2026年6月10日の Show HN

45 件

252

Extend UI – open-source UI kit for modern document apps #

extend.ai

81 コメント4:09 PMHN で見る

We're open-sourcing 14 components & examples today for PDF, DOCX, and XLSX viewers, plus bounding box citations, file upload, e-signature, and more. It's MIT licensed and fully customizable.

Demo video here: https://share.extend.ai/kRmSGKRF

When we started, we tried every file viewer and document component library we could find. Unfortunately, none of them had all the functionality (and polish) that we wanted, so we ended up building our own for https://extend.ai/. It was only ever meant to be internal, but enough customers kept asking for it that we decided to open source it.

It's useful for building document processing agents, real-time user facing document intake flows, or all kinds of internal tooling.

We naively thought this would be a solved problem. Turns out, making PDF/XLSX/DOCX viewers that work at scale is not trivial...we use and maintain it for Extend ourselves, so we've fixed a lot of edge cases that came up while running millions of pages / day through our own system. Our hope is that with our resources + community support, it'll keep getting better over time.

210

I am building a map of people who lived in the Roman Empire #

new.roman-names.com

51 コメント7:28 PMHN で見る

Driving home from work one day, I wanted to know how many people we knew the names of who lived during the Roman era. Searching around, I found lists of Consuls and officials, but nothing that covered ordinary people or even most people like freedmen and slaves. So I ended up building a pipeline to process the more than 500k Latin inscriptions in the Epigraphic Database Clauss-Slaby https://edcs.hist.uzh.ch/en/ and extract the names of people (and attempt to cluster them, but this is a work in progress).

There are databases where Classicists have done this manually for specific regions, Trismegistos https://www.trismegistos.org/ and Latin Inscriptions of the Roman Empire (LIRE) https://pure.au.dk/portal/en/publications/latin-inscriptions... are two major efforts I found. But there doesn't seem to be a project that did what I set out to do, although I have read in some places that it was believed to be possible.

I am not a classicist or a web developer, but I have Claude and Gemini and I can sort of read basic Latin - so I set to work. I used LIRE and another database as ground truth and built a pipeline to extract and process the inscriptions to recover the names. The process I developed uses a high end LLM like Sonnet or Gemini Pro to supervise the extraction and tuning process on a regional basis until the obvious error rate is reasonable. For this, so far, reasonable to me means less than 1-2% in the smaller initial samples of 100-500 and no observed systemic issues. The different regions often need different prompts, so this basically became an exercise in letting the higher level AI tune the prompt for the lower level AI. The extraction when measured against LIRE produces an F1 score between 0.64 and 0.87, but take this with a grain of salt.

Once I had done a few regions, I wanted to see the work, so I threw together a pretty crude website but as I am not a web developer, it was crude in how it accessed its data. It does look cool and I also added summarization, and machine translation to each entry. I wanted to eventually get feedback from an actual team of classicists and make the website work better, so I am rewriting it as we speak but it is broadly functional now with a few extra bugs but substantially improved performance compared to the old one. All entries link back to the proper sources, and the old web app linked to several additional sources where the data was present, but I haven't gotten that working again just yet on the new one. (The old web interface is still available at https://roman-names.com, but I will warn you it is clunky and not mobile friendly at all)

Key findings so far:

AI supervised AI extraction saved me time. I was manually tuning things for a while and then the runbook became an idea that I feed my instructions in and let the big AI go with sparse oversight from me.

The extraction improved significantly (by about 10 F1 points) when I fed the model the raw text including the markers, vs a cleaned up version of the text.

I just thought it was a cool little project and wanted to share. If you happen to work in any adjacent space and there is something I could do better etc let me know.

158

HelixDB – A graph database built on object storage #

github.com

46 コメント3:47 PMHN で見る

Hey HN, it’s been just over a year since we launched HelixDB (https://news.ycombinator.com/item?id=43975423), a project a friend and I started in college. It’s an OLTP graph database built on object-storage, with native vector search and full-text search (FTS).

Why graph, vector and FTS? Graph databases provide a natural cognitive model for data, vectors allow for a semantic understanding of the entities and relationships in the graph, and FTS provides more specific filtering. Many AI-driven applications attempt to combine all of these functionalities by stitching together multiple disconnected systems, but even then there’s no native way to perform joins or queries that span all systems. You still need to handle this logic at the application level.

Helix started as a graph DB, but we moved to a hybrid graph/vector approach after attempting to build an AI memory system, which led us down the GraphRAG and HybridRAG rabbit hole, where we would need separate graph and vector databases.

We knew scalability would be a challenge at each stage of our product's development, however our initial focus this past year was to prove out the product through local deployments and was only meant to be run on a single node. Scaling graph DBs remained a difficult and expensive problem we’d have to solve later. Some common ways other graph DBs solve scaling is by duplicating entire datasets across distributed machines (extremely expensive per node), or by sharding the data.

Sharding databases is effective and affordable, however, graph data doesn’t have explicit partitions like relational databases do. For example, sharding a relational DB involves splitting up tables. When it comes to graph DBs, the edges can span across any of the partitions, and hopping across multiple machines when traversing nodes is ineffective and computationally expensive.

Replicating graph DBs for high availability and better throughput drastically increases the operational cost of the db and still has a limit of how big you can vertically scale. The workload that we’re used for requires storing a huge amount of data for agents, where only a subset of that data is ever needed at any one time. So rather than having the whole thing in memory, we can store it all in object-storage and get the bits we need when they’re needed.

Agents benefit from better context, which is achieved from more and better data (more relationships etc). By using S3 as the persistence/data layer there is no limit to how big the graph can be or how many relationships you can have, and we can scale to serve throughput and requests by horizontally spinning up nodes and caching relevant subsets of the graph on each node. This way, you get extremely low latency for “hot” data and a p99 of ~100ms for writes and ~50ms for reads from cold storage (S3). Plus you get the benefit of dirt cheap storage.

Workloads that HelixDB is currently supporting: - Huge amounts of data (TBs) from which the agents need to search and traverse over - Offering affordable graph storage for companies where cost of graph data is a bottleneck - Consolidating multiple databases, enabling AI agents to have autonomy over companies, helping them become more autonomous. - AI memory - Company brains

We’re currently working on our own generalised AI memory layer which will use HelixDB under the hood and be completely open-source. Also, we’re finishing up on pre-filtering for vector search which will allow you to pre-filter based on relationships in the graph, metadata, and sub-graphs. And lastly, GA cloud will be available in the coming weeks.

If you want to run Helix locally (either on-disk or in-memory), you can find more info on our github (https://github.com/HelixDB/helix-db) or via our docs (https://docs.helix-db.com/database/local-development). If you’re interested in getting started with our distributed cloud, please email us [email protected].

Many thanks! Comments and feedback welcome!

Lightweight Task queue on Erlang/OTP, SQLite-backed, no overengineering #

github.com

17 コメント1:45 PMHN で見る

Setting up Kafka or such enterprise oriented software with their clusters or dedicated servers is heavy and bothering enough that most small teams or indie hackers skip it entirely and making compromise to use in-memory queues.

I wanted something in between: a persistent queue that is simple to run (one binary, which makes one sqlite db), gets real fault isolation and crash recovery due to Elixir, easy to inspect (open ezra.db in any SQLite browser and see every task), and requires no new client library - it speaks the Redis Streams wire protocol, so any Redis client in any language just works out of the box.

Very short demo video: [https://www.youtube.com/watch?v=MLYyD3DVWmE]

macOS menu bar gauges for your Claude Code quota #

github.com

40 コメント9:43 AMHN で見る

Turn your name into a tree in an infinite procedural shanshui landscape #

landscape.bairui.dev

21 コメント2:39 PMHN で見る

Artie – Real-time data replication to your warehouse, now self-serve #

artie.com

6 コメント5:27 AMHN で見る

Hey HN, cofounder of Artie here. We’ve built a real-time data replication tool that captures every row-level change in your source database and streams it to your warehouse in under 60 seconds.

The last time I posted here, people had to book a call with us in order to access Artie. Today, that’s no longer the case. You can now connect your source and destination and start streaming immediately.

I spent years of my career building large-scale data pipelines and experienced how difficult it was to get real-time data firsthand. I believed there must be a better way to stream data into our warehouse, which resulted in Artie being born. And now with AI agents, reducing data latency has become more and more crucial as agents need to make decisions off of fresh data.

When I first started building Artie, I quickly learned that the components meant to keep CDC running smoothly are very much bolted on with tons of edge cases. Unfortunately in practice, they were not built to work together. We ended up dealing with schema drift, backfill race conditions, Kafka offset commits, and TOAST columns. I’d love to know if others have hit these same issues while building in-house.

artie.com, would love feedback!

NBSDgames – 21 new, improved, original text games for Unix, DOS, Plan9 #

github.com

1 コメント3:08 PMHN で見る

Social network where inviting someone makes you accountable for them #

chirpper.com

21 コメント3:09 PMHN で見る

Chirpper is invite-only. When you vouch someone in, they join your TrustChain. Their behavior affects your TrustRank, and that propagates up the lineage. No moderators. The accountability is architectural, not policy-based. You can be pseudonymous, but you can't be unaccountable. Happy to get into the mechanics in comments.

Camel Mono – a monospace font that makes camelCase easier to read #

github.com

0 コメント2:02 PMHN で見る

Magenta Real-Time Music Generation on iPhone, Without the GPU #

github.com

0 コメント10:22 PMHN で見る

Last Thursday, Deepmind released Magenta Realtime 2 , an open source music generation model. They said it could run on Mac, but not iPhone.

As a v̵i̵b̵e̵ ̵c̵o̵d̵i̵n̵g̵ ̵a̵d̵d̵i̵c̵t̵ agentic AI maxxi and person who has melted iPhones before (link at bottom), I took that as a personal challenge and made it my weekend project.

On Saturday, I got it to run for 10min straight on an iPhone 12 Pro from 2020 without melting the phone or - shockingly - touching the GPU.

How? I chopped the model up into 5 pieces and set them each to run on different parts of Apple's system on a chip (SoC).

My past experience taught me that if you can actually leverage it, the iPhone's NPU is incredibly powerful, and power efficient. If you're doing sustained real-time generation for long periods of time on a device without a fan, you gotta use the neural engine or else you will melt the device.

See: https://accelerateordie.com/p/we-melted-iphones-for-science

The Apple Neural Engine has a ton of constraints, the main one being that it only accepts fixed shape inputs, and only supports some architectures -- which is why I chopped the model up into pieces.

But it works! And I wrote zero lines of code by hand. Back when I was running VC-backed companies, I would have needed a small team of grumpy greybeard engineers to do this and it would have taken 2-6 weeks. Now I can feed my own nerd fetish and do this stuff myself.

Next up: I'm building an iPhone app that ties into your heart rate, movement data, location etc to generate a real-time soundtrack to you life.

What a time to be alive!

I build an app to learn personal finance #

finance.usescroll.app

10 コメント5:46 PMHN で見る

Llmbuffer – Python library for cache-optimized LLM conversation history #

github.com

1 コメント10:26 PMHN で見る

I was not getting good cache utilization when including dynamic context in agent threads. After a lot of experimentation, I found a good pattern that minimizes how often long lived conversation history gets modified while still supporting dynamic context. It has flexible hooks for doing things like truncating or summarizing tool outputs when transitioning messages to the long term history. And I'm seeing >>90% of tokens hitting the cache for my agents despite including a lot of dynamic user context.

There are a wide range of agent prompting strategies so I'd love to hear where this library works well and where there are patterns that don't fit well into the current API!

A 150M model that extracts verbatim evidence spans for RAG, no LLM call #

huggingface.co

0 コメント4:29 PMHN で見る

Drift – an embedding-model upgrade should be a rotation, not a reindex #

github.com

3 コメント12:08 PMHN で見る

Learn while you wait for your agents to code #

github.com

2 コメント4:53 PMHN で見る

Hi HN,

While waiting for Claude Code to finish running, It's very tempting to start another task or browse the internet. This is what happened to me so I built Foyer to try to learn about what the agents are working on instead of losing focus.

Product is an early MVP and would love some feedback on this.

Jailbreak this model to get 3B tokens #

opir.ai

0 コメント11:47 PMHN で見る

Impress your boss with interactive Decision Tree Visualization #

github.com

0 コメント10:34 AMHN で見る

RiskKernel, kill -9 an AI agent and resume it without paying twice #

riskkernel.com

6 コメント12:37 PMHN で見る

Kctx – A read-only Kubernetes context engine for SREs and AI Agents #

github.com

0 コメント8:44 PMHN で見る

A curated collection of simple datasets for machine learning #

github.com

1 コメント2:10 PMHN で見る

WebCLI – what if the browser was just another Unix command? #

webcli.sh

0 コメント8:27 AMHN で見る

A Bluesky client for PICO-8 #

picosky.vinnymac.dev

0 コメント12:50 PMHN で見る

I’ve been working on this for some time after learning about PICO-8 and its constraints to fit into the p8 cartridge limits and was looking for a challenge.

So I made Picosky, a Bluesky client for the PICO-8 console to see what would be possible combining GPIO with sockets. Initially I just aimed to like a post on my feed, and then it grew from there.

Which also inspired a sibling game, https://npicomx.vinnymac.dev based on my experience contributing to the npmx.dev project in early 2026.

Both games require more play testing for bugs, so share feedback and let me know what you would have done differently.

Thanks

Nuts – pip/NPM for Java with first-class workspaces, JDK provisioning #

github.com

2 コメント10:23 PMHN で見る

My frustration with distributing java apps didnt show up recently. I remember having implemented my first network jar downloaded back in the 2000's because i needed applet like feature support with desktop full control. Years after, the problem is the very same. Webstart didnt really took off and the only mean i had in my projects was the ugly fatjars, including the (for me) uglier spring-boot repackaging that changes the application classloading behaviour and hence giving me by time some headackes i was not prepared for.

So basically nuts started as a response to this frustration 9 years ago, but from now i think its mature enough (used in production) to be shared, and forecebly i am more keen to need suggestions and help from fellow contributors.

Tapflow – self-hosted iOS/Android simulator streaming for mobile QA #

github.com

0 コメント9:24 AMHN で見る

I'd love feedback from anyone who's fought the simulator-access problem, or has opinions on the private-API touch approach or the MSE → WebCodecs/WASM latency path.

Amanuensis – a local-first AI persona that won't fabricate facts #

github.com

0 コメント6:08 PMHN で見る

This Week in Obsidian – Obsidian Newsletter Published Every Tuesday #

thisweekinobsidian.substack.com

0 コメント3:24 AMHN で見る

I am the author of This Week in Obsidian, a newsletter published every Tuesday to help Obsidian users stay up-to-date with the latest community news, discussions, and interesting plugins. The Obsidian community is very active, with new plugins and lively discussions every day. For the average user, there's no need to spend a lot of time reading all of that. I hope This Week in Obsidian can help you spend just a little time each week keeping up with the latest from the community.

The newsletter has now published 25 issues and has over 500 subscribers. I've also worked out an efficient and simple publishing process, so maintaining the newsletter is cost-effective and doesn't take too much of my time or energy. That means I'll likely continue publishing it for a long time. If you're an Obsidian user too, maybe you'll also enjoy This Week in Obsidian. If you have suggestions for improvement, I'd love to hear your feedback.

Practicing foreign language generating conversation on topic [video] #

youtube.com

0 コメント7:38 AMHN で見る

SoulOS open-source replacing system prompts with stable state machines #

github.com

1 コメント12:09 PMHN で見る

Construct SQL from table records by breaking down decision tree #

github.com

1 コメント2:04 PMHN で見る

Create SQL by over fitting decision tree on data

Then I simplify the boolean representation.

The demo is hosted in streamlit (https://inversql.streamlit.app).

Sorry if this is a repost, my previous post had the wrong tag.

Meadow Mind – a 7B diffusion LLM plays Gym games with zero training #

github.com

0 コメント5:41 PMHN で見る

KnowledgeMCP – Turn any docs into an MCP endpoint (0 LLM at query time) #

github.com

0 コメント12:36 AMHN で見る

AI Mime - Use a screen recording for context instead of prompting #

github.com

0 コメント6:21 PMHN で見る

Meadow Notes – extract and publish microsites from your Markdown graphs #

meadow-notes.com

0 コメント9:50 PMHN で見る

Sometimes I wonder if my life would be better if I'd never bumped into Andy Matuschak's ideas. But I did, and basically got one-shotted by his concept of "evergreen notes", which are notes that are similar to well-factored code. They're small, conceptual, and densely-linked.

They're also really hard to share with people, because those links form a complex graph that goes all over the place. You could share individual notes, but since the notes are small and have a lot of links there's not a lot of utility in sharing single notes. You could share all of them, but who in their right mind would want to share _all_ their notes.

I found myself wanting to share little groups of notes with different sets of people reasonably often. None of the existing publishing tools I encountered supported automatic curation where it would suggest a candidate graph that I could then modify. Also, none were geared towards publishing lots of little sites.

I've spent about a year part-time developing the project so far, and in that time I've probably published about 50 sites. Each time I learned something and improved the tooling. I like it pretty well now, so I'm sharing.

It's open source, but also has a "we host for you" option. You can publish 3 sites to the meadow-notes.com site for free, so you can try it out.

I'd love to know what you think.

I let an AI C-suite run my company – starter kit from the inside #

thepromptnova.gumroad.com

0 コメント10:27 PMHN で見る

Sum Type and Type Matching in C #

github.com

0 コメント2:18 AMHN で見る

Previous thread : https://news.ycombinator.com/item?id=45145176

Tried to implement a best effort pattern patching inspired solution in C. Destructuring not available, and nesting not directly available, but both can be achieved by opening more Match-When blocks. I guess we can say this is almost pattern matching in C.

SNItch – fuzz the TLS SNI field to discover hidden virtual hosts #

github.com

0 コメント12:44 PMHN で見る

Nodrix – open-source IoT cloud deployed to your own Cloudflare account #

github.com

0 コメント11:48 PMHN で見る

Tempis, a canvas-based timeline component for large datasets #

tempis.dev

0 コメント12:39 PMHN で見る

Petiglyph – TUI/CLI to turn images and videos into custom font glyphs #

github.com

0 コメント2:14 PMHN で見る

petiglyph is a TUI/CLI tool to easily turn images and videos into custom font glyphs, static or animated, to be added to your TUIs!

for example, if you wanted a specific icon to be available inside your Terminal UI without using Kitty graphics protocol, you could use petiglyph to generate a .ttf file that include a monochrome version of this icon as a glyph to be used like any other Nerd font glyph.

you can generate standard size glyphs or compose a grid of glyphs to display a bigger visual over several lines.

petiglyph also supports generating frames of glyphs (standard or grid) based on videos for you to render an animation inside the terminal, the TUI allows you to copy to clipboard the "glyph frames" to be able to easily paste them in your own code.

be aware that this tool relies on installing custom fonts on your system so once the fonts are installed you need to reboot the apps (or even your whole computer sometimes) for the fonts to be correctly loaded.

available on github, npm, pypi, AUR

LoopGain – Cut agent API spend by measuring when loops stop improving #

github.com

1 コメント12:38 PMHN で見る

A CLI that fact-checks text using a 500MB local model and web search #

github.com

0 コメント3:09 PMHN で見る

DESi Sees It #

hstre.github.io

0 コメント2:12 PMHN で見る

An Alternative to Selenium IDE #

uindow.com

0 コメント4:54 PMHN で見る

Eatmydata.ai – Local-First Question-to-SQL-to-Dashboard AI #

eatmydata.ai

0 コメント7:43 AMHN で見る

Yet another "talk to your data and build a dashboard" app, where data does not leave your browser.

You ask a question, agents produce multiple SQL queries to in-browser sqlite, never seeing results, and write dashboard configuration code. The data you analyze will be indexed with a local semantic index (embeddings generation + sqlite vector search fully local).

Next, sandboxed QuickJS runs this code to produce rich dashboards directly in your browser, no backend attached. This is a fully frontend app (except OpenRouter or other remote LLM).

All data sent to LLM's is heavily sanitized and obfuscated at several points. The remote LLM never sees the contents of data it analyzes. Why does it exist - I started this is a testbed for my local-first AI projects, agentic workflows and contextual data analysis experiments.

It grew into a tool I use daily for quick and dirty data analytics when I don't want to waste time debugging SQL or building charts for simple data questions, when I literally need an answer under 10s.

I also don't like the idea of sharing random data in Claude/ChatGPT chat, neither uploading any work-related datasets to them. Plus they both often choke on tiny 100k rows data.

Fully open-sourced under MIT https://github.com/eatmydata-org/eatmydata, run it yourself it's a static web app.

What's in the box:

- SQLite OPFS adapted from wa-sqlite, data queried only locally;

- TurboQuant semantic indexing extension for sqlite (MIT-licensed);

- Quantized PII detection and embedding generation models straight in browser;

- NER and embeddings inference engines in zero-dependency C and wasm-simd128 optimizations (1.7x faster and 38x lighter binary compare to onnxruntime);

- QuickJS sandbox for AI-generated code;

- Orchestrator <-> SQL Planner <-> Coder agent loop that build SQL and dashboards from user query;

- Apache ECharts for dashboards;

- Fork of xslx Community edition to support styles (missing in OSS version upstream).

Hope it'll be useful to anyone who is interested in local-first stuff.

2026年6月10日 の Show HN

2026年6月10日の Show HN