毎日の Show HN

Upvote0

2026年6月10日 の Show HN

45 件
252

Extend UI – open-source UI kit for modern document apps #

extend.ai faviconextend.ai
81 コメント4:09 PMHN で見る
We're open-sourcing 14 components & examples today for PDF, DOCX, and XLSX viewers, plus bounding box citations, file upload, e-signature, and more. It's MIT licensed and fully customizable.

Demo video here: https://share.extend.ai/kRmSGKRF

When we started, we tried every file viewer and document component library we could find. Unfortunately, none of them had all the functionality (and polish) that we wanted, so we ended up building our own for https://extend.ai/. It was only ever meant to be internal, but enough customers kept asking for it that we decided to open source it.

It's useful for building document processing agents, real-time user facing document intake flows, or all kinds of internal tooling.

We naively thought this would be a solved problem. Turns out, making PDF/XLSX/DOCX viewers that work at scale is not trivial...we use and maintain it for Extend ourselves, so we've fixed a lot of edge cases that came up while running millions of pages / day through our own system. Our hope is that with our resources + community support, it'll keep getting better over time.

210

I am building a map of people who lived in the Roman Empire #

new.roman-names.com faviconnew.roman-names.com
51 コメント7:28 PMHN で見る
Driving home from work one day, I wanted to know how many people we knew the names of who lived during the Roman era. Searching around, I found lists of Consuls and officials, but nothing that covered ordinary people or even most people like freedmen and slaves. So I ended up building a pipeline to process the more than 500k Latin inscriptions in the Epigraphic Database Clauss-Slaby https://edcs.hist.uzh.ch/en/ and extract the names of people (and attempt to cluster them, but this is a work in progress).

There are databases where Classicists have done this manually for specific regions, Trismegistos https://www.trismegistos.org/ and Latin Inscriptions of the Roman Empire (LIRE) https://pure.au.dk/portal/en/publications/latin-inscriptions... are two major efforts I found. But there doesn't seem to be a project that did what I set out to do, although I have read in some places that it was believed to be possible.

I am not a classicist or a web developer, but I have Claude and Gemini and I can sort of read basic Latin - so I set to work. I used LIRE and another database as ground truth and built a pipeline to extract and process the inscriptions to recover the names. The process I developed uses a high end LLM like Sonnet or Gemini Pro to supervise the extraction and tuning process on a regional basis until the obvious error rate is reasonable. For this, so far, reasonable to me means less than 1-2% in the smaller initial samples of 100-500 and no observed systemic issues. The different regions often need different prompts, so this basically became an exercise in letting the higher level AI tune the prompt for the lower level AI. The extraction when measured against LIRE produces an F1 score between 0.64 and 0.87, but take this with a grain of salt.

Once I had done a few regions, I wanted to see the work, so I threw together a pretty crude website but as I am not a web developer, it was crude in how it accessed its data. It does look cool and I also added summarization, and machine translation to each entry. I wanted to eventually get feedback from an actual team of classicists and make the website work better, so I am rewriting it as we speak but it is broadly functional now with a few extra bugs but substantially improved performance compared to the old one. All entries link back to the proper sources, and the old web app linked to several additional sources where the data was present, but I haven't gotten that working again just yet on the new one. (The old web interface is still available at https://roman-names.com, but I will warn you it is clunky and not mobile friendly at all)

Key findings so far:

AI supervised AI extraction saved me time. I was manually tuning things for a while and then the runbook became an idea that I feed my instructions in and let the big AI go with sparse oversight from me.

The extraction improved significantly (by about 10 F1 points) when I fed the model the raw text including the markers, vs a cleaned up version of the text.

I just thought it was a cool little project and wanted to share. If you happen to work in any adjacent space and there is something I could do better etc let me know.

158

HelixDB – A graph database built on object storage #

github.com favicongithub.com
46 コメント3:47 PMHN で見る
Hey HN, it’s been just over a year since we launched HelixDB (https://news.ycombinator.com/item?id=43975423), a project a friend and I started in college. It’s an OLTP graph database built on object-storage, with native vector search and full-text search (FTS).

Why graph, vector and FTS? Graph databases provide a natural cognitive model for data, vectors allow for a semantic understanding of the entities and relationships in the graph, and FTS provides more specific filtering. Many AI-driven applications attempt to combine all of these functionalities by stitching together multiple disconnected systems, but even then there’s no native way to perform joins or queries that span all systems. You still need to handle this logic at the application level.

Helix started as a graph DB, but we moved to a hybrid graph/vector approach after attempting to build an AI memory system, which led us down the GraphRAG and HybridRAG rabbit hole, where we would need separate graph and vector databases.

We knew scalability would be a challenge at each stage of our product's development, however our initial focus this past year was to prove out the product through local deployments and was only meant to be run on a single node. Scaling graph DBs remained a difficult and expensive problem we’d have to solve later. Some common ways other graph DBs solve scaling is by duplicating entire datasets across distributed machines (extremely expensive per node), or by sharding the data.

Sharding databases is effective and affordable, however, graph data doesn’t have explicit partitions like relational databases do. For example, sharding a relational DB involves splitting up tables. When it comes to graph DBs, the edges can span across any of the partitions, and hopping across multiple machines when traversing nodes is ineffective and computationally expensive.

Replicating graph DBs for high availability and better throughput drastically increases the operational cost of the db and still has a limit of how big you can vertically scale. The workload that we’re used for requires storing a huge amount of data for agents, where only a subset of that data is ever needed at any one time. So rather than having the whole thing in memory, we can store it all in object-storage and get the bits we need when they’re needed.

Agents benefit from better context, which is achieved from more and better data (more relationships etc). By using S3 as the persistence/data layer there is no limit to how big the graph can be or how many relationships you can have, and we can scale to serve throughput and requests by horizontally spinning up nodes and caching relevant subsets of the graph on each node. This way, you get extremely low latency for “hot” data and a p99 of ~100ms for writes and ~50ms for reads from cold storage (S3). Plus you get the benefit of dirt cheap storage.

Workloads that HelixDB is currently supporting: - Huge amounts of data (TBs) from which the agents need to search and traverse over - Offering affordable graph storage for companies where cost of graph data is a bottleneck - Consolidating multiple databases, enabling AI agents to have autonomy over companies, helping them become more autonomous. - AI memory - Company brains

We’re currently working on our own generalised AI memory layer which will use HelixDB under the hood and be completely open-source. Also, we’re finishing up on pre-filtering for vector search which will allow you to pre-filter based on relationships in the graph, metadata, and sub-graphs. And lastly, GA cloud will be available in the coming weeks.

If you want to run Helix locally (either on-disk or in-memory), you can find more info on our github (https://github.com/HelixDB/helix-db) or via our docs (https://docs.helix-db.com/database/local-development). If you’re interested in getting started with our distributed cloud, please email us [email protected].

Many thanks! Comments and feedback welcome!

75

Lightweight Task queue on Erlang/OTP, SQLite-backed, no overengineering #

github.com favicongithub.com
17 コメント1:45 PMHN で見る
Setting up Kafka or such enterprise oriented software with their clusters or dedicated servers is heavy and bothering enough that most small teams or indie hackers skip it entirely and making compromise to use in-memory queues.

I wanted something in between: a persistent queue that is simple to run (one binary, which makes one sqlite db), gets real fault isolation and crash recovery due to Elixir, easy to inspect (open ezra.db in any SQLite browser and see every task), and requires no new client library - it speaks the Redis Streams wire protocol, so any Redis client in any language just works out of the box.

Very short demo video: [https://www.youtube.com/watch?v=MLYyD3DVWmE]

25

Artie – Real-time data replication to your warehouse, now self-serve #

artie.com faviconartie.com
6 コメント5:27 AMHN で見る
Hey HN, cofounder of Artie here. We’ve built a real-time data replication tool that captures every row-level change in your source database and streams it to your warehouse in under 60 seconds.

The last time I posted here, people had to book a call with us in order to access Artie. Today, that’s no longer the case. You can now connect your source and destination and start streaming immediately.

I spent years of my career building large-scale data pipelines and experienced how difficult it was to get real-time data firsthand. I believed there must be a better way to stream data into our warehouse, which resulted in Artie being born. And now with AI agents, reducing data latency has become more and more crucial as agents need to make decisions off of fresh data.

When I first started building Artie, I quickly learned that the components meant to keep CDC running smoothly are very much bolted on with tons of edge cases. Unfortunately in practice, they were not built to work together. We ended up dealing with schema drift, backfill race conditions, Kafka offset commits, and TOAST columns. I’d love to know if others have hit these same issues while building in-house.

artie.com, would love feedback!

11

Social network where inviting someone makes you accountable for them #

chirpper.com faviconchirpper.com
21 コメント3:09 PMHN で見る
Chirpper is invite-only. When you vouch someone in, they join your TrustChain. Their behavior affects your TrustRank, and that propagates up the lineage. No moderators. The accountability is architectural, not policy-based. You can be pseudonymous, but you can't be unaccountable. Happy to get into the mechanics in comments.
9

Magenta Real-Time Music Generation on iPhone, Without the GPU #

github.com favicongithub.com
0 コメント10:22 PMHN で見る
Last Thursday, Deepmind released Magenta Realtime 2 , an open source music generation model. They said it could run on Mac, but not iPhone.

As a v̵i̵b̵e̵ ̵c̵o̵d̵i̵n̵g̵ ̵a̵d̵d̵i̵c̵t̵ agentic AI maxxi and person who has melted iPhones before (link at bottom), I took that as a personal challenge and made it my weekend project.

On Saturday, I got it to run for 10min straight on an iPhone 12 Pro from 2020 without melting the phone or - shockingly - touching the GPU.

How? I chopped the model up into 5 pieces and set them each to run on different parts of Apple's system on a chip (SoC).

My past experience taught me that if you can actually leverage it, the iPhone's NPU is incredibly powerful, and power efficient. If you're doing sustained real-time generation for long periods of time on a device without a fan, you gotta use the neural engine or else you will melt the device.

See: https://accelerateordie.com/p/we-melted-iphones-for-science

The Apple Neural Engine has a ton of constraints, the main one being that it only accepts fixed shape inputs, and only supports some architectures -- which is why I chopped the model up into pieces.

But it works! And I wrote zero lines of code by hand. Back when I was running VC-backed companies, I would have needed a small team of grumpy greybeard engineers to do this and it would have taken 2-6 weeks. Now I can feed my own nerd fetish and do this stuff myself.

Next up: I'm building an iPhone app that ties into your heart rate, movement data, location etc to generate a real-time soundtrack to you life.

What a time to be alive!

7

Llmbuffer – Python library for cache-optimized LLM conversation history #

github.com favicongithub.com
1 コメント10:26 PMHN で見る
I was not getting good cache utilization when including dynamic context in agent threads. After a lot of experimentation, I found a good pattern that minimizes how often long lived conversation history gets modified while still supporting dynamic context. It has flexible hooks for doing things like truncating or summarizing tool outputs when transitioning messages to the long term history. And I'm seeing >>90% of tokens hitting the cache for my agents despite including a lot of dynamic user context.

There are a wide range of agent prompting strategies so I'd love to hear where this library works well and where there are patterns that don't fit well into the current API!

6

Learn while you wait for your agents to code #

github.com favicongithub.com
2 コメント4:53 PMHN で見る
Hi HN,

While waiting for Claude Code to finish running, It's very tempting to start another task or browse the internet. This is what happened to me so I built Foyer to try to learn about what the agents are working on instead of losing focus.

Product is an early MVP and would love some feedback on this.

4

A Bluesky client for PICO-8 #

picosky.vinnymac.dev faviconpicosky.vinnymac.dev
0 コメント12:50 PMHN で見る
I’ve been working on this for some time after learning about PICO-8 and its constraints to fit into the p8 cartridge limits and was looking for a challenge.

So I made Picosky, a Bluesky client for the PICO-8 console to see what would be possible combining GPIO with sockets. Initially I just aimed to like a post on my feed, and then it grew from there.

Which also inspired a sibling game, https://npicomx.vinnymac.dev based on my experience contributing to the npmx.dev project in early 2026.

Both games require more play testing for bugs, so share feedback and let me know what you would have done differently.

Thanks

4

Nuts – pip/NPM for Java with first-class workspaces, JDK provisioning #

github.com favicongithub.com
2 コメント10:23 PMHN で見る
My frustration with distributing java apps didnt show up recently. I remember having implemented my first network jar downloaded back in the 2000's because i needed applet like feature support with desktop full control. Years after, the problem is the very same. Webstart didnt really took off and the only mean i had in my projects was the ugly fatjars, including the (for me) uglier spring-boot repackaging that changes the application classloading behaviour and hence giving me by time some headackes i was not prepared for.

So basically nuts started as a response to this frustration 9 years ago, but from now i think its mature enough (used in production) to be shared, and forecebly i am more keen to need suggestions and help from fellow contributors.

3

This Week in Obsidian – Obsidian Newsletter Published Every Tuesday #

thisweekinobsidian.substack.com faviconthisweekinobsidian.substack.com
0 コメント3:24 AMHN で見る
I am the author of This Week in Obsidian, a newsletter published every Tuesday to help Obsidian users stay up-to-date with the latest community news, discussions, and interesting plugins. The Obsidian community is very active, with new plugins and lively discussions every day. For the average user, there's no need to spend a lot of time reading all of that. I hope This Week in Obsidian can help you spend just a little time each week keeping up with the latest from the community.

The newsletter has now published 25 issues and has over 500 subscribers. I've also worked out an efficient and simple publishing process, so maintaining the newsletter is cost-effective and doesn't take too much of my time or energy. That means I'll likely continue publishing it for a long time. If you're an Obsidian user too, maybe you'll also enjoy This Week in Obsidian. If you have suggestions for improvement, I'd love to hear your feedback.

3

Meadow Notes – extract and publish microsites from your Markdown graphs #

meadow-notes.com faviconmeadow-notes.com
0 コメント9:50 PMHN で見る
Sometimes I wonder if my life would be better if I'd never bumped into Andy Matuschak's ideas. But I did, and basically got one-shotted by his concept of "evergreen notes", which are notes that are similar to well-factored code. They're small, conceptual, and densely-linked.

They're also really hard to share with people, because those links form a complex graph that goes all over the place. You could share individual notes, but since the notes are small and have a lot of links there's not a lot of utility in sharing single notes. You could share all of them, but who in their right mind would want to share _all_ their notes.

I found myself wanting to share little groups of notes with different sets of people reasonably often. None of the existing publishing tools I encountered supported automatic curation where it would suggest a candidate graph that I could then modify. Also, none were geared towards publishing lots of little sites.

I've spent about a year part-time developing the project so far, and in that time I've probably published about 50 sites. Each time I learned something and improved the tooling. I like it pretty well now, so I'm sharing.

It's open source, but also has a "we host for you" option. You can publish 3 sites to the meadow-notes.com site for free, so you can try it out.

I'd love to know what you think.

1

Petiglyph – TUI/CLI to turn images and videos into custom font glyphs #

github.com favicongithub.com
0 コメント2:14 PMHN で見る
petiglyph is a TUI/CLI tool to easily turn images and videos into custom font glyphs, static or animated, to be added to your TUIs!

for example, if you wanted a specific icon to be available inside your Terminal UI without using Kitty graphics protocol, you could use petiglyph to generate a .ttf file that include a monochrome version of this icon as a glyph to be used like any other Nerd font glyph.

you can generate standard size glyphs or compose a grid of glyphs to display a bigger visual over several lines.

petiglyph also supports generating frames of glyphs (standard or grid) based on videos for you to render an animation inside the terminal, the TUI allows you to copy to clipboard the "glyph frames" to be able to easily paste them in your own code.

be aware that this tool relies on installing custom fonts on your system so once the fonts are installed you need to reboot the apps (or even your whole computer sometimes) for the fonts to be correctly loaded.

available on github, npm, pypi, AUR

1

Eatmydata.ai – Local-First Question-to-SQL-to-Dashboard AI #

eatmydata.ai faviconeatmydata.ai
0 コメント7:43 AMHN で見る
Yet another "talk to your data and build a dashboard" app, where data does not leave your browser.

You ask a question, agents produce multiple SQL queries to in-browser sqlite, never seeing results, and write dashboard configuration code. The data you analyze will be indexed with a local semantic index (embeddings generation + sqlite vector search fully local).

Next, sandboxed QuickJS runs this code to produce rich dashboards directly in your browser, no backend attached. This is a fully frontend app (except OpenRouter or other remote LLM).

All data sent to LLM's is heavily sanitized and obfuscated at several points. The remote LLM never sees the contents of data it analyzes. Why does it exist - I started this is a testbed for my local-first AI projects, agentic workflows and contextual data analysis experiments.

It grew into a tool I use daily for quick and dirty data analytics when I don't want to waste time debugging SQL or building charts for simple data questions, when I literally need an answer under 10s.

I also don't like the idea of sharing random data in Claude/ChatGPT chat, neither uploading any work-related datasets to them. Plus they both often choke on tiny 100k rows data.

Fully open-sourced under MIT https://github.com/eatmydata-org/eatmydata, run it yourself it's a static web app.

What's in the box:

- SQLite OPFS adapted from wa-sqlite, data queried only locally;

- TurboQuant semantic indexing extension for sqlite (MIT-licensed);

- Quantized PII detection and embedding generation models straight in browser;

- NER and embeddings inference engines in zero-dependency C and wasm-simd128 optimizations (1.7x faster and 38x lighter binary compare to onnxruntime);

- QuickJS sandbox for AI-generated code;

- Orchestrator <-> SQL Planner <-> Coder agent loop that build SQL and dashboards from user query;

- Apache ECharts for dashboards;

- Fork of xslx Community edition to support styles (missing in OSS version upstream).

Hope it'll be useful to anyone who is interested in local-first stuff.