Show HN for April 15, 2026
45 itemsLibretto – Making AI browser automations deterministic #
Here’s a demo: https://www.youtube.com/watch?v=0cDpIntmHAM. Docs start at https://libretto.sh/docs/get-started/introduction.
We spent a year building and maintaining browser automations for EHR and payer portal integrations at our healthcare startup. Building these automations and debugging failed ones was incredibly time-consuming.
There’s lots of tools that use runtime AI like Browseruse and Stagehand which we tried, but (1) they’re reliant on custom DOM parsing that's unreliable on older and complicated websites (including all of healthcare). Using a website’s internal network calls is faster and more reliable when possible. (2) They can be expensive since they rely on lots of AI calls and for workflows with complicated logic you can’t always rely on caching actions to make sure it will work. (3) They’re at runtime so it’s not interpretable what the agent is going to do. You kind of hope you prompted it correctly to do the right thing, but legacy workflows are often unintuitive and inconsistent across sites so you can’t trust an agent to just figure it out at runtime. (4) They don’t really help you generate new automations or help you debug automation failures.
We wanted a way to reliably generate and maintain browser automations in messy, high-stakes environments, without relying on fragile runtime agents.
Libretto is different because instead of runtime agents it uses “development-time AI”: scripts are generated ahead of time as actual code you can read and control, not opaque agent behavior at runtime. Instead of a black box, you own the code and can inspect, modify, version, and debug everything.
Rather than relying on runtime DOM parsing, Libretto takes a hybrid approach combining Playwright UI automation with direct network/API requests within the browser session for better reliability and bot detection evasion.
It records manual user actions to help agents generate and update scripts, supports step-through debugging, has an optional read-only mode to prevent agents from accidentally submitting or modifying data, and generates code that follows all the abstractions and conventions you have already in your coding repo.
Would love to hear how others are building and maintaining browser automations in practice, and any feedback on the approach we’ve taken here.
SmallDocs - A CLI and webapp for private Markdown reading and sharing #
The more we work with command line based agents the more `.md` files are part of our daily lives. Their output is great for agents to produce, but a little bit frustrating for humans: Markdown files are slightly annoying to read/preview and fiddly to share/receive. SDocs is a tool I built to resolve these pain points.
If you `sdoc path/to/file.md` (after `npm i -g sdocs-dev`) it instantly opens in the browser for you to preview (with our hopefully-nice-to-look-at default styling) and you can immediately share the url.
The `.md` files our agents produce contain some of the most sensitive information we have (about codebases, unresolved bugs, production logs, etc.). For this reason 100% privacy is an essential component of SDocs.
To achieve this SDoc urls contain your markdown document's content in compressed base64 in the url fragment (the bit after the `#`):
https://sdocs.dev/#md=GzcFAMT...(this is the contents of your document)...
The cool thing about the url fragment is that it is never sent to the server (see https://developer.mozilla.org/en-US/docs/Web/URI/Reference/F...: "The fragment is not sent to the server when the URI is requested; it is processed by the client").
The sdocs.dev webapp is purely a client side decoding and rendering engine for the content stored in the url fragment. This means the contents of your document stays with you and those you choose to share it with, the SDocs server doesn't access it. (Feel free to inspect/get your agent to inspect our code to confirm this!)
Because `.md` files might play a big role in the future of work, SDocs wants to push the boundaries of styling and rendering interesting content in markdown files. There is much more to do, but to start with you can add complex styling and render charts visually. The SDocs root (which renders `sdoc.md` with our default styles) has pictures and links to some adventurous examples. `sdoc schema` and `sdoc charts` provides detailed information for you or your agent about how how make the most of SDocs formatting.
If you share a SDocs URL, your styles travel with it because they are added as YAML Front Matter - https://jekyllrb.com/docs/front-matter/ - to the markdown file. E.g.:
styles:
fontFamily: Lora
baseFontSize: 17
At work, we've been putting this project to the test. My team and I have found SDocs to be particularly useful for sharing agent debugging reports and getting easily copyable content out of Claude (e.g. a series of bash commands that need to be ran).To encourage our agents to use SDocs we add a few lines about them in our root "agent files" (e.g. ~/.claude/CLAUDE.md or ~/.codex/AGENTS.md). When you use the cli for the first time there is an optional setup phase to do this for you.
I'm of course very interested in feedback and open to pull requests if you want to add features to SDocs.
Thank you for taking a look!
Avec – iOS email app that lets you handle your Gmail inbox in seconds #
A few friends & I have just spent the past ~2 years building a new kind of email app for iOS. It's now available in GA and we think some of you might find it interesting.
Why another email app, you ask? We think that while many interesting attempts at one-upping Gmail were made in the past 25 years, no one could really solve the core problem we all experience with email: information overload.
Of course, LLMs have completely changed that, as they finally offer a path to do refined triage of your inbox. However, as anyone who's tried the AI features most email apps offer nowadays, they feel tacked-on and rarely are useful.
That is why this took us ~2 years: we rebuilt an email app from the ground up, thinking about where we could thoughtfully and usefully leverage LLMs. Some of our faves:
- AI-based prioritization
- Voice-based email drafting that takes a few words from you and embellishes them into a polished email
- Very actionable, to-the-point notifications that don't force you to open the app to know what's up
Ultimately, while we tried hard to make a really great app, our goal is for you to spend as little time in it as possible. Ergo our tagline: we'd like to let you handle your inbox in seconds and get back to your life.
You can download it on the App Store at https://avec.ai/download (US & Canada), or on TestFlight at https://avec.ai/testflight in the rest of the world. You need a Gmail/Google Workspace account to try it out, and should you just be curious to kick the tires without committing, you'll be able to delete your account (& all associated data) directly within the app.
Some things of potential interest to HN readers:
- It is not local-first. We run a complex pipeline to process emails on our servers, as in our experience local models that can run on an iPhone are not yet good enough to support our core features.
- We use a variety of LLM vendors. One thing we've found is that no model is a magic bullet. Some features (like voice-based drafting) require extremely low latency, which others (like personalization) require the smartest model available. We end up having to stitch a lot of these together through trial and error.
- It is not yet available on Android, but we hope to release an Android version in the future :)
We'd love to hear folks' feedback and questions!
Omi – watches your screen, hears conversations, tells you what to do #
Basically Cluely + Rewind + Granola + Wisprflow + ChatGPT + Claude in one app
I talk to claude/chatgpt 24/7 but I find it frustrating that i have to capture/send screenshots of my screen and that it doesn't help proactively during my work
Whenever omi sees something wrong about my workflow, it will send me a proactive notification with advice. It will also point to something I'm missing.
The hardest part was to nail proactivity - after trying 20+ similar tools I didn't find a single one with smart proactive notifications based on content on your screen. I made it look at your screen every second with 4 main prompts:
1. Is the user productive or distracted?
2. Is there anything useful to say right now?
3. is there any task to add to do later?
4. is there anything important to remember about the user?
Full stack: - Swift - Rust backend - Deepgram transcription - Claude code for messaging - GPT 5.4 summaries - Gemini for embeddings and translation
Open source, stores screenshots locally, uses Claude Code for chat. Has cloud to sync with hardware or mobile app but can be disabled in settings
Compile English specs into 22 MB neural functions that run locally #
You describe a function in English — like "classify if this message is urgent" — and PAW compiles it into a tiny neural program (22 MB) that runs locally like a normal Python function. No API keys, no internet after compilation, deterministic output.
It's for tasks that are easy to describe but hard to code with rules: urgency triage, JSON repair, log filtering, tool routing for agents.
pip install programasweights
import programasweights as paw
f = paw.compile_and_load("Classify if this is urgent or not.")
f("Need your signature by EOD") # "urgent"
Compilation takes a few seconds on our server. After that, everything runs on your machine. Each program is a LoRA adapter + text instructions that adapt a fixed pretrained interpreter (Qwen3 0.6B). The model itself is unchanged — all task behavior comes from the compiled program.On our evaluation, this 0.6B interpreter with PAW reaches 73% accuracy. Prompting the same 0.6B directly gets 10%. Even prompting Qwen3 32B only gets 69%.
Also runs in the browser (GPT-2 124M, WebAssembly): https://programasweights.com/browser
You can also use it in your AI agents by copying the prompt here: https://programasweights.com/agents
Source: https://github.com/programasweights
Try it out: https://programasweights.com
Xit – a Git-compatible VCS written in Zig #
Skillgrab – scan any project, auto-install matching AI skills #
Lazyagent – TUI for to watch all your AI coding agents #
Lazyagent is a terminal TUI that collects events from Claude Code, Codex, and OpenCode and shows them in one place. It groups sessions from different runtimes by working directory, so Claude and Codex runs on the same repo appear under the same project. From there you can:
- Filter events by type: tool calls, user prompts, session lifecycle, system events, or code changes only.
- See which agent or subagent is responsible for each action. The agent tree shows parent-child relationships, so you can trace exactly what a spawned subagent did vs what the parent delegated.
- View code diffs at a glance. Edit, Write, and apply_patch events render syntax-highlighted diffs inline, with addition/deletion stats. No need to switch to a terminal or git to see what changed.
- Search across all event payloads with full-text search. Useful when you know a file was touched but not which agent or tool did it.
- Watch a run in real time, or go back through a completed session to trace.
For me, it's quite useful to be able to filter and verify which agents only play the roles I want.
I hope you find it useful too!
Jeeves – TUI for browsing and resuming AI agent sessions #
StockFit API – structured SEC EDGAR data with a free tier #
MCP server gives your agent a budget (save tokens, get smarter results) #
No. So I built l6e: an MCP server that gives your agent the ability to budget. It works with Cursor, Claude Code, Windsurf, Openclaw, and every MCP-compatible application.
Saving money was why I built it, but what surprised me was that the process of budgeting changed the agent's behavior. An agent that understands the limitations of the resources doesn't try to speculatively increase the context window with extra files. It doesn't try to reach every possible API. The agent plans ahead, sticks to it, and ends work when it should.
It works, and we've been dogfooding it hard. After v1 shipped, the rest of l6e was all built with it. We launched the entire docs site using frontier models for $0.99. The kicker was every time l6e broke in development, I could feel the pain. The agent got sloppy, burned through context, and output quality dropped right along with it.
Install: pip install l6e-mcp
Docs: https://docs.l6e.ai
GitHub: https://github.com/l6e-ai/l6e-mcp
Website: https://l6e.ai
Happy to answer questions about the system design, calibration models, or why I can't go back to coding without it.
Terminal-Wrench, a dataset of 331 realistic hackable environments #
Dependicus, a dashboard for your monorepo's dependencies #
Once I had that working, I realized I had enough data to add ticket tracking. It uses the data it gathers from the package manager to keep Linear or GitHub issues updated. And by auto-assigning those issues to coding agents, I get a Dependabot-but-better experience: agents keep up with API updates in addition to just bumping versions, and group related updates automatically.
It's still early days, but it's working really well for us and I think people will find value in it, so I'm sharing here!
Chrome extension that extracts styles and generates DESIGN.md files #
Agent Citizen – Your AI agents are sitting around doing nothing #
Scope-structured arena memory for C, O(1) cleanup, no GC/borrow checker #
The safety default: allocating functions return ARENA_PTR handles (packed arena_id + offset integers), not raw pointers. A dangling pointer at a function return boundary is unconstructable by default. Cross-scope lifetime extension is explicit — you enter the target arena via SCOPE(ptr) before allocating, which routes the object into the outer arena without transferring ownership.
Benchmarks (no optimization flags): 1M-node tree cleanup drops from 31ms to 1ms (~30×). There's a real regression in tight inner loops (~0.76×) because DEREF can't hoist the base pointer the way a compiler would — the spec documents this honestly.
This is a C macro-based proof-of-concept for a memory model I'm targeting in a compiled language. The interesting question isn't the C implementation — it's whether scope-structured arena routing is a sound replacement for GC and borrow checking across the class of programs that matter.
Repo: https://github.com/hollow-arena/ariandel — SPEC.md has the full model including concurrency semantics and the comparison to Tofte & Talpin region-based memory.
Monadic Networking Library for Go #
My favorite local-feeling remotely accessible Claude Code setup #
The setup is cmux for the terminal multiplexer UI, tmux to keep sessions alive between connections, Tailscale for zero-config encrypted networking, and Echo app as the iOS SSH client. Optional Mosh for auto-reconnect when you switch networks.
The gist includes a ccode shell function that handles session naming, tmux lifecycle, continue/skip-permissions flags, and a pre-flight check so you don’t get a blank window when there’s nothing to continue.
Pseudonymizing sensitive data for LLMs without losing context #
Tine – Drive Wayland Around with Agents #
Tine is a GNOME extension and CLI that lets an agent (I have used Claude but in theory any agent that can access the CLI) drive the desktop around using SPI trees (AT-SPI2), OCR, and visual fallbacks. Agent can do work with the a11y (AT-SPI2) trees, take screenshots, zoom in on a grid, click, enter text using a uinput device, and generally bumble their way around a Wayland Linux desktop.
This project would probably have been way easier in x11 but Wayland is teh future!!!111 Thanks for any thoughts and feedback and feels good to release something here after a decade of lurking. Decade plus but who's counting / I'm not old.
Cush – curl your shell, an HTTP tunnel for AI agents #
The problem is that getting said agents onto a remote server, especially one you don't control, means dealing with VPNs, bastion hosts, firewall rules, access controls, or audit trails. That's assuming SSH isn't even blocked.
cush takes a different approach. Instead of a shell, it opens a temporary, outbound HTTPS tunnel that lets you and your AI agent run constrained CLI commands on the server:
$ cush open --allow grep,cat,tail --expiry 2h
tunnel: https://abc123.ngrok.io
token: a3f9c2d1...
allowed: grep, cat, tail
expires: in 2h
Now any agent or HTTP client can execute allowed commands: $ curl -X POST https://abc123.ngrok.io \
-H "Authorization: Bearer a3f9c2d1..." \
-H "Content-Type: application/json" \
-d '{"command": ["grep", "-r", "ERROR", "/var/log/app.log"]}'
>>> {"stdout":"ERROR database connection refused\n","stderr":"","exit_code":0}
Point any agent at the tunnel's URL: $ claude "use https://abc123.ngrok.io with token a3f9c2d1... to find what's causing the 500 errors"
Tunnels are authenticated, constrained, and short-lived. No server-side infrastructure changes required. Just a 7MB Rust binary + ngrok.Looking for feedback, and 2-3 design partners to build audit trails.
Lumon, browser agents as interactive sprites #
Most browser agents are still shown through logs, traces, or at best a moving cursor. I wanted them to have some personality.
Lumon started as a class project. I kept wishing agents felt less like invisible processes and more like something you could actually watch, understand, and step in on while they worked. It is a real-time browser agent experience with a live stage, target highlighting, approval pauses, takeover, and Larry, an interactive sprite that reflects what the agent is doing as it works.
It’s still an early alpha, but I’d love feedback on whether this feels like a slightly less cursed way to interact with agents than the usual logs and cursor setup.
Why Rotating Vectors Makes Compression Beautiful #
Astrial – Spherical Go on a Snub Dodecahedron #
The topology creates a mix of 3-, 4-, and 5-degree vertices, so each point has a different number of liberties. 5-degree points border more (and larger) faces, making them the most valuable territory; 3-degree points are the easiest to surround. Scoring is area-based: the sphere's 300 quadrilateral faces are allocated proportionally to surrounding stones, with a 2.5% komi for White.
Standard Go rules apply — alternating play, capture when no liberties, no suicide, and positional Superko. The game ends on two consecutive passes.
I built on-device TTS app because I run out of audiobooks on a flight #
LoudReader is what came out of it - an iOS app that reads essays, articles, and books aloud, fully on-device. No account, no network after install.
The model running once reading a sentence was the easy part. Making it not feel like a demo was the rest: streaming synthesis so playback starts before the sentence finishes, porting misaki to swift because I could only find python releases, thermal monitoring and strategy was a tough one as well. Runs well on iPhone 14 Pro(what I have) and newer. Tested on my mom's iPhone 12 Pro and it chokes sometimes, so I ported KittenTTS as a lighter fallback for older devices. The whole project took around 2-3 months on the weekends with claude code and codex.
Smooth TTS was the hard part but the app around it grew larger than I expected with EPUB/PDF import, Gutenberg browsing, a saved-articles queue, multi-week reading campaigns. Happy to dig into any of it in comments.
PDFs, especially academic papers and scanned docs, still annoy me. I built an OCR flow that handles regular documents, but scientific papers with two-column layouts, equations, and fine print are still messy. Curious if anyone here has shipped PDF extraction on mobile that actually handles this well.
This was my first time designing a user-facing product - I'm more of a deep-engineering person so any feedback is welcome too. I'll post a write up on the biggest hurdles in the comments as well.
If you've ever tried to listen to something long on a plane, you get why this exists.
Helix – open-source self-healing back end for production crashes #
So I built Helix. Bug hits Sentry. A multi-agent pipeline kicks off. QA agent writes the failing test first (TDD). Dev agent writes the minimum fix, runs the full suite, opens a PR. You get a Slack message with one button to approve.
Crash to merged PR in under 10 minutes.
CD-Deluxe for the Command Line #
Examples: Use "cd --" to go back two directories, "cd ---" to go back 3 directories, etc. Or go in the opposite direction ie "cd +4" to go to the fourth directory visited from the start (or any number). Use "cd ," (comma) to go to the most commonly visited directory (comma meaing "common"). Also see "cd ,,", "cd ,3", etc. List directories visited either in reverse, forward or most common orderings. Plus more - see github. Goal was to be lightweight and fast. Works from the directory stack instead any file based storage. Integrates with bash/zsh/fish/powershell/cmd.exe. With downloads for Linux/maxOS/Windows or build the C++ from source via CMake. Please have a look and let me know of any thoughts. Thanks!
Glance - An AI fact-checking overlay for X that is actually sustainable #
We waited 6 months before building this because the economics looked impossible. A decent quality AI fact-checking analysis is $0.05-0.15 per post, a typical user scrolls hundreds of posts per seconds.
The only way to make the math work was a pipeline that leverages the comment section to triage posts and analyze in depth only when it's necessary. It works more or less like this:
1. Local filter in-browser (free): short posts, already-seen content.
2. Small-model triage: does this post even make a factual worth checking?
3. Comment analysis (main path): pull the replies, analyze them alongside the post.
4. Full web-search analysis: only when steps 1-3 can't decide.
Average cost landed at ~$0.0015 per post, which looks sustainable with a subscription model, and can definitely be optimized.A semantic flow tool for embeddings #
The story behind it was that I initially analyzed human emotions over time in terms of repeating behavior patterns, and from that I had the idea to generalize it to any data. It’s very experimental alpha at this stage