2026年3月25日 的 Show HN
46 篇DuckDB community extension for prefiltered HNSW using ACORN-1 #
Optio – Orchestrate AI coding agents in K8s to go from ticket to PR #
Optio is an open-source orchestration system that turns tickets into merged pull requests using AI coding agents. You point it at your repos, and it handles the full lifecycle:
- Intake — pull tasks from GitHub Issues, Linear, or create them manually
- Execution — spin up isolated K8s pods per repo, run Claude Code or Codex in git worktrees
- PR monitoring — watch CI checks, review status, and merge readiness every 30s
- Self-healing — auto-resume the agent on CI failures, merge conflicts, or reviewer change requests
- Completion — squash-merge the PR and close the linked issue
The key idea is the feedback loop. Optio doesn't just run an agent and walk away — when CI breaks, it feeds the failure back to the agent. When a reviewer requests changes, the comments become the agent's next prompt. It keeps going until the PR merges or you tell it to stop.
Built with Fastify, Next.js, BullMQ, and Drizzle on Postgres. Ships with a Helm chart for production deployment.
E is for ENSHITTIFICATION – An illustrated children's book on big tech #
A is for Algorithms. B is for Billionaire. C/D is for Cory Doctorow.
Thought y'all might be interested. There's a free PDF or you can request a copy for future print runs. Would love to hear what you think.
Automate your workflow in plain English #
We talked to marketing ops people recently to validate whether we are solving the right problems. Three things came up every single time.
Setup complexity. People are not afraid of automation in theory. They are afraid of spending two hours configuring conditions and field mappings, only to have something silently misroute. The config layer is where confidence dies.
Debugging. When a workflow breaks there is usually no explanation. A trigger did not fire, data passed null downstream, a sequence stopped. You find out three weeks later when someone downstream asks a question. Nobody knows where it went wrong so they delete it and go back to doing it manually.
No trust without control. Everyone wanted to keep a review step before the system acts on its own. Not forever, but until it had proven itself across enough edge cases. The unlock for automation adoption is not fewer steps, it is making it safe to delegate gradually.
What we are building is a system that addresses all three. Plain English input so setup is fast, step-by-step explanations so debugging is readable, and staged autonomy so trust is earnable.
For founders who have built or managed GTM and marketing ops teams: does this match what you have seen. And is there a fourth problem we are missing.
Τ³-Bench is out – can agents handle complex docs and live calls? #
τ-Knowledge: agents must navigate ~700 interconnected policy documents to complete multi-step tasks. Best frontier model (GPT-5.2, high reasoning) hits ~25%. The surprising part: even when you hand the model the exact documents it needs, performance only reaches ~40%. We found that the bottleneck isn't retrieval — it's reasoning over complex, interlinked policies and executing the right actions in the right order.
τ-Voice: same grounded tasks, but over live full-duplex voice with realistic audio — accents, background noise, interruptions, compressed phone lines. Voice agents score 31–51% in clean audio conditions and 26–38% in realistic ones. A consistent failure pattern across providers (OpenAI, Gemini, xAI): agent mishears a name or email during authentication, and everything downstream fails.
We also incorporated 75+ task fixes to the original airline, retail, and telecom domains — many based on community audits and PRs (including contributions from Amazon and Anthropic). We believe a benchmark is only as good as its maintenance, and we're grateful for the community's help improving it.
Code and leaderboard are open — we'd welcome community submissions and feedback.
Blog post (papers, code, leaderboard): https://sierra.ai/blog/bench-advancing-agent-benchmarking-to...
Plasmite – a lightweight IPC system that's fun #
What was especially useful (and unusual in IPC systems it seems) was the property that message channels outlive all readers and writers, and even survive reboots, because they're just files. For local IPC you don't need a broker or server process.
All the engineers who ever worked at Oblong loved Plasma, so I've recreated and updated it, as Plasmite.
It's written in Rust and the message format is JSON, but it's fast because it's based on lite3 (https://lite3.io/index.html), a really cool project you should also check out.
Bindings for Python, Go, Node, and C, but you can also get a lot done with just the CLI tools. The basic commands are - "feed" (to write) - "follow" (to tail) - "fetch" (to read one) - "duplex" (to have a 2-way session)
I think duplex could be great for agent-agent communication, but I haven't tried this much yet. If you do, let me know!
RemoteDevJobs – AI-curated remote developer positions with scoring #
I built a free CharacterAI that runs locally #
The voice pipeline currently supports MLX on any M1 through M5 chip. I used Whisper-Turbo for STT, Qwen3.5-9B-4bit for the LLM and Qwen3-TTS-0.6B-4bit for TTS.
The repo also has a Websocket Transport to add these voices to devices powered by the ESP32 via secure websockets.
As Notes – A Static Site Generator in Your Markdown Knowledgebase #
CI/CD in your terminal, zero YAML #
We set out to build Zippy. A CI/CD system that works from your terminal. No context switching, no slow containers, instant feedback and seamless Claude Code integration. Just git push, instant build, and move on. Two bash scripts, one to setup the (cached) environment, one to run the build process.
I built an integration for RL training of browser agents for everyone #
I built a voice AI that responds like a real woman #
I built vibeCoach — a voice AI where you actually practice these conversations out loud, and the AI responds like a real woman would.
She starts guarded. One-word answers, a little skeptical. If you escalate too fast or try something cheesy, she gets MORE guarded. If you're genuine and read the moment right, she opens up. Just like real life.
Under the hood it's a multi-agent system — multiple AI agents per conversation that hand off to each other as her emotional state shifts. The transitions are seamless. You just hear her tone change.
Voice AI roleplay is a proven B2B category — sales teams use it for call training. I took the same approach and pointed it at the conversation most men actually struggle with.
There's a hard conversation scenario too — she's angry about something you did, she's not hearing logic, and you have to navigate her emotions before you can resolve anything. That one's humbling.
Live at tryvibecoach.com. Built solo. Happy to answer questions.
Marco, a privacy-first, offline-first email client (IMAP-native, no AI) #
I started Marco because I finally lost patience with Apple Mail, and the email client market has a weird gap. Legacy clients look terrible and/or are not cross-platform. The good ones scan your data or cost $300+/year. And there's a graveyard of startups (Tempo, Big Mail, Caley) who built beautiful products and shut down after a year or two.
I made a few opinionated bets early on:
1. IMAP-first, not Gmail API-first. Nearly every email startup builds on the Gmail API. It's convenient, but it locks you into one provider. Marco is IMAP-native, which means it works with Gmail, Outlook, iCloud, Fastmail, custom domains, and any provider that supports IMAP.
2. Offline-first. You should be able to read, reply, delete, and organise email on a plane with no wifi. When you reconnect, everything syncs. This requirement nearly killed me. I went through WatermelonDB, Triplit, InstantDB, PowerSync, and Replicache before landing on my current approach: regular HTTP endpoints with TanStack DB and TanStack Query, using IndexedDB on web and SQLite on mobile as storage layers. I ditched "fully fledged" sync engines entirely. Turns out, for my data volumes (hundreds of thousands of entities per user), every sync engine I tried either choked on performance or added complexity I didn't need. I wrote about this journey in detail: https://marcoapp.io/blog/offline-first-landscape
3. No AI. This is intentional. Every email client launching right now leads with AI. I think most of it is noise that none of us want or need. Marco is a tool. It should be fast, reliable, and stay out of your way. No email summarisation, no smart replies, no "AI powered" anything.
The stack is React Native with Expo, Node.js backend on Railway, Postgres, Redis, S3, etc (all privately networked). Yes, a lightweight backend is needed to facilitate things like push notifications. One codebase across all frontend platforms, 100% shared code.
Marco is bootstrapped and profitable at $8/month with a 7-day free trial. 2,000+ users, all organic. No VC, no paid marketing.
Would love feedback from HN. Happy to go deep on IMAP internals, the offline-first landscape, or any of the technical decisions.
Jmail Launches Jcal #
Herd – A Go sidecar to stop stateful processes Puppeteer/LLMs from OOM #
I'm an engineering student at Waterloo building stateful AI agents, and I kept hitting the same wall: whenever my Python scripts crashed or dropped a connection, the underlying Puppeteer or Ollama processes would just sit there orphaned, eating RAM until the node OOM-killed itself. Standard load balancers break sticky sessions, and passive HTTP timeouts are too slow for cleanup.
I couldn't find a good local process pool that actually cleaned up dead stateful sessions reliably, so I built Herd in Go.
It uses a persistent stream (gRPC/Unix sockets) strictly as a dead-man's switch. If your client script dies, the stream breaks. Herd registers the EOF and instantly fires a SIGKILL to the worker process (relying on Pdeathsig on Linux). For the actual heavy data, you just blast HTTP traffic through Herd's internal proxy, which routes it directly to the active process port.
My actual goal is to turn this into a multi-node distributed mesh with a Redis registry, where a client can drop off and an edge gateway routes them back to the exact pod holding their stateful memory.
But I know building a distributed mesh on top of a leaky local engine is a death sentence. The single-node cleanup has to be flawless first.
I'd love for you guys to roast the architecture. Specifically: is relying on Pdeathsig actually robust enough for a local dead-man's switch in production, or am I being naive and need to just bite the bullet and wrap everything in cgroups & microvms right now?
Repo link: https://github.com/herd-core/herd
GhostDesk – MCP server giving AI agents a full virtual Linux desktop #
GhostDesk gives your agent a full Linux desktop and the motor skills to operate it like a human realistic mouse movement, natural typing, screenshot fallback for CAPTCHAs. It reads UIs semantically and behaves like a real user when sites try to detect bots.
Book a flight, scrape a site without selectors, operate legacy software with no API, run QA across an entire app one prompt. If a human can do it on a desktop, your agent can too.
Runs in Docker. Spin up multiple instances in parallel, each driven by a sub-agent. No real ceiling.
Works with Claude, GPT, Gemini, or any local model (Ollama, LM Studio). MIT.
Stella Foster – iMessage on Any Phone #
Keystone – building self-configuring agents #
Keystone runs the agent inside a Modal sandbox with its own Docker daemon. It iterates until tests pass, then hands you a .devcontainer/ you can just check in. Now your repo knows how to run itself!
Cognium – Tree-sitter+taint Tracking SAST for Java,Python,JS,Rust #
PSFuturemail – Write a letter and forget it until it arrives #
Existing options didn't quite work for me. While Gmail lets you schedule emails, seeing those drafts every time was tempting, breaking the surprise. FutureMe has moved to a paid model, and most alternatives either lack encryption, feel limited, or don’t allow editing after scheduling.
So I built PSFutureMail.
It's a simple web app where you can write a letter and choose a delivery date anywhere from days to decades in the future. Letters are private and encrypted by default. You can edit or delete them at any time before delivery. Attachments are supported as well.
There's also an option to publish letters anonymously so that others can read them.
The core idea is to make it easy to write something to your future self in the best possible way.
Would love feedback: https://www.psfuturemail.com
clickity – mechanical keyboard click sounds when you type on macOS #
sound files are from https://mechvibes.com/
Starlink constellation health – 108 reentry anomalies in TLE data #
Results on 257 confirmed reentries: V1=250.6x basin separation, Recall=1.000, mean detection lead 471 days.
Applied to 15,170 operational satellites: 108 fire the reentry signal. Not in the deorbit campaign. BSTAR is normal — signal is trajectory geometry. 23 are above 600km where drag is negligible.
Prior work (Oliveira et al. 2025, Frontiers) tracked SC25 effects on satellites that already reentered. This asks the forward-looking question: which operational satellites currently show precursor geometry?
Code runs on a clean clone. Reproducible 10/10. https://github.com/mojoatomic/stts.git
Necessary Cuts – an interactive fiction fragment #
The result is a short interactive fragment (~5 minutes) — ambient audio synced to prose, three scenes. I didn't want to build a true game as much as make an attempt at immersion. So I re-wrote a fragment set in the world of the book, and wired up the computer-y bits to it. The fragment is in second person (the novella isn't), because other POVs don't really work with the immersion angle.
For the technically interested, this is just vanilla JS and Web Audio API, no frameworks - as is the way of my people.
First-token-only flaw in Claude Code permissions (triage bot too) #
But that is exactly wrong. Allow and deny lists allow DANGEROUS actions like "git cleanup"
Some human needs to read this HN post and my blog post. I've written a bash-guard fix that I use locally, but I CAN'T help everyone else until Anthropic takes my bug report seriously
https://github.com/anthropics/claude-code/issues/36637 https://github.com/anthropics/claude-code/pull/36645
FlowScript – Agent memory where contradictions are features #
So I built FlowScript.
FlowScript is a typed reasoning graph that your agent builds through tool calls during your everyday work. It is NOT a graph database. What it is is a small set of unique opinionated primitives: things like thoughts, questions, decisions, blockers, and each of those have typed relationships between them. Your agent calls the tools as it works and it builds this typed graph, and then afterwards you can query that structure to get actual deterministic answers using five queries: tensions, blocked, why, whatIf, alternatives.
What does this look like in practice? Here's an agent that has been reasoning about database choices for a few sessions:
> mem.query.why("node_postgres_decision")
PostgreSQL chosen
← "Need ACID for payment processing"
← "Original requirement: handle refunds atomically"
← "Stripe webhook failures in staging revealed race condition"
> mem.query.tensions()
><[performance vs cost]
"Redis: sub-ms reads critical for UX" vs "Redis cluster: $200/mo for 3 nodes"
The why chain traces back to the original constraint and the tension preserves the actual tradeoff being made. These are things no vector store can do, because they are NOT just flat facts, but are relationships and reasoning chains that are being captured in a deterministic way. Meaning you can actually go back and audit the actual reasoning of your agent, how it evolved over time, and see the actual tensions that were being balanced. No more opaque reasoning that is lost as soon as the polished answer is generated. Try that in any other memory system, I'll wait.Other memory systems, when they come across a tension or a contradiction, for the most part they are just simply deleting that. And that is wrong because that tension is new knowledge. Knowledge that we need to actually keep for auditing and because it tells us about the evolution of the system and its cognition over time. So instead of deleting contradictions, we relate and create named relationships for them. Relationships you can query.
Every decision, every tension, every piece of reasoning is being deterministically captured into an audit trail and hash-encoded. Now, not only do you have a deterministic reasoning chain, but that reasoning chain is auditable. You can go back to any point within the time that you have audit logs for and deterministically review and understand the actual reasoning chain that your model was using. Something that no other system can offer. The EU AI Act is going to require exactly this kind of transparency by August 2026, and as far as I can tell, FlowScript is the first open source agent memory system that is designed to meet that bar.
Try it NOW: Our MCP server in Claude Code or Cursor. Install and check our Get Started guide so you can add one JSON block to your editor config and drop a snippet into your project CLAUDE.md file, then restart. Your AI assistant gets a full set of reasoning tools that actually trace causality.
pip install flowscript-agents openai
See flowscript.org for full setup instructions: <https://flowscript.org/get-started>Or grab the TypeScript SDK for programmatic use:
npm install flowscript-core
There are drop-in adapters for LangGraph, CrewAI, Google ADK, and more Python agent frameworks. MIT licensed. Open source.Repo: <https://github.com/phillipclapham/flowscript> Docs: <https://flowscript.org> Python SDK: <https://github.com/phillipclapham/flowscript-agents>
BallotGuessr – Guess the 2024 election margin from a Street View photo #
Dirsv – dir browser, battery packed Markdown previewers #
IMO, it is a better markserv [0] that supports more features I needed for my workflow with markdown and general dir browsing:
- Math heavy note. - Tech design docs with heavy mermaid diagrams. - Code file, images, preview. Last time I used markserv, when browsing into an unsupported text files, it is downloaded. dirsv will render the file with syntax highlighting.
I also built the companion nvim for this: https://github.com/letientai299/dirsv.nvim
Compare to my last Show HN sharing, many bugs was fixed, and a few changes was made:
- Sync scroll, cursorline and visual selection between nvim plugin can the previewer. - Change default port ranges to 3579+, to avoid conflict with many other tools that like to use 8xxx range.
Disclaimer: this was built with claude code, and, I do code review seriously to make sure I can still work on it without any AI tools.
Vectree – Learn complex concepts through AI-generated interactive SVGs #
I launched Vectree about 3 weeks ago as an after work side project. It's basically a "visual Wikipedia" where you explore concepts through interactive, zoomable SVGs.
Link: https://vectree.io
Why I built it: Whenever I encountered a new complex concept, I just wanted a quick visual way to understand the basics. I started manually prompting LLMs to explain things to me by generating SVG schematics. It became so useful for me that I decided to automate the process and turn it into a web app.
How it works: You can browse the public graph of concepts totally for free. You click on different parts of an SVG diagram (nodes) to drill down into sub-concepts.
Bring Your Own Key (Private Lab): While browsing is free, I highly encourage creating an account and plugging in your own paid Gemini API key. This unlocks a "Private Lab" where you can: - Architect your own private concepts from scratch - Regenerate existing concepts (most concepts I generated with "flash" Gemini model) - Publish your private concepts to the public graph if you want to share them
Tech Stack: I used Elixir about 5 or 6 years ago. I kept hearing about how good its new AI/ML ecosystem is getting, so I used this project as an excuse to jump back in. - Backend: Elixir / Phoenix LiveView - Local AI: Bumblebee + Nx (running local embedding and toxicity models) - Cloud AI: Google Gemini (for generating the actual SVG structures and JSON) - DB: PostgreSQL + pgvector for semantic search
It's been a really fun experiment. I'd love for you to try it out and let me know what you think!
Mandarin Melon – A webapp for learning Chinese by reading social media #
I built Mandarin Melon (mandarin-melon.com) to scratch an itch as a Chinese learner.
I've found with language learning that I learn best when I'm getting a lot of high quality comprehensible input. Reading, listening, watching videos. But since my Chinese isn't great, it's kinda hard to find content that is at my level, and is actually engaging. A lot of people give advice like "watch Peppa Pig in Chinese", but IMO, Peppa Pig is not particularly beginner friendly and is also really boring. For reading, similarly, graded readers can target a specific reading level and are really useful, but get boring fast.
On the other hand, social media is about the ultimate form of engaging content. But as an intermediate learner, I quickly get lost trying to use actual Chinese social media platforms, scrolling without really understanding enough to be learning.
So I built Mandarin Melon as a way to read social media posts that use only the characters you've already studied.
(1.) Textbook vocabulary tailored feeds - If you study with the standard HSK textbooks, this creates a tailored feed using just characters you already know. For example, if you're at HSK level 3 here is a collection of 56,000+ posts that only use characters from HSK 3 and below: (https://mandarin-melon.com/bylist/hsk-old?level=3&atLevel=2&...). You can also choose to introduce posts with 1-3 characters you don't know, to push your learning and expand the base of posts to browse.
(2.) A mode for new learners (https://mandarin-melon.com/learn/onebyone/0) - In this mode, characters are introduced one-by-one, with definitions / pronunciations for the new character. The characters you learn are ordered such that each new character maximizes the number of new posts you can read by learning it. This wouldn't be a good way to learn Chinese on its own, but would be fun for a new learner to dip their toes into Chinese social media.
Personally I'm using the app most days. I find the bite-sized bits of content / learning a really motivating way to keep up my daily Chinese practice. I also find the little stories of peoples posts really fun and natural in a way that textbooks / graded readers are not.
If there are any Chinese language learners, or folks interested in Chinese social media, I'd love to hear thoughts / feedback. Thanks for checking it out!
Rostra – Scroll the Greats #
So instead of nobodies, I'll scroll the greats.
Read the latest from top practitioners, or read old books - nothing in between. Why listen to a random X account's take on AI coding, seed oils, Iran; or an anon telling you how to spend your time? On topics that don't change quickly - relationships, human nature, how to live a good life - I'll take the Cicero's, Marcus Aurelius', Lao Tzu's, the greats of mankind any day.
--
The Rostra was the platform in ancient Rome where orators addressed the people. Read history's greatest ideas in Rostra; life is too short for anything else.
readrostra.com
Blind voting kills groupthink – validate 10 ideas before others do 1 #
AVE Database Open taxonomy of 50 failure modes in multi-agent AI systems #
Claude Code generated a tip that looks like an ad #
"Tip: Need your database queries 1000x faster? Accelerate offers you that and more: https://pris.ly/tip-2-accelerate"
Screenshot: https://imgur.com/a/9nfAA1j
I had a fresh context before the prompt and no external retrieval (e.g. web search or Context8) was used. My assumption is that this is a result of the training data in combination with Claude Code's tip behaviour.
Where is the line between helpful suggestions and advertisement? Can ads be intentionally injected via training data?