2026년 5월 13일의 Show HN
40 개Torrix, self hosted, LLM Observability,(no Postgres, no Redis) #
curl -o docker-compose.yml https://raw.githubusercontent.com/torrix-ai/install/main/doc... docker compose up
No external dependencies. All data stays in a local SQLite file on your machine.
It logs LLM calls through a HTTP proxy or a python/Node SDK : tokens, cost, latency, full prompt and response traces, reasoning token capture. Works with OpenAI, Anthropic, Gemini, Groq, Mistral, Azure Open AI and any Apen AI compatible end point.
Things I added as I actually used it on real agent pipelines: cost forecasting and hard budget caps, PII masking, model routing rules, evals with golden runs, AI judge, a prompt library with version history, run tags for filtering by environment, MCP server so AI Assistants can query your own logs and OTLP/HTTP ingestion for apps aöready using OpenTelemetry.
Community edition is free for one user with 7-day retention. Pro adds teams, RBAC, 30 day retention, API key management, full text search and audit logs.
SQLite doesn't scale to high write throughput. This is aimed at teams logging hundreds to low thousands of LLM calls per day, not millions. Happy to hear what people think and what is missing.
GitHub / install: https://github.com/torrix-ai/install Website: https://www.torrix.ai
Rotunda - A browser built for agents with simulated typing #
Rotunda is a firefox fork primarily intended for agent use, which I’ve been hacking on nights/weekends.
There was a [lengthy](https://news.ycombinator.com/item?id=48024859) discussion last week on how expensive computer use models are. The cost is going to drop eventually, but I think on some level it's still usually the wrong primitive. The web gives us access to beautiful structured formats, plaintext, etc... why throw that away if we don't have to?
I realized at some point that for 99% of automations I just want agents to be able to control my Chrome instance. But that’s easier said that done: CDP (the Chrome automation protocol) leaks a ton of state about being programmatically controlled, either by toggling window attributes or by running `page.evaluate()` commands right in the page context. Plus if you look at an automation running it's pretty obvious what happens: the mouse jumps around, fields are filled instantly, etc.
Rotunda tries to fix this. Its standout features:
- Realistic simulation of mouse movements and keyboard commands, powered by a trained RNN on my own timing patterns from the last week. (still feel weird about opting-in to a key logger but whatever)
- Doesn’t lie about its host specs, only fibs about some client side details. Stealth browsers are too easy to flag statistically when you’re adding noise to canvas pixels or audio pipelines.
- It runs on your local device with a CLI or Playwright API accessible to Claude, Codex, or whatever your harness-de-jure today looks like.
- Patches modern Firefox (150) with an agentic harness to keep this updated over time
MPL-2.0 on GitHub: https://github.com/monkeysee-ai/rotunda
Longer writeup on the design choices: https://pierce.dev/notes/a-browser-for-agents
Also check out the demo on the site! https://www.rotunda.sh/
Pretty excited by how this turned out but we’re still super early. Give it a try and please flag any issues!
FixMyNPM, CLI to fix your insecure npm config #
Neural window manager, neural network moving windows from mouse actions #
As an experiment to answer this, I set out to create a neural window manager, training a neural network to predict what the screen would look like next.
Basically, the idea was to generate the next frame based on the last two frames and the mouse position. That's it: moving windows without programming an event system, just a simple convolutional neural network guessing pixels.
To implement the experiment, I used Pygame to simulate a turquoise desktop background, a gray window with a navy blue title bar, a white cursor, and four colors in total. Then, a bot randomly dragged the window, and I recorded everything, processing the frames as color index matrices (not RGB, to avoid complications) and the mouse delta (dx, dy, click) that caused each transition. 8000 frames, a few minutes in Colab.
The model is a unitary neural network (UNET). The encoder compresses the stacked frames, the decoder reconstructs the next one, and the mouse vector coordinates are projected with a linear layer to fit the spatial size of the bottleneck. There, they are concatenated before decoding, so that motion information feeds each jump connection.
And it works! Which still surprises me a little. You can drag, and the window follows you; when you release, it stops. There's no internal state, no (x, y) coordinates anywhere. The model infers the position from what it sees, which works until it doesn't. But after a couple of seconds of strange movement, the window starts to distort.
This will probably improve with more computing power for training and more examples, but to narrow the scope of the experiment and test it within a web browser, I decided to abandon the rendering aspect and have the model predict primitives instead of pixels, simply converting the motion engine into a neural network.
Basically, I trained a small MLP to receive (distance to the title bar, distance to the resize point, click) and generate (dx, dy, dw, dh), with two separate heads: one for moving and one for resizing. The trick is that they share nothing except the click signal, so the model can't confuse dragging with resizing. I then exported it to ONNX as well, and now everything runs in the browser, without a server, just a canvas element and two small neural networks communicating with each other.
With this new approach, the renderer remains deterministic, with rectangles drawn in JavaScript, but the window's behavior (where it moves, how it resizes) is learned from examples. It feels like a peculiar middle ground between traditional and neural, so you can feel the space the network has learned by interacting with it: dragging near the title bar moves it, but approaching the corner resizes the window. There are no conditionals or hitbox code; the network simply learned where those areas are from examples.
Sometimes it gets confused near the edges, which, frankly, is more interesting than if it worked perfectly; you can perceive how the probability changes. This makes sense when you think about it, because no (x, y) coordinates are stored in these models; the position is implied in the activations. It works well for short sequences, but fails when asked to maintain state over time.
Update: A few weeks later, Meta published the Neural Computers article (2604.06425, it's worth reading). The premise is the same, but they go much further: cli and uis, real programs. Their failure modes are practically identical to those I found with the pure pixel version: "challenges persist with routine reuse, controlled updates, and symbolic stability." which is a fancy way of saying that the window blurs after a few seconds (that was the reason for choosing deterministic rendering).
Monghoul – Desktop MongoDB GUI with schema-aware autocomplete and MCP #
I wanted a GUI that looks modern and snappy, minimal, not like 2003 MS Excel with dozens of buttons and dropdowns everywhere. I also wanted it to have a smart autocomplete that actually knows a schema, not just keys of the current collection, but their types and enum values. I wanted to type find({status: "}) and see "pending", "active", "cancelled" in the autocomplete suggestions.
As a tech stack, I chose Tauri for the shell, Bun for the sidecar running the MongoDB driver and a tRPC server, and react, tailwind, react-query for the UI. The installer is around 33 MB.
Also it has a built in MCP server that allows your AI tools to fully control the app: write queries, build charts, organize workspace, find and restore tabs that you once closed etc.
Using the combination of tauri + bun sidecar + trpc with react-query was the best decision: - startup under 2 seconds - end-to-end type safety without a need to update client interfaces on back-end changes - client optimistic updates are super easy to do, so everything feels instant
Mistle – Open-source infrastructure for running sandboxed coding agents #
We saw larger tech companies like Ramp (Inspect) and Stripe (Minions) build this internally and thought an open source version should exist.
We made a few very intentional decisions when working on this:
1. Credentials are kept out of the sandbox. Authorized access goes through a proxy, so agents do not directly receive credentials.
2. The harness is not our problem. We're not going to tackle things like memory, self-learning.
3. No magic. Configurations are explicit. You can bring your own keys for models, sandboxes, and other providers. You can write your own instructions and agent.
Mistle can be run locally with a single command: https://github.com/mistlehq/mistle#run-mistle-locally
Questions, feedback and ideas are welcome!
Vim file browser that runs in separate terminal #
I didn't want to learn Vim's window management, so I created a Vim file browser that can run in its own tmux pane.
Mainline – a project tool with no backlog, story points or surveillance #
The notion of story points was the first thing to go. Even Ron Jeffries, who invented them, regrets their usage. Mainline hard-codes a story size of 1 and tracks whether each story falls outside the team's normal distribution of time from start to release. After a few weeks the team has a real handle on their cadence, backed by data instead of planning-poker theatre. No estimation meetings required. Just do the work.
Unreleased work is made visible and tracked as inventory. There's a cottage industry of hosted feature-flag platforms selling a different idea to the original technique. Long-lived business-facing flags create waste in the form of unmanaged inventory left hanging around past its usefulness. Worst case, they become a giant configuration surface that "tech debt" doesn't do enough to capture as a term. Mainline tracks branch-by-abstraction, dark launches, feature flags, and old stories in flight through three phases: in progress, ageing, and stale.
The backlog is replaced with a Story Map. Work is shown as a user journey, and if it's not on the map (or has no user persona associated) it doesn't exist. This makes it harder for someone to dump tech tasks or random work onto the team for the purpose of "having a backlog."
The collaboration patterns high-performing teams use are first-class. Ensemble, pairing, and solo work are all visible and tracked, which makes de-siloing legible and promotes co-ownership.
I resisted adding AI features. It would be a weekend's work, and it would also undo the whole point. The longer argument is at mainline.dev/docs/ai.
I started building this with event sourcing and CQRS because I wanted the app built around an immutable event log. It wound up being too much boilerplate to change anything simple. So I went with an RDBMS and transactions to implement the immutable event log instead.
As I've learned after watching dozens of microservice teams utterly fail to safely ship anything, I took my own advice and started with a boring monolith and RDBMS. The stack itself is boring on purpose: Ruby with Roda and Sequel, Puma with YJIT, deployed via Kamal to Hetzner.
For teams who want to get better at CI, I built an integration with GitHub (GitLab coming) that reads the repo statuses and tracks how long a branch (if any are used) is open for. It's just a red/green dot on the repos as a heads-up.
The live demo is at mainline.dev/demo. No signup required, read-only because it's one shared team.
I'd be interested in feedback on how else I can help teams get better at CI and CD, and on any gripes you have with the parasitic management layer of non-practitioners who use other tools to implement Taylorist methods. Also, any other features that I haven't thought of (or thought of removing).
AgentKanban for VS Code – A task board with agent harness integration #
The web app and remote boards are at: https://www.agentkanban.io
The VS Code extension is at VS Code Marketplace (https://marketplace.visualstudio.com/items?itemName=appsoftw...) or the Open VSX Registry (https://open-vsx.org/extension/appsoftwareltd/agent-kanban-v...).
The TLDR It's a collaborative Kanban board / task management app which supports hand off to Github CoPilot in VS Code, and captures the ongoing user / agent conversation context on the task for resumption in new chats (with context curation tools).
The context collection ignores tool use to prevent bloat in the captured context. AgentKanban also has features for improving agentic coding session quality such as an optional plan / todo / implement workflow and support for Git worktree creation and clean up for working on concurrent tasks.
The tool is an evolution of an earlier VS Code kanban extension (https://marketplace.visualstudio.com/items?itemName=AppSoftw...) I built which proved fairly popular but only catered for a local file based workflow.
The new version with the remote board improves the reliability of context capture, with lots of developer experience improvements. It's a tool that I use everyday in my own agentic coding workflows, and I can honestly say that it improves the quality of the code produced and reduces friction in organising working on concurrent features.
I hope you find it useful and would really appreciate your feedback on how you use it, what you think it does well, or any improvements you think could be added.
Many thanks for your time reading this
I spent $100 in Claude tokens and 1k battles training my AI tank #
It is a small game where an AI agent writes the logic for your tank. You watch it fight, give strategic feedback, let the agent update the tank code, and send it back into battle.
I have run 1,000+ battles on my own tank and spent about $200 in Claude credits improving it. The part I enjoy most is not just winning, but watching the tank make visible mistakes, thinking of a better strategy, and seeing whether Claude can turn that into better code.
HYPD – AI co-pilot for marketers running Google Ads #
Thesis behind it: just like programming is "solved" but engineering is not, ad-ops and media buying will be solved, but account strategy and human creativity remain the leverage.
Background: I founded PubNative (acquired by Verve Group), was Co-CEO at Verve Group, and for the last year we've been "taming" LLMs when working with structured and unstructured data. So far we got more than 200+ agencies and freelancers onboarded.
Hard parts so far: (1) data accuracy, (2) understanding the gaps in LLM knowledge of the Google Ads API, (3) adding enough context to make answers fit what professional marketers expect.
Free trial + free tier on the site. Happy to enable demo accounts for anyone who wants to test it without connecting their account.
Petri – Drop-in Postgres image that forks a DB per test #
It's a drop-in Postgres image, with a Golang proxy. :5432 is passthrough, :5433 forks the DB per conn (CREATE DATABASE … TEMPLATE …, dropped on disconnect).
If you use it, let me know what you like or don't like, so I can make it better. Cheers!
Twatch – Rewind, search, and diff TUI applications #
I built a Rust-based terminal tool that wraps TUI applications and records their screen changes, so you can rewind, search, and diff previous screen states.
It runs the target program through a PTY, captures screen snapshots while the app is running, and lets you move back through the recorded history. You can search snapshots by string or regular expression, and highlight areas that changed between frames.
The original idea came from working on baeru, a tool for adding visual effects and color changes over existing TUI applications: https://github.com/blacknon/baeru
While building it, I started wondering whether wrapping an existing TUI as a passthrough layer could have more practical uses.
This tool is also related to hwatch, another project of mine that adds history, diff, and hook functionality to watch: https://github.com/blacknon/hwatch
In this case, the idea is similar, but applied to interactive TUI applications rather than repeated command output.
I’d be interested to hear what people think, especially about possible use cases or similar tools I may have missed.
Diom – Open-source back end primitives with no runtime dependencies #
Diom includes implementations for common backend primitives such as cache, key-value, idempotency, rate-limiting, queues, and streams, with more on the way.
While building Svix, we had to reimplement the same backend primitives that everyone have to reimplement. We also constantly felt the tension between building something custom on top of existing infra (like Redis and Postgres) and adding more dedicated services (like RabbitMQ and Kafka) which we would then need to configure, operate, back up, and maintain. This was even worse for us because Svix is open-source, so additional infrastructure meant additional burden on our customers.
Six months ago we finally decided to build Diom, and focus on developer experience and ease of operation. It's open source, self-contained, and manages its own storage using fjall (a fast LSM-tree-based storage similar to RocksDB). It requires no external runtime dependencies (no redis/postgres/kafka/etc), and supports running as a single node or a highly-available Raft based cluster.
The goal of Diam is to provide developers with the backend primitives they need without having to write custom code on top of Redis, RabbitMQ, Kafka, or even need to run them at all. It currently supports cache, key-value, idempotency, rate-limiting, queues, and streams. We also plan on adding auth-tokens, distributed settings, feature flags, and other common components; as well as adding more functionality to existing components.
Diom favors ease of operation over scale, so it doesn't match Kafka-level throughput or very high QPS like Redis and Dragonfly. However, most products and developers don't process multiple terabytes and billions of events per second anyway. That said, Diom can still hit high performance for its target use-cases as it implements higher-level primitives rather than basic operations. Additionally, because the primitives live in the same process as the storage, there are fewer network round-trips, which keeps latency low.
It uses HTTP/2 with msgpack as the wire protocol (works fine from browsers), and ships a CLI and SDKs for Python, TypeScript, Rust, Go, and Java, with more on the way.
We have Svix fully ported to Diom and continuously running tests and simulated workloads in one of our staging environments. GA (general availability) is planned for later this year, once we've moved Svix production workloads over.
Repo (MIT licensed): https://github.com/svix/diom
Docs: https://docs.diom.com
Live playground: https://diom.com/playground
I'm excited to finally share Diom, and would love to hear what everyone thinks, and what other components you would like us to build! Would also love help figuring out what to call this. We currently say "component platform," but I'm not a fan of the name.
TrueCitation – academic source credibility checker (URL/DOI/journal) #
Chimera, a leaderless runtime with selectable settlement profiles #
Hashiverse, an open-source decentralized social network in Rust #
Hashiverse (https://github.com/hashiverse/hashiverse) is an open-source decentralized social network protocol where Sybil
resistance, rate limiting, peer reputation, and content moderation all fall out of one design choice: every action carries a
proof-of-work cost calibrated to how much abuse it could cause. No central servers, no DNS dependency, no registration authority,
no moderation team. Rust core, WASM browser client, volunteers on $5 VPS machines.
Twitter-shaped (posts, follows, hashtags, timelines). The design problem that usually kills these projects on day one is Sybil
resistance without a gatekeeper, so that is what I most want feedback on. Signatures and encryption are conventional (ed25519 +
ML-DSA + FN-DSA, ChaCha20Poly1305, Blake3). The interesting surface is how every protocol action is priced in proof-of-work
calibrated to its abuse potential.
Shared primitive: a data-dependent chain over 17 hash algorithms. 5 rounds, each selecting one of 17 algorithms (Blake2s/b,
SHA-2/3 at 256/384/512, Keccak-256/384/512, Groestl-256/512, Whirlpool, Skein-256/512, Blake3) and applying it 1 or 2 times. The
algorithm index and repetition count for round N come from bytes of round N-1's output, so dispatch is data-dependent and only
resolved at runtime.
Honest prior art: Evan Duffield's X11 (Dash, 2014) chained 11 SHA-3 finalists with exactly this thesis. X11 ASICs (Baikal,
iBeLink) shipped by 2016. Multi-hash chaining delays ASICs, it does not prevent them. What's different here is data-dependent
dispatch (X11's pipeline is fixed) and variable repetition count. The honest question is not "is this ASIC-proof?" but "how much
delay does data-dependent dispatch buy, and what software-update cadence should a protocol with no upgrade authority plan for?"
Layer 1: Server-ID PoW (DHT membership). Generating a server identity means grinding a salt with the server's public keys through
the chained hash until the derived 256-bit Kademlia ID has enough leading zero bits. Hours on commodity hardware per identity.
Two compounding mitigations: bucket location IDs rotate on a monthly time epoch (the keyspace region around a user shifts
deterministically), and prolific users fan across more buckets as the hierarchy subdivides under load. An attacker pays admission
PoW against a moving target whose surface grows with the target's prolificness.
Layer 2: RPC PoW. Every RPC carries a PoW over (timestamp, salt, payload, client ID, destination server ID). Under-threshold
requests are rejected before payload parse. Timestamp pinning prevents replay; ID pinning prevents reuse across (client, server)
pairs. Knock-on: because the destination server's ID is in the PoW, servers handling real load accumulate a routing-table
reputation. A fresh Sybil has no traffic history; to affect the routing table they must either be useful or grind their own fake
reputation by paying RPC PoW for every fabricated client request. Useful work becomes a Sybil deterrent.
Post submission is a sub-case: two-phase Claim/Commit so one cheap PoW cannot deliver a huge payload. Submission difficulty
scales with recent posting frequency.
Layer 3: Per-feedback PoW. No central tally. Every signal (like, dislike, hate speech, spam, CSAM, etc.) is a PoW-stamped entry
over (post_id, feedback_type), so a PoW cannot be reused across signals or posts. We use straightforward statistics to infer the
total number of feedback submissions as the reciprocal of the unlikelihood of the globally-maximum PoW per (post_id,
feedback_type) pair. That maximum is healed by clients noticing discrepancies, not by server-to-server gossip.
If any of this resonates, or you spot something I've gotten wrong, I would love to hear it. PRs welcome.
-- Jimme JardineDart Live – compiler, VM, analyzer and hot reload on the web via WASM #
It's 7.6 MB gzipped and there's no server running behind it, so I was able to host it directly on github pages.
Here's the github repo with some more info: https://github.com/modulovalue/dart-live
Clock Face Generator [video] #
Create and customize clock faces by playing around with the parameters. This is still a work in progress, but there's already a nice collection of features. Checkout the Gumroad page for details.
Download the Clock Face Generator Blender Extension: https://salaivv.gumroad.com/l/blender-clock-generator
Elecz – MCP server for real-time electricity prices in 40 countries #
spot prices cheapest hours contract comparison data
Data comes directly from ENTSO-E, Octopus Agile, AEMO, ERCOT, CAISO, NYISO, JEPX and other official market operators. Updated every 15–60 minutes depending on market. No API key or account required. Works with Claude, Cursor, n8n, and other MCP-compatible clients. https://elecz.com
I mage GhosttyFX, a JavaFX terminal view that uses libghostty #
And for Clojurists that use Cljfx, I also made https://github.com/cljfx/ghosttyfx — a wrapper that exposes GhosttyFX as a reactive view.