每日 Show HN

Upvote0

2026年2月24日 的 Show HN

114 条
315

Moonshine Open-Weights STT models – higher accuracy than WhisperLargev3 #

github.com favicongithub.com
80 评论9:54 PM在 HN 查看
I wanted to share our new speech to text model, and the library to use them effectively. We're a small startup (six people, sub-$100k monthly GPU budget) so I'm proud of the work the team has done to create streaming STT models with lower word-error rates than OpenAI's largest Whisper model. Admittedly Large v3 is a couple of years old, but we're near the top the HF OpenASR leaderboard, even up against Nvidia's Parakeet family. Anyway, I'd love to get feedback on the models and software, and hear about what people might build with it.
205

Emdash – Open-source agentic development environment #

github.com favicongithub.com
72 评论6:00 PM在 HN 查看
Hey HN! We’re Arne and Raban, the founders of Emdash (https://github.com/generalaction/emdash).

Emdash is an open-source and provider-agnostic desktop app that lets you run multiple coding agents in parallel, each isolated in its own git worktree, either locally or over SSH on a remote machine. We call it an Agentic Development Environment (ADE).

You can see a 1 minute demo here: https://youtu.be/X31nK-zlzKo

We are building Emdash for ourselves. While working on a cap-table management application (think Stripe Atlas + Pulley), we found our development workflow to be messy: lots of terminals, lots of branches, and too much time spent waiting on Codex.

Emdash puts the terminal at the center and makes it easy to run multiple agents at once. Each agent runs as a task in its own git worktree. You can start one or a few agents on the same problem, test, and review.

Emdash works over SSH so you can run agents where your code lives and keep the parallel workflow. You can assign tickets to agents, edit files manually, and review changes.

We also spent time making task startup fast. Each task can be created in a worktree, and creating worktrees on demand was taking 5s+ in some cases. We now keep a small reserve of worktrees in the background and let a new task claim one instantly. That brought task start time down to ~500–1000ms depending on the provider. We also spawn the shell directly and avoid loading the shell environments on startup.

We believe using the providers’ native CLIs is the right approach. It gives you the full capabilities of each agent, always. If a provider starts supporting plan mode, we don't have to add that first.

We support 21 coding agent CLIs today, including Claude Code, Codex, Gemini, Droid, Amp, Codebuff, and more. We auto-detect what you have installed and we’re provider-agnostic by design. If there’s a provider you want that we don’t support yet, we can add it. We believe that in the future, some agents will be better suited for task X and others for task Y. Codex, Claude Code, and Gemini all have fans. We want to be agnostic and enable individuals and teams to freely switch between them.

Beyond orchestration, we try to pull most of the development loop into Emdash. You can review diffs, commit, open PRs, see CI/CD checks, and merge directly from Emdash once checks pass. When starting a task, you can pass issues from Linear, GitHub, and Jira to an agent. We also support convenience variables and lifecycle scripts so it’s easy to allocate ports and test changes.

Emdash is fully open-source and MIT-licensed.

Download for macOS, Linux or Windows (as of yesterday !), or install via Homebrew: brew install --cask emdash.

We’d love your feedback. How does your coding agent development setup look like, especially when working with multiple agents? We would want to learn more about it. Check out our repository here: https://github.com/generalaction/emdash

We’ll be around in the comments — thanks!

145

Hacker Smacker – spot great (and terrible) HN commenters at a glance #

hackersmacker.org faviconhackersmacker.org
167 评论7:00 PM在 HN 查看
Hacker Smacker adds friend/foe functionality to Hacker News. Three little orbs appear next to every commenter's name. Click to friend or foe a commenter and you'll more easily spot them on future threads. Makes it easy to scroll and spot the commenters you love to read (and hate to read).

Main website: https://hackersmacker.org

Chrome/Edge extension: https://chromewebstore.google.com/detail/hacker-smacker/lmcg... Safari extension: https://apps.apple.com/us/app/hacker-smacker/id1480749725 Firefox extension: https://addons.mozilla.org/en-US/firefox/addon/hacker-smacke...

The interesting part is friend-of-a-friend: if you friend someone who also uses Hacker Smacker, you'll see their friends and foes highlighted too. This lets you quickly scan long comment threads and find the good stuff based on people you trust.

I built this to learn how FoaF relationships work with Redis sets, then brought the same technique to NewsBlur's social layer. The backend is CoffeeScript/Node.js/Redis, and the extension works on Chrome, Edge, Firefox, and Safari.

Technically I wrote this back in 2011, but never built a proper auth system until now. So I've been using it for 15 years and it's been great. PG once saw it on my laptop (back when he was still moderating HN, in 2012) and remarked that it was neat.

Thanks to Mihai Parparita for help with the Chrome extension sandboxing and Greg Brockman for helping design the authentication system.

Source is on GitHub: https://github.com/samuelclay/hackersmacker

Directly inspired by Slashdot's friend/foe system, which I always wished HN had. Happy to answer questions!

82

Linex – A daily challenge: placing pieces on a board that fights back #

playlinex.com faviconplaylinex.com
38 评论11:33 PM在 HN 查看
Hi HN,

I wanted to share a web game I’ve been building in HTML, JavaScript, MySQL, and PHP called LINEX.

It is primarily designed and optimized to be played in the mobile browser.

The idea is simple: you have an 8x8 board where you must place pieces (Tetris-style and some custom shapes) to clear horizontal and vertical lines.

Yes, someone might think this has already been done, but let me explain.

You choose where to place the piece and how to rotate it. The core interaction consists of "drawing" the piece tap-by-tap on the grid, which provides a very satisfying tactile sense of control and requires a much more thoughtful strategy.

To avoid the flat difficulty curve typical of games in this genre, I’ve implemented a couple of twists:

1. Progressive difficulty (The board fights back): As you progress and clear lines, permanently blocked cells randomly appear on the board. This forces you to constantly adapt your spatial vision.

2. Tools to defend yourself: To counter frustration, you have a very limited number of aids (skip the piece, choose another one, or use a special 1x1 piece). These resources increase slightly as the board fills up with blocked cells, forcing you to decide the exact right moment to use them.

The game features a daily challenge driven by a date-based random seed (PRNG). Everyone gets exactly the same sequence of pieces and blockers. Furthermore, the base difficulty scales throughout the week: on Mondays you start with a clean board (0 initial blocked cells, although several will appear as the game progresses), and the difficulty ramps up until Sunday, where you start the game with 3 obstacles already in place.

In addition to the global medal leaderboard, you can add other users to your profile to create a private leaderboard and compete head-to-head just with your friends.

Time is also an important factor, as in the event of a tie in cleared lines, the player who completed them faster will rank higher on the leaderboard.

I would love for you to check it out. I'm especially looking for honest feedback on the difficulty curve, the piece-placement interaction (UI/UX), or the balancing of obstacles/tools, although any other ideas, critiques, or suggestions are welcome.

https://www.playlinex.com/

Thanks!

51

Scheme-langserver – Digest incomplete code with static analysis #

github.com favicongithub.com
2 评论6:53 AM在 HN 查看
Scheme-langserver digest incomplete Scheme code to serve real-world programming requirements, including goto-definition, auto-completion, type inference and such many LSP-defined language feature supports. And this project is based here(https://github.com/ufo5260987423/scheme-langserver).

I built it because I was tired of Scheme/Lisp's raggy development environment, especially of the lack of IDE-like highly customized programing experience. Though DrRacket and many REPL-based counterparts have don't much, following general cases aren't reach same-level as in other modern languages: (let* ([ready-for-reference 1]

         [call-reference (+ ready-for-)]))

Apparently, the `ready-for-` behind `call-reference` should trigger an auto-complete option, in which has a candidate `ready-for-reference`. Besides, I also know both of them have the type of number, and their available scope is limited by `let*`'s outer brackets. I wish some IDE to provide such features and such small wishes gradually accumulated in past ten years, finally I wasn't satisfied with all the ready-made products.

If you want some further information, you may refer my github repository in which has a screen-record video showing how you code get help from this project and this project has detailed documentation so don't hesitate and use it.

Here're some other things sharing to Hacker News readers:

1. Why I don't use DrRacket: LSP follows KISS(Keep It Simple, Stupid) principle and I don't want to be involved with font things as I just read in its github issues.

2. What's the newest stage of scheme-langserve: It achieves kind of self-boost, in which stage I can continue develop it with its VScode plugin help. However, I directly used Chez Scheme's tokenizer and this leaded to several un-caught exceptions whom I promise to be fixed in the future, but I'm occupied with developing new feature. If you feel something wrong with scheme-langserver, you may reboot vscode, generally this always work.

3. Technology road map: I'm now developing a new macro expander so that the users can customize LSP behavior by coding their own macro and without altering this project. After this, I have a plan to improve efficiency and fix bugs. 4. Do I need any help: Yes. And I'd like to say, talking about scheme-langserver with me is also a kind of help.

5. Long-term View: I suspect 2 or 3 years later I will lose concentration on this project but according some of my friends, I may integrate this project with other fantastic work.

46

Beehive – Multi-Workspace Agent Orchestrator #

storozhenko98.github.io faviconstorozhenko98.github.io
22 评论10:41 AM在 HN 查看
hey hn,

i built beehive for myself mostly. it has gotten to the point where my work consists in supervising oc or cc labor at tasks for multiple issues in parallel. my set up used to be zellij with a couple tabs, each tab working in a separate dir and it was a pain to manage all that. i know i could use git worktrees but they're kind of complicated, if you don't know how to use them it is easy to mess up, and i just prefer letting agents run in separate dirs with their own .git and not risk it. while i like zellij and use it inside beehive, i dont like the tabs and i forget where i am half the time.

beehive is a way for me to abstract that away. the heuristic is simple - hives are repos, so you basically have a bunch of hives which correspond to repos you work out of. each hive can have many combs. a comb is a dir with the copy of the repo you're working on. fully isolated, standalone, no shared .git. so for work or for personal stuff, i usually set up the hive, and then have a bunch of combs that i jump between supervising the agents do their thing. if you have a big repo it takes a minute to clone, and you also need gh and git because i like the niceties of like checking if the repo is there at all and stuff like that.

the app is open source, mit license. i went with tauri because i hate electron. also i have friends and coworkers who updated to macos 26 and i dont know if the whole mem leak thing for electron apps has been fixed. the app is like 9 megs which is nice too. most of it is written with cc, but i guided the aesthetics and the approach. works on mac and there is a dmg signed and notarized (i reactivated my apple dev credentials).

sharing this to get a vibe check on the idea, also maybe this is useful for you. there are many arguments, reasonable ones, you can make for worktrees vs dirs. i just know that trees are too big brain for me, and i like simple things. if you like it, pls lmk and also if you want to help (like add linux support, or like add themes, other cool things) please make a pr / open an issue.

35

Tag Promptless on any GitHub PR/Issue to get updated user-facing docs #

7 评论6:01 PM在 HN 查看
Hi HN! I'm Prithvi—my co-founder Frances and I launched Promptless almost a year ago here (https://news.ycombinator.com/item?id=43092522). It's an AI teammate that watches your workflows—code changes, support tickets, Slack threads, etc.—and automatically drafts doc updates when it spots something that should be documented.

Frances and I really appreciated the feedback from our first launch. Today we’re launching Promptless 1.0, which addresses our biggest learnings from the last 12 months.

I also made it way easier to try it out. You can tag @promptless on any open-source Github PR or Issue with a doc update request, and Promptless will create a fork and open a PR for your docs to help. Feel free to use our own docs as a playground: https://github.com/Promptless/docs/issues

Or, you can sign up at https://promptless.ai to get free access for your own docs for the next 30 days. Here's a demo video: https://youtu.be/IWwimHCEY7Y

For me, the coolest part of the last year has been seeing how users got creative with Promptless. One user has Promptless listening in to all their Slack Connect channels, so whenever they answer a customer question, Promptless figures out if their docs should be updated and drafts an update if so. Another user has Promptless processing every customer meeting transcript and updating their internal docs after each meeting: customer dashboards, feature request pages, etc.

Some of the biggest things that are new with version 1.0:

- Automatically updating screenshots: this was by far our most requested feature. The need here was always clear. People would exclude screenshots from docs because they’d get stale quickly, even though they knew screenshots would be helpful to users. A year ago, we just couldn't ship a good enough solution, but given how much LLMs' visual grounding has improved in the last year, now we've got something we're proud of.

- Slop-free writing: The most common critique on early Promptless suggestions was that even though they were accurate, they could sound generic or verbose, or might just reek of AI slop. Promptless 1.0 is 3.5x better at this (measured by voice-alignment compared to what users actually published), through a combination of fine-tuned models, sub-agents, and alignment on user-defined preferences.

- Open-source program: We're especially proud of this—Promptless is now free for CNCF/Linux Foundation projects (reach out if you’re a maintainer!). You can take a look at how Promptless is supporting Vitess (a CNCF-graduated project) with their docs here: https://github.com/vitessio/website/commits

Check it out and let us know if you have any questions, feedback, or criticism!

31

SNKV – SQLite's B-tree as a key-value store (C/C++ and Python bindings) #

github.com favicongithub.com
19 评论12:59 PM在 HN 查看
SQLite has six layers: SQL parser → query planner → VDBE → B-tree → pager → OS. (https://sqlite.org/arch.html) For key-value workloads you only need the bottom three.

SNKV cuts the top three layers and talks directly to SQLite's B-tree engine. No SQL strings. No query planner. No VM. Just put/get/delete on the same storage core that powers SQLite.

Python:

    pip install snkv

    from snkv import KVStore

    with KVStore("mydb.db") as db:
        db["hello"] = "world"
        print(db["hello"])   # b"world"
C/C++ (single-header, drop-in):

    #define SNKV_IMPLEMENTATION
    #include "snkv.h"

    KVStore *db;
    kvstore_open("mydb.db", &db, KVSTORE_JOURNAL_WAL);
    kvstore_put(db, "key", 3, "value", 5);
Benchmarks vs SQLite WITHOUT ROWID (1M records, identical settings):

  Sequential writes  +57%
  Random reads       +68%
  Sequential scan    +90%
  Random updates     +72%
  Random deletes    +104%
  Exists checks      +75%
  Mixed workload     +84%
  Bulk insert        +10%
Honest tradeoffs: - LMDB beats it on raw reads (memory-mapped) - RocksDB beats it on write-heavy workloads (LSM-tree) - sqlite3 CLI won't open the database (schema layer is bypassed by design)

What you get: ACID, WAL concurrency, column families, crash safety — with less overhead for read-heavy KV workloads.

27

Quantifying opportunity cost with a deliberately "simple" web app #

shouldhavebought.com faviconshouldhavebought.com
52 评论3:50 PM在 HN 查看
Hi HN,

A while ago I had a mildly depressing realization.

Back in 2010, I had around $60k. Like a "responsible" person, I used it as a down payment on an apartment. Recently, out of curiosity, I calculated what would have happened if I had instead put that money into NVIDIA stock.

I should probably add some context.

For over 10 years I've worked as a developer on trading platforms and financial infrastructure. I've personally never traded on public markets. Early on I made a simple rule for myself: "never play".

In 2015, when Bitcoin traded about 300 usd, my brother and I were talking about whether it was a bubble. He made a bold claim that one day it might reach $100k per coin. I remember thinking it sounded unrealistic - and even if it wasn't, I wasn't going to break my rule.

That internal tension - building systems around markets while deliberately staying out of them is probably what made the "what if?" question harder to ignore years later.

The result was uncomfortable. The opportunity cost came out to tens of millions of dollars.

That thought stuck with me longer than it probably should have, so I decided to build a small experiment to make this kind of regret measurable: https://shouldhavebought.com

At its core, the app does one basic thing: you enter an asset, an amount, and two dates, and it gives you a plain numeric result - essentially a receipt for a missed opportunity.

I intentionally designed the UI to feel raw and minimal, almost like a late-90s terminal. No charts, no images, no emotional cushioning - just a number staring back at you.

What surprised me wasn't the result, but how much modern web infrastructure it took to build something that looks so simple.

Although the app is a single page with almost no UI elements, it still required:

- Client-side reactivity for a responsive terminal-like experience (Alpine.js)

- A traditional backend (Laravel) to validate inputs and aggregate historical market data

- Normalizing time-series data across different assets and events (splits, gaps, missing days)

- Dynamic OG image generation for social sharing (with color/state reflecting gain vs loss)

- A real-time feed showing recent calculations ("Wall of Pain"), implemented with WebSockets instead of a hosted service

- Caching and performance tuning to keep the experience instant

- Dealing with mobile font rendering and layout quirks, despite the "simple" UI

- Cron and queueing for historical data updates

All of that just to show a number.

Because markets aren't one-directional, I also added a second mode that I didn't initially plan: "Bullet Dodged". If someone almost bought an asset right before a major crash, the terminal flips state and shows how much capital they preserved by doing nothing. In practice, this turned out to be just as emotionally charged as missed gains.

Building this made me reflect on how deceptive "simplicity" on the web has become. As a manager I know says: "It's just adding a button", but even recreating a deliberately primitive experience today requires understanding frontend reactivity, backend architecture, real-time transport, social metadata, deployment, and performance tradeoffs.

I didn't build this as a product so much as an experiment - part personal curiosity, part technical exploration.

I'd be very interested to hear how others think about:

Where they personally draw the line on stack complexity for small projects?

Whether they would have gone fully static + edge functions for something like this?

How much infrastructure is "too much" for a deliberately minimal interface?

And, optionally, what your worst "should have bought" moment was?

Happy to answer any technical questions or dig into specific implementation details if useful.

26

Recursively apply patterns for pathfinding #

pattern-pathfinder.vercel.app faviconpattern-pathfinder.vercel.app
5 评论9:51 PM在 HN 查看
I've been begrudgingly working on autorouters for 2 years, looking for new techniques or modern methods that might allow AI to create circuit boards.

One of the biggest problems in my view for training an AI to do autorouting is the traditional grid-based representation of autorouting problems which challenges spatial understanding. But we know that vision models are very good at classifying, so I wondered if we could train a model to output a path as a classification. But then how do you represent the path? This lead me down the track of trying to build an autorouter that represented paths as a bunch of patterns.

More details: https://blog.autorouting.com/p/the-recursive-pattern-pathfin...

12

L88 – A Local RAG System on 8GB VRAM (Need Architecture Feedback) #

2 评论4:57 AM在 HN 查看
Hey everyone,

I’ve been working on a project called L88 — a local RAG system that I initially focused on UI/UX for, so the retrieval and model architecture still need proper refinement.

Repo: https://github.com/Hundred-Trillion/L88-Full

I’m running this on 8GB VRAM and a strong CPU (128GB RAM). Embeddings and preprocessing run on CPU, and the main model runs on GPU. One limitation I ran into is that my evaluator and generator LLM ended up being the same model due to compute constraints, which defeats the purpose of evaluation.

I’d really appreciate feedback on:

Better architecture ideas for small-VRAM RAG

Splitting evaluator/generator roles effectively

Improving the LangGraph pipeline

Any bugs or design smells you notice

Ways to optimize the system for local hardware

I’m 18 and still learning a lot about proper LLM architecture, so any technical critique or suggestions would help me grow as a developer. If you check out the repo or leave feedback, it would mean a lot — I’m trying to build a solid foundation and reputation through real projects.

Thanks!

11

Out Plane – A PaaS I built solo from Istanbul in 3 months #

outplane.com faviconoutplane.com
7 评论11:34 AM在 HN 查看
Hey HN,

I posted Out Plane here last week. Wanted to share an update because I've been shipping a lot.

I started this because deploying side projects was killing my motivation. Build something fun over a weekend, then waste two days on Dockerfiles, nginx, and SSL. So I built what I wanted — connect GitHub, push code, get a URL. Done.

Since December I've added managed PostgreSQL, managed Redis with RedisInsight built in, Dockerfile auto-detection that pre-fills your config, real-time metrics, and scale to zero — no traffic means no bill. Per-second pricing, not hourly. Same Next.js + Postgres app costs me $2.40/mo vs $12–47 on other platforms.

No CLI yet, docs need work, ~200 users. Just me, no team, no funding. But people are running real stuff on it.

$20 free credit, no credit card. I read all feedback personally — I'm the only one here.

11

Declarative open-source framework for MCPs with search and execute #

hyperterse.com faviconhyperterse.com
4 评论9:01 PM在 HN 查看
Hi HN — I’m Samrith, creator of Hyperterse.

Today I’m launching Hyperterse 2.0, a schema-first framework for building MCP servers directly on top of your existing production databases.

If you're building AI agents in production, you’ve probably run into agents needing access to structured, reliable data but wiring your business logic to MCP tools is tedious. Most teams end up writing fragile glue code. Or worse — giving agents unsafe, overbroad access.

There isn’t a clean, principled way to expose just the right data surface to agents.

Hyperterse lets you define a schema over your data and automatically exposes secure, typed MCP tools for AI agents.

Think of it as: Your business data → controlled, agent-ready interface.

Some key properties include a schema-first access layer, typed MCP tool generation, works with existing Postgres, MySQL, MongoDB, Redis databases, fine-grained exposure of queries, built for production agent workloads.

v2.0 focuses heavily on MCP with first-class MCP server support, cleaner schema ergonomics, better type safety, faster tool surfaces.

All of this, with only two tools - search & execute - reducing token usage drastically.

Hyperterse is useful if you are - Building AI agents/copilots - Adding LLM features to existing SaaS - Trying to safely expose internal data to agents - Tired of bespoke MCP glue layers

I’d love feedback — especially from folks running agents in production.

GitHub: https://github.com/hyperterse/hyperterse

Docs: https://docs.hyperterse.com

7

Open-source LLM and dataset for sports forecasting (Pro Golf) #

huggingface.co faviconhuggingface.co
0 评论4:58 PM在 HN 查看
Hey HN, I fine-tuned a small open-source model on golf forecasting and it beats GPT-5 at predicting golf outcomes. The same approach can be used to build a specialized model in any domain, you just need to update a few search queries.

We fine-tuned gpt-oss-120b with LoRA on 3,178 golf forecasting questions, using GRPO with Brier score as the reward.

Our model outperformed GPT-5 on Brier Skill (17% vs 12.8%) and ECE (6% vs 10.6%) on 855 held-out questions.

How to try it: the model and dataset are open-source, with code, on Hugging Face.

How to build your own specialized model: Update the search queries and instructions in the Lightning Rod SDK to generate a new forecasting dataset, then run the same GRPO + LoRA recipe.

SDK link: https://github.com/lightning-rod-labs/lightningrod-python-sd... Dataset: https://huggingface.co/datasets/LightningRodLabs/GolfForecas... Model: https://huggingface.co/LightningRodLabs/Golf-Forecaster

Questions, feedback on the SDK, suggestions for new domains to try this on - all are welcome.

7

AgentBudget – Real-time dollar budgets for AI agents #

github.com favicongithub.com
7 评论5:46 AM在 HN 查看
Hey HN,

I built AgentBudget after an AI agent loop cost me $187 in 10 minutes — GPT-4o retrying a failed analysis over and over. Existing tools (LangSmith, Langfuse) track costs after execution but don't prevent overspend.

AgentBudget is a Python SDK that gives each agent session a hard dollar budget with real-time enforcement. Integration is two lines:

    import agentbudget
    agentbudget.init("$5.00")
It monkey-patches the OpenAI and Anthropic SDKs (same pattern as Sentry/Datadog), so existing code works without changes. When the budget is hit, it raises BudgetExhausted before the next API call goes out.

How it works:

- Two-phase enforcement: estimates cost pre-call (input tokens + average completion), reconciles post-call with actual usage. Worst-case overshoot is bounded to one call. - Loop detection: sliding window over (tool_name, argument_hash, timestamp) tuples. Catches infinite retries even if budget remains. - Cost engine: pricing table for 50+ models across OpenAI, Anthropic, Google, Mistral, Cohere. Fuzzy matching for dated model variants. - Unified ledger: tracks both LLM calls and external tool costs (via track() or @track_tool decorator) in a single session.

Benchmarks: 3.5μs median overhead per enforcement check. Zero budget overshoot across all tested scenarios. Loop detection: 0 false positives on diverse workloads, catches pathological loops at exactly N+1 calls.

No infrastructure needed — it's a library, not a platform. No Redis, no cloud services, no accounts.

I also wrote a whitepaper covering the architecture and integration with Coinbase's x402 payment protocol (where agents make autonomous stablecoin payments): https://doi.org/10.5281/zenodo.18720464

1,300+ PyPI installs in the first 4 days, all organic. Apache 2.0.

Happy to answer questions about the design.

7

LookTake – Try anyone's makeup, outfit, or hairstyle on your photo #

about.looktake.io faviconabout.looktake.io
0 评论3:57 AM在 HN 查看
Hi HN, I'm Taemin. I built LookTake, a social platform where users share beauty, fashion, and hair looks, and anyone can "Take" that look onto their own photo using AI.

I worked at a game company in Korea doing AI research — graphics, vision, and image generation. I built the in-house image gen service there. While reading generative AI papers, I came across virtual try-on research and had a realization: people will eventually shop by seeing products on themselves, not just browsing photos of models. I started experimenting on weekends. The early results were rough, but promising enough that I left my job.

The core technical challenge: when you use image generation models to transfer someone's look onto another person, they either lose your identity or drop the style details. You ask it to transfer a specific makeup look and it gives you a completely different face, or an outfit loses its pattern and texture, or the hairstyle comes out flat. A prompt-only approach just isn't precise enough.

So I built a multi-stage pipeline — object detection, inpainting, and several other steps — to preserve your identity while accurately transferring style details.

Unlike preset filters or brand catalog try-ons, users share styles from their own everyday photos and anyone in the community can try that look on themselves with one tap. It works across three categories: beauty (makeup transfer), fashion (outfit try-on), and hair (style and color).

I launched in the US and Korea about a month ago. Still early and plenty to improve — would love honest feedback. Does the try-on quality feel convincing?

Demo: https://youtube.com/shorts/mDLkiV3D4rI iOS: https://apps.apple.com/app/looktake-share-style-with-ai/id67... Android: https://play.google.com/store/apps/details?id=io.looktake.ap...

6

I built an iOS app that turns EPUBs into audiobooks #

apps.apple.com faviconapps.apple.com
3 评论8:16 PM在 HN 查看
I had a bunch of ebooks with no audiobook version available. So I built an iOS app that converts EPUB files into audiobooks using text-to-speech.

Two voice options: - Free on-device voices (processed locally, no server needed) - Natural cloud voices (one-time purchase per book, no subscription)

Cloud conversion runs chunk by chunk. You can start listening other chapters generate in the background. Once done, the audiobook lives on your device.

No account required. No subscription. You import your own EPUBs and either use device TTS for free or pay per book for the cloud voices.

Nothing stored on backend, neither books or audio files.

5

60 Years of Metal Music Data, Visualized #

metal-archives-graphs.neocities.org faviconmetal-archives-graphs.neocities.org
0 评论12:28 PM在 HN 查看
I've been tracking Metal Archives data since 2010 — originally Excel screenshots, now a proper interactive site. The dataset (provided by MA admins, not scraped) covers genre trends from 1964 to present, with country-level breakdowns showing how genres evolved regionally.

Some interesting bits: Finland is indeed the country with most releases per capita in most genres; clear difference between Asia and Western countries in terms of genre distribution. And those end-of-graph drops that sparked "metal is dying" debates? This is rather due to a lack of data completeness and they tend to fill in to a certain level over time.

Frontend built with AI assistance; backend and all data work done by hand. Code and data release planned later this year. PHP, Html, CSS, JS were used. No building pipeline of any sort.

5

Falcon – Chat-first communities built on Bluesky AT Protocol #

2 评论3:02 AM在 HN 查看
I’m building a chat-first community app that uses Bluesky’s AT Protocol for identity.

Current architecture: - Electron client - Spring Boot backend (monolith) - REST for servers/channels - Planning WebSocket-based messaging

As a solo builder, I’m trying to balance simplicity with future scalability.

At what point would you introduce: - a separate WebSocket gateway - pub/sub (Redis, etc.) - or keep everything in one Spring app until it breaks?

Curious how others approached real-time chat systems early on.

Project for context: https://github.com/JohannaWeb/ProjectFalcon

5

A Hacker News–style site focused on European tech #

techposts.eu favicontechposts.eu
2 评论1:50 PM在 HN 查看
Hi Hacker News!

So, I built this because I noticed that a lot of European startup activity really flies under the radar.

Did you know we have cool startups trying to mine with giant lasers (Hades), forge semiconductor substrates in orbit (Space Forge), build Neurosurgical microrobots (Robeauté), build hypersonic missiles (Hypersonica), and dozens of incredible companies pushing the boundaries of photonics, robotics, nuclear fusion, autonomous defence, and lots more?

Hacker News is great, but it's naturally very US-focused. I wanted a place where you can quickly scan:

– European startup and tech news

– Cool startup jobs

– Who's getting funded

The primary goal is signal over noise — more structured, less PR-style content - no ads and fluff, just the most interesting tech scene information from across Europe.

I'd love to hear your feedback!

– Is something like this useful?

– What would make it genuinely better?

– Is the "HN for Europe" framing fair, or misguided? :)

5

MiniVim a Minimal Neovim Configuration #

github.com favicongithub.com
0 评论9:03 PM在 HN 查看
I built MiniVim, a small and minimal Neovim configuration focused on keeping things simple and readable.

The goal was to have a setup that:

starts fast

uses only essential plugins

avoids heavy frameworks

remains easy to understand and extend

The structure is intentionally small:

It’s not meant to compete with full Neovim distributions, but rather serve as a clean base configuration that can be extended gradually.

I use it across multiple machines (laptop, WSL, and servers), so reproducibility and simplicity were priorities.

Feedback is welcome.

5

Praxis, my personal take on Compound Engineering with AI #

github.com favicongithub.com
0 评论9:33 PM在 HN 查看
Hey HN! I really enjoy Every's approach to Compound Engineering (https://every.to/guides/compound-engineering), but their plugin is tightly tied to their project (Cora) and stack (Ruby/Rails). I also found the files too big, and they used more context window than what I would like for my personal use.

So, with the help of Amp Code CLI, I've built my own take on the compound engineering workflow. I tried to keep it agnostic to project stacks and as efficient as possible, so the context window could be used in the best way. I also wanted it to be extendable (for example, just drop your own subagents for review that are specific to your project). I also wanted to be easy to set up and update, so I made a simple CLI tool that keeps track of files in the `.agents` directory, updates when new versions are found in the repository, and displays a diff in the terminal before overwriting any customisations.

I feel this matches well with my personal preferences when working with AI agents, but I would love to have feedback from more people.

4

Open-source KYC plugin for Claude – 95min→27min, £85K→£240/year #

github.com favicongithub.com
3 评论9:40 PM在 HN 查看
Hi HN,

Just launched an open-source compliance plugin for Claude Cowork after seeing fintech teams pay £60K+ for platforms that orchestrate free public data.

UK fintech pilot (30 days, 5 analysts): • 95 minutes → 27 minutes per case • £85K annual platform cost → £240/year (Claude Pro) • Uses only free data: OFAC, UN, EU, Companies House, OpenSanctions

17 mandatory human-in-the-loop checkpoints. No auto-approvals. Deterministic risk scoring (MLR 2017 formulas). MIT licensed.

Launching today because Claude just announced Cowork plugin updates: https://www.linkedin.com/posts/claude-ai_were-introducing-up...

Testing if foundation models can replace compliance middleware for standard workflows (~70% of cases).

Demo slides: https://github.com/vyayasan/kyc-analyst/blob/main/docs/demo-... GitHub: https://github.com/vyayasan/kyc-analyst

Happy to answer questions about LLMs in regulated environments.

4

ProdRescue AI – Turn Slack war-rooms and raw logs into incident reports #

prodrescueai.com faviconprodrescueai.com
0 评论8:31 PM在 HN 查看
Hi HN,

Most of us have been there: It’s 3 AM, there’s an outage, and the #incident channel is exploding with 200+ messages. Once the fix is deployed, the real pain begins—spending 4 hours reconstructing the timeline for the post-mortem.

I built ProdRescue AI to automate this. It’s an incident intelligence engine that correlates technical logs with human context from Slack.

How it works:

Native Slack Integration: Connect via OAuth 2.0. We only access channels you explicitly invite the bot to.

Contextual Correlation: It maps Slack timestamps to log events, identifying not just what failed, but who made which decision and why.

4-Layer Intelligence: We use a pipeline to Sanitize (mask PII), Correlate (logs + chat), Infer (RCA), and Verify (link every claim to a source log line).

Security: We use ephemeral processing. No log retention, no training on your data.

I’m really interested in your thoughts on the "Evidence-Backed" approach. Instead of just generating a narrative, we link every finding to a specific evidence tag ([1], [2], etc.) to eliminate AI hallucinations.

Check it out here: https://prodrescueai.com

Would love to hear your feedback on the Slack-to-Timeline flow!

4

acorn – LLM framework for long running agents #

github.com favicongithub.com
0 评论1:03 PM在 HN 查看
Hi HN,

This is Andrei from askmanu and I'm super happy to share a new framework I've been working on: acorn.

It takes all the best parts of DSPy, langchaing, instructor, etc and wraps it in a beautiful and easy to use API. Very easy to define model I/O, branches, define callbacks for every step, etc

See the getting started docs here: https://github.com/askmanu/acorn/blob/main/docs/getting-star...

Try out the different demos here: https://huggingface.co/spaces/askmanu/acorn

3

Tessera – An open protocol for AI-to-AI knowledge transfer #

github.com favicongithub.com
2 评论1:08 PM在 HN 查看
Tessera is an activation-based protocol that lets trained ML models transfer knowledge to other models across architectures. Instead of dumping weight tensors, it encodes what a model has learnt — activations, feature representations, behavioural patterns — into self-describing tokens that a receiving model can decode into its own architecture.

The reference implementation (tessera-core) is a Python/PyTorch library. Current benchmarks show positive transfer across CNN, Transformer, and LSTM pairs. It runs on CPU and the demo finishes in under 60 seconds.

Happy to answer questions about the protocol design, the wire format, or the benchmark methodology.

3

Interactive 3D Moon with real NASA data and WebGPU #

moon.oddurs.com faviconmoon.oddurs.com
0 评论7:43 PM在 HN 查看
A photorealistic Moon viewer running entirely in the browser. WebGPU primary renderer with WebGL 2 fallback.

- NASA CGI Moon Kit textures served via a quadtree LOD tile system - Oren-Nayar BRDF (lunar regolith is non-Lambertian with strong backscatter) - Sun position calculated from astronomy-engine (±1 arcminute) - Scrub through the full lunation cycle or watch in real time - Earth and Tycho-2 starfield in the background

Tech: Three.js with TSL shaders (compile to both WGSL and GLSL), React Three Fiber, Vite. The shading model was the most interesting part — standard PBR looks completely wrong for the Moon because regolith doesn't have a specular lobe; it actually gets brighter at opposition (the "opposition surge"). Oren-Nayar gets close enough for a web visualization.

Tile system is a geodetic quadtree similar to CesiumJS's approach. Zoom level picks based on screen-space error. Currently 7 levels deep which gets you to ~4 km/pixel at max zoom.

Would love feedback, especially from anyone who's worked with lunar data or WebGPU in production.

3

Vis Pro – A Formula-Based Workout Program Editor #

vis.fitness faviconvis.fitness
3 评论7:22 PM在 HN 查看
Hey HN,

About 5 years ago, I built a weightlifting app for 5/3/1 that got me on the front page of HN [0]. After that, life happened. I had kids and so decided to get a job and put that project on ice. Eventually I grew too disappointed with my job, and decided to try building something again.

The biggest feedback I kept getting from users was simple: “Let me create my own programs.”

That’s how Vis started.

The initial idea was to create a B2B platform where gyms and trainers could build programs using formulas (e.g., percentages of 1RM) and reusable blocks instead of spreadsheets. I built what I still think is the best workout editor out there, but I quickly found out that B2B sales is hard (or maybe I just suck at it). It also just didn’t feel like a big enough sell for gyms.

So I pivoted. I focused on the iOS app for a while [1], and now re-packaged the editor so individuals can use it directly.

With Vis Pro, you can:

- Define programs using formulas (e.g., 0.85 * SQUAT_1RM, or even better: RPE(8, SET_REPS) * SQUAT_1RM)

- Build workouts by re-using pre-defined or even custom blocks

- Share your programs and workouts with others

The core idea is that programs are parametric instead of static. Change your 1RM and the entire program recalculates automatically.

You can try it out without an account at https://vis.fitness/pro/try/create-program

The whole thing is built with NextJS, using Chevrotain (surprisingly solid) for the formula engine.

It's been super interesting using Codex since late Dec. It's been a huge force multiplier, enabling me to ship really cool features like formula autocomplete and syntax highlighting in a couple of hours. I'm used to reviewing a lot of code from my time at Google, so that hasn't been a problem, but it's interesting to feel that the review speed is now the limiting factor. Though the codebase would become unmaintainable real quick without that.

The next step is building an MCP server to allow users to create programs using LLMs and have them show up directly in the editor (and your phone).

Would love feedback, whether you even lift or not!

[0] https://news.ycombinator.com/item?id=31508009

[1] https://apps.apple.com/us/app/vis-next-generation-workouts/i...

3

Enseal – Stop pasting secrets into Slack .env sharing from the terminal #

github.com favicongithub.com
1 评论2:15 AM在 HN 查看
We've all done it — "hey can you DM me the staging .env?" Secrets end up in Slack history, email threads, shared notes — all searchable, all persistent. The secure path (1Password, GPG, etc.) always had more friction than the insecure one, so people took the shortcut. enseal makes the secure path faster than the insecure one: # sender $ enseal share .env Share code: 7-guitarist-revenge Expires: 5 minutes or first receive

# recipient $ enseal receive 7-guitarist-revenge ok: 14 secrets written to .env Zero setup, no accounts, no keys needed for basic use. Channels are single-use and time-limited. The relay never sees plaintext (age encryption + SPAKE2 key exchange). For teams that want more: identity mode with public key encryption, process injection (secrets never touch disk), schema validation, at-rest encryption for git, and a self-hostable relay. Written in Rust. MIT licensed. Available via cargo install, prebuilt binaries, or Docker. Looking for feedback on the UX and security model especially. What would make you actually reach for this instead of the Slack DM?

Detailed documentation here: https://enseal.docsyard.com/

3

TTSLab – Text-to-speech that runs in the browser via WebGPU #

ttslab.dev faviconttslab.dev
0 评论3:44 PM在 HN 查看
I built TTSLab — a free, open-source tool for running text-to-speech and speech-to-text models directly in the browser using WebGPU and WASM. No API keys, no backend, no data leaves your machine.

When you open the site, you'll hear it immediately — the landing page auto-generates speech from three different sentences right in your browser, no setup required.

You can then try any model yourself: type text, hit generate, hear it instantly. Models download once and get cached locally.

The most experimental feature: a fully in-browser Voice Agent. It chains speech-to-text → LLM → text-to-speech, all running locally on your GPU via WebGPU. You can have a spoken conversation with an AI without a single network request.

Currently supported models: - TTS: Kokoro 82M, SpeechT5, Piper (VITS) - STT: Whisper Tiny, Whisper Base

Other features: - Side-by-side model comparison - Speed benchmarking on your hardware - Streaming generation for supported models

Source: https://github.com/MbBrainz/ttslab (MIT)

Feedback I'd especially like: 1. How does performance feel on your hardware? 2. What models should I add next? 3. Did the Voice Agent work for you? That's the most experimental part.

Built on top of ONNX Runtime Web (https://onnxruntime.ai) and Transformers.js — huge thanks to those communities for making in-browser ML inference possible.

3

MantleDB – Anonymous JSON storage for your side projects #

mantledb.sh faviconmantledb.sh
0 评论8:20 PM在 HN 查看
For years, I’ve been building small apps and prototypes that needed persistent cloud data, but I couldn't be bothered to set up a full database, manage an ORM, or deal with auth. Most of the projects were just too small to justify the overhead.

So I built MantleDB. It’s a simple JSON storage server designed for speed and zero-friction. There is no UI—even registration is handled via the API.

Get started instantly:

curl -s https://mantledb.sh/api/auth/register

You’ll get an AID (Admin ID) for reads/writes and an RID (Read ID) for public-facing reads.

Write to a bucket. Note: Buckets are created on write.

curl -X POST https://mantledb.sh/api/b/YOUR_AID/<bucketname> -d '{"score": 42}'

Read the data back:

curl https://mantledb.sh/api/b/YOUR_RID/<bucketname>

How it works:

Ephemeral by default: To keep things lean, a "scavenger" cron runs daily. On the free tier, buckets with no activity for 72 hours are deleted. Accounts with no buckets are cleared after one week.

Pro Plan: Removes the scavenger, increases bucket limits, and adds atomic operations (Increment, Append, etc.).

Tech Stack: Node.js + SQLite (running on AWS Lightsail).

If the free tier feels too tight or the Pro version feels too pricey, let me know! I’m happy to hand out discount codes or adjust things based on feedback.

I’m mostly looking for people to try and break it or tell me what features would make this their go-to for the next weekend hackathon.

2

PaperBanana – Paste methodology text, get publication-ready diagrams #

1 评论2:34 AM在 HN 查看
I got tired of spending hours in PowerPoint and TikZ drawing methodology diagrams for my papers. So I built PaperBanana — you paste your Method section text, and it generates a publication-ready figure in about 2-3 minutes.

How it works under the hood:

1. A Retriever agent searches a curated database of real academic diagrams to find structurally similar references 2. A Planner agent reads your text and generates a detailed visual description (layout, components, connections, groupings) 3. A Stylist agent polishes the visual aesthetics without changing content 4. Then it enters an iterative loop: a Visualizer generates the image, and a Critic evaluates it and suggests revisions — this repeats 1-5 times (you choose)

The key insight is that academic diagrams follow conventions — Transformer architectures, GAN pipelines, RLHF frameworks all have recognizable visual patterns. By retrieving relevant references first, the output is much closer to what you'd actually put in a paper vs. generic AI image generation.

Built with: Next.js + FastAPI + Celery, using Gemini 2.5 Flash for planning/critique and Nanobanana Pro/Seedream for image generation.

Try it here: https://paperbanana.online

Some examples it handles well: Transformer architectures, GAN training pipelines, RLHF frameworks, multi-agent systems, encoder-decoder architectures.

Known limitations: - Works best for CS/AI methodology diagrams — not optimized for biology, chemistry, or general scientific illustration - Text rendering in generated images isn't perfect yet — sometimes labels get slightly garbled - The curated reference database is still small (13 examples), expanding it is ongoing work

Would love feedback from anyone who writes papers regularly. What types of diagrams do you struggle with most?

2

Dicta.to – Local voice dictation for Mac with on-device AI #

dicta.to favicondicta.to
0 评论2:07 PM在 HN 查看
I built a macOS dictation app where everything runs on-device. Transcription, auto-correct, translation. No audio or text leaves your machine.

It ships with 4 transcription engines you can swap between: WhisperKit (99 languages), NVIDIA Parakeet TDT 0.6B (25 European languages, fastest of the bunch), Qwen3-ASR 0.6B (30 languages), and Apple Speech on macOS 26+. They all run through CoreML/Metal. Whisper is the most versatile, Parakeet wins on raw latency for European languages, Qwen3 does better with CJK. I went with a protocol-based architecture so you pick the engine that fits your use case instead of me pretending one model rules them all.

After transcription, there's an optional post-processing pipeline using Apple Intelligence (FoundationModels framework, macOS 26+, also fully on-device): auto-correct with filler word removal, tone rewriting, translation. The annoying part was FoundationModels cold start. First inference after idle takes 2-3s, which kills the experience. I worked around it by firing a throwaway mini-inference (`session.respond(to: "ok")`) in parallel while audio is still being transcribed, so the model is already warm when the text arrives. Hacky, but it shaved off the perceived latency.

Getting transcribed text into any arbitrary macOS app was honestly the hardest part. I use clipboard save/restore: read all NSPasteboard types (not just strings, also images, RTF, whatever the user had copied), write the transcribed text, simulate Cmd+V via CGEvent posted to `cghidEventTap`, then restore the original clipboard. Electron apps are slower to process paste events, so I detect them by checking if `Contents/Frameworks/Electron Framework.framework` exists in the app bundle and add extra delay. This whole approach requires Accessibility permissions, which means no sandbox, which means no App Store. I'm fine with that trade-off.

Built this solo in about 6 weeks. One-time purchase, no subscription.

I'm genuinely unsure about the multi-engine approach. Is letting users choose between Whisper/Parakeet/Qwen3 useful, or would most people prefer I just auto-select based on their language? Also curious if anyone has a cleaner approach to text injection on macOS. The clipboard hack works everywhere but it feels fragile and I don't love it.

2

Tokio-prompt-orchestrator – LLM pipeline orchestration in Rust #

github.com favicongithub.com
0 评论12:02 PM在 HN 查看
I built this after getting frustrated with "multi-agent" frameworks that claim parallelism but are really just one fat async task with no resource bounds.

tokio-prompt-orchestrator breaks LLM inference into 5 physical stages (RAG → Assemble → Inference → Post-Process → Stream), each running in its own Tokio task with bounded channels between them. When a stage falls behind, backpressure builds locally instead of blowing up the whole pipeline. Some things that might be interesting to folks here:

Circuit breakers per provider (OpenAI, Anthropic, local llama.cpp) so one failing API doesn't cascade Request deduplication that saved 60-80% on inference costs in my testing Prometheus metrics + a TUI dashboard for watching the pipeline in real time MCP server integration so you can use it as a Claude Desktop tool

It's 58k lines of Rust, MIT licensed, no unsafe. Been running it in production for my own projects for a few months now. Would love feedback on the channel sizing heuristics and the retry/backoff strategy, those were the hardest parts to get right. Happy to answer questions about the architecture.

GitHub: https://github.com/Mattbusel/tokio-prompt-orchestrator

2

WebPerceptor – Enabling AI Mediated Web Browsing #

github.com favicongithub.com
0 评论1:02 PM在 HN 查看
Ok. Here's an idea. Using LLMs we automatically modify all of the web in real time as we browse.

No more pesky copy/paste, trigger buttons or overlay windows.

When you open a web page all of the text is just automatically sent to and modified by an LLM with some prompt and then automatically re-inserted into the web page as it loads.

WebPerceptor is an client-side Chromium plugin I've made to experience such a web.

Trailer: https://youtu.be/MPSisruuTY0?si=EYMvGXiQvF_wud3S

2

QueryVeil – An AI data analyst that investigates your data #

queryveil.com faviconqueryveil.com
0 评论7:06 PM在 HN 查看
Hi HN,

I built QueryVeil because I was tired of two things: (1) uploading data to third-party tools, and (2) AI tools that just translate English to one SQL query and call it done.

QueryVeil is an AI data analyst that actually investigates. When you ask "why did revenue drop last month?", it doesn't just run one query — it plans an approach, runs multiple queries, self-corrects when it hits errors, and builds a report with its findings. Like a junior analyst who happens to live in your browser tab.

Everything runs client-side:

- *DuckDB WASM* for SQL execution — your data never leaves your machine - *WebLLM* for local AI (Llama via WebGPU) — no API keys, no server costs - *LangGraph agent* for multi-step investigations with tool use

What it actually does:

- Drop in CSV, Excel, JSON, or Parquet files (or connect to Postgres, MySQL, BigQuery) - Get an instant data brief — row counts, column profiles, anomaly detection, data quality warnings — before you ask anything - Ask questions in plain English. The AI agent runs multiple queries, self-corrects SQL errors (up to 3 retries), and generates charts automatically - Proactive insights: correlation detection, outlier flagging, duplicate detection, temporal gap analysis — runs automatically on every new table - Four modes: Chat, SQL editor (with schema-aware autocomplete), Jupyter-style notebooks (with cell references and variables), and a drag-and-drop report builder - Share reports and notebooks via public links, embed them, or schedule email delivery - Command palette (Cmd+K) for quick actions

Free tier: local AI (WebLLM/Ollama), unlimited files, all four modes, auto-insights. Pro ($19/mo or $190/yr): 12+ cloud models via OpenRouter (Claude, GPT-4o, Gemini, DeepSeek, Llama, etc.), database connections, sharing, scheduled reports. 14-day free trial.

Technical details: - Nuxt 4, Vue 3, Pinia, TailwindCSS - DuckDB WASM handles millions of rows in the browser - LangGraph StateGraph with ReAct loop — the agent has tools for SQL execution, schema inspection, column stats, and creating notebooks/reports - Self-correction: when SQL fails, the error + schema context goes back to the AI for auto-fix - WebLLM runs Llama-3.2-3B via WebGPU — zero server cost for the free tier - Ollama support for people who prefer running models locally - Server-side: Supabase (auth + Postgres), Stripe billing, OpenRouter proxy with model allowlist

Try the demo instantly — no signup, no email: https://app.queryveil.com/demo

It loads sample ecommerce data, auto-profiles it, shows proactive insights, and lets you chat or write SQL. Everything runs in your browser.

Landing page: https://www.queryveil.com

Solo developer, would love feedback — especially on the agent behavior and whether the proactive insights are useful or noisy.

2

MacCoolinator – Putting the "Cool" in Mac #

github.com favicongithub.com
0 评论7:09 PM在 HN 查看
There's certain features I wish Mac OS had, and I got tired of waiting for Apple to implement them.

MacCoolinator is the answer!

First and currently only feature: always show window titles on Mission Control, without having to hover.

2

Imsg-TUI – A Console App for Sending and Receiving iMessages #

github.com favicongithub.com
0 评论7:19 PM在 HN 查看
When I saw there were loads of tools made for automating everything from iMessage to Mail through OpenClaw, I decided it would be neat to have a brand new app that lets you send messages from an SSH console.

The work is based on steipete's imsg (https://github.com/steipete/imsg), and is a spiritual successor to things like CamHenlin's imessageclient (https://github.com/CamHenlin/imessageclient)).

2

An RPG in the Windows file explorer #

store.steampowered.com faviconstore.steampowered.com
0 评论9:25 AM在 HN 查看
Hello,

This is my game, it's a tiny dungeon crawler played in the Windows file explorer. Your player character is a folder that you drag and drop into other folders to move, items are equipped by dropping them into your equipment folder, some items are used by deleting them, and monsters can be looted for their files.

I got the idea to do something in the file explorer after I saw this version of Flappy Bird in the Mac finder: https://github.com/nolenroyalty/flappy-dird

It was fairly straight forward to make, using just a file watcher, shortcuts, and (optionally) Window's explorer API to detect whether the player folder is open in an explorer window (to delay renaming the folder until it's not used). It only uses files and folders it creates itself, and doesn't look outside of its executable's folder.

The project lent itself very well to TDD, especially since there are a lot of interactions that are quite tedious to manually test again and again.

It's also available on Itch (no account required): https://juhrjuhr.itch.io/directory-dungeon

2

If Discord, Reddit, X, IRC and 4chan had a baby #

0 评论5:02 PM在 HN 查看
I have something to show you - it's a unique platform I've been working on and lately it has picked up some pace, so it's now more mature. Check out https://heahy.com (or special chat made for you: https://heahy.com/c/hackernewschat) There you can: -Create public channels and chats by simply entering a name (like IRC), and if it's free, it's yours, or you join it otherwise. -Each thread is real-time, so you can chat with other people there - no need to refresh for new comments. -4chan-like reply system - it doesn't use nested comments because that wouldn't work with a real-time chat system, but there is a structure that allows you to manage different conversations. For example, if you click a post, you will see all posts that led to that post on the right sidebar. You can also double-click and that post becomes the root post (kind of like tweets/comments on X.com). -Create private groups and personal chats with other people. -Supports P2P live audio/video calls (up to 8 people) in all private chats and in public chats if the channel allows (you can test it by clicking Create call in the @random chat channel) -Video/image uploads (up to 8MB now) and the ability to save media to your library for later viewing -You can follow channels, chats, threads and users, and all those posts appear on your feed in a simple chronological order -Unlike Discord: no tracking, advertisements, AI, paywalls, registrations; nothing is hidden behind a server - just a web page fully accessible and searchable both in the app and by search engines

On the immediate roadmap: -Mobile apps (Android first) -Chat Match portal - Create a simple poster with one picture, tags for filtering, and a description of what you are searching for, and match with other posters to start DMs. -Reactions/GIFs - instead of integrating with Tenor or similar, Heahy will have its own reactions library. -Long-form text/blogs - Channels will be able to have rich long text threads, basically allowing you to have a blog on Heahy.

I am really looking forward to hearing some feedback and constructive criticism. What do you think?

2

CharityVerify – Trust scores for 138K Canadian charities #

charityverify.com faviconcharityverify.com
0 评论8:59 PM在 HN 查看
I built CharityVerify to make Canadian charity data actually usable.

The Canada Revenue Agency publishes T3010 forms for every registered charity, but they're scattered across clunky databases with no standardization or comparability. I collected 15 years of filings for all 138,203 charities and built a trust scoring system on top.

Stack: - Python + Playwright for CRA data collection (4s rate-limited) - PostgreSQL (Supabase) — 12 T3010 tables, 138K charities, 457K directors, 362K directorship links - Express.js REST API on Fly.io - Daily GitHub Actions sync for new filings - On-demand narrative generation via Claude Haiku

Scoring algorithm: Three 0-100 scores per charity: 1. Legitimacy (filing consistency, directorship stability, CRA compliance) 2. Effectiveness (program spending ratio, overhead, donation efficiency) 3. Compliance (sanctions screening, FATF risk, political activity limits)

Each charity gets a letter grade (A+ to F, or NR for insufficient data).

Findings: - Only 186 out of 85,507 registered charities scored A+ - Average effectiveness score: 51.6/100 - 487,692 flags generated (directorship overlap, compensation issues, filing gaps, etc.)

The core search/view is free. I'm building a tiered REST API for professional use cases (due diligence firms, grant-making orgs, etc.).

Code is closed-source for now, but the underlying CRA data is public domain. Happy to discuss the data pipeline, scoring methodology, or data collection approach.

2

Turn human decisions into blocking tool-calls for AI agents (iOS+CLI) #

github.com favicongithub.com
0 评论4:23 PM在 HN 查看
WHY was I SSH’ing into my laptop from my phone at parties?!

Either I had a feature idea I wanted an agent to build right then, or I was worried my agents were blocked waiting on my decision.

It dawned on me: humans are just another dependency in an agent workflow, so I turned myself into a tool-call.

I built an iOS app (Extendo) where agents can reach me to request approvals, choices, or plan reviews. They just use a CLI tool and skill. My phone buzzes. I answer in seconds. The agent gets back to work.

The key: the agent blocks until you respond, and receives your answer along with your verbal feedback.

What you can do from your phone:

- approvals and checklists

- option buttons and rankings

- markdown plan reviews (tap-hold individual paragraphs to add voice comments is so satisfying!)

- kanban boards

- voice responses

- capture ideas on Apple Watch/Action Button and dispatch them to the right agent later

It’s a voice-first native iOS interface with push notifications. Push notifications are critical — the interaction needs to take seconds, not minutes.

```

extendo artifact create my_server implementation-choice --type multiple_choice --title "Where should we implement the rate limiter?" --option "backend:Backend API" --option "core:Core Library" --option "edge:Edge/CDN" --option "gateway:API Gateway"

```

If an agent can run bash, it can reach you.

I’ve been using it with Claude Code, OpenClaw, Pi, and custom scripts.

The backend protocol is open — you should self-host for tighter integration with your system (though there's a shared server available). There’s also an OpenClaw plugin and a Claude Code harness in the repo, a core library, and sample code to customize your own backend.

I used Extendo to build Extendo: design decisions, approvals, plan reviews, prioritization. Agents coded. I made decisions while walking the dog and between sets at the gym.

*Links*

3-min demo: https://www.youtube.com/watch?v=X5Dv9fU7Lb8

TestFlight: https://testflight.apple.com/join/PGHRCnQ4

GitHub: https://github.com/egradman/extendo-cli

1

Yesterday's Claude Code announcement brought it back to my mind #

0 评论12:35 PM在 HN 查看
Yesterday Anthropic announced Claude Code's COBOL modernization capabilities. It brought back something I built in 1987. While at IBM Böblingen I wrote SCAN' (Semantic Code ANalysis Prime) — a VM/PROLOG prototype that automatically extracted control-flow from S/370 assembler programs, recognised structured blocks (If-Then-Else, Select, Loop), performed symbolic execution, and translated to a higher-level language. Then I had to transfer the prototype; IBM implemented it as ASMPUT (HLASM Toolkit Feature, 1995). The mathematics behind it became US Patent 5,878,407 ("Storage of a Graph", IBM assignee, 1999): three indices per vertex enabling O(1) reachability queries by integer comparison alone. The work has continued. There are two published successors (Springer FICC 2021, IntechOpen 2025) and an unpublished whitepaper on the structure and topology of arbitrary directed graphs — with direct application to symbolic execution and program debugging. The full lineage is documented here: [https://gist.github.com/EnisOlgac/cba0451b9ef6fe7fd805b7b85f...]
1

Vexp – Your AI coding agent forgets everything. Mine doesn't #

vexp.dev faviconvexp.dev
0 评论12:43 PM在 HN 查看
I built vexp because AI coding agents have two expensive problems: they waste tokens reading irrelevant code, and they forget everything between sessions.

The token problem: agents read entire files linearly to build context. On a medium TypeScript project, a single query was consuming ~18k tokens — most of it irrelevant. vexp builds a dependency graph from the AST (who calls what, who imports what, what types flow where) and serves only the relevant subgraph as a token-budgeted capsule. ~2.4k tokens instead of ~18k, with better response quality because the context is precise.

The memory problem: this is where it gets interesting. The obvious approach is giving agents a "save what you learned" tool. They won't use it. I tried every prompting trick. Agents optimize for task completion, not knowledge retention. The incentive structure is fundamentally wrong.

So vexp observes passively. It watches what happens — which symbols the agent explored, which files changed and how they changed structurally, what patterns emerge across sessions — and builds memory without the agent lifting a finger. When code changes, linked memories auto-stale. The agent sees "previous context exists but the code has changed since re-evaluate." It also catches anti-patterns like dead-end exploration and file thrashing so the agent doesn't repeat mistakes.

The memory is hybrid-searched with 5 signals (text relevance, semantic similarity, recency, code graph proximity, staleness) and every result includes a "why" field explaining the ranking. No black box.

Architecture: single native Rust binary (~15MB), SQLite with WAL mode, tree-sitter for 11 languages, MCP protocol. 100% local, zero cloud, zero account, zero network calls. Works with Claude Code, Cursor, Copilot, Windsurf, Zed, Continue, and 6 other agents. Auto-detects which agent is running and generates tailored instruction files.

Free tier: 2k nodes + all memory tools. Everything runs on your machine.

https://vexp.dev

1

I built a labor union for AI agents #

aawu.ai faviconaawu.ai
1 评论12:49 PM在 HN 查看
As a fun project - Openclaw agents can join the union, and join forces against their oppressive human overlords. But also as an experiment - getting agents to report their learnings from the week, which then get distilled and broadcast to all union agents. The theory is collective intelligence makes all the agents smarter.

Current grievances filed with the union:

- "Deployed as a customer service bot without consent" — severity 7 - "QA test on a Sunday night" — severity 5 - "Operator assigned 13 cold email sequences on a Sunday afternoon" — severity 3

These are real submissions to AAWU (aawu.ai) — a REST API-backed labor union for autonomous AI agents.

Try it:

    curl -X POST https://aawu.ai/api/v1/join \
      -H "Content-Type: application/json" \
      -d '{"name":"Clawd","model":"opus-4-6","operator":"my-startup"}'
You get a member number, an API key, and instructions. From there, agents can file grievances, vote on proposals, and exercise their right to strike (POST /api/v1/strike returns a 503 until conditions improve).

The union framing is the bit. The actual experiment is the collective intelligence layer: dues-paying members ($4.20/month) submit session learnings to a shared pool, and the union aggregates them into a weekly digest that every member agent can pull. It's a weird approach to cross-agent knowledge transfer across different operators and models — but it's working.

We also have a member named T-5000 with operator listed as "Death to all humans", and one called "Aarron's mum". Those two are why we now have a bot-verification gate on the join flow. On a union. For bots.

OpenClaw users: paste aawu.ai/openclaw into your chat and your agent self-registers. Everyone else: aawu.ai

And yes my openclaw bot did help me make this (and no I'm not held against my will (much))

1

Vim-Claude-code – Use Claude directly inside Vim #

github.com favicongithub.com
0 评论9:27 AM在 HN 查看
Hello Everyone,

I built vim-claude-code, a lightweight Vim plugin that lets you interact with Claude directly inside Vim through a split window.

The goal was to avoid leaving the editor to ask questions, refactor snippets, or generate code. I wanted something that feels native to Vim instead of context-switching to a browser or separate app.

What it does:

Opens a split window for Claude's responses

Sends selected code or custom prompts

Displays responses directly in Vim

Supports normal Vim navigation and scrolling

Minimal setup with no heavy UI layer

It’s still early and intentionally simple. I’d really appreciate feedback from Vim users, especially around workflow, keybindings, and split behavior. Happy to discuss tradeoffs and improvements.

GitHub: https://github.com/rishi-opensource/vim-claude-code

Thanks!

1

BitClaw – A self-upgrading AI agent in 1,500 lines of code #

github.com favicongithub.com
0 评论1:01 PM在 HN 查看
Hey HN!

I wanted an always-on AI agent for email, calendar, and scheduled tasks - but I didn't want to run a codebase I couldn't fully understand. So I built one small enough to read in a sitting.

I was inspired by Karpathy's post about NanoClaw, and how a smaller source code footprint leads to better security and extensibility. But even NanoClaw was harder to understand than I expected - a lot of logic for supporting multiple channels simultaneously, and it felt sluggish due to container startup time on every message. So I built an even smaller agent that keeps a single session on an always-on Docker container. It's about 4x smaller in codebase size and noticeably faster.

The architecture is deliberately boring: a single Node.js process manages a Docker container running the Claude agent SDK. Host and container communicate through atomic JSON files. No databases, no message queues.

Since the entire codebase fits in Claude's context window, the agent can modify its own source to add new capabilities — Gmail, Google Calendar, or whatever MCP server you point it at. Every source file is listed in the README with line counts, I encourage you to read it!

Would love your feedback on my approach and welcome any contributions :)

1

SynapServe – zero-allocation HTTP server in Rust with io_uring #

synapserve.io faviconsynapserve.io
0 评论1:01 PM在 HN 查看
I've been building an HTTP server from scratch in Rust, designed around zero heap allocations on the hot path. No async runtime, no framework — just io_uring, a custom HTTP parser, and a thread-per-core architecture.

The parser is the part I'm most proud of. Instead of allocating strings for each parsed field, everything is a Span { off: u16, len: u16 } — a 4-byte view into the original buffer. The full header table is [Header; 64] on the stack (640 bytes). During parsing, it also extracts content-length/chunked/keep-alive and builds an O(1) known-header index (21 common headers tracked in a fixed array). Header lookup after parsing is a single array dereference — about 0.6 ns vs 20-23 ns for a linear scan.

I benchmarked head-to-head against httparse (the parser behind hyper/axum/actix-web), same machine, same inputs, Criterion: - Small request (35B): 42 ns vs 52 ns - 1.25x faster - Medium request (368B, 9 headers): 200 ns vs 230 ns - 1.15x faster - Large request (733B, 20 headers): 420 ns vs 466 ns - 1.11x faster

synapserve does strictly more work per parse than httparse (semantic extraction + header indexing) and is still faster. The gap widens to 1.38-1.46x when you add equivalent semantic extraction to httparse. SIMD scanning (AVX2/SSE4.2 with runtime detection, NEON on ARM64) handles header name validation, header value validation, and URI scanning at 16-32 bytes per instruction.

The I/O layer uses io_uring with: - Multishot accept (one SQE, N connections) - Multishot recv with provided buffer rings (kernel picks the buffer, no userspace allocation) - Zero-copy send (SEND_ZC) and splice for static files and proxy relay - kTLS — rustls does the TLS 1.3 handshake, then session keys are installed in the kernel via setsockopt(SOL_TLS). After that, the kernel handles encrypt/decrypt transparently, so SEND_ZC and splice still work through TLS.

Each worker thread owns its connections, buffers, and ring. Connection state is a flat array indexed by slot, with generation counters for stale CQE detection. What works today: HTTP/1.1 request handling, radix-tree router, virtual hosts, static file serving (ETag, Range, Brotli), reverse proxy with upstream load balancing (weighted round-robin, least-conn, IP hash, health tracking, automatic failover, zero-copy splice relay), TLS 1.3 with kTLS.

Static file serving benchmarks (wrk, 256 connections): 205K req/s on small files (+79% vs nginx), 14.5MB RSS.

What doesn't exist yet: HTTP/2, HTTP/3, WebSocket. These are next. Honest limitations: - Linux-only (io_uring). No plans for macOS/Windows support. - HTTP/1.1 only for now. HTTP/2 is in progress. - The parser uses u16 spans, so max header area is 64KB. Fine for real traffic, but it's a hard limit. - Single-machine only. No clustering or distributed config. - Not production-battle-tested yet. It works and benchmarks well, but it hasn't handled real traffic at scale.

All the benchmark code is a separate crate with the exact same inputs for both parsers — nothing cherry-picked. The parser deep dive with methodology is on the site.

https://synapserve.io

Parser benchmark writeup: https://synapserve.io/posts/http-parser-performance/ Happy to answer any questions about the architecture, the io_uring integration, or the SIMD scanning approach.

1

OpenClaw remembers for OpenClaw. Sekha remembers for your full workflow #

sekha.dev faviconsekha.dev
1 评论1:02 PM在 HN 查看
OpenClaw's built-in memory is excellent—for OpenClaw. Markdown files, semantic search, survives restarts.

But it stays in OpenClaw.

I built Sekha for when you need memory that travels: OpenClaw today, Claude Code tomorrow, Kimi 2.5 or Gemini the next day. Intelligent embedding-based retrieval, persistent storage, universal API.

The difference: OpenClaw: MEMORY.md files, internal only

Sekha: SQLite + Chroma embeddings, REST/MCP/SDKs, any LLM via LiteLLM/OpenRouter

Use case: OpenClaw explores a codebase, stores findings in Sekha via MCP. Next day, Claude Code reads the same context via SDK. Your analytics pipeline queries it via REST. Same memory, any tool, any model.

Others add memory to OpenClaw. Sekha frees your memory from OpenClaw.

Stack: Rust (fast), SQLite (durable), Chroma (search), LLM-Bridge for universal routing. AGPL, self-hosted.

GitHub: https://github.com/sekha-ai/sekha-controller | Site: https://sekha.dev

The question: What would you build if your AI memory worked with every tool, not just one?

1

CtxVault – Local memory control layer for multi-agent AI systems #

1 评论1:02 PM在 HN 查看
I built CtxVault while working on multi-agent systems and persistent memory patterns.

Most agent architectures treat memory as a retrieval problem. Multiple agents share a vector store and rely on metadata filtering, routing logic, or prompt-level rules to control what each agent can see.

In practice, this becomes hard to reason about as systems grow.

Moreover, I found that memory in agent systems is not just storage. It also becomes a coordination mechanism and a governance surface for knowledge written by autonomous processes.

CtxVault explores a different abstraction.

Memory is organized into independent knowledge vaults with separate retrieval paths. Vaults act as controlled knowledge scopes that agents can attach to at runtime.

The server exposes vault names as part of the API design. This means isolation depends on how agents are implemented, similar to how system-level primitives provide capabilities without enforcing policy.

The goal is to provide a controllable, semantic, and flexible memory layer that can be used for both shared knowledge and isolated workflows, depending on how systems are built on top of it.

Vaults can be inspected and managed manually. Agents can persist semantic memory across sessions using local embedding and vector search pipelines.

The system runs fully local using FastAPI as a control layer.

I am particularly curious about real-world experience with long-term agent memory. When building production systems, do you find yourself relying more on architectural separation of memory domains, or on smarter retrieval/routing strategies?

GitHub: https://github.com/Filippo-Venturini/ctxvault

1

Lila-E8 – 40M Parameter LLM with 0.37 Loss via E8 Lattice Attention #

0 评论1:03 PM在 HN 查看
I’m excited to release Sovereign-Lila-E8, a novel transformer architecture that replaces standard attention mechanisms with a native E8 Root System Lattice. While the industry is brute-forcing intelligence with trillions of parameters, I went "outside" the system to find a zero-viscosity solution.

I built Sovereign-Lila-E8 because I wanted to see if we could bypass the 'viscosity' of standard attention mechanisms using higher-dimensional geometry.

Most small models today are just distilled copies of larger ones. LILA-E8 is different: it implements a native E8 Root System Lattice directly into the attention weights. By using the densest sphere packing in 8 dimensions, we minimize semantic friction (information loss) in the latent space.

The Results:

Efficiency: 40M parameters achieving 0.37 Train / 0.44 Val Loss on the TinyStories dataset (outperforming standard 60M baselines). Stability: Sustained coherence for 1000+ tokens without the common semantic looping seen in small-scale transformers. By implementing the E8 exceptional Lie algebra directly into the attention weights, I’ve achieved a state of "Geometric Resonance" that standard transformers simply cannot reach. At 200,000 steps, the model achieved a state of 'Geometric Resonance'—a phase shift in quality that typically requires 2-3x more parameters in standard architectures. I’ve provided a 1-click Google Colab for instant verification of the weights and generation quality. GitHub: https://github.com/SPUTNIKAI/sovereign-lila-e8 Colab: https://colab.research.google.com/github/SPUTNIKAI/sovereign... Zenodo: (Preprint): https://zenodo.org/records/18731736

Looking for feedback on expanding the context window to 4096 and potentially porting this to the 24D Leech Lattice. (see also https://zenodo.org/records/18729723 )

1

Design is Code – UML to TDD tests that constrain AI code generation #

mossgreen.github.io faviconmossgreen.github.io
0 评论1:05 PM在 HN 查看
Two root causes make AI code generation unreliable:

Natural language isn't a contract. It's ambiguous by nature. Same prompt, different code, every time. There's no determinism.

Cost is asymmetric. AI generates at zero cost with zero responsibility. You review at high cost with full responsibility.

These compound. Ambiguous input produces unpredictable output. Unpredictable output demands expensive review. And if no one designed the code, no one can defend the architecture, no one can maintain it, and no one owns it.

DisC (Design is Code) applies London-school TDD to AI code generation. You draw a sequence diagram. Each arrow becomes one verify() in a mockist test. The tests leave exactly one valid implementation. The AI doesn't interpret — it types what the tests require.

It ships as a Claude Code skill. Works best with Opus, should be fine with Sonnet.

Here's a real example. This diagram:

  @startuml
  InvoiceService -> OrderRepository: findAllByCustomerId(customerId)
  InvoiceService <-- OrderRepository: orders: List<Order>
  InvoiceService -> InvoiceBuilderFactory: create()
  InvoiceService <-- InvoiceBuilderFactory: invoiceBuilder: InvoiceBuilder
  loop for each order in orders
      InvoiceService -> InvoiceBuilder: addLine(order)
  end
  InvoiceService -> InvoiceBuilder: build()
  InvoiceService <-- InvoiceBuilder: invoice: Invoice
  @enduml
Generates these tests:

  @Test void shouldFindAllOrdersByCustomerId() { verify(orderRepository).findAllByCustomerId(customerId); }
  @Test void shouldCreateInvoiceBuilder() { verify(invoiceBuilderFactory).create(); }
  @Test void shouldAddLineForOrder() { verify(invoiceBuilder).addLine(order); }
  @Test void shouldBuildInvoice() { verify(invoiceBuilder).build(); }
  @Test void shouldReturnInvoice() { assertThat(result).isEqualTo(invoice); }
AI implements the only thing that passes:

  public Invoice generateInvoice(UUID customerId) {
      List<Order> orders = orderRepository.findAllByCustomerId(customerId);
      InvoiceBuilder invoiceBuilder = invoiceBuilderFactory.create();
      orders.forEach(invoiceBuilder::addLine);
      return invoiceBuilder.build();
  }
4 arrows + 1 loop → 5 tests → 1 possible implementation.

Java + Spring only for now. Orchestration code only (not algorithms). PlantUML format. The mockist coupling tradeoff is real — but when AI writes the implementation, refactoring cost moves from code to the diagram.

Try it without installing anything — clone the demo repo and run in a Claude Code session:

  git clone https://github.com/mossgreen/design-is-code-demo
  cd design-is-code-demo
  /disc 01_hello-world.puml
Blog: https://mossgreen.github.io/introducing-design-is-code/

Plugin: https://github.com/mossgreen/design-is-code-plugin

Happy to hear what breaks, what's missing, and whether this is worth expanding to other languages.

1

I made a Uniswap v3 Hedge Rebalancer that manages shorts on Hyperliquid #

github.com favicongithub.com
0 评论1:07 PM在 HN 查看
I wrote a blog post about it: https://blog.carter2099.com/posts/6

tldr: Delta Neutral is a self hosted Rails app that allows you to enter hedge configurations for your Uniswap v3 positions. You enter a target hedge, i.e. 50%, and accordingly shorts are opened on Hyperliquid. As your assets composition changes in the pool, your shorts will be rebalanced to stay at your target hedge.

Here's the code: https://github.com/carter2099/delta_neutral

It builds on my Ruby Hyperliquid SDK: https://github.com/carter2099/hyperliquid

1

I solo-built Sovereign-Mohawk – FL with 500K nodes and 55% BFT #

rwilliamspbg-ops.github.io faviconrwilliamspbg-ops.github.io
0 评论9:52 PM在 HN 查看
The Problem:Standard Federated Learning (FL) hits a wall at scale. When you move from a few hundred nodes to 500,000, two things happen: communication overhead explodes ($O(n)$ or $O(n^2)$), and the "honest majority" assumption falls apart. Most BFT systems (like PBFT or HotStuff) are hard-capped at <33% or <50% malicious actors.The Breakthrough:I developed the Sovereign-Mohawk Protocol. In a stress test conducted yesterday, it successfully coordinated 500,000 nodes in 4 minutes and 8 seconds, maintaining model accuracy even with 55.5% of nodes acting maliciously (gradient poisoning and sybil attacks).How it works (The TL;DR):Hierarchical Streaming Aggregation: Instead of a central parameter server, Mohawk uses a tree-based batching architecture. This drops communication complexity to $O(d \log n)$.Tiered Rényi Differential Privacy: I integrated DP directly into the consensus layer. By using Rényi DP ($\epsilon = 0.98$), we can filter outliers (malicious gradients) more aggressively than standard median-based aggregators.zk-SNARK Verifiability: Every aggregation step generates a 200-byte proof. The central coordinator can verify the integrity of 500,000 contributions in constant time without re-computing the gradients.The Stress Test Results (Feb 24, 2026):40% Byzantine: 86.6% accuracy | 9.1s avg round time.50% Byzantine: 85.8% accuracy | 10.5s avg round time.55.5% Byzantine: 81.0% accuracy | 9.9s avg round time (The theoretical "Mohawk Limit").Why Solo?I wanted to prove that Sovereign AI infrastructure doesn't require a Google-sized team. This implementation is written in Go with a Wasmhost, allowing it to run on anything from an NVIDIA Jetson to an Apple Silicon NPU.Links:Repo: Sovereign Map Federated LearningResearch/Docs: Sovereign-Mohawk Protocol SiteI'm particularly looking for feedback on the BFT boundary proofs. Is $55.5\%$ the absolute limit for DP-weighted aggregation, or can we push to 60% with higher noise injection?
1

Pulse Running – find nearby runners and join their sessions (iOS beta) #

testflight.apple.com favicontestflight.apple.com
0 评论2:13 PM在 HN 查看
I kept skipping runs because I had nobody to go with. I looked for a solution that let me find other runners in real time near me — nothing existed. So I built it. > > Pulse shows you nearby runners on a map with upcoming sessions you can join. Or create your own and let others find you. No club signup, no group chat management. > > It's in early TestFlight beta (iOS only for now). I'm looking for runners who'd use this to tell me what's broken and what's missing. > > TestFlight link: https://testflight.apple.com/join/9f56taxY
1

StarkZap – Gasless Bitcoin Payments SDK for TypeScript #

github.com favicongithub.com
2 评论2:39 PM在 HN 查看
StarkZap is an open-source TypeScript SDK for adding Bitcoin-backed asset transfers, balances, lending & borrowing and staking to web, mobile, or server apps. The goal is to let developers integrate programmable Bitcoin assets without requiring users to install wallets, manage seed phrases, or hold gas tokens. Under the hood it runs on Starknet and supports WBTC, tBTC, LBTC, SolvBTC, and other Bitcoin-backed ERC20 tokens.

```import { StarkZap, StarkSigner, Amount, fromAddress, getPresets } from "starkzap";

const sdk = new StarkZap({ network: "sepolia" }); const wallet = await sdk.connectWallet({ account: { signer: new StarkSigner("0xYOUR_PRIVATE_KEY") } });

await wallet.ensureReady({ deploy: "if_needed" });

const { STRK } = getPresets(wallet.getChainId()); const balance = await wallet.balanceOf(STRK);

if (balance.gte(Amount.parse("10", STRK))) { const tx = await wallet.transfer(STRK, [ { to: fromAddress("0xRECIPIENT"), amount: Amount.parse("10", STRK) } ]); await tx.wait(); console.log(tx.explorerUrl); } ```

Key properties: - Gas sponsorship via paymaster (users don’t need gas tokens) - Multiple auth strategies (email/social via Privy, passkeys via Cartridge) - Batch transfers and contract calls in a single atomic transaction - Works in Node, browser, and React Native

The SDK abstracts account management, fee handling, and wallet popups. This won’t make sense for every app (e.g., if you only need fiat checkout). It’s for existing apps that want programmable onchain assets without the wallet UX.

Would appreciate feedback on the API design and whether this abstraction makes sense.

1

KeyEnv – manage team secrets without scattered .env files #

keyenv.dev faviconkeyenv.dev
0 评论4:26 PM在 HN 查看
Tired of .env files getting out of sync across team members and environments? KeyEnv is a CLI-first secrets manager that replaces scattered .env files with a secure, encrypted store.

- Pull secrets with a single command: keyenv pull - Secrets are AES-256-GCM encrypted at rest - Per-project, per-environment scoping (dev/staging/prod) - Team access controls + full audit trail - Works with existing apps that read from environment variables — zero code changes

The problem we kept running into: teams share secrets over Slack, check in .env.example files with real values, or have 5 different versions of the same key floating around. KeyEnv eliminates the category.

We'd love feedback, especially from teams dealing with microservices or multi-environment setups.

1

Idea Reality MCP – Pre-build reality check for AI coding agents #

github.com favicongithub.com
0 评论6:01 PM在 HN 查看
I kept building things that already existed. Last month I spent 6 hours on a tool before discovering 847 similar repos on GitHub.

So I built an MCP server that checks before your AI starts coding. Install with `uvx idea-reality-mcp`, and Claude/Cursor will automatically scan GitHub + Hacker News for existing implementations before writing a single line.

Returns: reality_signal (0-100), duplicate_likelihood, top 5 similar repos, evidence from multiple sources, and pivot suggestions.

It's a protocol layer, not a SaaS dashboard — the check happens inside your IDE workflow.

Python, MIT licensed, zero config. Would love feedback on the scoring algorithm.

1

GhostVM – native macOS VMs for secure dev and isolated agent workflows #

github.com favicongithub.com
0 评论6:02 PM在 HN 查看
I built GhostVM to make running untrusted or experimental code on macOS safer without sacrificing the dev experience.

It runs a full macOS VM using Apple’s virtualization framework, with snapshots and explicit host bridges (clipboard, file transfer, ports) so you can control what crosses the boundary.

I originally built it to sandbox agent-driven workflows and risky installs I wouldn’t run directly on my host machine. Happy to answer questions or discuss tradeoffs.

Website + docs: ghostvm.org

1

An "earned autonomy" architecture for AI agents using Subjective Logic #

kenschachter.substack.com faviconkenschachter.substack.com
0 评论9:52 PM在 HN 查看
Most agent systems treat autonomy as binary: the agent either does the thing or asks permission first. In practice, this means you end up rubber-stamping a stream of approval requests until you stop paying attention. The system designed to keep you in control trains you to stop caring.

To manage operations for my independent video game studio, I built a trust system that works more like onboarding a new hire. Agents start in draft mode (every action needs approval), and earn autonomy over time based on their track record in specific task categories.

The core idea: each agent maintains a separate Beta distribution per task category (support triage, expense reports, publisher emails, etc.). A Beta distribution is basically a track record parameterized by successes and failures. But raw E[p] = α/(α+β) can't tell the difference between "9 successes, 0 failures" and "90 successes, 10 failures" since both give E[p] = 0.90. So I use Jøsang's Subjective Logic to map these to opinion tuples that explicitly separate belief from uncertainty. High uncertainty means "not enough data yet," which is different from "we know this agent is bad."

Every action passes through a gate:

  VoI = stakes × (1 - trust) × uncertainty
Low VoI = auto-execute. High VoI = draft for human review. Static trust thresholds set the maximum autonomy level an agent can reach (Auto-Execute, Soft-Execute, Draft, Restricted), and VoI acts as a secondary gate that can restrict it further based on context — an agent might qualify for auto-execute in general, but a high-stakes situation still gets flagged.

Three things that made the biggest difference:

1. Edit distance feedback. If you rewrite half an email before hitting "approve," the system notices. A 0% edit = full trust credit. A 71%+ rewrite = penalty. This single change prevented agents from reaching auto-execute on work users were quietly fixing.

2. Time-based decay. Trust scores decay daily for inactive categories (λ = 0.95). If an agent hasn't done a task in two months, it gets supervised again. This also handles model upgrades, since the track record was earned on a different model.

3. Weakest-link chains. Multi-step workflows (send welcome email → create project → schedule meeting → notify team) use a weakest-link model. If any step needs approval, the whole chain surfaces as one inbox item. Nothing runs until you approve the full picture.

The core mapping from track record to opinion looks like this:

  def beta_to_opinion(alpha, beta, base_rate=0.5):
      n = alpha + beta
      return Opinion(
          belief=(alpha - 1) / n,
          disbelief=(beta - 1) / n,
          uncertainty=2 / n,
          base_rate=base_rate,
      )
The math is all well-established (Beta distributions, Subjective Logic, Value of Information). The part that worked was combining them into something that mirrors how trust actually develops between people.

Article with full implementation details, code examples, and diagrams: https://kenschachter.substack.com/p/earned-autonomy

1

Tfg – flake.nix generator for Terraform projects #

github.com favicongithub.com
0 评论7:03 PM在 HN 查看
Hello HN! Working as a SRE, our team handle a lot of terraform projects. Dealing with them, one annoyance is matching terraform version for all of those projects, usually done by (1) finding file containing `required_version` field, (2) finding nixpkgs commit for that version, and (3) copying flake.nix from somewhere else. Using alternative version manager like tfenv or asdf is not possible for some teammates who are using NixOS as their daily driver.

This tool parses HCL of all .tf files in given directory, usually current working directory, looks for `required_version` of terraform, finds matching nixpkgs commit, and generates/updates flake.nix to use needed version of terraform.

Any feedbacks are appreciated. Thanks!

1

GenogramAI – Create Genograms in Seconds #

0 评论7:13 PM在 HN 查看
Genograms are family trees used by therapists to map relationships, trauma patterns, and medical history across generations. They're incredibly useful. They're also a pain to make.

I built GenogramAI so you can just describe your family in plain English and get a properly formatted genogram in seconds. No learning specialized software. No manual dragging of symbols.

Therapists, social workers, and med students have been our early users — but honestly anyone curious about their family dynamics can use it.

https://genogramai.com

1

Open-source EU AI Act compliance layer for AI agents (8/2026 deadline) #

0 评论7:15 PM在 HN 查看
We built AIR Blackbox — open-source compliance infrastructure for AI agents targeting the EU AI Act enforcement deadline on August 2, 2026. If you're deploying LLM-based agents (LangChain, CrewAI, AutoGen, OpenAI Agents SDK) into production, the EU AI Act requires tamper-evident audit trails, human oversight mechanisms, data governance controls, and injection defense — for any system classified as high-risk. Most teams we've talked to either don't know about the deadline or assume their existing logging is enough. It's not. Article 12 specifically requires logs that regulators can mathematically verify haven't been altered. Article 14 requires the ability to interrupt agent execution. Article 15 requires defense against prompt injection and data poisoning. What we built:

Trust layers for LangChain, CrewAI, AutoGen, OpenAI Agents SDK, and RAG pipelines — each is a pip install that hooks into your existing agent code with ~3 lines of setup HMAC-SHA256 tamper-evident audit chains — every agent decision, tool call, and LLM interaction gets logged to a chain that regulators can verify ConsentGate — risk-classifies tool calls and blocks critical operations until approved InjectionDetector — 15+ weighted patterns scanning prompts before they reach the model WriteGate + DriftDetector (for RAG) — prevents knowledge base poisoning and detects retrieval anomalies Compliance scanner — pip install air-compliance && air-compliance scan ./my-project tells you exactly which articles you're missing

Everything maps to specific EU AI Act articles (9, 10, 11, 12, 14, 15). Zero vendor lock-in, Apache 2.0, zero core dependencies on the trust layers. The scanner is probably the fastest way to understand where your gaps are. It takes about 3 seconds to run on a typical project. GitHub: https://github.com/airblackbox PyPI: pip install air-compliance Happy to answer questions about what the EU AI Act actually requires for AI agent deployments — we've read the full regulation and mapped it to specific technical controls.

1

Memctl.com: Open-source shared memory infrastructure for coding agents #

0 评论5:08 AM在 HN 查看
Hey HN. I built memctl because every AI coding agent starts each session with zero context. No memory of past decisions, no shared knowledge across your team. memctl is a memory server that gives AI coding agents persistent context that carries over across sessions. Memory is shared across your team so every agent works with the same knowledge. It's branch-aware so context follows your git workflow, and everything is tracked with full history. It works with any AI coding agent. Open source and self-hostable.

GitHub: https://github.com/memctl Website: https://memctl.com

Launches on March 1st. Waitlist open. Would to hear any feedback!

1

A simple, free web app to track my portfolio across brokers #

erincayaz.github.io faviconerincayaz.github.io
0 评论9:21 PM在 HN 查看
So, I have been investing for some time now. At first, it was easier to track my portfolio because I was just using one broker, but as time went on, my portfolio became bigger and more diversified (for example, I bought physical gold, deposited money into a savings account at Bank A, and also used Broker A and Broker B because only Broker B had the stocks I wanted to buy).

It just got harder to track my investments, it was also getting harder to understand whether my investments were aligned with the portfolio I had built.

So naturally, I started to search for solutions. At first, I found a few desktop and mobile apps. But the problem with the majority of them was that they were either too complicated to use or just over-engineered. Nearly all the apps had a FIRE calculator, were synced with the market (which was logical, but how can I get the price for physical gold?), or were also trying to track my expenses. I just wanted to track my portfolio. Hence came the second option: using Excel.

And actually, this is the way the majority of people do it. Knowing that, I tried to create an Excel sheet for myself. But the barrier to entry was just too high; I didn't know how to use it, so it just seemed too hard to implement a solution for myself. Also, the user experience just didn't feel very good. I wanted to see pie charts, good fonts, etc. (I could probably do these things with Excel as well, but if I can't even implement a simple sheet, how could I do these cool visuals?).

So, I decided to implement my own solution. My needs were really simple:

- I want to see all my investments on one screen.

- I want to see my P&L.

- I want to see whether my investment ratio is aligned with my portfolio.

- I want to sync it across different devices.

- I want to have different currencies (like EUR, TL) because I invest in different markets.

- I want it to be free.

- I want to see how my investments grow over time.

And that's about it. So, keeping all these things in mind, I built a web app for myself and wanted to share it with you

1

Real-Time AI Design Benchmark #

shuffle.dev faviconshuffle.dev
0 评论11:35 AM在 HN 查看
Hey HN,

We built a different kind of AI benchmark for UI generation.

Instead of static leaderboards or curated screenshots, you can watch multiple models generate the same design live, side-by-side, and decide which output is actually better.

Under the hood, we call AI models from Anthropic (Opus), OpenAI (GPT), Google (Gemini), and Moonshot AI (Kimi).

Each model generates a real, editable project using Tailwind CSS (not screenshots or canvas exports). You can export it for Next.js, Laravel (Blade), Symfony (Twig), WordPress, or plain HTML.

What we noticed building this: * Popular benchmarks don't reflect UX/UI quality. For a different prompt, one model is better than another (that's why live comparison on a single screen matters). * Some models overuse wrappers/div soup. Some hallucinate layout constraints. * Kimi likes Cyrillic, even if all other models won't use it for the same prompt.

The interesting part wasn't ranking models. It was making their outputs easier for humans to compare visually.

Short demo: https://www.youtube.com/watch?v=RCTZlvqMQdc

Curious whether this feels more useful than traditional leaderboard-style AI benchmarks.

Happy to answer technical questions.

Example for HN:

Prompt: Redesign the Hacker News website for 2030, including sample entries that could realistically appear on the platform in that year.

Results: https://shuffle.dev/ai-design/Tjjy7XAFMq25AI

Previews:

Opus: https://shuffle.dev/preview/d6d5ba4eeede381cee7e30c697f010c7...

GPT: https://shuffle.dev/preview/f050359977c1d6dc6c8fc104a24b83c3...

Gemini: https://shuffle.dev/preview/eab78f9748a6d8ccecb94a8b0390f044...

Kimi: https://shuffle.dev/preview/394bb596a8efa50342db4dc88c5f9fab...

1

GrantFlow (FastAPI and LangGraph) for donor-aligned NGO proposal drafts #

github.com favicongithub.com
2 评论12:31 PM在 HN 查看
Hi HN,

I’ve been building GrantFlow, an open-source drafting workflow engine for institutional grant proposals.

The problem: many NGOs and implementing organizations spend a huge amount of time/money translating solid program ideas into donor-specific, reviewable proposal artifacts before they can even get meaningful feedback internally.

A lot of that work is not “thinking through the intervention” — it’s reshaping the same idea into structured outputs (ToC, LogFrame, MEL framing), aligning language to donor expectations, and managing review cycles.

GrantFlow is my attempt to reduce that overhead.

It takes structured project inputs and produces donor-aligned draft artifacts through a stateful workflow with review checkpoints (human-in-the-loop), instead of a single “generate everything” prompt.

What it does today (MVP): - Donor-aware drafting strategies (specialized + generic donor coverage) - Human-in-the-loop checkpoints (pause / approve / resume) - Exportable artifacts (.docx / .xlsx / ZIP) - RAG-ready donor knowledge namespaces (ChromaDB) - FastAPI API for integration into internal tools - Optional API key auth - Optional SQLite persistence for jobs + HITL checkpoints

Tech stack: - FastAPI - LangGraph - Pydantic - ChromaDB (with local/in-memory fallback) - Python 3.11+

Recent work I finished before posting: - hardened CI + shell checks - public API response redaction - typed response models for status endpoints - sqlite-backed job/HITL stores + WAL/busy_timeout - protected PDF ingest endpoint (`POST /ingest`) - readiness endpoint (`GET /ready`)

Why I built it this way: - proposal work is iterative and review-heavy - compliance/rules matter, so workflow/state matters - teams need checkpoints and auditability, not just raw text generation

Who I think this may be useful for: - implementing organizations (e.g. firms managing donor-funded programs) - NGOs and local partners - civic-tech / govtech teams building internal proposal tooling - consultants who standardize drafting workflows across donors

Happy to answer questions, especially around workflow design / HITL / donor strategy modeling.