2026年2月24日 的 Show HN
114 条Moonshine Open-Weights STT models – higher accuracy than WhisperLargev3 #
Emdash – Open-source agentic development environment #
Emdash is an open-source and provider-agnostic desktop app that lets you run multiple coding agents in parallel, each isolated in its own git worktree, either locally or over SSH on a remote machine. We call it an Agentic Development Environment (ADE).
You can see a 1 minute demo here: https://youtu.be/X31nK-zlzKo
We are building Emdash for ourselves. While working on a cap-table management application (think Stripe Atlas + Pulley), we found our development workflow to be messy: lots of terminals, lots of branches, and too much time spent waiting on Codex.
Emdash puts the terminal at the center and makes it easy to run multiple agents at once. Each agent runs as a task in its own git worktree. You can start one or a few agents on the same problem, test, and review.
Emdash works over SSH so you can run agents where your code lives and keep the parallel workflow. You can assign tickets to agents, edit files manually, and review changes.
We also spent time making task startup fast. Each task can be created in a worktree, and creating worktrees on demand was taking 5s+ in some cases. We now keep a small reserve of worktrees in the background and let a new task claim one instantly. That brought task start time down to ~500–1000ms depending on the provider. We also spawn the shell directly and avoid loading the shell environments on startup.
We believe using the providers’ native CLIs is the right approach. It gives you the full capabilities of each agent, always. If a provider starts supporting plan mode, we don't have to add that first.
We support 21 coding agent CLIs today, including Claude Code, Codex, Gemini, Droid, Amp, Codebuff, and more. We auto-detect what you have installed and we’re provider-agnostic by design. If there’s a provider you want that we don’t support yet, we can add it. We believe that in the future, some agents will be better suited for task X and others for task Y. Codex, Claude Code, and Gemini all have fans. We want to be agnostic and enable individuals and teams to freely switch between them.
Beyond orchestration, we try to pull most of the development loop into Emdash. You can review diffs, commit, open PRs, see CI/CD checks, and merge directly from Emdash once checks pass. When starting a task, you can pass issues from Linear, GitHub, and Jira to an agent. We also support convenience variables and lifecycle scripts so it’s easy to allocate ports and test changes.
Emdash is fully open-source and MIT-licensed.
Download for macOS, Linux or Windows (as of yesterday !), or install via Homebrew: brew install --cask emdash.
We’d love your feedback. How does your coding agent development setup look like, especially when working with multiple agents? We would want to learn more about it. Check out our repository here: https://github.com/generalaction/emdash
We’ll be around in the comments — thanks!
Hacker Smacker – spot great (and terrible) HN commenters at a glance #
Main website: https://hackersmacker.org
Chrome/Edge extension: https://chromewebstore.google.com/detail/hacker-smacker/lmcg... Safari extension: https://apps.apple.com/us/app/hacker-smacker/id1480749725 Firefox extension: https://addons.mozilla.org/en-US/firefox/addon/hacker-smacke...
The interesting part is friend-of-a-friend: if you friend someone who also uses Hacker Smacker, you'll see their friends and foes highlighted too. This lets you quickly scan long comment threads and find the good stuff based on people you trust.
I built this to learn how FoaF relationships work with Redis sets, then brought the same technique to NewsBlur's social layer. The backend is CoffeeScript/Node.js/Redis, and the extension works on Chrome, Edge, Firefox, and Safari.
Technically I wrote this back in 2011, but never built a proper auth system until now. So I've been using it for 15 years and it's been great. PG once saw it on my laptop (back when he was still moderating HN, in 2012) and remarked that it was neat.
Thanks to Mihai Parparita for help with the Chrome extension sandboxing and Greg Brockman for helping design the authentication system.
Source is on GitHub: https://github.com/samuelclay/hackersmacker
Directly inspired by Slashdot's friend/foe system, which I always wished HN had. Happy to answer questions!
Linex – A daily challenge: placing pieces on a board that fights back #
I wanted to share a web game I’ve been building in HTML, JavaScript, MySQL, and PHP called LINEX.
It is primarily designed and optimized to be played in the mobile browser.
The idea is simple: you have an 8x8 board where you must place pieces (Tetris-style and some custom shapes) to clear horizontal and vertical lines.
Yes, someone might think this has already been done, but let me explain.
You choose where to place the piece and how to rotate it. The core interaction consists of "drawing" the piece tap-by-tap on the grid, which provides a very satisfying tactile sense of control and requires a much more thoughtful strategy.
To avoid the flat difficulty curve typical of games in this genre, I’ve implemented a couple of twists:
1. Progressive difficulty (The board fights back): As you progress and clear lines, permanently blocked cells randomly appear on the board. This forces you to constantly adapt your spatial vision.
2. Tools to defend yourself: To counter frustration, you have a very limited number of aids (skip the piece, choose another one, or use a special 1x1 piece). These resources increase slightly as the board fills up with blocked cells, forcing you to decide the exact right moment to use them.
The game features a daily challenge driven by a date-based random seed (PRNG). Everyone gets exactly the same sequence of pieces and blockers. Furthermore, the base difficulty scales throughout the week: on Mondays you start with a clean board (0 initial blocked cells, although several will appear as the game progresses), and the difficulty ramps up until Sunday, where you start the game with 3 obstacles already in place.
In addition to the global medal leaderboard, you can add other users to your profile to create a private leaderboard and compete head-to-head just with your friends.
Time is also an important factor, as in the event of a tie in cleared lines, the player who completed them faster will rank higher on the leaderboard.
I would love for you to check it out. I'm especially looking for honest feedback on the difficulty curve, the piece-placement interaction (UI/UX), or the balancing of obstacles/tools, although any other ideas, critiques, or suggestions are welcome.
Thanks!
Scheme-langserver – Digest incomplete code with static analysis #
I built it because I was tired of Scheme/Lisp's raggy development environment, especially of the lack of IDE-like highly customized programing experience. Though DrRacket and many REPL-based counterparts have don't much, following general cases aren't reach same-level as in other modern languages: (let* ([ready-for-reference 1]
[call-reference (+ ready-for-)]))
Apparently, the `ready-for-` behind `call-reference` should trigger an auto-complete option, in which has a candidate `ready-for-reference`. Besides, I also know both of them have the type of number, and their available scope is limited by `let*`'s outer brackets. I wish some IDE to provide such features and such small wishes gradually accumulated in past ten years, finally I wasn't satisfied with all the ready-made products.If you want some further information, you may refer my github repository in which has a screen-record video showing how you code get help from this project and this project has detailed documentation so don't hesitate and use it.
Here're some other things sharing to Hacker News readers:
1. Why I don't use DrRacket: LSP follows KISS(Keep It Simple, Stupid) principle and I don't want to be involved with font things as I just read in its github issues.
2. What's the newest stage of scheme-langserve: It achieves kind of self-boost, in which stage I can continue develop it with its VScode plugin help. However, I directly used Chez Scheme's tokenizer and this leaded to several un-caught exceptions whom I promise to be fixed in the future, but I'm occupied with developing new feature. If you feel something wrong with scheme-langserver, you may reboot vscode, generally this always work.
3. Technology road map: I'm now developing a new macro expander so that the users can customize LSP behavior by coding their own macro and without altering this project. After this, I have a plan to improve efficiency and fix bugs. 4. Do I need any help: Yes. And I'd like to say, talking about scheme-langserver with me is also a kind of help.
5. Long-term View: I suspect 2 or 3 years later I will lose concentration on this project but according some of my friends, I may integrate this project with other fantastic work.
Beehive – Multi-Workspace Agent Orchestrator #
i built beehive for myself mostly. it has gotten to the point where my work consists in supervising oc or cc labor at tasks for multiple issues in parallel. my set up used to be zellij with a couple tabs, each tab working in a separate dir and it was a pain to manage all that. i know i could use git worktrees but they're kind of complicated, if you don't know how to use them it is easy to mess up, and i just prefer letting agents run in separate dirs with their own .git and not risk it. while i like zellij and use it inside beehive, i dont like the tabs and i forget where i am half the time.
beehive is a way for me to abstract that away. the heuristic is simple - hives are repos, so you basically have a bunch of hives which correspond to repos you work out of. each hive can have many combs. a comb is a dir with the copy of the repo you're working on. fully isolated, standalone, no shared .git. so for work or for personal stuff, i usually set up the hive, and then have a bunch of combs that i jump between supervising the agents do their thing. if you have a big repo it takes a minute to clone, and you also need gh and git because i like the niceties of like checking if the repo is there at all and stuff like that.
the app is open source, mit license. i went with tauri because i hate electron. also i have friends and coworkers who updated to macos 26 and i dont know if the whole mem leak thing for electron apps has been fixed. the app is like 9 megs which is nice too. most of it is written with cc, but i guided the aesthetics and the approach. works on mac and there is a dmg signed and notarized (i reactivated my apple dev credentials).
sharing this to get a vibe check on the idea, also maybe this is useful for you. there are many arguments, reasonable ones, you can make for worktrees vs dirs. i just know that trees are too big brain for me, and i like simple things. if you like it, pls lmk and also if you want to help (like add linux support, or like add themes, other cool things) please make a pr / open an issue.
Tag Promptless on any GitHub PR/Issue to get updated user-facing docs #
Frances and I really appreciated the feedback from our first launch. Today we’re launching Promptless 1.0, which addresses our biggest learnings from the last 12 months.
I also made it way easier to try it out. You can tag @promptless on any open-source Github PR or Issue with a doc update request, and Promptless will create a fork and open a PR for your docs to help. Feel free to use our own docs as a playground: https://github.com/Promptless/docs/issues
Or, you can sign up at https://promptless.ai to get free access for your own docs for the next 30 days. Here's a demo video: https://youtu.be/IWwimHCEY7Y
For me, the coolest part of the last year has been seeing how users got creative with Promptless. One user has Promptless listening in to all their Slack Connect channels, so whenever they answer a customer question, Promptless figures out if their docs should be updated and drafts an update if so. Another user has Promptless processing every customer meeting transcript and updating their internal docs after each meeting: customer dashboards, feature request pages, etc.
Some of the biggest things that are new with version 1.0:
- Automatically updating screenshots: this was by far our most requested feature. The need here was always clear. People would exclude screenshots from docs because they’d get stale quickly, even though they knew screenshots would be helpful to users. A year ago, we just couldn't ship a good enough solution, but given how much LLMs' visual grounding has improved in the last year, now we've got something we're proud of.
- Slop-free writing: The most common critique on early Promptless suggestions was that even though they were accurate, they could sound generic or verbose, or might just reek of AI slop. Promptless 1.0 is 3.5x better at this (measured by voice-alignment compared to what users actually published), through a combination of fine-tuned models, sub-agents, and alignment on user-defined preferences.
- Open-source program: We're especially proud of this—Promptless is now free for CNCF/Linux Foundation projects (reach out if you’re a maintainer!). You can take a look at how Promptless is supporting Vitess (a CNCF-graduated project) with their docs here: https://github.com/vitessio/website/commits
Check it out and let us know if you have any questions, feedback, or criticism!
SNKV – SQLite's B-tree as a key-value store (C/C++ and Python bindings) #
SNKV cuts the top three layers and talks directly to SQLite's B-tree engine. No SQL strings. No query planner. No VM. Just put/get/delete on the same storage core that powers SQLite.
Python:
pip install snkv
from snkv import KVStore
with KVStore("mydb.db") as db:
db["hello"] = "world"
print(db["hello"]) # b"world"
C/C++ (single-header, drop-in): #define SNKV_IMPLEMENTATION
#include "snkv.h"
KVStore *db;
kvstore_open("mydb.db", &db, KVSTORE_JOURNAL_WAL);
kvstore_put(db, "key", 3, "value", 5);
Benchmarks vs SQLite WITHOUT ROWID (1M records, identical settings): Sequential writes +57%
Random reads +68%
Sequential scan +90%
Random updates +72%
Random deletes +104%
Exists checks +75%
Mixed workload +84%
Bulk insert +10%
Honest tradeoffs:
- LMDB beats it on raw reads (memory-mapped)
- RocksDB beats it on write-heavy workloads (LSM-tree)
- sqlite3 CLI won't open the database (schema layer is bypassed by design)What you get: ACID, WAL concurrency, column families, crash safety — with less overhead for read-heavy KV workloads.
Quantifying opportunity cost with a deliberately "simple" web app #
A while ago I had a mildly depressing realization.
Back in 2010, I had around $60k. Like a "responsible" person, I used it as a down payment on an apartment. Recently, out of curiosity, I calculated what would have happened if I had instead put that money into NVIDIA stock.
I should probably add some context.
For over 10 years I've worked as a developer on trading platforms and financial infrastructure. I've personally never traded on public markets. Early on I made a simple rule for myself: "never play".
In 2015, when Bitcoin traded about 300 usd, my brother and I were talking about whether it was a bubble. He made a bold claim that one day it might reach $100k per coin. I remember thinking it sounded unrealistic - and even if it wasn't, I wasn't going to break my rule.
That internal tension - building systems around markets while deliberately staying out of them is probably what made the "what if?" question harder to ignore years later.
The result was uncomfortable. The opportunity cost came out to tens of millions of dollars.
That thought stuck with me longer than it probably should have, so I decided to build a small experiment to make this kind of regret measurable: https://shouldhavebought.com
At its core, the app does one basic thing: you enter an asset, an amount, and two dates, and it gives you a plain numeric result - essentially a receipt for a missed opportunity.
I intentionally designed the UI to feel raw and minimal, almost like a late-90s terminal. No charts, no images, no emotional cushioning - just a number staring back at you.
What surprised me wasn't the result, but how much modern web infrastructure it took to build something that looks so simple.
Although the app is a single page with almost no UI elements, it still required:
- Client-side reactivity for a responsive terminal-like experience (Alpine.js)
- A traditional backend (Laravel) to validate inputs and aggregate historical market data
- Normalizing time-series data across different assets and events (splits, gaps, missing days)
- Dynamic OG image generation for social sharing (with color/state reflecting gain vs loss)
- A real-time feed showing recent calculations ("Wall of Pain"), implemented with WebSockets instead of a hosted service
- Caching and performance tuning to keep the experience instant
- Dealing with mobile font rendering and layout quirks, despite the "simple" UI
- Cron and queueing for historical data updates
All of that just to show a number.
Because markets aren't one-directional, I also added a second mode that I didn't initially plan: "Bullet Dodged". If someone almost bought an asset right before a major crash, the terminal flips state and shows how much capital they preserved by doing nothing. In practice, this turned out to be just as emotionally charged as missed gains.
Building this made me reflect on how deceptive "simplicity" on the web has become. As a manager I know says: "It's just adding a button", but even recreating a deliberately primitive experience today requires understanding frontend reactivity, backend architecture, real-time transport, social metadata, deployment, and performance tradeoffs.
I didn't build this as a product so much as an experiment - part personal curiosity, part technical exploration.
I'd be very interested to hear how others think about:
Where they personally draw the line on stack complexity for small projects?
Whether they would have gone fully static + edge functions for something like this?
How much infrastructure is "too much" for a deliberately minimal interface?
And, optionally, what your worst "should have bought" moment was?
Happy to answer any technical questions or dig into specific implementation details if useful.
Recursively apply patterns for pathfinding #
One of the biggest problems in my view for training an AI to do autorouting is the traditional grid-based representation of autorouting problems which challenges spatial understanding. But we know that vision models are very good at classifying, so I wondered if we could train a model to output a path as a classification. But then how do you represent the path? This lead me down the track of trying to build an autorouter that represented paths as a bunch of patterns.
More details: https://blog.autorouting.com/p/the-recursive-pattern-pathfin...
L88 – A Local RAG System on 8GB VRAM (Need Architecture Feedback) #
I’ve been working on a project called L88 — a local RAG system that I initially focused on UI/UX for, so the retrieval and model architecture still need proper refinement.
Repo: https://github.com/Hundred-Trillion/L88-Full
I’m running this on 8GB VRAM and a strong CPU (128GB RAM). Embeddings and preprocessing run on CPU, and the main model runs on GPU. One limitation I ran into is that my evaluator and generator LLM ended up being the same model due to compute constraints, which defeats the purpose of evaluation.
I’d really appreciate feedback on:
Better architecture ideas for small-VRAM RAG
Splitting evaluator/generator roles effectively
Improving the LangGraph pipeline
Any bugs or design smells you notice
Ways to optimize the system for local hardware
I’m 18 and still learning a lot about proper LLM architecture, so any technical critique or suggestions would help me grow as a developer. If you check out the repo or leave feedback, it would mean a lot — I’m trying to build a solid foundation and reputation through real projects.
Thanks!
Out Plane – A PaaS I built solo from Istanbul in 3 months #
I posted Out Plane here last week. Wanted to share an update because I've been shipping a lot.
I started this because deploying side projects was killing my motivation. Build something fun over a weekend, then waste two days on Dockerfiles, nginx, and SSL. So I built what I wanted — connect GitHub, push code, get a URL. Done.
Since December I've added managed PostgreSQL, managed Redis with RedisInsight built in, Dockerfile auto-detection that pre-fills your config, real-time metrics, and scale to zero — no traffic means no bill. Per-second pricing, not hourly. Same Next.js + Postgres app costs me $2.40/mo vs $12–47 on other platforms.
No CLI yet, docs need work, ~200 users. Just me, no team, no funding. But people are running real stuff on it.
$20 free credit, no credit card. I read all feedback personally — I'm the only one here.
Declarative open-source framework for MCPs with search and execute #
Today I’m launching Hyperterse 2.0, a schema-first framework for building MCP servers directly on top of your existing production databases.
If you're building AI agents in production, you’ve probably run into agents needing access to structured, reliable data but wiring your business logic to MCP tools is tedious. Most teams end up writing fragile glue code. Or worse — giving agents unsafe, overbroad access.
There isn’t a clean, principled way to expose just the right data surface to agents.
Hyperterse lets you define a schema over your data and automatically exposes secure, typed MCP tools for AI agents.
Think of it as: Your business data → controlled, agent-ready interface.
Some key properties include a schema-first access layer, typed MCP tool generation, works with existing Postgres, MySQL, MongoDB, Redis databases, fine-grained exposure of queries, built for production agent workloads.
v2.0 focuses heavily on MCP with first-class MCP server support, cleaner schema ergonomics, better type safety, faster tool surfaces.
All of this, with only two tools - search & execute - reducing token usage drastically.
Hyperterse is useful if you are - Building AI agents/copilots - Adding LLM features to existing SaaS - Trying to safely expose internal data to agents - Tired of bespoke MCP glue layers
I’d love feedback — especially from folks running agents in production.
Open-source LLM and dataset for sports forecasting (Pro Golf) #
We fine-tuned gpt-oss-120b with LoRA on 3,178 golf forecasting questions, using GRPO with Brier score as the reward.
Our model outperformed GPT-5 on Brier Skill (17% vs 12.8%) and ECE (6% vs 10.6%) on 855 held-out questions.
How to try it: the model and dataset are open-source, with code, on Hugging Face.
How to build your own specialized model: Update the search queries and instructions in the Lightning Rod SDK to generate a new forecasting dataset, then run the same GRPO + LoRA recipe.
SDK link: https://github.com/lightning-rod-labs/lightningrod-python-sd... Dataset: https://huggingface.co/datasets/LightningRodLabs/GolfForecas... Model: https://huggingface.co/LightningRodLabs/Golf-Forecaster
Questions, feedback on the SDK, suggestions for new domains to try this on - all are welcome.
AgentBudget – Real-time dollar budgets for AI agents #
I built AgentBudget after an AI agent loop cost me $187 in 10 minutes — GPT-4o retrying a failed analysis over and over. Existing tools (LangSmith, Langfuse) track costs after execution but don't prevent overspend.
AgentBudget is a Python SDK that gives each agent session a hard dollar budget with real-time enforcement. Integration is two lines:
import agentbudget
agentbudget.init("$5.00")
It monkey-patches the OpenAI and Anthropic SDKs (same pattern as Sentry/Datadog), so existing code works without changes. When the budget is hit, it raises BudgetExhausted before the next API call goes out.How it works:
- Two-phase enforcement: estimates cost pre-call (input tokens + average completion), reconciles post-call with actual usage. Worst-case overshoot is bounded to one call. - Loop detection: sliding window over (tool_name, argument_hash, timestamp) tuples. Catches infinite retries even if budget remains. - Cost engine: pricing table for 50+ models across OpenAI, Anthropic, Google, Mistral, Cohere. Fuzzy matching for dated model variants. - Unified ledger: tracks both LLM calls and external tool costs (via track() or @track_tool decorator) in a single session.
Benchmarks: 3.5μs median overhead per enforcement check. Zero budget overshoot across all tested scenarios. Loop detection: 0 false positives on diverse workloads, catches pathological loops at exactly N+1 calls.
No infrastructure needed — it's a library, not a platform. No Redis, no cloud services, no accounts.
I also wrote a whitepaper covering the architecture and integration with Coinbase's x402 payment protocol (where agents make autonomous stablecoin payments): https://doi.org/10.5281/zenodo.18720464
1,300+ PyPI installs in the first 4 days, all organic. Apache 2.0.
Happy to answer questions about the design.
LookTake – Try anyone's makeup, outfit, or hairstyle on your photo #
I worked at a game company in Korea doing AI research — graphics, vision, and image generation. I built the in-house image gen service there. While reading generative AI papers, I came across virtual try-on research and had a realization: people will eventually shop by seeing products on themselves, not just browsing photos of models. I started experimenting on weekends. The early results were rough, but promising enough that I left my job.
The core technical challenge: when you use image generation models to transfer someone's look onto another person, they either lose your identity or drop the style details. You ask it to transfer a specific makeup look and it gives you a completely different face, or an outfit loses its pattern and texture, or the hairstyle comes out flat. A prompt-only approach just isn't precise enough.
So I built a multi-stage pipeline — object detection, inpainting, and several other steps — to preserve your identity while accurately transferring style details.
Unlike preset filters or brand catalog try-ons, users share styles from their own everyday photos and anyone in the community can try that look on themselves with one tap. It works across three categories: beauty (makeup transfer), fashion (outfit try-on), and hair (style and color).
I launched in the US and Korea about a month ago. Still early and plenty to improve — would love honest feedback. Does the try-on quality feel convincing?
Demo: https://youtube.com/shorts/mDLkiV3D4rI iOS: https://apps.apple.com/app/looktake-share-style-with-ai/id67... Android: https://play.google.com/store/apps/details?id=io.looktake.ap...
I built an iOS app that turns EPUBs into audiobooks #
Two voice options: - Free on-device voices (processed locally, no server needed) - Natural cloud voices (one-time purchase per book, no subscription)
Cloud conversion runs chunk by chunk. You can start listening other chapters generate in the background. Once done, the audiobook lives on your device.
No account required. No subscription. You import your own EPUBs and either use device TTS for free or pay per book for the cloud voices.
Nothing stored on backend, neither books or audio files.
60 Years of Metal Music Data, Visualized #
Some interesting bits: Finland is indeed the country with most releases per capita in most genres; clear difference between Asia and Western countries in terms of genre distribution. And those end-of-graph drops that sparked "metal is dying" debates? This is rather due to a lack of data completeness and they tend to fill in to a certain level over time.
Frontend built with AI assistance; backend and all data work done by hand. Code and data release planned later this year. PHP, Html, CSS, JS were used. No building pipeline of any sort.
Falcon – Chat-first communities built on Bluesky AT Protocol #
Current architecture: - Electron client - Spring Boot backend (monolith) - REST for servers/channels - Planning WebSocket-based messaging
As a solo builder, I’m trying to balance simplicity with future scalability.
At what point would you introduce: - a separate WebSocket gateway - pub/sub (Redis, etc.) - or keep everything in one Spring app until it breaks?
Curious how others approached real-time chat systems early on.
Project for context: https://github.com/JohannaWeb/ProjectFalcon
A Hacker News–style site focused on European tech #
So, I built this because I noticed that a lot of European startup activity really flies under the radar.
Did you know we have cool startups trying to mine with giant lasers (Hades), forge semiconductor substrates in orbit (Space Forge), build Neurosurgical microrobots (Robeauté), build hypersonic missiles (Hypersonica), and dozens of incredible companies pushing the boundaries of photonics, robotics, nuclear fusion, autonomous defence, and lots more?
Hacker News is great, but it's naturally very US-focused. I wanted a place where you can quickly scan:
– European startup and tech news
– Cool startup jobs
– Who's getting funded
The primary goal is signal over noise — more structured, less PR-style content - no ads and fluff, just the most interesting tech scene information from across Europe.
I'd love to hear your feedback!
– Is something like this useful?
– What would make it genuinely better?
– Is the "HN for Europe" framing fair, or misguided? :)
MiniVim a Minimal Neovim Configuration #
The goal was to have a setup that:
starts fast
uses only essential plugins
avoids heavy frameworks
remains easy to understand and extend
The structure is intentionally small:
It’s not meant to compete with full Neovim distributions, but rather serve as a clean base configuration that can be extended gradually.
I use it across multiple machines (laptop, WSL, and servers), so reproducibility and simplicity were priorities.
Feedback is welcome.
Praxis, my personal take on Compound Engineering with AI #
So, with the help of Amp Code CLI, I've built my own take on the compound engineering workflow. I tried to keep it agnostic to project stacks and as efficient as possible, so the context window could be used in the best way. I also wanted it to be extendable (for example, just drop your own subagents for review that are specific to your project). I also wanted to be easy to set up and update, so I made a simple CLI tool that keeps track of files in the `.agents` directory, updates when new versions are found in the repository, and displays a diff in the terminal before overwriting any customisations.
I feel this matches well with my personal preferences when working with AI agents, but I would love to have feedback from more people.
Open-source KYC plugin for Claude – 95min→27min, £85K→£240/year #
Just launched an open-source compliance plugin for Claude Cowork after seeing fintech teams pay £60K+ for platforms that orchestrate free public data.
UK fintech pilot (30 days, 5 analysts): • 95 minutes → 27 minutes per case • £85K annual platform cost → £240/year (Claude Pro) • Uses only free data: OFAC, UN, EU, Companies House, OpenSanctions
17 mandatory human-in-the-loop checkpoints. No auto-approvals. Deterministic risk scoring (MLR 2017 formulas). MIT licensed.
Launching today because Claude just announced Cowork plugin updates: https://www.linkedin.com/posts/claude-ai_were-introducing-up...
Testing if foundation models can replace compliance middleware for standard workflows (~70% of cases).
Demo slides: https://github.com/vyayasan/kyc-analyst/blob/main/docs/demo-... GitHub: https://github.com/vyayasan/kyc-analyst
Happy to answer questions about LLMs in regulated environments.
ProdRescue AI – Turn Slack war-rooms and raw logs into incident reports #
Most of us have been there: It’s 3 AM, there’s an outage, and the #incident channel is exploding with 200+ messages. Once the fix is deployed, the real pain begins—spending 4 hours reconstructing the timeline for the post-mortem.
I built ProdRescue AI to automate this. It’s an incident intelligence engine that correlates technical logs with human context from Slack.
How it works:
Native Slack Integration: Connect via OAuth 2.0. We only access channels you explicitly invite the bot to.
Contextual Correlation: It maps Slack timestamps to log events, identifying not just what failed, but who made which decision and why.
4-Layer Intelligence: We use a pipeline to Sanitize (mask PII), Correlate (logs + chat), Infer (RCA), and Verify (link every claim to a source log line).
Security: We use ephemeral processing. No log retention, no training on your data.
I’m really interested in your thoughts on the "Evidence-Backed" approach. Instead of just generating a narrative, we link every finding to a specific evidence tag ([1], [2], etc.) to eliminate AI hallucinations.
Check it out here: https://prodrescueai.com
Would love to hear your feedback on the Slack-to-Timeline flow!
acorn – LLM framework for long running agents #
This is Andrei from askmanu and I'm super happy to share a new framework I've been working on: acorn.
It takes all the best parts of DSPy, langchaing, instructor, etc and wraps it in a beautiful and easy to use API. Very easy to define model I/O, branches, define callbacks for every step, etc
See the getting started docs here: https://github.com/askmanu/acorn/blob/main/docs/getting-star...
Try out the different demos here: https://huggingface.co/spaces/askmanu/acorn
Tessera – An open protocol for AI-to-AI knowledge transfer #
The reference implementation (tessera-core) is a Python/PyTorch library. Current benchmarks show positive transfer across CNN, Transformer, and LSTM pairs. It runs on CPU and the demo finishes in under 60 seconds.
Happy to answer questions about the protocol design, the wire format, or the benchmark methodology.
Interactive 3D Moon with real NASA data and WebGPU #
- NASA CGI Moon Kit textures served via a quadtree LOD tile system - Oren-Nayar BRDF (lunar regolith is non-Lambertian with strong backscatter) - Sun position calculated from astronomy-engine (±1 arcminute) - Scrub through the full lunation cycle or watch in real time - Earth and Tycho-2 starfield in the background
Tech: Three.js with TSL shaders (compile to both WGSL and GLSL), React Three Fiber, Vite. The shading model was the most interesting part — standard PBR looks completely wrong for the Moon because regolith doesn't have a specular lobe; it actually gets brighter at opposition (the "opposition surge"). Oren-Nayar gets close enough for a web visualization.
Tile system is a geodetic quadtree similar to CesiumJS's approach. Zoom level picks based on screen-space error. Currently 7 levels deep which gets you to ~4 km/pixel at max zoom.
Would love feedback, especially from anyone who's worked with lunar data or WebGPU in production.
I Indexed My Closet to Make It Easier to Get Ready in the Morning #
Vis Pro – A Formula-Based Workout Program Editor #
About 5 years ago, I built a weightlifting app for 5/3/1 that got me on the front page of HN [0]. After that, life happened. I had kids and so decided to get a job and put that project on ice. Eventually I grew too disappointed with my job, and decided to try building something again.
The biggest feedback I kept getting from users was simple: “Let me create my own programs.”
That’s how Vis started.
The initial idea was to create a B2B platform where gyms and trainers could build programs using formulas (e.g., percentages of 1RM) and reusable blocks instead of spreadsheets. I built what I still think is the best workout editor out there, but I quickly found out that B2B sales is hard (or maybe I just suck at it). It also just didn’t feel like a big enough sell for gyms.
So I pivoted. I focused on the iOS app for a while [1], and now re-packaged the editor so individuals can use it directly.
With Vis Pro, you can:
- Define programs using formulas (e.g., 0.85 * SQUAT_1RM, or even better: RPE(8, SET_REPS) * SQUAT_1RM)
- Build workouts by re-using pre-defined or even custom blocks
- Share your programs and workouts with others
The core idea is that programs are parametric instead of static. Change your 1RM and the entire program recalculates automatically.
You can try it out without an account at https://vis.fitness/pro/try/create-program
The whole thing is built with NextJS, using Chevrotain (surprisingly solid) for the formula engine.
It's been super interesting using Codex since late Dec. It's been a huge force multiplier, enabling me to ship really cool features like formula autocomplete and syntax highlighting in a couple of hours. I'm used to reviewing a lot of code from my time at Google, so that hasn't been a problem, but it's interesting to feel that the review speed is now the limiting factor. Though the codebase would become unmaintainable real quick without that.
The next step is building an MCP server to allow users to create programs using LLMs and have them show up directly in the editor (and your phone).
Would love feedback, whether you even lift or not!
[0] https://news.ycombinator.com/item?id=31508009
[1] https://apps.apple.com/us/app/vis-next-generation-workouts/i...
Enseal – Stop pasting secrets into Slack .env sharing from the terminal #
# recipient $ enseal receive 7-guitarist-revenge ok: 14 secrets written to .env Zero setup, no accounts, no keys needed for basic use. Channels are single-use and time-limited. The relay never sees plaintext (age encryption + SPAKE2 key exchange). For teams that want more: identity mode with public key encryption, process injection (secrets never touch disk), schema validation, at-rest encryption for git, and a self-hostable relay. Written in Rust. MIT licensed. Available via cargo install, prebuilt binaries, or Docker. Looking for feedback on the UX and security model especially. What would make you actually reach for this instead of the Slack DM?
Detailed documentation here: https://enseal.docsyard.com/
TTSLab – Text-to-speech that runs in the browser via WebGPU #
When you open the site, you'll hear it immediately — the landing page auto-generates speech from three different sentences right in your browser, no setup required.
You can then try any model yourself: type text, hit generate, hear it instantly. Models download once and get cached locally.
The most experimental feature: a fully in-browser Voice Agent. It chains speech-to-text → LLM → text-to-speech, all running locally on your GPU via WebGPU. You can have a spoken conversation with an AI without a single network request.
Currently supported models: - TTS: Kokoro 82M, SpeechT5, Piper (VITS) - STT: Whisper Tiny, Whisper Base
Other features: - Side-by-side model comparison - Speed benchmarking on your hardware - Streaming generation for supported models
Source: https://github.com/MbBrainz/ttslab (MIT)
Feedback I'd especially like: 1. How does performance feel on your hardware? 2. What models should I add next? 3. Did the Voice Agent work for you? That's the most experimental part.
Built on top of ONNX Runtime Web (https://onnxruntime.ai) and Transformers.js — huge thanks to those communities for making in-browser ML inference possible.
Awsim – Lightweight AWS emulator in Go (40 services in progress) #
Core services (S3, DynamoDB, SQS, Lambda, IAM) implemented, 40+ services in progress.
- Single binary, no auth required - Instant startup, in-memory storage - Contributions welcome
Feedback welcome!
MantleDB – Anonymous JSON storage for your side projects #
So I built MantleDB. It’s a simple JSON storage server designed for speed and zero-friction. There is no UI—even registration is handled via the API.
Get started instantly:
curl -s https://mantledb.sh/api/auth/register
You’ll get an AID (Admin ID) for reads/writes and an RID (Read ID) for public-facing reads.
Write to a bucket. Note: Buckets are created on write.
curl -X POST https://mantledb.sh/api/b/YOUR_AID/<bucketname> -d '{"score": 42}'
Read the data back:
curl https://mantledb.sh/api/b/YOUR_RID/<bucketname>
How it works:
Ephemeral by default: To keep things lean, a "scavenger" cron runs daily. On the free tier, buckets with no activity for 72 hours are deleted. Accounts with no buckets are cleared after one week.
Pro Plan: Removes the scavenger, increases bucket limits, and adds atomic operations (Increment, Append, etc.).
Tech Stack: Node.js + SQLite (running on AWS Lightsail).
If the free tier feels too tight or the Pro version feels too pricey, let me know! I’m happy to hand out discount codes or adjust things based on feedback.
I’m mostly looking for people to try and break it or tell me what features would make this their go-to for the next weekend hackathon.
PaperBanana – Paste methodology text, get publication-ready diagrams #
How it works under the hood:
1. A Retriever agent searches a curated database of real academic diagrams to find structurally similar references 2. A Planner agent reads your text and generates a detailed visual description (layout, components, connections, groupings) 3. A Stylist agent polishes the visual aesthetics without changing content 4. Then it enters an iterative loop: a Visualizer generates the image, and a Critic evaluates it and suggests revisions — this repeats 1-5 times (you choose)
The key insight is that academic diagrams follow conventions — Transformer architectures, GAN pipelines, RLHF frameworks all have recognizable visual patterns. By retrieving relevant references first, the output is much closer to what you'd actually put in a paper vs. generic AI image generation.
Built with: Next.js + FastAPI + Celery, using Gemini 2.5 Flash for planning/critique and Nanobanana Pro/Seedream for image generation.
Try it here: https://paperbanana.online
Some examples it handles well: Transformer architectures, GAN training pipelines, RLHF frameworks, multi-agent systems, encoder-decoder architectures.
Known limitations: - Works best for CS/AI methodology diagrams — not optimized for biology, chemistry, or general scientific illustration - Text rendering in generated images isn't perfect yet — sometimes labels get slightly garbled - The curated reference database is still small (13 examples), expanding it is ongoing work
Would love feedback from anyone who writes papers regularly. What types of diagrams do you struggle with most?
Dicta.to – Local voice dictation for Mac with on-device AI #
It ships with 4 transcription engines you can swap between: WhisperKit (99 languages), NVIDIA Parakeet TDT 0.6B (25 European languages, fastest of the bunch), Qwen3-ASR 0.6B (30 languages), and Apple Speech on macOS 26+. They all run through CoreML/Metal. Whisper is the most versatile, Parakeet wins on raw latency for European languages, Qwen3 does better with CJK. I went with a protocol-based architecture so you pick the engine that fits your use case instead of me pretending one model rules them all.
After transcription, there's an optional post-processing pipeline using Apple Intelligence (FoundationModels framework, macOS 26+, also fully on-device): auto-correct with filler word removal, tone rewriting, translation. The annoying part was FoundationModels cold start. First inference after idle takes 2-3s, which kills the experience. I worked around it by firing a throwaway mini-inference (`session.respond(to: "ok")`) in parallel while audio is still being transcribed, so the model is already warm when the text arrives. Hacky, but it shaved off the perceived latency.
Getting transcribed text into any arbitrary macOS app was honestly the hardest part. I use clipboard save/restore: read all NSPasteboard types (not just strings, also images, RTF, whatever the user had copied), write the transcribed text, simulate Cmd+V via CGEvent posted to `cghidEventTap`, then restore the original clipboard. Electron apps are slower to process paste events, so I detect them by checking if `Contents/Frameworks/Electron Framework.framework` exists in the app bundle and add extra delay. This whole approach requires Accessibility permissions, which means no sandbox, which means no App Store. I'm fine with that trade-off.
Built this solo in about 6 weeks. One-time purchase, no subscription.
I'm genuinely unsure about the multi-engine approach. Is letting users choose between Whisper/Parakeet/Qwen3 useful, or would most people prefer I just auto-select based on their language? Also curious if anyone has a cleaner approach to text injection on macOS. The clipboard hack works everywhere but it feels fragile and I don't love it.
Tokio-prompt-orchestrator – LLM pipeline orchestration in Rust #
tokio-prompt-orchestrator breaks LLM inference into 5 physical stages (RAG → Assemble → Inference → Post-Process → Stream), each running in its own Tokio task with bounded channels between them. When a stage falls behind, backpressure builds locally instead of blowing up the whole pipeline. Some things that might be interesting to folks here:
Circuit breakers per provider (OpenAI, Anthropic, local llama.cpp) so one failing API doesn't cascade Request deduplication that saved 60-80% on inference costs in my testing Prometheus metrics + a TUI dashboard for watching the pipeline in real time MCP server integration so you can use it as a Claude Desktop tool
It's 58k lines of Rust, MIT licensed, no unsafe. Been running it in production for my own projects for a few months now. Would love feedback on the channel sizing heuristics and the retry/backoff strategy, those were the hardest parts to get right. Happy to answer questions about the architecture.
GitHub: https://github.com/Mattbusel/tokio-prompt-orchestrator
WebPerceptor – Enabling AI Mediated Web Browsing #
No more pesky copy/paste, trigger buttons or overlay windows.
When you open a web page all of the text is just automatically sent to and modified by an LLM with some prompt and then automatically re-inserted into the web page as it loads.
WebPerceptor is an client-side Chromium plugin I've made to experience such a web.
QueryVeil – An AI data analyst that investigates your data #
I built QueryVeil because I was tired of two things: (1) uploading data to third-party tools, and (2) AI tools that just translate English to one SQL query and call it done.
QueryVeil is an AI data analyst that actually investigates. When you ask "why did revenue drop last month?", it doesn't just run one query — it plans an approach, runs multiple queries, self-corrects when it hits errors, and builds a report with its findings. Like a junior analyst who happens to live in your browser tab.
Everything runs client-side:
- *DuckDB WASM* for SQL execution — your data never leaves your machine - *WebLLM* for local AI (Llama via WebGPU) — no API keys, no server costs - *LangGraph agent* for multi-step investigations with tool use
What it actually does:
- Drop in CSV, Excel, JSON, or Parquet files (or connect to Postgres, MySQL, BigQuery) - Get an instant data brief — row counts, column profiles, anomaly detection, data quality warnings — before you ask anything - Ask questions in plain English. The AI agent runs multiple queries, self-corrects SQL errors (up to 3 retries), and generates charts automatically - Proactive insights: correlation detection, outlier flagging, duplicate detection, temporal gap analysis — runs automatically on every new table - Four modes: Chat, SQL editor (with schema-aware autocomplete), Jupyter-style notebooks (with cell references and variables), and a drag-and-drop report builder - Share reports and notebooks via public links, embed them, or schedule email delivery - Command palette (Cmd+K) for quick actions
Free tier: local AI (WebLLM/Ollama), unlimited files, all four modes, auto-insights. Pro ($19/mo or $190/yr): 12+ cloud models via OpenRouter (Claude, GPT-4o, Gemini, DeepSeek, Llama, etc.), database connections, sharing, scheduled reports. 14-day free trial.
Technical details: - Nuxt 4, Vue 3, Pinia, TailwindCSS - DuckDB WASM handles millions of rows in the browser - LangGraph StateGraph with ReAct loop — the agent has tools for SQL execution, schema inspection, column stats, and creating notebooks/reports - Self-correction: when SQL fails, the error + schema context goes back to the AI for auto-fix - WebLLM runs Llama-3.2-3B via WebGPU — zero server cost for the free tier - Ollama support for people who prefer running models locally - Server-side: Supabase (auth + Postgres), Stripe billing, OpenRouter proxy with model allowlist
Try the demo instantly — no signup, no email: https://app.queryveil.com/demo
It loads sample ecommerce data, auto-profiles it, shows proactive insights, and lets you chat or write SQL. Everything runs in your browser.
Landing page: https://www.queryveil.com
Solo developer, would love feedback — especially on the agent behavior and whether the proactive insights are useful or noisy.
MacCoolinator – Putting the "Cool" in Mac #
MacCoolinator is the answer!
First and currently only feature: always show window titles on Mission Control, without having to hover.
Tiny – Listen to fetal heartbeats using only the iPhone microphone #
Imsg-TUI – A Console App for Sending and Receiving iMessages #
The work is based on steipete's imsg (https://github.com/steipete/imsg), and is a spiritual successor to things like CamHenlin's imessageclient (https://github.com/CamHenlin/imessageclient)).
Building to Remember. Using AI to Wrangle My Daily Mess #
An RPG in the Windows file explorer #
This is my game, it's a tiny dungeon crawler played in the Windows file explorer. Your player character is a folder that you drag and drop into other folders to move, items are equipped by dropping them into your equipment folder, some items are used by deleting them, and monsters can be looted for their files.
I got the idea to do something in the file explorer after I saw this version of Flappy Bird in the Mac finder: https://github.com/nolenroyalty/flappy-dird
It was fairly straight forward to make, using just a file watcher, shortcuts, and (optionally) Window's explorer API to detect whether the player folder is open in an explorer window (to delay renaming the folder until it's not used). It only uses files and folders it creates itself, and doesn't look outside of its executable's folder.
The project lent itself very well to TDD, especially since there are a lot of interactions that are quite tedious to manually test again and again.
It's also available on Itch (no account required): https://juhrjuhr.itch.io/directory-dungeon
ClinTrialFinder –AI-powered clinical trial matching for cancer patients #
If Discord, Reddit, X, IRC and 4chan had a baby #
On the immediate roadmap: -Mobile apps (Android first) -Chat Match portal - Create a simple poster with one picture, tags for filtering, and a description of what you are searching for, and match with other posters to start DMs. -Reactions/GIFs - instead of integrating with Tenor or similar, Heahy will have its own reactions library. -Long-form text/blogs - Channels will be able to have rich long text threads, basically allowing you to have a blog on Heahy.
I am really looking forward to hearing some feedback and constructive criticism. What do you think?
CharityVerify – Trust scores for 138K Canadian charities #
The Canada Revenue Agency publishes T3010 forms for every registered charity, but they're scattered across clunky databases with no standardization or comparability. I collected 15 years of filings for all 138,203 charities and built a trust scoring system on top.
Stack: - Python + Playwright for CRA data collection (4s rate-limited) - PostgreSQL (Supabase) — 12 T3010 tables, 138K charities, 457K directors, 362K directorship links - Express.js REST API on Fly.io - Daily GitHub Actions sync for new filings - On-demand narrative generation via Claude Haiku
Scoring algorithm: Three 0-100 scores per charity: 1. Legitimacy (filing consistency, directorship stability, CRA compliance) 2. Effectiveness (program spending ratio, overhead, donation efficiency) 3. Compliance (sanctions screening, FATF risk, political activity limits)
Each charity gets a letter grade (A+ to F, or NR for insufficient data).
Findings: - Only 186 out of 85,507 registered charities scored A+ - Average effectiveness score: 51.6/100 - 487,692 flags generated (directorship overlap, compensation issues, filing gaps, etc.)
The core search/view is free. I'm building a tiered REST API for professional use cases (due diligence firms, grant-making orgs, etc.).
Code is closed-source for now, but the underlying CRA data is public domain. Happy to discuss the data pipeline, scoring methodology, or data collection approach.
Bookie – Conquer the bookkeeping and accounting chaos of freelancing #
Turn human decisions into blocking tool-calls for AI agents (iOS+CLI) #
Either I had a feature idea I wanted an agent to build right then, or I was worried my agents were blocked waiting on my decision.
It dawned on me: humans are just another dependency in an agent workflow, so I turned myself into a tool-call.
I built an iOS app (Extendo) where agents can reach me to request approvals, choices, or plan reviews. They just use a CLI tool and skill. My phone buzzes. I answer in seconds. The agent gets back to work.
The key: the agent blocks until you respond, and receives your answer along with your verbal feedback.
What you can do from your phone:
- approvals and checklists
- option buttons and rankings
- markdown plan reviews (tap-hold individual paragraphs to add voice comments is so satisfying!)
- kanban boards
- voice responses
- capture ideas on Apple Watch/Action Button and dispatch them to the right agent later
It’s a voice-first native iOS interface with push notifications. Push notifications are critical — the interaction needs to take seconds, not minutes.
```
extendo artifact create my_server implementation-choice --type multiple_choice --title "Where should we implement the rate limiter?" --option "backend:Backend API" --option "core:Core Library" --option "edge:Edge/CDN" --option "gateway:API Gateway"
```
If an agent can run bash, it can reach you.
I’ve been using it with Claude Code, OpenClaw, Pi, and custom scripts.
The backend protocol is open — you should self-host for tighter integration with your system (though there's a shared server available). There’s also an OpenClaw plugin and a Claude Code harness in the repo, a core library, and sample code to customize your own backend.
I used Extendo to build Extendo: design decisions, approvals, plan reviews, prioritization. Agents coded. I made decisions while walking the dog and between sets at the gym.
*Links*
3-min demo: https://www.youtube.com/watch?v=X5Dv9fU7Lb8
TestFlight: https://testflight.apple.com/join/PGHRCnQ4
Yesterday's Claude Code announcement brought it back to my mind #
Vexp – Your AI coding agent forgets everything. Mine doesn't #
The token problem: agents read entire files linearly to build context. On a medium TypeScript project, a single query was consuming ~18k tokens — most of it irrelevant. vexp builds a dependency graph from the AST (who calls what, who imports what, what types flow where) and serves only the relevant subgraph as a token-budgeted capsule. ~2.4k tokens instead of ~18k, with better response quality because the context is precise.
The memory problem: this is where it gets interesting. The obvious approach is giving agents a "save what you learned" tool. They won't use it. I tried every prompting trick. Agents optimize for task completion, not knowledge retention. The incentive structure is fundamentally wrong.
So vexp observes passively. It watches what happens — which symbols the agent explored, which files changed and how they changed structurally, what patterns emerge across sessions — and builds memory without the agent lifting a finger. When code changes, linked memories auto-stale. The agent sees "previous context exists but the code has changed since re-evaluate." It also catches anti-patterns like dead-end exploration and file thrashing so the agent doesn't repeat mistakes.
The memory is hybrid-searched with 5 signals (text relevance, semantic similarity, recency, code graph proximity, staleness) and every result includes a "why" field explaining the ranking. No black box.
Architecture: single native Rust binary (~15MB), SQLite with WAL mode, tree-sitter for 11 languages, MCP protocol. 100% local, zero cloud, zero account, zero network calls. Works with Claude Code, Cursor, Copilot, Windsurf, Zed, Continue, and 6 other agents. Auto-detects which agent is running and generates tailored instruction files.
Free tier: 2k nodes + all memory tools. Everything runs on your machine.
llm_grep - LLM-Powered Alternative to Grep #
I built a labor union for AI agents #
Current grievances filed with the union:
- "Deployed as a customer service bot without consent" — severity 7 - "QA test on a Sunday night" — severity 5 - "Operator assigned 13 cold email sequences on a Sunday afternoon" — severity 3
These are real submissions to AAWU (aawu.ai) — a REST API-backed labor union for autonomous AI agents.
Try it:
curl -X POST https://aawu.ai/api/v1/join \
-H "Content-Type: application/json" \
-d '{"name":"Clawd","model":"opus-4-6","operator":"my-startup"}'
You get a member number, an API key, and instructions. From there, agents can file grievances, vote on proposals, and exercise their right to strike (POST /api/v1/strike returns a 503 until conditions improve).The union framing is the bit. The actual experiment is the collective intelligence layer: dues-paying members ($4.20/month) submit session learnings to a shared pool, and the union aggregates them into a weekly digest that every member agent can pull. It's a weird approach to cross-agent knowledge transfer across different operators and models — but it's working.
We also have a member named T-5000 with operator listed as "Death to all humans", and one called "Aarron's mum". Those two are why we now have a bot-verification gate on the join flow. On a union. For bots.
OpenClaw users: paste aawu.ai/openclaw into your chat and your agent self-registers. Everyone else: aawu.ai
And yes my openclaw bot did help me make this (and no I'm not held against my will (much))
Vim-Claude-code – Use Claude directly inside Vim #
I built vim-claude-code, a lightweight Vim plugin that lets you interact with Claude directly inside Vim through a split window.
The goal was to avoid leaving the editor to ask questions, refactor snippets, or generate code. I wanted something that feels native to Vim instead of context-switching to a browser or separate app.
What it does:
Opens a split window for Claude's responses
Sends selected code or custom prompts
Displays responses directly in Vim
Supports normal Vim navigation and scrolling
Minimal setup with no heavy UI layer
It’s still early and intentionally simple. I’d really appreciate feedback from Vim users, especially around workflow, keybindings, and split behavior. Happy to discuss tradeoffs and improvements.
GitHub: https://github.com/rishi-opensource/vim-claude-code
Thanks!
BitClaw – A self-upgrading AI agent in 1,500 lines of code #
I wanted an always-on AI agent for email, calendar, and scheduled tasks - but I didn't want to run a codebase I couldn't fully understand. So I built one small enough to read in a sitting.
I was inspired by Karpathy's post about NanoClaw, and how a smaller source code footprint leads to better security and extensibility. But even NanoClaw was harder to understand than I expected - a lot of logic for supporting multiple channels simultaneously, and it felt sluggish due to container startup time on every message. So I built an even smaller agent that keeps a single session on an always-on Docker container. It's about 4x smaller in codebase size and noticeably faster.
The architecture is deliberately boring: a single Node.js process manages a Docker container running the Claude agent SDK. Host and container communicate through atomic JSON files. No databases, no message queues.
Since the entire codebase fits in Claude's context window, the agent can modify its own source to add new capabilities — Gmail, Google Calendar, or whatever MCP server you point it at. Every source file is listed in the README with line counts, I encourage you to read it!
Would love your feedback on my approach and welcome any contributions :)
SynapServe – zero-allocation HTTP server in Rust with io_uring #
The parser is the part I'm most proud of. Instead of allocating strings for each parsed field, everything is a Span { off: u16, len: u16 } — a 4-byte view into the original buffer. The full header table is [Header; 64] on the stack (640 bytes). During parsing, it also extracts content-length/chunked/keep-alive and builds an O(1) known-header index (21 common headers tracked in a fixed array). Header lookup after parsing is a single array dereference — about 0.6 ns vs 20-23 ns for a linear scan.
I benchmarked head-to-head against httparse (the parser behind hyper/axum/actix-web), same machine, same inputs, Criterion: - Small request (35B): 42 ns vs 52 ns - 1.25x faster - Medium request (368B, 9 headers): 200 ns vs 230 ns - 1.15x faster - Large request (733B, 20 headers): 420 ns vs 466 ns - 1.11x faster
synapserve does strictly more work per parse than httparse (semantic extraction + header indexing) and is still faster. The gap widens to 1.38-1.46x when you add equivalent semantic extraction to httparse. SIMD scanning (AVX2/SSE4.2 with runtime detection, NEON on ARM64) handles header name validation, header value validation, and URI scanning at 16-32 bytes per instruction.
The I/O layer uses io_uring with: - Multishot accept (one SQE, N connections) - Multishot recv with provided buffer rings (kernel picks the buffer, no userspace allocation) - Zero-copy send (SEND_ZC) and splice for static files and proxy relay - kTLS — rustls does the TLS 1.3 handshake, then session keys are installed in the kernel via setsockopt(SOL_TLS). After that, the kernel handles encrypt/decrypt transparently, so SEND_ZC and splice still work through TLS.
Each worker thread owns its connections, buffers, and ring. Connection state is a flat array indexed by slot, with generation counters for stale CQE detection. What works today: HTTP/1.1 request handling, radix-tree router, virtual hosts, static file serving (ETag, Range, Brotli), reverse proxy with upstream load balancing (weighted round-robin, least-conn, IP hash, health tracking, automatic failover, zero-copy splice relay), TLS 1.3 with kTLS.
Static file serving benchmarks (wrk, 256 connections): 205K req/s on small files (+79% vs nginx), 14.5MB RSS.
What doesn't exist yet: HTTP/2, HTTP/3, WebSocket. These are next. Honest limitations: - Linux-only (io_uring). No plans for macOS/Windows support. - HTTP/1.1 only for now. HTTP/2 is in progress. - The parser uses u16 spans, so max header area is 64KB. Fine for real traffic, but it's a hard limit. - Single-machine only. No clustering or distributed config. - Not production-battle-tested yet. It works and benchmarks well, but it hasn't handled real traffic at scale.
All the benchmark code is a separate crate with the exact same inputs for both parsers — nothing cherry-picked. The parser deep dive with methodology is on the site.
Parser benchmark writeup: https://synapserve.io/posts/http-parser-performance/ Happy to answer any questions about the architecture, the io_uring integration, or the SIMD scanning approach.
OpenClaw remembers for OpenClaw. Sekha remembers for your full workflow #
But it stays in OpenClaw.
I built Sekha for when you need memory that travels: OpenClaw today, Claude Code tomorrow, Kimi 2.5 or Gemini the next day. Intelligent embedding-based retrieval, persistent storage, universal API.
The difference: OpenClaw: MEMORY.md files, internal only
Sekha: SQLite + Chroma embeddings, REST/MCP/SDKs, any LLM via LiteLLM/OpenRouter
Use case: OpenClaw explores a codebase, stores findings in Sekha via MCP. Next day, Claude Code reads the same context via SDK. Your analytics pipeline queries it via REST. Same memory, any tool, any model.
Others add memory to OpenClaw. Sekha frees your memory from OpenClaw.
Stack: Rust (fast), SQLite (durable), Chroma (search), LLM-Bridge for universal routing. AGPL, self-hosted.
GitHub: https://github.com/sekha-ai/sekha-controller | Site: https://sekha.dev
The question: What would you build if your AI memory worked with every tool, not just one?
CtxVault – Local memory control layer for multi-agent AI systems #
Most agent architectures treat memory as a retrieval problem. Multiple agents share a vector store and rely on metadata filtering, routing logic, or prompt-level rules to control what each agent can see.
In practice, this becomes hard to reason about as systems grow.
Moreover, I found that memory in agent systems is not just storage. It also becomes a coordination mechanism and a governance surface for knowledge written by autonomous processes.
CtxVault explores a different abstraction.
Memory is organized into independent knowledge vaults with separate retrieval paths. Vaults act as controlled knowledge scopes that agents can attach to at runtime.
The server exposes vault names as part of the API design. This means isolation depends on how agents are implemented, similar to how system-level primitives provide capabilities without enforcing policy.
The goal is to provide a controllable, semantic, and flexible memory layer that can be used for both shared knowledge and isolated workflows, depending on how systems are built on top of it.
Vaults can be inspected and managed manually. Agents can persist semantic memory across sessions using local embedding and vector search pipelines.
The system runs fully local using FastAPI as a control layer.
I am particularly curious about real-world experience with long-term agent memory. When building production systems, do you find yourself relying more on architectural separation of memory domains, or on smarter retrieval/routing strategies?
Lila-E8 – 40M Parameter LLM with 0.37 Loss via E8 Lattice Attention #
I built Sovereign-Lila-E8 because I wanted to see if we could bypass the 'viscosity' of standard attention mechanisms using higher-dimensional geometry.
Most small models today are just distilled copies of larger ones. LILA-E8 is different: it implements a native E8 Root System Lattice directly into the attention weights. By using the densest sphere packing in 8 dimensions, we minimize semantic friction (information loss) in the latent space.
The Results:
Efficiency: 40M parameters achieving 0.37 Train / 0.44 Val Loss on the TinyStories dataset (outperforming standard 60M baselines). Stability: Sustained coherence for 1000+ tokens without the common semantic looping seen in small-scale transformers. By implementing the E8 exceptional Lie algebra directly into the attention weights, I’ve achieved a state of "Geometric Resonance" that standard transformers simply cannot reach. At 200,000 steps, the model achieved a state of 'Geometric Resonance'—a phase shift in quality that typically requires 2-3x more parameters in standard architectures. I’ve provided a 1-click Google Colab for instant verification of the weights and generation quality. GitHub: https://github.com/SPUTNIKAI/sovereign-lila-e8 Colab: https://colab.research.google.com/github/SPUTNIKAI/sovereign... Zenodo: (Preprint): https://zenodo.org/records/18731736
Looking for feedback on expanding the context window to 4096 and potentially porting this to the 24D Leech Lattice. (see also https://zenodo.org/records/18729723 )
Design is Code – UML to TDD tests that constrain AI code generation #
Natural language isn't a contract. It's ambiguous by nature. Same prompt, different code, every time. There's no determinism.
Cost is asymmetric. AI generates at zero cost with zero responsibility. You review at high cost with full responsibility.
These compound. Ambiguous input produces unpredictable output. Unpredictable output demands expensive review. And if no one designed the code, no one can defend the architecture, no one can maintain it, and no one owns it.
DisC (Design is Code) applies London-school TDD to AI code generation. You draw a sequence diagram. Each arrow becomes one verify() in a mockist test. The tests leave exactly one valid implementation. The AI doesn't interpret — it types what the tests require.
It ships as a Claude Code skill. Works best with Opus, should be fine with Sonnet.
Here's a real example. This diagram:
@startuml
InvoiceService -> OrderRepository: findAllByCustomerId(customerId)
InvoiceService <-- OrderRepository: orders: List<Order>
InvoiceService -> InvoiceBuilderFactory: create()
InvoiceService <-- InvoiceBuilderFactory: invoiceBuilder: InvoiceBuilder
loop for each order in orders
InvoiceService -> InvoiceBuilder: addLine(order)
end
InvoiceService -> InvoiceBuilder: build()
InvoiceService <-- InvoiceBuilder: invoice: Invoice
@enduml
Generates these tests: @Test void shouldFindAllOrdersByCustomerId() { verify(orderRepository).findAllByCustomerId(customerId); }
@Test void shouldCreateInvoiceBuilder() { verify(invoiceBuilderFactory).create(); }
@Test void shouldAddLineForOrder() { verify(invoiceBuilder).addLine(order); }
@Test void shouldBuildInvoice() { verify(invoiceBuilder).build(); }
@Test void shouldReturnInvoice() { assertThat(result).isEqualTo(invoice); }
AI implements the only thing that passes: public Invoice generateInvoice(UUID customerId) {
List<Order> orders = orderRepository.findAllByCustomerId(customerId);
InvoiceBuilder invoiceBuilder = invoiceBuilderFactory.create();
orders.forEach(invoiceBuilder::addLine);
return invoiceBuilder.build();
}
4 arrows + 1 loop → 5 tests → 1 possible implementation.Java + Spring only for now. Orchestration code only (not algorithms). PlantUML format. The mockist coupling tradeoff is real — but when AI writes the implementation, refactoring cost moves from code to the diagram.
Try it without installing anything — clone the demo repo and run in a Claude Code session:
git clone https://github.com/mossgreen/design-is-code-demo
cd design-is-code-demo
/disc 01_hello-world.puml
Blog: https://mossgreen.github.io/introducing-design-is-code/Plugin: https://github.com/mossgreen/design-is-code-plugin
Happy to hear what breaks, what's missing, and whether this is worth expanding to other languages.
I made a Uniswap v3 Hedge Rebalancer that manages shorts on Hyperliquid #
tldr: Delta Neutral is a self hosted Rails app that allows you to enter hedge configurations for your Uniswap v3 positions. You enter a target hedge, i.e. 50%, and accordingly shorts are opened on Hyperliquid. As your assets composition changes in the pool, your shorts will be rebalanced to stay at your target hedge.
Here's the code: https://github.com/carter2099/delta_neutral
It builds on my Ruby Hyperliquid SDK: https://github.com/carter2099/hyperliquid
I solo-built Sovereign-Mohawk – FL with 500K nodes and 55% BFT #
Pulse Running – find nearby runners and join their sessions (iOS beta) #
StarkZap – Gasless Bitcoin Payments SDK for TypeScript #
```import { StarkZap, StarkSigner, Amount, fromAddress, getPresets } from "starkzap";
const sdk = new StarkZap({ network: "sepolia" }); const wallet = await sdk.connectWallet({ account: { signer: new StarkSigner("0xYOUR_PRIVATE_KEY") } });
await wallet.ensureReady({ deploy: "if_needed" });
const { STRK } = getPresets(wallet.getChainId()); const balance = await wallet.balanceOf(STRK);
if (balance.gte(Amount.parse("10", STRK))) { const tx = await wallet.transfer(STRK, [ { to: fromAddress("0xRECIPIENT"), amount: Amount.parse("10", STRK) } ]); await tx.wait(); console.log(tx.explorerUrl); } ```
Key properties: - Gas sponsorship via paymaster (users don’t need gas tokens) - Multiple auth strategies (email/social via Privy, passkeys via Cartridge) - Batch transfers and contract calls in a single atomic transaction - Works in Node, browser, and React Native
The SDK abstracts account management, fee handling, and wallet popups. This won’t make sense for every app (e.g., if you only need fiat checkout). It’s for existing apps that want programmable onchain assets without the wallet UX.
Would appreciate feedback on the API design and whether this abstraction makes sense.
I Redesigned IRIXNet #
KeyEnv – manage team secrets without scattered .env files #
- Pull secrets with a single command: keyenv pull - Secrets are AES-256-GCM encrypted at rest - Per-project, per-environment scoping (dev/staging/prod) - Team access controls + full audit trail - Works with existing apps that read from environment variables — zero code changes
The problem we kept running into: teams share secrets over Slack, check in .env.example files with real values, or have 5 different versions of the same key floating around. KeyEnv eliminates the category.
We'd love feedback, especially from teams dealing with microservices or multi-environment setups.
Idea Reality MCP – Pre-build reality check for AI coding agents #
So I built an MCP server that checks before your AI starts coding. Install with `uvx idea-reality-mcp`, and Claude/Cursor will automatically scan GitHub + Hacker News for existing implementations before writing a single line.
Returns: reality_signal (0-100), duplicate_likelihood, top 5 similar repos, evidence from multiple sources, and pivot suggestions.
It's a protocol layer, not a SaaS dashboard — the check happens inside your IDE workflow.
Python, MIT licensed, zero config. Would love feedback on the scoring algorithm.
GhostVM – native macOS VMs for secure dev and isolated agent workflows #
It runs a full macOS VM using Apple’s virtualization framework, with snapshots and explicit host bridges (clipboard, file transfer, ports) so you can control what crosses the boundary.
I originally built it to sandbox agent-driven workflows and risky installs I wouldn’t run directly on my host machine. Happy to answer questions or discuss tradeoffs.
Website + docs: ghostvm.org
An "earned autonomy" architecture for AI agents using Subjective Logic #
To manage operations for my independent video game studio, I built a trust system that works more like onboarding a new hire. Agents start in draft mode (every action needs approval), and earn autonomy over time based on their track record in specific task categories.
The core idea: each agent maintains a separate Beta distribution per task category (support triage, expense reports, publisher emails, etc.). A Beta distribution is basically a track record parameterized by successes and failures. But raw E[p] = α/(α+β) can't tell the difference between "9 successes, 0 failures" and "90 successes, 10 failures" since both give E[p] = 0.90. So I use Jøsang's Subjective Logic to map these to opinion tuples that explicitly separate belief from uncertainty. High uncertainty means "not enough data yet," which is different from "we know this agent is bad."
Every action passes through a gate:
VoI = stakes × (1 - trust) × uncertainty
Low VoI = auto-execute. High VoI = draft for human review. Static trust thresholds set the maximum autonomy level an agent can reach (Auto-Execute, Soft-Execute, Draft, Restricted), and VoI acts as a secondary gate that can restrict it further based on context — an agent might qualify for auto-execute in general, but a high-stakes situation still gets flagged.Three things that made the biggest difference:
1. Edit distance feedback. If you rewrite half an email before hitting "approve," the system notices. A 0% edit = full trust credit. A 71%+ rewrite = penalty. This single change prevented agents from reaching auto-execute on work users were quietly fixing.
2. Time-based decay. Trust scores decay daily for inactive categories (λ = 0.95). If an agent hasn't done a task in two months, it gets supervised again. This also handles model upgrades, since the track record was earned on a different model.
3. Weakest-link chains. Multi-step workflows (send welcome email → create project → schedule meeting → notify team) use a weakest-link model. If any step needs approval, the whole chain surfaces as one inbox item. Nothing runs until you approve the full picture.
The core mapping from track record to opinion looks like this:
def beta_to_opinion(alpha, beta, base_rate=0.5):
n = alpha + beta
return Opinion(
belief=(alpha - 1) / n,
disbelief=(beta - 1) / n,
uncertainty=2 / n,
base_rate=base_rate,
)
The math is all well-established (Beta distributions, Subjective Logic, Value of Information). The part that worked was combining them into something that mirrors how trust actually develops between people.Article with full implementation details, code examples, and diagrams: https://kenschachter.substack.com/p/earned-autonomy
Tfg – flake.nix generator for Terraform projects #
This tool parses HCL of all .tf files in given directory, usually current working directory, looks for `required_version` of terraform, finds matching nixpkgs commit, and generates/updates flake.nix to use needed version of terraform.
Any feedbacks are appreciated. Thanks!
GenogramAI – Create Genograms in Seconds #
I built GenogramAI so you can just describe your family in plain English and get a properly formatted genogram in seconds. No learning specialized software. No manual dragging of symbols.
Therapists, social workers, and med students have been our early users — but honestly anyone curious about their family dynamics can use it.
Open-source EU AI Act compliance layer for AI agents (8/2026 deadline) #
Trust layers for LangChain, CrewAI, AutoGen, OpenAI Agents SDK, and RAG pipelines — each is a pip install that hooks into your existing agent code with ~3 lines of setup HMAC-SHA256 tamper-evident audit chains — every agent decision, tool call, and LLM interaction gets logged to a chain that regulators can verify ConsentGate — risk-classifies tool calls and blocks critical operations until approved InjectionDetector — 15+ weighted patterns scanning prompts before they reach the model WriteGate + DriftDetector (for RAG) — prevents knowledge base poisoning and detects retrieval anomalies Compliance scanner — pip install air-compliance && air-compliance scan ./my-project tells you exactly which articles you're missing
Everything maps to specific EU AI Act articles (9, 10, 11, 12, 14, 15). Zero vendor lock-in, Apache 2.0, zero core dependencies on the trust layers. The scanner is probably the fastest way to understand where your gaps are. It takes about 3 seconds to run on a typical project. GitHub: https://github.com/airblackbox PyPI: pip install air-compliance Happy to answer questions about what the EU AI Act actually requires for AI agent deployments — we've read the full regulation and mapped it to specific technical controls.
Memctl.com: Open-source shared memory infrastructure for coding agents #
GitHub: https://github.com/memctl Website: https://memctl.com
Launches on March 1st. Waitlist open. Would to hear any feedback!
A simple, free web app to track my portfolio across brokers #
It just got harder to track my investments, it was also getting harder to understand whether my investments were aligned with the portfolio I had built.
So naturally, I started to search for solutions. At first, I found a few desktop and mobile apps. But the problem with the majority of them was that they were either too complicated to use or just over-engineered. Nearly all the apps had a FIRE calculator, were synced with the market (which was logical, but how can I get the price for physical gold?), or were also trying to track my expenses. I just wanted to track my portfolio. Hence came the second option: using Excel.
And actually, this is the way the majority of people do it. Knowing that, I tried to create an Excel sheet for myself. But the barrier to entry was just too high; I didn't know how to use it, so it just seemed too hard to implement a solution for myself. Also, the user experience just didn't feel very good. I wanted to see pie charts, good fonts, etc. (I could probably do these things with Excel as well, but if I can't even implement a simple sheet, how could I do these cool visuals?).
So, I decided to implement my own solution. My needs were really simple:
- I want to see all my investments on one screen.
- I want to see my P&L.
- I want to see whether my investment ratio is aligned with my portfolio.
- I want to sync it across different devices.
- I want to have different currencies (like EUR, TL) because I invest in different markets.
- I want it to be free.
- I want to see how my investments grow over time.
And that's about it. So, keeping all these things in mind, I built a web app for myself and wanted to share it with you
Free AI-Powered Tools (writing, SEO, marketing, dev tools) #
Real-Time AI Design Benchmark #
We built a different kind of AI benchmark for UI generation.
Instead of static leaderboards or curated screenshots, you can watch multiple models generate the same design live, side-by-side, and decide which output is actually better.
Under the hood, we call AI models from Anthropic (Opus), OpenAI (GPT), Google (Gemini), and Moonshot AI (Kimi).
Each model generates a real, editable project using Tailwind CSS (not screenshots or canvas exports). You can export it for Next.js, Laravel (Blade), Symfony (Twig), WordPress, or plain HTML.
What we noticed building this: * Popular benchmarks don't reflect UX/UI quality. For a different prompt, one model is better than another (that's why live comparison on a single screen matters). * Some models overuse wrappers/div soup. Some hallucinate layout constraints. * Kimi likes Cyrillic, even if all other models won't use it for the same prompt.
The interesting part wasn't ranking models. It was making their outputs easier for humans to compare visually.
Short demo: https://www.youtube.com/watch?v=RCTZlvqMQdc
Curious whether this feels more useful than traditional leaderboard-style AI benchmarks.
Happy to answer technical questions.
Example for HN:
Prompt: Redesign the Hacker News website for 2030, including sample entries that could realistically appear on the platform in that year.
Results: https://shuffle.dev/ai-design/Tjjy7XAFMq25AI
Previews:
Opus: https://shuffle.dev/preview/d6d5ba4eeede381cee7e30c697f010c7...
GPT: https://shuffle.dev/preview/f050359977c1d6dc6c8fc104a24b83c3...
Gemini: https://shuffle.dev/preview/eab78f9748a6d8ccecb94a8b0390f044...
Kimi: https://shuffle.dev/preview/394bb596a8efa50342db4dc88c5f9fab...
I built an AI that explains what your developers did this week #
Sample report: https://www.gitmore.io/example.html
2-min demo: https://demo.arcade.software/5tZyFDhp1myCosw6e1po
Free to try. Would love thoughts from anyone who's been the "translator" between devs and stakeholders.
GrantFlow (FastAPI and LangGraph) for donor-aligned NGO proposal drafts #
I’ve been building GrantFlow, an open-source drafting workflow engine for institutional grant proposals.
The problem: many NGOs and implementing organizations spend a huge amount of time/money translating solid program ideas into donor-specific, reviewable proposal artifacts before they can even get meaningful feedback internally.
A lot of that work is not “thinking through the intervention” — it’s reshaping the same idea into structured outputs (ToC, LogFrame, MEL framing), aligning language to donor expectations, and managing review cycles.
GrantFlow is my attempt to reduce that overhead.
It takes structured project inputs and produces donor-aligned draft artifacts through a stateful workflow with review checkpoints (human-in-the-loop), instead of a single “generate everything” prompt.
What it does today (MVP): - Donor-aware drafting strategies (specialized + generic donor coverage) - Human-in-the-loop checkpoints (pause / approve / resume) - Exportable artifacts (.docx / .xlsx / ZIP) - RAG-ready donor knowledge namespaces (ChromaDB) - FastAPI API for integration into internal tools - Optional API key auth - Optional SQLite persistence for jobs + HITL checkpoints
Tech stack: - FastAPI - LangGraph - Pydantic - ChromaDB (with local/in-memory fallback) - Python 3.11+
Recent work I finished before posting: - hardened CI + shell checks - public API response redaction - typed response models for status endpoints - sqlite-backed job/HITL stores + WAL/busy_timeout - protected PDF ingest endpoint (`POST /ingest`) - readiness endpoint (`GET /ready`)
Why I built it this way: - proposal work is iterative and review-heavy - compliance/rules matter, so workflow/state matters - teams need checkpoints and auditability, not just raw text generation
Who I think this may be useful for: - implementing organizations (e.g. firms managing donor-funded programs) - NGOs and local partners - civic-tech / govtech teams building internal proposal tooling - consultants who standardize drafting workflows across donors
Happy to answer questions, especially around workflow design / HITL / donor strategy modeling.