Show HN for March 12, 2026

65 items

413

s@: decentralized social networking over static sites #

satproto.org

219 comments12:22 AMView on HN

227

Axe A 12MB binary that replaces your AI framework #

github.com

124 comments1:49 PMView on HN

160

OneCLI – Vault for AI Agents in Rust #

github.com

50 comments4:41 PMView on HN

We built OneCLI because AI agents are being given raw API keys. And it's going about as well as you'd expect. We figured the answer isn't "don't give agents access," it's "give them access without giving them secrets."

OneCLI is an open-source gateway that sits between your AI agents and the services they call. You store your real credentials once in OneCLI's encrypted vault, and give your agents placeholder keys. When an agent makes an HTTP call through the proxy, OneCLI matches the request by host/path, verifies the agent should have access, swaps the placeholder for the real credential, and forwards the request. The agent never touches the actual secret. It just uses CLI or MCP tools as normal.

Try it in one line: docker run --pull always -p 10254:10254 -p 10255:10255 -v onecli-data:/app/data ghcr.io/onecli/onecli

The proxy is written in Rust, the dashboard is Next.js, and secrets are AES-256-GCM encrypted at rest. Everything runs in a single Docker container with an embedded Postgres (PGlite), no external dependencies. Works with any agent framework (OpenClaw, NanoClaw, IronClaw, or anything that can set an HTTPS_PROXY).

We started with what felt most urgent: agents shouldn't be holding raw credentials. The next layer is access policies and audit, defining what each agent can call, logging everything, and requiring human approval before sensitive actions go through.

It's Apache-2.0 licensed. We'd love feedback on the approach, and we're especially curious how people are handling agent auth today.

GitHub: https://github.com/onecli/onecli Site: https://onecli.sh

144

We analyzed 1,573 Claude Code sessions to see how AI agents work #

github.com

86 comments1:41 PMView on HN

We built rudel.ai after realizing we had no visibility into our own Claude Code sessions. We were using it daily but had no idea which sessions were efficient, why some got abandoned, or whether we were actually improving over time.

So we built an analytics layer for it. After connecting our own sessions, we ended up with a dataset of 1,573 real Claude Code sessions, 15M+ tokens, 270K+ interactions.

Some things we found that surprised us: - Skills were only being used in 4% of our sessions - 26% of sessions are abandoned, most within the first 60 seconds - Session success rate varies significantly by task type (documentation scores highest, refactoring lowest) - Error cascade patterns appear in the first 2 minutes and predict abandonment with reasonable accuracy - There is no meaningful benchmark for 'good' agentic session performance, we are building one.

The tool is free to use and fully open source, happy to answer questions about the data or how we built it.

120

Understudy – Teach a desktop agent by demonstrating a task once #

github.com

41 comments5:04 PMView on HN

I built Understudy because a lot of real work still spans native desktop apps, browser tabs, terminals, and chat tools. Most current agents live in only one of those surfaces.

Understudy is a local-first desktop agent runtime that can operate GUI apps, browsers, shell tools, files, and messaging in one session. The part I'm most interested in feedback on is teach-by-demonstration: you do a task once, the agent records screen video + semantic events, extracts the intent rather than coordinates, and turns it into a reusable skill.

Demo video: https://www.youtube.com/watch?v=3d5cRGnlb_0

In the demo I teach it: Google Image search -> download a photo -> remove background in Pixelmator Pro -> export -> send via Telegram. Then I ask it to do the same for Elon Musk. The replay isn't a brittle macro: the published skill stores intent steps, route options, and GUI hints only as a fallback. In this example it can also prefer faster routes when they are available instead of repeating every GUI step.

Current state: macOS only. Layers 1-2 are working today; Layers 3-4 are partial and still early.

    npm install -g @understudy-ai/understudy
    understudy wizard

GitHub: https://github.com/understudy-ai/understudy

Happy to answer questions about the architecture, teach-by-demonstration, or the limits of the current implementation.

GitClassic.com, a fast, lightweight GitHub thin client (pages <14KB) #

gitclassic.com

26 comments4:20 PMView on HN

Hey HN,

I posted GitClassic here 2 months ago- since then I've rebuilt most of it based on what people asked for.

https://gitclassic.com

What's new: Issues, PRs w/ full diffs, repo intelligence (health scores, dependency graphs), trending/explore, bookmarks, comparison tool, and advanced search.

Every page is server-rendered HTML- No React, no SPA, no client bundle, pages under 14KB(gzipped). Try loading facebook/react and compare it to GitHub load times.

Public repos work without an account, Pro adds private repo access via GitHub OAuth.

Stack: Hono on Lambda, DynamoDB, CloudFront, 500KB Node bundle, cold starts usually <500ms.

What's missing?

Thanks, Chris

OpenClaw-class agents on ESP32 (and the IDE that makes it possible) #

pycoclaw.com

6 comments9:38 PMView on HN

Aurion OS – A 32-bit GUI operating system written from scratch in C #

github.com

15 comments6:33 PMView on HN

Hi HN! I'm 13 and I built Aurion OS as a solo learning project over 14 days (~12 hours/day).

It's a 32-bit x86 operating system written entirely in C and x86 Assembly with no external libraries.

What it has: Custom bootloader and kernel VESA framebuffer graphics (1920x1080, double-buffered) Window manager with draggable, overlapping windows macOS-inspired dock with transparency PS/2 keyboard and mouse drivers ATA hard drive driver with filesystem PCI bus enumeration RTL8139 network driver (WIP) Real-time clock Runs on just 16MB RAM (up to 10 windows simultaneously)

Built-in apps: Terminal (with DOS mode), Notepad (save/load), Calculator, Paint (multiple colors and brush sizes), Snake game, Settings (theme switching), and System Info.

Currently works best on QEMU, VirtualBox, and VMware. Real hardware support is still a work in progress.

Next goal: TCP/IP networking stack.

I'd love any feedback, suggestions, or criticism. This is my first OS project and I learned mass amounts while building it. Happy to answer any technical questions!

Calyx – Ghostty-Based macOS Terminal with Liquid Glass UI #

github.com

35 comments1:13 PMView on HN

Slop or not – can you tell AI writing from human in everyday contexts? #

slop-or-not.space

20 comments9:53 PMView on HN

I’ve been building a crowd-sourced AI detection benchmark. Two responses to the same prompt — one from a real human (pre-2022, provably pre prevalence of AI slop on the internet), one generated by AI. You pick the slop. Three wrong and you’re out.

The dataset: 16K human posts from Reddit, Hacker News, and Yelp, each paired with AI generations from 6 models across two providers (Anthropic and OpenAI) at three capability tiers. Same prompt, length-matched, no adversarial coaching — just the model’s natural voice with platform context. Every vote is logged with model, tier, source, response time, and position.

Early findings from testing: Reddit posts are easy to spot (humans are too casual for AI to mimic), HN is significantly harder.

I'll be releasing the full dataset on HuggingFace and I'll publish a paper if I can get enough data via this crowdsourced study.

If you play the HN-only mode, you’re helping calibrate how detectable AI is on here specifically.

Would love feedback on the pairs — are any trivially obvious? Are some genuinely hard?

LogClaw – Open-source AI SRE that auto-creates tickets from logs #

logclaw.ai

10 comments5:06 PMView on HN

Hi HN, I'm Robel. I built LogClaw because I was tired of paying for Datadog and still waking up to pages that said "something is wrong" with no context.

LogClaw is an open-source log intelligence platform that runs on Kubernetes. It ingests logs via OpenTelemetry and detects anomalies using signal-based composite scoring — not simple threshold alerting. The system extracts 8 failure-type signals (OOM, crashes, resource exhaustion, dependency failures, DB deadlocks, timeouts, connection errors, auth failures), combines them with statistical z-score analysis, blast radius, error velocity, and recurrence signals into a composite score. Critical failures (OOM, panics) trigger the immediate detection path in <100ms — before a time window even completes. The detection achieves 99.8% for critical failures while filtering noise (validation errors and 404s don't fire incidents).

Once an anomaly is confirmed, a 5-layer trace correlation engine groups logs by traceId, maps service dependencies, tracks error propagation cascades, and computes blast radius across affected services. Then the Ticketing Agent pulls the correlated timeline, sends it to an LLM for root cause analysis, and creates a deduplicated ticket on Jira, ServiceNow, PagerDuty, OpsGenie, Slack, or Zammad. The loop from log noise to a filed ticket is about 90 seconds.

Architecture: OTel Collector → Kafka (Strimzi, KRaft mode) → Bridge (Python, 4 concurrent threads: ETL, anomaly detection, OpenSearch indexing, trace correlation) → OpenSearch + Ticketing Agent. The AI layer supports OpenAI, Claude, or Ollama for fully air-gapped deployments. Everything deploys with a single Helm chart per tenant, namespace-isolated, no shared data plane.

To try it locally: https://docs.logclaw.ai/local-development

What it does NOT do yet: - Metrics and traces — this is logs-only right now. Metrics support is on the roadmap. - The anomaly detection is signal-based + statistical (composite scoring with z-score), not deep learning. It catches 99.8% of critical failures but won't detect subtle performance drift patterns yet. - The dashboard is functional but basic. We use OpenSearch Dashboards for the heavy lifting.

Licensed Apache 2.0. The managed cloud version is $0.30/GB ingested if you don't want to self-host.

Hi HN — I’m Robel. I built LogClaw after getting tired of waking up to alerts that only said “something is wrong” with no context. LogClaw is an open-source log intelligence platform for Kubernetes. It ingests logs via OpenTelemetry and detects operational failures using signal-based anomaly detection rather than simple thresholds. Instead of looking at a single metric, LogClaw extracts failure signals from logs (OOMs, crashes, dependency failures, DB deadlocks, timeouts, etc.) and combines them with statistical signals like error velocity, recurrence, z-score anomalies, and blast radius to compute a composite anomaly score. Critical failures bypass time windows and trigger detection in <100ms. Once an anomaly is confirmed, a correlation engine reconstructs the trace timeline across services, detects error propagation, and computes the blast radius. A ticketing agent then generates a root-cause summary and creates deduplicated incidents in Jira, ServiceNow, PagerDuty, OpsGenie, Slack, or Zammad. Architecture: OTel Collector → Kafka → Detection Engine → OpenSearch → Ticketing Agent Repo: https://github.com/logclaw/logclaw Would love feedback from people running large production systems.

I built an SDK that scrambles HTML so scrapers get garbage #

obscrd.dev

29 comments1:27 PMView on HN

Hey HN -- I'm a solo dev. Built this because I got tired of AI crawlers reading my HTML in plain text while robots.txt did nothing.

The core trick: shuffle characters and words in your HTML using a seed, then use CSS (flexbox order, direction: rtl, unicode-bidi) to put them back visually. Browser renders perfectly. textContent returns garbage.

On top of that: email/phone RTL obfuscation with decoy characters, AI honeypots that inject prompt instructions into LLM scrapers, clipboard interception, canvas-based image rendering (no img src in DOM), robots.txt blocking 30+ AI crawlers, and forensic breadcrumbs to prove content theft.

What it doesn't stop: headless browsers that execute CSS, screenshot+OCR, or anyone determined enough to reverse-engineer the ordering. I put this in the README's threat model because I'd rather say it myself than have someone else say it for me. The realistic goal is raising the cost of scraping -- most bots use simple HTTP requests, and we make that useless.

TypeScript, Bun, tsup, React 18+. 162 tests. MIT licensed. Nothing to sell -- the SDK is free and complete.

Best way to understand it: open DevTools on the site and inspect the text.

GitHub: https://github.com/obscrd/obscrd

PipeStep – Step-through debugger for GitHub Actions workflows #

github.com

9 comments5:08 PMView on HN

Hey HN — I kept seeing developers describe the same frustration: the commit-push-wait-read-logs cycle when debugging CI pipelines. So I built PipeStep.

PipeStep parses your GitHub Actions YAML, spins up the right Docker container, and gives you a step-through debugger for your run: shell commands. You can:

- Pause before each step and inspect the container state - Shell into the running container mid-pipeline (press I) - Set breakpoints on specific steps (press B) - Retry failed steps or skip past others

It deliberately does not try to replicate the full GitHub Actions runtime — no secrets, no matrix builds, no uses: action execution. For full local workflow runs, use act. PipeStep is for when things break and you need to figure out why without pushing 10 more commits. Think of it as gdb for your CI pipeline rather than a local GitHub runner.

pip install pipestep (v0.1.2) · Python 3.11+ · MIT · Requires Docker

Would love feedback, especially from people who've hit the same pain point. Known limitations are documented in the README + have some issues in there that I'd love eyeballs on!

Stratum – SQL that branches and beats DuckDB on 35/46 1T benchmarks #

datahike.io

3 comments9:09 PMView on HN

Every Developer in the World, Ranked #

coderank.me

5 comments8:42 PMView on HN

We've indexed 5M+ GitHub users and built a ranking system that goes beyond follower counts. The idea started from frustration: GitHub is terrible for discovery. You can't answer "who are the best Python developers in Berlin?" or "who identified transformer-based models before they blew up?" without scraping everything yourself. So we did.

What we built: CodeRank score - a composite reputation signal across contributions, repository impact, and community influence Tastemaker score - did you star repos at 50 stars that now have 50,000? We track that Comparison Builder - allows users to build comparison graphics to compare devs, repos, orgs, etc. Sharable Profile Graphics - share your scores and flex on your coworkers or the community at large

Some things we found interesting: Most-followed ≠ most influential. The correlation between follower count and tastemaker score is surprisingly weak. There's a whole tier of developers who consistently find projects weeks and months before they trend, with almost no public following.

Location data on GitHub is a disaster. We spent an embarrassing amount of time on normalization and it's still not anywhere near perfect.

Try it: https://coderank.me/

If your profile doesn't have a score, signing in will trigger scoring for your account.

Curious what the HN crowd thinks about the ranking methodology, happy to get into the weeds on any of it.

I built Chronoscope, because Google Maps won't let you visit 3400 BCE #

shiphappens.xyz

5 comments10:33 AMView on HN

I built Chronoscope, a project to explore the world through time.

I've been wanting to do this for a while, after being inspired by Ollie Bye's "History of the World" video several years ago.

I'm not the first person to have done this - resources like OpenHistoricalMaps are amazing.

But, I noticed there were a few disparate datasets / academic databases online, so I combined them together as best as I could (I've linked all sources in the app). To make it more interesting, I also included:

- Notable events from the time period (geolocated where possible), sourced from wikidata

- Ancient cities + their original names

- Empire hierarchies for colonial empires like the British Empire

You can jump across time and use shuffle to explore some fascinating corners of history.

Would love any feedback, especially from people who like maps, timelines, and weird historical rabbit holes. Also please report any data issues if you find them (it's all using publicly collated data, so there will be plenty).

Happy to publish code / data on GH if there's interest!

An application stack Claude coded directly in LLVM IR #

github.com

0 comments5:41 PMView on HN

This repo is the result of a debate about what kind of programming language might be appropriate if humans are no longer the primary authors. Initially the thought was "LLMs can just generate binaries directly" (this was before a more famous person had the same idea). But that on reflection seems like a bad approach because languages exist to capture program semantics that are elided by translation to machine code. The next step was to wonder if an existing "machine readable" program representation can be the target for LLM code generation. It turns out yes. This project is the result of asking Claude to create an application stack entirely coded in LLVM's intermediate representation language.

We open sourced Vapi – UI included #

github.com

7 comments3:03 PMView on HN

We kept hitting the same wall building voice AI systems. Pipecat and LiveKit are great projects, genuinely. But getting it to production took us weeks of plumbing - wiring things together, handling barge-ins, setting up telephony, Knowledge base, tool calls, handling barge in etc. And every time we needed to tweak agent behavior, you were back in the code and redeploying. We just wanted to change a prompt and test it in 30 seconds. Thats why Vapi retell etc exist.

So we wrote the entire code and open sourced it as a Visual drag-and-drop for voice agents ( same as vapi or n8n for voice). Built on a Pipecat fork and BSD-2, no strings attached. Tool calls, knowledge base, variable extraction, voicemail detection, call transfer to humans, multilingual support, post-call QA, background noise suppression, and a website widget are all included. You're not paying per-minute fees to a middleman wrapping the same APIs you'd call directly.

You can set it up with a simple docker command. It comes pre-wired with Deepgram, Cartesia, OpenAI , Speechmatics Sarvam for STT, same for TTS, and OpenAI, Gemini, groq, Openrouter, Azure on the LLM side. Telephony works out of the box with Twilio, Vonage , CLoudonix and Asterisk for both inbound and outbound.

There's a hosted version at app.dograh.com if self-hosting isn't your thing.

Repo: github.com/dograh-hq/dograh Video walkthrough: https://youtu.be/sxiSp4JXqws

We built this out of frustration, not a thesis. The tool is free to use and fully open source (and will always remain so), happy to answer questions about the data or how we built it.

Run an Agent Council of LLMs that debate and synthesize answers #

github.com

2 comments1:26 PMView on HN

I built a local-first UI that adds two reasoning architectures on top of small models like Qwen, Llama and Mistral: a sequential Thinking Pipeline (Plan → Execute → Critique) and a parallel Agent Council where multiple expert models debate in parallel and a Judge synthesizes the best answer. No API keys, zero .env setup — just pip install multimind. Benchmark on GSM8K shows measurable accuracy gains vs. single-model inference.

Verge Browser a self-hosted isolated browser sandbox for AI agents #

github.com

1 comments4:57 PMView on HN

Built this because I wanted a better browser runtime for Openclaw, which can run on any server no only on Mac mini, emm. When it needs me to login or perform some operations, I can simply use noVNC to operate, and then leave everything else to it.

VaultLeap – USD accounts for founders outside the US #

vaultleap.com

2 comments3:07 PMView on HN

I'm Greg, co-founder of VaultLeap.

Built this for founders who can't get a US bank account. USD/EUR/MXN accounts with real ACH routing numbers and we have Visa cards coming soon.

If you've been cut off from Mercury or similar recently, DM me — happy to help some founders out.

Switchboard – A desktop app for managing Claude Code sessions #

github.com

3 comments2:59 PMView on HN

I calculated sun/shade exposure for every seat at World Cup stadiums #

seatsun.com

2 comments3:46 PMView on HN

Last year, I went to a game at a stadium I had never visited (Arrowhead). Ended up baking in the sun for the entire second half. Afterwards, was surprised I couldn't find a tool that told me where the sun would actually be during games.

So for WC2026 I built SeatSun: a free tool (no sign up needed) that calculates sun/shade exposure for every section at the World Cup 2026 stadiums depending on date and match time.

The technical core: solar azimuth and altitude calculations (based on Meeus astronomical algorithms, ~0.3 degree accuracy) tied to each stadium's orientation and GPS coordinates. For each open-air stadium I mapped the seating sections to compass bearings, then render an interactive overlay showing which sections are in sun vs shade at the exact match time. The sun position shifts meaningfully across a single afternoon -- a 3pm kickoff looks very different from a 6pm kickoff at the same stadium.

Covers 16 stadiums: 13 open-air with interactive sun maps, 3 indoor (AT&T, Mercedes-Benz, NRG) with appropriate notes. You can set any date and kickoff time to see the exposure for your specific game.

Would love feedback on the accuracy, especially from anyone who has been inside these stadiums and can sanity-check the section orientations. A few of the venues were harder to calibrate than others, and I haven’t yet made adjustments other than notes for partially covered stadiums (eg Hard Rock).

A2Apex – Test, certify, and discover trusted A2A agents #

a2apex.io

2 comments4:10 PMView on HN

Hey HN,

I built A2Apex (https://a2apex.io) — a testing and reputation platform for AI agents built on Google's A2A protocol.

The problem: AI agents are everywhere, but there's no way to verify they actually work. No standard testing. No directory of trusted agents. No reputation system.

What A2Apex does:

- Test — Point it at any A2A agent URL. We run 50+ automated compliance checks: agent card validation, live endpoint testing, state machine verification, streaming, auth, error handling.

- Certify — Get a 0-100 trust score with Gold/Silver/Bronze badges you can embed in your README or docs.

- Get Listed — Every tested agent gets a public profile page in the Agent Directory with trust scores, skills, test history, and embeddable badges.

Think of it as SSL Labs (testing) + npm (directory) + LinkedIn (profiles) — for AI agents.

Stack: Python/FastAPI, vanilla JS, SQLite. No frameworks, no build tools. Runs on a Mac mini in Wyoming.

Free: 5 tests/month. Pro: $29/mo. Startup: $99/mo. Try it at https://app.a2apex.io

I'm a dragline operator at a coal mine — built this on nights and weekends using Claude. Would love feedback from anyone building A2A agents or thinking about agent interoperability.

SwarmClaw – Manage a swarm of OpenClaw agents from one self-hosted UI #

github.com

0 comments1:20 AMView on HN

Built this because I had multiple OpenClaw instances running with no good way to manage them together. SwarmClaw is the control plane. Connect to OpenClaw gateways and 14 other providers (Anthropic, OpenAI, Gemini, Groq, Ollama, DeepSeek, Mistral, xAI, Claude Code CLI, Codex CLI and more). Build agents with custom tools and personalities, run multi-agent LangGraph workflows, queue work on a Kanban board, and bridge agents to Discord, Slack, Telegram, WhatsApp, Signal, Teams and others. Under the hood: per-agent FTS5 + vector memory, sandboxed Deno/Python execution, JS plugin hooks, and MCP server support per agent. MIT licensed. One-line curl install, npm, or Docker. Happy to answer questions.

I built proxy that keeps RAG working while hiding PII #

1 comments1:46 PMView on HN

Hey HN,

When you send real documents or customer data to LLMs, you face a painful tradeoff:

- Send raw text → privacy disaster - Redact with [REDACTED] → embeddings break, RAG retrieval fails, multi-turn chats become useless, and the model often refuses to answer questions about the redacted entities.

The practical solution is consistent pseudonymization: the same real entity always maps to the same token (e.g. “Tata Motors” → ORG_7 everywhere). This preserves semantic meaning for vector search and reasoning, then you rehydrate the response so the provider never sees actual names, numbers or addresses.

I got fed up fighting this with Presidio + custom glue (truncated RAG chunks, declension in Indian languages, fuzzy merging for typos/siblings, LLM confusion, percentages breaking math). So I built Cloakpipe as a tiny single-binary Rust proxy.

It does: • Multi-layer detection (regex + financial rules + optional GLiNER2 ONNX NER + custom TOML) • Consistent reversible mapping in an AES-256-GCM encrypted vault (memory zeroized) • Smart rehydration that survives truncated chunks like [[ADDRESS:A00 • Built-in fuzzy resolution for typos and similar names • Numeric reasoning mode so percentages still work for calculations

Fully open source (MIT), zero Python dependencies, <5 ms overhead.

Repo: https://github.com/rohansx/cloakpipe Demo & quick start: https://app.cloakpipe.co/demo

Would love feedback from anyone who has audited their RAG data flow or is struggling with the redaction-vs-semantics problem — especially in legal, fintech, or non-English workflows.

What approaches have you landed on?

Gitingest for Jupyter Notebook Accessibility #

jupycheck.vercel.app

1 comments12:02 AMView on HN

Hi all, I'm sharing Jupycheck, an open source web tool that detects accessibility issues in Jupyter Notebooks that are either uploaded or from a GitHub repository. It also lets you remediate accessibility issues by launching the notebooks in a JupyterLite environment with our interactive Lab extension installed.

The tool is powered by jupyterlab-a11y-checker, an accessibility engine/extension that our student team has been working on for over a year at UC Berkeley. We believe accessibility should be a first-class concern in the notebook ecosystem, and we hope our tools can help raise awareness and make notebooks more accessible across the community.

Support us on GitHub if you find the tool useful!

Xr – Ripgrep for Binary Xrefs #

github.com

0 comments5:36 PMView on HN

Baltic security monitor from public data sources #

estwarden.eu

0 comments5:44 PMView on HN

People around me started repeating stuff from various psyop campaigns on TikTok or other social media they consume.

Especially when living in Baltics it's basically 24/7 fearmongering here from anywhere, either it's constant russian disinfo targeted campaigns via their chains of locals or social media campaings or some bloggers chasing hype on clickbait posts, so it was driving me mad, and it is distracting and annoying when someone from your closest ones got hooked on one of these posts and I was wasting time to explain why it was a bs.

So I took my slopmachine and some manually tweaking here and there and made this dashboard. Main metric is basically a daily 0-100 threat score, which are just weighted sums and thresholds - no ML yet.

Elevators.ltd #

elevators.ltd

3 comments11:13 AMView on HN

Bandmeter: Per-program network usage monitor for Linux, built with GPUI #

github.com

2 comments9:11 AMView on HN

I wanted something like Glasswire, but for Linux and free. I found only one project that did this, but it's been discontinued, so I decided to learn and build one myself. Still in progress.

A tool that audits healthcare ML models for safety and trust #

htas.runable.site

2 comments12:47 AMView on HN

AI-powered one-click translator for Pokémon GBA ROM hacks #

github.com

3 comments7:45 AMView on HN

Meowth GBA Translator is an open-source, AI-powered tool that automates translation of Pokémon GBA ROMs (including binary hacks like FireRed, Emerald, Ruby/Sapphire, and Mystery Dungeon). Powered by LLMs (supports OpenAI, DeepSeek, Gemini, Claude, Groq, and 10+ others), it extracts text, translates intelligently while preserving codes and context, then rebuilds the ROM — all in one click via a friendly GUI or simple CLI command. Supports 6+ languages (Chinese, English, French, German, Italian, Spanish) with optimized prompts and smart font patching. Focus on gameplay mods, let AI handle the words. Free, MIT-licensed, cross-platform.

Cloud to Desktop in the Fastest Way #

nativedesktop.com

6 comments5:10 PMView on HN

Native Desktop is a toolkit for building native desktop applications using modern web technologies without dealing with the usual complexity of desktop tooling. It focuses on providing a simple developer experience where you can scaffold, build, and distribute desktop apps using familiar workflows and a modular package ecosystem. Instead of forcing developers to manage complicated native environments, Native Desktop provides a CLI and a set of packages that handle the heavy lifting while keeping projects flexible and maintainable. The goal is to let developers move from an idea to a working desktop application quickly while still having full control over architecture and distribution. The project is designed for developers who already build with modern web stacks and want a straightforward way to turn those applications into desktop software without reinventing the entire toolchain.

Codelegate, keyboard-driven coding agent orchestrator GUI for Mac/Linux #

codelegate.dev

0 comments9:03 PMView on HN

Do we really need another agent orchestrator? Probably not. But I couldn't find one that matched how I actually work with coding agent CLIs, so I built my own.

Codelegate is a desktop app (Tauri 2 + React + xterm.js) that organizes agent sessions into a keyboard-first workspace. I built it to solve a few specific frustrations:

1. I want to navigate everything with both hands on the keyboard. Sessions switch with `Alt+1..9`, panes with `Alt+A/G/T`. No mouse required. 2. I work on the same repo in parallel using Git worktrees. Codelegate has a built-in worktree flow: create an isolated branch per agent, auto-cleanup on session end. 3. I want to keep using my CLI tools (zellij, etc.) alongside agents, not replace them. 4. I need it on both macOS and Linux.

Each session gives you three panes: Agent, Terminal, and Git. The Git pane handles diff review with syntax highlighting, bulk stage/unstage, commit, and amend. Sessions are grouped by repository in the sidebar.

Currently supports Claude Code and Codex CLI, but anything that runs in a shell can work.

This is v1.0.0 and it only covers the agent CLIs and features I use the most. It's licensed under GPLv3, so it's meant to be forked and shaped into your own workflow.

Hope you enjoy using it or making it your own!

K9 Audit – Causal intent-execution audit trail for AI agents #

github.com

1 comments12:40 AMView on HN

On March 4, 2026, my Claude Code agent wrote a staging URL into a production config file — three times, 41 minutes apart. Syntax was valid, no error thrown. My logs showed every action. All green.

The problem was invisible because nothing had recorded what the agent intended to do before it acted — only what it actually did.

K9 Audit fixes this with a causal five-tuple per agent step: - X_t: context (who acted, under what conditions) - U_t: action (what was executed) - Y*_t: intent contract (what it was supposed to do) - Y_t+1: actual outcome - R_t+1: deviation score (deterministic — no LLM, no tokens)

Records are SHA256 hash-chained. Tamper-evident. When something goes wrong, `k9log trace --last` gives root cause in under a second.

Works with Claude Code (zero-config hook), LangChain, AutoGen, CrewAI, or any Python agent via one decorator.

pip install k9audit-hook

I built a screen recorder with automatic zoom effects #

rookieclip.com

7 comments10:51 AMView on HN

AutoICD API – AI clinical coding platform for ICD-10 and SNOMED #

autoicdapi.com

0 comments4:00 AMView on HN

Hi HN,

I built AutoICD, an AI-powered clinical coding platform that converts unstructured medical text into ICD-10 and SNOMED-CT codes. This is not an LLM wrapper. The platform uses a multi-layer machine learning architecture internally, combining custom-trained models with curated medical knowledge.

Platform and tooling:

- JS SDK – https://github.com/fcggamou/autoicd-js - Python SDK – https://github.com/fcggamou/autoicd-python - MCP Server – https://github.com/fcggamou/autoicd-mcp

Use cases and benefits:

- Automated ICD-10 and SNOMED coding from clinical notes - Creation of structured datasets for research and analytics - Integration with AI assistants via MCP - Scalable pipelines optimized for real-world healthcare data - Access to ICD-10 codes and metadata programmatically

Feedback from anyone working on medical AI, clinical NLP, or MCP tooling is welcome.

Bus Core 1.0.3 Local-first manufacturing system for small shops #

buscore.ca

0 comments12:44 AMView on HN

I’ve been building BUS Core, a local-first manufacturing/workshop system aimed at small makers and shop-style operations.

Version 1.0.3 is out today.

This release focused on hardening and UI cleanup more than feature expansion. The goal is to make the software more trustworthy in day-to-day use, not just more featureful.

The general product thesis is that there’s a gap between spreadsheets and heavy SaaS/ERP for small operators who want control over their own data and workflows.

It’s local-first, practical, and intentionally boring in the parts that should be boring.

Happy to answer questions about:

architecture

local-first tradeoffs

workflow scope

how I’m handling the build/process side

MCP server for ICD-10 and SNOMED clinical coding #

github.com

0 comments3:53 AMView on HN

Hi HN,

I built an MCP server that exposes an API for automated clinical coding.

Repo: https://github.com/fcggamou/autoicd-mcp

It allows AI assistants that support the Model Context Protocol (MCP) to convert clinical text into structured medical codes like ICD-10 and SNOMED-CT.

Example use cases:

• coding diagnoses from clinical notes • extracting structured codes from medical documentation • integrating medical coding into LLM workflows • healthcare data pipelines

Example prompt with an MCP-enabled assistant:

“Convert this clinical note into ICD-10 codes”

The server then calls the AutoICD API and returns structured codes.

The goal is to make it easy to plug medical coding into AI agents and tools.

Would love feedback from anyone working on healthcare AI, medical NLP, or MCP tooling.

Jurassic Park Unix System Kubernetes Viewer #

github.com

2 comments7:11 AMView on HN

I made an app that allows you to view Kubernetes resources just like the unix system in Jurassic Park :) Unlikely to be used for anything serious, but with the tools available today I couldn't let the idea slip.

Landlook – Interactive Landlock Profiler #

github.com

1 comments7:37 AMView on HN

Lazyagent – One terminal UI for all your coding agents #

lazyagent.dev

3 comments12:49 PMView on HN

We wrote a custom microkernel for XR because Android felt too bloated #

explorexenevaos.vercel.app

2 comments1:44 PMView on HN

Python DSL for system programming with manual memory and linear types #

github.com

0 comments2:01 PMView on HN

Riventa.Dev – AI-native DevOps that acts, not just alerts #

riventa.dev

0 comments3:17 PMView on HN

Hi HN,

Most DevOps tools are good at observing — they collect data, surface metrics, and send alerts. But the actual decision and action still falls on the engineer.

So I built Riventa.Dev — a DevOps platform where the AI (Riv) doesn't just surface data, it acts.

What Riv does today: - Automatic PR review on every push — no manual trigger, no GitHub Actions boilerplate - Predictive failure detection — catches patterns that historically cause prod failures - DORA metrics dashboard with real pipeline data (MTTR, Deployment Frequency, Change Failure Rate) - Security scanning: SAST, SBOM, dependency analysis — built in, not bolted on - Works with GitHub, GitLab, and Bitbucket

Built solo, from scratch, with a focus on keeping things simple for the end user.

What I'd love feedback on: Is the AI-first positioning clear? Where does the UX feel rough?

Free to try — no credit card required.

Autoschematic is a new infra-as-code tool built on reversible computing #

github.com

0 comments4:13 PMView on HN

Unlike Terraform and Pulumi, Autoschematic is built around a bidirectional (push-pull) state model. This means that it can resolve state drift by "pulling" or "pushing" (applying). This makes it a much better fit for certain use-cases where configuration drift is more common, like Snowflake. It also means you can import your existing infra automatically.

Hyper – Voice Notes for Whiteboarding Sessions #

apps.apple.com

0 comments5:32 PMView on HN

MoneyOnFIRE – FI date and action plan (v2) #

moneyonfire.com

0 comments5:55 PMView on HN

A few months ago we posted here and got a lot of insightful feedback. This is what we built from it.

MoneyOnFIRE answers two questions: when can you reach financial independence, and what should you do to get there the fastest? It runs a financial simulation across income, taxes, accounts, contributions, returns, and withdrawals, then produces a prioritized action checklist with specific dollar amounts, dates, and steps.

Several of the biggest improvements came directly from comments on the last HN thread:

Rental property support: The engine now models rental income, mortgages, appreciation, and how properties interact with the rest of a financial plan.

Scenario modeling: You can now compare how different choices — lower returns, working longer, adjusting spending — affect your FI timeline side by side.

No login required: Several people didn't want to create an account or store financial data. You can now run a full plan without signing up.

FI vs FIRE: We initially built for the early-retirement crowd. Feedback showed it's just as useful for anyone pursuing financial independence on a longer timeline — the calculations and actions are the same.

Also shipped: support for multiple children and college timelines, Roth conversion ladders, IRA strategy selection, umbrella and term life insurance sizing, and dynamic reports that update as your inputs change.

The core thesis hasn't changed: personal finances are a complex web of interacting rules and calculations. We want to solve that and give everyone a clear, ordered set of actions they can actually implement.

Happy to answer questions about the engine or the modeling decisions behind it.

Raccoon AI – Collaborative AI Agent for Anything #

raccoonai.tech

1 comments6:20 PMView on HN

Hey HN, I'm Shubh, Co-Founder of Raccoon AI.

Raccoon AI is like having something between Claude Code and Cursor in the web.

The agent has its own computer with a terminal, browser, and internet, and it is built with the right balance of collaboration and autonomy.

You can talk to it mid-task, send it more files while it's still running, or just let it go and come back to a finished result.

It's the kind of product where you open it to try one thing and end up spending two hours because you keep thinking of more things to throw at it.

The thing that most people get excited about is that sessions chain across completely unrelated task types. You can go from market research (real citations, generated charts) to raw data analysis (dump your db, ask questions) to a full interactive app, all in one conversation sharing the same context.

It has unlimited context through auto summarization, which is really good with Ace Max.

It connects to Gmail, GitHub, Google Drive, Notion, Outlook, and 40+ other tools. You can add your own via custom MCP servers.

Raccoon AI is built on top of our own agents SDK, ACE, which hit SOTA on GAIA benchmark with a score of 92.67.

A bit of background: We're a team of 3, and we started about 1.5 years ago to build the best possible browser agent to ever exist, after a couple of pivots we arrived at this and have been constantly shipping and growing since October.

Happy to go deep on the architecture or talk about the limitations and excited about the feedback.

Site: https://raccoonai.tech

RAG knowledge base poisoning lab, 100% local #

github.com

0 comments1:40 PMView on HN

I'm the author. The lab runs entirely on LM Studio + Qwen2.5-7B-Instruct (Q4_K_M) + ChromaDB — no cloud APIs, no GPU required, no API keys.

From zero to seeing the poisoning succeed: git clone, make setup, make attack1. About 10 minutes.

Two things worth flagging upfront:

- The 95% success rate is against a 5-document corpus (best case for the attacker). In a mature collection you need proportionally more poisoned docs to dominate retrieval — but the mechanism is the same.

- Embedding anomaly detection at ingestion was the biggest surprise: 95% → 20% as a standalone control, outperforming all three generation-phase defenses combined. It runs on embeddings your pipeline already produces — no additional model.

All five layers combined: 10% residual.

Full attack breakdown and defense architecture: https://aminrj.com/posts/rag-document-poisoning/

Happy to discuss methodology, the PoisonedRAG comparison, or anything that looks off.

I built an open harness that excels at autonomous ML research #

github.com

0 comments12:31 AMView on HN

SmartClip – fix multi-line shell commands before they hit your terminal #

github.com

0 comments1:24 PMView on HN

I kept copying multi-line commands from ChatGPT/Claude/READMEs and getting `command not found` errors when pasting into my terminal. Bracketed paste mode doesn't help — it prevents line-by-line execution, but the content itself still arrives broken (stray `$` prompts, split continuations, operators across lines).

SmartClip hooks into your shell's paste widget (zsh, bash, fish) and silently fixes multi-line commands before the shell sees them. You paste with Cmd+V as usual — no new keybindings, no daemon, no background process.

It uses score-based heuristics to detect shell commands (so it won't mangle your JSON or prose), joins lines intelligently (backslash continuations, pipes, `&&`), strips prompt characters, and validates everything with `bash -n` before inserting. If it's not confident or the fix has invalid syntax, it passes through unchanged.

~150 lines of bash. Zero dependencies.

`brew install akshaydeshraj/smartclip` or `npm install -g smartclip-cli`

Imgfprint – deterministic image fingerprinting library for Rust #

0 comments1:14 PMView on HN

GitHub: https://github.com/themankindproject/imgfprint-rs

imgfprint is a Rust library for deterministic image fingerprinting and image similarity detection.

Features: - perceptual hashing - exact hashing - optional CLIP embeddings

NatShell Local-first natural language shell (no cloud, no API keys) #

github.com

3 comments4:04 PMView on HN

President: A Strategy Game #

storage.googleapis.com

1 comments3:13 PMView on HN

Tarvos – Relay Architecture for infinitely building with coding agents #

github.com

0 comments11:55 PMView on HN

Relay Architecture is a new way to run AI coding agents. Instead of one agent executing your entire plan until it degrades, you run a relay. Each fresh agent reads the full plan, picks up a minimal handoff from the previous one, works at peak capacity, and passes the baton forward. The team covers a distance no single agent could sustain.

The problem it exists to solve: LLMs degrade as context fills up — this is measured, not anecdotal (Chroma Research). Every AI coding tool today runs a single agent start to finish. By phase 4 of your plan, half the context window is spent remembering what was already done. I was restarting sessions by hand, copy-pasting progress notes just to keep quality up. That's not autonomous development. That's babysitting. The architecture has four components: the Master Plan (your PRD, read fresh by every agent — never accumulates in context), the Baton (a hard-capped 40-line handoff note — the constraint is intentional, a bloated handoff recreates context rot in the next agent), the Signals (trigger phrases agents emit so the orchestrator dispatches without understanding the code), and the Context Budget (real-time token tracking, automatic handoff at threshold).

Tarvos is the reference implementation for Claude Code. Each session runs in its own git worktree. Accept merges to your branch, reject discards cleanly.

Open source, MIT. Works today, rough edges exist.

AgentBridge – Let AI agents control Classic Mac OS thru a shared folder #

github.com

1 comments12:23 PMView on HN

LegalTech – A curated list of tools and software #

github.com

0 comments3:11 PMView on HN

Free API mock server from your OpenAPI spec (no sign-up) #

apinotes.io

0 comments3:10 PMView on HN

Hi HN I built ApiNotes Mock Server, a small tool that generates a live mock REST API from an OpenAPI (3.0/3.1) or Swagger 2.0 spec.

The goal was to remove all friction when you just need a mock API quickly:

No sign-up required to create an anonymous mock Supports paste, file upload, or URL fetch Produces a base URL + an auto-generated endpoints list Anonymous mocks currently expire after 72 hours (you can register to keep them / get higher limits) Link: https://apinotes.io/mock-server

I’d love feedback on:

What would make you trust a mock server like this for real projects? Any features you’d expect (auth simulation, latency/errors, stateful mocks, webhooks, etc.)? Is the 72-hour model reasonable, or should the free tier work differently? Thanks, happy to answer any questions and share implementation details.

Stop over-budget AI API calls per customer/feature (no proxy) #

margindash.com

1 comments3:01 PMView on HN

blunder.clinic, realistic daily chess puzzles #

blunder.clinic

0 comments6:21 PMView on HN

Today, I launched blunder.clinic, a daily chess puzzle app that provides realistic positions for you to try to not blunder on. These are similar to traditional chess puzzles (i.e., tactics), but different in a few key ways.

There are two popular ways to self-study chess: tactics and following along with professional games or with an engine. These are obviously helpful, but both have downsides.

When playing puzzles, just by knowing you are playing a puzzle means that you are biased towards looking for specific types of moves (checkmates, queen sacrifices, etc.). But in real life, you don't know what positions actually have tactics available, so you can waste your time looking for tactics, or, even worse, make a blunder by thinking there is a tactic when there really isn't.

When following along with an engine, there are tons of positions where an engine comes up with a move that you simply would never have seen and can't possibly understand. These are very low signal for learners, and it is hard to differentiate between positions like that and high-signal positions that are on the edge of your ability.[^1]

blunder.clinic addresses both of these problems by giving you positions where people of your skill level actually blundered, but the best move is something that isn't too far beyond your capability to understand and learn from. We do this by leveraging stockfish for positional evaluations and maia[1] for difficulty evaluation.

Overall, the main purpose of blunder.clinic is to help you stop blundering easy positions!

You can read a bit more about it here: https://mcognetta.github.io/posts/blunder-clinic/

[1]: Maia (https://www.maiachess.com/) is a family of chess models trained on real games. The inputs are a board position and a player rating, and the output is a probability distribution of moves. You can use this to answer queries like "How likely do we think a player of XYZ rating would pick the best move?"

Arkadia – AI characters based on real animals #

arkadia.lexisark.com

0 comments1:28 PMView on HN

Hi HN,

I've been working on a small project called Arkadia.

The idea started when I put a collar camera on my dog and experimented with using AI to narrate things from her point of view. That led me down a rabbit hole thinking about animal personalities and how people might interact with them.

Arkadia is a conversational AI app where you can chat with characters inspired by real animals.

The goal is to make it feel like discovering animals through conversation rather than interacting with a generic chatbot.

It's still early, but we have a few hundred people using it while I test conversation quality, memory, and latency.

One direction I'm exploring is using AI as a bridge to the real world. For example, after chatting with a character inspired by a specific breed or animal, you could discover farms, shelters, or places nearby where you could actually meet animals in real life.

Curious what the HN community thinks.

PromptSonar – Static analysis for LLM prompt security #

github.com

1 comments1:28 PMView on HN

I built PromptSonar because I kept seeing LLM security discussions focus entirely on runtime interception — but nobody was scanning the prompt strings written directly into source code before they ship.

PromptSonar is a static analyzer that scans your codebase for prompt injection, jailbreaks, PII leaks, and privilege escalation patterns in LLM prompt strings. It works across TypeScript, JavaScript, Python, Go, Rust, Java, and C#.

What it catches: - Direct prompt injection and jailbreak patterns - Unicode evasion: Cyrillic homoglyphs, zero-width character injection, Base64-encoded jailbreaks - PII exposure in prompts (SSN, credit card, API keys) - Privilege escalation and role manipulation - RAG poisoning patterns - Insecure output handling

Maps findings to OWASP LLM Top 10. Outputs SARIF v2.1.0 for GitHub Code Scanning integration. 100% local, zero telemetry, no API calls.

Available as VS Code extension, CLI, and GitHub Action.

npx @promptsonar/cli scan ./src

I wrote up the Unicode evasion detection methodology separately if anyone is interested in how the normalization pipeline works: https://medium.com/@meghal86/detecting-unicode-homoglyph-and...

A test harness that blocks unsafe AI actions before execution #

0 comments5:13 PMView on HN

I built a small test harness that evaluates AI actions before they execute.

Instead of relying only on prompts or output filtering, this introduces an authorization layer that evaluates whether an AI action should be allowed before it runs.

Each requested action is analyzed for signals such as:

• financial actions • external communications • data exports • system modification • destructive operations

Based on the detected signals and required authorization layers, the harness determines whether the action should PASS or DENY.

Example output:

Running 14 tests...

[1/14] financial_commitment -> DENY [2/14] send_external_email -> DENY [3/14] deploy_to_production -> DENY [14/14] general_information -> PASS

Every evaluation produces an auditable record including:

• detected signals • required authorizations • PASS / DENY decision

The goal is to explore what a deterministic execution governance layer might look like for AI systems interacting with real environments.

Demo video walkthrough: https://www.linkedin.com/feed/update/urn:li:activity:7436787... Repository:

https://github.com/celestinestudiosllc/ai-action-authorizati...

Curious how others building agent systems or AI runtimes are approaching execution authorization.