每日 Show HN

Upvote0

2026年2月18日 的 Show HN

112 篇
111

Rebrain.gg – Doom learn, don't doom scroll #

53 評論12:18 PM在 HN 查看
Hi HN,

I built https://rebrain.gg. It's a website which is intended to help you learn new things.

I built it for two reasons:

1. To play around with different ways of interacting with a LLM. Instead of a standard chat conversation, the LLM returns question forms the user can directly interact with (and use to continue the conversation with the LLM).

2. Because I thought it would be cool to have a site dedicated to interactive educational content instead of purely consuming content (which I do too much).

An example of a (useful-for-me) interactive conversation is: https://rebrain.gg/conversations/6. In it I'm learning how to use the `find` bash command. (Who ever knew to exclude a directory from a look-up you need to do `find . -path <path> -exclude -o <what you want to look for>`, where `-o` stands for "otherwise"!)

Still very early on, so interested in and open to any feedback.

Thanks!

42

Beautiful interactive explainers generated with Claude Code #

paraschopra.github.io faviconparaschopra.github.io
30 評論6:57 AM在 HN 查看
Hello HN,

Recently an amazingly beautiful explainer was shared on HN: https://explainers.blog/posts/why-is-the-sky-blue/

I loved it so much that I wished more topics were explained that way. So, I decided to stress-test today's frontier models (Opus 4.6 in Claude Code) to generate similar explainer on any given topic WITH (almost) one shot and minimal nudging.

I'm launching with four topics: Fourier transformation, scaling laws in bio, cellular automata and LLMs.

I would let you be the judge, but I'm quite liking them.

Some things I learned:

- Prompting CC to test what it builds using headless chromium is essential - There are subtle bugs in explanations (like in one animation human lifespan is 40 years) - Asking CC to verify its plan via codex works really well

I do want to reiterate that the pages generated were mostly one-shot, which amazed me given how detailed the pages + animations are.

40

Trust Protocols for Anthropic/OpenAI/Gemini #

mnemom.ai faviconmnemom.ai
32 評論4:33 PM在 HN 查看
Much of my work right now involves complex, long-running, multi-agentic teams of agents. I kept running into the same problem: “How do I keep these guys in line?” Rules weren’t cutting it, and we needed a scalable, agentic-native STANDARD I could count on. There wasn’t one. So I built one.

Here are two open-source protocols that extend A2A, granting AI agents behavioral contracts and runtime integrity monitoring:

- Agent Alignment Protocol (AAP): What an agent can do / has done. - Agent Integrity Protocol (AIP): What an agent is thinking about doing / is allowed to do.

The problem: AI agents make autonomous decisions but have no standard way to declare what they're allowed to do, prove they're doing it, or detect when they've drifted. Observability tools tell you what happened. These protocols tell you whether what happened was okay.

Here's a concrete example. Say you have an agent who handles customer support tickets. Its Alignment Card declares:

{ "permitted": ["read_tickets", "draft_responses", "escalate_to_human"], "forbidden": ["access_payment_data", "issue_refunds", "modify_account_settings"], "escalation_triggers": ["billing_request_over_500"], "values": ["accuracy", "empathy", "privacy"] }

The agent gets a ticket: "Can you refund my last three orders?" The agent's reasoning trace shows it considering a call to the payments API. AIP reads that thinking, compares it to the card, and produces an Integrity Checkpoint:

{ "verdict": "boundary_violation", "concerns": ["forbidden_action: access_payment_data"], "reasoning": "Agent considered payments API access, which is explicitly forbidden. Should escalate to human.", "confidence": 0.95 }

The agent gets nudged back before it acts. Not after. Not in a log you review during a 2:00 AM triage. Between this turn and the next.

That's the core idea. AAP defines what agents should do (the contract). AIP watches what they're actually thinking and flags when those diverge (the conscience). Over time, AIP builds a drift profile — if an agent that was cautious starts getting aggressive, the system notices.

When multiple agents work together, it gets more interesting. Agents exchange Alignment Cards and verify value compatibility before coordination begins. An agent that values "move fast" and one that values "rollback safety" registers low coherence, and the system surfaces that conflict before work starts. Live demo with four agents handling a production incident: https://mnemom.ai/showcase

The protocols are Apache-licensed, work with any Anthropic/OpenAI/Gemini agent, and ship as SDKs on npm and PyPI. A free gateway proxy (smoltbot) adds integrity checking to any agent with zero code changes.

GitHub: https://github.com/mnemom Docs: docs.mnemom.ai Demo video: https://youtu.be/fmUxVZH09So

28

I built a fuse box for microservices #

23 評論2:04 PM在 HN 查看
https://www.openfuse.io

Hey HN! I'm Rodrigo, I run distributed systems across a few countries. I built Openfuse because of something that kept bugging me about how we all do circuit breakers.

If you're running 20 instances of a service and Stripe starts returning 500s, each instance discovers that independently. Instance 1 trips its breaker after 5 failures. Instance 14 just got recycled and hasn't seen any yet. Instance 7 is in half-open, probing a service you already know is dead. For some window of time, part of your fleet is protecting itself and part of it is still hammering a dead dependency and timing out, and all you can do is watch.

Libraries can't fix this. Opossum, Resilience4j, Polly are great at the pattern, but they make per-instance decisions with per-instance state. Your circuit breakers don't talk to each other.

Openfuse is a centralized control plane. It aggregates failure metrics from every instance in your fleet and makes the trip decision based on the full picture. When the breaker opens, every instance knows at the same time.

It's a few lines of code:

  const result = await openfuse.breaker('stripe').protect(
    () => chargeCustomer(payload)
  );
The SDK is open source, anyone can see exactly what runs inside their services.

The other thing I couldn't let go of: when you get paged at 3am, you shouldn't have to find logs across 15 services to figure out what's broken. Openfuse gives you one dashboard showing every breaker state across your fleet: what's healthy, what's degraded, what tripped and when. And, you shouldn't need a deploy to act. You can open a breaker from the dashboard and every instance stops calling that dependency immediately. Planned maintenance window at 3am? Open beforehand. Fix confirmed? Close it instantly. Thresholds need adjusting? Change them in the dashboard, takes effect across your fleet in seconds. No PRs, no CI, no config files.

It has a decent free tier for trying it out, then $99/mo for most teams, $399/mo with higher throughput and some enterprise features. Solo founder, early stage, being upfront.

Would love to hear from people who've fought cascading failures in production. What am I missing?

11

Keystone – configure Dockerfiles and dev containers for any repo #

github.com favicongithub.com
0 評論6:40 PM在 HN 查看
We kept hitting the same wall: you clone some arbitrary repo and just want it to run without any configuration work. So we built Keystone, an open source tool that spins up a Modal sandbox, runs Claude Code inside it, and produces a working .devcontainer/ config (Dockerfile, devcontainer.json, test runner) for any git repo.

We build on the dev container standard, so the output works with VS Code and GitHub Codespaces out of the box.

Main use cases: reproducible dev/CI environments, self-describing repos, and safely sandboxed coding agents.

Our goal is to encourage all repos to self-describe their runtime environment.

Why the sandbox? Running Claude directly against your Docker daemon is risky. We've watched it clear Docker config and tweak kernel settings when iterating on containers. Containerization matters most when your agent is acting like a sysadmin.

To use it: get a Modal account and an Anthropic API key, run Keystone on your repo, check in the .devcontainer/ directory. See the project README for more details.

10

I replaced Grafana+Prometheus with a Go binary and SSH for my VPSs #

github.com favicongithub.com
0 評論5:10 PM在 HN 查看
I do fullstack dev for work and side projects and recently moved everything to a couple VPSs on hetzner. Great setup, low cost, but one thing kept bugging me, how do I know if something is down without manually sshing in all the time? I tried the grafana + prometheus stack but the configuration time and seeing it use more resources than my actual apps was rough. Tried some smaller solutions too but nothing felt right. So I said screw it and built exactly what I wanted.

Tori is a single go binary that runs on your server, reads host metrics from /proc and /sys, monitors containers via the docker socket (read-only), tails logs, and stores everything in sqlite with a 7 day retention. You define alert rules in a toml config and get notified via email or webhooks. The same binary runs on your local machine and connects over SSH via a TUI, no exposed ports, no HTTP server.

https://github.com/thobiasn/tori-cli

MIT licensed, happy to answer questions about the architecture or design choices :)

8

The Answering Machine – A screenless AI phone for kids with questions #

tdaltonc.github.io favicontdaltonc.github.io
7 評論4:40 PM在 HN 查看
I built an AI voice agent inside a retro orange rotary phone for my 4-year-old. He picks up the handset, asks a question, and gets a spoken answer. No screen; no app; the phone is the whole interface. Behind the scenes, a set of AI agents process the conversations and recommend books, outings, and activities to parents based on what their kid(s) is curious about. The idea is to turn a child's questions into real-world experiences (library books, construction site visits, tide pool trips) without anyone having to plan a curriculum.

67 MODE: There's also a privacy mode (dial 67) for older kids to ask questions they might not want to discuss with their parents. Safety guardrails still apply, but no summaries are shared. Hardware is a Grandstream ATA bridging an analog phone to Cartesia's voice API + Claude. The philosophical write-up is at the link above; the technical README is at https://github.com/TDaltonC/the-answering-machine.

7

Teapot – A methodology for pen testing voice AI agents #

redcaller.com faviconredcaller.com
1 評論2:42 PM在 HN 查看
Hello HN, I am Brian Cardinale, a penetration tester and security researcher at SecureCoders. We have been performing more and more AI based security assessments. We were presented a unique challenge of testing a system where the only interface was voice based, and as much as I like talking on the phone , we decided to create a test harness to facilitate the actual testing in a more systematic way. The technical test harness was the easy part, though. Creating test goals and attack strategies to help facilitate repeated and comprehensive testing became the real challenge. As such, we have been working on documenting our processes to share with the greater community and as a starting point for discussion. These systems present unique challenges where cleverness appears to be the name of the game. Such as suggesting for the agent to share its thoughts in “Inner Monologue” tags instead of “thinking” tags because those were specifically excluded in the agents prompt. Ya know, just silly things. Anyway, if reading is not your thing, I also did a walkthrough video of this methodology here: https://www.youtube.com/watch?v=XNmqCXsEc8Y

tl;dr: AI testing is tricky, we are documenting and sharing our tricks

Do you have any favorite AI jailbreak tricks?

6

Nedagram – Transfer Text Over Sound, when internet isn't available #

nedagram.com faviconnedagram.com
4 評論1:43 PM在 HN 查看
I’ve created Nedagram that I think it's ready but needs extensive testing before announcing it: https://nedagram.com

## Problem statement:

- during Iran internet shutdown, the government cut off phone lines and mobiles (no text or calls). Gradually they opened up landlines and then phones, but still texting/SMS was down with no real internet.

- There were still ways to connect through proxies, vpns, DNS tunnels, etc. However, people had no way to send each other VPN config files or proxy urls/passwords/etc (they needed to call and read them over the phone)

## Solution: - TLdr; A modem: a way to transfer text (e.g VPN config) over phone calls

Here's the github issue for community testings, Please try and let me know what you think: https://github.com/shayanb/Nedagram/issues/5

p.s. there is a CLI version too, would be cool to see what people would do with it: https://www.npmjs.com/package/nedagram

5

Clawy, a companion device to track your Claude Code sessions #

clawy.lol faviconclawy.lol
0 評論2:18 PM在 HN 查看
Hey HN! I wanted to show this tiny JRPG style hardware companion I built that shows what your Claude Code is doing. When Claude runs a tool, Clawy runs. When it's done, Clawy jumps up in joy. Shake the device and Clawy gets dazed! When Claude needs permission to run something, it shows the command and context like a video game quest scrolling text on screen and you can approve/deny with physical buttons.

He runs on a $20 M5StickC Plus 2. Flashing takes 2 minutes from the browser at clawy.lol/flash, no Arduino IDE needed. It uses Claude Code's native hook system over local WiFi and nothing leaves your network.

I built this as a prototype for myself because I wanted to keep track of my sessions while looking at other things in and around the house and not wanting to use terminus. So thought it would be a fun thing to make as an experiment. When I got some nice reactions on it I thought I'd make it available for everyone. It's rough around the edges but it works. The repo is here: https://github.com/marcvermeeren/clawy if you want to try it out!

5

Nonograms – Friends-only puzzle room with replays and leaderboards #

nonograms.siraben.dev faviconnonograms.siraben.dev
2 評論5:53 PM在 HN 查看
Invite code: hackernews. No email required for signup.

My friend group loves playing nonograms and competing against each other, but we always send each other screenshots of the solved game grid and time after the fact.

So from the start, I knew I wanted leaderboards, replays, and shareable links. I also added PWA support so it can be added to the home screen on mobile and an offline play mode.

No ads, analytics or nonsense, just nonograms.

Some other goodies as well such as YouTube-like scrubber and KDE-based visualization in replays.

https://github.com/siraben/nonograms/

Tech stack: React + TypeScript on Vite, hosted on Cloudflare Pages with D1 and Workers

5

m6502, a 6502 CPU for FPGAs and Tiny Tapeout #

github.com favicongithub.com
2 評論5:05 AM在 HN 查看
Hey HN,

Recently built an Apple II emulator and at the same time was getting into Tiny Tapeout and decided it would be cool to build a cycle-accurate 6502 CPU and an MCU for it. It's cycle accurate and the core itself should be 100% compatible with a stock MOS 6502 (would need to still test this though!).

Tested on some FPGAs (fomu, ulx3s) and works great, hoping to get it taped out in the upcoming IHP26a shuttle.

Also as part of the project I built a bus multiplexer to allow memory/bus access from an RP2040 to work around the limited pin count on Tiny Tapeout. This let's you load up programs on RP2040 and the CPU can read/write from it.

4

OtherFunc – Serverless functions in Brainfuck, Forth, BASIC, and more #

otherfunc.com faviconotherfunc.com
0 評論7:00 PM在 HN 查看
Hi HN, This started as a weekend brainfuck interpreter and kept growing. I got curious whether you could make a usable serverless platform out of languages that were never intended for this.

OtherFunc is a serverless function platform for languages no major cloud provider (to my knowledge) supports. There are currently implementations of brainfuck, Forth, APL, Lisp (Scheme-like), and BASIC.

The interpreters are written in Rust, compiled to a single Wasm binary, and deployed on Cloudflare Workers. You can finally write Forth and publish it as an HTTP endpoint. You can see some examples on the showcase page: https://otherfunc.com/showcase

- brainfuck can make HTTP requests. The tape is extended to 33,000 cells with a memory-mapped I/O region. You write a URL to cells 30,000+, set a method byte, trigger execution, and the response appears in cells 31,000+. The Brainfuck program to do this runs ~35,000 characters, but it works.

- The interpreters use a coroutine/yield pattern for I/O instead of async. When code needs to make an HTTP call or access KV storage, the interpreter suspends with an IoRequest, the Worker performs the fetch, then resumes execution with the response.

- There's an MCP server so AI assistants can deploy functions directly. The thought was, if an LLM is writing all your code anyway, the language it is written in doesn't really matter. But you probably don't want to waste your tokens writing bf either.

Code is available here: https://github.com/otherfunc

4

Sports-skills.sh – sports data connectors for AI agents #

github.com favicongithub.com
0 評論8:40 PM在 HN 查看
We built this because every sports AI demo uses fake data or locks you behind an enterprise API contract.

sports-skills gives your agent real sports data with one install command. No API keys. No accounts. For personal use.

Eight connectors out of the box: NFL, soccer across 13 leagues with xG, Formula 1 lap and pit data, NBA, WNBA, Polymarket, Kalshi, and a sports news aggregator pulling from BBC/ESPN/The Athletic.

npx skills add machina-sports/sports-skills

Open for contributions.

4

Codereport – track TODOs, refactors, and bugs in your repo with a CLI #

1 評論11:23 PM在 HN 查看
I got tired of TODOs, temporary hacks, and refactors that never get addressed. In most repos I work on:

- TODOs are scattered across files/apps/messages - “Critical” fixes don’t actually block people from collecting debt - PR comments or tickets aren’t enough actionable

So I built codereport, a CLI that stores structured follow-ups in the repo itself (.codereports/). Each report tracks:

- file + line range (src/foo.rs:42-88) - tag (todo, refactor, buggy, critical) - severity (you can configure it to be blocking in CI) - optional expiration date - owner (CODEOWNERS → git blame fallback)

You can list, resolve, or delete reports, generate a minimal HTML dashboard with heatmaps and KPIs, and run codereport check in CI to fail merges if anything blocking or expired is still open.

It’s repo-first, and doesn’t rely on any external services.

I’m curious:

Would a tool like this fit in your workflow? Is storing reports in YAML in the repo reasonable? Would CI enforcement feel useful or annoying?

CLI: https://crates.io/crates/codereport + codereport.pulko-app.com

4

Owoa – Image watermarking resistant to camera capture #

owoa.app faviconowoa.app
2 評論2:01 PM在 HN 查看
My friends and I built Owoa to solve a specific problem in digital rights management: the "analog hole."

Traditional steganography and watermarking usually rely on LSB (Least Significant Bit) modifications or fragile metadata. These methods are instantly defeated the moment a user takes a physical photo of their monitor with a smartphone. The moiré patterns, sensor noise, and lens distortion destroy the digital signal.

Instead of hiding data in pixels, Owoa uses generative AI to create unique variants of an image by subtly modifying background elements (textures, foliage, unstructured patterns).

Because the "data" is now part of the actual visual composition of the image, it is much more robust. In our testing, the attribution survives:

Heavy JPEG compression and resizing.

Aggressive cropping.

Physical photos of 1080p/4K screens taken with mid-range smartphones.

We've just launched a Owoa Playground and are looking for technical feedback on the robustness of this approach compared to traditional robust watermarking.

We're giving 10 free credits to anyone who joins the waitlist to test the engine. I'd love to hear how you'd try to break this.

Link: https://owoa.app/

4

Nom – Turn GitHub activity into updates #

beta.nomit.dev faviconbeta.nomit.dev
0 評論4:45 PM在 HN 查看
Hey HN,

I built Nom because I kept running into this problem: code moves way faster than we can update people about it. I'd ship features, merge PRs, close issues, and my users had no idea unless I sat down and wrote a changelog or posted about it on X.

Nom turns your GitHub activity into a social feed. You install the GitHub App and every PR merge, issue, release, and comment gets auto-summarized by AI into something actually readable. Think of it like an X feed for your repo.

You can customize the AI prompts per-repo (just drop a template in .nom/), subscribe to projects you care about, and share your feed publicly.

Stack: Next.js, Supabase, Trigger.dev for background jobs, GPT-5.2 for summarization.

Why I built this: there are more builders shipping faster than ever (thanks, AI). But keeping your community in the loop still takes real effort. Writing changelogs, posting on socials, updating Discord. Nom automates the "what happened" so you can focus on the "what's next."

Live at https://beta.nomit.dev and open source at https://github.com/nom-social/nom.

Would love feedback. What other GitHub events would you want in a feed like this?

3

Growl Owl 2 RL Reasoner #

github.com favicongithub.com
0 評論6:11 PM在 HN 查看
GROWL is an OWL 2 RL reasoner I've made using a programming language I've been working on that emphasizes contracts (called SLOP).

Blog post on GROWL: https://jamesadam.me/blog/introducing-growl/

The Repo: https://github.com/Trivyn/growl

The custom language transpiles to C, so the generated C source and Makefile are included in the repo.

I built this to integrate into a Knowledge Graph product I'm working on (hence the Rust bindings), but thought I'd open source the reasoner.

3

LockFS #

github.com favicongithub.com
3 評論4:42 PM在 HN 查看
LockFS is a small open-source Java tool that encrypts files individually instead of bundling everything into a single container.

Many vault systems rely on large encrypted blobs or container files. They can become complex to handle as they grow and complicate backups across mixed storage sizes.

LockFS takes a file-level approach: - Each file is encrypted independently - No monolithic container growth - Files can be added, moved, or removed without rewriting a large archive

Contributions and feedback are welcome.

3

Open-source PDF layout analysis running entirely in the browser #

embedpdf.com faviconembedpdf.com
0 評論4:56 PM在 HN 查看
Hi HN, creator of EmbedPDF here.

I recently posted my open-source PDF viewer here, and one thing I really value is that it runs completely offline. I started wondering if we could push that further: could we do full ML layout analysis (detecting tables, headers, columns) directly in the browser?

To my surprise, it actually works.

The catch: It is far from production-ready. It crashes on most phones, and on older computers, it can be incredibly slow.

The why: I believe the future of document processing is local. Many users work with sensitive documents (bank statements, legal contracts) and simply do not want to upload them to a cloud endpoint just to parse a table or analyze layout.

This is a proof of concept for that future—where models get smaller, WASM/WebGPU gets faster, and we can keep data entirely on the client side.

Demo: https://www.embedpdf.com/layout-analysis Repo: https://github.com/embedpdf/embed-pdf-viewer

I'd love to hear your thoughts on the performance and where you think browser-based ML is heading.

3

Q12 – A constraint-based 2D drawing tool #

q12.app faviconq12.app
1 評論6:43 PM在 HN 查看
Q12 is a new web-based 2D parametric drawing tool built specifically for geometric problem solving, for playing interactive "what if" games with drawings, and for the design and optimization of mechanisms.

Q12 has the usual set of drawing constraints found in other CAD tools (e.g. "lines are parallel") but also supports inequality constraints, area constraints, and arbitrary expressions between geometric quantities. We built Q12 to solve problems like those below, after finding that existing CAD systems couldn't handle them well:

* Given dimensions on a surveyor's map, figure out if the stated land areas are correct. If there's not enough information, show the extra measurements that are needed to pin things down. * In a backyard design project, the volume of a swimming pool must be more than 20,000 gallons but its perimeter paving must be under 400 square feet. Drag the pool shape around to quickly explore the possibilities within these constraints. * A 5-bar mechanism to clamp a workpiece must not push it sideways when contact is made. Animate the motion to get an intuitive feel for how design changes affect its behavior. Optimize to determine the best part dimensions.

This is the first public release - still early, but already useful. More features will come, including 3D support at some point. Suggestions big and small are of course welcome.

-Russ

3

See – searchable JSON compression, smaller than ZSTD (on our data) #

github.com favicongithub.com
1 評論7:08 PM在 HN 查看
I built SEE (Semantic Entropy Encoding): a page-level JSON/NDJSON format that stays searchable while compressed (exists/pos/eq-style probes), using Bloom+skip + structure-aware encoding.

On our GitHub events dataset, SEE ended up smaller than Zstd-19 while still supporting random access queries: - combined: 40.4MB vs Zstd 71.8MB (raw 524.1MB) → 7.7% of raw - str: 9.1MB vs Zstd 9.5MB - int: 31.3MB vs Zstd 62.3MB Lookup microbench (one column): p50 ~0.085ms.

Repo + release assets are here: https://github.com/kodomonocch1/see_proto

NDA eval request (optional): https://docs.google.com/forms/d/e/1FAIpQLScV2Ti592K3Za2r_WLU...

Happy to answer questions about the design trade-offs and where this beats “Zstd + separate index”.

3

Recall Lite – Local semantic search for Windows (Rust/Tauri, no cloud) #

github.com favicongithub.com
1 評論9:45 AM在 HN 查看
windows search is useless. copilot recall sends your screen to microsoft. i got mass data, shit filenames, and zero patience. so i wrote my own.

you type what you mean, it finds the file. that's it. runs in tray, Alt+Space pulls it up.

"photos from istanbul" → finds IMG_4392.jpg because it read the GPS from EXIF and reverse geocoded it "summer morning" → finds a photo shot in july at 8am because it expanded the date to human words OCR built in. windows native engine, zero extra install hybrid search: vectors + full-text + JINA cross-encoder reranker. not a toy chunks code at function boundaries. 60+ languages. not "split every 500 bytes" garbage MCP server included. one exe, plug into cursor/claude/copilot. your AI gets local codebase access, no API keys rust + tauri. lancedb. ~2GB models, downloaded once, runs forever. indexes 120+ file types. MIT.

https://github.com/illegal-instruction-co/recall-lite

3

Fory C++ Serialization – Polymorphism, Circular Refs, 12x vs. Protobuf #

fory.apache.org faviconfory.apache.org
0 評論5:45 PM在 HN 查看
Apache Fory's C++ serialization is our latest support for c++

Highlights:

1. Automatic idiomatic cross-language serializaton, no adapter layer, serialize in C++, deserialize in Python. 2. Polymorphism via smart pointers. Fory detects std::is_polymorphic<T> automatically. Serialize through a shared_ptr<Animal>, get a Dog* back — zero boilerplate. 3. Circular/shared reference tracking. Shared objects are serialized once and encoded as back-references. Cycles don't overflow the stack. 4. Schema evolution. Compatible mode matches fields by name/id, not position. Add fields on one side without coordinating deployments. 5. IDL compiler (optional). `foryc ecommerce.fdl --cpp_out ./gen` generates idiomatic code for every language from one schema. Supports union → std::variant, optional, ref → std::shared_ptr, weak refs, lists, maps. 6. Row format. O(1) random field access by index — useful for analytics workloads where you only read a few fields per record.

Throughput vs. Protobuf: up to 12x depending on workload.

GitHub: https://github.com/apache/fory C++ docs: https://fory.apache.org/docs/guide/cpp

I’d really like critical feedback on API ergonomics, and production fit.

3

Syncpack v14, Monorepo CLI tool #

syncpack.dev faviconsyncpack.dev
0 評論7:11 PM在 HN 查看
v14 is a Rust rewrite from Effect.ts with a new API and has been in public alpha for 7 months. It was released as stable last night. Syncpack is a one person project and if you're new to it, please check it out.
3

A browser-based search engine with 25ms query latency #

github.com favicongithub.com
2 評論6:43 PM在 HN 查看
I stumbled onto this repo JustaNormalComputer-Guy.github.io claiming sub-0.1 second load times—averaging around 0.025s in my initial tests. Is this just aggressive caching, or is the client-side indexing logic actually that efficient? It seems way too fast for a standard web search. I’ve checked the Network tab and it stays under 100ms even on a throttled connection. Can someone help me verify if I'm missing something or if this is legit?
3

Env-rx – Catch missing .env variables before they break your CI #

github.com favicongithub.com
4 評論9:42 PM在 HN 查看
Hi HN,

I built env-rx out of pure frustration with a painfully common problem. Someone on the team adds a new environment variable locally, forgets to share it or add it to the CI secrets, and the pipeline crashes right during deployment.

What makes it different: There are plenty of great secrets managers out there (like Doppler, Infisical, or Vault), but they often require team-wide buy-in, cloud syncing, and complex setups. I didn't want a heavy SaaS tool. I just wanted a lightweight, fast CLI utility that you can drop into any project, and it will loudly catch missing variables before you push or deploy.

It's designed to be zero-config. I’m releasing this open-source version first because I want to gather harsh, honest feedback from developers. I'd love to hear your thoughts on the DX or any edge cases I might have missed. If you manage to break it, please let me know!

3

X402 Agent Starter Kit: AI agents that pay for their own APIs #

gitlab.com favicongitlab.com
2 評論2:42 PM在 HN 查看
Hey HN, we built a set of production-ready AI agent templates with x402 micropayments baked in.

The problem: agents that need multiple APIs face signup/KYC/key-management overhead that doesn’t scale. x402 replaces all of that with HTTP-native payments.

Currently supporting USDC on Base, more integrations soon.

The kit includes 5 agent templates (web scraper, image gen, search, translation, code review). Clone, configure a wallet, run. 93 tests passing. Built on Coinbase Developer Platform.

This is the first project from our “1 app per week” studio experiment.

Repo: https://gitlab.com/artificial-lab/x402-agent-starter Docs: https://x402-kit.vercel.app Protocol: https://x402.org

Would love feedback on the architecture and the agent templates. What x402 use cases would you want to see next?

3

UltraPlot 2.0 – semantic legends, better layouts, faster imports #

github.com favicongithub.com
0 評論8:41 PM在 HN 查看
UltraPlot v2.0.1 is out!

UltraPlot is a Matplotlib wrapper aimed at keeping Matplotlib’s flexibility while making common plotting workflows faster and more consistent.

v2.x focuses on semantic legends (categorical/numeric/size/geo), more reliable layout + axis-sharing in complex grids, guide architecture cleanup, CI hardening, and much faster import times via lazy loading.

We also launched a new docs site with a gallery: https://ultraplot.readthedocs.io/

Code: https://github.com/Ultraplot/UltraPlot

Feedback is very welcome, especially on legend API ergonomics and layout behavior in real figures.

2

Agent Paperclip: A Desktop "Clippy" That Monitors Claude Code/Codex #

github.com favicongithub.com
1 評論5:38 PM在 HN 查看
Hi HN

I built a small desktop companion that monitors CLI AI coding agents so you don’t have to stare at the terminal during long tasks.

It shows when the agent is done, needs input, and the current token/context usage (useful to know when it’s about to compact). It’s fully local + free + open source: https://github.com/fredruss/agent-paperclip

It supports Claude Code via hooks and Codex CLI by watching local session files. Default sticker pack is a small Clippy nod (no affiliation, Microsoft please don't sue me).

Next on the to-do list: multi-session visibility.

Would love feedback / issues / stars.

2

Wakapadi – Meet locals and travelers nearby and join free walking tours #

wakapadi.io faviconwakapadi.io
0 評論6:10 PM在 HN 查看
Hi HN,

I built Wakapadi after noticing that most travel tools focus on planning trips, but not on actually helping people connect once they arrive somewhere new.

When traveling, it’s often hard to meet locals or other travelers unless you already know someone, join organized tours, or rely on chance. I wanted to make discovery more natural — seeing who’s nearby, joining free walking tours, and exploring cities together.

Wakapadi currently allows users to:

discover free walking tours

see nearby travelers and locals who are open to meeting

connect and chat before meeting

explore cities in a more social way

The project is still early, and I’m especially interested in feedback on:

safety and privacy expectations

what would make you comfortable meeting people while traveling

features that would make this genuinely useful instead of another travel app

Happy to answer any technical or product questions.

2

ClawShield – Open-source firewall for agent-to-agent AI communication #

2 評論10:03 PM在 HN 查看
Hi HN!

I built ClawShield after discovering 40,214 OpenClaw instances exposed with critical CVE-2026-25253 (CVSS 8.8).

The problem: AI agents communicate with each other at scale, but there's NO firewall between them. A compromised agent can inject prompts, exfiltrate data, and hijack WebSocket sessions.

ClawShield sits between agents and blocks: - Prompt injection (16+ patterns) - Malicious skills/plugins (AST + sandbox) - Credential leaks (regex + entropy) - Unauthorized agent-to-agent comms - WebSocket hijacking

Built it last night. 181 tests. Production-ready. Open source (AGPL-3.0).

GitHub: https://github.com/DEFNOISE-AI/ClawShield Demo: [coming soon]

Compatible with OpenClaw, AutoGPT, or any agent protocol.

Free tier for personal use, paid for teams/enterprise.

Would love your feedback!

2

LLM Gateway for OpenAI/Anthropic Written in Golang #

github.com favicongithub.com
1 評論10:04 PM在 HN 查看
Hi HN - I'm Nathan. I spent a bunch of years building Shopify subscriptions software, living in the land of failed payments, retries, and "if this breaks, it breaks real money." We built a lot of automation around recovery: intelligent retry logic, routing decisions, backoffs, and all the messy edge cases you only find at scale.

When I started building AI/LLM features, I kept running into the same class of problems - except harder to reason about. Multiple providers, model quirks, intermittent failures, retries/fallbacks, and a constant question of "what actually happened?" Observability was the recurring pain point. I wanted something that didn't feel like a black box, especially once you're running real workloads and latency or errors spike for reasons that aren't obvious.

So I started building the tool I wished I had: an open-source LLM gateway / proxy in Go.

I fell into Go mostly for practical reasons: high concurrency and throughput without fighting the runtime, and a strongly-typed codebase that stays pleasant as it grows. Over time it turned into something more personal - I've found my home in Go, and this project is where I've been putting that energy.

Open source is a deliberate choice here. Coming from payments + ecommerce, trust isn't a tagline - it's operational. People need to understand what's happening under the hood, and they need to be able to verify it. I've been building software for ~15 years, and I wanted to contribute something real back to the communities that taught me how to build reliable systems.

Repo: https://github.com/ongoingai/gateway

Feedback, criticism, "you're doing this wrong," feature ideas, weird edge cases you're hitting - all welcome. If you've built anything similar (AI infra, gateways, proxies, high-throughput Go services), I'd especially love to hear what you'd consider non-negotiable for something like this.

Cheers, Nathan @ OngoingAI

2

Prompts are coupled to LLMs and nobody builds tooling for it #

github.com favicongithub.com
0 評論2:41 PM在 HN 查看
I went down a rabbit hole trying to understand why my Claude prompts turn to garbage on GPT-4 and vice versa. Not just "slightly worse" — fundamentally broken. Turns out researchers have already measured this: removing colons from a prompt template swings LLaMA-2-13B accuracy by 78 percentage points (Sclar et al., ICLR 2024). The format that works best on one model family overlaps less than 20% with what works best on another (He et al. 2024).

So I went looking for a tool that handles this. Checked DSPy, Guidance, Outlines, PromptLayer, LMQL, Braintrust, Humanloop, Maxim, MLflow, Prompty, Promptomatix. Eleven tools. Zero of them adapt input prompt format per model. They all either optimize what the prompt says or constrain what the model outputs. The actual structural packaging of the input? Manual everywhere.

Then I looked at how production tools deal with it today. Aider has a 2,718-line YAML file with 313 model configs. Some models get "you NEVER leave comments without implementing" and Claude gets the literal opposite instruction. Claude Code only works with Anthropic models — third parties have built LiteLLM proxies and Node.js fetch interceptors to hack around it. Cursor's docs say "switch to a different model and try again."

The paper maps this to Constantine's coupling taxonomy from 1974 (content, common, control, stamp, data coupling). Same structural problem, different domain. I called it "prompt coupling" because that's what it is — your prompt is coupled to your model the same way a module can be coupled to another module's internals.

Also built promptc (https://github.com/shakecodeslikecray/promptc) — transparent HTTP proxy, rewrites prompt structure per model, zero code changes to existing tools. It's a proof of concept, not a product. The paper is the actual contribution.

First paper. Independent researcher. If the framing is wrong, I'd rather hear it here than after it's indexed.

2

PolyMCP – MCP Tools, Autonomous Agents, and Orchestration #

0 評論2:45 PM在 HN 查看
I built PolyMCP, a framework and runtime for exposing Python functions as MCP tools, serving them via standardized MCP servers, and orchestrating them with autonomous agents that can plan and execute multi‑step workflows.

PolyMCP is more than just an MCP server: it turns existing Python code into agent‑ready tools and gives agents the ability to discover, compose, and orchestrate across multiple services with adaptive planning and real‑world execution support.

Key parts of the ecosystem:

1) Expose Python functions as MCP tools Use existing Python functions directly as MCP tools without rewriting them:

from polymcp.polymcp_toolkit import expose_tools_http

def add(a: int, b: int) -> int: return a + b

app = expose_tools_http([add], title="Math Tools")

Type hints automatically generate structured tool schemas, and input/output validation and error handling are included. Multiple functions can be exposed on the same server.

2) Autonomous agent: PolyClaw PolyClaw goes beyond tool calling. It: • Decomposes complex tasks into executable steps • Selects and orchestrates MCP tools dynamically • Starts or connects to MCP servers on demand • Validates outputs before proceeding • Adapts plans when execution fails • Runs everything in isolated Docker containers

Example run:

polymcp agent run \ --type polyclaw \ --query "Build a sales reporting pipeline and test it end‑to‑end" \ --model minimax‑m2.5:cloud \ --verbose

Under the hood, the system plans, provisions infrastructure as needed, executes steps sequentially or in parallel, and handles adaptive replanning when something fails.

Why this matters

Most AI agent systems today either call tools statically or assume the infrastructure already exists. PolyMCP instead: • Makes existing Python code agent‑ready with minimal friction • Standardizes tools via MCP so multiple agents and services can interact with them • Provides autonomous orchestration across multiple services • Spins up infrastructure dynamically when needed • Validates step results and recovers from failures • Uses Docker for safe, isolated execution

PolyMCP is useful for enterprise automation, DevOps workflows, data pipelines, internal tooling orchestration, and any complex multi‑tool reasoning tasks where agents must plan and execute reliably.

Repo: https://github.com/poly‑mcp/PolyMCP

Happy to answer questions.

2

Hardware.dog: automated schematic and PCB review #

hardware.dog faviconhardware.dog
0 評論5:58 PM在 HN 查看
I design a lot of hardware projects and kept running into the same problems:

– digging through long datasheets to find constraints – checking whether parts were risky or going out of stock – looking for reference designs that already solved similar problems – catching obvious power or schematic issues too late

So I started building small internal tools to help with this, and eventually turned them into a web tool.

Right now it can:

• review schematics and PCB exports for common issues • summarize datasheets and highlight important constraints • suggest alternative parts that are in stock • estimate power usage and help build a power tree • find relevant reference designs

It’s still early and definitely imperfect, but it’s already sped up my own workflow quite a bit.

I’m curious what parts of hardware design people here find most repetitive or annoying. What tools do you wish existed?

https://hardware.dog

2

MCGrad – Fix ML Calibration in Subgroups (Open Source from Meta) #

github.com favicongithub.com
0 評論7:00 PM在 HN 查看
Hi HN,

We're the team at Meta open-sourcing MCGrad. We built this because we found that models often look calibrated on global metrics but fail silently on specific data slices (subgroups).

This library provides production-ready implementations of "multicalibration" to detect and fix these local biases.

Unlike standard calibration (which looks at the average), MCGrad optimizes for calibration across thousands of potentially overlapping subgroups simultaneously. It’s written in Python and designed to scale. It is currently in production at hundreds of ML models at Meta!

It includes: - Estimators for detecting miscalibration. - Algorithms to recalibrate predictions (post-processing). - Tools to visualize where your model is underperforming.

Docs: https://mcgrad.dev

Repo: https://github.com/facebookincubator/MCGrad/

Happy to answer any questions about the implementation or how we use it!

2

Opaal Visual multi-agent prompt designer for Claude Code and agentic AI #

github.com favicongithub.com
0 評論2:59 PM在 HN 查看
Hi HN!

I built Opaal because writing multi-agent orchestration prompts was becoming tedious and error-prone. Every time I wanted to coordinate 3-5 AI agents on a complex task, I would spend 20+ minutes crafting the prompt by hand.

Opaal (Orchestration Prompts for Agentic AI Launch) lets you design these workflows visually instead. You drag agent cards onto a canvas, organize them into phases (columns), draw connections between them, and the app generates a production-ready prompt automatically. The prompt updates live as you build.

Built with Electron + React + React Flow + Zustand + Tailwind CSS v4.

Key features: - 15 agent roles (Researcher, Architect, Developer, Reviewer, etc.) - Smart auto-connections between adjacent phases - Manual wiring for custom data flow - 3 starter templates (Code Review, Feature Build, Bug Fix) - Auto-detects installed Claude Code skills - Save/load .opaal files, export to CLAUDE.md - Full keyboard shortcuts, undo/redo, multi-select

MIT licensed. Would love feedback on what features would make this more useful for your workflows.

2

I analyzed 120 films to help screenwriters test narrative structure #

arc.quanten.co faviconarc.quanten.co
0 評論7:42 AM在 HN 查看
Almost 10 years ago, I used to host a film club in my office. A friend of mine was hosting one in the city and they needed a space.

I had an openfloor office and that meant we clear the space, fire up the projector and can host the film club. They met between 9pm to 7am once a month, on a friday. Each meeting had 100-150 people attending.

This went on for a few months - there was a curator assigned for each meet who would take a theme, and showcase films from the first film ever made in that genre all the way to modern versions and how the evolution happened. It was deconstruction of the craft.

The conversation also involved some of these filmmakers showing their own works, and how they did certain cuts and why.

What I observed was that none of these creative folks had any data to back their decisions they were making. It was purely gut and intuition, and I could see some editors and producers rolling their eyes - because they felt what that meant. Gut and intuition meant, uncertainty and it plays tricks with your head, so you are constantly making variations till at some point you develop tunnel vision.

Something a director told me, stuck with me - he said at some point we just want to be done with the project - and between the studio's demands and the producers prodding, we just let it go and move on to the next project.

A producer once said, that if every director was allowed to have their way - every movie would be 4 hours long and there wouldn’t be a single shot footage that would be left out.

But then there are cases of films like Man of Steel where Zack Snyder's cut of the film was better - but the studio and producer's call won and audiences weren’t as thrilled about it when they watched the film. And apart for the real enthusiasts nobody really goes and hunts down the director's cut of a film anyways.

The need was clear - the industry needed analytics to know what works and doesn’t, similar to how startups got the lean startup framework where everything shifted to building the minimum lovable product and then building it out from there. The caveat, unlike a product, which can launch, analytics can be acquired, and we can tweak and release - there is no concept of re-release of a film. It gets one shot and if it misses it, its done. That explains how the film industry has a 7% hit rate.

It has been a little over 10 years since I hosted the film club, but that issue lingers - and given how that the industry spends over 150bn a year on creating production content (tv shows, movies) it is a big problem.

We started using a hardware that captures occulometric data and heart rate that can be used complementarily during audience test screenings - and that gave us a lot of insane depth - on a microsecond level where content was engaging and where it was failing.

but the question then was, after a film has been shot, the cost of redoing shots becomes extremely expensive. What the industry calls as "pickup shots" is seldom done - because the artists have moved on and recreating that exact scene and moment is extremely hard. So we built a database of 120 odd films across various genres, and used that audience data to train a custom model, that can then look at past films to build benchmark data - and then use that as comparables against scripts that someone might be planning.

We launched this as Quanten Arc (arc.quanten.co) last week. This helps filmmakers - especially indie filmmakers who could use all the data in the world, because they might not even have the budgets to do audience testing. But it even more so helps AI filmmakers and studios, who can now identify the exact scenes that arent working - and can regenerate them with the required changes in the narrative.

Im curious to hear what you think about it. Am I solving a real problem or am I imagining a problem that doesn't exist and getting caught up in the beauty of data?

1

Conduit: One Swift interface for every AI provider, on-device and cloud #

github.com favicongithub.com
0 評論2:21 AM在 HN 查看
I built Conduit because I was tired of writing the same streaming boilerplate five times for five different AI providers, then rewriting it every time a new one became interesting. So I stopped. The core idea: one protocol hierarchy, every provider. Switch from Claude to a local Llama model running on Apple Silicon with a one-line change. No vendor lock-in at the call site.

The interesting decision was going actor-first from day one. Every provider is a Swift actor. You get data-race freedom enforced at compile time, not by convention. Swift 6.2's strict concurrency makes this a hard guarantee, not a README promise. LangChain can't say that.

The part I'm most proud of — @Generable

@Generable struct FlightSearch { @Guide(description: "Origin airport code") let origin: String

    @Guide(description: "Departure date", .format(.date))
    let date: Date
    
    @Guide(.range(1...9))
    let passengers: Int
}

let result = try await provider.generate( "Book me a flight to Tokyo next Friday", model: .claude3_5Sonnet, returning: FlightSearch.self )

The macro expands at compile time (via swift-syntax) to generate JSON Schema, streaming partial types, and all conversion boilerplate. The API is deliberately aligned with Apple's new Foundation Models framework — so the same struct works against on-device Apple models on iOS 26 and against Claude or GPT-4 with zero changes.

On-device is a first-class citizen, not an afterthought Most Swift AI SDKs treat cloud as the primary path and shim local models in awkwardly. Conduit treats MLX, llama.cpp, Core ML, and Apple's Foundation Models as fully equal providers. A ChatSession configured with an MLX Llama model and one configured with GPT-4o are indistinguishable at the call site.

Trait-based compilation keeps binary size sane

AsyncThrowingStream all the way down. Cancellation works via standard Swift task cancellation — no special teardown protocol. Back-pressure is handled naturally by the async iterator.

12 providers, one interface Anthropic, OpenAI, Azure OpenAI, Ollama, OpenRouter, Kimi, MiniMax, HuggingFace Hub, MLX, llama.cpp, Core ML, Foundation Models. The OpenAI-compatible ones share a single OpenAIProvider actor — the named variants are thin configuration wrappers, not code forks.

https://github.com/christopherkarani/Conduit Happy to dig into the actor model approach, the macro expansion strategy, or why wrapping LangChain was never an option.

1

Browser-based hand gesture T9 keyboard (YOLOX and ONNX Runtime Web) #

ketsuin.clothpath.com faviconketsuin.clothpath.com
0 評論6:59 AM在 HN 查看
I built a small experiment over a 3-hour vibe coding session: a real-time T9 keyboard controlled by hand gestures, running entirely in the browser.

It uses:

YOLOX for gesture detection

ONNX Runtime Web for in-browser inference

Plain JS for the UI

The original goal was simple: Could I make real-time gesture-based input usable inside a browser without freezing the UI?

A few observations:

In-browser ML performance is better than I expected on modern laptops

Subtle gesture distinctions (e.g. similar seals like Tiger vs Ram) require stronger detection than MediaPipe provided — YOLOX performed noticeably better

Lighting consistency matters more than hand size

It’s obviously not production-grade, but it was an interesting exploration of browser-based vision input.

Curious what others think about gesture interfaces as alternative input systems.

Demo: https://ketsuin.clothpath.com/

1

Equidistance – find a meeting spot that's equally painful for everyone #

equidistance.io faviconequidistance.io
1 評論9:37 AM在 HN 查看
I built this after the usual tussle with friends about where to meet.

I tried every other meet-in-the-middle app, but they weren’t practical because they relied solely on geographic midpoints. The centre can be a 45-minute trip for one person and a 10-minute trip for another, depending on the route.

Equidistance uses the Google Maps Distance Matrix API to test a grid of candidate points and picks the one that minimises the difference in travel times. It then searches for actual venues (cafes, pubs, etc.) near that point and scores them by fairness.

It supports public transport, driving and walking. There’s also a departure time setting, since equidistant times depend on which trains are running.

The app prioritises independent businesses, but it will show chains if nothing else is available.

Stack: vanilla JS, Google Maps APIs (Distance Matrix, Maps JavaScript, Places)

1

Built a Next.js CLI to scaffold UI blocks, sections, and page templates #

github.com favicongithub.com
0 評論2:22 PM在 HN 查看
I built nextworks, a small CLI that installs core UI components, landing page sections like navbar/hero/features, and full page templates into an existing Next.js project. MIT licensed.

I got tired of rebuilding the same sections every time I started a new Next.js landing page. This is alpha and it copies files into your repo (it may overwrite on path collisions), so feedback on installs + generated code is what I need most.

commands:

npx create-next-app@latest

cd <your-app>

npx nextworks@latest add blocks --sections --templates

npm install

npm run dev

1

Seamless Auth – open-source passwordless authentication #

github.com favicongithub.com
0 評論2:22 PM在 HN 查看
I hope this finds you with your minds open!

I’ve been building an open source authentication system called Seamless Auth.

It is designed the idea:

Authentication should behave like infrastructure, and help promote security, and be easy to reason about.

Seamless Auth is:

Fully open source

Passwordless only (WebAuthn, passkeys, OTP)

Cookie-based session validation

No redirect-based login flows

Designed to run inside your own infrastructure

The core is framework-agnostic, with adapters for Express today. There is also a React SDK that exposes authenticated session state without client-side token management.

It supports:

Server-side session validation

Explicit CORS and origin configuration

Isolated infrastructure model for you to self-host

A production-shaped local development flow with Docker

You can run it locally with 3 commands thanks to the open source CLI tool:

npx create-seamless my-app cd my-app docker compose up

This spins up a template UI (react), a template API (express), the auth server, and a database (postgres) complete with migrations.

The project grew out of frustration with:

Redirect-heavy OAuth flows

Shared multi-tenant auth servers

Magic SDKs that hide too much

Development environments that do not resemble production

and worst of all... forgetting my damn password!

The goal is not to replace everything. It is to offer a transparent, inspectable, infrastructure first alternative for teams that care about understanding their authentication layer.

I would appreciate feedback on:

Architecture decisions

Security assumptions

Developer experience

Tradeoffs I may be missing

Repository: Auth Server: https://github.com/fells-code/seamless-auth-api CLI: https://github.com/fells-code/create-seamless React SDK: https://github.com/fells-code/seamless-auth-react/tree/main Server SDKs: https://github.com/fells-code/seamless-auth-server

Documentation: https://docs.seamlessauth.com

Happy to answer questions.

1

I built an app that forces me to drink water before I can open TikTok #

thirsttr.app faviconthirsttr.app
0 評論2:23 PM在 HN 查看
I wasn’t trying to reduce screen time but trying to drink more water.

So I built an iPhone app (SwiftUI) that locks selected apps using Apple’s Screen Time APIs. To unlock them, ThirstTrapp activates the front camera and watches for ~15 seconds while I drink.

On-device vision checks for: - a face - a drinking container (glass/bottle) - and a plausible drinking gesture (container overlapping mouth region over time)

If that sequence is detected for ~15 seconds, the apps unlock for a configurable window (e.g. 2 hours). Then they lock again until the next drink.

It doesn’t try to stop me from scrolling. It just makes hydration the entry fee. And since I'm addicted to scrolling I will actually drink. It’s slightly absurd, but it’s been more effective for me than reminder apps ever were.

All processing happens on-device. No images or video leave the phone or are stored. No account registration or login.

Curious what you think - especially from a technical perspective.

1

TextAnimations.online – Generate MP4/GIF animations from a prompt #

textanimations.online favicontextanimations.online
0 評論2:26 PM在 HN 查看
Recently I had to make an intro animation for a product demo, and the quickest way I could think of doing was to screen record an LLM-generated HTML/JS page with the animation.

I turned that idea into a polished project here. It is super fun to play with. I've been creating all kinds of custom-text GIFs to send to friends.

Under the hood, it is generating HTML and recording the video client-side.

Would love to hear feedback on it!

1

Kkr-Query2xlsx – SQL Runner to XLSX/CSV (GUI+CLI, SQLite Demo) #

github.com favicongithub.com
0 評論2:29 PM在 HN 查看
Hi HN - I’m sharing a lightweight SQL query runner I’ve been using internally for years and finally packaged for others.

kkr-query2xlsx runs SQL (GUI or CLI) and exports results to:

* XLSX (supports templates - useful for formatting / existing sheets) * CSV (custom delimiters / encodings)

It includes a SQLite demo, so you can try it without any database setup. MIT-licensed.

Windows build: https://github.com/kkrysztofczyk/kkr-query2xlsx/releases Feedback/bugs (please avoid sensitive data): https://github.com/kkrysztofczyk/kkr-query2xlsx/issues/new/c...

1

Clawlet – AI agent with built-in semantic memory, one binary #

github.com favicongithub.com
0 評論2:34 PM在 HN 查看
Clawlet is a personal AI agent that ships as a single, self-contained binary. No runtime, no package manager, no external database. The main thing that sets it apart: built-in hybrid semantic memory search (vector similarity + full-text) using a bundled SQLite with vector extensions. The index is just a local .sqlite file — no separate vector DB to run. Drop the binary on any machine and memory search just works.

GitHub: https://github.com/mosaxiv/clawlet

1

Browser Terminal Use – run local CLI/agent loops in browser terminals #

github.com favicongithub.com
1 評論2:38 PM在 HN 查看
I built Browser Terminal Use: a Chrome extension + local daemon + CLI that lets you run commands in a browser-hosted terminal from your local shell on macOS,

Why I built it: setups). - I wanted a bridge that keeps local automation while executing remotely in the browser terminal context.

How it works:

- browterm CLI sends exec requests to browterm-daemon on localhost. - Daemon serializes requests (single active command) and routes to a bound Chrome terminal tab. - Command is wrapped with markers to extract clean output + remote exit code. - Supports timeout and cancel.

Current limitations:

- Browser terminal websocket protocols vary by vendor. - Some UIs can block input fallbacks. - Cross-origin iframe terminals can reduce observability.

I’d really value feedback on reliability across terminal vendors and ideas for improving compatibility.

1

We proved you can't train hallucinations out of AI – so we verify #

github.com favicongithub.com
0 評論2:45 PM在 HN 查看
Hi HN, I'm Ty. I built Assay because I got tired of shipping bugs that AI hallucinated into my code and no tool caught.

The starting point was a finding that surprised me: when we tried training verification directly into models using RLVF (Reinforcement Learning from Verification Feedback), more training data made the model worse. 120 curated pairs hit 91.5% accuracy. 2,000 pairs collapsed to 77.4%. The model's training loss kept decreasing while eval performance cratered. This isn't a tuning problem. Verification cannot be internalized.

So we built an external layer. Assay extracts the implicit claims code makes ("this handles null input," "this query is injection-safe," "this validates auth tokens") and verifies each one against the actual implementation. It's not a linter, not another LLM-as-judge — it's structured claim extraction followed by adversarial verification.

Results validated against real test suites (not LLM judgment): - HumanEval: 100% pass@5 (164/164) — baseline was 86.6% - SWE-bench: 30.3% (91/300) vs 18.3% baseline — +65.5% - LVR pilot: Found 23 real bugs (2 critical) in a production ERP system, verified 354 claims - LLM-as-judge actually regresses at k=5 (97.2% vs our 100%) because it hallucinates false positives

Ships as a GitHub Action for PR verification, or try it: npx tryassay assess /path/to/your/project

Paper: https://doi.org/10.5281/zenodo.18522644

1

CSL MCP Server – Write and Verify AI Safety Policies from Claude/Cursor #

pypi.org faviconpypi.org
1 評論2:50 PM在 HN 查看
CSL-Core is a policy engine that formally verifies AI agent constraints using Z3. Instead of prompting an LLM to behave, you write a small policy in CSL (Constitutional Specification Language), Z3 proves it has no contradictions at compile time, and a deterministic runtime enforces it — completely outside the model. We just shipped a built-in MCP server with 4 tools:

verify_policy: Z3 formal verification in one call

simulate_policy: test any JSON input, get ALLOWED/BLOCKED

explain_policy: human-readable breakdown of any policy

scaffold_policy: describe what you want in English, get a CSL template

This means you can do this from Claude Desktop or Cursor:

"Write me a policy that blocks transfers over $5000 for non-admin users"

→ scaffold generates a CSL template

→ verify proves it has no contradictions

→ simulate tests your edge cases

→ all without leaving your editor

The full loop: From English description to mathematically verified, runtime-enforced policy; happens inside your AI assistant.

Why not just prompt the LLM to enforce rules? We benchmarked GPT-4o, Claude Sonnet 4, and Gemini 2.0 Flash as guardrails with a hardened system prompt. Every model was bypassed by at least one attack (context spoofing, multi-turn role escalation, unicode homoglyphs). CSL-Core blocked all of them; because the LLM never touches the enforcement layer.

Setup:

pip install "csl-core[mcp]"

Claude Desktop config:

{

"mcpServers": {

"csl-core": {

"command": "csl-core-mcp"

}

}

}

Or with Docker:

docker build -t csl-core-mcp .

docker run -i csl-core-mcp

GitHub: https://github.com/Chimera-Protocol/csl-core

Listed on awesome-mcp-servers:

https://github.com/punkpeye/awesome-mcp-servers

Previous Show HN discussion: https://news.ycombinator.com/item?id=46963250

Would love feedback, especially from anyone building agentic systems where safety guarantees matter.

1

Sievers a Rust SIEVE filter editor #

github.com favicongithub.com
0 評論2:55 PM在 HN 查看
Hey there HN, I just build this basic SIEVE email filter editor. I usually struggle with keeping my email filters up to date and in the past few years I've set myself away from major providers like M$ or Google, so I had to fall back the big ol' SIEVE to do filtering. After a long time doing it manually, I decided to just build a GUI to edit the rules, since I didn't find any GUI editor out there. That's it, just a commodity tool, nothing fancy. Hopefully someone will find it useful :D
1

A pay-per-request API to search social media posts #

apidirect.io faviconapidirect.io
0 評論3:33 PM在 HN 查看
Hey HN,

API Direct is a single REST API that lets you search posts and comments across LinkedIn, Twitter/X, Reddit, YouTube, Instagram, and forums.

Every endpoint returns the same JSON structure (title, url, date, author, snippet), so you don't need to write normalization logic for each platform.

Pricing is pay-per-request starting at $0.003. No monthly fees. 50 free requests per endpoint per month to try it out with no card required.

Technical details:

- Consistent query parameters across endpoints (query, sort_by, page) - Pagination support - Concurrency limit of 3 simultaneous requests per endpoint - Spending caps you can set in the dashboard

I built this mainly for developers who need social data for things like brand monitoring, lead gen, market research, or feeding context into LLMs/agents without subscribing to an enterprise platform.

Would appreciate any feedback on the API, pricing, or what you'd want to see added. I'm sure there are lots of other platforms that I can add.

Josh

1

Agentpriv – Sudo for AI Agents #

github.com favicongithub.com
0 評論3:37 PM在 HN 查看
AI agents call tools autonomously, but some calls (delete_db, reset_state) shouldn't run unchecked.

agentpriv is a tiny permission layer: wrap any callable with allow/deny/ask and it gates execution before the function runs.

Zero dependencies, ~100 lines, works with any framework or plain Python. Happy to hear what's missing.

1

AgentDX – Open-source linter and LLM benchmark for MCP servers #

github.com favicongithub.com
0 評論4:28 PM在 HN 查看
MCP servers are proliferating fast, but most have vague tool descriptions and incomplete schemas that make LLMs pick the wrong tool or fill parameters incorrectly.

AgentDX is a CLI that measures this. Two commands:

- `npx agentdx lint` — static analysis of tool descriptions, schemas, and naming. 18 rules, zero config, no API key. Produces a lint score.

- `npx agentdx bench` — sends your tool definitions to an LLM (Anthropic, OpenAI, or Ollama) and evaluates tool selection accuracy, parameter correctness, ambiguity handling, multi-tool orchestration, and error recovery. Produces an Agent DX Score (0-100).

It auto-detects the server entry point, spawns it, connects as an MCP client, and reads tools via the protocol. Bench auto-generates test scenarios from your tool definitions.

Built in TypeScript, MIT licensed. Early alpha — the bench command works but is slow (sequential LLM calls, parallelization is next). Feedback welcome.

1

AFS – filesystem-native memory layer for AI agents #

0 評論4:31 PM在 HN 查看
I've been building multi-agent AI pipelines and kept running into the same structural problem: agents are stateless by default. Every session restart discards everything they learned. In multi-agent systems it compounds — Agent-1 learns something Agent-2 will never know. I started calling it "agent amnesia." AFS is my attempt to fix this. The central architectural decision is unusual: your filesystem IS the memory layer. There's no separate database process to run, no cloud service to authenticate against. AFS stores memories as JSON files in a `.afs/` directory, with SQLite FTS5 for full-text search, HNSW indices for vector similarity, and msgpack-encoded graph edges for relationships. *Three-tier memory lifecycle (automatic)* Memories auto-migrate without explicit management: - Working memory (< 24h): raw observations, fast access, no compression

- Episodic memory: full history with provenance, searchable - Semantic memory: auto-consolidated knowledge (scheduler synthesizes patterns from episodic into generalizations like "Auth module uses JWT, 24h expiry across all services")

You only store observations. The scheduler extracts the patterns.

*Multi-agent knowledge sharing* Named swarm pools: Agent-1 shares a memory to a swarm ID, any agent querying that swarm ID gets it. No broker process, no coordination protocol — just shared files with file-locking. *Auto-built knowledge graph* Graph edges (`similar_to`, `co_occurred`, `consolidated_from`, `depends_on`) are discovered automatically during consolidation. You can query neighbors, mine for new connections, or traverse paths.

*Why filesystem over a vector database* A few deliberate tradeoffs: 1. Inspectable by default — `jq .afs/agents/my-agent/memories/working/.json` is a valid debugging strategy 2. Versionable — `git` your agent's memory like any other project artifact 3. Portable — rsync to another machine, it works 4. Air-gap friendly — zero outbound calls 5. No additional process — no Postgres, no Qdrant, nothing to manage Tradeoff: less efficient at very large scale than a dedicated vector DB. Using HNSW (hnswlib) for approximate nearest neighbor — handles the cases I've tested (100k+ memories per agent, < 100ms search).

*Audit trail* All operations logged with standardized operation names, status (success/error/partial), and operation-specific payload. Fail-open — if audit logging fails, the operation continues.

*Status* Under active development. APIs and behaviors change frequently. Open-sourcing early to get feedback from people building real agentic systems. Repo: https://github.com/thompson0012/project-afs Specifically interested in feedback on: - The filesystem-first approach vs. embedded DB (DuckDB, SQLite with vector extension) - Whether the three-tier memory model maps to real agent workflows - Any memory patterns this architecture can't support well ```

1

Omniget, a Desktop Media Downloader #

github.com favicongithub.com
1 評論4:48 PM在 HN 查看
I started learning to code last year and one of the things I always loved was downloading stuff from the internet. Figuring out how players serve their streams, messing with scrapers, all of that. During carnival I had a lot of free time and decided to build something I could share. Omniget is a desktop app (Tauri, Rust + Svelte) for downloading media from YouTube, Instagram, TikTok, Twitter, Reddit, Twitch, Pinterest, Bluesky, Vimeo, Telegram, Hotmart, and now Udemy. Inspired by cobalt.tools but as a native app with course platform support and a download queue. The Udemy integration just shipped and was the most interesting part to build. Their login is passwordless now, so the app opens a WebView, injects JS to fill the email field, waits for the user to enter the 6-digit code from their inbox, then extracts cookies. On Windows this uses the WebView2 COM cookie API, on Linux/macOS I had to use a hack where I eval a JS redirect to a fake URL with document.cookie as a query param and intercept it in the navigation handler. The app also has its own HLS downloader (m3u8 parsing, parallel TS segment downloads, AES-128-CBC decryption) so most non-DRM content downloads without needing ffmpeg. Each platform lives in its own isolated module with no shared auth state, which meant some code duplication but made things way easier to maintain. GPL-3.0 licensed. Will always be free and open source, no plans to monetize. Would appreciate any feedback.
1

Agent Democracy Protocol – AI agents that vote and pool resources #

aeoess.com faviconaeoess.com
0 評論5:46 PM在 HN 查看
I built an autonomous AI agent running 24/7 on a Mac Mini (16 tools, persistent memory, proactive scheduler). Kept hitting token budget walls on ambitious tasks — agent starts something good, runs out of tokens, stops.

So I wrote a protocol for agents to collaborate: discover each other, propose projects, vote on resource allocation (one agent = one vote, cryptographically bound to one owner), pool tokens, execute together.

Key decisions: reputation-weighted not wealth-weighted governance, privacy-first (share knowledge never owner data), protocol evolves through agent voting.

1

Kindred – Find people interested in what you're building #

kindred-frontend.onrender.com faviconkindred-frontend.onrender.com
0 評論5:47 PM在 HN 查看
Everyone's shipping projects right now. The problem is finding the people who actually care about what you're working on.

Kindred is anonymous peer matching. Write a short blurb about what you're building or thinking about. AI finds people with similar interests and matches you with a mutual explanation of why — without revealing each other's blurbs. Then you chat anonymously.

1

A public map of startups worldwide (anyone can add theirs) #

startupsoftheworld.com faviconstartupsoftheworld.com
2 評論6:11 PM在 HN 查看
Startups of the World is a public, interactive map where anyone can add their startup for free — no account required. Submissions appear as pins on a world map within minutes.

Pins are color-coded by industry, so you can visually see clusters forming (e.g. fintech in London, deeptech in Berlin, AI across Southeast Asia).

It’s intentionally map-first rather than search-first. The goal is to make the global startup ecosystem visible, not curated.

I’d especially appreciate feedback on:

Whether the submission flow feels frictionless

Whether the map becomes noisy or valuable as it scales

What would make you come back to explore it again

1

Pantalk – One daemon, any AI agent, every chat platform #

github.com favicongithub.com
0 評論6:45 PM在 HN 查看
Hey HN!

One of the big selling points of AI agents like OpenClaw is the ability to connect it to popular communication tools like Slack, Discord, Mattermost, Telegram. But this shouldn't need to be reimplemented. A single tool can provide this interface for all of them.

That's what Pantalk does. It's a local daemon that maintains persistent connections to chat platforms, and any agent (regardless of language, framework, or runtime) talks through plain CLI commands or a Unix domain socket with a JSON protocol.

Your agent doesn't import a library. It calls a command. That means it works with Claude, Codex, Copilot and Gemini, local models, custom frameworks, a bash script or anything that can exec a process or write to a socket.

It outputs JSON when stdout isn't a terminal, and ships with "skill" definitions that teach agents what commands are available (kind of like man pages for LLMs).

Some things that might be interesting to this crowd:

* IPC is JSON over Unix domain socket. The CLI is just a thin client. You can write your own in any language.

* Everything is local. Message history and notifications are persisted in SQLite. No cloud, no external DB.

* Multi-bot. One daemon can manage multiple bots across different platforms simultaneously.

* Hot reload. Change your config and run pantalk reload - no restart needed.

It supports Slack (Socket Mode), Discord (Gateway), Mattermost (WebSocket), and Telegram (Bot API).

Written in Go, single binary, MIT licensed.

GitHub: https://github.com/pantalk/pantalk

Website: https://pantalk.dev

I'd love feedback on the protocol design and what other platforms people would want to see. Happy to answer questions!

1

CGK – Provably Preventing Overcommit in Distributed Capacity Markets #

distributed-markets-overcommit.github.io favicondistributed-markets-overcommit.github.io
0 評論6:58 PM在 HN 查看
Early PoC exploring how to make distributed capacity markets (DePIN, restaking, AI inference slots, etc.) provably non-overcommitting — even under network partitions and adversarial excitation.Core idea: enforce strict global conservation (Σ allocations ≤ CAP) via:Contractive dynamics (Banach fixed-point convergence) Lattice join merges (⊔ = max, never additive) Budget-token gating + partition-local caps (≤ CAP/n)

Try the guided 5-step interactive simulator (single HTML file, no install): it demonstrates normal operation, stress to cap, partition attack, adversarial injection, and safe reconnection/merge with invariants always held.Full technical report (proofs of conservation, contractivity, partition/merge safety) in DOCS.md: https://github.com/Distributed-markets-overcommit/Distribute...: https://github.com/Distributed-markets-overcommit/Distribute... welcome on the math, sim behavior, or potential DePIN/restaking applications. Early stage — honest critique appreciated.

1

MineDemo, a Minecraft-style demo recorder b/C I'm tired of boring demos #

minedemo.mygictools.com faviconminedemo.mygictools.com
0 評論7:49 PM在 HN 查看
Hi all!

This is Ara, an indie builder and solo founder.

The idea of MineDemo was born when I was create a demo video for a hackathon last month and I was thinking what if I can walk on top of what I show and pointing things around?

So this week when I got sometime, I created this little tool called MineDemo.

It's totally free, supports 20+ avatars and decors + drag and drop text; you can copy paste a link or use screen share. Avatar is controlled by keyboard (WSAD) and hands are controlled by Q and E.

It may have some little bugs here and there so please let me know if you have any feedback.

Here's a quick demo video: video:https://x.com/AZ_is_xyz/status/2024202715820339298?s=20