2026년 5월 19일의 Show HN

43 개

687

Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks #

github.com

252 댓글12:23 PMHN에서 보기

Hi HN, I'm Antoine Zambelli, AI Director at Texas Instruments.

I built Forge, an open-source reliability layer for self-hosted LLM tool-calling.

What it does:

- Adds domain-and-tool-agnostic guardrails (retry nudges, step enforcement, error recovery, VRAM-aware context management) to local models running on consumer hardware

- Takes an 8B model from ~53% to ~99% on multi-step agentic workflows without changing the model - just the system around it

- Ships with an eval harness and interactive dashboard so you can reproduce every number

I wanted to run a handful of always-on agentic systems for my portfolio, didn't want to pay cloud frontier costs, and immediately hit the compounding math problem on local models. 90% per-step accuracy sounds great, but with a 5-step workflow that's a 40% failure rate. No existing framework seemed to address this mechanical reliability issue - they all seemed tailor-made for cloud frontier.

Demo video: https://youtu.be/MzRgJoJAXGc (side-by-side: same model, same task, with and without Forge guardrails)

The paper (accepted to ACM CAIS '26, presenting May 26-29 in San Jose) covers the peer-reviewed findings across 97 model/backend configurations, 18 scenarios, 50 runs each. Key numbers:

- Ministral 8B with Forge: 99.3%. Claude Sonnet with Forge: 100%. The gap between a free local 8B model on a $600 GPU and a frontier API is less than 1 point.

- The same 8B local model with Forge (99.3%) outperforms Claude Sonnet without guardrails (87.2%) - an 8B model with framework support beats the best result you can get through frontier API alone.

- Error recovery scores 0% for every model tested - local and frontier - without the retry mechanism. Not a capability gap, an architectural absence.

I'm currently using this for my home assistant running on Ministral 14B-Reasoning, and for my locally hosted agentic coding harness (8B managed to contribute to the codebase!).

The guardrail stack has five layers, each independently toggleable. The two that carry the most weight (per ablation study with McNemar's test): retry nudges (24-49 point drops when disabled) and error recovery (~10 point drops, significant for every model tested). Step enforcement is situational - only fires for models with weaker sequencing discipline. Rescue parsing and context compaction showed no significance in the eval but are retained for production workloads where they activate once in a while.

One thing I really didn't expect: the serving backend matters. Same Mistral-Nemo 12B weights produce 7% accuracy on llama-server with native function calling and 83% on Llamafile in prompt mode. A 75-point swing from infrastructure alone. I don't think anyone's published this because standard benchmarks don't control for serving backend.

Another surprise: there's no distinction in current LLM tool-calling between "the tool ran successfully and returned data" and "the tool ran successfully but found nothing." Both return a value, the orchestrator marks the step complete, and bad data cascades downstream. It's the equivalent of HTTP having 200 but no 404. Forge adds this as a new exception class (ToolResolutionError) - the model sees the error and can retry instead of silently passing garbage forward.

Biggest technical challenge was context compaction for memory-constrained hardware. Both Ollama and Llamafile silently fall back to CPU when the model exceeds VRAM - no warning, no error, just 10-100x slower inference. Forge queries nvidia-smi at startup and derives a token budget to prevent this.

How to try it:

- Clone the repo, run the eval harness on a model I haven't tested. If you get interesting results I'll add them to the dashboard.

- Try the proxy server mode - point any OpenAI-compatible client at Forge and it handles guardrails transparently. It's the newest model and I'd love more eyes on it.

- Dogfooding led me to optimize model parameters in v0.6.0. The harder eval suite (26 scenarios) is designed to raise the ceiling so no one sits at 100%. Several that did on the original suite can't sweep it - including Opus 4.6. Curious if anyone finds scenarios that expose gaps I haven't thought of. Paper numbers based on pre v0.6.0 code.

Background: prior ML publication in unsupervised learning (83 citations). This paper accepted to ACM CAIS '26 - presenting May 26-29.

Repo: https://github.com/antoinezambelli/forge

Paper: https://www.caisconf.org/program/2026/demos/forge-agentic-re... https://github.com/antoinezambelli/forge/blob/main/docs/forg...

Dashboard: https://github.com/antoinezambelli/forge/docs/results/dashbo...

529

Gaussian Splat of a Strawberry #

superspl.at

200 댓글10:38 AMHN에서 보기

The Setup:

https://i.imgur.com/o0hgybh.jpeg

https://i.imgur.com/mcNiomp.jpeg

https://i.imgur.com/vIjw6pc.jpeg

https://i.imgur.com/nzOwmSC.jpeg

I made a 3D pose maker for artists #

setpose.com

33 댓글2:04 PMHN에서 보기

Superlog (YC P26) – Observability that installs itself and fixes bugs #

superlog.sh

49 댓글3:54 PMHN에서 보기

Hey HN, we’re Nico and Arseniy, co-founders of Superlog (https://superlog.sh). We're building a self-installing, self healing observability tool meant not to be opened. It has a wizard that daily sets up proper logging and an agent that investigates errors and opens PRs.

Super short demo: https://www.youtube.com/watch?v=xFhU9Mk247M.

In our earlier startups, we tried Sentry, Datadog, Grafana, Dash0, and nothing was good enough. Proper telemetry and alerting still requires a ton of manual setup. We struggled with adding good logs, so debugging was tough, especially as codebases grow at a faster pace. Meanwhile, the Datadog/Dash0 bill kept climbing, and we still spent engineering hours to learn, configure, and maintain our observability tooling.

With Sentry, we found ourselves flooded by a stream of alerts into our Slack channel, most were duplicates or lacked context, so alert fatigue/constant interrupts were a real pain. The #ops notification is consistently the worst feeling on a Saturday morning

We’ve seen too many times servers run out of memory and disk, and three AWS metrics giving us three different values. Half of the graphs on dashboards are normally empty or outdated, and manually clicking through UIs, especially when the team is small, seems like a huge waste of time.

At some point we realized that solving this problem would be more valuable than the things we had been working on, and we had the expertise to do it, since Arseniy had spent years at Datadog, getting paged during the night to debug production incidents. So we decided to build a platform that would just work: agent-first, MCP-native, zero-setup.

Here’s how Superlog works: we have a wizard that scans your repo, and automatically instruments it with well-structured logs, traces and metrics via OpenTelemetry. We make sure to highlight main failure modes, endpoint performance, usage per tenant, and LLM/upstream cost (by callsite, tenant and model).

Errors get fingerprinted and grouped into incidents, so you see one issue, not a thousand duplicates. When you get a notification from Superlog, you see a clear failure summary, its inferred severity and impact upfront.

Then the agent investigates and tries to solve the issue. If it has enough context, it produces a concise and tested PR. If it doesn't, it posts its findings for the investigating team, and automatically pulls in the engineers that could contribute more context based on documentation, previous investigations and Slack threads.

Either way the output is one clean PR per incident, posted in Slack, that you can merge, ignore, or open as a Claude Code session and modify.

Three things we think are different from other observability vendors:

(1) We solve the setup pain. The wizard will instrument everything with native OTel SDKs, respecting the semantic conventions, with proper service and environment tagging. We’re also working on native automatic dashboards and alerts, so that you can see what’s going on in a glance and don’t miss subtle failure modes.

(2) Our telemetry doesn’t decay. The wizard runs daily, and keeps adding logs, alerts and dashboards where it’s needed. You don't have to remember to instrument new features. The next time something breaks, the data you need to debug it is already there.

(3) Our goal is to solve alert fatigue. We use agents to merge similar errors and refine the summaries, giving you relevant information upfront. We have a custom evaluation setup that makes sure that our summaries are dense and correct, and severity and impact is on point. We also give you confidence scores for every LLM-enhanced metric so that wrong guesses don’t get boosted.

Important: superlog telemetry is vendor-neutral, so you keep all the logs/metrics/traces we install. Pricing is on the site. We're early, so expect rough edges and please tell us when you find them.

You can try it at https://superlog.sh. We'd love to hear what you're using today, what's broken about it, and whether the "one mergeable PR per incident" model sounds useful or terrifying. Especially keen to hear from folks running integration-heavy products, anyone who's rolled their own observability, and anyone who has tried Sentry / Datadog MCPs and given up. Comments and feedback welcome!

Hsrs – Type-Safe Haskell Bindings Generator for Rust #

github.com

9 댓글4:06 AMHN에서 보기

Hey everyone! I've been working on hsrs, a type-safe Haskell Bindings Generator for Rust.

I couldn't really find any bindings generator that would create type-safe, rich bindings for Haskell from Rust. Naturally, both languages have rich type systems, so I was amazed that no awesome bindings generator already existed, hence I decided to write my own. hsrs feels very similar to pyo3 and napi-rs, and if you've used those, hsrs will feel right at home.

What's unique about hsrs as opposed to hs-bindgen is that it has type-safe bindings for rich types, like Result, Maybe, etc. while also generating Haskell bindings. The repo contains a minimal example, and more details are available in the haskell discourse: https://discourse.haskell.org/t/ann-hsrs-ergonomic-haskell-b...

Id-agent – Token efficient UUID alternative for AI agents #

github.com

55 댓글11:16 AMHN에서 보기

Pg_deltax, Apache-licensed alternative to TimescaleDB #

github.com

1 댓글6:36 PMHN에서 보기

The Hanging Sculptures of the Xiaoxitian #

funes.world

8 댓글6:54 AMHN에서 보기

Yt-x v0.8.0 – Browse, play, and download YouTube from the terminal #

github.com

4 댓글7:44 PMHN에서 보기

Clark-Browser – Stealth Chromium #

github.com

4 댓글3:09 AMHN에서 보기

Fully open-sourced, perfect for agentic browsing, works with Vercel's agent-browser and playwright.

LibreOffice-rs – I built a pure-Rust LibreOffice using autoresearch #

github.com

1 댓글5:01 PMHN에서 보기

Hey HN,

I built libreoffice-rs: a pure-Rust, std-only library + CLI for reading, writing, converting, and rendering office documents — with *zero* LibreOffice, Java, or C dependencies.

100x faster... I know, I know.

It supports DOCX, XLSX, PPTX, ODT/ODS/ODP, PDF, Markdown, CSV, HTML, SVG, and more. The CLI is designed to feel familiar:

```bash cargo install libreoffice-pure

# soffice-style usage libreoffice-pure --headless --convert-to pdf report.docx libreoffice-pure --headless --convert-to csv spreadsheet.xlsx

# Markdown extraction libreoffice-pure docx-to-md report.docx report.md libreoffice-pure pptx-to-md slides.pptx slides.md

# Render pages as images libreoffice-pure docx-to-pngs report.docx pages/ --dpi 144 ```

Gpubook – An order book for GPU compute #

gpubook.io

4 댓글3:08 PMHN에서 보기

I built a native macOS Markdown viewer 100% with AI coding agents #

github.com

1 댓글9:57 PMHN에서 보기

I built Markdown Viewer because every Markdown app I found was either bloated (VS Code, Obsidian) or too bare-bones. Wanted something that loads instantly, renders Obsidian-style features cleanly, and weighs in at a few megabytes.

Built with Tauri 2 (Rust backend + webview frontend): - GitHub Flavored Markdown + Obsidian extensions (wikilinks, callouts, emoji, math, Mermaid diagrams) - Frontmatter rendered as a structured metadata bar above content - HTML sanitization via ammonia for security - No heavy dependencies, no Electron

What makes it interesting isn't so much the features — but how it was built. Every line of Rust, CSS, and JavaScript was written by AI coding agents (pi.dev/Qwen and Claude Code) without a single human writing code. No hand-holding, no "prompt then copy-paste" — just a high-level brief and iterative agent-driven development.

I've been using this project to hone into my pi.dev setup - am getting somewhere with pi.dev/Qwen3.6 with a small set of extensions. Trying to avoid Claude Code/Opus for this project - want to see what I can do with local LLM.

Key stats: - Instant load (no webview overhead, pure rendering) - ~few MB binary - Sanitized HTML via ammonia (XSS-safe) - Open source on GitHub

Open source at https://github.com/rajatarya/mdviewer

Closed Rings – A CLI-first time tracker for developers #

closedrings.sh

0 댓글5:29 AMHN에서 보기

Hi, HN. I built Closed Rings. A developer-friendly, AI-agent-first time tracker that integrates with my workflow. I wanted something that lives in my terminal and my coding agent.

You can run `rings start "OAuth 2.0" -m "Start integrating OAuth 2.0"` when you start a new task and `rings close` when you're done with your current work. In between, it tracks context switches. You get a stand-up-ready summary, a focus report (longest focus block, number of context switches, time per project), and an export grouped by project or day.

You can also ask your AI agent: _Start tracking "OAuth 2.0"._ Or track retroactively: _Track a 1-hour meeting I forgot to track this morning at 8._ The MCP has a comprehensive set of tools.

This is primarily for consultants or freelance developers who want to start tracking their time right away. The CLI is pretty straightforward, and the MCP allows you to do everything you can do in the dashboard.

Want to integrate it in your own systems? Just create an API key and start communicating with the API.

The stack is pretty simple: Ruby on Rails (MCP + API), Go (CLI).

The pricing is also pretty flat: $7/mo ($60/year).

My first product. Feedback welcome.

Autodidact – Self-evolving local-first AI agent #

github.com

4 댓글3:36 PMHN에서 보기

pip install autodidact && autodidact init

Enforra – open-source action governance for AI agent tool calls #

github.com

1 댓글9:08 PMHN에서 보기

MyUUIDshop, Generate UUIDs and never worry about duplicates #

myuuid.shop

7 댓글5:17 AMHN에서 보기

In response to some recent discussion here and on X about a company having an in house uuid microservice and team dedicated to it. At first that was made fun of, but further discussion revealed in fact sometimes uuids can collide due to improper entropy seeding most likely. In order to ensure that UUIDs are unique, we store each generation in a database, then check new generations against it to ensure they are not previously generated. As well, there is an API through which you can check if a UUID is present in the database. Paid options available for heavy use. Enjoy!

SharpSkill – A LeetCode Alternative with real interview outcomes #

sharpskill.dev

0 댓글9:39 PMHN에서 보기

Gx HN,

We built SharpSkill in order to add another angle for the technical interviews' process.

Just a Leetcode Alternative to give other insights.

A self-balancing skip-list (a.k.a. "splay-list") library in C #

codeberg.org

0 댓글12:48 PMHN에서 보기

A header-only C library implementing a concurrent, lock-free skip-list (specifically, a splay-list: a skip-list with optional adaptive rebalancing). The entire implementation lives in preprocessor macros in include/sl.h that generate type-specific code at compile time, similar to C++ templates.

Logbox – let Claude monitor your dev logs #

github.com

1 댓글7:03 PMHN에서 보기

TL;DR: logbox is an open-source tool that pipes dev server logs to a local sqlite db with `<your-dev-server-cmd> | logbox collect`. Give Claude Code access by running `claude mcp add logbox -- logbox serve`.

I used to copy & paste logs into Claude Code when manually testing my server in dev. I wanted to give it its own verification loop.

I initially tried having it boot the server itself and follow the logs. It was good at knowing if the server booted properly, but it capped out and missed details when the logs started flowing in.

I also tried piping the logs to a local file and telling Claude to read them from there. It worked, but became annoying once we had multiple services or wanted to reference past dev server sessions.

So I built logbox for ourselves at Struct and decided to open-source it. It’s a simple Rust CLI that pipes logs into a local SQLite db with an MCP server that gives coding agents the ability to search them.

Once it could reliably monitor the dev server logs totally autonomously after testing its changes, I stopped needing to fish for log snippets and keep nudging it to get a manual test working end-to-end.

Everything stays local. `logbox serve` is an stdio MCP server and it just works with the local SQLite db.

claude-autopilot, autonomous dev pipeline with multi-model review #

github.com

2 댓글11:35 PMHN에서 보기

audio.observer – AI news jingles you didn’t ask for #

audio.observer

0 댓글5:06 PMHN에서 보기

PoC - VPN over WebRTC to Bypass Whitelists #

github.com

1 댓글8:25 PMHN에서 보기

Bevel – Guess the book from its opening passage #

bevel.ink

4 댓글8:13 PMHN에서 보기

Built Bevel because my girlfriend is in a book club and I wanted to make her something. It’s a daily puzzle: you read a 200ish-word public-domain passage and try to deduce the author, decade, nationality, and title in five guesses with adaptive hints in between guesses. Wordle-ish mechanics, but for classic literature. There’s a Discord activity too if you enjoy it & want to add it to your server. Bevel is currently unverified on Discord, so only addable to smaller servers for now. Would love any feedback.

Cervantes yet Another HN Reader #

github.com

4 댓글9:09 AMHN에서 보기

I've been switching between macOS, Linux and Windows machines quite a bit recently due to work, so it's been tough work to find a reader I enjoy using across all platforms, and there are a few features I've been wanting for a while...

...so I one-shotted Cervantes over the weekend. It's a Tauri-based cross-platform desktop app. At heart it will allow you to simply browse HN but I added the ability to favv'e users so you can see their content, you can replace words, it will flag frontpage thread movement and you have a dark interface.

It's not too pretentious, i don't think. Design-wise was also done via Claude Design.

How Expensive Is Your (Steam) Wishlist? #

weloveit.io

0 댓글5:15 PMHN에서 보기

A tool/toy that lets you connect to your Steam wishlist to calculate the total list/current price of all the games on it.

There's a shallow, jokey purpose to it ("I could buy a BMW with this amount!"), but the real purpose is to demonstrate how we can do a better job of portraying a game catalog. I often wishlist stuff, then it pops up in a "Hey, it's on sale!" email months later. In that email, there's a banner capsule, but that doesn't help my brain remember why I added it.

To that end, after you get the bill, you get a nice, flat feed of stuff about all the titles you've wishlisted over the years. It's all stuff that developers painstakingly put together, but which Steam tucks away under the fold of a game's Store page.

Anyway, my wishlist came to about $250. My QA guy is up to $19k. Give it a go; hope you enjoy it!

Search 67K .AI domains by AI-extracted tags and descriptions #

ratemyaisite.com

0 댓글6:46 PMHN에서 보기

I previously shared a mechanism for voting on .AI sites. It was something I built just to explore what's going on in this part of the internet.

I wanted more comprehensive data. So I've added AI generated structured categories and a description for each of the sites, so it's easy to find players within specific niches.

I'm creating a much much larger dataset (not just .AI websites). Let me know if any of this is of interest (will share for free if it's for research purposes).

Noxu DB, a Rust Port of Berkeley DB Java Edition #

codeberg.org

0 댓글12:27 PMHN에서 보기

Noxu provides ACID transactions, a log-structured B+tree, checkpoint-based crash recovery (ARIES), master-replica(s) replication, and XA. I have always admired the design and engineering behind Berkeley DB Java Edition, so I translated it to Rust for fun.

DDS Vibe Academy – 31 free AI coding masterclasses, built by AI agents #

0 댓글8:02 PMHN에서 보기

The DDS Vibe Academy is a free, 31-class curriculum on AI coding. It covers Claude Code, Google Antigravity, Gemini 3.1 Pro, Cursor, Shopify Sidekick, Ollama, Hydrogen 2026, and AI Cost Engineering. No paywall, no signup, no certificate.

The academy hub itself was built by AI agents. Claude Opus 4.7 authored 12 Liquid sections (~6,400 lines). Google Antigravity deployed every file to Shopify via the Shopify MCP. Cowork ran an autonomous browser audit. I did not write a single line of code or upload a single file manually. I designed the constraints. The agents did the implementation.

The curriculum spans four stages. Foundation onramps for Sidekick and Ollama. Development masterclasses on Claude Code and Cursor. Application-stage case studies documenting systems like the Sovereign Orchestrator and NicheForge. Mastery-stage forensic records of multi-agent AI infrastructure. A sister blog publishes Monday through Friday.

Read whatever you want, whenever you want: https://ddsboston.com/pages/dds-vibe-academy?utm_source=hn&u...

Resilient, A composable async resilience toolkit for rust #

github.com

1 댓글9:29 AMHN에서 보기

Resilient is an async toolkit for rust that handles fault tolerance for your rust Apps that often call other services or database queries frequently. Resilient supports rate limiting, circuit breaker, timeout, bulkhead and retry policies. Pipeline is used to define multiple policies at once and run async operations based on the rules from the policies. You can also add a fallback if the system fails too often.

This was inspired by failsafe-go but for Rust. Would love to know your view on this. drop a star if you loved it

Barstool, a Prettier macOS Menubar #

barstool.lotl.dev

2 댓글10:33 AMHN에서 보기

I really hate the way the macOS menu bar looks, and how crowded it gets with all my apps' Menubar menus.

Barstool lets me still see the time/date and other useful info while hiding the menubar. I can see wifi connectivity, date, time, and battery all the time.

The app also observes system notifications to surface now playing state, (Apple) calendar events, and volume/brightness changes.

I've had to do a lot of finking with mac PrivateFrameworks as apple loves to make all the interesting data unavailable through official sources/APIs.

Happy to get any feedback/questions!

A sparse, compressed bitmap index in C. Better than Roaring Bitmaps? #

codeberg.org

0 댓글12:53 PMHN에서 보기

This is an implementation of a sparse, compressed bitmap index. In the best case, it can store 2048 bits in just 8 bytes. In the worst case, it stores the 2048 bits uncompressed and requires an additional 8 bytes of overhead. It compares favorably against Roaring Bitmaps and other competition in the space, but is it better?

Sentilis – a folder of Markdown files becomes your bio, blog, store #

github.com

0 댓글4:32 PMHN에서 보기

Alint, a fast linter for repository structure and hygiene #

github.com

0 댓글11:39 PMHN에서 보기

Hi HN, I have been working on alint for the last little while. It is a linter for the shape of a repository rather than the code inside it. clippy, ruff, eslint, and others already handle the AST and code space. alint checks the other layer: required files, filename conventions, content patterns, structured values inside package.json / Cargo.toml / GitHub workflow YAML, and cross-file invariants. You write one .alint.yml describing what the repo should look like, and it runs in CI. Today it covers 70 rule kinds across 13 families, 19 bundled rulesets, 12 auto-fix ops, and 8 output formats including SARIF.

To check the design against reality, I built (vibed) working configs for 30 real OSS repos. The corpus is deliberately varied: single-language workspaces and polyglot monorepos, from small clean trees up to roughly 150k-file mega-repos, with and without mature in-house tooling. Each one has a writeup of what alint catches that the repo's existing tooling does not, including the false-positive-in-spirit cases; the configs are committed under examples/ so the claims are checkable instead of asserted. Findings span large-scale verify-script consolidation in mega-repos, real Trojan-Source bidi-control characters in archived release notes (caught by the bundled oss-baseline ruleset), and the usual CI hygiene drift.

The benchmark is the project's strongest discipline, every release is thoroughly benchmark-gated - see docs/benchmarks for methodology and results. 0.2s-1.5s for 100k file repos, 5s-15s for 1m file repos.

Test discipline is layered: unit tests, declarative YAML end-to-end scenarios, property-based invariants, smoke fixtures that pin rule semantics so an ignored knob is caught at PR time, and the benchmark gate. The project is dual-licensed Apache-2.0 OR MIT, telemetry-free (the only outbound is an extends: URL you wrote yourself, SRI-pinned), reproducible build, one static binary. Install: cargo install alint, a Homebrew tap, a distroless Docker image, an npm shim, or a GitHub Action.

The roadmap is public under docs/design/. Near-term: broader extensibility (richer plugin surface, more bundled rulesets) and a few high-demand rule kinds the case studies surfaced.

The fit with coding agents has been getting the most interest lately. Agents make exactly the structural mistakes alint catches: a stray .bak file, a scratch PLAN.md at the repo root, a committed console.log, drift off the repo naming convention. A --format agent output emits per-violation instruction strings an agent can act on, and alint export-agents-md writes the active ruleset into an AGENTS.md or CLAUDE.md section.

What it deliberately does not do is parse code. If a check needs an AST or type information, that belongs to clippy or eslint and alint stays out of it. On bazel roughly 29% (15 of 52 checks) are out of scope for exactly that reason, and the writeup breaks down which and why.

I am open to all kinds of feedback: pain points, things that should be primitives and are not, install or first-run friction, naming, DSL ergonomics (the JSONPath dashed-key bracket-form papercut has caught everyone), ruleset coverage. If you try it on a repo of yours and it falls over, that is the feedback I most want to hear.

Repo: https://github.com/asamarts/alint

Site and docs: https://alint.org

Benchmarks: https://alint.org/benchmarks/

The 30 example configs: https://alint.org/examples/

MediaMolder – A Modern Rewrite of FFmpeg #

github.com

0 댓글11:42 PMHN에서 보기

MediaMolder is a ground-up rewrite of FFmpeg's interface and orchestration layer, built directly on the same libav* media libraries. Written in Go, it includes a React GUI that runs in your web browser. Import, view, validate, run and monitor your FFmpeg jobs (command lines, which are really graphs). If validation fails, MediaMolder will suggest and can implement a fix. You will see per-node performance statistics as your job runs. There is an initial implementation of a real-time mode, which monitors and dynamically optimizes your running graph, in order to achieve and maintain real-time frame rates. When your graph is perfected, hit the “-> FFmpeg” button to see the equivalent FFmpeg command line.

MailSec – Free email security audit API (one curl, eight checks) #

fivetag.systems

0 댓글5:09 PMHN에서 보기

Circuit Breaker – runtime cost ceilings for AI agents #

github.com

0 댓글2:07 PMHN에서 보기

FortiGate SSL-VPN Honeypot #

github.com

0 댓글12:30 PMHN에서 보기

A deception honeypot that mimics FortiGate VPN-SSL devices to trap brute force attempts, detect deliberately exfiltrated credentials for counter‑intelligence, and report malicious activity to external intelligence feeds.

Memory Concierge – hotel concierge AI #

memory-concierge.vercel.app

0 댓글4:09 AMHN에서 보기

RTFRA - A Humble Proposal [RFC] #

0 댓글11:44 PMHN에서 보기

You know that feeling when no one reads the documentation you wrote? I bet we've all experienced that moment when, after spending a lot of time crafting a README file, you realize nobody gives a fuck.

But how do you know nobody reads your Readme, I hear you ask?

Well, come on, remember when your colleague or manager asks for a call to quiz you on things you already carefully detailed in the docs?

And now? AI agents - which are basically LLMs trained on human data - expect you to have an `AGENTS.md` file. This is on top of the `README.md`, `CONTRIBUTING.md`, `STYLEGUIDE.md`, and `TESTING.md` files you've already written. It's happening all over again.

So "tu quoque," Claude? My friend Claude, you want me to create an `CLAUDE.md` file because you don't want to read my docs either...

This is why I propose we create an `AGENTS.md` file containing nothing but this:

RTFRA

(Read The Fucking Readme, Agent)

Capframe – capability tokens for AI agent tool calls #

capframe.ai

2 댓글11:46 PMHN에서 보기

I made a 2D Lua game engine using Rust with code and asset live reload #

usagiengine.com

0 댓글7:46 PMHN에서 보기

I love making small 2D pixel art games, especially using tools that have some constraints. Things like Pico-8 and the Playdate SDK are simple, fun, and allow for focusing on the game idea rather than the technical minutia. Years ago I prototyped an idea for this little game engine, Usagi, but used Rhai instead of Lua. The idea for this little game engine never went away, so I decided to finally dig in and build it.

Today I released v1.0.0 of Usagi Engine after making a bunch of small games, getting feedback from developers, and stabilizing the API. It's simple, has a great developer experience (CLI-based, init template, Lua plugin integration, and cross-platform export for web, Linux, macOS, and Windows with a single command).

The engine is public domain, and its source is on GitHub (linked from the website).

Rust was a great fit for this project due to its stability and tooling. The crate ecosystem is a real highlight for me. Plus clippy. I made a couple of games in Rust in the past (using Macroquad) which prepared me for this project. But for Usagi I decided to go with Raylib for it's maturity. Usagi is using sola-raylib, the Rust bindings for C Raylib with some Rust-y wrappings. I maintain these bindings, which also was a big help to be familiar with what's possible with Raylib.

The well known tools that are similar are Pico-8, Picotron, Love2D, and DragonRuby Game Toolkit. They all have their strengths and weaknesses. I think Usagi fits in a little spot amongst them where it's free, open source, and has a much more modern developer experience.

Now that the engine is v1.0.0, I'm going to focus my energy on making games with it, writing a book of tutorials, and creating video screencasts. I love sharing what I learn and helping people make their games.

I'd love it if you checked the engine out, and I'm looking forward to seeing what people make.

Crisper – On-device voice to polished text for macOS #

speakcrisper.com

1 댓글5:17 PMHN에서 보기

Hey HN,

I built Crisper because every dictation tool I tried either sent audio to the cloud or gave me raw, messy transcripts I still had to fix.

Crisper runs entirely on-device — no network calls, no account, no subscription. It does two things in sequence: transcribes using a speech model, then runs a local LM pass to strip filler words, fix grammar, and make the output sound intentional. The whole thing takes ~1–2 seconds on Apple Silicon.

A floating hotkey pill sits above every window. When you're done recording, it auto-pastes back into whatever app you were in before — Slack, Notion, VS Code, anything.

A few things I'm happy with: - Three recording modes (toggle, hold-to-record, re-paste last) — all rebindable - Full transcript library with source app, timestamp, and audio playback - Fully offline after first-run model download

Free to download. Would love feedback on the AI polish quality especially — that's the part I'm still tuning.

https://speakcrisper.com