매일의 Show HN

Upvote0

2026년 4월 29일의 Show HN

32 개
91

Adblock-rust Manager – Firefox extension to enable the Brave ad blocker #

github.com favicongithub.com
44 댓글12:24 PMHN에서 보기
Firefox 149 ships adblock-rust (Brave's Rust engine, MPL-2.0) completely disabled with no UI. It's controlled by two about:config prefs with no WebExtension API, so you can't touch them programmatically from a standard extension.

This extension gives it a UI: ETP toggle (via browser.privacy API, instant), filter list manager with clipboard helpers for the manual about:config steps, and 8 preset lists. You can also add your own if you so desire.

54

A new benchmark for testing LLMs for deterministic outputs #

interfaze.ai faviconinterfaze.ai
21 댓글4:01 PMHN에서 보기
When building workflows that rely on LLMs, we commonly use structured output for programmatic use cases like converting an invoice into rows or meeting transcripts into tickets or even complex PDFs into database entries.

The model may return the schema you want, but with hallucinated values like `invoice_date` being off by 2 months or the transcript array ordered wrongly. The JSON is valid, but the values are not.

Structured output today is a big part of using LLMs, especially when building deterministic workflows.

Current structured output benchmarks (e.g., JSONSchemaBench) only validate the pass rate for JSON schema and types, and not the actual values within the produced JSON.

So we designed the Structured Output Benchmark (SOB) that fixes this by measuring both the JSON schema pass rate, types, and the value accuracy across all three modalities, text, image, and audio.

For our test set, every record is paired with a JSON Schema and a ground-truth answer that was verified against the source context manually by a human and an LLM cross-check, so a missing or hallucinated value will be considered to be wrong.

Open source is doing pretty well with GLM 4.7 coming in number 2 right after GPT 5.4.

We noticed the rankings shift across modalities: GLM-4.7 leads text, Gemma-4-31B leads images, Gemini-2.5-Flash leads audio.

For example, GPT-5.4 ranks 3rd on text but 9th on images.

Model size is not a predictor, either: Qwen3.5-35B and GLM-4.7 beat GPT-5 and Claude-Sonnet-4.6 on Value Accuracy. Phi-4 (14B) beats GPT-5 and GPT-5-mini on text.

Structured hallucinations are the hardest bug. Such values are type-correct, schema-valid, and plausible, so they slip through most guardrails. For example, in one audio record, the ground truth is "target_market_age": "15 to 35 years", and a model returns "25 to 35". This is invisible without field-level checks.

Our goal is to be the best general model for deterministic tasks, and a key aspect of determinism is a controllable and consistent output structure. The first step to making structured output better is to measure it and hold ourselves against the best.

21

GeoTraceroute – Traceroutes on a 3D globe and submarine cables #

geotraceroute.com favicongeotraceroute.com
1 댓글1:45 AMHN에서 보기
I've been working on GeoTraceroute for a while and just shipped v2.3 with submarine cable inference.

A few things that might interest HN:

- 320 community-contributed nodes across 50 countries, all volunteer-run - Three views: 3D globe with day/night rendering, 2D map, and a topological mode that infers submarine cable routing - The submarine cable inference is the part I find most interesting. Since cable routers don't respond to ICMP, the underwater segments are invisible to standard traceroute. The tool detects ocean crossings by geolocation delta between consecutive hops, then infers the likely cable using a geo graph of landing points with A* pathfinding. I have no way to validate this — if anyone has ideas, I'd love to discuss.

Coverage is good in EU and US but thin in Asia, Africa and South America. If you want to contribute a node: https://geotraceroute.com/join

Salim

21

Pi-hosts – Give the Pi coding agent access to your servers #

github.com favicongithub.com
0 댓글2:14 AMHN에서 보기
I built that initially for an AI chat bot that allows teams to perform DevOps tasks straight out of Slack/Teams (with proper permission control, obviously).

Useful to let developers perform mundane tasks, or help coordinate incident response.

I ended up using it myself on my own machine to manage Hetzner and AWS boxes. I thought that may be useful to others.

The default policy should be good enough, but if you don't like living dangerously, you can set it to `paranoid`.

Install: `pi install npm:pi-hosts`

GitHub: https://github.com/hunvreus/pi-hosts

6

A Multi User Multi Task Board MCP Server #

github.com favicongithub.com
2 댓글7:41 PMHN에서 보기
I built a simple multi user, multi board, Task/Kanban MCP server. I have been looking for something like this to manage development agents, but I wasn't seeing anything that felt like what I wanted. So I set down and decided to vibe code an alternative.

While it was an experiment at first I have been using it daily for my personal development projects and I really think there are others who might be looking for exactly this. It's 100% a WIP, but it is also very usable.

I have a demo instance running at https://mootasks.dev. If you find this interesting I'd appreciate a star. This is really the first thing I built that I felt would be of interest to others.

The readme explains it, but if you have docker you can get this running in a couple minutes. It's helped my workflow a lot and I plan on continuing to add features / improve it.

6

AgentPort – Open-source Security Gateway For Agents #

agentport.sh faviconagentport.sh
2 댓글4:33 PMHN에서 보기
Hey HN!

I've been wanting to use something like OpenClaw for a while but couldn't get myself to give it access to anything important due to all the risks involved. Prompt injection is still a problem (even though some people seem to ignore it) and so are hallucinations and mishaps that cause agents to do things like delete production data [1].

Even harnesses like Claude Code and Codex are subject to this, particularly since we're getting progressively looser about how we run them e.g. Conductor is really popular and runs agents without any sandboxing.

That means we're in a bit of an all-or-nothing situation. There are people who just ignore the risks and connect everything to their agents and reap benefits from it while being subject to more risk, and there are others that just don't connect anything because they are mindful of the potential issues.

I've been quite cautious but have wanted to run more autonomous agents and so I built the component I needed to enable me to do so: AgentPort.

AgentPort is a gateway that connects to any service (e.g. Gmail, GitHub, Stripe, PostHog, Linear) and let's you set granular permissions for what the agent can do automatically, what it needs your approval for, and what it can never do.

For example, you can set `list_customers` and `get_customer` on the Stripe integration to "Auto-approve" but `create_refund` to "Ask for approval". The agent will thus be able to do a lot in the background independently but when it comes to a potentially destructive operation it will be blocked and receive an approval link to send to you. You can then approve or deny the call with those exact parameters e.g. `create_refund(customer_id: 1234, amount: 12)`.

Agents connect via MCP or CLI and have access to all the integrations you connected without ever getting API keys. Kind of like Composio but with granular permissions and open source.

The goal with AgentPort is to specifically address two vulnerabilities that agents are subject to:

1. Destructive operations on downstream services: It can't delete a database unless you explicitly approve 2. Credential exfiltration: Your agent never sees API keys

AgentPort also helps with sensitive data exfiltration, but that is more nuanced and complicated to defend against if the agent has an internet connection [2].

Ultimately, AgentPort was the missing piece for me to start running more autonomous agents that have access to third-party services, and hopefully it can unlock use cases for you too. There's a ton more work needed around securing agents (Claws in particular) and I've both been writing about it [3] and intend to do more in this space, so if you're thinking about similar things let's have a chat.

The repo is https://github.com/yakkomajuri/agentport and you can run it locally with docker compose in a minute or use the one-liner install to deploy a prod instance (domain, TLS, etc.) in just a few mins as well.

[1] "An AI agent deleted our production database. The agent's confession is below" (https://news.ycombinator.com/item?id=47911524)

[2] See my post "On agents dropping production databases": https://yakko.dev/blog/on-agents-dropping-production-dbs

[3] https://yakko.dev/blog

4

OmniForge – document intelligence and audio capture with local LLM #

omniforge.online faviconomniforge.online
0 댓글3:47 PMHN에서 보기
We built OmniForge for 2 reasons:

- we dread context switching between apps and wanted a unified place for docs and meeting recaps that can be used as context for an AI assistant

- we wanted an alternative to cloud solutions and the piece of mind that our information stays private

With OmniForge you can import documents, record > transcribe > summarize audio and ask questions across all of them

The app has a generous free plan and requires no registration/payment to try it.

Happy to answer questions and hear where this feels useful or where it falls short.

3

Platypus – Local meeting transcription, notes, and chat (Tauri, Rust) #

platypusnotes.com faviconplatypusnotes.com
0 댓글3:07 PMHN에서 보기
Hi HN — I built Platypus as I wanted to combine note taking, live transcription and knowledge base management in one app. Granola / Notebook LM free local alternative. It's a Tauri/Rust desktop app that does on-device meeting transcription via whisper.cpp, notes in TipTap, and per-project HNSW vector search across your knowledge base. You bring your own LLM (Claude, OpenAI, Gemini, or local Ollama).

A few things that were interesting to figure out:

Zoom/Teams meeting auto-detection works via process inspection — Zoom spawns CptHost only during an active call (not when the app is just open), and Teams's audio.mojom.AudioService sub-process. No Zoom/Teams API access needed. Local Whisper works great - on Mac. Old PC experience not recommended, so I built in API transcription switch if the local model is slow.

Repo: https://github.com/pixelsmasher13/platypus Site: https://platypusnotes.com

Feedback welcome!

3

Stream iOS Simulators to a Browser Window #

github.com favicongithub.com
1 댓글9:34 PMHN에서 보기
Agent tools seemingly know how to work with browsers better than with iPhone simulators, so I built this tool to capture the simulator XPC stream and render it in a webpage. This means Claude Code/Codex desktop apps can use their existing browser-use tools to launch the preview, screenshot, and read logs without getting caught in a osascript loop.
2

Agented, a Text Editor for LLMs #

github.com favicongithub.com
0 댓글1:24 PMHN에서 보기
Hi everyone,

I guess a lot of you by now know this pain. The agent goes off in some direction for fifteen minutes, half of it is wrong, and you're left trying to reconstruct from memory which parts to keep, because undo is one line back to nowhere useful.

`ae` is my idea to fix this. It's inspired by ed (yes, good old ed): line addresses, short verbs, and absolutely unusable for the common human being. I went down the rabbit hole and tried to reimagine ed in the age of LLMs. It works with any common agent, as a skill or over MCP. Check out the readme for all the details.

Every once in a while the LLM drifts back to its built-in tools and I have to nudge it back to `ae`. If you have any ideas on how to keep an agent (especially the one notorious for disobedience) from sliding back to defaults, and also general feedback, I'd love to hear it.

2

Stateless, system-wide Transparent Tor Proxy for Linux (v0.1.0) #

github.com favicongithub.com
0 댓글12:44 PMHN에서 보기
Hi, I made this transparent proxy because I wanted to make a tool that felt easy to use. You just have to turn it on and forget it, and as soon as you turn it off your computer will go back to the previous state. In a few words, TTP intercepts all TCP traffic and DNS queries (using nftables) and redirects them to respectively Tor's TransPort and DNSPort. When turned off, it destroys its dedicated table atomically, without breaking your network configuration on your machine. It also implements a kill-switch for connections established before TTP starts (outgoing traffic is rejected, doing so the connection gets automatically re-estrablished using the secure connection). In the end, it offers SELinux custom policies and it is crash safe (using a lock file in /var/lib/ttp). Disclaimer: this tool is not meant for high-risk activities. I DO NOT recommend it for further purpose rather than privacy and/or for testing/developing it. Finally, this is just a v0.1.0 by an undergraduate cs student. I made this to learn, so, please, help me out! Thank you all. Github link: https://github.com/onyks-os/TransparentTorProxy Documentation: https://onyks-os.github.io/ttp/
1

Snitchmd – Cloudflare-protected URLs into clean Markdown via Docker #

github.com favicongithub.com
0 댓글3:07 PMHN에서 보기
Shmauthor here. Built this for myself, putting it out in case it's useful.

Needed any URL as clean Markdown for LLM context — including Cloudflare/anti-bot sites. curl gets HTTP 403 on those, raw HTML is 80%+ nav noise eating context, paid SaaS (Firecrawl, Jina) wasn't an option for me.

It's a Docker wrapper around two existing OSS tools — CloakBrowser (stealth Chromium that passes Cloudflare) and rs-trafilatura (HTML → Markdown). No new scraper, just glue. Runs locally, my URLs stay on my box

Token reduction (raw curl HTML vs snitchmd, tiktoken cl100k_base):

- cloudflare.com/learning/bots — curl: HTTP 403 → snitchmd: 0.8k

- docs.docker.com/engine/install — 187k → 0.9k

- en.wikipedia.org/wiki/LLM — 222.7k → 29.7k

Heads up: passes Cloudflare, can't solve "click traffic lights" captchas (reCAPTCHA v2, hCaptcha)

MIT. Happy to answer questions