매일의 Show HN

Upvote0

2026년 6월 5일의 Show HN

25 개
336

I Derived a Pancake #

absurdlyoptimized.com faviconabsurdlyoptimized.com
136 댓글6:42 AMHN에서 보기
After 25 years of making other people's pancake recipes - always yearning for more tang, more fluff, and more predictability - I decided to derive the pancake recipe from the chemistry.

You mark checkboxes for what you have on hand (ricotta, sour cream, kefir, buttermilk, yogurt, cottage cheese, lemon, cream of tartar, etc.) and it computes the best recipe based on targets for acid, fat, salt, sugar, and CO2.

My particular favorite are the yeast-raised lemon ricotta kefir pancakes - the best I've ever had.

The math is done in a small pure-ESM library: ingredient composition to component masses and acid moles, a stoichiometry layer, and a bisection solver for the target deficits.

I'm not a chemist, so if something is off, tell me and I will fix it!

156

Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens #

github.com favicongithub.com
78 댓글9:10 AMHN에서 보기
Hi HN,

Not sure if anyone would be interested.

But, just wanted to share that I've been maintaining my small tool called 'lowfat' that helps me filters some of my verbose CLI output.

It's a single binary, works as an agent hook or a shell wrapper. It has a plugin system to customize filters per command.

The idea is pretty simple: agents don't need the full kubectl get -o yaml or any 10k-line dump to make decisions. So that lowfat sits in between, strips the noise, and passes through what matters.

Here's my real report after 2 months of personal use:

lowfat history --all

  lowfat plugin candidates
  ─────────────────────────────────────────────────────────

    #  command                    runs   avg raw      cost   savings  source    status  
    1  kubectl get                101x     14.4K      1.5M     93.9%  plugin    good    
    2  grep                       103x     13.5K      1.4M     96.2%  plugin    good    
    3  git diff                    81x       995     80.6K     57.9%  built-in  good    
    4  kubectl                     90x       485     43.6K     33.6%  plugin    good    
    5  docker                     127x      5.5K    693.6K     96.1%  built-in  good    
    6  ls                         489x       117     57.3K     56.2%  built-in  good    
    7  find                        30x     16.5K    495.0K     95.5%  plugin    good    
    8  git show                    63x       490     30.9K     38.0%  built-in  good    
    9  git                        177x       368     65.2K     76.1%  built-in  good    
   10  git log                     86x       556     47.8K     78.5%  built-in  good    
   11  kubectl logs                 5x      3.6K     17.8K     43.0%  plugin    good    
   12  git status                  86x       152     13.1K     58.0%  built-in  good    
   13  docker ps                   20x       467      9.3K     52.8%  plugin    good    
   14  kubectl describe             6x       656      3.9K      1.2%  plugin    weak    
   15  docker images                9x       940      8.5K     61.8%  built-in  good    
   16  k get                        2x      2.1K      4.2K     35.9%  plugin    good    
   17  terraform                   10x       395      3.9K     32.1%  plugin    good    
   18  git commit                  32x        77      2.5K      0.0%  built-in  weak    
   19  docker build                 8x       487      3.9K     37.6%  built-in  good    
   20  docker compose              22x       979     21.5K     89.4%  built-in  good    

  total: 4.4M raw → 4.1M saved (91.8%)
My toolset above is kind limited, but it works pretty well for my usecase without any interruption Kinda help me not reaching the token limit for my company Bedrock limit usage and keep optimizing the saving on the go for later usage.

But, why not alternatives (https://github.com/zdk/lowfat#alternatives) ? The answers are: - My goal is to make the core lightweight but extensible via plugins i.e. not trying to bundle every command in the installed binary so that people own their output filters. - Customizable per usecase via plugin or filter pipelines as I am using my own toolset. - Customizable for non-public CLI tools, for example, some enterprise might have their interal CLI tools that public won't have access. - People should own their data. So the design is local-first, No telemetry forever. - I kinda love UNIX-style composible pipes, so lowfat-filter has implemented this style. - Be able to adjust aggressiveness of the filter, so we can control that we won't strip something the agent needed.

GitHub: https://github.com/zdk/lowfat

Anyway, if anyone is interested, feedbacks and questions are welcome!

Thanks!

32

MimicScribe – transcriber with ~97% accurate on-device speaker IDing #

mimicscribe.app faviconmimicscribe.app
9 댓글5:33 PMHN에서 보기
I’ve spent the last seven months building a tool I wish I’d had in my previous roles. MimicScribe is a macOS menu bar app that fits the "AI notetaker" category. It has accurate on-device speaker identification (a first possibly?), real-time meeting talking points for discovery calls, and a fully keyboard- and voice-driven interface.

I believe the accuracy of the speaker ID system is its biggest strength. I used fluid audio’s port of (https://github.com/fluidInference/FluidAudio) Pyannote's community-1 as a base. To improve accuracy, the system uses grammar structure cues from the Parakeet STT to mask by sentence. By taking a second set of samples within that mask for cluster assignment, it leverages the fact that most people don’t finish each other's… sandwiches in business meetings. It tends to slightly oversegment, as I’ve found it much easier to merge segments or reassign a speaker than it is to untangle an incorrect merge.

The app provides in-meeting talking points using a prompt tuned for discovery type calls. It can suggest probing questions to help you extract more detail or helps you refocus on the big picture with “magic wand” type questions (e.g. “how would your ideal system work”). Getting low latency models to provide novel, relevant, and totally not hallucinated information is a bit of a reach and it tends to restate the transcript frequently but little gems do come from it sometimes so it’s best to think of it as a source of inspiration and be a vigilant gatekeeper.

It’s set up so recording can be started and ended via holding a keyboard shortcut instead of connecting to your calendar service. I prefer this for privacy and to keep transcript history from getting cluttered. Tapping the shortcut shows and hides an always-on-top overlay on your active screen regardless of whether you have other apps full-screen or not. Beyond simple navigation, you can also use voice commands to make post-meeting corrections or additions, for instance, you can simply say "merge this speaker with that speaker" to clean up the transcript.

A developer friend who’s worked in finance reviewed the site and said he’d bounce because the privacy story wasn’t strong enough so I added a completely on-device mode and a bring-your-own-key option. Using cloud models does add a lot to the experience, including context aware speaker merging and fragment cleanup, summary items during meetings, action items attributed, etc. On-device mode is completely free and the speaker identification is still very useful.

The privacy story is my biggest worry with the app, particularly since its target audience is more technical people. I’d love to get people's thoughts on it and any feedback would be super helpful.

20

I nerfed our coding agents on purpose #

10 댓글11:19 PMHN에서 보기
Tl;dr: I trained a classifier to route to the least expensive model and reasoning depth to complete the request. Coupling that with additional automated token efficiency techniques has yielded 3x usage for the same spend. For anyone interested in trying it themselves: https://nerfguard.com

Various teammates and I switched over to Codex from Claude Code recently. We still bounce between the tools, but Codex’s speed and steerability coupled with performance gains were hard to ignore. One of the downsides was that the per token pricing kicked in way sooner. This is happening across the board, but we felt it in Codex more acutely. We’re a startup filled with people who work around the clock and are obsessed with building — naturally our daily bill alone was striking.

Luckily we’re going after a big mission and speed matters significantly more than marginal token spend on the edges. Still, it got us thinking about how it was ludicrous that while our product has a side effect of decreasing token spend and speeding up agentic workflows by many orders of magnitude, we were using these top tier models for all types of internal coding tasks without any of those optimizations. The waste felt pretty ridiculous — the most glaring culprit was that we were seemingly using the max intelligence model on max reasoning for every task even when the task clearly didn’t require it. As a company who spends a lot of time on cached intelligence, it was also easy for us to see how there was plenty of other low hanging fruit as well.

So, on a recent weekend, I quickly built a tool to optimize our usage. At its core is a very fast classifier that classifies your requests to the least intelligence required for the task and includes some nice token optimizations on top. The result is roughly the same quality for multiples lower token spend. But even more exciting for us, is that the properly bin packed intelligence and reasoning levels meant our speed also went up considerably. This wasn’t negligible.

We’ve observed up to 3x savings and hours per day per person in saved time that we would have otherwise been waiting on tool turns and coding agent responses.

For us, that means improved engineering velocity and significantly higher usage for the same spend. It also means more usage before getting throttled.

As I told friends about this, they also wanted to start using it to maximize the usage they could get out of their coding agent plans. There are now engineers across many of the most cutting edge AI companies using this tool to optimize their token utilization in this way. Not just to save money, but to maximize output. Turns out that the best way to avoid getting nerfed by Claude is to intentionally nerf yourself selectively. We decided to release it for the rest of the builder community to use as well. You can now turn on Nerfguard for yourself and start getting more usage today.

19

Courtside – TUI for NBA Games #

github.com favicongithub.com
5 댓글11:43 PMHN에서 보기
Hi HN, I made this after seeing a few similar projects on the front page. NBA API endpoints are public and there’s a pretty robust python package ( https://github.com/swar/nba_api ) that I referenced for the endpoint structure to build an sdk in go. used BubbleTea and LipGloss for styling. It was a bit tricky to test the live endpoints but I watched Friday’s Final game with this and it worked pretty well

playball - https://news.ycombinator.com/item?id=45451577

faceoff - https://news.ycombinator.com/item?id=47826104

10

Concord – Discord in Terminal #

github.com favicongithub.com
2 댓글8:56 AMHN에서 보기
Concord(not that game) is TUI client for discord.

Here are features: Login by token, email/password or QR code from the mobile app

Vim-style keys and full customization

Messaging, Voice chat support

Attachment downloads, uploads

Reactions(Unicode + custom emoji), poll voting

Inline image previews via Kitty, iTerm2, or Sixel protocols (halfblock fallback for anything else)

Avatar and custom emoji rendering, full-screen image viewer

Live typing indicators, unread + mention counts

Desktop notification

10

Lessons learned from running Claude Code swarms at scale #

6 댓글4:34 AMHN에서 보기
Some time ago I built a simple app to run swarms of coding agents — I call it fleet (https://news.ycombinator.com/item?id=48256389). It's based on centralized beads with a Python orchestrator and can run any coder (Claude, agy, Codex). Recently I added a UI to manage the whole agent lifecycle: adding new tasks, monitoring running ones, and a chat interface built on MCP with a centralized SQLite DB. From the UI I can spawn agents to run in any directory, define dependencies on other tasks, and specify which coder/model should do the job. Today I can run 10–15 agents concurrently. At that scale you burn through limits very fast, so I spent some time investigating where those limits go and how to maximize efficiency. Here are the lessons learned after a few weeks of running the fleet:

- CLAUDE.md is a terrible abstraction. These files load unconditionally, they often contain descriptions irrelevant to the task at hand, and they stack from your working directory upward. The result is wasted tokens and confusion from injecting irrelevant instructions into the session.

- Skills are bad, but not as bad as CLAUDE.md. They use a progressive disclosure approach: only the skill description goes into the session, and Claude loads the full skill text with a tool when it's needed. That's one level better, but it still doesn't let you scale — you can't create 10K skills, as that would eat your entire usable context. Claude recently introduced a skills budget that silently drops less frequently used skills from the session entirely. You can still invoke them in an interactive session, but the model can't invoke them in a background session.

- Some plugins may be installed more than once. During cleanup I found that a few of mine were installed in multiple locations, consuming double the tokens on duplicated instructions.

- Attaching plugins to every session is a bad idea at scale. You want to be precise about which plugins are actually useful and attach them per task.

- Use a hierarchical knowledge base instead of CLAUDE.md / skills / plugins. It lets you benefit from real progressive disclosure: keep your instructions and tool descriptions in it and let Claude navigate through it quickly and cheaply.

- System tools consume ~15K tokens (7% of the session). You can't manage this — they're just attached, and disabling tools doesn't remove them from the context.

- AskUserQuestion isn't available in background sessions. You need to implement your own tool — MCP- or CLI-based — to give `claude -p` the ability to talk to you.

- You become selective about which model handles each task. Decompose work into harder and simpler subtasks so you can route the simpler ones to weaker, cheaper models and save tokens.

- Your context-switching skill improves over time.

Fleet repo: https://github.com/sermakarevich/fleet

6

Omni – Local-first multimodal file search on macOS #

hanxiao.io faviconhanxiao.io
2 댓글11:20 PMHN에서 보기
Finally made something I've always wanted, using the model we built.

• SOTA omni embedding model, fully local, indexes text, PDF, image, audio, and video • Swift-native app UI + mlx-swift-transformer core. No Python. • Tested on M3 Pro 18G / M3 Ultra 512G / M4 Pro 48G. All work fine. • HTTP server exposes search to local agents like OpenClaw & Hermes − Indexing still feels slow even on the latest M3 Ultra, ranging from 10K tps to 300 tps depending on file type − Fans go crazy, high power draw while indexing − Search is near-instant. Multimodal relevance is sometimes arguable, but the idea is recall (the agentic LLM takes the results and refines for the final answer), so maybe that's fine

4

I benchmarked LLM agents on fixing real-world security vulnerabilities #

giovannigatti.github.io favicongiovannigatti.github.io
3 댓글7:43 AMHN에서 보기
I built a benchmark with 20 real CVEs across 18 Python projects (Pillow, GitPython, yt-dlp, urllib3, etc). I've run it over 5 LLM agents (3 OpenAI, 2 poolside) and 3 different prompts (full advisory, locate, diagnose) with a total of 300 runs. The agents are tasked to fix security vulnerabilities in a sandboxed environment and they are scored against a hidden security tests from the maintainer's own fix.

Best solve rate was 50%. On the other 50%, some fixes are sometimes coherent and pass all regression tests, but vulnerability still present.

The main differentiator I found between models is cost: gpt-5.5 at 12× more expensive than gpt-5.4-mini while producing statistically similar results. Within-family performance gaps are small, which points out the difference is likely due to model training data. I also did a power analysis and the task count needed to detect a meaningful within-family edge at ~700.

Full write-up: https://giovannigatti.github.io/cve-bench

Code: https://github.com/GiovanniGatti/cve-bench

2

Bash Runtime for AWS Lambda #

github.com favicongithub.com
0 댓글7:12 PMHN에서 보기
Hi HN,

I built a Bash runtime for AWS Lambda to make writing glue code simpler and faster. Sometimes, all you need is a bit of `sed`, `awk`, maybe a loop and a few HTTP API calls, and this runtime gives you all the tools to do that. It comes bundled with `jq` and `curl` so you can handle JSON payloads and string together HTTP API calls right out of the box, including calling AWS services with `curl --aws-sigv4`.

In keeping with the theme, the Lambda handler contract is also made as simple as practical: read from stdin, write to stdout, return 0 for success and non-0 for error. You can run shell scripts, call binaries (either what's available in `al2023.provided` or you can package your own static binaries with your handler), or a combination of both. If you remember nodding along to Adam Drake's post about how bash and coreutils can be faster than a Hadoop cluster, I hope you give this a whirl and find it useful. The runtime is packaged as a Lambda layer, so it should drop right into your normal AWS infrastructure.

2

LLMhop – A tiny, stateless router for LLMs with a NixOS module #

github.com favicongithub.com
0 댓글12:26 AMHN에서 보기
LLMhop is a tiny stateless proxy for LLM inference servers. It tackles an issue I faced when trying to serve more than one local LLM at once which is not natively supported by vLLM. The LLMhop binary inspects the model field of the request and routes it to the correct backend service with optional handling of authentication. In addition, it contains a NixOS module to run llama.cpp, vLLM, and sglang via Quadlet/Podman and auto-register with the proxy.