2026년 4월 27일의 Show HN

33 개

389

OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview #

github.com

Scored 65.2% vs google's official 47.8%, and the existing top closed source model Junie CLI's 64.3%.

Since there are a lot of reports of deliberate cheating on TerminalBench 2.0 lately (https://debugml.github.io/cheating-agents/), I would like to also clarify a few things

1. Absolutely no {agents/skills}.md files were inserted at any point. No cheating mechanisms whatsoever

2. The cli agent was run in leaderboard compliant way (no modification of resources or timeouts)

3. The full terminal bench run was done using the fully open source version of the agent, no difference between what is on github and what was run.

I was originally going to wait for it to land on the leaderboard, but it has been 8 days and the maintainers do not respond unfortunately (there is a large backlog of the pull requests on their HF) so I decided to post anyways.

124

A terminal spreadsheet editor with Vim keybindings #

github.com

51 댓글11:39 AMHN에서 보기

While speccing out this spreadsheet tool, I realized that I never had to think about the keybindings. It all just came naturally from Vim. Normal/insert/visual modes, hjkl navigation, dd/yy/p, :w, :q. The usual muscle memory works.

It supports CSV/TSV import and export, and a native .cell format that preserves formulas. The formula engine handles SUM, AVERAGE, COUNT, MIN, MAX, and IF with range references.

The codebase is a Cargo workspace: a pure cell-sheet-core library (no TUI dependency) and a cell-sheet-tui crate on top of ratatui. Early days, but it's usable.

To try it out: cargo install cell-sheet-tui

Feedback of any kind is greatly appreciated!

124

Utilyze – an open source GPU monitoring tool more accurate than nvtop #

systalyze.com

28 댓글1:55 PMHN에서 보기

The standard GPU utilization metric reported by nvidia-smi, nvtop, Weights & Biases, Amazon CloudWatch, Google Cloud Monitoring, and Azure Monitor is highly misleading. It reports the fraction of time that any kernel is running on the GPU, which means a GPU can report 100% utilization even if only a small portion of its compute capacity is actually being used. In practice, we've seen workloads with ~1–10% real compute throughput while dashboards show 100%.

This becomes a problem when teams rely on that metric for capacity planning or optimization decisions, it can make underutilized systems look saturated.

We're releasing an open-source (Apache 2.0) tool, Utilyze, to measure GPU utilization differently. It samples hardware performance counters and reports compute and memory throughput relative to the hardware's theoretical limits. It also estimates an attainable utilization ceiling for a given workload.

GitHub link: https://github.com/systalyze/utilyze

We'd love to hear your thoughts!

The Unix Magic poster, annotated (updated) #

github.com

7 댓글1:32 AMHN에서 보기

This is a site that maps the references on Gary Overacre's 1980s UNIX Magic poster to short write-ups with sources. I posted an earlier version about a year ago [1]. Since then I rewrote some of the annotations, added deep-linking to individual markers and a frame/sidebar view, gave the site a terminal-style redesign, and fixed historical inaccuracies (daemon etymology, nroff origin, B language vs. Multics, etc.).

Contributions and comments welcome; each marker is a GitHub issue.

site: https://unixmagic.net

[1] https://news.ycombinator.com/item?id=43019136

Unusual Wikipedia #

unusualwiki.nk412.com

7 댓글4:27 PMHN에서 보기

I built a dual crossword puzzle where two crosswords share one grid #

forkle.co.uk

15 댓글10:57 AMHN에서 보기

Forkle (forkle.co.uk) is a daily word game where two thematically linked crosswords occupy the same grid simultaneously. Every tile contains two letters - one belonging to each puzzle - displayed as a diagonal colour split. Where the two puzzles intersect, some tiles share the same letter, giving you a foothold into both crosswords at once. The mechanic has two layers: the shared grid, and the connected themes. Each day's two crosswords are thematically paired around a central idea. Today's puzzle is "Same House, Different Kingdoms" - same home, same humans, entirely different worlds. One crossword is the dog's world, the other is the cat's. The themes are chosen to be related but distinct, which creates an extra layer of satisfaction when the connection clicks. The constraint of forcing two crosswords into one physical space turns out to create genuinely interesting solving decisions - sometimes the two puzzles help each other, sometimes they fight. Built solo over a few months using React, Python, Supabase, Fly.io and Resend. Three months of daily puzzles pre-loaded. Launched two weeks ago. Would love feedback from anyone willing to try it.

Qumulator – quantum circuit simulator, 1000 qubits, no GPU #

github.com

4 댓글3:56 PMHN에서 보기

I built a reference site for the recurring hard problems in software #

thehardparts.dev

3 댓글11:35 AMHN에서 보기

Hi HN, I've been working on this for a while and it was hard to decide when to stop, either on the way information is presented or when to stop with adding entries. It's not meant as a blog, but rather as a reference that keeps growing.

Link: https://thehardparts.dev

Currently I've created 4 main section:

- Failure Modes: ways project go wrong

- Red Flags: early signals that are worth taking seriously

- Tech Decisions: common and not so common trade-offs for hard choices

- Playbooks - guided approach for situations that repeat

I've also focused on creating links between them to show how connected many things are: a red flag usually precedes a failure mode, which might connect to a forced decision, etc.

Some entry points to give you an idea:

- The Invisible Deadline: a date that exists socially but not explicitly enough to manage honestly

- Eveyone Asks The Same Person: when one person becomes the default source of truth

- Build a Practical Rollback Strategy: how to build a reliable rollback strategy

It has 151 entries across the 4 sections.

Curious what you think about the content, format, grouping.

I made a website to clean up recipe websites #

tangled.org

7 댓글3:17 PMHN에서 보기

Hi hn! It has always annoyed me that recipe websites are so cluttered and have so much (in my opinion) useless fluff, so I made a little website to solve that problem! One good thing about SEO for once is that Google has pretty much forced the internet to adopt JSON-LD for recipes so many recipes are directly extractable and for others I have several fall back parsers including an indieweb parser :)

If you do find a website that is broken feel free to click the flag button or share it as I would love to fix any edge cases

Vibe-coding video games with Claude (Day 14: Tetris) #

gamevibe.us

3 댓글4:03 PMHN에서 보기

I used to run a flash games website (SWF files) years ago. I've made a few games of my own. I'm also an avid gamer and love to play games of all kinds.

I'm also a software engineer, and a few days ago I decided I wanted to run a games website again. So I bought the domain gamevibe.us and with the help of Claude I've been vibe-coding one video game every day since.

Happy to answer questions, take feedback, etc

Paper Millionaire – A Startup Stock Option Horror Roguelike #

paper-millionaire.pagey.site

0 댓글3:45 AMHN에서 보기

Building a SQL analyst agent from scratch #

raminmousavi.dev

0 댓글9:07 AMHN에서 보기

I built a SQL analyst agent based on a simple idea: generating SQL is not the same as doing analysis.

Most text-to-SQL tools stop at producing a query. But real analysis is iterative. You explore the schema, run queries, adjust, and refine.

This project tries to model that loop instead of treating a query as the final output.

I wrote about the approach, challenges, and tradeoffs here: https://raminmousavi.dev/blog/building-a-sql-analyst-agent

Github repo: https://github.com/raminious/sql-analyst-ai-agent

Git-agecrypt – transparent file-level encryption for Git #

github.com

0 댓글3:31 PMHN에서 보기

Greetings HN! I've forked the excellent work done by [vlaci/git-agecrypt](https://github.com/vlaci/git-agecrypt), looks like the original project has not been maintained for a while, so I decided to pick it up, update all the dependencies and add some thorough testing.

I like the ability to store sensitive data in public repositories, it's especially useful when bootstrapping new IaC repositories, you don't have anything at the beginning of a new project so there is no place yet to store secrets, this is "a way" of doing it.

I know you can use tools like SOPS but I think the transparent approach offered by tools like this one or git-crypt really shine for many use cases.

Looking forward to hear your feedback!

What happens when you load a webpage (Interactive) #

toolkit.whysonil.dev

0 댓글7:26 PMHN에서 보기

Friendly prediction markets to turn trips into a running tournament #

bets.bernikins.com

0 댓글2:06 AMHN에서 보기

On a trip with my friend, he introduced me to Kalshi and Polymarket and was shocked at how degenerate it was being able to bet on literally anything. So naturally I built an app ironically where we could do the same thing (with fake money) on group outings!

Managed to get a prototype finished in time for our big 14-person annual ski trip and we had a blast 'betting' on things like "Will [friend that always breaks his phone] break his phone?" and "Who will win the first game of Jackbox?" Having a low-stakes running leaderboard during our trip was so fun that I decided to flesh it out and share it with others for their trips, wacky professors, etc.

*How It Works*

1. Create a group for your trip and give your friends the group code to join.

2. Everyone starts with the same amount of Tokens/Points/Ski-bux that they can wager on each question.

3. Place bets on the options that you think will win - the more bets on an option, the more it costs to pick that option.

4. Anyone can propose when an outcome has happened and everyone gets 24 hours to dispute it.

5. Every share of the winning option pays out as 1 Token once the 24 hours has passed. So if you bet 2 tokens when the odds were 50%, you win 4 Tokens!

Please let me know what you think and what features I should add to make it even more fun for your friend groups!

---

Requires an account to use, but feel free to use this test account if you just want to peek around: (u: [email protected] - p: 11111111)

Also available on:

Android: https://play.google.com/store/apps/details?id=com.bernikins....

iOS: https://apps.apple.com/us/app/bernibets/id6761561613

YubiClicker, a clicker game that requires a physical security key #

yubiclicker.com

1 댓글12:23 PMHN에서 보기

I thought cookie clicker but with yubikey taps might be funny, and this is a proof of concept. Cloudflare did something similar in 2021, pitching yubikey-proven taps as a captcha replacement:

https://news.ycombinator.com/item?id=27141593

Cloudflare's idea of proving personhood with yubikey taps wasn't well received at the time. But here for a silly game, I think the idea holds up better. We can prove that you have a key, and serially tapped it a lot. If you automate activating the touch sensor on a physical key, you might be able to do a bit better. That's cool and you deserve to win.

2 weeks of coding, 3 months of OpenAI review, my ChatGPT App is live #

1 댓글4:05 PMHN에서 보기

I run Tredict, an endurance sports training platform I've been building since 2020. OpenAI opened the ChatGPT App Directory to third-party submissions in December, and the official Tredict app is now live. The actual programming took me two weeks, but the entire process took three months.

AMA on the submission process to OpenAI (timeline, review effort, what they ask for), how I solved user-authenticated content inside the iframe widgets, or why I had to remove certain tools to stay on the fitness side of OpenAI's fitness/health line.

https://www.tredict.com/blog/tredict_chatgpt_app/

Connect with a free ChatGPT account in a couple of clicks, then ask ChatGPT to analyse your activities, rename past sessions, or create structured workouts. Planned workouts sync to Garmin, Coros, Wahoo, Suunto and some more via Tredict. When you ask for it, an interactive Tredict view opens directly in the chat thread, showing the actual activity with charts, map and metrics, or the structured workout you just created.

Two things I find interesting about this:

The app uses MCP UI Apps, not just tools. Tredict's actual activity and plan views render inside the chat as interactive widgets. Most ChatGPT apps I've seen so far are tool-only, the widget pattern is still uncommon. Getting user-authenticated content into those widgets was the hardest part. The widget runs in a sandboxed iframe that has no access to the user's OAuth tokens, and there are basically no documented best practices for this yet.

ChatGPT is also frugal with its context window, so it tends to fetch the activity list and skip the detailed metrics unless you nudge it. A vague "tell me about my run" gets a shallow answer, while "fetch the details and give me a detailed assessment" gets the full analysis. For multi-week plan creation Claude with the same MCP server still works noticeably better. With Claude.ai I can build full structured training plans spanning weeks or even months, with proper periodisation, mixed sport types and individualised intervals based on past activity data. ChatGPT struggles with that scope. The limit sits with the host, not the server. The interactive MCP UI Apps also work in Claude.ai, so the same activity and plan widgets render directly in the chat there too.

Server lives at https://www.tredict.com/api/mcp/v2 and works with any MCP-compatible host. Honestly it works best with Claude.ai, which makes it slightly absurd that my application to be listed in Anthropic's connector directory has been pending without feedback for a while. If any Anthropic folks see this: would genuinely appreciate a status update or even a rejection with reason.

Launch Your Product. Get Seen Weekly #

2 댓글10:12 AMHN에서 보기

visit: https://www.scrolllaunch.com/

TermToMD past in choppy terminal output get clean Markdown #

termtomd.com

0 댓글11:31 AMHN에서 보기

It's not perfect, but it works for my needs. Hopefully it will be useful to someone else.

I was using a terminal based LLM to help me create some tickets for an upcoming sprint. Pasting directly into the ticket system provided a poor looking result. So I created TermToMD.

If you try it out, there is a gear button that will let you customize the results. Also the icon on the left will let you toggle dark mode and it has a quick markdown cheat sheet.

Webhook API – inbound email –> webhook #

echovalue.dev

0 댓글11:40 AMHN에서 보기

Hi HN,

I often run into systems that can send email but cannot call a webhook directly

I built a tool that creates an opaque email address <mailboxId>@hook.echovalue.dev and forwards inbound messages as structured JSON to a configured endpoint. It can also call a webhook from a cron schedule, so small scheduled jobs do not need their own worker

There are built-in formats for Slack, Discord, Teams, Telegram and custom JSON templates.

Pricing is pay-as-you-go and intentionally low. It is mostly there as an abuse limit rather than a subscription model

Docs: https://docs.echovalue.dev/webhook/

any feedback?

Tera – A Compiler‑Native UI Framework with Shared Runtime/AI Context #

github.com

0 댓글6:59 PMHN에서 보기

DAG-chat – DAG-based AI chat app with branch and merge #

github.com

0 댓글1:14 PMHN에서 보기

Prediction market analysis app layering LLMs with data APIs #

apps.apple.com

5 댓글3:06 PMHN에서 보기

I created a prediction market analysis app after trying prediction markets and doing quite poorly. I wondered if AI-driven predictions could be better with the right data. Depending on the model you use the answer swings wildly between definitely not and yes. Gemini 3 Flash and Sonnet have done well with complex pipeline analysis instructions. Examples of some data apis are FRED, the NWS, Open-Meteo, CoinGecko, congress.gov. I also incorporated general search queries with Serper and Tavily.

In full disclosure, I got maybe too ambitious and prepped it as an iOS product with monetization features. I am a vibe coder from a non-technical background and this is the first thing I created, so I would love critical feedback.

There is a sign-in but email verification is off so you can enter whatever.

Terminal UI for managing SSH servers (users admin, file transfers) #

github.com

0 댓글4:05 PMHN에서 보기

I built a systems programming language (Tin) #

github.com

0 댓글3:22 PMHN에서 보기

I spent 5 years building a financial planning and optimization tool #

projectionlab.com

0 댓글3:10 PMHN에서 보기

It took me half a decade[0], but I finally landed the original vision for ProjectionLab.

It started out as a personal finance simulator, and I always wanted to build an optimization layer on top. This month I did!

I've added a tax strategy engine to automatically coordinate Roth conversions, tax-aware withdrawal blending, and gain harvesting, around constraints like targeting a federal tax bracket, preserving ACA subsidies, respecting IRMAA cliffs, or avoiding NIIT.

A beam search wraps around this to help find the best strategy for your objectives (maximizing net legacy, minimizing lifetime taxes, etc.)

It's been pretty cool to see that in the median case this often saves about $300k in lifetime taxes.

If you tried PL years ago in one of my original Show HNs, I'd love to hear what feels different now, or what areas still need more attention.

I've incorporated a lot of this community's feedback, e.g. there is now a free tier with basic forecasting that does save your data. And I've added more international depth for folks in Canada, UK, AU, and some other locations.

Also noteworthy:

- Flexible spending based on portfolio performance to help simulate more realistic spending profiles and boost Chance of Success.

- Net legacy estimation, estate planning, and charitable giving (QCDs, DAFs, etc)

- Government benefit estimation (US: Social Security, Medicare, ACA subsidies, Canada: CPP, OAS, GIS, etc)

- Monte Carlo options like block bootstrap (stitch random blocks of consecutive years to mix eras while keeping year-to-year patterns)

- Changelog [1]

[0] previous Show HNs: 2021 (https://news.ycombinator.com/item?id=26969173), 2022 (https://news.ycombinator.com/item?id=31083093), 2023 (https://news.ycombinator.com/item?id=36849502)

[1] https://projectionlab.com/changelog

Tunemark – Bookmark moments in songs on Android #

tunemark.app

0 댓글11:01 AMHN에서 보기

I've been practising dancing for a few years now and while practising a choreography I found myself constantly jumping back to certains parts of a song. While there are DJ-type apps available that can be used to add cue points to tracks, those require you to have the song file. Since I couldn't find an app where I could do this with a streaming service, I decided to build one myself.

Tunemark works by reading and controlling the currently playing media from Android's media notification. This allows it to work with most music apps as long as they show the currently playing media notification.

This is my first ever Android application and it has been fun learning some native app development. I have some features planned for future releases such as sharing bookmarks via a link and perhaps a companion app for smart watches.

Hopefully you find this interesting and if you or someone you know could benefit from it, please give it a try or let them know!

I'm happy to answer any question and hear any feedback you have on the app or the product page etc.