Show HN for May 2, 2026

32 items

168

State of the Art of Coding Models, According to Hacker News Commenters #

hnup.date

87 comments9:25 PMView on HN

Hello HN,

I was away from my computer for two weeks, and after coming back and reading the latest discussions on HN about coding assistants (models, harnesses), I felt very out of the loop. My normal process would have been to keep reading and figure out the latest and greatest from people's comments, but I wanted to try and automate this process.

Basically the goal is to get a quick overview over which coding models are popular on HN. A next iteration could also scan for harnesses that people use, or info on self-hosting or hardware setups.

I wrote a short intro on the page about the pipeline that collects and analyzes the data, but feel free to ask for more details or check the Google Sheet for more info.

https://hnup.date/hn-sota

Agent-desktop – Native desktop automation CLI for AI agents #

github.com

43 comments2:18 AMView on HN

I've been building computer-use tools for a while, and I quietly launched this about a month ago (122 Stars on GH). I figured it was worth sharing here.

Over the last few months, a lot of computer-use agents have come out: Codex, Claude Code, CUA, and others. Most of them seem to work roughly like this: 1. Take a screenshot 2. Have the model predict pixel coordinates 3. Click x,y 4. Take another screenshot 5. Repeat

That works, but it's slow, expensive in tokens, and fragile. If the UI shifts a few pixels, things break. And the model still doesn't know what any element actually is.

But the OS already exposes structured UI information:

  - macOS: Accessibility API
  - Windows: UI Automation
  - Linux: AT-SPI

Screen readers have used these APIs for years. On the web, Playwright beat screenshot scraping for the same reason: structured access is just a better abstraction than pixels.

So I built a desktop equivalent: agent-desktop.

It's a cross-platform CLI for structured desktop automation through the accessibility tree. One Rust binary, about 15 MB, no runtime dependencies. It exposes 53 commands with JSON output, so an LLM can inspect and operate native apps without screenshots or vision models. Inspired by agent-browser by Vercel Labs.

A typical loop looks like this:

  agent-desktop snapshot --app Slack -i --compact
  agent-desktop click @e12
  agent-desktop type @e5 "ship it"
  agent-desktop press cmd+return

So the loop becomes:

  1. Snapshot
  2. Decide
  3. Act
  4. Snapshot again

The main design problem was context size.

A naive approach would dump the full accessibility tree into the model, but real apps get huge. Slack can easily exceed 50,000 tokens for a full tree dump, which makes the approach impractical.

The approach I ended up using is progressive skeleton traversal:

  - First pass: return a shallow tree, typically depth 3, with deeper containers truncated and annotated with children_count
  - Named containers get references so the agent can request only that subtree
  - The agent drills down into the relevant region with --root @e3
  - References are scoped and invalidated only for that subtree
  - After acting, the agent can re-query just that region instead of re-snapshotting the whole app

In practice, this reduced token usage by about 78% to 96% versus full-tree dumps in Electron apps like Slack, VS Code, and Notion.

A few implementation details that may be interesting here:

  - Rust workspace with strict platform/core separation through a PlatformAdapter trait
  - Accessibility-first activation chain; mouse synthesis is the fallback, not the default
  - Deterministic element refs like @e1, @e2, with optimistic re-identification across UI shifts
  - Structured errors with machine-readable codes plus retry suggestions
  - C ABI via cdylib, so it can be loaded directly from Python, Swift, Go, Node, Ruby, or C without shelling out
  - Batch operations in a single call
  - Support for windows, menus, sheets, popovers, alerts, and notifications
  - Special handling for Chromium/Electron accessibility trees, which can get very deep and noisy

Why I think this matters: pixel-based desktop control feels like a leaky abstraction. The OS already knows the UI semantically. Accessibility APIs give you roles, names, actions, hierarchy, focus, selection, and state directly. That seems like a much better substrate for desktop agents than screenshot loops.

If you're building your own desktop agent, internal automation tool, or research prototype, this may be useful.

Install:

  npm install -g agent-desktop
  agent-desktop snapshot --app Finder -i

Repo: https://github.com/lahfir/agent-desktop

I'd especially love feedback from people who've built desktop automation before. What are the biggest pain points you've run into, and what would you want a tool like this to support?

nfsdiag - a NFS diagnostic application #

github.com

6 comments12:48 PMView on HN

Mljar Studio – local AI data analyst that saves analysis as notebooks #

mljar.com

18 comments10:21 AMView on HN

Hi HN,

I’ve been working on mljar-supervised (open-source AutoML for tabular data) for a few years. Recently I built a desktop app around it called MLJAR Studio.

The idea is simple: you talk to your data in natural language, the AI generates Python code, executes it locally, and the whole conversation becomes a reproducible notebook (*.ipynb file). So instead of just chatting with data, you end up with something you can inspect, modify, and rerun.

What MLJAR Studio does:

- Sets up a local Python environment automatically, runs on Mac, Windows, and Linux

- Installs missing packages during the conversation

- Built-in AutoML for tabular data (classification, regression, multiclass)

- Works with standard Python libraries (pandas, matplotlib, etc.)

- Works with any data file: CSV, Excel, Stata, Parquet ...

- Connects to PostgreSQL, MySQL, SQL Server, Snowflake, Databricks, and Supabase.

For AI: use Ollama locally (zero data egress), bring your own OpenAI key, or use MLJAR AI add-on.

I built this because I wanted something between Jupyter Notebook (flexible but manual) and AI tools that generate code but don’t preserve the workflow. Most tools I tried either hide too much or don’t give reproducible results and are cloud based

Demos:

- 60-second demo: https://youtu.be/BjxpZYRiY4c

- Full 3-minute analysis: https://youtu.be/1DHMMxaNJxI

Pricing is $199 one-time, with a 7-day trial.

Curious if this is useful for others doing real data work, or if I’m solving my own problem here.

Happy to answer questions.

Piruetas – A self-hosted diary app I built for my girlfriend #

piruet.app

52 comments10:42 AMView on HN

I searched for a simple, self-hosted journal app for my girlfriend and everything I found was either too complex, too feature-heavy, too feature-less for what I needed or required trusting a cloud service.

So I built Piruetas (it means lollipops in Spanish - she chose the name btw).

It's a day-per-page diary with rich text editing, drag-and-drop image uploads, auto-save, public share links, and a clean mobile UI. It can be set up for Personal or Multi-user usage via docker compose deployment.

She seems to like it so I decided to give back to the community and make it available for everyone (after some QA)

Live demo: https://piruet.app (login: demo / piruetas — data resets every 30 min!) GitHub: https://github.com/patillacode/piruetas

Filling PDF forms with AI using client-side tool calling #

copilot.simplepdf.com

29 comments8:54 AMView on HN

Hey HN!

I built SimplePDF Copilot: an AI assistant that can interact with the PDF editor. It fills fields, answers questions, focuses on a specific field, adds fields, deletes pages, and so on.

It's built on top of SimplePDF that I started 7 years ago, pioneering privacy-respecting client-side pdf editing, now used monthly by 200k+ people.

As for the privacy model: the PDF itself never leaves the browser. Parsing, rendering, and field detection all run client-side.

The text the model needs (and your messages) goes to whatever LLM you point at. By default that's our demo proxy (DeepSeek V4 Flash, rate-capped), but you can BYOK and point it at any cloud provider, or go fully local (I've been testing with LM Studio).

Unlike the existing "Chat with PDF" tools that only retrieve the text/OCR layer, Copilot can act on the PDF: filling fields, adding fields (detected client-side using CommonForms by Joe Barrow [1], jbarrow on HN with some post-processing heuristics I added on top), focusing on fields, deleting pages, and so on.

I built this because SimplePDF is mostly used by healthcare customers where document privacy is paramount, and I wanted an AI experience that didn't require shipping PII to a third party. Stack is pretty standard:

- Tanstack Start

- AI SDK from Vercel

- Tailwind (I personally prefer CSS modules, I'm old-school but the goal since I open source it, I figured that Tailwind would be a better fit)

The more interesting part is the client-side tool calling: events are passed back and forth via iframe postMessage.

If you're not familiar with "tool calling" and "client-side tool calling", a quick primer:

Tool calling is what LLMs use to take actions. When Claude runs grep or ls, or hits an MCP server, those are tool calls.

Client-side tool calling means the intent to call a tool comes from the LLM, but the execution happens in the browser.

That matters for: speed, you can't go faster than client-to-client operations and also gives you the ability to limit the data you expose to the LLM. For the demo I do feed the content of the document to the LLM, but that connection could be severed as simply as removing the tool that exposes the content data.

The demo is fully open source, available on Github [2] and the demo is the same as the link of this post [3]

What's not open source is SimplePDF itself (loaded as the iframe).

I could talk on and on about this, let me know if you have any questions, anything goes!

[1] https://github.com/jbarrow/commonforms

[2] https://github.com/SimplePDF/simplepdf-embed/tree/main/copil...

[3] https://copilot.simplepdf.com/?share=a7d00ad073c75a75d493228...

Large Scale Article Extract of Newspapers 1730s-1960s #

snewpapers.com

20 comments8:42 AMView on HN

Hello HN, over the past 7 months I've spent nearly 3,000 hours on building SNEWPAPERS, the first historical newpaper archive with full-text extractions, nearly perfect OCR, a vast categorization taxonomy and of course with semantic and agentic search capabilities.

Problem: I wanted to search through newspaper archives, but when I tried every service only lets you search for keywords and dates, and gives you back raw images of the papers, and too many of them with no context. A sea of noise.

Solution: I taught machines how to read the newspapers and so far I've extracted the content from > 600k pages (about 5TB) from the Chronicling America collection. Problems I had to deal with were an infinite variety of layouts, font sizes, image scan qualities, resolutions, aspect ratios, navigating around the images on the page. I also had to figure out how to get OCR to be nearly perfect so people wouldn't hate reading the extracts. I stitched together a multi-model pipeline (layout tech, ocr tech, llm, vllm) with heuristics to go from layout -> segmentation -> classification. I put it all in OpenSearch / Postgres and made it semantically searchable and also put an agentic search tool on top that knows how to use the API really well and helps you write queries to find what you're looking for. Happy to discuss AWS architecture and scaling as well, that was tough!

If you have five minutes and you just want to jump in and have your own personalized experience, what I would suggest is:

Before searching for anything, go to the Sleuth page Ask it about anything from 1736 to 1963, maybe 1 or 2 follow up questions Then go to the search page so you can see the queries it wrote for you (bottom left "saved queries") and uncover more info on whatever it is you're interested in

If you think it's cool and you want to learn more, then there's about 10 minutes of video guides on the various capabilities in "Guide" on the nav bar

Some other people have also taken a crack at this, notably:

https://dell-research-harvard.github.io/resources/americanst... (very good attempt) https://labs.loc.gov/work/experiments/newspaper-navigator/ (focused on images)

Browser-based light pollution simulator using real photometric data #

iesna.eu

20 comments9:08 AMView on HN

Hi HN — author here. iesna.eu is a browser-based ecosystem for working with photometric data: parsing standard luminaire files (LDT/EULUMDAT, IES LM-63, Oxytech, ATLA-S001), running design calculations against EN 13201 / ANSI/IES RP-8 / CJJ 45 / IES-IDA MLO, and (the part I most want to show off here) rendering real urban scenes in Bevy with the photometric data driving actual streetlight behavior, including sky-glow contribution. The Skyglow Analysis demo loads a real LDT file into a Bevy scene (Khronos Bistro test asset). The luminaire's intensity distribution drives the streetlight rendering directly — no fudging — and the sky-glow grade updates live as you adjust the uplight percentage. Swap to a full-cutoff fixture and the sky goes from F (Severe) back to A (Excellent). You can see the difference on the buildings as well as in the sky. Stack: Rust core (eulumdat-rs and friends, ~20 crates handling photometric formats), Bevy for the 3D rendering, WASM for browser deployment. No backend; everything runs client-side. About a thousand lines of new code on top of the existing photometric library to make the Bevy integration work. Things I'd love feedback on:

The atmospheric scattering model is currently single-scattering Rayleigh+Mie. Is that defensible for the use case, or should I move toward multi-scattering? The Bistro test scene works well visually but isn't a controlled environment. Anyone know of a public urban geometry asset that's more typical of real road-lighting evaluation? The CJJ 45 implementation (China's national road lighting standard) is the only one I've had to reverse-engineer from translated PDFs. If anyone has primary-source experience with it, I'd value a sanity check.

Open-source on GitHub (eulumdat-rs and the related crates). Crates.io: eulumdat

I Built a Museum Exhibit #

knhash.in

4 comments9:07 PMView on HN

Stop playing my matchstick puzzles, start building your own in seconds #

mathstick.github.io

35 comments5:04 AMView on HN

Rust library for Undo/Redo using deltas, snapshots or commands #

github.com

4 comments6:41 PMView on HN

Clipmon is a macOS clipboard manager on steroids #

github.com

6 comments8:29 PMView on HN

MemHub, Turn Your GPT/Claude/Gemini History into LLM-Wiki Mindmap #

github.com

0 comments12:56 AMView on HN

Hi, this is Tristan, CPO of XTrace.

We are launching a very cool feature that is inspired by Andrey Karpathy's LLM Wiki mindmap. Let everyone who doesn't have enough sessions and markdowns made with claude code be able to visualize their own memory mindmap!

Which public repos are friendliest to an AI coding agent? #

agentfriendlycode.com

0 comments7:14 PMView on HN

Hollow is an open-sourced self-modifying agentic system #

github.com

0 comments7:43 AMView on HN

Raptor – fast, energy efficient small file uploads to S3 #

github.com

3 comments4:26 AMView on HN

AgInTiFlow, a local web and CLI agent workspace using DeepSeek #

npmjs.com

0 comments1:31 PMView on HN

High perfomance database - Graph, vector, array, columnar, KV #

github.com

0 comments7:11 PMView on HN

I built Male Hormone Lab Interpreter that does what LLMs can't #

longevity-tools.com

0 comments9:12 AMView on HN

Hi,

My name is Zsolt. I build clinical tools for labs and longevity clinics, and I release some of them as free public tools.

The Male Hormone Lab Interpreter unique feature is that it visualizes the testosterone production and feedback pathway on HPG axis to identify the true root cause of hormonal imbalances. All hormones are interpreted based on optimal ranges. This is something LLMs still struggles with.

The only difference between public and clinical ones is that the clinical suggest diagnosis and treatment options. But if there is any type of disfunction its obvious from the diagram and differentiating is also straigh forward.

All the tools on the longevity-tools.com are 100% free and private. No email or any data collection, no third party scripts, no tracking scripts. Full privacy.

Check out my other tools:

## Thyroid Function Interpreter https://longevity-tools.com/thyroid-function-interpreter

Visualizes hormones, calculated sensitivities, and secretory capacities on the HPT axis so thyroid dysfunction patterns are easier to recognize.

## Glucose Metabolism Interpreter https://longevity-tools.com/glucose-metabolism-interpreter

Verifies fasting and insulin status, then classifies glucose-metabolism dysfunction using insulin-dependent and non-insulin-dependent markers.

## Liver Function Blood Test Interpreter https://longevity-tools.com/liver-function-interpreter

Calculates liver health scores, runs differential diagnostic logic, helps identify the tissue of origin of elevated enzymes, and surfaces relevant next steps from clinical guidelines.

## Iron Status Interpreter https://longevity-tools.com/iron-status-interpreter

Adjusts ferritin for inflammation, estimates real iron stores, interprets them against optimal targets, and provides guideline-based next steps plus a personalized supplementation plan.

## Grip Strength Interpreter https://longevity-tools.com/grip-strength-interpreter

Shows grip-strength percentiles, estimates all-cause-mortality risk impact, proposes achievable targets, and projects future muscle-weakness risk.

## Humanity's Bortz Blood Age Calculator https://longevity-tools.com/humanitys-bortz-blood-age

The most advanced biological age calculator model based on blood tests available in commercial labs. It was trained on 306k UK Biobank participants and has higher predictive value than Levine PhenoAge.

## Levine PhenoAge Biological Age Calculator https://longevity-tools.com/levine-pheno-age

A biological age calculator based on routine blood biomarkers, developed by Morgan Levine, Steve Horvath, and collaborators. It prioritizes accessibility and low-cost lab markers over maximum precision.

Agent with its own computer on the cloud #

pulsarbot.cloud

0 comments7:15 AMView on HN

Glacier – A zero-config macOS terminal I vibecoded in Rust #

github.com

2 comments7:28 AMView on HN

Create the right image sizes for social media #

skills.sh

1 comments7:43 AMView on HN

Sanishne – Rust based bookmark boards #

sanishne.org

0 comments8:26 AMView on HN

I built Sanishne because useful links in teams kept disappearing into Slack/Discord/Notion noise.

It is a small, free shared bookmark board app. You create a board for a project, add links with short notes and tags, then invite people as owners, editors, or viewers. It also supports search, tag filtering, expiring invite links, MFA-backed accounts, and JSON import/export so your bookmarks are not trapped.

The name comes from Georgian: სანიშნე means bookmark. The app is intentionally boring: no AI summarization, no feed, no browser extension requirement, no team knowledge graph. Just a place where a group can keep the links that are actually worth coming back to.

Technically, it is built in Rust with Axum and some internal libraries I built over time.

I would especially appreciate feedback!

Shutt – Turn Strava activities into shareable photo/video posts #

shutt.run

0 comments9:10 AMView on HN

Fabrica – A minimal terminal-based coding agent built in Rust #

github.com

1 comments7:49 PMView on HN

Rotato – Node.js proxy that rotates LLM API keys on 429 errors #

github.com

0 comments9:08 PMView on HN

From Beats to Notes and Beyond #

bookerapp.replit.app

0 comments8:50 AMView on HN

What appears as the fundamental building blocks of music-notes, scales, harmony—are not primitives.

They are emergent structures.

Pitch is just rhythm sped up.

A podcast generated by NoteookLM: https://www.youtube.com/watch?v=q9bFUocrm70

TurnZero – Persistent Expert for LLMs #

0 comments7:51 AMView on HN

In an attempt to reduce cold starts in AI sessions Ive made a tool that runs as an MCP server and loads the context before Turn 0.

Two things happen:

Personal Priors - your workflows and standards loads once per session and persists across every supported AI client.

Expert Priors - when prompt is stack specific, relevar priors inject based on semantic similarity. This is to reduce errors and unwanted behaviour of the AI.

Privacy guarantee: local-first by design. Raw prompts are never stored. Injection is always client-side.

```bash pipx install turnzero turnzero setup # registers MCP server with Claude Code, Cursor, Claude Desktop, Gemini CLI turnzero verify # confirms everything is wired correctly ```

Demo:https://asciinema.org/a/8IV2yoLNTloSlZo0

Repo: https://github.com/turnzero-ai/turnzero

Headcam [video] #

youtube.com

0 comments9:29 PMView on HN

Hi all, I've been tirelessly swearing at Claude for several months in an effort to cram head tracking into loads of games and generally make it more accessible.

I'm working on dozens of OpenTrack compatible game mods all of which are being released under the MIT license, 9 are out now and can be found on GitHub/Nexus Mods:

https://github.com/itsloopyo?tab=repositories&q=head+trackin...

https://www.nexusmods.com/profile/itsloopyo/mods

In the linked video I discuss my iPhone app, Headcam, and the library powering its absurdly responsive video streaming.

If anybody would like to try it, you can join the TestFlight beta at https://headcam.app

Formattery – on-device file converter for iPhone, iPad, and Mac #

apps.apple.com

0 comments7:47 AMView on HN

A Universal Stability Criterion for Symbolic Complex Systems #

zenodo.org

1 comments6:43 PMView on HN

This work establishes a foundational invariant for proactive stability monitoring in artificial intelligence, legal frameworks, financial systems, .and biological sequences

Plannotator for Codex #

twitter.com

1 comments9:27 PMView on HN