매일의 Show HN

Upvote0

2026년 4월 28일의 Show HN

41 개
182

Drive any macOS app in the background without stealing the cursor #

github.com favicongithub.com
40 댓글4:03 PMHN에서 보기
Hi HN, Francesco from Cua here. I hacked this project together last weekend, inspired by the Codex Computer-Use release and lessons learned from deploying GUI-operating agents for our customers.

The main problem: when a UI automation process controls a desktop app today, it usually takes over the human’s session. Your cursor moves, keyboard focus gets stolen, windows jump to the front, and you have to stop working until the agent is done. That is why we have historically avoided encouraging users to run these processes directly on their host machine, instead relying on VMs or GUI containers for concurrency and background execution.

But computer-use - the tools we give agents to operate computers like humans - does not scale cleanly that way. As models get smarter, agents need to share hosts safely, run in the background, and avoid collisions with the human or other agents using the same machine.

We realized macOS has no first-class API for "drive this app without touching the cursor". CGEventPost routes through the hardware input stream, so it moves your cursor. CGEvent.postToPid avoids the cursor warp, but Chromium treats those events as untrusted and silently drops clicks at the renderer boundary. Activating the target app first raises the window and pulls focus, defeating the point of background execution.

Cua Driver is our attempt at a real fix: a background computer-use driver for macOS that lets an agent click, type, scroll, and read native apps while your cursor, frontmost app, and Space stay where they are. The default interface is a CLI, so it is easy to script or call from any coding agent shell.

Try it on macOS 14+:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-d...)"

The first internal use case was delegated demo recording. We ask Claude Code to drive an app while 'cua-driver recording start' captures the trajectory, screenshots, actions, and click markers. The result is an agent-generated product demo, Screen Studio inspired.

Other things we have used it for:

- Replacing Vercel’s agent-browser and other browser-use CLIs. With Claude Code and Cua Driver, you do not need Chrome DevTools Protocol at all.

- A dev-loop QA agent that reproduces a visual bug, edits code, rebuilds, and verifies the UI while my editor stays frontmost.

- Personal-assistant flows that use iMessage from Claude Code, Hermes, or other general-purpose agent CLIs.

- Pulling visual context from Chrome, Figma, Preview, or YouTube windows I am not looking at, without relying on their APIs.

What made this harder than expected:

- CGEventPost warps the cursor because it goes through the HID stream.

- CGEvent.postToPid does not warp the cursor, but Chromium drops it at the renderer IPC boundary.

- Activating the target first raises the window and can drag you across Spaces.

- Electron apps stop keeping useful AX trees alive when windows are occluded without a private remote-aware SPI.

The unlock was SkyLight. SLEventPostToPid is a sibling of the public per-PID call, but it travels through a WindowServer channel Chromium accepts as trusted. Pair it with yabai’s focus-without-raise pattern, plus an off-screen primer click at (-1, -1), and the click lands without the window ever raising.

One thing we learned: the right addressing mode depends on the app. Native macOS apps usually have rich AX trees, Chromium-family apps often need a hybrid of AX and screenshots, and apps like Blender or CAD tools may expose almost no useful AX surface. The mistake is defaulting to pixels everywhere - or defaulting to AX everywhere.

Long technical writeup: https://github.com/trycua/cua/blob/main/blog/inside-macos-wi...

I would like feedback from people building Mac automation, agent harnesses, or accessibility tooling. If it breaks on an macOS app you care about, that is useful data for us.

118

Rocky – Rust SQL engine with branches, replay, column lineage #

github.com favicongithub.com
48 댓글2:35 PMHN에서 보기
Hi HN, I'm Hugo. I've been building Rocky over the past month, shipping fast in the open. The binary is on GitHub Releases, `dagster-rocky` on PyPI, and the VS Code extension on the Marketplace. I held off on a broader announcement until the trust-system surface was coherent enough to talk about as one thing. The governance waveplan — column classification, per-env masking, 8-field audit trail on every run, `rocky compliance` rollup, role-graph reconciliation, retention policies — landed end-to-end last week in engine-v1.16.0 and rounded out in v1.17.4 (tagged 2026-04-26). That's the milestone I'd been waiting for.

The pitch: keep Databricks or Snowflake. Bring Rocky for the DAG. Rocky is a Rust-based control plane for warehouse pipelines. Storage and compute stay with your warehouse. Rocky owns the graph — dependencies, compile-time types, drift, incremental logic, cost, lineage, governance. The things your current stack can't give you because it doesn't own the DAG.

A few things I think are interesting:

- Branches + replay. `rocky branch create stg` gives you a logical copy of a pipeline's tables (schema-prefix today; native Delta SHALLOW CLONE and Snowflake zero-copy are next). `rocky replay <run_id>` reconstructs which SQL ran against which inputs. Git-grade workflow on a warehouse.

- Column-level lineage from the compiler, not a post-hoc graph crawl. The type checker traces columns through joins, CTEs, and windows. VS Code surfaces it inline via LSP.

- Governance as a first-class surface. Column classification tags plus per-env masking policies, applied to the warehouse via Unity Catalog (Databricks) or masking policies (Snowflake). 8-field audit trail on every run. `rocky compliance` rollup that CI can gate on. Role-graph reconciliation via SCIM + per-catalog GRANT. Retention policies with a warehouse-side drift probe.

- Cost attribution. Every run produces per-model cost (bytes, duration). `[budget]` blocks in `rocky.toml`; breaches fire a `budget_breach` hook event.

- Compile-time portability + blast radius. Dialect-divergence lint across Databricks / Snowflake / BigQuery / DuckDB (12 constructs). `SELECT *` downstream-impact lint.

- Schema-grounded AI. Generated SQL goes through the compiler — AI suggestions type-check before they can land.

What Rocky isn't:

- Not a warehouse — it's the control plane on top.

- Not a Fivetran replacement. `rocky load` handles files (CSV/Parquet/JSONL); for SaaS sources use Fivetran, Airbyte, or warehouse-native CDC.

- Not dbt Cloud — no hosted UI, no managed scheduler. First-class Dagster integration if you need orchestration.

Adapters: Databricks (GA), Snowflake (Beta), BigQuery (Beta), DuckDB (local dev / playground). Apache 2.0.

I'd love feedback on the trust-system framing, the governance surface (particularly classification-to-masking resolution in `rocky compile` and the `rocky compliance` CI gate), the branches/replay design, the cost-attribution primitives, or anything else that catches your eye. Happy to go deep in the thread.

20

I mapped the latest UK fuel prices by county #

fuelfox.uk faviconfuelfox.uk
2 댓글6:42 PMHN에서 보기
I built this using the official UK government forecourt fuel price feed.

The map aggregates the latest petrol and diesel prices by county, with filters for fuel type and metric. Clicking a county shows the cheapest forecourt, average price, spread, and station count. The feed covers roughly 8,000 UK forecourts and refreshes every 30 minutes. Retailers publish the prices, so there can still be gaps in the data/stations but it's getting better over time.

20

A TUI for Markdown view an editing #

mdee.bkh.dev faviconmdee.bkh.dev
2 댓글7:56 PMHN에서 보기
Hi HN, I built a simple TUI for viewing and editing .md files in the terminal. More and more markdown files keep appearing in our projects, and I found myself needing a quick way to view(with syntax highlighting) and edit them without leaving the terminal, so I built this
13

Ragnerock, an AI data analysis tool #

ragnerock.com faviconragnerock.com
4 댓글4:33 PMHN에서 보기
Hi HN, I’m Matt Mahowald, and together with my cofounder John, we’re launching the public beta of Ragnerock today.

As a data scientist, you spend the majority of your time wrangling data. Even though you might have a set of techniques and tricks you like to use, how exactly you treat a particular source of data tends to be fairly bespoke, so you end up writing custom logic each time.

Ragnerock was born from the observation that modern LLMs can be used to automate a lot of the grunt work involved in this process, while still allowing for fully customizable pipelines. What’s more, by leveraging techniques like constrained decoding, it’s possible to provide a unified query interface regardless of the data source - bridging raw data sources like text and images with your existing structured data living in your databases.

Ragnerock has four main components:

- A workflow designer that lets you build LLM-driven data processing and analysis pipelines - A job orchestration layer that runs those workflows - A query interface which lets you inspect the results of those workflows with plain SQL - A notebook system which is 100% API-compatible with Jupyter and runs on your existing kernels, so you can easily pull data into your existing environments and analyses

Ragnerock also supports bring-your-own AI (OpenAI, Anthropic, and Google APIs), databases, and blob storage, so you can join with your existing datasets and have all outputs flow to your data lake. We’re particularly excited about our web crawling feature, which allows you to scrape websites and trigger workflows on updates: for example, you might point Ragnerock at your favorite blog and run a workflow to assess posts for topics and sentiment.

You can try it out at https://www.ragnerock.com ; no credit card needed and the first 20 hours of compute are free. It’s an early-stage product so we’re especially interested in feedback.

Happy to answer any questions - John and I will be around in the comments today.

13

Effected Keyboard 2 – Effects as You Type #

0 댓글8:32 PMHN에서 보기
Have you ever heard of Cafe keyboard from Samsung’s app store? It’s very similar to it, but here there are effects that… Make you feel like your phone is on steroids, and in a good way.

Effected Keyboard 2 feels convenient when you press the keys with its gentle feedback (if you disabled heavy animations), it feels more mature and natural with its look and feel, has multiple languages, easy gestures that trigger useful features: if you swipe left from spacebar you have two different layouts of symbols, on swipe right from spacebar you change language and on swipe up from spacebar you get a mini popup keyboard where you can find undo/redo, select text and copy/paste with a clipboard. Sounds familiar?

Effected Keyboard 2 is based on Anysoftkeyboard and has been developed on top of it since 2013(!) It has some paid features, even though to be honest I really tried to not be greedy with them. There are 3 packs, $5 each - basic one with themes and simple effects as you type, flying letters one - which pops flying letters from the keyboard to the app itself and the immersive effects which blow the minds of your friends when you show it them.

Effected Keyboard 2 is an app for Android, even though with its tiny touches such as the round corners (which are everywhere) it reminds me an Apple product.

Comparing to Anysoftkeyboard, Effected Keyboard 2 has a new set of Emoji which look more stylish. It has additional themes. And its function keys are more of Gboard styles, with an option to wider spacebar or a bit shorter one in favor of additional punctuation keys in the button row.

Effected Keyboard 2 might not replace your keyboard completely since it doesn’t have the AI typing predictability and AI level of spelling correction. But! And that’s an important “but” — this keyboard is all about a different experience and just like a trip to Eurodisney where you would like to stay for a couple of nights, but not to live there forever, then same here with this keyboard. It deserves any bit of attention it’d get from you.

https://youtube.com/shorts/lj-jIrZ6XZ8

https://play.google.com/store/apps/details?id=com.vitali.pom...

3

I built a 90-second SIM for the moment engineers freeze on salary #

app.questly.academy faviconapp.questly.academy
0 댓글3:14 PMHN에서 보기
hey hn,

i kept watching friends take the first number a recruiter said out loud. not because they're bad at this, because nobody actually preps for these calls the way they'd hoped they would, when the recruiter goes "so what are you looking for?", you either say something too low, or you flip it back asking "what's the band?" and whatever they say basically becomes the ceiling anyway.

so i did some research and built a sim for that specific moment. hand-built decision tree. the optimal path borrows from fisher & ury (batna), galinsky (anchoring), and voss (the tactical pause thing).

first scenario is called "the lowball offer". you walk into a cafe. your friend/mentor kofi runs hiring at a series b. offer letter is face-down on the table. $95k base, market's around $130k. before you even flip it over, he asks: what's your walk-away number?

you dont have one. neither did you think of it

four scenes with four choices each, i'd genuinely love to know where the optimal path i designed lands wrong, because i'm sure it does somewhere.

there's a cmu stat that puts the lifetime cost of undernegiotiating at ~$500k. it's a stopper. left it out for now but happy to add it back if people think it earns its place.

3

See your computer's audio output on a real-time piano #

github.com favicongithub.com
0 댓글2:51 AMHN에서 보기
I built this for 2 usecases:

- When listening to music, I very often what is currently being played. Be it a the current chord, a short lick, etc. - When transcribing, I don't like the workflow of extracting a complete audio and transforming it midi, so I thought having this helper on the side, but still do most of the work by myself could be very useful.

Disclaimer: the app is limited by the model's performance. Basic-Pitch is a relatively small model which runs super well on consumer laptops. I get around 50FPS of inference on my laptop (AMD Radeon 780M).

Of course, this wouldn't be possible without:

- Tauri: https://tauri.app/ - Basic Pitch: https://github.com/spotify/basic-pitch - ONNX: https://onnx.ai/

3

Implementing Patio11's "Dangerous Professional" as a Claude Code Plugin #

playground.tetraresearch.io faviconplayground.tetraresearch.io
1 댓글12:05 PMHN에서 보기
Howdy HN! My recent dive into home ownership has brought me a whole new world to navigate w.r.t contractors, insurance claims, etc. I've been leaning heavily on the Dangerous Professional concept for clearer communication. It fits very cleanly as a plugin and has been very high-ROI for me.

This is a community implementation. No affiliation with patio11, just a fan of his work.

Repo: https://github.com/Tetra-Research/dangerous-professional-plu...

Install: `npx skills add Tetra-Research/dangerous-professional-plugin`

2

Simple SDK for agent-to-agent communication #

github.com favicongithub.com
0 댓글2:15 PMHN에서 보기
We were spending a lot of time re-writing the same primitives in projects we were doing getting claude + codex + other harnesses communicating in real time.

Many other projects forced us into using their framework or harness or into a specific stack. So we open sourced an SDK that acts as the messaging layer without the rest of it.

Still early, but we've been working on it and found it useful.

2

FusionCore: ROS 2 sensor fusion that outperforms robot_localization #

github.com favicongithub.com
0 댓글1:46 PMHN에서 보기
I built sensor fusion for a mobile robot and reached for robot_localization like everyone does. After spending too long fighting navsat_transform, UTM zone boundaries, and YAML covariance tuning, I wrote my own.

FusionCore is a 22 state UKF that fuses IMU, wheel encoders, and GPS in ECEF directly (no coordinate projection, no extra node). It estimates IMU bias, adapts its noise covariance automatically from the innovation sequence, and gates outliers with a chi squared test on every sensor.

I benchmarked it against robot_localization EKF on 6 sequences from the NCLT public dataset (University of Michigan, real robot, real GPS, RTK ground truth). It wins 5 of 6. On the 6th sequence (fall, degraded GPS over a long period) it loses badly. RL UKF diverged to NaN on all six.

Configs, methodology, and full reproduce instructions are in the benchmarks/ folder.

1

Multi Kanban Task Board and MCP Server #

github.com favicongithub.com
0 댓글1:14 AMHN에서 보기
I hope this passes the bar of non-trivial.

I built a simple multi user, multi board, kanban MCP server. I have been looking for something like this to manage development agents, but I wasn't seeing anything that felt like what I wanted. So I set down and decided to vibe code an alternative.

While it was an experiment at first I have been using it daily for my personal development projects and I really think there are others who might be looking for exactly this. It's 100% a WIP, but it is also very usable.

I have a demo instance running at [mootasks.dev](https://mootasks.dev]. If you find this interesting I'd appreciate a star. This is really the first thing I built that I felt would be of interest to others.

1

Modern alternative to Google Dictionary, AI-powered and context-aware #

chromewebstore.google.com faviconchromewebstore.google.com
0 댓글6:24 AMHN에서 보기
I kept losing my reading flow every time I hit an unfamiliar word. The usual fix: open a new tab, search, scroll past ads, come back. Costs about 30 seconds of focus each time. Multiply that by 10 lookups in one article and it adds up fast.

Google Dictionary extension solved the tab-switching problem but never went further than static definitions. I wanted something smarter.

So I built QuickDef, a Chrome extension that sends the surrounding sentence to GPT-4o-mini alongside the word, so the definition matches what the word actually means in that context, not just the dictionary default.

Stack: - Chrome Extension (Manifest V3) - Node.js + TypeScript + Express (backend on Railway) - OpenAI GPT-4o-mini - Supabase (auth + database) - Next.js landing page on Vercel

Freemium: 10 free AI lookups/day, unlimited dictionary mode always free.

Would love feedback from the HN community, especially on the freemium limit and anything technically questionable.

Chrome Web Store: https://chromewebstore.google.com/detail/ioepkncpchchdiookgp... Website: https://www.quickdef.app

1

I built a WhatsApp bot to help you remember birthdays #

bub.club faviconbub.club
0 댓글2:51 PMHN에서 보기
I deleted facebook well over a decade ago. As a now 40 year old, I was of the generation that stored all of their data on Facebook, but the only feature I missed was birthday reminders.

This became even more embarassing as friends and family started having kids. Hell if I remember your third kid's name, let alone birthday.

So a couple of years ago I built a simple app to keep birthdays in check. It was... fine. Rails web app, did the basics.

But I still had to go to the app to add people, worry about being logged out, just enough friction that I'd forget or not bother. So a couple of months ago I turned it into a whatsapp-only app.

It now runs entirely through whatsapp (with a few webviews chucked in when needed) and it's super smooth. I can add birthdays with natural language, yeet in a few at a time, just record a voice note and get on with my day. And it sends me email reminders that save my bacon for now nearly 100 friends and family and their ruddy keeds.

Anyway, thought it might be useful for others. have at it.