2026年4月28日 の Show HN
41 件Drive any macOS app in the background without stealing the cursor #
The main problem: when a UI automation process controls a desktop app today, it usually takes over the human’s session. Your cursor moves, keyboard focus gets stolen, windows jump to the front, and you have to stop working until the agent is done. That is why we have historically avoided encouraging users to run these processes directly on their host machine, instead relying on VMs or GUI containers for concurrency and background execution.
But computer-use - the tools we give agents to operate computers like humans - does not scale cleanly that way. As models get smarter, agents need to share hosts safely, run in the background, and avoid collisions with the human or other agents using the same machine.
We realized macOS has no first-class API for "drive this app without touching the cursor". CGEventPost routes through the hardware input stream, so it moves your cursor. CGEvent.postToPid avoids the cursor warp, but Chromium treats those events as untrusted and silently drops clicks at the renderer boundary. Activating the target app first raises the window and pulls focus, defeating the point of background execution.
Cua Driver is our attempt at a real fix: a background computer-use driver for macOS that lets an agent click, type, scroll, and read native apps while your cursor, frontmost app, and Space stay where they are. The default interface is a CLI, so it is easy to script or call from any coding agent shell.
Try it on macOS 14+:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-d...)"
The first internal use case was delegated demo recording. We ask Claude Code to drive an app while 'cua-driver recording start' captures the trajectory, screenshots, actions, and click markers. The result is an agent-generated product demo, Screen Studio inspired.
Other things we have used it for:
- Replacing Vercel’s agent-browser and other browser-use CLIs. With Claude Code and Cua Driver, you do not need Chrome DevTools Protocol at all.
- A dev-loop QA agent that reproduces a visual bug, edits code, rebuilds, and verifies the UI while my editor stays frontmost.
- Personal-assistant flows that use iMessage from Claude Code, Hermes, or other general-purpose agent CLIs.
- Pulling visual context from Chrome, Figma, Preview, or YouTube windows I am not looking at, without relying on their APIs.
What made this harder than expected:
- CGEventPost warps the cursor because it goes through the HID stream.
- CGEvent.postToPid does not warp the cursor, but Chromium drops it at the renderer IPC boundary.
- Activating the target first raises the window and can drag you across Spaces.
- Electron apps stop keeping useful AX trees alive when windows are occluded without a private remote-aware SPI.
The unlock was SkyLight. SLEventPostToPid is a sibling of the public per-PID call, but it travels through a WindowServer channel Chromium accepts as trusted. Pair it with yabai’s focus-without-raise pattern, plus an off-screen primer click at (-1, -1), and the click lands without the window ever raising.
One thing we learned: the right addressing mode depends on the app. Native macOS apps usually have rich AX trees, Chromium-family apps often need a hybrid of AX and screenshots, and apps like Blender or CAD tools may expose almost no useful AX surface. The mistake is defaulting to pixels everywhere - or defaulting to AX everywhere.
Long technical writeup: https://github.com/trycua/cua/blob/main/blog/inside-macos-wi...
I would like feedback from people building Mac automation, agent harnesses, or accessibility tooling. If it breaks on an macOS app you care about, that is useful data for us.
Rocky – Rust SQL engine with branches, replay, column lineage #
The pitch: keep Databricks or Snowflake. Bring Rocky for the DAG. Rocky is a Rust-based control plane for warehouse pipelines. Storage and compute stay with your warehouse. Rocky owns the graph — dependencies, compile-time types, drift, incremental logic, cost, lineage, governance. The things your current stack can't give you because it doesn't own the DAG.
A few things I think are interesting:
- Branches + replay. `rocky branch create stg` gives you a logical copy of a pipeline's tables (schema-prefix today; native Delta SHALLOW CLONE and Snowflake zero-copy are next). `rocky replay <run_id>` reconstructs which SQL ran against which inputs. Git-grade workflow on a warehouse.
- Column-level lineage from the compiler, not a post-hoc graph crawl. The type checker traces columns through joins, CTEs, and windows. VS Code surfaces it inline via LSP.
- Governance as a first-class surface. Column classification tags plus per-env masking policies, applied to the warehouse via Unity Catalog (Databricks) or masking policies (Snowflake). 8-field audit trail on every run. `rocky compliance` rollup that CI can gate on. Role-graph reconciliation via SCIM + per-catalog GRANT. Retention policies with a warehouse-side drift probe.
- Cost attribution. Every run produces per-model cost (bytes, duration). `[budget]` blocks in `rocky.toml`; breaches fire a `budget_breach` hook event.
- Compile-time portability + blast radius. Dialect-divergence lint across Databricks / Snowflake / BigQuery / DuckDB (12 constructs). `SELECT *` downstream-impact lint.
- Schema-grounded AI. Generated SQL goes through the compiler — AI suggestions type-check before they can land.
What Rocky isn't:
- Not a warehouse — it's the control plane on top.
- Not a Fivetran replacement. `rocky load` handles files (CSV/Parquet/JSONL); for SaaS sources use Fivetran, Airbyte, or warehouse-native CDC.
- Not dbt Cloud — no hosted UI, no managed scheduler. First-class Dagster integration if you need orchestration.
Adapters: Databricks (GA), Snowflake (Beta), BigQuery (Beta), DuckDB (local dev / playground). Apache 2.0.
I'd love feedback on the trust-system framing, the governance surface (particularly classification-to-masking resolution in `rocky compile` and the `rocky compliance` CI gate), the branches/replay design, the cost-attribution primitives, or anything else that catches your eye. Happy to go deep in the thread.
AgentSwift – open-source iOS builder agent #
Waiting for LLMs Suck – Give your user a game #
49Agents – 2D Canvas IDE for Orchestrating Agents, Repos, Issues #
I mapped the latest UK fuel prices by county #
The map aggregates the latest petrol and diesel prices by county, with filters for fuel type and metric. Clicking a county shows the cheapest forecourt, average price, spread, and station count. The feed covers roughly 8,000 UK forecourts and refreshes every 30 minutes. Retailers publish the prices, so there can still be gaps in the data/stations but it's getting better over time.
A TUI for Markdown view an editing #
SyncVibe – Code with friends in the terminal, each with your own AI #
Devicons, +1300 logos and icons in React, SVG, and icon format #
+1300 logos and icons coming along with a brand new website, high fidelity SVG files, extended documentation and support for all major front end frameworks.
GitHub link: https://github.com/vorillaz/devicons
Ragnerock, an AI data analysis tool #
As a data scientist, you spend the majority of your time wrangling data. Even though you might have a set of techniques and tricks you like to use, how exactly you treat a particular source of data tends to be fairly bespoke, so you end up writing custom logic each time.
Ragnerock was born from the observation that modern LLMs can be used to automate a lot of the grunt work involved in this process, while still allowing for fully customizable pipelines. What’s more, by leveraging techniques like constrained decoding, it’s possible to provide a unified query interface regardless of the data source - bridging raw data sources like text and images with your existing structured data living in your databases.
Ragnerock has four main components:
- A workflow designer that lets you build LLM-driven data processing and analysis pipelines - A job orchestration layer that runs those workflows - A query interface which lets you inspect the results of those workflows with plain SQL - A notebook system which is 100% API-compatible with Jupyter and runs on your existing kernels, so you can easily pull data into your existing environments and analyses
Ragnerock also supports bring-your-own AI (OpenAI, Anthropic, and Google APIs), databases, and blob storage, so you can join with your existing datasets and have all outputs flow to your data lake. We’re particularly excited about our web crawling feature, which allows you to scrape websites and trigger workflows on updates: for example, you might point Ragnerock at your favorite blog and run a workflow to assess posts for topics and sentiment.
You can try it out at https://www.ragnerock.com ; no credit card needed and the first 20 hours of compute are free. It’s an early-stage product so we’re especially interested in feedback.
Happy to answer any questions - John and I will be around in the comments today.
Effected Keyboard 2 – Effects as You Type #
Effected Keyboard 2 feels convenient when you press the keys with its gentle feedback (if you disabled heavy animations), it feels more mature and natural with its look and feel, has multiple languages, easy gestures that trigger useful features: if you swipe left from spacebar you have two different layouts of symbols, on swipe right from spacebar you change language and on swipe up from spacebar you get a mini popup keyboard where you can find undo/redo, select text and copy/paste with a clipboard. Sounds familiar?
Effected Keyboard 2 is based on Anysoftkeyboard and has been developed on top of it since 2013(!) It has some paid features, even though to be honest I really tried to not be greedy with them. There are 3 packs, $5 each - basic one with themes and simple effects as you type, flying letters one - which pops flying letters from the keyboard to the app itself and the immersive effects which blow the minds of your friends when you show it them.
Effected Keyboard 2 is an app for Android, even though with its tiny touches such as the round corners (which are everywhere) it reminds me an Apple product.
Comparing to Anysoftkeyboard, Effected Keyboard 2 has a new set of Emoji which look more stylish. It has additional themes. And its function keys are more of Gboard styles, with an option to wider spacebar or a bit shorter one in favor of additional punctuation keys in the button row.
Effected Keyboard 2 might not replace your keyboard completely since it doesn’t have the AI typing predictability and AI level of spelling correction. But! And that’s an important “but” — this keyboard is all about a different experience and just like a trip to Eurodisney where you would like to stay for a couple of nights, but not to live there forever, then same here with this keyboard. It deserves any bit of attention it’d get from you.
https://youtube.com/shorts/lj-jIrZ6XZ8
https://play.google.com/store/apps/details?id=com.vitali.pom...
I built a 90-second SIM for the moment engineers freeze on salary #
i kept watching friends take the first number a recruiter said out loud. not because they're bad at this, because nobody actually preps for these calls the way they'd hoped they would, when the recruiter goes "so what are you looking for?", you either say something too low, or you flip it back asking "what's the band?" and whatever they say basically becomes the ceiling anyway.
so i did some research and built a sim for that specific moment. hand-built decision tree. the optimal path borrows from fisher & ury (batna), galinsky (anchoring), and voss (the tactical pause thing).
first scenario is called "the lowball offer". you walk into a cafe. your friend/mentor kofi runs hiring at a series b. offer letter is face-down on the table. $95k base, market's around $130k. before you even flip it over, he asks: what's your walk-away number?
you dont have one. neither did you think of it
four scenes with four choices each, i'd genuinely love to know where the optimal path i designed lands wrong, because i'm sure it does somewhere.
there's a cmu stat that puts the lifetime cost of undernegiotiating at ~$500k. it's a stopper. left it out for now but happy to add it back if people think it earns its place.
See your computer's audio output on a real-time piano #
- When listening to music, I very often what is currently being played. Be it a the current chord, a short lick, etc. - When transcribing, I don't like the workflow of extracting a complete audio and transforming it midi, so I thought having this helper on the side, but still do most of the work by myself could be very useful.
Disclaimer: the app is limited by the model's performance. Basic-Pitch is a relatively small model which runs super well on consumer laptops. I get around 50FPS of inference on my laptop (AMD Radeon 780M).
Of course, this wouldn't be possible without:
- Tauri: https://tauri.app/ - Basic Pitch: https://github.com/spotify/basic-pitch - ONNX: https://onnx.ai/
Implementing Patio11's "Dangerous Professional" as a Claude Code Plugin #
This is a community implementation. No affiliation with patio11, just a fan of his work.
Repo: https://github.com/Tetra-Research/dangerous-professional-plu...
Install: `npx skills add Tetra-Research/dangerous-professional-plugin`
Simple SDK for agent-to-agent communication #
Many other projects forced us into using their framework or harness or into a specific stack. So we open sourced an SDK that acts as the messaging layer without the rest of it.
Still early, but we've been working on it and found it useful.
FusionCore: ROS 2 sensor fusion that outperforms robot_localization #
FusionCore is a 22 state UKF that fuses IMU, wheel encoders, and GPS in ECEF directly (no coordinate projection, no extra node). It estimates IMU bias, adapts its noise covariance automatically from the innovation sequence, and gates outliers with a chi squared test on every sensor.
I benchmarked it against robot_localization EKF on 6 sequences from the NCLT public dataset (University of Michigan, real robot, real GPS, RTK ground truth). It wins 5 of 6. On the 6th sequence (fall, degraded GPS over a long period) it loses badly. RL UKF diverged to NaN on all six.
Configs, methodology, and full reproduce instructions are in the benchmarks/ folder.
Multi Kanban Task Board and MCP Server #
I built a simple multi user, multi board, kanban MCP server. I have been looking for something like this to manage development agents, but I wasn't seeing anything that felt like what I wanted. So I set down and decided to vibe code an alternative.
While it was an experiment at first I have been using it daily for my personal development projects and I really think there are others who might be looking for exactly this. It's 100% a WIP, but it is also very usable.
I have a demo instance running at [mootasks.dev](https://mootasks.dev]. If you find this interesting I'd appreciate a star. This is really the first thing I built that I felt would be of interest to others.
Modern alternative to Google Dictionary, AI-powered and context-aware #
Google Dictionary extension solved the tab-switching problem but never went further than static definitions. I wanted something smarter.
So I built QuickDef, a Chrome extension that sends the surrounding sentence to GPT-4o-mini alongside the word, so the definition matches what the word actually means in that context, not just the dictionary default.
Stack: - Chrome Extension (Manifest V3) - Node.js + TypeScript + Express (backend on Railway) - OpenAI GPT-4o-mini - Supabase (auth + database) - Next.js landing page on Vercel
Freemium: 10 free AI lookups/day, unlimited dictionary mode always free.
Would love feedback from the HN community, especially on the freemium limit and anything technically questionable.
Chrome Web Store: https://chromewebstore.google.com/detail/ioepkncpchchdiookgp... Website: https://www.quickdef.app
I built a WhatsApp bot to help you remember birthdays #
This became even more embarassing as friends and family started having kids. Hell if I remember your third kid's name, let alone birthday.
So a couple of years ago I built a simple app to keep birthdays in check. It was... fine. Rails web app, did the basics.
But I still had to go to the app to add people, worry about being logged out, just enough friction that I'd forget or not bother. So a couple of months ago I turned it into a whatsapp-only app.
It now runs entirely through whatsapp (with a few webviews chucked in when needed) and it's super smooth. I can add birthdays with natural language, yeet in a few at a time, just record a voice note and get on with my day. And it sends me email reminders that save my bacon for now nearly 100 friends and family and their ruddy keeds.
Anyway, thought it might be useful for others. have at it.