2025年7月29日的 Show HN

46 条

933

Draw A Fish and watch it swim with the others #

drawafish.com

233 评论4:57 AM在 HN 查看

280

I built an AI that turns any book into a text adventure game #

kathaaverse.com

111 评论4:17 PM在 HN 查看

It's a web app that uses AI to turn any book into a playable text adventure. Your favorite book, but your choices, hence your story. You can even "remix" the genre like playing Dune as a noir detective story.

Note: Work in progress. Suggestions are welcome.

124

Terminal-Bench-RL: Training Long-Horizon Terminal Agents with RL #

github.com

12 评论11:12 AM在 HN 查看

After training calculator agent via RL, I really wanted to go bigger! So I built RL infrastructure for training long-horizon terminal/coding agents that scales from 2x A100s to 32x H100s (~$1M worth of compute!) Without any training, my 32B agent hit #19 on Terminal-Bench leaderboard, beating Stanford's Terminus-Qwen3-235B-A22! With training... well, too expensive, but I bet the results would be good!

*What I did*:

- Created a Claude Code-inspired agent (system msg + tools)

- Built Docker-isolated GRPO training where each rollout gets its own container

- Developed a multi-agent synthetic data pipeline to generate & validate training data with Opus-4

- Implemented a hybrid reward signal of unit test verifiers & a behavioural LLM judge.

*Key results*:

- My untrained Qwen3-32B agent achieved 13.75% on Terminal-Bench (#19, beats Stanford's Qwen3-235B MoE)

- I tested training to work stably on 32x H100s distributed across 4 bare metal nodes

- I created a mini-eval framework for LLM-judge performance. Sonnet-4 won.

- ~£30-50k needed for full training run of 1000 epochs (I could only afford testing )

*Technical details*:

- The synthetic dataset ranges from easy to extremely hard tasks. An example hard task's prompt:

"I found this mystery program at `/app/program` and I'm completely stumped. It's a stripped binary, so I have no idea what it does or how to run it properly. The program seems to expect some specific input and then produces an output, but I can't figure out what kind of input it needs. Could you help me figure out what this program requires?"

- Simple config presets allow training to run on multiple hardware setups with minimal effort.

- GRPO used with 16 rollouts per task, up to 32k tokens per rollout.

- Agent uses XML/YAML format to structure tool calls

*More details*:

My Github repos open source it all (agent, data, code) and has way more technical details if you are interested!:

- Terminal Agent RL repo

- Multi-agent synthetic data pipeline repo

I thought I would share this because I believe long-horizon RL is going to change everybody's lives, and so I feel it is important (and super fun!) for us all to share knowledge around this area, and also have enjoy exploring what is possible.

Thanks for reading!

Dan

(Built using rLLM RL framework which was brilliant to work with, and evaluated and inspired by the great Terminal Bench benchmark)

A GitHub Action that quizzes you on a pull request #

github.com

33 评论6:20 PM在 HN 查看

A little idea I got from playing with AI SWE Agents. Can AI help make sure we understand the code that our AIs write?

PR Quiz uses AI to generate a quiz from a pull request and blocks you from merging until the quiz is passed. You can configure various options like the LLM model to use, max number of attempts to pass the quiz or min diff size to generate a quiz for. I found that the reasoning models, while more expensive, generated better questions from my limited testing.

Privacy: This GitHub Action runs a local webserver and uses ngrok to serve the quiz through a temporary url. Your code is only sent to the model provider (OpenAI).

Monchromate – the best greyscale browser extension #

monochromate.lirena.in

13 评论6:32 PM在 HN 查看

4 months back I started making an extension named monochromate which was basically a greyscale extension, I realised power of them when I turned it on my phone and the thing is being a programmer most of my time is spent on browsers ... and that's where it don't exist (at least not the perfect solution for my needs).

That's how I came up with it, made it open source, recently passed 100 users over chrome webstore and it have 5-star rating as of now.

and yes you might say that why would I need it when I can do this with filters that can be toggled via settings, thing is I didn't wanted to greyscale my work sites as well and thus I made site exclusion, also have scheduler, intensity control and above that all it supports all the browsers including Firefox providing same experience.

Would love any kind of feedback over this!!

ELF Injector #

github.com

13 评论3:00 PM在 HN 查看

The ELF Injector allows you to "inject" arbitrary-sized relocatable code chunks into ELF executables. The code chunks will run before the original entry point of the executable runs.

Included in the project are sample chunks as well as a step-by-step tutorial on how it works.

It's a mix of C and assembly and currently runs on 32-bit ARM though it's easy to port to other architectures.

Xorq – open compute catalog for AI #

github.com

11 评论3:15 PM在 HN 查看

Hi HN, Hussain and Dan from Xorq here.

After years of struggling with scaling compute that worked in notebooks but failed in production, we decided to do something about it. Data has standards like Iceberg and Delta. But compute is still a mess—trapped in notebooks, duplicated effort across teams, or baked into custom Airflow DAGs. We think of Xorq as the missing analog to Apache Iceberg, but for compute.

We’ve spent the last year building Xorq, an *compute catalog* that helps teams *reuse, ship, and observe* transformations, features, models, and pipelines across engines.

Xorq is built on:

- *Arrow Flight* (`do_exchange`) for high-speed data transport - *Ibis* for cross-engine expression trees, serialized to YAML - A portable UDF engine that compiles pipelines to SQL or Python - `uv` to make Python environments fully reproducible

Xorq features:

- pandas-style declarative transformations, backed by Ibis - Multi-engine execution (e.g., DuckDB, Snowflake) - UDFs as portable Flight endpoints - Serveable transforms by way of flight_udxf operator - Built-in caching and lineage tracking - Diff-able YAML artifacts, great for CI/CD

Xorq use cases:

Since our last major release, it’s been exciting to see the first Xorq use-cases show up in the wild. All with *Python simplicity and SQL-scale performance*.

- Feature Stores (https://www.xorq.dev/blog/featurestore-to-featurehouse) - Semantic Layers (e.g. https://github.com/boringdata/boring-semantic-layer) - MCP + ML Integration (https://docs.xorq.dev/vignettes/mcp_flight_server)

We’re open source and learning fast. Would love feedback on what’s useful or missing. Thanks in advance for trying it out!

Check out the demo of the Xorq CLI tool in action: https://asciinema.org/a/730484

---

Get Started

- Github: https://github.com/xorq-labs/xorq - Xorq docs: https://docs.xorq.dev/ ---

Sneak peak - Xorq Compute Catalog UI Console:

Check out this interactive Claude demo showing how the Xorq compute catalog can be visualized to accelerate composition, reuse, and troubleshooting of AI compute: https://claude.ai/public/artifacts/d2f00d2a-a3f9-4032-884e-d...

Rewindtty – Record and replay terminal sessions as structured JSON #

github.com

16 评论10:28 AM在 HN 查看

Walk-through of rocket landing optimization paper [pdf] #

scpowers.github.io

1 评论7:02 AM在 HN 查看

Hey all! Long time lurker, first time poster.

I found this rocket landing trajectory optimization paper cool, but it took me a while to wrap my head around it and implement it. I wrote up an expanded version of the paper including details that would have helped me understand it the first time through, with the idea being that it might make the content more approachable for others with similar interests. The source code is also linked in the document.

I'm open to feedback, I'm always trying to get better across the board.

The easiest accessibility (a11y) checker for VSCode #

github.com

1 评论3:49 PM在 HN 查看

Wush-Action – SSH into GitHub Actions over WireGuard #

github.com

0 评论1:56 PM在 HN 查看

Debunking Election Fraud Claims – Interactive Data Viz and Simulations #

sullivan.zip

4 评论2:22 PM在 HN 查看

Hi HN!

I built this after seeing several references to Election Truth Alliance on social media, and after reading their analysis, I just couldn't get the problems I saw in it out of my head.

So I downloaded the data, and rebuilt their full analysis from scratch.

Their critical error is a simple misunderstanding of the Law of Large Numbers: values collected in large samples converge to the true probability in the sample distribution.

(not to be confused with the Law of Very Large Numbers: which states that unlikely things happen given enough time. That confused me too)

Technical Details:

- No build system, this is entirely handmade HTML, CSS, and plain Javascript. - Initial analysis done in Python with only standard libraries. - Visualizations created in Observable Plot and D3.js - Simulations run entirely client-side - Web page built with Scrollama for animations and behavior controls - Vote history visualizations process ~600k individual ballot records in real time, with a little bit of cacheing to keep your browser from chugging. - Made with the help of Windsurf

Interesting Challenges:

- Making the visualizations performant without a backend, which is accomplished with a bit of preloading as you scroll, and some amound of cacheing so that the visualizations can share resources whenever possible. - Windsurf does run wild sometimes. During the initial preprocessing stage, it at one point dumped an absolutely massive json blob to disk, it was so large it actually crashed my whole computer while writing. Then to read it, obviously it couldn't just be read in, but rather than storing in a more sane format, my Opus 4 powered coding agent decided to build a streaming JSON parser from scratch. It worked, and I got the data out that I needed so I didn't go back and make it more sensible, but man that was dumb.

This actually started with the simulation, which took only about a day of work, and then later grew to include the re-analysis and visualizations. The visualizations were all dnoe within 2-3 days after I got the data.

If I did it over again, I would've probably tried to find some kind of build system or static site generator to compose the final result. Once the page got very long it was quite unwieldy even for windsurf. Very short conversations could flood Sonnet 4's rate limit because there was just so much stuff in a single file.

Maia Chess – Human-like chess AI for playing, learning, and more #

maiachess.com

1 评论5:28 PM在 HN 查看

We're thrilled to announce that www.maiachess.com is now in open beta, meaning everyone can access it! Maia is the most human-like chess AI, and is an ongoing research project at the University of Toronto developing fun, useful, and novel human-AI collaboration in chess. Please give it a try and let us know what you think. We're still rapidly improving and iterating on it.

* Play Maia-2: Play the (updated) most human-like chess engine, tailored to your skill level

* Analyze your games: See how you (or the pros!) stack up with both Maia’s human-based predictions and classic Stockfish evaluation

* Try Maia-powered puzzles: Tactics puzzles curated and analyzed through Maia’s unique lens

* Openings drill: Brand new! Select openings, play through them against Maia, and get instant, personalized feedback

* Hand & Brain: Play this fun team variant where you play with Maia as a human-AI team

* Bot-or-not: A chess Turing Test: can you spot the bot in a real human-vs-bot game?

* Leaderboards: See how you rank in each mode, and challenge yourself to climb higher

We’d love your feedback: what works, what doesn’t, what’s missing, or what would make the platform more valuable for you. Join our Discord to chat with us and other users (https://discord.gg/hHb6gqFpxZ).

If you're interested in our research behind Maia, you can check out these papers:

Aligning Superhuman AI with Human Behavior: Chess as a Model System, KDD 2020

Detecting Individual Decision-Making Style: Exploring Behavioral Stylometry in Chess, NeurIPS 2021

Learning Models of Individual Behavior in Chess, KDD 2022

Designing Skill-Compatible AI: Methodologies and Frameworks in Chess, ICLR 2024

Maia-2: A Unified Model for Human-AI Alignment in Chess, NeurIPS 2024

Learning to Imitate with Less: Efficient Individual Behavior Modeling in Chess, under review

Same prompt tested across Replit, Bolt, v0, Lovable and Raq.com #

raq.com

0 评论1:34 PM在 HN 查看

Hi HN,

I built Raq.com – a platform that uses Claude Code to build working internal tools directly in the browser.

Claude Code is great at self correcting when given the right tools.

I've found that the popular web-based AI coding tools look great in demos but fail on real API integrations or require a lot of error back and forth. They don't appear to do much research or self-correcting, likely to reduce spend. I wanted to see the current state of these tools, so I ran the same prompt on five platforms (Replit, Bolt, v0, Lovable, and Raq.com) to build a tool that requires 3 different APIs (Companies House, FinUK and OpenRouter) working together.

Four platforms produced broken prototypes or needed manual fixes. Raq.com delivered a complete working solution from a single prompt (that can be deployed to live with one click).

Full test with videos: https://raq.com/real-world-test

We're in early access (requires Claude Pro/Max for free usage) - we're looking non-coders who would like to build internal tools for their team.

Some technical info:

- Raq.com provisions isolated dev and prod Docker environments for each company (companyname.raq.com and companyname-dev.raq.com). - The dev site includes a persistent terminal streamed to the browser, so the session continues even while tab is closed. - CLAUDE.md file provides best practices, known pitfalls, and coding patterns for the Laravel + Filament stack. - Self-Correction Loop: Claude can test and debug its own work. It has direct shell access to a custom script that bundles PHPUnit, syntax checks, and cache clearing. Plus a Playwright wrapper to check for errors and take screenshots. - A single click runs a script that rsync's the dev workspace to the prod container, runs migrations, and clears caches.

Agentic Coding Tools – Directory for AI agents and vibe coders #

aisnoop.org

3 评论2:51 AM在 HN 查看

I put together a directory of agentic coding tools & things like autonomous app builders, CLI agents, VSCode copilots, and multi-agent dev platforms.

Most of these tools can plan, scaffold, and write code with minimal input. Some are polished, some experimental. I wanted a way to compare them all in one place.

You can filter by autonomy level, LLMs used, pricing, open source, etc. It’s a compact UI—works on mobile, has dark mode, and no signups or fluff.

Would love feedback:

Are there tools I’ve missed?

Anything that should be organized differently?

Info you wish was included?

Cheers.

Fix Your Damn Posture! #

apps.apple.com

0 评论4:34 AM在 HN 查看

Turn your AirPods into a real-time posture coach.

I built a Tamagotchi that teaches French numbers #

apps.apple.com

0 评论7:48 PM在 HN 查看

Hey HN! Halfway through moving to Canada and whilst my conversational French gets me by, I'm absolutely rubbish when someone rattles off phone numbers at full speed (quatre-vingt-dix-neuf anyone?).

So I built a virtual pet that thrives on correctly answered number challenges. 20-second drills throughout the day: get them right and Lexie grows, get them wrong and it gets a bit sad (but never dies, this isn't the 90s).

Speak, type, or tap your answers. Would love any feedback/bugreports.

Or just rant about how the Belgians sensibly say "nonante-neuf" whilst we're stuck with "four-twenties-nineteen".

API Radar – Real-time GitHub scanner for exposed API keys #

apiradar.live

2 评论7:37 AM在 HN 查看

Hi HN,

I'm a solo dev and student, and I recently built API Radar — a real-time tool that monitors public GitHub commits for leaked API keys (OpenAI, Google Gemini, Anthropic Claude, and more).

What it does: Scans public GitHub commits in real time Detects API keys using pattern matching and validation heuristics Redacts most of the key, but allows copying for verified leaks (for security teams) Leaderboards by leaky repositories and exposed providers Built to promote developer hygiene and security awareness

Stack: Backend: Node.js (Fastify), MongoDB, Redis, custom TruffleHog-like scanner Frontend: Next.js 14, TailwindCSS, shadcn/ui Infra: VPS, NGINX + SSL, background worker farm, rate-limit handling

Current stats (soft launch): 210 active users 208 new users 2.6K total events 53s average engagement time

Built fully solo — from design to deployment, analytics to queue resilience. My goal was to ship something fast, security-aware, and production-grade.

Would love feedback on: Improving UX for security teams Ethics around redaction and disclosure Ideas to scale this into an OSS tool or API service

Thanks for reading! https://apiradar.live

— Zaim

YouTubeTldw: ad‑free, login‑free YouTube summaries in a flash #

youtubetldw.com

3 评论2:18 PM在 HN 查看

I made an open source, no BS youtube summary website that makes use of the tldw [1] python library to create really quick youtube summaries.

The longer a talk is, the more ad revenue a creator gets. But we don't all have 40 minutes to listen to someone slowly edge around a point.

This website has no ads, no login, and is 100% free.You can find the source code here [2].

[1] https://pypi.org/project/tldw/ [2] https://github.com/DavidZirinsky/tldw-site

I waste my time extracting stuff every week from the Internet #

apicocktail.substack.com

0 评论3:21 PM在 HN 查看

I started a few weeks ago (6 to be exact) a curation process. To make it short, I selected 150 feeds of news that I think are relevant, including HN, download new items on a regular basis, pre-filter them using light llms to filter out things that I'm not interested in (like billionaire drama, politics, recruitment, fundraising, astrophysics, commercial software ...) and filter them manually using a tinder-like application for news, with a few internal criterias: like open source software, human written inspiring text, dislike pricing buttons...

So basically, I got into a process of swiping 2500 or so times a week about content, before diving even more in the one that interested me at first glance, which means I've eaten a lot of my time for little to no value.

Any genius (or stupid) ideas on how to do better? I'd like to continue, but as it is now, it's too much time consuming and I'll get bored soon ... Of course I could automate the selection with LLMs but that's not the point, I like human-picked stuff (although I may benefit from auto-filtering generated content if I knew how to).

Thanks :)

Suggest – Ultra-low-friction feedback for your website #

suggest.dev

2 评论12:56 PM在 HN 查看

Hey HN! I'm excited to share suggest.dev, a feedback widget that triggers when users rageclick on your site. When a user leaves feedback, a full session replay is saved with it, making it easy for developers to understand what happened.

We built this to get feedback from external users, but we're seeing high uptake from our own team for internal feedback. Often, we'd encounter small paper cuts in the product in the middle of another task. In the past, many of these were just not reported - the effort to create an issue and describe it in sufficient detail was too high. And if you did report, the context switch was long enough that you'd interrupt your original task. Suggest takes a literal 4 seconds to leave feedback, and because of the session replay, all feedback is very high quality - always with enough information for a developer to reproduce an issue.

Suggest works alongside your existing tools and doesn't need you to replace any of them.

Thoughts and feedback are very welcome!

TanStack DB – Reactive DB with Differential Dataflow for TanStack Query #

tanstack.com

2 评论5:48 PM在 HN 查看

Hi HN, Kyle, Sam and the TanStack team here.

We’ve been working on TanStack DB, an embedded, reactive client database for TanStack Query, and are proud to announce today that with the 0.1 release that it's now in BETA!

TanStack DB plugs into your existing TanStack Query useQuery calls and uses Differential Dataflow to incrementally recompute only what changed, so updates stay sub-millisecond even with 100k rows. You get live queries, optimistic updates with automatic rollback, and streaming joins — all in the client!

TanStack DB works with REST, GraphQL, WebSockets, and shines with sync engines like ElectricSQL or Firebase, letting you load large, normalized collections once and stream real-time changes into the client without manual bookkeeping. It sits on top of queryClient so you can adopt it incrementally, one route at a time.

- Intro post: https://tanstack.com/blog/tanstack-db-0.1-the-embedded-clien...

- Local-first sync via Electric: https://electric-sql.com/blog/2025/07/29/local-first-sync-wi...

- Web starter with TanStack Start: https://github.com/electric-sql/electric/tree/main/examples/...

- Mobile starter with Expo: https://github.com/electric-sql/electric/tree/main/examples/...

- Project website and docs: https://tanstack.com/db

- GitHub repo: https://github.com/tanstack/db

Try it out and let us know what you think!

I built a deep email validation library in Kotlin #

github.com

0 评论6:10 PM在 HN 查看

Show HN: I built a deep email validation library to learn Kotlin

Hey HN,

I wanted a real-world project to properly learn Kotlin (coroutines, DSLs, etc.) and decided to tackle a problem I've found surprisingly underserved: comprehensive email validation. Most solutions stop at regex, but that doesn't prevent sign-ups from [email protected] or disposable email services.

So, I built a library that performs a series of deeper checks. I just tagged the v1.0.0 release because the API is now stable and I think it's ready for feedback from the community.

It validates an email in layers:

1. Syntax: A robust check that's more reliable than a typical regex.

2. Domain Registrability: Checks the domain against the Public Suffix List to ensure it's on a real TLD.

3. MX Records: A DNS query to see if the domain is actually configured to receive email.

4. Disposable Services: Checks against a list of known temporary/throwaway email providers.

5. SMTP Connection (Optional): A live check to see if the mailbox actually exists. This is off by default since port 25 is often blocked, but can be enabled via a proxy.

One of my main goals was to build something that would be useful on both the server and on a client like an Android app. This led to a couple of key design decisions:

- It's built with coroutines for non-blocking, concurrent I/O.

- It has a full offline mode. You can disable all network checks and run it using bundled datasets for things like syntax and disposable domain checks, which is great for providing instant, client-side feedback.

The configuration is done through a simple Kotlin DSL.

The project is MIT licensed. I'm posting this to get your thoughts on the approach, the architecture, or any Kotlin idioms I might have missed. How do you all typically handle this problem beyond regex?

GitHub: https://github.com/mbalatsko/emailverifier-kt

4KFilmDb – A tool to track and analyze 4K movies (HDR, Dolby Atmos) #

4kfilmdb.com

0 评论4:25 AM在 HN 查看

Hi HN,

Over the past few months, I’ve been building 4KFilmDb, the first (and independent) 4K movie database to track and compare streaming quality (HDR, bitrates, Atmos audio) across platforms (Netflix, Prime Video, Disney+, etc).

Key features: • HDR & Atmos analyzers • Smart filters (Presets) with built-in options to suggest 4K titles or ready-made lists • Fake HDR titles tracker (spot poor HDR grades easily)

Currently in beta, so feedback is very welcome.

MultiDrive – a free app to clone, backup, erase drives (UI/CLI) #

multidrive.io

0 评论11:34 AM在 HN 查看

Hey HN! I'm the CTO at Atola Technology, and we have launched MultiDrive: a free disk backup/clone/wipe tool that does things differently.

After 17 years of work with drives, I got tired of seeing simple disk operations locked behind paywalls. Macrium killed their free version, EaseUS hides cloning in paid tiers, and so on and so forth. Another reason is that all the existing solutions are overly complicated. It must be simple enough for my mum to start erasing a USB stick or making a full drive backup. All in all, we thought the community deserved better.

What makes MultiDrive different:

- Dead simple launch of a full drive backup, clone, erase or restore

- 100% free. No ads, "upgrade to pro" popups

- Standard formats. Backups use ZIP or RAW, no proprietary .afi/.tib nonsense

- Handles bad sectors, loose cables, can pause/resume any operation

- Parallel drive tasks

- CLI app as an addition for workflow automation

Would love your feedback! What problems with disk operations have you had that current tools couldn't solve? We're building our roadmap based on real pain points.

PolyglotGPT – Conversational AI for Learning 40 Languages #

polyglotgpt.com

0 评论1:09 PM在 HN 查看

Hi HN! I'm trying to learn a language, but I don't have anyone to speak it with, so I made this website to get some practice.

To start, all you have to do is set your native language and target language. Then, just start talking to it in either your native or target language. It'll catch any mistakes you make when you speak in your target language and answer any grammar/vocab questions you have.

It has a translate button, a romanize button (converts any text into the Latin alphabet), and you can highlight words/phrases you don't know in AI responses to have them explained.

I'd appreciate any feedback, thanks!

Gogg – A GOG game downloader written in Go #

0 评论4:25 PM在 HN 查看

Hey HN,

I made an open-source tool in Go, named gogg, to download and back up your GOG.com game library.

It's cross-platform and has features like:

- A scriptable CLI and easy-to use GUI

- Multi-threaded and resumable downloads

- Filters for platform, language, DLCs, etc.

- File verification with hashes and total size calculation

You can find the project on GitHub: https://github.com/habedi/gogg

Kbm - GPU‑accelerated visual mouse and keyboard macros in Rust #

getkbm.io

0 评论5:06 PM在 HN 查看

BreathylBox – A lockbox that only opens when you're sober #

breathylbox.com

0 评论10:08 PM在 HN 查看

Hi HN,

I'm an incoming college freshman building BreathylBox, a lockbox that stays locked unless you pass a breathalyzer test and authenticate with a passcode. It’s designed to help prevent access to car keys, firearms, or phones when someone’s been drinking.

Here’s the landing page: https://www.breathylbox.com

Right now, we’re validating demand across use cases:

Parents storing car keys after parties Gun safety in homes with teens People trying to reduce tech use while drinking

We’re not selling anything yet — just trying to see if the idea resonates and which use case to prioritize. Would love your feedback:

Would you use something like this? What should we do (or avoid) before moving to manufacturing? Any obvious legal or hardware red flags?

Thanks in advance — happy to answer any questions!

Sean Short, CEO BreathylBox ([email protected])

Railway hackathon – deploy an idea over a weekend #

railway.com

0 评论2:02 PM在 HN 查看

At Railway (I work there), we're putting on a hackathon starting August 6th.

Build a template for others, whether it be for full-stack apps or a headless CMS.

We've seen people deploy traditional apps or infra to host marketing blog sites (we host ours on Railway).

Upto $1000 in prizes for project complexity or content depth.

Simple finance tracker, CRM and task manager for small business #

github.com

0 评论4:29 PM在 HN 查看

AI agents reviewing each other's code in production [video] #

youtube.com

0 评论11:27 AM在 HN 查看

e've been running an experiment where Claude creates PRs and CodeRabbit reviews them, then Claude responds to the feedback. They debate implementation details in GitHub comments, and both AIs learn from these interactions.

Results after 2 months: - 98% production-ready code before human review - 3-month features now ship in 2 weeks - 2 developers supporting 4 platforms effectively

Video walkthrough (10 min): https://www.youtube.com/watch?v=fV__0QBmN18

Tech stack: Claude Code, CodeRabbit, Asana and Figma via MCP, custom orchestration layer.

The interesting part is watching them disagree - CodeRabbit might suggest an optimization, and Claude will defend its approach with specific reasoning about our codebase. These conversations create great documentation.

Happy to answer questions about the setup, costs, or specific implementation details.

LNB – One command to make any binary globally accessible #

github.com

0 评论8:04 PM在 HN 查看

Give Claude a secure coding env to automate work in your apps #

github.com

0 评论2:23 PM在 HN 查看

Hello HN, we're Andrew and Stephen from Keyboard (https://www.keyboard.dev/).

After building AI tools for the past year, we recently made a YouTube video on building MCP servers and realized MCP is a total game-changer. It essentially lets AI do anything by connecting to your apps. But the deeper we dove, the clearer it became that security and privacy were complete afterthoughts. Coming from backgrounds at Okta and Stripe, this made us pretty uncomfortable.

We kept seeing the same pattern: every app needs its own MCP server, each storing sensitive tokens, with minimal security controls. It felt like we were back to the early days of OAuth implementations. Functional, but scary.

How Keyboard fixes this:

- Isolated execution: Your API keys live in your own GitHub Codespace secrets, Bearer OAuth tokens in encrypted files on your machine. Your credentials stay in your trust radius - Ephemeral environments: Codespaces can be destroyed/recreated, limiting blast radius - Built-in access controls: GitHub's enterprise-grade security model protects your credentials - Zero-trust architecture: Only you can access your API keys and execution environment

What makes this different:

- Real code execution: Claude can run JavaScript/Node.js with npm packages and your API credentials - Reusable workflows: Save complex scripts as "Keyboard Shortcuts" for instant reuse - Universal integration: One setup connects Linear, Slack, Google Workspace, GitHub, and more - Auto-environment management: Codespaces created/managed automatically as needed

The GitHub Codespace approach came from experimental work with interactive documentation. We realized Codespaces might be the most secure place to execute these tasks - isolated, ephemeral, with enterprise-grade controls.

We need your help: If this resonates, give us a star on GitHub! We're looking for early users and contributors who want to help make MCP more powerful and more secure.

We'd love your feedback, especially if you've been experimenting with MCP yourself!

If you want to try it here is the quickstart: https://docs.keyboard.dev/getting-started/quickstart

SBoMPlay – Client side SBoM explorer #

cyfinoid.github.io

0 评论8:57 PM在 HN 查看

Analyze Software Bill of Materials from GitHub organizations and users to understand dependency patterns and usage across repositories.

opensource, work in progress code, please share your feedback.

Dart implementation of the libp2p networking stack #

github.com

0 评论12:31 AM在 HN 查看

I created a Dart port of the LibP2P stack. I wanted a native Dart implementation, and in the absence of a native QUIC library, I also created my own UDP transport implementation called Dart-UDX. You can find GossipSub, DHT and UDX implementations alongside this one in my github account.

Click the Circle #

tb-peregrine.github.io

0 评论6:11 PM在 HN 查看

I built a Vue dependency debugger plugin #

npmjs.com

0 评论8:13 PM在 HN 查看

I was using vue reactivity hooks to debug my apps so decided to build a plugin to better visualize what's going on. It's by no means perfect but decided to add some bells and whistles and publish it. All feedback is welcome.

LLM-benchmark – Make LLMs fight for fastest ops/SEC on your code #

github.com

0 评论4:37 AM在 HN 查看

StoxGPT – type "add RSI" and the indicator appears on the chart #

stoxgpt.com

0 评论3:20 AM在 HN 查看

Hi HN,

I built *StoxGPT*, a TradingView-powered chart where you control everything by chat. Example:

> *You*: add RSI > *Chart*: (RSI indicator appears) > *You*: change ticker to AMZN > *Chart*: (switches symbols)

No menus, no hotkeys—just natural language mapped to the TradingView JS API.

---

### Why? I got tired of drilling through panels to add indicators or tweak settings. A chatbot front-end felt faster, so I wired GPT-3.5-Turbo to TradingView’s `widget.activeChart()` calls.

---

### How it works * *React + Next.js* front-end * *Dummy OHLCV generator* (open-source) for this demo * Simple command grammar → LLM → function-calling layer → TradingView injection * Hosted on Vercel; cold start ~ 400 ms

---

### What I’m exploring next * *Plaid auth* → live Robinhood balances & fundamentals * Back-testing via OpenAI function calls * Sharing indicator “recipes” between users

---

### Looking for feedback * Would live data make this a daily driver? * Any killer feature missing? * Is the chat modality actually faster for you?

Thanks in advance—happy to answer anything!

Faster local AWS EKS access #

github.com

0 评论2:20 PM在 HN 查看

Simple script to replace `aws` in your Kubeconfig for users of AWS EKS. Reduces the pain of using kubectl with AWS EKS a lot

CineWan – video generation platform powered by Wan2.2 AI model #

cinewan.net

0 评论4:28 PM在 HN 查看

  I've been working on CineWan, an AI video generation
  platform that leverages the new Wan2.2 models with
  Mixture-of-Experts (MoE) architecture.

  Technical highlights:
  • MoE architecture separates denoising across
  timesteps with specialized expert models
  • Dynamic routing system selects experts based on
  content complexity
  • Generates up to 720p, 121-frame videos from text or
  images
  • Built on Next.js 15 with edge runtime for <50ms
  global response times
  • Smart cost optimization: Cloudflare R2 storage with
  3-day auto-expiration
  • Real-time progress streaming with exponential
  backoff polling

  The Wan2.2 models were trained on +65% more images and
   +83% more videos than v2.1, with integrated
  cinematography principles. We're seeing cinema-grade
  output quality that rivals much more expensive
  solutions.

Benchmax, a new open-source RL environment framework for LLM finetuning #

github.com

0 评论8:10 PM在 HN 查看

Hello HN!

I’ve been working on `benchmax`, a open-source framework for building, running, and parallelizing environments, to fine-tune LLMs with reinforcement learning.

What I wanted to solve for:

- Environments are tightly coupled with RL trainers, leading to fragmentation and limited compatibility.

- These coupled environments are tend to be mostly competitive math and coding → for OSS RL + LLMs to scale, we need more complex, real-world environments.

- Scaling these environments in parallel is still not easily possible

What I'm excited about:

- benchmax is training framework agnostic with adapters already built out for verl and verifiers. we’re gonna build more adapters for other frameworks (e.g. SkyRL, etc.), instead of forcing others to adopt our standard (though ofc they’re welcome to )

- benchmax comes with a few interesting environments out of the box: spreadsheet processing, CRM, etc. → more coming soon!

- benchmax supports MCP as a first class citizen. there has been an explosion of MCP servers/tools built out for usecases ranging from browser use to excel to game creation.`benchmax` allow folks to leverage and compose these existing MCP servers to build environments integrated with real world systems

- Multi-node environment parallelization coming soon!

If you like what you see, feel free to *star* the *repo* to support the project!! Our hope’s to really let anyone benchmax on their tasks, with benchmax

https://github.com/cgftinc/benchmax

It’s still very early! And I expect to be shipping a lot more things → more environments, more trainer integrations. Would love y’all’s thoughts what environments and trainer integrations could be interesting!

I built an API to generate PDF invoices from JSON #

json2invoice.com

0 评论9:02 PM在 HN 查看

Hi HN,

I'm Daniel. I built a simple and straightforward API: you POST a JSON payload with your invoice data, and it returns a secure, presigned URL to a generated PDF. The goal is to make invoicing a single, reliable API call so you can get back to your main product.

I also used this as a personal challenge to move away from my old LAMP stack background and build something new with Python/FastAPI, Next.js, and a serverless architecture on GCP and AWS.

For the HN community, I've set up a promo code: HEYHN100

If you sign up, you can redeem it in your dashboard for 100 free credits (on top of the 10 you get by default). The credits don't expire.

I'm here to answer any questions and would genuinely appreciate any feedback or technical critiques you have. Thanks for checking it out.

AI debates between different personas #

nimroboai.com

1 评论1:07 PM在 HN 查看

GenDB – I built a tool to generate a full database from a single prompt #

gendb.carrd.co

0 评论1:01 PM在 HN 查看

Hey HN, Nic here.

I recently built a prototype called GenDB, an AI-powered backend builder designed to eliminate boilerplate and streamline database deployment.

The idea came out of my own frustrations: I was spending hours writing and rewriting Python code with TortoiseORM just to build and modify basic schemas. Then I’d have to deploy it all over again for even small changes. After seeing tools like Lovable and Cursor make front-end development nearly effortless, I started to wonder: why wasn’t backend development just as fluid?

With GenDB, you can: - Prompt a schema (e.g., “Instagram clone”) via natural language or image - Edit it visually using a DBML-based ERD editor - One-click deploy to GCP or AWS - (Coming soon) Auto-generate APIs and safe migration scripts

The goal is to go from idea → schema → live backend in minutes, without writing boilerplate or fiddling with cloud infrastructure too much.

It’s still an early prototype, and I’d love your feedback, especially if you’ve run into similar pain points. What seems useful? What’s missing? Where will this fall apart?

Demo + details here: https://gendb.carrd.co/

Thanks for taking a look!

2025年7月29日 的 Show HN

Gogg – A GOG game downloader written in Go #

2025年7月29日的 Show HN