2025年7月29日 的 Show HN
46 条I built an AI that turns any book into a text adventure game #
Note: Work in progress. Suggestions are welcome.
Terminal-Bench-RL: Training Long-Horizon Terminal Agents with RL #
*What I did*:
- Created a Claude Code-inspired agent (system msg + tools)
- Built Docker-isolated GRPO training where each rollout gets its own container
- Developed a multi-agent synthetic data pipeline to generate & validate training data with Opus-4
- Implemented a hybrid reward signal of unit test verifiers & a behavioural LLM judge.
*Key results*:
- My untrained Qwen3-32B agent achieved 13.75% on Terminal-Bench (#19, beats Stanford's Qwen3-235B MoE)
- I tested training to work stably on 32x H100s distributed across 4 bare metal nodes
- I created a mini-eval framework for LLM-judge performance. Sonnet-4 won.
- ~£30-50k needed for full training run of 1000 epochs (I could only afford testing )
*Technical details*:
- The synthetic dataset ranges from easy to extremely hard tasks. An example hard task's prompt:
"I found this mystery program at `/app/program` and I'm completely stumped. It's a stripped binary, so I have no idea what it does or how to run it properly. The program seems to expect some specific input and then produces an output, but I can't figure out what kind of input it needs. Could you help me figure out what this program requires?"
- Simple config presets allow training to run on multiple hardware setups with minimal effort.
- GRPO used with 16 rollouts per task, up to 32k tokens per rollout.
- Agent uses XML/YAML format to structure tool calls
*More details*:
My Github repos open source it all (agent, data, code) and has way more technical details if you are interested!:
- Terminal Agent RL repo
- Multi-agent synthetic data pipeline repo
I thought I would share this because I believe long-horizon RL is going to change everybody's lives, and so I feel it is important (and super fun!) for us all to share knowledge around this area, and also have enjoy exploring what is possible.
Thanks for reading!
Dan
(Built using rLLM RL framework which was brilliant to work with, and evaluated and inspired by the great Terminal Bench benchmark)
A GitHub Action that quizzes you on a pull request #
PR Quiz uses AI to generate a quiz from a pull request and blocks you from merging until the quiz is passed. You can configure various options like the LLM model to use, max number of attempts to pass the quiz or min diff size to generate a quiz for. I found that the reasoning models, while more expensive, generated better questions from my limited testing.
Privacy: This GitHub Action runs a local webserver and uses ngrok to serve the quiz through a temporary url. Your code is only sent to the model provider (OpenAI).
Monchromate – the best greyscale browser extension #
That's how I came up with it, made it open source, recently passed 100 users over chrome webstore and it have 5-star rating as of now.
and yes you might say that why would I need it when I can do this with filters that can be toggled via settings, thing is I didn't wanted to greyscale my work sites as well and thus I made site exclusion, also have scheduler, intensity control and above that all it supports all the browsers including Firefox providing same experience.
Would love any kind of feedback over this!!
ELF Injector #
Included in the project are sample chunks as well as a step-by-step tutorial on how it works.
It's a mix of C and assembly and currently runs on 32-bit ARM though it's easy to port to other architectures.
Xorq – open compute catalog for AI #
After years of struggling with scaling compute that worked in notebooks but failed in production, we decided to do something about it. Data has standards like Iceberg and Delta. But compute is still a mess—trapped in notebooks, duplicated effort across teams, or baked into custom Airflow DAGs. We think of Xorq as the missing analog to Apache Iceberg, but for compute.
We’ve spent the last year building Xorq, an *compute catalog* that helps teams *reuse, ship, and observe* transformations, features, models, and pipelines across engines.
Xorq is built on:
- *Arrow Flight* (`do_exchange`) for high-speed data transport - *Ibis* for cross-engine expression trees, serialized to YAML - A portable UDF engine that compiles pipelines to SQL or Python - `uv` to make Python environments fully reproducible
Xorq features:
- pandas-style declarative transformations, backed by Ibis - Multi-engine execution (e.g., DuckDB, Snowflake) - UDFs as portable Flight endpoints - Serveable transforms by way of flight_udxf operator - Built-in caching and lineage tracking - Diff-able YAML artifacts, great for CI/CD
Xorq use cases:
Since our last major release, it’s been exciting to see the first Xorq use-cases show up in the wild. All with *Python simplicity and SQL-scale performance*.
- Feature Stores (https://www.xorq.dev/blog/featurestore-to-featurehouse) - Semantic Layers (e.g. https://github.com/boringdata/boring-semantic-layer) - MCP + ML Integration (https://docs.xorq.dev/vignettes/mcp_flight_server)
We’re open source and learning fast. Would love feedback on what’s useful or missing. Thanks in advance for trying it out!
Check out the demo of the Xorq CLI tool in action: https://asciinema.org/a/730484
---
Get Started
- Github: https://github.com/xorq-labs/xorq - Xorq docs: https://docs.xorq.dev/ ---
Sneak peak - Xorq Compute Catalog UI Console:
Check out this interactive Claude demo showing how the Xorq compute catalog can be visualized to accelerate composition, reuse, and troubleshooting of AI compute: https://claude.ai/public/artifacts/d2f00d2a-a3f9-4032-884e-d...
Walk-through of rocket landing optimization paper [pdf] #
I found this rocket landing trajectory optimization paper cool, but it took me a while to wrap my head around it and implement it. I wrote up an expanded version of the paper including details that would have helped me understand it the first time through, with the idea being that it might make the content more approachable for others with similar interests. The source code is also linked in the document.
I'm open to feedback, I'm always trying to get better across the board.
Debunking Election Fraud Claims – Interactive Data Viz and Simulations #
I built this after seeing several references to Election Truth Alliance on social media, and after reading their analysis, I just couldn't get the problems I saw in it out of my head.
So I downloaded the data, and rebuilt their full analysis from scratch.
Their critical error is a simple misunderstanding of the Law of Large Numbers: values collected in large samples converge to the true probability in the sample distribution.
(not to be confused with the Law of Very Large Numbers: which states that unlikely things happen given enough time. That confused me too)
Technical Details:
- No build system, this is entirely handmade HTML, CSS, and plain Javascript. - Initial analysis done in Python with only standard libraries. - Visualizations created in Observable Plot and D3.js - Simulations run entirely client-side - Web page built with Scrollama for animations and behavior controls - Vote history visualizations process ~600k individual ballot records in real time, with a little bit of cacheing to keep your browser from chugging. - Made with the help of Windsurf
Interesting Challenges:
- Making the visualizations performant without a backend, which is accomplished with a bit of preloading as you scroll, and some amound of cacheing so that the visualizations can share resources whenever possible. - Windsurf does run wild sometimes. During the initial preprocessing stage, it at one point dumped an absolutely massive json blob to disk, it was so large it actually crashed my whole computer while writing. Then to read it, obviously it couldn't just be read in, but rather than storing in a more sane format, my Opus 4 powered coding agent decided to build a streaming JSON parser from scratch. It worked, and I got the data out that I needed so I didn't go back and make it more sensible, but man that was dumb.
This actually started with the simulation, which took only about a day of work, and then later grew to include the re-analysis and visualizations. The visualizations were all dnoe within 2-3 days after I got the data.
If I did it over again, I would've probably tried to find some kind of build system or static site generator to compose the final result. Once the page got very long it was quite unwieldy even for windsurf. Very short conversations could flood Sonnet 4's rate limit because there was just so much stuff in a single file.
Maia Chess – Human-like chess AI for playing, learning, and more #
* Play Maia-2: Play the (updated) most human-like chess engine, tailored to your skill level
* Analyze your games: See how you (or the pros!) stack up with both Maia’s human-based predictions and classic Stockfish evaluation
* Try Maia-powered puzzles: Tactics puzzles curated and analyzed through Maia’s unique lens
* Openings drill: Brand new! Select openings, play through them against Maia, and get instant, personalized feedback
* Hand & Brain: Play this fun team variant where you play with Maia as a human-AI team
* Bot-or-not: A chess Turing Test: can you spot the bot in a real human-vs-bot game?
* Leaderboards: See how you rank in each mode, and challenge yourself to climb higher
We’d love your feedback: what works, what doesn’t, what’s missing, or what would make the platform more valuable for you. Join our Discord to chat with us and other users (https://discord.gg/hHb6gqFpxZ).
If you're interested in our research behind Maia, you can check out these papers:
Aligning Superhuman AI with Human Behavior: Chess as a Model System, KDD 2020
Detecting Individual Decision-Making Style: Exploring Behavioral Stylometry in Chess, NeurIPS 2021
Learning Models of Individual Behavior in Chess, KDD 2022
Designing Skill-Compatible AI: Methodologies and Frameworks in Chess, ICLR 2024
Maia-2: A Unified Model for Human-AI Alignment in Chess, NeurIPS 2024
Learning to Imitate with Less: Efficient Individual Behavior Modeling in Chess, under review
Same prompt tested across Replit, Bolt, v0, Lovable and Raq.com #
I built Raq.com – a platform that uses Claude Code to build working internal tools directly in the browser.
Claude Code is great at self correcting when given the right tools.
I've found that the popular web-based AI coding tools look great in demos but fail on real API integrations or require a lot of error back and forth. They don't appear to do much research or self-correcting, likely to reduce spend. I wanted to see the current state of these tools, so I ran the same prompt on five platforms (Replit, Bolt, v0, Lovable, and Raq.com) to build a tool that requires 3 different APIs (Companies House, FinUK and OpenRouter) working together.
Four platforms produced broken prototypes or needed manual fixes. Raq.com delivered a complete working solution from a single prompt (that can be deployed to live with one click).
Full test with videos: https://raq.com/real-world-test
We're in early access (requires Claude Pro/Max for free usage) - we're looking non-coders who would like to build internal tools for their team.
Some technical info:
- Raq.com provisions isolated dev and prod Docker environments for each company (companyname.raq.com and companyname-dev.raq.com). - The dev site includes a persistent terminal streamed to the browser, so the session continues even while tab is closed. - CLAUDE.md file provides best practices, known pitfalls, and coding patterns for the Laravel + Filament stack. - Self-Correction Loop: Claude can test and debug its own work. It has direct shell access to a custom script that bundles PHPUnit, syntax checks, and cache clearing. Plus a Playwright wrapper to check for errors and take screenshots. - A single click runs a script that rsync's the dev workspace to the prod container, runs migrations, and clears caches.
Agentic Coding Tools – Directory for AI agents and vibe coders #
Most of these tools can plan, scaffold, and write code with minimal input. Some are polished, some experimental. I wanted a way to compare them all in one place.
You can filter by autonomy level, LLMs used, pricing, open source, etc. It’s a compact UI—works on mobile, has dark mode, and no signups or fluff.
Would love feedback:
Are there tools I’ve missed?
Anything that should be organized differently?
Info you wish was included?
Cheers.
I built a Tamagotchi that teaches French numbers #
So I built a virtual pet that thrives on correctly answered number challenges. 20-second drills throughout the day: get them right and Lexie grows, get them wrong and it gets a bit sad (but never dies, this isn't the 90s).
Speak, type, or tap your answers. Would love any feedback/bugreports.
Or just rant about how the Belgians sensibly say "nonante-neuf" whilst we're stuck with "four-twenties-nineteen".
API Radar – Real-time GitHub scanner for exposed API keys #
I'm a solo dev and student, and I recently built API Radar — a real-time tool that monitors public GitHub commits for leaked API keys (OpenAI, Google Gemini, Anthropic Claude, and more).
What it does: Scans public GitHub commits in real time Detects API keys using pattern matching and validation heuristics Redacts most of the key, but allows copying for verified leaks (for security teams) Leaderboards by leaky repositories and exposed providers Built to promote developer hygiene and security awareness
Stack: Backend: Node.js (Fastify), MongoDB, Redis, custom TruffleHog-like scanner Frontend: Next.js 14, TailwindCSS, shadcn/ui Infra: VPS, NGINX + SSL, background worker farm, rate-limit handling
Current stats (soft launch): 210 active users 208 new users 2.6K total events 53s average engagement time
Built fully solo — from design to deployment, analytics to queue resilience. My goal was to ship something fast, security-aware, and production-grade.
Would love feedback on: Improving UX for security teams Ethics around redaction and disclosure Ideas to scale this into an OSS tool or API service
Thanks for reading! https://apiradar.live
— Zaim
YouTubeTldw: ad‑free, login‑free YouTube summaries in a flash #
The longer a talk is, the more ad revenue a creator gets. But we don't all have 40 minutes to listen to someone slowly edge around a point.
This website has no ads, no login, and is 100% free.You can find the source code here [2].
[1] https://pypi.org/project/tldw/ [2] https://github.com/DavidZirinsky/tldw-site
I waste my time extracting stuff every week from the Internet #
So basically, I got into a process of swiping 2500 or so times a week about content, before diving even more in the one that interested me at first glance, which means I've eaten a lot of my time for little to no value.
Any genius (or stupid) ideas on how to do better? I'd like to continue, but as it is now, it's too much time consuming and I'll get bored soon ... Of course I could automate the selection with LLMs but that's not the point, I like human-picked stuff (although I may benefit from auto-filtering generated content if I knew how to).
Thanks :)
Suggest – Ultra-low-friction feedback for your website #
We built this to get feedback from external users, but we're seeing high uptake from our own team for internal feedback. Often, we'd encounter small paper cuts in the product in the middle of another task. In the past, many of these were just not reported - the effort to create an issue and describe it in sufficient detail was too high. And if you did report, the context switch was long enough that you'd interrupt your original task. Suggest takes a literal 4 seconds to leave feedback, and because of the session replay, all feedback is very high quality - always with enough information for a developer to reproduce an issue.
Suggest works alongside your existing tools and doesn't need you to replace any of them.
Thoughts and feedback are very welcome!
TanStack DB – Reactive DB with Differential Dataflow for TanStack Query #
We’ve been working on TanStack DB, an embedded, reactive client database for TanStack Query, and are proud to announce today that with the 0.1 release that it's now in BETA!
TanStack DB plugs into your existing TanStack Query useQuery calls and uses Differential Dataflow to incrementally recompute only what changed, so updates stay sub-millisecond even with 100k rows. You get live queries, optimistic updates with automatic rollback, and streaming joins — all in the client!
TanStack DB works with REST, GraphQL, WebSockets, and shines with sync engines like ElectricSQL or Firebase, letting you load large, normalized collections once and stream real-time changes into the client without manual bookkeeping. It sits on top of queryClient so you can adopt it incrementally, one route at a time.
- Intro post: https://tanstack.com/blog/tanstack-db-0.1-the-embedded-clien...
- Local-first sync via Electric: https://electric-sql.com/blog/2025/07/29/local-first-sync-wi...
- Web starter with TanStack Start: https://github.com/electric-sql/electric/tree/main/examples/...
- Mobile starter with Expo: https://github.com/electric-sql/electric/tree/main/examples/...
- Project website and docs: https://tanstack.com/db
- GitHub repo: https://github.com/tanstack/db
Try it out and let us know what you think!
I built a deep email validation library in Kotlin #
Hey HN,
I wanted a real-world project to properly learn Kotlin (coroutines, DSLs, etc.) and decided to tackle a problem I've found surprisingly underserved: comprehensive email validation. Most solutions stop at regex, but that doesn't prevent sign-ups from [email protected] or disposable email services.
So, I built a library that performs a series of deeper checks. I just tagged the v1.0.0 release because the API is now stable and I think it's ready for feedback from the community.
It validates an email in layers:
1. Syntax: A robust check that's more reliable than a typical regex.
2. Domain Registrability: Checks the domain against the Public Suffix List to ensure it's on a real TLD.
3. MX Records: A DNS query to see if the domain is actually configured to receive email.
4. Disposable Services: Checks against a list of known temporary/throwaway email providers.
5. SMTP Connection (Optional): A live check to see if the mailbox actually exists. This is off by default since port 25 is often blocked, but can be enabled via a proxy.
One of my main goals was to build something that would be useful on both the server and on a client like an Android app. This led to a couple of key design decisions:
- It's built with coroutines for non-blocking, concurrent I/O.
- It has a full offline mode. You can disable all network checks and run it using bundled datasets for things like syntax and disposable domain checks, which is great for providing instant, client-side feedback.
The configuration is done through a simple Kotlin DSL.
The project is MIT licensed. I'm posting this to get your thoughts on the approach, the architecture, or any Kotlin idioms I might have missed. How do you all typically handle this problem beyond regex?
4KFilmDb – A tool to track and analyze 4K movies (HDR, Dolby Atmos) #
Over the past few months, I’ve been building 4KFilmDb, the first (and independent) 4K movie database to track and compare streaming quality (HDR, bitrates, Atmos audio) across platforms (Netflix, Prime Video, Disney+, etc).
Key features: • HDR & Atmos analyzers • Smart filters (Presets) with built-in options to suggest 4K titles or ready-made lists • Fake HDR titles tracker (spot poor HDR grades easily)
Currently in beta, so feedback is very welcome.
MultiDrive – a free app to clone, backup, erase drives (UI/CLI) #
After 17 years of work with drives, I got tired of seeing simple disk operations locked behind paywalls. Macrium killed their free version, EaseUS hides cloning in paid tiers, and so on and so forth. Another reason is that all the existing solutions are overly complicated. It must be simple enough for my mum to start erasing a USB stick or making a full drive backup. All in all, we thought the community deserved better.
What makes MultiDrive different:
- Dead simple launch of a full drive backup, clone, erase or restore
- 100% free. No ads, "upgrade to pro" popups
- Standard formats. Backups use ZIP or RAW, no proprietary .afi/.tib nonsense
- Handles bad sectors, loose cables, can pause/resume any operation
- Parallel drive tasks
- CLI app as an addition for workflow automation
Would love your feedback! What problems with disk operations have you had that current tools couldn't solve? We're building our roadmap based on real pain points.
PolyglotGPT – Conversational AI for Learning 40 Languages #
To start, all you have to do is set your native language and target language. Then, just start talking to it in either your native or target language. It'll catch any mistakes you make when you speak in your target language and answer any grammar/vocab questions you have.
It has a translate button, a romanize button (converts any text into the Latin alphabet), and you can highlight words/phrases you don't know in AI responses to have them explained.
I'd appreciate any feedback, thanks!
Gogg – A GOG game downloader written in Go #
I made an open-source tool in Go, named gogg, to download and back up your GOG.com game library.
It's cross-platform and has features like:
- A scriptable CLI and easy-to use GUI
- Multi-threaded and resumable downloads
- Filters for platform, language, DLCs, etc.
- File verification with hashes and total size calculation
You can find the project on GitHub: https://github.com/habedi/gogg
BreathylBox – A lockbox that only opens when you're sober #
I'm an incoming college freshman building BreathylBox, a lockbox that stays locked unless you pass a breathalyzer test and authenticate with a passcode. It’s designed to help prevent access to car keys, firearms, or phones when someone’s been drinking.
Here’s the landing page: https://www.breathylbox.com
Right now, we’re validating demand across use cases:
Parents storing car keys after parties Gun safety in homes with teens People trying to reduce tech use while drinking
We’re not selling anything yet — just trying to see if the idea resonates and which use case to prioritize. Would love your feedback:
Would you use something like this? What should we do (or avoid) before moving to manufacturing? Any obvious legal or hardware red flags?
Thanks in advance — happy to answer any questions!
Sean Short, CEO BreathylBox ([email protected])
Railway hackathon – deploy an idea over a weekend #
Build a template for others, whether it be for full-stack apps or a headless CMS.
We've seen people deploy traditional apps or infra to host marketing blog sites (we host ours on Railway).
Upto $1000 in prizes for project complexity or content depth.
AI agents reviewing each other's code in production [video] #
Results after 2 months: - 98% production-ready code before human review - 3-month features now ship in 2 weeks - 2 developers supporting 4 platforms effectively
Video walkthrough (10 min): https://www.youtube.com/watch?v=fV__0QBmN18
Tech stack: Claude Code, CodeRabbit, Asana and Figma via MCP, custom orchestration layer.
The interesting part is watching them disagree - CodeRabbit might suggest an optimization, and Claude will defend its approach with specific reasoning about our codebase. These conversations create great documentation.
Happy to answer questions about the setup, costs, or specific implementation details.
Give Claude a secure coding env to automate work in your apps #
After building AI tools for the past year, we recently made a YouTube video on building MCP servers and realized MCP is a total game-changer. It essentially lets AI do anything by connecting to your apps. But the deeper we dove, the clearer it became that security and privacy were complete afterthoughts. Coming from backgrounds at Okta and Stripe, this made us pretty uncomfortable.
We kept seeing the same pattern: every app needs its own MCP server, each storing sensitive tokens, with minimal security controls. It felt like we were back to the early days of OAuth implementations. Functional, but scary.
How Keyboard fixes this:
- Isolated execution: Your API keys live in your own GitHub Codespace secrets, Bearer OAuth tokens in encrypted files on your machine. Your credentials stay in your trust radius - Ephemeral environments: Codespaces can be destroyed/recreated, limiting blast radius - Built-in access controls: GitHub's enterprise-grade security model protects your credentials - Zero-trust architecture: Only you can access your API keys and execution environment
What makes this different:
- Real code execution: Claude can run JavaScript/Node.js with npm packages and your API credentials - Reusable workflows: Save complex scripts as "Keyboard Shortcuts" for instant reuse - Universal integration: One setup connects Linear, Slack, Google Workspace, GitHub, and more - Auto-environment management: Codespaces created/managed automatically as needed
The GitHub Codespace approach came from experimental work with interactive documentation. We realized Codespaces might be the most secure place to execute these tasks - isolated, ephemeral, with enterprise-grade controls.
We need your help: If this resonates, give us a star on GitHub! We're looking for early users and contributors who want to help make MCP more powerful and more secure.
We'd love your feedback, especially if you've been experimenting with MCP yourself!
If you want to try it here is the quickstart: https://docs.keyboard.dev/getting-started/quickstart
SBoMPlay – Client side SBoM explorer #
opensource, work in progress code, please share your feedback.
Dart implementation of the libp2p networking stack #
I built a Vue dependency debugger plugin #
StoxGPT – type "add RSI" and the indicator appears on the chart #
I built *StoxGPT*, a TradingView-powered chart where you control everything by chat. Example:
> *You*: add RSI > *Chart*: (RSI indicator appears) > *You*: change ticker to AMZN > *Chart*: (switches symbols)
No menus, no hotkeys—just natural language mapped to the TradingView JS API.
---
### Why? I got tired of drilling through panels to add indicators or tweak settings. A chatbot front-end felt faster, so I wired GPT-3.5-Turbo to TradingView’s `widget.activeChart()` calls.
---
### How it works * *React + Next.js* front-end * *Dummy OHLCV generator* (open-source) for this demo * Simple command grammar → LLM → function-calling layer → TradingView injection * Hosted on Vercel; cold start ~ 400 ms
---
### What I’m exploring next * *Plaid auth* → live Robinhood balances & fundamentals * Back-testing via OpenAI function calls * Sharing indicator “recipes” between users
---
### Looking for feedback * Would live data make this a daily driver? * Any killer feature missing? * Is the chat modality actually faster for you?
Thanks in advance—happy to answer anything!
Faster local AWS EKS access #
CineWan – video generation platform powered by Wan2.2 AI model #
I've been working on CineWan, an AI video generation
platform that leverages the new Wan2.2 models with
Mixture-of-Experts (MoE) architecture.
Technical highlights:
• MoE architecture separates denoising across
timesteps with specialized expert models
• Dynamic routing system selects experts based on
content complexity
• Generates up to 720p, 121-frame videos from text or
images
• Built on Next.js 15 with edge runtime for <50ms
global response times
• Smart cost optimization: Cloudflare R2 storage with
3-day auto-expiration
• Real-time progress streaming with exponential
backoff polling
The Wan2.2 models were trained on +65% more images and
+83% more videos than v2.1, with integrated
cinematography principles. We're seeing cinema-grade
output quality that rivals much more expensive
solutions.
Benchmax, a new open-source RL environment framework for LLM finetuning #
I’ve been working on `benchmax`, a open-source framework for building, running, and parallelizing environments, to fine-tune LLMs with reinforcement learning.
What I wanted to solve for:
- Environments are tightly coupled with RL trainers, leading to fragmentation and limited compatibility.
- These coupled environments are tend to be mostly competitive math and coding → for OSS RL + LLMs to scale, we need more complex, real-world environments.
- Scaling these environments in parallel is still not easily possible
What I'm excited about:
- benchmax is training framework agnostic with adapters already built out for verl and verifiers. we’re gonna build more adapters for other frameworks (e.g. SkyRL, etc.), instead of forcing others to adopt our standard (though ofc they’re welcome to )
- benchmax comes with a few interesting environments out of the box: spreadsheet processing, CRM, etc. → more coming soon!
- benchmax supports MCP as a first class citizen. there has been an explosion of MCP servers/tools built out for usecases ranging from browser use to excel to game creation.`benchmax` allow folks to leverage and compose these existing MCP servers to build environments integrated with real world systems
- Multi-node environment parallelization coming soon!
If you like what you see, feel free to *star* the *repo* to support the project!! Our hope’s to really let anyone benchmax on their tasks, with benchmax
https://github.com/cgftinc/benchmax
It’s still very early! And I expect to be shipping a lot more things → more environments, more trainer integrations. Would love y’all’s thoughts what environments and trainer integrations could be interesting!
I built an API to generate PDF invoices from JSON #
I'm Daniel. I built a simple and straightforward API: you POST a JSON payload with your invoice data, and it returns a secure, presigned URL to a generated PDF. The goal is to make invoicing a single, reliable API call so you can get back to your main product.
I also used this as a personal challenge to move away from my old LAMP stack background and build something new with Python/FastAPI, Next.js, and a serverless architecture on GCP and AWS.
For the HN community, I've set up a promo code: HEYHN100
If you sign up, you can redeem it in your dashboard for 100 free credits (on top of the 10 you get by default). The credits don't expire.
I'm here to answer any questions and would genuinely appreciate any feedback or technical critiques you have. Thanks for checking it out.
GenDB – I built a tool to generate a full database from a single prompt #
I recently built a prototype called GenDB, an AI-powered backend builder designed to eliminate boilerplate and streamline database deployment.
The idea came out of my own frustrations: I was spending hours writing and rewriting Python code with TortoiseORM just to build and modify basic schemas. Then I’d have to deploy it all over again for even small changes. After seeing tools like Lovable and Cursor make front-end development nearly effortless, I started to wonder: why wasn’t backend development just as fluid?
With GenDB, you can: - Prompt a schema (e.g., “Instagram clone”) via natural language or image - Edit it visually using a DBML-based ERD editor - One-click deploy to GCP or AWS - (Coming soon) Auto-generate APIs and safe migration scripts
The goal is to go from idea → schema → live backend in minutes, without writing boilerplate or fiddling with cloud infrastructure too much.
It’s still an early prototype, and I’d love your feedback, especially if you’ve run into similar pain points. What seems useful? What’s missing? Where will this fall apart?
Demo + details here: https://gendb.carrd.co/
Thanks for taking a look!