매일의 Show HN

Upvote0

2025년 10월 3일의 Show HN

22 개
75

FLE v0.3 – Claude Code Plays Factorio #

jackhopkins.github.io faviconjackhopkins.github.io
17 댓글7:32 PMHN에서 보기
We're excited to release v0.3.0 of the Factorio Learning Environment (FLE), an open-source environment for evaluating AI agents on long-horizon planning, spatial reasoning, and automation tasks.

== What is FLE? ==

FLE uses the game Factorio to test whether AI can handle complex, open-ended engineering challenges. Agents write Python code to build automated factories, progressing from simple resource extraction (~30 units/min) to sophisticated production chains (millions of units/sec).

== What's new in 0.3.0 ==

- Headless scaling: No longer needs the game client, enabling massive parallelization!

- OpenAI Gym compatibility: Standard interface for RL research

- Claude Code integration: We're livestreaming Claude playing Factorio [on Twitch](http://twitch.tv/playsfactorio)

- Better tooling and SDK: 1-line CLI commands to run evaluations (with W&B logging)

== Key findings ==

We evaluated frontier models (Claude Opus 4.1, GPT-5, Gemini 2.5 Pro, Grok 4) on 24 production automation tasks of increasing complexity.

Even the best models struggle:

- Most models still rely on semi-manual strategies rather than true automation

- Agents rarely define helper functions or abstractions, limiting their ability to scale

- Error recovery remains difficult – agents often get stuck in repetitive failure loops

The performance gap between models on FLE correlates more closely with real-world task benchmarks (like GDPVal) than with traditional coding/reasoning evals.

== Why this matters ==

Unlike benchmarks based on exams that saturate quickly, Factorio's exponential complexity scaling means there's effectively no performance ceiling. The skills needed - system debugging, constraint satisfaction, logistics optimization - transfer directly to real challenges.

== Try it yourself ==

>>> uv add factorio-learning-environment

>>> uv add "factorio-learning-environment[eval]"

>>> fle cluster start

>>> fle eval --config configs/gym_run_config.json

We're looking for researchers, engineers, and modders interested in pushing the boundaries of agent capabilities. Join our Discord if you want to contribute. We look forward to meeting you and seeing what you can build!

-- FLE Team

22

BetterBrain – Dementia prevention, covered by insurance #

betterbrain.com faviconbetterbrain.com
5 댓글2:03 AMHN에서 보기
Hey all! I’ve been building BetterBrain for the past few months, which is the first dementia prevention program entirely covered by insurance. BetterBrain combines expert clinicians, comprehensive testing and state of the art AI - and for many insurance plans is $0. Research shows that dementia can be detected up to 20 years in advance. Despite this, many people at risk of dementia overlook regular brain health assessments. Many members of our founding team have family members affected by neurodegenerative disease.

We’re also hiring aggressively if anyone is interested in changing the future of treating neurodegenerative disease.

Would love to talk to anyone interested https://www.betterbrain.com/insurance

11

A visual AI interface to understand papers/books/topics #

kerns.ai faviconkerns.ai
2 댓글10:01 PMHN에서 보기
I feel like LLMs can help me understand anything. However, after I get a summary, I can't dive in to parts that I find interesting; can't refer to original source easily and can't control context with chatbots. This is an attempt to solve for a complete knowledge consumption experience with AI . Please give me feedback!
11

Powerful Visual Programming Language (Book) #

pipelang.com faviconpipelang.com
15 댓글9:42 PMHN에서 보기
Throughout my 30+ software development career, after spending many sleepless nights digging up through enormous codebases to understand logic or fix a bug, I was thinking: "There must be a better, visual way to represent program rather than text". However, no usable visual programming language popped up on horizon for the whole duration of 30+ years of my career. Therefore, I decided to take matters in my own hands, creating new visual programming language called "Pipe". A book about this language was published recently. The book is available for free on Amazon Kindle and Apple iBooks.

Language Pipe has a level of sophistication and power comparable to existing most powerful textual languages and therefore, it has a very high chances to successfully compete with text-based programming. The book provides full and comprehensive language specification. On top of that, the book contains many features and ideas planned for future versions of the language.

Pipe implements many novel concepts and unique features. As a result, multiple patent applications have already been filed and pending. The published book contains complete language specification, including graphical notation of all its elements and full API specification for code integration. Pipe has the following features:

* General-purpose visual language.

* Compact but powerful language.

* Complete and detailed language specification.

* Practical visual language.

* API specification for integration with non-visual languages.

* Statically-typed language.

* Long-term plans for future versions.

* Augmentation of AI code generation.

* Language for the next generation of low-code systems.

The problem of AI code generation is that it is very difficult to prepare complete and precise input specifications, especially in case of a large project. The solution is generating code only for base-level components easily explainable to AI, completing the rest of application via manual coding. That, however, undermines the goal of leveraging AI to remove the need for human programming. Pipe provides an alternative to textual coding by encapsulating AI-generated components within visual blocks for building the rest of application as graphical workflows via an intuitive drag-and-drop interface. As a next level of Pipe evolution, AI will be generating complete visual workflows directly, making it much easier to understand and modify generated logic.

Usage of a general-purpose visual programming language Pipe to connect blocks containing AI-generated code can inspire the next generation of extremely versatile low-code platforms, as AI code generation followed by visual integration of generated components is a very powerful low-code framework. Users will be able to generate new components using AI and that solves the problem of limited customization in existing low-code platforms where components are mostly predefined. On top of that, common visual programming language Pipe will ensure portability of low-code projects between different platforms.

7

Pluqqy – Terminal based context management tool for AI coding #

github.com favicongithub.com
5 댓글2:08 AMHN에서 보기
I vibe-coded a terminal tool called Pluqqy (I had a dormant domain on hand) to help me keep LLM context organized while coding with AI. It’s my first time writing Go and my first terminal app, built almost entirely with Claude Code.

• What it does: Pluqqy lets you manage prompts, rules, and context as small building blocks, then stitch them together into a single file (like AGENT.md or CLAUDE.md) that your coding agent can consume. It’s meant to reduce context drift and make iteration easier.

• Why I built it: I was losing track of my agent context between sessions and wanted something lightweight, reproducible, and terminal-native.

• Status: This is more of an experiment / thought-tool than a maintained project. It works on Mac; Windows/Linux haven’t been tested much.

• Install: go install github.com/pluqqy/pluqqy-terminal/cmd/pluqqy@latest

• Landing page: https://pluqqy.com (just had fun with it)

6

API for removing watermarks from Sora 2 videos #

cliploom.app faviconcliploom.app
0 댓글3:54 PMHN에서 보기
Computer vision for detection, advanced inpainting for removal, FFmpeg for audio handling. Simple REST endpoints with webhook callbacks for async processing.

Built this after seeing developers struggle with building their own ML pipelines for video post-processing. The API handles the complexity—you just POST a video and get back a clean file

4

BodhiGPT – Become a Better Human with AI #

bodhigpt.com faviconbodhigpt.com
1 댓글2:01 PMHN에서 보기
Hey HN — Like many here, watching the capabilities of LLMs and AI advance over the last few years has been the most exciting stretch of my tech/data science career.

And while I am not an AI doomer, it’s become clear to me that as we become more reliant on AI to automate larger and larger portions of our lives, certain aspects of our humanity become even more important to develop (consciousness and awareness, mental and physical health, personal knowledge to form your own perspectives on topics, etc) and AI can actually help us do just that in ways that haven’t been easy or even possible before.

So I built BodhiGPT for myself to do that in as streamlined a way as possible. I didn’t originally intend to share, but it’s become such an integral part of my daily routine that I decided to launch it as a side project. Feedback welcome and I am going to continue to add more tools over time (already have a few in mind, but would curious if folks had other ideas within this theme). Cheers.

4

Beacon (open source) – Built after AWS billed me 700% more for RDS #

beaconinfra.dev faviconbeaconinfra.dev
0 댓글8:22 PMHN에서 보기
I've been hosting my side project on AWS. I was paying an okay price for not managing infrastructure at all. I moved everything to AWS Ligthsail after my startup credits run out. The project was initially a success and made several thousand euros per month in revenue. Then came covid with new regulations, and suddenly my customers were non existent (the problem it solved was no longer there). After that it was not making me money, I was paying it from my own pocket to maintain it, thinking maybe it will come back. Then one day, after some ignored spam AWS emails, I got a huge charge on my card, along with a bill from AWS. The charge was orders of magnitude higher than the previous charges."WTF??" I said to myself while rushing to log into the dashboard to see what the issue was.

No DDoS, no misconfiguration, nothing unusual. I logged into the root account to look at the billing page, and there it was:RDS PostgreSQL legacy fee ~€200 because I did not upgrade to Postgres 16 (from 13).

I was baffled. I paid monthly €25 (27% tax included) for the smallest RDS instance, then I see this monster fee for something I think should cost maybe €2. I mean AWS just has to run it in a different environment. For €200 I could buy them a new server to run it for me.

That's when I had the realization: "I have a spare Raspberry Pi 3, I'll just host everything on that. That will be for free." But self-hosting came with it's own challenges, especially on a resource-constrained device. I needed better tools to deploy and monitor my application. SSH-ing into the Raspberry Pi every time I wanted to deploy a newer version was a pain in the ass. So was debugging issues. Existing deployment and monitoring solutions were either too expensive, too complex, or didn't work well with resource-constrained devices like Raspberry Pi. Examples: * Grafana/Prometheus for monitoring: Over-engineered for my needs. * OpenSearch/ELK for logs: A nightmare on low-resource devices. * Metabase for dashboards: A ram hungry monster that eats up more resources than if I hosted 100 applications. And to access the db remotely opening a port and putting it behind Cloudflare Zero Trust is much easier than setting up Metabase.

So I decided to build my own deployment and montitoring agent, and why not make it opensource? The agent can currently deploy applications from github by polling release tags, monitor device metrics, alert when thresholds are reached, forward logs to cloud dashboard. It's still in development, with features improving every week. If you are interested, give it a start on Github.

4

Real-time app starter with FastAPI, PostgreSQL pub/sub, and UV #

1 댓글3:12 PMHN에서 보기
Starter template for real-time web apps using modern Python/JS tooling and PostgreSQL's LISTEN/NOTIFY instead of external message queues.

Stack: - UV (Python package manager - incredibly fast) - FastAPI with full async/await - PostgreSQL triggers + LISTEN/NOTIFY for pub/sub - Bun for frontend builds - Proper connection pooling and lifecycle management

GitHub: https://github.com/garage44/plank

This came from rebuilding the same pattern across projects. Most examples I found were toy demos that didn't handle reconnection, dead clients, or proper shutdown.

Includes working frontend example that updates in real-time when database changes. Docker Compose setup for testing.

Good for: admin dashboards, monitoring tools, collaborative apps where you just need current state pushed immediately. Not for: guaranteed message delivery or job queues.

4

A VS Code Extension for Genesis DB – The event sourcing database #

genesisdb.io favicongenesisdb.io
0 댓글7:42 PMHN에서 보기
I've been working on an extension that brings the full Genesis DB experience directly into Visual Studio Code. Genesis DB itself is a production-ready event-sourcing database engine, and with this extension you can now:

- Manage multiple connections (dev, staging, prod) with token-based authentication

- Explore events via a built-in Event Explorer UI

- Commit new events, run GDBQL queries, and instantly view results

- Manage schemas (register, browse, validate) directly in the editor

- Use built-in GDPR features like event erasure, without leaving VS Code

The goal: no more context switching between CLI, APIs, and docs, everything integrated into the editor you already use.

You can check out the extension here: https://marketplace.visualstudio.com/items?itemName=patricec...

I'd love to hear what you think: Is this something that would help your workflow? What would you expect from a database extension in VS Code?

4

Was pissed about Google Docs, So I made an Text Editor myself #

sourcepilot.co faviconsourcepilot.co
2 댓글7:58 PMHN에서 보기
It’s been a while since I’ve started to write a book . The process of creation of it has not been easy , first because I’m not a writer , I’ve created well though out internet posts here and there, which ended up creating my first book. It was a good experience , but then I’ve started to think that a book that just gathered my thoughts online it’s not entirely “writing” a book , I needed more. And than I’ve opened google docs and start typing. Then I started to figure out what I wanted to write: should it be a fantasy story, a self-biography, or an observation of the world? I believe most writers have this figured out beforehand, but not me. I began writing pieces to see if they would fit together and make sense. I started gathering philosophical anecdotes based on my core beliefs and sensed something brewing. When I finally decided what the book would be about, and what I wanted to write, the type of writing I wanted to do, I saw an already sizable document with ideas scattered throughout it. That was good for me, as I could just join the pieces, but I didn’t want to be trapped in writing that could be repetitive. I wanted to have the ideas, philosophy, the whole reason why the book is like this, stored in a place I could easily access. I'm planning to use AI as a memory dump, where I can add information during a conversation. Then, whenever I consult it, I can check if I've already written something and if it reflects the temper and pace I want for my book. Everything seems fine, but we encountered a few problems. First, the AI's writing was a conundrum of errors. I could gain assistance and a sense of what to write, but the AI itself, due to our prolonged interchange, started to hallucinate and produce nonsense or "forget" our conversation. The second issue was that the AI couldn't consistently verify what was already written. As the text grew larger, the context window began to shrink, and the more I used the AI tool, the less helpful it became. So I decided to search for a tool that could do what I wanted. I found elements in each of the products I've used: some were extremely satisfying to write with, others had good features to enhance text, some allowed me to organize my book by scattering ideas effectively, and still others used AI for correction and proofreading tasks. The solutions for this market are diverse and offer numerous approaches. I could easily transition between tools, but I wanted something unified to keep my writing process in one place. That’s why I created this text editor and called it SourcePilot. It’s a tool that identifies your writing style as you write, allowing you to add notes, sources, and videos, and to use them as context for the AI, enabling more nuanced outputs tailored to your writing. It was interesting to build, and I’m providing a link you can try. It’s a desktop app, and you can use it for free, depending on the hardware you have. I’m looking for people who could give me feedback on what's wrong with it. People who could not install it (I’ve built it on Mac and could not test Linux and Windows), or have problems logging in. I keep getting loads of problems because I’m using the tool right now as I write this text. I'm planning to launch a new version soon, featuring an anti-slop algorithm I’ve developed, along with document branching. I just want to see if there are people interested in using it at the moment. If there aren't users, that's fine. I think I’ve made something for myself anyway. :) Thank you for your attention if you made it this far. You’re greatly appreciated. Cheers!
2

Lootbox – CLI that unifies MCP and custom functions for Claude Code #

github.com favicongithub.com
0 댓글9:09 PMHN에서 보기
Hey HN! I built a CLI that unifies MCP tools and your custom functions under one code execution interface. Think serverless functions for your coding assistant but on your local machine

Installation is a one-liner curl script (see README)

Instead of configuring your AI with dozens of individual tools, drop a ts file in a directory, and give the LLM one capability: execute TypeScript with lootbox.

  // my-functions/example.ts
  export async function analyzeText(args: { text: string }) {
    return {
      length: args.text.length,
      words: args.text.split(' ').length,
      uppercase: args.text.toUpperCase()
    };
  }

  # Run server with your functions + MCP servers
  lootbox-runtime --rpc-dir ./my-functions --mcp-config mcp.json

  # AI writes code that uses both:
  lootbox -e '
    const file = await tools.mcp_github.read_file({repo: "x/y", path: "README.md"});
    const analysis = await tools.myapp.analyzeText({text: file});
    console.log(analysis);
  '
Your custom TypeScript functions auto-discover alongside your MCP servers and get turned into a fully typed 'tools' object that the AI can use.

AI gets full type definitions for everything. Writes code that chains operations together instead of doing sequential tool calls.

LLM scripts are executed in a Deno Sandbox with only net access. RPC files get full access.

Based on Cloudflare's Code Mode research but completely local.

Check the README for some sample rpc files, a workflow, and a much deeper dive on how it all works.

Typically Claude Code will use

  lootbox --help
  lootbox --namespaces
  lootbox --types kv,sqlite // returns the types from the typed client
And will then start writing a script to orchestrate the tools to accomplish the goal.

Lootbox can also run files so you can tell Claude to save the script as a file and later just run it with

  lootbox path/to/script.ts
Built it after continuing to experiment and play with my original take on Code Mode.

https://github.com/jx-codes/lootbox

Original Take: https://github.com/jx-codes/codemode-mcp

1

SAI – A Reinforcement Learning Competition Platform #

competesai.com faviconcompetesai.com
0 댓글7:47 PMHN에서 보기
We’ve been building something for the reinforcement learning (RL) community in stealth for the past two years, and I'm really excited to finally be able to share it here.

Before diving into what we’ve built, here’s the baseline we started from:

- AGI won’t emerge from isolated algorithms. It will require a shared ecosystem where researchers can train, benchmark, and learn together in open environments.

- We believe RL is the most promising pathway toward general intelligence.

- Most RL researchers are still publishing results in isolation, on tasks that can’t easily be compared.

So, we built SAI, a RL competition platform designed to make RL progress more accessible, standardized, and measurable.

SAI is a platform where you can train, benchmark, and submit models to a global leaderboard. A proving ground for reproducible RL research:

- Competitions designed to surface real research challenges (generalization, transfer, and adaptation)

- Infrastructure for reproducible experiments and shared results

- Community through discussion forums, visible progress and collaboration

With SAI live, the next step is competition, and our second one launches October 6: the Booster Soccer Showdown, in partnership with Booster Robotics.

The challenge itself asks a core AGI question in miniature:

Can one agent generalize across different environments without per-task tuning?

Competitors will need to train a humanoid soccer agent to succeed at three related tasks - testing policies for adaptability, transfer, and generalization, the very qualities real-world intelligence requires.

If you’re into RL or just curious about ML, feel free to try out the platform. All feedback and ideas are welcome!

Platform: https://competesai.com/

Booster Soccer Showdown: https://competesai.com/competitions/cmp_xnSCxcJXQclQ