2025年10月28日的 Show HN

34 篇

246

Bash Screensavers #

github.com

77 評論11:12 AM在 HN 查看

A github project to collect a bunch of bash-based screensavers/visualizations.

122

Pipelex – declarative language for repeatable AI workflows (MIT) #

github.com

27 評論4:19 PM在 HN 查看

We’re Robin, Louis, and Thomas. Pipelex is a DSL and a Python runtime for repeatable AI workflows. Think Dockerfile/SQL for multi-step LLM pipelines: you declare steps and interfaces; any model/provider can fill them.

Why this instead of yet another workflow builder?

- Declarative, not glue code: you state what to do; the runtime figures out how. - Agent-first: each step carries natural-language context (purpose, inputs/outputs with meaning) so LLMs can follow, audit, and optimize. Our MCP server enables agents to run pipelines but also to build new pipelines on demand. - Open standard under MIT: language spec, runtime, API server, editor extensions, MCP server, n8n node. - Composable: pipes can call other pipes, created by you or shared in the community.

Why a domain-specific language?

- We need context, meaning and nuances preserved in a structured syntax that both humans and LLMs can understand - We need determinism, control, and reproducibility that pure prompts can't deliver - Bonus: editors, diffs, semantic coloring, easy sharing, search & replace, version control, linters…

How we got there:

Initially, we just wanted to solve every use-case with LLMs but kept rebuilding the same agentic patterns across different projects. So we challenged ourselves to keep the code generic and separate from use-case specifics, which meant modeling workflows from the relevant knowledge and know-how.

Unlike existing code/no-code frameworks for AI workflows, our abstraction layer doesn't wrap APIs, it transcribes business logic into a structured, unambiguous script executable by software and AI. Hence the "declarative" aspect: the script says what should be done, not how to do it. It's like a Dockerfile or SQL for AI workflows.

Additionally, we wanted the language to be LLM-friendly. Classic programming languages hide logic and context in variable names, functions, and comments: all invisible to the interpreter. In Pipelex, these elements are explicitly stated in natural language, giving AI full visibility: it's all logic and context, with minimal syntax.

Then, we didn't want to write Pipelex scripts ourselves so we dogfooded: we built a Pipelex workflow that writes Pipelex workflows. It's in the MCP and CLI: "pipelex build pipe '…'" runs a multi-step, structured generation flow that produces a validated workflow ready to execute with "pipelex run". Then you can iterate on it yourself or with any coding agent.

What’s included: Python library, FastAPI and Docker, MCP server, n8n node, VS Code extension.

What we’d like from you

1. Build a workflow: did the language work for you or against you? 2. Agent/MCP workflows and n8n node usability. 3. Suggest new kinds of pipes and other AI models we could integrate 4. Looking for OSS contributors to the core library but also to share pipes with the community

Known limitations

- Connectors: Pipelex doesn’t integrate with “your apps”, we focus on the cognitive steps, and you can integrate through code/API or using MCP or n8n - Visualization: we need to generate flow-charts - The pipe builder is still buggy - Run it yourself: we don’t yet provide a hosted Pipelex API, it’s in the works - Cost-tracking: we only track LLM costs, not image generation or OCR costs yet - Caching and reasoning options: not supported yet

Links

- GitHub: https://github.com/Pipelex/pipelex - Cookbook: https://github.com/Pipelex/pipelex-cookbook - Starter: https://github.com/Pipelex/pipelex-starter - VS Code extension: https://github.com/Pipelex/vscode-pipelex - Docs: [https://docs.pipelex.com](https://docs.pipelex.com/) - Demo video (2 min): https://youtu.be/dBigQa8M8pQ - Discord for support and sharing: https://go.pipelex.com/discord

Thanks for reading. If you try Pipelex, tell us exactly where it hurts, that’s the most valuable feedback we can get.

Research Hacker News, ArXiv & Google with Hierarchical Bayesian Models #

sturdystatistics.com

23 評論3:49 PM在 HN 查看

Hi Hacker News! I’m a Bayesian statistician that has been working on applying hierarchical mixture models (originally developed for genomics) to structure text data, and in the process, used these models to build (what started as a personal) tool for conducting literature reviews and deep research.

My literature review process starts with a broad search to find a few key papers/groups, and from there expands along their citation networks. I needed to conduct a few rounds of literature reviews during the course of my research and decided to build a tool to facilitate this process. The tool started as an experimental wrapper over low-level statistical software in C, quickly became a testing/iteration ground for our api, and is now my personal go-to for lit reviews.

The tool organizes corpuses of text content, visualizes the high level themes, and enables me to pull up relevant excerpts. Unlike LLMs, this model transparently organizes the data and can train from scratch quickly on small datasets to learn custom hierarchical taxonomies. My favorite part of the tool is the citation network integration: any research paper it pulls up has a button “Citation Network Deep Dive” that pulls every paper that cites or is cited by the original paper, and organizes it for further exploration.

I initially built this tool for academic research, but ended up extending it to support Hacker News to mine technical conversation, the top 200 Google results, and earnings transcripts. We have a gallery of ready to explore results on the homepage. If you are kicking off a custom deep dive, it takes about 1-5 minutes for academic search, 3-7 minutes for Hacker News, and 5-10 minutes for Google. To demonstrate the process, I put together a video walkthrough of a short literature review I conducted on AI hallucinations: https://www.youtube.com/watch?v=OUmDPAcK6Ns

I host this tool on my company’s website, free for personal use. I’d love to know if the HN community finds it useful (or to hear what breaks)!

Tamagotchi P1 for FPGAs #

github.com

11 評論7:34 PM在 HN 查看

After being thrust headfirst into FPGA development thanks to the Analogue Pocket, my first from scratch creation was a gate level implementation of the original Tamagotchi toy.

The core, running on both the Analogue Pocket and MiSTer platforms, lets users re-experience the very first Tamagotchi from 1996 with accurate emulation, but modern features. The core has savestates (which is much harder to do in hardware vs software emulation), high turbo speeds (1,800x was the max clock speed I've reached so far), and more.

Learning more about hardware and FPGAs is something I've wanted to do for many years, and I highly recommend it for any programmer-brained person. It's a very slightly different way of thinking that has vast consequences on how you look at simple problems.

Apache Fory Rust – 10-20x faster serialization than JSON/Protobuf #

fory.apache.org

59 評論5:58 PM在 HN 查看

Serialization framework with some interesting numbers: 10-20x faster on nested objects than json/protobuf.

  Technical approach: compile-time codegen (no reflection), compact binary protocol with meta-packing, little-endian layout optimized for modern CPUs.

  Unique features that other fast serializers don't have:
  - Cross-language without IDL files (Rust ↔ Python/Java/Go)
  - Trait object serialization (Box<dyn Trait>)
  - Automatic circular reference handling
  - Schema evolution without coordination

  Happy to discuss design trade-offs.

  Benchmarks: https://fory.apache.org/docs/benchmarks/rust

Butter – A Behavior Cache for LLMs #

butter.dev

22 評論7:39 PM在 HN 查看

Hi HN! I'm Erik. We built Butter, an LLM proxy that makes agent systems deterministic by caching and replaying responses, so automations behave consistently across runs.

- It’s a chat completions compatible endpoint, making it easy to drop into existing agents with a custom base_url

- The cache is template-aware, meaning lookups can treat dynamic content (names, addresses, etc.) as variables

You can see it in action in this demo where it memorizes tic-tac-toe games: https://www.youtube.com/watch?v=PWbyeZwPjuY

Why we built this: before Butter, we were Pig.dev (YC W25), where we built computer-use agents to automate legacy Windows applications. The goal was to replace RPA. But in practice, these agents were slow, expensive, and unpredictable - a major downgrade from deterministic RPA, and unacceptable in the worlds of healthcare, lending, and government. We realized users don't want to replace RPA with AI, they just want AI to handle the edge cases.

We set out to build a system for "muscle memory" for AI automations (general purpose, not just computer-use), where agent trajectories get baked into reusable code. You may recall our first iteration of this in May, a library called Muscle Mem: https://news.ycombinator.com/item?id=43988381

Today we're relaunching it as a chat completions proxy. It emulates scripted automations by storing observed message histories in a tree structure, where each fork in the tree represents some conditional branch in the workflow's "code". We replay behaviors by walking the agent down the tree, falling back to AI to add new branches if the next step is not yet known.

The proxy is live and free to use while we work through making the template-aware engine more flexible and accurate. Please try it out and share how it went, where it breaks, and if it’s helpful.

Dexto – Connect your AI Agents with real-world tools and data #

github.com

11 評論4:07 PM在 HN 查看

Hi HN, we’re the team at Truffle AI (YC W25), and we’ve been working on Dexto (https://www.dexto.ai/), a runtime and orchestration layer for AI Agents that lets you turn any app, service or tool into an AI assistant that can reason, think and act. Here's a video walkthrough - https://www.youtube.com/watch?v=WJ1qbI6MU6g

We started working on Dexto after helping clients setup agents for everyday marketing tasks like posting on LinkedIn, running Reddit searches, generating ad creatives, etc. We realized that the LLMs weren’t the issue. The real drag was the repetitive orchestration around them:

- wiring LLMs to tools - managing context and persistence - adding memory and approval flows - tailoring behavior per client/use case

Each small project quietly ballooned into weeks of plumbing where each customer had mostly the same, but slightly custom requirement.

So instead of another framework where you write orchestration logic yourself, we built Dexto as a top-level orchestration layer where you declare an agent’s capabilities and behavior:

- which tools or MCPs the agent can use - which LLM powers it - how it should behave (system prompt, tone, approval rules)

Once configured, the agent runs as an event-driven loop - reasoning through steps, invoking tools, handling retries, and maintaining its own state and memory. Your app doesn’t manage orchestration, it just triggers and subscribes to the agent’s events and decides how to render or approve outcomes.

Agents can run locally, in the cloud, or hybrid. Dexto ships with a CLI, a web UI, and a few sample agents to get started.

To show its flexibility, we wrapped some OpenCV functions into an MCP server and connected it to Dexto (https://youtu.be/A0j61EIgWdI). Now, a non-technical user could detect faces in images or create custom photo collages by talking to the agent. The same approach works for coding agents, browser agents, multi-speaker podcast agents, and marketing assistants tuned to your data. https://docs.dexto.ai/examples/category/agent-examples

Dexto is modular, composable and portable allowing you to plug in new tools or even re-expose an entire Dexto agent as an MCP Server and consume it from other apps like Cursor (https://www.youtube.com/watch?v=_hZMFIO8KZM). Because agents are defined through config and powered by a consistent runtime, they can run anywhere without code changes making cross-agent (A2A) interactions and reuse effortless.

In a way, we like to think of Dexto as a “meta-agent” or “agent harness” that can be customized into a specialized agent depending on its tools, data, and platform.

For the time being, we have opted for an Elastic V2 license to give maximum flexibility for the community to build with Dexto while preventing bigger players from taking over and monetizing our work.

We’d love your feedback:

- Try the quickstart and tell us what breaks - Share a use case you want to ship in a day, and we’ll suggest a minimal config

Repo: https://github.com/truffle-ai/dexto

Docs: https://docs.dexto.ai/docs/category/getting-started

Quickstart: npm i -g dexto

I was tired of people dmming me just "hi", so I made this - NoGreeting #

nogreeting.kuber.studio

23 評論9:30 AM在 HN 查看

most people on social media don't know how to text

they think starting with a greeting and waiting for a response is kind because that's telephone etiquette, but don't understand that doing that over text is like

someone calling you, saying "hello," then putting YOU on hold.

literally making the other person do extra work to find out what you want.

so I made this website Instead of spending time explaining them this concept (and maybe coming off as very rude), I just keep this in my bio or send them this link when they do.

Pick your name. Pick the greeting trigger. Get a link. (optionally select one of the 16 languages)

They get a friendly explanation of why leading with context matters. You save 10 minutes. Everyone wins.

Train your network to respect your time by being clear about what you need. Life's too short for message ping-pong with strangers.

PS: this builds on the legacy of nohello.net but adds the option of other greetings and adding custom names in the messages

also open source! https://github.com/Kuberwastaken/nogreeting

Ordered – A sorted collection library for Zig #

6 評論5:26 AM在 HN 查看

I made an early version of a sorted collection library for Zig. Sorted collections are data structures that maintain the data in sorted order. Examples of these data structures are `java.util.TreeMap` in Java and `std::map` in C++. These data structures are mainly used for fast lookups (point search) and fast range searches.

The library is available on GitHub: https://github.com/CogitatorTech/ordered

I made semantic search engine for engineering blogs and conferences #

devblogs.sh

0 評論5:19 PM在 HN 查看

Write Node.js code in Rust to achieve massive HTTP throughput #

npmjs.com

2 評論5:56 AM在 HN 查看

OpenAI Apps Handbook #

github.com

0 評論7:05 PM在 HN 查看

I went swimming in the ocean of OpenAI's Apps SDK… and came back with a handbook!

Over the past few weeks, I’ve been diving deep into the ChatGPT App SDK: exploring its APIs, tools, and hidden gems. Along the way, I built, broke, fixed, and reimagined a bunch of little experiments.

P.S: Indeed OAIs official docs is the source of truth, this is just a rough notebook

Maybe, I can create a CLI tool to scaffold app?

Linux CLI game, quiz, cheatsheet and map from my mind mapping app #

mindmapsonline.com

2 評論4:26 PM在 HN 查看

Im working on a mind mapping app that would allow for integration of gamification features and be nicer and easier to remember. I was missing more advanced graphics from apps and being able to treat it as my note taking and learning tool.

i.e. how (and why?) should I remember a mind map if it looks the same like all other maps and all i can do is pick couple of shapes and dashed/dotted lines?

this simply doesnt work with my brain so i decided to create something better :) map on page is a preview of this, but im curious about the quiz, typing games and cheatsheet, would love some feedback (and ideas) on such training modes!

Another thing i'd love is some comments about any features that you miss in existing mm's soft. If you're mind mapping enthusiast and interested in beta testing checkout contact form :)

p.s. last week i posted simple map, but i didnt realize i can use SHOWN HN for stuff that ive made and that lets you play with something. It just vanished down the pages so ive added new commands, modified all bad answers to be more realistic, and added some features for quiz and typing game and reposting to get proper feedback and hopefully some testers!

as well, if you're not ready to take lengthy quiz [nearly 180 questions] pick a smaller random generated subset to play with it!

CoordConversions NPM Module for Map Coordinate Conversions #

github.com

1 評論5:51 PM在 HN 查看

I have been working on a project that has multiple repos, all of which have to convert between multiple map coordinate types, so I made an NPM module that allows you to parse and convert between Decimal Degrees, Degrees-Minutes, and Degrees-Minutes-Seconds coordinate types. Niche? Yes. Useful? Also yes (I hope)!

Rewriting Scratch 3.0 from scratch in Lua (browser-free native runtime) #

github.com

0 評論5:38 PM在 HN 查看

Built a native Scratch 3.0 runtime in Lua that runs .sb3 projects without a browser.

Why? Browser sandboxing prevents access to hardware features (haptics, sensors, fine-grained perf controls). Native runtime gives you direct hardware access and lets you deploy to consoles, handhelds, embedded devices. Also means much smaller binaries (LÖVE is ~7MB vs 50-100MB for Electron).

How it works:

- Scratch blocks compile to IR, then optimize, then generate Lua

- LuaJIT executes the compiled code

- Coroutine-based threading for concurrent scripts

- Lazy loading + LRU cache for memory management

- SVG support via resvg FFI

~100% compatible with Scratch 3.0 blocks. Extensions that need JavaScript won't work (no Music, TTS, Video Sensing), but core blocks are there.

Built on LÖVE framework, so it's cross-platform (desktop, mobile, gaming devices).

Still rough around the edges (user input not implemented yet, cloud variables only work locally), but it runs real Scratch projects today.

Ball X Pit – Physics Roguelite with 42 Ball Evolutions #

ballxpit.net

0 評論12:44 PM在 HN 查看

iny team (3 devs, 1 artist) here—sharing our first game: BALL x PIT, a survival roguelite focused on physics-based ball combat. Key bits: Unique ball types (Bomb, Black Hole, etc.) with distinct bounce/effects 42 evolutions via clear fusion (no random unlocks) Base-building for permanent run perks Launched 5 days: 300k+ sales, 95% Steam positive. On PC/PS5/Xbox (Game Pass free)/Switch ($14.99). Happy to answer questions, and we’re prepping an update with more fusions. Thanks for checking!

Permit Watch, Turning Ireland's work permits into a job-demand proxy #

permitwatch.ie

0 評論9:21 PM在 HN 查看

I'm a Stamp 1G student in Ireland. DETE's data is gold but buried in Excels. Built this to filter/search: HSE's 12,501 health permits (2024), IT trends (6,788 issued), CSRI success odds (e.g., 88.8% for US devs at Meta Dublin).

Proxy insight: High permits = hot sectors/companies for non-EU talent.

100% DETE open data. Feedback?

Linux Smart Directories Navigation #

github.com

0 評論2:59 AM在 HN 查看

Get new and fixed of

Linux smart directories navigation

Interactive RISC-V CPU Visualizer (Sequential and Pipelined) #

mostlykiguess.github.io

0 評論6:04 AM在 HN 查看

I built an interactive RISC-V CPU visualizer for a course that lets you explore how instructions move through both a sequential and a 5-stage pipelined processor.

You can step through execution, watch data hazards resolve, and see how branching and forwarding work in real time.

The goal is to make CPU architecture learning more intuitive for students (and anyone who likes poking around pipelines). The whole verilog code and implementation details are available on the project report!

Right now, the demo supports basic arithmetic, memory, and branch instructions, and includes two pre-programmed examples (a basic ALU demo and a Fibonacci pipeline program).

Would love feedback on how to handle branching better, I remember crushing it due to deadline and it is not handled properly, but haven't looked at the code in-depth after that :p. I also want to do something similar for OS and write from scratch, been seeing a lot of post so any recommendations are appreciated :)

Visualizers: https://mostlykiguess.github.io/RISC-V-Processor-Implementat... https://mostlykiguess.github.io/RISC-V-Processor-Implementat...

Code: https://github.com/MostlyKIGuess/RISC-V-Processor-Implementa...

Web extension to remove social metrics on web #

trashpandaextension.com

0 評論3:10 PM在 HN 查看

I've been building Trash Panda, a web extension that removes "social metrics": number of likes, subscribers, followers, stars, favorites, upvotes, downvotes, etc.

When I was building this, I had a few major sites in mind, but I was surprised at how pervasive these social metrics are across the web. When the extension removes them, the web feels calmer.

Extension has a 30-day free trial.

Please let me know what you think - still fairly early stage!

Thymis.io Device management – images pre-loaded with your applications #

thymis.io

0 評論4:48 PM在 HN 查看

VS Code extension to run/debug Go table tests #

github.com

0 評論4:59 PM在 HN 查看

Hey HN, Over the weekend I built a small VS Code extension that lets you run/debug individual subtests in Go table tests — something I’ve often wished the official extension supported.

Unlike other extensions I’ve seen, it doesn’t just rely on regex and assumed field names but it does a bit of structural analysis by following testing.T references.

If you’re a Go developer and this sounds useful, I’d love to hear your feedback!

C# analyzer for error handling patterns in your including call graph #

github.com

0 評論12:28 PM在 HN 查看

Auto-generate stock research reports from SEC and industry publicaitons #

app.deepvalue.tech

0 評論11:12 AM在 HN 查看

I built a tool that produces stock research reports in ~5 minutes by combining SEC filings, curated industry sources, and live financial data. Demo: https://www.youtube.com/watch?v=2wCM44B6iH8

Instead of relying on OpenAI’s Deep Research (too costly, ~$10/run), I built my own agents that:

- Parse 10-Ks and 10-Qs directly from the SEC

- Search curated industry publications (avoiding noisy market news)

- Pull financials from Financial Modeling Prep

A combiner agent then outputs:

- A one-page summary

- A detailed report with citations

- Financials talbes + graphs

This is NOT a stock tip generator or a replacement for your own judgment. Think of it as a scalable junior analyst who does the research part for you.

You can sign up (3 free credits) or just comment a ticker and I’ll share a sample report. Would love feedback on whether this fits your workflow, especially from folks who do deep equity research.

Redis-Automerge: CRDT Documents for Redis #

github.com

0 評論3:12 PM在 HN 查看

redis-automerge is a redis module that adds real-time collaboration to json-like documents using the automerge CRDT library.

'Elon's X page without politics': LLM-based content filtering in Chrome #

chromewebstore.google.com

0 評論12:52 PM在 HN 查看

Hi!

As a side-project this summer (partly to have an excuse to try out Claude Code ;D), I built a Chrome extension that hides content that doesn't match user preferences. Example usage: https://youtu.be/japjNSU3O7A

The content + preferences are sent to an LLM, which decides if the content is relevant. You can use your own OpenRouter API key, but there's also a 'free tier' which uses my key (there's a daily quota with this option, though).

The extension now also supports including images and video thumbnails in the filtering, although this is only available when using your own OpenRouter key. The code is available at: https://github.com/jac08h/great_filter

Feedback is appreciated!

Build TypeScript backends and SDKs with up to 90% less code #

js20.dev

0 評論12:51 PM在 HN 查看

My bootstrapped startup for co-parents was just featured in WIRED #

1 評論12:50 PM在 HN 查看

Hi HN, I’m Sol (YC 2006). I built BestInterest to help co-parents communicate peacefully after divorce or high-conflict relationships.

The idea came from personal experience — a painful divorce and difficult co-parenting communication. Courts often tell co-parents to keep things business-like and child-focused, which sounds simple but is brutally hard in practice. I realized AI can sometimes do what humans can’t — remove emotion from the loop. It started as a simple theory — that AI could actually prevent emotional abuse in digital communication.

I’d taken a break from tech after years at Google (I was a PM there), but eventually brushed off the dusties and decided: I’ll build it myself.

THE STACK

Google Cloud + Firebase + Gemini (with some OpenAI functionality still in place). Front end: FlutterFlow. I bootstrapped everything — no funding, no team at first, just persistence, a supportive partner, and late nights after my kids were asleep. One upside of being a co-parent is suddenly a lot of kid-free time to think!

Early on, I knew I wanted an advisor with deep expertise in abuse recovery. During my own healing, Dr. Ramani Durvasula’s YouTube videos were life-changing, so she topped my “never-going-to-happen” list. I cold-emailed her — and to my surprise, she said yes.

Yesterday, WIRED featured our story: “Divorced? With Kids? And an Impossible Ex? There’s AI for That.” Side note: in the article, our leading competitor acknowledged using users’ personal correspondence for training data — which was… surprising.

It’s surreal seeing something that began with personal pain now helping others in such a profound way. I get emails every week from users saying the app has “literally changed their lives.” It’s incredibly gratifying.

I’m learning as I go — building in a space this sensitive has challenged me in many ways and shown just how deeply this kind of technology is needed.

When used with care and intention, AI can genuinely help marginalized communities — people navigating conflict, isolation, or trauma. I was “fortunate,” in a strange way, to have lived this pain firsthand; it helped me understand what needed to be built for my niche. AI has just as much potential to create harm or false information as it does to bring light to dark places — protecting victims and helping people find safety in their communication.

Happy to talk about any of these:

- Bootstrapping a consumer AI app solo

- Restarting life as an entrepreneur after kids, a divorce, and an eight-year hiatus

- Transitioning to being a full-time dad

- Growing a real subscriber base in a niche without a marketing budget — leveraging AI tools and SEO

- Building for a legally and emotionally complex community

- Using AI to protect against abuse — designing filters that help without over-censoring

- Breaking into a quasi-regulated industry where many assume court approval is required just to operate

- Or what it’s like designing tech for the most emotionally charged messages imaginable

AMA — happy to talk about the journey, the challenges, or anything else that resonates.

EM DASH DESTROYER 4001 #

basepurpose.com

0 評論5:59 PM在 HN 查看

a fun, industrial-strength machine that burns em dashes (—) and replaces them with commas.

I built an open source LLM integration for PostgreSQL #

github.com

0 評論3:14 PM在 HN 查看

More often than not we pre-process data using an LLM before inserting them into a column/row on the database, for example translating text and storing the different languages or running OCR on a document and storing the structured output or using Classification to label user messages/documents before storing it.

Many of this use cases could happen on the database rather than a dedicated pipeline. With the motivation to allow Postgres to do more by extending it. We built Postgres-LLM which is a trigger than can conditionally be set on specific columns and executes a well defined tasks on insert/update that runs an LLM and stores the output on another column or updates the initial column.

This project supports any LLM that is OpenAI Chat API compatible. I was motivated to create this project to showcase our recently launched LLM, Interfaze.ai as it was trained to work on developer tasks like OCR, translation, classification and more. However, you can use any LLM of your choice by replacing the URL and API key on setup.

I built a triple-agent LLM system that verifies its own work #

0 評論12:56 PM在 HN 查看

Hi HN,

Six months ago, I asked Gemini to "send my weekly report to the team." It replied: " Email sent successfully"—but the email was never sent. The attachment was wrong. Nobody told me.

That's when I realized: *LLMs lie about their own execution.*

---

*The Problem:*

When you ask an LLM to automate multi-step tasks (search file → attach → send), it cheerfully reports success even when: - The file doesn't exist (hallucinates the ID) - The API call failed silently - Permissions were denied

Single-LLM systems have no incentive to admit failure; they optimize for appearing helpful, not for being correct.

---

*My Solution: Don't Let the LLM Grade Its Own Homework*

I built PupiBot with three separate agents that cannot collude, ensuring *the agent that executed the step is NOT the one verifying it succeeded.*

The architecture is simple:

* *CEO Agent (Planner, Gemini Flash):* Generates the execution plan (No API access). * *COO Agent (Executor, Gemini Pro):* Executes steps, calls 81 Google APIs, returns raw API responses. * *QA Agent (Verifier, Gemini Flash):* *After EVERY critical step, validates success with real, independent API calls.* Triggers retry if verification fails.

*Real Example (Caught & Fixed):* User: "Email last month's sales report to Alice" * Search Drive: Not found * *QA Agent:* "Step failed. Retries with fuzzy search." * Finds: "Q3\_Sales\_Final\_v2.pdf" | *QA Agent:* "File verified. Proceed." * Sends email | *QA Agent:* "Email delivered. Attachment confirmed."

It's like code review: you don't approve your own PRs.

---

*Current Implementation & Transparency:*

* *Open Source*: MIT License, Python 3.10+ * *APIs*: Google Workspace (Gmail, Drive, Contacts, Calendar, Docs). * *Reliability (Self-Tested):* Baseline (single Gemini Pro) was ~70% success. PupiBot (triple-agent) achieves *~92% success* on same tasks. * *Known Limitation*: Google-only, 3x LLM overhead (tradeoff: reliability > speed), early stage.

---

*Why I'm Sharing This (My Garage Story):*

I'm not a programmer, I have no formal CS degree. My development process was simple: I'd use PupiBot as my daily assistant, manually log every error, and bring that "bug report" to my AI assistants (Claude, Gemini) to fix.

PupiBot is my 'custom car' built in the garage, fueled by passion and persistence. I’m finally opening the door to invite the real mechanics (you, HN) to examine the engine.

*What I'd Love from HN:* 1. *Feedback* on the independent QA agent pattern. 2. *Benchmarking ideas* for rigorous evaluation. 3. *Architectural critiques.* Where's the weak link?

---

*Links:* - GitHub: https://github.com/PupiBott/PupiBot1.0 - Quick Demo (1:44 min): https://youtube.com/shorts/wykKckwaukY?si=0xdn7rM6B2tMAIPw - Architecture Docs: https://github.com/PupiBott/PupiBot1.0/blob/main/ARCHITECTUR...

Built with by a self-taught technology enthusiast in Chile Special thanks to Claude Sonnet 4.5 for being my coding partner throughout this journey

Clockwork – Intelligent, Composable Infrastructure Primitives in Python #

github.com

0 評論12:47 PM在 HN 查看

I've been working on Clockwork, a Python library for composable infrastructure blocks that lets you dial the AI involvement up or down per resource.

The core idea: Allows you to build complex infra components from basic building blocks with a knob on how much "intelligence" you want in it.

``` # Specify everything yourself nginx = DockerResource( image="nginx:1.25-alpine", ports=["8080:80"], volumes=["/configs:/etc/nginx"] )

# Just set constraints, AI fills the rest nginx = DockerResource( description="web server with caching", ports=["8080:80"] )

  # Or just describe it
  nginx = DockerResource(
      description="web server for static files",
      assertions=[HealthcheckAssert(url="http://localhost:8080")]
  )

```

Same resource type, you pick the level of control. What I find tedious (picking nginx vs caddy vs httpd) you might care deeply about. So every resource lets you specify what matters to you and skip what doesn't.

It's built on Pulumi for deployment, uses Pydantic for declarative specifications, and works with local LLMs (LM Studio) and cloud-based such as OpenRouter.

Also has composable resources - group related things together:

``` BlankResource(name="dev-stack", description="Local dev environment").add( DockerResource(description="postgres", ports=["5432:5432"]), DockerResource(description="redis", ports=["6379:6379"]), DockerResource(description="api server", ports=["8000:8000"]) ) ```

The AI sees the whole group and configures things to work together. Or you can .connect() independent resources for dependency ordering and auto-generated connection strings (this is still WIP as is the whole project but I'm currently thinking of a mechanism of "connecting" things together appropriately).

Repo: https://github.com/kessler-frost/clockwork

It's early (v0.3.0) and I'm still figuring out what works. Main questions:

1. The "adjustable AI" concept - is this useful or confusing? 2. Which resources/features would be most valuable next?

Would love to hear if this resonates with anyone or if I'm solving a problem nobody has.

Weak Legacy 2 Guide – Tier Lists, Codes and Trello #

weaklegacy2.com

0 評論12:47 PM在 HN 查看

Hey HN, If you play Weak Legacy 2 (the Roblox Demon Slayer-inspired RPG), this site’s been a go-to for me: daily-updated codes, full clan/breathing tier lists, and the official Trello roadmap—all tied to UPD 5.0. Key bits: 2 active codes (verified Oct 2025) Tier lists for 20 clans + 14 breathings Live reset timer (15h left now) + stock updates Saves time hunting for info across Discord/Reddit. Thought fellow players might find it useful!

ICESight – Computer Vision Tool for Detecting and Mapping ICE Activity #

realtimefascism.com

0 評論4:28 PM在 HN 查看

Hi HN, I built this after seeing ICEBlock get removed from the App Store.

ICESight is a photo-based tool that uses AI to handle moderation and verification, with human users acting as verifiers. When someone uploads a photo of an ICE sighting, it runs through an object detection model to find possible agents or vehicles. Each detected area is then sent to another AI service that checks if the scene looks authentic.

If it passes those checks, the system saves visual embeddings for each agent in a database and compares them against future uploads. Verified photos are added to a public map where other users can review and confirm sightings.

I hope this can become a useful public dataset that helps improve transparency and public safety. Feedback very welcome!

2025年10月28日 的 Show HN

Ordered – A sorted collection library for Zig #

My bootstrapped startup for co-parents was just featured in WIRED #

I built a triple-agent LLM system that verifies its own work #

2025年10月28日的 Show HN