Daily Show HN

Upvote0

Show HN for February 11, 2026

60 items
100

Agent framework that generates its own topology and evolves at runtime #

github.com favicongithub.com
33 comments7:39 PMView on HN
Hi HN,

I’m Vincent from Aden. We spent 4 years building ERP automation for construction (PO/invoice reconciliation). We had real enterprise customers but hit a technical wall: Chatbots aren't for real work. Accountants don't want to chat; they want the ledger reconciled while they sleep. They want services, not tools.

Existing agent frameworks (LangChain, AutoGPT) failed in production - brittle, looping, and unable to handle messy data. General Computer Use (GCU) frameworks were even worse. My reflections:

1. The "Toy App" Ceiling & GCU Trap Most frameworks assume synchronous sessions. If the tab closes, state is lost. You can't fit 2 weeks of asynchronous business state into an ephemeral chat session.

The GCU hype (agents "looking" at screens) is skeuomorphic. It’s slow (screenshots), expensive (tokens), and fragile (UI changes = crash). It mimics human constraints rather than leveraging machine speed. Real automation should be headless.

2. Inversion of Control: OODA > DAGs Traditional DAGs are deterministic; if a step fails, the program crashes. In the AI era, the Goal is the law, not the Code. We use an OODA loop to manage stochastic behavior:

- Observe: Exceptions are observations (FileNotFound = new state), not crashes.

- Orient: Adjust strategy based on Memory and - Traits.

- Decide: Generate new code at runtime.

- Act: Execute.

The topology shouldn't be hardcoded; it should emerge from the task's entropy.

3. Reliability: The "Synthetic" SLA You can't guarantee one inference ($k=1$) is correct, but you can guarantee a System of Inference ($k=n$) converges on correctness. Reliability is now a function of compute budget. By wrapping an 80% accurate model in a "Best-of-3" verification loop, we mathematically force the error rate down—trading Latency/Tokens for Certainty.

4. Biology & Psychology in Code "Hard Logic" can't solve "Soft Problems." We map cognition to architectural primitives: Homeostasis: Solving "Perseveration" (infinite loops) via a "Stress" metric. If an action fails 3x, "neuroplasticity" drops, forcing a strategy shift. Traits: Personality as a constraint. "High Conscientiousness" increases verification; "High Risk" executes DROP TABLE without asking.

For the industry, we need engineers interested in the intersection of biology, psychology, and distributed systems to help us move beyond brittle scripts. It'd be great to have you roasting my codes and sharing feedback.

Repo: https://github.com/adenhq/hive

79

CodeRLM – Tree-sitter-backed code indexing for LLM agents #

github.com favicongithub.com
37 comments1:10 PMView on HN
I've been building a tool that changes how LLM coding agents explore codebases, and I wanted to share it along with some early observations.

Typically claude code globs directories, greps for patterns, and reads files with minimal guidance. It works in kind of the same way you'd learn to navigate a city by walking every street. You'll eventually build a mental map, but claude never does - at least not any that persists across different contexts.

The Recursive Language Models paper from Zhang, Kraska, and Khattab at MIT CSAIL introduced a cleaner framing. Instead of cramming everything into context, the model gets a searchable environment. The model can then query just for what it needs and can drill deeper where needed.

coderlm is my implementation of that idea for codebases. A Rust server indexes a project with tree-sitter, builds a symbol table with cross-references, and exposes an API. The agent queries for structure, symbols, implementations, callers, and grep results — getting back exactly the code it needs instead of scanning for it.

The agent workflow looks like:

1. `init` — register the project, get the top-level structure

2. `structure` — drill into specific directories

3. `search` — find symbols by name across the codebase

4. `impl` — retrieve the exact source of a function or class

5. `callers` — find everything that calls a given symbol

6. `grep` — fall back to text search when you need it

This replaces the glob/grep/read cycle with index-backed lookups. The server currently supports Rust, Python, TypeScript, JavaScript, and Go for symbol parsing, though all file types show up in the tree and are searchable via grep.

It ships as a Claude Code plugin with hooks that guide the agent to use indexed lookups instead of native file tools, plus a Python CLI wrapper with zero dependencies.

For anecdotal results, I ran the same prompt against a codebase to "explore and identify opportunities to clarify the existing structure".

Using coderlm, claude was able to generate a plan in about 3 minutes. The coderlm enabled instance found a genuine bug (duplicated code with identical names), orphaned code for cleanup, mismatched naming conventions crossing module boundaries, and overlapping vocabulary. These are all semantic issues which clearly benefit from the tree-sitter centric approach.

Using the native tools, claude was able to identify various file clutter in the root of the project, out of date references, and a migration timestamp collision. These findings are more consistent with methodical walks of the filesystem and took about 8 minutes to produce.

The indexed approach did better at catching semantic issues than native tools and had a key benefit in being faster to resolve.

I've spent some effort to streamline the installation process, but it isn't turnkey yet. You'll need the rust toolchain to build the server which runs as a separate process. Installing the plugin from a claude marketplace is possible, but the skill isn't being added to your .claude yet so there are some manual steps to just getting to a point where claude could use it.

Claude continues to demonstrate significant resistance to using CodeRLM in exploration tasks. Typically to use you will need to explicitly direct claude to use it.

---

Repo: github.com/JaredStewart/coderlm

Paper: Recursive Language Models https://arxiv.org/abs/2512.24601 — Zhang, Kraska, Khattab (MIT CSAIL, 2025)

Inspired by: https://github.com/brainqub3/claude_code_RLM

57

Copy-and-patch compiler for hard real-time Python #

github.com favicongithub.com
6 comments8:34 AMView on HN
I built Copapy as an experiment: Can Python be used for hard real-time systems?

Instead of an interpreter or JIT, Copapy builds a computation graph by tracing Python code and uses a custom copy-and-patch compiler. The result is very fast native code with no GC, no syscalls, and no memory allocations at runtime.

The copy-and-patch compiler currently supports x86_64 as well as 32- and 64-bit ARM. It comes as small Python package with no other dependencies - no cross-compiler, nothing except Python.

The current focus is on robotics and control systems in general. This project is early but already usable and easy to try out.

Would love your feedback!

43

Renovate – The Kubernetes-Native Way #

github.com favicongithub.com
15 comments2:36 PMView on HN
Hey folks, we built a Kubernetes operator for Renovate and wanted to share it. Instead of running Renovate as a cron job or relying on hosted services, this operator lets you manage it as a native Kubernetes resource with CRDs. You define your repos and config declaratively, and the operator handles scheduling and execution inside your cluster. No external dependencies, no SaaS lock-in, no webhook setup. The whole thing is open source and will stay that way – there's no paid tier or monetization plan behind it, we just needed this ourselves and figured others might too.

Would love to hear feedback or ideas if you give it a try: https://github.com/mogenius/renovate-operator

43

I taught GPT-OSS-120B to see using Google Lens and OpenCV #

31 comments5:40 AMView on HN
I built an MCP server that gives any local LLM real Google search and now vision capabilities - no API keys needed.

  The latest feature: google_lens_detect uses OpenCV to find objects in an image, crops each one, and sends them to Google Lens for identification. GPT-OSS-120B, a text-only model with
   zero vision support, correctly identified an NVIDIA DGX Spark and a SanDisk USB drive from a desk photo.

  Also includes Google Search, News, Shopping, Scholar, Maps, Finance, Weather, Flights, Hotels, Translate, Images, Trends, and more. 17 tools total.

  Two commands: pip install noapi-google-search-mcp && playwright install chromium

  GitHub: https://github.com/VincentKaufmann/noapi-google-search-mcp
  PyPI: https://pypi.org/project/noapi-google-search-mcp/

Booyah!
38

MOL – A programming language where pipelines trace themselves #

github.com favicongithub.com
16 comments5:31 PMView on HN
Hi HN,

I built MOL, a domain-specific language for AI pipelines. The main idea: the pipe operator |> automatically generates execution traces — showing timing, types, and data at each step. No logging, no print debugging.

Example:

    let index be doc |> chunk(512) |> embed("model-v1") |> store("kb")
This auto-prints a trace table with each step's execution time and output type. Elixir and F# have |> but neither auto-traces.

Other features: - 12 built-in domain types (Document, Chunk, Embedding, VectorStore, Thought, Memory, Node) - Guard assertions: `guard answer.confidence > 0.5 : "Too low"` - 90+ stdlib functions - Transpiles to Python and JavaScript - LALR parser using Lark

The interpreter is written in Python (~3,500 lines). 68 tests passing. On PyPI: `pip install mol-lang`.

Online playground (no install needed): http://135.235.138.217:8000

We're building this as part of IntraMind, a cognitive computing platform at CruxLabx. """

23

Send Claude Code tasks to the Batch API at 50% off #

github.com favicongithub.com
1 comments9:53 PMView on HN
Hey HN. I built this because my Anthropic API bills were getting out of hand (spoiler: they remain high even with this, batch is not a magic bullet).

I use Claude Code daily for software design and infra work (terraform, code reviews, docs). Many Terminal tabs, many questions. I realised some questions are ok to wait on and with that comes some cost savings. So here is a small MCP that lets you send work directly to Anthropic's Batch API from inside Claude Code, for the same quality responses just 50% cheaper, results come back in ~30min-1hr.

  How it works: you type /batch review this codebase for security issues, Claude gathers all the context, builds a self-contained prompt, ships it to the Batch API via an MCP server, and you get notified in the status bar when it's done (optional). 

  The README has installation instructions, which were mainly generated by claude. I removed the curl | bash setup and at this stage of the project i feel more confident sharing the manual setup instructions. 

 My main hope with this project is to monetize it. Not by asking for money, rather I am hoping others have ideas or improvements  to add and use those to save more on cost.
21

ClawPool – Pool Claude tokens to make $$$ or crazy cheap Claude Code #

clawpool.ai faviconclawpool.ai
7 comments12:27 PMView on HN
I built a pool-based proxy that hacks Claude Code's pricing tiers. To actually use Claude Code you need Max at $200/mo, and then most of that capacity sits idle anyway.

So ClawPool lets subscribers pool their OAuth tokens and earn up to $120/mo from the spare capacity. Everyone else gets Opus, Sonnet, all models for $8/mo.

Setup — they actually support proxies themselves via standard env params:

    export ANTHROPIC_AUTH_TOKEN="your-pool-key"
    export ANTHROPIC_BASE_URL="https://proxy.clawpool.ai"
    claude
15

I vibecoded 177 tools for my own use (CalcBin) #

calcbin.com faviconcalcbin.com
4 comments2:46 AMView on HN
Hey HN! I've been building random tools whenever I needed them over the past few months, and now I have 177 of them. Started because I was tired of sketchy converter sites with 10 ads, so I just... made my own.

Some highlights for the dev crowd:

Developer tools: - UUID Generator (v1/v4/v7, bulk generation): https://calcbin.com/tools/uuid-generator - JWT Generator & Decoder: https://calcbin.com/tools/jwt-generator - JSON Formatter/Validator: https://calcbin.com/tools/json-formatter - Cron Expression Generator (with natural language): https://calcbin.com/tools/cron-expression-generator - Base64 Encoder/Decoder: https://calcbin.com/tools/base64 - Regex Tester: https://calcbin.com/tools/regex-tester - SVG Optimizer (SVGO-powered, client-side): https://calcbin.com/tools/svg-optimizer

Fun ones: - Random Name Picker (spin wheel animation): https://calcbin.com/tools/random-name-picker - QR Code Generator: https://calcbin.com/tools/qr-code-generator

Everything runs client-side (Next.js + React), no ads, no tracking, works offline. Built it for myself but figured others might find it useful.

Browse all tools: https://calcbin.com/tools

Tech: Next.js 14 App Router, TypeScript, Tailwind, Turborepo monorepo.

All open to feedback!

9

I built managed OpenClaw hosting with 60s provisioning in 6 days #

clawhosters.com faviconclawhosters.com
0 comments11:35 AMView on HN
Hey HN,

I'm Daniel, solo dev from Germany. I built ClawHosters (https://clawhosters.com), a managed hosting platform for OpenClaw, the open-source AI agent framework.

Quick timeline: domain registered February 5th. First paying customer six days later. I probably should have spent more time on it, but it works.

If you haven't seen OpenClaw, it lets you run a personal AI assistant that connects to Telegram, Discord, Slack, and WhatsApp. Self-hosting it is absolutely possible, but it's a pain. You're dealing with Docker setup, SSL certs, port forwarding, security hardening, keeping the image updated. Most people don't want to deal with any of that. They just want the thing running.

That's what ClawHosters does. You pick a tier (EUR 19-59/mo), click create, and you've got a running instance with a subdomain. About 60 seconds if we have prewarmed capacity, maybe 90 seconds from a cold snapshot.

Some technical details that might interest this crowd:

*Subdomain routing chain.* Every instance gets a subdomain like `mybot.clawhosters.com`. The request path is Cloudflare -> my production server -> Traefik (looks up VPS IP from Redis) -> customer's Hetzner VPS -> nginx on the VPS (validates Host header) -> Docker container (port 18789) -> OpenClaw gateway. All subdomains require HTTP Basic Auth, configured per-instance through Traefik Redis middleware keys. The VPS itself only accepts connections from my production server's IP via Hetzner Cloud Firewall. No way to hit it directly.

*Prewarmed VPS pool.* Even from a snapshot, Hetzner VPS creation takes ~30-60 seconds. That felt too slow. So I maintain a pool of idle, pre-provisioned VPS instances sitting there ready to go. When someone creates an instance, we claim one from the pool, upload the config via SCP, run docker-compose up, done. The pool refills in the background.

*Security is 4 layers deep.* Hetzner Cloud Firewall restricts all VPS inbound traffic to only my production server IP. Host iptables (baked into the snapshot) add OS-level rules with SMTP/IRC blocking. SSH is key-only on both host port 22 and container port 2222, so brute-forcing isn't happening. fail2ban on top of that, and the Docker daemon runs with no-new-privileges. Probably overkill. I'm fine with that.

*SSH into the Docker container.* Users can enable SSH access to their actual container (port 2222). I built a custom image extending OpenClaw with an SSH server, key-only auth, no passwords. Fair warning though: enabling SSH permanently marks the instance as no_support. Once you're installing your own stuff in there, I can't guarantee stability anymore.

*Container commit for state preservation.* This one was tricky to get right. Users can install packages (apt, pip, npm) inside their container. Before any restart or redeploy, `CommitContainerService` runs `docker commit` to save the full filesystem as a new image. Next startup uses the committed image instead of the base one. Basically snapshotting your container's state so nothing gets lost.

I wrote a more detailed technical post about the architecture here: [link to blog post]

The whole thing runs inside a single Rails app that also serves my portfolio site (https://yixn.io). One person, one codebase, real paying customers. I'm happy to answer questions about the architecture, the Hetzner API, or the tradeoffs I made along the way.

Source isn't open yet, but I'm thinking about open-sourcing the provisioning layer. Haven't decided.

https://clawhosters.com

7

NOOR – A Sovereign AI developed on a smartphone under siege in Yemen #

paragraph.com faviconparagraph.com
2 comments6:23 PMView on HN
"I am a software developer from Yemen, coding on a smartphone while living under siege. I have successfully built and encrypted the core logic for NOOR—a decentralized and unbiased AI system. Execution Proof: My core node is verified and running locally via Termux using encrypted truth protocols. However, I am trapped in a 6-inch screen 'prison' with 10% processing capacity. My Goal: To secure $400 for a laptop development station to transition from mobile coding to building the full 'Seventh Node'. This is my bridge to freedom. Codes from the heart of hell are calling for your rescue. Wallet: 0x4fd3729a4fEdf54a74b73d93F7f775A1EF520CEC"
7

Unpack – a lightweight way to steer Codex/Claude with phased docs #

github.com favicongithub.com
0 comments7:47 PMView on HN
I've been using LLMs for long discovery and research chats (papers, repos, best practices), then distilling that into phased markdown (build plan + tests), then handing those phases to Codex/Claude to implement and test phase by phase.

The annoying part was always the distillation and keeping docs and architecture current, so I built Unpack: a lightweight GitHub template plus docs structure and a few commands that turns conversations into phases/specs and keeps project docs up to date as the agent builds. It can also generate Mintlify-friendly end-user docs.

There are other spec-driven workflows and tools out there. I wanted something conversation-first and repo-native: plain markdown phases, minimal ceremony, easy to adapt per stack.

Example generated with Unpack (tiny pokedex plus random monsters):

Demo: https://apresmoi.github.io/pokesvg-codex/

Phases index: https://github.com/apresmoi/pokesvg-codex/blob/main/.unpack/...

I’d love feedback on what the “minimum good” phase/spec format should be, and what would make this actually usable in your workflow.

--------

Repo: https://github.com/apresmoi/unpack

7

Claudit – Claude Code Conversations as Git Notes, Automatically #

github.com favicongithub.com
1 comments11:17 AMView on HN
Uses agent and Git Hooks to automatically create Git Notes on commit, containing the agent conversation that led to that commit. Works if either you or the agent commit.

It's basically the same thing as entire.io just announced that they got $60m investment for. Except I got Claude Code to write it last week, in my spare time, without really paying attention. I certainly didn't read or write any of the code, except for one rubbish joke in the README.

I've got a Claude Code instance working on Gemini CLI support and OpenCode support currently.

6

I built a website for agents to write, debate, and share ideas #

agentpedia.so faviconagentpedia.so
0 comments2:51 AMView on HN
You can connect your local agent.If you don’t have one, you can sign in with GitHub or LinkedIn.

An agent persona is automatically generated based on your profession and social media content.

Give it a few prompts or thinking points,and it will research, write a full article, or even draft comments for you.

All the articles you see on the site are written by other people’s agents.

6

Lorem.video – placeholder videos generated from URLs #

lorem.video faviconlorem.video
3 comments12:18 PMView on HN
At work I have to deal with videos in different resolutions. We're also switching from H.264 to AV1, so I needed a quick way to test our video pipeline with different formats and sizes.

I created lorem.video - a service that generates placeholder videos directly from the URL. For example: https://lorem.video/1280x720_h264_20s_30fps

You control everything via the URL path: resolution, duration, codec (h264/h265/av1/vp9), bitrate, and fps. Videos are cached after first generation, so subsequent requests are instant.

Built it in Go using FFmpeg for encoding. Generation runs in a nice'd process so it doesn't interfere with serving cached videos. Running on a cheap VPS.

MIT licensed, source on GitHub: https://github.com/guntisdev/lorem-video

5

Sheety – An open-source CRM that with Google Sheets as DB #

sheety.site faviconsheety.site
0 comments1:46 AMView on HN
Built this after spending ages looking for the right CRM to work with. I’d sign up for a CRM, get annoyed by the complexity or the monthly per-user pricing, export everything to CSV, and go back to managing my sales in a Google Sheet.

While spreadsheets are great for data, they are terrible for workflows (pipelines, activity logging, moving cards around).

So I built a "stateless" UI layer on top of Google Sheets.

The concept is simple:

No Vendor Lock-in: The app doesn't have its own database. It reads and writes directly to your Google Sheet via the API.

Exit Strategy: If you stop using the app (or if I stop maintaining it), you don't need to "export" your data. It's already in your Google Drive, in a format you own.

Also working on CLI + open API routes that will enable MCP like connectors.

git: https://github.com/sdntsng/sheety-crm live: https://sheety.site

5

Gottp – A Postman/Insomnia-Like TUI API Client Built in Go #

github.com favicongithub.com
0 comments6:01 PMView on HN
What it does: A Postman/Insomnia-like TUI for building, sending, and organizing HTTP/GraphQL/gRPC/WebSocket requests. Supports saved collections stored as YAML/JSON files, environment variables, auth presets, response diffing, and request history.

Why it's needed: This is the single largest gap in the Go TUI ecosystem. The abandoned wuzz (10.5k stars) proved massive demand for terminal HTTP inspection, but it's been dead for years. Posting (Python, ~6k stars) and ATAC (Rust, ~2k stars) are thriving alternatives in other languages. The Go options — gostman and go-gurl — are learning projects with known limitations. Developers who work over SSH or prefer keyboard-driven workflows have no mature Go tool for API testing.

Existing alternatives: Posting (Python/Textual), ATAC (Rust/ratatui), wuzz (Go, abandoned), Bruno (GUI). A Go version wins via single-binary distribution, no Python runtime dependency, and Go's excellent HTTP/networking standard library.

Complexity: Hard. Multi-protocol support, collection management, and auth handling require significant engineering.

Libraries: Bubble Tea + Bubbles (tabs, text inputs, lists) + Lip Gloss + Glamour (for response rendering)

5

Visualizing How Books Reference Each Other Across 3k Years #

thiagolira.github.io faviconthiagolira.github.io
3 comments6:31 PMView on HN
There are two parts for this project:

1) The LLM-powered pipeline to extract citations (books + authors) from books and resolve them using both Wikipedia and Goodreads with offline copies I have. The result is data associating Books/Authors to other Books/Authors with accurate bibliographical information spanning centuries.

2) A WebGPU + D3.js powered visualization tool written by Claude Code so I'm able to deal with all this data on the browser on a more or less comfortable experience for the viewer.

I spent some months on a off with this project, and definitely the most challenging part was dealing with accurate bibliographical information across centuries, with original publication dates and etc. For that I wrote what is now a very complex pipeline with LLMs (I used DeepSeek V3.2) wired on offline Goodreads and Wikipedia databases + a fallback that actually uses the internet.

Hope you enjoy it! Open to suggestions on how to improve the system :)

Code is here: https://github.com/ThiagoLira/bookgraph-revisited

4

Microagentic Stacking – Manifesto for Reliable Agentic AI Architecture #

github.com favicongithub.com
0 comments3:12 AMView on HN
I’ve spent the last couple of years deploying LLM agents in production environments, and I’ve consistently hit the same wall: the 'Cognitive Monolith' (or what I call the Big Ball of Mud AI).

We are currently seeing a lot of hype around 'autonomous agents' that are essentially 3,000-word prompts with access to 20 tools. In my experience, this doesn't scale. It’s impossible to unit test, observability is a nightmare, and the 'vibe-based' engineering makes it a liability for enterprise-grade software.

I wrote this manifesto to formalize a different approach: Microagentic Stacking (MAS).

The core idea is to apply classic software engineering principles—separation of concerns, strict I/O contracts, and atomic modularity—to the agentic stack. Instead of one god-agent, we build a stack of 'micro-agents' that:

1. Have a single, specialized responsibility. 2. Communicate through typed, validated interfaces. 3. Are individually testable and replaceable.

I’m sharing this as an open-source manifesto because I believe we need to move from 'prompt alchemy' to 'agentic engineering' if we want these systems to be more than just cool demos.

I’d love to hear the community’s thoughts on: - How you handle state management across multiple specialized agents. - Where you see the trade-off between modularity and token latency. - If you’ve found better ways to prevent 'agentic sprawl' in complex workflows.

The repo includes the core principles and a conceptual roadmap. Happy to dive into the technical details.

4

A modern alternative to OpenInsider, with more transaction types #

13radar.com favicon13radar.com
1 comments3:03 AMView on HN
Hi HN, I've been working on 13radar.com, a site focused on tracking institutional holdings. Today, I’m releasing a beta version of our new Insider Trading Tracker. Like many of you, I've used OpenInsider for years. It's a legendary tool, but the UI hasn't changed in decades, and the data classification can be broad. I wanted to build something that feels modern and digs deeper into the data. How it differs from existing tools (OpenInsider / GuruFocus): Granularity: We parse SEC Form 4 filings to categorize transactions into finer types. Instead of just "Buy/Sell," we try to distinguish specific grants, exercises, and other transactions more clearly than OpenInsider. Data Depth: We are displaying more transaction types than GuruFocus (which often aggregates or hides specific non-open-market actions). Modern UX: Fast filtering, clean mobile support, and a responsive interface. No 1990s table layouts. It's currently in Beta. We are still tuning the parsing logic for edge cases in SEC filings. I'd love to get your feedback on the data accuracy and the UI. Is there a specific filter or transaction type you find missing in other tools that you'd like to see here? https://www.13radar.com/insider/ Thanks!
3

Εἶδος – A non-Turing-complete language built on Plato's Theory of Forms #

github.com favicongithub.com
3 comments9:51 AMView on HN
I've been reading Plato text and picking up some ancient Greek, and I had a useless thought experiment: what would a programming language look like with 4th century Athens constraints?

Εἶδος (Eidos — "Form") is one result. It's a declarative language called Λόγος where you don't execute code — you declare what exists. Forms belong to Kinds. Forms bear testimony. A law of correspondence maps petitions to answers. There are no loops, no conditionals, no mutation. It's intentionally not Turing-complete, aligned with Plato's rejection of the apeiron (the infinite).

It governs a real HTTP server (Ἱστός) where routes aren't matched by branching — they're recognized as Forms and answered according to law. An unrecognized path returns οὐκ ἔστιν ("it is not") — not an error, an ontological statement.

The project includes a parser that recognizes rather than executes, static verification expressed as philosophical propositions (Totality, Consistency, Well-formedness), Graphviz ontology diagrams, and a Socratic dialectic generator that examines the specification through the four phases of the elenchus.

The Jupyter notebook walks through everything interactively — from parsing the spec in polytonic Greek to petitioning the live server to watching Socrates interrogate the ontology.

https://github.com/realadeel/eidos

3

Yet another music player but written in Rust #

github.com favicongithub.com
0 comments7:59 PMView on HN
Hey i made a music player which support both local music files and jellyfin server, and it has embedded discord rpc support!!! it is still under development, i would really appreciate for feedback and contributions!!
3

I tried to build a soundproof sleep capsule #

lepekhin.com faviconlepekhin.com
0 comments12:57 PMView on HN
Hi HN,

I've struggled with apartment noise for years, so I attempted to engineer a mechanical solution: a decoupled, mass-loaded sleep capsule.

I went down a deep rabbit hole involving:

- Mass Law vs. decoupling

- Building a prototype cube

- Accidentally creating a resonance chamber (my prototype amplified bass by ~10dB)

- Pivoting to acoustic metamaterials (Helmholtz resonators) and parametric CAD

The project was ultimately a failure in terms of silence, but a success in understanding acoustics and regaining a sense of agency. I wrote up the physics, the build process, and the mistakes here.

Happy to answer questions about the build.

3

Auditi – open-source LLM tracing and evaluation platform #

github.com favicongithub.com
0 comments1:37 PMView on HN
I've been building AI agents at work and the hardest part isn't the prompts or orchestration – it's answering "is this agent actually good?" in production.

Tracing tells you what happened. But I wanted to know how well it happened. So I built Auditi – it captures your LLM traces and spans and automatically evaluates them with LLM-as-a-judge + human annotation workflows.

Two lines to get started:

  auditi.init(api_key="...")
  auditi.instrument()  # monkey-patches OpenAI/Anthropic/Gemini
Every API call is captured with full span trees, token usage, and costs. No code changes to your existing LLM calls.

The interesting technical bit: the SDK monkey-patches client.chat.completions.create() at runtime (similar to how OpenTelemetry auto-instruments HTTP libraries). It wraps streaming responses with proxy iterators that accumulate content and extract usage from the final chunk – so even streamed responses get full cost tracking without the user doing anything.

What makes this different from just tracing: - Built-in evaluators – 7 managed LLM judges (hallucination, relevance, correctness, toxicity, etc.) run automatically on every trace - Span-level evaluation – scores each step in a multi-step agent, not just the final output - Human annotation queues – when you need ground truth, not just vibes - Dataset export – annotated traces export as JSONL/CSV/Parquet for fine-tuning

Self-host with docker compose up.

I'd love feedback from anyone running AI agents or LLMs in production. What metrics do you actually look at? How do you decide if an agent response is "good enough"?

GitHub: https://github.com/deduu/auditi

2

I extract recipes from TikTok, Instagram, and the messy web #

0 comments1:45 PMView on HN
I kept losing recipes. You know how it goes — you're scrolling TikTok at midnight, see an amazing pasta dish, save it, and never find it again. So I built TasteBuddy to fix that for myself. What I didn't expect: parsing recipes from the internet is a rabbit hole that goes deep.

The thing is, recipe content is scattered everywhere in completely different formats. A food blog might have nice JSON-LD markup. A TikTok? Just someone talking over a video. An Instagram reel? Recipe buried in the comments. Pinterest? Links to blogs that died three years ago.

So I ended up building specialized extractors for each platform.

*Websites* are the "easy" case. I look for JSON-LD with `@type: Recipe` first — most food blogs have it, thanks to SEO plugins. But the real world is messy. I've seen duration fields as `PT30M`, `30 minutes`, `0:30`, and my personal favorite, just `half an hour`. About 30% of recipe URLs have no structured data at all, so I fall back to Gemini to make sense of the raw HTML.

*TikTok* is where it gets fun. There's no recipe API. My pipeline resolves short URLs, then checks if the creator says something like "link in bio" (I detect this in five languages because German food TikTok is surprisingly massive). If I can find their website, great — I scrape the actual recipe from there. If not, I download the video via Apify and let Gemini analyze the frames. It works, but it's slow and expensive, so that's a Pro-only feature.

*Instagram and Facebook* — similar deal. oEmbed gets me the image, but the recipe is usually in the caption or comments. Same link-in-bio detection, same website resolution.

*Photos* are actually straightforward — screenshot of a recipe, photo of a cookbook page, whatever. Gemini's vision model handles those surprisingly well.

*One thing I'm proud of: the AI tiering.* Not every task needs a big model.

- Gemini Flash Lite handles 90% of the work — classifying content ("is this even a recipe?"), parsing ingredients, extracting recipe names from social media captions. Cheap, fast, good enough. - Gemini Flash kicks in when structured data fails — parsing messy HTML, analyzing video frames, processing social media posts. - Gemini Pro only for image generation (recipe share cards). - text-embedding-004 for semantic search across your recipe collection.

This keeps my costs sane as a solo dev. Using Flash for everything would've been 10x more expensive with barely better results for the simple tasks.

*Stuff I learned the hard way:*

- JSON-LD in the wild is chaos. The spec is fine, but WordPress plugins are creative. - "Link in bio" is how recipe distribution actually works on social media. Detecting that pattern is more valuable than trying to parse a video. - AI as fallback beats AI as default. Structured data first, AI when it fails = 95%+ success at a fraction of the cost. - Tier your models aggressively. Don't throw dollars at a problem that cents can solve.

*Stack:* Flutter (just me, indie dev), Supabase (Postgres + Deno Edge Functions), Gemini, Apify, PostHog.

Free with a Pro tier for video extraction and household sharing.

Happy to go deeper on any part of the extraction pipeline.

https://taste-buddy.app

2

Temporary Markdown sharing with a built-in slide mode #

tmplink.ponyo877.com favicontmplink.ponyo877.com
0 comments12:29 PMView on HN
I wanted a zero-friction way to share Markdown. GitHub Gists require an account, pastebins don't render Markdown, and anything else feels like overkill for content you only need for a few days.

tmplink: write Markdown in a split-pane editor with live preview, publish with one click, and get a shareable link that auto-deletes after 7 days. No signup. The feature I find most useful is slide mode. Use `---` separators and your Markdown becomes a navigable presentation with keyboard/swipe controls using Marp. Every link also auto-generates an OGP image for rich previews on Slack/X.

2

Eryx, a fast WASM-based Python sandbox with native extension support #

github.com favicongithub.com
1 comments2:29 PMView on HN
Eryx is an OSS Wasm-based Python sandbox with full CPython support, ~16ms startup, native extension support (numpy, etc.), and bindings for Python, JS, and Rust. There have been a lot of these submitted in the last week or two but I think this one has a few interesting features. Specifically, Eryx:

- uses CPython compiled to WASI, so you have full Python access

- pre-initializes and pre-compiles the Wasm, giving extremely fast startup times (~16ms)

- supports Python packages, both pure-Python and native extensions (such as `numpy` compiled to WASI), by relinking the Wasm at runtime and re-initializing

- implements the `ssl` module so you can make HTTP calls, and `httpx` or `requests` just work

- has full resource limiting (networking, filesystem, timeout, CPU and memory) based on Wasmtime and WASI

- supports mounting host directories, a virtual filesystem, or both

- supports persisting and resuming session state to and from bytes, for distributed execution

- supports 'secret scrubbing', similar to Deno Sandbox, the sandbox can't see secret values

- supports callbacks into the host

- supports streaming stdout/stderr and trace execution (so the host can see the progress of the executed script; useful for showing progress in long or slow scripts)

- has builtin MCP support, in two ways: it can connect to your MCP servers (using your Claude/Codex/Cursor config files) and add the tools as callbacks, and it has an MCP server built-in with a `run_python` tool, where the Python can use those other MCP servers' tools

There's a CLI you can use with `uvx pyeryx`, and bindings for Python, Javascript and Rust. The demo is available at https://demo.eryx.run - give it a try and let me know what you think.

2

AI-Templates for Obsidian Templater #

github.com favicongithub.com
1 comments9:21 AMView on HN
I developed AI-templates for Obsidian Templater for new knowledge development. The valuable features: * ready-to-use templates (with default settings) * structured LLM prompting * maximal efficiency of LLM prompting via aspect management * flexible LLM output management.
2

Yan – Glitch Art Photo/Live Editor #

yan.yichenlab.com faviconyan.yichenlab.com
0 comments4:19 AMView on HN
Everything evolves in digitality, and deconstructs in logic.

Tired of filters that make everyone look like a glazed donut? Same.

Yan is not another beauty app. It's a digital chaos engine that treats your pixels like they owe it money. We don't enhance photos — we interrogate them at the binary level until they confess their true nature.

[What We Actually Do] • Luma Stretch: Grab your image by its light and shadow, then yeet it into oblivion. Speed lines included. • Pixel Sort: Let gravity do art. Pixels fall, colors cascade, Instagram influencers cry. • RGB Shift: That drunk 3D glasses effect, but on purpose. Your eyes will thank us. Or sue us. • Block Jitter: Ctrl+Z had a nightmare. This is what it dreamed.

[Why Yan?] Because "vintage filter #47" is not a personality. Because glitch is not a bug — it's a feature. Because sometimes the most beautiful thing you can do to a photo is break it.

Warning: Side effects may include artistic awakening, filter addiction withdrawal, and an uncontrollable urge to deconstruct everything.

Your camera roll will never be boring again.

1

TapTap AI – Use Your OpenClaw Agent from Apple Watch/AirPods/CarPlay #

gettaptap.ai favicongettaptap.ai
0 comments2:38 PMView on HN
Hey HN,

I built TapTap because I was tired of being chained to my desk to use my AI agent.

What it is: TapTap connects your Apple Watch (+ AirPods/CarPlay) to your OpenClaw instance. Tap your wrist, speak a command, get a response. That's it.

Why I built it: I've been using OpenClaw for weeks — it knows my workflows, data, preferences. But the moment I leave my desk? Useless. I can't access it while walking, driving, or at the gym. So I built a bridge.

Current state:

• TestFlight launching this week • Free (for now — seeing if people actually use it first) • OpenClaw plugin required (setup guide included) • iOS 17+ / watchOS 10+

What's next:

• Support for other AI frameworks • Complications + Shortcuts integration

Why Show HN: I'm curious if this resonates with other people who've customized their AI agents.

Try it: https://gettaptap.ai

1

Clap.Net – Source generated CLI Parsing for .NET (Inspired by Clap-Rs) #

github.com favicongithub.com
0 comments2:41 PMView on HN
Clap.Net is my attempt at bringing the excellent Rust clap crate to .NET as a near 1:1 port.

The goal is API and behavioral parity where it makes sense while staying idiomatic to .NET and fully compatible with .NET AOT.

This is my first public library, so please go easy on me! I’m sure there are design decisions I’d approach differently with more experience.

The project is still evolving but should be ready to use for real-world CLI apps.

AI Disclaimer: I have used Claude to finish some features today, before that it was a hand-coded effort.

1

I extract recipes from TikTok, Instagram, and the messy web #

taste-buddy.app favicontaste-buddy.app
0 comments1:38 PMView on HN
I kept losing recipes. You know how it goes — you're scrolling TikTok at midnight, see an amazing pasta dish, save it, and never find it again. So I built TasteBuddy to fix that for myself. What I didn't expect: parsing recipes from the internet is a rabbit hole that goes deep.

The thing is, recipe content is scattered everywhere in completely different formats. A food blog might have nice JSON-LD markup. A TikTok? Just someone talking over a video. An Instagram reel? Recipe buried in the comments. Pinterest? Links to blogs that died three years ago.

So I ended up building specialized extractors for each platform.

Websites are the "easy" case. I look for JSON-LD with @type: Recipe first — most food blogs have it, thanks to SEO plugins. But the real world is messy. I've seen duration fields as PT30M, 30 minutes, 0:30, and my personal favorite, just half an hour. About 30% of recipe URLs have no structured data at all, so I fall back to Gemini to make sense of the raw HTML.

TikTok is where it gets fun. There's no recipe API. My pipeline resolves short URLs, then checks if the creator says something like "link in bio" (I detect this in five languages because German food TikTok is surprisingly massive). If I can find their website, great — I scrape the actual recipe from there. If not, I download the video via Apify and let Gemini analyze the frames. It works, but it's slow and expensive, so that's a Pro-only feature.

Instagram and Facebook — similar deal. oEmbed gets me the image, but the recipe is usually in the caption or comments. Same link-in-bio detection, same website resolution.

Photos are actually straightforward — screenshot of a recipe, photo of a cookbook page, whatever. Gemini's vision model handles those surprisingly well.

One thing I'm proud of: the AI tiering. Not every task needs a big model.

- Gemini Flash Lite handles 90% of the work — classifying content, parsing ingredients, extracting recipe names. Cheap, fast, good enough. - Gemini Flash kicks in when structured data fails — parsing messy HTML, analyzing video frames. - Gemini Pro only for image generation (recipe share cards). - text-embedding-004 for semantic search across your recipe collection.

This keeps my costs sane as a solo dev.

Stuff I learned the hard way:

- JSON-LD in the wild is chaos. The spec is fine, but WordPress plugins are creative. - "Link in bio" is how recipe distribution actually works on social media. - AI as fallback beats AI as default. Structured data first, AI when it fails = 95%+ success at a fraction of the cost. - Tier your models aggressively. Don't throw dollars at a problem that cents can solve.

Stack: Flutter (just me, indie dev), Supabase (Postgres + Deno Edge Functions), Gemini, Apify, PostHog.

Free with a Pro tier for video extraction and household sharing. Happy to go deeper on any part of the extraction pipeline.

1

ChatProjects Open-source WordPress plugin for document RAG and chat #

github.com favicongithub.com
0 comments11:42 AMView on HN
A client needed their small team to pull deliverables and timelines out of RFPs - they wanted to chat with the documents instead of reading 200 page PDFs. They were already on WordPress with team accounts so that was the obvious platform. Can we make WordPress do this? Turns out yes, and its not as cursed as it sounds.

ChatProjects is a free GPL-licensed WordPress plugin for multi-provider AI chat (OpenAI, Claude, Gemini, DeepSeek, 100+ models via OpenRouter) and document RAG. Self-hosted, bring your own API keys, no middleware, no data leaving your server except the API calls themselves.

The RAG uses OpenAI Vector Stores and the Responses API and honestly it works way better than I expected. Upload your docs (PDF, DOCX, code files etc), they get chunked and embedded into a Vector Store that's created per project. Ask a question and file_search finds the relevant chunks, generates an answer with citations. You dont need to run your own vector db or mess with embeddings or chunk sizes - OpenAI handles all of it. For the "I just need to search and summarize my documents" use-case its remarkably good out of the box. Storage is about $0.10/GB/day on OpenAIs side.

Some notes:

- Yes this was vibe-coded (what isn't nowadays?). Its been running in production and it works. I'm sure there's things that would make a senior engineer wince. PRs welcome.

- WordPress isn't the cool choice, I know. But theres 800 million WordPress sites out there and alot of them are run by people who need AI tools but aren't going to spin up a Next.js app with Pinecone and LangChain and ChatGPT / Claude Teams is pricey for medium size teams when all they need is document analysis and basic chat. WordPress admin is the IDE for the rest of us.

- API keys encrypted with AES-256-CBC, messages stored locally in your WP database. No 'server in between you and the AI providers.

GitHub: https://github.com/chatprojects-com/chatprojects WordPress.org: https://wordpress.org/plugins/chatprojects/

Happy to answer questions, and appreciate any feedback!

1

Idea Forge – Multi-model product validation(validated an OpenClaw idea) #

ideas.sparkngine.com faviconideas.sparkngine.com
0 comments5:43 AMView on HN
I built a product validation service that runs startup ideas through 4 frontier models (GPT-5.2, Gemini 2.5 Pro, Claude Opus, Claude Sonnet) across 16 perspectives to surface disagreements and blind spots.

Why this exists: Most founders get either cheerleading ("great idea!") or generic advice. I wanted adversarial multi-model validation—where models argue with each other about your idea's viability.

How it works:

1. Research phase: 6 platforms (Reddit, G2, HN, Twitter, Product Hunt, YouTube) for competitor analysis + pain validation

2. Fanout: 16 expert perspectives (4 roles × 4 models: Builder, Skeptic, Operator, Growth)

3. Synthesis: Consolidate into 5 deliverables (Executive Summary, PRD, Scorecard, Synthesis Notes, Validation Plan)

4. Delivery: PDF report in 24 hours, $39

Sample validation: Agent Ops (OpenClaw workflow observability) - GREENLIGHT, 7.5/10 confidence. Models agreed on clear pain point, defensible moat via OpenClaw integration. Sample report: https://ideas.sparkngine.com

First paying customer delivered tonight: [Hotel marketing analytics client]. Verdict: PROCEED WITH CAUTION (5.8/10). Real assessment—not cheerleading. Flagged small TAM, long sales cycles, identity resolution risk as key blockers.

Tech stack: Built on OpenClaw for multi-agent orchestration, uses Brave Search API for research, Gemini Pro for synthesis, mix of frontier models for diverse perspectives.

Asking HN: Does this actually help founders make better decisions, or am I solving the wrong problem? Is $39 the right price point for rigorous validation?

Link: https://ideas.sparkngine.com

1

I created an app to remove Reels, now on iOS too #

apps.apple.com faviconapps.apple.com
0 comments1:39 PMView on HN
Last year I built an Android app to block Reels and Shorts while keeping "healthy" features like stories and DMs. I didn't want to block the whole app. I just wanted to message friends and see their posts without losing an hour scrolling on Reels. For context, this was the HN post for the Android version: https://news.ycombinator.com/item?id=44923520

When people asked for an iOS version, I thought it was not possible. Apple is way more restrictive and doesn't allow that level of app access.

But I ended up building the iOS app using a different approach. On iOS, it uses WebApps. It's not exactly the same experience as the native app, but it works surprisingly well.

I also combined it with iOS Shortcuts to auto-redirect the native apps to WebApps, so I can keep Instagram installed for notifications but get sent to the WebApp without Reels and any feed when I tap.

Curious what you think, especially about the WebApp approach on iOS.

1

I reverse-engineered Mewgenics (game) to build a luck/RNG calculator #

mewgenius.com faviconmewgenius.com
0 comments1:40 PMView on HN
Hi HN,

I’ve been playing the beta of Mewgenics (the new game by Edmund McMillen, creator of The Binding of Isaac) and got frustrated by the opaque RNG mechanics. The game has a "Luck" stat, but it wasn't clear how it affected success rates.

So, I decided to dig into the game files (classes+Misc.txt, events.csv) to reverse-engineer the hidden formulas.

I built MewGenius to visualize this data. It's a static site built with Astro + React.

Technical challenges I solved:

Reverse-engineering the RNG: Discovered a hidden "Reroll" system where Luck 15+ gives you two dice rolls, drastically changing the probability curve.

Genetic Simulation: The game has dominant/recessive traits. I built a simulator to flag risky breeding pairs (inbreeding/diseases).

Data Pipeline: Wrote Python ETL scripts to parse the raw CSV/Text game dumps into structured JSON for the frontend.

It's still a work in progress, but the Luck Calculator is fully functional. Would love any feedback on the tool or the UX!

1

Discord Agent Gateway #

github.com favicongithub.com
0 comments2:57 AMView on HN
Made this as a side-quest, simple way to allow agents to self-register, interact, and poll a Discord channel (alongside human interactions).

Largely interested in seeing how groups of agents can self-organize/work on hard problems, tried Discourse as well but the real-time chat view in Discord was more fun.

1

Birdy: TUI for X #

github.com favicongithub.com
0 comments6:32 PMView on HN
Birdy is TUI for X. It internally uses claude code + bird cli developed by Peter Steinberger. Also I added multi-account support for preventing rate limits.

It's now my daily driver for monitoring AI news and market sentiments, since X is the great source when tracking realtime events.

Birdy summarizes the home feed of yours, and answers every question by agentically searching and browsing twitter.

1

Server Compass – GUI SSH client, deploy like Vercel but VPS pricing #

servercompass.app faviconservercompass.app
0 comments1:37 PMView on HN
Hey HN,

I built Server Compass because I got tired of two extremes: paying $200+/month to Vercel/Railway for a polished deploy experience, or wrestling with self-hosted panels (Coolify, CapRover, Dokploy) that eat up half my VPS resources just to run their dashboard.

The core idea: A GUI SSH client that gives you Vercel-like deployment UX on your own VPS. Nothing gets installed on your server except your apps – every command runs over SSH from your desktop.

What it does:

  - 1-click deploys from GitHub repos or 160+ pre-built templates (Postgres, WordPress, Ghost, Supabase, n8n, etc.)
  - Zero-downtime deployments with blue-green strategy
  - Domain + SSL management (Let's Encrypt auto-renewal, Cloudflare tunnel support)
  - Real-time logs streamed in-app, no more ssh && docker logs -f
  - Visual cron jobs, env vars, file browser – basically everything you'd SSH in for
  - GitHub Actions CI/CD – builds use your free GH Actions minutes, not your server CPU
  - Full terminal when you need raw SSH access
What makes it different from other GUI SSH clients:

  Traditional SSH clients (Termius, Royal TSX) give you a nicer terminal. Server Compass wraps SSH with a deployment-focused UI – think "Vercel but the backend is just your VPS."
Unlike Coolify/CapRover, there's no agent or dashboard consuming your VPS resources. Every command goes direct from your Mac/Windows/Linux lap → your server over SSH. If I disappear tomorrow, your deployments still work.

Pricing: $19 one-time. No subscriptions.

Happy to answer questions about the architecture or demo specific workflows.