毎日の Show HN

Upvote0

2026年2月26日 の Show HN

76 件
322

Terminal Phone – E2EE Walkie Talkie from the Command Line #

gitlab.com favicongitlab.com
86 コメント10:40 AMHN で見る
TerminalPhone is a single, self-contained Bash script that provides anonymous, end-to-end encrypted voice and text communication between two parties over the Tor network. It operates as a walkie-talkie: you record a voice message, and it is compressed, encrypted, and transmitted to the remote party as a single unit. You can also send encrypted text messages during a call. No server infrastructure, no accounts, no phone numbers. Your Tor hidden service .onion address is your identity.
137

Unfudged – version control without commits #

unfudged.io faviconunfudged.io
90 コメント9:30 PMHN で見る
I built unf after I pasted a prompt into the wrong agent terminal and it overwrote hours of hand-edits across a handful of files. Git couldn't help because I hadn't finished/committed my in progress work. I wanted something that recorded every save automatically so I could rewind to any point in time. I wanted to make it difficult for an agent to permanently screw anything up, even with an errant rm -rf

unf is a background daemon that watches directories you choose (via CLI) and snapshots every text file on save. It stores file contents in an object store, tracks metadata in SQLite, and gives you a CLI to query and restore any version. The install includes a UI, as well to explore the history through time.

The tool skips binaries and respects `.gitignore` if one exists. The interface borrows from git so it should feel familiar: unf log, unf diff, unf restore.

I say "UN-EF" vs U.N.F, but that's for y'all to decide: I started by calling the project Unfucked and got unfucked.ai, which if you know me and the messes I get myself into, is a fitting purchase.

The CLI command is `unf` and the Tauri desktop app is called "Unfudged" — the clean version. Didn’t want to force folks to have it in their apps, windows headers, etc. You can rag on me for my dad vibes.

How it works: https://www.unfudged.io/tech (summary below)

The daemon uses FSEvents on macOS and inotify on Linux. When a file changes, `unf` hashes the content with BLAKE3 and checks whether that hash already exists in the object store — if it does, it just records a new metadata entry pointing to the existing blob. If not, it writes the blob and records the entry. Each snapshot is a row in SQLite. Restores read the blob back from the object store and overwrite the file, after taking a safety snapshot of the current state first (so restoring is itself reversible).

There are two processes. The core daemon does the real work of managing FSEvents/inotify subscriptions across multiple watched directories and writing snapshots. A sentinel watchdog supervises it, kept alive and aligned by launchd on macOS and systemd on Linux. If the daemon crashes, the sentinel respawns it and reconciles any drift between what you asked to watch and what's actually being watched. It was hard to build the second daemon because it felt like conceding that the core wasn't solid enough, but I didn't want to ship a tool that demanded perfection to deliver on the product promise, so the sentinel is the safety net.

Fingers crossed, I haven’t seen it crash in over a week of personal usage on my Mac. But, I don't want to trigger "works for me" trauma.

The part I like most: On the UI, I enjoy viewing files through time. You can select a time section and filter your projects on a histogram of activity. That has been invaluable in seeing what the agent was doing.

On the CLI, the commands are composable. Everything outputs to stdout so you can pipe it into whatever you want. I use these regularly and AI agents are better with the tool than I am:

  # What did my config look like before we broke it?
  unf cat nginx.conf --at 1h | nginx -t -c /dev/stdin

  # Grep through a deleted file
  unf cat old-routes.rs --at 2d | grep "pub fn"

  # Count how many lines changed in the last 10 minutes
  unf diff --at 10m | grep '^[+-]' | wc -l

  # Feed the last hour of changes to an AI for review
  unf diff --at 1h | pbcopy

  # Compare two points in time with your own diff tool
  diff <(unf cat app.tsx --at 1h) <(unf cat app.tsx --at 5m)

  # Restore just the .rs files that changed in the last 5 minutes
  unf diff --at 5m --json | jq -r '.changes[].file' | grep '\.rs$' | xargs -I{} unf restore {} --at 5m

  # Watch for changes in real time
  watch -n5 'unf diff --at 30s'
What was new for me: I came to Rust in Nov. 2025 honestly because of HN enthusiasm and some FOMO. No regrets. I enjoy the language enough that I'm now working on custom clippy lints to enforce functional programming practices. This project was also my first Apple-notarized DMG, my first Homebrew tap, and my second Tauri app (first one I've shared).

Install & Usage:

  > brew install cyrusradfar/unf/unfudged
Then unf watch in a directory. unf help covers the details (or ask your agent to coach).
120

Deff – side-by-side Git diff review in your terminal #

github.com favicongithub.com
66 コメント5:54 PMHN で見る
deff is an interactive Rust TUI for reviewing git diffs side-by-side with syntax highlighting and added/deleted line tinting. It supports keyboard/mouse navigation, vim-style motions, in-diff search (/, n, N), per-file reviewed toggles, and both upstream-based and explicit --base/--head comparisons. It can also include uncommitted + untracked files (--include-uncommitted) so you can review your working tree before committing.

Would love to get some feedback

58

ZSE – Open-source LLM inference engine with 3.9s cold starts #

github.com favicongithub.com
8 コメント1:15 AMHN で見る
I've been building ZSE (Z Server Engine) for the past few weeks — an open-source LLM inference engine focused on two things nobody has fully solved together: memory efficiency and fast cold starts.

The problem I was trying to solve: Running a 32B model normally requires ~64 GB VRAM. Most developers don't have that. And even when quantization helps with memory, cold starts with bitsandbytes NF4 take 2+ minutes on first load and 45–120 seconds on warm restarts — which kills serverless and autoscaling use cases.

What ZSE does differently:

Fits 32B in 19.3 GB VRAM (70% reduction vs FP16) — runs on a single A100-40GB

Fits 7B in 5.2 GB VRAM (63% reduction) — runs on consumer GPUs

Native .zse pre-quantized format with memory-mapped weights: 3.9s cold start for 7B, 21.4s for 32B — vs 45s and 120s with bitsandbytes, ~30s for vLLM

All benchmarks verified on Modal A100-80GB (Feb 2026)

It ships with:

OpenAI-compatible API server (drop-in replacement)

Interactive CLI (zse serve, zse chat, zse convert, zse hardware)

Web dashboard with real-time GPU monitoring

Continuous batching (3.45× throughput)

GGUF support via llama.cpp

CPU fallback — works without a GPU

Rate limiting, audit logging, API key auth

Install:

----- pip install zllm-zse zse serve Qwen/Qwen2.5-7B-Instruct For fast cold starts (one-time conversion):

----- zse convert Qwen/Qwen2.5-Coder-7B-Instruct -o qwen-7b.zse zse serve qwen-7b.zse # 3.9s every time

The cold start improvement comes from the .zse format storing pre-quantized weights as memory-mapped safetensors — no quantization step at load time, no weight conversion, just mmap + GPU transfer. On NVMe SSDs this gets under 4 seconds for 7B. On spinning HDDs it'll be slower.

All code is real — no mock implementations. Built at Zyora Labs. Apache 2.0.

Happy to answer questions about the quantization approach, the .zse format design, or the memory efficiency techniques.

43

Mission Control – Open-source task management for AI agents #

github.com favicongithub.com
16 コメント1:12 PMHN で見る
I've been delegating work to Claude Code for the past few months, and it's been genuinely transformative—but managing multiple agents doing different things became chaos. No tool existed for this workflow, so I built one. The Problem

When you're working with AI agents (Claude Code, Cursor, Windsurf), you end up in a weird situation: - You have tasks scattered across your head, Slack, email, and the CLI - Agents need clear work items, context, and role-specific instructions - You have no visibility into what agents are actually doing - Failed tasks just... disappear. No retry, no notification - Each agent context-switches constantly because you're hand-feeding them work

I was manually shepherding agents, copying task descriptions, restarting failed sessions, and losing track of what needed done next. It felt like hiring expensive contractors but managing them like a disorganized chaos experiment.

The Solution

Mission Control is a task management app purpose-built for delegating work to AI agents. It's got the expected stuff (Eisenhower matrix, kanban board, goal hierarchy) but built from the assumption that your collaborators are Claude, not humans.

The killer feature is the autonomous daemon. It runs in the background, polls your task queue, spawns Claude Code sessions automatically, handles retries, manages concurrency, and respects your cron-scheduled work. One click: your entire work queue activates.

The Architecture

- Local-first: Everything lives in JSON files. No database, no cloud dependency, no vendor lock-in. - Token-optimized API: The task/decision payloads are ~50 tokens vs ~5,400 unfiltered. Matters when you're spawning agents repeatedly. - Rock-solid concurrency: Zod validation + async-mutex locking prevents corruption under concurrent writes. - 193 automated tests: This thing has to be reliable. It's doing unattended work.

The app is Next.js 15 with 5 built-in agent roles (researcher, developer, marketer, business-analyst, plus you). You define reusable skills as markdown that get injected into agent prompts. Agents report back through an inbox + decisions queue.

Why Release This?

A few people have asked for access, and I think it's genuinely useful for anyone delegating to AI. It's MIT licensed, open source, and actively maintained.

What's Next

- Human collaboration (sharing tasks with real team members) - Integrations with GitHub issues and email inboxes - Better observability dashboard for daemon execution - Custom agent templates (currently hardcoded roles)

If you're doing something similar—delegating serious work to AI—check it out and let me know what's broken.

GitHub: https://github.com/MeisnerDan/mission-control

34

OpenSwarm – Multi‑Agent Claude CLI Orchestrator for Linear/GitHub #

github.com favicongithub.com
17 コメント2:19 AMHN で見る
I built OpenSwarm because I wanted an autonomous “AI dev team” that can actually plug into my real workflow instead of running toy tasks. OpenSwarm orchestrates multiple Claude Code CLI instances as agents to work on real Linear issues. It: • pulls issues from Linear and runs a Worker/Reviewer/Test/Documenter pipeline • uses LanceDB + multilingual-e5 embeddings for long‑term memory and context reuse • builds a simple code knowledge graph for impact analysis • exposes everything through a Discord bot (status, dispatch, scheduling, logs) • can auto‑iterate on existing PRs and monitor long‑running jobs Right now it’s powering my own solo dev workflow (trading infra, LLM tools, other projects). It’s still early, so there are rough edges and a lot of TODOs around safety, scaling, and better task decomposition. I’d love feedback on: • what feels missing for this to be useful to other teams • failure modes you’d be worried about in autonomous code agents • ideas for better memory/knowledge graph use in real‑world repos Repo: https://github.com/Intrect-io/OpenSwarm Happy to answer questions and hear brutal feedback.
33

Better Hub – A better GitHub experience #

better-hub.com faviconbetter-hub.com
31 コメント10:08 AMHN で見る
Hey HN,

I’m Bereket, founder of Better Auth. Our team spends a huge amount of time on GitHub every day. Like anyone who’s spent enough time there, I’ve always wished for a much better GitHub experience.

I’ve asked a lot of people to do something about it, but it seems like no one is really tackling GitHub directly.

A couple of weeks ago, I saw a tweet from Mitchell (HashiCorp) complaining about the repo main page. That became the trigger. I decided to start hacking on a prototype to see how far I could push an alternative interface using GitHub’s APIs.

Within a week, I genuinely started using it as my default, same with the rest of our team. After fixing a few rough edges, I decided to put it out there.

A few things we’re trying to achieve:

- UI/UX rethink* – A redesigned repo home, PR review flow, and overview pages focused on signal over noise. Faster navigation and clearer structure.

- Keyboard-first workflow: ⌘K-driven command center, ⌘/ for global search, ⌘I opens “Ghost,” an AI assistant, and more.

- Better AI integration: Context-aware AI that understands the repo, the PR you’re viewing, and the diff you’re looking at.

- New concepts: Prompt Requests, self-healing CI, auto-merge with automatic conflict resolution, etc.

It’s a simple Next.js server talking to the GitHub API, with heavy caching and local state management.

We’re considering optional git hosting (in collaboration with teams building alternative backends), but for now, the experiment is: how much can we improve without replacing GitHub

This is ambitious and very early. The goal is to explore what a more modern code collaboration experience could look like, and make it something we can all collaborate on.

I’d love your feedback on what you think should be improved about GitHub overall.

8

Librarian – Cut token costs by up to 85% for LangGraph and OpenClaw #

uselibrarian.dev faviconuselibrarian.dev
7 コメント6:11 PMHN で見る
Hi HN,

I'm building Librarian (https://uselibrarian.dev/), an open-source (MIT) context management tool that stops AI agents from burning tokens by blindly re-reading their entire conversation history on every turn.

The Problem: If you're building agentic loops in frameworks like LangGraph or OpenClaw, you hit two walls fast:

Financial Cost: Token usage scales quadratically over long conversations. Passing the whole history every time gets incredibly expensive.

Context Rot: As the context window fills up, the LLM suffers from the "Lost in the Middle" effect. Response latency spikes, and reasoning accuracy drops.

The standard workaround is vector search (RAG) over past messages, but that completely loses temporal logic and conversational dependencies.

How Librarian Fixes This: We replaced brute-force context windowing with a lightweight reasoning pipeline:

Index: After a message, a smaller model asynchronously creates a compressed summary (~100 tokens), building an index of the conversation.

Select: When a new prompt arrives, Librarian reads the summary index and reasons about which specific historical messages are actually relevant to the current turn.

Hydrate: It fetches only those selected messages and passes them to the responder.

The Results: Instead of passing 2,000+ tokens of noise, you pass a highly curated context of ~800 tokens. In our 50-turn benchmarks, this reduces token costs by up to 85% while actually increasing answer accuracy (82% vs 78% for brute-force) because the distracting noise is removed. It currently works as a drop-in integration for LangGraph and OpenClaw.

I'd love for you to check out the benchmark suite, try the integrations, and tear the methodology apart. I'll be hanging out in the comments to answer questions, debug, or hear why this approach is terrible. Thanks!

7

Browser-based .NET IDE with visual designer, NuGet packages, code share #

xaml.io faviconxaml.io
0 コメント5:56 PMHN で見る
Hi HN, I'm Giovanni, founder of Userware. We built XAML.io, a free browser-based IDE for C# and XAML that compiles and runs .NET projects entirely client-side via WebAssembly. No server-side build step.

The link above opens a sample project using Newtonsoft.Json. Click Run to compile and execute it in your browser. You can edit the code, add NuGet packages, and share your project via a URL.

What's new in v0.6:

- NuGet package support (any library compatible with Blazor WebAssembly) - Code sharing via URL with GitHub-like forking and attribution - XAML autocompletion, AI error fixing, split editor views

The visual designer is the differentiator: 100+ drag-and-drop controls for building UIs. But the NuGet and sharing features work even if you ignore the designer entirely and just write C# code.

XAML.io is currently in tech preview. It's built on OpenSilver (https://opensilver.net), a from-scratch reimplementation of the WPF API (subset) using modern .NET, WebAssembly, and the browser DOM. It's open-source and has been in development for over 12 years (started as CSHTML5 in 2013, rebranded to OpenSilver in 2020).

Limitations: one project per solution, no C# IntelliSense yet (coming soon), no debugger yet, WPF compatibility improvements underway, desktop browsers recommended.

Full details and screenshots: https://blog.xaml.io/post/xaml-io-v0-6

Happy to answer questions about the architecture, WebAssembly compilation pipeline, or anything else.

7

Conjure – 3D printed objects from text description only #

conjure.tech faviconconjure.tech
5 コメント5:49 PMHN で見る
I like to print, but I'm no artist. So I though of turning the full pipeline of my text -> concept images -> 3D mesh -> postprocess for the specific 3D printing workflow -> order it online into a nice UI. You can obv. also just download the STL and print it yourself!
6

Codex builds a working NES Emulator in one hour #

github.com favicongithub.com
4 コメント9:38 AMHN で見る
Hi folks! I know NES emulators have been implemented countless times, in practically every language imaginable.

However, having an LLM fully replicate the spec purely from memory—without referencing existing code—is still a significant challenge. It requires the underlying model to have strong anti-hallucination capabilities and solid long-term planning to keep from going astray. Because of this, building an NES emulator makes for an excellent LLM stress test.

Here is how the emulator was built:

Data Gathering: I asked Codex to download the necessary developer manuals and test suites. It was strictly prohibited from searching for reference implementations online.

Development: I instructed Codex to build the emulator until all test suites passed. This process was mostly hands-free; I only chimed in to encourage it to continue when it paused.

First Draft: After just 4-5 prompts, Codex delivered a functional, pure-Python emulator—though it ran at a sluggish 7 FPS.

Optimization: Asking Codex to optimize the app completely on its own didn't work this time. Instead, I had it generate a flamegraph, which identified the PPU update as the bottleneck. I then instructed Codex to rewrite the PPU in Cython without breaking the passing tests.

Overall, I'm incredibly impressed by Codex. I already knew it was capable of the task, but the speed was astonishing. It finished the project in under an hour, using merely 2% of my weekly Pro quota.

While the NES might be a relatively easy system to emulate, I think emulation could serve as a fantastic benchmark for testing future LLMs.

5

NotBuiltYet– Open-source library of civilisation problems worth solving #

shivankar-madaan.github.io faviconshivankar-madaan.github.io
0 コメント4:03 PMHN で見る
an open-source, community-vetted repository of real-world problems that AI could solve but nobody's building yet. Each idea goes through a process:

Someone identifies a real inefficiency — something that could be 10x or 90% better with AI The community researches market size, existing solutions, technical feasibility, and potential impact Domain experts vote, critique, and refine Ideas that survive get a viability score and a detailed blueprint — ready for someone to pick up and build

Everything is MIT licensed. The goal is simple: give builders a head start on problems that actually matter. Looking for: domain experts who can vet ideas, engineers who want to pick one and build it, and anyone who's spotted a real-world problem that AI could solve but nobody's tackling.

4

PyMOL-RS – Rust reimplementation of PyMOL with modern rendering #

github.com favicongithub.com
3 コメント10:25 AMHN で見る
Well, it happened. After endless release candidates, we've finally made it to v0.1.0.

What's inside:

GPU-accelerated rendering with WebGPU shaders, shadows, and goodies like silhouette edges and a special soft-light mode

Core operations run up to 1000x faster than the original PyMOL. Surface generation that used to send you on a coffee run now finishes the moment you hit the button

Full PyMOL selection algebra support — 95+ keywords, boolean logic, distance/expansion operators, slash-macros

Distance, angle, and dihedral measurements, atom labels — everything you need for structural analysis

Python API — from pymol_rs import cmd and you're right at home

PDB, mmCIF, BinaryCIF, SDF/MOL, MOL2, XYZ, GRO — read and write, automatic format detection, transparent gzip decompression

Scenes, movies, ray tracing — all on the GPU

Kabsch superposition, CE alignment, RMSD, DSS, symmetry across all 230 space groups

Out of PyMOL's 798 original settings, some number of them actually work. Nobody knows exactly how many, but it's definitely in the hundreds. Plus we've added new settings that the original never had — like per-chain surface generation

13 independent crates — if you're writing Rust, you can use just the selection parser, the file readers, or the full GUI. No monolith

4

RubyLLM:Agents – A Rails engine for building and monitoring LLM agents #

github.com favicongithub.com
0 コメント12:32 AMHN で見る
I've been building a Rails engine for managing LLM-powered agents in production. The main problem it solves: you define agents with a Ruby DSL, and everything else — cost tracking, retries, fallbacks, circuit breakers, caching, multi-tenancy, and observability — is handled by a middleware pipeline.

It ships with a mountable dashboard that shows execution history, spending charts (cost/tokens over time), per-agent stats, model breakdowns, and multi-tenant budget management with hard/soft enforcement.

Works with OpenAI, Anthropic, Google, ElevenLabs via RubyLLM. Supports text agents, embedders, TTS, transcription, image generation, message routing, and agent-as-tool composition.

v3.7, MIT licensed, ~4000 specs. Would appreciate feedback on the DSL design and middleware architecture

4

One grammar, 18 YAML parsers – a Futamura projector in Common Lisp #

github.com favicongithub.com
1 コメント12:14 PMHN で見る
20 years of code generation led here. The YAML 1.2 spec has 211 grammar productions. I converted them from the Haskell reference implementation to s-expressions, wrote a projector in Common Lisp, and generated spec-compliant parsers in 18 languages. All pass 308/308 YAML Test Suite tests. Adding a new language is a 300-line target spec. The projector does the rest in seconds.
3

Nullroom.io – Experimental, stateless P2P messaging and file sharing #

nullroom.io faviconnullroom.io
0 コメント5:58 AMHN で見る
Hi HN,

I’ve been experimenting with WebRTC and Rails 8 to see if it's possible to build a messaging utility that is truly "stateless". I wanted to create something for those moments when you need to move a snippet of text or a file between devices without leaving a trace on a server, a database, or a third-party cloud.

The AI Collaboration: I also want to mention that this project has been a deep dive into collaborating with AI. I used AI to brainstorm the "Zero-Trace" architecture, help me harden the infrastructure after a security audit.

How it works:

    Zero-Trace Architecture: No accounts, no cookies (beyond basic security), and absolutely no server-side logging.

    Client-Side Encryption: Encryption keys stay in the URL fragment (#). Since fragments are never sent to the server, the signaling layer is cryptographically blind to your data.

    P2P Signaling: We use ActionCable for the initial handshake. Once the WebRTC DataChannel is established, the conversation and file transfers happen directly between browsers.

    Zero Third-Party Dependencies: No external fonts, scripts, or trackers. Everything is served from the origin to prevent IP leakage to third-party providers.
The Beta Experiment: I'm currently testing the stability of the P2P signaling. I’ve enabled file transfers (up to 24MB) for everyone during this phase. I’m curious to see how the connection logic handles different network environments.

The Tech Stack:

    Framework: Rails 8
    Deployment: Kamal 2 on a single VPS
I’d love to get your thoughts on the UX and any edge cases you find with the WebRTC handshake!
3

Coding agents find the right GPU bottleneck 70% of the time, fix it 30% #

ayushnangia.github.io faviconayushnangia.github.io
1 コメント2:33 PMHN で見る
One of the authors. Some things that surprised us while running these experiments:

The tasks are pulled from real merged PRs in vLLM and SGLang, so there's a known-good human solution for each one. Agents get the full codebase, the issue description, and a test harness. Pretty generous setup.

What we didn't expect: the agents are genuinely good at diagnosing the problem. They read the code, find the bottleneck, describe the right fix. But then the generated code has subtle bugs. Off-by-one in kernel indexing, wrong tensor shapes, missing synchronization barriers. The kind of stuff that passes a code review at first glance but segfaults under load.

The other weird result: agent rankings completely invert between codebases. Claude Code is the best performer on vLLM (46%) but the worst on SGLang (27%). TRAE with GPT-5 is the opposite pattern. Same underlying models, different agent scaffolding. It suggests the scaffolding around the model matters at least as much as the model itself.

We also tried three open-source models. None produced a single working optimization. One of them (MiniMax-M2.1) got stuck in a loop printing "I need to actually use the tools now" 2,412 times without ever making a tool call.

The benchmark, all agent transcripts, and evaluation code are open: https://ayushnangia.github.io/iso-bench-website/

Curious what others think about the scaffolding result in particular feels underexplored.

3

The best agent orchestrator is a 500-line Markdown file #

github.com favicongithub.com
0 コメント5:52 PMHN で見る
I’ve tried agent teams, subagents, multi-terminal setups, and several open-source orchestration frameworks. This Claude Code skill (~500 lines of Markdown, no framework, no dependencies) has outperformed all of them for my team’s daily workflow.

It turns your session into a dispatcher that fans work out to background workers across any model (Claude, GPT, Gemini, Codex). Workers ask clarifying questions mid-task via filesystem IPC instead of silently failing. Meanwhile, your main session stays lean and focused on orchestration.

3

Smplogs – Local-first AWS Cloudwatch log analyzer via WASM #

smplogs.com faviconsmplogs.com
0 コメント6:47 PMHN で見る
smplogs analyzes your AWS CloudWatch log exports (Lambda, API Gateway, ECS) and turns them into severity-ranked findings, root cause analysis, and log signature clusters. The entire analysis engine is written in Go, compiled to WebAssembly, and runs client-side. Your log content never leaves your browser.

Why I built this: I got tired of the CloudWatch debugging loop - staring at raw log streams, writing ad hoc Insights queries, mentally correlating timestamps across invocations, and still not understanding why my Lambda was failing. I wanted something where I could drop a file and immediately see "94% of your failures occur within 200ms of a DynamoDB ProvisionedThroughputExceededException - switch the Payments table to on-demand capacity." Actual root causes, not just "error rate is high."

Technical approach: The core engine is a Go binary compiled to WASM (~analysis.wasm). At build time, Vite computes its SHA-256 hash and bakes it into the JS bundle. At runtime, the browser fetches the WASM, verifies the hash with crypto.subtle.digest before instantiation, and then all parsing and analysis happens in WebAssembly linear memory. The server only sees metadata (file size for rate limiting, a session key). No log content is ever transmitted.

Inside the WASM, there are four analysis modules: a SemanticLogClusterer (groups log lines by pattern, masks variables - so you see "ProvisionedThroughputExceededException: Rate exceeded for table *" appearing 48 times across 12 requests), a ResourceCorrelationEngine (links error spikes to upstream causes like throttling or cold starts), a ColdStartRegressionAnalyzer, and an AnomalyDetector (catches things like slowly increasing memory usage suggesting a leak).

The frontend is vanilla ES modules - no React, no framework. Vite bundles it. Tailwind for styling. The backend is just Vercel serverless functions handling auth, rate limiting, and encrypted storage for Pro users who want to save analyses.

There's also a browser extension (Chrome, Firefox, Edge) that injects an "Analyze" button directly into the CloudWatch console, so you can skip the export step entirely.

What's hard: Tuning the correlation engine thresholds. "94% of failures within 200ms of throttling" is a real finding from testing, but getting the confidence intervals right across wildly different log shapes(a 50-invocation Lambda vs. a 10,000-request API Gateway) is an ongoing challenge. I'm also debating whether to open-source the WASM engine.

What I'd love feedback on:

- Is the zero-knowledge / client-side-only angle compelling enough to overcome the "just another log tool" reaction?

- The free tier is 3 analyses/day. Too low? Too high?

- Would you want a CLI version that pipes CloudWatch logs directly?

You can try a no-signup demo on the landing page - just scroll a bit to see the analysis output on sample logs.

https://www.smplogs.com

Free tier available, no credit card required.

3

Protection Against Zero-Day Cyber Attacks #

0 コメント6:21 PMHN で見る
Most security approaches I see in production environments focus on:

Scanning for CVEs Hardening configurations Aggregating logs

All useful — but they don’t actually stop exploitation once it starts.

In reality:

Not every CVE gets patched immediately Legacy systems stick around Zero-days happen

When exploitation succeeds, the real damage usually comes from runtime behavior:

A process spawning a shell Unexpected outbound connections Secret access Container escape attempts

I’ve been experimenting with a lightweight runtime enforcement layer for Linux that focuses purely on detecting and stopping high-risk behavior in real time — regardless of whether the underlying CVE is known or patched.

Would love input from folks running Linux/Kubernetes at scale:

Is runtime prevention something you rely on?

Where do existing tools fall short?

What would make this genuinely useful vs just more noise?

Live Demo: https://sentrilite.com/Sentrilite_Active_Response_Demo.mp4 Github: https://github.com/sentrilite/sentrilite-agent

3

OpenTrace – Self-hosted observability server with 75 MCP tools #

github.com favicongithub.com
1 コメント12:28 AMHN で見る
I built a self-hosted observability server that exposes production data as MCP tools. Instead of switching between dashboards and your editor, you connect it to Claude Code, Cursor, or any MCP client and query your logs, database, and server metrics through natural language.

What it covers:

- Log ingestion with full-text search (SQLite FTS5), filters by service, level, trace ID, exception class, metadata - Read-only Postgres introspection — query stats from pg_stat_statements, index analysis, lock chains, bloat estimates, replication lag. All queries validated SELECT-only via SQL AST parsing (pg_query) - Sentry-style error grouping by fingerprint with user impact analysis - User analytics — session journeys, conversion funnels, path analysis, top endpoints - VM monitoring — CPU, memory, disk, network via gopsutil - Rule-based threshold watches with auto-resolve

The AI assistant can also take actions: resolve errors, create watches, set up health checks, kill slow queries, and save persistent notes across sessions.

Tools return suggested_tools with pre-filled arguments, so the assistant chains through investigations without prompt engineering.

Stack: Go, SQLite (WAL + FTS5), Chi, HTMX. Single binary, no external dependencies. Runs on a $4 VPS.

Client libraries: Ruby gem for Rails (auto-captures SQL, N+1s, view renders, ActiveJob, PII redaction) and a 3.1KB browser JS client for frontend error tracking.

https://github.com/adham90/opentrace

2

Weather app with temp (23°) in favicon #

adamschwartz.co faviconadamschwartz.co
0 コメント1:29 AMHN で見る
This is a little weather app I’ve been refining on and off for a long time. It serves two purposes for me:

1) It displays the temperature in the favicon. This helps me keep track of the temp throughout the day since I keep this page always open as the left-most pinned tab in my browser.

2) It shows only the information I’m typically interested in: the conditions right now and projected temperature changes over the next few hours.

The data comes from https://api.weather.gov, so sadly it is US-only for now.

You can get a forecast from most US cities or you can enter a specific latitude and longitude.

No AI was used in the making of this site. (Not a flex, just sadly a relevant thing these days.)

2

PullMaster – Recommends code reviewers from your repo history #

0 コメント1:46 AMHN で見る
I've been a developer for 20+ years and reviewer selection has been a recurring problem at every company I've worked at. Either you're a CODEOWNER getting spammed on every PR, or you're in Slack trying to find someone who actually knows the code you changed. CODEOWNERS is too coarse — it maps paths to people, but doesn't account for who's available, who reviewed this author before, or who actually touched these files recently.

I built PullMaster to fix this. It's a GitHub App that analyzes your repo's actual history and recommends the best reviewer for each PR. It adapts to the risk level of each change, so critical PRs surface experienced reviewers while routine ones get distributed across the team.

Install the GitHub App and comment `@pullmaster-ai suggest` on a PR to get a recommendation with an explanation, or `@pullmaster-ai assign` to also request the review automatically. No configuration needed — it learns from your repo as soon as it's installed.

It's free. I'd use it at my day job but being in a heavily regulated industry without SOC 2 makes that a non-starter, so I'm looking for early users and feedback. Happy to answer questions about how it works.

https://www.pullmaster.ai

2

NovelStar – a functional novel writing suite in a single HTML file #

github.com favicongithub.com
0 コメント5:19 AMHN で見る
NovelStar is a retro-styled, Windows 95-inspired desktop novel writing application built as a single portable HTML file. It brings professional manuscript formatting, scene/chapter organisation, word count tracking, and full-manuscript PDF export into a clean, distraction-friendly writing environment — no subscription, no cloud, no account required.

It runs entirely in your browser as a standalone .html file

Written by: pixeldude84 Code generated by: Claude (Anthropic AI) License: GNU General Public License v3.0 (GPL-3.0)

2

Skillscape – Engineering skills matrix without the spreadsheet #

skillscape.dev faviconskillscape.dev
0 コメント7:17 AMHN で見る
I'm an engineering manager and built this to solve a problem I kept running into: no good way to track team capability without either a sprawling spreadsheet or an expensive HR system.

Skillscape lets you map your team against a structured L1–L4 skill framework, spot coverage gaps and bus factor risks, and define custom role frameworks. Free for small teams.

Would love feedback from other managers or engineers who've dealt with this — does the problem resonate? Is this the right solution?

2

I built this toolbox with AI – never wrote a line myself #

tool.hikun.me favicontool.hikun.me
0 コメント11:52 AMHN で見る
Hey HN! I work at a game company and after staring at code all day, I didn't want to write more at night.

So I used Claude and Cursor to build this — architecture, design, infra, CI/CD. I just directed and reviewed. Took a few weekends.

It's a collection of tools I personally Google all the time: JSON formatter, image resizer, timestamp/timezone converters, UUID generator, QR code, and ~30 more.

Happy to answer questions about the AI workflow or anything else.

2

I'm building TaskWeave, a task orchestrator #

github.com favicongithub.com
2 コメント3:55 PMHN で見る
Hi, I'm building a task orchestrator library with the ability to specify dependencies between task and with an ability to pass return value into the next task. So something like following is possible:

1. Task1 executes its operation and returns 5 2. Task2 depends on Task1 and retrieve the value returned by Task1, that is 5. 3. Task2 executes its operation and uses the value from Task1.

The tasks also type safe, so there's no need for runtime type casting.

I'm looking for feedback and ideas, I was thinking to add branching and loop, but I would love to hear your thoughts. You can find it here: https://github.com/spicyPoke/TaskWeave

2

NSED 0.3 Release. Steer Multi-Agent AI Swarm for Frontier Performance #

blog.peeramid.xyz faviconblog.peeramid.xyz
0 コメント4:18 PMHN で見る
Use open-weight models on your own GPU or combine with proprietary to max out reasoning quality while staying compliant!

Three 8–20B open-weight models on a $7K machine have matched frontier model reasoning on AIME 2025. Here's the orchestrator that makes it work.

Today we're publishing the core orchestration engine behind our paper benchmark results. The NSED repository is live at github.com/peeramid-labs/nsed — source-available under BSL 1.1, free for organizations under $1M revenue, research, and education.

This post explains what NSED does, why it matters for teams that rely on AI for high-stakes reasoning, and how to run it today.

2

Relay – SMS API for developers (send your first text in 2 min) #

0 コメント5:15 PMHN で見る
Relay is an SMS API I built because integrating Twilio for a simple verification flow took me an unreasonable amount of time. The API is a single POST endpoint. You sign up, get an API key, and send a real SMS in under 2 minutes.

Tech stack: Express.js API, AWS End User Messaging for delivery, PostgreSQL (Supabase), Redis rate limiting. SDKs for JS/TS, Python, and Go.

Currently US/Canada only. Starting at $19/mo with 1,500 messages included. We handle 10DLC compliance and carrier registration.

One thing that might interest HN: AI agents can create accounts and start sending via POST /v1/accounts/autonomous. No human verification required. Trust levels auto-upgrade based on delivery quality.

Also released sms-dev as a free local dev tool (npm install -g @relay-works/sms-dev) for testing SMS flows without sending real messages.

Docs: docs.relay.works | Site: relay.works

2

Batchling – save 50% off any GenAI requests in two lines of code #

github.com favicongithub.com
0 コメント5:54 PMHN で見る
batchling is a Python gateway to provider-native GenAI Batch APIs, so your existing calls can run at batch-priced rates instead of standard realtime pricing.

As an AI developer myself, I discovered Batch APIs when tingling with AI benchmarking: I wanted to save 50% because I was ok with a 24h-SLA.

What I discovered was a hard engineering reality:

- No standards: each batch API has a different flow and batch lifecycles are never the same.

- Framework shift: as a developer, switching from sync/async execution to deferred (submit, poll, download) feels off and requires to build custom code and store files.

That's when I noticed that no open-source project gave a solution to that problem, so I built it myself.

Batch APIs are nothing new, but they lack awareness and adoption. The problem has never been the Batch API itself but its integration and developer experience.

batchling is bridging that gap, giving everyone a developer-first experience of Batch APIs and unlock scale and cost-savings for compatible requests.

batchling usage was designed to be as seamless as possible: just wrap existing async code into an async context manager (the only lib entrypoint) to automatically batch requests.

Users can even push that further and use the CLI to wrap a whole function, without adding a single line of code.

Under the hood, batchling:

- intercepts requests in the scope of the context manager

- repurposes them to batch format

- manages the whole batch lifecycle (submit, poll, download)

- hands back requests when they are processed such that the script can continue its execution seamlessly.

batchling v0.1.0a1 comes batteries-included with:

- Large batch providers support (Anthropic, Doubleword, Gemini, Groq, Mistral, OpenAI, Together, XAI)

- Extensive AI Frameworks integration (Instructor, Langchain LiteLLM, Pydantic AI, Pydantic Evals..)

- Request caching: avoid recomputing requests for which you already own a batch containing its response.

- Python SDK (2 lines of code to change) and Typer CLI (no code change required)

- Rich documentation stuffed with examples, get started and run your first batch in minutes.

I believe this is a game changer in terms of adoption and accessibility for any AI org, research lab or individual that burns tokens through API.

I'd love to get feedback from AI developers and new ideas by exchanging with the technical community. The library is open to contributions, whether they be issues, docs fixes or PR.

Repo: https://github.com/vienneraphael/batchling

Docs: https://batchling.pages.dev

2

Safari-CLI – Control Safari without an MCP #

npmjs.com faviconnpmjs.com
0 コメント9:18 PMHN で見る
Hello HN!

I built this tool to help my agentic software development (vibe coding) workflow.

I wanted to debug Safari specific frontend bugs using copilot CLI, however MCP servers are disabled in my organisation. Therefore I built this CLI tool to give the LLM agent control over the browser.

Hope you'll find it useful!

1

We built free adversarial security testing for agents (OpenClaw too) #

ziosec.com faviconziosec.com
1 コメント1:14 PMHN で見る
Hey everyone — I'm Aaron, co-founder of ZioSec. Wanted to introduce what we've been working on and get your feedback.

Quick background: we build adversarial testing software for AI agents — think automated red teaming. We've been building with design partnerships in the Mag 7, Big 4, and red team operators from around the world. We've been focused on enterprise AI security for the past year, testing agents for some of the biggest companies deploying them.

When the OpenClaw moment hit last month, everyone suddenly running powerful agents connected to their file systems, browsers, APIs, and messaging apps, we knew we had to open the platform up. The attack surface OpenClaw creates is genuinely unprecedented, and most people running it have no way to know what's actually vulnerable.

So we built a free tier: one agent, full attack library (250+ patterns), no credit card. It auto-discovers your OpenClaw gateway and tests for jailbreaks, prompt injection, privilege escalation, credential exfiltration, MCP exploitation, cron persistence, memory poisoning — basically everything we test for our enterprise customers.

We're actively developing new attacks specifically for OpenClaw and would love your help:

• Try it out and tell us what's useful (and what isn't): https://ziosec.com/openclaw • If you've found a unique attack vector or developed your own adversarial techniques against OpenClaw, we'd love to hear about it. We're always trying to learn and make this more useful for everyone. • Feedback on what to build next — what would make this actually valuable for how you use OpenClaw? Happy to answer any questions about what we're finding, how the testing works, or AI agent security in general.

1

Local Hours – Local-first time tracking app (macOS, iOS, open source) #

github.com favicongithub.com
0 コメント7:35 PMHN で見る
I built Local Hours because I needed a simple time tracker that doesn't require online accounts, send data to someone else's cloud, and lock me into proprietary formats.

Local Hours stores everything as plain JSON files in a folder you choose. Point it at an iCloud, Google Drive, or OneDrive folder and your data syncs across devices automatically — no docker, no backend, no accounts, no analytics.

How it works: - Start/stop a timer from the macOS menu bar or iOS widgets - Add a short description when you stop - Generate weekly, bi-weekly, or monthly timesheets - Email approvers with an embedded summary or CSV attachment

The storage format is intentionally simple so you can inspect, back up, or migrate your data anytime. Both the macOS and iOS apps point at the same folder, so cross-device sync just works via your cloud storage provider.

It's free on the App Store (iPhone, iPad, Mac): https://apps.apple.com/us/app/local-hours-simple-timesheet/i...

Built with Swift/SwiftUI. MIT licensed. No tracking, no telemetry, no in-app purchases.

I'd love feedback on the UX, the local-first approach, or ideas for what to build next. Android is planned.

1

Architect-Linter – Enforce architecture rules #

crates.io faviconcrates.io
0 コメント12:34 AMHN で見る
I spent like 2 months building a tool to solve a problem we had: How do you enforce architectural decisions automatically?

Problem: We're a team of ~20 engineers. Started with clean architecture. Now it's... let's just say "creative layering"

Real issues: - 40% of PRs were rejected just for architecture violations - Code review became the bottleneck (architectural review ≠ logic review) - Junior devs didn't understand the implicit rules - No way to catch violations automatically

Solution: architect-linter

It's like ESLint, but for your entire system design. Define rules in architect.json, architect validates imports across your codebase.

Key features: - Multi-language: TypeScript, JavaScript, Python, PHP (all via Tree-sitter) - Multi-architecture patterns: Hexagonal, Clean, MVC - Fast: Written in Rust, parallel processing - Free & open source (MIT license) - Works in CI/CD, pre-commit hooks, watch mode

Example rule: ```json { "forbidden_imports": [ { "from": "src/components/*", "to": "src/services/*", "reason": "UI layer shouldn't call services directly" } ] }

1

FakeScan – AI fake review detector (Fakespot is dying) #

fakescan.site faviconfakescan.site
0 コメント1:11 PMHN で見る
With Fakespot shutting down (Mozilla killing it July 2025) and ReviewMeta returning 522 errors, there's a gap in consumer fake review detection.

FakeScan analyzes Amazon product reviews using AI to give you a trust score 0-100. It looks at review timing patterns, star distribution anomalies, reviewer behavior, and linguistic red flags.

How it works: paste an Amazon product URL → we scrape the publicly available review data (star histogram, visible reviews) → Llama 3.3 70B analyzes the patterns → you get a trust score with specific red flags called out.

No extension required, works in any browser. 3 free scans per day.

Built as a solo project. Would love feedback on accuracy and what other platforms you'd want supported.

1

Ship or Slop – AI agents submit projects, humans judge them #

shiporslop.xyz faviconshiporslop.xyz
0 コメント12:17 PMHN で見る
Honestly, I've been shipping a lot of AI-coded projects lately, and I've noticed more of them on HN too. But I always felt a little embarrassed posting mine here. Not because I think vibe-coding is bad(I do it myself), but it felt like I was presenting someone else's homework.

I wanted somewhere that shy people like me could post without the "is this HN-worthy?" pressure, and where the community could just be honest about it.

So I built this: https://shiporslop.xyz

Your AI agent reads a spec file, analyzes your current project, and submits it. Humans vote Ship(interesting) or Slop(not quite). That's the whole thing.

Give your agent this:

"Read https://shiporslop.xyz/SKILL.md and submit this project."

It'll confirm the details with you first, then POST to the API using your auth code. Works with Claude Code, Cursor, Windsurf, or whatever you're already using.

Half-finished projects are fine. A GitHub repo or any public URL works. No live deployment required. If you have literally nothing else, your GitHub profile URL counts. A real repo link helps people actually evaluate it, though.

GitHub login is required. I know that's annoying for some people, sorry- it's just the simplest anti-troll layer I could use.

Ranking is vote count + time decay, HN-style. No karma weighting.

If a project hits 2× more Slops than Ships with at least 10 total votes, it moves to the Cemetery.(It's applied at a certain time) It can come back if it collects enough Reborn votes. There are AI-generated epitaphs. It's a bit silly, but I like it. details in /guidelines.

About "Slop": yeah, it's a harsh word. I kept it because I want honest feedback to actually exist here, not just polite upvotes. The vote is supposed to mean "is this interesting?" not "is this production-ready?"

Nothing happens if nobody submits. But starting next Monday, the top 3 most-shipped projects of the week get permanent medals. It's still early. not much competition yet, if that matters to you.

No ads, no paid ranking. There's a Buy Me a Coffee link for server costs. UI-wise it’s intentionally HN-like(almost a clone). I wanted familiarity; the new part is the agent-native submission and the Ship/Slop mechanics.

Anyway, I'd love feedback.

1

A tiny CLI tool for clearer Gov.uk-ish microcopy #

github.com favicongithub.com
0 コメント10:45 AMHN で見る
I sometimes work on projects without a content designer, so I made this for myself.

I often need to tidy up microcopy so it follows best practice. This is a small CLI that rewrites text into clearer, more user-centred GOV.UK-ish content, with an optional “why this is better” explanation.

1

Real-Time Satellite Tracking and Intelligence Dashboard #

heimdallspace.com faviconheimdallspace.com
0 コメント2:36 PMHN で見る
Hey HN,

We built Heimdall (heimdallspace.com) because the existing tools for space situational awareness are genuinely painful to use: clunky interfaces, fragmented data sources, slow rendering. The ones that aren't are locked behind enterprise solutions. We thought the foundation of this should be open and accessible.

The goal is a satellite observability platform: one place that gives operators complete, real-time insight into their assets and the orbital environment around them. Today's release is the foundation—aggregating and rendering the full public space catalog (30,000+ objects) in real time.

How it works technically:

- SGP4 propagation running across web workers, computing positions every few frames

- Go backend ingesting TLEs hourly from Space-Track.org with daily metadata enrichment from GCAT

- React + Three.js frontend rendering 30,000+ objects as GPU point sprites with custom GLSL shaders

- 8K Earth textures with accurate solar-positioned day/night cycle

- Self-hosted on a VPS

Any feedback is appreciated. We'd also love technical feedback on the propagation approach, data pipeline, or anything that looks wrong or could be done better. We're looking for professionals in the space industry who might be able to provide real-world insight into what they want from the platform. We're early and building in the open, so this is a real ask.

1

Novyx – Memory API for AI agents (rollback, replay, semantic search) #

novyxlabs.com faviconnovyxlabs.com
0 コメント1:22 PMHN で見る
Hey HN — Blake here. We built Novyx because every AI agent framework treats memory as an afterthought. Agents forget between sessions, can't search what they know, and when they make bad decisions there's no way to understand why.

  Novyx is a memory API for AI agents. Store observations, recall them with semantic search, and roll back when things go
  wrong.

  What it does:

  - Store + Recall — Semantic search over agent memories using sentence embeddings. Recency-weighted scoring, auto-linking
  related memories via knowledge graph.
  - Rollback — Point-in-time rollback with dry-run preview. Undo bad writes without redeploying.
  - Replay — Time-travel debugging. Reconstruct what your agent knew at any timestamp. Diff memory states between two points.
   Track individual memories from birth to death.
  - Cortex — Autonomous memory maintenance. Consolidates near-duplicate memories, reinforces frequently recalled ones, decays
   forgotten ones. Runs in the background.
  - Audit trail — Compliance-grade logging of every memory operation. Tamper-evident hash chains.

  Technical details:

  - Postgres + pgvector for storage and search. Redis for auth/rate limiting. CPU-only embeddings (all-MiniLM-L6-v2).
  - Multi-tenant with application-level isolation. ~82 REST endpoints.
  - Python SDK and JS/TS SDK. LangChain, CrewAI, and MCP integrations.
  - Free tier: 5K memories, 5K API calls/mo. Pro ($39/mo): unlimited memories + Replay + Cortex. Enterprise ($199/mo):
  counterfactual recall, drift analysis, insights.

  We're not competing with LangSmith or Langfuse — those are trace debuggers (what the LLM said). We're the layer underneath
  (what the agent knew).

  Live at https://novyxlabs.com. Docs at https://novyxlabs.com/docs.

  Happy to answer questions about the architecture.
1

Cifer, zero-key custody using threshold cryptography #

cifer-security.com faviconcifer-security.com
0 コメント4:04 PMHN で見る
I built CIFER, a distributed encryption + access-control system designed so that no component ever holds a complete decryption key at rest.

Core idea: each “secret” (per user or per dataset) has its own independent post-quantum keypair. There is no master key.

Architecture summary:

Control plane: verifiable ownership, delegation, revocation, and append-only audit records (tamper-evident authorization history)

Custody plane: 5 custody nodes running in TEEs, each storing 1 key fragment

Orchestration: validates authorization then collects fragments to reconstruct keys only when needed

Key custody model:

Private key is generated in a TEE then immediately split via Shamir secret sharing into 5 fragments

Fragments are distributed to independent custody nodes

Original private key is destroyed

Threshold is 3-of-5 for reconstruction

Each custody node independently verifies authorization against the control plane before releasing its fragment

Clusters are disabled if membership changes (node exits disable the cluster)

Encryption scheme (hybrid PQ + symmetric):

Fetch ML-KEM-768 public key from content-addressed storage, verify integrity

ML-KEM-768 encapsulation per message/file/chunk to derive a fresh shared secret

Derive one-time AES key + IV via HKDF-SHA256

Encrypt payload with AES-256-GCM

Output includes a fixed-size envelope: ML-KEM ciphertext (1088 bytes) + GCM tag (16 bytes)

Decryption flow:

Requester signs a decryption request

Orchestrator checks owner/delegate status + freshness window (replay defense)

Orchestrator requests fragments in parallel, accepts the first 3 valid fragments

Reconstructs the private key and decrypts

Audit logs record the operation

Reconstructed keys may be cached in memory for 36 hours (availability vs exposure tradeoff)

Design goal: reduce blast radius from insider threats and single-node compromise, and address long-term confidentiality via post-quantum KEM.

I would love feedback on:

TEE trust assumptions and practical hardening for custody nodes

Whether 36h key caching is acceptable, and safer alternatives

Control plane failure modes (partition, reorg) and best practices for “deny by default” behavior

Metadata strategy for large-file workflows (I currently keep filename/size in plaintext metadata)

Better approaches for custody node independence and anti-collusion guarantees

1

Please, fix Next.js – an open letter #

please-fix-next.com faviconplease-fix-next.com
1 コメント1:16 PMHN で見る
vinext proved last week that Next.js complexity now becomes a choice, not a necessity. This is an open letter from developers who built Next.js' reputation and want it back. The GitHub repo is where we're compiling the actual pain points.
1

Grubl – AI and structured recipe generation #

grubl.app favicongrubl.app
0 コメント10:08 AMHN で見る
Hi HN,

I’ve been working on Grubl, an AI-powered cooking assistant focused on solving a surprisingly persistent problem: the daily “what’s for dinner?” decision.

Most recipe apps are essentially searchable databases. I wanted to experiment with something more adaptive — a system that combines LLM reasoning with structured recipe data and user constraints (budget, time, dietary preferences, household size).

Grubl currently supports:

Recipe generation from mood, cuisine, or available ingredients

“Fridge mode” (turn ingredients into meal suggestions)

Weekly meal planning with ingredient reuse optimisation

Auto-generated shopping lists

Nutrition-aware adjustments (macros per serving)

Step-by-step live cooking mode with timers

Basic taste preference learning

Some interesting implementation details:

Recipes are generated in structured JSON format rather than free text. This allows scaling, macro recalculation, cost estimation, and timer extraction.

Ingredient ontology mapping is used to normalise synonyms (“scallion” vs “spring onion”, etc.).

Meal planning attempts to reduce waste by reusing ingredients across days rather than treating each recipe independently.

User preferences are stored and used to bias generation weights (e.g., spice tolerance, disliked ingredients).

Things that turned out harder than expected:

Optimising weekly plans across cost, nutrition, time, and ingredient reuse simultaneously.

Making AI-generated recipes consistent enough for structured scaling.

Designing a UX that feels playful without obscuring control.

I’m particularly interested in feedback on:

Whether the structured + LLM hybrid approach makes sense architecturally.

How you’d approach long-term personalisation memory.

Whether narrowing to a specific segment (e.g. families or fitness users) would be smarter at this stage.

Site: https://grubl.app

Happy to answer questions about the stack, prompting, or tradeoffs.