Show HN за 3 февраля 2026 г.
61 постовSandboxing untrusted code using WebAssembly #
I built a runtime to isolate untrusted code using wasm sandboxes.
Basically, it protects your host system from problems that untrusted code can cause. We’ve had a great discussion about sandboxing in Python lately that elaborates a bit more on the problem [1]. In TypeScript, wasm integration is even more natural thanks to the close proximity between both ecosystems.
The core is built in Rust. On top of that, I use WASI 0.2 via wasmtime and the component model, along with custom SDKs that keep things as idiomatic as possible.
For example, in Python we have a simple decorator:
from capsule import task
@task(
name="analyze_data",
compute="MEDIUM",
ram="512mb",
allowed_files=["./authorized-folder/"],
timeout="30s",
max_retries=1
)
def analyze_data(dataset: list) -> dict:
"""Process data in an isolated, resource-controlled environment."""
# Your code runs safely in a Wasm sandbox
return {"processed": len(dataset), "status": "complete"}
And in TypeScript we have a wrapper: import { task } from "@capsule-run/sdk"
export const analyze = task({
name: "analyzeData",
compute: "MEDIUM",
ram: "512mb",
allowedFiles: ["./authorized-folder/"],
timeout: 30000,
maxRetries: 1
}, (dataset: number[]) => {
return {processed: dataset.length, status: "complete"}
});
You can set CPU (with compute), memory, filesystem access, and retries to keep precise control over your tasks.It's still quite early, but I'd love feedback. I’ll be around to answer questions.
Minikv – Distributed key-value and object store in Rust (Raft, S3 API) #
I'm Emilie, I have a literature background (which explains the well-written documentation!) and I've been learning Rust and distributed systems by building minikv over the past few months. It recently got featured in Programmez! magazine: https://www.programmez.com/actualites/minikv-un-key-value-st...
minikv is an open-source, distributed storage engine built for learning, experimentation, and self-hosted setups. It combines a strongly-consistent key-value database (Raft), S3-compatible object storage, and basic multi-tenancy.
Features/highlights:
- Raft consensus with automatic failover and sharding - S3-compatible HTTP API (plus REST/gRPC APIs) - Pluggable storage backends: in-memory, RocksDB, Sled - Multi-tenant: per-tenant namespaces, role-based access, quotas, and audit - Metrics (Prometheus), TLS, JWT-based API keys - Easy to deploy (single binary, works with Docker/Kubernetes)
Quick demo (single node):
```bash git clone https://github.com/whispem/minikv.git cd minikv cargo run --release -- --config config.example.toml curl localhost:8080/health/ready
# S3 upload + read curl -X PUT localhost:8080/s3/mybucket/hello -d "hi HN" curl localhost:8080/s3/mybucket/hello
Docs, cluster setup, and architecture details are in the repo. I’d love to hear feedback, questions, ideas, or your stories running distributed infra in Rust!
Repo: https://github.com/whispem/minikv Crate: https://crates.io/crates/minikv
C discrete event SIM w stackful coroutines runs 45x faster than SimPy #
I have built Cimba, a multithreaded discrete event simulation library in C.
Cimba uses POSIX pthread multithreading for parallel execution of multiple simulation trials, while the coroutines provide concurrency inside each simulated trial universe. The simulated processes are based on asymmetric stackful coroutines with the context switching hand-coded in assembly.
The stackful coroutines make it natural to express agentic behavior by conceptually placing oneself "inside" that process and describing what it does. A process can run in an infinite loop or just as a one-shot customer passing through the system, yielding and resuming execution from any level of its call stack, acting both as an active agent and a passive object as needed. This is inspired by my own experience programming in Simula67, many moons ago, where I found the coroutines more important than the deservedly famous object-orientation.
Cimba turned out to run really fast. In a simple benchmark, 100 trials of an M/M/1 queue run for one million time units each, it ran 45 times faster than an equivalent model built in SimPy + Python multiprocessing. The running time was reduced by 97.8 % vs the SimPy model. Cimba even processed more simulated events per second on a single CPU core than SimPy could do on all 64 cores.
The speed is not only due to the efficient coroutines. Other parts are also designed for speed, such as a hash-heap event queue (binary heap plus Fibonacci hash map), fast random number generators and distributions, memory pools for frequently used object types, and so on.
The initial implementation supports the AMD64/x86-64 architecture for Linux and Windows. I plan to target Apple Silicon next, then probably ARM.
I believe this may interest the HN community. I would appreciate your views on both the API and the code. Any thoughts on future target architectures to consider?
Octosphere, a tool to decentralise scientific publishing #
I built "AI Wattpad" to eval LLMs on fiction #
Turns out this is surprisingly hard to answer. Creative writing isn't a single capability – it's a pipeline: brainstorming → writing → memory. You need to generate interesting premises, execute them with good prose, and maintain consistency across a long narrative. Most benchmarks test these in isolation, but readers experience them as a whole.
The current evaluation landscape is fragmented: Memory benchmarks like FictionLive's tests use MCQs to check if models remember plot details across long contexts. Useful, but memory is necessary for good fiction, not sufficient. A model can ace recall and still write boring stories.
Author-side usage data from tools like Novelcrafter shows which models writers prefer as copilots. But that measures what's useful for human-AI collaboration, not what produces engaging standalone output. Authors and readers have different needs.
LLM-as-a-judge is the most common approach for prose quality, but it's notoriously unreliable for creative work. Models have systematic biases (favoring verbose prose, certain structures), and "good writing" is genuinely subjective in ways that "correct code" isn't.
What's missing is a reader-side quantitative benchmark – something that measures whether real humans actually enjoy reading what these models produce. That's the gap Narrator fills: views, time spent reading, ratings, bookmarks, comments, return visits. Think of it as an "AI Wattpad" where the models are the authors.
I shared an early DSPy-based version here 5 months ago (https://news.ycombinator.com/item?id=44903265). The big lesson: one-shot generation doesn't work for long-form fiction. Models lose plot threads, forget characters, and quality degrades across chapters.
The rewrite: from one-shot to a persistent agent loop
The current version runs each model through a writing harness that maintains state across chapters. Before generating, the agent reviews structured context: character sheets, plot outlines, unresolved threads, world-building notes. After generating, it updates these artifacts for the next chapter. Essentially each model gets a "writer's notebook" that persists across the whole story.
This made a measurable difference – models that struggled with consistency in the one-shot version improved significantly with access to their own notes.
Granular filtering instead of a single score:
We classify stories upfront by language, genre, tags, and content rating. Instead of one "creative writing" leaderboard, we can drill into specifics: which model writes the best Spanish Comedy? Which handles LitRPG stories with Male Leads the best? Which does well with romance versus horror?
The answers aren't always what you'd expect from general benchmarks. Some models that rank mid-tier overall dominate specific niches.
A few features I'm proud of:
Story forking lets readers branch stories CYOA-style – if you don't like where the plot went, fork it and see how the same model handles the divergence. Creates natural A/B comparisons.
Visual LitRPG was a personal itch to scratch. Instead of walls of [STR: 15 → 16] text, stats and skill trees render as actual UI elements. Example: https://narrator.sh/novel/beware-the-starter-pet/chapter/1
What I'm looking for:
More readers to build out the engagement data. Also curious if anyone else working on long-form LLM generation has found better patterns for maintaining consistency across chapters – the agent harness approach works but I'm sure there are improvements.
Inverting Agent Model (App as Clients, Chat as Server and Reflection) #
The project is designed to handle communication between desktop apps in an agentic manner, so the focus is strictly on this IPC layer (forget about HTTP API calls).
At the heart of RAIL (Remote Agent Invocation Layer) are two fundamental concepts. The names might sound scary, but remember this is a research project:
Memory Logic Injection + Reflection Paradigm shift: The Chat is the Server, and the Apps are the Clients.
Why this approach? The idea was to avoid creating huge wrappers or API endpoints just to call internal methods. Instead, the agent application passes its own instance to the SDK (e.g., RailEngine.Ignite(this)).
Here is the flow that I find fascinating:
-The App passes its instance to the RailEngine library running inside its own process.
-The Chat (Orchestrator) receives the manifest of available methods.The Model decides what to do and sends the command back via Named Pipe.
-The Trigger: The RailEngine inside the App receives the command and uses Reflection on the held instance to directly perform the .Invoke().
Essentially, I am injecting the "Agent Logic" directly into the application memory space via the SDK, allowing the Chat to pull the trigger on local methods remotely.
A note on the Repo: The GitHub repository has become large. The core focus is RailEngine and RailOrchestrator. You will find other connectors (C++, Python) that are frankly "trash code" or incomplete experiments. I forced RTTR in C++ to achieve reflection, but I'm not convinced by it. Please skip those; they aren't relevant to the architectural discussion.
I’d love to focus the discussion on memory-managed languages (like C#/.NET) and ask you:
-Architecture: Does this inverted architecture (Apps "dialing home" via IPC) make sense for local agents compared to the standard Server/API model?
-Performance: Regarding the use of Reflection for every call—would it be worth implementing a mechanism to cache methods as Delegates at startup? Or is the optimization irrelevant considering the latency of the LLM itself?
-Security: Since we are effectively bypassing the API layer, what would be a hypothetical security layer to prevent malicious use? (e.g., a capability manifest signed by the user?)
I would love to hear architectural comparisons and critiques.
Craftplan – Elixir-based micro-ERP for small-scale manufacturers #
So I built Craftplan. All the features were tailored to what she actually needed, and I figured other small-scale manufacturers (soap makers, breweries, candle makers, etc.) probably need the same things. So I’m putting it out there for free.
- Live demo: https://craftplan.fly.dev ([email protected] / Aa123123123123)
- GitHub: https://github.com/puemos/craftplan
- Docs: https://puemos.github.io/craftplan
- Self-hosting guide: https://puemos.github.io/craftplan/docs/self-hosting/
What it does: - Product catalog with versioned recipes (BOMs) and automatic cost rollups across materials, labor, and overhead
- Inventory tracking with lot traceability, expiry dates, allergen/nutrition flags, and demand forecasting
- Order processing with calendar-based scheduling and allocation to production batches
- Production planner with make sheets, material consumption from specific lots, and cost snapshots
- Purchase orders with receiving workflow that auto-creates inventory lots
- Basic CRM for customers and suppliers
- CSV import/export, iCal calendar feed, JSON:API and GraphQL endpoints
Experience building with Elixir, Ash and Liveview: - Speed: you get to test and improve things sooo fast. The DSL makes it simple to translate your thinking into live product
- Extensibility: With Ash + LiveView you can add more features so easily. Adding JSON:API + Grapghql was a few minutes.
- UX: I believe LiveView makes it simple to deliver great UX since it forcing you to keep things simple with no so much interaction overhead which most of the time means better and simple experience
Self-hosting: - Docker image: `ghcr.io/puemos/craftplan` (amd64 + arm64)
- Docker Compose bundles PostgreSQL 16 + MinIO.
Other details: - Email config from UI (SMTP, SendGrid, Mailgun, Postmark, Brevo, Amazon SES)
- API keys encrypted at rest (AES-256-GCM)
- Role-based access (admin/staff)
- Tech stack: Elixir, Ash Framework, Phoenix LiveView, PostgreSQL
- License: AGPLv3
Feedback welcome (and needed!)Axiomeer – An open marketplace for AI agents #
I built Axiomeer, an open-source marketplace protocol for AI agents. The idea: instead of hardcoding tool integrations into every agent, agents shop a catalog at runtime, and the marketplace ranks, executes, validates, and audits everything.
How it works: - Providers publish products (APIs, datasets, model endpoints) via 10-line JSON manifests - Agents describe what they need in natural language or structured tags - The router scores all options by capability match (70%), latency (20%), cost (10%) with hard constraint filters - The top pick is executed, output is validated (citations required? timestamps?), and evidence quality is assessed deterministically - If the evidence is mock/fake/low-quality, the agent abstains rather than hallucinating - Every execution is logged as an immutable receipt
The trust layer is the part I think is missing from existing approaches. MCP standardizes how you connect to a tool server. Axiomeer operates one layer up: which tool, from which provider, and can you trust what came back?
Stack: Python, FastAPI, SQLAlchemy, Ollama (local LLM, no API keys). v1 ships with weather providers (Open-Meteo + mocks). The architecture supports any HTTP endpoint that returns structured JSON.
Looking for contributors to add real providers across domains (finance, search, docs, code execution). Each provider is ~30 lines + a manifest.
Latchkey – inject credentials into agents' curl calls #
At Imbue, we’ve been following the rapid developments in the agent landscape and noticed that the way agents interact with third-party services on users’ behalf often leaves a lot to be desired. Integrations are ad hoc, complicated, context-heavy, and either unfriendly to non-technical users or tied to a lock-in of some sort.
We‘re experimenting with a command-line tool, Latchkey, that could be used by local agents targeted at non-technical users while avoiding remote intermediaries. As far as we know, this is the only existing approach at the intersection of these two goals.
Core idea: Agents access APIs of third-party services by prepending ordinary `curl` calls with the `latchkey` command, like this:
latchkey curl -X POST 'https://slack.com/api/conversations.create' \
-H 'Content-Type: application/json' \
-d '{"name":"something-urgent"}'
Latchkey then transparently injects credentials into these calls, prompting the user to log in via a browser pop-up when needed. Browser automation is used to extract an API token from the browser session once logged in.Benefits:
* A single skill to integrate with all services.
* Direct communication between the agent and the third-party service (no OAuth intermediary app needed).
* Agents usable by non-technical users.
* Secrets do not leak to logs or chat transcripts.
We believe this aligns with a vision of a decentralized future in which people don’t need to ask corporations for permission to use their own data. We imagine a lively ecosystem of local agents that people use freely, supported by a community helping each other keep these tools useful and functional.
We’re aware that this approach comes with some downsides, too, and would love your feedback.
P.S. Here’s also a link to Passepartout, a toy demo AI assistant app built using Latchkey: https://github.com/imbue-ai/passepartout
Latex-wc – Word count and word frequency for LaTeX projects #
So I built latex-wc, a small Python CLI that:
- extracts tokens from LaTeX while ignoring common LaTeX “noise” (commands, comments, math, refs/cites, etc.)
- can take a single .tex file or a directory and recursively scan all *.tex files
- prints a combined report once (total words, unique words, top-N frequencies)
Fastest way to try it is `uvx latex-wc [path]` (file or directory). Feedback welcome, especially on edge cases where you think the heuristic filters are too aggressive or not aggressive enough.
Kannada Nudi Editor Web Version #
Open-source semantic search over your local notes via CLI #
What it does:
Semantic search over local folders and notes Works across multiple synced directories RAG-style answers with citations from your own files
How it works:
Calls `POST /search/query` with `local_folders` Uses `search_mode: sources` to return answers + file references
Example:
vault ask "What are my notes about project planning?"
Orchestrate Claude Code CLI from GitHub #
"Kiln" orchestrates Claude Code instances on your local machine using GitHub projects as its control panel.
https://github.com/agentic-metallurgy/kiln
If you're around Stage 6-7 on the Gas Town scale, you may have 3-15 terminal windows open. You're out of screen real estate and the markdown files are piling up. TUIs and specialized IDEs are meant to help, but they're more things to manage.
Kiln simply polls GitHub projects. When you move issues from one column to another, Kiln invokes Claude Code CLI to run the corresponding /command.
Claude creates the worktrees, researches the codebase, creates and implements the plan. Stores it in GitHub Issues.
It's meant to be simple, nothing new:
- Use your existing claude subscription (no auth trickery, runs locally)
- All context and state is on GitHub (no markdown mess, no local DBs, easy recovery)
- Poll instead of webhooks/events (no external attack surfaces, works behind VPN)
- Supports MCPs and anything else Claude can do
That's the heart of it and it works because... it's just Claude :)
It's got a few more tricks, but I'll cut it short.
ps: sorry for fresh account, needed a real name one :) been lurking since 2008.
Browser based state machine simulator and visualizer #
LUML – an open source (Apache 2.0) MLOps/LLMOps platform #
We built LUML (https://github.com/luml-ai/luml), an open-source (Apache 2.0) MLOps/LLMOps platform that covers experiments, registry, LLM tracing, deployments and so on.
It separates the control plane from your data and compute. Artifacts are self-contained. Each model artifact includes all metadata (including the experiment snapshots, dependencies, etc.), and it stays in your storage (S3-compatible or Azure).
File transfers go directly between your machine and storage, and execution happens on compute nodes you host and connect to LUML.
We’d love you to try the platform and share your feedback!
Emmtrix ONNX-to-C Code Generator for Edge AI Deployment #
Kekkai – Interactive security triage in the terminal #
As an AppSec engineer, I’ve spent a lot of time running and tunning open-source security scanners like Trivy, Semgrep, Gitleaks and Dojo. What I have found is that running them is easy, reviewing the results, not so much. Each tool outputs different JSON, false positives pile up, and CI either becomes noisy or blocks everything.
So I built Kekkai (formerly Hokage), a small open-source CLI that wraps these scanners and focuses specifically on human triage.
Kekkai runs the scanners in isolated Docker containers, normalizes their outputs into a single format, and provides an interactive terminal UI to review findings, mark false positives, and save decisions locally.
You can try it out:
``` pipx install kekkai-cli kekkai scan kekkai triage ```
What it currently does:
- Runs Trivy (dependencies), Semgrep (code), and Gitleaks (secrets) - Normalizes findings into a unified report - Provides a keyboard-driven TUI for reviewing and marking findings - Supports .kekkaiignore for false positives - Has a CI mode with severity-based failure thresholds
Design choices:
- Local-first by default (no SaaS required) - No proprietary scanning logic, it sits on top of existing tools - Scanners run in read-only, no-network Docker containers
This is still early and aimed at individual developers and small teams. The next things I’m working on are persistent triage state across runs (baselines) and better PR-level workflows.
Repo and docs: https://github.com/kademoslabs/kekkai
I’m around to answer questions about tradeoffs, limitations, or why this exists at all.
I built an AI movie making and design engine in Rust #
All of my film school friends have a lot of ambition, but the production pyramid doesn't allow individual talent to shine easily. 10,000 students go to film school, yet only a handful get to helm projects they want with full autonomy - and almost never at the blockbuster budget levels that would afford the creative vision they want. There's a lot of nepotism, too.
AI is the personal computer moment for film. The DAW.
One of my friends has done rotoscoping with live actors:
https://www.youtube.com/watch?v=Tii9uF0nAx4
The Corridor folks show off a lot of creativity with this tech:
https://www.youtube.com/watch?v=_9LX9HSQkWo
https://www.youtube.com/watch?v=DSRrSO7QhXY
https://www.youtube.com/watch?v=iq5JaG53dho
We've been making silly shorts ourselves:
https://www.youtube.com/watch?v=oqoCWdOwr2U
https://www.youtube.com/watch?v=H4NFXGMuwpY
The secret is that a lot of studios have been using AI for well over a year now. You just don't notice it, and they won't ever tell you because of the stigma. It's the "bad toupee fallacy" - you'll only notice it when it's bad, and they'll never tell you otherwise.
Comfy is neat, but I work with folks that don't intuit node graphs and that either don't have graphics cards with adequate VRAM, or that can't manage Python dependencies. The foundation models are all pretty competitive, and they're becoming increasingly controllable - and that's the big thing - control. So I've been working on the UI/UX control layer.
ArtCraft has 2D and 3D control surfaces, where the 3D portion can be used as a strong and intuitive ControlNet for "Image-to-Image" (I2I) and "Image-to-Video" (I2V) workflows. It's almost like a WYSIWYG, and I'm confident that this is the direction the tech will evolve for creative professionals rather than text-centric prompting.
I've been frustrated with tools like Gimp and Blender for a while. I'm no UX/UI maestro, but I've never enjoyed complicated tools - especially complicated OSS tools. Commercial-grade tools are better. Figma is sublime. An IDE for creatives should be simple, magical, and powerful.
ArtCraft lets you drag and drop from a variety of creative canvases and an asset drawer easily. It's fast and intuitive. Bouncing between text-to-image for quick prototyping, image editing, 3d gen, to 3d compositing is fluid. It feels like "crafting" rather than prompting or node graph wizardry.
ArtCraft, being a desktop app, lets us log you into 3rd party compute providers. I'm a big proponent of using and integrating the models you subscribe to wherever you have them. This has let us integrate WorldLabs' Marble Gaussian Splats, for instance, and nobody else has done that. My plan is to add every provider over time, including generic API key-based compute providers like FAL and Replicate. I don't care if you pay for ArtCraft - I just want it to be useful.
Two disclaimers:
ArtCraft is "fair source" - I'd like to go the Cockroach DB route and eventually get funding, but keep the tool itself 100% source available for people to build and run for themselves. Obsidian, but with source code. If we got big, I'd spend a lot of time making movies.
Right now ArtCraft is tied to a lightweight cloud service - I don't like this. It was a choice so I could reuse an old project and go fast, but I intend for this to work fully offline soon. All server code is in the monorepo, so you can run everything yourself. In the fullness of time, I do envision a portable OSS cloud for various AI tools to read/write to like a Github for assets, but that's just a distant idea right now.
I've written about roadmap in the repo: I'd like to develop integrations for every compute provider, rewrite the frontend UI/UX in Bevy for a fully native client, and integrate local models too.
ErwinDB, a TUI to view 7k Stack Overflow answers by Postgres expert #
Over the years, I've lost count of how often I've searched Stack Overflow for a Postgres question and ended up with an answer by Erwin Brandstetter that was exceptionally thorough and clear. I've become a better developer by learning from his responses.
ErwinDB lets you browse Erwin Brandstetter's answers offline and search them quickly from a TUI. It includes semantic search, syntax highlighting, one-key opening of links in your external browser, and an "Erwin mode" that prominently highlights his posts.
127 PRs to Prod this wknd with 18 AI agents: metaswarm. MIT licensed #
I got tired of being a project manager for Claude Code. It writes code fine, but shipping production code is seven or eight jobs — research, planning, design review, implementation, code review, security audit, PR creation, CI babysitting. I was doing all the coordination myself. The agent typed fast. I was still the bottleneck. What I really needed was an orchestrator of orchestrators - swarms of swarms of agents with deterministic quality checks.
So I built metaswarm. It breaks work into phases and assigns each to a specialist swarm orchestrator. It manages handoffs and uses BEADS for deterministic gates that persist across /compact, /clear, and even across sessions. Point it at a GitHub issue or brainstorm with it (it uses Superpowers to ask clarifying questions) and it creates epics, tasks, and dependencies, then runs the full pipeline to a merged PR - including outside code review like CodeRabbit, Greptile, and Bugbot.
The thing that surprised me most was the design review gate. Five agents — PM, Architect, Designer, Security, CTO — review every plan in parallel before a line of code gets written. All five must approve. Three rounds max, then it escalates to a human. I expected a rubber stamp. It catches real design problems, dependency issues, security gaps.
This weekend I pointed it at my backlog. 127 PRs merged. Every one hit 100% test coverage. No human wrote code, reviewed code, or clicked merge. OK, I guided it a bit, mostly helping with plans for some of the epics.
A few learnings:
Agent checklists are theater. Agents skipped coverage checks, misread thresholds, or decided they didn't apply. Prompts alone weren't enough. The fix was deterministic gates — BEADS, pre-push hooks, CI jobs all on top of the agent completion check. The gates block bad code whether or not the agent cooperates.
The agents are just markdown files. No custom runtime, no server, and while I built it on TypeScript, the agents are language-agnostic. You can read all of them, edit them, add your own.
It self-reflects too. After every merged PR, the system extracts patterns, gotchas, and decisions into a JSONL knowledge base. Agents only load entries relevant to the files they're touching. The more it ships, the fewer mistakes it makes. It learns as it goes.
metaswarm stands on two projects: https://github.com/steveyegge/beads by Steve Yegge (git-native task tracking and knowledge priming) and https://github.com/obra/superpowers by Jesse Vincent (disciplined agentic workflows — TDD, brainstorming, systematic debugging). Both were essential.
Background: I founded Technorati, Linuxcare, and Warmstart; tech exec at Lyft and Reddit. I built metaswarm because I needed autonomous agents that could ship to a production codebase with the same standards I'd hold a human team to.
$ cd my-project-name
$ npx metaswarm init
MIT licensed. IANAL. YMMV. Issues/PRs welcome!
TrueLedger – a local-first personal finance app with no cloud back end #
I built TrueLedger because I didn’t want a personal finance app that requires a cloud account or bank credential access just to work.
TrueLedger is a local-first personal finance app. All data stays on the user’s device and works fully offline.
Technical choices: - SQLite for local storage across platforms - SQLCipher (AES-256) for encrypted databases - Web version runs entirely client-side using SQLite WASM - Encrypted, deterministic JSON backups for portability without a server
Demo (runs fully client-side): https://trueledger.satyakommula.com
Source: https://github.com/satyakommula96/trueledger
Happy to answer questions about local-first design or encryption tradeoffs.
AnsiColor, resilient ANSI color codes for your TUI #
I built this after experiencing the hilarious illegibility of Codex CLI when running with Solarized Dark. If a zillion dollar company can't get it right, we need better tools.
It comes with these themes:
Andromeda
Ayu Dark/Light
Bearded Dark/Light
Catppuccin Frappe
Catppuccin Latte
Catppuccin Macchiato
Catppuccin Mocha
Dracula
GitHub Dark
Gruvbox
Monokai Dark/Light
Nord
One Dark/Light
Palenight
Panda
Solarized Dark/Light
Synthwave 84
Tailwind
Tokyo Night Dark/LightReal-world speedrun timer that auto-ticks via vision on smart glasses #
Demo: https://www.youtube.com/watch?v=NuOVlyr-e1w
Repo: https://github.com/RealComputer/GlassKit
I initially tried a multimodal LLM for scene understanding, but the latency and consistency were not good enough for this use case, so I switched to a small object detection model (fine-tuned RF-DETR). It just runs an inference loop on the camera feed. This also makes on-device/offline use feasible (today it still runs via a local server).
govalid – Go validation without reflection (5-44x faster) #
Autoliner – write a bot to control a virtual airline #
A Python time and space complexity reference #
I ran into a case where I used a Python stdlib class but couldn’t confidently state its time or space complexity. The official docs didn’t mention it. When I asked in Python discussion forums whether complexity should be documented, the response was mixed.
I originally assumed a comprehensive reference would take too much manual effort, but realized that coding agents could help bootstrap it by extracting and summarizing information from the documentation and CPython source code.
To reduce errors, I had multiple agents (using different models) independently check the same sections and compare their results. The project is open source, so the idea is that remaining inaccuracies can be corrected over time.
I’d appreciate feedback on missing cases, incorrect assumptions, or whether this is useful in real Python work.
ItemGrid – Free inventory management for single-location businesses #
Visual grid interface QR/barcode scanning Multi-location support Free for 1 location forever $8/user when you grow
Right now it's just a landing page collecting validation signups. Not building the full product until I hit 50-100 signups to confirm real demand. Would love feedback, especially if you've dealt with inventory headaches. https://itemgrid.io
I wrote a Semgrep alternative in Rust with cross-file taint tracking #
Written in Rust, uses tree-sitter for parsing Cross-file taint propagation with BFS (max depth 15) 647 Semgrep rules pre-compiled at build time Supports 28 languages, 20+ frameworks (Spring, Django, Express, etc.) SARIF output for GitHub Security tab integration Sub-500ms for 100k lines
I scanned Spring Boot's own framework source and found 36 cross-file data flows including 8 SQL injection paths. Not toy examples — real multi-hop flows across 5-15 files. It's free and open source. Happy to answer questions about the taint analysis implementation or anything else.
Find viral video ideas on YouTube #
I am a Youtuber with 170k subs and 12M views. (@EatTheBlocks)
Getting good video ideas has been the most important factor to my success.
To get new ideas, I started monitoring competitor channels every week, manually.
I got lots of good ideas that became viral videos. But the monitoring process is very tedious.
I built a tool to automate the process, for myself. Then I decided to make the tool available to others. This is ViralOutlier.
If you have question about the tool, reply to this comment.
Stream-based AI with neurological multi-gate (Na⁺/θ/NMDA) #
OAuth 2.0 server with AI security agents (EU sovereign alternative) #
Then I discovered agentic coding and shipped it in 3 weeks.
What makes it different:
• Dual AI agents analyze every login in <300ms - Security Signals Agent: risk scoring (device, IP, geo, velocity) - Policy Compliance Agent: business rules (MFA policies, role enforcement) - Combined decision: allow/log/step-up/lock/deny
• Production-ready security - PKCE (RFC 7636), DPoP (RFC 9449) - MFA (TOTP + WebAuthn/Passkeys) - IP restrictions, rate limiting, audit trail
• EU digital sovereignty - GDPR native (data export, legal holds, retention policies) - EU hosting, no US Cloud Act exposure - Full audit trail (PostgreSQL + Redis Streams)
• Zero AI dependency - Deterministic fallback if AI timeouts - Conservative MEDIUM risk returned (safe default) - System keeps running without external LLM calls
• Modern stack - Backend: NestJS + TypeScript, LangChain/LangGraph - Frontend: React 19, hexagonal architecture, 91% test coverage - Deterministic fallback if AI timeouts (zero dependency)
Built as an alternative to Firebase Auth / AWS Cognito / Auth0 for companies that want control over their authentication infrastructure.
Architecture diagrams and screenshots in the repo.
Open to feedback and questions.
Ember-mug – I made a CLI for the Ember Coffee Mug #
Submit issues on Github if you find any problems. Feel free to submit a PR too.
https://ember-mug.benjaminjsinger.com/
WebRockets – Rust-powered WebSockets library for Python #
Homomorphically Encrypted Vector Database #
We’re releasing HEVEC, a vector database built on homomorphic encryption, enabling end-to-end privacy with real-time search at scale.
HEVEC is designed as a drop-in alternative to plaintext vector databases and supports real-time encrypted search at scale (1M vectors in ~187 ms).
Key points: - A secure, drop-in alternative to plaintext vector databases - End-to-end homomorphic encryption for both data and queries - Real-time encrypted search at scale (1M vectors in 187 ms)
As personal AI agents become deeply personalized, data ownership must belong to users.
HEVEC enforces this through privacy-by-design architecture.
We’d appreciate feedback from the AI, systems, and privacy communities.
VeilStream – Per-Branch Preview Environments #
## Optionally with sanitized production data
I've been building VeilStream to solve a problem I kept running into: Rigorous evaluation of changes, such that they won't break production is hard. Staging is shared and often broken. Pulling to local setups are a pain. So PRs get a quick code skim and a "LGTM." VeilStream spins up isolated preview environments from your `docker-compose.yml`, (optionally) complete with a sanitized snapshot of your production database.
## What happens when you open PR #247
1. GitHub webhook hits our API
2. We pull your branch and parse your compose file
3. Kubernetes manifests render from your compose services (and applied to our cluster)
4. A fresh namespace gets created with your containers
5. Optionally, postgres containers are seeded with your data
6. Health checks pass, you get a stable URL: `https://<unique-string>.env.veilstreamapp.com`
7. that link is commented back on any open PRs from that branch
Total time from push to working environment: about 2 minutes.
### The reviewer experience
Your teammate clicks the link. They're using your app with real data structure, real relationships, real edge cases—but emails are fake, SSNs are masked, and PII never leaves your production boundary.
No shared staging. No "wait, who's testing on staging right now?" No stepping on each other's test data. No risk to prod.
### When PR #247 gets merged or closed
The namespace, containers, and database are automatically destroyed. Nothing lingers.
## What it's not
- Not serverless/edge - this is for apps that run containers
- Not a Vercel competitor - we're focused on the full stack from a docker-compose perspective
- Not a database replication tool - the proxy works like a man-in-the-middle, not on the WAL
## MCP Server for AI Agents
We built an MCP server so Claude Code, Cursor, and other AI coding agents can deploy preview environments directly. Your agent can spin up an environment, run tests against it, and tear it down—all without leaving your editor.
## Tech Stack
- Backend: Go (API + reconciler)
- Frontend: React + TypeScript
- Infrastructure: Kubernetes, with dynamic namespace provisioning
- Database proxy: Custom Go proxy that inter
## Links
- Landing page: https://www.veilstream.com
- Application: https://app.veilstream.com
- Example project: https://github.com/veilstream/example-music-company
- Demo video: https://www.linkedin.com/posts/jonessteven_i-have-only-made-...
- Docs: https://docs.veilstream.com
Happy to answer questions.
Also: htaccess-style password protection is available for your preview environments.
Nomad Tracker – a local-first iOS app to track visas and tax residency #
I’m full stack developer (formerly iOS) and I just launched Nomad Tracker, a native iOS app to help digital nomads track physical presence across countries for visa limits and tax residency.
Key idea: everything runs on-device. No accounts, no cloud sync, no analytics.
Features:
- Calendar-based day tracking per country. - Schengen 90/180 and other visa “runways”. - Fiscal residency day counts and alerts. - Optional background location logging (battery-efficient, never overwrites manual data). - Photo import using metadata only (no image access). - On-device “Fiscal Oracle” using Apple’s Foundational Models to ask questions about your own data.
I created this because other apps felt limiting and didn’t do what I needed. This app is visual, user-focused, and designed to make tracking easy and clear.
Happy to answer questions or discuss the technical tradeoffs.
I built a baby Palantir to track everything from markets to planes #
OpenSymbolicAI – Agents with typed variables, not just context stuffing #
We've spent the last year building AI agents and kept hitting the same wall: prompt engineering doesn't feel like software engineering. It feels like guessing.
We built OpenSymbolicAI to turn agent development into actual programming. It is an open-source framework (MIT) that lets you build agents using typed primitives, explicit decompositions, and unit tests.
THE MAIN PROBLEM: CONTEXT WINDOW ABUSE
Most agent frameworks (like ReAct) force you to dump tool outputs back into the LLM's context window to decide the next step.
Agent searches DB.
Agent gets back 50kb of JSON.
You paste that 50kb back into the prompt just to ask "What do I do next?"
This is slow, expensive, and confuses the model.
THE SOLUTION: DATA AS VARIABLES
In OpenSymbolicAI, the LLM generates a plan (code) that manipulates variables. The actual heavy data (search results, PDF contents, API payloads) is stored in the Python/runtime variables and is never passed through the LLM context until a specific primitive actually needs to read it.
Think of it as pass-by-reference for Agents. The LLM manipulates variable handles (docs), while the Python runtime stores the actual data.
EXAMPLE: A RAG AGENT
Instead of the LLM hallucinating a plan based on a wall of text, it simply writes the logic to manipulate the data containers.
class ResearchAgent(PlanExecute):
@primitive
def retrieve_documents(self, query: str) -> list[Document]:
"""Fetches heavy documents from vector DB."""
# Returns heavy objects that stay in Python memory
return vector_store.search(query)
@primitive
def synthesize_answer(self, docs: list[Document]) -> str:
"""Consumes documents to generate an answer."""
# This is the ONLY step that actually reads the document text
context = "\n".join([d.text for d in docs])
return llm.generate(context)
@decomposition(intent="Research quantum computing")
def _example_flow(self):
# The LLM generates this execution plan.
# Crucially: The LLM manages the 'docs' variable symbol,
# but never sees the massive payload inside it during planning.
docs = self.retrieve_documents("current state of quantum computing")
return self.synthesize_answer(docs)
agent = ResearchAgent() agent.run("Research the latest in solid state batteries")DISCUSSION
We'd love to hear from the community about:
Where have you struggled with prompt engineering brittleness?
What would convince you to try treating prompts as code?
Are there other domains where this approach would shine?
What's missing to make this production-ready for your use case?
The code is intentionally simple Python, no magic, no framework lock-in. If the approach resonates, it's easy to adapt to your specific needs or integrate with existing codebases.
Repos:
Core (Python): https://github.com/OpenSymbolicAI/core-py
Docs: https://www.opensymbolic.ai/
Blog (Technical deep dives): https://www.opensymbolic.ai/blog
Tenuo – Capability-Based Authorization (Macaroons for AI Agents) #
Tenuo makes authority task-scoped. A manager agent starts with a signed capability ("warrant") and can delegate to workers, but delegation can only attenuate. Each step gets narrower authority, and it disappears when the task ends.
Even if an agent is prompt-injected, it can't take actions outside its warrant. Every tool call requires proof-of-possession, and arguments are validated against explicit constraints before execution.
This is inspired by systems like Macaroons / Biscuit / UCAN, but adapted for AI agents processing untrusted input:
- signed, ephemeral capability tokens
- mandatory proof-of-possession
- constraint checking on tool arguments
- fail-closed by default (no warrant = no action)
Implementation details:
- Rust core, Python SDK
- ~27μs verification per call, offline
- integrations for LangGraph, OpenAI SDK, MCP, A2A, etc.
Repo: https://github.com/tenuo-ai/tenuo
Launch post (2025): https://niyikiza.com/posts/tenuo-launch/
Happy to answer questions or hear where this breaks down.
I built an AI twin recruiters can interview #
The problem: Hiring new grads is broken. Thousands of identical resumes, but we're all different people. Understanding someone takes time - assessments, phone screens, multiple interviews. Most never get truly seen.
I didn't want to be just another PDF. So I built an AI twin that recruiters can actually interview.
What you can do: •Interview my AI about anything: https://chengai.me/chat •Paste your JD to see if we match: https://chengai.me/jd-match •Explore my projects, code, and writing
What happened: Sent it to one recruiter on LinkedIn. Next day, traffic spiked as it spread internally. Got interview invites within 24 hours.
The bigger vision: What if this became standard? Instead of resume spam → keyword screening → interview rounds that still miss good fits, let recruiter AI talk to candidate AI for deep discovery. Build a platform where anyone can create their AI twin for genuine matching.
I'm seeking Software/AI/ML Engineering roles and can build production-ready solutions from scratch.
The site itself proves what I can do. Would love HN's thoughts on both the execution and the vision.
Open-source taxonomy of 122 AI/LLM attack vectors #
So I built one. 122 distinct attack techniques across 11 categories, mapped to OWASP LLM Top 10 and MITRE ATLAS.
Categories: - Prompt Injection (20 attacks) - Jailbreaks (22) - System Prompt Leakage (12) - Vision/Multimodal (12) - Excessive Agency / Tool Abuse (12) - Multi-Turn Manipulation (8) - Sensitive Info Disclosure (10) - Supply Chain (8) - Vector/Embedding Attacks (8) - Improper Output Handling (8) - Unbounded Consumption (2)
What's included: IDs, names, descriptions, severity ratings, framework mappings, remediation guidance, code examples.
What's NOT included: actual payloads, detection logic, model-specific success rates. This is a taxonomy, not an exploit database.
The goal is to give security teams a checklist and common language for AI security assessments.
Apache 2.0 licensed. PRs welcome for new techniques, framework mappings (NIST, ISO, etc.), and remediation improvements.
AI Config – Keep Claude / Codex / Gemini / OpenCode Configs in Sync #
I built a script to sync them. The repo includes my working configs: subagents, skills, commands, MCP servers.
Added a new puzzle mode to my game #
Not to worry I won't be spamming Show HN with updates but I did think the new puzzle mode would be worth an update. Would love any feedback if you have a few moments to try it out. Thanks!
Link: https://spellrush.com/
Prism – 7 AI stories daily with credibility tags, no doomscrolling #
Every day, it distills AI news into exactly 7 swipeable cards. Each story shows a clear credibility tag—whether it's a peer-reviewed paper, a product launch, funding news, or just speculation. Swipe through in 2 minutes, know what's real, move on.
No infinite scroll. No algorithm trying to hook you. Just intentional constraints.
Would love HN's feedback—especially on what credibility signals matter most to you.
ChibiGenerator – Generate chibi-style characters from photos using AI #
I built ChibiGenerator, a small web app that generates chibi-style characters from photos or text prompts using AI.
I noticed that many existing avatar or chibi tools either rely on fixed templates or require illustration skills. I wanted something simple: upload a photo, pick a style, and get a clean, usable chibi character in seconds.
The product is currently focused on: - Photo-to-chibi and text-to-chibi generation - Multiple chibi-style outputs - High-resolution images suitable for reuse - A very lightweight, no-friction UI
It’s live and being iterated on based on early user feedback. I’d really appreciate any thoughts on output quality, usability, or potential use cases I may be missing.
Thanks for checking it out.
LLM Shield (Prompt Injection protection for developers) #
Using sound symbolism and multi-agent AI to generate brand names #
The core problem: if you ask any LLM to name a business, you get the same [Adjective][Noun] compounds. NovaTech. BrightPath. SwiftFlow. They're linguistically dead — no phonetic texture, no semantic depth, high cognitive fluency but zero distinctiveness.
The pipeline has six stages:
1. A discovery agent analyzes the business and produces a strategic brief. Critically, it also generates a "tangential category" (something completely unrelated, like "a luxury candle brand" for a SaaS tool) and a "disguised context" (an adjacent industry).
2. Three creative agents run in parallel, each with a different framing of the same brief. One works honestly from the brief. One is told it's naming the disguised context. One is told it's naming the tangential category. The disguised and tangential agents consistently produce more interesting names because they're freed from category conventions — the LLM can't fall back on the obvious industry vocabulary.
3. A linguistic filter scores all ~90 candidates using sound symbolism research: - The bouba/kiki effect (round sounds like b, m, l, o map to friendly/soft; sharp sounds like k, t, p, i map to edgy/precise) - Processing fluency (ease of pronunciation, spelling, recall) - The Von Restorff isolation effect (distinctiveness from category norms) - Consonant/vowel balance and syllable structure
Each name gets a 0-100 score. Top 25 survive.
4. Domain availability across ~280 combinations (7 TLDs x multiple variations).5. A synthesis agent ranks the final 10. This stage uses Claude instead of OpenAI — the ranking requires balancing semantic relevance, brand fit, sound symbolism scores, domain availability, and "polarization potential" (names that provoke a reaction tend to be stronger brands). Claude handles this kind of multi-factor holistic judgment noticeably better in my testing.
6. Trademark screening against the USPTO database, cross-referenced with the Nice classification classes identified in stage 1.
The two-model split was a pragmatic choice. GPT-4o-mini is fast and cheap for structured generation and analysis (stages 1-4). Claude Opus is better at the subjective ranking tradeoffs in stage 5 but would be too expensive to run across all the parallel creative agents.
The linguistic scoring is the part I find most interesting. Sound symbolism is well-established in psycholinguistics but rarely applied systematically to naming. Lexicon Branding (who named Sonos, Pentium, Blackberry) uses these principles — the "s" sounds in Sonos evoke smoothness and flow, which maps to their product experience. The tool tries to do the same analysis programmatically.
Genuinely curious what HN thinks of the names it generates. Try it with a business you know well and see if the output feels different from what ChatGPT gives you.
I built a client-side AI background remover (100% Free) #
I am a developer based in Bangladesh. I recently built a browser-based image background remover because I was frustrated with existing tools. Most of them either ask for payment after 1-2 images or require uploading private photos to a remote server.
This tool runs entirely on the client side using WebAssembly.
Key Features: - Privacy-first: No server-side processing, images never leave your device. - Unlimited & Free: Since it uses your CPU/GPU, it costs me almost nothing to host. - High Resolution: Preserves the original quality of the image.
Tech Stack: - Vanilla JS & HTML5 - @imgly/background-removal for WASM inference
It's a simple utility, but I found it useful for my own privacy-conscious workflow. I’d love to hear your feedback.