毎日の Show HN

Upvote0

2026年4月8日 の Show HN

47 件
76

We fingerprinted 178 AI models' writing styles and similarity clusters #

rival.tips faviconrival.tips
23 コメント2:06 PMHN で見る
We have a dataset of 3,095 standardized AI responses across 43 prompts. From each response, we extract a 32-dimension stylometric fingerprint (lexical richness, sentence structure, punctuation habits, formatting patterns, discourse markers).

Some findings:

- 9 clone clusters (>90% cosine similarity on z-normalized feature vectors) - Mistral Large 2 and Large 3 2512 score 84.8% on a composite metric combining 5 independent signals - Gemini 2.5 Flash Lite writes 78% like Claude 3 Opus. Costs 185x less - Meta has the strongest provider "house style" (37.5x distinctiveness ratio) - "Satirical fake news" is the prompt that causes the most writing convergence across all models - "Count letters" causes the most divergence

The composite clone score combines: prompt-controlled head-to-head similarity, per-feature Pearson correlation across challenges, response length correlation, cross-prompt consistency, and aggregate cosine similarity.

Tech: stylometric extraction in Node.js, z-score normalization, cosine similarity for aggregate, Pearson correlation for per-feature tracking. Analysis script is ~1400 lines.

64

Druids – Build your own software factory #

github.com favicongithub.com
16 コメント8:12 PMHN で見る
Hi HN!

Druids (https://github.com/fulcrumresearch/druids) is an open-source library for structuring and running multi-agent coding workflows. Druids makes it easy to do this by abstracting away all the VM infrastructure, agent provisioning, and communication. You can watch our demo video here (https://www.youtube.com/watch?v=EVJqW-tvSy4) to see what it looks like.

At a high level:

- Users can write Python programs that define what roles the agents take on and how they interact with each other.

- A program is made of events - clear state transitions that the agents or clients can call to modify state. Each event gets exposed as an agent tool.

- Druids provisions full VMs so that the agents can run continuously and communicate effectively.

We made Druids because we were making lots of internal coding tools using agents and found it annoying to have to rearrange the wiring every time.

As we were building Druids, we realized a lot of our internal tools were easier to express as an event-driven architecture – separating deterministic control flow from agent behavior – and this design also made it possible to have many agents work reliably.

We had issues with scaling the number of concurrent agents within a run, so we decided to have each program run in an isolated sandbox program runtime, kind of the same way you run a Modal function. Each agent then calls the runtime with an agent token, which checks who can talk to who or send files across VMs, and then applies the tool call.

Our early users have found the library useful for:

- running many agents to do performance optimization

- building custom automated software pipelines for eg code review, pentesting, large-scale migrations, etc...

We've heard that the frontier labs have the infrastructure to quickly spin up 100 agents and have them coordinate with each other smoothly in various ways. We're hoping that Druids can be a starting point to make that infrastructure more accessible.

44

BAREmail ʕ·ᴥ·ʔ – minimalist Gmail client for bad WiFi #

github.com favicongithub.com
44 コメント2:44 PMHN で見る
I've been frustrated one too many times by terrible airplane wifi and not being able to load Gmail or Superhuman when all I want to do is get a few simple text-only emails out the door.

These clients have become pretty bloated with the assumption you've always got great bandwidth.

So I vibe coded BAREMAIL. It's open source, has no backend, and you can just set it up for yourself. Takes ~3 mins to setup API access via Google Cloud Platform (thanks for making this not super easy Google!)

I tried to maintain nice design and some important keyboard shortcuts without getting to overBEARing.

14

I built a local data lake for AI powered data engineering and analytics #

stream-sock-3f5.notion.site faviconstream-sock-3f5.notion.site
8 コメント9:11 PMHN で見る
I got tired of the overhead required to run even a simple data analysis - cloud setup, ETL pipelines, orchestration, cost monitoring - so I built a fully local data-stack/IDE where I can write SQL/Py, run it, see results, and iterate quickly and interactively.

You get data lake like catalog, zero-ETL, lineage, versioning, and analytics running entirely on your machine. You can import from a database, webpage, CSV, etc. and query in natural language or do your own work in SQL/Pyspark. Connect to local models like Gemma or cloud LLMs like Claude for querying and analysis. You don’t have to setup local LLMs, it comes built in.

This is completely free. No cloud account required.

Downloading the software - https://getnile.ai/downloads

Watch a demo - https://www.youtube.com/watch?v=C6qSFLylryk

Check the code repo - https://github.com/NileData/local

This is still early and I'd genuinely love your feedback on what's broken, what's missing, and if you find this useful for your data and analytics work.

13

500k+ events/sec transformations for ClickHouse ingestion #

github.com favicongithub.com
4 コメント5:26 PMHN で見る
Hi HN! We are Ashish and Armend, founders of GlassFlow.

Over the last year, we worked with teams running high-throughput pipelines into self-hosted ClickHouse. Mostly for observability and real-time analytics.

A question that came repeatedly was: What happens when throughput grows?

Usually, things work fine at 10k events/sec, but we started seeing backpressure and errors at >100k.

When the throughput per pipeline stops scaling, then adding more CPU/memory doesn’t help because often parts of the pipeline are not parallelized or are bottlenecked by state handling.

At this point, engineers usually scale by adding more pipeline instances.

That works but comes with some trade-offs: - You have to split the workload (e.g., multiple pipelines reading from the same source) - Transformation logic gets duplicated across pipelines - Stateful logic becomes harder to manage and keep consistent - Debugging and changes get more difficult because the data flow is fragmented

Another challenge arises when working with high-cardinality keys like user IDs, session IDs, or request IDs, and when you need to handle longer time windows (24h or more). The state grows quickly and many systems rely on in-memory state, which makes it expensive and harder to recover from failures.

We wanted to solve this problem and rebuild our approach at GlassFlow.

Instead of scaling by adding more pipelines, we scale within a single pipeline by using replicas. Each replica consumes, processes, and writes independently, and the workload is distributed across them.

In the benchmarks we’re sharing, this scales to 500k+ events/sec while still running stateful transformations and writing into ClickHouse.

A few things we think are interesting: - Scaling is close to linear as you add replicas - Works with stateful transformations (not just stateless ingestion) - State is backed by a file-based KV store instead of relying purely on memory - The ClickHouse sink is optimized for batching to avoid small inserts - The product is built with Go

Full write-up + benchmarks: https://www.glassflow.dev/blog/glassflow-now-scales-to-500k-...

Repo: https://github.com/glassflow/clickhouse-etl

Happy to answer questions about the design or trade-offs.

9

Itsumo: Make yourself fun and interesting academic language lessons #

itsumo.study faviconitsumo.study
0 コメント6:14 PMHN で見る
Itsumo is a language learning app that generates short lessons from stories about topics you choose, at your level. Instead of working through a fixed curriculum or another flashcard deck, you can make a lesson about whatever happens to be interesting to you that day, with audio, tappable words, and grammar notes.

You can try a language story from the landing page without signing up, and if you want to try full access, no credit card is required at signup, and the premium tier free for 14 days. It supports Japanese, Polish, Spanish, English, Portuguese, French and Italian.

I built it because I kept bouncing between two kinds of Japanese study tools: textbooks I liked but never had with me, and apps I always had with me but didn't actually want to open. What I wanted was something that felt closer to reading a lesson in a book than doing a drill, and I wanted it on my phone.

There aren't a lot of apps out there targeting intermediate language-learners. I've been studying Japanese, and a lot of the tools are optimized either for complete beginners or for spaced-repetition-heavy review. That works for some people, but once I was past the basics, I wanted to practice in the context of a story that felt relevant to me.

Technical details, since this is HN:

- It's an Expo React Native app, and we're going for universal access. It's available on the web now, with mobile versions launching soon. - Created with a cool custom mobile studio I talked about here: https://youtu.be/ufgYlZCfRM4?t=643 - Lesson generation goes through an LLM pipeline with a verification pass to keep the output near the requested JLPT level.

I'm especially interested in feedback from people who are somewhere in the middle with Japanese, where beginner material feels too basic but native material is still too much work. If you try it, I'd like to hear where it feels off, confusing, or not actually helpful.

Give it try it: https://itsumo.study

Built by Margaret and Raf https://okthink.ai/

5

I built an open source multi-agent harness in Go #

github.com favicongithub.com
0 コメント10:19 PMHN で見る
Hey HN. I built an AI agent harness over the past few months and I'm open sourcing it today.

Some context on why. I've been building with Claude Code daily using this harness. It orchestrates multiple AI agents as a team, with a dashboard, chat, kanban board, the works. I used it to build a full SaaS product (MyUpMonitor, https://myupmonitor.com) in about 24 hours of focused coding.

Then yesterday Anthropic announced Mythos and decided to keep it behind closed doors. Meanwhile I'm paying for Claude and I can't access their best model. I don't think that is nice at all...

So I'm open sourcing the harness with support for both Claude Code and OpenAI Codex. The whole point is that you shouldn't be dependent on one company's API to run your AI workflow.

What this harness does: - Spawns and manages multiple AI agents in parallel - Each agent gets a persona, role, and communication channels - CEO agent that can hire/fire workers on its own - Web dashboard with chat, kanban board, and live terminal output - Supports Claude Code, OpenAI Codex, Cursor Agent, opencode, and local models - MCP server per worker for tool access - Written in Go. Two binaries. SQLite. No heavy deps.

Would love feedback from anyone working on multi-agent setups or just what you think in general. Thank you!

5

Prefab – A generative UI framework for Python #

prefab.prefect.io faviconprefab.prefect.io
0 コメント11:37 PMHN で見る
Hi HN,

I'm the author of FastMCP, the most popular Python framework for building MCP servers. I've been really excited about MCP Apps for a while. I think letting a server ship a fully interactive UI directly into the conversation is one of the most compelling additions to the protocol.

I wanted to make this a first-class experience in FastMCP, but I kept getting stuck on what it actually means for a Python framework to integrate with a frontend feature. The JavaScript ecosystem has extraordinary tooling for this. I didn't want to build a worse version of it just to stay in Python.

What changed my thinking was looking at how MCP is actually deployed. Most people think of it as a way to reach customers, but what we overwhelmingly see is companies using MCP internally — replacing dashboards, workflows, and internal tools. And for that, Python developers don't need the full JS ecosystem and its polished, branded, custom frontends. What they need is a way to compose the right components and data bindings.

So we built Prefab. It's a generative UI framework with 100+ shadcn components that you compose using Python context managers. It serializes to a JSON protocol that renders as a real React application, with full client-side interactivity, reactive state, actions, and no JavaScript required. And because context managers have no closing tags, a partial Prefab script is already a valid UI, which means an agent can stream an interface to the renderer as it generates, without waiting for the full program to complete.

Prefab is built right into FastMCP 3.2, but Prefab Apps can also run standalone or against any REST backend (like FastAPI).

Docs: https://prefab.prefect.io

Try the playground (no install required): https://prefab.prefect.io/docs/playground

GitHub: https://github.com/PrefectHQ/prefab

4

Linggen – Open-source AI agent with P2P remote access from your phone #

linggen.dev faviconlinggen.dev
1 コメント9:43 PMHN で見る
Hi HN, I built Linggen — a local-first, open-source AI coding agent written in Rust.

  What's new in 0.9.2:
  - P2P remote access via WebRTC — control your agent from your phone, no port forwarding or cloud relay needed. Just
  `ling login` and scan a QR code.
  - Plan mode — the agent proposes a step-by-step plan before writing code, you approve or edit before execution.
  - Works with any model — Ollama, OpenAI-compatible APIs, Gemini, DeepSeek. Bring your own keys or run fully local.

  It's like Claude Code but model-agnostic, extensible through skills (markdown files), and now accessible from
  anywhere.

  Demo video on the landing page showing install → plan → build → mobile sync.

  Install: curl -fsSL https://linggen.dev/install.sh | bash

  GitHub: https://github.com/linggen/linggen
3

Composer – Diagram Your Codebase with MCP #

usecomposer.com faviconusecomposer.com
0 コメント11:45 PMHN で見る
I've been working on Composer 2.0 (usecomposer.com), a tool that connects to your AI coding agents (Claude Code, Cursor, Copilot, Windsurf) via MCP and generates full architecture diagrams from your codebase.

You can also describe what you want to build in plain English and an AI architect designs the system with you!

Would love feedback on where it's useful and where it breaks down.

3

Prompt injection detector beats ProtectAI by 19% accuracy, 8.9x smaller #

huggingface.co faviconhuggingface.co
2 コメント4:58 PMHN で見る
ProtectAI's deberta-v3 is the most downloaded prompt injection classifier on HuggingFace. We benchmarked directly against it on Qualifire's independent dataset (excluded from our training).

* 91.68% vs 72.28% accuracy

* 95.84% vs 65.33% precision — they block 1 in 3 legitimate users

* 83MB vs 739MB ONNX size

* 101ms vs 646ms on CPU

3

A website to track live music attendance #

showcount.com faviconshowcount.com
0 コメント8:42 PMHN で見る
TL;DR: I built a website that allows users to track the concerts they've been to. If you have strong opinions about engineering/design or how shows should be tracked (festivals, venues, etc...), I'd love to get your input!

For the past ~5 years, I've been tracking the shows I attend on my personal website (https://love-music-will-travel.henryrobbins.com). It's fun to see things like distance traveled and how many times I've been to certain venues. I know many friends who also track their shows through notes, ticket stubs, Excel, etc... It always bummed me out that I couldn't pore through their concert data myself...

showcount.com is my solution to that desire. It's essentially a public version of my old personal website, where anyone can make an account and manage a show list (mine is https://www.showcount.com/user/love-music-will-travel).

I'm currently on the lookout for other live music lovers and/or data nerds to try out the site and give opinions on various design choices. If any of the following topics are of interest to you, please reach out!

- How should venue name/location changes be handled? - How should music festivals be handled? - I have an initial version of an AI parser for loading in existing show lists; how can this be made more robust? - What else should have first-class tracking support (e.g., friends in attendance)?

As an aside, this project is also my first experiment with full-on vibe-coding / harness-engineering. I began the project with Cursor and then switched to Claude Code. I've been programming for the better part of a decade, mostly Python and Java. Full-stack development is relatively new to me. I include the tech stack below. Most decisions were made pragmatically based on what I thought would get me to a first version of the site as quickly as possible.

- Next.js web app hosted on Vercel - Fast API backend service (for the AI parsing) hosted on Railway - Supabase - Observability through Axiom (logging), PostHog (analytics), and Sentry (monitoring) - Clerk for user authentication - Google Maps API for venue locations - Claude API for the AI parser - Terraform for infra-as-code

2

I built a tool to bootstrap VLESS and REALITY over SSH (with rollback) #

0 コメント6:32 PMHN で見る
Built a small tool that sets up VLESS + REALITY over SSH in one command. It handles: - full Xray setup - client configs (vless URI, sing-box, mihomo) - rollback if something breaks

Example: ./irit.sh --mode setup --host <ip>

Would love feedback.) https://github.com/anonymmized/Irit

2

OS Megakernel that match M5 Max Tok/w at 2x the Throughput on RTX 3090 #

github.com favicongithub.com
0 コメント3:00 PMHN で見る
Hey there, we fused all 24 layers of Qwen3.5-0.8B (a hybrid DeltaNet + Attention model) into a single CUDA kernel launch and made it open-source for everyone to try it.

On an RTX 3090 power-limited to 220W: - 411 tok/s vs 229 tok/s on M5 Max (1.8x) - 1.87 tok/J, beating M5 Max efficiency - 1.55x faster decode than llama.cpp on the same GPU - 3.4x faster prefill

The RTX 3090 launched in 2020. Everyone calls it power-hungry. It isn't, the software is. The conventional wisdom NVIDIA is fast but thirsty. Apple Silicon is slow but sips power. Pick a side.

With stock frameworks, the numbers back that up: Setup | tok/s | Power | tok/J RTX 3090 (llama.cpp) | 267 | 350W | 0.76 M5 Max (LM Studio) | 229 | ~130W | 1.76

Case closed. Except the 3090 has 936 GB/s of bandwidth and 142 TFLOPS of FP16 compute, and llama.cpp extracts 267 tok/s out of it. That ratio is absurd.

Traditional inference dispatches one kernel per operation. For 24 layers, that's roughly 100 launches per token. Every boundary means: - Return control to the CPU - Dispatch the next kernel - Re-fetch weights from global memory - Synchronize threads

Why nobody had done this yet? Qwen3.5-0.8B isn't a vanilla transformer. It alternates: - 18 DeltaNet layers: linear attention with a learned recurrence - 6 Full Attention layers: standard MHA

This hybrid pattern is where frontier models are heading: Qwen3-Next, Kimi Linear, all of them. DeltaNet scales linearly with context length instead of quadratically.

It's new, and nobody has shipped a fused kernel for it. MLX doesn't have DeltaNet kernels at all. llama.cpp supports it generically. Everyone else is waiting. The 267 tok/s wasn't a hardware ceiling, it was the software ceiling for a brand-new architecture.

We wrote a single CUDA kernel that runs the entire forward pass in one dispatch. Data stays in registers and shared memory as it flows through the network. Zero CPU round-trips, zero redundant memory fetches.

- 82 blocks x 512 threads, all SMs occupied - BF16 weights and activations, FP32 accumulation DeltaNet recurrence runs in warp-cooperative F32 registers - Full attention fuses QKV, RoPE, causal softmax, and output projection - Cooperative grid sync replaces kernel launches between layers

Results on the same RTX 3090, same model, same weights: Setup | Prefill (pp520) | Decode (tg128) Megakernel | 37,800 tok/s | 413 tok/s llama.cpp BF16 | 11,247 tok/s | 267 tok/s PyTorch + HF | 7,578 tok/s | 108 tok/s

Then we turned the power down Fewer wasted cycles means less heat, so we swept nvidia-smi -pl: Power limit | Clock | Draw | tok/s | tok/J | Notes 420W (stock) | 1980 MHz | 314W | 433 | 1.38 | baseline 300W | 1935 MHz | 299W | 432 | 1.44 | -5% power, 99.8% speed 220W | 1635 MHz | 220W | 411 | 1.87 | -30% power, 95% speed 150W | 405 MHz | 150W | 194 | 1.29 | clock cliff, too aggressive

At 220W we hit the sweet spot: 95% of the throughput for 70% of the power. Tighter execution converts almost directly into saved watts. Measurement: NVML energy counters for NVIDIA, powermetrics for Apple Silicon, matching Hazy Research's Intelligence Per Watt methodology. Accelerator power only, not wall draw.

Without the megakernel the 3090 barely edges out a laptop chip. With it, a five-year-old GPU beats Apple's latest on throughput, matches it on efficiency, and costs a quarter as much. The NVIDIA vs Apple efficiency gap isn't silicon. It's software.

Try it git clone https://github.com/Luce-Org/luce-megakernel.git cd luce-megakernel pip install -e . python bench_pp_tg.py

Requires: NVIDIA Ampere+ (tested on 3090), CUDA 12+, PyTorch 2.0+, ~1.5GB VRAM.

Code is open source (MIT): https://github.com/Luce-Org/luce-megakernel

Let us know if you have any feedback

2

IDWIW – a YouTube viewer to avoid algorithm traps #

0 コメント2:39 PMHN で見る
Hi HN,

I built IDWIW — short for "I Decide What I Watch" — an algorithm-free YouTube viewer.

You can try it here: https://get.idwiw.app/

The problem I kept running into was that YouTube is full of "algorithm traps". I usually want to spend as little time on the platform as possible, even though there’s a lot of useful content I want to consume for learning.

Existing RSS readers didn’t really solve this for me because they aren’t designed to queue up and consume video well. YouTube’s "Watch Later" list also didn’t work — videos don’t automatically disappear after watching, and since it lives on YouTube, you're still constantly exposed to recommendations.

So I built what I actually wanted: a dedicated YouTube RSS reader and viewer with a proper watch list.

The goal is to make it easy to decide what to watch (and what not to), and when watching, stay completely isolated from the recommendation feed.

I built the first version in about 12 days using Rails 8.

If you're curious about how it was built, I wrote a detailed breakdown here: https://medium.com/@kei178/from-idea-to-product-in-12-days-w...

The app is free to try. I'd love feedback — especially whether this solves a problem you personally have, and if anything feels confusing or missing.

Happy to answer any questions, technical or otherwise!

2

OpenFable – Open-source RAG engine using tree-structured indexes #

github.com favicongithub.com
0 コメント1:05 PMHN で見る
Hi HN, I built OpenFable, an open-source retrieval engine that implements the FABLE algorithm (https://arxiv.org/abs/2601.18116) for RAG pipelines. I'm using it in another project and thought that others might benefit.

  Most RAG systems chunk documents into flat segments and retrieve by vector similarity. This works  
  for simple lookups but breaks when answers span multiple sections, when relevant content is buried
  in a subsection, or when you need to control how many tokens you're sending to an LLM.             
                                                                                                   
  OpenFable takes a different approach: when you ingest a document, it uses an LLM to identify       
  discourse boundaries (not fixed-size windows), then builds a hierarchical tree, root, sections,
  subsections, leaf chunks, with embeddings at every level. Retrieval combines two paths:           
                                                                                                   
  1. LLM-guided path: the LLM reasons about which documents and subtrees are relevant from summaries
  2. Vector path: similarity search with structure-aware score propagation through the tree
                                                                                                     
  Results from both paths are fused, deduplicated, and trimmed to fit a token budget you specify. You
   get the most relevant chunks, in document order, within budget.                                   
                                                                                                     
  From the FABLE paper: the algorithm matches full-context inference (517K tokens) using only 31K    
  tokens, 94% reduction, while hitting 92% completeness vs. Gemini-2.5-Pro at 91% with the full
  document.                                                                                          
                                                                                                   
  Retrieval only; OpenFable returns ranked chunks, not generated answers. Bring your own LLM for    
  generation.
                                                                                                     
  It runs as a Docker stack (FastAPI + PostgreSQL/pgvector) and exposes both a REST API and an MCP   
  server, so LLM agents like Claude Desktop or Cursor can use it directly.
                                                                                                     
  Trade-offs I want to be upfront about:                                                           
  - Ingestion is expensive; every document requires multiple LLM calls for chunking and tree
  construction                                                                                       
  - Retrieval isn't sub-second, the LLM-guided paths add round-trips
  - No built-in auth; designed to sit behind a reverse proxy                                        
  - v0.1.0 — works end to end but the roadmap includes async ingestion, document deletion, and       
  metadata filtering                                                                                 
                                                                                                     
  Stack: Python 3.12, FastAPI, SQLAlchemy, pgvector, LiteLLM, fastMCP. Apache 2.0.                   
                                                                                                     
  Happy to answer questions about the algorithm, implementation choices, or benchmarks.
1

Host Infinite Python Services #

phemeral.dev faviconphemeral.dev
0 コメント5:14 PMHN で見る
Built this project over the past few months to make it simpler to host and scale Python servers. For FastAPI, Flask, Django apps all you need to do is connect your git repo and make a push, all the config is automatically figured out and your app is deployed for you. For any other framework, you just need to specify a start command before you make a push. Your applications run on Phemeral's managed compute and automatically scale to zero and rapidly (~30ms) scale up using vm snapshots when receiving traffic.
1

Hoeren – Local-only meeting transcription and voice dictation #

0 コメント2:37 PMHN で見る
My company has a strict data policy around call recordings (and using cloud AI tools). I got tired of working around it every time I needed to transcribe something, so I built Hoeren - a macOS app that keeps everything on-device. No account, no internet, no subscription. Granola, Otter, Fireflies all do. This doesn't.

Two things it does well:

- Meeting transcription and summaries with action items - Voice dictation with optional AI prompting - you speak roughly, it turns that into something better (I use it to improve my English in Slack)

Apple Silicon only, Whisper&Qwen under the hood. One-time $49, no subscription.

HN50 gets you 50% off - https://hoeren.app