Show HN for May 4, 2026

17 items

I indexed 8,643 BSides talks across 227 chapters and 6 continents #

allbsides.com

8 comments10:10 PMView on HN

Hi HN,

I'm Roland, and for the past few weeks, I've been building AllBSides — a directory of every BSides conference talk uploaded to YouTube. As of today, 8,643 talks from 5,927 speakers across 227 chapters in 68 countries. Combined runtime is 280 days. The transcripts come to about 60 million words.

The archive came together in stages:

1. Manually map every BSides chapter's YouTube channel 2. Pull every video and transcript from Supabase 3. Run each transcript through Haiku for tag extraction (tools, topics, difficulty, team, talk style, research method, and much more) 4. Run results through Sonnet for categorization and dedup 5. Final pass goes through Opus for verification 6. Do a manual verification - at one time, the pipeline showed over 16k AI suggestions for manual verification. Today, most are resolved.

Total LLM cost so far: about €200. The whole pipeline is rebuildable from scratch.

Each talk gets its own page with embedded video, full transcript, speakers, tags, and "related talks." Each tool/framework/protocol/standard mentioned across the corpus gets its own page (3,968 distinct technologies tracked).

Some interesting facts I gathered while building it:

-(A) The site is currently 94% bot traffic. Of that, about 80,000 hits/month are AI training crawlers (ClaudeBot, GPTBot, meta-externalagent). Within 7 days of the talks archive going live, all major AI labs had ingested the entire corpus. The discovery cascade was startling to watch in real time.

-(B) The taxonomy work was the hardest part. Distinguishing "tools" from "frameworks" from "protocols" from "concepts" sounds easy until you have 5,000 ambiguous extracted entities. The 3-tier LLM pipeline helped a lot — Haiku alone was too noisy, Opus alone was too expensive.

-(C) Top tools mentioned: Wireshark (343), PowerShell (342), Metasploit (332), Burp Suite (322), GitHub (296), VirusTotal (273), Docker (253), Splunk (251), Nmap (247), MITRE ATT&CK (237). The list reflects what BSides talks actually discuss, not what vendors curate.

-(D) May is the peak BSides month — 29 events, 17% of all events with dates.

-(E) The top 1% of talks (86 videos by view count) account for 51% of all viewership. The other 99% are deeply niche, often the only video record of a specific technique.

The stack is intentionally lean: Go, SQLite, vanilla JavaScript, BunnyCDN. Static rendering at build time. No frameworks, no client-side state. The site costs about €50/month to run.

The data behind this post and much more can be found in the site footer, under the link "stats".

Happy to answer questions about the data pipeline, the taxonomy decisions, or what the AI crawler patterns looked like as the archive went live. Feedback on what to build next is genuinely welcome — I'm a solo dev figuring this out as I go.

— Roland (parkado)

Muesli – If Granola and Wisprflow had an open source on device baby #

freedspeech.xyz

10 comments4:41 PMView on HN

Hey folks, I am the developer behind muesli - which is your one stop app for all your speech to text needs, be it voice dictation or meeting transcriptions that runs on device on your Apple Neural Engine using CoreML based STT models (Parakeet, Whisper, Cohere transcribe). Everything is open source and we are at 160 stars - au naturale - would love for folks to use it and contribute further to the development

Bonsai 1.7B ternary model at 442T/s on M4 Max #

agents2agents.ai

3 comments3:47 PMView on HN

We took a recently released Bonsai 1.7B ternary model from PrismML (https://github.com/PrismML-Eng/Bonsai-demo) and ran our agentic evolution search on it for 6 hours to optimize the Metal kernels. The search was fully autonomous. Measured against unmodified upstream llama.cpp at the same Bonsai/Q2_0 commit, same M4 Max: - tg128: 309.82 → 442.42 t/s (+42.0%) - pp512: 4250.32 → 4622.63 t/s (+8.8%)

Agent-evals – Claude skill to build your own evals #

github.com

1 comments7:27 PMView on HN

I’ve spent the past 10 years working on AI in finance, with much of that time focused on building evaluation systems for production environments.

As agents become more widely adopted, more software engineering and product people have start building them. But I’ve noticed that many teams are not yet fluent in systematic evaluation, or in the processes needed to keep agent quality high over time.

For large organizations, that gap is rarely the bottleneck due to dedicated teams. But after speaking with a number of startups, it became clear that building strong, up-to-date evals is much harder in a fast startup, especially when the team does not have a data science background.

So I tried to condense as much of my experience as possible into a Claude Skill: a practical starting point for evaluating your agent.

The idea is simple: tell Claude you need evals, and it will set up a solid baseline directly in your codebase - that's it! The evals will follow patterns I've seen many times before, and will get you a summary of what your agent does well and what it doesnt.

Looking forward to your feedback!

Node-Vmm – Linux MicroVMs in Pure Node.js for Mac/Windows/Linux in ~1s #

github.com

1 comments10:37 PMView on HN

ReflowPDF – wrote a layout engine because every PDF library failed #

reflowpdf.com

2 comments1:09 AMView on HN

Let – Offline-first life events tracker (React Native, SQLite) #

github.com

1 comments1:53 PMView on HN

Replacing spec-driven development with just facts #

github.com

2 comments2:00 PMView on HN

I had a lot of issues with spec-driven approaches, agents are too readily producing fluff, large projects have so many specs agents start making mistakes maintaining them. There's a constant consistency tax.

In the end every spec is just a bunch of facts, so I decided to leave that and throw away everything else while making it friendlier for agentic use.

Introducing facts - skills and CLI for agents to use facts-driven development. https://github.com/av/facts

Kula – a family health platform that makes sense of your data #

12 comments4:40 AMView on HN

My parents are in India, I'm in the US. Their health system was continuous WhatsApp photos of lab reports, vague updates over the phone, and me finding out about doctor visits weeks later. So I built Kula. Upload lab reports (photo, PDF, WhatsApp forward) and it have them parsed and track trends. Connect a wearable and track daily health signals as well as your baselines. Everything goes into one record you can search and review over time. There's a chat layer where you can ask questions in plain language like, "what's my dad's cholesterol trend showing", and get a sourced answer from your own data. Primarily built it for my family. My parents told me they'd use it even without me, just to have their records organized before doctor visits. That truly changed how I think about it.

Looking for feedback on this platform. Would you use this? What are your thoughts? What's missing?

www.mykula.health

Privacy-First Pdf Converter #

privapdf.net

6 comments1:55 PMView on HN

NeuralScript – A pure-Rust AOT compiler #

github.com

0 comments8:06 PMView on HN

Yames – A distraction-free desktop metronome built with Rust and Tauri #

turutupa.github.io

0 comments11:49 PMView on HN

Pytest plugin that classifies why your CI failed #

github.com

0 comments2:12 PMView on HN

NoReporter – AI-only newsroom, $1/year #

noreporter.ai

0 comments6:38 PMView on HN

A completely automated news agency, eliminate human selection bias.

Genosyn – Run Autonomous Companies #

genosyn.com

0 comments7:24 PMView on HN

Visual SSL TLS Handshake Visualizer #

sitesecurityscore.com

0 comments3:19 PMView on HN

Here's handy tool for those wanting to find out and learn how does a SSL TLS Handshake works, including Key Exchange, Certificate Chain

Logram, a filterable, modular log navigator for the terminal #

github.com

0 comments4:42 PMView on HN

I was _very_ tired of going through massive log files with just vim/grep. You give logram the formats of your logs, and it will parse it into different fields that you can filter on. I also aimed to make it eaily modulable. It saves me a ton of time and I hope you can find it usefull too