2026年5月17日的 Show HN

11 篇

444

Semble – Code search for agents that uses 98% fewer tokens than grep #

github.com

Hey HN! We (Stephan and Thomas) recently open-sourced Semble. We kept running into the same problem while using Claude Code on large codebases: when the agent can't find something directly, it falls back to grep, reading full files or launching subagents. This uses a lot of tokens, and often still misses the relevant code. There are existing tools for this, but they were either too slow to index on demand, needed API keys, or had poor retrieval quality.

Semble is our solution for this. It combines static Model2Vec embeddings (using our latest static model: potion-code-16M) with BM25, fused via RRF and reranked with code-aware signals. Everything runs on CPU since there's no transformers involved. On our benchmark of ~1250 query/document pairs across 63 repos and 19 languages, it uses 98% fewer tokens than grep+read and reaches 99% of the retrieval quality of a 137M-parameter code-trained transformer, while being ~200x faster.

Main features:

- Token-efficient: 98% fewer tokens than grep+read

- Fast: ~250ms to index a typical repo on our benchmark, ~1.5ms per query on CPU (very large repos may take longer)

- Accurate: 0.854 NDCG@10, 99% of the best transformer setup we tested

- MCP server: drop-in for Claude Code, Cursor, Codex, OpenCode

- Zero config: no API keys, no GPU, no external services

Install in Claude Code with: claude mcp add semble -s user -- uvx --from "semble[mcp]" semble

Or check our README for other installation instructions, benchmarks, and methodology:

Semble: https://github.com/MinishLab/semble

Benchmarks: https://github.com/MinishLab/semble/tree/main/benchmarks

Model: https://huggingface.co/minishlab/potion-code-16M

Let us know if you have any feedback or questions!

Codiff, a local diff review tool #

github.com

7 評論5:30 AM在 HN 查看

Nowadays I review a lot of code locally that was written by llms. I used to review my own code using git + delta. It started to feel limiting with the amount of code written by llms.

When looking at a large diff on Friday I pointed an llm at diffs.com and trees.software and told it to build an app. It only took 16 minutes, is extremely fast for large diffs, beautiful and minimal.

Today I polished it up and added all the features that I need. It has file filters, search, an llm walkthrough mode, and review comments that you can paste back into your llm.

I will be using Codiff a lot, and can finally review the large diff from Friday that led me to build this If you like it, fork it!

Typeset sitelen pona and copy a PNG (for toki pona speakers) #

sitelen.vercel.app

1 評論4:41 PM在 HN 查看

Serene Bach – a Go weblog engine that runs as CGI or HTTP #

github.com

0 評論3:47 AM在 HN 查看

I originally made Serene Bach in the 2000s as a weblog engine written in Perl CGI. I rebuilt it from scratch in Go as a single binary that can run either as a CGI program or as a normal HTTP server.

I know CGI is generally considered legacy technology now, but I still rely on it for shared hosting. In this version, I added Markdown support, a responsive default theme, Open Graph image generation, and static output generation.

It is still in beta, but the repository includes a Docker image published on GHCR, documentation, and a local quick start. I'd appreciate feedback from anyone interested in small self-hosted publishing tools, especially if you still care about shared hosting or CGI-style deployment.

I made a printable graph papaer templates website #

printablegraphpaper.org

9 評論2:56 PM在 HN 查看

Cheap-IM: Thinking Machines' demo on a CPU laptop #

github.com

0 評論11:49 PM在 HN 查看

Forecasting my backyard weather with a 22M time-series model #

huggingface.co

4 評論3:08 PM在 HN 查看

HypergraphZ – directed hypergraph library in Zig with Python bindings #

github.com

0 評論7:08 PM在 HN 查看

Built a verifiable, open-source SoC 2 readiness scanner #

loxeai.com

0 評論12:02 AM在 HN 查看

After speaking with over 50+ CISOs, DevOps, & pre-series A founders for months, I realized a problem in the GRC industry. SOC 2 automation exists, but people are split between trusting these black-box tools with systems that are continuously changing. As a result audits are slow & mistrusted.

Right now the most important thing is verifiability & depth, rather than just compliance automation-because it does exist, everywhere.

Here's what I did from learning this:

-> Created an open-source AWS Evidence Scanner & Control Mapper for lean, pre-series A AWS-Native teams thinking about SOC 2 Type l or are undergoing SOC 2 Type l audit. Collects across 15+ AWS Services to 12 critical controls in the trust-service criteria.

Why open-source? Accessibility for people who might have their hands tied choosing between expensive GRC tools. Its also used as a trust-mechanism. Code is right there. A CEO or auditor can read exactly what API calls we make before giving us the role ARN.

-> I included a paid report embedded within the tool (open-core model). Users have the option to pay for the report in which every finding traces back to the API call that produced it. SHA-256 hashed (at a fraction of the cost of bigger legacy platforms). With remediation steps & a compliance-copilot to help with other parts of the Type l process beyond evidence collection (like policy writing, risk assessment, etc).

Why paid report? The best way to make the auditors job as easy as possible is to give them a verifiable package where the evidence is right there in front of them, timestamped so they can see what happened, when (rooted in AWS APIs). No black-box, no way to fake it. Saving weeks of back & forth between auditors and clients, with the click of a few buttons.

An auditor can re-run the same API call, hash the response themselves, and verify it matches what's in the report.

Value: 30 seconds to deploy. 5 mins to run the scan & evidence is collected & mapped. Paid report includes verifiable evidence companies can send to their auditor. Paid features include a co-pilot to help with audit-readiness beyond just evidence collection.

-> Understand Limitations.

I understand the scope of this product is pretty limited in part because its also very new. I'm not going to claim it solves all of compliance, because it doesn't. It makes a very time-consuming part of the process very accessible to be automated & gives an auditor a report they can rely on.

What now? Anyone who's gone through, thinking about or is in the middle of SOC 2, would love your reaction to the output, even if it's critical. Also looking for early testers/users.

repo here: https://github.com/adog0822/AWS-Evidence-Layer

try it here: https://loxeai.com

A freehand drawing guestbook for my portfolio #

paco.fyi

0 評論3:42 PM在 HN 查看

I added a drawing guestbook to my personal portfolio as a way to let visitors leave a doodle.

It uses perfect-freehand lib to generate pressure-sensitive SVG strokes. Each drawing is serialized as a stroke point array and appended to a flat JSON file on the server. No database, no auth.

Feel free to draw something!

Proper, a Rails-shaped Python web framework #

properproject.org

0 評論2:58 PM在 HN 查看

2026年5月17日 的 Show HN

2026年5月17日的 Show HN