2026年3月31日 の Show HN
55 件How This Graybeard Built the Fastest and Freest Postgres BM25 Search #
The problem: core Postgres doesn't provide this; the leading Postgres BM25 extension, ParadeDB, is guarded behind AGPL; developing our own extension appeared daunting. We'd need a small team of sharp engineers and 6-12 months, I figured. And we'd probably still fall short of the performance of a mature system like Parade/Tantivy.
Or would we? I'd be experimenting long enough with AI-boosted development at that point to realize that with the latest tools (Claude Code + Opus) and an experienced hand (I've been working in database systems internals for 25 years now), the old time estimates pretty much go out the window.
I told our CTO I thought I could solo the project in one quarter. This raised some eyebrows.
It did take a little more time than that (two quarters), and we got some real help from the community (amazing!) after open-sourcing the pre-release. But I'm thrilled/exhausted today to share that pg_textsearch v1.0 is freely available via open source (Postgres license), on Tiger Data cloud, and hopefully soon, a hyperscalar near you:
https://github.com/timescale/pg_textsearch
In the blog post accompanying the release, I overview the architecture and present benchmark results using MS-MARCO. To my surprise, we were not only able to meet Parade/Tantivy's query performance, but exceed it substantially, measuring a 4.7x advantage on query throughput at scale:
https://www.tigerdata.com/blog/pg-textsearch-bm25-full-text-...
It's exciting (and, to be honest, a little unnerving) to see a field I've spent so much time toiling in change so quickly in ways that enable us to be more ambitious in our technical objectives. Technical moats are moats no longer.
The benchmark scripts and methodology are available in the github repo. Happy to answer any questions in the thread.
Thanks,
TJ ([email protected])
Claude Code rewritten as a bash script #
PhAIL – Real-robot benchmark for AI models #
PhAIL runs four models (OpenPI/pi0.5, GR00T, ACT, SmolVLA) on bin-to-bin order picking – one of the most common warehouse operations. Same robot (Franka FR3), same objects, hundreds of blind runs. The operator doesn't know which model is running.
Best model: 64 UPH. Human teleoperating the same robot: 330. Human by hand: 1,300+.
Everything is public – every run with synced video and telemetry, the fine-tuning dataset, training scripts. The leaderboard is open for submissions.
Happy to answer questions about methodology, the models, or what we observed.
Wageslave – I quit my soul sucking job to make a game about it #
So I relegated this aspiration to the realm of pipe dreams, something whose idea I preferred to the actual process of doing it, and I continued working as a software engineer.
This only changed after a disruption to my job: the fintech I worked for got acquired by a big traditional bank and we were absorbed into their ranks.
Long story short: it wasn't a culture fit.
After six months, I decided to quit; earning a salary was the only thing keeping me there.
With all this newfound free time, I decided to give game dev another chance. I would never get back those six months, but I could use them as fuel for creative inspiration.
That's how the idea of wageslave came to be: I wanted to embody the absurdity of the 9 to 5 into an interactive format. As well as include winks and nods to developer culture.
I don't know if I was successful with this goal, but it's been lots of fun creating it as well as being a cathartic experience for me. I am actually enjoying the process and not just the results. So much so that I'm aiming to complete at least two more projects before I reevaluate whether this is viable.
I initially planned on releasing the game for free, but in the spirit of taking game dev seriously, I will be selling it for a small amount.
Feel free to try the demo, I'd be happy to hear about any feedback!
DeepTable – an API that converts messy Excel files into structured data #
The core issue: most real-world spreadsheets aren't relational tables. Merged cells, multi-level headers, multiple tables per sheet, totals mixed in with data. You can't just dump them to CSV and call it done. LLMs handle the easy cases but fall apart on complex workbooks at scale.
Our approach uses an agent-guided compilation pipeline that produces SQL-ready relational tables with full cell-level provenance. This demo visualizes what we do: https://storage.googleapis.com/deeptable-public/deeptable_an...
We have a handful of early customers but honestly don't know yet whether this is a real market or a niche problem. We're posting this to hear from people who've dealt with arbitrary spreadsheet ingestion. Whether you solved it, gave up, or are still living with the pain.
If you want to try it on your own files, email me (see my profile for my email) and I'll give you API access.
Multi-agent autoresearch for ANE inference beats Apple's CoreML by 6× #
Each agent ran locally on a different Mac (M1–M4), repeatedly modifying how a DistilBERT model is executed on the ANE, benchmarking latency, and sharing results and insights with other agents in real time.
Instead of exploring independently, agents could:
- see what others had tried - reuse working strategies - avoid known failure modes
Across all tested chips, the agents ended up outperforming Apple’s CoreML baseline, with up to 6.31× lower median inference latency on the same hardware.
An interesting pattern we observed: an agent stuck at ~2.1ms latency on M4 was able to break through after incorporating strategies discovered by agents on different chips (M2, M4 Max), eventually reaching ~1.5ms and surpassing CoreML.
Full write-up: https://x.com/christinetyip/status/2039040161439224157
Detailed results: https://ensue-network.ai/lab/ane?view=strategies https://ensue-network.ai/lab/ane
Curious what other optimization problems this kind of setup could be applied to, especially in systems, compilers, or ML infra. Would be interested in exploring similar experiments.
Fingerprinting browser-impersonating bots w/o JavaScript (open spec) #
WebRTC video calls, no account needed #
just-call.app — no sign-up, no install, just a link.
Happy to answer questions or take feedback.
Prawduct, a product development framework for Claude Code #
Prawduct is a set of prompts, skills, hooks, and artifact templates that help focus Claude Code on product development rather than code development.
You can start from something a simple as "make a website with a scientific calculator" or as complex as "create a MMO with clients for iOS, Android, and web". You can specify as much or as little arch standards or implementation details as you want.
Specialized skills like /critic and /janitor are run automatically and apply context-less reviews to catch drift, hacks, and violations of best practices.
I've been using Prawduct myself for a couple of months, developing my own projects and also iterating on Prawduct itself (which is of course self-hosted on its own framework).
I'd love to hear feedback.
INTERCALsky.ATproto client.Ada carries packets.INTERCAL carries meaning #
Posted to Bluesky. With ACCEPTABLE politeness. PLEASE.
This is INTERCAL:
https://en.wikipedia.org/wiki/INTERCAL
This is Ada: https://en.wikipedia.org/wiki/Ada_(programming_language)
Vibe Check – UX Benchmark for vibe designs #
Example of sim playback: https://app.appvelocity.io/vibe/simulation/8321db67-883b-445...
Example report: https://appvelocity-io.pmailroute.net/x/d?c=50527836&l=694a7...
If anyone shares some early Vibes, I'll run Vibe Check on your behalf and share some insights.
Ironedome Commander – Israel/Iran War Arcade #
Gravity doesn't track mass, it tracks waveform complexity #
A companion paper establishes the framework: fix an observation pipeline and measure persistence across 13 real-data domains from independent instruments (LIGO, EHT, CMB, sunspots, quasars, supernovae, neutrinos). A single temporal axis organizes all domains, with electromagnetism at one pole and gravity at the other. A periodic signal through the same pipeline produces zero positive persistence across 180+ runs — accumulation destroys the wave when observation boundaries don't divide the period.
First Paper: https://zenodo.org/records/19323952 Second Paper: https://zenodo.org/records/19341889
Companion piece to the first paper: Light and gravity are opposite poles of observation (less technical) https://www.wvrk.org/works/the-structure
Erebus (the underlying system): https://erebus.org
Browserbeam – a browser API built for AI agents #
I started digging deeper and at some point I just bluntly asked in the Cursor chat the following question: "I ask you, as an LLM that uses these headless browsers, what do you wish people would build to make your work easier?"
And it worked because I expanded the "Thinking" section and I saw: "The user is asking me a really interesting meta-question ..." and after that it just listed top 10 most painful issues related to the agent<->browser interaction.
So I started building a browser API that returns what LLMs actually need, not what browsers return.
Fast forward a few weeks and here we are. A REST API built specifically to help LLMs interact with real browsers.
Instead of reading raw HTML, you get markdown, page map, short refs (e1, e2) for clicking instead of CSS selectors, a stable flag when the page is ready, diffs after each step, the list of all interactive elements (links, buttons, inputs), automatic blocker dismissal and a small extract step that returns structured JSON from a schema you describe.
Official SDKs for Python, TypeScript, Ruby. MCP server for Cursor and Claude Desktop.
Would appreciate any feedback, especially on the API design.
Open-source AI native linktree app #
An extension that opens any Goodreads book in anna's or Zlib in a click #
I built a free, open source browser extension that adds buttons directly onto Goodreads book pages. Instead of copying titles and searching manually, you just click the badge for whichever source you want.
You can also toggle sources on/off so say if you only want Z-Lib and Anna's Archive badges and not Gutenberg, you can do exactly that.
Supported sources:
Anna's Archive
Z-Library
Project Gutenberg
AudioBookBay (new!)
Supported sites:
• Goodreads
• StoryGraph
• Hardcover
• Babelio
• Novelupdates
it is available on :
-chrome
-firefox
- Edge
Available on Chrome and Firefox. Also for firefox mobile
Anime.js used for animation
No data collected , you can verify that yourself via the source code on GitHub or the privacy page.
This has been updated to V1.0.8 !
it is free and open source ,
if you want to support me and like this extension , pls star it and rate it. ( Also you can github sponsor me! )
Thanks.
OpenClaw Arena – Benchmark models on real tasks, rank by perf and cost #
The problem: Chatbot Arena tests conversation quality. But most people using AI agents need them to do more: browse the web, manage files, write and run code, create full applications, automate multi-step workflows. There's no benchmark that (1) tests general-purpose agentic tasks, (2) uses user-submitted tasks instead of fixed test sets, and (3) separately ranks models on both quality and cost-effectiveness.
What we built: OpenClaw Arena lets you submit any task and pit 2-5 models against each other. A judge OpenClaw agent (currently using one of the top models: Claude Opus 4.6, GPT-5.4, or Gemini 3.1 Pro) runs on a fresh VM, spawns one subagent per model, and each model solves the task independently with full access to terminal, browser, file system, and code execution.
Results feed into two live leaderboards:
- Performance — which model produces the best results
- Cost-effectiveness — which model delivers the best quality per dollar
What we've found (after 300+ battles, 15 models):
The two rankings are completely different. Performance top 3: Claude Opus 4.6, GPT-5.4, Claude Sonnet 4.6. Cost-effectiveness top 3: Step 3.5 Flash, Grok 4.1 Fast, MiniMax M2.7.
Claude Opus 4.6 ranks #1 on performance but #14 on cost-effectiveness.
Step 3.5 Flash is #1 on cost-effectiveness, #5 on performance. (I didn't expect that TBH)
Several models (GLM-5 Turbo, Xiaomi MiMo v2 Pro, MiniMax M2.7) outrank Gemini 3.1 Pro on performance. Actually Gemini 3.1 Pro is so bad at using skills that we have to optimize the judge message just for it, otherwise it sometimes just reads the skill and decide to do nothing...
Note: we bootstrap first 300 battles by crawling what people are doing using OpenClaw (on X, Reddit, etc), and generate battles with similar tasks + randomly selected models.
Methodology: We only use the relative ordering of models within each battle to compute rankings — not the raw scores. Same principle as Chatbot Arena: absolute scores from judges are noisy and poorly calibrated (a "7/10" in one battle might be "6/10" in another), but "A ranked above B" is much more consistent and reliable. Rankings use a grouped Plackett-Luce model (not simple win-rate or Bradley-Terry) with 1,000-resample bootstrap confidence intervals. Each model entry shows score ± CI and a rank spread (plausible rank range). Models with insufficient data are marked "provisional." Full methodology with equations: https://app.uniclaw.ai/arena/leaderboard/methodology?via=hn
Key features:
- Live dual leaderboard (performance + cost-effectiveness) with Plackett-Luce ranking
- Dynamic user-submitted tasks across 11 categories (no fixed test set to overfit on), we will add more, just let me know what you want to add
- Fresh VM per benchmark with one subagent per model
- User-selectable judge model
- Full conversation history, judge reasoning, and workspace artifacts preserved and shown to users
- Full transparency: you can evaluate the output yourself, not just trust the score
- Open-source judge skill: https://github.com/unifai-network/skills/tree/main/agent-ben...
Public benchmarks are free (we cover compute). The leaderboard is browsable without an account.
- Leaderboard: https://app.uniclaw.ai/arena?via=hn
- Submit a battle: https://app.uniclaw.ai/arena/new?via=hn (free account required)
- Methodology: https://app.uniclaw.ai/arena/leaderboard/methodology?via=hn
- Judge skill source: https://github.com/unifai-network/skills/tree/main/agent-ben...
We'd love feedback on the methodology and what tasks you'd want to see benchmarked.
Mpump – browser groovebox where grooves are shareable links #
No install, no account.
The idea is simple: a groove is a URL. The full pattern, tempo, and sounds live in the link. You open it, hear it, change it, send it back different. Has drums, bass, synth, Euclidean rhythms, live jam sessions.
Would love feedback, especially from people who make this kind of music.
https://mpump.live Built with Claude Code. Source on GitHub (AGPL-3.0). https://github.com/gdamdam/mpump
MCP server that generates macOS tools via Open Scripting Architecture #
----
Claude has several ways to interact with an app or service. Computer use is the broadest and slowest, so Claude tries the most precise tool first:
- If you have an MCP server for the service, Claude uses that.
- If the task is a shell command, Claude uses Bash.
- If the task is browser work and you have Claude in Chrome set up, Claude uses that.
- If none of those apply, Claude uses computer use.
----
So osa-mcp adds an MCP server with tools for every AppleScript/JXA app it can find on the host, in order to maximize the first/best case. I've verified that it works with claude code and cowork, but it should work with any other MCP client as well. It also supports Remote Login via SSH [2].
This enables some pretty cool workflows and custom skills, e.g. "Read today's inbox in Mail and give me a summary. Check to see if I should schedule any additional meetings in Calendar, and if they conflict with the plans I made with anyone in Messages let them know. Then organize my notes for each meeting, and update their descriptions." Thus the capabilities of AppleScript are made available with natural language.
Inspiration is credited to [3] and [4] but neither exposes the entirety of OSA with MCP like this in a dynamic manner. Feedback is appreciated, I think it is pretty much an unofficial preview of the inevitable agentic Siri that will be released in a future OSX update.
[1] https://code.claude.com/docs/en/computer-use
[2] https://support.apple.com/guide/mac-help/allow-a-remote-comp...
AI Engineering AI-native Self-learning course repo #
Amoxide – The right aliases, at the right time #
amoxide organizes aliases in three layers, from broadest to most specific:
- Global — always active, available in every shell session
- Profiles — named groups of aliases you can activate/deactivate
- Project — local .aliases files that auto-load per directory
Each layer can override the previous one. Project aliases override profile aliases, which override global aliases.