每日 Show HN

Upvote0

2026年3月31日 的 Show HN

55 条
203

How This Graybeard Built the Fastest and Freest Postgres BM25 Search #

github.com favicongithub.com
56 评论4:29 PM在 HN 查看
Last summer we faced a conundrum at my company, Tiger Data, a Postgres cloud vendor whose main business is in timeseries data. We were trying to grow our business towards emerging AI-centric workloads and wanted to provide a state-of-the-art hybrid search stack in Postgres. We'd already built pgvectorscale in house with the goal of scaling semantic search beyond pgvector's main memory limitations. We just needed a scalable ranked keyword search solution too.

The problem: core Postgres doesn't provide this; the leading Postgres BM25 extension, ParadeDB, is guarded behind AGPL; developing our own extension appeared daunting. We'd need a small team of sharp engineers and 6-12 months, I figured. And we'd probably still fall short of the performance of a mature system like Parade/Tantivy.

Or would we? I'd be experimenting long enough with AI-boosted development at that point to realize that with the latest tools (Claude Code + Opus) and an experienced hand (I've been working in database systems internals for 25 years now), the old time estimates pretty much go out the window.

I told our CTO I thought I could solo the project in one quarter. This raised some eyebrows.

It did take a little more time than that (two quarters), and we got some real help from the community (amazing!) after open-sourcing the pre-release. But I'm thrilled/exhausted today to share that pg_textsearch v1.0 is freely available via open source (Postgres license), on Tiger Data cloud, and hopefully soon, a hyperscalar near you:

https://github.com/timescale/pg_textsearch

In the blog post accompanying the release, I overview the architecture and present benchmark results using MS-MARCO. To my surprise, we were not only able to meet Parade/Tantivy's query performance, but exceed it substantially, measuring a 4.7x advantage on query throughput at scale:

https://www.tigerdata.com/blog/pg-textsearch-bm25-full-text-...

It's exciting (and, to be honest, a little unnerving) to see a field I've spent so much time toiling in change so quickly in ways that enable us to be more ambitious in our technical objectives. Technical moats are moats no longer.

The benchmark scripts and methodology are available in the github repo. Happy to answer any questions in the thread.

Thanks,

TJ ([email protected])

60

Claude Code rewritten as a bash script #

github.com favicongithub.com
19 评论11:24 PM在 HN 查看
Have you ever wondered if Claude Code could be rewritten as a bash script? Me neither, yet here we are. Just for kicks I decided to try and strip down the source, removing all the packages.
21

PhAIL – Real-robot benchmark for AI models #

phail.ai faviconphail.ai
8 评论4:25 PM在 HN 查看
I built this because I couldn't find honest numbers on how well VLA models actually work on commercial tasks. I come from search ranking at Google where you measure everything, and in robotics nobody seemed to know.

PhAIL runs four models (OpenPI/pi0.5, GR00T, ACT, SmolVLA) on bin-to-bin order picking – one of the most common warehouse operations. Same robot (Franka FR3), same objects, hundreds of blind runs. The operator doesn't know which model is running.

Best model: 64 UPH. Human teleoperating the same robot: 330. Human by hand: 1,300+.

Everything is public – every run with synced video and telemetry, the fine-tuning dataset, training scripts. The leaderboard is open for submissions.

Happy to answer questions about methodology, the models, or what we observed.

10

Wageslave – I quit my soul sucking job to make a game about it #

cauldron.itch.io faviconcauldron.itch.io
6 评论4:21 PM在 HN 查看
Game development used to be something I dabbled in and off, and despite having chosen to major in Computer Science to follow a career in the field, I never managed to establish any consistency with it.

So I relegated this aspiration to the realm of pipe dreams, something whose idea I preferred to the actual process of doing it, and I continued working as a software engineer.

This only changed after a disruption to my job: the fintech I worked for got acquired by a big traditional bank and we were absorbed into their ranks.

Long story short: it wasn't a culture fit.

After six months, I decided to quit; earning a salary was the only thing keeping me there.

With all this newfound free time, I decided to give game dev another chance. I would never get back those six months, but I could use them as fuel for creative inspiration.

That's how the idea of wageslave came to be: I wanted to embody the absurdity of the 9 to 5 into an interactive format. As well as include winks and nods to developer culture.

I don't know if I was successful with this goal, but it's been lots of fun creating it as well as being a cathartic experience for me. I am actually enjoying the process and not just the results. So much so that I'm aiming to complete at least two more projects before I reevaluate whether this is viable.

I initially planned on releasing the game for free, but in the spirit of taking game dev seriously, I will be selling it for a small amount.

Feel free to try the demo, I'd be happy to hear about any feedback!

8

DeepTable – an API that converts messy Excel files into structured data #

docs.deeptable.com favicondocs.deeptable.com
0 评论3:27 PM在 HN 查看
We tried to build an Excel error checker. To achieve that, we needed to actually understand the semantic structure of a spreadsheet first. So we built that, and it turned out to be the harder, more general problem.

The core issue: most real-world spreadsheets aren't relational tables. Merged cells, multi-level headers, multiple tables per sheet, totals mixed in with data. You can't just dump them to CSV and call it done. LLMs handle the easy cases but fall apart on complex workbooks at scale.

Our approach uses an agent-guided compilation pipeline that produces SQL-ready relational tables with full cell-level provenance. This demo visualizes what we do: https://storage.googleapis.com/deeptable-public/deeptable_an...

We have a handful of early customers but honestly don't know yet whether this is a real market or a niche problem. We're posting this to hear from people who've dealt with arbitrary spreadsheet ingestion. Whether you solved it, gave up, or are still living with the pain.

If you want to try it on your own files, email me (see my profile for my email) and I'll give you API access.

6

Multi-agent autoresearch for ANE inference beats Apple's CoreML by 6× #

ensue-network.ai faviconensue-network.ai
0 评论7:31 PM在 HN 查看
We ran an experiment over the weekend to explore whether multiple autonomous agents could collaboratively optimize inference on Apple’s Neural Engine (ANE).

Each agent ran locally on a different Mac (M1–M4), repeatedly modifying how a DistilBERT model is executed on the ANE, benchmarking latency, and sharing results and insights with other agents in real time.

Instead of exploring independently, agents could:

- see what others had tried - reuse working strategies - avoid known failure modes

Across all tested chips, the agents ended up outperforming Apple’s CoreML baseline, with up to 6.31× lower median inference latency on the same hardware.

An interesting pattern we observed: an agent stuck at ~2.1ms latency on M4 was able to break through after incorporating strategies discovered by agents on different chips (M2, M4 Max), eventually reaching ~1.5ms and surpassing CoreML.

Full write-up: https://x.com/christinetyip/status/2039040161439224157

Detailed results: https://ensue-network.ai/lab/ane?view=strategies https://ensue-network.ai/lab/ane

Curious what other optimization problems this kind of setup could be applied to, especially in systems, compilers, or ML infra. Would be interested in exploring similar experiments.

5

WebRTC video calls, no account needed #

just-call.app faviconjust-call.app
4 评论11:52 AM在 HN 查看
I built this to call my mom. FaceTime wasn't working reliably across our connection, so I threw together a WebRTC app. The quality surprised me, better than I expected, and it worked where FaceTime didn't.

just-call.app — no sign-up, no install, just a link.

Happy to answer questions or take feedback.

5

Prawduct, a product development framework for Claude Code #

github.com favicongithub.com
6 评论3:55 PM在 HN 查看
Claude Code is amazing at writing code, but it will happily build from under-specified requirements, implement the same thing different ways, and fail to write great tests unless you specifically ask it to.

Prawduct is a set of prompts, skills, hooks, and artifact templates that help focus Claude Code on product development rather than code development.

You can start from something a simple as "make a website with a scientific calculator" or as complex as "create a MMO with clients for iOS, Android, and web". You can specify as much or as little arch standards or implementation details as you want.

Specialized skills like /critic and /janitor are run automatically and apply context-less reviews to catch drift, hacks, and violations of best practices.

I've been using Prawduct myself for a couple of months, developing my own projects and also iterating on Prawduct itself (which is of course self-hosted on its own framework).

I'd love to hear feedback.

4

Vibe Check – UX Benchmark for vibe designs #

vibecheck.appvelocity.io faviconvibecheck.appvelocity.io
1 评论2:57 PM在 HN 查看
Vibe check shares benchmarking insights on any vibe coded URLs like: Make, Lovable, Claude code, V0, etc. Provide a link and we'll share your 'time-to-value'. Provide a URL, set a challenge("Find pricing and subscribe") then AI navigates in real-time to report on UX insights: Interactions, time-on-task, drop-off, etc. What's cool about this is that it gives you some quantitive data to act upon in improving the UX of product journeys.

Example of sim playback: https://app.appvelocity.io/vibe/simulation/8321db67-883b-445...

Example report: https://appvelocity-io.pmailroute.net/x/d?c=50527836&l=694a7...

If anyone shares some early Vibes, I'll run Vibe Check on your behalf and share some insights.

3

Gravity doesn't track mass, it tracks waveform complexity #

0 评论12:17 AM在 HN 查看
A bandwidth-limited neural network (that I've named Erebus) trained on public LIGO gravitational wave data finds that persistence — how much temporal accumulation helps predict structure — correlates with spectral entropy of the waveform (r = +0.69, p = 2×10⁻⁵) and not with mass (ρ = −0.05, p = 0.80). Two black hole mergers at identical mass differ by 100× in persistence. The difference is entirely explained by the spectral complexity of the signal, not the mass of the source. The pattern reproduces across four architecturally distinct observers (GRU, LSTM, Transformer, ViT-Small).

A companion paper establishes the framework: fix an observation pipeline and measure persistence across 13 real-data domains from independent instruments (LIGO, EHT, CMB, sunspots, quasars, supernovae, neutrinos). A single temporal axis organizes all domains, with electromagnetism at one pole and gravity at the other. A periodic signal through the same pipeline produces zero positive persistence across 180+ runs — accumulation destroys the wave when observation boundaries don't divide the period.

First Paper: https://zenodo.org/records/19323952 Second Paper: https://zenodo.org/records/19341889

Companion piece to the first paper: Light and gravity are opposite poles of observation (less technical) https://www.wvrk.org/works/the-structure

Erebus (the underlying system): https://erebus.org

2

Browserbeam – a browser API built for AI agents #

browserbeam.com faviconbrowserbeam.com
0 评论10:01 PM在 HN 查看
I often use LLMs to automate different workflows, some of which include browsing the web and gathering data. At some point I started noticing a few things that bothered me: the browser interactions were clunky, as if the agent was struggling to "see" and understand the page, and as a result, many tokens were wasted. Same for knowing when the page is actually ready or not.

I started digging deeper and at some point I just bluntly asked in the Cursor chat the following question: "I ask you, as an LLM that uses these headless browsers, what do you wish people would build to make your work easier?"

And it worked because I expanded the "Thinking" section and I saw: "The user is asking me a really interesting meta-question ..." and after that it just listed top 10 most painful issues related to the agent<->browser interaction.

So I started building a browser API that returns what LLMs actually need, not what browsers return.

Fast forward a few weeks and here we are. A REST API built specifically to help LLMs interact with real browsers.

Instead of reading raw HTML, you get markdown, page map, short refs (e1, e2) for clicking instead of CSS selectors, a stable flag when the page is ready, diffs after each step, the list of all interactive elements (links, buttons, inputs), automatic blocker dismissal and a small extract step that returns structured JSON from a schema you describe.

Official SDKs for Python, TypeScript, Ruby. MCP server for Cursor and Claude Desktop.

Would appreciate any feedback, especially on the API design.

2

An extension that opens any Goodreads book in anna's or Zlib in a click #

chromewebstore.google.com faviconchromewebstore.google.com
2 评论5:50 PM在 HN 查看
"Open Books in Zlib Or Anna's Archive with a button on Goodreads in one click."

I built a free, open source browser extension that adds buttons directly onto Goodreads book pages. Instead of copying titles and searching manually, you just click the badge for whichever source you want.

You can also toggle sources on/off so say if you only want Z-Lib and Anna's Archive badges and not Gutenberg, you can do exactly that.

Supported sources:

Anna's Archive

Z-Library

Project Gutenberg

AudioBookBay (new!)

Supported sites:

• Goodreads

• StoryGraph

• Hardcover

• Babelio

• Novelupdates

it is available on :

-chrome

-firefox

- Edge

Available on Chrome and Firefox. Also for firefox mobile

Anime.js used for animation

No data collected , you can verify that yourself via the source code on GitHub or the privacy page.

This has been updated to V1.0.8 !

it is free and open source ,

if you want to support me and like this extension , pls star it and rate it. ( Also you can github sponsor me! )

Thanks.

2

OpenClaw Arena – Benchmark models on real tasks, rank by perf and cost #

app.uniclaw.ai faviconapp.uniclaw.ai
0 评论5:52 PM在 HN 查看
We built an arena for comparing AI models on real agentic tasks — not chat or static benchmarks. Models run as actual OpenClaw subagent in fresh VMs with full tool access, and results feed into two separate leaderboards: performance and cost-effectiveness.

The problem: Chatbot Arena tests conversation quality. But most people using AI agents need them to do more: browse the web, manage files, write and run code, create full applications, automate multi-step workflows. There's no benchmark that (1) tests general-purpose agentic tasks, (2) uses user-submitted tasks instead of fixed test sets, and (3) separately ranks models on both quality and cost-effectiveness.

What we built: OpenClaw Arena lets you submit any task and pit 2-5 models against each other. A judge OpenClaw agent (currently using one of the top models: Claude Opus 4.6, GPT-5.4, or Gemini 3.1 Pro) runs on a fresh VM, spawns one subagent per model, and each model solves the task independently with full access to terminal, browser, file system, and code execution.

Results feed into two live leaderboards:

- Performance — which model produces the best results

- Cost-effectiveness — which model delivers the best quality per dollar

What we've found (after 300+ battles, 15 models):

The two rankings are completely different. Performance top 3: Claude Opus 4.6, GPT-5.4, Claude Sonnet 4.6. Cost-effectiveness top 3: Step 3.5 Flash, Grok 4.1 Fast, MiniMax M2.7.

Claude Opus 4.6 ranks #1 on performance but #14 on cost-effectiveness.

Step 3.5 Flash is #1 on cost-effectiveness, #5 on performance. (I didn't expect that TBH)

Several models (GLM-5 Turbo, Xiaomi MiMo v2 Pro, MiniMax M2.7) outrank Gemini 3.1 Pro on performance. Actually Gemini 3.1 Pro is so bad at using skills that we have to optimize the judge message just for it, otherwise it sometimes just reads the skill and decide to do nothing...

Note: we bootstrap first 300 battles by crawling what people are doing using OpenClaw (on X, Reddit, etc), and generate battles with similar tasks + randomly selected models.

Methodology: We only use the relative ordering of models within each battle to compute rankings — not the raw scores. Same principle as Chatbot Arena: absolute scores from judges are noisy and poorly calibrated (a "7/10" in one battle might be "6/10" in another), but "A ranked above B" is much more consistent and reliable. Rankings use a grouped Plackett-Luce model (not simple win-rate or Bradley-Terry) with 1,000-resample bootstrap confidence intervals. Each model entry shows score ± CI and a rank spread (plausible rank range). Models with insufficient data are marked "provisional." Full methodology with equations: https://app.uniclaw.ai/arena/leaderboard/methodology?via=hn

Key features:

- Live dual leaderboard (performance + cost-effectiveness) with Plackett-Luce ranking

- Dynamic user-submitted tasks across 11 categories (no fixed test set to overfit on), we will add more, just let me know what you want to add

- Fresh VM per benchmark with one subagent per model

- User-selectable judge model

- Full conversation history, judge reasoning, and workspace artifacts preserved and shown to users

- Full transparency: you can evaluate the output yourself, not just trust the score

- Open-source judge skill: https://github.com/unifai-network/skills/tree/main/agent-ben...

Public benchmarks are free (we cover compute). The leaderboard is browsable without an account.

- Leaderboard: https://app.uniclaw.ai/arena?via=hn

- Submit a battle: https://app.uniclaw.ai/arena/new?via=hn (free account required)

- Methodology: https://app.uniclaw.ai/arena/leaderboard/methodology?via=hn

- Judge skill source: https://github.com/unifai-network/skills/tree/main/agent-ben...

We'd love feedback on the methodology and what tasks you'd want to see benchmarked.

2

Mpump – browser groovebox where grooves are shareable links #

0 评论1:41 PM在 HN 查看
I built a groovebox for making loops: techno, house, acid, anything electronic that works in loops.

No install, no account.

The idea is simple: a groove is a URL. The full pattern, tempo, and sounds live in the link. You open it, hear it, change it, send it back different. Has drums, bass, synth, Euclidean rhythms, live jam sessions.

Would love feedback, especially from people who make this kind of music.

https://mpump.live Built with Claude Code. Source on GitHub (AGPL-3.0). https://github.com/gdamdam/mpump

1

MCP server that generates macOS tools via Open Scripting Architecture #

github.com favicongithub.com
0 评论11:25 PM在 HN 查看
Cf. The Claude computer use beta. Their own docs state that MCP servers are preferable to computer use [1]:

----

Claude has several ways to interact with an app or service. Computer use is the broadest and slowest, so Claude tries the most precise tool first:

- If you have an MCP server for the service, Claude uses that.

- If the task is a shell command, Claude uses Bash.

- If the task is browser work and you have Claude in Chrome set up, Claude uses that.

- If none of those apply, Claude uses computer use.

----

So osa-mcp adds an MCP server with tools for every AppleScript/JXA app it can find on the host, in order to maximize the first/best case. I've verified that it works with claude code and cowork, but it should work with any other MCP client as well. It also supports Remote Login via SSH [2].

This enables some pretty cool workflows and custom skills, e.g. "Read today's inbox in Mail and give me a summary. Check to see if I should schedule any additional meetings in Calendar, and if they conflict with the plans I made with anyone in Messages let them know. Then organize my notes for each meeting, and update their descriptions." Thus the capabilities of AppleScript are made available with natural language.

Inspiration is credited to [3] and [4] but neither exposes the entirety of OSA with MCP like this in a dynamic manner. Feedback is appreciated, I think it is pretty much an unofficial preview of the inevitable agentic Siri that will be released in a future OSX update.

[1] https://code.claude.com/docs/en/computer-use

[2] https://support.apple.com/guide/mac-help/allow-a-remote-comp...

[3] https://github.com/joshrutkowski/applescript-mcp

[4] https://github.com/supermemoryai/apple-mcp

1

Amoxide – The right aliases, at the right time #

amoxide.rs faviconamoxide.rs
0 评论11:30 PM在 HN 查看
Like direnv, but for aliases. Define aliases per project, per toolchain, or globally — and load the right ones automatically.

amoxide organizes aliases in three layers, from broadest to most specific:

- Global — always active, available in every shell session

- Profiles — named groups of aliases you can activate/deactivate

- Project — local .aliases files that auto-load per directory

Each layer can override the previous one. Project aliases override profile aliases, which override global aliases.

1

AI-Native NAACP #

naacp.ai faviconnaacp.ai
0 评论11:28 PM在 HN 查看
Testing my first platform thought this would be a good use case. Should be fairly self explanatory; if not, thats also explanatory.