2025년 9월 3일의 Show HN
24 개Entropy-Guided Loop – How to make small models reason #
What it does - Captures logprobs/top-k during generation, computes perplexity and token-level entropy.
- Triggers at most one refine when simple thresholds fire; passes a compact “uncertainty report” (uncertain tokens + top-k alts + local context) back to the model.
- In our tests on technical Q&A / math / code, a small model recovered much of “reasoning” quality at ~⅓ the cost while refining ~⅓ of outputs.
Why I built it I kept seeing “reasoning” models behave like expensive black boxes. Meanwhile, standard inference already computes useful signals both before softmax normalization and after it(logprobs), which we usually throw away. This loop tries the simplest thing that you could think of: use those signals to decide when (and where) to think again.
How to try it GitHub (notebook + minimal code): https://github.com/monostate/weave-logprobs-reasoning-loop Paper (short & engineer made): http://arxiv.org/abs/2509.00079 Blog (more context): https://monostate.ai/blog/entropy-refinement-blog
Requirements: Python, API that exposes logprobs (tested with OpenAI non reasoning 4.1). OPENAI_API_KEY and WEAVE for observability. Run the notebook; it prints metrics and shows which tokens triggered refinement.
Stack / notes - Python, simple loop (no retraining). - Uses Responses API logprobs/top-k; metrics: perplexity, max token entropy, low-confidence counts. - Weave for lightweight logging/observability (optional).
What I learned / things that mattered - Passing alternatives (not just “this looks uncertain”) prevents over-correction. - A simple OR rule (ppl / max-entropy / low-confidence count) catches complementary failure modes. - Numbers drift across vendors; keeping the method vendor-agnostic is better than chasing fragile pairings.
Limitations / caveats - Needs APIs that expose logprobs/top-k. - Results are indicative—not a leaderboard; focus is on within-model gains (single-pass vs +loop). - Thresholds might need light tuning per domain. - One pass only; not a chain-of-thought replacement.
Asks / feedback - Run it on your models and ideas (e.g., 4o-mini, v3, Llama variants with logprobs) and share logs in a PR for our README in GitHub if you'd like, PRs welcome - I’ll credit and link.
Overall let me know if you find making small models reason like this useful!
Chibi, AI that tells you why users churn #
I’ve been a PM for 3 years, and one hard part was always understanding why users churn, drop off and behave the way they do!
Session replays had the answer, but watching hours of them was painful.
I chatted with a bunch of founder friends and PMs and they too had similar troubles.
So I built Chibi an AI that watches replays and tells you what’s broken, confusing, or causing drop-off.
Long Term: I'm thinking if Chibi could evolve into an AI product manager co-worker that can detect and prioritize issues, think through features and even run A/B tests.
Tech Stack: Elixir + Phoenix, rrweb and gemini
Would love to know what you think :)
Happy to answer any questions too
I built an AI that uses a metacognitive loop 2 solve invention problems #
TwoTickets – meet through events, not swipes #
On TwoTickets, you Twoot an event in order to connect with others around that shared plan. The flow is simple: Twoot → Match → Chat → Decide → Go. You don’t see profiles or matches until you Twoot events, so plans come first and profiles second.
The aim is to make meeting new people more natural: the event itself is the ice-breaker, not a random line in a bio. We’re in soft launch now and would love feedback from HN — does this “event-first” approach resonate with you, and where do you see the pitfalls?
Paul Graham once said that all dating apps are really just matching apps. I wonder: how close is this to a solution to that assertion — although TwoTickets is broader than dating.
*Links:* - Website: https://www.twotickets.us - iOS App: https://apps.apple.com/us/app/twotickets-match-eventfully/id...
Tail Lens – Visually edit tailwind css dev tool #
Try it live here - https://taillens.io
Text2SQL with a Graph Semantic Layer #
Instead of feeding the model a list of tables and columns, we feed it a graph that understands what a customer is, how it connects to orders, which products belong to a campaign, and what "active user" actually means in your business context. We used FalkorDB for the graph part because it handles relationship mapping better than cramming table schemas into prompts. Graphiti tracks the conversation so follow-ups actually work. Final notes: Your data stays in your databases. We read from existing schemas, never migrate data. Standard SQL outputs you can run anywhere. We've built an MCP and you can generate an API key to take it for a spin. Please, tell us how it’s working out for you!
Multi-Agent-Coder Is #12 on Stanford's TBench. Beats Claude Code #
The architecture is straightforward, consisting of an orchestrator agent that deploys explorer & coder subagents to complete complex terminal based tasks, utilising an intelligent context sharing mechanism along the way which makes it all work.
The repo has a lot of technical details, and all the code and prompts for you to play around with if you'd like!
I had a lot of fun making this, I hope you have fun reading the README, using it yourself, or even extending it!
As always, a huge thanks to the great team behind Terminal Bench. It is a great benchmark.
Thanks for reading, Dan
A hacky app for location sharing without suirvellance #
Best JSON Comparison Tool #
For the lack of a clean, accurate and feature heavy json comparison tool out there. I made jsontoolbox compare tool.
This is the only tool that- - does real time comparison - shows JSON path dynamically as you navigate the json - allows type/paste in, import from file or drag-drop 1/2 files in the editor to compare - lets you choose if you want sync-scroll or not - sorts both json (only)if you like to see a sorted diff - lets you swap both json - lets you download each json separately with a custom file name - works completely on client side - has no ads - has dark/light mode
It is also one of the best JSON Formatter/Minifier out there :)
I know there is a sea of such tools out there, but as a developer none were good enough for my use case. Please try it out and share feedback.
Trending rust NTP inspection CLI #
Just came across a crate on crates.io that recently hit v1.0.0. It’s called rkik - basically a "dig for NTP". I hadn’t seen a tool like this in Rust before.
Looks pretty handy: it can query and compare NTP servers, output JSON for monitoring, and even run continuous checks. Seems to be getting some traction in the Rust community - might be worth a look if you’re into System administration, networking or DevOps.
Listgitfiles.sh – Fetch Raw GitHub File URLs with One Command #
Turn any PDF research paper into a video explanation using AI #
I built a tool that generates a video explanation from a pdf link to a research paper using AI.
Just paste in a PDF link, and it creates a narrated video that deeply explains the paper, including detailed explanations of plots and figures. The goal is to make digesting research papers require a bit less effort.
Link: https://researchpapervideos.com/
Would love to hear what you think!
dvcdbg 0.3.0: 1.3KB Initialization Sequence Explorer(Arduino in Rust) #
So I wrote my own driver, and along the way implemented a *1.3KB algorithm to search and verify the initialization sequence* on an Arduino Uno.
Key features: - Iterative topological sort to explore init sequences - Optimized for AVR constraints using bit flags and static arrays - Utilities: I2C scanner, hex/binary dump, serial adapter
This actually initializes a Grove OLED on Arduino Uno using just 1.3KB SRAM.
Code & docs: https://github.com/p14c31355/dvcdbg Crate: https://crates.io/crates/dvcdbg
Fst – Lightweight C utility for detailed directory statistics LGPL 3.0 #
I’ve just released fst, a minimalistic C utility that provides comprehensive statistics about directories. It’s designed to be fast, with no dependencies, and is fully statically compilable.
Features:
Counts of files, directories, empty files/folders
Classifies binary, text, and script files
Displays min, max, and average file sizes
Identifies recent and oldest files
Lists executable files, symbolic and hard links
Supports human-readable sizes and recursive stats
Key Points:
Written in pure C, minimal dependencies, lightning fast
Works on both ARM and x86 architectures, fully statically compilable
Can run with no arguments (defaults to the current directory)
Licensed under LGPL 3.0
GitHub repository: https://github.com/Ferki-git-creator/fst-ferki
Submitted to GNU for review
I'm seeking feedback, ideas, and contributions from the community. The goal is to enhance usability, add more metrics, and possibly extend internationalization support in the future.
Any thoughts or suggestions would be highly appreciated!
Deep Researcher Web App, Node.js-Based (Open Source; MIT) #
The app requires your own api key to run OpenAI's LLMs and the web search tool. The LLM vendor can easily be swapped for another but the web search tool substitution will require work.
- GitHub repo: https://github.com/Antibody/deep-researcher-node