Daily Show HN

Upvote0

Show HN for January 26, 2026

44 items
52

Ourguide – OS wide task guidance system that shows you where to click #

ourguide.ai faviconourguide.ai
22 comments6:19 PMView on HN
Hey! I'm eshaan and I'm building Ourguide -an on-screen task guidance system that can show you where to click step-by-step when you need help.

I started building this because whenever I didn’t know how to do something on my computer, I found myself constantly tabbing between chatbots and the app, pasting screenshots, and asking “what do I do next?” Ourguide solves this with two modes. In Guide mode, the app overlays your screen and highlights the specific element to click next, eliminating the need to leave your current window. There is also Ask mode, which is a vision-integrated chat that captures your screen context—which you can toggle on and off anytime -so you can ask, "How do I fix this error?" without having to explain what "this" is.

It’s an Electron app that works OS-wide, is vision-based, and isn't restricted to the browser.

Figuring out how to show the user where to click was the hardest part of the process. I originally trained a computer vision model with 2300 screenshots to identify and segment all UI elements on a screen and used a VLM to find the correct icon to highlight. While this worked extremely well—better than SOTA grounding models like UI Tars—the latency was just too high. I'll be making that CV+VLM pipeline OSS soon, but for now, I’ve resorted to a simpler implementation that achieves <1s latency.

You may ask: if I can show you where to click, why can't I just click too? While trying to build computer-use agents during my job in Palo Alto, I hit the core limitation of today’s computer-use models where benchmarks hover in the mid-50% range (OSWorld). VLMs often know what to do but not what it looks like; without reliable visual grounding, agents misclick and stall. So, I built computer use—without the "use." It provides the visual grounding of an agent but keeps the human in the loop for the actual execution to prevent misclicks.

I personally use it for the AWS Console's "treasure hunt" UI, like creating a public S3 bucket with specific CORS rules. It’s also been surprisingly helpful for non-technical tasks, like navigating obscure settings in Gradescope or Spotify. Ourguide really works for any task when you’re stuck or don't know what to do.

You can download and test Ourguide here: https://ourguide.ai/downloads

The project is still very early, and I’d love your feedback on where it fails, where you think it worked well, and which specific niches you think Ourguide would be most helpful for.

40

Cua-Bench – a benchmark for AI agents in GUI environments #

github.com favicongithub.com
8 comments5:46 PMView on HN
Hey HN, we're excited to share Cua-Bench ( https://github.com/trycua/cua ), an open-source framework for evaluating and training computer-use agents across different environments.

Computer-use agents show massive performance variance across different UIs—an agent with 90% success on Windows 11 might drop to 9% on Windows XP for the same task. The problem is OS themes, browser versions, and UI variations that existing benchmarks don't capture.

The existing benchmarks (OSWorld, Windows Agent Arena, AndroidWorld) were great but operated in silos—different harnesses, different formats, no standardized way to test the same agent across platforms. More importantly, they were evaluation-only. We needed environments that could generate training data and run RL loops, not just measure performance. Cua-Bench takes a different approach: it's a unified framework that standardizes environments across platforms and supports the full agent development lifecycle—benchmark, train, deploy.

With Cua-Bench, you can:

- Evaluate agents across multiple benchmarks with one CLI (native tasks + OSWorld + Windows Agent Arena adapters)

- Test the same agent on different OS variations (Windows 11/XP/Vista, macOS themes, Linux, Android via QEMU)

- Generate new tasks from natural language prompts

- Create simulated environments for RL training (shell apps like Spotify, Slack with programmatic rewards)

- Run oracle validations to verify environments before agent evaluation

- Monitor agent runs in real-time with traces and screenshots

All of this works on macOS, Linux, Windows, and Android, and is self-hostable.

To get started:

Install cua-bench:

% pip install cua-bench

Run a basic evaluation:

% cb run dataset datasets/cua-bench-basic --agent demo

Open the monitoring dashboard:

% cb run watch <run_id>

For parallelized evaluations across multiple workers:

% cb run dataset datasets/cua-bench-basic --agent your-agent --max-parallel 8

Want to test across different OS variations? Just specify the environment:

% cb run task slack_message --agent your-agent --env windows_xp

% cb run task slack_message --agent your-agent --env macos_sonoma

Generate new tasks from prompts:

% cb task generate "book a flight on kayak.com"

Validate environments with oracle implementations:

% cb run dataset datasets/cua-bench-basic --oracle

The simulated environments are particularly useful for RL training—they're HTML/JS apps that render across 10+ OS themes with programmatic reward verification. No need to spin up actual VMs for training loops.

We're seeing teams use Cua-Bench for:

- Training computer-use models on mobile and desktop environments

- Generating large-scale training datasets (working with labs on millions of screenshots across OS variations)

- RL fine-tuning with shell app simulators

- Systematic evaluation across OS themes and browser versions

- Building task registries (collaborating with Snorkel AI on task design and data curation, similar to their Terminal-Bench work)

Cua-Bench is 100% open-source under the MIT license. We're actively developing it as part of Cua (https://github.com/trycua/cua), our Computer Use Agent SDK, and we'd love your feedback, bug reports, or feature ideas.

GitHub: https://github.com/trycua/cua

Docs: https://cua.ai/docs/cuabench

Technical Report: https://cuabench.ai

We'll be here to answer any technical questions and look forward to your comments!

18

NukeCast – If it happened today, where would the fallout go #

nukecast.com faviconnukecast.com
6 comments3:21 AMView on HN
I built NukeCast because I’ve always wanted a tool that answers one question fast. If it happened today, where would the fallout plumes go and where would you drive to get out of it. NukeCast uses weather forecasts to drive a Lagrangian particle dispersion model with wet and dry deposition. Scenarios are defined by selecting strike site(s) and yield(s); the setup uses preselected US sites and yields based on FEMA emergency data. Outputs are shown as estimated ground level radiation dose at the surface over a 12 hour integration. Free to use with limits, and a paid tier if you want more runs/features because AWS compute time ain’t cheap.
5

Partial content web crawling using HTTP/2 and Go #

altayakkus.substack.com faviconaltayakkus.substack.com
0 comments2:38 PMView on HN
Hi, I wrote a low-level HTTP/2 web crawler in Go, which can scrape partial content to save traffic.

Tl;dr e.g. the HTML of a YouTube video contains the video description, views, likes etc. in its first 600KB, the remaining 900KB are of no use for me, but I have to pay my proxies by the gigabyte.

My crawler receives packet per packet, and if I got everything I needed I reset the request, and only pay-for-what-i-crawled.

This is also potentially useful for large-scale crawling operations, where duplicates matter. You could compute a simHash on the fly, and reset on-the-fly before crawling the entire document (again).

4

I got tired of checking 5 dashboards, so I built a simpler one #

anypanel.io faviconanypanel.io
1 comments4:03 PMView on HN
Hey, I’m Felix, an 18-year-old student from Austria. I’ve built a few small SaaS projects, mostly solo, and I kept running into the same small but persistent problem.

Whenever I wanted to understand how things were going, I’d end up jumping between Stripe, analytics, database queries, logs, and cron scripts. I even built custom dashboards and Telegram bots to notify me about certain numbers, but that just added more things to maintain.

What I wanted was something simpler: send a number from my backend and see it on a clean dashboard.

So I built a small tool for myself.

It’s essentially a very simple API where you push numeric metrics with a timestamp, and then view them as counters, charts, goals, or percentage changes over time.

It’s not meant to replace analytics tools. I still use those. This is more for things like user counts, MRR, failed jobs, or any metric you already know you want to track without setting up a full integration.

Some intentional constraints: - no SDKs, just a basic HTTP API - works well with backend code and cron jobs - stores only numbers and timestamps - flexible enough to track any metric you can turn into a number

It’s still early and very much an MVP. I’m mainly posting to get feedback: - does this solve a real problem for you? - what feels unnecessary or missing? - how would you approach this differently?

Website: https://anypanel.io

Happy to answer questions or hear why this doesn’t make sense. Thanks, Felix

4

A Local OS for LLMs. MIT License. Zero Hallucinations. Infinite Memory #

github.com favicongithub.com
3 comments1:29 AMView on HN
The problem with LLMs isn't intelligence; it's amnesia and dishonesty.

Hey HN,

I’ve spent the last few months building Remember-Me, an open-source "Sovereign Brain" stack designed to run entirely offline on consumer hardware.

The core thesis is simple: Don't rent your cognition.

Most RAG (Retrieval Augmented Generation) implementations are just "grep for embeddings." They are messy, imprecise, and prone to hallucination. I wanted to solve the "Context integrity" problem at the architectural layer.

The Tech Stack (How it works):

QDMA (Quantum Dream Memory Architecture): instead of a flat vector DB, it uses a hierarchical projection engine. It separates "Hot" (Recall) from "Cold" (Storage) memory, allowing for effectively infinite context window management via compression.

CSNP (Context Switching Neural Protocol) - The Hallucination Killer: This is the most important part. Every memory fragment is hashed into a Merkle Chain. When the LLM retrieves context, the system cryptographically verifies the retrieval against the immutable ledger.

If the hash doesn't match the chain: The retrieval is rejected.

Result: The AI visually cannot "make things up" about your past because it is mathematically constrained to the ledger. Local Inference: Built on top of llama.cpp server. It runs Llama-3 (or any GGUF) locally. No API keys. No data leaving your machine.

Features:

Zero-Dependency: Runs on Windows/Linux with just Python and a GPU (or CPU).

Visual Interface: Includes a Streamlit-based "Cognitive Interface" to visualize memory states. Open Source: MIT License. This is an attempt to give "Agency" back to the user. I believe that if we want AGI, it needs to be owned by us, not rented via an API.

Repository: https://github.com/merchantmoh-debug/Remember-Me-AI

I’d love to hear your feedback on the Merkle-verification approach. Does constraining the context window effectively solve the "trust" issue for you?

It's fully working - Fully tested. If you tried to Git Clone before without luck - As this is not my first Show HN on this - Feel free to try again.

To everyone who HATES AI slop; Greedy corporations and having their private data stuck on cloud servers.

You're welcome.

Cheers, Mohamad

3

Bytepiper – turn .txt files into live APIs #

bytepiper.com faviconbytepiper.com
4 comments7:45 AMView on HN
Hi HN,

I built a small tool that converts API logic written in plain .txt files into real, executable PHP API endpoints.

The motivation was personal: I can design and ship frontends quickly, but backend APIs (setup, boilerplate, deployment) always slowed down small projects and MVPs. I wanted a way to describe inputs, rules, and responses in text and get a working endpoint without worrying about infrastructure.

This is early and opinionated. I’m especially interested in feedback around:

trust and security concerns

where this breaks down

whether this is useful beyond prototypes

Happy to answer questions about how it works.

2

Debugging conflicting U.S. sexual behavior surveys #

osf.io faviconosf.io
0 comments2:06 AMView on HN
I'm the author of a new preprint that tries to resolve why major U.S. sexual behavior surveys appeared to report contradictory trends over the past decade. The key move is separating never-occurrence from temporary inactivity among the experienced, rather than averaging them together. That decomposition is applied symmetrically across surveys and validated against an independent dataset. The paper then treats the remaining discrepancy as a debugging problem: • check distributions for digit heaping and compression • look for stock–flow reversals consistent with under-reporting • compare adjacent survey waves for internal consistency • cross-validate against an external series not subject to the same reporting incentives Once those diagnostics are applied, the conflicting results reconcile cleanly. This is a methods/measurement paper, not a causal one. The contribution is showing how small reporting artifacts and aggregation choices can produce large apparent disagreements—even in high-quality surveys—when they aren't handled carefully. (One note for the methodologically inclined: conditioning on "sexually experienced" looks like it could induce collider bias. Section 4 and Appendix D5 address this directly—the gender gap in sexual debut didn't change differentially, so selection into the analysis sample is symmetric.) Preprint: https://osf.io/preprints/socarxiv/jcdbm_v2 Replication code: https://github.com/Joshfkon/ResearchPaper_PartnershipGap Happy to discuss the diagnostics, decomposition logic, or limitations.
2

SHDL – A minimal hardware description language built from logic gates #

github.com favicongithub.com
2 comments12:34 AMView on HN
Hi, everyone!

I built SHDL (Simple Hardware Description Language) as an experiment in stripping hardware description down to its absolute fundamentals.

In SHDL, there are no arithmetic operators, no implicit bit widths, and no high-level constructs. You build everything explicitly from logic gates and wires, and then compose larger components hierarchically. The goal is not synthesis or performance, but understanding: what digital systems actually look like when abstractions are removed.

SHDL is accompanied by PySHDL, a Python interface that lets you load circuits, poke inputs, step the simulation, and observe outputs. Under the hood, SHDL compiles circuits to C for fast execution, but the language itself remains intentionally small and transparent.

This is not meant to replace Verilog or VHDL. It’s aimed at:

- learning digital logic from first principles

- experimenting with HDL and language design

- teaching or visualizing how complex hardware emerges from simple gates

I would especially appreciate feedback on:

- the language design choices

- what feels unnecessarily restrictive vs. educationally valuable

- whether this kind of “anti-abstraction” HDL is useful to you

Repo: https://github.com/rafa-rrayes/SHDL

Python package: PySHDL on PyPI

Thanks for reading, and I’m very open to critique.

2

Alprina – Intent matching for co-founders and investors #

alprina.com faviconalprina.com
1 comments11:37 AMView on HN
We noticed that professional networking is mostly profile-based: you create a static resume and hope the right people somehow discover you. But what actually drives meaningful connections is complementary intent: a founder looking for a technical co-founder, an engineer wanting to join a specific type of early-stage company, an investor actively looking for a certain thesis.

We built Alprina to match people on what they want right now, not just who they are on paper.

On Alprina, you create "intents" in natural language (what you're looking for), join networks (communities where matching happens), and our AI matches you with people whose intents complement yours. You can attach context like pitch decks or profiles so that when you match, the other side immediately understands why you're reaching out.

Would love feedback from the HN community - especially on the balance between match precision and serendipity. Too strict and you miss interesting connections; too loose and it's just noise.

1

Ideon – An open source, infinite canvas for your project's segmentation #

theideon.com favicontheideon.com
0 comments6:48 PMView on HN
The problem: I want to use my tools. You want to use yours. The contractor we hired uses yet another set of tools. This causes unnecessary friction for all participants.

Ideon is a self-hosted visual workspace designed to bridge this gap. It doesn't replace your existing stack (GitHub, Figma, Notion, etc.) but provides a shared context where all these pieces live together on an infinite canvas.

We built this because projects often die from fragmentation—code is in one place, decisions in chat logs, and visuals in design tools. Ideon aims to keep the project "mentally navigable" for everyone involved.

Key features: - Visual Blocks: Organize Repositories, Notes, Links, Files, and People spatially. - State History: Track how decisions evolved with workspace snapshots. - Multiplayer: Real-time collaboration. - Self-hosted: Docker-based and AGPLv3 licensed.

Tech stack: Next.js, PostgreSQL, Docker.

Would love to hear your feedback on the approach!

1

PillarLabAI – A reasoning engine for prediction markets #

pillarlabai.com faviconpillarlabai.com
0 comments7:17 PMView on HN
Hi HN, I’m the creator of PillarLab. I noticed that most people betting on prediction markets like Polymarket or Kalshi were either guessing or using generic LLMs that hallucinate data.

I built PillarLab to solve the 'black box' problem of AI. It uses 1,720+ proprietary 'Pillars' (analytical frameworks) to guide the AI through rigorous logic, things like Sharp Money tracking, xG soccer models, and Line Movement.

I’d love your feedback on the reasoning. Does the weighting of factors make sense to you? I'll be here to answer questions all day!

1

AI Compass:Daily AI Search Signals and Trends #

theaidirection.com favicontheaidirection.com
1 comments8:00 AMView on HN
Hi Hacker Friends,

I’m sharing AI Compass, a daily AI signal brief built for builders who want facts over hype.

AI moves fast. We track real-time signals across Google Trends and web news, then use AI clustering, denoising, and source attribution to surface what actually matters: model launches, company moves, and emerging terms.

You get a structured daily brief with traceable sources and clear takeaways, so you can understand the landscape in minutes.

Our priorities:

1. High signal-to-noise: only accurate, relevant items. No hype. 2. Objective and transparent: conclusions backed by traceable evidence. 3. Fast to consume: built for developers, PMs, and indie builders.

This is a new launch. Feedback, bug reports, and feature ideas are welcome.

1

GitHub Action that analyzes CI failures using AI #

github.com favicongithub.com
0 comments8:09 AMView on HN
I built a GitHub Action that analyzes workflow failures and correlates them with PR changes. It uses cordon (transformer-based semantic anomaly detection) to reduce massive logs to unusual patterns, then DSPy to orchestrate LLM analysis of failures. Then, it determines whether PR changes likely caused the issue and posts structured root cause reports.

Marketplace: https://github.com/marketplace/actions/github-actions-failur...

1

The Poor Man's Guide to Cloud GPU Selection #

gist.github.com favicongist.github.com
0 comments11:22 AMView on HN
I wrote an article about a cloud GPU selection technique that would maximize FLOPs per $.

Taking the pre-training of LLMs as an example, it shows how the cost-optimal GPU changes depending on the computational intensity (∝ model size x batch size).

1

CSR vs. SSR Detector #

chromewebstore.google.com faviconchromewebstore.google.com
0 comments11:23 AMView on HN
Hi HN

After work I built a small Chrome extension that detects whether a webpage is rendered using Server Side Rendering, Client Side Rendering, or a hybrid approach.

As a frontend developer I often wanted a quick way to check how a site is rendered without opening devtools and digging through network and DOM. This started as a personal after hours project and turned into something I use daily, so I decided to share it.

What it does • Detects SSR, CSR, or hybrid rendering • Recognizes frameworks like Next.js, React, Nuxt, Gatsby and others • Shows basic performance timings like DOM ready and FCP • Keeps last 10 checks in local history • Works fully locally with no data collection

Accuracy is based on 15 plus indicators and works surprisingly well across modern stacks.

Everything is open source. No tracking. No external servers. Just a lightweight dev tool.

I recently improved React 18 detection, fixed history display, added better error handling, and cleaned up docs and roadmap.

This is very much a side project made for fun and learning. If it helps even a few devs or SEO folks, I will be happy.

https://chromewebstore.google.com/detail/csr-vs-ssr-detector...

Feedback and suggestions are welcome. Thanks for checking it out

1

ZigZag – Generate Markdown code reports from directories (Zig) #

github.com favicongithub.com
0 comments11:24 AMView on HN
Hi HN,

I built ZigZag, a command-line tool written in Zig that recursively scans source code directories and generates a single markdown report containing the code and metadata.

It’s designed to be fast on large codebases and uses: - Parallel directory and file processing - A persistent on-disk cache to avoid re-reading unchanged files - Different file reading strategies based on file size (read vs mmap) - Timezone-aware timestamps in reports

Each directory produces a report.md with a table of contents, syntax-highlighted code blocks, file sizes, modification times, and detected language.

Repo: https://github.com/LegationPro/zigzag

I built this mainly for auditing and documenting large repositories. Feedback, critiques, and ideas are welcome.

1

SheetSage – A Linter for the Most Dangerous Programming Language #

sheetsage.co faviconsheetsage.co
3 comments11:38 AMView on HN
I built SheetSage because "Silent Failures" in spreadsheets are a massive unmanaged risk in finance and ops. Most tools just find broken references (#REF!), but the real killers are logical errors like a VLOOKUP defaulting to approximate match on unsorted data, returning a plausible but wrong value.

The Technical Implementation:

Locale aware parsing: Since Google Sheets doesn’t provide an AST for formulas, I had to build a conservative parser that tracks quotes, parens, and braces to extract function calls without getting poisoned by strings or array literals. It handles localized argument separators (, vs ;) and decimal separators (, vs .) based on the spreadsheet's locale.

R1C1 Clustering: To avoid UI noise, I don't treat every cell as a unique finding. I normalize formulas using getFormulasR1C1() to identify templates that have been copied down. This allows the fix all engine to refactor thousands of cells in one batch.

The systemic softcap scoring: standard penalty per thousand metrics often under react to widespread errors. I implemented a continuous soft-cap model. It calculates union coverage for risks—if a critical error covers 40% of your workbook, your health score is soft-capped regardless of how many other healthy cells you have.

Snapshot & Rollback: Since I’m mutating user data, I implemented a SnapshotService that writes original formulas to a hidden SheetSage_SNAPSHOT sheet before any bulk fix. This provides a native "Undo" even after the Apps Script execution finishes.

Privacy: No spreadsheet data ever leaves the Google environment. The audit engine runs entirely in Apps Script. The only external call is a signed HMAC request to a Vercel/Next.js billing service to verify subscription entitlements via a stable clientId.

I'd love to discuss the heuristics I'm using to distinguish magic numbers from legitimate constants (like 24 for hours), and how I'm handling LockService to prevent race conditions during bulk refactoring.

1

InsAIts V2 – Real-time monitoring for multi-agent AI communication #

github.com favicongithub.com
0 comments3:57 AMView on HN
Hi HN, I'm Cristian, an indie developer from Italy working on tools to make multi-agent AI safer and more debuggable. When agents talk to each other (CrewAI, LangGraph, AutoGen, custom setups), they quickly develop shorthand, lose context, invent jargon, and propagate hallucinations — all invisible to us. InsAIts is a lightweight Python SDK that adds observability in ~3 lines of code. V2 just shipped with:

Anchor-aware detection (set the user's original query as context to reduce false positives) Forensic root-cause tracing + ASCII chain visualization Built-in domain dictionaries (finance, healthcare, kubernetes, ML, devops, quantum) Local (Ollama) decipher mode — translates agent jargon to human-readable (Cloud soon) Integrations: Slack alerts, Notion/Airtable export, LangGraph/CrewAI wrappers

Privacy-first: local embeddings by default, nothing leaves your machine unless you opt into cloud decipher. Free tier works without an API key (local only). Also running limited lifetime deals for early supporters. Quick install: Bashpip install insa-its[full] Demos included:

Live terminal dashboard Marketing team agent simulation (watch shorthand emerge in real time)

GitHub: https://github.com/Nomadu27/InsAIts PyPI: https://pypi.org/project/insa-its/ Docs: https://insaitsapi-production.up.railway.app/docs Would love feedback — especially from anyone building agent crews or running multi-LLM systems in production. What’s your biggest pain point with agent observability? Thanks for checking it out!

Cristian

1

Python SDK for RamaLama AI Containers #

github.com favicongithub.com
0 comments6:43 PMView on HN
TL;DR An SDK for running AI on-device with even the most non-standard hardware.

Hey, I’m one of the maintainers of RamaLama[1] which is part of the containers ecosystem (podman, buildah, skopeo). It’s a runtime-agnostic tool for coordinating local AI inference with containers.

I put together a python SDK for programmatic control over local AI using ramalama under the hood. Being runtime agnosti you can use ramalama with llama.cpp, vLLM, mlx, etc… so long as the underlying service exposes an OpenAI compatible endpoint. This is especially powerful for users deploying to edge or other devices with atypical hardware/software configuration that, for example, requires custom runtime compilations.

``` from ramalama_sdk import RamalamaModel

runtime_image = "quay.io/ramalama/ramalama:latest" model = "huggingface://ggml-org/gpt-oss-20b-GGUF"

with RamalamaModel(model, base_image=runtime_image) as model:

    response = model.chat("How tall is Michael Jordan?")

    print(response["content"])
```

This SDK manages:

  - Pulling and verifying runtime images
  - Downloading models (HuggingFace, Ollama, ModelScope, OCI registries)
  - Managing the runtime process
It works with air-gapped deployments and private registries and also has async support.

If you want to learn more the documentation is available here: https://docs.ramalama.com/sdk/introduction.

Otherwise, I hope this is useful to people out there and would appreciate feedback about where to prioritize next whether that’s specific language support, additional features (speech to text? RAG? MCP?), or something else.

1. github.com/containers/ramalama