Daily Show HN

Upvote0

Show HN for March 12, 2026

54 items
116

We analyzed 1,573 Claude Code sessions to see how AI agents work #

github.com favicongithub.com
72 comments1:41 PMView on HN
We built rudel.ai after realizing we had no visibility into our own Claude Code sessions. We were using it daily but had no idea which sessions were efficient, why some got abandoned, or whether we were actually improving over time.

So we built an analytics layer for it. After connecting our own sessions, we ended up with a dataset of 1,573 real Claude Code sessions, 15M+ tokens, 270K+ interactions.

Some things we found that surprised us: - Skills were only being used in 4% of our sessions - 26% of sessions are abandoned, most within the first 60 seconds - Session success rate varies significantly by task type (documentation scores highest, refactoring lowest) - Error cascade patterns appear in the first 2 minutes and predict abandonment with reasonable accuracy - There is no meaningful benchmark for 'good' agentic session performance, we are building one.

The tool is free to use and fully open source, happy to answer questions about the data or how we built it.

83

OneCLI – Vault for AI Agents in Rust #

github.com favicongithub.com
31 comments4:41 PMView on HN
We built OneCLI because AI agents are being given raw API keys. And it's going about as well as you'd expect. We figured the answer isn't "don't give agents access," it's "give them access without giving them secrets."

OneCLI is an open-source gateway that sits between your AI agents and the services they call. You store your real credentials once in OneCLI's encrypted vault, and give your agents placeholder keys. When an agent makes an HTTP call through the proxy, OneCLI matches the request by host/path, verifies the agent should have access, swaps the placeholder for the real credential, and forwards the request. The agent never touches the actual secret. It just uses CLI or MCP tools as normal.

Try it in one line: docker run --pull always -p 10254:10254 -p 10255:10255 -v onecli-data:/app/data ghcr.io/onecli/onecli

The proxy is written in Rust, the dashboard is Next.js, and secrets are AES-256-GCM encrypted at rest. Everything runs in a single Docker container with an embedded Postgres (PGlite), no external dependencies. Works with any agent framework (OpenClaw, NanoClaw, IronClaw, or anything that can set an HTTPS_PROXY).

We started with what felt most urgent: agents shouldn't be holding raw credentials. The next layer is access policies and audit, defining what each agent can call, logging everything, and requiring human approval before sensitive actions go through.

It's Apache-2.0 licensed. We'd love feedback on the approach, and we're especially curious how people are handling agent auth today.

GitHub: https://github.com/onecli/onecli Site: https://onecli.sh

49

Understudy – Teach a desktop agent by demonstrating a task once #

github.com favicongithub.com
12 comments5:04 PMView on HN
I built Understudy because a lot of real work still spans native desktop apps, browser tabs, terminals, and chat tools. Most current agents live in only one of those surfaces.

Understudy is a local-first desktop agent runtime that can operate GUI apps, browsers, shell tools, files, and messaging in one session. The part I'm most interested in feedback on is teach-by-demonstration: you do a task once, the agent records screen video + semantic events, extracts the intent rather than coordinates, and turns it into a reusable skill.

Demo video: https://www.youtube.com/watch?v=3d5cRGnlb_0

In the demo I teach it: Google Image search -> download a photo -> remove background in Pixelmator Pro -> export -> send via Telegram. Then I ask it to do the same for Elon Musk. The replay isn't a brittle macro: the published skill stores intent steps, route options, and GUI hints only as a fallback. In this example it can also prefer faster routes when they are available instead of repeating every GUI step.

Current state: macOS only. Layers 1-2 are working today; Layers 3-4 are partial and still early.

    npm install -g @understudy-ai/understudy
    understudy wizard
GitHub: https://github.com/understudy-ai/understudy

Happy to answer questions about the architecture, teach-by-demonstration, or the limits of the current implementation.

30

Aurion OS – A 32-bit GUI operating system written from scratch in C #

github.com favicongithub.com
15 comments6:33 PMView on HN
Hi HN! I'm 13 and I built Aurion OS as a solo learning project over 14 days (~12 hours/day).

It's a 32-bit x86 operating system written entirely in C and x86 Assembly with no external libraries.

What it has: Custom bootloader and kernel VESA framebuffer graphics (1920x1080, double-buffered) Window manager with draggable, overlapping windows macOS-inspired dock with transparency PS/2 keyboard and mouse drivers ATA hard drive driver with filesystem PCI bus enumeration RTL8139 network driver (WIP) Real-time clock Runs on just 16MB RAM (up to 10 windows simultaneously)

Built-in apps: Terminal (with DOS mode), Notepad (save/load), Calculator, Paint (multiple colors and brush sizes), Snake game, Settings (theme switching), and System Info.

Currently works best on QEMU, VirtualBox, and VMware. Real hardware support is still a work in progress.

Next goal: TCP/IP networking stack.

I'd love any feedback, suggestions, or criticism. This is my first OS project and I learned mass amounts while building it. Happy to answer any technical questions!

15

LogClaw – Open-source AI SRE that auto-creates tickets from logs #

logclaw.ai faviconlogclaw.ai
10 comments5:06 PMView on HN
Hi HN, I'm Robel. I built LogClaw because I was tired of paying for Datadog and still waking up to pages that said "something is wrong" with no context.

LogClaw is an open-source log intelligence platform that runs on Kubernetes. It ingests logs via OpenTelemetry and detects anomalies using signal-based composite scoring — not simple threshold alerting. The system extracts 8 failure-type signals (OOM, crashes, resource exhaustion, dependency failures, DB deadlocks, timeouts, connection errors, auth failures), combines them with statistical z-score analysis, blast radius, error velocity, and recurrence signals into a composite score. Critical failures (OOM, panics) trigger the immediate detection path in <100ms — before a time window even completes. The detection achieves 99.8% for critical failures while filtering noise (validation errors and 404s don't fire incidents).

Once an anomaly is confirmed, a 5-layer trace correlation engine groups logs by traceId, maps service dependencies, tracks error propagation cascades, and computes blast radius across affected services. Then the Ticketing Agent pulls the correlated timeline, sends it to an LLM for root cause analysis, and creates a deduplicated ticket on Jira, ServiceNow, PagerDuty, OpsGenie, Slack, or Zammad. The loop from log noise to a filed ticket is about 90 seconds.

Architecture: OTel Collector → Kafka (Strimzi, KRaft mode) → Bridge (Python, 4 concurrent threads: ETL, anomaly detection, OpenSearch indexing, trace correlation) → OpenSearch + Ticketing Agent. The AI layer supports OpenAI, Claude, or Ollama for fully air-gapped deployments. Everything deploys with a single Helm chart per tenant, namespace-isolated, no shared data plane.

To try it locally: https://docs.logclaw.ai/local-development

What it does NOT do yet: - Metrics and traces — this is logs-only right now. Metrics support is on the roadmap. - The anomaly detection is signal-based + statistical (composite scoring with z-score), not deep learning. It catches 99.8% of critical failures but won't detect subtle performance drift patterns yet. - The dashboard is functional but basic. We use OpenSearch Dashboards for the heavy lifting.

Licensed Apache 2.0. The managed cloud version is $0.30/GB ingested if you don't want to self-host.

Hi HN — I’m Robel. I built LogClaw after getting tired of waking up to alerts that only said “something is wrong” with no context. LogClaw is an open-source log intelligence platform for Kubernetes. It ingests logs via OpenTelemetry and detects operational failures using signal-based anomaly detection rather than simple thresholds. Instead of looking at a single metric, LogClaw extracts failure signals from logs (OOMs, crashes, dependency failures, DB deadlocks, timeouts, etc.) and combines them with statistical signals like error velocity, recurrence, z-score anomalies, and blast radius to compute a composite anomaly score. Critical failures bypass time windows and trigger detection in <100ms. Once an anomaly is confirmed, a correlation engine reconstructs the trace timeline across services, detects error propagation, and computes the blast radius. A ticketing agent then generates a root-cause summary and creates deduplicated incidents in Jira, ServiceNow, PagerDuty, OpsGenie, Slack, or Zammad. Architecture: OTel Collector → Kafka → Detection Engine → OpenSearch → Ticketing Agent Repo: https://github.com/logclaw/logclaw Would love feedback from people running large production systems.

13

I built an SDK that scrambles HTML so scrapers get garbage #

obscrd.dev faviconobscrd.dev
29 comments1:27 PMView on HN
Hey HN -- I'm a solo dev. Built this because I got tired of AI crawlers reading my HTML in plain text while robots.txt did nothing.

The core trick: shuffle characters and words in your HTML using a seed, then use CSS (flexbox order, direction: rtl, unicode-bidi) to put them back visually. Browser renders perfectly. textContent returns garbage.

On top of that: email/phone RTL obfuscation with decoy characters, AI honeypots that inject prompt instructions into LLM scrapers, clipboard interception, canvas-based image rendering (no img src in DOM), robots.txt blocking 30+ AI crawlers, and forensic breadcrumbs to prove content theft.

What it doesn't stop: headless browsers that execute CSS, screenshot+OCR, or anyone determined enough to reverse-engineer the ordering. I put this in the README's threat model because I'd rather say it myself than have someone else say it for me. The realistic goal is raising the cost of scraping -- most bots use simple HTTP requests, and we make that useless.

TypeScript, Bun, tsup, React 18+. 162 tests. MIT licensed. Nothing to sell -- the SDK is free and complete.

Best way to understand it: open DevTools on the site and inspect the text.

GitHub: https://github.com/obscrd/obscrd

8

We open sourced Vapi – UI included #

github.com favicongithub.com
5 comments3:03 PMView on HN
We kept hitting the same wall building voice AI systems. Pipecat and LiveKit are great projects, genuinely. But getting it to production took us weeks of plumbing - wiring things together, handling barge-ins, setting up telephony, Knowledge base, tool calls, handling barge in etc. And every time we needed to tweak agent behavior, you were back in the code and redeploying. We just wanted to change a prompt and test it in 30 seconds. Thats why Vapi retell etc exist.

So we wrote the entire code and open sourced it as a Visual drag-and-drop for voice agents ( same as vapi or n8n for voice). Built on a Pipecat fork and BSD-2, no strings attached. Tool calls, knowledge base, variable extraction, voicemail detection, call transfer to humans, multilingual support, post-call QA, background noise suppression, and a website widget are all included. You're not paying per-minute fees to a middleman wrapping the same APIs you'd call directly.

You can set it up with a simple docker command. It comes pre-wired with Deepgram, Cartesia, OpenAI , Speechmatics Sarvam for STT, same for TTS, and OpenAI, Gemini, groq, Openrouter, Azure on the LLM side. Telephony works out of the box with Twilio, Vonage , CLoudonix and Asterisk for both inbound and outbound.

There's a hosted version at app.dograh.com if self-hosting isn't your thing.

Repo: github.com/dograh-hq/dograh Video walkthrough: https://youtu.be/sxiSp4JXqws

We built this out of frustration, not a thesis. The tool is free to use and fully open source (and will always remain so), happy to answer questions about the data or how we built it.

6

PipeStep – Step-through debugger for GitHub Actions workflows #

github.com favicongithub.com
2 comments5:08 PMView on HN
Hey HN — I kept seeing developers describe the same frustration: the commit-push-wait-read-logs cycle when debugging CI pipelines. So I built PipeStep.

PipeStep parses your GitHub Actions YAML, spins up the right Docker container, and gives you a step-through debugger for your run: shell commands. You can:

- Pause before each step and inspect the container state - Shell into the running container mid-pipeline (press I) - Set breakpoints on specific steps (press B) - Retry failed steps or skip past others

It deliberately does not try to replicate the full GitHub Actions runtime — no secrets, no matrix builds, no uses: action execution. For full local workflow runs, use act. PipeStep is for when things break and you need to figure out why without pushing 10 more commits. Think of it as gdb for your CI pipeline rather than a local GitHub runner.

pip install pipestep (v0.1.2) · Python 3.11+ · MIT · Requires Docker

Would love feedback, especially from people who've hit the same pain point. Known limitations are documented in the README + have some issues in there that I'd love eyeballs on!

6

I built Chronoscope, because Google Maps won't let you visit 3400 BCE #

shiphappens.xyz faviconshiphappens.xyz
3 comments10:33 AMView on HN
I built Chronoscope, a project to explore the world through time.

I've been wanting to do this for a while, after being inspired by Ollie Bye's "History of the World" video several years ago.

I'm not the first person to have done this - resources like OpenHistoricalMaps are amazing.

But, I noticed there were a few disparate datasets / academic databases online, so I combined them together as best as I could (I've linked all sources in the app). To make it more interesting, I also included:

- Notable events from the time period (geolocated where possible), sourced from wikidata

- Ancient cities + their original names

- Empire hierarchies for colonial empires like the British Empire

You can jump across time and use shuffle to explore some fascinating corners of history.

Would love any feedback, especially from people who like maps, timelines, and weird historical rabbit holes. Also please report any data issues if you find them (it's all using publicly collated data, so there will be plenty).

Happy to publish code / data on GH if there's interest!

4

AI-powered one-click translator for Pokémon GBA ROM hacks #

github.com favicongithub.com
3 comments7:45 AMView on HN
Meowth GBA Translator is an open-source, AI-powered tool that automates translation of Pokémon GBA ROMs (including binary hacks like FireRed, Emerald, Ruby/Sapphire, and Mystery Dungeon). Powered by LLMs (supports OpenAI, DeepSeek, Gemini, Claude, Groq, and 10+ others), it extracts text, translates intelligently while preserving codes and context, then rebuilds the ROM — all in one click via a friendly GUI or simple CLI command. Supports 6+ languages (Chinese, English, French, German, Italian, Spanish) with optimized prompts and smart font patching. Focus on gameplay mods, let AI handle the words. Free, MIT-licensed, cross-platform.
4

K9 Audit – Causal intent-execution audit trail for AI agents #

github.com favicongithub.com
1 comments12:40 AMView on HN
On March 4, 2026, my Claude Code agent wrote a staging URL into a production config file — three times, 41 minutes apart. Syntax was valid, no error thrown. My logs showed every action. All green.

The problem was invisible because nothing had recorded what the agent intended to do before it acted — only what it actually did.

K9 Audit fixes this with a causal five-tuple per agent step: - X_t: context (who acted, under what conditions) - U_t: action (what was executed) - Y*_t: intent contract (what it was supposed to do) - Y_t+1: actual outcome - R_t+1: deviation score (deterministic — no LLM, no tokens)

Records are SHA256 hash-chained. Tamper-evident. When something goes wrong, `k9log trace --last` gives root cause in under a second.

Works with Claude Code (zero-config hook), LangChain, AutoGen, CrewAI, or any Python agent via one decorator.

pip install k9audit-hook

4

Run an Agent Council of LLMs that debate and synthesize answers #

github.com favicongithub.com
2 comments1:26 PMView on HN
I built a local-first UI that adds two reasoning architectures on top of small models like Qwen, Llama and Mistral: a sequential Thinking Pipeline (Plan → Execute → Critique) and a parallel Agent Council where multiple expert models debate in parallel and a Judge synthesizes the best answer. No API keys, zero .env setup — just pip install multimind. Benchmark on GSM8K shows measurable accuracy gains vs. single-model inference.
4

Gitingest for Jupyter Notebook Accessibility #

jupycheck.vercel.app faviconjupycheck.vercel.app
1 comments12:02 AMView on HN
Hi all, I'm sharing Jupycheck, an open source web tool that detects accessibility issues in Jupyter Notebooks that are either uploaded or from a GitHub repository. It also lets you remediate accessibility issues by launching the notebooks in a JupyterLite environment with our interactive Lab extension installed.

The tool is powered by jupyterlab-a11y-checker, an accessibility engine/extension that our student team has been working on for over a year at UC Berkeley. We believe accessibility should be a first-class concern in the notebook ecosystem, and we hope our tools can help raise awareness and make notebooks more accessible across the community.

Support us on GitHub if you find the tool useful!

3

Bus Core 1.0.3 Local-first manufacturing system for small shops #

buscore.ca faviconbuscore.ca
0 comments12:44 AMView on HN
I’ve been building BUS Core, a local-first manufacturing/workshop system aimed at small makers and shop-style operations.

Version 1.0.3 is out today.

This release focused on hardening and UI cleanup more than feature expansion. The goal is to make the software more trustworthy in day-to-day use, not just more featureful.

The general product thesis is that there’s a gap between spreadsheets and heavy SaaS/ERP for small operators who want control over their own data and workflows.

It’s local-first, practical, and intentionally boring in the parts that should be boring.

Happy to answer questions about:

architecture

local-first tradeoffs

workflow scope

how I’m handling the build/process side

3

MCP server for ICD-10 and SNOMED clinical coding #

github.com favicongithub.com
0 comments3:53 AMView on HN
Hi HN,

I built an MCP server that exposes an API for automated clinical coding.

Repo: https://github.com/fcggamou/autoicd-mcp

It allows AI assistants that support the Model Context Protocol (MCP) to convert clinical text into structured medical codes like ICD-10 and SNOMED-CT.

Example use cases:

• coding diagnoses from clinical notes • extracting structured codes from medical documentation • integrating medical coding into LLM workflows • healthcare data pipelines

Example prompt with an MCP-enabled assistant:

“Convert this clinical note into ICD-10 codes”

The server then calls the AutoICD API and returns structured codes.

The goal is to make it easy to plug medical coding into AI agents and tools.

Would love feedback from anyone working on healthcare AI, medical NLP, or MCP tooling.

3

AutoICD API – AI clinical coding platform for ICD-10 and SNOMED #

autoicdapi.com faviconautoicdapi.com
0 comments4:00 AMView on HN
Hi HN,

I built AutoICD, an AI-powered clinical coding platform that converts unstructured medical text into ICD-10 and SNOMED-CT codes. This is not an LLM wrapper. The platform uses a multi-layer machine learning architecture internally, combining custom-trained models with curated medical knowledge.

Platform and tooling:

- JS SDK – https://github.com/fcggamou/autoicd-js - Python SDK – https://github.com/fcggamou/autoicd-python - MCP Server – https://github.com/fcggamou/autoicd-mcp

Use cases and benefits:

- Automated ICD-10 and SNOMED coding from clinical notes - Creation of structured datasets for research and analytics - Integration with AI assistants via MCP - Scalable pipelines optimized for real-world healthcare data - Access to ICD-10 codes and metadata programmatically

Feedback from anyone working on medical AI, clinical NLP, or MCP tooling is welcome.

3

Jurassic Park Unix System Kubernetes Viewer #

github.com favicongithub.com
2 comments7:11 AMView on HN
I made an app that allows you to view Kubernetes resources just like the unix system in Jurassic Park :) Unlikely to be used for anything serious, but with the tools available today I couldn't let the idea slip.
3

I built proxy that keeps RAG working while hiding PII #

0 comments1:46 PMView on HN
Hey HN,

When you send real documents or customer data to LLMs, you face a painful tradeoff:

- Send raw text → privacy disaster - Redact with [REDACTED] → embeddings break, RAG retrieval fails, multi-turn chats become useless, and the model often refuses to answer questions about the redacted entities.

The practical solution is consistent pseudonymization: the same real entity always maps to the same token (e.g. “Tata Motors” → ORG_7 everywhere). This preserves semantic meaning for vector search and reasoning, then you rehydrate the response so the provider never sees actual names, numbers or addresses.

I got fed up fighting this with Presidio + custom glue (truncated RAG chunks, declension in Indian languages, fuzzy merging for typos/siblings, LLM confusion, percentages breaking math). So I built Cloakpipe as a tiny single-binary Rust proxy.

It does: • Multi-layer detection (regex + financial rules + optional GLiNER2 ONNX NER + custom TOML) • Consistent reversible mapping in an AES-256-GCM encrypted vault (memory zeroized) • Smart rehydration that survives truncated chunks like [[ADDRESS:A00 • Built-in fuzzy resolution for typos and similar names • Numeric reasoning mode so percentages still work for calculations

Fully open source (MIT), zero Python dependencies, <5 ms overhead.

Repo: https://github.com/rohansx/cloakpipe Demo & quick start: https://app.cloakpipe.co/demo

Would love feedback from anyone who has audited their RAG data flow or is struggling with the redaction-vs-semantics problem — especially in legal, fintech, or non-English workflows.

What approaches have you landed on?

3

VaultLeap – USD accounts for founders outside the US #

vaultleap.com faviconvaultleap.com
2 comments3:07 PMView on HN
I'm Greg, co-founder of VaultLeap.

Built this for founders who can't get a US bank account. USD/EUR/MXN accounts with real ACH routing numbers and we have Visa cards coming soon.

If you've been cut off from Mercury or similar recently, DM me — happy to help some founders out.

3

Riventa.Dev – AI-native DevOps that acts, not just alerts #

riventa.dev faviconriventa.dev
0 comments3:17 PMView on HN
Hi HN,

Most DevOps tools are good at observing — they collect data, surface metrics, and send alerts. But the actual decision and action still falls on the engineer.

So I built Riventa.Dev — a DevOps platform where the AI (Riv) doesn't just surface data, it acts.

What Riv does today: - Automatic PR review on every push — no manual trigger, no GitHub Actions boilerplate - Predictive failure detection — catches patterns that historically cause prod failures - DORA metrics dashboard with real pipeline data (MTTR, Deployment Frequency, Change Failure Rate) - Security scanning: SAST, SBOM, dependency analysis — built in, not bolted on - Works with GitHub, GitLab, and Bitbucket

Built solo, from scratch, with a focus on keeping things simple for the end user.

What I'd love feedback on: Is the AI-first positioning clear? Where does the UX feel rough?

Free to try — no credit card required.

3

A2Apex – Test, certify, and discover trusted A2A agents #

a2apex.io favicona2apex.io
2 comments4:10 PMView on HN
Hey HN,

I built A2Apex (https://a2apex.io) — a testing and reputation platform for AI agents built on Google's A2A protocol.

The problem: AI agents are everywhere, but there's no way to verify they actually work. No standard testing. No directory of trusted agents. No reputation system.

What A2Apex does:

- Test — Point it at any A2A agent URL. We run 50+ automated compliance checks: agent card validation, live endpoint testing, state machine verification, streaming, auth, error handling.

- Certify — Get a 0-100 trust score with Gold/Silver/Bronze badges you can embed in your README or docs.

- Get Listed — Every tested agent gets a public profile page in the Agent Directory with trust scores, skills, test history, and embeddable badges.

Think of it as SSL Labs (testing) + npm (directory) + LinkedIn (profiles) — for AI agents.

Stack: Python/FastAPI, vanilla JS, SQLite. No frameworks, no build tools. Runs on a Mac mini in Wyoming.

Free: 5 tests/month. Pro: $29/mo. Startup: $99/mo. Try it at https://app.a2apex.io

I'm a dragline operator at a coal mine — built this on nights and weekends using Claude. Would love feedback from anyone building A2A agents or thinking about agent interoperability.

3

Baltic security monitor from public data sources #

estwarden.eu faviconestwarden.eu
0 comments5:44 PMView on HN
People around me started repeating stuff from various psyop campaigns on TikTok or other social media they consume.

Especially when living in Baltics it's basically 24/7 fearmongering here from anywhere, either it's constant russian disinfo targeted campaigns via their chains of locals or social media campaings or some bloggers chasing hype on clickbait posts, so it was driving me mad, and it is distracting and annoying when someone from your closest ones got hooked on one of these posts and I was wasting time to explain why it was a bs.

So I took my slopmachine and some manually tweaking here and there and made this dashboard. Main metric is basically a daily 0-100 threat score, which are just weighted sums and thresholds - no ML yet.

3

MoneyOnFIRE – FI date and action plan (v2) #

moneyonfire.com faviconmoneyonfire.com
0 comments5:55 PMView on HN
A few months ago we posted here and got a lot of insightful feedback. This is what we built from it.

MoneyOnFIRE answers two questions: when can you reach financial independence, and what should you do to get there the fastest? It runs a financial simulation across income, taxes, accounts, contributions, returns, and withdrawals, then produces a prioritized action checklist with specific dollar amounts, dates, and steps.

Several of the biggest improvements came directly from comments on the last HN thread:

Rental property support: The engine now models rental income, mortgages, appreciation, and how properties interact with the rest of a financial plan.

Scenario modeling: You can now compare how different choices — lower returns, working longer, adjusting spending — affect your FI timeline side by side.

No login required: Several people didn't want to create an account or store financial data. You can now run a full plan without signing up.

FI vs FIRE: We initially built for the early-retirement crowd. Feedback showed it's just as useful for anyone pursuing financial independence on a longer timeline — the calculations and actions are the same.

Also shipped: support for multiple children and college timelines, Roth conversion ladders, IRA strategy selection, umbrella and term life insurance sizing, and dynamic reports that update as your inputs change.

The core thesis hasn't changed: personal finances are a complex web of interacting rules and calculations. We want to solve that and give everyone a clear, ordered set of actions they can actually implement.

Happy to answer questions about the engine or the modeling decisions behind it.

3

Raccoon AI – Collaborative AI Agent for Anything #

raccoonai.tech faviconraccoonai.tech
1 comments6:20 PMView on HN
Hey HN, I'm Shubh, Co-Founder of Raccoon AI.

Raccoon AI is like having something between Claude Code and Cursor in the web.

The agent has its own computer with a terminal, browser, and internet, and it is built with the right balance of collaboration and autonomy.

You can talk to it mid-task, send it more files while it's still running, or just let it go and come back to a finished result.

It's the kind of product where you open it to try one thing and end up spending two hours because you keep thinking of more things to throw at it.

The thing that most people get excited about is that sessions chain across completely unrelated task types. You can go from market research (real citations, generated charts) to raw data analysis (dump your db, ask questions) to a full interactive app, all in one conversation sharing the same context.

It has unlimited context through auto summarization, which is really good with Ace Max.

It connects to Gmail, GitHub, Google Drive, Notion, Outlook, and 40+ other tools. You can add your own via custom MCP servers.

Raccoon AI is built on top of our own agents SDK, ACE, which hit SOTA on GAIA benchmark with a score of 92.67.

A bit of background: We're a team of 3, and we started about 1.5 years ago to build the best possible browser agent to ever exist, after a couple of pivots we arrived at this and have been constantly shipping and growing since October.

Happy to go deep on the architecture or talk about the limitations and excited about the feedback.

Site: https://raccoonai.tech

2

Autoschematic is a new infra-as-code tool built on reversible computing #

github.com favicongithub.com
0 comments4:13 PMView on HN
Unlike Terraform and Pulumi, Autoschematic is built around a bidirectional (push-pull) state model. This means that it can resolve state drift by "pulling" or "pushing" (applying). This makes it a much better fit for certain use-cases where configuration drift is more common, like Snowflake. It also means you can import your existing infra automatically.
2

RAG knowledge base poisoning lab, 100% local #

github.com favicongithub.com
0 comments1:40 PMView on HN
I'm the author. The lab runs entirely on LM Studio + Qwen2.5-7B-Instruct (Q4_K_M) + ChromaDB — no cloud APIs, no GPU required, no API keys.

From zero to seeing the poisoning succeed: git clone, make setup, make attack1. About 10 minutes.

Two things worth flagging upfront:

- The 95% success rate is against a 5-document corpus (best case for the attacker). In a mature collection you need proportionally more poisoned docs to dominate retrieval — but the mechanism is the same.

- Embedding anomaly detection at ingestion was the biggest surprise: 95% → 20% as a standalone control, outperforming all three generation-phase defenses combined. It runs on embeddings your pipeline already produces — no additional model.

All five layers combined: 10% residual.

Full attack breakdown and defense architecture: https://aminrj.com/posts/rag-document-poisoning/

Happy to discuss methodology, the PoisonedRAG comparison, or anything that looks off.

2

An application stack Claude coded directly in LLVM IR #

github.com favicongithub.com
0 comments5:41 PMView on HN
This repo is the result of a debate about what kind of programming language might be appropriate if humans are no longer the primary authors. Initially the thought was "LLMs can just generate binaries directly" (this was before a more famous person had the same idea). But that on reflection seems like a bad approach because languages exist to capture program semantics that are elided by translation to machine code. The next step was to wonder if an existing "machine readable" program representation can be the target for LLM code generation. It turns out yes. This project is the result of asking Claude to create an application stack entirely coded in LLVM's intermediate representation language.
2

SmartClip – fix multi-line shell commands before they hit your terminal #

github.com favicongithub.com
0 comments1:24 PMView on HN
I kept copying multi-line commands from ChatGPT/Claude/READMEs and getting `command not found` errors when pasting into my terminal. Bracketed paste mode doesn't help — it prevents line-by-line execution, but the content itself still arrives broken (stray `$` prompts, split continuations, operators across lines).

SmartClip hooks into your shell's paste widget (zsh, bash, fish) and silently fixes multi-line commands before the shell sees them. You paste with Cmd+V as usual — no new keybindings, no daemon, no background process.

It uses score-based heuristics to detect shell commands (so it won't mangle your JSON or prose), joins lines intelligently (backslash continuations, pipes, `&&`), strips prompt characters, and validates everything with `bash -n` before inserting. If it's not confident or the fix has invalid syntax, it passes through unchanged.

~150 lines of bash. Zero dependencies.

`brew install akshaydeshraj/smartclip` or `npm install -g smartclip-cli`

2

Imgfprint – deterministic image fingerprinting library for Rust #

0 comments1:14 PMView on HN
GitHub: https://github.com/themankindproject/imgfprint-rs

imgfprint is a Rust library for deterministic image fingerprinting and image similarity detection.

Features: - perceptual hashing - exact hashing - optional CLIP embeddings

2

Cloud to Desktop in the Fastest Way #

nativedesktop.com faviconnativedesktop.com
0 comments5:10 PMView on HN
Native Desktop is a toolkit for building native desktop applications using modern web technologies without dealing with the usual complexity of desktop tooling. It focuses on providing a simple developer experience where you can scaffold, build, and distribute desktop apps using familiar workflows and a modular package ecosystem. Instead of forcing developers to manage complicated native environments, Native Desktop provides a CLI and a set of packages that handle the heavy lifting while keeping projects flexible and maintainable. The goal is to let developers move from an idea to a working desktop application quickly while still having full control over architecture and distribution. The project is designed for developers who already build with modern web stacks and want a straightforward way to turn those applications into desktop software without reinventing the entire toolchain.
1

Free API mock server from your OpenAPI spec (no sign-up) #

apinotes.io faviconapinotes.io
0 comments3:10 PMView on HN
Hi HN I built ApiNotes Mock Server, a small tool that generates a live mock REST API from an OpenAPI (3.0/3.1) or Swagger 2.0 spec.

The goal was to remove all friction when you just need a mock API quickly:

No sign-up required to create an anonymous mock Supports paste, file upload, or URL fetch Produces a base URL + an auto-generated endpoints list Anonymous mocks currently expire after 72 hours (you can register to keep them / get higher limits) Link: https://apinotes.io/mock-server

I’d love feedback on:

What would make you trust a mock server like this for real projects? Any features you’d expect (auth simulation, latency/errors, stateful mocks, webhooks, etc.)? Is the 72-hour model reasonable, or should the free tier work differently? Thanks, happy to answer any questions and share implementation details.

1

Arkadia – AI characters based on real animals #

arkadia.lexisark.com faviconarkadia.lexisark.com
0 comments1:28 PMView on HN
Hi HN,

I've been working on a small project called Arkadia.

The idea started when I put a collar camera on my dog and experimented with using AI to narrate things from her point of view. That led me down a rabbit hole thinking about animal personalities and how people might interact with them.

Arkadia is a conversational AI app where you can chat with characters inspired by real animals.

The goal is to make it feel like discovering animals through conversation rather than interacting with a generic chatbot.

It's still early, but we have a few hundred people using it while I test conversation quality, memory, and latency.

One direction I'm exploring is using AI as a bridge to the real world. For example, after chatting with a character inspired by a specific breed or animal, you could discover farms, shelters, or places nearby where you could actually meet animals in real life.

Curious what the HN community thinks.

1

blunder.clinic, realistic daily chess puzzles #

blunder.clinic faviconblunder.clinic
0 comments6:21 PMView on HN
Today, I launched blunder.clinic, a daily chess puzzle app that provides realistic positions for you to try to not blunder on. These are similar to traditional chess puzzles (i.e., tactics), but different in a few key ways.

There are two popular ways to self-study chess: tactics and following along with professional games or with an engine. These are obviously helpful, but both have downsides.

When playing puzzles, just by knowing you are playing a puzzle means that you are biased towards looking for specific types of moves (checkmates, queen sacrifices, etc.). But in real life, you don't know what positions actually have tactics available, so you can waste your time looking for tactics, or, even worse, make a blunder by thinking there is a tactic when there really isn't.

When following along with an engine, there are tons of positions where an engine comes up with a move that you simply would never have seen and can't possibly understand. These are very low signal for learners, and it is hard to differentiate between positions like that and high-signal positions that are on the edge of your ability.[^1]

blunder.clinic addresses both of these problems by giving you positions where people of your skill level actually blundered, but the best move is something that isn't too far beyond your capability to understand and learn from. We do this by leveraging stockfish for positional evaluations and maia[1] for difficulty evaluation.

Overall, the main purpose of blunder.clinic is to help you stop blundering easy positions!

You can read a bit more about it here: https://mcognetta.github.io/posts/blunder-clinic/

[1]: Maia (https://www.maiachess.com/) is a family of chess models trained on real games. The inputs are a board position and a player rating, and the output is a probability distribution of moves. You can use this to answer queries like "How likely do we think a player of XYZ rating would pick the best move?"

1

PromptSonar – Static analysis for LLM prompt security #

github.com favicongithub.com
1 comments1:28 PMView on HN
I built PromptSonar because I kept seeing LLM security discussions focus entirely on runtime interception — but nobody was scanning the prompt strings written directly into source code before they ship.

PromptSonar is a static analyzer that scans your codebase for prompt injection, jailbreaks, PII leaks, and privilege escalation patterns in LLM prompt strings. It works across TypeScript, JavaScript, Python, Go, Rust, Java, and C#.

What it catches: - Direct prompt injection and jailbreak patterns - Unicode evasion: Cyrillic homoglyphs, zero-width character injection, Base64-encoded jailbreaks - PII exposure in prompts (SSN, credit card, API keys) - Privilege escalation and role manipulation - RAG poisoning patterns - Insecure output handling

Maps findings to OWASP LLM Top 10. Outputs SARIF v2.1.0 for GitHub Code Scanning integration. 100% local, zero telemetry, no API calls.

Available as VS Code extension, CLI, and GitHub Action.

npx @promptsonar/cli scan ./src

I wrote up the Unicode evasion detection methodology separately if anyone is interested in how the normalization pipeline works: https://medium.com/@meghal86/detecting-unicode-homoglyph-and...

1

A test harness that blocks unsafe AI actions before execution #

0 comments5:13 PMView on HN
I built a small test harness that evaluates AI actions before they execute.

Instead of relying only on prompts or output filtering, this introduces an authorization layer that evaluates whether an AI action should be allowed before it runs.

Each requested action is analyzed for signals such as:

• financial actions • external communications • data exports • system modification • destructive operations

Based on the detected signals and required authorization layers, the harness determines whether the action should PASS or DENY.

Example output:

Running 14 tests...

[1/14] financial_commitment -> DENY [2/14] send_external_email -> DENY [3/14] deploy_to_production -> DENY [14/14] general_information -> PASS

Every evaluation produces an auditable record including:

• detected signals • required authorizations • PASS / DENY decision

The goal is to explore what a deterministic execution governance layer might look like for AI systems interacting with real environments.

Demo video walkthrough: https://www.linkedin.com/feed/update/urn:li:activity:7436787... Repository:

https://github.com/celestinestudiosllc/ai-action-authorizati...

Curious how others building agent systems or AI runtimes are approaching execution authorization.