TinyFish Bigset: Open-Source Web Data Agent

TinyFish Bigset launched June 2 as an AGPL-3.0 open-source multi-agent system that turns a plain-English sentence into a structured dataset pulled from the live web, then refreshes it on a schedule. It runs self-hosted through Docker, with a hosted free tier covering 2,500 row operations per month.

What Happened

TinyFish, the San Francisco company behind a stack of enterprise web-agent APIs, released Bigset as its first open-source product. The pitch on the project README is direct: build and maintain any dataset from the live web, with refresh cadences from 30 minutes to weekly. You describe the dataset you want ("top 50 image generation models on Hugging Face by likes this week"), Bigset infers the schema, dispatches autonomous research agents, verifies findings against sources, deduplicates, and exports CSV or XLSX.

What You Can Do With It

For creators tracking the AI tool landscape, this is the first credible self-hostable answer to "how do I keep a live spreadsheet of the things I care about." Three workflows that fit Bigset in under an hour:

Track a topic for content research. Spin up a dataset of "AI music generation models released in 2026 with pricing tier and licensing terms" and have it refresh every 24 hours. Pull the CSV into a research doc before each newsletter or video script.

Maintain a tools roundup page. Feed a weekly-refresh dataset into a static site to keep a "Best AI [category] tools" page current without manual edits.

Build a competitor pricing watcher. Set Bigset to refresh a pricing dataset every 12 hours, then diff the exports to catch tier changes in tools you cover.

Why It Matters

Most "research agent" releases this year have been closed-source SaaS pitched at sales or finance teams. Bigset is the opposite shape: AGPL-3.0, Docker-deployable, and explicitly built around two-agent orchestration with hard limits (sub-agents capped at six tool calls). The architecture choice matters because it makes cost predictable, the source code auditable, and self-hosting a real option rather than a press-release feature.

It also lands in a moment where most creators are juggling browser tabs of leaderboards, pricing pages, and release feeds by hand. A scheduled, schema-aware data agent collapses that work to a prompt and a refresh interval.

Key Details

License: AGPL-3.0, self-hostable via Docker.

Free hosted tier: 2,500 row operations per month, 9 curated public datasets included (AI hiring, GPU pricing, open-source repos among them).

Stack: Next.js 16 frontend, Fastify backend, Convex database, TinyFish web-access APIs, OpenRouter for model routing.

Default models: Claude Sonnet 4.6 for schema inference, Qwen3.7-max for agent roles.

Architecture: One orchestrator agent dispatches sub-agents limited to six tool calls each, which keeps run costs and execution time bounded.

What to Do Next

Clone the repo and run the Docker quickstart to test a dataset against your own creator workflow before committing to the hosted tier. Read the company's broader API documentation if you want to plug Bigset into an existing research pipeline. The original launch coverage at TestingCatalog has additional screenshots of the schema-inference flow.

TinyFish Bigset: Open-Source Web Data Agent for Creators

What Happened

What You Can Do With It

Why It Matters

Key Details

What to Do Next

Keep reading

GPT-5.6 Sol, Terra, Luna Land on Amazon Bedrock

Claude Opus 5: Anthropic's New Frontier Model, Explained

Codex Slides: Open-Source AI Deck Studio in Codex

What Happened

What You Can Do With It

Why It Matters

Key Details

What to Do Next

Stay ahead of AI

Keep reading

GPT-5.6 Sol, Terra, Luna Land on Amazon Bedrock

Claude Opus 5: Anthropic's New Frontier Model, Explained

Codex Slides: Open-Source AI Deck Studio in Codex

Stay ahead of Creative AI