State of Creative AI Tools: March 2026 Report

AI agents gained more GitHub stars this week than image generators gained all month. That single data point captures what is happening across creative AI in March 2026: the tools creators rely on are shifting from generation to orchestration, and the numbers tell a story most trend pieces miss.

We pulled data from five public sources: 400 recent arXiv research papers across AI and computer vision, 50 trending papers and 250 top models on HuggingFace, 50 of the most popular HuggingFace Spaces, and the weekly GitHub Trending page. Here is what the numbers actually say.

Key Findings

1. AI Agents Dominate GitHub: 10 of 16 Trending Repos Are Agent Frameworks

The most striking signal in this dataset comes from GitHub. Of the 16 AI-related trending repositories this week, 10 are agent frameworks or agent-related tools. The top gainer added 23,185 stars in a single week. For context, the top text-to-speech repo (Fish Speech) gained 2,159 stars in the same period.

GitHub Trending AI Repos: Agent Frameworks vs Other Categories
Repository	Category	Total Stars	Stars Gained (Week)
AI Agency Framework	Agent Framework	55,508	+23,185
Agent Optimization System	Agent Tooling	88,325	+14,298
OpenViking Context DB	Agent Infrastructure	16,408	+10,158
Lightpanda Browser	Agent Automation	22,239	+9,984
Learn Claude Code	Agent Tooling	33,674	+7,836
Page Agent	Web Agent	11,827	+6,243
Impeccable Design AI	Agent Design	10,964	+6,432
DeepAgents	Agent Framework	15,605	+4,877
Claude HUD	Agent Tooling	8,533	+3,674
Hermes Agent	Agent Framework	9,176	+3,241
Fish Speech TTS	Audio Generation	28,338	+2,159
BitNet 1-bit LLMs	Model Optimization	35,886	+4,792

Agent frameworks collectively gained over 90,000 stars this week. The research confirms this: on HuggingFace, the most upvoted paper (91 votes) was MetaClaw, an agent that meta-learns and evolves autonomously. The arXiv keyword "LLM agents" appeared in 5 of the top 30 keywords. Creators are not just generating content anymore. They are building systems that generate, iterate, and improve content autonomously.

2. Text-to-Image Generation Has Plateaued

Zero new text-to-image models appeared in the top 50 by downloads recently. The leaderboard is frozen: Stable Diffusion XL (2.27M downloads, released July 2023), SD v1.5 (1.6M), and FLUX.1-dev (754K, 12,474 likes) occupy the top spots. The newest model in the top 10, Z-Image-Turbo from Tongyi-MAI, was released in November 2025 and has 876K downloads.

Top Text-to-Image Models by Downloads
Model	Downloads	Likes	Released
stabilityai/stable-diffusion-xl-base-1.0	2,269,426	7,539	Jul 2023
stable-diffusion-v1-5	1,595,405	1,049	Aug 2024
Tongyi-MAI/Z-Image-Turbo	876,154	4,276	Nov 2025
black-forest-labs/FLUX.1-dev	754,240	12,474	Jul 2024
black-forest-labs/FLUX.1-schnell	709,839	4,694	Jul 2024

This does not mean image generation is dead. It means the open-source image generation stack has matured. FLUX.1-dev has the highest like-to-download ratio of any model in the top 10 (1 like per 60 downloads vs SDXL's 1:301), suggesting the community considers it the quality leader even if pipeline integrations still default to SDXL. For creators, this plateau is good news: the tools are stable, well-documented, and not changing every month.

3. Video Generation Is the Hottest Creative Pipeline

Text-to-video had 3 new models enter the top 50, the most new entries of any creative pipeline except audio. Wan2.2 from Wan-AI dominates with 130,303 downloads for the T2V variant alone, plus 43,867 for the image-to-video model and 34,366 for the Lightning distilled version. Combined, Wan2.2 variants account for over 350,000 downloads.

New Models by Creative Pipeline
Pipeline	New Models (Top 50)	Dominant Player	Top Downloads
Text-to-Audio	4	Meta MusicGen	1,398,448
Text-to-Video	3	Wan-AI Wan2.2	130,303
Text-to-3D	1	Microsoft TRELLIS	26,659
Text-to-Image	0	Stability AI SDXL	2,269,426
Text-Generation	0	Qwen 2.5 7B	22,065,027

On HuggingFace's trending papers, 2 of the top 3 most-upvoted are video-related: Video-CoE (83 upvotes) for event prediction and MosaicMem (69 upvotes) for controllable video world models. The research community and the open-source community are aligned: video is where the most active development is happening.

4. Qwen Owns the Open-Source LLM Pipeline

Qwen models hold 5 of the top 10 spots in text-generation by downloads. Qwen2.5-7B-Instruct leads with 22 million downloads, nearly triple Meta's Llama 3.1-8B-Instruct at 7.6 million. OpenAI's open-source entry, gpt-oss-20b, sits at 7.5 million downloads with 4,469 likes.

Text-Generation Model Downloads: Top Contenders
Model	Organization	Downloads	Likes
Qwen2.5-7B-Instruct	Alibaba/Qwen	22,065,027	1,139
Qwen3-0.6B	Alibaba/Qwen	13,096,387	1,141
gpt2	OpenAI Community	11,448,387	3,131
Qwen2.5-1.5B-Instruct	Alibaba/Qwen	8,923,707	642
Qwen3-8B	Alibaba/Qwen	8,567,203	995
Llama-3.1-8B-Instruct	Meta	7,632,351	5,577
gpt-oss-20b	OpenAI	7,468,776	4,469

The Qwen3 family (released April 2025) already has models with 13 million and 8.5 million downloads. This is not just about quality. Qwen offers models at every size (0.6B, 1.7B, 3B, 7B, 8B) making them the default choice for local deployment and fine-tuning. For creators building AI-powered tools, Qwen is the practical choice for text generation that runs on consumer hardware.

5. Research Is Shifting from Generation to Reasoning

The top arXiv keywords paint a clear picture of where AI research is heading:

Top Research Keywords Across 400 Recent arXiv Papers
Keyword	Frequency	Category
Language models	19	Foundation
Reinforcement learning	15	Reasoning/Training
Large language models	12	Foundation
Vision-language models	11	Multimodal
Gaussian splatting	6	3D/Spatial
LLM agents	5	Agents
Policy optimization	5	RL/Alignment
Reward modeling	5	RL/Alignment
Diffusion models	4	Generation
Video diffusion	4	Generation
3D reconstruction	4	3D/Spatial
Autonomous driving	4	Robotics

Reinforcement learning (15 mentions), policy optimization (5), and reward modeling (5) collectively account for 25 keyword appearances. These are all about making models reason better, not generate more. "Diffusion models" (the backbone of image/video generation) appears only 4 times. The research community has moved past generation as a primary challenge and toward reasoning, planning, and autonomous action.

6. 3D Generation Is Early but Accelerating

Text-to-3D had 1 new model in the top 50, and downloads are orders of magnitude lower than other pipelines: Microsoft TRELLIS leads with just 26,659 downloads versus 2.2 million for the top image model. But the signals of acceleration are clear.

TRELLIS holds 3 of the top 6 text-to-3D model slots (xlarge, large, base variants). Hunyuan3D-2 from Tencent has 3,236 likes on HuggingFace Spaces. Gaussian splatting appeared 6 times in arXiv keywords, and 3D reconstruction appeared 4 times. The LoST paper (Level of Semantics Tokenization for 3D Shapes) was trending on HuggingFace with 14 upvotes.

For creators, 3D generation is not production-ready for most workflows, but it is crossing the threshold from research curiosity to functional tool. TRELLIS running locally is the closest thing to "Stable Diffusion moment" for 3D.

7. Audio Generation: ACE-Step Challenges Meta's MusicGen

Text-to-audio had the most new models (4) of any pipeline. Meta's MusicGen-medium still dominates downloads at 1.4 million, but ACE-Step 1.5 (released January 2026) has already reached 32,987 downloads with 649 likes, the highest like count of any audio model released in the past year.

Audio Generation Model Landscape
Model	Downloads	Likes	Released
Meta MusicGen-medium	1,398,448	158	Jun 2023
Meta MusicGen-small	117,976	480	Jun 2023
ACE-Step 1.5	32,987	649	Jan 2026
Stability Audio Open 1.0	31,020	1,426	May 2024
Meta MusicGen-large	24,306	525	Jun 2023
MOSS SoundEffect	6,431	41	Feb 2026

On GitHub, Fish Speech (28,338 stars, +2,159 this week) is the top open-source TTS project. Audio is following the same trajectory image generation took in 2023-2024: a dominant incumbent (MusicGen) being challenged by specialized newcomers (ACE-Step for music, Fish Speech for voice, Stability Audio for sound design).

8. The HuggingFace Spaces Ecosystem Reveals What Creators Actually Use

The top 20 HuggingFace Spaces by likes tell us what creative professionals return to repeatedly:

Top HuggingFace Spaces by Likes (Creative Categories)
Space	Category	Likes	SDK
Open LLM Leaderboard	Benchmarking	13,904	Docker
AI Comic Factory	Image + Story	10,995	Docker
Kolors Virtual Try-On	Image/Fashion	10,011	Gradio
FLUX.1-dev	Image Generation	9,405	Gradio
FLUX.1-schnell	Image Generation	5,046	Gradio
Wan2.2-Animate	Video Generation	4,986	Gradio
TRELLIS	3D Generation	4,776	Gradio
MusicGen	Audio Generation	5,068	Gradio
Kokoro-TTS	Voice/TTS	3,232	Gradio
Hunyuan3D-2	3D Generation	3,236	Gradio

Three patterns: First, benchmarking tools (Open LLM Leaderboard, MTEB, LM Arena) are among the most-liked, showing creators care about model selection, not just model usage. Second, practical creative tools (AI Comic Factory, Virtual Try-On, LivePortrait) outperform raw model demos. Third, every major creative modality now has at least one Space with 3,000+ likes: images (FLUX), video (Wan2.2), audio (MusicGen), voice (Kokoro-TTS), 3D (TRELLIS, Hunyuan3D).

Trend Analysis

Rising: AI agents and autonomous workflows (10/16 GitHub trending repos), reinforcement learning for LLM alignment (25 arXiv keyword hits), video generation (3 new models, 2 of top 3 trending papers), open-source audio challengers (ACE-Step, Fish Speech).

Stable: Text-to-image generation (mature, 0 new top-50 models), text generation dominated by Qwen (5 of top 10), benchmark and evaluation tools (top HF Spaces by likes).

Emerging: 3D generation (TRELLIS gaining traction, gaussian splatting in 6 papers), vision-language-action models (spatial reasoning papers trending), 1-bit model optimization (Microsoft BitNet gaining 4,792 stars).

Declining as standalone categories: Pure text generation research (the field is moving to multimodal and agentic), single-purpose generation tools (being replaced by agent-orchestrated pipelines).

Predictions

1. Agent frameworks will consolidate by Q3 2026. Ten competing agent frameworks on GitHub trending simultaneously is unsustainable. Expect 2-3 winners to absorb the rest through community adoption, similar to how LangChain dominated the LLM toolkit space in 2023.

2. Wan2.2 will become the "Stable Diffusion of video" within 6 months. With 350,000+ combined downloads and multiple community variants already shipping, it has the early adoption curve that SDXL had in mid-2023.

3. Qwen will maintain open-source LLM dominance through 2026. Their strategy of releasing at every model size (0.6B to 72B) creates lock-in at the fine-tuning and deployment level that competitors cannot easily match.

4. Text-to-3D will have its "Stable Diffusion moment" by late 2026. TRELLIS downloads are growing, gaussian splatting research is accelerating, and Apple/Meta's spatial computing push creates real demand. The 26,659 downloads for TRELLIS today will look like SDXL's early numbers in hindsight.

5. ACE-Step will overtake MusicGen in community preference within 3 months. Its like-to-download ratio (1:50) is 5x better than MusicGen-medium (1:8,850), indicating much higher user satisfaction despite lower absolute numbers.

What This Means for Creators

If you work with images: Your tools are stable. FLUX.1-dev and SDXL are not going anywhere. Invest time in mastering workflows rather than chasing new models. The next wave of improvement will come from agent-based iteration, not new base models.

If you work with video: Wan2.2 is the model to learn now. The ecosystem around it (Lightning variants, Fun-Reward LoRAs) is growing fast. This is comparable to learning Stable Diffusion in early 2023 -- early investment pays off as the community builds tooling around it.

If you work with audio: Try ACE-Step 1.5 alongside MusicGen. The community clearly prefers it (649 likes vs 158 for MusicGen-medium), and it was built from the ground up for modern music generation tasks.

If you are building AI-powered tools: Qwen2.5-7B-Instruct is the practical default for any text generation task that needs to run locally. It has 3x the downloads of Llama 3.1-8B for a reason: it works reliably at the 7B parameter scale.

For everyone: Learn agent frameworks. The shift from "I generate one thing at a time" to "I set up a system that generates, evaluates, and iterates" is the defining trend of 2026. The GitHub data makes this unambiguous.

Full Data

Creative AI Pipeline Summary: March 2026
Pipeline	Top Model	Downloads	New Models	Maturity
Text-to-Image	SDXL Base 1.0	2,269,426	0	Mature
Text-to-Video	Wan2.2-T2V-A14B	130,303	3	Growing
Text-to-Audio	MusicGen-medium	1,398,448	4	Transitioning
Text-to-3D	TRELLIS-text-xlarge	26,659	1	Early
Text-Generation	Qwen2.5-7B-Instruct	22,065,027	0	Mature

Research Focus Areas: arXiv Keyword Distribution (400 Papers)
Category	Keywords	Combined Frequency	Share
LLM/Foundation	language models, large language models	31	23%
Reasoning/RL	reinforcement learning, policy optimization, reward modeling	25	19%
Multimodal	vision-language models, vision language	16	12%
3D/Spatial	gaussian splatting, 3D reconstruction, spatial reasoning	13	10%
Agents	LLM agents	5	4%
Generation	diffusion models, video diffusion	8	6%

This research was produced by Creative AI News.

Subscribe for free to get the weekly digest every Tuesday.

State of Creative AI Tools: March 2026 Data Report

Key Findings

1. AI Agents Dominate GitHub: 10 of 16 Trending Repos Are Agent Frameworks

2. Text-to-Image Generation Has Plateaued

3. Video Generation Is the Hottest Creative Pipeline

4. Qwen Owns the Open-Source LLM Pipeline

5. Research Is Shifting from Generation to Reasoning

6. 3D Generation Is Early but Accelerating

7. Audio Generation: ACE-Step Challenges Meta's MusicGen

8. The HuggingFace Spaces Ecosystem Reveals What Creators Actually Use

Trend Analysis

Predictions

What This Means for Creators

Full Data

Keep reading

Gemini 3.6 Flash: Google's New Agent Workhorse

Qwen-Image-3.0 Ships Without Open Weights or Report

Jelly UI: Soft-Body Physics for HTML Form Controls

Key Findings

1. AI Agents Dominate GitHub: 10 of 16 Trending Repos Are Agent Frameworks

2. Text-to-Image Generation Has Plateaued

3. Video Generation Is the Hottest Creative Pipeline

4. Qwen Owns the Open-Source LLM Pipeline

5. Research Is Shifting from Generation to Reasoning

6. 3D Generation Is Early but Accelerating

7. Audio Generation: ACE-Step Challenges Meta's MusicGen

8. The HuggingFace Spaces Ecosystem Reveals What Creators Actually Use

Trend Analysis

Predictions

What This Means for Creators

Full Data

Stay ahead of AI

Keep reading

Gemini 3.6 Flash: Google's New Agent Workhorse

Qwen-Image-3.0 Ships Without Open Weights or Report

Jelly UI: Soft-Body Physics for HTML Form Controls

Stay ahead of Creative AI