AI agents gained more GitHub stars this week than image generators gained all month. That single data point captures what is happening across creative AI in March 2026: the tools creators rely on are shifting from generation to orchestration, and the numbers tell a story most trend pieces miss.

We pulled data from five public sources: 400 recent arXiv research papers across AI and computer vision, 50 trending papers and 250 top models on HuggingFace, 50 of the most popular HuggingFace Spaces, and the weekly GitHub Trending page. Here is what the numbers actually say.

Key Findings

1. AI Agents Dominate GitHub: 10 of 16 Trending Repos Are Agent Frameworks

The most striking signal in this dataset comes from GitHub. Of the 16 AI-related trending repositories this week, 10 are agent frameworks or agent-related tools. The top gainer added 23,185 stars in a single week. For context, the top text-to-speech repo (Fish Speech) gained 2,159 stars in the same period.

GitHub Trending AI Repos: Agent Frameworks vs Other Categories
RepositoryCategoryTotal StarsStars Gained (Week)
AI Agency FrameworkAgent Framework55,508+23,185
Agent Optimization SystemAgent Tooling88,325+14,298
OpenViking Context DBAgent Infrastructure16,408+10,158
Lightpanda BrowserAgent Automation22,239+9,984
Learn Claude CodeAgent Tooling33,674+7,836
Page AgentWeb Agent11,827+6,243
Impeccable Design AIAgent Design10,964+6,432
DeepAgentsAgent Framework15,605+4,877
Claude HUDAgent Tooling8,533+3,674
Hermes AgentAgent Framework9,176+3,241
Fish Speech TTSAudio Generation28,338+2,159
BitNet 1-bit LLMsModel Optimization35,886+4,792

Agent frameworks collectively gained over 90,000 stars this week. The research confirms this: on HuggingFace, the most upvoted paper (91 votes) was MetaClaw, an agent that meta-learns and evolves autonomously. The arXiv keyword "LLM agents" appeared in 5 of the top 30 keywords. Creators are not just generating content anymore. They are building systems that generate, iterate, and improve content autonomously.

2. Text-to-Image Generation Has Plateaued

Zero new text-to-image models appeared in the top 50 by downloads recently. The leaderboard is frozen: Stable Diffusion XL (2.27M downloads, released July 2023), SD v1.5 (1.6M), and FLUX.1-dev (754K, 12,474 likes) occupy the top spots. The newest model in the top 10, Z-Image-Turbo from Tongyi-MAI, was released in November 2025 and has 876K downloads.

Top Text-to-Image Models by Downloads
ModelDownloadsLikesReleased
stabilityai/stable-diffusion-xl-base-1.02,269,4267,539Jul 2023
stable-diffusion-v1-51,595,4051,049Aug 2024
Tongyi-MAI/Z-Image-Turbo876,1544,276Nov 2025
black-forest-labs/FLUX.1-dev754,24012,474Jul 2024
black-forest-labs/FLUX.1-schnell709,8394,694Jul 2024

This does not mean image generation is dead. It means the open-source image generation stack has matured. FLUX.1-dev has the highest like-to-download ratio of any model in the top 10 (1 like per 60 downloads vs SDXL's 1:301), suggesting the community considers it the quality leader even if pipeline integrations still default to SDXL. For creators, this plateau is good news: the tools are stable, well-documented, and not changing every month.

3. Video Generation Is the Hottest Creative Pipeline

Text-to-video had 3 new models enter the top 50, the most new entries of any creative pipeline except audio. Wan2.2 from Wan-AI dominates with 130,303 downloads for the T2V variant alone, plus 43,867 for the image-to-video model and 34,366 for the Lightning distilled version. Combined, Wan2.2 variants account for over 350,000 downloads.

New Models by Creative Pipeline
PipelineNew Models (Top 50)Dominant PlayerTop Downloads
Text-to-Audio4Meta MusicGen1,398,448
Text-to-Video3Wan-AI Wan2.2130,303
Text-to-3D1Microsoft TRELLIS26,659
Text-to-Image0Stability AI SDXL2,269,426
Text-Generation0Qwen 2.5 7B22,065,027

On HuggingFace's trending papers, 2 of the top 3 most-upvoted are video-related: Video-CoE (83 upvotes) for event prediction and MosaicMem (69 upvotes) for controllable video world models. The research community and the open-source community are aligned: video is where the most active development is happening.

4. Qwen Owns the Open-Source LLM Pipeline

Qwen models hold 5 of the top 10 spots in text-generation by downloads. Qwen2.5-7B-Instruct leads with 22 million downloads, nearly triple Meta's Llama 3.1-8B-Instruct at 7.6 million. OpenAI's open-source entry, gpt-oss-20b, sits at 7.5 million downloads with 4,469 likes.

Text-Generation Model Downloads: Top Contenders
ModelOrganizationDownloadsLikes
Qwen2.5-7B-InstructAlibaba/Qwen22,065,0271,139
Qwen3-0.6BAlibaba/Qwen13,096,3871,141
gpt2OpenAI Community11,448,3873,131
Qwen2.5-1.5B-InstructAlibaba/Qwen8,923,707642
Qwen3-8BAlibaba/Qwen8,567,203995
Llama-3.1-8B-InstructMeta7,632,3515,577
gpt-oss-20bOpenAI7,468,7764,469

The Qwen3 family (released April 2025) already has models with 13 million and 8.5 million downloads. This is not just about quality. Qwen offers models at every size (0.6B, 1.7B, 3B, 7B, 8B) making them the default choice for local deployment and fine-tuning. For creators building AI-powered tools, Qwen is the practical choice for text generation that runs on consumer hardware.

5. Research Is Shifting from Generation to Reasoning

The top arXiv keywords paint a clear picture of where AI research is heading:

Top Research Keywords Across 400 Recent arXiv Papers
KeywordFrequencyCategory
Language models19Foundation
Reinforcement learning15Reasoning/Training
Large language models12Foundation
Vision-language models11Multimodal
Gaussian splatting63D/Spatial
LLM agents5Agents
Policy optimization5RL/Alignment
Reward modeling5RL/Alignment
Diffusion models4Generation
Video diffusion4Generation
3D reconstruction43D/Spatial
Autonomous driving4Robotics

Reinforcement learning (15 mentions), policy optimization (5), and reward modeling (5) collectively account for 25 keyword appearances. These are all about making models reason better, not generate more. "Diffusion models" (the backbone of image/video generation) appears only 4 times. The research community has moved past generation as a primary challenge and toward reasoning, planning, and autonomous action.

6. 3D Generation Is Early but Accelerating

Text-to-3D had 1 new model in the top 50, and downloads are orders of magnitude lower than other pipelines: Microsoft TRELLIS leads with just 26,659 downloads versus 2.2 million for the top image model. But the signals of acceleration are clear.

TRELLIS holds 3 of the top 6 text-to-3D model slots (xlarge, large, base variants). Hunyuan3D-2 from Tencent has 3,236 likes on HuggingFace Spaces. Gaussian splatting appeared 6 times in arXiv keywords, and 3D reconstruction appeared 4 times. The LoST paper (Level of Semantics Tokenization for 3D Shapes) was trending on HuggingFace with 14 upvotes.

For creators, 3D generation is not production-ready for most workflows, but it is crossing the threshold from research curiosity to functional tool. TRELLIS running locally is the closest thing to "Stable Diffusion moment" for 3D.

7. Audio Generation: ACE-Step Challenges Meta's MusicGen

Text-to-audio had the most new models (4) of any pipeline. Meta's MusicGen-medium still dominates downloads at 1.4 million, but ACE-Step 1.5 (released January 2026) has already reached 32,987 downloads with 649 likes, the highest like count of any audio model released in the past year.

Audio Generation Model Landscape
ModelDownloadsLikesReleased
Meta MusicGen-medium1,398,448158Jun 2023
Meta MusicGen-small117,976480Jun 2023
ACE-Step 1.532,987649Jan 2026
Stability Audio Open 1.031,0201,426May 2024
Meta MusicGen-large24,306525Jun 2023
MOSS SoundEffect6,43141Feb 2026

On GitHub, Fish Speech (28,338 stars, +2,159 this week) is the top open-source TTS project. Audio is following the same trajectory image generation took in 2023-2024: a dominant incumbent (MusicGen) being challenged by specialized newcomers (ACE-Step for music, Fish Speech for voice, Stability Audio for sound design).

8. The HuggingFace Spaces Ecosystem Reveals What Creators Actually Use

The top 20 HuggingFace Spaces by likes tell us what creative professionals return to repeatedly:

Top HuggingFace Spaces by Likes (Creative Categories)
SpaceCategoryLikesSDK
Open LLM LeaderboardBenchmarking13,904Docker
AI Comic FactoryImage + Story10,995Docker
Kolors Virtual Try-OnImage/Fashion10,011Gradio
FLUX.1-devImage Generation9,405Gradio
FLUX.1-schnellImage Generation5,046Gradio
Wan2.2-AnimateVideo Generation4,986Gradio
TRELLIS3D Generation4,776Gradio
MusicGenAudio Generation5,068Gradio
Kokoro-TTSVoice/TTS3,232Gradio
Hunyuan3D-23D Generation3,236Gradio

Three patterns: First, benchmarking tools (Open LLM Leaderboard, MTEB, LM Arena) are among the most-liked, showing creators care about model selection, not just model usage. Second, practical creative tools (AI Comic Factory, Virtual Try-On, LivePortrait) outperform raw model demos. Third, every major creative modality now has at least one Space with 3,000+ likes: images (FLUX), video (Wan2.2), audio (MusicGen), voice (Kokoro-TTS), 3D (TRELLIS, Hunyuan3D).

Trend Analysis

Rising: AI agents and autonomous workflows (10/16 GitHub trending repos), reinforcement learning for LLM alignment (25 arXiv keyword hits), video generation (3 new models, 2 of top 3 trending papers), open-source audio challengers (ACE-Step, Fish Speech).

Stable: Text-to-image generation (mature, 0 new top-50 models), text generation dominated by Qwen (5 of top 10), benchmark and evaluation tools (top HF Spaces by likes).

Emerging: 3D generation (TRELLIS gaining traction, gaussian splatting in 6 papers), vision-language-action models (spatial reasoning papers trending), 1-bit model optimization (Microsoft BitNet gaining 4,792 stars).

Declining as standalone categories: Pure text generation research (the field is moving to multimodal and agentic), single-purpose generation tools (being replaced by agent-orchestrated pipelines).

Predictions

1. Agent frameworks will consolidate by Q3 2026. Ten competing agent frameworks on GitHub trending simultaneously is unsustainable. Expect 2-3 winners to absorb the rest through community adoption, similar to how LangChain dominated the LLM toolkit space in 2023.

2. Wan2.2 will become the "Stable Diffusion of video" within 6 months. With 350,000+ combined downloads and multiple community variants already shipping, it has the early adoption curve that SDXL had in mid-2023.

3. Qwen will maintain open-source LLM dominance through 2026. Their strategy of releasing at every model size (0.6B to 72B) creates lock-in at the fine-tuning and deployment level that competitors cannot easily match.

4. Text-to-3D will have its "Stable Diffusion moment" by late 2026. TRELLIS downloads are growing, gaussian splatting research is accelerating, and Apple/Meta's spatial computing push creates real demand. The 26,659 downloads for TRELLIS today will look like SDXL's early numbers in hindsight.

5. ACE-Step will overtake MusicGen in community preference within 3 months. Its like-to-download ratio (1:50) is 5x better than MusicGen-medium (1:8,850), indicating much higher user satisfaction despite lower absolute numbers.

What This Means for Creators

If you work with images: Your tools are stable. FLUX.1-dev and SDXL are not going anywhere. Invest time in mastering workflows rather than chasing new models. The next wave of improvement will come from agent-based iteration, not new base models.

If you work with video: Wan2.2 is the model to learn now. The ecosystem around it (Lightning variants, Fun-Reward LoRAs) is growing fast. This is comparable to learning Stable Diffusion in early 2023 -- early investment pays off as the community builds tooling around it.

If you work with audio: Try ACE-Step 1.5 alongside MusicGen. The community clearly prefers it (649 likes vs 158 for MusicGen-medium), and it was built from the ground up for modern music generation tasks.

If you are building AI-powered tools: Qwen2.5-7B-Instruct is the practical default for any text generation task that needs to run locally. It has 3x the downloads of Llama 3.1-8B for a reason: it works reliably at the 7B parameter scale.

For everyone: Learn agent frameworks. The shift from "I generate one thing at a time" to "I set up a system that generates, evaluates, and iterates" is the defining trend of 2026. The GitHub data makes this unambiguous.

Full Data

Creative AI Pipeline Summary: March 2026
PipelineTop ModelDownloadsNew ModelsMaturity
Text-to-ImageSDXL Base 1.02,269,4260Mature
Text-to-VideoWan2.2-T2V-A14B130,3033Growing
Text-to-AudioMusicGen-medium1,398,4484Transitioning
Text-to-3DTRELLIS-text-xlarge26,6591Early
Text-GenerationQwen2.5-7B-Instruct22,065,0270Mature
Research Focus Areas: arXiv Keyword Distribution (400 Papers)
CategoryKeywordsCombined FrequencyShare
LLM/Foundationlanguage models, large language models3123%
Reasoning/RLreinforcement learning, policy optimization, reward modeling2519%
Multimodalvision-language models, vision language1612%
3D/Spatialgaussian splatting, 3D reconstruction, spatial reasoning1310%
AgentsLLM agents54%
Generationdiffusion models, video diffusion86%

This research was produced by Creative AI News.

Subscribe for free to get the weekly digest every Tuesday.