AI agents gained more GitHub stars this week than image generators gained all month. That single data point captures what is happening across creative AI in March 2026: the tools creators rely on are shifting from generation to orchestration, and the numbers tell a story most trend pieces miss.
We pulled data from five public sources: 400 recent arXiv research papers across AI and computer vision, 50 trending papers and 250 top models on HuggingFace, 50 of the most popular HuggingFace Spaces, and the weekly GitHub Trending page. Here is what the numbers actually say.
Key Findings
1. AI Agents Dominate GitHub: 10 of 16 Trending Repos Are Agent Frameworks
The most striking signal in this dataset comes from GitHub. Of the 16 AI-related trending repositories this week, 10 are agent frameworks or agent-related tools. The top gainer added 23,185 stars in a single week. For context, the top text-to-speech repo (Fish Speech) gained 2,159 stars in the same period.
| Repository | Category | Total Stars | Stars Gained (Week) |
|---|---|---|---|
| AI Agency Framework | Agent Framework | 55,508 | +23,185 |
| Agent Optimization System | Agent Tooling | 88,325 | +14,298 |
| OpenViking Context DB | Agent Infrastructure | 16,408 | +10,158 |
| Lightpanda Browser | Agent Automation | 22,239 | +9,984 |
| Learn Claude Code | Agent Tooling | 33,674 | +7,836 |
| Page Agent | Web Agent | 11,827 | +6,243 |
| Impeccable Design AI | Agent Design | 10,964 | +6,432 |
| DeepAgents | Agent Framework | 15,605 | +4,877 |
| Claude HUD | Agent Tooling | 8,533 | +3,674 |
| Hermes Agent | Agent Framework | 9,176 | +3,241 |
| Fish Speech TTS | Audio Generation | 28,338 | +2,159 |
| BitNet 1-bit LLMs | Model Optimization | 35,886 | +4,792 |
Agent frameworks collectively gained over 90,000 stars this week. The research confirms this: on HuggingFace, the most upvoted paper (91 votes) was MetaClaw, an agent that meta-learns and evolves autonomously. The arXiv keyword "LLM agents" appeared in 5 of the top 30 keywords. Creators are not just generating content anymore. They are building systems that generate, iterate, and improve content autonomously.
2. Text-to-Image Generation Has Plateaued
Zero new text-to-image models appeared in the top 50 by downloads recently. The leaderboard is frozen: Stable Diffusion XL (2.27M downloads, released July 2023), SD v1.5 (1.6M), and FLUX.1-dev (754K, 12,474 likes) occupy the top spots. The newest model in the top 10, Z-Image-Turbo from Tongyi-MAI, was released in November 2025 and has 876K downloads.
| Model | Downloads | Likes | Released |
|---|---|---|---|
| stabilityai/stable-diffusion-xl-base-1.0 | 2,269,426 | 7,539 | Jul 2023 |
| stable-diffusion-v1-5 | 1,595,405 | 1,049 | Aug 2024 |
| Tongyi-MAI/Z-Image-Turbo | 876,154 | 4,276 | Nov 2025 |
| black-forest-labs/FLUX.1-dev | 754,240 | 12,474 | Jul 2024 |
| black-forest-labs/FLUX.1-schnell | 709,839 | 4,694 | Jul 2024 |
This does not mean image generation is dead. It means the open-source image generation stack has matured. FLUX.1-dev has the highest like-to-download ratio of any model in the top 10 (1 like per 60 downloads vs SDXL's 1:301), suggesting the community considers it the quality leader even if pipeline integrations still default to SDXL. For creators, this plateau is good news: the tools are stable, well-documented, and not changing every month.
3. Video Generation Is the Hottest Creative Pipeline
Text-to-video had 3 new models enter the top 50, the most new entries of any creative pipeline except audio. Wan2.2 from Wan-AI dominates with 130,303 downloads for the T2V variant alone, plus 43,867 for the image-to-video model and 34,366 for the Lightning distilled version. Combined, Wan2.2 variants account for over 350,000 downloads.
| Pipeline | New Models (Top 50) | Dominant Player | Top Downloads |
|---|---|---|---|
| Text-to-Audio | 4 | Meta MusicGen | 1,398,448 |
| Text-to-Video | 3 | Wan-AI Wan2.2 | 130,303 |
| Text-to-3D | 1 | Microsoft TRELLIS | 26,659 |
| Text-to-Image | 0 | Stability AI SDXL | 2,269,426 |
| Text-Generation | 0 | Qwen 2.5 7B | 22,065,027 |
On HuggingFace's trending papers, 2 of the top 3 most-upvoted are video-related: Video-CoE (83 upvotes) for event prediction and MosaicMem (69 upvotes) for controllable video world models. The research community and the open-source community are aligned: video is where the most active development is happening.
4. Qwen Owns the Open-Source LLM Pipeline
Qwen models hold 5 of the top 10 spots in text-generation by downloads. Qwen2.5-7B-Instruct leads with 22 million downloads, nearly triple Meta's Llama 3.1-8B-Instruct at 7.6 million. OpenAI's open-source entry, gpt-oss-20b, sits at 7.5 million downloads with 4,469 likes.
| Model | Organization | Downloads | Likes |
|---|---|---|---|
| Qwen2.5-7B-Instruct | Alibaba/Qwen | 22,065,027 | 1,139 |
| Qwen3-0.6B | Alibaba/Qwen | 13,096,387 | 1,141 |
| gpt2 | OpenAI Community | 11,448,387 | 3,131 |
| Qwen2.5-1.5B-Instruct | Alibaba/Qwen | 8,923,707 | 642 |
| Qwen3-8B | Alibaba/Qwen | 8,567,203 | 995 |
| Llama-3.1-8B-Instruct | Meta | 7,632,351 | 5,577 |
| gpt-oss-20b | OpenAI | 7,468,776 | 4,469 |
The Qwen3 family (released April 2025) already has models with 13 million and 8.5 million downloads. This is not just about quality. Qwen offers models at every size (0.6B, 1.7B, 3B, 7B, 8B) making them the default choice for local deployment and fine-tuning. For creators building AI-powered tools, Qwen is the practical choice for text generation that runs on consumer hardware.
5. Research Is Shifting from Generation to Reasoning
The top arXiv keywords paint a clear picture of where AI research is heading:
| Keyword | Frequency | Category |
|---|---|---|
| Language models | 19 | Foundation |
| Reinforcement learning | 15 | Reasoning/Training |
| Large language models | 12 | Foundation |
| Vision-language models | 11 | Multimodal |
| Gaussian splatting | 6 | 3D/Spatial |
| LLM agents | 5 | Agents |
| Policy optimization | 5 | RL/Alignment |
| Reward modeling | 5 | RL/Alignment |
| Diffusion models | 4 | Generation |
| Video diffusion | 4 | Generation |
| 3D reconstruction | 4 | 3D/Spatial |
| Autonomous driving | 4 | Robotics |
Reinforcement learning (15 mentions), policy optimization (5), and reward modeling (5) collectively account for 25 keyword appearances. These are all about making models reason better, not generate more. "Diffusion models" (the backbone of image/video generation) appears only 4 times. The research community has moved past generation as a primary challenge and toward reasoning, planning, and autonomous action.
6. 3D Generation Is Early but Accelerating
Text-to-3D had 1 new model in the top 50, and downloads are orders of magnitude lower than other pipelines: Microsoft TRELLIS leads with just 26,659 downloads versus 2.2 million for the top image model. But the signals of acceleration are clear.
TRELLIS holds 3 of the top 6 text-to-3D model slots (xlarge, large, base variants). Hunyuan3D-2 from Tencent has 3,236 likes on HuggingFace Spaces. Gaussian splatting appeared 6 times in arXiv keywords, and 3D reconstruction appeared 4 times. The LoST paper (Level of Semantics Tokenization for 3D Shapes) was trending on HuggingFace with 14 upvotes.
For creators, 3D generation is not production-ready for most workflows, but it is crossing the threshold from research curiosity to functional tool. TRELLIS running locally is the closest thing to "Stable Diffusion moment" for 3D.
7. Audio Generation: ACE-Step Challenges Meta's MusicGen
Text-to-audio had the most new models (4) of any pipeline. Meta's MusicGen-medium still dominates downloads at 1.4 million, but ACE-Step 1.5 (released January 2026) has already reached 32,987 downloads with 649 likes, the highest like count of any audio model released in the past year.
| Model | Downloads | Likes | Released |
|---|---|---|---|
| Meta MusicGen-medium | 1,398,448 | 158 | Jun 2023 |
| Meta MusicGen-small | 117,976 | 480 | Jun 2023 |
| ACE-Step 1.5 | 32,987 | 649 | Jan 2026 |
| Stability Audio Open 1.0 | 31,020 | 1,426 | May 2024 |
| Meta MusicGen-large | 24,306 | 525 | Jun 2023 |
| MOSS SoundEffect | 6,431 | 41 | Feb 2026 |
On GitHub, Fish Speech (28,338 stars, +2,159 this week) is the top open-source TTS project. Audio is following the same trajectory image generation took in 2023-2024: a dominant incumbent (MusicGen) being challenged by specialized newcomers (ACE-Step for music, Fish Speech for voice, Stability Audio for sound design).
8. The HuggingFace Spaces Ecosystem Reveals What Creators Actually Use
The top 20 HuggingFace Spaces by likes tell us what creative professionals return to repeatedly:
| Space | Category | Likes | SDK |
|---|---|---|---|
| Open LLM Leaderboard | Benchmarking | 13,904 | Docker |
| AI Comic Factory | Image + Story | 10,995 | Docker |
| Kolors Virtual Try-On | Image/Fashion | 10,011 | Gradio |
| FLUX.1-dev | Image Generation | 9,405 | Gradio |
| FLUX.1-schnell | Image Generation | 5,046 | Gradio |
| Wan2.2-Animate | Video Generation | 4,986 | Gradio |
| TRELLIS | 3D Generation | 4,776 | Gradio |
| MusicGen | Audio Generation | 5,068 | Gradio |
| Kokoro-TTS | Voice/TTS | 3,232 | Gradio |
| Hunyuan3D-2 | 3D Generation | 3,236 | Gradio |
Three patterns: First, benchmarking tools (Open LLM Leaderboard, MTEB, LM Arena) are among the most-liked, showing creators care about model selection, not just model usage. Second, practical creative tools (AI Comic Factory, Virtual Try-On, LivePortrait) outperform raw model demos. Third, every major creative modality now has at least one Space with 3,000+ likes: images (FLUX), video (Wan2.2), audio (MusicGen), voice (Kokoro-TTS), 3D (TRELLIS, Hunyuan3D).
Trend Analysis
Rising: AI agents and autonomous workflows (10/16 GitHub trending repos), reinforcement learning for LLM alignment (25 arXiv keyword hits), video generation (3 new models, 2 of top 3 trending papers), open-source audio challengers (ACE-Step, Fish Speech).
Stable: Text-to-image generation (mature, 0 new top-50 models), text generation dominated by Qwen (5 of top 10), benchmark and evaluation tools (top HF Spaces by likes).
Emerging: 3D generation (TRELLIS gaining traction, gaussian splatting in 6 papers), vision-language-action models (spatial reasoning papers trending), 1-bit model optimization (Microsoft BitNet gaining 4,792 stars).
Declining as standalone categories: Pure text generation research (the field is moving to multimodal and agentic), single-purpose generation tools (being replaced by agent-orchestrated pipelines).
Predictions
1. Agent frameworks will consolidate by Q3 2026. Ten competing agent frameworks on GitHub trending simultaneously is unsustainable. Expect 2-3 winners to absorb the rest through community adoption, similar to how LangChain dominated the LLM toolkit space in 2023.
2. Wan2.2 will become the "Stable Diffusion of video" within 6 months. With 350,000+ combined downloads and multiple community variants already shipping, it has the early adoption curve that SDXL had in mid-2023.
3. Qwen will maintain open-source LLM dominance through 2026. Their strategy of releasing at every model size (0.6B to 72B) creates lock-in at the fine-tuning and deployment level that competitors cannot easily match.
4. Text-to-3D will have its "Stable Diffusion moment" by late 2026. TRELLIS downloads are growing, gaussian splatting research is accelerating, and Apple/Meta's spatial computing push creates real demand. The 26,659 downloads for TRELLIS today will look like SDXL's early numbers in hindsight.
5. ACE-Step will overtake MusicGen in community preference within 3 months. Its like-to-download ratio (1:50) is 5x better than MusicGen-medium (1:8,850), indicating much higher user satisfaction despite lower absolute numbers.
What This Means for Creators
If you work with images: Your tools are stable. FLUX.1-dev and SDXL are not going anywhere. Invest time in mastering workflows rather than chasing new models. The next wave of improvement will come from agent-based iteration, not new base models.
If you work with video: Wan2.2 is the model to learn now. The ecosystem around it (Lightning variants, Fun-Reward LoRAs) is growing fast. This is comparable to learning Stable Diffusion in early 2023 -- early investment pays off as the community builds tooling around it.
If you work with audio: Try ACE-Step 1.5 alongside MusicGen. The community clearly prefers it (649 likes vs 158 for MusicGen-medium), and it was built from the ground up for modern music generation tasks.
If you are building AI-powered tools: Qwen2.5-7B-Instruct is the practical default for any text generation task that needs to run locally. It has 3x the downloads of Llama 3.1-8B for a reason: it works reliably at the 7B parameter scale.
For everyone: Learn agent frameworks. The shift from "I generate one thing at a time" to "I set up a system that generates, evaluates, and iterates" is the defining trend of 2026. The GitHub data makes this unambiguous.
Full Data
| Pipeline | Top Model | Downloads | New Models | Maturity |
|---|---|---|---|---|
| Text-to-Image | SDXL Base 1.0 | 2,269,426 | 0 | Mature |
| Text-to-Video | Wan2.2-T2V-A14B | 130,303 | 3 | Growing |
| Text-to-Audio | MusicGen-medium | 1,398,448 | 4 | Transitioning |
| Text-to-3D | TRELLIS-text-xlarge | 26,659 | 1 | Early |
| Text-Generation | Qwen2.5-7B-Instruct | 22,065,027 | 0 | Mature |
| Category | Keywords | Combined Frequency | Share |
|---|---|---|---|
| LLM/Foundation | language models, large language models | 31 | 23% |
| Reasoning/RL | reinforcement learning, policy optimization, reward modeling | 25 | 19% |
| Multimodal | vision-language models, vision language | 16 | 12% |
| 3D/Spatial | gaussian splatting, 3D reconstruction, spatial reasoning | 13 | 10% |
| Agents | LLM agents | 5 | 4% |
| Generation | diffusion models, video diffusion | 8 | 6% |
This research was produced by Creative AI News.
Subscribe for free to get the weekly digest every Tuesday.