Introduction
On April 5, 2025, Meta released Llama 4, its latest generation of large language models [1]. This release has generated significant discussion in the AI community regarding Meta's approach to open-sourcing, the trustworthiness of evaluation results, and the overall reception of the models. This analysis examines these aspects in detail, providing a critical assessment of the Llama 4 release.
Overview of Llama 4 Models
Meta's Llama 4 release consists of three models in what they call the "Llama 4 herd" [1]:
Llama 4 Scout
Llama 4 Maverick
Llama 4 Behemoth
All models use a Mixture-of-Experts (MoE) architecture and are natively multimodal, capable of processing text, images, and video through an "early fusion" approach [1][2].
Concerns About Meta's Open Source Approach
Meta's approach to open-sourcing Llama 4 has raised several concerns:
License Restrictions
Despite Meta's claims of "openness," Llama 4 is not truly open source according to generally accepted definitions. The Llama 4 Community License Agreement contains significant restrictions, particularly in Section 2 "Additional Commercial Terms," which limits usage for entities with more than 700 million monthly active users [4][5].
"Open Weights" vs. "Open Source"
Critics argue that "open weights" more accurately describes Meta's approach rather than "open source." While the model weights are available for download, the licensing restrictions prevent truly open use [5][6].
Hardware Requirements
Unlike previous Llama versions, even the smallest Llama 4 model (Scout) requires high-end hardware (at least an NVIDIA H100 GPU), making it less accessible to individual researchers and smaller organizations [2][3].
Experimental vs. Released Versions
There's a discrepancy between the models used for benchmarking (particularly on LMArena) and the publicly released versions. Meta's blog post mentions an "experimental chat version" of Maverick that achieved high scores, but this version is not the same as what was made publicly available [7][8].
Trustworthiness of Evaluation Results
Several issues have emerged regarding the trustworthiness of Llama 4's evaluation results:
Benchmark Manipulation Allegations
There have been allegations that Meta manipulated benchmark results. A viral Reddit post cited a Chinese report allegedly from a Meta employee claiming internal pressure to blend benchmark test sets during post-training to achieve better scores [9][10].
Meta's Denial
Meta's VP of Generative AI, Ahmad Al-Dahle, has denied these allegations, stating they are "simply not true" and that the company would "never do that" [11][12].
Selective Benchmark Reporting
Critics note that Meta selectively reported benchmarks where Llama 4 performs well while omitting those where it underperforms compared to competitors like DeepSeek V3.1 [6][13].
Different Versions for Benchmarks
The version of Maverick evaluated on LMArena isn't identical to what Meta made publicly available. Meta's blog post mentions an "experimental chat version" tailored to improve "conversationality," raising questions about the relevance of the reported benchmark scores [7][8].
Context Window Claims
Despite Meta's promotion of Llama 4 Scout's 10 million token context window, developers have discovered that using even a fraction of that amount has proven challenging due to memory limitations. Third-party services providing access have limited Scout's context to much smaller windows (128,000 to 328,000 tokens) [3][14].
General Reaction to Llama 4
The AI community's reaction to Llama 4 has been mixed:
Disappointment
Some experts and community members have expressed disappointment with Llama 4, describing it as "entirely lost" compared to previous Llama releases. The Interconnects.ai analysis states: "Where Llama 2's and Llama 3's releases were arguably some of the top few events in AI for their respective release years, Llama 4 feels entirely lost" [6][15].
Unusual Release Timing
The Saturday release has been described as "utterly bizarre for a major company launching one of its highest-profile products of the year," suggesting potential internal issues or rushed timing [3][16].
Performance Concerns
Early users have reported inconsistent performance from Maverick and Scout models, with some tasks that other models handle easily proving challenging for Llama 4 [15][17].
Accessing Scout and Maverick Models
Llama 4 Scout and Maverick models are available through several channels:
Direct Download
The models can be downloaded from llama.com and Hugging Face after accepting the license terms [1][2].
Cloud Providers
The models are available through various cloud platforms including Amazon Web Services (SageMaker and Bedrock), Microsoft Azure, Google Cloud, and Databricks [19][20].
API Access
Services like OpenRouter and others provide API access to the models [21].
Why Behemoth Only Exists in a Blog Post
Llama 4 Behemoth is mentioned in Meta's blog post but has not been released publicly for several reasons:
Still in Training
Meta explicitly states that Behemoth is "still training" and they're "excited to share more details about it even while it's still in flight" [1][22].
Teacher Model Role
Behemoth served as a "teacher" model for distillation, helping to train the smaller Scout and Maverick models through a process called model distillation [1][3].
Conclusion
Meta's Llama 4 release represents a significant technical advancement in terms of architecture (MoE), multimodal capabilities, and context window size. However, it has fallen short of expectations in several areas:
- The "open source" claims are undermined by licensing restrictions that prevent truly open use [4][5].
- Questions about benchmark manipulation and the discrepancy between benchmarked and released versions raise concerns about the trustworthiness of evaluation results [9][10][11].
- The community reaction has been mixed, with many expressing disappointment compared to previous Llama releases [6][15][17].
- The hardware requirements for even the smallest model limit accessibility to researchers and smaller organizations [2][3].
- The unreleased Behemoth model, while technically impressive, exists only in Meta's blog post, raising questions about Meta's transparency and competitive strategy [1][3][6].
Overall, Llama 4 appears to be Meta's attempt to keep pace in the increasingly competitive AI landscape, but the release has exposed gaps between Meta's AI ambitions and the reality of what they've delivered to the community [3][6].
References
- Meta AI. (2025, April 5). The Llama 4 herd: The beginning of a new era of natively multimodal intelligence. https://ai.meta.com/blog/llama-4-multimodal-intelligence/
- Hugging Face. (2025, April 5). Welcome Llama 4 Maverick & Scout on Hugging Face. https://huggingface.co/blog/llama4-release
- Ars Technica. (2025, April 7). Meta's surprise Llama 4 drop exposes the gap between AI ambition and reality. https://arstechnica.com/ai/2025/04/metas-surprise-llama-4-drop-exposes-the-gap-between-ai-ambition-and-reality/
- Meta Llama. (2025). Llama 4 Community License Agreement. https://github.com/meta-llama/llama-models/blob/main/models/llama4/LICENSE
- TechCrunch. (2025, April 5). Meta releases Llama 4, a new crop of flagship AI models. https://techcrunch.com/2025/04/05/meta-releases-llama-4-a-new-crop-of-flagship-ai-models/
- Interconnects.ai. (2025, April 7). Llama 4: Did Meta just push the panic button? https://www.interconnects.ai/p/llama-4
- Reddit. (2025, April 7). Meta got caught gaming AI benchmarks for Llama 4. https://www.reddit.com/r/OpenAI/comments/1ju2buh/meta_got_caught_gaming_ai_benchmarks_for_llama_4/
- Venture Beat. (2025, April 8). Meta defends Llama 4 release against reports of mixed quality, blames bugs. https://venturebeat.com/ai/meta-defends-llama-4-release-against-reports-of-mixed-quality-blames-bugs/
- Reddit. (2025, April 6). Serious issues in Llama 4 training. I Have Submitted My Resignation Letter. https://www.reddit.com/r/LocalLLaMA/comments/1jt8yug/serious_issues_in_llama_4_training_i_have/
- Beebom. (2025, April 8). Meta Under Fire for Manipulating Llama 4 Benchmark, But It Isn't the First Time. https://beebom.com/meta-llama-4-benchmark-manipulation-not-first-time/
- TechCrunch. (2025, April 7). Meta exec denies the company artificially boosted Llama 4's benchmark scores. https://techcrunch.com/2025/04/07/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores/
- Analytics India Magazine. (2025, April 8). Meta Denies Any Wrongdoing in Llama 4 Benchmarks. https://analyticsindiamag.com/ai-news-updates/meta-denies-any-wrongdoing-in-llama-4-benchmarks/
- Tech in Asia. (2025, April 8). Meta denies manipulation of AI benchmark with Llama 4 models. https://www.techinasia.com/news/meta-denies-manipulation-ai-benchmark-llama-4-models
- Reddit. (2025, April 6). Meta's Llama 4 Fell Short. https://www.reddit.com/r/LocalLLaMA/comments/1jt7hlc/metas_llama_4_fell_short/
- Reddit. (2025, April 5). I'm incredibly disappointed with Llama-4. https://www.reddit.com/r/LocalLLaMA/comments/1jsl37d/im_incredibly_disappointed_with_llama4/
- CNBC. (2025, April 5). Meta debuts new Llama 4 models, but most powerful AI model is still to come. https://www.cnbc.com/2025/04/05/meta-debuts-new-llama-4-models-but-most-powerful-ai-model-is-still-to-come.html
- Reddit. (2025, April 5). What are your thoughts about the Llama 4 models? https://www.reddit.com/r/LocalLLaMA/comments/1jsr8ie/what_are_your_thoughts_about_the_llama_4_models/
- Resemble AI. (2025, April 6). What Is LLaMA 4? Everything You Need to Know. https://www.resemble.ai/what-is-llama-4-everything-you-need-to-know/
- Amazon Web Services. (2025, April 5). Meta's Llama 4 models now available on Amazon Web Services. https://www.aboutamazon.com/news/aws/aws-meta-llama-4-models-available
- Databricks. (2025, April 5). Introducing Meta's Llama 4 on the Databricks Data Intelligence Platform. https://www.databricks.com/blog/introducing-metas-llama-4-databricks-data-intelligence-platform
- Medium. (2025, April 6). How to use Meta Llama4 for free? OpenRouter, HuggingFace and more. https://medium.com/data-science-in-your-pocket/how-to-use-meta-llama4-for-free-da46c30aa32c
- BD Tech Talks. (2025, April 6). What to know about Meta's Llama 4 model family. https://bdtechtalks.com/2025/04/06/meta-llama-4/