Meta releases Llama 3.1: The largest open-source AI model yet

Meta released Llama 3.1, the largest open-source AI model to date.

4 months ago   •   6 min read

By Peter Foy
Table of contents

Meta (META) just released Llama 3.1, the largest open-source AI model to date that promises to outperform some of the industry's leading proprietary models.

Our latest instruction-tuned model is available in 8B, 70B and 405B versions.
source

Let's take a look at what this new model has to offer...

A new benchmark in open-source AI

Back in April with the release of Llama 3, Meta hinted at its ambitious plans to create an open-source AI model that could rival the best private models from companies like OpenAI. Today, that vision has come to fruition with the release of Llama 3.1.

According to Meta, Llama 3.1 surpasses the performance of OpenAI's GPT-4o and Anthropic’s Claude 3.5 Sonnet on multiple benchmarks. This model is available in three versions: 8 billion, 70 billion, and a staggering 405 billion parameters.

Llama 3.1 Benchmarks

Category Benchmark Llama 3.1 8B Gemma 2 9B IT Llama 3.1 70B GPT 3.5 Turbo Llama 3.1 405B GPT-4 Omni Claude 3.5 Sonnet
General MMLU Chat (0-shot, CoT) 73.0 72.3 (0-shot, non-CoT) 86.0 69.8 88.6 88.7 88.3
MMLU PRO (5-shot, CoT) 48.3 - 66.4 49.2 73.3 74.0 77.0
IFEval 80.4 73.6 87.5 69.9 88.6 85.6 88.0
Code HumanEval (0-shot) 72.6 54.3 80.5 68.0 89.0 90.2 92.0
MBPP EvalPlus (base) (0-shot) 72.8 71.7 86.0 82.0 88.6 87.8 90.5
Math GSM8K (8-shot, CoT) 84.5 76.7 95.1 81.6 96.8 96.1 96.4 (0-shot)
MATH (0-shot, CoT) 51.9 44.3 68.0 43.1 73.8 76.6 71.1
Reasoning ARC Challenge (0-shot) 83.4 87.6 94.8 83.7 96.9 96.7 96.7
GPQA (0-shot, CoT) 32.8 - 46.7 30.8 51.1 53.6 59.4
Tool use BFCL 76.1 - 84.8 85.9 88.5 80.5 90.2
Nexus (0-shot) 38.5 30.0 56.7 37.2 58.7 56.1 45.7
Long context ZeroSCROLLS/QuALITY 81.0 - 90.5 - 95.2 90.5 90.5
InfiniteBench/En.MC 65.1 - 78.2 - 83.4 82.5 -
NIH/Multi-needle 98.8 - 97.5 - 98.1 100.0 90.8
Multilingual Multilingual MGSM (0-shot) 68.9 53.2 86.9 51.4 91.6 90.5 91.6

Here's a summary of the key takeaways from these benchmarks:

Performance Highlights

  • General Understanding: Llama 3.1 demonstrates strong performance in general benchmarks like MMLU Chat and MMLU PRO, especially in the largest 405B model.
  • Coding Tasks: Performs well in coding benchmarks such as HumanEval and MBPP EvalPlus, with the 405B model showing top scores.
  • Mathematics: Excels in mathematical reasoning tasks like GSM8K and MATH, particularly in the 405B variant.
  • Reasoning and Tool Use: Shows high competence in reasoning benchmarks (e.g., ARC Challenge) and tool use scenarios (e.g., BFCL).
  • Long Context Handling: Capable in long context benchmarks, outperforming others in tasks like ZeroSCROLLS/QuALITY.
  • Multilingual Capabilities: Llama 3.1 handles multilingual tasks effectively, scoring well in Multilingual MGSM.

Comparative Performance

  • Llama 3.1 405B outperforms other models across most benchmarks, highlighting its robustness and versatility.
  • Even the smaller Llama 3.1 8B and 70B models show competitive performance, making them valuable options for different computational capacities.

Implications

  • The release of Llama 3.1 sets a new standard for open-source models, offering high performance across a wide range of tasks.
  • Its scalability and performance make it a strong contender in the landscape of AI models, challenging proprietary alternatives.

The compute behind Llama 3.1

As the Verge writes, the development of Llama 3.1 involved extensive computational resources of over 16,000 Nvidia H100 GPUs. Although Meta has not disclosed the exact cost, it is estimated to be in the hundreds of millions of dollars.

Despite the significant investment, Meta remains committed to the open-source model, releasing Llama 3.1 with a license that only requires approval from companies with hundreds of millions of users.

Meta's open-source advantage

Mark Zuckerberg, Meta’s CEO, believes that open-source AI models will eventually outpace proprietary models, drawing parallels to the rise of Linux as the dominant open-source operating system.

As Zuck writes in his post titled Open Source AI Is the Path Forward:

Today, several tech companies are developing leading closed models. But open source is quickly closing the gap.
This year, Llama 3 is competitive with the most advanced models and leading in some areas. Starting next year, we expect future Llama models to become the most advanced in the industry.

He envisions Llama 3.1 as an inflection point in the industry, encouraging developers to primarily use open-source solutions. This strategy mirrors Meta’s previous Open Compute Project, which standardized data center designs and saved the company billions.

Partnerships and deployment

To facilitate the widespread adoption of Llama 3.1, Meta is collaborating with over two dozen companies, including tech giants like Microsoft (MSFT), Amazon (AMZN), Google (GOOGL), and Nvidia (NVDA).

source

These partnerships aim to help developers deploy their customized versions of the model. Meta claims that Llama 3.1 is roughly half as costly to run in production compared to OpenAI’s GPT-4o, making it an attractive option for businesses.

Enhanced capabilities & safety measures

Llama 3.1 is not just about scale; it also introduces advanced features such as the ability to integrate with search engine APIs and generate Python code for complex queries.

Meta has also emphasized the importance of safety, conducting extensive adversarial testing to identify potential cybersecurity and biochemical misuse. The model includes new trust and safety tools like Llama Guard 2 and Cybersec Eval 2, ensuring a responsible deployment.

Global availability and future plans

Meta’s AI assistant, powered by Llama 3.1, is now accessible through WhatsApp and the Meta AI website in the US, with plans to expand to Instagram and Facebook soon.

The assistant supports multiple languages, including French, German, Hindi, Italian, and Spanish. However, the most advanced 405-billion parameter model will switch to a more scaled-back 70-billion model after a certain number of prompts, suggesting the former's high operational cost.

Challenges and regulatory concerns

While Meta is excited about the potential of Llama 3.1, it faces regulatory challenges, particularly in the European Union.

Due to the unpredictable regulatory environment, Meta has decided not to release its multimodal AI models in the EU. This decision highlights the complexities tech companies face in complying with stringent data protection laws like the GDPR.

Summary: Llama 3.1 release

The release of Llama 3.1 represents a significant leap forward in the AI industry, setting new standards for open-source models.

With its advanced capabilities, cost-effectiveness, and collaborative deployment strategy, Llama 3.1 is poised to become a cornerstone in the AI ecosystem.

If you want to learn more, check out this interview with Rowan Cheung and Mark Zuckerberg discussing the new model, the future of AI/AGI, and more:


Spread the word

Keep reading