Meta releases Llama 3.1: The largest open-source AI model yet

Meta released Llama 3.1, the largest open-source AI model to date.

a year ago • 6 min read

By Peter Foy

Meta (META) just released Llama 3.1, the largest open-source AI model to date that promises to outperform some of the industry's leading proprietary models.

Our latest instruction-tuned model is available in 8B, 70B and 405B versions.

Let's take a look at what this new model has to offer...

A new benchmark in open-source AI

Back in April with the release of Llama 3, Meta hinted at its ambitious plans to create an open-source AI model that could rival the best private models from companies like OpenAI. Today, that vision has come to fruition with the release of Llama 3.1.

According to Meta, Llama 3.1 surpasses the performance of OpenAI's GPT-4o and Anthropic’s Claude 3.5 Sonnet on multiple benchmarks. This model is available in three versions: 8 billion, 70 billion, and a staggering 405 billion parameters.

Llama 3.1 Benchmarks

Category	Benchmark	Llama 3.1 8B	Gemma 2 9B IT	Llama 3.1 70B	GPT 3.5 Turbo	Llama 3.1 405B	GPT-4 Omni	Claude 3.5 Sonnet
General	MMLU Chat (0-shot, CoT)	73.0	72.3 (0-shot, non-CoT)	86.0	69.8	88.6	88.7	88.3
	MMLU PRO (5-shot, CoT)	48.3	-	66.4	49.2	73.3	74.0	77.0
	IFEval	80.4	73.6	87.5	69.9	88.6	85.6	88.0
Code	HumanEval (0-shot)	72.6	54.3	80.5	68.0	89.0	90.2	92.0
	MBPP EvalPlus (base) (0-shot)	72.8	71.7	86.0	82.0	88.6	87.8	90.5
Math	GSM8K (8-shot, CoT)	84.5	76.7	95.1	81.6	96.8	96.1	96.4 (0-shot)
	MATH (0-shot, CoT)	51.9	44.3	68.0	43.1	73.8	76.6	71.1
Reasoning	ARC Challenge (0-shot)	83.4	87.6	94.8	83.7	96.9	96.7	96.7
	GPQA (0-shot, CoT)	32.8	-	46.7	30.8	51.1	53.6	59.4
Tool use	BFCL	76.1	-	84.8	85.9	88.5	80.5	90.2
	Nexus (0-shot)	38.5	30.0	56.7	37.2	58.7	56.1	45.7
Long context	ZeroSCROLLS/QuALITY	81.0	-	90.5	-	95.2	90.5	90.5
	InfiniteBench/En.MC	65.1	-	78.2	-	83.4	82.5	-
	NIH/Multi-needle	98.8	-	97.5	-	98.1	100.0	90.8
Multilingual	Multilingual MGSM (0-shot)	68.9	53.2	86.9	51.4	91.6	90.5	91.6

Here's a summary of the key takeaways from these benchmarks:

Performance Highlights

General Understanding: Llama 3.1 demonstrates strong performance in general benchmarks like MMLU Chat and MMLU PRO, especially in the largest 405B model.
Coding Tasks: Performs well in coding benchmarks such as HumanEval and MBPP EvalPlus, with the 405B model showing top scores.
Mathematics: Excels in mathematical reasoning tasks like GSM8K and MATH, particularly in the 405B variant.
Reasoning and Tool Use: Shows high competence in reasoning benchmarks (e.g., ARC Challenge) and tool use scenarios (e.g., BFCL).
Long Context Handling: Capable in long context benchmarks, outperforming others in tasks like ZeroSCROLLS/QuALITY.
Multilingual Capabilities: Llama 3.1 handles multilingual tasks effectively, scoring well in Multilingual MGSM.

Comparative Performance

Llama 3.1 405B outperforms other models across most benchmarks, highlighting its robustness and versatility.
Even the smaller Llama 3.1 8B and 70B models show competitive performance, making them valuable options for different computational capacities.

Implications

The release of Llama 3.1 sets a new standard for open-source models, offering high performance across a wide range of tasks.
Its scalability and performance make it a strong contender in the landscape of AI models, challenging proprietary alternatives.

The compute behind Llama 3.1

As the Verge writes, the development of Llama 3.1 involved extensive computational resources of over 16,000 Nvidia H100 GPUs. Although Meta has not disclosed the exact cost, it is estimated to be in the hundreds of millions of dollars.

Despite the significant investment, Meta remains committed to the open-source model, releasing Llama 3.1 with a license that only requires approval from companies with hundreds of millions of users.

Meta's open-source advantage

Mark Zuckerberg, Meta’s CEO, believes that open-source AI models will eventually outpace proprietary models, drawing parallels to the rise of Linux as the dominant open-source operating system.

As Zuck writes in his post titled Open Source AI Is the Path Forward:

Today, several tech companies are developing leading closed models. But open source is quickly closing the gap.

This year, Llama 3 is competitive with the most advanced models and leading in some areas. Starting next year, we expect future Llama models to become the most advanced in the industry.

He envisions Llama 3.1 as an inflection point in the industry, encouraging developers to primarily use open-source solutions. This strategy mirrors Meta’s previous Open Compute Project, which standardized data center designs and saved the company billions.

Partnerships and deployment

To facilitate the widespread adoption of Llama 3.1, Meta is collaborating with over two dozen companies, including tech giants like Microsoft (MSFT), Amazon (AMZN), Google (GOOGL), and Nvidia (NVDA).

These partnerships aim to help developers deploy their customized versions of the model. Meta claims that Llama 3.1 is roughly half as costly to run in production compared to OpenAI’s GPT-4o, making it an attractive option for businesses.

Enhanced capabilities & safety measures

Llama 3.1 is not just about scale; it also introduces advanced features such as the ability to integrate with search engine APIs and generate Python code for complex queries.

Meta has also emphasized the importance of safety, conducting extensive adversarial testing to identify potential cybersecurity and biochemical misuse. The model includes new trust and safety tools like Llama Guard 2 and Cybersec Eval 2, ensuring a responsible deployment.

Global availability and future plans

Meta’s AI assistant, powered by Llama 3.1, is now accessible through WhatsApp and the Meta AI website in the US, with plans to expand to Instagram and Facebook soon.

The assistant supports multiple languages, including French, German, Hindi, Italian, and Spanish. However, the most advanced 405-billion parameter model will switch to a more scaled-back 70-billion model after a certain number of prompts, suggesting the former's high operational cost.

Challenges and regulatory concerns

While Meta is excited about the potential of Llama 3.1, it faces regulatory challenges, particularly in the European Union.

Due to the unpredictable regulatory environment, Meta has decided not to release its multimodal AI models in the EU. This decision highlights the complexities tech companies face in complying with stringent data protection laws like the GDPR.

Meta will *not* release the multimodal versions of its AI products and models in the EU because of an unpredictable regulatory environment.
This means that EU users of Ray-Ban Meta won't be able to use the image understanding features.
It also means that the EU industry will not…
— Yann LeCun (@ylecun) July 19, 2024

Summary: Llama 3.1 release

The release of Llama 3.1 represents a significant leap forward in the AI industry, setting new standards for open-source models.

With its advanced capabilities, cost-effectiveness, and collaborative deployment strategy, Llama 3.1 is poised to become a cornerstone in the AI ecosystem.

If you want to learn more, check out this interview with Rowan Cheung and Mark Zuckerberg discussing the new model, the future of AI/AGI, and more:

Exclusive: Meta just released Llama 3.1 405B — the first-ever open-sourced frontier AI model, beating top closed models like GPT-4o across several benchmarks.

I sat down with Mark Zuckerberg, diving into why this marks a major moment in AI history.

Timestamps:

00:00 Intro… pic.twitter.com/wI0X86P0dM
— Rowan Cheung (@rowancheung) July 23, 2024

public

Getting Started with OpenAI's Structured Outputs

public

A new benchmark in open-source AI

Llama 3.1 Benchmarks

Performance Highlights

Comparative Performance

Implications

The compute behind Llama 3.1

Meta's open-source advantage

Partnerships and deployment

Enhanced capabilities & safety measures

Global availability and future plans

Challenges and regulatory concerns

Summary: Llama 3.1 release

Sign up for MLQ.ai

Spread the word

Getting Started with OpenAI's Structured Outputs

A Study on LLMs for Financial Statement Analysis

Keep reading

Deepseek R1: The Training Breakthrough That Has AI Investors Worried

Alibaba Open Sources 100+ AI Models

AI Agents for Customer Segmentation Analysis

Subscribe to our newsletter