Grok 4: The world's most powerful AI model

Introduction

In 10 july 2025, xAI launches Grok 4, the most intelligent AI model available today as they claims in their livestream and their benchmarks point to that direction as well. They announce 2 versions, Grok 4 and Grok 4 Heavy . The models have made a significant accuracy and beat other reasoning llm models such as Gemini 2.5 Pro and o3 on different benchmarks . Grok 4 is a single agent while Grok 4 Heavy is multi-agents .

Training

Grok 4 is no different then Grok 3 but with better reasoning and x10 times more computing.

Humanity’s Last Exam

A 2 500 PhD and advanced level questions across a broad range of fields such as Mathematics, Chemistry, Languages, Engineering and so many more . It was created by Center for AI Safety and Scale AI early this year in response to the popular AI benchmarks having reached “saturation” . With context window 128 000 in the app and 256 000 in the API, Grok 4 Heavy with tools have achieved 44.4% on Humanity’s Last Exam, Grok 4 on the other hand have achieved 38.6%, 25.4% with tools and without tools respectively. Elon musk claims that the model is smarter then all the graduate students in all disciplines .

HLE (text only)

High level essay is another benchmark used to test the capability of Grok 4 and the results are fascinating . Grok 4 has achieved 50.7% accuracy on the test time with external tool and approximately 42% during the training with no tool. We can see that the more compute you give it, the accurate it gets over time. Grok 4 have tested on other reasoning problems such as AIM25, GPQA, LCB and many more, it did perform well compared to other reasoning models and scores the highest accuracy in most of the benchmarks . Andon labs have tested Grok 4 on how well it can manage a simple long-run business scenarios by operation a vending machine. The AI Agent must track inventory, place orders, set prices and cover daily fees. The model outperform Claude Opus 4, o3 and many other AI models and even humans, you can find the leaderboard here Vending-Bench: Testing long-term coherence in agents | Andon Labs xAI made a significant voice mode 2x faster, 5 voices and 10x daily user seconds . Right now, Grok 4 and Grok 4 Heavy are available at $300/month in grok.com .

What is next ?

In their livestream, xAI will soon announce the coding models which will be faster and smarter. In addition, the multi-model capability will be improved and video generations and video understinding will be announced in the future.

Introduction

Training

Humanity’s Last Exam

HLE (text only)

What is next ?

Table of Contents