Grok-3, developed by Elon Musk’s xAI, was unveiled on Monday, with the company making bold claims about its capabilities while showcasing a massive computing infrastructure that signals even bigger ambitions.
The announcement focused heavily on raw computational muscle, benchmark performance, and upcoming features, though many of the actual demonstrations felt like replays of what other AI companies have already achieved.
The star of the initial part of the show wasn’t the AI itself, but rather “Colossus,” a behemoth cluster of 200,000 GPUs that powers Grok-3’s training.
The system came together in two phases: 122 days of synchronous training on 100,000 GPUs, followed by 92 days of scaling up to the full 200,000. According to the xAI developers, building this infrastructure proved more challenging than developing the AI model itself.
The company already has plans for an even more powerful cluster, with Musk saying they are aiming for five times the current capacity, effectively building what would be the most powerful GPU cluster on earth.
When it comes to performance, Grok-3 shows impressive results across standard AI benchmarks. The base model (the regular model without Chain of Thought and reasoning embedded) consistently tops the charts in math (AIME), science (GPOA), and coding (LCB) tests.
It also seems very promising in blind tests.
xAI confirmed that the mysterious model codenamed “Chocolate” was actually an early test version of Grok-3 that was uploaded to the LLM Arena.
During those tests, it achieved the best ELO among all the LLMs, meaning users preferred its answers over the generations provided by all the other AI models in direct competition without knowing which model they were evaluating.
This is probably the most accurate way to measure quality without giving models any chance to cheat on benchmarks by training their AIs on those datasets. This benchmark is based purely on preference and blind choice by thousands of anonymous users.

A specialized “Reasoning Beta” variant of Grok-3, which employs internal chain-of-thought processing and additional computing at test time, pushes math scores even higher—reaching 93% on the AIME 2025 benchmark compared to the other best-performing models that rank below 87%.
Interestingly, a smaller version called Grok-3 Mini Reasoning Beta sometimes outperforms its larger sibling, thanks to a longer training time.
In other words, the full-size Grok-3 still has room for improvement once it receives comparable training duration, which seems promising given its greater parameter count.
But when xAI moved to demonstrate Grok-3’s capabilities live, the presentation felt more like a game of catch-up than innovation. The team showcased the model solving physics problems and writing game code from scratch—impressive feats that ChatGPT, Claude, and Google’s Gemini mastered a while ago.
New tools, old tricks
They also introduced DeepSearch, a research agent that, like similar tools from OpenAI and Google, scours the web and generates extensive reports on given topics.
X Premium Plus subscribers get immediate access to Grok-3, but the most powerful version and updated versions will usually live in a dedicated standalone app or on Grok.com.
Voice interactions, similar to OpenAI’s “Advanced Voice Mode” will arrive in the upcoming weeks, with Musk emphasizing this isn’t simple text-to-speech but a genuine AI voice model capable of natural, expressive speech.
Developers will get API access in the coming weeks, along with audio transcription capabilities, making Grok-3 a powerful tool for third-party AI-powered apps.
Just after showcasing an example of a Tetris game generated by Grok, xAI also revealed plans for an AI gaming studio that will let developers build games powered by Grok-3.
Right now, the model is being slowly rolled out. By the time of writing, Decrypt has yet to receive access to the model, but some enthusiasts have tried it and are so far pleased with the results.
Computer scientist Lex Friedman, one of the loudest voices in the AI space, praised Grok-3’s capabilities.
I got to use Grok 3 extensively (early). My mind is blown, very impressive model Congrats to Elon and the team for bringing it to life
— Lex Fridman (@lexfridman) February 18, 2025
Others compared it to leading market rivals.
“Grok 3 + Thinking feels somewhere around the state of art territory of OpenAI’s strongest models (o1-pro, $200/month), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking,” former OpenAI co-founder Andrej Karpathy wrote in an extensive post on X. “For now, big congrats to the xAI team, they clearly have huge velocity and momentum”
I was given early access to Grok 3 earlier today, making me I think one of the first few who could run a quick vibe check.
Thinking
First, Grok 3 clearly has an around state of the art thinking model (“Think” button) and did great out of the box on my Settler’s of Catan… pic.twitter.com/qIrUAN1IfD— Andrej Karpathy (@karpathy) February 18, 2025
X user Penny2x shared a game built from scratch with Grok-3—a 2d platformer similar to Mario Bros.
They appeared impressed by Grok’s ability to understand instructions and improve upon several iterations.
“I just keep asking for adjustments, and it keeps spitting the game out in a single file that I can put on my desktop and run.” he wrote in a post on X. “This is incredible. We live in the future. Everyone is a developer now.”
The game is available for testing at Thank Doge.
The company also confirmed plans to open-source Grok-2 once Grok-3 is fully mature and running correctly, which is expected to occur sometime in the coming months.
xAI previously open-sourced its models after Grok-2, continuing its trend of releasing older versions to spur innovation—though Grok-2 lags behind top-tier models.
For now, Grok-3 appears adept at matching what the best AI models can already do.
The real test will come when xAI rolls out its promised voice features, gaming tools, and API access in the weeks ahead. Now, the ball is in OpenAI’s court, which is set to release GPT-4.5 soon.
Edited by Sebastian Sinclair
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.
Credit: Source link