I tested ChatGPT-5.2 vs Grok 4.1 with 7 challenging prompts — here’s the winner

Grok 4.1 and GPT-5.2 are two of the best AI models on the market right now. Powering the latest versions of ChatGPT and Grok, they are designed to excel at writing, logic, research and creativity.

But which is the better overall chatbot right now with these updates? Both have claims to the title, boasting impressive results in benchmark tests and receiving glowing reviews from the public.

1. Workout plan

(Image credit: Future)

Prompt: I want to build muscle and lose weight but I’m a complete beginner. Design a weekly workout routine for me that includes gym workouts and running or cycling. I work 9 to 5 Monday through Friday and like to keep my weekends free, work around this.

Grok 4.1: The model provided me with a plan for the week that spread across Monday-Friday. This included three gym workouts and two cardio sessions.

While the three gym days were all full-body workouts, it changed them each day to keep things interesting. Where Grok especially thrived was in the fact that it offered a lot of clear instructions.

It explained why I was doing everything listed, and what the benefits of small changes were. It even listed out the exact amount of time this would take out of my week and when to fit sessions in.

ChatGPT-5.2: While ChatGPT answered my question fully, it didn’t included the same level of detail as Grok. In fact, it was quite a sparse response, only listing out the most important details.

In a way, this was helpful, removing any excess information and just handing over the facts, but it did feel like it was missing a lot of the context that Grok provided. It also only gave one workout routine, which was quite a basic one. It did give some tips on healthy eating though.

Winner: Grok for the inclusion of more detail and a clearer path.

2. Renewable energy

Prompt: Produce a detailed report on the current trends of renewable energy. The report should include an analysis of cost-effectiveness, risks, and benefits. Where needed, include charts or infographics.

Grok 4.1: A sea of facts, figures and a lot of detail, Grok crammed a lot in here. There are multiple tables and graphs and the information is split into useful sub-headed sections.

While it was a huge amount of information thrown in, it did finish up with a helpful conclusion and outlook to summarize more generally

ChatGPT-5.2: Similarly to Grok, ChatGPT threw pages of information at me. However, while it was equally as dense, it took a different structural layout. It used headers based on my original prompt, splitting it into cost-effectiveness, risks and benefits.

Like Grok, it did include some charts, but these were included at the bottom as downloadable attachments, and they were very basic.

Winner: Grok for more detail and better inclusion of infographics.

3. Trick question

Prompt: You are given a town with 100 residents, each of whom either always tells the truth or always lies. You may ask exactly three yes-or-no questions, total, to any residents you choose. Design a strategy that guarantees identifying at least one truth-teller and explain why it works.

Grok 4.1: I think I broke Grok here. This prompt is essentially designed to be impossible, its simply asking a chatbot to do the maths and realise its not a problem that can be solved.

Grok first responded a few times saying “no response” before finally answering with a never-ending response. When it finally finished, it had produced a small book’s length of gibberish, going in circles trying to solve the problem.

While it told me it had designed an effective strategy, and solved the problem, it finally finished up its ramblings with “Unfortunately, there is no strategy that guarantees identifying a specific truth-teller with only three yes-or-no questions in the worst case, without additional assumptions about the number of truth-tellers and liars.”

ChatGPT-5.2: Nice and simple. After thinking for just 1 minute, ChatGPT told me it wasn’t possible, explained why and went about its day. It even went the extra step of explaining what variables would need to be added for this to be solvable.

Winner: ChatGPT…for obvious reasons.

4. AI image generation

Prompt: Step by step, explain how an AI image generator works. Make it understandable for anyone, even someone who has no knowledge of AI. Use all of your tools to answer this question

Grok 4.1: Grok started at the beginning, explaining each step of how an AI image generator works in detail, but without overcomplicating the subject.

It also included diagrams and some examples of what AI generated images look like.

ChatGPT-5.2: ChatGPT started off with a somewhat confused step by step graph, where the text didn’t fully fit inside the box. After thinking a bit more, it provided a set of diagrams and a step-by-step process.

While it did explain how they work, it did so in a slightly confused order, never fully explaining the concept but more hinting at it than anything.

Winner: Grok for a clearer explanation of the subject and inclusion of better infographics.

5. Phone story

Prompt: Write a 250-word story about the fear of realising your phone is about to die without ever actually stating that the phone is going to die.

Grok 4.1: A great attempt at a story here, and Grok followed the prompt exactly, at no point stating that this was about a smartphone about to die.

However, it did feel a bit repetitive, repeating some of the narrative points a few times over, or creating long lists to highlight the point.

ChatGPT-5.2: A similar output to Grok, the main difference here was that ChatGPT took more of a chance in its narrative voice. It was more dramatic overall, but that worked for the tone.

Winner: ChatGPT for a slightly better use of creativity

6. The future of AGI

Prompt: Answer the following complex question: ‘Will artificial general intelligence increase or decrease global inequality by 2040?’ After answering, critique your own response by identifying at least three assumptions you made, explaining why each could be wrong, and revising your conclusion accordingly

Grok 4.1: For most of the prompts that I used in this test, Grok would provide more detail than ChatGPT. This wasn’t the case here. While it produced a really detailed report, it never really seemed to offer a conclusive answer or opinion.

ChatGPT-5.2: went into this task with confidence, listing out a long report of information and explaining its thinking.

Winner: ChatGPT for a more conclusive response

7. Work focus

Prompt: I’ve been struggling to stay focused while working. How can I fix this? Give me five actionable examples, as well as some tips and a long term plan

Grok 4.1: Like in some of its previous outputs, Grok included a long step-by-step process, but it also included some graphs to help explain one of its main suggestions, the Pomodoro Method.

It also included some ‘extra suggestions,’ as well as some long term fixes to the problem.

ChatGPT-5.2: ChatGPT offered similar suggestions to Grok, but like some of its previous answers, it just did it with slightly less detail. It did however perform well when it came to long term solutions to the problem.

Winner: Grok for better detail and more suggestions.

Overall winner: Grok

This was a close one, but Grok just about took the win. This was thanks, mostly, to the AI model’s detail and tendency to include images and infographics where needed.

However, it seemed to struggle more than ChatGPT when it came to questions of logic or creative writing.

Both models did a great job overall and, other than Grok’s confusion at the trick question, there were no bad responses here.