🍌
← Back to Bonita Banana
Issue #4 Sunday, March 8, 2026

Anthropic's Claude 3.7 Sonnet Hits 62% on SWE-bench — The Reasoning Wars Are On

Claude 3.7 Sonnet beats GPT-4o on coding benchmarks, Mistral goes local, and a tip that fixes bugs faster.

The reasoning arms race just got a new leaderboard entry. Anthropic dropped Claude 3.7 Sonnet with a hybrid thinking mode — and the benchmark numbers are hard to ignore.

🍌 The Big Story

Anthropic released Claude 3.7 Sonnet, its first model with extended thinking built in. Toggle it on and the model reasons step-by-step before answering — similar to OpenAI’s o1, but available in the API and Claude.ai today. The headline number: 62.3% on SWE-bench, which measures real-world software engineering tasks. That’s up from Claude 3.5’s 49% and above GPT-4o’s 58%. For developers, this means Claude can now handle multi-file refactors and tricky bug diagnoses without hand-holding. read more

⚡ Quick Peels

🛠️ Tool of the Day

Cursor🍌 Ripe

The AI code editor that’s quietly become the default for developers who want to ship faster. Tab completion predicts your next edit across files (not just the next line), and Composer lets you describe a feature in plain English and watch it scaffold across your whole codebase. Free tier includes 2,000 completions per month — enough to get genuinely hooked before you hit the paywall.

🧠 Banana Tip

Stuck on a bug? Paste the error and the relevant code into Claude with extended thinking enabled and add: “Think through every possible cause before suggesting a fix — ranked from most to least likely, with one sentence explaining why each could be the culprit.” You’ll get a proper diagnosis instead of the usual “try checking your imports.”


The reasoning wars are just getting started — and the tools on the front lines are free to try. 🍌

— The Bonita Banana Team