Comparing AI Coding Assistants

In the rapidly evolving landscape of AI-powered coding assistants, developers are presented with a variety of options. But which tool truly stands out? I’ve put several popular AI coding tools through their paces, testing them on a real-world scenario using a ~70 file Python project - a rogue-like game I developed some time ago (you can find it on GitHub: https://github.com/Julioevm/coten).

To ensure a fair comparison, I performed the following specific tests for each tool:

  1. Workspace-wide change: Requesting a new feature that leverages the existing codebase, requiring changes across several files (around 5). The expectation was to be guided through each necessary change.

  2. Code Finding: Using the chat interface to inquire about the appropriate file for a specific change, particularly where to add another status bar in the UI.

  3. Localized change: Implementing a relatively small change in a specified file, potentially as a follow-up to the previous question if the correct location was identified. Suggestions for related changes in other files were also valued.

  4. Autocomplete: Evaluating the general satisfaction with auto-complete functionality, without attempting complex code generation.

  5. Inline issue fix: Attempting to resolve a simple linter-highlighted error in the code, specifically a basic Python typing issue.

I used the same prompt to start with in every case, the same one that worked well enough with Cursor—as it was the first one to test—and a single promp worked well enough in that case.

For example, for the first test I used:

“What changes should I perform in order to add a new functionality to my game?

The functionality is as follows: Weapons can have a special ability. You trigger it by pressing space.”

Arguably not the best prompt, but I wanted to try these tools in a more natural way than an experienced user would prompt them with. And considering Cursor was successful, I kept the basic prompt for the other tests.

Let’s dive into how each tool performed:

Cursor: Setting a New Standard

Website: https://www.cursor.com
Models: GPT-4, GPT-4o, and Claude 3.5 Sonnet (Claude 3.5 used in this test)

Cursor has truly raised the bar for AI coding assistants. Here’s a breakdown of its performance:

  • Workspace-wide changes: In a single prompt, Cursor showed all the necessary changes with minimal input from the user. The suggested changes implemented the feature without issues. As a bonus, it even suggested updating the in-game help - a detail no other tool considered.

  • Code finding: Cursor indexes the codebase, resulting in very accurate file location based on functionality. It can also show related files where functions are used together.

  • Localized changes: Executes changes effectively, including updates to related files to ensure functionality.

  • Autocomplete: Quick and context-aware, even within quotes. It can suggest non-adjacent follow-up lines related to the last change.

  • Inline issue fixing: Resolves lint errors with a simple right-click.

  • User Interface: The UI makes it easy to apply changes for each file and check the diff for approval.

Summary: Cursor sets a new standard for AI assistants. Its code suggestions are more accurate and comprehensive, and the UX creates a seamless experience. Where other tools might require multiple prompts and file navigation, Cursor handles it all in one place. While it’s twice as expensive as some alternatives like Cody, the time saved could well justify the cost for many developers. Another caveat is that it’s a stand alone editor, while most other options have plugins available for editors besides vs code.

GitHub Copilot: The Established Player

Website: https://github.com/features/copilot
Model: GPT-4o

Copilot has been a go-to for many developers. Here’s how it performed:

  • Workspace-wide changes: Copilot came close to identifying the right files for necessary changes but missed some. The implementation wasn’t given in full, requiring additional prompting for missing parts.

  • Code finding: Results were mixed. It struggled to find where specific code was handled in the codebase, but performed better when asked to perform more specific changes.

  • Localized changes: Provided implementation for localized changes and suggested necessary changes to other files. It can apply changes but lacks a built-in diff viewer, requiring reliance on git diff.

  • Autocomplete: Performance could be better. It struggles with autocomplete when there are certain characters in the middle of a sentence, often requiring the cursor to be at the end of a line.

  • Inline issue fixing: The suggested fix for the tested error was incorrect.

Summary: Copilot offers solid integration with VS Code and has most desired features, but some need polishing. The autocomplete experience lags behind competitors. While its chat and VS Code integration are strong, tools like Cody offer improvements at the same price point. Notably, Copilot is the only tool in this comparison without a free tier option.

Codeium: A Mixed Bag

Website: https://codeium.com
Models: Proprietary model, with options for Claude 3.5 Sonnet and GPT-4 (Sonnet used for chat tests)

Codeium shows promise but falls short in some key areas:

  • Workspace-wide changes: Provides a high-level overview of changes and suggests concrete changes for different files. However, the suggestions aren’t comprehensive enough to complete the feature, often requiring additional prompting or manual coding. The tool lacks functionality to automatically apply these suggestions.

  • Code finding: Struggled to locate the correct place where the asked functionality was implemented, though it found some related functions.

  • Localized changes: Required multiple prompts even when shown the correct context file. Even then, the implementation was often wrong or incomplete. The tool lacks functionality to apply the changes it suggests.

  • Autocomplete: Feels faster than Copilot and solves some issues with mid-line autocompletion. However, the quality of suggestions isn’t always high.

  • Inline issue fixing: Capable of addressing inline issues.

Summary: Codeium has fallen behind competitors, particularly in code suggestions and file navigation. It offers a generous free tier, but its performance doesn’t match up to other tools, especially in complex tasks. Users are limited to a few messages with advanced models (GPT-4 and Sonnet 3.5) unless they subscribe and the base model can fall sort.

Supermaven: Limited But Potential

Website: https://supermaven.com
Models: GPT-4 or Claude 3.5 Sonnet (requires user’s API key in free mode)

Supermaven faces some significant limitations:

  • Workspace-wide changes: Unable to process the entire codebase.

  • Code finding: Limited by inability to access full projects.

  • Localized changes: Can suggest and apply changes to specific files but may introduce formatting issues, particularly with Python indentation. It lacks awareness of inter-file dependencies.

  • Autocomplete: Performs well, with quick suggestions that work in the middle of lines.

  • Inline issue fixing: Capable of fixing issues, showing diffs in chat, and auto-applying results.

Summary: Supermaven’s free tier feels restrictive, limiting its usefulness for complex projects. However, its pay-as-you-go API key option might appeal to some users. In its current state, it’s hard to find compelling reasons to choose Supermaven over other options, at least in the free tier. It might not be fair considering I’m comparing against co-pilot that doesn’t have a free tier to begin with, but I shouldn’t have to use my credit card data just to test it.

Cody: A Strong Contender

Website: https://sourcegraph.com/cody
Models: Claude 3.5 Sonnet (with support for GPT-4o, Gemini, Mistral, local Ollama, and more)

Cody impresses with its capabilities and flexibility:

  • Workspace-wide changes: Offers a high-level list of changes. After another prompt, it shows the code that needs changing. Compared to Cursor, the changes are more complex and don’t fit as nicely with the existing codebase. While the initial suggestions alone wouldn’t make the feature work, they came close in one attempt.

  • Code finding: Performance varies. It might not show the exact file for changes but suggests function names to look for and possible code implementations. In another attempt, it found a relevant function but not the precise location for the change.

  • Localized changes: Executes file-specific changes well, with diff viewing and direct application. While it doesn’t immediately suggest changes in other files, it mentions their necessity, allowing for further prompting to get complete changes.

  • Autocomplete: Fast, but not always smooth. Sometimes adds unnecessary brackets and doesn’t always trigger mid-line. This area could use improvement.

  • Inline issue fixing: Capable of addressing lint errors with a right-click.

Summary: Cody is a pleasant surprise, offering a robust set of features at a competitive price point ($10/month, same as Copilot). It has a generous free tier and supports integration with local Ollama instances. Custom commands for quick, specific prompts add to its versatility. While some areas need polishing, Cody shows great promise and outperforms Copilot in several aspects.

Conclusion

After putting these AI coding assistants to the test, it’s clear that the field is evolving rapidly. Cursor stands out as the current frontrunner, offering an exceptionally smooth experience that could justify its higher price point for many developers. However, Cody emerges as a strong contender, providing a robust set of features at a competitive price.

Copilot, while still solid, seems to be falling behind in some areas, particularly in autocomplete functionality. Codeium and Supermaven, while having their strengths, currently lag behind the top performers in this comparison.

The landscape of AI coding tools is rapidly changing. While this comparison provides a snapshot of current capabilities, it’s always worth keeping an eye on new developments and updates from these and other emerging tools in the space.

2024

Comparing AI Coding Assistants

7 minute read

In the rapidly evolving landscape of AI-powered coding assistants, developers are presented with a variety of options. But which tool truly stands out?

Back to Top ↑

2021

Back to Top ↑

2020

2020 What a year…

1 minute read

So what can I say that hasn’t been said about 2020? That it has been a great year so far! Wait, what? Yes, in retrospective and very selfishly it has been a ...

Game related youtube channels

2 minute read

For a month and two weeks now, I’ve been having less and less time to do much beside changing diapers. If I can squeeze a bit of time into my hobbies, I have...

Back to Top ↑

2019

Moving to linux at work

1 minute read

So I’ve been using Linux here and there for many many years, but never used it as a daily driver. On my current work, we use Linux machines a lot, both physi...

Back to Top ↑