[ { "speaker": "Speaker 1", "text": "I've been writing code for the last 15 years and working as a software engineer for the last 10. And yet it feels like software engineering as a whole is changing so fast every couple months as" }, { "speaker": "Speaker 1", "text": "new models come out or as coding agents get better. And it can be kind of hard to understand these trends. So, Cursor, we just released a bunch of data on how AI assistant development and coding" }, { "speaker": "Speaker 1", "text": "agents have been changing the field of software engineering. So, I want to read through it live and give some of my commentary on how I think about this and how it's affected my own work as an" }, { "speaker": "Speaker 1", "text": "engineer. This developer habits report is going to talk about five things. So, coding speed is doubling year over year." }, { "speaker": "Speaker 1", "text": "We're seeing larger PRs. We'll talk more about that. Agent generated code is sticking around." }, { "speaker": "Speaker 1", "text": "Uh we've benchmarked many different model families and the cost per line and the cost to actually submit these requests uh is very different across different models and different providers." }, { "speaker": "Speaker 1", "text": "We see this trend of the top 1% power users of AI and coding agents really being very productive and having a a larger separation with the rest of users." }, { "speaker": "Speaker 1", "text": "Uh context is growing. We see a dramatic increase in input tokens and a shift towards trying to cache as much as possible, which makes sense. And then we're seeing a lot of people evolve from" }, { "speaker": "Speaker 1", "text": "more basic prompts to kind of building this system, building this factory that's going to help you produce high-quality software." }, { "speaker": "Speaker 1", "text": "And we have some interesting data to help talk about this. So, let's start with developer acceleration." }, { "speaker": "Speaker 1", "text": "Developers are now adding more code per week with growth accelerating since the start of 2026." }, { "speaker": "Speaker 1", "text": "Now, I wanted to add this. It's not a perfect metric, but I do think that lines of code added is at least directionally interesting. Obviously, you can add lots of bad code and that's" }, { "speaker": "Speaker 1", "text": "actually a net negative for the codebase. But when you do look at this in aggregate across the Cursor user data, it does show this trend of how more and more developers are both creating more projects, trying out new" }, { "speaker": "Speaker 1", "text": "ideas, trying out new prototypes. Uh other people outside of the development process are being able to contribute to building software, which I think is uh by and large a very good thing. But it" }, { "speaker": "Speaker 1", "text": "that does come with its own challenges, so we can't look at just that one bit of data um in isolation. Notably, code additions are growing in PR. So, the lines added per PR is up two and a half" }, { "speaker": "Speaker 1", "text": "times year over year, and that's continuing to grow. And then specifically, I think this mega PR, which is a fascinating to me. The number of PRs with a thousand lines changed are becoming more and more common. Which" }, { "speaker": "Speaker 1", "text": "makes sense. You see people who are kind of vibing out these PRs with tons of changes. Maybe they don't yet understand what a lock file is and why it added, you know, thousands of lines of codes in" }, { "speaker": "Speaker 1", "text": "in their diff, or maybe they accidentally generated some code and it needs to be ignored in Git, or maybe it's just actually just a ton of code." }, { "speaker": "Speaker 1", "text": "And I think this is really interesting for two reasons. One, you see the spike around the holidays, which I think is when a lot of people started to explore the latest models, Opus 45, kind of" }, { "speaker": "Speaker 1", "text": "getting into Cursor, trying this and applying it, or other coding agents. And then secondly, I think that these mega PRs do pose a real challenge for developers. It's hard to maintain" }, { "speaker": "Speaker 1", "text": "quality as the number of lines of code produced grows, and in general, that code can become a liability. I had a tweet about this earlier if you want to go check it out, but you should be trying to minimize the" }, { "speaker": "Speaker 1", "text": "amount of code, and an agent without proper wielding and control is going to be happy to write a lot of code for you." }, { "speaker": "Speaker 1", "text": "Some of that code you might not even need. It might be overly defensive. It might be overly backwards compatible for situations that don't really even matter or don't exist. So, it really takes a" }, { "speaker": "Speaker 1", "text": "lot of nuance to do this well, and a lot of these patterns are still really being figured out." }, { "speaker": "Speaker 1", "text": "But interestingly, in the past couple months, if you look at tool calls, so writing or editing files or running shell commands, searching the web, for example, you're seeing this continue to" }, { "speaker": "Speaker 1", "text": "rise about 30%. And for me, the way I think about this is agents and models are generally getting better at calling tools. And if you call tools, it's a pretty good approximation of an agent's" }, { "speaker": "Speaker 1", "text": "usefulness. If they're making more file changes, if they're reading the web, they're running shell commands, it's probably a more productive and helpful agent for you. So, I kind of view this chart as general" }, { "speaker": "Speaker 1", "text": "model intelligence and model quality improving over time, which I find really interesting." }, { "speaker": "Speaker 1", "text": "And then, the last part in this section, AI-generated code is surviving longer." }, { "speaker": "Speaker 1", "text": "This is a really interesting stat that Cursor can help provide through our data, but AI lines have been accepted are still present after 60 minutes. And, you know, you could argue what the" }, { "speaker": "Speaker 1", "text": "correct duration of that time should be." }, { "speaker": "Speaker 1", "text": "I think for me, what I take away is that code is very sticky, and that's why a lot of people say that adding tons and tons of code is kind of a liability to the maintenance of that software over" }, { "speaker": "Speaker 1", "text": "time. So, with great power comes great responsibility. Yes, it's amazing that we can generate and write lots of code, but the, you know, the senior staff engineers, the code base architects, the" }, { "speaker": "Speaker 1", "text": "people who are thinking about how to build these systems and make them maintainable over time, are also trying to fight against and they're using AI to work against the AI to make code review" }, { "speaker": "Speaker 1", "text": "easier, to make sure we're not overly adding things that we don't need. And I expect this trend just to continue to rise. I don't think there's a perfect solution in the market right now that" }, { "speaker": "Speaker 1", "text": "has figured this out. Everyone is still grappling with the ease of generating code through agents and what that means for the software systems that we maintain over time." }, { "speaker": "Speaker 1", "text": "Okay, section two." }, { "speaker": "Speaker 1", "text": "Uh intelligence. This is really interesting. I mean, there's a deeper philosophical thing here, I think, which is why do people like Opus? Why do people like GPT? Why do people like other models? And there" }, { "speaker": "Speaker 1", "text": "is um some correlation between how you view the warmth and the response of the model and the brand of the model and you're kind of building this relationship kind of like it was a coworker to where you" }, { "speaker": "Speaker 1", "text": "might be willing to pay a premium for using that model. I think a lot of people like the Claude models, a lot of people like the GPT models, and both of them are somewhat converging on some" }, { "speaker": "Speaker 1", "text": "best practices for how you should respond, how eager you should be to make edits, how much you should push back, how much you should continue working on long horizon tasks without having to ask" }, { "speaker": "Speaker 1", "text": "for a bunch of clarifying questions." }, { "speaker": "Speaker 1", "text": "This is really tough and getting this model behavior right has been a multi-year effort for um all of the model labs." }, { "speaker": "Speaker 1", "text": "So, it's interesting here to see that generally the Claude models are a bit more expensive. But then when you look at the cost per accepted line, this is where things start to get interesting. So, if a model is more" }, { "speaker": "Speaker 1", "text": "expensive, but it helps you get your job done faster, does that mean that it's actually the same cost as maybe a cheaper model that's going to make a bunch of edits or just work a lot" }, { "speaker": "Speaker 1", "text": "longer? I think in some cases, yes. Like if you can get the most intelligent model to one-shot something, it will be overall a lower price than a very unintelligent model just spinning its" }, { "speaker": "Speaker 1", "text": "wheels for a very long time. But this is a nuanced question because for most of your agentic work, probably everything is not requiring that level of intelligence or that level of um you" }, { "speaker": "Speaker 1", "text": "know, one-shot capabilities. So, if you use the most expensive model for everything, it will add up and I think we've we've seen that um already happening with some companies as they're" }, { "speaker": "Speaker 1", "text": "trying to figure out how to balance um using very, very smart models and also the economics of thousands of engineers who are writing code now every day using these tools and trying to find the right" }, { "speaker": "Speaker 1", "text": "balance of price, performance, and cost." }, { "speaker": "Speaker 1", "text": "And so, we're seeing this uh both in our own internal evals called CursorBench, as well as external evals where people are trying to figure out where on this uh on this chart they want to fit. So," }, { "speaker": "Speaker 1", "text": "the average cost per task on the x-axis, and on the y-axis we have the percentage that the model scores on our evals. And you know, you might argue that well, this is a CursorBench, so obviously" }, { "speaker": "Speaker 1", "text": "cursor models are going to score well here, which I think there is some um there are many ways that we try to make that not true, and make sure that we're not just, you know, padding our own" }, { "speaker": "Speaker 1", "text": "benchmarks, right? But, I think it's also going to take this with a grain of salt, and compare it against other external benchmarks. For example, we report on TerminalBench and SweBench" }, { "speaker": "Speaker 1", "text": "multilingual, and then there's artificial analysis and a bunch of other external benchmarks where you can kind of make your own comparison here. So, for example, the artificial analysis" }, { "speaker": "Speaker 1", "text": "benchmark is pretty similar to the results that we're seeing here." }, { "speaker": "Speaker 1", "text": "Um but, what it's trying to show, in my opinion, the bigger conversation is how much do we in value intelligence to a certain price point? And especially the total time it takes to get a task done." }, { "speaker": "Speaker 1", "text": "If there's a fast variant of a model, and it's economically much more affordable than maybe a different fast model, does it make more sense for us to use that and get the work done more" }, { "speaker": "Speaker 1", "text": "quickly?" }, { "speaker": "Speaker 1", "text": "Kind of depends. Kind of depends on the team." }, { "speaker": "Speaker 1", "text": "Okay. The power user gap, number three." }, { "speaker": "Speaker 1", "text": "Uh This is really interesting. This one kind of blew my mind. I mean, it makes sense. You see people building these wild things what on X, and you hear about these people who are just using a" }, { "speaker": "Speaker 1", "text": "ton of agents and creating wild things." }, { "speaker": "Speaker 1", "text": "And when you look at this usage, you see a small share of developers that's just writing a ton of code with AI, or building these uh very complex agent systems, and using a lot of tokens. And" }, { "speaker": "Speaker 1", "text": "they're using tokens to to automate software, which we'll talk about later." }, { "speaker": "Speaker 1", "text": "Um but, I think for me, the thing to take away here is somewhat similar to lines of code produced. I think tokens consumed is not a perfect measure. I think there is some amount of token" }, { "speaker": "Speaker 1", "text": "waste here where even the people in the top 1% on the bleeding edge of of trying out and using these models and these agents as much as they can, they are willingly knowing that this is not a perfect measure of" }, { "speaker": "Speaker 1", "text": "productivity, that some tokens are kind of wasted in the pursuit of what whatever this means to you, but becoming more AI native, figuring out AI agent workflows, whatever, you know," }, { "speaker": "Speaker 1", "text": "word salad you want to use to describe that." }, { "speaker": "Speaker 1", "text": "Everyone is trying to disrupt themselves and figure out how they use these tools." }, { "speaker": "Speaker 1", "text": "And for a lot of people, especially a lot of companies, they're willing to have some error bars on how much of that token usage is actually just kind of throw away in pursuit of that larger" }, { "speaker": "Speaker 1", "text": "goal of upskilling or reskilling an entire workforce. So, it's interesting." }, { "speaker": "Speaker 1", "text": "I I think what it means to me is it is worth investing in learning the latest models, learning the latest agents. And it's also okay to come at that with this critical eye of cost to intelligence to" }, { "speaker": "Speaker 1", "text": "performance and trying to get the most value out of the tools that you're using." }, { "speaker": "Speaker 1", "text": "So, when you look at these developers kind of pulling away from the median developers, this chart is measuring it in lines of code per week, which we've already kind of talked about is, you" }, { "speaker": "Speaker 1", "text": "know, an imperfect measure, but still interesting." }, { "speaker": "Speaker 1", "text": "And then here's another interesting thing. When you look at the lines of code, the median active user merges." }, { "speaker": "Speaker 1", "text": "They merge 15 times more PRs." }, { "speaker": "Speaker 1", "text": "Um so, that's really interesting. I actually like merged PRs as a better metric than just lines of code added cuz presumably um for for some compliance reasons, a lot of companies need to have at least" }, { "speaker": "Speaker 1", "text": "one human reviewer sign off on a PR." }, { "speaker": "Speaker 1", "text": "So, if that's the case and a PR is getting merged, there is at least some human reviewer doing some sort of check." }, { "speaker": "Speaker 1", "text": "Now, it might just be a cursory check, no pun intended, but they're doing some review of the code." }, { "speaker": "Speaker 1", "text": "And for a PR to get merged and to know that that code is going to production and someone is going to be responsible for that code, it is a higher bar than I just generated a bunch of code and I" }, { "speaker": "Speaker 1", "text": "just like vibed out this prototype, right? So, that's pretty interesting and I think it speaks back to what I was saying where the P99 people are really trying to get the most out of these models and figure out" }, { "speaker": "Speaker 1", "text": "how to kind of reimagine their workflows. Now, on context, this one is maybe not as interesting, but I think still pretty pretty interesting as kind of a footnote. A lot of these things, the" }, { "speaker": "Speaker 1", "text": "coding agents and the harnesses are taking care of you and handling it behind behind the scenes, but over time, you're going to see a lot more input tokens, which makes sense. Like, you you give the models," }, { "speaker": "Speaker 1", "text": "um, you know, a pretty you know, relatively simple prompt and then they go and do a lot of work for you. So, when you look at this input mix, 90% of the input output token volume, it's" }, { "speaker": "Speaker 1", "text": "making the context that dominant part of the non-cached model usage. And if you jump down to input context, you're seeing that's basically the main token cost now. So, it's dominating that token" }, { "speaker": "Speaker 1", "text": "consumption and it's become the majority of the price equivalent token usage since the start of the year at roughly half of the input output token cost to nearly 70% and now if we drill in like one final" }, { "speaker": "Speaker 1", "text": "time, most of that is going to be cash. So, the cash read tokens are going to dominate all of that total token activity, which shows really how much this agent work is important to reuse" }, { "speaker": "Speaker 1", "text": "prior contexts, to use prompt caching, you know, there's new strategies that models and model providers are putting into agent um, into the models that they're offering to better take" }, { "speaker": "Speaker 1", "text": "advantage of the cash. As the context grows, as you ask agents to do these very very long threads or conversations and then compact or self-summarize over time, it's very important that you can" }, { "speaker": "Speaker 1", "text": "cash as much as possible in that. And there's a lot of nuances to do this well with things like tool calling and as these models get more complicated. So, we have a blog post if you want to learn more about how we try" }, { "speaker": "Speaker 1", "text": "to maximize caching across models and providers, but it's a very tough problem." }, { "speaker": "Speaker 1", "text": "And kind of to wrap up, the interesting thing to take away is you might think of people working with coding agents or writing code with AI as doing you know simple prompts into a text input or" }, { "speaker": "Speaker 1", "text": "writing plans and reviewing plans and making those changes, but I think increasingly what you're starting to see is more changes accepted without manual review, which is in my opinion, this number is lower than" }, { "speaker": "Speaker 1", "text": "what I think you would expect if you were in the SFX zeitgeist online bubble." }, { "speaker": "Speaker 1", "text": "Like 30, what is this? 30 6%? I mean, if you follow just the online hype cycle, you would think that oh everyone's just merging code without review, but I think for most teams like" }, { "speaker": "Speaker 1", "text": "I talked about compliance reasons or software quality maintenance reasons, it's still really important to have that review." }, { "speaker": "Speaker 1", "text": "The thing that I take away from this more than anything else for all its pros and cons is that we're starting to build these agent systems where it's really important to figure out how" }, { "speaker": "Speaker 1", "text": "you build this repeatable software factory and you automate the best practices of building software at scale across your team, whether that's building something on top of an agent SDK that kind of abstracts" }, { "speaker": "Speaker 1", "text": "away some of the harness and the models or you're setting up automations that will do security reviews or code reviews or other things." }, { "speaker": "Speaker 1", "text": "Uh think this is the the next trend that we're seeing uh where those those automations, those cloud agents, those systems can run overnight. They can run in the background, and they can help you" }, { "speaker": "Speaker 1", "text": "make better quality software, and ultimately a better product at the end of the day um as you continue to improve and redefine what software engineering looks like." }, { "speaker": "Speaker 1", "text": "So, that's the report. I know I tried to breeze through it and give some of my commentary, but I found this really interesting. This data is helpful to kind of understand how software" }, { "speaker": "Speaker 1", "text": "engineering as an industry is changing." }, { "speaker": "Speaker 1", "text": "So, let me know what you thought was most interesting, and if we should do something like this again." }, { "speaker": "Speaker 1", "text": "Peace." } ]