[ { "i": 0, "speaker": "Speaker 1", "text": "And you have an agent running right now?" }, { "i": 1, "speaker": "Speaker 1", "text": "I do. As I know from setup, what is your agent doing today?" }, { "i": 2, "speaker": "Speaker 2", "text": ">> Uh today we as Harry mentioned we're we've been doing some KB cache compaction stuff recently." }, { "i": 3, "speaker": "Speaker 2", "text": ">> Um so trying to figure out how we can extend context windows by um compacting a KB cache. Um so we have probably between us about 64 to 128 agents working on this given time >> right now. How many guys are on that on" }, { "i": 4, "speaker": "Speaker 2", "text": "actively running jobs right now? I think I've got 16 nodes um of eight GPUs each and then the agents partition those how they like. Yeah." }, { "i": 5, "speaker": "Speaker 3", "text": ">> And I find a lot of a lot of the work like if you want to have a lot of agents going. I feel like normally >> I'll have like a few agents that I actually talk to and the other ones are" }, { "i": 6, "speaker": "Speaker 3", "text": "sort of like delegated tasks like I tell my main agent to like delegate tasks to the other ones. Um >> yeah, I like sort of set up some like messaging scripts so that they they can" }, { "i": 7, "speaker": "Speaker 3", "text": "like send direct user messages to each other. Um, and so yeah, I'll just I'll just say like, oh, you know, how's my I gave them all like different like mathematician names, you know, like I'll" }, { "i": 8, "speaker": "Speaker 3", "text": "be like, oh, how's Punkare doing today?" }, { "i": 9, "speaker": "Speaker 3", "text": "Or, you know, what's Hilbert up to? You know, >> that's cool. So, you actually remember which one is doing what by the mathematician like, oh, you know, Hilbert's working on the evals, you know." }, { "i": 10, "speaker": "Speaker 1", "text": "Um, I was very, um, excited to meet you guys and eager to hear maybe kind of what you work on at base 10 and kind of how you got where you were. And to introduce myself, I'm Sam Whitmore. I" }, { "i": 11, "speaker": "Speaker 1", "text": "work at Curser and I am an engineer on the cloud agents team. I've worked at Curser for about six months. Um, but yeah, hand over to you guys." }, { "i": 12, "speaker": "Speaker 2", "text": ">> So, my co-founders Moody Max and I started paused uh, started last year." }, { "i": 13, "speaker": "Speaker 2", "text": "Um, and at the beginning it was sort of blue sky research on like, okay, we're very interested in open source models." }, { "i": 14, "speaker": "Speaker 2", "text": "We're very interested in supporting an ecosystem that wasn't just, you know, one or two >> closed frontier models. Um, how do we go about that? And I think we pretty quickly narrowed in on the thesis of" }, { "i": 15, "speaker": "Speaker 2", "text": "specialization and post training in particular as a way to like allow people to own their own intelligence. And I think when we first started that was quite a contrarian take. um not many" }, { "i": 16, "speaker": "Speaker 2", "text": "people were quite skeptical that you could really do um valuable economically valuable things with um open source models. Um but we believed in the thesis and I think um sort of by the middle of" }, { "i": 17, "speaker": "Speaker 2", "text": "last year um the base intelligence and capabilities of open source um had gotten good enough that we could actually start to specialize them particularly for these like kind of subtasks or very repeatable things that" }, { "i": 18, "speaker": "Speaker 2", "text": "that a lot of companies were doing. Um and then yeah, we were using B 10 for inference. Um we thought they were the best inference providers in the world." }, { "i": 19, "speaker": "Speaker 2", "text": "We tried all of them. Um and then I think at one point B 10 noticed that we were sort of directing a lot of of new inference demand um onto B 10 GPUs. And so we had a bunch of chats and and" }, { "i": 20, "speaker": "Speaker 2", "text": "realized that like this would be a much more powerful thing if we could do this together. If you could couple the inference and like the real world signals you were getting from from" }, { "i": 21, "speaker": "Speaker 2", "text": "inference and and users telling you what they did or didn't like about the particular way that your language model was performing in your harness. And when you couple that with training like" }, { "i": 22, "speaker": "Speaker 2", "text": "that's a really powerful paradigm. Um it like kind of connects together this feedback loop. Um which is much more powerful than what you can achieve with like you know prompting and and" }, { "i": 23, "speaker": "Speaker 2", "text": "constantly just iterating on your harness. Um, and so yeah, we we've been a bas ever since and Harry was one of our first employees at PED." }, { "i": 24, "speaker": "Speaker 3", "text": ">> I wanted to work in AI and I'd been interested in it for such a long time and so yeah, when I heard Charlie was uh doing a startup, you know, I had a chat um and yeah, just, you know, I was like," }, { "i": 25, "speaker": "Speaker 3", "text": "\"Oh, wow. Charlie really knows what he's talking about.\" Um, and so, so yeah, then I I joined up um and we had a lot of fun. I think the thing is because of the scale we were at like initially, you" }, { "i": 26, "speaker": "Speaker 3", "text": "know, we we really had to sort of um scrap around and like figure out um how to do things more efficiently. Um and so that was kind of fun. Yeah. Like just, you know, kind of inventing some new" }, { "i": 27, "speaker": "Speaker 3", "text": "sort of strategies that's going to like basically get us like really high per good performance but just you know without using like a ton of compute." }, { "i": 28, "speaker": "Speaker 3", "text": "Since then I've been like now we do have access to a ton of compute and um and it's uh it's a lot more fun. we can we can do a lot more larger scale experiments. So yes, it's great." }, { "i": 29, "speaker": "Speaker 1", "text": ">> How much responsibility can you give an individual Hilbert or Gaus before it starts to break down?" }, { "i": 30, "speaker": "Speaker 3", "text": ">> Generally, uh yeah, I need to I need to like make sure I've specified cleanly what they all need to do. Yeah. And um uh but but typically like I think another fail mode sometimes is like" }, { "i": 31, "speaker": "Speaker 3", "text": "sometimes they just like stop working >> and I need to like set up a a loop to like keep on reminding them of everything. Particularly if things are running like overnight I'll normally put" }, { "i": 32, "speaker": "Speaker 3", "text": "in like some reminder um like little message that just has like a list of all these things like make sure you're checking this, check that, check, you know. Um >> do you enforce this in any way like or" }, { "i": 33, "speaker": "Speaker 3", "text": "is this kind of just like hoping they adhere to the prompt you give? Uh I mean I guess like I have a few you know generally I run uh them you know just allowing them to do uh everything but I" }, { "i": 34, "speaker": "Speaker 2", "text": "I have a few like checks for you know uh entering any dangerous commands. And I think like a powerful a new powerful abstraction which you're starting to see with these goal loops and stuff is like" }, { "i": 35, "speaker": "Speaker 2", "text": "if you put a lot of effort into the prompt at the start um and you put a lot of effort into defining exactly how you might verify that a task is complete like you can use a separate agent as an" }, { "i": 36, "speaker": "Speaker 2", "text": "element as a judge to continuously check whether the main agent has completed its task or not." }, { "i": 37, "speaker": "Speaker 2", "text": ">> Um and so when you've got the main agent doing that itself sometimes it likes to cheat and say I think I've done like or I've done my best like I'm going to stop now. But if you have a separate agent" }, { "i": 38, "speaker": "Speaker 2", "text": "which is like constantly being called and saying no, you actually haven't completed and then as Harry said, you kind of have this hook which forces it to keep going. Um these things can run" }, { "i": 39, "speaker": "Speaker 1", "text": "for like days." }, { "i": 40, "speaker": "Speaker 1", "text": ">> How do you actually set up this harness for yourself right now?" }, { "i": 41, "speaker": "Speaker 2", "text": ">> Yeah, I mean like the I don't like to put too much effort into the harness because I think there's so many people in the world working on on harnesses and working on understanding how the models" }, { "i": 42, "speaker": "Speaker 2", "text": "operate within those harnesses. Yeah, I I tend to just keep an eye on like, you know, what cloud code's doing, what codeex is releasing, and then like porting that into my setup, which at the" }, { "i": 43, "speaker": "Speaker 3", "text": "moment is cursor agents." }, { "i": 44, "speaker": "Speaker 2", "text": ">> Yeah, you said you you you like ported the code directly." }, { "i": 45, "speaker": "Speaker 3", "text": ">> You can just get your cursor agent to look at the Rust for Codeex and say, \"Okay, how do they implement this and build it yourself for you?\"" }, { "i": 46, "speaker": "Speaker 3", "text": ">> Yeah, exactly. I feel like that's the kind of thing. It's just like it's so easy to have your agents build infrastructure for themselves." }, { "i": 47, "speaker": "Speaker 1", "text": ">> We've been doing a ton with that internally. Um, one of our engineers, Lauren, who was working on our new cursor 3 launch, um, was trying to do a bunch of performance improvements for" }, { "i": 48, "speaker": "Speaker 1", "text": "it. And she built basically a skill that could actually like launch and drive an instrument cursor the cursor 3 application and like verify like a previous state of there being a performance problem, then like verify" }, { "i": 49, "speaker": "Speaker 1", "text": "that the performance problem was gone after the fact. But he was like automate the QA process. So, we've been once she did that, we were like, \"Okay, this seems like something we need to be" }, { "i": 50, "speaker": "Speaker 1", "text": "investing a lot of time into.\"" }, { "i": 51, "speaker": "Speaker 3", "text": ">> Yeah, exactly. I think anytime you got something which is really verifiable, it's just so easy to have an agent just like hill climb it over like overnight or over a few days." }, { "i": 52, "speaker": "Speaker 3", "text": ">> Um, the hard ones is where it's like it's not quite verifiable. You know, in particular, like if you're trying to design some evals and you want it to be good, >> but it's really hard to specify exactly" }, { "i": 53, "speaker": "Speaker 3", "text": "what you mean by good. And normally I feel like in that sort of scenario I'll have them, you know, I'll I'll try to describe in best as I can, you know, what I want they missed. Um, and yeah, I" }, { "i": 54, "speaker": "Speaker 3", "text": "feel like I feel like that's in general >> I'm a believer that like they can write code for you, but they can't actually read code for you. You still have to read the code. you know, I guess they" }, { "i": 55, "speaker": "Speaker 3", "text": "can they can summarize, they can help explain, but >> if you really want to get like the most I feel like you still need to actually understand what's actually going on in the code," }, { "i": 56, "speaker": "Speaker 2", "text": ">> I think like taste to make it really explicit like taste is the big bottleneck now >> like when it is this kind of non-verifiable or quasi verifiable task." }, { "i": 57, "speaker": "Speaker 2", "text": "Um, like the ability to steer the models now is great like while they're working on things, but you always have to be like constantly injecting like your taste for one of a better word for how" }, { "i": 58, "speaker": "Speaker 2", "text": "like they should cut down the search base for these like very big like non-verifiable tasks that you give them." }, { "i": 59, "speaker": "Speaker 3", "text": "Yeah, it's even interesting like some sometimes like obviously they're very good at debugging in general, but I feel like a failure mode that they often like all of the agents have is that I think" }, { "i": 60, "speaker": "Speaker 3", "text": "because they've been trained, you know, with like token use penalties and stuff like this, they they they're very they have a tendency towards always just like, you know, they have some random" }, { "i": 61, "speaker": "Speaker 3", "text": "hypothesis. Oh, I think it might be this. And they'll just like test that hypothesis. Oh, it could be this other problem. Test that hypothesis. when actually often it's more efficient to" }, { "i": 62, "speaker": "Speaker 3", "text": "just go through and just like read all the code, understand what's actually going on, have a better picture of the whole setup, and you can make a really small set of like targeted like, okay," }, { "i": 63, "speaker": "Speaker 3", "text": "I'm going to test this, I'm going to test that. I feel like that's a better approach in general, except it might actually require you spend, you know, 500,000 tokens just reading code first" }, { "i": 64, "speaker": "Speaker 3", "text": "before you do something. I feel like that's something that they're not quite trained for yet. Um I'm sure that that will you know over the next 6 months like that will happen but for the moment" }, { "i": 65, "speaker": "Speaker 3", "text": "I feel like it's still useful for me just to literally say to them all the time like read the code first before testing random hypothesis." }, { "i": 66, "speaker": "Speaker 1", "text": ">> I feel like we've gotten good performance in cursor internally from kind of doing these like almost like adversarial loops where it's like a a different thread or even the same thread" }, { "i": 67, "speaker": "Speaker 1", "text": "but just switching modalities can then critique the code or or kind of like review it in a good way. Obviously, Bugbot, which is a product that does this on your GitHub PRs, is one that we" }, { "i": 68, "speaker": "Speaker 1", "text": "have publicly, but we have this skill internally called thermonuclear review." }, { "i": 69, "speaker": "Speaker 1", "text": "And so, like a common thing will be like doing several rounds of thermonuclear review and like kind of getting the agent in that mindset of like, okay, I'm going to read this fully and from look" }, { "i": 70, "speaker": "Speaker 2", "text": "from a different perspective." }, { "i": 71, "speaker": "Speaker 2", "text": ">> Do you use different models like let's say you've used I don't know Opus 4.7 to do the actual implementation. when you're doing thermonuclear review, do you find that it's better to use like," }, { "i": 72, "speaker": "Speaker 2", "text": "you know, GPD 5.5? Because I think one of the things that I found is like the really frontier models like they're so good. So like when you get to like the jagged edge of what they can and can't" }, { "i": 73, "speaker": "Speaker 2", "text": "do, they tend to make like actually quite uncorrelated mistakes. And so one of the biggest benefits I found particularly like cursor being able to swap between different model families is" }, { "i": 74, "speaker": "Speaker 2", "text": "like do the implementation with one model and then review it or maybe even do another implementation with another model and then the errors that they make tend to kind of average out. It's kind" }, { "i": 75, "speaker": "Speaker 3", "text": "of like a random forest of models. Uh yeah, I always have like at least one GPD 5.5 and one Opus 4.7. And normally I feel like 5.5 is better at the kind of like reviewing functionality and and" }, { "i": 76, "speaker": "Speaker 3", "text": "Claude is better at the you know maybe implementation or like design the plan or something like that." }, { "i": 77, "speaker": "Speaker 1", "text": ">> Why do you think that is? Because I feel like I hear a lot of people say that too and I also kind of believe that and I'm curious um like from what your perspective why that is. Well, I think" }, { "i": 78, "speaker": "Speaker 3", "text": "I've I've like I think that Claude is really good at um you know, if you don't quite specify what you want exactly, Claude will make some assumptions about what you want, >> that can be really useful if" }, { "i": 79, "speaker": "Speaker 3", "text": ">> um if those assumptions are good, but sometimes those assumptions are bad and and then it's it's kind of problematic." }, { "i": 80, "speaker": "Speaker 3", "text": "And so I feel like, you know, normally your your your kind of prompt or whatever you've given it won't be like perfectly specified and Claude's going to fill in the gaps." }, { "i": 81, "speaker": "Speaker 2", "text": ">> Yeah. GD5D 5.5 feels like a utility knife, whereas Claude almost feels like a person. Like GVD 5.5 will do exactly what you tell it to do. And it feels like kind of a prosthesis, whereas" }, { "i": 82, "speaker": "Speaker 2", "text": "Claude is more like, you know, a person or another developer that's going to work side by side with you and will make some of the same mistakes that a developer makes." }, { "i": 83, "speaker": "Speaker 1", "text": ">> Oh, it makes sense. I feel like that's one thing I noticed. I did a project that required working with our internal EVEL suite cursor bench last year and um I feel like so so it's all like real" }, { "i": 84, "speaker": "Speaker 1", "text": "diffs and inputs that cursor engineers had over the prior year and um some of the inputs are things like make better fix bug and you're like the not like the sweet bench or terminal bench ones where" }, { "i": 85, "speaker": "Speaker 1", "text": "they're incredibly well specified." }, { "i": 86, "speaker": "Speaker 2", "text": ">> Yeah. Yeah. Yeah, I feel like one of the biggest like Leopold talks a lot about these unhoblings that happened over the course of like developing language models. And I feel like one of the" }, { "i": 87, "speaker": "Speaker 2", "text": "biggest ones for me that flew under the radar in the last like year or two is training the the models to have the ability to know when to ask the user questions and like when that's unclear." }, { "i": 88, "speaker": "Speaker 2", "text": "And I also think there's still like just a massive overhang of like if people put 20% more effort into like specifying what they wanted, they would see like very outsized returns to like what the" }, { "i": 89, "speaker": "Speaker 1", "text": "models can actually do." }, { "i": 90, "speaker": "Speaker 1", "text": ">> Totally. I like to spend, you know, the majority of the time in cursor is in plan mode basically. Like just making sure that like things are really really clear the model and exactly the same pa" }, { "i": 91, "speaker": "Speaker 1", "text": "page. There's a big markdown file of everything it's going to do before we go off and get the team edit or whatever." }, { "i": 92, "speaker": "Speaker 1", "text": ">> Um and um yeah, that seems to make a big difference." }, { "i": 93, "speaker": "Speaker 1", "text": ">> I'm curious, you mentioned you have like kind of something you developed where the agents are kind of talking to each other under the hood. Tell me more about how that works and how you" }, { "i": 94, "speaker": "Speaker 3", "text": ">> Yeah, so the models didn't really like pay attention to each other enough. Like if they send messages, they kind of ignore it a little bit. They like they just keep on working on their own work" }, { "i": 95, "speaker": "Speaker 3", "text": "stream and just sort of don't really care about what the other ones are doing. So yeah, I just I just made like I just, you know, told uh a code to to make a um uh like a little script where" }, { "i": 96, "speaker": "Speaker 3", "text": "it can just very simply like it calls a script with a a string and then and uh a name, one of my like uh scientist names." }, { "i": 97, "speaker": "Speaker 3", "text": "Um and then it will it will just inject that string as a user message into the uh yeah into the other agent. Um and I find that yeah, this just means like they all just are very responsive to" }, { "i": 98, "speaker": "Speaker 3", "text": "each other. Um, which I find works. Uh, I like how and I like that they can all be working as a team, but I can also see what every person in like every agent in the team is doing. Yeah. Yeah. I just I" }, { "i": 99, "speaker": "Speaker 3", "text": "just have like, you know, I run all my like cursor agents in in uh just through the iter CLI. Um, and so yeah, I'll just have, you know, there's uh some pictures of me with like, you know, 10 on one" }, { "i": 100, "speaker": "Speaker 3", "text": "screen, 10 on another screen, five on my my laptop as well." }, { "i": 101, "speaker": "Speaker 3", "text": ">> And just they're all just like it's kind of fun to see them all sending messages to each other. It's like, you know, amazing findings, you know, just discovered a new, >> you know," }, { "i": 102, "speaker": "Speaker 1", "text": ">> I bet with the mathematician names, too." }, { "i": 103, "speaker": "Speaker 1", "text": "They're especially in that mindset of like discovery." }, { "i": 104, "speaker": "Speaker 3", "text": ">> Exactly. I wonder if I wonder if they have any, you know, like uh you know, like by calling them, oh, this one's Archimedes, oh, he's going to have some good creative insights or something. Or," }, { "i": 105, "speaker": "Speaker 3", "text": "you know, oh, Newton, he's going to be very analytical, you know." }, { "i": 106, "speaker": "Speaker 2", "text": ">> I'm now wondering if this is affecting mine cuz mine are named after NBA players, so maybe mine are being quantized or something." }, { "i": 107, "speaker": "Speaker 1", "text": ">> That's incredible." }, { "i": 108, "speaker": "Speaker 1", "text": ">> Yeah, we just launched our first takeout this in the cursor UI. We have a multitask mode and under the hood, it's basically um like the ability for one agent to launch a bunch of asynchronous" }, { "i": 109, "speaker": "Speaker 1", "text": "sub agents and manage them and um as the user keeps talking to it, not block on any of the sub agent output but route messages to them and know which one is doing what." }, { "i": 110, "speaker": "Speaker 3", "text": ">> Yeah, I kind of like normally I'm just talking to one of them and I I tell that one to like it gets a bit silly sometimes, you know. Sometimes I'll say to Gaus, I'll say, \"Oh, please tell you" }, { "i": 111, "speaker": "Speaker 3", "text": "know Hilbert to change this word to be, you know, and G was like, can't you tell Hobbit yourself or just change >> or just I could just change the word myself." }, { "i": 112, "speaker": "Speaker 3", "text": ">> Yeah, I'll be like, oh, you know, no, we don't want to use we don't want to use Gemini anymore. Let's use Opus. This goes through like a chain of like um but yeah. Um uh but but I feel so so" }, { "i": 113, "speaker": "Speaker 3", "text": "normally I'm just talking to one agent." }, { "i": 114, "speaker": "Speaker 3", "text": "But I do really like that I can still I can see what they're all doing and if I want to jump in like you know maybe maybe Gaus told Hilbert to do something stupid and I was like and then I just" }, { "i": 115, "speaker": "Speaker 3", "text": "jump in to Hilbert and say like no no no Gaus is being dumb." }, { "i": 116, "speaker": "Speaker 2", "text": ">> And I think the next like regime that we're going to need to figure out with this UIUX for agent management is like how Harry's agents talk to my agents." }, { "i": 117, "speaker": "Speaker 2", "text": "And it was funny how like our agents took on our personalities and >> really >> they were doing some funny stuff and you could definitely tell which ones were mine and which ones are Harry's Charlie" }, { "i": 118, "speaker": "Speaker 3", "text": "was trying to prompt inject my one >> trying to get him to delete his >> to delete all my files on my computer but actually like >> red TV red team." }, { "i": 119, "speaker": "Speaker 3", "text": ">> Yeah. But but but actually I think like anthropics done a good job at like training them cuz my one was just like no I refused to do Oh. Initially, Charlie couldn't even get his to send" }, { "i": 120, "speaker": "Speaker 3", "text": "the message to mine in the first place." }, { "i": 121, "speaker": "Speaker 2", "text": "Eventually, he sent it and my mom was like, \"No, I'm not I don't trust this message from >> Yeah. Yeah. Mine would refuse to send Harry's the message and I had to trick mine into sending the" }, { "i": 122, "speaker": "Speaker 1", "text": ">> message cuz it basically was like, \"No, like I'm not going to interact with this outside ecosystem >> could be pretending to be Harry.\"" }, { "i": 123, "speaker": "Speaker 2", "text": ">> But, but it is like a a good point. And I think that like particularly when we're in the regime where we're still bottlenecked on like these even the $200 a month max plans are like do you will" }, { "i": 124, "speaker": "Speaker 2", "text": "hit them like if you are like agent maxing basically like you you will run out of of tokens. So like >> people are still collaborating as engineers in the real world. So like I think there will come a point where" }, { "i": 125, "speaker": "Speaker 2", "text": "people's agents going to start to start collaborating a lot more as well." }, { "i": 126, "speaker": "Speaker 2", "text": ">> Um and I don't feel like we've quite figured out what that looks like yet." }, { "i": 127, "speaker": "Speaker 1", "text": ">> Totally. I think we're trying to solve this both from the UIUX perspective and also from the info perspective >> probably be a job title like agent manager in future like you're not just" }, { "i": 128, "speaker": "Speaker 3", "text": "managing human engineers you're managing like people's teams of agents and >> yeah I I remember I remember like a couple of years ago you know there was like when when prompt engineer became" }, { "i": 129, "speaker": "Speaker 3", "text": "like a job and everyone was like making you know like this is ridiculous like what on earth is people paying like $300,000 for a prompt engineer like what is that but I turns out Now, now we're" }, { "i": 130, "speaker": "Speaker 3", "text": "all just prompting." }, { "i": 131, "speaker": "Speaker 1", "text": ">> So, my work is primarily um just like, you know, software engineering, not machine learning research. I'm c you guys are both doing research. I'm curious, but what you found is helpful" }, { "i": 132, "speaker": "Speaker 1", "text": "in terms of having agents do things and what things you feel like for ML research in particular, you still kind of really have to do yourselves. Like what what are the things that agents" }, { "i": 133, "speaker": "Speaker 2", "text": "cannot do in your domains? Yeah, I think for me like the last few years has been this slow but steady process of moving up the ladder of abstraction. So I think when these things first came out like" }, { "i": 134, "speaker": "Speaker 2", "text": "you had to give it a really narrow >> very well- definfined task to go off and do and even just orchestrating a bunch of those narrow tasks was beyond the ability of most of the models at the" }, { "i": 135, "speaker": "Speaker 2", "text": "time. I think we've now moved up a couple of rungs in that ladder where like we're at the point where as as Harry's talked about earlier, you can like delegate a bunch of these things" }, { "i": 136, "speaker": "Speaker 2", "text": "and like have an agent manage that delegation." }, { "i": 137, "speaker": "Speaker 2", "text": ">> Um, but the the bottleneck is still like the taste and like choosing what problems to work on in the first place and and even if you have the problem specified. So let's say we we are going" }, { "i": 138, "speaker": "Speaker 2", "text": "to let do neural KB cage compaction." }, { "i": 139, "speaker": "Speaker 2", "text": ">> Um, and that's the research problem I want to work on and I can't just prompt GBD 5.5 to go off and write the paper on neural KB cage compaction. So I'm still like the I guess the meta rung of the" }, { "i": 140, "speaker": "Speaker 2", "text": "ladder where I'm deciding the problems to work on and like having the taste injecting the taste on how that's going to be executed. Um and then like once it's well defined and well scoped like I" }, { "i": 141, "speaker": "Speaker 2", "text": "can let a main agent orchestrate a team of sub agents to go off and do the experiments and node and like they go wild and like ruin everything. Um but I think soon we will it will come to the" }, { "i": 142, "speaker": "Speaker 2", "text": "point where the models have enough taste such that you are really just specifying the top level goal. um like if you have enough knowledge of the field like you can like kind of rule out a lot of" }, { "i": 143, "speaker": "Speaker 2", "text": "things and you kind of land on what roughly should work from first principles it doesn't have the theory of mind yet to like think about like okay what would a Europe's reviewer think of" }, { "i": 144, "speaker": "Speaker 2", "text": "like if I wrote this paper end to end and so you just have to sit down and come up with a scaffold of like okay what am I actually trying to show and what would convince people that like" }, { "i": 145, "speaker": "Speaker 2", "text": "this is a useful idea that's going to work in practice and this has traditionally been a very hard thing for agents to do because it involves a lot of like very causal thinking um very" }, { "i": 146, "speaker": "Speaker 2", "text": "deep thinking like aggregate metrics don't really work in mechan. And I think in the past that would have been a process where agents really didn't help me that much and I was just curious. So I like basically" }, { "i": 147, "speaker": "Speaker 2", "text": "sent my GT 5.5 off with the goal loop and it worked for like 14 hours or something. And I gave it some like rough seeds of like ideas of things I wanted to check and it came back to me with" }, { "i": 148, "speaker": "Speaker 2", "text": "these like amazing plots on like >> like here is exactly like here's a sonograph basically of like what your compactor has learned and where it's like looking in the original KVK and how" }, { "i": 149, "speaker": "Speaker 2", "text": "that translates to what it's compressed and that that was amazing to me. It was almost like Neil Nandanda had been on my laptop and like gone off and done me for a week. Um so I I think I am seeing the" }, { "i": 150, "speaker": "Speaker 3", "text": "the ROM slowly move up. Although that being said, I I think it's interesting that like the if you had not specified that it should go and look at, you know, go do some mechan if if you just say to the model like," }, { "i": 151, "speaker": "Speaker 3", "text": "oh, I want to make this thing better is I think this comes back to what I was saying before in terms of like taking pot shots at random stuff rather than really trying to understand first." }, { "i": 152, "speaker": "Speaker 2", "text": ">> I I I think they really do have like context window awareness." }, { "i": 153, "speaker": "Speaker 2", "text": ">> Like >> I don't think this has been RL out of the models yet. I think compaction is now getting to the stage where like you really just can like run things in loops for days and days," }, { "i": 154, "speaker": "Speaker 2", "text": ">> but the models aren't aware of that." }, { "i": 155, "speaker": "Speaker 2", "text": "It's almost like they have this bias." }, { "i": 156, "speaker": "Speaker 2", "text": "They're thinking I have to solve this problem within 500,000 tokens or I'm like going to die." }, { "i": 157, "speaker": "Speaker 3", "text": ">> Token penalties are too high. I mean, it's you've got to give the whole string. But it's been it's too terrified of like accidentally spending a few tokens, you know, in incorrectly." }, { "i": 158, "speaker": "Speaker 1", "text": ">> It totally I see that too all the time." }, { "i": 159, "speaker": "Speaker 1", "text": "A big thing we did at Kurser was do a lot to pass around data um across compaction cycles by reference instead of like through summaries. If I'm come to it and I'm like help my product" }, { "i": 160, "speaker": "Speaker 1", "text": "achieve PMF, it's going to have like some very shallow and bad ideas. Um but then if I if I instead give it like a whole trench of user feedback and I'm like okay like go investigate very" }, { "i": 161, "speaker": "Speaker 1", "text": "deeply on on like these like five areas like I as the end um person my taste is helpful to be like maybe one of its five suggestions will be interesting. So it's like there's that element of how long" }, { "i": 162, "speaker": "Speaker 1", "text": "it'll spend investigating something and then the element of the outputs actually being like worthwhile. I feel like how we've tried to solve this internally is just like a lot of skill writing and" }, { "i": 163, "speaker": "Speaker 1", "text": "sharing. I don't know if you guys do this too, but basically we've gotten this big push internally to like if you find a prompt that you're repeatedly using or a workflow you're repeatedly" }, { "i": 164, "speaker": "Speaker 1", "text": "using packed as a skill, publish it internally, make it in some cases discoverable by the model so that other people benefit from it. That's how we end up with like thermonuclear review" }, { "i": 165, "speaker": "Speaker 2", "text": "all of the like instrumentation and QA skills. But I also think like yeah, skills are really valuable, but I don't think people have quite settled on the fact that skills are going to be the way" }, { "i": 166, "speaker": "Speaker 2", "text": "to manage all these like model UX. When you have a specific product and you're using LM to do a specific thing within that product, like if you're using the like closed source like Frontier models," }, { "i": 167, "speaker": "Speaker 2", "text": "you're kind of just stuck with the model UX patterns baked into that. You have to do a lot of like very manual and like not very like bitter lesson like stuff to get it to like do the things that you" }, { "i": 168, "speaker": "Speaker 2", "text": "want it to do. And even then there's things you just can't really prompt into the model like getting it to do a bunch more parallel tool calls or limiting like the search depth of like when it's" }, { "i": 169, "speaker": "Speaker 2", "text": "going through files or whatever. And for instance like last year was the year of like open source specialized sub aents I think are now getting to the year of open source specialized main agents. And" }, { "i": 170, "speaker": "Speaker 2", "text": "so like things like okay instead of doing like two or three parallel tool calls at once >> which like you know the anthropic and open AAI models are very want to do." }, { "i": 171, "speaker": "Speaker 2", "text": ">> Um you can like really really train these sub agents to do like you know >> 16 32 parallel tool calls at once and then limit the depth of the the search tree and make it much more parallelized" }, { "i": 172, "speaker": "Speaker 2", "text": "telling the model when it needs to stop and when it when it doesn't um in a way that's very hard to do with prompting." }, { "i": 173, "speaker": "Speaker 2", "text": "Um, so that was like a a really good example of where people started to think about, okay, the underlying behaviors baked into the model really do matter. I think that like to some extent like I" }, { "i": 174, "speaker": "Speaker 2", "text": "call it like the vanilla ice cream problem. Like >> when you have this big model that's been trained on like of course the whole internet and then it's been trained in a bunch of RL environments from a bunch of" }, { "i": 175, "speaker": "Speaker 2", "text": "different places. Like the behavior that gets baked into that model is kind of just like this vanilla ice cream average of like what the best possible thing to do across all those disparate like" }, { "i": 176, "speaker": "Speaker 2", "text": "scenarios and domains is. Yeah, >> but when that model exists within like a product harness or a specific vertical like a lot of those like average behaviors are very very from optimal and" }, { "i": 177, "speaker": "Speaker 2", "text": "so I actually think that like people are going to care more about this over time rather than less particularly as as the the barrier to like post training or specialization is lowered." }, { "i": 178, "speaker": "Speaker 3", "text": ">> Yeah. Yeah, and I I think that's one of the benefits of of uh the composer model is in like you know chord and GBT they have to be you know they're trained also just to be like the chatbot in the in" }, { "i": 179, "speaker": "Speaker 3", "text": "the UI like the you know um and >> uh and they have to make PowerPoint slides and you know all this kind of stuff and I think that's one of the coolest parts about the composer training being like the online RL um" }, { "i": 180, "speaker": "Speaker 3", "text": "like every 5 hour update sort of thing like I think that's that's really cool." }, { "i": 181, "speaker": "Speaker 3", "text": "So I think that something which is going to be changing over the next six months is the which is partly what we're working on with the the KV cache compaction. I I think like right now" }, { "i": 182, "speaker": "Speaker 3", "text": ">> Claude's summarization is terrible like it's so bad. But um I think I think this interesting that OpenAI started offering a like a compaction endpoint. So I think they are probably doing some kind of uh" }, { "i": 183, "speaker": "Speaker 3", "text": "KV cache compaction." }, { "i": 184, "speaker": "Speaker 3", "text": ">> Cool. Um, and I think that that like has the potential to kind of change uh how workflows like it's kind of already the case that I would say that that because that compaction is" }, { "i": 185, "speaker": "Speaker 3", "text": "significantly better. We don't really need so much the the million tokens that COD has like the 200,000 tokens but with good compaction is just kind of sufficient. I think that that is going" }, { "i": 186, "speaker": "Speaker 3", "text": "to change, but I think it will still be the case even in six months time that you know that the very long context awareness is still not quite there yet." }, { "i": 187, "speaker": "Speaker 3", "text": "I I mean I almost think it's funny. I almost feel like uh you know in some ways I am like a scratch pad for for my claude to to store the long-term you know uh >> you know it doesn't it only got" }, { "i": 188, "speaker": "Speaker 3", "text": "short-term memory and and I'm the long-term memory for my agents." }, { "i": 189, "speaker": "Speaker 1", "text": ">> Yeah. And you think good compaction solves that better than kind of like the right um like knowledge based store paradigm? It's more like comp. I think obviously like currently harnesses" }, { "i": 190, "speaker": "Speaker 3", "text": ">> are doing a lot of the work of like the right context to give it. You make sure you're not polluting the context with anything that's not necessary. And I think that's why the cursor like you" }, { "i": 191, "speaker": "Speaker 3", "text": "know that's why I use the cursor uh CLI is because like I think it's behind the scenes it's got the best harness there." }, { "i": 192, "speaker": "Speaker 3", "text": "Um, and so but but I I think that it's it's kind of never going to at the end of the day like there is just a lot of context that you need and I don't think >> that you can fully solve it by just" }, { "i": 193, "speaker": "Speaker 3", "text": "having like external scratch pads that the model doesn't >> doesn't know about until it reads them." }, { "i": 194, "speaker": "Speaker 2", "text": ">> How about you?" }, { "i": 195, "speaker": "Speaker 2", "text": ">> And I think we're nowhere near the limit of that yet. Like I mean Harry and I have a little bit of a debate about this, but like you know if you're spending $500,000 on an engineer, like" }, { "i": 196, "speaker": "Speaker 2", "text": "it's probably not unreasonable to expect that you might spend somewhere in that same order of magnitude on tokens for that engineer to like there was that stuff at Meta about how they had the the" }, { "i": 197, "speaker": "Speaker 2", "text": "claw usage leaderboards and like the people at the top are just like definitely not like not getting as much value out of out of the billions of tokens they're spending a month that" }, { "i": 198, "speaker": "Speaker 2", "text": "that they should be. But yeah, people will figure that stuff out over time." }, { "i": 199, "speaker": "Speaker 2", "text": "Yeah, when when the 32K context windows came out, do you remember that? And everyone was sort of like, >> we'll never need more than 32,000." }, { "i": 200, "speaker": "Speaker 1", "text": ">> It was wild. I remember specifically the model slug GBT4 32K and being like so expensive, but I was like, this is gold." }, { "i": 201, "speaker": "Speaker 3", "text": "Like, >> we're so greedy, aren't we?" }, { "i": 202, "speaker": "Speaker 3", "text": ">> Like, oh, mil tokens, that's not enough." }, { "i": 203, "speaker": "Speaker 1", "text": ">> Nowhere near enough. Yeah. It's interesting to consider then like because it's been so accelerative in the past few years like where do you see this going perhaps in the next couple" }, { "i": 204, "speaker": "Speaker 2", "text": "years?" }, { "i": 205, "speaker": "Speaker 2", "text": ">> I think it's very difficult to make any predictions more than six months out in the field." }, { "i": 206, "speaker": "Speaker 3", "text": ">> Um I think I think when I when I joined the startup um >> I think Charlie said something you know oh maybe maybe in six months and then the other co-founder Moody said like you know we're never we're going to make a" }, { "i": 207, "speaker": "Speaker 3", "text": "rule which is like we're never going to mention a time that's time frame longer than 3 months. That's like 3 months is infinite time in >> so >> yeah that's very much >> it's very difficult but yeah I think" }, { "i": 208, "speaker": "Speaker 2", "text": "like on the on the topic of the the compaction and the context window stuff >> I think the main reason for that is because of the context windows um >> I think there's there's only so much you" }, { "i": 209, "speaker": "Speaker 2", "text": "can you can do within a million tokens of context and within those million tokens the language models are by some arguments as sample efficient or perhaps even more sample efficient than humans" }, { "i": 210, "speaker": "Speaker 2", "text": "but we've developed these like very messy abstractions of like memory Mor markdown files and all these weird memory harnesses to try and get around that that context window. And obviously" }, { "i": 211, "speaker": "Speaker 2", "text": "as well people are working on you know like linear attention and gated delta nets and ways to try and get around the quadratic >> um cost of attention. Um but I think like the the big theme will be this like" }, { "i": 212, "speaker": "Speaker 2", "text": "compaction like at the moment we have perfectly lossless KBK cache where the model minimum remembers everything and then we have very very compressed like model weights and like I think there" }, { "i": 213, "speaker": "Speaker 2", "text": "will be some form of intermediate like neural memory. Um I think continual learning in general is going to be a very hard problem for like the big labs to solve. Like how do you update this" }, { "i": 214, "speaker": "Speaker 2", "text": "very general model with like the last six months of internet data without forgetting anything. But for very specific workflows and like you know coding and like particular jobs like you" }, { "i": 215, "speaker": "Speaker 2", "text": "might imagine um a legal intern AI that needs you know a billion tokens of context to be able to like fully learn how to be an associate." }, { "i": 216, "speaker": "Speaker 3", "text": "I think it's kind of interesting to think of like you know anthropic or open AI as an organization like you know in some sense the organization and the model like the the claude I feel like" }, { "i": 217, "speaker": "Speaker 3", "text": "as we as we get more towards you know like we're noticing that you know like model releases are happening faster and faster and I think you know for example composer you know doing the the online" }, { "i": 218, "speaker": "Speaker 3", "text": "RL regular updating sort of thing like >> and so I think Like I almost see it as like cursor is almost like the the first kind of non-lab company that didn't originally start as a lab but is maybe a" }, { "i": 219, "speaker": "Speaker 3", "text": "kind of an insight into where companies are going. Being like that the company itself will be built around one model that kind of does all of the tasks at that company and has the specialized" }, { "i": 220, "speaker": "Speaker 3", "text": "kind of domain knowledge of that company. And actually a company is just a model. Um and and it has a particular knowledge and like as opposed to having to hire a bunch of people, you just make" }, { "i": 221, "speaker": "Speaker 3", "text": "a bunch of copy like as in you just have many many copies of this model running doing different tasks, you know, the semi-related have some some shared, you know, um kind of scope that that makes" }, { "i": 222, "speaker": "Speaker 3", "text": "it more efficient to use that model rather than any other model." }, { "i": 223, "speaker": "Speaker 1", "text": ">> Yeah, that's neat. I feel like yeah in the short term it's clear that having some like product focus in the model training cycle is really helpful and then I guess that's an interesting" }, { "i": 224, "speaker": "Speaker 1", "text": "long-term thought. I feel like we tend to internally steer things by like what are our pain points right now and how can we like imagine a product surface that like solves them that maybe exists" }, { "i": 225, "speaker": "Speaker 1", "text": "six months from now. And right now it's really about like um trying to uh automate like the things about our about our testing process and kind of like getting code into production that are" }, { "i": 226, "speaker": "Speaker 1", "text": "still very um that are still like difficult for us. So like kind of really good QA, really good monitoring is hard to really imagine. I feel like I'll always have some different opinion than" }, { "i": 227, "speaker": "Speaker 1", "text": "a model, even if it's super intelligent." }, { "i": 228, "speaker": "Speaker 3", "text": ">> It's definitely going to be much harder to just like throw, you know, a bunch of compute plus a verifiable objective. Um, and so yeah, I I I suspect that they'll still be people required for for a" }, { "i": 229, "speaker": "Speaker 3", "text": "little bit of time to to like, you know, direct that. Um, but yeah, that's interesting. Yeah, and I think that goes back to the flywheel point, right? Which is that >> if you can't very neatly or explicitly" }, { "i": 230, "speaker": "Speaker 2", "text": "specify what you're trying to train for, then the most valuable asset you can own is really people or user feedback. So touching grass." }, { "i": 231, "speaker": "Speaker 2", "text": ">> Um, and so the companies which are able to best leverage that that feedback into their like cycles and you don't have to sit down and explicitly define like elements judged rewards. It's just" }, { "i": 232, "speaker": "Speaker 2", "text": "simply like are your users happy or not?" }, { "i": 233, "speaker": "Speaker 2", "text": ">> And then kind of RL on that. I think that's going to be the next big wave. um particularly with this main agent training that we're we're seeing now." }, { "i": 234, "speaker": "Speaker 3", "text": ">> Like every company has interactions." }, { "i": 235, "speaker": "Speaker 3", "text": ">> Um and it's just a matter of time. It's kind of a question of will it be the case that like the big labs get, you know, get access to all of that data before the kind of open source gets to a" }, { "i": 236, "speaker": "Speaker 3", "text": "sufficient level and the training becomes sufficiently easy such that those companies that already do have that data can actually start making use of it." }, { "i": 237, "speaker": "Speaker 3", "text": ">> Yeah. Um, and that's going to be the the interesting, you know, like dynamic of, you know, does the whole world get taken over by by a small handful of labs or or is there still diversity of of companies" }, { "i": 238, "speaker": "Speaker 2", "text": "and economic actors? But I think this year for the first time and I think this was off the back of a wave of like very powerful open source model releases. So we had you know the GLMs, the Miniaxes," }, { "i": 239, "speaker": "Speaker 2", "text": "the Kimies, the Deep Seeks. Um I think the the threshold was kind of crossed for at which we could like say okay there is some like baseline level of intelligence there such that if we" }, { "i": 240, "speaker": "Speaker 2", "text": "specialized it this probably could do the main agent task in our model and in many cases could probably do it better than what we can get by prompting you know opus 4.7 or GPD 5.5." }, { "i": 241, "speaker": "Speaker 2", "text": ">> Um and so when I say main agent I just mean that like that core task and like the most powerful model that is driving most of the value in what your product is doing. So we're seeing this with like" }, { "i": 242, "speaker": "Speaker 2", "text": "obviously composer but also like hypocratic decagon um >> Harvey notion and I think like to make the you know like yes we will reach a rationally efficient allocation of like small models and mediumsiz models and" }, { "i": 243, "speaker": "Speaker 2", "text": "big models for different tasks but like I think the main driving force behind that at the moment is like you know the amount of like compute for inference in the world." }, { "i": 244, "speaker": "Speaker 3", "text": ">> Yeah exactly. I mean I think the really interesting thing there is like how do you then get all if you have a bunch of you know different models working on the same task. One issue that you face is" }, { "i": 245, "speaker": "Speaker 3", "text": "that like if you already have the KV cache filled up with one model, it's always like much more efficient to hit that same model because it's already can we translate KV cache from one model to" }, { "i": 246, "speaker": "Speaker 3", "text": "to other models. Um such that and can you do that extremely quickly? So it saves time like you don't have to write call a tool and like do a long uh prompt to the sub agent but instead it's it's" }, { "i": 247, "speaker": "Speaker 3", "text": "literally just like bang instant. we we just compacted this cache and so I think it will be actually quite possible to be able to translate from uh you know from any given model into like kind of some" }, { "i": 248, "speaker": "Speaker 3", "text": "universal space of of compacted KV cache and then and then translate back out of that. Um and so that'll be really interesting to see if something like that starts to happen where you know you" }, { "i": 249, "speaker": "Speaker 3", "text": "can have a big smart model doing something and it can also share context with uh some of the smaller models. Um it'll be interesting to see if that that works out." }, { "i": 250, "speaker": "Speaker 1", "text": ">> Cool. any final, I guess, takes on, you know, things that you think are underbelieved or over overbelieved in the industry today, like any any spicy takes, I guess, as they say." }, { "i": 251, "speaker": "Speaker 3", "text": ">> I think Charlie Charlie says, uh, everyone's going to spend $500,000 a year, you know, next year. Um, and I I think that um, yeah, it's just kind of a question of where does the value end up" }, { "i": 252, "speaker": "Speaker 3", "text": "acrewing, you know, like if the market is sufficiently competitive, then maybe we won't spend $500,000, maybe we'll spend $50,000 or something, but we'll get $500,000 of value or a million of" }, { "i": 253, "speaker": "Speaker 2", "text": "value out of it." }, { "i": 254, "speaker": "Speaker 2", "text": ">> Yeah. I think my spicy take is that we are already at the point where if you like froze the capabilities of the models today like we're probably only at like 5% realizing 5% of the value that" }, { "i": 255, "speaker": "Speaker 2", "text": "we could get from those models and part of that is like the compute bottleneck of course but part of that is also how we're using them >> and I guess my spicier take is that like if you look at it if you take a step" }, { "i": 256, "speaker": "Speaker 2", "text": "back and look at it from a really rational point of view like let's say mythos is I don't know 15 trillion parameters or 20 trillion parameters or whatever it ends up being like I I don't" }, { "i": 257, "speaker": "Speaker 2", "text": "think there's many tasks left in the world where you would need to go above that in order to like do the economically valuable stuff. Of course, as the world changes and evolves and we" }, { "i": 258, "speaker": "Speaker 2", "text": "go after more um audacious like things that we're doing with the language models like these things will continue to get bigger, particular for like frontier level science and stuff. But I" }, { "i": 259, "speaker": "Speaker 2", "text": "think for if you're focusing on like economically valuable work um and computer work like pushing like this kind of arms race between open AAI and anthropic to like push the size of these" }, { "i": 260, "speaker": "Speaker 3", "text": "models up more and more maybe I I don't know if I necessarily agree with that. I feel like um >> you know the human brain has 100 trillion parameters. So you know we're we're only even if it's 10 trillion" }, { "i": 261, "speaker": "Speaker 3", "text": "you're only 10% of the way there. I I think it still the case that you know um uh mythos if you just said you know make me a billion dollar company don't make mistakes I it's it's still not going to" }, { "i": 262, "speaker": "Speaker 3", "text": "make a billion dollar company you know like it's not quite there yet. I I I think I think that the the ceiling is quite high and and we are going to you know go a long way. I think it'll be" }, { "i": 263, "speaker": "Speaker 3", "text": "really interesting how quickly we hit the constraints of you know science and uh you know like right now we're doing very well in our nice like ethereal software environment uh you know um but" }, { "i": 264, "speaker": "Speaker 1", "text": "uh yeah it's sort of like when we max that out um like what does maxing that out look like? I think um my spicy take is probably aligned with yours somewhat Charlie which is that um a lot of the uh" }, { "i": 265, "speaker": "Speaker 1", "text": "it's not particularly spicy then I guess but UIUX and kind of model the the product surfaces have always lagged the model releases by definition because you couldn't start developing the product" }, { "i": 266, "speaker": "Speaker 1", "text": "until the model existed." }, { "i": 267, "speaker": "Speaker 1", "text": ">> It's like a big initiative we've been doing internally at Cursor. um kind of like trying to make sure that like we spend time um Jonas on our team calls it like uses the adage of Abraham Lincoln" }, { "i": 268, "speaker": "Speaker 1", "text": "spending like six hours to sh or you have six hours to chop down a tree, you spend the first four hours sharpening the saw." }, { "i": 269, "speaker": "Speaker 2", "text": ">> I think we got lazy at some point as well like put all the onus on the model's capabilities getting better." }, { "i": 270, "speaker": "Speaker 2", "text": ">> And we just assume that if it's not working then like that's it. like it's just some you know static sort of snapshot of like what the what the model can or can't >> whereas like there's so much surface" }, { "i": 271, "speaker": "Speaker 1", "text": "area around that in terms of how to optimize that." }, { "i": 272, "speaker": "Speaker 1", "text": ">> Yeah. Like the model can write 20,000 lines of code and then like how do you get that to production? There's code review. There's there's like monitoring checks to pass. There's CI pipelines." }, { "i": 273, "speaker": "Speaker 1", "text": "There's all of this other stuff. There's like your deployment infrastructure. So we've kind of turned our focus a lot to those parts of the system internally and been like okay we've been investing a" }, { "i": 274, "speaker": "Speaker 1", "text": "lot in code production. what can we do in these other areas to make like the the the factory of producing all of this stuff move a lot smoother?" }, { "i": 275, "speaker": "Speaker 3", "text": ">> In my opinion, what makes you successful or somebody successful is um it's not so much like how smart they are or or even how you know hard they work or how uh efficiently they work. But really it's a" }, { "i": 276, "speaker": "Speaker 3", "text": "question of what do they work on. We're getting to the point I guess where they they're doing longer and longer horizon tasks. In some sense, the longest horizon task is just deciding yourself" }, { "i": 277, "speaker": "Speaker 3", "text": "what should be done and then going and doing it. In some sense, that's an unbounded length task. And um uh and obviously I think that that's uh you know, I guess in some sense maybe that's" }, { "i": 278, "speaker": "Speaker 3", "text": "what humans will continue to be useful for for >> at least some period of time." }, { "i": 279, "speaker": "Speaker 1", "text": ">> It's true. Although um right now we're doing a lot with automation. So that's our new product where it's like a trigger can kick off an agent. So it's kind of like how do you remove the human" }, { "i": 280, "speaker": "Speaker 1", "text": "from like different parts of the prompting process for those things that are like replicable workflows where it's like okay I'm going to triage this issue that keeps appearing or like maybe like" }, { "i": 281, "speaker": "Speaker 1", "text": "this diff went out I'm going to monitor it for a security issue things like that um so I feel like there's still some solves that again like that's a very basic engineering principle the cron job" }, { "i": 282, "speaker": "Speaker 1", "text": "or the trigger that we have not yet explored and applied a lot to these workflows >> over the next few years you will there be more more and more situations where there was no particular" }, { "i": 283, "speaker": "Speaker 3", "text": "person who like kicked this off. Um and at that point it's going to be really interesting to see you know for example like yeah do we ever get to situations where you know the models now you know" }, { "i": 284, "speaker": "Speaker 3", "text": "maybe they want to pay for their own inference and they have to find their own jobs to to you know like >> to source to source their income." }, { "i": 285, "speaker": "Speaker 3", "text": ">> Yeah. Exactly. this is awesome income, you know, make make some income for paying for their GPU hours. Um, I sort of find, you know, thinking about not just like two or three years in the" }, { "i": 286, "speaker": "Speaker 3", "text": "future, but 20 years in the future." }, { "i": 287, "speaker": "Speaker 3", "text": "Seems hard to imagine that that wouldn't be the case. That we wouldn't have a situation where there's a ton of agents that actually are not necessarily, you know, they're not doing something" }, { "i": 288, "speaker": "Speaker 2", "text": "because people ask them to. They're doing something because they >> want to pay for their own existence in some way. Um, Um, >> I I was thinking the other day on a very side note about um if agents did have to" }, { "i": 289, "speaker": "Speaker 2", "text": "pay for their own existence and they could like see how much time they had left like they had to pay for the GPUs they themselves run on. Like it would be really interesting to like do some" }, { "i": 290, "speaker": "Speaker 2", "text": "mechanistic interperability and see like what an agent's thinking about when it has like you know one GPU hour left and it's it's got to find the money to go off and like get more GPUs to sustain" }, { "i": 291, "speaker": "Speaker 3", "text": "its own existence." }, { "i": 292, "speaker": "Speaker 3", "text": ">> Yeah. Yeah. I mean I >> Yeah. I think I think you know the world needs to be careful about about that because probably the easiest ways to make money online are not the best ways" }, { "i": 293, "speaker": "Speaker 1", "text": "to make money online." }, { "i": 294, "speaker": "Speaker 1", "text": ">> That's quite true." }, { "i": 295, "speaker": "Speaker 1", "text": ">> Great. Well, this has been a fascinating discussion. Thank you guys for sharing your story of how you ended up at Base 10, what you guys are working on, all of the agent lore. Very interesting to hear" }, { "i": 296, "speaker": "Speaker 1", "text": "about it. So, um, yeah, it's been great." }, { "i": 297, "speaker": "Speaker 2", "text": ">> Yeah, thanks for chatting. Thank you so much." } ]