My Account Log in

1 option

Generative AI in the Real World : Context Engineering with Drew Breunig

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Sound recording
Author/Creator:
Breunig, Drew.
Contributor:
Lorica, Ben.
Language:
English
Subjects (All):
Artificial intelligence.
Physical Description:
1 online resource (1 audio file)
Place of Publication:
O'Reilly Media, Inc. 2025
Summary:
In this episode, Ben Lorica and Drew Breunig, a strategist at the Overture Maps Foundation, talk all things context engineering: what's working, where things are breaking down, and what comes next. Listen in to hear why huge context windows aren't solving the problems we hoped they might, why companies shouldn't discount evals and testing, and why we're doing the field a disservice by leaning into marketing and buzzwords rather than trying to leverage what current crop of LLMs are actually capable of. About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone's agenda. In 2025, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise. Transcript This transcript was created with the help of AI and has been lightly edited for clarity. 00.00: All right. So today we have Drew Breunig. He is a strategist at the Overture Maps Foundation. And he's also in the process of writing a book for O'Reilly called the Context Engineering Handbook. And with that, Drew, welcome to the podcast. 00.23: Thanks, Ben. Thanks for having me on here. 00.26: So context engineering. . . I remember before ChatGPT was even released, someone was talking to me about prompt engineering. I said, "What's that?" And then of course, fast-forward to today, now people are talking about context engineering. And I guess the short definition is it's the delicate art and science of filling the context window with just the right information. What's broken with how teams think about context today? 00.56: I think it's important to talk about why we need a new word or why a new word makes sense. I was just talking with Mike Taylor, who wrote the prompt engineering book for O'Reilly, exactly about this and why we need a new word. Why is prompt engineering not good enough? And I think it has to do with the way the models and the way they're being built is evolving. I think it also has to deal with the way that we're learning how to use these models. And so prompt engineering was a natural word to think about when your interaction and how you program the model was maybe one turn of conversation, maybe two, and you might pull in some context to give it examples. You might do some RAG and context augmentation, but you're working with this one-shot service. And that was really similar to the way people were working in chatbots. And so prompt engineering started to evolve as this thing. 02.00: But as we started to build agents and as companies started to develop models that were capable of multiturn tool-augmented reasoning usage, suddenly you're not using that one prompt. You have a context that is sometimes being prompted by you, sometimes being modified by your software harness around the model, sometimes being modified by the model itself. And increasingly the model is starting to manage that context. And that prompt is very user-centric. It is a user giving that prompt. But when we start to have these multiturn systematic editing and preparation of contexts, a new word was needed, which is this idea of context engineering. This is not to belittle prompt engineering. I think it's an evolution. And it shows how we're evolving and finding this space in real time. I think context engineering is more suited to agents and applied AI programing, whereas prompt engineering lives in how people use chatbots, which is a different field. It's not better and not worse. And so context engineering is more specific to understanding the failure modes that occur, diagnosing those failure modes and establishing good practices for both preparing your context but also setting up systems that fix and edit your context, if that makes sense. 03.33: Yeah, and also, it seems like the words themselves are indicative of the scope, right? So "prompt" engineering means it's the prompt. So you're fiddling with the prompt. And [with] context engineering, "context" can be a lot of things. It could be the information you retrieve. It might involve RAG, so you retrieve information. You put that in the context window. 04.02: Yeah. And people were doing that with prompts too. But I think in the beginning we just didn't have the words. And that word became a big empty bucket that we filled up. You know, the quote I always quote too often, but I find it fitting, is one of my favorite quotes from Stuart Brand, which is, "If you want to know where the future is being made, follow where the lawyers are congregating and the language is being invented," and the arrival of context engineering as a word came after the field was invented. It just kind of crystallized and demarcated what people were already doing. 04.36: So the word "context" means you're providing context. So context could be a tool, right? It could be memory. Whereas the word "prompt" is much more specific. 04.55: And I think it also is like, it has to be edited by a person. I'm a big advocate for not using anthropomorphizing words around large language models. "Prompt" to me involves agency. And so I think it's nice--it's a good delineation. 05.14: And then I think one of the very immediate lessons that people realize is, just because. . . So one of the things that these model providers do when they have a model release, one of the things they note is, What's the size of the context window? So people started associating context window [with] "I stuff as much as I can in there." But the reality is actually that, one, it's not efficient. And two, it also is not useful to the model. Just because you have a massive context window doesn't mean that the model treats the entire context window evenly. 05.57: Yeah, it doesn't treat it evenly. And it's not a one-size-fits-all solution. So I don't know if you remember last year, but that was the big dream, which was, "Hey, we're doing all this work with RAG and augmenting our context. But wait a second, if we can make the context 1 million tokens, 2 million tokens, I don't have to run RAG on all of my corporate documents. I can just fit it all in there, and I can constantly be asking this. And if we can do this, we essentially have solved all of the hard problems that we were worrying about last year." And so that was the big hope. And you started to see an arms race of everybody trying to make bigger and bigger context windows to the point where, you know, Llama 4 had its spectacular flameout. It was rushed out the door. But the headline feature by far was "We will be releasing a 10 million token context window." And the thing that everybody realized is. Like, all right, we were really hopeful for that. And then as we started building with these context windows, we started to realize there were some big limitations around them. 07.01: Perhaps the thing that clicked for me was in Google's Gemini 2.5 paper. Fantastic paper. And one of the reasons I love it is because they dedicate about four pages in the appendix to talking about the kind of methodology and harnesses they built so that they could teach Gemini to play Pokémon: how to connect it to the game, how to actually read out the state of the game, how to make choices about it, what tools they gave it, all of these other things. And buried in there was a real "warts and all" case study, which are my favorite when you talk about the hard things and especially when you cite the things you can't overcome. And Gemini 2.5 was a million-token context window with, eventually, 2 million tokens coming. But in this Pokémon thing, they said, "Hey, we actually noticed something, which is once you get to about 200,000 tokens, things start to fall apart, and they fall apart for a host of reasons. They start to hallucinate. One of the things that is really demonstrable is they start to rely more on the context knowledge than the weights knowledge. 08.22: So inside every model there's a knowledge base. There's, you know, all of these other things that get kind of buried into the parameters. But when you reach a certain level of context, it starts to overload the model, and it starts to rely more on the examples in the context. And so this means that you are not taking advantage of the full strength or knowledge of the model. 08.43: So that's one way it can fail. We call this "context distraction," though Kelly Hong at Chroma has written an incredible paper documenting this, which she calls "context rot," which is a similar way [of] charting when these benchmarks start to fall apart. Now the cool thing about this is that you can actually use this to your advantage. There's another paper out of, I believe, the Harvard Interaction Lab, where they look at these inflection points for. 09.13: Are you familiar with the term "in-context learning"? In-context learning is when you teach the model to do something that doesn't know how to do by providing examples in your context. And those examples ill...
Notes:
OCLC-licensed vendor bibliographic record.
OCLC:
1546904116

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account