Why LLMs require judgement as much as context

Why LLMs require judgement as much as context
Why LLMs require judgement as much as context
Large language models (LLMs) have reached a turning point. Scale brought us here, but the next breakthrough is about cultivating judgement: the ability to discern which information matters, when it matters and how it should shape decisions in real time.   

We’ve been conditioned to equate bigger with better. Larger models, more parameters, expanded context windows. Yet, something fundamental is missing from this equation. A model that can access everything but prioritise nothing hasn’t become more intelligent, it’s simply accumulated more potential points of failure.  

 

As LLMs evolve into agentic systems capable of reasoning and autonomous action, their ability to filter signal from noise, weigh relevance and anchor decisions in what truly matters will determine how capable they are.   

Scale alone isn’t enough   

The context windows for LLMs, how much recent text it can remember and use to shape its next response, have dramatically expanded in recent years. They have grown from a few thousand tokens to a few hundred thousand tokens, and in some cases have even reached a million tokens. In theory, this should allow LLMs to read and reason across entire documents, sustain longer conversations, and use information from multiple sources to produce more coherent answers.   

However, Stanford’s 2025 AI Index shows that the standard tests for language model proficiency amongst leading LLMs are producing near identical results despite wide differences in model size and memory. This suggests that increased scale is not enough to make a meaningful difference to LLM efficacy.  

At the same time, using larger LLMs is more costly. This isn’t necessarily a bad thing as bigger contexts ensure that LLMs can handle longer documents, recall past exchanges, and reason across complex information. But it’s important for business ROI that the higher spend on compute is matched by better outputs.   

Nvidia estimates that keeping a 128K token conversation (which is roughly the length of a short book) in an LLM’s working memory can consume about 40 gigabytes of graphics processing unit (GPU)  memory. This means that one long chat can max out an entire GPU, which is very costly for potentially only marginal gains in performance.   

More data doesn’t mean better answers  

LLMs need the right data to produce answers that are accurate, relevant, and useful. Today, they are being fed more information than ever in a bid to make their responses richer and more precise. This can include recent documents, data from internal knowledge bases, previous chat histories, database records, and live information pulled from APIs or other connected applications.  

Each of these sources adds useful information, but they also bring more complexity. The data is often scattered across different systems, updated at different speeds, and stored in different formats, so stitching it all together takes longer and more computing power. The crux of the issue however is that even with all that data, LLMs aren’t guaranteed to use the right information at the right time.  

Stanford and Berkeley’s Lost in the Middle research shows that when models are flooded with long contexts, they often fail to recall what matters most. In other words, simply giving LLMs more information doesn’t help if they can’t recognise what’s relevant.  

For example, a customer support bot scrolling through an entire chat history instead of focusing on the last issue you raised, is slowed down by the additional information and is not able to make a better judgement simply because it has access to more data.  

The same issue can crop up in enterprise search. Ask an AI assistant for your company’s latest travel policy, and it might pull up five versions — including one from 2019 — because it can’t judge which source is current. The answer looks comprehensive, but it’s not actually useful.  

In short, the problem isn’t simply how much data an LLM can access, but how well it manages that data.   

The role of context engineering  

If more data alone isn’t the answer, better context is. Context engineering is deciding what information an LLM needs, when it needs it, and where that information should come from. The aim here isn’t to feed models everything, but to help them focus on the right things to produce better outputs.   

Getting context engineering right depends on improving performance, relevance, and access. Performance is improved when LLMs have the ability to reuse work they’ve already done, so time and energy isn’t wasted recomputing answers. Relevance, on the other hand, is about helping LLMs narrow their field of view to the data that improves reasoning in relation to a specific task.  Access is about ensuring useful data is always available, accurate, and secure when the model needs it. Actioned together, these three elements are what can enable LLMs to make better choices about what to use and when, transforming raw information into meaningful context.  

Filtering data to deliver accuracy   

Modern data infrastructure is what makes this all possible. Real-time in-memory storage speeds retrieval so LLMs can recall useful context in milliseconds, while semantic caching avoids unnecessary compute by identifying previously answered questions.   Vector search helps surface the most relevant information from large stores of data. Together, these techniques are what give LLMs the ability to use the right context at the right moment, rather than simply remembering everything.    

For example, a business using an LLM to summarise company compliance policies risks inaccurate answers if outdated or unrelated documents are merged. With context engineering, the model filters for the most recent verified documents. Real-time retrieval ensures only up-to-date information is used, making answers faster and more accurate. Simply put, that model is not remembering more, it’s reasoning better.  

AI that makes smarter decisions  

The transition from static models to dynamic agents marks a fundamental shift in what we should expect from AI. Context windows will continue to expand, but scale alone has never equalled wisdom. What separates truly capable AI systems is their capacity to identify which information matters and the judgement to act on it appropriately. This combination of insight and judgement will shape the next generation of AI.   


Discover more from RSS Feeds Cloud

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Discover more from RSS Feeds Cloud

Subscribe now to keep reading and get access to the full archive.

Continue reading