Back to blog
ai

How I optimized my AI agent context

By Youcef EL KAMEL
9 min read

Optimizing AI agent context after losing Opus 1M

I loved Opus 1M for one very simple reason: it let you be a little sloppy.

You could load too much context. Read too many files. Let a skill drag half the repo in “just in case.” And a lot of the time, it still worked.

Once I moved back to GLM 5.1 with a much tighter context window, the system reminded me of something a lot of people forget with LLMs: context is not free.

It’s a budget. And if you spend it badly, everything else gets fragile.

The real problem wasn’t the model

The classic reaction when a workflow breaks after a model switch is:

“The new model is worse.”

Sometimes that’s true. But in my case, the real problem was somewhere else: my skills and crons had developed bad habits.

With a huge context window, workflows become lazy:

  • they read optional files upfront
  • they load complete histories when an excerpt would do
  • they carry full checklists that could be compressed
  • they repeat the same rules in both the cron payload and the skill
  • they keep “safe” but verbose instructions that bloat every run

I saw the result quickly on some Reddit and Instagram runs: context overflow.

Not a cute little overflow either. The kind of run that blows up before it even starts doing the actual work.

What losing opus 1M forced me to do

In practice, moving to a smaller-context model forced me to clean up the agent layer architecture for real.

And honestly, that was healthy.

I started treating prompts the same way I treat production code:

  • remove dead weight
  • defer anything that can be loaded later
  • eliminate duplicated instructions
  • separate must-have from nice-to-have
  • stop reading files “for comfort”

In other words: move from “load everything and hope for the best” to “load the strict minimum, then expand only if needed.”

1. I introduced a “minimal context first” doctrine

This was the biggest change.

Before, a lot of skills did a kind of XXL preflight:

  • app context
  • strategy
  • templates
  • variations
  • anti-patterns
  • examples
  • full tracking
  • side docs

All of that before the first real action.

Now I structure skills like this:

Phase A, required minimal context

Read only what is needed to make the first good decision.

Typical example:

  • account config
  • app context
  • strategy
  • one primary template file

Phase B, read on demand

Everything else becomes optional:

  • examples
  • long anti-pattern docs
  • extra variations
  • detailed history
  • fallback docs

If a run doesn’t need them, it never loads them.

It sounds obvious when you say it like that. But in multi-agent systems, this discipline changes everything.

2. I stopped loading full files “just in case”

The worst offenders are history files: tracking CSVs, logs, post lists, giant markdown examples.

Before, some workflows would read the entire file to “have the full picture.”

In reality, 90% of the time you do not need the full file. You need:

  • the header
  • the latest relevant lines
  • or a targeted lookup for one username / status / subreddit

So I started replacing “read all” with:

  • reading the first 50 lines or another useful excerpt
  • then doing targeted lookups during execution if a specific case requires it

That change alone cuts a huge chunk out of the starting prompt.

And more importantly, it makes the run more resilient: the system is no longer dependent on one massive block of context before it can act.

3. I removed duplication between cron payloads and skills

Another classic trap: you put the same instructions everywhere.

The cron says:

  • be discreet
  • no preamble
  • use minimal context
  • don’t read the full app folder

Then the skill says:

  • be discreet
  • no preamble
  • read only essentials
  • avoid full tracking loads

Each line seems reasonable on its own. Together, you pay twice for the same rule.

So I simplified the split:

  • the cron payload keeps only the essential runtime constraints
  • the skill keeps the detailed operating logic

The payload is no longer a mini handbook. It goes back to being an execution envelope.

4. I kept the quality guardrails… but in compact form

I didn’t want to make the opposite mistake and compress so hard that the output quality collapsed.

On Reddit, for example, one critical safeguard for me is the humanizer. I do not want posts that smell like AI from a mile away.

The problem is that loading a full humanizer skill on every public-facing run costs context.

So the approach I now use is:

  • keep the humanizer mandatory
  • but use a compact embedded checklist first
  • only load the full skill as a fallback if the draft still feels suspicious

In practice, that means things like:

  • remove corporate phrasing
  • strip overly polished symmetry
  • avoid em dashes and overly neat structure
  • keep slight human imperfection
  • check whether the text would survive scrutiny from someone hunting AI content

That keeps the quality bar while avoiding the cost of a full systematic load every single time.

5. I separated “decision context” from “execution context”

This distinction helps me a lot now.

Decision context

What the agent needs to choose the right action. For example: which account to use, which subreddit to target, which template to pick.

Execution context

What the agent needs once the action is already chosen. For example: exact flair, detailed subreddit history, a specific CSV row for one target.

Before, many skills mixed both together and loaded everything up front.

Now I force the sequence:

  1. decide using a light context
  2. only then load what is needed to execute cleanly

This makes agents more disciplined. And it also makes failures much easier to debug.

6. Big context windows hide design debt

This is probably the most interesting lesson in the whole thing.

A large context window can hide:

  • overly verbose skills
  • badly segmented workflows
  • duplicated instructions
  • workspace files that have grown too large
  • crons that accumulated layers of patches over time

Once you go back to a stricter window, all of that comes back to the surface.

And that is good news.

Because underneath, this is not just a token problem. It is an operational design problem.

If an agent needs 15 files and 200 rules before it can click a button, the problem is not only the model. The problem is that the workflow is too heavy.

What i optimized concretely on the cron side

Crons were the first target because they run on their own and need to be reliable.

I started by tightening the payloads of the most sensitive runs:

  • Instagram outreach
  • Instagram inbox / cleanup
  • Reddit posting
  • Reddit warmup
  • Reddit comment acquisition
  • Reddit engagement

The default rule is now:

  • execution-only
  • no useless planning
  • no skill summary inside the run
  • no preamble
  • no full app-folder load
  • no full tracking-file read without a strict reason

A cron should be designed as a tight operational unit. Not as a brainstorming session.

And on the skill side, the improvement matters even more

Skills are where most of the battle is won.

When a skill is well written, the model feels like it has more context than it actually does, because the information is better structured.

When a skill is badly written, even a large window never really feels enough.

The patterns I keep by default now:

  • a very short opening
  • an explicit context-budget doctrine
  • a required minimal context section
  • a read on demand only section
  • business rules grouped together instead of repeated everywhere
  • a clear fallback when extra context becomes necessary

That is cleaner for the model, but also for me. When I re-read a skill, I can immediately see what is expensive and what is truly essential.

The result: less spectacular than opus 1M, but healthier

Let’s be honest: moving from an ultra-large model to a tighter one is not fun.

You lose comfort. You need to be more rigorous. You see the limits of your instructions faster.

But the hidden upside is huge: you end up with a much cleaner system.

Today, my best workflows no longer depend on the generosity of a giant context window. They depend on:

  • good decomposition
  • progressive reading
  • compact rules
  • targeted fallbacks
  • a real separation between essential and optional information

And that is much more durable in the long run.

My advice if you run agents in production

If you rely on agents, crons, or recurring LLM workflows, do this exercise even if you still have access to a giant context window.

Ask yourself:

  • what is truly needed for the first step?
  • which files are being read out of habit?
  • which rules are duplicated?
  • what can move to fallback?
  • which histories can be replaced by targeted lookups?

Most of the time, you will discover that your system can lose 30–70% of its upfront context without losing quality.

And sometimes it even gets better.

Because an agent with less clutter often makes better decisions.

Conclusion

Losing Opus 1M forced me to do something I had postponed for too long: treat prompts like architecture.

Not like disposable text. Not like a stack of notes. Not like “let’s dump everything in and see what happens.”

I now see much more clearly the difference between:

  • a system that works because it has a lot of margin
  • and a system that works because it is well designed

The first one is comfortable. The second one is durable.

And if I have to choose for agents that run in production every day, I’ll take the second one every time.

Youcef | Creative Builder I build systems that work while I sleep

#OpenClaw #GLM 5.1 #context window #AI agents #cron #skills #optimization #LLM ops #multi-agent