How a Fuzzer Works: Mutations, Seeds, and Corpus Explained

Complete Fuzzing Series Blog 2 of N

Part Title Status
1 What Is Fuzzing and Why Does It Actually Work? :white_check_mark:
2 -You are here What Happens Inside a Fuzzer Seeds, Mutations, and Corpus :white_check_mark:
3 The Taxonomy: Every Type of Fuzzing Explained :soon_arrow:
4 What Happens Inside a Program When It Crashes :soon_arrow:
5 Sanitizers Your Fuzzing Superpowers :soon_arrow:
6+ More parts coming… :soon_arrow:

→ Next: Blog 3 - The Taxonomy: Every Type of Fuzzing Explained

← Previous: Blog 1 - What Is Fuzzing and Why Does It Actually Work?

Part of the Complete Fuzzing Series on IoTSec.in


In Blog 1, I told you fuzzing is not random. It’s guided. It finds the gaps developers never thought about.

But I never explained how exactly that works.

Like, when you actually run AFL on your laptop what is happening? What is the fuzzer doing every second? What is a seed? What is a corpus? What does “mutation” actually mean in practice?

That’s what this blog is about.

I’m going to walk you through everything using one story. Jack a beginner security researcher sits down at his laptop and runs his first fuzzing campaign. We’ll follow him from start to crash.


The Setup - Jack Has One File and One Program

Jack downloaded AFL. He has a JPEG parser program on his Linux machine a program that reads JPEG files and processes them.

He wants to fuzz it. Find bugs in it.

He has one valid JPEG file on his desktop photo.jpg. A real photo that the parser already accepts without any problem.

That’s all he needs to start.


What Is a Seed The Starting Point

Jack’s photo.jpg is his seed.

A seed is just a valid, real input that the program already accepts. Nothing special about it. It’s the starting point the base from which everything else is built.

Why do we need a seed instead of just generating random bytes?

Because random bytes get rejected immediately. If you feed pure garbage to a JPEG parser, it checks the first 4 bytes, sees it’s not a valid JPEG header, and exits. The fuzzer never gets past the front door.

A valid seed gets inside the program. It reaches the actual parsing logic. Now the fuzzer has something to work with.

Random bytes → parser rejects at door → fuzzer learns nothing
Valid seed   → parser accepts → fuzzer gets inside → real testing begins

Good seeds matter. A seed that gets deep inside the program gives the fuzzer a much better starting position than one that gets rejected at the first check.


What Is Mutation - How the Fuzzer Creates New Inputs

Jack has photo.jpg. AFL doesn’t just feed the same file over and over. That would be useless same input always produces the same result.

AFL takes photo.jpg and starts making mutations copies with small, deliberate changes.

Think of it like a photocopier that makes mistakes on purpose. You put in the original. It copies it, but randomly smudges a word, cuts off the last line, duplicates a paragraph, writes garbage in the margin.

Each copy is slightly different. Slightly broken. Most of them are useless. But occasionally one broken copy triggers something the original never could.

AFL does the same thing to bytes:

Original photo.jpg:   FF D8 FF E0 00 10 4A 46

Bit flip:             FF D8 FF E0 00 10 4B 46  ← one bit changed
Insert bytes:         FF D8 FF E0 00 00 00 10 4A 46  ← bytes added
Delete bytes:         FF D8 FF E0 4A 46  ← bytes removed
Repeat a chunk:       FF D8 FF D8 FF D8 FF E0  ← section repeated
Replace with zero:    FF D8 FF E0 00 00 00 00  ← zeroed out
Replace with max:     FF D8 FF E0 FF FF FF FF  ← maxed out

No intelligence. No reading the code. Just mechanical changes. Millions of them per hour.

One important thing AFL is not completely random. It has a list of known dangerous values it always tries:

0
-1
255
65535
2147483647

These are values that have historically broken programs. Developers forget to handle them. AFL always tries substituting these values into the input because experience shows they find bugs.

The fuzzer doesn’t know why these values are dangerous. Someone wrote that list into AFL years ago and it keeps working.


What Is Instrumentation The Hidden Cameras

Here’s something I had to understand before corpus made sense.

AFL doesn’t just feed files to the parser and hope for a crash. It watches what happens inside the program during every single run.

But wait AFL doesn’t read the code. So how does it watch what happens inside?

Instrumentation. Before fuzzing starts, AFL compiles the program with special markers inserted at every decision point:

// Original code
if (file_size > 1000) {
    process_large();
} else {
    process_small();
}

// After AFL instruments it
if (file_size > 1000) {
    [MARKER_A]   ← AFL planted this
    process_large();
} else {
    [MARKER_B]   ← and this
    process_small();
}

When the program runs, these markers fire and report back to AFL “branch A was taken” or “branch B was taken.”

AFL keeps a master checklist:

MARKER_A hit? → NO  (not yet)
MARKER_B hit? → YES (seed reached here)
MARKER_C hit? → NO  (not yet)

Every single run, AFL updates this checklist. This is how it tracks which parts of the program have been explored.

This checklist is stored in a bitmap a giant array of 65,536 slots in memory. Each slot represents one branch in the program. Hit = 1. Not hit = 0.

Checking “did this run reach anything new” is just one comparison between two bitmaps. Happens in microseconds. That’s why AFL can run millions of inputs per hour.


What Is a Corpus The Folder of Survivors

Now we get to the most important concept. And the one I found most confusing at first.

Jack’s AFL run is creating millions of mutated files. Most get thrown away. Some get saved. The ones that get saved that collection is called the corpus.

But what decides if a file gets saved or thrown away?

Simple. One question:

Did this input hit any branch marker that was never hit before?

YES → interesting input → save it to corpus
NO  → useless input    → throw away forever

That’s it. Nothing else matters.

Let’s watch it happen:

Seed photo.jpg runs → hits MARKER_B and MARKER_D
Master checklist: MARKER_B ✓, MARKER_D ✓
Seed saved to corpus (everything was new at start)

Mutated v1 runs → hits MARKER_B and MARKER_D
Same as before. Nothing new.
→ DISCARDED

Mutated v2 runs → hits MARKER_B, MARKER_D, MARKER_F
MARKER_F is new! Never hit before.
→ SAVED TO CORPUS

Master checklist updates: MARKER_B ✓, MARKER_D ✓, MARKER_F ✓

So after this, the corpus folder looks like:

output/queue/
    id:000000  ← original seed (photo.jpg)
    id:000001  ← mutated v2 (hit MARKER_F, the new one)

Mutated v1 is gone. Forever. Never touched again.

The corpus is just a folder of files on disk. Real files. You can open them, copy them, delete them. Each one looks like a slightly broken JPEG. But each one earned its place by unlocking a part of the program that nothing before it had reached.

Every corpus file is a mutated input. But not every mutated input becomes a corpus file.


The Queue Not All Corpus Files Are Equal

Corpus keeps growing. Now AFL has many files to choose from for the next round of mutations.

But which one does it pick first?

AFL uses a queue a priority system. Not all corpus files are equally valuable. Some are better starting points than others.

AFL favors:

Smaller files      → easier to mutate meaningfully
Faster execution   → more mutations per second
More new branches reached → already productive, might go deeper

A tiny 200-byte file that reached 5 new branches gets picked before a 10MB file that reached 1 new branch.

AFL calls its best inputs “favored” it spends more mutations on these and skips less interesting ones in some rounds.

Queue priority:
[small file, 5 new branches]  ← AFL picks this often
[medium file, 2 new branches] ← picked sometimes
[large file, 1 new branch]    ← picked rarely

The Full Loop Everything Together

Now let’s watch Jack’s entire fuzzing session from start to crash.

Jack types one command:

afl-fuzz -i seeds/ -o output/ -- ./jpeg_parser @@

AFL starts. Here’s what happens every single second:

1. Pick a file from corpus (queue)
2. Mutate it flip bits, insert bytes, try dangerous values
3. Feed the mutated file to jpeg_parser
4. Parser runs completely, start to finish, then exits
5. AFL checks the bitmap any new branches hit?
   YES → save this file to output/queue/
   NO  → throw it away
6. Did it crash?
   YES → save to output/crashes/
7. Go back to step 1

That loop runs millions of times per hour. The corpus grows. New branches get discovered. The program gets explored deeper and deeper.

Jack watches his screen:

total execs    :  2,400,000
corpus size    :  47 inputs
crashes found  :  3
branches hit   :  1,240 / 3,000

20 minutes later:

output/crashes/
    id:000000,sig:11  ← this input crashed the parser
    id:000001,sig:11  ← this one too
    id:000002,sig:06  ← and this one

AFL saved the exact files that caused each crash. Jack can now feed them manually to the parser and confirm. Then investigate is this exploitable?

AFL doesn’t stop. It keeps running. More crashes might be hiding deeper.


One Thing That Confused Me

“The corpus and the mutated inputs look the same. Both are just broken JPEG files. What’s the difference?”

This confused me too. Here’s the answer:

AFL creates millions of mutated files. The corpus holds only the ones that unlocked new code. Out of 1,000,000 mutations, maybe 200 make it into the corpus. The other 999,800 are gone forever.

1,000,000 mutated files created
    999,800 → nothing new → thrown away
        200 → new branch hit → saved to corpus

They look the same from outside. But corpus files are the survivors. The ones that proved themselves useful.


Why Crashes Don’t Stop the Fuzzer

When AFL finds a crash, it saves the file to output/crashes/ and keeps going.

Why not stop?

Because one crash doesn’t mean the job is done. There might be 50 more crashes hiding in code paths that haven’t been reached yet. AFL’s job is to find all of them. Your job as a researcher is to analyze them later.

Not every crash is a vulnerability either. Some crashes are just bugs with no security impact. The fuzzer doesn’t know the difference it saves everything and lets you decide.


What I Found Confusing (And Now Don’t)

“How does the fuzzer know a new code path was reached if it can’t read the code?”
Instrumentation. The compiler plants markers at every branch during compilation. AFL watches which markers fire during each run. No code reading required.

“What exactly is a corpus? Is it the code paths?”
No. The corpus is the actual files the inputs that caused new code paths to be reached. You can open a corpus file in a hex editor. It’s just bytes. Modified versions of your seed that happened to unlock new territory.

“Does AFL follow the program while it runs?”
No. AFL and the parser are separate. AFL hands a file to the parser. Parser runs completely. Exits. AFL checks what happened. Then AFL creates the next file. They don’t talk during the run.

“Why do good seeds matter so much?”
A bad seed gets rejected at the first check. The fuzzer never gets past the front door. A good seed gets deep inside the program and gives AFL a rich starting point to mutate from.


What We Learned

Seed - A valid, real input that the program already accepts. The fuzzer’s starting point. The base from which all mutations are created.

Mutation - Taking an existing input and changing it slightly — flipping bits, inserting bytes, deleting bytes, substituting known dangerous values. The fuzzer creates millions of mutations per hour.

Instrumentation - The process of compiling a program with hidden markers at every branch point. These markers report back to AFL which parts of the code were executed during each run.

Bitmap - AFL’s master checklist. 65,536 slots. Each slot represents one branch in the program. Hit = 1, not hit = 0. AFL uses this to decide in microseconds whether an input reached anything new.

Corpus - The folder of saved inputs that each reached at least one new branch. Every corpus file earned its place. Useless inputs are thrown away forever.

Queue - AFL’s priority system for deciding which corpus file to mutate next. Smaller files and files that reached more new branches get picked first.

Coverage-guided fuzzing - Fuzzing that uses instrumentation and bitmap tracking to guide mutations toward unexplored code. The fuzzer is not random it builds a map of the program and systematically explores every corner.

Interesting input - Any input that hits a new branch, hits an existing branch significantly more times than before, or causes a crash. Everything else is uninteresting and gets discarded.

Crash - When a mutation causes the program to terminate unexpectedly. AFL saves these automatically to output/crashes/. The fuzzer doesn’t stop when it finds one it keeps going.


→ Next: Blog 3 -The Taxonomy: Every Type of Fuzzing Explained

← Previous: Blog 1 What Is Fuzzing and Why Does It Actually Work?

Part of the Complete Fuzzing Series on IoTSec.in