Leo+DadMade for Leo
Collecting Datasets
Rung 1 of 3 · Discover

Where a Dataset Comes From

Before any charts, let's watch a messy pile of observations turn itself into neat counts — and see how a fair handful can speak for a whole crowd, while a biased one quietly lies.

NESA SC4-DA1-01 The start of data science

Play Click observations in and watch the frequency table fill. Then flip to sampling and see fair vs biased give two answers.
🎧
Audio WalkthroughDad & Leo, Two Minutes — Coming Soon
Video ExplainerComing Soon

Here's the whole idea in one breath: data starts as a messy pile of raw observations, and the work is turning that pile into counts you can actually read. A tally and a frequency table do exactly that. And because you usually can't measure everyone, you take a sample — which only tells the truth if it's fair.

The Messy Pile

Imagine you stood at the school gate and wrote down the eye colour of the first forty kids through it: brown, blue, brown, brown, green, hazel, blue, brown… Forty scribbles. Is brown winning? You genuinely can't tell — your eye slides off a list of forty words. The raw observations are all there, but a pile isn't an answer. You have to organise it before it means anything.

The fix is ancient and brilliant: a tally. Each time a value shows up you draw one stroke next to it, and every fifth stroke goes across the previous four, so you can count in fives at a glance. Add up the strokes for each value and you've got a frequency — the number of times that value appeared. Lay every value beside its frequency in a little two-column table and that's a frequency table: the pile, finally turned into counts.

Say it plainly: raw observations are just a pile. A tally counts them as they come in; a frequency table is the tidy result — each value with the number of times it showed up. Same data, but now you can read it.

You Can't Measure Everyone

There's a second problem hiding behind the first. You wanted to know about the whole school's eye colour, but you only counted forty kids — you took a sample. That's normal; nobody measures the entire population every time. The catch is that a sample only tells the truth about the crowd if it's representative — a fair, unbiased slice of it.

Picture asking "what's everyone's favourite sport?" but only sampling kids walking out of basketball training. Your frequency table will scream basketball, and it'll be a lie about the school — because the sample was loaded before you started. A fair sample is picked so every kind of person in the crowd has an equal chance of turning up; a biased sample quietly leaves some of them out, then speaks confidently for people it never asked.

Flip the toy to its sampling mode. The same coloured crowd sits there the whole time — you're not changing it. A fair random handful lands close to the real split; a biased handful, grabbed only from one corner, gives a confidently wrong answer. Same population, two samples, two different "truths." That gap is the whole reason careful collecting matters.

Us, Thinking Out Loud

If a frequency table just counts what you collected, what happens to it when the collecting was unfair to begin with?

Where have you seen a “9 out of 10 people” claim that probably only asked a loaded little crowd?