2 · How to do it — Collecting Datasets

Collecting a dataset well isn't luck — it's the same four moves every time. Run them in order and the messy pile sorts itself out.

The Four Steps

One — define the variable, with units. Decide exactly what you're recording before you start. Not "how tall" but "height in centimetres." Not "how warm" but "temperature in °C." A fuzzy variable gives you a fuzzy pile that won't add up later.

Two — decide a fair sample. You can't measure everyone, so choose who or what you'll measure — fairly, so every part of the crowd has an equal chance of being picked. A random handful beats "whoever's standing near me" every time.

Three — record into a tidy table. As results come in, write them straight into a table where each row is one observation and each column is one variable. One thing measured, one row. This raw table is your honest record before any counting.

Four — tally into a frequency table. Now sweep through the tidy table and tally each value, turning rows of raw results into a frequency table — each value beside the number of times it appeared.

Say it plainly: define it (with units) → sample it fairly → record it row-by-row → tally it into counts. The tidy table is the raw truth; the frequency table is the summary. Don't skip straight to counting — the tidy table is what keeps you honest.

When the Data Comes from an Experiment

Sometimes you're not surveying a crowd — you're measuring something you change on purpose, like how far a toy car rolls from different ramp heights. The same routine holds, but you add one rule on top: run a fair test. Change one thing (the ramp height), keep everything else the same (same car, same floor, same starting push), and measure the result. Change two things at once and you'll never know which one moved the numbers — your dataset becomes unreadable before you even tally it.

A Worked One, Slowly

Question: collect a dataset of how many hours of sleep the kids in your class got last night.

Define it: the variable is hours of sleep last night, rounded to the nearest half hour — units locked in. Sample it fairly: you ask the whole class, so there's no bias; every kid is in. Record it tidily: one row per kid, one column for their name and one for their hours — Mia 8, Jack 6.5, Priya 9, Omar 7… — the raw, honest record. Tally it: sweep the hours column and build a frequency table that says 6.5 hrs: 3 kids, 7 hrs: 5 kids, 8 hrs: 6 kids… Four moves, in order, and a scribbled class survey becomes a dataset you could actually chart. That structure works for eye colour, dice rolls, ramp distances — anything.

Collecting a Dataset, Step by Step

The Four Steps

When the Data Comes from an Experiment

A Worked One, Slowly

Us, Thinking Out Loud