Machine Learning, From Zero
What a model really is, how it learns from data instead of from rules you write, and the four kinds of machine-learning system — built from first principles, one plain idea at a time.
What a model really is, how it learns from data instead of from rules you write, and the four kinds of machine-learning system — built from first principles, one plain idea at a time.
| 1 | What "machine learning" actually means | 2 | How a model learns |
| 3 | The four kinds of ML system | 4 | Common pitfalls when learning ML |
| 5 | Check yourself | 6 | Adoption checklist |
| 7 | Related | 8 | References |
Picture a neighbourhood bakery that wants to know how many loaves to bake tomorrow. The owner could write the rules by hand: start at 200, add 50 on a public holiday, subtract 30 when it rains, bump it up the week schools reopen. This works for a while. Then a festival lands on a rainy Tuesday, two rules collide, and the shelves are wrong all day. Every new situation needs another rule, and the rules start contradicting each other.
Machine learning flips the direction of that effort. Instead of writing the rules, you hand a program three years of daily sales — the date, the weather, the holidays, the foot traffic, and the number that actually sold — and let it work out the relationship for itself. The output of that process is a model: a mathematical relationship, derived from data, that maps an input you have (tomorrow's conditions) to an answer you want (tomorrow's demand).
That is the whole shift. Traditional software is logic a human wrote. Machine learning is a relationship a program found in data. You are no longer programming the answer; you are programming the search for the answer (Mitchell, 1997). The skill stops being "can I express this rule?" and becomes "do I have data where the pattern lives, and do I know what answer I'm asking for?"
This matters because it changes where bugs come from. A rule-based system fails when a human forgets a case. A learned system fails when the data is biased, the question is fuzzy, or the world shifts away from what the model saw in training. Different failure modes need different instincts — which is what the rest of this primer builds.
Six words carry most of the weight in any first ML conversation. Learn these as a connected set, not a glossary — each one only makes sense next to the others.
A model is the learned relationship itself — the thing that turns an input into an answer. It is not code a person wrote line by line.
Concretely a model is a structure plus a set of numbers: feed in tomorrow's conditions, the numbers combine them, an answer comes out. You can save it, copy it, and run it a million times. Everything else in this section is about where those numbers come from and what they act on.
A feature is one input the model reads — one column of your data. For the bakery: the day of the week, the temperature, "is it a holiday," last week's sales. Each is a feature.
Choosing good features is most of the real work. A model can only find a pattern in what you give it: if footfall drives sales but you never recorded footfall, no amount of cleverness recovers it. Features are the model's entire view of the world.
A label is the known, correct answer attached to a past example — the number that actually sold on a given day. Features describe the situation; the label is the outcome you wish you could predict.
Data where every row has its label is the raw material a model learns from. The quality of those labels sets the ceiling on everything: a model trained on sloppy answers learns to reproduce sloppy answers, confidently.
Training is the loop that turns data into a model. The program makes a guess, compares it to the label, measures how wrong it was, and nudges its internal numbers to be a little less wrong — then repeats, across every example, often millions of times.
Nobody hand-tunes those numbers. Training is automated search for the settings that best match inputs to known answers. When people say a model "learned," this loop is what they mean — nothing more mystical than that.
A prediction is the model's output on data it has never seen. Training happens once on past data; prediction is what you actually use the model for, day after day, on new inputs.
The honest test of a model is never how well it fits the data it trained on — it is how well it predicts cases it was never shown. A model that aces yesterday and fumbles tomorrow has memorised, not learned.
These are the internal numbers training tunes. A weight scales how much one feature matters; a bias shifts the baseline answer up or down regardless of the inputs.
A tiny example makes it concrete: total cost at a parking garage is a fixed entry fee plus a per-hour charge. The per-hour charge is a weight on the "hours" feature; the entry fee is the bias — what you pay before any feature moves. Training is, in essence, the search for the right weights and bias (Google ML Glossary).
Almost every ML system fits one of four families, separated by what the data gives the model to learn from. The diagram below is the decision tree; this section is the tour.
Supervised learning trains on examples that already carry the correct answer — features and labels together. It is the bakery case: years of conditions paired with what actually sold. The model studies labelled history, then answers for new inputs. Most ML you meet day to day is supervised, and it splits by the shape of the answer.
When the answer is a number on a scale, the task is regression: tomorrow's electricity demand in megawatts, the resale price of a used motorcycle, the minutes a delivery will take. The output is a position on a continuum, and "close" is meaningful — being off by 2 is better than off by 200.
When the answer is a category, the task is classification: is this payment fraud or legitimate, which of five languages is this review written in. With exactly two categories it is binary classification (fraud / not fraud); with more than two it is multiclass (which crop disease is in this leaf photo). The output is a choice from a fixed set, not a point on a scale.
Unsupervised learning is handed data with no answers attached and asked to find structure anyway. The most common form is clustering — grouping records that resemble each other. Point it at a million support tickets with no categories and it surfaces the recurring themes; point it at listener histories and it finds taste segments.
The defining difference from classification: in classification you define the categories in advance and label examples. In clustering the groups emerge from the data, and a human decides afterward what each one means — or whether it means anything.
Reinforcement learning has no labelled answers and no fixed dataset. An agent acts inside an environment, receives a reward or penalty for what it did, and gradually discovers a policy — a strategy for choosing actions that earn the most reward over time. A warehouse robot learning the fastest pick route by trying paths and scoring the time is reinforcement learning; so is the system that learned superhuman play in the game of Go (Silver et al., 2016).
The signal here is weaker and later than a label — you learn from consequences, often delayed, rather than from a correct answer handed over up front.
Generative systems learn the patterns in a large body of data so thoroughly that they can produce new examples in the same vein — drafting a product description, sketching an image from a prompt, proposing code. They map an input of one kind to an output of another: text to text, text to image, text to audio.
Under the hood, a generative model is first trained to imitate its data, then often refined to follow instructions or human preferences. The catch worth internalising early: it is optimised to produce output that is plausible, which is not the same as output that is true.
The map below is a decision tree for placing any ML problem. Start at the data and ask one question — what does the data give us? A labelled answer points to supervised learning, and from there the shape of the answer (a number or a category) picks regression or classification. No labels points to unsupervised structure-finding; a reward signal points to reinforcement learning; learning a pattern in order to create points to generative AI. Each branch is a colour-coded zone.
The mistakes below are the ones that trip up almost everyone in their first months. Each is a confusion of categories, not a gap in maths.
Both "sort things into groups," so they blur together. But they are opposites in one decisive way: classification predicts categories you defined and labelled in advance; clustering discovers groups you never specified. Treating a clustering result as if its groups were your intended categories leads to confident, wrong conclusions about what the model "decided."
Ask one question: did I hand the model the right answers during training? If yes, it is classification and the categories are yours. If no, and you let groupings emerge, it is clustering — and naming those groups is your job, done after the fact, with no guarantee they match any category you had in mind.
It is tempting to fix a weak model by pouring in more rows. But volume cannot rescue a fuzzy target or noisy labels — it just reproduces the confusion faster and more confidently. A clean ten thousand examples with trustworthy labels routinely beats a noisy ten million.
Spend the first effort on the question and the labels, not the row count. Define precisely what you are predicting, make sure the label actually captures it, and audit a sample by hand. Good features and clean labels move accuracy more reliably than raw scale (Domingos, 2012).
Fluent, well-structured text reads as authoritative, so it is easy to trust. But a generative model optimises for plausibility, not correctness — it will state a confident, well-formed claim that is simply wrong, with no internal signal that anything is off.
Treat every generation as a draft. Verify factual claims against a real source, keep a citation trail, and put a human gate in front of anything consequential. The model is a fast first-pass writer, not an oracle — design the workflow around that.
The shape of the output number fools people. A model that emits a postal code, a product ID, or a category index outputs a number — but it is predicting a category, not a quantity. Run regression on it and "close" becomes meaningless: postal code 10001 is not "nearly" 10002.
Classify by what the number means, not how it looks. If it is an ordered quantity where distance matters, it is regression. If it is a label that merely happens to be written as a number, it is classification. Ask: would "off by one" be better than "off by a hundred"? If not, it is a category.
It is natural to say a fraud model "knows" fraud or a language model "understands" a sentence. That language quietly imports human comprehension where there is none. The model found statistical relationships between features and labels; it holds no concept of money, crime, or meaning.
Reason about a model as tuned numbers over features — because that framing predicts where it breaks. When the world drifts from the training data, or a category it never saw appears, performance degrades silently. Expecting understanding hides those failures; expecting a fitted function anticipates them.
Four quick prompts. Name the kind of ML before reading the answer — retrieval beats re-reading.
Q. A utility wants to predict tomorrow's electricity demand in megawatts. Which kind of ML?
Supervised regression — the answer is a quantity on a scale, so "close" is meaningful.
Q. A bank must flag each incoming payment as fraudulent or legitimate. Which kind?
Supervised classification, and specifically binary — two categories, the answer is a choice, not a number.
Q. You have a million support tickets, no categories defined, and you want to find the recurring themes. Which kind?
Unsupervised learning — clustering. The groups emerge from the data; you name them afterward.
Q. A robot learns the fastest pick route in a warehouse by trying paths and scoring the time taken. Which kind?
Reinforcement learning — it learns from rewards earned by acting, not from labelled answers handed over in advance.
Ten checkpoints on the path to genuine ML literacy. The order matters — each builds on the one before, and the last is the real test.
The line to hold: ordinary code is logic a human wrote; ML is a relationship a program found in data. If you can state this cleanly, the rest of the field has a place to attach.
Supervised, unsupervised, reinforcement, generative. Inventing your own example for each — not repeating one you read — is the test that the category, not the anecdote, has landed.
A quantity on a scale is regression; a choice from a fixed set is classification. The deciding question is whether "off by a little" is better than "off by a lot."
You define and label them in advance → classification. They emerge from the data and you name them after → clustering. This is the single most common beginner mix-up.
Given a table, identify which columns are inputs (features) and which is the known answer (label). If you cannot find the label, the task is not supervised — that itself is a useful signal.
Training is an automated loop: guess, measure the error against the label, nudge the numbers, repeat. No human hand-tunes the weights. "The model learned" means exactly this loop ran.
Postal codes, product IDs, and class indices are numbers that name categories. Recognising these prevents the classic error of running regression on a classification problem.
Plausible is not true. A literacy checkpoint is the reflex to verify, cite, and keep a human gate — rather than pasting confident output straight through to a decision.
Labelled data goes into training; training produces a model; the model makes predictions on new, unseen inputs. Drawing this without notes means the mechanics, not just the words, are yours.
The real test of understanding is transfer. If you can make a colleague grasp "rules you write" versus "patterns the program finds" in a minute, you own the foundation — and you are ready for Part 2.
A score below 7 means the vocabulary is in place but the distinctions are still fuzzy — re-run the "four kinds" tour and the pitfalls, then come back.