Machine Learning, From Zero

On This Page

1	What "machine learning" actually means	2	How a model learns
3	The four kinds of ML system	4	Common pitfalls when learning ML
5	Check yourself	6	Adoption checklist
7	Related	8	References

What "machine learning" actually means

Picture a neighbourhood bakery that wants to know how many loaves to bake tomorrow. The owner could write the rules by hand: start at 200, add 50 on a public holiday, subtract 30 when it rains, bump it up the week schools reopen. This works for a while. Then a festival lands on a rainy Tuesday, two rules collide, and the shelves are wrong all day. Every new situation needs another rule, and the rules start contradicting each other.

Machine learning flips the direction of that effort. Instead of writing the rules, you hand a program three years of daily sales — the date, the weather, the holidays, the foot traffic, and the number that actually sold — and let it work out the relationship for itself. The output of that process is a model: a mathematical relationship, derived from data, that maps an input you have (tomorrow's conditions) to an answer you want (tomorrow's demand).

That is the whole shift. Traditional software is logic a human wrote. Machine learning is a relationship a program found in data. You are no longer programming the answer; you are programming the search for the answer (Mitchell, 1997). The skill stops being "can I express this rule?" and becomes "do I have data where the pattern lives, and do I know what answer I'm asking for?"

This matters because it changes where bugs come from. A rule-based system fails when a human forgets a case. A learned system fails when the data is biased, the question is fuzzy, or the world shifts away from what the model saw in training. Different failure modes need different instincts — which is what the rest of this primer builds.

How a model learns

Six words carry most of the weight in any first ML conversation. Learn these as a connected set, not a glossary — each one only makes sense next to the others.

1. Model

A model is the learned relationship itself — the thing that turns an input into an answer. It is not code a person wrote line by line.

Concretely a model is a structure plus a set of numbers: feed in tomorrow's conditions, the numbers combine them, an answer comes out. You can save it, copy it, and run it a million times. Everything else in this section is about where those numbers come from and what they act on.

2. Feature

A feature is one input the model reads — one column of your data. For the bakery: the day of the week, the temperature, "is it a holiday," last week's sales. Each is a feature.

Choosing good features is most of the real work. A model can only find a pattern in what you give it: if footfall drives sales but you never recorded footfall, no amount of cleverness recovers it. Features are the model's entire view of the world.

3. Label

A label is the known, correct answer attached to a past example — the number that actually sold on a given day. Features describe the situation; the label is the outcome you wish you could predict.

Data where every row has its label is the raw material a model learns from. The quality of those labels sets the ceiling on everything: a model trained on sloppy answers learns to reproduce sloppy answers, confidently.

4. Training

Training is the loop that turns data into a model. The program makes a guess, compares it to the label, measures how wrong it was, and nudges its internal numbers to be a little less wrong — then repeats, across every example, often millions of times.

Nobody hand-tunes those numbers. Training is automated search for the settings that best match inputs to known answers. When people say a model "learned," this loop is what they mean — nothing more mystical than that.

5. Prediction

A prediction is the model's output on data it has never seen. Training happens once on past data; prediction is what you actually use the model for, day after day, on new inputs.

The honest test of a model is never how well it fits the data it trained on — it is how well it predicts cases it was never shown. A model that aces yesterday and fumbles tomorrow has memorised, not learned.

6. Weights and bias

These are the internal numbers training tunes. A weight scales how much one feature matters; a bias shifts the baseline answer up or down regardless of the inputs.

A tiny example makes it concrete: total cost at a parking garage is a fixed entry fee plus a per-hour charge. The per-hour charge is a weight on the "hours" feature; the entry fee is the bias — what you pay before any feature moves. Training is, in essence, the search for the right weights and bias (Google ML Glossary).

The four kinds of ML system

Almost every ML system fits one of four families, separated by what the data gives the model to learn from. The diagram below is the decision tree; this section is the tour.

Supervised learning

Supervised learning trains on examples that already carry the correct answer — features and labels together. It is the bakery case: years of conditions paired with what actually sold. The model studies labelled history, then answers for new inputs. Most ML you meet day to day is supervised, and it splits by the shape of the answer.

Regression — predict a quantity

When the answer is a number on a scale, the task is regression: tomorrow's electricity demand in megawatts, the resale price of a used motorcycle, the minutes a delivery will take. The output is a position on a continuum, and "close" is meaningful — being off by 2 is better than off by 200.

Classification — predict a category

When the answer is a category, the task is classification: is this payment fraud or legitimate, which of five languages is this review written in. With exactly two categories it is binary classification (fraud / not fraud); with more than two it is multiclass (which crop disease is in this leaf photo). The output is a choice from a fixed set, not a point on a scale.

Unsupervised learning

Unsupervised learning is handed data with no answers attached and asked to find structure anyway. The most common form is clustering — grouping records that resemble each other. Point it at a million support tickets with no categories and it surfaces the recurring themes; point it at listener histories and it finds taste segments.

The defining difference from classification: in classification you define the categories in advance and label examples. In clustering the groups emerge from the data, and a human decides afterward what each one means — or whether it means anything.

Reinforcement learning

Reinforcement learning has no labelled answers and no fixed dataset. An agent acts inside an environment, receives a reward or penalty for what it did, and gradually discovers a policy — a strategy for choosing actions that earn the most reward over time. A warehouse robot learning the fastest pick route by trying paths and scoring the time is reinforcement learning; so is the system that learned superhuman play in the game of Go (Silver et al., 2016).

The signal here is weaker and later than a label — you learn from consequences, often delayed, rather than from a correct answer handed over up front.

Generative AI

Generative systems learn the patterns in a large body of data so thoroughly that they can produce new examples in the same vein — drafting a product description, sketching an image from a prompt, proposing code. They map an input of one kind to an output of another: text to text, text to image, text to audio.

Under the hood, a generative model is first trained to imitate its data, then often refined to follow instructions or human preferences. The catch worth internalising early: it is optimised to produce output that is plausible, which is not the same as output that is true.

Flowchart

%%{init:{'theme':'base','themeVariables':{'fontSize':'16px','fontFamily':'Inter, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':10,'nodeSpacing':22,'rankSpacing':55,'useMaxWidth':true}}}%% flowchart LR DATA([Raw data]) Q{What does the data give us?} subgraph SUP [Supervised · labelled examples] direction LR SQ{Number or category?} REG[Regression predict a quantity] CLS[Classification predict a category binary · multiclass] SQ -->|a number| REG SQ -->|a category| CLS end subgraph UNS [Unsupervised · no labels] CLU[Clustering group similar records] end subgraph RL [Reinforcement · reward signal] POL[Policy actions that earn the most reward] end subgraph GEN [Generative · learn then create] CRT[New content text · image · audio] end DATA --> Q Q -->|labelled| SQ Q -->|no labels| CLU Q -->|reward| POL Q -->|learn a pattern| CRT %% ─── Zone + node styling (DESIGN.md §1.3) ─── style DATA fill:#E5E7EB,stroke:#4B5563,stroke-width:2px,color:#111827 style Q fill:#E5E7EB,stroke:#4B5563,stroke-width:2px,color:#111827 style SUP fill:#EFF6FF,stroke:#93C5FD,stroke-width:1.2px,color:#1e3a5f style SQ fill:#DBEAFE,stroke:#2563EB,stroke-width:1.5px,color:#1e3a5f style REG fill:#DBEAFE,stroke:#2563EB,stroke-width:1.5px,color:#1e3a5f style CLS fill:#DBEAFE,stroke:#2563EB,stroke-width:1.5px,color:#1e3a5f style UNS fill:#F0FDF4,stroke:#86EFAC,stroke-width:1.2px,color:#14532D style CLU fill:#DCFCE7,stroke:#16A34A,stroke-width:1.5px,color:#14532D style RL fill:#FEFCE8,stroke:#FDE68A,stroke-width:1.2px,color:#713f12 style POL fill:#FEF9C3,stroke:#CA8A04,stroke-width:1.5px,color:#713f12 style GEN fill:#FAF5FF,stroke:#E9D5FF,stroke-width:1.2px,color:#4C1D95 style CRT fill:#F3E8FF,stroke:#7C3AED,stroke-width:1.5px,color:#4C1D95

The map below is a decision tree for placing any ML problem. Start at the data and ask one question — what does the data give us? A labelled answer points to supervised learning, and from there the shape of the answer (a number or a category) picks regression or classification. No labels points to unsupervised structure-finding; a reward signal points to reinforcement learning; learning a pattern in order to create points to generative AI. Each branch is a colour-coded zone.

Common pitfalls when learning ML

The mistakes below are the ones that trip up almost everyone in their first months. Each is a confusion of categories, not a gap in maths.

Confusing classification with clustering

Both "sort things into groups," so they blur together. But they are opposites in one decisive way: classification predicts categories you defined and labelled in advance; clustering discovers groups you never specified. Treating a clustering result as if its groups were your intended categories leads to confident, wrong conclusions about what the model "decided."

What to do instead

Ask one question: did I hand the model the right answers during training? If yes, it is classification and the categories are yours. If no, and you let groupings emerge, it is clustering — and naming those groups is your job, done after the fact, with no guarantee they match any category you had in mind.

Assuming more data always beats a sharper question

It is tempting to fix a weak model by pouring in more rows. But volume cannot rescue a fuzzy target or noisy labels — it just reproduces the confusion faster and more confidently. A clean ten thousand examples with trustworthy labels routinely beats a noisy ten million.

What to do instead

Spend the first effort on the question and the labels, not the row count. Define precisely what you are predicting, make sure the label actually captures it, and audit a sample by hand. Good features and clean labels move accuracy more reliably than raw scale (Domingos, 2012).

Treating generative output as ground truth

Fluent, well-structured text reads as authoritative, so it is easy to trust. But a generative model optimises for plausibility, not correctness — it will state a confident, well-formed claim that is simply wrong, with no internal signal that anything is off.

What to do instead

Treat every generation as a draft. Verify factual claims against a real source, keep a citation trail, and put a human gate in front of anything consequential. The model is a fast first-pass writer, not an oracle — design the workflow around that.

Calling every numeric output "regression"

The shape of the output number fools people. A model that emits a postal code, a product ID, or a category index outputs a number — but it is predicting a category, not a quantity. Run regression on it and "close" becomes meaningless: postal code 10001 is not "nearly" 10002.

What to do instead

Classify by what the number means, not how it looks. If it is an ordered quantity where distance matters, it is regression. If it is a label that merely happens to be written as a number, it is classification. Ask: would "off by one" be better than "off by a hundred"? If not, it is a category.

Believing the model "understands"

It is natural to say a fraud model "knows" fraud or a language model "understands" a sentence. That language quietly imports human comprehension where there is none. The model found statistical relationships between features and labels; it holds no concept of money, crime, or meaning.

What to do instead

Reason about a model as tuned numbers over features — because that framing predicts where it breaks. When the world drifts from the training data, or a category it never saw appears, performance degrades silently. Expecting understanding hides those failures; expecting a fitted function anticipates them.

Check yourself

Four quick prompts. Name the kind of ML before reading the answer — retrieval beats re-reading.

Q. A utility wants to predict tomorrow's electricity demand in megawatts. Which kind of ML?

Supervised regression — the answer is a quantity on a scale, so "close" is meaningful.

Q. A bank must flag each incoming payment as fraudulent or legitimate. Which kind?

Supervised classification, and specifically binary — two categories, the answer is a choice, not a number.

Q. You have a million support tickets, no categories defined, and you want to find the recurring themes. Which kind?

Unsupervised learning — clustering. The groups emerge from the data; you name them afterward.

Q. A robot learns the fastest pick route in a warehouse by trying paths and scoring the time taken. Which kind?

Reinforcement learning — it learns from rewards earned by acting, not from labelled answers handed over in advance.

Adoption checklist

Ten checkpoints on the path to genuine ML literacy. The order matters — each builds on the one before, and the last is the real test.

1 I can say in one sentence what makes ML different from ordinary code. ☐

The line to hold: ordinary code is logic a human wrote; ML is a relationship a program found in data. If you can state this cleanly, the rest of the field has a place to attach.

2 I can name the four system types and give a fresh example of each. ☐

Supervised, unsupervised, reinforcement, generative. Inventing your own example for each — not repeating one you read — is the test that the category, not the anecdote, has landed.

3 I can tell regression from classification by the shape of the answer. ☐

A quantity on a scale is regression; a choice from a fixed set is classification. The deciding question is whether "off by a little" is better than "off by a lot."

4 I can tell classification from clustering by who defines the categories. ☐

You define and label them in advance → classification. They emerge from the data and you name them after → clustering. This is the single most common beginner mix-up.

5 I can point to the features and the label in a small dataset. ☐

Given a table, identify which columns are inputs (features) and which is the known answer (label). If you cannot find the label, the task is not supervised — that itself is a useful signal.

6 I understand training as tuning weights and bias to fit labels. ☐

Training is an automated loop: guess, measure the error against the label, nudge the numbers, repeat. No human hand-tunes the weights. "The model learned" means exactly this loop ran.

7 I can spot when a numeric output is really a category. ☐

Postal codes, product IDs, and class indices are numbers that name categories. Recognising these prevents the classic error of running regression on a classification problem.

8 I treat generative output as a draft to verify, not an oracle. ☐

Plausible is not true. A literacy checkpoint is the reflex to verify, cite, and keep a human gate — rather than pasting confident output straight through to a decision.

9 I can sketch the data → train → model → predict loop from memory. ☐

Labelled data goes into training; training produces a model; the model makes predictions on new, unseen inputs. Drawing this without notes means the mechanics, not just the words, are yours.

10 I can teach the bakery contrast to a non-engineer in sixty seconds. ☐

The real test of understanding is transfer. If you can make a colleague grasp "rules you write" versus "patterns the program finds" in a minute, you own the foundation — and you are ready for Part 2.

A score below 7 means the vocabulary is in place but the distinctions are still fuzzy — re-run the "four kinds" tour and the pitfalls, then come back.

References

Google for Developers — Machine Learning Crash Course: "What Is Machine Learning?" — developers.google.com — system types, the supervised/unsupervised/reinforcement/generative split, and regression vs classification.
Google for Developers — Machine Learning Glossary — developers.google.com — definitions of model, feature, label, training, weights, and bias.
Pedro Domingos — "A Few Useful Things to Know about Machine Learning," Communications of the ACM, 2012 — homes.cs.washington.edu — why data quality and good features outweigh raw volume.
Tom M. Mitchell — Machine Learning, McGraw-Hill, 1997 — cs.cmu.edu — the foundational "learning from experience" definition.
Silver et al. — "Mastering the game of Go with deep neural networks and tree search," Nature, 2016 — nature.com — reinforcement learning at scale.
NIST — AI Risk Management Framework (AI RMF 1.0) — nist.gov — governing the failure modes that distinguish learned systems from rule-based ones.

Ascendion Engineering Knowledge Base ← Engineering Foundations

Machine Learning, From Zero

What "machine learning" actually means

How a model learns

1. Model

2. Feature

3. Label

4. Training

5. Prediction

6. Weights and bias

The four kinds of ML system

Supervised learning

Regression — predict a quantity

Classification — predict a category

Unsupervised learning

Reinforcement learning

Generative AI

Flowchart

Common pitfalls when learning ML

http://www.w3.org/2000/svg" aria-hidden="true">Confusing classification with clustering

What to do instead

http://www.w3.org/2000/svg" aria-hidden="true">Assuming more data always beats a sharper question

What to do instead

http://www.w3.org/2000/svg" aria-hidden="true">Treating generative output as ground truth

What to do instead

http://www.w3.org/2000/svg" aria-hidden="true">Calling every numeric output "regression"

What to do instead

http://www.w3.org/2000/svg" aria-hidden="true">Believing the model "understands"

What to do instead

Check yourself

Adoption checklist

Related

References

Confusing classification with clustering

Assuming more data always beats a sharper question

Treating generative output as ground truth

Calling every numeric output "regression"

Believing the model "understands"