logo

Human and
Machine Learning

Week 1 — Introduction and Basic Bayes

Spring 2026 / Apr 17 / Prof. Joseph Austerweil
Block 1 · 5 minutes

Welcome

Human and Machine Learning · Week 1

Today

  • Who you are, what you're hoping to get out of the course
  • Why this course exists — inductive inference under uncertainty
  • Basic probability and Bayes' rule
  • Two worked examples: sick friend, Kahneman-Tversky cab problem
  • What is GenJAX, and how will we use it?
  • Logistics and homework
Human and Machine Learning · Week 1
Block 2 · 25 minutes

Why This Course Exists

Inductive inference

Human and Machine Learning · Week 1

Deduction vs. induction

Deduction

1+4=?1 + 4 = \,?

Preserves truth. One answer.

Induction

?+?=5? + ? = 5

Underdetermined. Many plausible answers.


The problems that people excel at — where we outperform machines — are inductive.

They feel easy. They are not.

Human and Machine Learning · Week 1

Three inductive problems

(... that the mind solves constantly)

Human and Machine Learning · Week 1

1. Perception

Which square is darker — A or B?


The visual system solves
square color + shadow = intensity
for square color, using priors about how retinal images are generated.

You are literally looking at an inductive inference right now.

Human and Machine Learning · Week 1

2. Mental states

Data: other people's behavior (motion, speech, gaze).
Hypotheses: their goals and beliefs.

Heider & Simmel (1944) — animated geometric shapes.

Everyone spontaneously narrates them as agents with intentions, grudges, fears.

Those intentions were never in the video.

(Watch ~90s: https://www.youtube.com/watch?v=VTNmLt7QX8E)

Human and Machine Learning · Week 1

3. Word learning

You're in the Australian outback. Someone points at a hopping animal and says "jumbuck."

What does jumbuck mean?

  • the animal itself?
  • undetached kangaroo-parts?
  • kangaroo temporal-stages?
  • any mammal?
  • dinner?

All consistent with the data. Children pick one — fast.
They bring strong prior expectations about what words mean.

Human and Machine Learning · Week 1

Check-in

What's the common structure across these three problems?


  • Hidden variable we care about: hh
  • Noisy / sparse data we observe: dd
  • Prior knowledge we bring to bear

P(hd)P(dh)P(h)P(h \mid d) \propto P(d \mid h) \cdot P(h)

(We'll unpack every symbol in the next hour.)

Human and Machine Learning · Week 1
Block 3 · 25 minutes

Basic Bayes

Foundations, compactly but rigorously

Human and Machine Learning · Week 1

Outcomes, events, event space

Flip two coins.

  • Outcome set: S={HH,HT,TH,TT}S = \{HH, HT, TH, TT\}     S=4|S| = 4
  • Event: any subset of SS.
    • e.g., "at least one head" ={HH,HT,TH}= \{HH, HT, TH\}
  • Event space: the set of events (the powerset of SS, when SS is finite).

For continuous SS, the math gets trickier — we'll come back to it next week.

Human and Machine Learning · Week 1

Random variables

A random variable is a function from the outcome space to some value space.

Y:S{T,F},Y(s)="does s have at least one H?"Y: S \to \{T, F\}, \quad Y(s) = \text{"does } s \text{ have at least one H?"}

Y(HH)=T,Y(HT)=T,Y(TH)=T,Y(TT)=F.Y(HH) = T, \quad Y(HT) = T, \quad Y(TH) = T, \quad Y(TT) = F.


The word random is about the input, not the mapping. The function itself is deterministic.

Human and Machine Learning · Week 1

Probability

P(Y=T)={sS:Y(s)=T}S=34P(Y = T) = \frac{|\{s \in S : Y(s) = T\}|}{|S|} = \frac{3}{4}

Count the outcomes where YY takes the value you care about. Divide by total outcomes.

Human and Machine Learning · Week 1

Joint probability

P(first=H,second=T)=P({HT})=14P(\text{first} = H,\, \text{second} = T) = P(\{HT\}) = \tfrac{1}{4}

Joint = "both of these at once."

Human and Machine Learning · Week 1

Conditional probability — set restriction

Question.
What is P(first=Hat least one H)P(\text{first} = H \mid \text{at least one } H)?

Many people's first answer: 3/4.

That's wrong.

Conditioning = restricting the universe to outcomes where the condition holds.

New universe: {HH,HT,TH}\{HH, HT, TH\} (3 outcomes).

Of those, 2 have first =H= H.

P(first=H1H)=23P(\text{first} = H \mid \geq 1\,H) = \tfrac{2}{3}

Human and Machine Learning · Week 1

The ratio definition

P(AB)=P(AB)P(B)P(A \mid B) = \frac{P(A \cap B)}{P(B)}

Same answer:

P(first=H1H)=P({HH})P({HH,HT,TH})=2/43/4=23P(\text{first}=H \mid \geq 1\,H) = \frac{P(\{HH\})}{P(\{HH, HT, TH\})} = \frac{2/4}{3/4} = \frac{2}{3} \checkmark

Human and Machine Learning · Week 1

Marginalization

P(X)=cP(X,C=c)P(X) = \sum_c P(X, C=c)

Sum the joint over the values of whatever you want to get rid of.

Human and Machine Learning · Week 1

Bayes' rule — one line

From the product rule, both directions:

P(AB)P(B)=P(A,B)=P(BA)P(A)P(A \mid B)\,P(B) = P(A, B) = P(B \mid A)\,P(A)

Divide:

P(AB)=P(BA)P(A)P(B)\boxed{\,P(A \mid B) = \frac{P(B \mid A)\,P(A)}{P(B)}\,}

That's it. The rest of the course is understanding what this means.

Human and Machine Learning · Week 1

Check-in

For the problem I'm about to give you — the cab problem

  • What's dd (the data, the observation)?
  • What's hh (the hidden variable)?
Human and Machine Learning · Week 1
Block 4 · 35 minutes

Bayes in Action

Sick friend · Cab problem · Synthesis

Human and Machine Learning · Week 1

Sick friend

Your 50-year-old friend has been a chain smoker since 18.
There's a nasty cold going around.
You go over to their house. You hear them cough.

Three possibilities:

  • Cold     ·     Stomach virus     ·     Lung cancer

How likely is each?

Human and Machine Learning · Week 1

First pass — qualitatively

PriorP(cough | d)Prior × Likelihood
Coldhighhighhigh
Stomach virusmediumlowlow
Lung cancermediumhighmedium

Before any math: cold is most likely.

Human and Machine Learning · Week 1

With numbers

$P(h)$$P(\text{cough} \mid h)$NumeratorPosterior
Cold0.450.90.4050.750
Stomach virus0.100.450.0450.083
Lung cancer0.100.90.0900.167

Sum of numerators: 0.405+0.045+0.090=0.5400.405 + 0.045 + 0.090 = 0.540.
Divide each numerator by the sum to normalize.

Human and Machine Learning · Week 1

The pedagogical punchline

P(coldcough)=0.90.450.90.45+0.450.10+0.90.10=0.4050.540=0.75P(\text{cold} \mid \text{cough}) = \frac{0.9 \cdot 0.45}{0.9 \cdot 0.45 + 0.45 \cdot 0.10 + 0.9 \cdot 0.10} = \frac{0.405}{0.540} = 0.75

Even with a chain-smoking friend, P(lung cancercough)17%P(\text{lung cancer} \mid \text{cough}) \approx 17\% — not dominant.

The cold wins because:

  1. Priors matter: colds are going around; lung cancer is rare in any given week, even for smokers. Cold has a ~4.5× higher prior.
  2. Likelihoods matter: a cough is more likely from a cold (0.9) than a stomach virus (0.45).
  3. Neither alone is enough. You need the whole rule.
Human and Machine Learning · Week 1

The Kahneman–Tversky cab problem

Human and Machine Learning · Week 1

Setup

  • A city has two cab companies: Blue and Green.
  • 85% of cabs are Green. 15% are Blue.
  • At night, a hit-and-run happens. A witness says the cab was Blue.
  • The witness correctly identifies cab colors at night 80% of the time.

What is the probability the cab was actually Blue?


(Commit to a number before we compute.)

Human and Machine Learning · Week 1

Pass 2 — the area diagram

Imagine 100 cabs.

  • Blue cabs: 15. Witness correctly says "blue" on 80% of them → 12.
  • Green cabs: 85. Witness incorrectly says "blue" on 20% of them → 17.

Total "blue" reports: 12+17=2912 + 17 = 29.

P(Bluewitness says Blue)=12290.41P(\text{Blue} \mid \text{witness says Blue}) = \frac{12}{29} \approx 0.41


Below 50%. The witness is more likely wrong than right, despite being 80% accurate.

Human and Machine Learning · Week 1

Pass 3 — formal Bayes

Hypotheses h{B,G}h \in \{B, G\}.   Observation oo = "witness says Blue."

P(Bo)=P(oB)P(B)P(oB)P(B)+P(oG)P(G)P(B \mid o) = \frac{P(o \mid B)\,P(B)}{P(o \mid B)\,P(B) + P(o \mid G)\,P(G)}

=0.800.150.800.15+0.200.85=0.120.12+0.17=0.120.290.41= \frac{0.80 \cdot 0.15}{0.80 \cdot 0.15 + 0.20 \cdot 0.85} = \frac{0.12}{0.12 + 0.17} = \frac{0.12}{0.29} \approx 0.41


Same number as the area diagram. The equation and the counting are the same calculation.

Human and Machine Learning · Week 1

Base-rate neglect

Most people report something close to 80%.
They anchor on the likelihood and ignore the prior.

Kahneman & Tversky (1972); Bar-Hillel (1980).

Medical doctors fail this too, even for diagnoses (Casscells, Schoenberger, Grayboys, 1978).

Human and Machine Learning · Week 1

Heuristics and biases

Heuristic
A rule-of-thumb. Usually works. Cheap to apply.

Bias
Systematic error introduced when a heuristic is misapplied.

Kahneman & Tversky argued that the mind is full of heuristics — good enough for most everyday inference but wrong in predictable ways when base rates are extreme, evidence is unfamiliar, or causal structure is unusual.

This is a running theme for the semester.

Human and Machine Learning · Week 1

Synthesis — the 5-step recipe

This is the course's method.

  1. Formalize the problem people face.
  2. Formalize the knowledge they bring to it.
  3. Apply Bayes' rule — compute what an ideal learner would infer.
  4. Characterize the ideal learner's behavior.
  5. Identify what knowledge and constraints must be assumed for model and human behavior to match.

Every week we pick a different cognitive domain and run this loop.

Human and Machine Learning · Week 1
Block 5 · 20 minutes

GenJAX, Admin, Homework

Human and Machine Learning · Week 1

What is GenJAX?

Bayes' rule is easy to write. Enumerating hypothesis spaces by hand is not.

For continuous distributions you can't enumerate at all.

GenJAX is a probabilistic programming language.
You write the generative process as code. The machine does the enumeration and counting.

Human and Machine Learning · Week 1

GenJAX by example

import genjax
from genjax import gen, flip

@gen
def chibany_day():
    lunch = flip(0.5) @ "lunch"
    dinner = flip(0.5) @ "dinner"
    return (lunch, dinner)
  • @gen — this is a generative model, not a regular function.
  • flip(0.5) — a Bernoulli random variable.
  • @ "lunch" — name the random choice so we can condition on it later.

This function IS the outcome space Ω\Omega from Block 3. Every call samples a point.

Human and Machine Learning · Week 1

Preemptive callouts

  • Use flip(p), not bernoulli(p). bernoulli takes a logit, not a probability.
  • The @ operator here is not matrix multiplication. GenJAX overloads it.
  • Output displays as Array(0, dtype=int32). Treat these as 0s and 1s.
  • Everything runs in Google Colab. No local installation.
Human and Machine Learning · Week 1

Admin — grading

ComponentWeight
Final project50%  proposal 5% · talk 7.5% · paper 37.5%
Programming assignments (4)30%  Clusters 7.5% · Gen 7.5% · MC 10.5% · RL 4.5%
Weekly written reflections15%  ~200 words · 8 of 13 · pass/fail
Participation5%
Quizzes0%  self-check only

All four assignments are completed in GenJAX. Weekly reflections replace the SP25 paper-presentation format (doesn't fit a 2-4 person seminar).

Human and Machine Learning · Week 1

Admin — the rest

  • Textbook: A Narrative Introduction to Probabilityhttps://josephausterweil.github.io/probintro/
  • Tooling: Google Colab, GenJAX. No local setup.
  • Office hours: TBD — I'll poll.
  • Email: best-effort 36h response (48h on weekends).
  • AI tools: welcome as a technical resource; must cite; not for quiz/assignment answers.
  • Full syllabus on the course website (coming up).
Human and Machine Learning · Week 1

Homework for Week 2

  1. Textbook T1 Ch 1-3 — reinforces everything from Block 3.
    intro/01_goals.md · 02_hungry.md · 03_prob_count.md

  2. Textbook T2 Ch 0-1 — gets GenJAX running in Colab.
    genjax/00_getting_started.md · 01_python_basics.md

  3. Start thinking about a final-project topic. We'll talk Week 2.

Week 2 has a short optional self-check quiz (Intro Probability Theory 1).

Human and Machine Learning · Week 1

Questions?


Thanks — see you next Friday.

Human and Machine Learning · Week 1