The Loop

Four rounds. Every loop contains some version of these.

01 Live

ML System Design

Design production ML systems end-to-end under a 45-minute time constraint.

  • Problem scoping — do you ask the right questions before designing?
  • Architecture decisions — retrieval vs ranking, trade-offs, cascade thinking
  • Scale and latency instinct — napkin math, serving budgets, retraining cadences
  • ML evaluation — offline metrics, A/B design, guardrails, north star connection
Duration 45–60 min
Frequency 1–2 rounds per loop
Seniority L4 and above
02 Live

AI Systems Design

Design LLM-powered systems — RAG pipelines, agents, evaluation frameworks, fine-tuning strategies.

  • LLM architecture choices — when to RAG vs fine-tune vs prompt-engineer
  • Retrieval design — chunking, embedding, indexing, hybrid search
  • Evaluation — hallucination, faithfulness, latency, cost at scale
  • Agent design — tool use, memory, orchestration, failure handling
Duration 45 min
Frequency 1–2 rounds (AI-first companies), 0–1 at product companies
Seniority L4 and above; L6+ expected to scope ambiguous problems
03 Live

ML Coding

Implement ML algorithms and systems from scratch — no libraries, clean code, explained aloud.

  • Algorithmic ML — implement attention, k-means, backprop, decision tree splits
  • Data pipelines — feature engineering, windowed aggregates, join logic
  • Production code quality — vectorisation, memory efficiency, edge case handling
  • Explanation under pressure — think aloud, name complexity, offer alternatives
Duration 45 min
Frequency 1–2 rounds per loop
Seniority L4 and above; L6+ expected to optimise and generalise
04 Live

Statistics & Experimentation

A/B testing, causal inference, and power analysis — the quantitative backbone of every production ML decision.

  • A/B design — randomisation unit, holdout strategy, novelty effect, network effects
  • Power analysis — MDE, sample size, α and β, multiple testing correction
  • Causal inference — difference-in-differences, instrumental variables, selection bias
  • Statistical pitfalls — p-hacking, Simpson's paradox, Goodhart's Law, survivorship bias
Duration 45–60 min
Frequency 1–2 rounds at product companies; often replaces system design for DS roles
Seniority L4: apply the right test correctly. L6+: design experiments end-to-end and catch validity threats

The Bar

What changes as you go senior — and how to demonstrate it.

The same question in a system design interview is evaluated completely differently at L5 vs L6. Know where the bar is before you walk in.

L4 / L5 Execution Bar Mid-level · SWE II / Senior I
  • Problem framing Problem is given. You scope within it.
  • What they test Can you solve it correctly and completely?
  • Design Find the right answer. Demonstrate depth.
  • Failure mode Missing components, shallow trade-offs.
  • ML metrics Recall the right ones for the problem.
  • Production Know serving, retraining, monitoring basics.
  • Experimentation Apply the right test. State assumptions. Compute sample size.
L6 / Staff+ Strategy Bar Senior / Staff · Tech Lead
  • Problem framing You define the scope. Ambiguity is the test.
  • What they test Can you own the problem end-to-end?
  • Design Defend trade-offs. Show what you would not build.
  • Failure mode Missing stakes, business connection, org impact.
  • ML metrics Choose the right ones and justify the choice.
  • Production Failure modes, cost, team runway, tech debt.
  • Experimentation Design end-to-end. Catch validity threats. Connect to business decisions.

The single biggest shift L5 → L6

At L5, the problem is handed to you. At L6, the ambiguity is the problem. A senior candidate who waits to be told what to design has already failed. State your scope assumptions in the first 60 seconds and defend them.

Company Targeting

Where to spend your depth depends on who's interviewing you.

Same candidate, same skill level — different companies weight rounds completely differently. Prep to the company's bar, not a generic bar.

FAANG Google · Meta · Amazon · Netflix · Apple
ML System Design Very heavy. 45–60 min, often 2 rounds. Google tests ML theory depth; Meta expects multi-stakeholder framing.
ML Coding Strict. LeetCode medium-hard. Google often adds ML implementation (implement attention, custom loss).
Experimentation Google and Meta: heavy. Expect power analysis, metric design, A/B validity, and causal inference for ML roles. Amazon tests LP stories separately.
Key differentiator Scale is the answer to every "why" at FAANG. Napkin math. 99th percentile latency. Mention it without prompting.
AI-first OpenAI · Anthropic · DeepMind · Cohere · Mistral
AI Systems Design Primary differentiator. Expect 1–2 rounds on LLM architecture, evaluation, safety, RLHF, inference optimization.
Research depth Be able to discuss recent papers. They care that you're reading. Alignment and safety awareness matters at Anthropic.
ML Coding Implementation-heavy. Implement training loop components, custom optimisers, quantisation techniques.
Key differentiator Mission alignment is heavily weighted. Know the company's safety philosophy. Curiosity over pedigree.
Product companies Stripe · Airbnb · Uber · LinkedIn · DoorDash
ML System Design Business-connected. The interviewer will push "how does this improve revenue / retention?" Answer it proactively.
Experimentation A/B testing, causal inference, power analysis — treated as a first-class interview topic, especially for DS roles.
Coding Standard LeetCode. Less ML-specific than FAANG or AI-first.
Key differentiator Pragmatic ML wins. "Ship a 70% solution now" over "perfect system in 6 months." Frame everything in user outcomes.
Startups Series A–C · AI-powered products
Breadth Full-stack ML. Can you own the data pipeline, model, deployment, and monitoring alone? They're hiring for one person doing four jobs.
Experimentation Lightweight and practical. They care about shipping — "we ran an experiment, saw X, shipped Y." Less rigour, more product instinct.
System design Lighter and more conversational. They care less about FAANG-style depth and more about practical decision-making under constraints.
Key differentiator Move fast, make decisions under uncertainty, take ownership. Show evidence of all three — in that order.

Your Role

Which rounds matter most depends on what you're interviewing for.

ML Engineer

Building and deploying ML systems in production.

  • Primary ML System Design 2+ rounds. This is the core test.
  • Primary ML Coding 1–2 rounds. LeetCode + ML implementation.
  • Important Statistics & Experimentation 1 round at product companies. A/B design, metric sensitivity, experiment validity.
  • Lighter AI Systems Design 0–1 round. Relevant at AI-adjacent companies.
Focus

Nail system design depth first. Coding is table stakes — clean, efficient, explained. Senior ✦ signals in system design separate L5 from L6.

AI / LLM Engineer

Building LLM-powered products and infrastructure.

  • Primary AI Systems Design 2 rounds. This is your differentiator.
  • Primary ML System Design 1 round. Expect recommendation or ranking.
  • Important ML Coding LLM implementation focus — attention, tokenisation, fine-tuning loops.
  • Lighter Statistics & Experimentation 0–1 round. Focus on LLM evaluation design and A/B testing for model releases.
Focus

Lead with LLM architecture literacy. Know evaluation cold — hallucination, faithfulness, latency/cost tradeoffs. Safety thinking is a differentiator at Anthropic, OpenAI, DeepMind.

Data Scientist

Turning data into decisions and products.

  • Primary Statistics & Experimentation The core DS test. A/B design, power analysis, causal inference, statistical pitfalls.
  • Primary ML Coding Python + SQL. Feature engineering, data manipulation, model implementation.
  • Important ML System Design Lighter version — data pipeline, metrics, experimentation design.
  • Lighter AI Systems Design 0–1 round at AI-focused DS roles. Evaluation framework and RAG quality design.
Focus

Experimentation fluency is non-negotiable. Connect every ML decision to a business metric. The DS candidate who speaks in p-values but can't connect to revenue or retention will not get an offer at L5+.

The Clock

A suggested sequence. Adjust to your loop date.

4 weeks out Deep Study
  • ML System Design: 1 system per day, all 5 phases. Use the Design Room.
  • ML Coding: 2 problems per day — fundamentals, not hard LeetCode yet.
  • Stats & Experimentation: master A/B design, power analysis, and causal inference fundamentals.
  • For AI roles: read 2–3 recent LLM papers relevant to the company.
2 weeks out Mock & Sharpen
  • Run 2–3 mock system design interviews (with a peer or AI).
  • AI Systems: deep dive on RAG, evaluation, agent patterns.
  • Experimentation: work through 2–3 case studies — metric design, validity threats, network effects.
  • Identify the 2–3 systems most likely for your target company. Drill those.
1 week out Review Mode
  • Quick-reference only — no new material.
  • Research your target company: recent blog posts, papers, product launches.
  • Review statistical pitfalls: p-hacking, Simpson's paradox, survivorship bias, Goodhart's Law.
  • One full mock interview with timed phases.
48 hours out Light Touch
  • Review Senior ✦ callouts for the 2 most likely systems.
  • Review your causal inference framework and A/B experiment validity checklist.
  • Prepare 3 thoughtful questions to ask the interviewer.
  • No new material. Sleep matters more than cramming.
Day of Execute
  • Re-read the interviewer prompt slowly. Clarify before designing.
  • State your two-stage / cascade / threshold approach in the first 2 minutes.
  • Name a Senior ✦ insight per phase. Don't wait to be asked.
  • Close with business impact. Always.