AI / ML Engineering Interview Prep

Interview
Command Center

Four rounds. Know what each one tests. Know what your level requires.

The bar by level → Company targeting → Your role → Prep sequence →

01 Live ML System Design Design production ML systems end-to-end under a 45-minute time constraint. 02 Live AI Systems Design Design LLM-powered systems — RAG pipelines, agents, evaluation frameworks, fine-tuning strategies. 03 Live ML Coding Implement ML algorithms and systems from scratch — no libraries, clean code, explained aloud. 04 Live Statistics & Experimentation A/B testing, causal inference, and power analysis — the quantitative backbone of every production ML decision.

The Loop

Four rounds. Every loop contains some version of these.

01 Live

ML System Design

Design production ML systems end-to-end under a 45-minute time constraint.

Problem scoping — do you ask the right questions before designing?
Architecture decisions — retrieval vs ranking, trade-offs, cascade thinking
Scale and latency instinct — napkin math, serving budgets, retraining cadences
ML evaluation — offline metrics, A/B design, guardrails, north star connection

Duration 45–60 min

Frequency 1–2 rounds per loop

Seniority L4 and above

Open Design Room →

02 Live

AI Systems Design

Design LLM-powered systems — RAG pipelines, agents, evaluation frameworks, fine-tuning strategies.

LLM architecture choices — when to RAG vs fine-tune vs prompt-engineer
Retrieval design — chunking, embedding, indexing, hybrid search
Evaluation — hallucination, faithfulness, latency, cost at scale
Agent design — tool use, memory, orchestration, failure handling

Duration 45 min

Frequency 1–2 rounds (AI-first companies), 0–1 at product companies

Seniority L4 and above; L6+ expected to scope ambiguous problems

Open Design Room →

03 Live

ML Coding

Implement ML algorithms and systems from scratch — no libraries, clean code, explained aloud.

Algorithmic ML — implement attention, k-means, backprop, decision tree splits
Data pipelines — feature engineering, windowed aggregates, join logic
Production code quality — vectorisation, memory efficiency, edge case handling
Explanation under pressure — think aloud, name complexity, offer alternatives

Duration 45 min

Frequency 1–2 rounds per loop

Seniority L4 and above; L6+ expected to optimise and generalise

Open Code Room →

04 Live

Statistics & Experimentation

A/B testing, causal inference, and power analysis — the quantitative backbone of every production ML decision.

A/B design — randomisation unit, holdout strategy, novelty effect, network effects
Power analysis — MDE, sample size, α and β, multiple testing correction
Causal inference — difference-in-differences, instrumental variables, selection bias
Statistical pitfalls — p-hacking, Simpson's paradox, Goodhart's Law, survivorship bias

Duration 45–60 min

Frequency 1–2 rounds at product companies; often replaces system design for DS roles

Seniority L4: apply the right test correctly. L6+: design experiments end-to-end and catch validity threats

Open Lab Room →

The Bar

What changes as you go senior — and how to demonstrate it.

The same question in a system design interview is evaluated completely differently at L5 vs L6. Know where the bar is before you walk in.

L4 / L5 Execution Bar Mid-level · SWE II / Senior I

Problem framing Problem is given. You scope within it.
What they test Can you solve it correctly and completely?
Design Find the right answer. Demonstrate depth.
Failure mode Missing components, shallow trade-offs.
ML metrics Recall the right ones for the problem.
Production Know serving, retraining, monitoring basics.
Experimentation Apply the right test. State assumptions. Compute sample size.

L6 / Staff+ Strategy Bar Senior / Staff · Tech Lead

Problem framing You define the scope. Ambiguity is the test.
What they test Can you own the problem end-to-end?
Design Defend trade-offs. Show what you would not build.
Failure mode Missing stakes, business connection, org impact.
ML metrics Choose the right ones and justify the choice.
Production Failure modes, cost, team runway, tech debt.
Experimentation Design end-to-end. Catch validity threats. Connect to business decisions.

The single biggest shift L5 → L6

At L5, the problem is handed to you. At L6, the ambiguity is the problem. A senior candidate who waits to be told what to design has already failed. State your scope assumptions in the first 60 seconds and defend them.

Company Targeting

Where to spend your depth depends on who's interviewing you.

Same candidate, same skill level — different companies weight rounds completely differently. Prep to the company's bar, not a generic bar.

FAANG Google · Meta · Amazon · Netflix · Apple

ML System Design Very heavy. 45–60 min, often 2 rounds. Google tests ML theory depth; Meta expects multi-stakeholder framing.

ML Coding Strict. LeetCode medium-hard. Google often adds ML implementation (implement attention, custom loss).

Experimentation Google and Meta: heavy. Expect power analysis, metric design, A/B validity, and causal inference for ML roles. Amazon tests LP stories separately.

Key differentiator Scale is the answer to every "why" at FAANG. Napkin math. 99th percentile latency. Mention it without prompting.

AI-first OpenAI · Anthropic · DeepMind · Cohere · Mistral

AI Systems Design Primary differentiator. Expect 1–2 rounds on LLM architecture, evaluation, safety, RLHF, inference optimization.

Research depth Be able to discuss recent papers. They care that you're reading. Alignment and safety awareness matters at Anthropic.

ML Coding Implementation-heavy. Implement training loop components, custom optimisers, quantisation techniques.

Key differentiator Mission alignment is heavily weighted. Know the company's safety philosophy. Curiosity over pedigree.

Product companies Stripe · Airbnb · Uber · LinkedIn · DoorDash

ML System Design Business-connected. The interviewer will push "how does this improve revenue / retention?" Answer it proactively.

Experimentation A/B testing, causal inference, power analysis — treated as a first-class interview topic, especially for DS roles.

Coding Standard LeetCode. Less ML-specific than FAANG or AI-first.

Key differentiator Pragmatic ML wins. "Ship a 70% solution now" over "perfect system in 6 months." Frame everything in user outcomes.

Startups Series A–C · AI-powered products

Breadth Full-stack ML. Can you own the data pipeline, model, deployment, and monitoring alone? They're hiring for one person doing four jobs.

Experimentation Lightweight and practical. They care about shipping — "we ran an experiment, saw X, shipped Y." Less rigour, more product instinct.

System design Lighter and more conversational. They care less about FAANG-style depth and more about practical decision-making under constraints.

Key differentiator Move fast, make decisions under uncertainty, take ownership. Show evidence of all three — in that order.

Your Role

Which rounds matter most depends on what you're interviewing for.

⚙

ML Engineer

Building and deploying ML systems in production.

Primary ML System Design 2+ rounds. This is the core test.
Primary ML Coding 1–2 rounds. LeetCode + ML implementation.
Important Statistics & Experimentation 1 round at product companies. A/B design, metric sensitivity, experiment validity.
Lighter AI Systems Design 0–1 round. Relevant at AI-adjacent companies.

Focus

Nail system design depth first. Coding is table stakes — clean, efficient, explained. Senior ✦ signals in system design separate L5 from L6.

◈

AI / LLM Engineer

Building LLM-powered products and infrastructure.

Primary AI Systems Design 2 rounds. This is your differentiator.
Primary ML System Design 1 round. Expect recommendation or ranking.
Important ML Coding LLM implementation focus — attention, tokenisation, fine-tuning loops.
Lighter Statistics & Experimentation 0–1 round. Focus on LLM evaluation design and A/B testing for model releases.

Focus

Lead with LLM architecture literacy. Know evaluation cold — hallucination, faithfulness, latency/cost tradeoffs. Safety thinking is a differentiator at Anthropic, OpenAI, DeepMind.

◉

Data Scientist

Turning data into decisions and products.

Primary Statistics & Experimentation The core DS test. A/B design, power analysis, causal inference, statistical pitfalls.
Primary ML Coding Python + SQL. Feature engineering, data manipulation, model implementation.
Important ML System Design Lighter version — data pipeline, metrics, experimentation design.
Lighter AI Systems Design 0–1 round at AI-focused DS roles. Evaluation framework and RAG quality design.

Focus

Experimentation fluency is non-negotiable. Connect every ML decision to a business metric. The DS candidate who speaks in p-values but can't connect to revenue or retention will not get an offer at L5+.

The Clock

A suggested sequence. Adjust to your loop date.

4 weeks out Deep Study

ML System Design: 1 system per day, all 5 phases. Use the Design Room.
ML Coding: 2 problems per day — fundamentals, not hard LeetCode yet.
Stats & Experimentation: master A/B design, power analysis, and causal inference fundamentals.
For AI roles: read 2–3 recent LLM papers relevant to the company.

2 weeks out Mock & Sharpen

Run 2–3 mock system design interviews (with a peer or AI).
AI Systems: deep dive on RAG, evaluation, agent patterns.
Experimentation: work through 2–3 case studies — metric design, validity threats, network effects.
Identify the 2–3 systems most likely for your target company. Drill those.

1 week out Review Mode

Quick-reference only — no new material.
Research your target company: recent blog posts, papers, product launches.
Review statistical pitfalls: p-hacking, Simpson's paradox, survivorship bias, Goodhart's Law.
One full mock interview with timed phases.

48 hours out Light Touch

Review Senior ✦ callouts for the 2 most likely systems.
Review your causal inference framework and A/B experiment validity checklist.
Prepare 3 thoughtful questions to ask the interviewer.
No new material. Sleep matters more than cramming.

Day of Execute

Re-read the interviewer prompt slowly. Clarify before designing.
State your two-stage / cascade / threshold approach in the first 2 minutes.
Name a Senior ✦ insight per phase. Don't wait to be asked.
Close with business impact. Always.

InterviewCommand Center

Four rounds. Every loop contains some version of these.

ML System Design

AI Systems Design

ML Coding

Statistics & Experimentation

What changes as you go senior — and how to demonstrate it.

Where to spend your depth depends on who's interviewing you.

Which rounds matter most depends on what you're interviewing for.

ML Engineer

AI / LLM Engineer

Data Scientist

A suggested sequence. Adjust to your loop date.

Interview
Command Center