description	concept
memoryless, waiting	geometric, exponential
trials with success & failure	Bernoulli, binomial
large sample, average	CLT, approximately Gaussian
bound given only mean / variance	Markov / Chebyshev inequalities
estimating parameter from data	MLE, MAP, Bayesian update
update belief from evidence	Bayes rule
function of a RV	Jensen’s inequality
expected count of things	linearity + indicators
expected hitting time / return probability	first-step analysis, solve recurrence
Markov property (future depends only on present)	first-step analysis

Distributions

	discrete	continuous
counting	binomial: how many successes in $n$ trials?	Poisson: how many events in fixed window?
waiting	geometric: how many trials until first success?	exponential: how long until first event?

Discrete

bernoulli RV $X\sim\operatorname{Bernoulli}(p)$ models a single trial with $p$ probability of success

$$ P(X=1)=p,P(X=0)=1-p $$
- compact PMF form
  
  $$ P(X=x)=p^x(1-p)^{1-x} $$
$\mathbb E[X]=p$, $\text{Var}[X]=p(1-p)$
- proof
  - $\mathbb E[X]=1\cdot p+0\cdot (1-p)=p$
  - $\mathbb E[X^2]=1^2\cdot p+0^2\cdot (1-p)=p$
  - $\operatorname{Var}[X]=\mathbb E[X^2]-\mathbb E[X]^2=p-p^2=p(1-p)$
categorial RV generalizes Bernoulli from 2 to $k$ outcomes

$$ P(X=i)=p_i $$
- each item $i$ is $\operatorname{Bernoulli}(p_i)$, so $\mathbb E[X_i]=p_i$ and $\operatorname{Var}(X_i)=p_i(1-p_i)$
handy techniques
- indicator squaring: for any binary variable $X\in\{0,1\}$, $X^2=X$ so $\mathbb E[X^2]=\mathbb E[X]$

binomial RV $X\sim\operatorname{Binomial}(n,p)$ is the sum of $n$ independent $\text{Bernoulli}(p)$ trials
- $X=\sum_{i=1}^n X_i$ where $X_i\sim\operatorname{Bernoulli}(p)$
$$ P(X=k)=\binom nk p^k(1-p)^{n-k} $$
- every sequence with $k$ successes has the same probability $p^k(1-p)^{n-k}$, and there are $\binom nk$ such sequences
Binomial theorem

$$ (a+b)^n=\sum_{k=0}^n\binom nk a^k b^{n-k} $$
we can verify that $\sum_k P(X=k)=1$

$$ \begin{align*} \sum_{k=0}^n\binom nk p^k(1-p)^{n-k}=(p+(1-p))^n=1^n=1 \end{align*} $$
$\mathbb E[X]=np$, $\text{Var}[X]=np(1-p)$
- proof
  - $\mathbb E[X]=\mathbb E\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbb E[X_i]=np$
  - $\operatorname{Var}[X]=\operatorname{Var}[\sum_{i=1}^nX_i]=\sum_{i=1}^n\operatorname{Var}[X_i]=np(1-p)$ (second equality follows because the $X_i$ are independent)
handy techniques
- decompose into Bernoulli and use linearity of expectation
- decompose $X^2$ into cross-terms
  
  $$ X^2=\left(\sum_i X_i\right)^2=\sum_i X_i^2+\sum_{i\neq j}X_iX_j $$
  - this makes computing expectation easy (using the fact that $X_i^2=X_i$ for indicator variables)
    
    $$ \mathbb E[X^2]=\sum_i\mathbb E[X_i]+\sum_{i\neq j}\mathbb E[X_iX_j]=n\mathbb E[X_i]+n(n-1)\mathbb E[X_iX_j] $$

Poisson RV $X\sim\operatorname{Poisson}(\lambda)$ models the number of events in a fixed interval when they occur independently at an average rate of $\lambda$ over that interval

$$ P(X=k)=\frac{\lambda^ke^{-\lambda}}{k!} $$
the number of events in interval of length $t$ distributed as $N(t)\sim\operatorname{Poisson}(\lambda t)$