Random Variable
Definition
A random variable is a function that maps experiment outcomes to numerical values . That is, it maps sample space to numbers:
we use to denote that the range (or more specifically codomain, image) of random variable .
- If is finite or countably infinite, is called a discrete random variable.
- If is uncountably infinite, is called a continuous random variable.
Probability Mass Function
The probability mass function (PMF) of a discrete random variable assigns probabilities to the possible values of the random variable. That is: where:
note that as shown in the function signature of , and thus the PMF must satisfy:
Expectation
The expectation (or expected value, mean) of a discrete random variable is defined as:
Recall that random variable is a function, and thus is the value of random variable at outcome .
Notation
The notation in random variable confuse me a lot in the beginning. Letβs clarify them here:
-
where denotes that the random variable takes value , which implies that:
-
denotes the probability that random variable takes value .
-
Most of the time, is interchangeable with and
Important
- In the notation, is the value of the random variable, not the event
- In the notation, is the event, not the value of the random variable
- In the notation, is the value of the random variable, not the event
Likelihood
Likelihood Function
The likelihood function measures how likely a particular parameter value is given observed data .
where:
- represents the parameters of the distribution
- is the observed data (treated as constant)
- is the probability of observing given parameter
Important
The above equation is equal in value but not equal in semantic:
- Likelihood: is a function of parameters given fixed data
- Probability: is a function of data given fixed parameters
For independent observations , the joint likelihood is:
Log Likelihood
The log likelihood is simply the natural logarithm of the likelihood function:
For independent observations, the log likelihood becomes:
Why use log likelihood?
- Converts products to sums (easier to differentiate)
- More numerically stable (avoids underflow)
- Preserves the location of maximum:
Negative Log Likelihood (NLL)
The negative log likelihood (NLL) is simply the negative of the log likelihood:
Why use NLL?
- Most optimization algorithms (like gradient descent) are designed for minimization
- NLL minimization likelihood maximization
- Often provides cleaner mathematical expressions
For independent observations:
References
Stanford CS109: Discrete Random Variables: Basics
Stanford CS109: Discrete Random Variables: More on Expectation