Random Variable

Definition

A random variable is a function that maps experiment outcomes to numerical values . That is, it maps sample space to numbers:

we use to denote that the range (or more specifically codomain, image) of random variable .

If is finite or countably infinite, is called a discrete random variable.
If is uncountably infinite, is called a continuous random variable.

Probability Mass Function

The probability mass function (PMF) of a discrete random variable assigns probabilities to the possible values of the random variable. That is: where:

note that as shown in the function signature of , and thus the PMF must satisfy:

Expectation

The expectation (or expected value, mean) of a discrete random variable is defined as:

Recall that random variable is a function, and thus is the value of random variable at outcome .

Notation

The notation in random variable confuse me a lot in the beginning. Let’s clarify them here:

where denotes that the random variable takes value , which implies that:
denotes the probability that random variable takes value .
Most of the time, is interchangeable with and

Important

In the notation, is the value of the random variable, not the event

In the notation, is the event, not the value of the random variable

In the notation, is the value of the random variable, not the event

Likelihood

Likelihood Function

The likelihood function measures how likely a particular parameter value is given observed data .

where:

represents the parameters of the distribution
is the observed data (treated as constant)
is the probability of observing given parameter

Important

The above equation is equal in value but not equal in semantic:

Likelihood: is a function of parameters given fixed data

Probability: is a function of data given fixed parameters

For independent observations , the joint likelihood is:

Log Likelihood

The log likelihood is simply the natural logarithm of the likelihood function:

For independent observations, the log likelihood becomes:

Why use log likelihood?

Converts products to sums (easier to differentiate)
More numerically stable (avoids underflow)
Preserves the location of maximum:

Negative Log Likelihood (NLL)

The negative log likelihood (NLL) is simply the negative of the log likelihood:

Why use NLL?

Most optimization algorithms (like gradient descent) are designed for minimization
NLL minimization likelihood maximization
Often provides cleaner mathematical expressions

For independent observations:

References

Stanford CS109: Discrete Random Variables: Basics

Stanford CS109: Discrete Random Variables: More on Expectation

Wikipedia: Likelihood function

YCG's Oasis🌴

Recent Notes

A gentle introduction to go modules

Error Correction Code

Medium Access Control

Explorer

Probability

Random Variable

Definition

Probability Mass Function

Expectation

Notation

Likelihood

Likelihood Function

Log Likelihood

Negative Log Likelihood (NLL)

References

Graph View

Table of Contents

Backlinks