Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Probability Theory 101: Building Blocks for Machine Learning. Chapter 1.
Brain Dump
Oct 6, 2024
126 views
Think of probability as a tool that helps us make sense of uncertainty. It's not about knowing what will happen for sure, but about understanding the chances of different outcomes. Probability guides everything from weather forecasts to stock market trends and even the decisions made by AI.
When you interact with AI language models like ChatGPT or Claude or other LLMs, the responses you receive are based on probability calculations. The model identifies the most likely response from a vast array of alternatives, effectively selecting the top contender based on its learned patterns and data.
But here’s the interesting part: even though we use probability all the time, people still haven't come to a clear conclusion about its definition. Mathematicians, philosophers, and scientists often debate what probability truly means. It's a tricky concept that shapes how we understand the uncertain world around us.
This seemingly simple concept is at the heart of:
Probability theory provides a mathematical framework for reasoning about uncertain events.It's branch of mathematics that deals with analyzing & quantifying the likelihood of events occurring. In the context of deep learning, we often with uncertain or incomplete information. Probability theory provides a formal framework for reasoning about uncertainty & reasoning.
Table of Content:
let's start:
Events: Possible outcomes of an experiment or process
Sample space: Set of all possible outcomes.
Probability: How likely an event is to occur, expressed between number 0 & 1.
Axioms of Probability:
The fundamental rules or principles that form the foundation of probability theory
Key areas of probability theory:
1. Discrete probability:
2. Continuous probability:
4. Random Variables:
There are two kinds:
4.1. Discrete Random Variable:
Imagine we're flipping a coin. We want to assign numbers to the outcomes so we can do math with them. This assignment is what we call a random variable.
Let's define a random variable X like this:
Now that we've defined our discrete random variable X, we need a way to describe its probability distribution. For discrete random variables like our coin flip example, we use what's called a Probability Mass Function (PMF).
Why do we need to describe the probability distribution?
Because it provides a complete picture of random variable behaviors.
4.a Probability Mass function:
This is for a single discrete random variable. A function that gives the probability of each possible value for a single discrete random variable
Probability mass function for our coin flip example would look like this:
It's called mass function because it's assigns weight/mass to each discrete value that X can take.
Key properties of Probability Mass Function (PMF):
Using the PMF, we can easily answer questions like:
4.1.b Join Probability Mass Function (PMF):
Deals with multiple discrete random variable. A function that gives joint probability for every possible combination of values. Deals with two or more random variables simultaneously.
Joint Probability
Joint PMF considers the simultaneous occurrence of multiple events, while the regular pmf only considers one event at a time.
4.1.c PMF vs Joint PMF difference:
Let's consider a scenario with two dice, a red die (R) and a blue die (B).
Key difference:
Joint PMF allows us to consider multiple variables at once, capturing how they occur together, while the regular pmf only deals with one variable at a time.
4.1.d Marginal Probability:
The probability of an outcome for one die, regardless of the outcomes of other die.
Considering scenario with two dice, a red die (R) and a blue die (B).
Calculating Marginal Probabilities:
similarly for blue die to roll 3,
we sum all the all the joint probabilities where blue die is 3
Interpretation:
The marginal probability P(R=2) = 1/6 means:
4.1.e. Relationship between marginal probabilities & joint probabilities:
Marginal probabilities are derived from joint probabilities, by summing over all the possibilities of other variables.
4.2. Continuous Random Variable:
Imagine we're measuring the waiting time at a busy coffee shop. Let's define a random variable T as the time (in minutes) a customer waits for their order.
Properties:
Questions we might ask:
4.2.f What is Probability Density Function?
It's a simply function like a one shown in the above graph. It's type of function used specially only for continuous random variable. It is used to describe the likelihood of continuous random variable taking on a specific value. For example, the probability that a customer waits exactly 2 minutes for their coffee is zero. Not 2.01 minutes, not 2.002 minutes, not 1.9999 minutes, not 2.000001 minutes, but precisely 2.000000... minutes. Not a single millisecond more or less than 2 minutes.
The probability of this exact waiting time is 0.
By showing a probability of an exact value is 0, it demonstrates why we need a different approach for continuous variables. This is where probability density comes in.
Zero probability for an exact value also explain why we need to integrate an interval to find probabilities, rather than simply evaluating a function at a point.
Key points:
Probability Density Function, f(x) is not a probability, but a density. The probability is found by integrating the PDF over an interval. For a PDF, the integral of f(x) over all possible x is 1. This property means that the total area under the entire PDF curve is always equal to 1.
Since PDF is for continuous random variables, what do we have for discrete random variables?
Probability Mass Function:
Both PMFs and PDFs describe the probability distribution of random variables. In this sense, they serve analogous roles for discrete and continuous variables respectively.
5. Conditional Probability:
Conditional probability is the probability of an event occurring, given that another event has already occurred.
An example of diagnosing strep throat:
Why is conditional probability more informative?
Conditional probability helps us update our beliefs based on new information.
In machine learning this is crucial as it allows models to make prediction based on specific input features (like symptoms) rather than overall statistics. This leads to more accurate & useful predictions.
Definition: The conditional probability of event A given event B is denoted as P(A|B) and is defined as:
P(A|B) = P(A ∩ B) / P(B)
Where:
Another simple example:
Example: Let's consider a deck of 52 playing cards.
Interpretation: If we know we've drawn a face card, the probability of it being a King is 1/3.
Applications in Deep Learning:
6. Chain Rule of Probability:
Chain rule of probability is closely related to conditional probability. It's a direct application & extension of conditional probability to scenarios involving multiple events.
Chain rule is built on the definition of conditional probability.
We can rearrange this to get:
This is the simplest form of the chain rule, for just two events.
Let's consider three events related to diagnosing strep throat:
Simple Two-Event Case:
Which means:
The probability of having Strep throat & Sore throat is equal to probability of having Strep throat given that you have Sore throat times probability of having sore throat.
Now, let's extend this to three events using the Chain Rule:
The probability of having Strep throat & Sore throat & fever is equal to
The extension of chain rule allows us to calculate the joint probability of all three events occurring together by breaking it down to conditional probabilities.
The power of this rule is that it can be extended to any number of events.
for example if we added 4th symptoms D, where
we can further extend it to:
P(A ∩ B ∩ C ∩ D) = P (A | B,C,D) * P (B | C, D) * P (C |D) * P(D)
This process of breaking down joint probabilities into product of conditional probabilities is fundamental in many areas of machine learning & probabilistic modeling, as it allows to work with manageable pieces of information.
To be continued to next note chapter 2.
#CSCE598 #deeplearning #probability #probabilitytheory #ml #ai #mathematics
#DeepLearningNotes