Day 10: Probability and Statistics - Basic Concepts, Relevant Distributions - Expanded#

Objective#

Deepen your mastery of probability and statistics, crucial for analyzing data and making data-driven decisions. This lesson provides a thorough exploration of the core concepts, various distributions, and practical applications using Python, equipping you with the skills to perform detailed statistical analysis and understand data distributions.

Prerequisites#

  • Proficient in Python programming.

  • Basic understanding of algebra and mathematical concepts.

  • A keen interest in data analysis and interpretation.

Introduction to Probability and Statistics#

Probability and statistics are interrelated disciplines that help us quantify uncertainty, analyze random phenomena, and make sense of data.

Probability: The Mathematics of Uncertainty#

Probability offers a framework for quantifying the uncertainty of events. It’s foundational for statistical inference, where we use data to make predictions about broader populations.

  • Random Experiment: An experiment where the outcome cannot be predicted with certainty.

  • Sample Space (S): The set of all possible outcomes of a random experiment.

  • Event (E): Any subset of the sample space.

  • Probability of an Event (P(E)): A measure ranging from 0 (impossibility) to 1 (certainty), denoting the likelihood of occurrence of an event.

Statistics: The Science of Data#

Statistics is the practice of collecting, analyzing, interpreting, and presenting data. It’s about extracting meaning from data and making informed decisions.

  • Descriptive Statistics: Methods for summarizing and organizing data. Includes measures like mean, median, mode, variance, and standard deviation.

  • Inferential Statistics: Techniques for making predictions or inferences about a population based on a sample. It includes hypothesis testing, confidence intervals, and regression analysis.

Understanding Distributions#

Distributions are fundamental in statistics as they describe how data points are likely to be distributed.

  • Uniform Distribution: The simplest probability distribution where every event has an equal chance of occurring.

  • Normal (Gaussian) Distribution: Many natural phenomena follow this distribution. It’s characterized by its mean (μ) and standard deviation (σ).

  • Binomial Distribution: Describes the number of successes in a fixed number of independent Bernoulli trials, each with its own success probability.

  • Poisson Distribution: Models the number of times an event occurs in a fixed interval of time or space, given the average number of times the event occurs over that interval

Implementing Probability and Statistics in Python#

Let’s dive into practical implementation using Python’s powerful libraries.

Step 1: Import Necessary Libraries#

import numpy as np
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt

Step 2: Descriptive Statistics#

  • Understanding the central tendency and dispersion in your data.

# Sample data
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Central Tendency
mean = np.mean(data)
median = np.median(data)
mode = stats.mode(data)[0]

# Dispersion
range_ = np.ptp(data)
variance = np.var(data)
std_dev = np.std(data)

print(f"Mean: {mean}, Median: {median}, Mode: {mode}, Range: {range_}, Variance: {variance}, Standard Deviation: {std_dev}") 
Mean: 5.5, Median: 5.5, Mode: 1, Range: 9, Variance: 8.25, Standard Deviation: 2.8722813232690143

Step 3: Understanding Distributions#

  • Visualize and understand different probability distributions.

Normal Distribution Example

# Parameters
mu, sigma = 0, 0.1

# Generating random values
s = np.random.normal(mu, sigma, 1000)

# Plotting
count, bins, ignored = plt.hist(s, 30, density=True)
plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( - (bins - mu)**2 / (2 * sigma**2) ), linewidth=2, color='r')
plt.title('Normal Distribution')
plt.show()
../_images/61220ab148633e1fad59f4d01737bdace2482dcad293b48e76229d149647321c.png

Binomial Distribution Example

# Parameters
n, p = 10, 0.5

# Generating random values
s = np.random.binomial(n, p, 1000)

# Plotting
plt.hist(s, bins=30, density=True)
plt.title('Binomial Distribution')
plt.show()
../_images/a778a8b0367d95eea8e0c157595fa1fc840f4b9afa64384774062de96ea1b228.png

Poisson Distribution Example

# Parameters
lambda_ = 5

# Generating random values
s = np.random.poisson(lambda_, 1000)

# Plotting
plt.hist(s, bins=30, density=True)
plt.title('Poisson Distribution')
plt.show()
../_images/0742ee6b9102b6f2be55ab630bd3da8f00e306e7ed76fe10be157370163fbb60.png

Step 4: Inferential Statistics#

  • Inferential statistics help us make predictions about a population based on sample data.

# Hypothesis Testing Example
# Null Hypothesis: The sample comes from a population with a mean of μ_0
# Alternative Hypothesis: The sample does not come from a population with a mean of μ_0

# Parameters
mu_0 = 5
alpha = 0.05  # Significance level

# T-test
t_statistic, p_value = stats.ttest_1samp(data, mu_0)

print(f"T-statistic: {t_statistic}, P-value: {p_value}")
print("Reject Null Hypothesis" if p_value < alpha else "Fail to Reject Null Hypothesis")
T-statistic: 0.5222329678670935, P-value: 0.614117254808394
Fail to Reject Null Hypothesis

Conclusion#

Probability and statistics are indispensable in the field of data analysis and decision-making. This lesson provided a deep dive into the core concepts, distributions, and practical Python implementations, laying the groundwork for rigorous data analysis and interpretation.

Further Resources#

https://www.khanacademy.org/math/statistics-probability

https://www.probabilitycourse.com/preface.php

https://www.vfu.bg/en/e-Learning/Math–Bertsekas_Tsitsiklis_Introduction_to_probability.pdf

https://www.youtube.com/watch?v=1uW3qMFA9Ho&list=PLUl4u3cNGP60hI9ATjSFgLZpbNJ7myAg6

https://morningside.libguides.com/math-stats-resources/probability-statistics