summaryrefslogtreecommitdiff
path: root/tutorials/module_4/4.4 Statistical Analysis.md
blob: bf3a8bd0f8dac4f75c9d28af4186daab42bdb66d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Statistical Analysis
**Learning Objectives:**

- Descriptive statistics (mean, median, variance, std deviation)
- Histograms and probability distributions
- Correlation and regression
- Uncertainty, error bars, confidence intervals
---
## Statistical tools
Numpy comes with some useful statistical tools that we can use to analyze our data. We can use these tools when working with data, it’s important to understand the **central tendency** and **spread** of your dataset. NumPy provides several built-in functions to quickly compute common statistical metrics such as **mean**, **median**, **standard deviation**, and **variance**. These are fundamental tools for analyzing measurement consistency, uncertainty, and identifying trends in data.
```python
import numpy as np

mean = np.mean([1, 2, 3, 4, 5])
median = np.median([1, 2, 3, 4, 5])
std = np.std([1, 2, 3, 4, 5])
variance = np.var([1, 2, 3, 4, 5])
```

As seen in the previous lecture, pandas also includes several built-in statistical tools that make it easy to analyze entire datasets directly from a DataFrame. Instead of applying individual NumPy functions to each column, you can use methods such as `.mean()`, `.std()`, `.var()`, and especially `.describe()` to generate quick summaries of your data. These tools are convenient when working with experimental or simulation data that contain multiple variables, allowing you to assess trends, variability, and potential outliers all at once.