tutorials/module_4/4.4 Statistical Analysis.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

# Statistical Analysis
**Learning Objectives:**

- Descriptive statistics (mean, median, variance, std deviation)
- Histograms and probability distributions
- Correlation and regression
- Uncertainty, error bars, confidence intervals
---
## Engineering Models
Why care? - By analyzing data engineers can use statistical tools to create a mathematical model to help us predict something. You've probably used excel for this before, we will do it with python.
- Curve fitting
 You've probably used excel for this before, we will do it with python.

## Statistics Review
Let's take a second to remind ourselves of some statistical terms and how we define it mathematically

|                          | Formula                                                                                                          |
| ------------------------ | ---------------------------------------------------------------------------------------------------------------- |
| Arithmetic Mean          | $$\bar{y} = \frac{\sum y_i}{n}$$                                                                                 |
| Standard Deviation       | $$s_y = \sqrt{\frac{S_t}{n - 1}}, \quad S_t = \sum (y_i - \bar{y})^2$$                                           |
| Variance                 | $$s_y^2 = \frac{\sum (y_i - \bar{y})^2}{n - 1} = \frac{\left(\sum y_i^2 - \frac{(\sum y_i)^2}{n}\right)}{n- 1}$$ |
| Coefficient of Variation | $$c.v. = \frac{s_y}{\bar{y}} \times 100\%$$                                                                      |

## Statistics function in python
Both Numpy and Pandas come with some useful statistical tools that we can use to analyze our data. We can use these tools when working with data, it’s important to understand the **central tendency** and **spread** of your dataset. NumPy provides several built-in functions to quickly compute common statistical metrics such as **mean**, **median**, **standard deviation**, and **variance**. These are fundamental tools for analyzing measurement consistency, uncertainty, and identifying trends in data.
```python
import numpy as np

mean = np.mean([1, 2, 3, 4, 5])
median = np.median([1, 2, 3, 4, 5])
std = np.std([1, 2, 3, 4, 5])
variance = np.var([1, 2, 3, 4, 5])
```

Pandas also includes several built-in statistical tools that make it easy to analyze entire datasets directly from a DataFrame. When working with pandas we can use methods such as `.mean()`, `.std()`, `.var()`, and especially `.describe()` to generate quick summaries of your data. These tools are convenient when working with experimental or simulation data that contain multiple variables, allowing you to assess trends, variability, and potential outliers all at once.

## Problem: Use pd.describe() to report on a dataseries


---

Great, so we 
## Statistical Distributions
Normal distributions
<img src="image_1761513820040.png" width="650">
- Design thinking -> Motorola starting Six sigma organization based on the probability of a product to fail. Adopted world wide.
- Statistical analysis of data.

## Spectroscopy
### Background
Spectroscopy is the study of how matter interacts with electromagnetic radiation, including the absorption and emission of light and other forms of radiation. It examines how these interactions depend on the wavelength of the radiation, providing insight into the physical and chemical properties of materials. In simple terms, spectroscopy helps us understand what substances are made of and how they behave when exposed to energy.

In engineering applications, spectroscopy is a powerful diagnostic and analysis tool. It can be used for material identification, such as how NASA determines the composition of planetary surfaces and atmospheres. It’s also applied in combustion and thermal analysis, where emission spectroscopy measures plasma temperatures and monitors exhaust composition in rocket engines. These applications allow engineers to better understand material behavior under extreme conditions and improve system performance and efficiency.


## Problem: Spectroscopy