summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--tutorials/module_4/4.4 Statistical Analysis.md28
-rw-r--r--tutorials/module_4/4.5 Statistical Analysis II.md63
2 files changed, 83 insertions, 8 deletions
diff --git a/tutorials/module_4/4.4 Statistical Analysis.md b/tutorials/module_4/4.4 Statistical Analysis.md
index 09ac1fb..de6de07 100644
--- a/tutorials/module_4/4.4 Statistical Analysis.md
+++ b/tutorials/module_4/4.4 Statistical Analysis.md
@@ -7,10 +7,21 @@
- Uncertainty, error bars, confidence intervals
---
## Engineering Models
-
+Why care? - By analyzing data engineers can use statistical tools to create a mathematical model to help us predict something. You've probably used excel for this before, we will do it with python.
- Curve fitting
--
-## Statistical tools
+ You've probably used excel for this before, we will do it with python.
+
+## Statistics Review
+Let's take a second to remind ourselves of some statistical terms and how we define it mathematically
+
+| | Formula |
+| ------------------------ | ---------------------------------------------------------------------------------------------------------------- |
+| Arithmetic Mean | $$\bar{y} = \frac{\sum y_i}{n}$$ |
+| Standard Deviation | $$s_y = \sqrt{\frac{S_t}{n - 1}}, \quad S_t = \sum (y_i - \bar{y})^2$$ |
+| Variance | $$s_y^2 = \frac{\sum (y_i - \bar{y})^2}{n - 1} = \frac{\left(\sum y_i^2 - \frac{(\sum y_i)^2}{n}\right)}{n- 1}$$ |
+| Coefficient of Variation | $$c.v. = \frac{s_y}{\bar{y}} \times 100\%$$ |
+
+## Statistics function in python
Both Numpy and Pandas come with some useful statistical tools that we can use to analyze our data. We can use these tools when working with data, it’s important to understand the **central tendency** and **spread** of your dataset. NumPy provides several built-in functions to quickly compute common statistical metrics such as **mean**, **median**, **standard deviation**, and **variance**. These are fundamental tools for analyzing measurement consistency, uncertainty, and identifying trends in data.
```python
import numpy as np
@@ -23,8 +34,17 @@ variance = np.var([1, 2, 3, 4, 5])
Pandas also includes several built-in statistical tools that make it easy to analyze entire datasets directly from a DataFrame. When working with pandas we can use methods such as `.mean()`, `.std()`, `.var()`, and especially `.describe()` to generate quick summaries of your data. These tools are convenient when working with experimental or simulation data that contain multiple variables, allowing you to assess trends, variability, and potential outliers all at once.
-## Statistical Distribution
+## Problem: Use pd.describe() to report on a dataseries
+
+
+---
+
+Great, so we
+## Statistical Distributions
+Normal distributions
+- Design thinking -> Motorola starting Six sigma organization based on the probability of a product to fail. Adopted world wide.
## Problem: Spectroscopy
+Let's
diff --git a/tutorials/module_4/4.5 Statistical Analysis II.md b/tutorials/module_4/4.5 Statistical Analysis II.md
index df1b585..3df558b 100644
--- a/tutorials/module_4/4.5 Statistical Analysis II.md
+++ b/tutorials/module_4/4.5 Statistical Analysis II.md
@@ -1,17 +1,72 @@
# 4.5 Statistical Analysis II
-[Introduction text]
+As mentioned in the previous tutorial. Data is what gives us the basis to create models. By now you've probably used excel to create a line of best fit. In this tutorial, we will go deeper into how this works and how we can apply this to create our own models to make our own predictions.ile changes in local repository
+​=======
+ File changes in remote reposito
## Least Square Regression and Line of Best Fit
+
+
### What is Linear Regression?
Linear regression is one of the most fundamental techniques in data analysis. It models the relationship between two (or more) variables by fitting a **straight line** that best describes the trend in the data.
-Linear regression helps identify proportional relationships, estimate calibration constants, or model linear system responses.
+### Linear
+To find a linear regression line we can apply the
-## Least square fitting
+### Exponential and Power functions
+Logarithm trick
+### Polynomial
+ For non-linear equations function such as a polynomial Numpy has a nice feature.
-## Extrapolation
+```python
+x_d = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8])
+y_d = np.array([0, 0.8, 0.9, 0.1, -0.6, -0.8, -1, -0.9, -0.4])
+
+plt.figure(figsize = (12, 8))
+for i in range(1, 7):
+
+ # get the polynomial coefficients
+ y_est = np.polyfit(x_d, y_d, i)
+ plt.subplot(2,3,i)
+ plt.plot(x_d, y_d, 'o')
+ # evaluate the values for a polynomial
+ plt.plot(x_d, np.polyval(y_est, x_d))
+ plt.title(f'Polynomial order {i}')
+
+plt.tight_layout()
+plt.show()
+```
+
+### Using Scipy
+```python
+# let's define the function form
+def func(x, a, b):
+ y = a*np.exp(b*x)
+ return y
+
+alpha, beta = optimize.curve_fit(func, xdata =
+ x, ydata = y)[0]
+print(f'alpha={alpha}, beta={beta}')
+
+# Let's have a look of the data
+plt.figure(figsize = (10,8))
+plt.plot(x, y, 'b.')
+plt.plot(x, alpha*np.exp(beta*x), 'r')
+plt.xlabel('x')
+plt.ylabel('y')
+plt.show()
+```
+
+
+
+
+### How well did we do?
+
+Using the
+
+## Extrapolation
+basis funct
## Moving average