summaryrefslogtreecommitdiff
path: root/tutorials/module_4/4.5 Statistical Analysis II.md
diff options
context:
space:
mode:
Diffstat (limited to 'tutorials/module_4/4.5 Statistical Analysis II.md')
-rw-r--r--tutorials/module_4/4.5 Statistical Analysis II.md79
1 files changed, 60 insertions, 19 deletions
diff --git a/tutorials/module_4/4.5 Statistical Analysis II.md b/tutorials/module_4/4.5 Statistical Analysis II.md
index da25643..20805c9 100644
--- a/tutorials/module_4/4.5 Statistical Analysis II.md
+++ b/tutorials/module_4/4.5 Statistical Analysis II.md
@@ -18,10 +18,8 @@ $$
You may have asked yourself. "What if my data is not linear?". If the variables in your data is related to each other by exponential or power we can use a logarithm trick. We can apply a log scale to the function to linearize the function and then apply the linear least-squares method.
### Polynomial
-https://www.geeksforgeeks.org/machine-learning/python-implementation-of-polynomial-regression/
Least squares method can also be applied to polynomial functions. For non-linear equations function such as a polynomial, Numpy has a nice feature.
-
```python
x_d = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8])
y_d = np.array([0, 0.8, 0.9, 0.1, -0.6, -0.8, -1, -0.9, -0.4])
@@ -42,14 +40,14 @@ plt.show()
```
### Using Scipy
+You can also use scipy to
```python
# let's define the function form
def func(x, a, b):
y = a*np.exp(b*x)
return y
-alpha, beta = optimize.curve_fit(func, xdata =
- x, ydata = y)[0]
+alpha, beta = optimize.curve_fit(func, xdata = x, ydata = y)[0]
print(f'alpha={alpha}, beta={beta}')
# Let's have a look of the data
@@ -79,7 +77,6 @@ Where:
* $\hat{y}_i$ = predicted data from the model
* $\bar{y}$ = mean of observed data
#### Standard Error of the Estimate
-
If the scatter of data about the regression line is approximately normal, the **standard error of the estimate** represents the typical deviation of a point from the fitted line:
$$
@@ -88,27 +85,25 @@ $$
where $n$ is the number of data points.
Smaller $s_{y/x}$ means the regression line passes closer to the data points.
-#### Coefficient of Determination – (R^2)
+#### Coefficient of Determination – ($R^2$)
The coefficient of determination, (R^2), tells us how much of the total variation in (y) is explained by the regression:
-
$$
R^2 = \frac{S_l}{S_t} = 1 - \frac{S_r}{S_t}
$$
-* (R^2 = 1.0) → perfect fit (all points on the line)
-* (R^2 = 0) → model explains none of the variation
-
-In engineering terms, a high (R^2) indicates that your model captures most of the physical trend — for example, how deflection scales with load.
+- ($R^2$ = 1.0) → perfect fit (all points on the line)
+- ($R^2$ = 0) → model explains none of the variation
+In engineering terms, a high (R^2) indicates that your model captures most of the physical trend, for example, how deflection scales with load.
-
-#### Correlation Coefficient – (r)
-For linear regression, the **correlation coefficient** (r) is the square root of (R^2), with sign matching the slope of the line:
+#### Correlation Coefficient – ($r$)
+For linear regression, the correlation coefficient (r) is the square root of (R^2), with sign matching the slope of the line:
$$
r = \pm \sqrt{R^2}
$$
-* (r > 0): positive correlation (both variables increase together)
-* (r < 0): negative correlation (one increases, the other decreases)
-#### Example – Evaluating Fit in Python
+- ($r$ > 0): positive correlation (both variables increase together)
+- ($r$ < 0): negative correlation (one increases, the other decreases)
+## Problem 1:
+Fit a linear and polynomial model to stress-strain data. Compute R^2 and discuss which model fits better.
```python
import numpy as np
@@ -134,5 +129,51 @@ print(f"r = {r:.3f}")
```
## Extrapolation
-basis funct
-## Moving average \ No newline at end of file
+Once we have a regression model, it’s tempting to use it to predict values beyond the range of measured data. This process is called extrapolation.
+
+In interpolation, the model is supported by real data on both sides of the point. In extrapolation, we’re assuming that the same physical relationship continues indefinitely and that’s often not true in engineering systems.
+
+Most regression equations are empirical as they describe the trend in the range of observed conditions but may not capture the true physics. Common issues may originate from nonlinear behavior outside range such as stress–strain curves. Physical limitations, such as below absolute 0 temperatures, or greater than 100% efficiencies. Another case could be where the mechanism changes in the real world making the model inapplicable such as heat transfer switching from convection to radiation at higher temperatures.
+
+Some guidelines of using extrapolation:
+- Plot the data used for fitting
+- Avoid predicting far beyond the range of your data unless supported by physical models
+## Moving average
+Real experimental data often contains small random fluctuations that obscure the underlying trend a.k.a. noise. Rather than fitting a complex equation, we can smooth the data using a moving average, which replaces each point with the average of its nearby neighbors. This simple method reduces random variation while preserving the overall shape of the signal.
+
+A moving average or rolling mean takes the average over a sliding window of data points given by the equation:
+$$\bar{y}_i = \frac{1}{N} \sum_{j=i-k}^{i+k} y_j$$
+where:
+- $N$ = window size (number of points averaged),
+- $k = (N-1)/2$ if the window is centered,
+- $y_j$​ = original data values.
+
+If you select a larger window you'll have a smoother curve, but you loose detail. A smaller windows retains more detail but reduces less noise.
+### Example: Smoothing sensor noise
+```python
+import numpy as np
+import matplotlib.pyplot as plt
+import pandas as pd
+
+# Generate noisy signal
+x = np.linspace(0, 4*np.pi, 100)
+y = np.sin(x) + 0.2*np.random.randn(100)
+
+# Apply moving average with different window sizes
+df = pd.DataFrame({'x': x, 'y': y})
+df['y_smooth_5'] = df['y'].rolling(window=5, center=True).mean()
+df['y_smooth_15'] = df['y'].rolling(window=15, center=True).mean()
+
+plt.plot(df['x'], df['y'], 'k.', alpha=0.4, label='Raw data')
+plt.plot(df['x'], df['y_smooth_5'], 'r-', label='Window = 5')
+plt.plot(df['x'], df['y_smooth_15'], 'b-', label='Window = 15')
+plt.xlabel('Time (s)')
+plt.ylabel('Signal')
+plt.title('Effect of Moving Average Window Size')
+plt.legend()
+plt.show()
+```
+
+## Problem 2: Moving average
+Apply a moving average to noisy temperature data and compare raw vs. smoothed signals
+