diff options
Diffstat (limited to 'tutorials/module_4/4.4 Statistical Analysis.md')
| -rw-r--r-- | tutorials/module_4/4.4 Statistical Analysis.md | 108 |
1 files changed, 1 insertions, 107 deletions
diff --git a/tutorials/module_4/4.4 Statistical Analysis.md b/tutorials/module_4/4.4 Statistical Analysis.md index 4a90ba5..bf3a8bd 100644 --- a/tutorials/module_4/4.4 Statistical Analysis.md +++ b/tutorials/module_4/4.4 Statistical Analysis.md @@ -17,110 +17,4 @@ std = np.std([1, 2, 3, 4, 5]) variance = np.var([1, 2, 3, 4, 5]) ``` -As seen in the previous lecture, pandas also includes several built-in statistical tools that make it easy to analyze entire datasets directly from a DataFrame. Instead of applying individual NumPy functions to each column, you can use methods such as `.mean()`, `.std()`, `.var()`, and especially `.describe()` to generate quick summaries of your data. These tools are convenient when working with experimental or simulation data that contain multiple variables, allowing you to assess trends, variability, and potential outliers all at once. - -## Linear Regression -### What is Linear Regression? -Linear regression is one of the most fundamental techniques in data analysis. -It models the relationship between two (or more) variables by fitting a **straight line** that best describes the trend in the data. - -Mathematically, the model assumes a linear equation: -$$ -y = m x + b -$$ -where -- $y$ = dependent variable -- $x$ = independent variable -- $m$ = slope (rate of change) -- $b$ = intercept (value of $y$ when $x = 0$) - -Linear regression helps identify proportional relationships, estimate calibration constants, or model linear system responses. - -### Problem 1: Stress–Strain Relationship -Let’s assume we’ve measured the stress (σ) and strain (ε) for a material test and want to estimate Young’s modulus (E) from the slope. - -```python -import numpy as np -import pandas as pd -import matplotlib.pyplot as plt - -# Example data (strain, stress) -strain = np.array([0.000, 0.0005, 0.0010, 0.0015, 0.0020, 0.0025]) -stress = np.array([0.0, 52.0, 104.5, 157.2, 208.1, 261.4]) # MPa - -# Fit a linear regression line using NumPy -coeffs = np.polyfit(strain, stress, deg=1) -m, b = coeffs -print(f"Slope (E) = {m:.2f} MPa, Intercept = {b:.2f}") - -# Predicted stress -stress_pred = m * strain + b - -# Plot -plt.figure() -plt.scatter(strain, stress, label="Experimental Data", color="navy") -plt.plot(strain, stress_pred, color="red", label="Linear Fit") -plt.xlabel("Strain (mm/mm)") -plt.ylabel("Stress (MPa)") -plt.title("Linear Regression – Stress–Strain Curve") -plt.legend() -plt.grid(True) -plt.show() - -``` - -The slope `m` represents the Young’s Modulus (E), showing the stiffness of the material in the linear elastic region. - -```python -import numpy as np -import matplotlib.pyplot as plt -from sklearn.linear_model import LinearRegression -from sklearn.metrics import r2_score, mean_squared_error - -# ------------------------------------------------ -# 1. Example Data: Stress vs. Strain -# (Simulated material test data) -strain = np.array([0.000, 0.0005, 0.0010, 0.0015, 0.0020, 0.0025]) -stress = np.array([0.0, 52.0, 104.5, 157.2, 208.1, 261.4]) # MPa - -# Reshape strain for scikit-learn (expects 2D input) -X = strain.reshape(-1, 1) -y = stress - -# ------------------------------------------------ -# 2. Fit Linear Regression Model -model = LinearRegression() -model.fit(X, y) - -# Extract slope and intercept -m = model.coef_[0] -b = model.intercept_ -print(f"Linear model: Stress = {m:.2f} * Strain + {b:.2f}") - -# ------------------------------------------------ -# 3. Predict Stress Values and Evaluate the Fit -y_pred = model.predict(X) - -# Coefficient of determination (R²) -r2 = r2_score(y, y_pred) - -# Root mean square error (RMSE) -rmse = np.sqrt(mean_squared_error(y, y_pred)) - -print(f"R² = {r2:.4f}") -print(f"RMSE = {rmse:.3f} MPa") - -# ------------------------------------------------ -# 4. Visualize Data and Regression Line -plt.figure(figsize=(6, 4)) -plt.scatter(X, y, color="navy", label="Experimental Data") -plt.plot(X, y_pred, color="red", label="Linear Fit") -plt.xlabel("Strain (mm/mm)") -plt.ylabel("Stress (MPa)") -plt.title("Linear Regression – Stress–Strain Relationship") -plt.legend() -plt.grid(True) -plt.tight_layout() -plt.show() - -```
\ No newline at end of file +As seen in the previous lecture, pandas also includes several built-in statistical tools that make it easy to analyze entire datasets directly from a DataFrame. Instead of applying individual NumPy functions to each column, you can use methods such as `.mean()`, `.std()`, `.var()`, and especially `.describe()` to generate quick summaries of your data. These tools are convenient when working with experimental or simulation data that contain multiple variables, allowing you to assess trends, variability, and potential outliers all at once.
\ No newline at end of file |
