diff options
Diffstat (limited to 'tutorials/module_4')
| -rw-r--r-- | tutorials/module_4/3_linear_regression.md | 106 | ||||
| -rw-r--r-- | tutorials/module_4/4.2 Interpreting Data.md | 40 | ||||
| -rw-r--r-- | tutorials/module_4/4.4 Statistical Analysis.md | 108 | ||||
| -rw-r--r-- | tutorials/module_4/Pandas.md | 16 |
4 files changed, 137 insertions, 133 deletions
diff --git a/tutorials/module_4/3_linear_regression.md b/tutorials/module_4/3_linear_regression.md index 511ea1a..6c60531 100644 --- a/tutorials/module_4/3_linear_regression.md +++ b/tutorials/module_4/3_linear_regression.md @@ -1,18 +1,110 @@ # Linear Regression -## Statistical tools -Numpy comes with some useful statistical tools that we can use to analyze our data. +### + +## Linear Regression +### What is Linear Regression? +Linear regression is one of the most fundamental techniques in data analysis. +It models the relationship between two (or more) variables by fitting a **straight line** that best describes the trend in the data. + +Mathematically, the model assumes a linear equation: +$$ +y = m x + b +$$ +where +- $y$ = dependent variable +- $x$ = independent variable +- $m$ = slope (rate of change) +- $b$ = intercept (value of $y$ when $x = 0$) + +Linear regression helps identify proportional relationships, estimate calibration constants, or model linear system responses. + +### Problem 1: Stress–Strain Relationship +Let’s assume we’ve measured the stress (σ) and strain (ε) for a material test and want to estimate Young’s modulus (E) from the slope. ```python import numpy as np +import pandas as pd +import matplotlib.pyplot as plt + +# Example data (strain, stress) +strain = np.array([0.000, 0.0005, 0.0010, 0.0015, 0.0020, 0.0025]) +stress = np.array([0.0, 52.0, 104.5, 157.2, 208.1, 261.4]) # MPa + +# Fit a linear regression line using NumPy +coeffs = np.polyfit(strain, stress, deg=1) +m, b = coeffs +print(f"Slope (E) = {m:.2f} MPa, Intercept = {b:.2f}") + +# Predicted stress +stress_pred = m * strain + b + +# Plot +plt.figure() +plt.scatter(strain, stress, label="Experimental Data", color="navy") +plt.plot(strain, stress_pred, color="red", label="Linear Fit") +plt.xlabel("Strain (mm/mm)") +plt.ylabel("Stress (MPa)") +plt.title("Linear Regression – Stress–Strain Curve") +plt.legend() +plt.grid(True) +plt.show() -mean = np.mean([1, 2, 3, 4, 5]) -median = np.median([1, 2, 3, 4, 5]) -std = np.std([1, 2, 3, 4, 5]) -variance = np.var([1, 2, 3, 4, 5]) ``` +The slope `m` represents the Young’s Modulus (E), showing the stiffness of the material in the linear elastic region. +```python +import numpy as np +import matplotlib.pyplot as plt +from sklearn.linear_model import LinearRegression +from sklearn.metrics import r2_score, mean_squared_error -### +# ------------------------------------------------ +# 1. Example Data: Stress vs. Strain +# (Simulated material test data) +strain = np.array([0.000, 0.0005, 0.0010, 0.0015, 0.0020, 0.0025]) +stress = np.array([0.0, 52.0, 104.5, 157.2, 208.1, 261.4]) # MPa + +# Reshape strain for scikit-learn (expects 2D input) +X = strain.reshape(-1, 1) +y = stress + +# ------------------------------------------------ +# 2. Fit Linear Regression Model +model = LinearRegression() +model.fit(X, y) + +# Extract slope and intercept +m = model.coef_[0] +b = model.intercept_ +print(f"Linear model: Stress = {m:.2f} * Strain + {b:.2f}") + +# ------------------------------------------------ +# 3. Predict Stress Values and Evaluate the Fit +y_pred = model.predict(X) + +# Coefficient of determination (R²) +r2 = r2_score(y, y_pred) + +# Root mean square error (RMSE) +rmse = np.sqrt(mean_squared_error(y, y_pred)) + +print(f"R² = {r2:.4f}") +print(f"RMSE = {rmse:.3f} MPa") + +# ------------------------------------------------ +# 4. Visualize Data and Regression Line +plt.figure(figsize=(6, 4)) +plt.scatter(X, y, color="navy", label="Experimental Data") +plt.plot(X, y_pred, color="red", label="Linear Fit") +plt.xlabel("Strain (mm/mm)") +plt.ylabel("Stress (MPa)") +plt.title("Linear Regression – Stress–Strain Relationship") +plt.legend() +plt.grid(True) +plt.tight_layout() +plt.show() + +```
\ No newline at end of file diff --git a/tutorials/module_4/4.2 Interpreting Data.md b/tutorials/module_4/4.2 Interpreting Data.md index 109a741..d199ee7 100644 --- a/tutorials/module_4/4.2 Interpreting Data.md +++ b/tutorials/module_4/4.2 Interpreting Data.md @@ -1,14 +1,48 @@ -# Interpreting Data +# Interpreting Data for Plotting Philosophy of visualizing data +When engineers work with data we have to consider the following: + - Purpose -> explain a process, compare of contrast, show a change or establish a relationship + - Plot Composition -> How do you arrange to components of the plot to clearly show the purpose + - Clarity +## Syntax and semantics in Mathematics - The meaning of our data +In the English language, grammar defines the syntax—the structural rules that determine how words are arranged in a sentence. However, meaning arises only through semantics, which tells us what the sentence actually conveys. +Similarly, in the language of mathematics, syntax consists of the formal rules that govern how we combine symbols, perform operations, and manipulate equations. Yet it is semantics—the interpretation of those symbols and relationships—that gives mathematics its meaning and connection to the real world. -## The meaning of your data -Similarly to the English language, when we put words together we create context. As engineers and scientists, if mathematics is our language, then the data is the context. +As engineers and scientists, we must grasp the semantics of our work—not merely the procedures—it is our responsibility to understand the meaning behind it. YouTube creator and rocket engineer Destin Sandlin or better known as SmarterEveryDay illustrates this concept in his video on the “backwards bicycle,” which demonstrates how syntax and semantics parallel the difference between knowledge and true understanding. +> I had the knowledge of operating the bike, but I did not have the understanding. Therefore knowledge does not equal understanding +> - Destin Sandlin + ## Audience +When Interpreting the +- Colleague/Supervisor .etc +- Research conference +- Clients + +## Making good plots + +- Labeling +- Grid +- Axis scaling +- + + +## Problem 1: + + + + + +## Problem 2: + + + + + diff --git a/tutorials/module_4/4.4 Statistical Analysis.md b/tutorials/module_4/4.4 Statistical Analysis.md index 4a90ba5..bf3a8bd 100644 --- a/tutorials/module_4/4.4 Statistical Analysis.md +++ b/tutorials/module_4/4.4 Statistical Analysis.md @@ -17,110 +17,4 @@ std = np.std([1, 2, 3, 4, 5]) variance = np.var([1, 2, 3, 4, 5]) ``` -As seen in the previous lecture, pandas also includes several built-in statistical tools that make it easy to analyze entire datasets directly from a DataFrame. Instead of applying individual NumPy functions to each column, you can use methods such as `.mean()`, `.std()`, `.var()`, and especially `.describe()` to generate quick summaries of your data. These tools are convenient when working with experimental or simulation data that contain multiple variables, allowing you to assess trends, variability, and potential outliers all at once. - -## Linear Regression -### What is Linear Regression? -Linear regression is one of the most fundamental techniques in data analysis. -It models the relationship between two (or more) variables by fitting a **straight line** that best describes the trend in the data. - -Mathematically, the model assumes a linear equation: -$$ -y = m x + b -$$ -where -- $y$ = dependent variable -- $x$ = independent variable -- $m$ = slope (rate of change) -- $b$ = intercept (value of $y$ when $x = 0$) - -Linear regression helps identify proportional relationships, estimate calibration constants, or model linear system responses. - -### Problem 1: Stress–Strain Relationship -Let’s assume we’ve measured the stress (σ) and strain (ε) for a material test and want to estimate Young’s modulus (E) from the slope. - -```python -import numpy as np -import pandas as pd -import matplotlib.pyplot as plt - -# Example data (strain, stress) -strain = np.array([0.000, 0.0005, 0.0010, 0.0015, 0.0020, 0.0025]) -stress = np.array([0.0, 52.0, 104.5, 157.2, 208.1, 261.4]) # MPa - -# Fit a linear regression line using NumPy -coeffs = np.polyfit(strain, stress, deg=1) -m, b = coeffs -print(f"Slope (E) = {m:.2f} MPa, Intercept = {b:.2f}") - -# Predicted stress -stress_pred = m * strain + b - -# Plot -plt.figure() -plt.scatter(strain, stress, label="Experimental Data", color="navy") -plt.plot(strain, stress_pred, color="red", label="Linear Fit") -plt.xlabel("Strain (mm/mm)") -plt.ylabel("Stress (MPa)") -plt.title("Linear Regression – Stress–Strain Curve") -plt.legend() -plt.grid(True) -plt.show() - -``` - -The slope `m` represents the Young’s Modulus (E), showing the stiffness of the material in the linear elastic region. - -```python -import numpy as np -import matplotlib.pyplot as plt -from sklearn.linear_model import LinearRegression -from sklearn.metrics import r2_score, mean_squared_error - -# ------------------------------------------------ -# 1. Example Data: Stress vs. Strain -# (Simulated material test data) -strain = np.array([0.000, 0.0005, 0.0010, 0.0015, 0.0020, 0.0025]) -stress = np.array([0.0, 52.0, 104.5, 157.2, 208.1, 261.4]) # MPa - -# Reshape strain for scikit-learn (expects 2D input) -X = strain.reshape(-1, 1) -y = stress - -# ------------------------------------------------ -# 2. Fit Linear Regression Model -model = LinearRegression() -model.fit(X, y) - -# Extract slope and intercept -m = model.coef_[0] -b = model.intercept_ -print(f"Linear model: Stress = {m:.2f} * Strain + {b:.2f}") - -# ------------------------------------------------ -# 3. Predict Stress Values and Evaluate the Fit -y_pred = model.predict(X) - -# Coefficient of determination (R²) -r2 = r2_score(y, y_pred) - -# Root mean square error (RMSE) -rmse = np.sqrt(mean_squared_error(y, y_pred)) - -print(f"R² = {r2:.4f}") -print(f"RMSE = {rmse:.3f} MPa") - -# ------------------------------------------------ -# 4. Visualize Data and Regression Line -plt.figure(figsize=(6, 4)) -plt.scatter(X, y, color="navy", label="Experimental Data") -plt.plot(X, y_pred, color="red", label="Linear Fit") -plt.xlabel("Strain (mm/mm)") -plt.ylabel("Stress (MPa)") -plt.title("Linear Regression – Stress–Strain Relationship") -plt.legend() -plt.grid(True) -plt.tight_layout() -plt.show() - -```
\ No newline at end of file +As seen in the previous lecture, pandas also includes several built-in statistical tools that make it easy to analyze entire datasets directly from a DataFrame. Instead of applying individual NumPy functions to each column, you can use methods such as `.mean()`, `.std()`, `.var()`, and especially `.describe()` to generate quick summaries of your data. These tools are convenient when working with experimental or simulation data that contain multiple variables, allowing you to assess trends, variability, and potential outliers all at once.
\ No newline at end of file diff --git a/tutorials/module_4/Pandas.md b/tutorials/module_4/Pandas.md index 672661f..499cbef 100644 --- a/tutorials/module_4/Pandas.md +++ b/tutorials/module_4/Pandas.md @@ -3,24 +3,8 @@ Panel Data https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_spreadsheets.html#compare-with-spreadsheets - - - - -https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html - - - https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html - https://pandas.pydata.org/docs/getting_started/intro_tutorials/05_add_columns.html - - https://pandas.pydata.org/docs/user_guide/reshaping.html - - https://pandas.pydata.org/docs/user_guide/merging.html - - - https://pandas.pydata.org/docs/getting_started/intro_tutorials/08_combine_dataframes.html
\ No newline at end of file |
