From c37c82f36fadd74dfe84980c71e3fe1fabf47dcd Mon Sep 17 00:00:00 2001 From: Christian Kolset Date: Wed, 15 Oct 2025 11:15:29 -0600 Subject: Added module 4 tutorials --- ...Introduction to Data and Scientific Datasets.md | 65 ++++++++++ .../module_4/4.2 Importing and Managing Data.md | 142 +++++++++++++++++++++ .../4.3 Data Cleaning and Preprocessing.md | 104 +++++++++++++++ tutorials/module_4/4.4 Statistical Analysis.md | 126 ++++++++++++++++++ .../4.5 Data Filtering and Signal Processing.md | 47 +++++++ .../4.6 Data Visualization and Presentation.md | 29 +++++ tutorials/module_4/Pasted image 20251013064715.png | Bin 0 -> 10807 bytes tutorials/module_4/Pasted image 20251013064718.png | Bin 0 -> 10807 bytes tutorials/module_4/Spectroscopy.md | 16 +++ tutorials/module_4/outline.md | 135 ++++++++++++++++++++ 10 files changed, 664 insertions(+) create mode 100644 tutorials/module_4/4.1 Introduction to Data and Scientific Datasets.md create mode 100644 tutorials/module_4/4.2 Importing and Managing Data.md create mode 100644 tutorials/module_4/4.3 Data Cleaning and Preprocessing.md create mode 100644 tutorials/module_4/4.4 Statistical Analysis.md create mode 100644 tutorials/module_4/4.5 Data Filtering and Signal Processing.md create mode 100644 tutorials/module_4/4.6 Data Visualization and Presentation.md create mode 100644 tutorials/module_4/Pasted image 20251013064715.png create mode 100644 tutorials/module_4/Pasted image 20251013064718.png create mode 100644 tutorials/module_4/Spectroscopy.md create mode 100644 tutorials/module_4/outline.md diff --git a/tutorials/module_4/4.1 Introduction to Data and Scientific Datasets.md b/tutorials/module_4/4.1 Introduction to Data and Scientific Datasets.md new file mode 100644 index 0000000..8327006 --- /dev/null +++ b/tutorials/module_4/4.1 Introduction to Data and Scientific Datasets.md @@ -0,0 +1,65 @@ +# Introduction to Data and Scientific Datasets + +**Learning objectives:** + +- Understand what makes data “scientific” (units, precision, metadata) +- Recognize types of data: time-series, experimental, simulation, and imaging data +- Identify challenges in data processing (missing data, noise, outliers) +- Overview of the data-analysis workflow +--- +### What is scientific data? +Scientific data refers to **measured or simulated information** that describes a physical phenomenon in a quantitative and reproducible way. Scientific data is rooted in physical laws and carries information about the system’s behavior, boundary conditions, and measurement uncertainty whether this is collected experimentally or predicted with a model. + +We may collect this in the following ways: +- **Experiments** – temperature readings from thermocouples, strain or force from sensors, vibration accelerations, or flow velocities. +- **Simulations** – outputs from finite-element or CFD models such as pressure, stress, or temperature distributions. +- **Instrumentation and sensors** – digital or analog signals from transducers, encoders, or DAQ systems. + +### Introduction to pandas +`pandas` (**Pan**el **Da**ta) is a Python library designed for **data analysis and manipulation**, widely used in engineering, science, and data analytics. It provides two core data structures: the **Series** and the **DataFrame**. + +A `Series` represents a single column or one-dimensional labeled array, while a `DataFrame` is a two-dimensional table of data, similar to a spreadsheet table, where each column is a `Series` and each row has a labeled index. + +DataFrames can be created from dictionaries, lists, NumPy arrays, or imported from external files such as CSV or Excel. Once data is loaded, you can **view and explore** it using methods like `head()`, `tail()`, and `describe()`. Data can be **selected by label** or **by position**. These indexing systems make it easy to slice, filter, and reorganize datasets efficiently. + + +### Problem 1: Import a text file +```python +import pandas as pd + +file_path = "force_displacement_data.txt" + +df_txt = pd.read_csv( + file_path, + delim_whitespace=True, + comment="#", + skiprows=0, + header=0 +) + +print("\n=== Basic Statistics ===") +print(df_txt.describe()) + +if "Force_N" in df_txt.columns: + print("\nFirst five Force readings:") + print(df_txt["Force_N"].head()) + +try: + import matplotlib.pyplot as plt + + plt.plot(df_txt.iloc[:, 0], df_txt.iloc[:, 1]) + plt.xlabel(df_txt.columns[0]) + plt.ylabel(df_txt.columns[1]) + plt.title("Loaded Data from Text File") + plt.grid(True) + plt.show() + +except ImportError: + print("\nmatplotlib not installed — skipping plot.") + +``` + + +**Activities & Examples:** +- Load small CSV datasets using `numpy.loadtxt()` and `pandas.read_csv()` +- Discuss real ME examples: strain gauge data, thermocouple readings, pressure transducers \ No newline at end of file diff --git a/tutorials/module_4/4.2 Importing and Managing Data.md b/tutorials/module_4/4.2 Importing and Managing Data.md new file mode 100644 index 0000000..101d5ab --- /dev/null +++ b/tutorials/module_4/4.2 Importing and Managing Data.md @@ -0,0 +1,142 @@ +# Importing and Managing Data + +**Learning objectives:** + +- Import data from CSV, Excel, and text files using Pandas +- Handle headers, delimiters, and units +- Combine and merge multiple datasets +- Manage data with time or index labels +--- +## File types +Once data is collected, the first step is importing it into a structured form that Python can interpret. The `pandas` library provides the foundation for this, it can read nearly any file format used in engineering (text files, CSV, Excel sheets, CFD results, etc. as well as many python formats such as, arrays, lists, dicitonaries, Numpy arrays etc.) and organize the data in a DataFrame, a tabular structure similar to an Excel sheet but optimized for coding. +![](https://pandas.pydata.org/docs/_images/02_io_readwrite.svg) +## Importing spreadsheets using Pandas +Comma-Separated Values (CSV) files is a common spreadsheet type file. It is essentially a text file where each line is a now row of tables and commas indicate that a new column has stated. It is a standard convention to save spreadsheets in this format. + +Let's take a look at how this works in python. +```python +import pandas as pd + +# Read a CSV file +df = pd.read_csv("data_experiment.csv") + +# Optional arguments +df_csv = pd.read_csv( + "data_experiment.csv", + delimiter=",", # specify custom delimiter + header=0, # row number to use as header + index_col=None, # or specify a column as index + skiprows=0, # skip metadata lines +) +print df +``` + +We now created a new dataframe with the data from our .csv file. + +We can also do this for **excel files**. Pandas has a built-in function to make this easier for us. +```python +df_xlsx = pd.read_excel("temperature_log.xlsx", sheet_name="Sheet1") +print(df_xlsx.head()) +``` + +Additionally, although not a very common practice in engineering but very useful: Pandas can import a wide variety of file types such as JSON, HTML, SQL or even your clipboard. + +### Handling Headers, Units, and Metadata +Raw data often contains metadata or units above the table. Pandas can account for this metadata by skipping the first few rows. + +```python +df = pd.read_csv("sensor_data.csv", skiprows=3) +df.columns = ["Time_s", "Force_N", "Displacement_mm"] + +# Convert units +df["Displacement_m"] = df["Displacement_mm"] / 1000 +``` + +### Writing and Editing Data in pandas +https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html + +Once data has been analyzed or cleaned, `pandas` allows you to **export results** to multiple file types for reporting or further processing. Similarily to importing we can also export .csv files and Excel files. Pandas makes it easy to modify individual datapoints directly within a DataFrame. You can localize entries either by label or position + +```python +# by name +df.loc[row_label, column_label]`  +#or by position  +df.iloc[row_index, column_index] +``` + + +```python +import pandas as pd + +# Create DataFrame manually +data = { + "Time_s": [0, 1, 2, 3], + "Force_N": [0.0, 5.2, 10.4, 15.5], + "Displacement_mm": [0.0, 0.3, 0.6, 0.9] +} +df = pd.DataFrame(data) + +# Edit a single value +df.loc[1, "Force_N"] = 5.5 + +# Export to CSV +df.to_csv("edited_experiment.csv", index=False) +``` + +This workflow makes pandas ideal for working with tabular data, you can quickly edit or generate datasets, verify values, and save clean, structured files for later visualization or analysis. + +## Subsetting and Conditional filtering +You can select rows, columns, or specific conditions from a DataFrame. + +```python +# Select a column +force = df["Force_N"] + +# Select multiple columns +subset = df[["Time_s", "Force_N"]] + +# Conditional filtering +df_high_force = df[df["Force_N"] > 50] +``` + + +![[Pasted image 20251013064718.png]] + +## Combining and Merging Datasets +Often, multiple sensors or experiments must be merged into one dataset for analysis. + +```python +# Merge on a common column (e.g., time) +merged = pd.merge(df_force, df_temp, on="Time_s") + +# Stack multiple test runs vertically +combined = pd.concat([df_run1, df_run2], axis=0) +``` + + +## Problem 1: Describe a dataset +Use pandas built-in describe data to report on the statistical mean of the given experimental data. + +```python +import matplotlib.pyplot as plt + +plt.plot(df["Time_s"], df["Force_N"]) +plt.xlabel("Time (s)") +plt.ylabel("Force (N)") +plt.title("Force vs. Time") +plt.show() +``` + + +### Problem 2: Import time stamped data + + + +### Further Docs +[Comparison with Spreadsheets](https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_spreadsheets.html#compare-with-spreadsheets) +[Intro to Reading/Writing Files](https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html) +[Subsetting Data](https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html) +[Adding Columns](https://pandas.pydata.org/docs/getting_started/intro_tutorials/05_add_columns.html) +[Reshaping Data](https://pandas.pydata.org/docs/user_guide/reshaping.html) +[Merging DataFrames](https://pandas.pydata.org/docs/user_guide/merging.html) +[Combining DataFrames](https://pandas.pydata.org/docs/getting_started/intro_tutorials/08_combine_dataframes.html) diff --git a/tutorials/module_4/4.3 Data Cleaning and Preprocessing.md b/tutorials/module_4/4.3 Data Cleaning and Preprocessing.md new file mode 100644 index 0000000..7ac126c --- /dev/null +++ b/tutorials/module_4/4.3 Data Cleaning and Preprocessing.md @@ -0,0 +1,104 @@ +# Data Cleaning and Preprocessing + +**Learning objectives:** + +- Detect and handle missing or invalid data +- Identify and remove outliers +- Apply smoothing and detrending +- Unit consistency and scaling +--- +## What is data cleaning? +Data cleaning is an **iterative and adaptive process** that uses different methods depending on the characteristics of the dataset, the goals of the analysis, and the tools available. It generally includes several key tasks, such as: +- Handling or replacing missing and invalid data +- Detecting and correcting outliers +- Reducing noise through smoothing or filtering techniques + + +## Handling missing or invalid data +Missing data occurs when expected values or measurements are absent from a dataset, often appearing as `NULL`, `0`, empty strings, or `NaN` (Not a Number) entries. These gaps can arise from various sources, including sensor malfunctions during data acquisition, errors in transmission, or formatting issues during data conversion. Because missing data can distort analyses and weaken model accuracy, it must be carefully identified and treated during the data cleaning stage. + +Detecting missing data may seem simple, but selecting an appropriate way to replace those gaps is often more complex. The process typically begins by locating missing or invalid entries through visualization or value inspection. Once identified, the goal is to estimate replacements that closely approximate the true, unobserved values. The method used depends heavily on the behavior and structure of the data. + +- **Slowly changing data**, such as temperature measurements, can often be filled using the nearest valid observation. +- **Seasonal or moderately variable data**, like weather records, may benefit from statistical approaches such as moving averages, medians, or _K_-nearest neighbor imputation. +- **Strongly time-dependent data**, such as financial or process signals, are best handled using interpolation methods that estimate values based on surrounding data points. + +You may have data the looks like this. +[![A solar irradiance raw input data time-series plot with missing values.|450](https://www.mathworks.com/discovery/data-cleaning/_jcr_content/mainParsys/band/mainParsys/lockedsubnav/mainParsys/columns_copy_copy/725f6f68-0273-4bd3-8e6a-6a184615752a/image.adapt.full.medium.jpg/1758740047296.jpg) + +In Python, the `pandas` library provides several simple and powerful ways to handle missing values. Missing entries in a DataFrame appear as `NaN` (Not a Number), and you can replace or estimate these values using methods such as forward fill, backward fill, interpolation, or moving averages. + +- Forward fill (`ffill`) uses the last valid observation to replace missing values, which is useful for slowly changing signals. +- Backward fill (`bfill`) propagates the next valid value backward to fill earlier gaps. +- Interpolation estimates missing values using linear or polynomial trends between known data points. +- Rolling mean or moving average smooths short-term fluctuations by averaging nearby samples, similar to MATLAB’s `movmean()` function. + +The example below demonstrates these techniques applied to a temperature dataset with missing readings: +```python +import pandas as pd + +# Example data +data = {"Time_s": [0, 1, 2, 3, 4, 5], + "Temp_C": [20.1, None, 21.0, None, 22.3, 22.8]} +df = pd.DataFrame(data) + +# Fill with the last valid value (forward fill) +df["Temp_ffill"] = df["Temp_C"].ffill() + +# Fill with next valid value (backward fill) +df["Temp_bfill"] = df["Temp_C"].bfill() + +# Linear interpolation between missing values +df["Temp_interp"] = df["Temp_C"].interpolate(method="linear") + +# Rolling mean (similar to moving average) +df["Temp_movmean"] = df["Temp_C"].fillna( + df["Temp_C"].rolling(window=3, min_periods=1).mean() +) +print(df) + +``` + + +## Identify and remove outliers +Outliers are data points that differ greatly from the rest of the data, often appearing as unusually high or low values that don’t follow the overall pattern. They can distort analysis and lead to misleading conclusions. Outliers may result from measurement errors, data entry mistakes, normal variation, or true anomalies in the system being measured. + +One common statistical approach to detect and remove outliers is the **Z-score method**. A Z-score describes how far a data point is from the mean of the dataset, measured in units of standard deviation. For a normally distributed variable, most data points lie within three standard deviations of the mean (±3 σ). Values that fall far outside this range are likely to be **outliers**, meaning they deviate significantly from the typical trend of the data. + +In practice, we calculate the Z-score for each observation, take its absolute value, and remove points whose Z-score exceeds a threshold, commonly 3 for general data, or 2.5 when the dataset is smaller or more sensitive to noise. This method works best when the data roughly follows a bell-shaped (Gaussian) distribution. + +The following example demonstrates how we could apply the Z-score method using the SciPy library and a small sample dataset of force measurements: + +```python +import pandas as pd +import numpy as np +from scipy import stats + +# Example dataset +df = pd.DataFrame({"Force_N": [10, 11, 10.5, 10.2, 11.1, 50, 10.8, 9.9]}) + +# Compute Z-scores +z = np.abs(stats.zscore(df["Force_N"])) + +# Keep only points where Z < 3 (within 3 standard deviations) +df_clean = df[z < 3] + +print(df_clean) +``` + + +### Problem 1: Cleaning datasets +Clean up the following dataset using the methods above. + +## Apply smoothing and detrending + + + + +## Units and scalling + + + +### Problem 2: + + diff --git a/tutorials/module_4/4.4 Statistical Analysis.md b/tutorials/module_4/4.4 Statistical Analysis.md new file mode 100644 index 0000000..4a90ba5 --- /dev/null +++ b/tutorials/module_4/4.4 Statistical Analysis.md @@ -0,0 +1,126 @@ +# Statistical Analysis +**Learning Objectives:** + +- Descriptive statistics (mean, median, variance, std deviation) +- Histograms and probability distributions +- Correlation and regression +- Uncertainty, error bars, confidence intervals +--- +## Statistical tools +Numpy comes with some useful statistical tools that we can use to analyze our data. We can use these tools when working with data, it’s important to understand the **central tendency** and **spread** of your dataset. NumPy provides several built-in functions to quickly compute common statistical metrics such as **mean**, **median**, **standard deviation**, and **variance**. These are fundamental tools for analyzing measurement consistency, uncertainty, and identifying trends in data. +```python +import numpy as np + +mean = np.mean([1, 2, 3, 4, 5]) +median = np.median([1, 2, 3, 4, 5]) +std = np.std([1, 2, 3, 4, 5]) +variance = np.var([1, 2, 3, 4, 5]) +``` + +As seen in the previous lecture, pandas also includes several built-in statistical tools that make it easy to analyze entire datasets directly from a DataFrame. Instead of applying individual NumPy functions to each column, you can use methods such as `.mean()`, `.std()`, `.var()`, and especially `.describe()` to generate quick summaries of your data. These tools are convenient when working with experimental or simulation data that contain multiple variables, allowing you to assess trends, variability, and potential outliers all at once. + +## Linear Regression +### What is Linear Regression? +Linear regression is one of the most fundamental techniques in data analysis. +It models the relationship between two (or more) variables by fitting a **straight line** that best describes the trend in the data. + +Mathematically, the model assumes a linear equation: +$$ +y = m x + b +$$ +where +- $y$ = dependent variable +- $x$ = independent variable +- $m$ = slope (rate of change) +- $b$ = intercept (value of $y$ when $x = 0$) + +Linear regression helps identify proportional relationships, estimate calibration constants, or model linear system responses. + +### Problem 1: Stress–Strain Relationship +Let’s assume we’ve measured the stress (σ) and strain (ε) for a material test and want to estimate Young’s modulus (E) from the slope. + +```python +import numpy as np +import pandas as pd +import matplotlib.pyplot as plt + +# Example data (strain, stress) +strain = np.array([0.000, 0.0005, 0.0010, 0.0015, 0.0020, 0.0025]) +stress = np.array([0.0, 52.0, 104.5, 157.2, 208.1, 261.4]) # MPa + +# Fit a linear regression line using NumPy +coeffs = np.polyfit(strain, stress, deg=1) +m, b = coeffs +print(f"Slope (E) = {m:.2f} MPa, Intercept = {b:.2f}") + +# Predicted stress +stress_pred = m * strain + b + +# Plot +plt.figure() +plt.scatter(strain, stress, label="Experimental Data", color="navy") +plt.plot(strain, stress_pred, color="red", label="Linear Fit") +plt.xlabel("Strain (mm/mm)") +plt.ylabel("Stress (MPa)") +plt.title("Linear Regression – Stress–Strain Curve") +plt.legend() +plt.grid(True) +plt.show() + +``` + +The slope `m` represents the Young’s Modulus (E), showing the stiffness of the material in the linear elastic region. + +```python +import numpy as np +import matplotlib.pyplot as plt +from sklearn.linear_model import LinearRegression +from sklearn.metrics import r2_score, mean_squared_error + +# ------------------------------------------------ +# 1. Example Data: Stress vs. Strain +# (Simulated material test data) +strain = np.array([0.000, 0.0005, 0.0010, 0.0015, 0.0020, 0.0025]) +stress = np.array([0.0, 52.0, 104.5, 157.2, 208.1, 261.4]) # MPa + +# Reshape strain for scikit-learn (expects 2D input) +X = strain.reshape(-1, 1) +y = stress + +# ------------------------------------------------ +# 2. Fit Linear Regression Model +model = LinearRegression() +model.fit(X, y) + +# Extract slope and intercept +m = model.coef_[0] +b = model.intercept_ +print(f"Linear model: Stress = {m:.2f} * Strain + {b:.2f}") + +# ------------------------------------------------ +# 3. Predict Stress Values and Evaluate the Fit +y_pred = model.predict(X) + +# Coefficient of determination (R²) +r2 = r2_score(y, y_pred) + +# Root mean square error (RMSE) +rmse = np.sqrt(mean_squared_error(y, y_pred)) + +print(f"R² = {r2:.4f}") +print(f"RMSE = {rmse:.3f} MPa") + +# ------------------------------------------------ +# 4. Visualize Data and Regression Line +plt.figure(figsize=(6, 4)) +plt.scatter(X, y, color="navy", label="Experimental Data") +plt.plot(X, y_pred, color="red", label="Linear Fit") +plt.xlabel("Strain (mm/mm)") +plt.ylabel("Stress (MPa)") +plt.title("Linear Regression – Stress–Strain Relationship") +plt.legend() +plt.grid(True) +plt.tight_layout() +plt.show() + +``` \ No newline at end of file diff --git a/tutorials/module_4/4.5 Data Filtering and Signal Processing.md b/tutorials/module_4/4.5 Data Filtering and Signal Processing.md new file mode 100644 index 0000000..112826e --- /dev/null +++ b/tutorials/module_4/4.5 Data Filtering and Signal Processing.md @@ -0,0 +1,47 @@ +# Data Filtering and Signal Processing + +**Learning Objectives** + +- Understand the purpose of filtering in experimental and computational data +- Differentiate between noise, bias, and true signal +- Apply time-domain and frequency-domain filters to remove unwanted noise +- Introduce basic spatial (2-D) filtering for imaging or contour data +- Interpret filter performance and trade-offs (cutoff frequency, phase lag) + +--- + + +#### Topics + +- Review: what “noise” looks like statistically +- Time-domain filters + - Moving-average, Savitzky–Golay smoothing + - FIR and IIR filters (low-pass, high-pass, band-pass) +- Frequency-domain filtering + - Fast Fourier Transform (FFT) basics + - Noise removal using spectral methods +- Spatial filtering and image operations + - Gaussian smoothing, Sobel edge detection, median filters +- Comparing filtered vs. unfiltered data visually +#### Python Focus + +- `scipy.signal` for 1-D signals + - `butter()`, `filtfilt()`, `savgol_filter()` + - `freqz()` for visualizing filter response +- `numpy.fft` for frequency-domain analysis +- `scipy.ndimage` for 2-D spatial filters + - `gaussian_filter()`, `median_filter()`, `sobel()` +- Quick visualization with `matplotlib.pyplot` and `imshow()` +#### Applications + +- **Vibration analysis:** Filter accelerometer data to isolate modal frequencies +- **Thermal measurements:** Smooth transient thermocouple data to remove spikes +- **Fluid or heat transfer visualization:** Apply Gaussian blur or gradient filters to contour plots or infrared images +- **Structural testing:** Remove noise from strain-gauge or displacement signals before computing stress–strain + +#### Problems + +- Filter noisy vibration or pressure data and compare spectra before/after +- Apply a moving average and a Butterworth filter to the same dataset — evaluate differences +- Use `ndimage.sobel()` to highlight temperature gradients in a heat-map image +- Challenge: write a short Python function that automatically chooses an appropriate smoothing window based on noise level \ No newline at end of file diff --git a/tutorials/module_4/4.6 Data Visualization and Presentation.md b/tutorials/module_4/4.6 Data Visualization and Presentation.md new file mode 100644 index 0000000..b788fc7 --- /dev/null +++ b/tutorials/module_4/4.6 Data Visualization and Presentation.md @@ -0,0 +1,29 @@ +# Data Visualization and Presentation + +**Learning objectives:** + +- Create scientific plots using `matplotlib.pyplot` +- Customize figures (labels, legends, styles, subplots) +- Plot multi-dimensional and time-series data +- Combine plots and export for reports +--- + +**Extensions:** + +- Intro to `seaborn` for statistical visualization +- Plotting uncertainty and error bars + + + + +## How to represent data scientifically + + + + + + + + + +## Taking it further with R \ No newline at end of file diff --git a/tutorials/module_4/Pasted image 20251013064715.png b/tutorials/module_4/Pasted image 20251013064715.png new file mode 100644 index 0000000..e081063 Binary files /dev/null and b/tutorials/module_4/Pasted image 20251013064715.png differ diff --git a/tutorials/module_4/Pasted image 20251013064718.png b/tutorials/module_4/Pasted image 20251013064718.png new file mode 100644 index 0000000..e081063 Binary files /dev/null and b/tutorials/module_4/Pasted image 20251013064718.png differ diff --git a/tutorials/module_4/Spectroscopy.md b/tutorials/module_4/Spectroscopy.md new file mode 100644 index 0000000..544b049 --- /dev/null +++ b/tutorials/module_4/Spectroscopy.md @@ -0,0 +1,16 @@ + + + +Example + + +Statistics: + +Statistical Average (mean) + +Variancejkj +Standard Deviation +Linear regression + + + diff --git a/tutorials/module_4/outline.md b/tutorials/module_4/outline.md new file mode 100644 index 0000000..354a2f0 --- /dev/null +++ b/tutorials/module_4/outline.md @@ -0,0 +1,135 @@ + +Excellent — this is a rich and highly practical module for mechanical engineers, since _data analysis and visualization_ tie directly into interpreting experiments, simulations, and sensor data. + + +It follows your course flow from importing → cleaning/filtering → visualization, and integrates _NumPy, SciPy,_ and _Matplotlib_ progressively. + +--- + +# **Module 4: Data Analysis and Processing** + +## **Overview** + +This module introduces methods for handling, cleaning, and visualizing scientific and experimental data in Python. +Students will learn to: + +- Import data from various sources (CSV, Excel, sensors, simulations) +- Detect and correct data errors or noise +- Filter and smooth signals +- Extract meaningful patterns and trends +- Create clear, professional-quality figures for reports + +**Primary Libraries:** +`NumPy`, `Pandas`, `SciPy.signal`, `SciPy.ndimage`, `Matplotlib`, `Seaborn` + +--- + +## **Lecture 1 — Introduction to Data and Scientific Datasets** + +**Learning objectives:** + +- Understand what makes data “scientific” (units, precision, metadata) +- Recognize types of data: time-series, experimental, simulation, and imaging data +- Identify challenges in data processing (missing data, noise, outliers) +- Overview of the data-analysis workflow + +**Activities & Examples:** + +- Load small CSV datasets using `numpy.loadtxt()` and `pandas.read_csv()` +- Discuss real ME examples: strain gauge data, thermocouple readings, pressure transducers + +--- + +## **Lecture 2 — Importing and Managing Data** + +**Learning objectives:** + +- Import data from CSV, Excel, and text files using Pandas +- Handle headers, delimiters, and units +- Combine and merge multiple datasets +- Manage data with time or index labels + +**Hands-on examples:** + +- Combine data from multiple experimental runs +- Import time-stamped data and plot quick trends + +--- + +## **Lecture 3 — Data Cleaning and Preprocessing** + +**Learning objectives:** + +- Detect and handle missing or invalid data +- Identify and remove outliers +- Apply smoothing and detrending +- Unit consistency and scaling + +**Techniques & Tools:** + +- `pandas.isna()`, `dropna()`, and `fillna()` +- Statistical checks with `numpy.mean()`, `numpy.std()` +- Z-score outlier removal +- Case study: noisy strain vs. time dataset + +--- + +## **Lecture 4 — Data Filtering and Signal Processing (SciPy)** + +**Learning objectives:** + +- Understand why and when filtering is needed +- Apply low-pass, high-pass, and band-pass filters +- Implement moving-average and Savitzky–Golay filters +- Compare frequency vs. time-domain filtering + +**Toolbox Focus:** `scipy.signal` +**Example Applications:** + +- Filter noisy vibration data from accelerometers +- Remove DC offset from force measurements + +--- + +## **Lecture 5 — Image and Spatial Data Processing (Optional/Extension)** + +**Learning objectives:** + +- Introduce `scipy.ndimage` for image-based data +- Perform smoothing, edge detection, and segmentation +- Apply spatial filtering to thermal images or contour data + +**Applications:** + +- Heat distribution image analysis +- Flow visualization from CFD contour plots + +--- + +## **Lecture 6 — Data Visualization and Presentation** + +**Learning objectives:** + +- Create scientific plots using `matplotlib.pyplot` +- Customize figures (labels, legends, styles, subplots) +- Plot multi-dimensional and time-series data +- Combine plots and export for reports + +**Extensions:** + +- Intro to `seaborn` for statistical visualization +- Plotting uncertainty and error bars + +**Capstone Exercise:** + +- Load experimental dataset → clean → filter → visualize results + (Example: force–displacement data → stress–strain curve with trendline) + +--- + +## **Optional Lab/Project Ideas** + +- Clean and visualize experimental data from a tensile test +- Filter and interpret vibration data from a rotating machine +- Plot temperature variation in a heat exchanger experiment +- Generate report-quality figures comparing experimental and simulation data -- cgit v1.2.3