diff options
Diffstat (limited to 'tutorials')
4 files changed, 275 insertions, 67 deletions
diff --git a/tutorials/module_4/4.0 Outline.md b/tutorials/module_4/4.0 Outline.md index 8156651..0c19b12 100644 --- a/tutorials/module_4/4.0 Outline.md +++ b/tutorials/module_4/4.0 Outline.md @@ -43,7 +43,7 @@ g. Extrapolation and limitations h. Moving averages i. Problem 1: Fit a linear and polynomial model to stress-strain data. Compute R^2 and discuss which model fits better. - j. Problem 2: Apply a moving average to noisy temperature data and ocmpare ra vs. smoothed signals. + j. Problem 2: Apply a moving average to noisy temperature data and compare raw vs. smoothed signals. 6. Data Filtering and Signal Processing a. What is it and why it matters - noise vs. signal diff --git a/tutorials/module_4/4.5 Statistical Analysis II.md b/tutorials/module_4/4.5 Statistical Analysis II.md index da25643..20805c9 100644 --- a/tutorials/module_4/4.5 Statistical Analysis II.md +++ b/tutorials/module_4/4.5 Statistical Analysis II.md @@ -18,10 +18,8 @@ $$ You may have asked yourself. "What if my data is not linear?". If the variables in your data is related to each other by exponential or power we can use a logarithm trick. We can apply a log scale to the function to linearize the function and then apply the linear least-squares method. ### Polynomial -https://www.geeksforgeeks.org/machine-learning/python-implementation-of-polynomial-regression/ Least squares method can also be applied to polynomial functions. For non-linear equations function such as a polynomial, Numpy has a nice feature. - ```python x_d = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8]) y_d = np.array([0, 0.8, 0.9, 0.1, -0.6, -0.8, -1, -0.9, -0.4]) @@ -42,14 +40,14 @@ plt.show() ``` ### Using Scipy +You can also use scipy to ```python # let's define the function form def func(x, a, b): y = a*np.exp(b*x) return y -alpha, beta = optimize.curve_fit(func, xdata = - x, ydata = y)[0] +alpha, beta = optimize.curve_fit(func, xdata = x, ydata = y)[0] print(f'alpha={alpha}, beta={beta}') # Let's have a look of the data @@ -79,7 +77,6 @@ Where: * $\hat{y}_i$ = predicted data from the model * $\bar{y}$ = mean of observed data #### Standard Error of the Estimate - If the scatter of data about the regression line is approximately normal, the **standard error of the estimate** represents the typical deviation of a point from the fitted line: $$ @@ -88,27 +85,25 @@ $$ where $n$ is the number of data points. Smaller $s_{y/x}$ means the regression line passes closer to the data points. -#### Coefficient of Determination – (R^2) +#### Coefficient of Determination – ($R^2$) The coefficient of determination, (R^2), tells us how much of the total variation in (y) is explained by the regression: - $$ R^2 = \frac{S_l}{S_t} = 1 - \frac{S_r}{S_t} $$ -* (R^2 = 1.0) → perfect fit (all points on the line) -* (R^2 = 0) → model explains none of the variation - -In engineering terms, a high (R^2) indicates that your model captures most of the physical trend — for example, how deflection scales with load. +- ($R^2$ = 1.0) → perfect fit (all points on the line) +- ($R^2$ = 0) → model explains none of the variation +In engineering terms, a high (R^2) indicates that your model captures most of the physical trend, for example, how deflection scales with load. - -#### Correlation Coefficient – (r) -For linear regression, the **correlation coefficient** (r) is the square root of (R^2), with sign matching the slope of the line: +#### Correlation Coefficient – ($r$) +For linear regression, the correlation coefficient (r) is the square root of (R^2), with sign matching the slope of the line: $$ r = \pm \sqrt{R^2} $$ -* (r > 0): positive correlation (both variables increase together) -* (r < 0): negative correlation (one increases, the other decreases) -#### Example – Evaluating Fit in Python +- ($r$ > 0): positive correlation (both variables increase together) +- ($r$ < 0): negative correlation (one increases, the other decreases) +## Problem 1: +Fit a linear and polynomial model to stress-strain data. Compute R^2 and discuss which model fits better. ```python import numpy as np @@ -134,5 +129,51 @@ print(f"r = {r:.3f}") ``` ## Extrapolation -basis funct -## Moving average
\ No newline at end of file +Once we have a regression model, it’s tempting to use it to predict values beyond the range of measured data. This process is called extrapolation. + +In interpolation, the model is supported by real data on both sides of the point. In extrapolation, we’re assuming that the same physical relationship continues indefinitely and that’s often not true in engineering systems. + +Most regression equations are empirical as they describe the trend in the range of observed conditions but may not capture the true physics. Common issues may originate from nonlinear behavior outside range such as stress–strain curves. Physical limitations, such as below absolute 0 temperatures, or greater than 100% efficiencies. Another case could be where the mechanism changes in the real world making the model inapplicable such as heat transfer switching from convection to radiation at higher temperatures. + +Some guidelines of using extrapolation: +- Plot the data used for fitting +- Avoid predicting far beyond the range of your data unless supported by physical models +## Moving average +Real experimental data often contains small random fluctuations that obscure the underlying trend a.k.a. noise. Rather than fitting a complex equation, we can smooth the data using a moving average, which replaces each point with the average of its nearby neighbors. This simple method reduces random variation while preserving the overall shape of the signal. + +A moving average or rolling mean takes the average over a sliding window of data points given by the equation: +$$\bar{y}_i = \frac{1}{N} \sum_{j=i-k}^{i+k} y_j$$ +where: +- $N$ = window size (number of points averaged), +- $k = (N-1)/2$ if the window is centered, +- $y_j$ = original data values. + +If you select a larger window you'll have a smoother curve, but you loose detail. A smaller windows retains more detail but reduces less noise. +### Example: Smoothing sensor noise +```python +import numpy as np +import matplotlib.pyplot as plt +import pandas as pd + +# Generate noisy signal +x = np.linspace(0, 4*np.pi, 100) +y = np.sin(x) + 0.2*np.random.randn(100) + +# Apply moving average with different window sizes +df = pd.DataFrame({'x': x, 'y': y}) +df['y_smooth_5'] = df['y'].rolling(window=5, center=True).mean() +df['y_smooth_15'] = df['y'].rolling(window=15, center=True).mean() + +plt.plot(df['x'], df['y'], 'k.', alpha=0.4, label='Raw data') +plt.plot(df['x'], df['y_smooth_5'], 'r-', label='Window = 5') +plt.plot(df['x'], df['y_smooth_15'], 'b-', label='Window = 15') +plt.xlabel('Time (s)') +plt.ylabel('Signal') +plt.title('Effect of Moving Average Window Size') +plt.legend() +plt.show() +``` + +## Problem 2: Moving average +Apply a moving average to noisy temperature data and compare raw vs. smoothed signals + diff --git a/tutorials/module_4/4.6 Data Filtering and Signal Processing.md b/tutorials/module_4/4.6 Data Filtering and Signal Processing.md index ac1760e..9ca3034 100644 --- a/tutorials/module_4/4.6 Data Filtering and Signal Processing.md +++ b/tutorials/module_4/4.6 Data Filtering and Signal Processing.md @@ -9,38 +9,218 @@ - Interpret filter performance and trade-offs (cutoff frequency, phase lag) --- +## What is data filtering and why does it matter? -#### Topics - -- Review: what “noise” looks like statistically -- Time-domain filters - - Moving-average, Savitzky–Golay smoothing - - FIR and IIR filters (low-pass, high-pass, band-pass) -- Frequency-domain filtering - - Fast Fourier Transform (FFT) basics - - Noise removal using spectral methods -- Spatial filtering and image operations - - Gaussian smoothing, Sobel edge detection, median filters -- Comparing filtered vs. unfiltered data visually -#### Python Focus - -- `scipy.signal` for 1-D signals - - `butter()`, `filtfilt()`, `savgol_filter()` - - `freqz()` for visualizing filter response -- `numpy.fft` for frequency-domain analysis -- `scipy.ndimage` for 2-D spatial filters - - `gaussian_filter()`, `median_filter()`, `sobel()` -- Quick visualization with `matplotlib.pyplot` and `imshow()` -#### Applications - -- **Vibration analysis:** Filter accelerometer data to isolate modal frequencies -- **Thermal measurements:** Smooth transient thermocouple data to remove spikes -- **Fluid or heat transfer visualization:** Apply Gaussian blur or gradient filters to contour plots or infrared images -- **Structural testing:** Remove noise from strain-gauge or displacement signals before computing stress–strain - -#### Problems - -- Filter noisy scpectroscopy data and compare spectra before/after -- Apply a moving average and a Butterworth filter to the same dataset — evaluate differences -- Use `ndimage.sobel()` to highlight temperature gradients in a heat-map image -- Challenge: write a short Python function that automatically chooses an appropriate smoothing window based on noise level
\ No newline at end of file +Filtering is a process in signal processing to remove unwanted parts of the signal within certain frequency range. Low-pass filters remove all signals above certain cut-off frequency; high-pass filters do the opposite. Combining low- and high-pass filters allows constructing a band-pass filter, which means we only keep the signals within a pair of frequencies. + +Measurements from sensors, test rigs, or simulations are rarely perfect. Electrical interference, vibration, quantization error, or even airflow turbulence can create random variations that obscure the trend. + +Different filtering methods are used depending on the data set type, the nature of the noise, and the desired outcome, whether it’s removing interference, detecting anomalies, or smoothing fluctuations in time-series data. Choosing the right filter ensures cleaner, more reliable data for analysis and decision-making. + +**Key Data filtering Methods** + +| **Filtering Method** | **Types of Filters** | **Purpose** | **Applications** | +| --------------------------------------------------- | ----------------------------------------------------------------------------------- | ------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- | +| Frequency-based filters (signal processing filters) | - Low-pass<br>- High-pass<br>- Bandpass<br>- Bandstop (notch) | Remove or retain specific frequency components in data | Noise reduction (e.g., low-pass filtering to remove high-frequency noise), image processing, sensor data analysis | +| Smoothing filters (statistical methods) | - Median filter<br>- Moving average<br>- Gaussian filter<br>- Exponential smoothing | Smooth data by reducing noise and variability | Time-series smoothing, image processing, outlier removal | +| Rule-based filters (conditional filtering) | - Threshold filters (e.g., greater than, less than)<br>- Rule-based filters | Filter data based on predefined logical conditions | Data cleaning, outlier detection, quality control | +| Trend-based filters (time-series methods) | - Hodrick-Prescott filter<br>- Kalman filter<br>- Wavelet filter | Identify trends, remove seasonality, smooth fluctuations | Stock market analysis, climate data, sensor monitoring | +| Machine learning–based filters | - Anomaly detection algorithms<br>- Autoencoders<br>- Clustering-based filtering | Use AI and machine learning to detect and remove noisy or irrelevant data | Fraud detection, predictive maintenance, automated data cleaning | + + +## Frequency domain basics +So far, we’ve looked at data in the time domain, for example, Temperature(t), Pressure(t) or Displacement(t). The frequency domain is a different way of looking at the same information. Instead of asking _“how does this signal change over time?”_, we ask *“What frequencies make up this signal?”*. Every repeating vibration, oscillation, or wave can be described by the frequencies it contains. Transforming to the frequency domain helps us see the hidden structure of a signal, especially when it’s a mix of multiple oscillations. + +Let's consider the vibration of a shaft due to two rotating components, one at 10 Hz and another at 50 Hz. In the time domain, the combined signal looks complicated, but in the frequency domain, two clear peaks appear at 10 Hz and 50 Hz, instantly revealing the underlying behavior. + +The mathematical tool that converts a time-domain signal to its frequency components is the Fourier Transform. +$$X(f) = \int_{-\infty}^{\infty} x(t)\, e^{-j 2 \pi f t}\, dt$$ +In practice, we use the Discrete Fourier Transform (DFT) or its efficient implementation, the Fast Fourier Transform (FFT). This process decomposes any signal into a sum of sine and cosine waves at different frequencies. +##### Visualization +```python +import numpy as np +import matplotlib.pyplot as plt + +# Create a time vector (1 second at 1000 Hz sampling) +fs = 1000 +t = np.linspace(0, 1, fs, endpoint=False) + +# Signal: 10 Hz + 50 Hz sine waves + noise +x = np.sin(2*np.pi*10*t) + 0.5*np.sin(2*np.pi*50*t) + 0.2*np.random.randn(len(t)) + +# Compute FFT +X = np.fft.fft(x) +freq = np.fft.fftfreq(len(x), d=1/fs) + +# Plot magnitude spectrum (only positive frequencies) +plt.figure(figsize=(10,5)) +plt.plot(freq[:fs//2], np.abs(X)[:fs//2]) +plt.title('Frequency Domain Representation') +plt.xlabel('Frequency [Hz]') +plt.ylabel('Amplitude') +plt.grid(True) +plt.show() + +``` + +## Fourier transform overiew (numpy.fft, scipy.fft) +The Fourier Transform is the mathematical bridge between the time and frequency domains. In python we can use both numpy and scipy to perform DFT on a set of data in the time domain. For a discrete signal $x[n]$ sampled at uniform intervals we write the fourier transform function as: +$$X[k] = \sum_{n=0}^{N-1} x[n] \, e^{-j \frac{2\pi}{N}kn}$$ +where: +- $x[n]$= nth sample of the time-domain signal +- $X[k]$= kth frequency component (complex number) +- $N$ = total number of samples +- $e^{-j 2\pi kn / N}$ = basis functions complex exponentials representing sinusoids of different frequencies + +Both numpy and Scipy use Fast Fourier Transform (FFT) which is an algorithm that computes the DFT using the equation above. For this section we will use Scipy as it has a modern algorithm which is optimized for larger data sizes. + +```python +import numpy as np +import matplotlib.pyplot as plt +from scipy.fft import fft, fftfreq + +# --- Create a sample signal --- +fs = 1000 # sampling frequency [Hz] +T = 1/fs # sampling period [s] +t = np.arange(0, 1, T) # 1 second of data + +# signal: two sine waves + random noise +x = np.sin(2*np.pi*10*t) + 0.5*np.sin(2*np.pi*50*t) + 0.2*np.random.randn(len(t)) + +# --- Compute FFT --- +N = len(x) +X = fft(x) +freq = fftfreq(N, T)[:N//2] # positive frequencies only + +# --- Plot results --- +plt.figure(figsize=(12,5)) +plt.subplot(1,2,1) +plt.plot(t, x) +plt.title('Time Domain') +plt.xlabel('Time [s]') +plt.ylabel('Amplitude') + +plt.subplot(1,2,2) +plt.plot(freq, 2/N * np.abs(X[:N//2])) # scaled magnitude spectrum +plt.title('Frequency Domain') +plt.xlabel('Frequency [Hz]') +plt.ylabel('Amplitude') +plt.tight_layout() +plt.show() + +``` + + +## Low-pass and high-pass filters (scipy.singla.butter, filtfilt) +After analyzing signals in the frequency domain, we often want to keep only the frequencies that matter and remove those that don’t. You may have encountered a filters when dealing with circuitry, we can also apply a filter digitally to a signal. + +The Butterworth filter is one of the most common digital filters because it has a smooth, flat frequency response in the passband (no ripple). + +Its gain function is: +$$|H(\omega)| = \frac{1}{\sqrt{1 + \left( \frac{\omega}{\omega_c} \right)^{2n}}}$$ +where: +- $\omega_c$ = cut-off frequency +- $n$ = filter order (higher = sharper roll-off) + +In Python, `scipy.signal.butter` designs this filter, and `scipy.signal.filtfilt` applies it _forward and backward_ to avoid phase shift. +## Example: Removing high-frequency noise from a displacement signal + +```python +import numpy as np +import matplotlib.pyplot as plt +from scipy.signal import butter, filtfilt + +# --- Create a noisy signal --- +fs = 1000 # sampling frequency [Hz] +t = np.linspace(0, 1, fs) +signal = np.sin(2*np.pi*5*t) + 0.5*np.sin(2*np.pi*50*t) # 5 Hz + 50 Hz components +noisy_signal = signal + 0.3*np.random.randn(len(t)) + +# --- Design a low-pass Butterworth filter --- +cutoff = 10 # desired cutoff frequency [Hz] +order = 4 +b, a = butter(order, cutoff/(fs/2), btype='low', analog=False) + +# --- Apply the filter --- +filtered = filtfilt(b, a, noisy_signal) + +# --- Plot results --- +plt.figure(figsize=(10,5)) +plt.plot(t, noisy_signal, 'gray', alpha=0.5, label='Noisy Signal') +plt.plot(t, filtered, 'r', linewidth=2, label='Low-pass Filtered') +plt.xlabel('Time [s]') +plt.ylabel('Amplitude') +plt.title('Low-pass Butterworth Filter') +plt.legend() +plt.show() + +``` + +```python +# --- Design a high-pass filter --- +cutoff_hp = 20 +b_hp, a_hp = butter(order, cutoff_hp/(fs/2), btype='high', analog=False) +filtered_hp = filtfilt(b_hp, a_hp, noisy_signal) + +# --- Plot comparison --- +plt.figure(figsize=(10,5)) +plt.plot(t, noisy_signal, 'gray', alpha=0.5, label='Original Signal') +plt.plot(t, filtered_hp, 'b', linewidth=2, label='High-pass Filtered') +plt.xlabel('Time [s]') +plt.ylabel('Amplitude') +plt.title('High-pass Butterworth Filter') +plt.legend() +plt.show() + +``` + +## Example: Removing noise from an image to help for further analysis (for PIV) + +```python +import numpy as np +import matplotlib.pyplot as plt +from scipy.fft import fft2, ifft2, fftshift, ifftshift +from skimage import data, img_as_float +from skimage.util import random_noise + +# --- Load and corrupt an image --- +image = img_as_float(data.camera()) # grayscale test image +noisy = random_noise(image, mode='s&p', amount=0.05) # add salt & pepper noise + +# --- 2D FFT --- +F = fft2(noisy) +Fshift = fftshift(F) # move zero frequency to center + +# --- Build a circular low-pass mask --- +rows, cols = noisy.shape +crow, ccol = rows//2, cols//2 +radius = 30 # cutoff radius in frequency domain +mask = np.zeros_like(noisy) +Y, X = np.ogrid[:rows, :cols] +dist = np.sqrt((X-ccol)**2 + (Y-crow)**2) +mask[dist <= radius] = 1 + +# --- Apply mask and inverse FFT --- +Fshift_filtered = Fshift * mask +F_ishift = ifftshift(Fshift_filtered) +filtered = np.real(ifft2(F_ishift)) + +# --- Plot results --- +fig, ax = plt.subplots(1, 3, figsize=(12,5)) +ax[0].imshow(noisy, cmap='gray') +ax[0].set_title('Noisy Image') +ax[1].imshow(np.log(1+np.abs(Fshift)), cmap='gray') +ax[1].set_title('FFT Magnitude Spectrum') +ax[2].imshow(filtered, cmap='gray') +ax[2].set_title('Low-pass Filtered Image') +for a in ax: a.axis('off') +plt.tight_layout() +plt.show() +``` + +## Problem 1: +Generate a synthetic signal (sum of two sine waves+random noise). Apply a moving average and FFT to show frequency components.) + + +## Problem 2: +Design a Butterworkth low-pass filter to isolate the funcamental frequency of a vibration signal (e.g. roating machinery). Plot before and after.
\ No newline at end of file diff --git a/tutorials/module_4/4.7 Data Visualization and Presentation.md b/tutorials/module_4/4.7 Data Visualization and Presentation.md index 3b53ff0..f11f401 100644 --- a/tutorials/module_4/4.7 Data Visualization and Presentation.md +++ b/tutorials/module_4/4.7 Data Visualization and Presentation.md @@ -17,16 +17,3 @@ ## How to represent data scientifically - - - -: - - - - - - - - -## Taking it further with R
\ No newline at end of file |
