summaryrefslogtreecommitdiff
path: root/tutorials/module_4
diff options
context:
space:
mode:
authorChristian Kolset <christian.kolset@gmail.com>2025-10-24 17:22:07 -0600
committerChristian Kolset <christian.kolset@gmail.com>2025-10-24 17:22:07 -0600
commiteb0ee1f0d51d33666376552e610de15f233167f5 (patch)
treeb39df8b924ea6fadc318c9af3b9055508f84c03c /tutorials/module_4
parent0ecf562bdf2bddb16593fc5402f0c147e2bf7fac (diff)
Added signal processing information to data_dump
Diffstat (limited to 'tutorials/module_4')
-rw-r--r--tutorials/module_4/2_data_processing.md136
-rw-r--r--tutorials/module_4/4.5 Statistical Analysis II.md4
2 files changed, 130 insertions, 10 deletions
diff --git a/tutorials/module_4/2_data_processing.md b/tutorials/module_4/2_data_processing.md
index eba7a12..d01532c 100644
--- a/tutorials/module_4/2_data_processing.md
+++ b/tutorials/module_4/2_data_processing.md
@@ -1,6 +1,36 @@
# Data Processing
-## Signal Processing - Filtering
+## Data filtering
+
+Filtering is a process in signal processing to remove unwanted parts of the signal within certain frequency range. Low-pass filters remove all signals above certain cut-off frequency; high-pass filters do the opposite. Combining low- and high-pass filters allows constructing a band-pass filter, which means we only keep the signals within a pair of frequencies.
+
+While techniques vary by field, three fundamental aspects define how data filtering is applied:
+- **Noise reduction** removes unwanted variations or distortions that can obscure meaningful information, improving data clarity and consistency.
+- **Relevance filtering** selects only the most useful data based on specific criteria, ensuring that analytics and decision-making focus on high-value information.
+- **Data smoothing and transformation** reduces abrupt fluctuations and refines raw data, making it easier to identify trends and patterns in time-series analysis and predictive modeling.
+
+### Example: Medical Imaging
+In [medical image processing](https://www.mathworks.com/help/medical-imaging/index.html), data filtering is essential for producing clearer scans. For example, MRI and CT scans use filters to reduce noise caused by movement or interference, making it easier for radiologists to detect abnormalities. Without filtering, critical details could be lost in background noise, potentially leading to misdiagnosis.
+
+## Best practices for effective Data filtering
+Effective data filtering is crucial to maintaining data accuracy and reliability. The top best practices to ensure high-quality results are:
+- **Understanding your data:** Before applying any filters, analyze the structure and characteristics of your data set. This step includes identifying noise, missing values, and outliers to choose the most suitable filtering techniques.
+- **Choosing the right filter:** Select filters that align with your analysis goals. For example, use frequency-based filters for noise reduction, smoothing filters for trend preservation, and rule-based filters for outlier detection.
+- **Preserving data integrity:** Avoid over-filtering, which can remove important insights. Focus on improving accuracy while maintaining essential data and patterns.
+- **Evaluating filtered data:** Always assess the effectiveness of your filtering. Compare raw versus filtered data, visualize the results, and use statistical metrics to ensure the accuracy and reliability of your data.
+### Types of Data filtering methods
+Different filtering methods are used depending on the data set type, the nature of the noise, and the desired outcome—whether it’s removing interference, [detecting anomalies](https://www.mathworks.com/discovery/anomaly-detection.html), or smoothing fluctuations in time-series data. Choosing the right filter ensures cleaner, more reliable data for analysis and decision-making.
+
+**Key Data filtering Methods**
+
+| **Filtering Method** | **Types of Filters** | **Purpose** | **Applications** |
+| --------------------------------------------------- | ----------------------------------------------------------------------------------- | ------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
+| Frequency-based filters (signal processing filters) | - Low-pass<br>- High-pass<br>- Bandpass<br>- Bandstop (notch) | Remove or retain specific frequency components in data | Noise reduction (e.g., low-pass filtering to remove high-frequency noise), image processing, sensor data analysis |
+| Smoothing filters (statistical methods) | - Median filter<br>- Moving average<br>- Gaussian filter<br>- Exponential smoothing | Smooth data by reducing noise and variability | Time-series smoothing, image processing, outlier removal |
+| Rule-based filters (conditional filtering) | - Threshold filters (e.g., greater than, less than)<br>- Rule-based filters | Filter data based on predefined logical conditions | Data cleaning, outlier detection, quality control |
+| Trend-based filters (time-series methods) | - Hodrick-Prescott filter<br>- Kalman filter<br>- Wavelet filter | Identify trends, remove seasonality, smooth fluctuations | Stock market analysis, climate data, sensor monitoring |
+| Machine learning–based filters | - Anomaly detection algorithms<br>- Autoencoders<br>- Clustering-based filtering | Use AI and machine learning to detect and remove noisy or irrelevant data | Fraud detection, predictive maintenance, automated data cleaning |
+### Moving Average
### Low-Pass
@@ -8,24 +38,112 @@
### Band-Pass
-## Data Filtering
+## Example: Filtering Audio Data
+```python
+from scipy.io import wavfile
+fs, data = wavfile.read('input.wav')
+b, a = signal.butter(6, 1000, fs=fs, btype='low')
+filtered = signal.filtfilt(b, a, data)
+wavfile.write('output.wav', fs, filtered.astype(data.dtype))
+```
+## Generating a noisy example signal
+```python
+fs = 500  # Sampling frequency (Hz)
+t = np.arange(0, 1.0, 1.0/fs)  # 1 second of data
+freq = 5  # Frequency of the signal (Hz)
+x = np.sin(2 * np.pi * freq * t)  # Pure tone
+
+# Add Gaussian noise
+np.random.seed(0)
+x_noisy = x + 0.5 * np.random.randn(len(t))
+plt.plot(t, x_noisy)
+plt.title("Noisy Signal")
+plt.xlabel("Time [s]")
+plt.ylabel("Amplitude")
+plt.show()
+```
-## Data Cleaning
+## Filtering Signals: Butterworth Low-Pass Example
+You can easily filter out high-frequency noise using a Butterworth low-pass filter. Here’s a practical workflow:
-### Empty Cells
+#### **Design the filter**
+```python
+**cutoff = 10  # Desired cutoff frequency (Hz)
+order = 4    # Filter order (steepness)
+b, a = signal.butter(order, cutoff, fs=fs, btype='low')**
+```
-Remove data point - `df.dropna()`
-Replace data point - `fillna(130, inplace = True)`
-We can use this to replace each data point with mean, median or mode -
+#### **Apply the filter (zero-phase for no time shift)**
```python
-x = df["Calories"].mean()
-df.fillna({"Calories": x}, inplace=True)
+x_filtered = signal.filtfilt(b, a, x_noisy)
```
+#### **Plot to compare before and after**
+
+| |
+| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `plt.plot(t, x_noisy, label="Noisy")`<br><br>`plt.plot(t, x_filtered, label="Filtered", linewidth=2)`<br><br>`plt.legend()`<br><br>`plt.title("Signal Before and After Low-Pass Filtering")`<br><br>`plt.xlabel("Time [s]")`<br><br>`plt.show()` |
+
+**Tip:** Use filtfilt for zero-phase filtering. If you use lfilter instead, the output will be shifted in time.
+
+### Visualizing Frequency Content (Power Spectrum)
+
+You don’t have to stick with FFT. For better frequency analysis, use Welch’s method:
+
+| |
+|---|
+|`f, Pxx = signal.welch(x_noisy, fs, nperseg=256)`<br><br>`plt.semilogy(f, Pxx)`<br><br>`plt.title("Power Spectral Density (Welch's Method)")`<br><br>`plt.xlabel("Frequency [Hz]")`<br><br>`plt.ylabel("PSD")`<br><br>`plt.show()`|
+
+Welch’s method averages over segments, giving you a smoother, more robust estimate than a plain FFT, especially when your data is noisy or short.
+
+---
+
+### Peak Detection in Signals
+
+To find peaks (local maxima), such as heartbeats in ECG or clicks in audio:
+
+| |
+|---|
+|`peaks, _ = signal.find_peaks(x_filtered, height=0.5)`<br><br>`plt.plot(t, x_filtered)`<br><br>`plt.plot(t[peaks], x_filtered[peaks], "x")`<br><br>`plt.title("Detected Peaks")`<br><br>`plt.show()`|
+
+You can fine-tune detection with parameters like height, distance, and prominence to match your data’s characteristics.
+
+### Creating and Applying Custom Filters
+
+Suppose you want a band-pass filter. Design it and apply just like before:
+
+| |
+|---|
+|`lowcut = 2`<br><br>`highcut = 15`<br><br>`b, a = signal.butter(order, [lowcut, highcut], fs=fs, btype='band')`<br><br>`x_band = signal.filtfilt(b, a, x_noisy)`|
+
+For more control, use different filter types (cheby1, ellip, etc.), or design with firwin for FIR filters:
+
+| |
+|---|
+|`numtaps = 101  # Filter length`<br><br>`fir_coeff = signal.firwin(numtaps, [lowcut, highcut], fs=fs, pass_zero='bandpass')`<br><br>`x_fir = signal.filtfilt(fir_coeff, 1.0, x_noisy)`|
+
+### Denoising and Smoothing
+
+Quick moving average:
+
+| |
+|---|
+|`window = np.ones(10)/10`<br><br>`x_smooth = np.convolve(x_noisy, window, mode='same')`|
+
+Or use built-in Savitzky-Golay filter for preserving shape:
+
+| |
+|---|
+|`x_savgol = signal.savgol_filter(x_noisy, window_length=51, polyorder=3)`|
+
+### Common Pitfalls and Best Practices
+- Always check filter stability and output. High-order filters or poor cutoff frequencies can cause ringing or instability.
+- When using filtfilt, your signal needs to be longer than three times the filter’s length. Otherwise, you’ll get edge effects or errors.
+- Choose the correct filter type for your data and application. FIR filters are always stable but may require longer filter lengths.
###
diff --git a/tutorials/module_4/4.5 Statistical Analysis II.md b/tutorials/module_4/4.5 Statistical Analysis II.md
index 3df558b..458bada 100644
--- a/tutorials/module_4/4.5 Statistical Analysis II.md
+++ b/tutorials/module_4/4.5 Statistical Analysis II.md
@@ -1,7 +1,7 @@
# 4.5 Statistical Analysis II
As mentioned in the previous tutorial. Data is what gives us the basis to create models. By now you've probably used excel to create a line of best fit. In this tutorial, we will go deeper into how this works and how we can apply this to create our own models to make our own predictions.ile changes in local repository
​=======
- File changes in remote reposito
+ File changes in remote repository
## Least Square Regression and Line of Best Fit
@@ -10,6 +10,7 @@ As mentioned in the previous tutorial. Data is what gives us the basis to create
Linear regression is one of the most fundamental techniques in data analysis. It models the relationship between two (or more) variables by fitting a **straight line** that best describes the trend in the data.
+
### Linear
To find a linear regression line we can apply the
@@ -67,6 +68,7 @@ plt.show()
Using the
+
## Extrapolation
basis funct
## Moving average