1 files changed, 67 insertions, 42 deletions
diff --git a/tutorials/module_4/4.1 Introduction to Data and Scientific Datasets.md b/tutorials/module_4/4.1 Introduction to Data and Scientific Datasets.md
index d52c33c..3ad34e4 100644
--- a/tutorials/module_4/4.1 Introduction to Data and Scientific Datasets.md
+++ b/tutorials/module_4/4.1 Introduction to Data and Scientific Datasets.md
@@ -3,7 +3,6 @@
 **Learning objectives:**
 
 - Understand what makes data “scientific” (units, precision, metadata)
-- Recognize types of data: time-series, experimental, simulation, and imaging data
 - Identify challenges in data processing (missing data, noise, outliers)
 - Overview of the data-analysis workflow
 ---
@@ -14,6 +13,11 @@ We may collect this in the following ways:
 - **Experiments** – temperature readings from thermocouples, strain or force from sensors, vibration accelerations, or flow velocities.
 - **Simulations** – outputs from finite-element or CFD models such as pressure, stress, or temperature distributions.
 - **Instrumentation and sensors** – digital or analog signals from transducers, encoders, or DAQ systems.
+## Data Processing flow works
+```mermaid
+flowchart
+	A[Collecting] --> B[Cleaning & Filtering] --> C[Analysis] --> D[Visualization]
+```
 ## Introduction to pandas
 `pandas` (**Pan**el **Da**ta) is a Python library designed for data analysis and manipulation, widely used in engineering, science, and data analytics. It provides two core data structures: the **Series** and the **DataFrame**. 
 
@@ -21,45 +25,26 @@ A `Series` represents a single column or one-dimensional labeled array, while
 
 DataFrames can be created from dictionaries, lists, NumPy arrays, or imported from external files such as CSV or Excel. Once data is loaded, you can **view and explore** it using methods like `head()`, `tail()`, and `describe()`. Data can be **selected by label** or **by position**. These indexing systems make it easy to slice, filter, and reorganize datasets efficiently.
 
+### Problem 1: Create a dataframe from an array
+Given the data `force_N` and `time_s`
 
-### Problem 1: Create a dataframe from a text file
-Given the the file `force_displacement_data.txt`. Use pandas to tabulate the data into a dataframe
 ```python
 import pandas as pd
 
-file_path = "force_displacement_data.txt"
+force_N = [10, 20, 30, 25, 15]
+time_s = [0, 1, 2, 3, 4]
 
-df_txt = pd.read_csv(
-    file_path,
-    delim_whitespace=True,
-    comment="#",
-    skiprows=0,
-    header=0
-)
+df = pd.DataFrame({
+	'Time (s)': time_s,
+    'Force (N)': force_N
+})
 
 print("\n=== Basic Statistics ===")
-print(df_txt.describe())232
-
-if "Force_N" in df_txt.columns:
-    print("\nFirst five Force readings:")
-    print(df_txt["Force_N"].head())
-
-try:
-    import matplotlib.pyplot as plt
-
-    plt.plot(df_txt.iloc[:, 0], df_txt.iloc[:, 1])
-    plt.xlabel(df_txt.columns[0])
-    plt.ylabel(df_txt.columns[1])
-    plt.title("Loaded Data from Text File")
-    plt.grid(True)
-    plt.show()
-
-except ImportError:
-    print("\nmatplotlib not installed — skipping plot.")
-
+print(df.describe())
 ```
-
-## Subsetting and Conditional filtering
+Notice how `the describe()` function outputs some statistical data the we may find useful.
+### Manipulating dataframes
+#### Subsets and Conditional filtering
 You can select rows, columns, or specific conditions from a DataFrame.
 
 ```python
@@ -73,10 +58,9 @@ subset = df[["Time_s", "Force_N"]]
 df_high_force = df[df["Force_N"] > 50]
 ```
 
-
 ![[Pasted image 20251013064718.png]]
 
-## Combining and Merging Datasets
+#### Combining and Merging Datasets
 Often, multiple sensors or experiments must be merged into one dataset for analysis.
 
 ```python
@@ -88,21 +72,62 @@ combined = pd.concat([df_run1, df_run2], axis=0)
 ```
 
 
-## Problem 1: Describe a dataset
-Use pandas built-in describe data to report on the statistical mean of the given experimental data.
+https://pandas.pydata.org/docs/user_guide/merging.html
+
+#### Creating new columns based on existing ones
+<img src="image_1761156588079.png" width="600">
 
+Much like excel, pandas allows you to manipulate columns using the dataframe header. In this examples we want to multiply a dataframe column by a constant.
 ```python
-import matplotlib.pyplot as plt
+air_quality["london_mg_per_cubic"] = air_quality["station_london"] * 1.882
+```
 
-plt.plot(df["Time_s"], df["Force_N"])
-plt.xlabel("Time (s)")
-plt.ylabel("Force (N)")
-plt.title("Force vs. Time")
-plt.show()
+We may want to the new column as a function of other columns, we can do so by simply applying a mathematical operation as follows:
+```python
+air_quality["ratio_paris_antwerp"] = (
+    air_quality["station_paris"] / air_quality["station_antwerp"]
+    )
 ```
 
+https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html
 
+https://pandas.pydata.org/docs/user_guide/reshaping.html
 
+### Problem 1: Create a dataframe from data
+Given the the file `force_displacement_data.txt`. Use pandas to tabulate the data into a dataframe
+```python
+import pandas as pd
+
+file_path = "force_displacement_data.txt"
+
+df_txt = pd.read_csv(
+    file_path,
+    delim_whitespace=True,
+    comment="#",
+    skiprows=0,
+    header=0
+)
+
+print("\n=== Basic Statistics ===")
+print(df_txt.describe())
+
+if "Force_N" in df_txt.columns:
+    print("\nFirst five Force readings:")
+    print(df_txt["Force_N"].head())
+
+try:
+    import matplotlib.pyplot as plt
+
+    plt.plot(df_txt.iloc[:, 0], df_txt.iloc[:, 1])
+    plt.xlabel(df_txt.columns[0])
+    plt.ylabel(df_txt.columns[1])
+    plt.title("Loaded Data from Text File")
+    plt.grid(True)
+    plt.show()
+
+except ImportError:
+    print("\nmatplotlib not installed — skipping plot.")
+```
 
 
 **Activities & Examples:**