summaryrefslogtreecommitdiff
path: root/tutorials/module_4/4.1 Introduction to Data and Scientific Datasets.md
diff options
context:
space:
mode:
Diffstat (limited to 'tutorials/module_4/4.1 Introduction to Data and Scientific Datasets.md')
-rw-r--r--tutorials/module_4/4.1 Introduction to Data and Scientific Datasets.md55
1 files changed, 50 insertions, 5 deletions
diff --git a/tutorials/module_4/4.1 Introduction to Data and Scientific Datasets.md b/tutorials/module_4/4.1 Introduction to Data and Scientific Datasets.md
index 8327006..d52c33c 100644
--- a/tutorials/module_4/4.1 Introduction to Data and Scientific Datasets.md
+++ b/tutorials/module_4/4.1 Introduction to Data and Scientific Datasets.md
@@ -14,16 +14,16 @@ We may collect this in the following ways:
- **Experiments** – temperature readings from thermocouples, strain or force from sensors, vibration accelerations, or flow velocities.
- **Simulations** – outputs from finite-element or CFD models such as pressure, stress, or temperature distributions.
- **Instrumentation and sensors** – digital or analog signals from transducers, encoders, or DAQ systems.
-
-### Introduction to pandas
-`pandas` (**Pan**el **Da**ta) is a Python library designed for **data analysis and manipulation**, widely used in engineering, science, and data analytics. It provides two core data structures: the **Series** and the **DataFrame**.
+## Introduction to pandas
+`pandas` (**Pan**el **Da**ta) is a Python library designed for data analysis and manipulation, widely used in engineering, science, and data analytics. It provides two core data structures: the **Series** and the **DataFrame**.
A `Series` represents a single column or one-dimensional labeled array, while a `DataFrame` is a two-dimensional table of data, similar to a spreadsheet table, where each column is a `Series` and each row has a labeled index.
DataFrames can be created from dictionaries, lists, NumPy arrays, or imported from external files such as CSV or Excel. Once data is loaded, you can **view and explore** it using methods like `head()`, `tail()`, and `describe()`. Data can be **selected by label** or **by position**. These indexing systems make it easy to slice, filter, and reorganize datasets efficiently.
-### Problem 1: Import a text file
+### Problem 1: Create a dataframe from a text file
+Given the the file `force_displacement_data.txt`. Use pandas to tabulate the data into a dataframe
```python
import pandas as pd
@@ -38,7 +38,7 @@ df_txt = pd.read_csv(
)
print("\n=== Basic Statistics ===")
-print(df_txt.describe())
+print(df_txt.describe())232
if "Force_N" in df_txt.columns:
print("\nFirst five Force readings:")
@@ -59,6 +59,51 @@ except ImportError:
```
+## Subsetting and Conditional filtering
+You can select rows, columns, or specific conditions from a DataFrame.
+
+```python
+# Select a column
+force = df["Force_N"]
+
+# Select multiple columns
+subset = df[["Time_s", "Force_N"]]
+
+# Conditional filtering
+df_high_force = df[df["Force_N"] > 50]
+```
+
+
+![[Pasted image 20251013064718.png]]
+
+## Combining and Merging Datasets
+Often, multiple sensors or experiments must be merged into one dataset for analysis.
+
+```python
+# Merge on a common column (e.g., time)
+merged = pd.merge(df_force, df_temp, on="Time_s")
+
+# Stack multiple test runs vertically
+combined = pd.concat([df_run1, df_run2], axis=0)
+```
+
+
+## Problem 1: Describe a dataset
+Use pandas built-in describe data to report on the statistical mean of the given experimental data.
+
+```python
+import matplotlib.pyplot as plt
+
+plt.plot(df["Time_s"], df["Force_N"])
+plt.xlabel("Time (s)")
+plt.ylabel("Force (N)")
+plt.title("Force vs. Time")
+plt.show()
+```
+
+
+
+
**Activities & Examples:**
- Load small CSV datasets using `numpy.loadtxt()` and `pandas.read_csv()`