# Noise Monitoring Database - Data Structure Documentation

## Overview

This database contains environmental noise measurement data collected from live music events and outdoor performances. 
The data includes event metadata, band performance schedules, measurement configurations, and time-series sensor data with frequency spectrum analysis.

**Database Name:** `simu`
**Database System:** PostgreSQL
**Total Tables:** 5
**Total Records:** ~9.8 million rows
**Export File:** `simu.sql` (8.1 GB)

---

## Database Schema

### Entity Relationship Diagram

```
┌─────────────┐
│   event     │
│             │
│ • id (PK)   │
│ • event_name│
│ • year      │
│ • ...       │
└──────┬──────┘
       │
       │ 1:N
       │
   ┌───┴────────────────────┐
   │                        │
   │                        │
┌──▼──────────┐      ┌──────▼────────┐
│    band     │      │ mcore_config  │
│             │      │               │
│ • id (PK)   │      │ • id (PK)     │
│ • event_id  │      │ • event_id    │
│ • band_name │      │ • mcore_id    │
│ • bandinfo  │      │ • mcore_loc...│
└──────┬──────┘      │ • foh         │
       │             └───────┬───────┘
       │ 1:N                 │
       │                     │ 1:N
       │                     │ (via mcore_id)
┌──────▼──────────┐   ┌──────▼──────────────────┐
│ band_schedule   │   │     sensordata          │
│                 │   │                         │
│ • id (PK)       │   │ • timestamp (PK)        │
│ • band_id       │   │ • mcore_id              │
│ • time_start    │   │ • laeq, lceq, lcpeak    │
│ • time_stop     │   │ • lzterz_* (spectrum)   │
└─────────────────┘   └─────────────────────────┘
```

---

## Table Structures

### 1. `event` - Event Metadata

Stores information about music events, concerts, and performances where noise measurements were conducted.

**Row Count:** 378

| Column | Type | Nullable | Description |
|--------|------|----------|-------------|
| `id` | INTEGER | NO | Primary key, auto-increment |
| `event_name` | TEXT | YES | Name/identifier of the event |
| `year` | INTEGER | YES | Year when the event took place |
| `sub_alignment` | TEXT | YES | Event sub-classification |
| `event_type` | TEXT | YES | Type of event (e.g., concert, festival) |
| `venue_type` | TEXT | YES | Venue classification (indoor/outdoor) |
| `audience` | INTEGER | YES | Expected or actual audience size |

**Example Data:**
```sql
id | event_name               | year | sub_alignment | event_type | venue_type | audience
---+-------------------------+------+---------------+------------+------------+---------
3  | Event_indoor_2023_1     | 2023 | standard      | other      | indoor     | 0
4  | Event_Unknown_2018_1    | 2018 | standard      |            |            | 0
```

**Relationships:**
- Referenced by: `band.event_id`, `mcore_config.event_id`

---

### 2. `band` - Band/Performer Information

Contains information about bands or performers that played at the events.

**Row Count:** 708

| Column | Type | Nullable | Description |
|--------|------|----------|-------------|
| `id` | INTEGER | NO | Primary key, auto-increment |
| `event_id` | INTEGER | YES | Foreign key to `event.id` |
| `band_name` | TEXT | YES | Name/identifier of the band |
| `bandinfo` | TEXT | YES | Genre or additional band information |

**Example Data:**
```sql
id | event_id | band_name       | bandinfo
---+----------+----------------+----------
2  | 4        | Band_pop_1     | pop
8  | 10       | Band_rock_1    | rock
10 | 12       | Band_hip_hop_1 | hip_hop
```

**Relationships:**
- References: `event.id` via `event_id`
- Referenced by: `band_schedule.band_id`

---

### 3. `band_schedule` - Performance Schedule

Records the start and end times of band performances during events.

**Row Count:** 659

| Column | Type | Nullable | Description |
|--------|------|----------|-------------|
| `id` | INTEGER | NO | Primary key, auto-increment |
| `band_id` | INTEGER | YES | Foreign key to `band.id` |
| `time_start` | TIMESTAMP | YES | Performance start time (without timezone) |
| `time_stop` | TIMESTAMP | YES | Performance end time (without timezone) |

**Example Data:**
```sql
id   | band_id | time_start          | time_stop
-----+---------+--------------------+--------------------
1086 | 812     | 2025-06-27 18:11:24| 2025-06-27 18:55:10
830  | 496     | 2024-08-17 16:55:54| 2024-08-17 17:22:14
```

**Relationships:**
- References: `band.id` via `band_id`

**Usage:** Links performance timing to sensor data for analysis of noise levels during specific performances.

---

### 4. `mcore_config` - Measurement Configuration

Defines measurement points (sensor locations) for each event. Each row represents one sensor placement.

**Row Count:** 892

| Column | Type | Nullable | Description |
|--------|------|----------|-------------|
| `id` | INTEGER | NO | Primary key, auto-increment |
| `event_id` | INTEGER | YES | Foreign key to `event.id` |
| `mcore_location` | TEXT | YES | **Anonymized** location identifier |
| `foh` | BOOLEAN | YES | `TRUE` if Front-of-House (infield), `FALSE` if residential/outdoor |
| `mcore_id` | INTEGER | YES | Unique identifier for the measurement device |
| `mic_height` | DOUBLE PRECISION | YES | Microphone height in meters |
| `event_distance` | DOUBLE PRECISION | YES | Distance from event source in meters |

**Location Naming Convention:**
- **FOH (Front-of-House):** `infield_1`, `infield_2`, ..., `infield_39`
  - Measurement points at or near the sound source (stage, mixing desk area)
- **Residential/Outdoor:** `residential_A`, `residential_B`, ..., `residential_CP`
  - Measurement points in surrounding areas (residential, monitoring locations)

**Example Data:**
```sql
id | event_id | mcore_location  | foh   | mcore_id | mic_height | event_distance
---+----------+----------------+-------+----------+------------+---------------
1  | 3        | infield_36     | TRUE  | 20002    | 2.0        | 30.0
3  | 4        | residential_CO | FALSE | 10002    | 0.0        | 510.0
4  | 4        | residential_AY | FALSE | 10003    | 20.0       | 530.0
```

**Relationships:**
- References: `event.id` via `event_id`
- Referenced by: `sensordata.mcore_id`

**Important Notes:**
- Same `mcore_location` value means same physical location across different events
- `mcore_id` identifies the specific measurement device used
- Location names are **anonymized** - original street addresses have been replaced

---

### 5. `sensordata` - Sensor Measurements (Time-Series)

Contains high-resolution acoustic measurement data collected from sensors. Includes overall sound levels and frequency spectrum analysis.

**Row Count:** 9,793,768

| Column | Type | Nullable | Description |
|--------|------|----------|-------------|
| `timestamp` | TIMESTAMPTZ | NO | Measurement timestamp (with timezone) |
| `mcore_id` | INTEGER | YES | Foreign key to `mcore_config.mcore_id` |
| `laeq` | SMALLINT | YES | A-weighted equivalent continuous sound level (dB × 100) |
| `lceq` | SMALLINT | YES | C-weighted equivalent continuous sound level (dB × 100) |
| `lcpeak` | SMALLINT | YES | C-weighted peak sound level (dB × 100) |
| `lafmax` | SMALLINT | YES | A-weighted maximum sound level (dB × 100) |
| `laft` | SMALLINT | YES | A-weighted fast time-weighted sound level (dB × 100) |
| `lzterz_6_3hz` | SMALLINT | YES | Z-weighted 1/3-octave band level at 6.3 Hz (dB × 100) |
| `lzterz_8hz` | SMALLINT | YES | Z-weighted 1/3-octave band level at 8 Hz (dB × 100) |
| `lzterz_10hz` | SMALLINT | YES | Z-weighted 1/3-octave band level at 10 Hz (dB × 100) |
| ... | ... | ... | *(continues for all frequency bands)* |
| `lzterz_20khz` | SMALLINT | YES | Z-weighted 1/3-octave band level at 20 kHz (dB × 100) |

**Total Frequency Bands:** 36 (from 6.3 Hz to 20 kHz)

**Frequency Spectrum Columns:**
```
6.3Hz, 8Hz, 10Hz, 12.5Hz, 16Hz, 20Hz, 25Hz, 31.5Hz, 40Hz, 50Hz,
63Hz, 80Hz, 100Hz, 125Hz, 160Hz, 200Hz, 250Hz, 315Hz, 400Hz, 500Hz,
630Hz, 800Hz, 1kHz, 1.25kHz, 1.6kHz, 2kHz, 2.5kHz, 3.15kHz, 4kHz, 5kHz,
6.3kHz, 8kHz, 10kHz, 12.5kHz, 16kHz, 20kHz
```

**Data Encoding:**
All sound level values are stored as **SMALLINT** representing **dB × 100** for precision.

**Example:**
- Stored value: `7042` → Actual: `70.42 dB`
- To convert: `actual_dB = stored_value / 100.0`

**Example Data:**
```sql
timestamp                      | mcore_id | laeq | lceq | lcpeak | lafmax | laft | ...
-------------------------------+----------+------+------+--------+--------+------+----
2025-07-05 16:49:40+00        | 10035    | 5550 | 8140 | 9810   | 5670   | 6110 | ...
2025-07-05 16:49:41+00        | 10002    | 7042 | 7496 | 8768   | 7288   | 7271 | ...
```

**Relationships:**
- References: `mcore_config.mcore_id` via `mcore_id`

**Temporal Resolution:**
- Typical measurement interval: 1 second
- Can vary depending on device configuration

---

## Data Relationships

### Primary Relationships

1. **Event → Band (One-to-Many)**
   - One event can have multiple bands performing
   - `band.event_id → event.id`

2. **Band → Band Schedule (One-to-Many)**
   - One band can have multiple performance time slots
   - `band_schedule.band_id → band.id`

3. **Event → Measurement Configuration (One-to-Many)**
   - One event can have multiple measurement points
   - `mcore_config.event_id → event.id`

4. **Measurement Configuration → Sensor Data (One-to-Many)**
   - One measurement point generates many sensor readings over time
   - `sensordata.mcore_id → mcore_config.mcore_id`

### Data Analysis Flow

```
Event Context
    └─> Bands/Performers
    │       └─> Performance Schedule (when did they play?)
    │
    └─> Measurement Points (where did we measure?)
            └─> Time-Series Sensor Data (what did we measure?)
                    └─> Acoustic levels + frequency spectrum
```

---

## Database Restoration

### Prerequisites

- **PostgreSQL** version 12 or higher
- **Available disk space:** ~10 GB (8.1 GB for SQL file + 2 GB for database)
- **Import time:** 30-60 minutes (depending on hardware)

### Method 1: Using psql Command Line

#### Step 1: Create Database

```bash
psql -h localhost -p 5432 -U postgres -c "CREATE DATABASE simu;"
```

#### Step 2: Import SQL File

```bash
psql -h localhost -p 5432 -U postgres -d simu -f simu.sql
```

**With custom connection parameters:**
```bash
psql -h your_host -p your_port -U your_user -d simu -f simu.sql
```

### Method 2: Using Docker PostgreSQL

If PostgreSQL is running in a Docker container:

#### Step 1: Copy SQL File to Container

```bash
docker cp simu.sql <container_name>:/tmp/
```

#### Step 2: Create Database

```bash
docker exec -it <container_name> psql -U postgres -c "CREATE DATABASE simu;"
```

#### Step 3: Import

```bash
docker exec -i <container_name> psql -U postgres -d simu < simu.sql
```

### Method 3: Using pgAdmin (GUI)

1. Create new database: Right-click "Databases" → "Create" → "Database"
   - Database name: `simu`
2. Open Query Tool for the new database
3. File → Open → Select `simu.sql`
4. Execute (F5)

### Method 4: Using Python

```python
import psycopg2

# Connect to PostgreSQL server
conn = psycopg2.connect(
    host='localhost',
    port=5432,
    user='postgres',
    password='your_password'
)
conn.autocommit = True
cursor = conn.cursor()

# Create database
cursor.execute("CREATE DATABASE simu;")
cursor.close()
conn.close()

# Connect to new database and import
conn = psycopg2.connect(
    host='localhost',
    port=5432,
    database='simu',
    user='postgres',
    password='your_password'
)
cursor = conn.cursor()

# Read and execute SQL file
with open('simu.sql', 'r', encoding='utf-8') as f:
    sql_script = f.read()
    cursor.execute(sql_script)

conn.commit()
cursor.close()
conn.close()

print("Database restored successfully!")
```

## Data Quality Notes

1. **Missing Values:** Some fields may contain NULL values, especially:
   - `mic_height` and `event_distance` in `mcore_config`
   - Low and high frequency bands in `sensordata` (6.3Hz, 8Hz, 16kHz-20kHz)

2. **Data Encoding:** All dB values are stored × 100 (e.g., 7042 = 70.42 dB)

3. **Timestamp Zones:**
   - `sensordata.timestamp`: WITH timezone (UTC)
   - `band_schedule` times: WITHOUT timezone (local time)

4. **Measurement Continuity:** Sensor data may have gaps due to:
   - Equipment downtime
   - Battery changes
   - Technical issues

---