02 GLUE Ensemble — CI Band Visualization#

Monte Carlo sampling via spotpy with NSE behavioral threshold. Forward model: Q_sim = a * Q_wam + b (linear scaling correction).

Run this notebook from the repo root directory: jupyter notebook

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

from red_tide_reanalysis.ingestion.obs_loader import load_observations
from red_tide_reanalysis.ingestion.wam_loader import load_wam_model
from red_tide_reanalysis.ingestion.align import align_obs_model
from red_tide_reanalysis.glue.method import GLUEMethod
from red_tide_reanalysis.writers.ensemble_writer import write_ensemble_csv
from red_tide_reanalysis.writers.stats_writer import write_stats_csv

1. Load and Align Data#

obs_raw = load_observations("../Observation_Data/Observed_flow_ARCADIA_FL.csv")
wam_raw = load_wam_model("../Synthetic_Model_Data/Station 02296750 (ARCADIA)_reach000084_83.csv")
obs, model = align_obs_model(obs_raw, wam_raw)
print(f"Aligned series: {len(obs)} timesteps, {obs.index[0].date()} to {obs.index[-1].date()}")
WARNING: align.align_obs_model():  1461 observation dates have no WAM model counterpart (e.g., 1995-01-01). These dates will be excluded from analysis.
Aligned series: 9131 timesteps, 1999-01-01 to 2023-12-31

2. Configure and Run GLUE#

method = GLUEMethod(
    n_members=200,
    station_id="02296750_peace_river",
    n_draws=10000,
    nse_threshold=0.5,
    a_bounds=(0.1, 3.0),
    b_bounds=(-50.0, 50.0),
)
result = method.run(obs, model, n_members=200, seed=42)
print(f"GLUE: {result.config['n_behavioral']} behavioral / {result.config['n_draws']} total draws (NSE >= {result.config['nse_threshold']})")
print(f"Ensemble shape: {result.members.shape}")
Initializing the  Monte Carlo (MC) sampler  with  10000  repetitions
Starting the MC algorithm with 10000 repetitions...
Initialize database...
['csv', 'hdf5', 'ram', 'sql', 'custom', 'noData']
2145 of 10000, min objf=-8.86777, max objf=0.862195, time remaining: 00:00:07
4319 of 10000, min objf=-8.86777, max objf=0.86247, time remaining: 00:00:05
6493 of 10000, min objf=-8.86777, max objf=0.863911, time remaining: 00:00:03
8707 of 10000, min objf=-8.86777, max objf=0.863911, time remaining: 00:00:01

*** Final SPOTPY summary ***
Total Duration: 9.16 seconds
Total Repetitions: 10000
Minimal objective value: -8.86777
Corresponding parameter setting:
a: 2.99078
b: 49.515
Maximal objective value: 0.863911
Corresponding parameter setting:
a: 0.902571
b: 0.144144
******************************

GLUE: 1907 behavioral / 10000 total draws (NSE >= 0.5)
Ensemble shape: (200, 9131)

3. Threshold Sensitivity Table#

Shows behavioral vs. non-behavioral parameter set counts at three NSE cutoff values.

import spotpy
import spotpy.algorithms
import spotpy.objectivefunctions
import spotpy.parameter
from red_tide_reanalysis.glue.method import _GLUESpotSetup

# Re-run a fresh MC sample with the same seed for sensitivity diagnostics
np.random.seed(42)
spot_setup = _GLUESpotSetup(
    obs.values, model.values,
    a_bounds=(0.1, 3.0),
    b_bounds=(-50.0, 50.0),
)
sampler = spotpy.algorithms.mc(spot_setup, dbname="sensitivity", dbformat="ram", save_sim=False)
sampler.sample(10000)
raw_results = sampler.getdata()
# spotpy 1.6.7: field name is 'like1' (verified via dtype.names diagnostic)
nse_scores = np.squeeze(raw_results["like1"])

thresholds = [0.3, 0.5, 0.7]
total = len(nse_scores)
rows = [
    {
        "NSE Threshold": t,
        "Behavioral": int(np.sum(nse_scores >= t)),
        "Non-Behavioral": int(np.sum(nse_scores < t)),
        "% Behavioral": f"{100 * np.sum(nse_scores >= t) / total:.1f}%",
    }
    for t in thresholds
]
df_sensitivity = pd.DataFrame(rows)
print(f"\nGLUE Threshold Sensitivity (total draws: {total})")
display(df_sensitivity)
Initializing the  Monte Carlo (MC) sampler  with  10000  repetitions
Starting the MC algorithm with 10000 repetitions...
Initialize database...
['csv', 'hdf5', 'ram', 'sql', 'custom', 'noData']
2188 of 10000, min objf=-8.86777, max objf=0.862195, time remaining: 00:00:07
4361 of 10000, min objf=-8.86777, max objf=0.86247, time remaining: 00:00:05
6537 of 10000, min objf=-8.86777, max objf=0.863911, time remaining: 00:00:03
8716 of 10000, min objf=-8.86777, max objf=0.863911, time remaining: 00:00:01

*** Final SPOTPY summary ***
Total Duration: 9.2 seconds
Total Repetitions: 10000
Minimal objective value: -8.86777
Corresponding parameter setting:
a: 2.99078
b: 49.515
Maximal objective value: 0.863911
Corresponding parameter setting:
a: 0.902571
b: 0.144144
******************************


GLUE Threshold Sensitivity (total draws: 10000)
NSE Threshold Behavioral Non-Behavioral % Behavioral
0 0.3 3037 6963 30.4%
1 0.5 1907 8093 19.1%
2 0.7 863 9137 8.6%

4. CI Band Visualization#

5th–95th percentile shaded band from the 200-member GLUE ensemble. Observations as scatter points; WAM model output as a dashed red line.

p5  = np.percentile(result.members, 5,  axis=0)
p95 = np.percentile(result.members, 95, axis=0)
median = np.median(result.members, axis=0)

fig, ax = plt.subplots(figsize=(14, 5))
ax.fill_between(result.time_index, p5, p95, alpha=0.25, color="blue", label="5th-95th pctl")
ax.plot(result.time_index, median, color="yellow", linewidth=1.5, label="Median")
ax.plot(result.time_index, result.model_output.values, color="green", linewidth=1, linestyle="--", label="WAM Model")
ax.scatter(result.time_index, result.observations.values, color="red", s=8, zorder=5, label="Observations")
ax.set_xlabel("Date")
ax.set_ylabel("Discharge (CMS)")
ax.set_title("GLUE Ensemble — 200 Members (NSE >= 0.5)")
ax.legend(loc="upper right")
plt.tight_layout()
plt.show()
../_images/45da5d7c9a08bad9d72d41093350119c57973aaa2fc54c98ef7f6fa4fb0c51d0.png

5. Export CSVs#

output_dir = Path("data/outputs")
ens_path = write_ensemble_csv(result, output_dir / "ensembles")
stats_path = write_stats_csv(result, output_dir / "stats")
print(f"Ensemble CSV: {ens_path}")
print(f"Stats CSV:    {stats_path}")
Ensemble CSV: data\outputs\ensembles\glue_02296750_peace_river_discharge_members.csv
Stats CSV:    data\outputs\stats\glue_02296750_peace_river_discharge_stats.csv

Summary#

  • Forward model: Q_sim = a * Q_wam + b (2-parameter linear scaling)

  • MC sampling: 10,000 draws via spotpy

  • Operating NSE threshold: 0.5

  • Ensemble size: 200 members (resampled with replacement from behavioral sets)

  • Output files: data/outputs/ensembles/ and data/outputs/stats/