{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Radiation bias correction for low-cost temperature sensors\n",
    "\n",
    "Low-cost temperature devices (LCD) are increasingly used to densify urban meteorological networks at low deployment cost {cite}`muller2013sensors,hamidi2020state,wong2025government`. However, these devices often lack proper radiation shields, which causes their temperature readings to be systematically biased upward during daytime {cite}`bell2015good,chapman2016can,buchau2018modelling,cornes2020correcting,bosch2026swiss`.\n",
    "\n",
    "The `meteora.bias_correction` module provides tools to apply pre-trained correction models that remove this radiation-driven bias. Each model captures the radiation-to-temperature-bias relationship for a specific sensor type and can be shared as a [scikit-learn pipeline](https://scikit-learn.org/stable/modules/compose.html#pipeline) serialized with [skops](https://skops.readthedocs.io) on [Hugging Face Hub](https://huggingface.co/models).\n",
    "\n",
    "This notebook shows the following pipeline:\n",
    "\n",
    "1. Retrieve LCD temperature data from the AWEL network of Decentlab sensors in Zurich {cite}`awel2026lufttemperatur` using `meteora.clients.AWELClient`.\n",
    "2. Retrieve reference automated weather station (AWS) air temperature and shortwave radiation data from the [MeteoSwiss automated observation network (SwissMetNet)](https://www.meteoswiss.admin.ch/weather/measurement-systems/land-based-stations/automatic-measurement-network.html) using `meteora.clients.MeteoSwissClient`.\n",
    "3. Load a pre-trained correction model from Hugging Face Hub at [`martibosch/decentlab-bias-correction`](https://huggingface.co/martibosch/decentlab-bias-correction).\n",
    "4. Apply the correction and compare the results.\n",
    "\n",
    "The pre-trained model used here is fitted based on the temperature readings of a Decentlab sensor and the temperature and shortwave radiation measurements from [a reference AWS from SwissMetNet at Zollikofen (Switzerland)](https://www.meteoswiss.admin.ch/services-and-publications/applications/measurement-values.html#param=messwerte-lufttemperatur-10min&table=false&station=BER), collocated alongside further LCD models in an intercomparison field study in summer 2025 {cite}`bosch2026swiss`.\n",
    "\n",
    "```{note}\n",
    "This notebook requires the `sk` and `xvec` optional extras as well as the `interpret` library for the pre-trained correction models. The requirements can be installed as in, e.g.:\n",
    "\n",
    "pip install interpret meteora[sk,xvec]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import contextily as cx\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "from huggingface_hub import hf_hub_download\n",
    "from sklearn.linear_model import LinearRegression\n",
    "from sklearn.pipeline import Pipeline\n",
    "from skops import io as skops_io\n",
    "\n",
    "from meteora import clients, settings, utils\n",
    "from meteora.bias_correction import (\n",
    "    BestScaleRadiationTransformer,\n",
    "    apply_bias_correction,\n",
    "    load_correction_model,\n",
    ")\n",
    "\n",
    "figwidth, figheight = plt.rcParams[\"figure.figsize\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2",
   "metadata": {},
   "outputs": [],
   "source": [
    "region = \"Zurich, Switzerland\"\n",
    "start = \"2023-07-01\"\n",
    "end = \"2023-07-31\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3",
   "metadata": {},
   "source": [
    "## LCD sensor data\n",
    "\n",
    "We can start by getting LCD data for our study area, in this case, the city of Zurich (Switzerland). The AWEL (Amt für Abfall, Wasser, Energie und Luft) network operates Decentlab LoRa temperature sensors across the Zurich agglomeration. We use the `AWELClient` to retrieve temperature measurements for July 2023:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {},
   "outputs": [],
   "source": [
    "awel_client = clients.AWELClient(region)\n",
    "lcd_ts_df = utils.long_to_wide(\n",
    "    awel_client.get_ts_df(settings.ECV_TEMPERATURE, start=start, end=end)\n",
    ")\n",
    "lcd_stations_gdf = awel_client.stations_gdf\n",
    "lcd_ts_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "## Reference weather station data\n",
    "\n",
    "The bias correction relies on co-located shortwave radiation measurements to predict the radiation-induced temperature offset. We use the `MeteoSwissClient` to fetch air temperature and global shortwave radiation from official MeteoSwiss automated weather stations in the same region:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "aws_client = clients.MeteoSwissClient(region)\n",
    "aws_ts_df = aws_client.get_ts_df(\n",
    "    [settings.ECV_TEMPERATURE, settings.ECV_RADIATION_SHORTWAVE],\n",
    "    start=start,\n",
    "    end=end,\n",
    ")\n",
    "aws_ts_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7",
   "metadata": {},
   "source": [
    "We can visualise both station networks on a shared map:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8",
   "metadata": {},
   "outputs": [],
   "source": [
    "colors = sns.color_palette()\n",
    "fig, ax = plt.subplots()\n",
    "aws_client.stations_gdf.to_crs(lcd_stations_gdf.crs).plot(\n",
    "    ax=ax, color=colors[0], label=\"AWS (MeteoSwiss)\"\n",
    ")\n",
    "lcd_stations_gdf.plot(ax=ax, color=colors[1], label=\"LCD (AWEL)\")\n",
    "ax.legend()\n",
    "cx.add_basemap(ax, crs=lcd_stations_gdf.crs, attribution=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9",
   "metadata": {},
   "source": [
    "*(C) OpenStreetMap contributors, Tiles style by Humanitarian OpenStreetMap Team hosted by OpenStreetMap France*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10",
   "metadata": {},
   "source": [
    "For the spatial matching between LCD stations and their nearest AWS reference (needed to assign each LCD sensor the correct radiation time series), we can convert the long-form data frame and station geo-data frame to a [vector data cube](data-structures.ipynb#vector-data-cubes):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "aws_ts_cube = utils.long_to_cube(aws_ts_df, aws_client.stations_gdf)\n",
    "aws_ts_cube"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12",
   "metadata": {},
   "source": [
    "## Applying bias correction\n",
    "\n",
    "We call `apply_bias_correction`, passing the HuggingFace Hub repository string directly as the model. Since the pipeline is serialized with [skops](https://skops.readthedocs.io), it is recommended to first inspect the non-sklearn types it contains as a security step before deserialization:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "13",
   "metadata": {},
   "outputs": [],
   "source": [
    "model_repo = \"martibosch/decentlab-bias-correction\"\n",
    "\n",
    "trusted = skops_io.get_untrusted_types(file=hf_hub_download(model_repo, \"model.skops\"))\n",
    "print(\"Types to review before trusting:\")\n",
    "for t in trusted:\n",
    "    print(f\"  {t}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14",
   "metadata": {},
   "source": [
    "After reviewing the listed types, we can pass the HuggingFace Hub repository alongside the LCD and AWS data to `apply_bias_correction`, which will download and deserialize the model automatically. For each LCD station (`lcd_stations_gdf`), the function will find the nearest AWS reference station (`aws_ts_cube`), extract its radiation time series, run it through the model pipeline to predict the radiation-induced temperature offset, and finally subtract that offset from the raw LCD readings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15",
   "metadata": {},
   "outputs": [],
   "source": [
    "cor_ts_df = apply_bias_correction(\n",
    "    lcd_ts_df,\n",
    "    aws_ts_cube,\n",
    "    model_repo,\n",
    "    lcd_stations_gdf=lcd_stations_gdf,\n",
    "    trusted=trusted,\n",
    ")\n",
    "cor_ts_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16",
   "metadata": {},
   "source": [
    "Note that `ref_ts` can be any meteora data structure in wide (single reference station, flat time series index) or long (multiple reference station multi-indexed by station and time — in that order) form. Wide inputs apply the same correction to every LCD station (based on the radiation readings from the single reference station), whereas long inputs require `lcd_stations_gdf` and `ref_stations_gdf` so that each LCD station is matched to its nearest reference via a spatial join. When `ref_ts` is a vector data cube the geometry is used for the spatial join directly, so `ref_stations_gdf` is not needed.\n",
    "\n",
    "## Results\n",
    "\n",
    "We can now inspect the correction for a LCD station over a three day window in mid-July alongside the shortwave radiation of the nearest MeteoSwiss reference, to assess the differences between AWS and LCD data (raw and corrected) as well as how these relate to the incoming shortwave radiation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17",
   "metadata": {},
   "outputs": [],
   "source": [
    "aws_wide = utils.long_to_wide(aws_ts_df)\n",
    "station_id = lcd_ts_df.columns[0]\n",
    "plot_slice = slice(\"2023-07-10\", \"2023-07-12\")\n",
    "\n",
    "rad_wide = aws_wide[settings.ECV_RADIATION_SHORTWAVE]\n",
    "aws_temp_wide = aws_wide[settings.ECV_TEMPERATURE]\n",
    "aws_station_id = rad_wide.columns[0]  # nearest AWS station (approximate)\n",
    "\n",
    "fig, axes = plt.subplots(nrows=2, sharex=True, figsize=(2 * figwidth, figheight))\n",
    "\n",
    "axes[0].plot(\n",
    "    aws_temp_wide.loc[plot_slice, aws_station_id],\n",
    "    label=f\"AWS ({aws_station_id})\",\n",
    "    color=\"steelblue\",\n",
    "    alpha=0.8,\n",
    ")\n",
    "axes[0].plot(\n",
    "    lcd_ts_df.loc[plot_slice, station_id],\n",
    "    label=\"LCD (uncorrected)\",\n",
    "    color=\"tomato\",\n",
    "    alpha=0.8,\n",
    ")\n",
    "axes[0].plot(\n",
    "    cor_ts_df.loc[plot_slice, station_id],\n",
    "    label=\"LCD (corrected)\",\n",
    "    color=\"seagreen\",\n",
    "    linewidth=1.5,\n",
    "    linestyle=\"--\",\n",
    ")\n",
    "axes[0].set_ylabel(\"Temperature (\\u00b0C)\")\n",
    "axes[0].legend()\n",
    "\n",
    "axes[1].plot(\n",
    "    rad_wide.loc[plot_slice, aws_station_id],\n",
    "    color=\"goldenrod\",\n",
    "    label=f\"Shortwave radiation ({aws_station_id})\",\n",
    ")\n",
    "axes[1].set_ylabel(\"Radiation (W m\\u207b\\u00b2)\")\n",
    "axes[1].set_xlabel(\"Time\")\n",
    "axes[1].legend()\n",
    "\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18",
   "metadata": {},
   "source": [
    "## Training a correction model\n",
    "\n",
    "Besides supporting the application of pre-trained models (i.e., via `apply_bias_correction`), meteora also supports training custom bias correction models given a paired time series of shortwave radiation from a co-located reference AWS (`X_train`) and the observed temperature bias at the LCD sensor (`y_train`, i.e. LCD minus reference temperature). For the most part, this is essentially done using [scikit-learn's pipelines](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html), which is a sequence of data transformations with a final predictor model. We can actually inspect the structure of the pre-trained pipeline by loading it explicitly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = load_correction_model(model_repo, \"model.skops\", trusted=trusted)\n",
    "model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20",
   "metadata": {},
   "source": [
    "We can see that the pipeline is composed of the `BestScaleRadiationTransformer` followed by an `ExplainableBosstingRegressor` {cite}`lou2013accurate,nori2019interpretml`. The `BestScaleRadiationTransformer` from meteora is based on the observation that the temperature bias of an unshielded sensor is driven not only by instantaneous shortwave radiation but also by its accumulation over preceding time (due to the thermal inertia of the sensor) {cite}`bell2015good,cornes2020correcting,beele2022quality`. Accordingly, the transformer evaluates a set of candidate rolling-sum windows (in minutes) and selects the one most correlated with the observed temperature bias during `fit`; `transform` then applies that window to a new radiation time series and returns the result as a single-column DataFrame ready for any scikit-learn-compatible regressor.\n",
    "\n",
    "Below we illustrate the workflow with the MeteoSwiss radiation data and a synthetic bias of 3 mK per W m⁻²:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21",
   "metadata": {},
   "outputs": [],
   "source": [
    "rad_ser = aws_wide[settings.ECV_RADIATION_SHORTWAVE].iloc[:, 0].dropna()\n",
    "\n",
    "# X_train: time + radiation columns; y_train: temperature bias (LCD - reference)\n",
    "# in practice y_train comes from paired LCD–reference measurements\n",
    "X_train = pd.DataFrame(\n",
    "    {\n",
    "        settings.TIME_COL: rad_ser.index,\n",
    "        settings.ECV_RADIATION_SHORTWAVE: rad_ser.values,\n",
    "    }\n",
    ")\n",
    "y_train = (\n",
    "    0.003 * rad_ser.values\n",
    ")  # synthetic: 3 mK per W m\\u207b\\u00b2 (for illustration)\n",
    "\n",
    "window_minutes = [30, 60, 120, 240]\n",
    "pipeline = Pipeline(\n",
    "    [\n",
    "        (\"transformer\", BestScaleRadiationTransformer(window_minutes)),\n",
    "        (\"regressor\", LinearRegression()),\n",
    "    ]\n",
    ")\n",
    "pipeline.fit(X_train, y_train)\n",
    "print(f\"Best radiation window: {pipeline['transformer'].best_scale_} min\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22",
   "metadata": {},
   "source": [
    "While `LinearRegression` works well as a first-order approximation, non-linear models may better capture the relationship between accumulated radiation and sensor bias {cite}`cornes2020correcting,beele2022quality,bosch2026swiss`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23",
   "metadata": {},
   "source": [
    "The fitted pipeline can be serialized with [skops](https://skops.readthedocs.io) and uploaded to [Hugging Face Hub](https://huggingface.co/models) for later reuse:\n",
    "\n",
    "```python\n",
    "skops_io.dump(pipeline, \"model.skops\")\n",
    "\n",
    "from huggingface_hub import HfApi\n",
    "\n",
    "api = HfApi()\n",
    "api.create_repo(\"username/my-lcd-bias-correction\", repo_type=\"model\")\n",
    "api.upload_file(\n",
    "    path_or_fileobj=\"model.skops\",\n",
    "    path_in_repo=\"model.skops\",\n",
    "    repo_id=\"username/my-lcd-bias-correction\",\n",
    ")\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24",
   "metadata": {},
   "source": [
    "## References\n",
    "\n",
    "```{bibliography}\n",
    ":filter: docname in docnames\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "tags,-all"
  },
  "kernelspec": {
   "display_name": "Python (Pixi)",
   "language": "python",
   "name": "pixi-kernel-python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.11"
  },
  "pixi-kernel": {
   "environment": "doc"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}