Radiation bias correction for low-cost temperature sensors#

Low-cost temperature devices (LCD) are increasingly used to densify urban meteorological networks at low deployment cost (Muller et al., 2013, Wong et al., 2025). However, these devices often lack proper radiation shields, which causes their temperature readings to be systematically biased upward during daytime (Bell et al., 2015, Büchau, 2018, Chapman et al., 2016, Cornes et al., 2020, Climatology Group of the University of Bern, 2026).

The meteora.bias_correction module provides tools to apply pre-trained correction models that remove this radiation-driven bias. Each model captures the radiation-to-temperature-bias relationship for a specific sensor type and can be shared as a scikit-learn pipeline serialized with skops on Hugging Face Hub.

This notebook shows the following pipeline:

  1. Retrieve LCD temperature data from the AWEL network of Decentlab sensors in Zurich (Amt für Abfall, Wasser, Energie und Luft, 2026) using meteora.clients.AWELClient.

  2. Retrieve reference automated weather station (AWS) air temperature and shortwave radiation data from the MeteoSwiss automated observation network (SwissMetNet) using meteora.clients.MeteoSwissClient.

  3. Load a pre-trained correction model from Hugging Face Hub at martibosch/decentlab-bias-correction.

  4. Apply the correction and compare the results.

The pre-trained model used here is fitted based on the temperature readings of a Decentlab sensor and the temperature and shortwave radiation measurements from a reference AWS from SwissMetNet at Zollikofen (Switzerland), collocated alongside further LCD models in an intercomparison field study in summer 2025 (Climatology Group of the University of Bern, 2026).

Note

This notebook requires the sk and xvec optional extras as well as the interpret library for the pre-trained correction models. The requirements can be installed as in, e.g.:

pip install interpret meteora[sk,xvec]

import contextily as cx
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from huggingface_hub import hf_hub_download
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from skops import io as skops_io

from meteora import clients, settings, utils
from meteora.bias_correction import (
    BestScaleRadiationTransformer,
    apply_bias_correction,
    load_correction_model,
)

figwidth, figheight = plt.rcParams["figure.figsize"]
region = "Zurich, Switzerland"
start = "2023-07-01"
end = "2023-07-31"

LCD sensor data#

We can start by getting LCD data for our study area, in this case, the city of Zurich (Switzerland). The AWEL (Amt für Abfall, Wasser, Energie und Luft) network operates Decentlab LoRa temperature sensors across the Zurich agglomeration. We use the AWELClient to retrieve temperature measurements for July 2023:

awel_client = clients.AWELClient(region)
lcd_ts_df = utils.long_to_wide(
    awel_client.get_ts_df(settings.ECV_TEMPERATURE, start=start, end=end)
)
lcd_stations_gdf = awel_client.stations_gdf
lcd_ts_df.head()
station_id 530 534 2651 2652 2653 2655 2656 2657 2659 2679 2680 2682 2683 2688 2689 2695 2696 2697 2810
time
2023-07-01 00:00:00 NaN 17.01 15.82 16.78 14.53 16.30 NaN 16.30 16.70 16.20 17.50 16.92 16.28 16.68 16.96 16.64 16.04 15.73 16.08
2023-07-01 00:10:00 16.75 17.14 15.87 16.82 14.54 16.41 NaN NaN 16.75 16.17 17.52 16.95 16.27 16.78 17.14 16.61 16.03 15.60 16.05
2023-07-01 00:20:00 16.72 17.14 15.87 16.91 14.49 16.51 NaN 16.35 16.79 16.38 17.56 17.01 16.34 16.83 17.15 16.67 16.03 15.55 16.04
2023-07-01 00:30:00 16.72 17.25 15.90 17.07 14.49 16.57 NaN 16.43 16.86 16.57 17.54 17.16 16.28 16.95 17.26 16.71 16.09 15.44 15.97
2023-07-01 00:40:00 16.69 17.14 15.95 17.11 14.49 16.68 NaN 16.47 16.87 16.59 17.48 17.31 NaN 17.11 17.32 16.75 15.98 15.40 15.99

Reference weather station data#

The bias correction relies on co-located shortwave radiation measurements to predict the radiation-induced temperature offset. We use the MeteoSwissClient to fetch air temperature and global shortwave radiation from official MeteoSwiss automated weather stations in the same region:

aws_client = clients.MeteoSwissClient(region)
aws_ts_df = aws_client.get_ts_df(
    [settings.ECV_TEMPERATURE, settings.ECV_RADIATION_SHORTWAVE],
    start=start,
    end=end,
)
aws_ts_df.head()
temperature radiation_shortwave
station_id time
REH 2023-07-01 00:00:00 15.6 0.0
2023-07-01 00:10:00 15.7 0.0
2023-07-01 00:20:00 15.5 0.0
2023-07-01 00:30:00 15.5 0.0
2023-07-01 00:40:00 15.5 1.0

We can visualise both station networks on a shared map:

colors = sns.color_palette()
fig, ax = plt.subplots()
aws_client.stations_gdf.to_crs(lcd_stations_gdf.crs).plot(
    ax=ax, color=colors[0], label="AWS (MeteoSwiss)"
)
lcd_stations_gdf.plot(ax=ax, color=colors[1], label="LCD (AWEL)")
ax.legend()
cx.add_basemap(ax, crs=lcd_stations_gdf.crs, attribution="")
../_images/c7d38e595f429b732b41161062000d7c36877008d685a6eb2f6bcfbc0045aff5.png

(C) OpenStreetMap contributors, Tiles style by Humanitarian OpenStreetMap Team hosted by OpenStreetMap France

For the spatial matching between LCD stations and their nearest AWS reference (needed to assign each LCD sensor the correct radiation time series), we can convert the long-form data frame and station geo-data frame to a vector data cube:

aws_ts_cube = utils.long_to_cube(aws_ts_df, aws_client.stations_gdf)
aws_ts_cube
<xarray.Dataset> Size: 242kB
Dimensions:              (geometry: 3, time: 4321)
Coordinates:
  * geometry             (geometry) geometry 24B POINT (2681433 1253548) ... ...
    station_id           (geometry) object 24B 'REH' 'SMA' 'UEB'
  * time                 (time) datetime64[us] 35kB 2023-07-01 ... 2023-07-31
Data variables:
    temperature          (geometry, time) float64 104kB 15.6 15.7 ... nan nan
    radiation_shortwave  (geometry, time) float64 104kB 0.0 0.0 0.0 ... 4.0 4.0
Indexes:
    geometry  GeometryIndex (crs=EPSG:2056)

Applying bias correction#

We call apply_bias_correction, passing the HuggingFace Hub repository string directly as the model. Since the pipeline is serialized with skops, it is recommended to first inspect the non-sklearn types it contains as a security step before deserialization:

model_repo = "martibosch/decentlab-bias-correction"

trusted = skops_io.get_untrusted_types(file=hf_hub_download(model_repo, "model.skops"))
print("Types to review before trusting:")
for t in trusted:
    print(f"  {t}")
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Types to review before trusting:
  interpret.glassbox._ebm._ebm.ExplainableBoostingRegressor
  meteora.bias_correction.BestScaleRadiationTransformer

After reviewing the listed types, we can pass the HuggingFace Hub repository alongside the LCD and AWS data to apply_bias_correction, which will download and deserialize the model automatically. For each LCD station (lcd_stations_gdf), the function will find the nearest AWS reference station (aws_ts_cube), extract its radiation time series, run it through the model pipeline to predict the radiation-induced temperature offset, and finally subtract that offset from the raw LCD readings:

cor_ts_df = apply_bias_correction(
    lcd_ts_df,
    aws_ts_cube,
    model_repo,
    lcd_stations_gdf=lcd_stations_gdf,
    trusted=trusted,
)
cor_ts_df.head()
/home/docs/checkouts/readthedocs.org/user_builds/meteora/checkouts/latest/.pixi/envs/doc/lib/python3.13/site-packages/sklearn/base.py:463: InconsistentVersionWarning: Trying to unpickle estimator Pipeline from version 1.7.2 when using version 1.8.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
station_id 530 534 2651 2652 2653 2655 2656 2657 2659 2679 2680 2682 2683 2688 2689 2695 2696 2697 2810
time
2023-07-01 00:00:00 NaN 17.354606 16.164606 17.124606 14.874606 16.644606 NaN 16.644606 17.044606 16.544606 17.844606 17.264606 16.624606 17.024606 17.304606 16.984606 16.384606 16.044612 16.394612
2023-07-01 00:10:00 17.094606 17.484606 16.214606 17.164606 14.884606 16.754606 NaN NaN 17.094606 16.514606 17.864606 17.294606 16.614606 17.124606 17.484606 16.954606 16.374606 15.897380 16.347380
2023-07-01 00:20:00 17.064606 17.484606 16.214606 17.254606 14.834606 16.854606 NaN 16.694606 17.134606 16.724606 17.904606 17.354606 16.684606 17.174606 17.494606 17.014606 16.374606 15.841275 16.331275
2023-07-01 00:30:00 17.064606 17.594606 16.244606 17.414606 14.834606 16.914606 NaN 16.774606 17.204606 16.914606 17.884606 17.504606 16.624606 17.294606 17.604606 17.054606 16.434606 15.690199 16.220199
2023-07-01 00:40:00 17.040136 17.484606 16.300136 17.454606 14.834606 17.030136 NaN 16.820136 17.214606 16.934606 17.824606 17.654606 NaN 17.460136 17.664606 17.100136 16.330136 15.645376 16.235376

Note that ref_ts can be any meteora data structure in wide (single reference station, flat time series index) or long (multiple reference station multi-indexed by station and time — in that order) form. Wide inputs apply the same correction to every LCD station (based on the radiation readings from the single reference station), whereas long inputs require lcd_stations_gdf and ref_stations_gdf so that each LCD station is matched to its nearest reference via a spatial join. When ref_ts is a vector data cube the geometry is used for the spatial join directly, so ref_stations_gdf is not needed.

Results#

We can now inspect the correction for a LCD station over a three day window in mid-July alongside the shortwave radiation of the nearest MeteoSwiss reference, to assess the differences between AWS and LCD data (raw and corrected) as well as how these relate to the incoming shortwave radiation:

aws_wide = utils.long_to_wide(aws_ts_df)
station_id = lcd_ts_df.columns[0]
plot_slice = slice("2023-07-10", "2023-07-12")

rad_wide = aws_wide[settings.ECV_RADIATION_SHORTWAVE]
aws_temp_wide = aws_wide[settings.ECV_TEMPERATURE]
aws_station_id = rad_wide.columns[0]  # nearest AWS station (approximate)

fig, axes = plt.subplots(nrows=2, sharex=True, figsize=(2 * figwidth, figheight))

axes[0].plot(
    aws_temp_wide.loc[plot_slice, aws_station_id],
    label=f"AWS ({aws_station_id})",
    color="steelblue",
    alpha=0.8,
)
axes[0].plot(
    lcd_ts_df.loc[plot_slice, station_id],
    label="LCD (uncorrected)",
    color="tomato",
    alpha=0.8,
)
axes[0].plot(
    cor_ts_df.loc[plot_slice, station_id],
    label="LCD (corrected)",
    color="seagreen",
    linewidth=1.5,
    linestyle="--",
)
axes[0].set_ylabel("Temperature (\u00b0C)")
axes[0].legend()

axes[1].plot(
    rad_wide.loc[plot_slice, aws_station_id],
    color="goldenrod",
    label=f"Shortwave radiation ({aws_station_id})",
)
axes[1].set_ylabel("Radiation (W m\u207b\u00b2)")
axes[1].set_xlabel("Time")
axes[1].legend()

fig.tight_layout()
../_images/c9d24bb01318e3167c7aa1e5d842c3de3dce7c738c54c40dcde1c19348a4e077.png

Training a correction model#

Besides supporting the application of pre-trained models (i.e., via apply_bias_correction), meteora also supports training custom bias correction models given a paired time series of shortwave radiation from a co-located reference AWS (X_train) and the observed temperature bias at the LCD sensor (y_train, i.e. LCD minus reference temperature). For the most part, this is essentially done using scikit-learn’s pipelines, which is a sequence of data transformations with a final predictor model. We can actually inspect the structure of the pre-trained pipeline by loading it explicitly:

model = load_correction_model(model_repo, "model.skops", trusted=trusted)
model
/home/docs/checkouts/readthedocs.org/user_builds/meteora/checkouts/latest/.pixi/envs/doc/lib/python3.13/site-packages/sklearn/base.py:463: InconsistentVersionWarning: Trying to unpickle estimator Pipeline from version 1.7.2 when using version 1.8.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
Pipeline(steps=[('radiation_transformer',
                 BestScaleRadiationTransformer(radiation_col='radiation_shortwave',
                                               time_col='time',
                                               window_minutes=[60, 120, 180,
                                                               240, 300])),
                ('model', ExplainableBoostingRegressor())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

We can see that the pipeline is composed of the BestScaleRadiationTransformer followed by an ExplainableBosstingRegressor (Lou et al., 2013, Nori et al., 2019). The BestScaleRadiationTransformer from meteora is based on the observation that the temperature bias of an unshielded sensor is driven not only by instantaneous shortwave radiation but also by its accumulation over preceding time (due to the thermal inertia of the sensor) (Beele et al., 2022, Bell et al., 2015, Cornes et al., 2020). Accordingly, the transformer evaluates a set of candidate rolling-sum windows (in minutes) and selects the one most correlated with the observed temperature bias during fit; transform then applies that window to a new radiation time series and returns the result as a single-column DataFrame ready for any scikit-learn-compatible regressor.

Below we illustrate the workflow with the MeteoSwiss radiation data and a synthetic bias of 3 mK per W m⁻²:

rad_ser = aws_wide[settings.ECV_RADIATION_SHORTWAVE].iloc[:, 0].dropna()

# X_train: time + radiation columns; y_train: temperature bias (LCD - reference)
# in practice y_train comes from paired LCD–reference measurements
X_train = pd.DataFrame(
    {
        settings.TIME_COL: rad_ser.index,
        settings.ECV_RADIATION_SHORTWAVE: rad_ser.values,
    }
)
y_train = (
    0.003 * rad_ser.values
)  # synthetic: 3 mK per W m\u207b\u00b2 (for illustration)

window_minutes = [30, 60, 120, 240]
pipeline = Pipeline(
    [
        ("transformer", BestScaleRadiationTransformer(window_minutes)),
        ("regressor", LinearRegression()),
    ]
)
pipeline.fit(X_train, y_train)
print(f"Best radiation window: {pipeline['transformer'].best_scale_} min")
Best radiation window: 30 min

While LinearRegression works well as a first-order approximation, non-linear models may better capture the relationship between accumulated radiation and sensor bias (Beele et al., 2022, Cornes et al., 2020, Climatology Group of the University of Bern, 2026).

The fitted pipeline can be serialized with skops and uploaded to Hugging Face Hub for later reuse:

skops_io.dump(pipeline, "model.skops")

from huggingface_hub import HfApi

api = HfApi()
api.create_repo("username/my-lcd-bias-correction", repo_type="model")
api.upload_file(
    path_or_fileobj="model.skops",
    path_in_repo="model.skops",
    repo_id="username/my-lcd-bias-correction",
)

References#

[1] (1,2)

Eva Beele, Maarten Reyniers, Raf Aerts, and Ben Somers. Quality control and correction method for air temperature data from a citizen science weather station network in leuven, belgium. Earth System Science Data, 14(10):4681–4717, 2022.

[2] (1,2)

Simon Bell, Dan Cornford, and Lucy Bastin. How good are citizen weather stations? addressing a biased opinion. Weather, 70(3):75–84, 2015.

[3]

Yann Georg Büchau. Modelling Shield Temperature Sensors: An Assessment of the Netatmo Citizen Weather Station. PhD thesis, Universität Hamburg, 2018.

[4]

Lee Chapman, Cassandra Bell, and Simon Bell. Can the crowdsourcing data paradigm take atmospheric science to a new level?: a case study of the urban heat island of london quantified using netatmo weather stations. International Journal of Climatology, 2016.

[5] (1,2,3)

Richard C Cornes, Marieke Dirksen, and Raymond Sluiter. Correcting citizen-science air temperature measurements across the netherlands for short wave radiation bias. Meteorological Applications, 27(1):e1814, 2020.

[6]

Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker. Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 623–631. ACM, 2013.

[7]

C. Muller, L. Chapman, C. S. B. Grimmond, D. Young, and X. Cai. Sensors and the city: a review of urban meteorological networks. International Journal of Climatology, 33(7):1585–1600, 2013.

[8]

Harsha Nori, Samuel Jenkins, Paul Koch, and Rich Caruana. Interpretml: a unified framework for machine learning interpretability. arXiv preprint arXiv:1909.09223, 2019.

[9]

Carmen Hau Man Wong, Yu Ting Kwok, Yueyang He, and Edward Ng. Government-involved urban meteorological networks (umns): a global review. Urban Climate, 61:102409, 2025.

[10]

Amt für Abfall, Wasser, Energie und Luft. Lufttemperatur und luftfeuchte lora-sensor-messwerte. Available from (in German) https://opendata.swiss/de/dataset/lufttemperatur-und-luftfeuchte-lora-sensor-messwerte. Accessed: 14 January 2026, 2026.

[11] (1,2,3)

Climatology Group of the University of Bern. Revisiting urban heat indices in Switzerland using low-cost measurement networks. Manuscript under preparation, 2026.