Radiation bias correction for low-cost temperature sensors#
Low-cost temperature devices (LCD) are increasingly used to densify urban meteorological networks at low deployment cost (Muller et al., 2013, Wong et al., 2025). However, these devices often lack proper radiation shields, which causes their temperature readings to be systematically biased upward during daytime (Bell et al., 2015, Büchau, 2018, Chapman et al., 2016, Cornes et al., 2020, Climatology Group of the University of Bern, 2026).
The meteora.bias_correction module provides tools to apply pre-trained correction models that remove this radiation-driven bias. Each model captures the radiation-to-temperature-bias relationship for a specific sensor type and can be shared as a scikit-learn pipeline serialized with skops on Hugging Face Hub.
This notebook shows the following pipeline:
Retrieve LCD temperature data from the AWEL network of Decentlab sensors in Zurich (Amt für Abfall, Wasser, Energie und Luft, 2026) using
meteora.clients.AWELClient.Retrieve reference automated weather station (AWS) air temperature and shortwave radiation data from the MeteoSwiss automated observation network (SwissMetNet) using
meteora.clients.MeteoSwissClient.Load a pre-trained correction model from Hugging Face Hub at
martibosch/decentlab-bias-correction.Apply the correction and compare the results.
The pre-trained model used here is fitted based on the temperature readings of a Decentlab sensor and the temperature and shortwave radiation measurements from a reference AWS from SwissMetNet at Zollikofen (Switzerland), collocated alongside further LCD models in an intercomparison field study in summer 2025 (Climatology Group of the University of Bern, 2026).
Note
This notebook requires the sk and xvec optional extras as well as the interpret library for the pre-trained correction models. The requirements can be installed as in, e.g.:
pip install interpret meteora[sk,xvec]
import contextily as cx
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from huggingface_hub import hf_hub_download
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from skops import io as skops_io
from meteora import clients, settings, utils
from meteora.bias_correction import (
BestScaleRadiationTransformer,
apply_bias_correction,
load_correction_model,
)
figwidth, figheight = plt.rcParams["figure.figsize"]
region = "Zurich, Switzerland"
start = "2023-07-01"
end = "2023-07-31"
LCD sensor data#
We can start by getting LCD data for our study area, in this case, the city of Zurich (Switzerland). The AWEL (Amt für Abfall, Wasser, Energie und Luft) network operates Decentlab LoRa temperature sensors across the Zurich agglomeration. We use the AWELClient to retrieve temperature measurements for July 2023:
awel_client = clients.AWELClient(region)
lcd_ts_df = utils.long_to_wide(
awel_client.get_ts_df(settings.ECV_TEMPERATURE, start=start, end=end)
)
lcd_stations_gdf = awel_client.stations_gdf
lcd_ts_df.head()
| station_id | 530 | 534 | 2651 | 2652 | 2653 | 2655 | 2656 | 2657 | 2659 | 2679 | 2680 | 2682 | 2683 | 2688 | 2689 | 2695 | 2696 | 2697 | 2810 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| time | |||||||||||||||||||
| 2023-07-01 00:00:00 | NaN | 17.01 | 15.82 | 16.78 | 14.53 | 16.30 | NaN | 16.30 | 16.70 | 16.20 | 17.50 | 16.92 | 16.28 | 16.68 | 16.96 | 16.64 | 16.04 | 15.73 | 16.08 |
| 2023-07-01 00:10:00 | 16.75 | 17.14 | 15.87 | 16.82 | 14.54 | 16.41 | NaN | NaN | 16.75 | 16.17 | 17.52 | 16.95 | 16.27 | 16.78 | 17.14 | 16.61 | 16.03 | 15.60 | 16.05 |
| 2023-07-01 00:20:00 | 16.72 | 17.14 | 15.87 | 16.91 | 14.49 | 16.51 | NaN | 16.35 | 16.79 | 16.38 | 17.56 | 17.01 | 16.34 | 16.83 | 17.15 | 16.67 | 16.03 | 15.55 | 16.04 |
| 2023-07-01 00:30:00 | 16.72 | 17.25 | 15.90 | 17.07 | 14.49 | 16.57 | NaN | 16.43 | 16.86 | 16.57 | 17.54 | 17.16 | 16.28 | 16.95 | 17.26 | 16.71 | 16.09 | 15.44 | 15.97 |
| 2023-07-01 00:40:00 | 16.69 | 17.14 | 15.95 | 17.11 | 14.49 | 16.68 | NaN | 16.47 | 16.87 | 16.59 | 17.48 | 17.31 | NaN | 17.11 | 17.32 | 16.75 | 15.98 | 15.40 | 15.99 |
Reference weather station data#
The bias correction relies on co-located shortwave radiation measurements to predict the radiation-induced temperature offset. We use the MeteoSwissClient to fetch air temperature and global shortwave radiation from official MeteoSwiss automated weather stations in the same region:
aws_client = clients.MeteoSwissClient(region)
aws_ts_df = aws_client.get_ts_df(
[settings.ECV_TEMPERATURE, settings.ECV_RADIATION_SHORTWAVE],
start=start,
end=end,
)
aws_ts_df.head()
| temperature | radiation_shortwave | ||
|---|---|---|---|
| station_id | time | ||
| REH | 2023-07-01 00:00:00 | 15.6 | 0.0 |
| 2023-07-01 00:10:00 | 15.7 | 0.0 | |
| 2023-07-01 00:20:00 | 15.5 | 0.0 | |
| 2023-07-01 00:30:00 | 15.5 | 0.0 | |
| 2023-07-01 00:40:00 | 15.5 | 1.0 |
We can visualise both station networks on a shared map:
colors = sns.color_palette()
fig, ax = plt.subplots()
aws_client.stations_gdf.to_crs(lcd_stations_gdf.crs).plot(
ax=ax, color=colors[0], label="AWS (MeteoSwiss)"
)
lcd_stations_gdf.plot(ax=ax, color=colors[1], label="LCD (AWEL)")
ax.legend()
cx.add_basemap(ax, crs=lcd_stations_gdf.crs, attribution="")
(C) OpenStreetMap contributors, Tiles style by Humanitarian OpenStreetMap Team hosted by OpenStreetMap France
For the spatial matching between LCD stations and their nearest AWS reference (needed to assign each LCD sensor the correct radiation time series), we can convert the long-form data frame and station geo-data frame to a vector data cube:
aws_ts_cube = utils.long_to_cube(aws_ts_df, aws_client.stations_gdf)
aws_ts_cube
<xarray.Dataset> Size: 242kB
Dimensions: (geometry: 3, time: 4321)
Coordinates:
* geometry (geometry) geometry 24B POINT (2681433 1253548) ... ...
station_id (geometry) object 24B 'REH' 'SMA' 'UEB'
* time (time) datetime64[us] 35kB 2023-07-01 ... 2023-07-31
Data variables:
temperature (geometry, time) float64 104kB 15.6 15.7 ... nan nan
radiation_shortwave (geometry, time) float64 104kB 0.0 0.0 0.0 ... 4.0 4.0
Indexes:
geometry GeometryIndex (crs=EPSG:2056)Applying bias correction#
We call apply_bias_correction, passing the HuggingFace Hub repository string directly as the model. Since the pipeline is serialized with skops, it is recommended to first inspect the non-sklearn types it contains as a security step before deserialization:
model_repo = "martibosch/decentlab-bias-correction"
trusted = skops_io.get_untrusted_types(file=hf_hub_download(model_repo, "model.skops"))
print("Types to review before trusting:")
for t in trusted:
print(f" {t}")
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Types to review before trusting:
interpret.glassbox._ebm._ebm.ExplainableBoostingRegressor
meteora.bias_correction.BestScaleRadiationTransformer
After reviewing the listed types, we can pass the HuggingFace Hub repository alongside the LCD and AWS data to apply_bias_correction, which will download and deserialize the model automatically. For each LCD station (lcd_stations_gdf), the function will find the nearest AWS reference station (aws_ts_cube), extract its radiation time series, run it through the model pipeline to predict the radiation-induced temperature offset, and finally subtract that offset from the raw LCD readings:
cor_ts_df = apply_bias_correction(
lcd_ts_df,
aws_ts_cube,
model_repo,
lcd_stations_gdf=lcd_stations_gdf,
trusted=trusted,
)
cor_ts_df.head()
/home/docs/checkouts/readthedocs.org/user_builds/meteora/checkouts/latest/.pixi/envs/doc/lib/python3.13/site-packages/sklearn/base.py:463: InconsistentVersionWarning: Trying to unpickle estimator Pipeline from version 1.7.2 when using version 1.8.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
warnings.warn(
| station_id | 530 | 534 | 2651 | 2652 | 2653 | 2655 | 2656 | 2657 | 2659 | 2679 | 2680 | 2682 | 2683 | 2688 | 2689 | 2695 | 2696 | 2697 | 2810 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| time | |||||||||||||||||||
| 2023-07-01 00:00:00 | NaN | 17.354606 | 16.164606 | 17.124606 | 14.874606 | 16.644606 | NaN | 16.644606 | 17.044606 | 16.544606 | 17.844606 | 17.264606 | 16.624606 | 17.024606 | 17.304606 | 16.984606 | 16.384606 | 16.044612 | 16.394612 |
| 2023-07-01 00:10:00 | 17.094606 | 17.484606 | 16.214606 | 17.164606 | 14.884606 | 16.754606 | NaN | NaN | 17.094606 | 16.514606 | 17.864606 | 17.294606 | 16.614606 | 17.124606 | 17.484606 | 16.954606 | 16.374606 | 15.897380 | 16.347380 |
| 2023-07-01 00:20:00 | 17.064606 | 17.484606 | 16.214606 | 17.254606 | 14.834606 | 16.854606 | NaN | 16.694606 | 17.134606 | 16.724606 | 17.904606 | 17.354606 | 16.684606 | 17.174606 | 17.494606 | 17.014606 | 16.374606 | 15.841275 | 16.331275 |
| 2023-07-01 00:30:00 | 17.064606 | 17.594606 | 16.244606 | 17.414606 | 14.834606 | 16.914606 | NaN | 16.774606 | 17.204606 | 16.914606 | 17.884606 | 17.504606 | 16.624606 | 17.294606 | 17.604606 | 17.054606 | 16.434606 | 15.690199 | 16.220199 |
| 2023-07-01 00:40:00 | 17.040136 | 17.484606 | 16.300136 | 17.454606 | 14.834606 | 17.030136 | NaN | 16.820136 | 17.214606 | 16.934606 | 17.824606 | 17.654606 | NaN | 17.460136 | 17.664606 | 17.100136 | 16.330136 | 15.645376 | 16.235376 |
Note that ref_ts can be any meteora data structure in wide (single reference station, flat time series index) or long (multiple reference station multi-indexed by station and time — in that order) form. Wide inputs apply the same correction to every LCD station (based on the radiation readings from the single reference station), whereas long inputs require lcd_stations_gdf and ref_stations_gdf so that each LCD station is matched to its nearest reference via a spatial join. When ref_ts is a vector data cube the geometry is used for the spatial join directly, so ref_stations_gdf is not needed.
Results#
We can now inspect the correction for a LCD station over a three day window in mid-July alongside the shortwave radiation of the nearest MeteoSwiss reference, to assess the differences between AWS and LCD data (raw and corrected) as well as how these relate to the incoming shortwave radiation:
aws_wide = utils.long_to_wide(aws_ts_df)
station_id = lcd_ts_df.columns[0]
plot_slice = slice("2023-07-10", "2023-07-12")
rad_wide = aws_wide[settings.ECV_RADIATION_SHORTWAVE]
aws_temp_wide = aws_wide[settings.ECV_TEMPERATURE]
aws_station_id = rad_wide.columns[0] # nearest AWS station (approximate)
fig, axes = plt.subplots(nrows=2, sharex=True, figsize=(2 * figwidth, figheight))
axes[0].plot(
aws_temp_wide.loc[plot_slice, aws_station_id],
label=f"AWS ({aws_station_id})",
color="steelblue",
alpha=0.8,
)
axes[0].plot(
lcd_ts_df.loc[plot_slice, station_id],
label="LCD (uncorrected)",
color="tomato",
alpha=0.8,
)
axes[0].plot(
cor_ts_df.loc[plot_slice, station_id],
label="LCD (corrected)",
color="seagreen",
linewidth=1.5,
linestyle="--",
)
axes[0].set_ylabel("Temperature (\u00b0C)")
axes[0].legend()
axes[1].plot(
rad_wide.loc[plot_slice, aws_station_id],
color="goldenrod",
label=f"Shortwave radiation ({aws_station_id})",
)
axes[1].set_ylabel("Radiation (W m\u207b\u00b2)")
axes[1].set_xlabel("Time")
axes[1].legend()
fig.tight_layout()
Training a correction model#
Besides supporting the application of pre-trained models (i.e., via apply_bias_correction), meteora also supports training custom bias correction models given a paired time series of shortwave radiation from a co-located reference AWS (X_train) and the observed temperature bias at the LCD sensor (y_train, i.e. LCD minus reference temperature). For the most part, this is essentially done using scikit-learn’s pipelines, which is a sequence of data transformations with a final predictor model. We can actually inspect the structure of the pre-trained pipeline by loading it explicitly:
model = load_correction_model(model_repo, "model.skops", trusted=trusted)
model
/home/docs/checkouts/readthedocs.org/user_builds/meteora/checkouts/latest/.pixi/envs/doc/lib/python3.13/site-packages/sklearn/base.py:463: InconsistentVersionWarning: Trying to unpickle estimator Pipeline from version 1.7.2 when using version 1.8.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
warnings.warn(
Pipeline(steps=[('radiation_transformer',
BestScaleRadiationTransformer(radiation_col='radiation_shortwave',
time_col='time',
window_minutes=[60, 120, 180,
240, 300])),
('model', ExplainableBoostingRegressor())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
Parameters
| window_minutes | [60, 120, ...] | |
| time_col | 'time' | |
| radiation_col | 'radiation_shortwave' |
Parameters
| feature_names | None | |
| feature_types | None | |
| max_bins | 1024 | |
| max_interaction_bins | 64 | |
| interactions | '5x' | |
| exclude | None | |
| validation_size | 0.15 | |
| outer_bags | 14 | |
| inner_bags | 0 | |
| learning_rate | 0.04 | |
| greedy_ratio | 10.0 | |
| cyclic_progress | False | |
| smoothing_rounds | 500 | |
| interaction_smoothing_rounds | 100 | |
| max_rounds | 50000 | |
| early_stopping_rounds | 100 | |
| early_stopping_tolerance | 1e-05 | |
| callback | None | |
| min_samples_leaf | 4 | |
| min_hessian | 0.0 | |
| reg_alpha | 0.0 | |
| reg_lambda | 0.0 | |
| max_delta_step | 0.0 | |
| gain_scale | 5.0 | |
| min_cat_samples | 10 | |
| cat_smooth | 10.0 | |
| missing | 'separate' | |
| max_leaves | 2 | |
| monotone_constraints | None | |
| objective | 'rmse' | |
| n_jobs | -2 | |
| random_state | 42 |
We can see that the pipeline is composed of the BestScaleRadiationTransformer followed by an ExplainableBosstingRegressor (Lou et al., 2013, Nori et al., 2019). The BestScaleRadiationTransformer from meteora is based on the observation that the temperature bias of an unshielded sensor is driven not only by instantaneous shortwave radiation but also by its accumulation over preceding time (due to the thermal inertia of the sensor) (Beele et al., 2022, Bell et al., 2015, Cornes et al., 2020). Accordingly, the transformer evaluates a set of candidate rolling-sum windows (in minutes) and selects the one most correlated with the observed temperature bias during fit; transform then applies that window to a new radiation time series and returns the result as a single-column DataFrame ready for any scikit-learn-compatible regressor.
Below we illustrate the workflow with the MeteoSwiss radiation data and a synthetic bias of 3 mK per W m⁻²:
rad_ser = aws_wide[settings.ECV_RADIATION_SHORTWAVE].iloc[:, 0].dropna()
# X_train: time + radiation columns; y_train: temperature bias (LCD - reference)
# in practice y_train comes from paired LCD–reference measurements
X_train = pd.DataFrame(
{
settings.TIME_COL: rad_ser.index,
settings.ECV_RADIATION_SHORTWAVE: rad_ser.values,
}
)
y_train = (
0.003 * rad_ser.values
) # synthetic: 3 mK per W m\u207b\u00b2 (for illustration)
window_minutes = [30, 60, 120, 240]
pipeline = Pipeline(
[
("transformer", BestScaleRadiationTransformer(window_minutes)),
("regressor", LinearRegression()),
]
)
pipeline.fit(X_train, y_train)
print(f"Best radiation window: {pipeline['transformer'].best_scale_} min")
Best radiation window: 30 min
While LinearRegression works well as a first-order approximation, non-linear models may better capture the relationship between accumulated radiation and sensor bias (Beele et al., 2022, Cornes et al., 2020, Climatology Group of the University of Bern, 2026).
The fitted pipeline can be serialized with skops and uploaded to Hugging Face Hub for later reuse:
skops_io.dump(pipeline, "model.skops")
from huggingface_hub import HfApi
api = HfApi()
api.create_repo("username/my-lcd-bias-correction", repo_type="model")
api.upload_file(
path_or_fileobj="model.skops",
path_in_repo="model.skops",
repo_id="username/my-lcd-bias-correction",
)
References#
Eva Beele, Maarten Reyniers, Raf Aerts, and Ben Somers. Quality control and correction method for air temperature data from a citizen science weather station network in leuven, belgium. Earth System Science Data, 14(10):4681–4717, 2022.
Simon Bell, Dan Cornford, and Lucy Bastin. How good are citizen weather stations? addressing a biased opinion. Weather, 70(3):75–84, 2015.
Yann Georg Büchau. Modelling Shield Temperature Sensors: An Assessment of the Netatmo Citizen Weather Station. PhD thesis, Universität Hamburg, 2018.
Lee Chapman, Cassandra Bell, and Simon Bell. Can the crowdsourcing data paradigm take atmospheric science to a new level?: a case study of the urban heat island of london quantified using netatmo weather stations. International Journal of Climatology, 2016.
Richard C Cornes, Marieke Dirksen, and Raymond Sluiter. Correcting citizen-science air temperature measurements across the netherlands for short wave radiation bias. Meteorological Applications, 27(1):e1814, 2020.
Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker. Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 623–631. ACM, 2013.
C. Muller, L. Chapman, C. S. B. Grimmond, D. Young, and X. Cai. Sensors and the city: a review of urban meteorological networks. International Journal of Climatology, 33(7):1585–1600, 2013.
Harsha Nori, Samuel Jenkins, Paul Koch, and Rich Caruana. Interpretml: a unified framework for machine learning interpretability. arXiv preprint arXiv:1909.09223, 2019.
Carmen Hau Man Wong, Yu Ting Kwok, Yueyang He, and Edward Ng. Government-involved urban meteorological networks (umns): a global review. Urban Climate, 61:102409, 2025.
Amt für Abfall, Wasser, Energie und Luft. Lufttemperatur und luftfeuchte lora-sensor-messwerte. Available from (in German) https://opendata.swiss/de/dataset/lufttemperatur-und-luftfeuchte-lora-sensor-messwerte. Accessed: 14 January 2026, 2026.