API#

Available clients#

AEMET#

AEMET client.

class meteora.clients.aemet.AemetClient(region, api_key, **sjoin_kwargs)#

AEMET client.

Parameters:
  • region (str, Sequence, GeoSeries, GeoDataFrame, PathLike, or IO) –

    The region to process. This can be either:

    • A string with a place name (Nominatim query) to geocode.

    • A sequence with the west, south, east and north bounds.

    • A geometric object, e.g., shapely geometry, or a sequence of geometric objects. In such a case, the value will be passed as the data argument of the GeoSeries constructor, and needs to be in the same CRS as the one used by the client’s class (i.e., the CRS class attribute).

    • A geopandas geo-series or geo-data frame.

    • A filename or URL, a file-like object opened in binary (‘rb’) mode, or a Path object that will be passed to geopandas.read_file.

  • api_key (str) – AEMET API key.

  • sjoin_kwargs (dict, optional) – Keyword arguments to pass to the geopandas.sjoin function when filtering the stations within the region. If None, the value from settings.SJOIN_KWARGS is used.

get_ts_df(variables)#

Get time series data frame for the last 24h.

Parameters:

variables (str, int or list-like of str or int) – Target variables, which can be either an AEMET variable code (integer or string) or an essential climate variable (ECV) following the Meteora nomenclature (string).

Returns:

ts_df – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column). The time level of the index is timezone-aware in the data source’s timezone (the client’s TZ attribute).

Return type:

pandas.DataFrame

property request_headers: dict#

Request headers.

property variables_df: DataFrame#

Variables dataframe.

Agrometeo#

Agrometeo client.

class meteora.clients.agrometeo.AgrometeoClient(region, *, crs=None, **sjoin_kwargs)#

Agrometeo client.

Parameters:
  • region (str, Sequence, GeoSeries, GeoDataFrame, PathLike, or IO) –

    The region to process. This can be either:

    • A string with a place name (Nominatim query) to geocode.

    • A sequence with the west, south, east and north bounds.

    • A geometric object, e.g., shapely geometry, or a sequence of geometric objects. In such a case, the value will be passed as the data argument of the GeoSeries constructor, and needs to be in the same CRS as the one used by the client’s class (i.e., the CRS class attribute).

    • A geopandas geo-series or geo-data frame.

    • A filename or URL, a file-like object opened in binary (‘rb’) mode, or a Path object that will be passed to geopandas.read_file.

  • crs (str, dict or pyproj.CRS, optional) – The coordinate reference system (CRS) to be used. For Agrometeo, the provided value must be equivalent to either the EPSG:21781 (default) or EPSG:4326.

  • sjoin_kwargs (dict, optional) – Keyword arguments to pass to the geopandas.sjoin function when filtering the stations within the region. If None, the value from settings.SJOIN_KWARGS is used.

get_ts_df(variables, start, end, *, scale=None, measurement=None)#

Get time series data frame.

Parameters:
  • variables (str, int or list-like of str or int) – Target variables, which can be either an Agrometeo variable code (integer or string) or an essential climate variable (ECV) following the Meteora nomenclature (string).

  • start (datetime-like, str, int, float) – Values representing the start and end of the requested data period respectively. Accepts any datetime-like object that can be passed to pandas.Timestamp. Naive values are interpreted in the data source’s timezone (the client’s TZ attribute); timezone-aware values are converted to it.

  • end (datetime-like, str, int, float) – Values representing the start and end of the requested data period respectively. Accepts any datetime-like object that can be passed to pandas.Timestamp. Naive values are interpreted in the data source’s timezone (the client’s TZ attribute); timezone-aware values are converted to it.

  • scale ({"hour", "day", "month", "year"}, optional) – Temporal scale of the measurements. If None, returns the finest scale, i.e., 10 minutes.

  • measurement ({"min", "avg", "max"}, optional) – Whether the measurement values correspond to the minimum, average or maximum value for the required temporal scale. If None, returns the average. Ignored if scale is None.

Returns:

ts_df – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column). The time level of the index is timezone-aware in the data source’s timezone (the client’s TZ attribute).

Return type:

pandas.DataFrame

AWEL#

Office for Waste, Water, Energy and Air (AWEL) of the canton of Zurich.

class meteora.clients.awel.AWELClient(region, *, sensor_height=2, pooch_kwargs=None, progress=None, **sjoin_kwargs)#

AWEL client (canton of Zurich).

Parameters:
  • region (str, Sequence, GeoSeries, GeoDataFrame, PathLike, or IO) –

    The region to process. This can be either:

    • A string with a place name (Nominatim query) to geocode.

    • A sequence with the west, south, east and north bounds.

    • A geometric object, e.g., shapely geometry, or a sequence of geometric objects. In such a case, the value will be passed as the data argument of the GeoSeries constructor, and needs to be in the same CRS as the one used by the client’s class (i.e., the CRS class attribute).

    • A geopandas geo-series or geo-data frame.

    • A filename or URL, a file-like object opened in binary (‘rb’) mode, or a Path object that will be passed to geopandas.read_file.

  • sensor_height ({1, 1.8, 2, 2.5, 2.7, 3.5, 4}, default 2) – Sensor height (in m) above site location, must be a value among the following: 1, 1.8, 2, 2.5, 2.7, 3.5, 4. The default value is 2 m.

  • pooch_kwargs (dict, optional) – Keyword arguments to pass to the pooch.retrieve function when downloading the stations time series data.

  • progress (bool, optional) – Whether to show a tqdm progress bar for partitioned time series fetches. If None, the value from settings.SHOW_PROGRESS is used.

  • sjoin_kwargs (dict, optional) – Keyword arguments to pass to the geopandas.sjoin function when filtering the stations within the region. If None, the value from settings.SJOIN_KWARGS is used.

get_ts_df(variables, start, end)#

Get time series data frame.

Parameters:
  • variables (str, int or list-like of str or int) – Target variables, which can be either an AWEL variable code (string) or an essential climate variable (ECV) following the Meteora nomenclature (string).

  • start (datetime-like, str, int, float) – Values representing the start and end of the requested data period respectively. Accepts any datetime-like object that can be passed to pandas.Timestamp. Naive values are interpreted in the data source’s timezone (the client’s TZ attribute); timezone-aware values are converted to it.

  • end (datetime-like, str, int, float) – Values representing the start and end of the requested data period respectively. Accepts any datetime-like object that can be passed to pandas.Timestamp. Naive values are interpreted in the data source’s timezone (the client’s TZ attribute); timezone-aware values are converted to it.

Returns:

ts_df – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column). The time level of the index is timezone-aware in the data source’s timezone (the client’s TZ attribute).

Return type:

pandas.DataFrame

Iowa Environmental Mesonet (IEM)#

Iowa Environmental Mesonet (IEM) client.

class meteora.clients.iem.ASOSOneMinIEMClient(region, **sjoin_kwargs)#

ASOS 1 minute Iowa Environmental Mesonet (IEM) client.

Parameters:
  • region (str, Sequence, GeoSeries, GeoDataFrame, PathLike, or IO) –

    The region to process. This can be either:

    • A string with a place name (Nominatim query) to geocode.

    • A sequence with the west, south, east and north bounds.

    • A geometric object, e.g., shapely geometry, or a sequence of geometric objects. In such a case, the value will be passed as the data argument of the GeoSeries constructor, and needs to be in the same CRS as the one used by the client’s class (i.e., the CRS class attribute).

    • A geopandas geo-series or geo-data frame.

    • A filename or URL, a file-like object opened in binary (‘rb’) mode, or a Path object that will be passed to geopandas.read_file.

  • sjoin_kwargs (dict, optional) – Keyword arguments to pass to the geopandas.sjoin function when filtering the stations within the region. If None, the value from settings.SJOIN_KWARGS is used.

class meteora.clients.iem.METARASOSIEMClient(region, **sjoin_kwargs)#

METAR/ASOS Iowa Environmental Mesonet (IEM) client.

Parameters:
  • region (str, Sequence, GeoSeries, GeoDataFrame, PathLike, or IO) –

    The region to process. This can be either:

    • A string with a place name (Nominatim query) to geocode.

    • A sequence with the west, south, east and north bounds.

    • A geometric object, e.g., shapely geometry, or a sequence of geometric objects. In such a case, the value will be passed as the data argument of the GeoSeries constructor, and needs to be in the same CRS as the one used by the client’s class (i.e., the CRS class attribute).

    • A geopandas geo-series or geo-data frame.

    • A filename or URL, a file-like object opened in binary (‘rb’) mode, or a Path object that will be passed to geopandas.read_file.

  • sjoin_kwargs (dict, optional) – Keyword arguments to pass to the geopandas.sjoin function when filtering the stations within the region. If None, the value from settings.SJOIN_KWARGS is used.

Meteocat#

Meteocat client.

class meteora.clients.meteocat.MeteocatClient(region, api_key, *, progress=None, **sjoin_kwargs)#

Meteocat client.

Parameters:
  • region (str, Sequence, GeoSeries, GeoDataFrame, PathLike, or IO) –

    The region to process. This can be either:

    • A string with a place name (Nominatim query) to geocode.

    • A sequence with the west, south, east and north bounds.

    • A geometric object, e.g., shapely geometry, or a sequence of geometric objects. In such a case, the value will be passed as the data argument of the GeoSeries constructor, and needs to be in the same CRS as the one used by the client’s class (i.e., the CRS class attribute).

    • A geopandas geo-series or geo-data frame.

    • A filename or URL, a file-like object opened in binary (‘rb’) mode, or a Path object that will be passed to geopandas.read_file.

  • api_key (str) – Meteocat API key.

  • progress (bool, optional) – Whether to show a tqdm progress bar for partitioned time series fetches. If None, the value from settings.SHOW_PROGRESS is used.

  • sjoin_kwargs (dict, optional) – Keyword arguments to pass to the geopandas.sjoin function when filtering the stations within the region. If None, the value from settings.SJOIN_KWARGS is used.

get_ts_df(variables, start, end)#

Get time series data frame.

Parameters:
  • variables (str, int or list-like of str or int) – Target variables, which can be either a Meteocat variable code (integer or string) or an essential climate variable (ECV) following the Meteora nomenclature (string).

  • start (datetime-like, str, int, float) – Values representing the start and end of the requested data period respectively. Accepts any datetime-like object that can be passed to pandas.Timestamp. Naive values are interpreted in the data source’s timezone (the client’s TZ attribute); timezone-aware values are converted to it.

  • end (datetime-like, str, int, float) – Values representing the start and end of the requested data period respectively. Accepts any datetime-like object that can be passed to pandas.Timestamp. Naive values are interpreted in the data source’s timezone (the client’s TZ attribute); timezone-aware values are converted to it.

Returns:

ts_df – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column). The time level of the index is timezone-aware in the data source’s timezone (the client’s TZ attribute).

Return type:

pandas.DataFrame

MeteoSwiss#

MeteoSwiss client.

class meteora.clients.meteoswiss.MeteoSwissClient(region, *, crs=None, pooch_kwargs=None, progress=None, **sjoin_kwargs)#

MeteoSwiss client.

Parameters:
  • region (str, Sequence, GeoSeries, GeoDataFrame, PathLike, or IO) –

    The region to process. This can be either:

    • A string with a place name (Nominatim query) to geocode.

    • A sequence with the west, south, east and north bounds.

    • A geometric object, e.g., shapely geometry, or a sequence of geometric objects. In such a case, the value will be passed as the data argument of the GeoSeries constructor, and needs to be in the same CRS as the one used by the client’s class (i.e., the CRS class attribute).

    • A geopandas geo-series or geo-data frame.

    • A filename or URL, a file-like object opened in binary (‘rb’) mode, or a Path object that will be passed to geopandas.read_file.

  • crs (str, dict or pyproj.CRS, optional) – The coordinate reference system (CRS) to be used. For MeteoSwiss, the provided value must be equivalent to either the EPSG:2056 (default) or EPSG:4326.

  • pooch_kwargs (dict, optional) – Keyword arguments to pass to the pooch.retrieve function when caching file downloads.

  • progress (bool, optional) – Whether to show a tqdm progress bar for partitioned time series fetches. If None, the value from settings.SHOW_PROGRESS is used.

  • sjoin_kwargs (dict, optional) – Keyword arguments to pass to the geopandas.sjoin function when filtering the stations within the region. If None, the value from settings.SJOIN_KWARGS is used.

get_ts_df(variables, start, end)#

Get time series data frame.

Parameters:
  • variables (str, int or list-like of str or int) – Target variables, which can be either an Agrometeo variable code (integer or string) or an essential climate variable (ECV) following the Meteora nomenclature (string).

  • start (datetime-like, str, int, float) – Values representing the start and end of the requested data period respectively. Accepts any datetime-like object that can be passed to pandas.Timestamp. Naive values are interpreted in the data source’s timezone (the client’s TZ attribute); timezone-aware values are converted to it.

  • end (datetime-like, str, int, float) – Values representing the start and end of the requested data period respectively. Accepts any datetime-like object that can be passed to pandas.Timestamp. Naive values are interpreted in the data source’s timezone (the client’s TZ attribute); timezone-aware values are converted to it.

Returns:

ts_df – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column). The time level of the index is timezone-aware in the data source’s timezone (the client’s TZ attribute).

Return type:

pandas.DataFrame

Netatmo#

Netatmo client.

class meteora.clients.netatmo.NetatmoClient(region, client_id, client_secret, *, redirect_uri=None, token=None, window_size=None, **sjoin_kwargs)#

Netatmo client.

Parameters:
  • region (str, Sequence, GeoSeries, GeoDataFrame, PathLike, or IO) –

    The region to process. This can be either:

    • A string with a place name (Nominatim query) to geocode.

    • A sequence with the west, south, east and north bounds.

    • A geometric object, e.g., shapely geometry, or a sequence of geometric objects. In such a case, the value will be passed as the data argument of the GeoSeries constructor, and needs to be in the same CRS as the one used by the client’s class (i.e., the CRS class attribute).

    • A geopandas geo-series or geo-data frame.

    • A filename or URL, a file-like object opened in binary (‘rb’) mode, or a Path object that will be passed to geopandas.read_file.

  • client_id (str) – Client ID and secret of the Netatmo API, used to authenticate the client, i.e., to obtain and refresh tokens.

  • client_secret (str) – Client ID and secret of the Netatmo API, used to authenticate the client, i.e., to obtain and refresh tokens.

  • redirect_uri (str, optional) – Redirect URI for the Netatmo app, used for the “Authorization code” grant type authentication (by default). Ignored if token is provided. If None, the value from settings.REDIRECT_URI is used.

  • token (dict, optional) – Token dictionary with the keys “access_token” and “refresh_token”.

  • window_size (numeric, optional) – Window size (square side, in degrees) to split the region into non-overlapping windows (to bypass Netatmo API limits). If None, the value from clients.netatmo.WINDOW_SIZE is used.

  • sjoin_kwargs (dict, optional) – Keyword arguments to pass to the geopandas.sjoin function when filtering the stations within the region. If None, the value from settings.SJOIN_KWARGS is used.

get_ts_df(variables, start, end, *, scale=None, limit=None, real_time=None)#

Get time series data frame.

Parameters:
  • variables (str, int or list-like of str or int) – Target variables, which can be either a Netatmo variable code (integer or string) or an essential climate variable (ECV) following the Meteora nomenclature (string).

  • start (datetime-like, str, int, float) – Values representing the start and end of the requested data period respectively. Accepts any datetime-like object that can be passed to pandas.Timestamp. Naive values are interpreted in the data source’s timezone (the client’s TZ attribute); timezone-aware values are converted to it.

  • end (datetime-like, str, int, float) – Values representing the start and end of the requested data period respectively. Accepts any datetime-like object that can be passed to pandas.Timestamp. Naive values are interpreted in the data source’s timezone (the client’s TZ attribute); timezone-aware values are converted to it.

  • scale ({"30min", "1hour", "3hours", "1day", "1week", "1month"}, optional) – Temporal scale of the measurements. If None, returns the finest scale, i.e., “30min” (30 minutes).

  • limit (int, optional) – Maximum number of time steps to return. If None, the maximum number allowed by the Netatmo API (1024) is used.

  • real_time (bool, optional) – A value of True returns the exact timestamps. Otherwise, when scale is different than the maximum, i.e., 30 minutes, timestamps are offset by half of the scale. If None, the default value of False is used (in line with the Netatmo API).

Returns:

ts_df – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column). The time level of the index is timezone-aware in the data source’s timezone (the client’s TZ attribute).

Return type:

pandas.DataFrame

National Oceanic And Atmospheric Administration (NOAA)#

National Oceanic And Atmospheric Administration (NOAA) client.

class meteora.clients.noaa.GHCNHourlyClient(region, *, pooch_kwargs=None, progress=None, **sjoin_kwargs)#

NOAA GHCN hourly client.

Parameters:
  • region (str, Sequence, GeoSeries, GeoDataFrame, PathLike, or IO) –

    The region to process. This can be either:

    • A string with a place name (Nominatim query) to geocode.

    • A sequence with the west, south, east and north bounds.

    • A geometric object, e.g., shapely geometry, or a sequence of geometric objects. In such a case, the value will be passed as the data argument of the GeoSeries constructor, and needs to be in the same CRS as the one used by the client’s class (i.e., the CRS class attribute).

    • A geopandas geo-series or geo-data frame.

    • A filename or URL, a file-like object opened in binary (‘rb’) mode, or a Path object that will be passed to geopandas.read_file.

  • pooch_kwargs (dict, optional) – Keyword arguments to pass to the pooch.retrieve function when downloading the stations time series data.

  • progress (bool, optional) – Whether to show a tqdm progress bar for partitioned time series fetches. If None, the value from settings.SHOW_PROGRESS is used.

  • sjoin_kwargs (dict, optional) – Keyword arguments to pass to the geopandas.sjoin function when filtering the stations within the region. If None, the value from settings.SJOIN_KWARGS is used.

get_ts_df(variables, start, end)#

Get time series data frame.

Parameters:
  • variables (str, int or list-like of str or int) – Target variables, which can be either an GHCNh variable code (string) or an essential climate variable (ECV) following the Meteora nomenclature (string).

  • start (datetime-like, str, int, float) – Values representing the start and end of the requested data period respectively. Accepts any datetime-like object that can be passed to pandas.Timestamp. Naive values are interpreted in the data source’s timezone (the client’s TZ attribute); timezone-aware values are converted to it.

  • end (datetime-like, str, int, float) – Values representing the start and end of the requested data period respectively. Accepts any datetime-like object that can be passed to pandas.Timestamp. Naive values are interpreted in the data source’s timezone (the client’s TZ attribute); timezone-aware values are converted to it.

Returns:

ts_df – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column). The time level of the index is timezone-aware in the data source’s timezone (the client’s TZ attribute).

Return type:

pandas.DataFrame

Quality-control#

Quality control for CWS data.

Based on Napoly et al., 2018 (https://doi.org/10.3389/feart.2018.00118)

class meteora.qc.QCPipeline(*, stations=None, elevation=None, steps=None, step_kwargs=None)#

Chain of QC steps applied to a (wide) time series data frame.

The pipeline runs an ordered list of steps, dropping the stations that each step flags (and applying any value transformation, e.g. the elevation adjustment). Which steps run and in which order is controlled by the steps argument; each step’s parameters are passed as its own dict in the positionally-matched step_kwargs list.

Parameters:
  • stations (geopandas.GeoDataFrame or geopandas.GeoSeries, optional) – Station locations indexed by the station id, either as a geo-data frame or as the geometry geoseries directly. Used to auto-populate the station_gser argument of the steps that declare it (flag_mislocated, flag_buddies); if None, those steps are omitted from the defaults (and self-skip with a warning if listed explicitly).

  • elevation (str or pandas.Series, optional) – Source of the per-station elevation used by adjust_elevation, as either a column of stations (str) or a series indexed by the station id. Defaults to the column named in settings.ELEVATION_COL. Used to auto-populate the station_elevation_ser argument; if it cannot be resolved (e.g. the column is absent), adjust_elevation is omitted from the defaults. It (like the geometry) can always be overridden per step via step_kwargs.

  • steps (list of str or callable, optional) – Ordered list of steps to run. Each item is either the name of a built-in function (one of settings.DEFAULT_QC_STEPS plus “flag_buddies”, “mask_outliers”) or a callable with the signature step(ts_df, **kwargs) -> (ts_df, discarded_station_ids). If None, settings.DEFAULT_QC_STEPS is used, omitting steps whose station metadata is absent (the buddy check is not a default step - add it with, e.g., steps=[*settings.DEFAULT_QC_STEPS, “flag_buddies”]).

  • step_kwargs (list of dict, optional) – Per-step keyword arguments, positionally matched to steps (so its length must equal that of steps). Each dict is forwarded to the corresponding step, on top of the auto-populated station metadata (which it can override); omit it (or pass {}) to fall back to the settings.* defaults. Only valid when steps is given explicitly; when steps is None the defaults are built automatically.

discarded_#

Mapping of step name (the function name) to the list of station ids it discarded, populated by apply.

Type:

dict

apply(ts_df, *, copy=True)#

Run the QC steps on a (wide) time series data frame.

Parameters:
  • ts_df (pandas.DataFrame) – Wide time series data frame with stations as columns and time as index.

  • copy (bool, default True) – Whether to operate on a copy of ts_df (leaving the input untouched).

Returns:

ts_df – Quality-controlled (and possibly elevation-adjusted) time series data frame, with the discarded stations dropped. The ids discarded by each step are available in the discarded_ attribute; their union is in discarded_stations.

Return type:

pandas.DataFrame

property discarded_stations: list#

Union of the stations discarded across all steps (after apply).

meteora.qc.adjust_elevation(ts_df, *, station_elevation_ser=None, atmospheric_lapse_rate=None)#

Adjust temperature measurements based on station elevation.

Unlike the other (station-discarding) QC steps, this one transforms the time series and discards no station (its discard list is always empty).

Parameters:
  • ts_df (pandas.DataFrame) – Time series of measurements (rows) for each station (columns).

  • station_elevation_ser (pandas.Series, optional) – Series of station elevations, indexed by the station id. If None, the step is skipped and ts_df is returned unchanged.

  • atmospheric_lapse_rate (numeric, optional) – Atmospheric lapse rate (in unit of ts_df per unit of station_elevation_ser) to account for the elevation effect. If None, the value from settings.ATMOSPHERIC_LAPSE_RATE is used.

Returns:

adjusted_ts_df, discarded – The elevation-adjusted time series data frame and an (always empty) discard list.

Return type:

pandas.DataFrame, list

meteora.qc.flag_buddies(ts_df, *, station_gser=None, buddy_radius=None, buddy_min_n=None, low_alpha=None, high_alpha=None, station_outlier_threshold=None, keep_isolated=False)#

Spatial buddy check.

Outlier detection within the neighbourhood (“buddies”) of each station, intended to catch faulty values - primarily single unrealistically high values due to radiative errors - that remain after the station-wise checks. For each station, the buddies are the stations within buddy_radius; at each time step the median and Qn scale estimator (a robust alternative to the standard deviation) are computed across the buddies (excluding the checked station), and a modified z-score is derived analogously to flag_outliers. A station is flagged as a buddy outlier when the proportion of its time steps with an outlier z-score exceeds station_outlier_threshold. Adapted from the m5 level of CrowdQC+ (Fenner et al., 2021, https://doi.org/10.3389/feart.2021.720747); unlike the original, this check neither corrects for the atmospheric lapse rate (this is handled separately by adjust_elevation) nor flags individual values - only whole stations.

Parameters:
  • ts_df (pandas.DataFrame) – Time series of measurements (rows) for each station (columns).

  • station_gser (geopandas.GeoSeries) – Geoseries of station locations (points), indexed by the station id. It is (re)projected to a metric CRS to compute the neighbourhoods, so buddy_radius is interpreted in meters. If None, the step is skipped (with a warning) and no station is flagged.

  • buddy_radius (numeric, optional) – Radius (in meters) within which neighbouring stations are considered buddies. If None, the value from settings.BUDDY_RADIUS is used.

  • buddy_min_n (int, optional) – Minimum number of buddies with valid data required to check a station. Stations with fewer buddies are flagged as isolated rather than evaluated. If None, the value from settings.BUDDY_MIN_N is used.

  • low_alpha (numeric, optional) – Values for the lower and upper tail respectively (in proportion from 0 to 1) that lead to flagging a measurement as a buddy outlier. If None, the respective values from settings.BUDDY_LOW_ALPHA and settings.BUDDY_HIGH_ALPHA are used.

  • high_alpha (numeric, optional) – Values for the lower and upper tail respectively (in proportion from 0 to 1) that lead to flagging a measurement as a buddy outlier. If None, the respective values from settings.BUDDY_LOW_ALPHA and settings.BUDDY_HIGH_ALPHA are used.

  • station_outlier_threshold (numeric, optional) – Maximum proportion (from 0 to 1) of buddy-outlier measurements after which the respective station is flagged as a buddy outlier. If None, the value from settings.BUDDY_STATION_OUTLIER_THRESHOLD is used.

  • keep_isolated (bool, default False) – Whether to keep isolated stations (those with fewer than buddy_min_n buddies, which cannot be evaluated). By default they are discarded along with the buddy outliers; set to True to exclude them from the returned list.

Returns:

ts_df, discard_stations – The (unchanged) time series data frame and the list of station ids to discard, i.e. those flagged as buddy outliers and (unless keep_isolated) those flagged as isolated (fewer than buddy_min_n buddies).

Return type:

pandas.DataFrame, list

meteora.qc.flag_indoor(ts_df, *, station_indoor_corr_threshold=None)#

Flag indoor stations.

Stations whose time series of measurements show low correlations with the spatial median time series are likely set up indoors.

Parameters:
  • ts_df (pandas.DataFrame) – Time series of measurements (rows) for each station (columns).

  • station_indoor_corr_threshold (numeric, optional) – Stations showing Pearson correlations (with the overall station median distribution) lower than this threshold are likely set up indoors. If None, the value from settings.STATION_INDOOR_CORR_THRESHOLD is used.

Returns:

ts_df, indoor_stations – The (unchanged) time series data frame and the list of station ids flagged as indoor.

Return type:

pandas.DataFrame, list

meteora.qc.flag_mislocated(ts_df, *, station_gser=None)#

Flag mislocated stations.

When multiple stations share the same location, it is likely due to an incorrect set up that led to automatic location assignment based on the IP address of the wireless network.

Parameters:
  • ts_df (pandas.DataFrame) – Time series of measurements (rows) for each station (columns); returned unchanged (this step flags stations purely from their geometry).

  • station_gser (geopandas.GeoSeries, optional) – Geoseries of station locations (points). If None, the step is skipped (with a warning) and no station is flagged.

Returns:

ts_df, mislocated_stations – The (unchanged) time series data frame and the list of station ids considered mislocated.

Return type:

pandas.DataFrame, list

meteora.qc.flag_outliers(ts_df, *, low_alpha=None, high_alpha=None, station_outlier_threshold=None)#

Flag outlier stations.

Measurements can show suspicious deviations from a normal distribution (based on a modified z-score using robust Qn variance estimators). Stations with high proportion of such measurements can be related to radiative errors in non-shaded areas or other measurement errors.

Parameters:
  • ts_df (pandas.DataFrame) – Time series of measurements (rows) for each station (columns).

  • low_alpha (numeric, optional) – Values for the lower and upper tail respectively (in proportion from 0 to 1) that lead to the rejection of the null hypothesis (i.e., the corresponding measurement does not follow a normal distribution can be considered an outlier). If None, the respective values from settings.OUTLIER_LOW_ALPHA and settings.OUTLIER_HIGH_ALPHA are used.

  • high_alpha (numeric, optional) – Values for the lower and upper tail respectively (in proportion from 0 to 1) that lead to the rejection of the null hypothesis (i.e., the corresponding measurement does not follow a normal distribution can be considered an outlier). If None, the respective values from settings.OUTLIER_LOW_ALPHA and settings.OUTLIER_HIGH_ALPHA are used.

  • station_outlier_threshold (numeric, optional) – Maximum proportion (from 0 to 1) of outlier measurements after which the respective station may be flagged as faulty. If None, the value from settings.STATION_OUTLIER_THRESHOLD is used.

Returns:

ts_df, outlier_stations – The (unchanged) time series data frame and the list of station ids flagged as outlier.

Return type:

pandas.DataFrame, list

meteora.qc.flag_unreliable(ts_df, *, unreliable_threshold=None)#

Flag stations with a high proportion of non-valid measurements.

Parameters:
  • ts_df (pandas.DataFrame) – Time series of measurements (rows) for each station (columns).

  • unreliable_threshold (numeric, optional) – Proportion of non-valid measurements after which a station is considered unreliable. If None, the value from settings.UNRELIABLE_THRESHOLD is used.

Returns:

ts_df, unreliable_stations – The (unchanged) time series data frame and the list of station ids considered unreliable.

Return type:

pandas.DataFrame, list

meteora.qc.mask_outliers(ts_df, *, low_alpha=None, high_alpha=None)#

Mask (set to NaN) individual outlier measurements.

Unlike flag_outliers, which discards whole stations, this step replaces the individual measurements flagged as outliers (by the same modified z-score, see _outlier_mask) with NaN, leaving the rest untouched and discarding no station. It therefore transforms the time series rather than flagging stations, closer to the m2 level of CrowdQC+ (Fenner et al., 2021), which flags values rather than stations. It is deliberately not part of settings.DEFAULT_QC_STEPS.

Parameters:
  • ts_df (pandas.DataFrame) – Time series of measurements (rows) for each station (columns).

  • low_alpha (numeric, optional) – Lower and upper tail proportions (from 0 to 1) beyond which a measurement is masked. If None, the respective values from settings.OUTLIER_LOW_ALPHA and settings.OUTLIER_HIGH_ALPHA are used.

  • high_alpha (numeric, optional) – Lower and upper tail proportions (from 0 to 1) beyond which a measurement is masked. If None, the respective values from settings.OUTLIER_LOW_ALPHA and settings.OUTLIER_HIGH_ALPHA are used.

Returns:

masked_ts_df, discarded – The time series data frame with the outlier measurements set to NaN, and an (always empty) discard list.

Return type:

pandas.DataFrame, list

Climate indices#

Climate indices with xclim integration.

meteora.climate_indices.cooling_degree_days(ts_df, *, thresh=None, freq=None, temperature_col=None, temperature_unit='degC')#

Compute cooling degree days with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • thresh (str, optional) – Temperature threshold. If None, the xclim default is used.

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • temperature_col (str, optional) – Column holding temperature values. If None, the first column in ts_df is used.

  • temperature_unit (str, default "degC") – Units of the temperature values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily mean temperature before computing the index.

meteora.climate_indices.daily_temperature_range(ts_df, *, freq=None, op=None, temperature_col=None, temperature_unit='degC')#

Compute daily temperature range with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • op (str, optional) – Aggregation operator for the temperature range. If None, the xclim default is used.

  • temperature_col (str, optional) – Column holding temperature values. If None, the first column in ts_df is used.

  • temperature_unit (str, default "degC") – Units of the temperature values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily minimum and maximum temperature before computing the index.

meteora.climate_indices.dry_days(ts_df, *, thresh=None, freq=None, op=None, precipitation_col=None, precipitation_unit='mm/day')#

Compute dry days with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • thresh (str, optional) – Precipitation threshold. If None, the xclim default is used.

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • op (str, optional) – Comparison operator. If None, the xclim default is used.

  • precipitation_col (str, optional) – Column holding precipitation values. If None, the default ECV name settings.ECV_PRECIPITATION is used.

  • precipitation_unit (str, default "mm/day") – Units of the precipitation values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily total precipitation before computing the index.

meteora.climate_indices.frost_days(ts_df, *, thresh=None, freq=None, temperature_col=None, temperature_unit='degC')#

Compute frost days with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • thresh (str, optional) – Temperature threshold. If None, the xclim default is used.

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • temperature_col (str, optional) – Column holding temperature values. If None, the first column in ts_df is used.

  • temperature_unit (str, default "degC") – Units of the temperature values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily minimum temperature before computing the index.

meteora.climate_indices.heat_index(ts_df, *, temperature_col=None, temperature_unit='degC', relative_humidity_col=None, relative_humidity_unit='%')#

Compute heat index with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • temperature_col (str, optional) – Column holding temperature values. If None, the default ECV name settings.ECV_TEMPERATURE is used.

  • temperature_unit (str, default "degC") – Units of the temperature values. Ignored when ts_df provides unit metadata.

  • relative_humidity_col (str, optional) – Column holding relative humidity values. If None, the default ECV name settings.ECV_RELATIVE_HUMIDITY is used.

  • relative_humidity_unit (str, default "%") – Units of the relative humidity values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

meteora.climate_indices.heating_degree_days(ts_df, *, thresh=None, freq=None, temperature_col=None, temperature_unit='degC')#

Compute heating degree days with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • thresh (str, optional) – Temperature threshold. If None, the xclim default is used.

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • temperature_col (str, optional) – Column holding temperature values. If None, the first column in ts_df is used.

  • temperature_unit (str, default "degC") – Units of the temperature values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily mean temperature before computing the index.

meteora.climate_indices.hot_days(ts_df, *, thresh=None, freq=None, temperature_col=None, temperature_unit='degC')#

Compute hot days with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • thresh (str, optional) – Temperature threshold. If None, the xclim default is used.

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • temperature_col (str, optional) – Column holding temperature values. If None, the first column in ts_df is used.

  • temperature_unit (str, default "degC") – Units of the temperature values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily maximum temperature before computing the index.

meteora.climate_indices.hot_spell_frequency(ts_df, *, thresh=None, window=None, freq=None, op=None, resample_before_rl=None, temperature_col=None, temperature_unit='degC')#

Compute hot spell frequency with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • thresh (str, optional) – Temperature threshold. If None, the xclim default is used.

  • window (int, optional) – Minimum consecutive days for a spell. If None, the xclim default is used.

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • op (str, optional) – Comparison operator. If None, the xclim default is used.

  • resample_before_rl (bool, optional) – Whether to resample before run length calculation. If None, the xclim default is used.

  • temperature_col (str, optional) – Column holding temperature values. If None, the first column in ts_df is used.

  • temperature_unit (str, default "degC") – Units of the temperature values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily maximum temperature before computing the index.

meteora.climate_indices.hot_spell_total_length(ts_df, *, thresh=None, window=None, freq=None, op=None, resample_before_rl=None, temperature_col=None, temperature_unit='degC')#

Compute hot spell total length with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • thresh (str, optional) – Temperature threshold. If None, the xclim default is used.

  • window (int, optional) – Minimum consecutive days for a spell. If None, the xclim default is used.

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • op (str, optional) – Comparison operator. If None, the xclim default is used.

  • resample_before_rl (bool, optional) – Whether to resample before run length calculation. If None, the xclim default is used.

  • temperature_col (str, optional) – Column holding temperature values. If None, the first column in ts_df is used.

  • temperature_unit (str, default "degC") – Units of the temperature values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily maximum temperature before computing the index.

meteora.climate_indices.humidex(ts_df, *, temperature_col=None, temperature_unit='degC', dew_point_temperature_col=None, dew_point_temperature_unit='degC', relative_humidity_col=None, relative_humidity_unit='%')#

Compute humidex with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • temperature_col (str, optional) – Column holding temperature values. If None, the default ECV name settings.ECV_TEMPERATURE is used.

  • temperature_unit (str, default "degC") – Units of the temperature values. Ignored when ts_df provides unit metadata.

  • dew_point_temperature_col (str, optional) – Column holding dew point temperature values. If None, the default ECV name settings.ECV_DEW_POINT_TEMPERATURE is used.

  • dew_point_temperature_unit (str, default "degC") – Units of the dew point temperature values. Ignored when ts_df provides unit metadata.

  • relative_humidity_col (str, optional) – Column holding relative humidity values. If None, the default ECV name settings.ECV_RELATIVE_HUMIDITY is used.

  • relative_humidity_unit (str, default "%") – Units of the relative humidity values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

If the dew point column is not available, relative humidity is used instead.

meteora.climate_indices.ice_days(ts_df, *, thresh=None, freq=None, temperature_col=None, temperature_unit='degC')#

Compute ice days with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • thresh (str, optional) – Temperature threshold. If None, the xclim default is used.

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • temperature_col (str, optional) – Column holding temperature values. If None, the first column in ts_df is used.

  • temperature_unit (str, default "degC") – Units of the temperature values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily maximum temperature before computing the index.

meteora.climate_indices.max_1day_precipitation_amount(ts_df, *, freq=None, precipitation_col=None, precipitation_unit='mm/day')#

Compute maximum 1-day precipitation amount with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • precipitation_col (str, optional) – Column holding precipitation values. If None, the default ECV name settings.ECV_PRECIPITATION is used.

  • precipitation_unit (str, default "mm/day") – Units of the precipitation values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily total precipitation before computing the index.

meteora.climate_indices.max_n_day_precipitation_amount(ts_df, *, window=None, freq=None, precipitation_col=None, precipitation_unit='mm/day')#

Compute maximum n-day precipitation amount with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • window (int, optional) – Rolling window (in days). If None, the xclim default is used.

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • precipitation_col (str, optional) – Column holding precipitation values. If None, the default ECV name settings.ECV_PRECIPITATION is used.

  • precipitation_unit (str, default "mm/day") – Units of the precipitation values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily total precipitation before computing the index.

meteora.climate_indices.prcptot(ts_df, *, thresh=None, freq=None, precipitation_col=None, precipitation_unit='mm/day')#

Compute total precipitation with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • thresh (str, optional) – Precipitation threshold. If None, the xclim default is used.

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • precipitation_col (str, optional) – Column holding precipitation values. If None, the default ECV name settings.ECV_PRECIPITATION is used.

  • precipitation_unit (str, default "mm/day") – Units of the precipitation values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily total precipitation before computing the index.

meteora.climate_indices.sfc_wind_max(ts_df, *, freq=None, wind_speed_col=None, wind_speed_unit='m s-1')#

Compute maximum wind speed with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • wind_speed_col (str, optional) – Column holding wind speed values. If None, the default ECV name settings.ECV_WIND_SPEED is used.

  • wind_speed_unit (str, default "m s-1") – Units of the wind speed values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily mean wind speed before computing the index.

meteora.climate_indices.sfc_wind_mean(ts_df, *, freq=None, wind_speed_col=None, wind_speed_unit='m s-1')#

Compute mean wind speed with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • wind_speed_col (str, optional) – Column holding wind speed values. If None, the default ECV name settings.ECV_WIND_SPEED is used.

  • wind_speed_unit (str, default "m s-1") – Units of the wind speed values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily mean wind speed before computing the index.

meteora.climate_indices.sfc_wind_min(ts_df, *, freq=None, wind_speed_col=None, wind_speed_unit='m s-1')#

Compute minimum wind speed with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • wind_speed_col (str, optional) – Column holding wind speed values. If None, the default ECV name settings.ECV_WIND_SPEED is used.

  • wind_speed_unit (str, default "m s-1") – Units of the wind speed values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily mean wind speed before computing the index.

meteora.climate_indices.tn_days_above(ts_df, *, thresh=None, freq=None, op=None, temperature_col=None, temperature_unit='degC')#

Compute number of days above a minimum temperature threshold with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • thresh (str, optional) – Temperature threshold. If None, the xclim default is used.

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • op (str, optional) – Comparison operator. If None, the xclim default is used.

  • temperature_col (str, optional) – Column holding temperature values. If None, the first column in ts_df is used.

  • temperature_unit (str, default "degC") – Units of the temperature values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily minimum temperature before computing the index.

meteora.climate_indices.wetdays(ts_df, *, thresh=None, freq=None, op=None, precipitation_col=None, precipitation_unit='mm/day')#

Compute wet days with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • thresh (str, optional) – Precipitation threshold. If None, the xclim default is used.

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • op (str, optional) – Comparison operator. If None, the xclim default is used.

  • precipitation_col (str, optional) – Column holding precipitation values. If None, the default ECV name settings.ECV_PRECIPITATION is used.

  • precipitation_unit (str, default "mm/day") – Units of the precipitation values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily total precipitation before computing the index.

meteora.climate_indices.windy_days(ts_df, *, thresh=None, freq=None, wind_speed_col=None, wind_speed_unit='m s-1')#

Compute windy days with xclim.

Parameters:
  • ts_df (pandas.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • thresh (str, optional) – Wind speed threshold. If None, the xclim default is used.

  • freq (str, optional) – Resampling frequency for the index output. If None, the xclim default is used.

  • wind_speed_col (str, optional) – Column holding wind speed values. If None, the default ECV name settings.ECV_WIND_SPEED is used.

  • wind_speed_unit (str, default "m s-1") – Units of the wind speed values. Ignored when ts_df provides unit metadata.

Returns:

Data frame with time as index and stations as columns.

Return type:

pandas.DataFrame

Notes

Uses daily mean wind speed before computing the index.

Utils#

Utils.

class meteora.utils.DummyAttribute#

Dummy attribute.

meteora.utils.abstract_attribute(obj=None)#

Abstract attribute.

meteora.utils.dms_to_decimal(ser)#

Convert a series from degrees, minutes, seconds (DMS) to decimal degrees.

Parameters:

ser (Series)

Return type:

Series

meteora.utils.get_heatwave_periods(ts_df, *, heatwave_t_threshold=None, heatwave_n_consecutive_days=None, station_agg_func=None, inter_station_agg_func=None)#

Get the heatwave periods from a time series of temperature measurements.

A heatwave is defined as a period of at least heatwave_n_consecutive_days days with a temperature above heatwave_t_threshold.

Parameters:
  • ts_df (pd.DataFrame) – Data frame with a time series of temperature measurements at each station, in long or wide format.

  • heatwave_t_threshold (float, optional) – The temperature threshold for a heatwave, in units of ts_df. If not provided, the value from settings.HEATWAVE_T_THRESHOLD is used.

  • heatwave_n_consecutive_days (int, optional) – The number of consecutive days above the mean temperature threshold for the corresponding period to be considered a heatwave. If not provided, the value from settings.HEATWAVE_N_CONSECUTIVE_DAYS is used.

  • station_agg_func (str or function, optional) – How to respectively aggregate the daily temperature measurements at each station and the aggregated daily temperature measurements across all stations. Must be a string function name or a callable function, which will be passed as the func argument of pandas.core.groupby.DataFrameGroupBy.agg. If not provided, the respective values from settings.HEATWAVE_STATION_AGG_FUNC and settings.HEATWAVE_INTER_STATION_AGG_FUNC are used.

  • inter_station_agg_func (str or function, optional) – How to respectively aggregate the daily temperature measurements at each station and the aggregated daily temperature measurements across all stations. Must be a string function name or a callable function, which will be passed as the func argument of pandas.core.groupby.DataFrameGroupBy.agg. If not provided, the respective values from settings.HEATWAVE_STATION_AGG_FUNC and settings.HEATWAVE_INTER_STATION_AGG_FUNC are used.

Returns:

heatwave_range_df – Data frame with the heatwave start and end dates as columns, indexed by the heatwave event identifier.

Return type:

pd.DataFrame

meteora.utils.get_heatwave_ts_df(ts_df, *, heatwave_periods=None, heatwave_t_threshold=None, heatwave_n_consecutive_days=None, station_agg_func=None, inter_station_agg_func=None)#

Get a time series data frame for the heatwave periods.

A heatwave is defined as a period of at least heatwave_n_consecutive_days days with a temperature above heatwave_t_threshold.

Parameters:
  • ts_df (pd.DataFrame) – Data frame with a time series of temperature measurements at each station, in long or wide format.

  • heatwave_t_threshold (float, optional) – The temperature threshold for a heatwave, in units of ts_df. If not provided, the value from settings.HEATWAVE_T_THRESHOLD is used.

  • heatwave_n_consecutive_days (int, optional) – The number of consecutive days above the mean temperature threshold for the corresponding period to be considered a heatwave. If not provided, the value from settings.HEATWAVE_N_CONSECUTIVE_DAYS is used.

  • station_agg_func (str or function, optional) – How to respectively aggregate the daily temperature measurements at each station and the aggregated daily temperature measurements across all stations. Must be a string function name or a callable function, which will be passed as the func argument of pandas.core.groupby.DataFrameGroupBy.agg. If not provided, the respective values from settings.HEATWAVE_STATION_AGG_FUNC and settings.HEATWAVE_INTER_STATION_AGG_FUNC are used.

  • inter_station_agg_func (str or function, optional) – How to respectively aggregate the daily temperature measurements at each station and the aggregated daily temperature measurements across all stations. Must be a string function name or a callable function, which will be passed as the func argument of pandas.core.groupby.DataFrameGroupBy.agg. If not provided, the respective values from settings.HEATWAVE_STATION_AGG_FUNC and settings.HEATWAVE_INTER_STATION_AGG_FUNC are used.

  • heatwave_periods (list[tuple[date, date]] | None)

Returns:

heatwave_range_df – Data frame with the heatwave start and end dates as columns, indexed by the heatwave event identifier.

Return type:

pd.DataFrame

meteora.utils.log(message, *, level=None, name=None, filename=None)#

Write a message to the logger.

This logs to file and/or prints to the console (terminal), depending on the current configuration of settings.LOG_FILE and settings.LOG_CONSOLE.

Parameters:
  • message (str) – The message to log.

  • level (int, optional) – One of Python’s logger.level constants. If None, the value from settings.LOG_LEVEL is used.

  • name (str, optional) – Name of the logger. If None, the value from settings.LOG_NAME is used.

  • filename (str, optional) – Name of the log file, without file extension. If None, the value from settings.LOG_FILENAME is used.

Return type:

None

meteora.utils.long_to_cube(ts_df, stations_gdf)#

Convert a time series data frame and station locations to a vector data cube.

A vector data cube is an n-D array with at least one dimension indexed by vector geometries. In Python, this is represented as an xarray Dataset (or DataArray) object with an indexed dimension with vector geometries set using xvec.

Parameters:
  • ts_df (pd.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • stations_gdf (gpd.GeoDataFrame) – The stations data as a GeoDataFrame.

Returns:

ts_cube – The vector data cube with the time series of measurements for each station. The stations are indexed by their geometry.

Return type:

xr.Dataset

meteora.utils.long_to_stationbench(ts_df, stations_gdf, *, variable_rename=None, dst_time_dim='time', dst_station_dim='station_id')#

Convert a long time series data frame to StationBench’s station data format.

Parameters:
  • ts_df (pd.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • stations_gdf (gpd.GeoDataFrame) – Stations data as a GeoDataFrame, indexed by station ID.

  • variable_rename (Mapping[str, str], optional) – Mapping used to rename data variable names in the output dataset. If None, Meteora ECV names are mapped to StationBench defaults (temperature -> 2m_temperature, wind_speed -> 10m_wind_speed).

  • dst_time_dim (str, optional) – Names for the time and station dimensions in the output dataset. If not provided, they default to time and station_id, which are the expected names in StationBench.

  • dst_station_dim (str, optional) – Names for the time and station dimensions in the output dataset. If not provided, they default to time and station_id, which are the expected names in StationBench.

Returns:

ts_ds – Dataset with dimensions time and station_id, coordinates latitude and longitude, and one data variable per weather variable.

Return type:

xr.Dataset

meteora.utils.long_to_wide(ts_df, *, variables=None)#

Convert a time series data frame from long (default) to wide format.

Parameters:
  • ts_df (pd.DataFrame) – Long form data frame with a time series of measurements (second-level index) at each station (first-level index) for each variable (column).

  • variables (str, int or list-like of str or int, optional) – Target variables, which must be columns in ts_df.

Returns:

wide_ts_df – Wide form data frame with a time series of measurements (index) for each variable (first-level column index) at each station (second-level column index). If there is only one variable, the column index is a single level featuring the stations.

Return type:

pd.DataFrame

meteora.utils.ts(*, style='datetime', template=None)#

Get current timestamp as string.

Parameters:
  • style (str {"datetime", "date", "time"}) – Format the timestamp with this built-in template.

  • template (str, optional) – If not None, format the timestamp with this template instead of one of the built-in styles.

Returns:

ts – The string timestamp.

Return type:

str

Base abstract client#

Base abstract classes for meteo station datasets.

class meteora.clients.base.BaseFileClient(*args, **kwargs)#

Base class for clients that operate on file URLs (e.g., CSV).

property pooch_kwargs: dict#

Keyword arguments to pass to pooch retrieval calls.

class meteora.clients.base.BaseJSONClient(*args, **kwargs)#

Base class for clients that return JSON-encoded responses.

class meteora.clients.base.BaseTextClient(*args, **kwargs)#

Base class for clients that return text-encoded (e.g., CSV) responses.

Mixins#

Authentication mixins.

class meteora.clients.mixins.auth.APIKeyHeaderMixin#

API key as request header mixin.

property request_headers: dict#

Request headers.

class meteora.clients.mixins.auth.APIKeyMixin#

API key mixin.

class meteora.clients.mixins.auth.APIKeyParamMixin#

API key as request parameter mixin.

property request_params: dict#

Request parameters.

Stations mixins.

class meteora.clients.mixins.stations.StationsEndpointMixin#

Stations endpoint mixin.

Variables mixins.

class meteora.clients.mixins.variables.VariablesEndpointMixin#

Variables endpoint mixin.

property variables_df: DataFrame#

Variables dataframe.

class meteora.clients.mixins.variables.VariablesHardcodedMixin#

Hardcoded variables mixin.

property variables_df: DataFrame#

Variables dataframe.

class meteora.clients.mixins.variables.VariablesMixin#

Variables Mixin.