{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Climate indices with xclim integration\n",
    "\n",
    "In this notebook, we show how Meteora supports the computation of climate indices via the `climate_indices` module, which process the time-series data frames as obtained by Meteora's clients into the inputs of several [of the climate indices implemented in xclim](https://xclim.readthedocs.io/en/stable/indices.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import datetime as dt\n",
    "\n",
    "import contextily as cx\n",
    "import geopandas as gpd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from shapely import geometry\n",
    "\n",
    "from meteora import clients, climate_indices, units"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "We will use MeteoSwiss stations around Zurich and focus on the June to August periods (JJA) of 2024:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {},
   "outputs": [],
   "source": [
    "region_bbox = [8.34, 47.28, 8.67, 47.54]\n",
    "region = gpd.GeoSeries(geometry.box(*region_bbox), crs=\"epsg:4326\")\n",
    "# select study period\n",
    "year = 2024\n",
    "# months to consider when querying the data\n",
    "start_month = 6\n",
    "end_month = 8\n",
    "# variables\n",
    "variables = [\"temperature\", \"relative_humidity\", \"precipitation\", \"wind_speed\"]\n",
    "# viz\n",
    "cmap = \"coolwarm\"\n",
    "figwidth = plt.rcParams[\"figure.figsize\"][0]\n",
    "figheight = plt.rcParams[\"figure.figsize\"][0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4",
   "metadata": {},
   "source": [
    "We will start by plotting the MeteoSwiss stations in the study area:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5",
   "metadata": {},
   "outputs": [],
   "source": [
    "client = clients.MeteoSwissClient(region)\n",
    "colors = sns.color_palette()\n",
    "ax = client.stations_gdf.plot(color=colors[0])\n",
    "client.region.plot(ax=ax, facecolor=(0, 0, 0, 0), edgecolor=colors[1])\n",
    "cx.add_basemap(ax=ax, crs=client.stations_gdf.crs, attribution=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6",
   "metadata": {},
   "source": [
    "*(C) OpenStreetMap contributors, Tiles style by Humanitarian OpenStreetMap Team hosted by OpenStreetMap France*\n",
    "\n",
    "We can now get the JJA time series of measurements for the target year:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_df = client.get_ts_df(\n",
    "    variables,\n",
    "    dt.date(year, start_month, 1),\n",
    "    # get latest moment of the latest day of the month\n",
    "    # ACHTUNG: this won't work if `end_month` is 12\n",
    "    dt.datetime.combine(\n",
    "        dt.date(year, end_month + 1, 1) - dt.timedelta(days=1), dt.time.max\n",
    "    ),\n",
    ")\n",
    "ts_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8",
   "metadata": {},
   "source": [
    "### Climate indices and input climate variables\n",
    "\n",
    "We can compute several climate indices using temperature data only, which can be as simple as the number of  \"hot days\" with daily maximum temperature exceeds a given threshold:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9",
   "metadata": {},
   "outputs": [],
   "source": [
    "hot_days_df = climate_indices.hot_days(\n",
    "    ts_df,\n",
    "    thresh=\"30.0 degC\",\n",
    "    freq=\"YS\",\n",
    ")\n",
    "hot_days_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10",
   "metadata": {},
   "source": [
    "Let's visualize the number of hot days per station in the study area:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "hot_days_label = \"N. hot days\"\n",
    "hot_days_gdf = gpd.GeoDataFrame(hot_days_df.T, geometry=client.stations_gdf[\"geometry\"])\n",
    "hot_days_gdf.columns = [hot_days_label, \"geometry\"]\n",
    "ax = hot_days_gdf.plot(\n",
    "    hot_days_label,\n",
    "    cmap=\"coolwarm\",\n",
    "    edgecolor=\"k\",\n",
    "    legend=True,\n",
    "    legend_kwds={\"shrink\": 0.5, \"label\": hot_days_label},\n",
    ")\n",
    "ax.set_axis_off()\n",
    "cx.add_basemap(ax, crs=hot_days_gdf.crs, attribution=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12",
   "metadata": {},
   "source": [
    "*(C) OpenStreetMap contributors, Tiles style by Humanitarian OpenStreetMap Team hosted by OpenStreetMap France*\n",
    "\n",
    "The computation of the hot days and other temperature-only indices will work as long as the input data frame contains a temperature column named \"temperature\" according to [the meteora nomenclature](overview.ipynb#selecting-variables). In the case where the temperature column has a different name, we need to provide that information explicitly through the `temperature_col` argument:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "13",
   "metadata": {},
   "outputs": [],
   "source": [
    "climate_indices.hot_days(\n",
    "    ts_df.rename(columns={\"temperature\": \"tre200s0\"}), temperature_col=\"tre200s0\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14",
   "metadata": {},
   "source": [
    "Other climate indices require multiple climate variables as input. For instance, the heat index {cite}`blazejczyk2012utci` requires temperature and relative humidity to estimate perceived temperature:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15",
   "metadata": {},
   "outputs": [],
   "source": [
    "climate_indices.heat_index(ts_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16",
   "metadata": {},
   "source": [
    "Here, the input data frame must contain both a temperature column named \"temperature\" and another column named \"relative_humidity\". If the columns have different names, we can provide them explicitly through the `temperature_col` and `relative_humidity_col` arguments.\n",
    "\n",
    "### Climate indices and their default arguments\n",
    "\n",
    "Most climate indices have additional arguments that can be used to customize their computation. For instance, note that while the heat index is computed at each time step, the hot days index aggregates the yearly counts. The yearly frequency is extracted from the signature of the underlying xclim method, in this case `xclim.indices.hot_days`, but we can choose another resampling frequency supported by xarray (e.g., [the \"month start\" `MS` frequency](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17",
   "metadata": {},
   "outputs": [],
   "source": [
    "climate_indices.hot_days(ts_df, freq=\"MS\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18",
   "metadata": {},
   "source": [
    "Similarly, we can provide a different threshold value after which a day is considered \"hot\":"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19",
   "metadata": {},
   "outputs": [],
   "source": [
    "climate_indices.hot_days(\n",
    "    ts_df,\n",
    "    thresh=\"25 degC\",\n",
    "    freq=\"YS\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20",
   "metadata": {},
   "source": [
    "### Units and xclim compatibility\n",
    "\n",
    "The xclim methods handle unit conversions when inputs include unit metadata, as it is the case with the time series returned by `get_ts_df` methods in Meteora - so indices work out of the box. For unit-naive data frames (e.g., custom data sources), you can pass explicit `*_unit` arguments, attach units with `meteora.units.attach_units`, or convert to canonical ECV units with `meteora.units.convert_units`.\n",
    "\n",
    "For instance, consider the `METARASOSIEMClient`, which provides temperature in Fahrenheit, precipitation in inches and wind speed in knots:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "metar_client = clients.METARASOSIEMClient(region)\n",
    "metar_client._variable_units_dict"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22",
   "metadata": {},
   "source": [
    "We can get the time series for the same period as before:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23",
   "metadata": {},
   "outputs": [],
   "source": [
    "metar_ts_df = metar_client.get_ts_df(\n",
    "    variables,\n",
    "    dt.date(year, start_month, 1),\n",
    "    # get latest moment of the latest day of the month\n",
    "    # ACHTUNG: this won't work if `end_month` is 12\n",
    "    dt.datetime.combine(\n",
    "        dt.date(year, end_month + 1, 1) - dt.timedelta(days=1), dt.time.max\n",
    "    ),\n",
    ")\n",
    "metar_ts_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24",
   "metadata": {},
   "source": [
    "Because `metar_ts_df` includes units metadata, we can compute the hot days index directly and xclim will convert Fahrenheit internally:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "25",
   "metadata": {},
   "outputs": [],
   "source": [
    "climate_indices.hot_days(\n",
    "    metar_ts_df,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26",
   "metadata": {},
   "source": [
    "If you drop the units metadata (or start from a unit-naive data frame), you need to provide the correct units explicitly, otherwise it would lead to misleading computed values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "27",
   "metadata": {},
   "outputs": [],
   "source": [
    "metar_ts_df_naive = metar_ts_df.copy()\n",
    "metar_ts_df_naive.attrs = {}\n",
    "\n",
    "climate_indices.hot_days(\n",
    "    metar_ts_df_naive,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28",
   "metadata": {},
   "source": [
    "For naive data frames, we can specify the units of each variable using the respective `*_unit` argument:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "29",
   "metadata": {},
   "outputs": [],
   "source": [
    "climate_indices.hot_days(metar_ts_df_naive, temperature_unit=\"degF\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30",
   "metadata": {},
   "source": [
    "Alternatively, we can attach units metadata to the unit-naive data frame using `units.attach_units` and compute the hot days index without providing the `temperature_unit` argument, since the data frame already carries the units metadata:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31",
   "metadata": {},
   "outputs": [],
   "source": [
    "metar_ts_df_units = units.attach_units(\n",
    "    metar_ts_df_naive,\n",
    "    metar_client._variable_units_dict,\n",
    ")\n",
    "climate_indices.hot_days(\n",
    "    metar_ts_df_units,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32",
   "metadata": {},
   "source": [
    "## Supported indices (MeteoSwiss example)\n",
    "\n",
    "The climate indices from xclim currently implemented in `meteora.climate_indices` are:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33",
   "metadata": {},
   "outputs": [],
   "source": [
    "sorted(climate_indices.__all__)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "34",
   "metadata": {},
   "source": [
    "## See also\n",
    "- The [xclim documentation on climate indices](https://xclim.readthedocs.io/en/stable/indices.html) for a detailed overview of ther arguments and default values.\n",
    "- The [heatwave detection notebook](heatwave-detection.ipynb) for heatwave detection and extraction of the underlying time series data of each heatwave.\n",
    "\n",
    "## References\n",
    "\n",
    "```{bibliography}\n",
    ":filter: docname in docnames\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python (Pixi)",
   "language": "python",
   "name": "pixi-kernel-python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.11"
  },
  "pixi-kernel": {
   "environment": "doc"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}