Data structures for geospatial time series data#

import contextily as cx
import matplotlib.pyplot as plt
import osmnx as ox
import seaborn as sns
import xarray as xr

from meteora import clients, utils

region = "Switzerland"
start = "2021-08-13"
end = "2021-08-16"
variables = ["temperature", "precipitation", "wind_speed"]

client = clients.METARASOSIEMClient(region)

ts_df = client.get_ts_df(variables, start=start, end=end)
ts_df

		temperature	precipitation	wind_speed
station_id	time
LSGC	2021-08-13 00:20:00+00:00	60.8	0.0	1.0
	2021-08-13 00:50:00+00:00	60.8	0.0	2.0
	2021-08-13 01:20:00+00:00	60.8	0.0	2.0
	2021-08-13 01:50:00+00:00	59.0	0.0	0.0
	2021-08-13 02:20:00+00:00	59.0	0.0	0.0
...	...	...	...	...
LSZS	2021-08-15 21:50:00+00:00	51.8	0.0	4.0
	2021-08-15 22:20:00+00:00	51.8	0.0	2.0
	2021-08-15 22:50:00+00:00	51.8	0.0	4.0
	2021-08-15 23:20:00+00:00	50.0	0.0	2.0
	2021-08-15 23:50:00+00:00	50.0	0.0	1.0

2304 rows × 3 columns

There are different ways in which we can structure time series data frame shown above. The most common representation are the so-called long and wide table formats (see the “Data structures accepted by seaborn” section of the seaborn documentation for an overview).

Long data frames#

The get_ts_df method returns a long data frame where the station and time are used as multi-index levels, so that we can easily select the time series of a specific station:

station_id = "LSGG"
ts_df.loc[station_id]

	temperature	precipitation	wind_speed
time
2021-08-13 00:20:00+00:00	64.4	0.0	4.0
2021-08-13 00:50:00+00:00	62.6	0.0	1.0
2021-08-13 01:20:00+00:00	62.6	0.0	3.0
2021-08-13 01:50:00+00:00	62.6	0.0	2.0
2021-08-13 02:20:00+00:00	62.6	0.0	3.0
...	...	...	...
2021-08-15 21:50:00+00:00	77.0	0.0	2.0
2021-08-15 22:20:00+00:00	77.0	0.0	4.0
2021-08-15 22:50:00+00:00	77.0	0.0	10.0
2021-08-15 23:20:00+00:00	68.0	0.0	7.0
2021-08-15 23:50:00+00:00	68.0	0.0	3.0

144 rows × 3 columns

The key advantage of a long data frame for time series data is that it is very flexible, i.e., we can combine stations with different time ranges and temporal resolutions into a single data frame structure. However, performing simple time series operations requires grouping by stations, which may not be the most convenient and straight-forward.

Wide data frames#

The same data can be represented as a wide data frame, where the index is the time and the variables and stations are the multi-level columns. We can convert the long data frame to a wide data frame using the meteora.utils.long_to_wide function:

# in this case, equivalently:
# wide_ts_df = ts_df.unstack(level="station")
wide_ts_df = utils.long_to_wide(ts_df)
wide_ts_df

variable	temperature										...	wind_speed
station_id	LSGC	LSGG	LSGS	LSMA	LSMD	LSME	LSMM	LSMP	LSZA	LSZB	...	LSMM	LSMP	LSZA	LSZB	LSZC	LSZG	LSZH	LSZL	LSZR	LSZS
time
2021-08-13 00:20:00+00:00	60.8	64.4	64.4	69.8	68.0	69.8	66.2	68.0	71.6	64.4	...	2.0	4.0	8.0	3.0	4.0	2.0	7.0	5.0	6.0	1.0
2021-08-13 00:50:00+00:00	60.8	62.6	64.4	68.0	66.2	69.8	66.2	68.0	71.6	62.6	...	3.0	4.0	4.0	3.0	2.0	2.0	4.0	7.0	3.0	4.0
2021-08-13 01:20:00+00:00	60.8	62.6	64.4	66.2	60.8	68.0	64.4	68.0	71.6	62.6	...	4.0	3.0	2.0	1.0	2.0	2.0	5.0	8.0	6.0	2.0
2021-08-13 01:50:00+00:00	59.0	62.6	62.6	66.2	59.0	66.2	64.4	66.2	69.8	62.6	...	3.0	3.0	3.0	2.0	4.0	2.0	2.0	7.0	5.0	2.0
2021-08-13 02:20:00+00:00	59.0	62.6	62.6	64.4	59.0	64.4	62.6	64.4	69.8	62.6	...	3.0	0.0	4.0	2.0	3.0	2.0	4.0	3.0	4.0	1.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2021-08-15 21:50:00+00:00	62.6	77.0	75.2	73.4	-4.0	75.2	69.8	71.6	75.2	68.0	...	0.0	1.0	1.0	1.0	2.0	1.0	2.0	3.0	2.0	4.0
2021-08-15 22:20:00+00:00	64.4	77.0	75.2	71.6	12.2	73.4	68.0	71.6	75.2	66.2	...	3.0	0.0	2.0	2.0	9.0	4.0	2.0	2.0	4.0	2.0
2021-08-15 22:50:00+00:00	66.2	77.0	75.2	69.8	-13.0	71.6	68.0	71.6	73.4	66.2	...	6.0	2.0	2.0	2.0	15.0	3.0	2.0	2.0	5.0	4.0
2021-08-15 23:20:00+00:00	64.4	68.0	75.2	71.6	26.6	71.6	68.0	75.2	75.2	66.2	...	3.0	3.0	3.0	4.0	9.0	3.0	2.0	2.0	5.0	2.0
2021-08-15 23:50:00+00:00	66.2	68.0	71.6	73.4	23.0	71.6	68.0	69.8	73.4	68.0	...	1.0	2.0	1.0	2.0	8.0	5.0	2.0	3.0	3.0	1.0

144 rows × 48 columns

Accordingly, we can easily select the time series of a specific variable:

wide_ts_df["temperature"]

station_id	LSGC	LSGG	LSGS	LSMA	LSMD	LSME	LSMM	LSMP	LSZA	LSZB	LSZC	LSZG	LSZH	LSZL	LSZR	LSZS
time
2021-08-13 00:20:00+00:00	60.8	64.4	64.4	69.8	68.0	69.8	66.2	68.0	71.6	64.4	68.0	68.0	71.6	71.6	71.6	50.0
2021-08-13 00:50:00+00:00	60.8	62.6	64.4	68.0	66.2	69.8	66.2	68.0	71.6	62.6	69.8	66.2	71.6	73.4	71.6	50.0
2021-08-13 01:20:00+00:00	60.8	62.6	64.4	66.2	60.8	68.0	64.4	68.0	71.6	62.6	68.0	66.2	69.8	73.4	73.4	50.0
2021-08-13 01:50:00+00:00	59.0	62.6	62.6	66.2	59.0	66.2	64.4	66.2	69.8	62.6	66.2	64.4	69.8	73.4	69.8	50.0
2021-08-13 02:20:00+00:00	59.0	62.6	62.6	64.4	59.0	64.4	62.6	64.4	69.8	62.6	68.0	62.6	69.8	73.4	71.6	48.2
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2021-08-15 21:50:00+00:00	62.6	77.0	75.2	73.4	-4.0	75.2	69.8	71.6	75.2	68.0	73.4	71.6	71.6	73.4	73.4	51.8
2021-08-15 22:20:00+00:00	64.4	77.0	75.2	71.6	12.2	73.4	68.0	71.6	75.2	66.2	73.4	71.6	69.8	73.4	77.0	51.8
2021-08-15 22:50:00+00:00	66.2	77.0	75.2	69.8	-13.0	71.6	68.0	71.6	73.4	66.2	71.6	69.8	69.8	73.4	78.8	51.8
2021-08-15 23:20:00+00:00	64.4	68.0	75.2	71.6	26.6	71.6	68.0	75.2	75.2	66.2	73.4	71.6	69.8	73.4	78.8	50.0
2021-08-15 23:50:00+00:00	66.2	68.0	71.6	73.4	23.0	71.6	68.0	69.8	73.4	68.0	69.8	75.2	69.8	73.4	77.0	50.0

144 rows × 16 columns

Note that unlike in the long data frame, the records of all the stations and variables are indexed by a common and unique timestamp. Accordingly, a key advantage of the wide data frame is that time series operations are supported by pandas out-of-the-box. For example, we can easily select a time range and plot the temperature time series of all stations:

wide_ts_df.loc["2021-08-14 00:00:00":"2021-08-15 00:00:00"]["temperature"].plot(
    legend=False
)

<Axes: xlabel='time'>

../_images/3318bb1c7715f068a107f27a834c9417cea7d45efca24238dd24692b1ea08ae8.png

We can also resample the time series to a different frequency, e.g., we can compute the hourly mean temperature time series of all stations:

wide_ts_df.resample("3h")["temperature"].mean().plot(legend=False)

<Axes: xlabel='time'>

../_images/dd6ad213246b8dd7871149aacf390f4f0a110c5f9f2e5fd3b9959d6ce4b192ad.png

The main disadvantage is the wide data form may not be appropriate to store raw observations from different sources at different time resolutions, since it can result in many nan values.

Example geospatial operations with long and wide data frames#

We have so far reviewed the advantages and disatvantages that the long and wide data frame forms feature when dealing with time series data. However, meteorological observations also have an important geospatial component which we have not addressed yet. While it is possible to add a “geometry” column to the long data frame, it would result in many repeated values (since we likely have many observations for each station). On the other hand, stations are featured as rows in the wide data frame, which makes adding their geolocation very inconvenient.

Therefore, in most cases the best way to peform spatial operations on meteorological data is to use two separate data structures, i.e., one for the time series of measurements and another with the station locations (and potentially other attributes). As shown above, the station locations can be accessed in the stations_gdf property, which we can use to perform geospatial operations. For example, we can select stations within a buffer around a specific location:

query = "Canton de Vaud"
buffer_dist = 20e3  # in meters

extent_geom = (
    ox.projection.project_gdf(ox.geocode_to_gdf(query))
    .buffer(buffer_dist)
    .to_crs(client.stations_gdf.crs)
    .iloc[0]
)
station_ids = client.stations_gdf[client.stations_gdf.within(extent_geom)].index
station_ids

Index(['LSGC', 'LSGG', 'LSGS', 'LSMP'], dtype='str', name='station_id')

We can use this information to select the time series of the stations within the buffer. For the long data frame (the default in Meteora), this is straightforward since the stations are the first-level index:

ts_df.loc[station_ids]

		temperature	precipitation	wind_speed
station_id	time
LSGC	2021-08-13 00:20:00+00:00	60.8	0.0	1.0
	2021-08-13 00:50:00+00:00	60.8	0.0	2.0
	2021-08-13 01:20:00+00:00	60.8	0.0	2.0
	2021-08-13 01:50:00+00:00	59.0	0.0	0.0
	2021-08-13 02:20:00+00:00	59.0	0.0	0.0
...	...	...	...	...
LSMP	2021-08-15 21:50:00+00:00	71.6	0.0	1.0
	2021-08-15 22:20:00+00:00	71.6	0.0	0.0
	2021-08-15 22:50:00+00:00	71.6	0.0	2.0
	2021-08-15 23:20:00+00:00	75.2	0.0	3.0
	2021-08-15 23:50:00+00:00	69.8	0.0	2.0

576 rows × 3 columns

For the wide data frame, stations are in the second level of the columns, so we can either (i) swap the column levels and then simply pass station_ids to select the desired stations or (ii) use .loc at the second level of the columns:

# (i)
# wide_ts_df.swaplevel("variable", "station", axis="columns")[station_ids]
# (ii)
wide_ts_df.loc[:, (slice(None), station_ids)]

variable	temperature				precipitation				wind_speed
station_id	LSGC	LSGG	LSGS	LSMP	LSGC	LSGG	LSGS	LSMP	LSGC	LSGG	LSGS	LSMP
time
2021-08-13 00:20:00+00:00	60.8	64.4	64.4	68.0	0.0	0.0	0.0	0.0	1.0	4.0	4.0	4.0
2021-08-13 00:50:00+00:00	60.8	62.6	64.4	68.0	0.0	0.0	0.0	0.0	2.0	1.0	5.0	4.0
2021-08-13 01:20:00+00:00	60.8	62.6	64.4	68.0	0.0	0.0	0.0	0.0	2.0	3.0	0.0	3.0
2021-08-13 01:50:00+00:00	59.0	62.6	62.6	66.2	0.0	0.0	0.0	0.0	0.0	2.0	0.0	3.0
2021-08-13 02:20:00+00:00	59.0	62.6	62.6	64.4	0.0	0.0	0.0	0.0	0.0	3.0	2.0	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...
2021-08-15 21:50:00+00:00	62.6	77.0	75.2	71.6	0.0	0.0	0.0	0.0	1.0	2.0	3.0	1.0
2021-08-15 22:20:00+00:00	64.4	77.0	75.2	71.6	0.0	0.0	0.0	0.0	2.0	4.0	3.0	0.0
2021-08-15 22:50:00+00:00	66.2	77.0	75.2	71.6	0.0	0.0	0.0	0.0	5.0	10.0	3.0	2.0
2021-08-15 23:20:00+00:00	64.4	68.0	75.2	75.2	0.0	0.0	0.0	0.0	6.0	7.0	5.0	3.0
2021-08-15 23:50:00+00:00	66.2	68.0	71.6	69.8	0.0	0.0	0.0	0.0	6.0	3.0	1.0	2.0

144 rows × 12 columns

Note that if we are interested in a single variable, filtering by stations becomes much simpler:

wide_ts_df["temperature"][station_ids]

station_id	LSGC	LSGG	LSGS	LSMP
time
2021-08-13 00:20:00+00:00	60.8	64.4	64.4	68.0
2021-08-13 00:50:00+00:00	60.8	62.6	64.4	68.0
2021-08-13 01:20:00+00:00	60.8	62.6	64.4	68.0
2021-08-13 01:50:00+00:00	59.0	62.6	62.6	66.2
2021-08-13 02:20:00+00:00	59.0	62.6	62.6	64.4
...	...	...	...	...
2021-08-15 21:50:00+00:00	62.6	77.0	75.2	71.6
2021-08-15 22:20:00+00:00	64.4	77.0	75.2	71.6
2021-08-15 22:50:00+00:00	66.2	77.0	75.2	71.6
2021-08-15 23:20:00+00:00	64.4	68.0	75.2	75.2
2021-08-15 23:50:00+00:00	66.2	68.0	71.6	69.8

144 rows × 4 columns

Saving to disk#

There are several ways in which we can store the geospatial time series data that we deal with when using Meteora.

Using long or wide data frames#

When using long and wide data frames, the most straight-forward solution is to separately save the stations geo-data frame and time series data frame to disk, e.g., using a geospatial file format for the former and a pandas data frame compatible format (CSV, JSON…) for the latter:

ts_df.to_csv("ts-df.csv")
client.stations_gdf.to_file("stations.gpkg")

Note that for long data frames, it is possible to save a single object by adding the station attributes as columns, however note that in most cases this will be highly inefficient and result in many repeated values.

For large amounts of data, you may consider tstore, which is an interoperable specification based on Apache Parquet and GeoParquet. (note that the API is still experimental). The first tutorial illustrates different ways in which long data frames from Meteora can be efficiently$^1$ stored while keeping the geospatial information.

Using vector data cubes#

There exist several ways of saving vector data cubes to disk, which include (see the related section of the xvec documentation for more details):

geospatial file formats, which is essentially equivalent to transforming the vector data cube to a long data frame with the station attributes (including location), and thus has the exact same shortcomings in terms of the high degree of repetition.
binary serialization formats such as pickle and joblib, which have the advantage of allowing compression but may only work when read in a similar environment with similar package versions (therefore with restricted interoperability).
CF conventions with netCDF or zarr, which encode the geospatial information so that it is compatible with the xarray’s formats such as netCDF and zarr.

Overall, the latter option is likely the most convenient. Let us try it (note that in order to run the cells below, we need to install zarr which is not a Meteora - nor xvec - dependency):

encoded = ts_cube.xvec.encode_cf()
# encoded.to_netcdf("geo-encoded.nc", mode="w")
encoded.to_zarr("geo-encoded.zarr", mode="w")

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[20], line 3
      1 encoded = ts_cube.xvec.encode_cf()
      2 # encoded.to_netcdf("geo-encoded.nc", mode="w")
----> 3 encoded.to_zarr("geo-encoded.zarr", mode="w")

File ~/checkouts/readthedocs.org/user_builds/meteora/checkouts/stable/.pixi/envs/doc/lib/python3.13/site-packages/xarray/core/dataset.py:2390, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, align_chunks, storage_options, zarr_version, zarr_format, write_empty_chunks, chunkmanager_store_kwargs)
   2207 """Write dataset contents to a zarr group.
   2208 
   2209 Zarr chunks are determined in the following way:
   (...)   2386     The I/O user guide, with more details and examples.
   2387 """
   2388 from xarray.backends.writers import to_zarr
-> 2390 return to_zarr(  # type: ignore[call-overload,misc]
   2391     self,
   2392     store=store,
   2393     chunk_store=chunk_store,
   2394     storage_options=storage_options,
   2395     mode=mode,
   2396     synchronizer=synchronizer,
   2397     group=group,
   2398     encoding=encoding,
   2399     compute=compute,
   2400     consolidated=consolidated,
   2401     append_dim=append_dim,
   2402     region=region,
   2403     safe_chunks=safe_chunks,
   2404     align_chunks=align_chunks,
   2405     zarr_version=zarr_version,
   2406     zarr_format=zarr_format,
   2407     write_empty_chunks=write_empty_chunks,
   2408     chunkmanager_store_kwargs=chunkmanager_store_kwargs,
   2409 )

File ~/checkouts/readthedocs.org/user_builds/meteora/checkouts/stable/.pixi/envs/doc/lib/python3.13/site-packages/xarray/backends/writers.py:797, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, align_chunks, storage_options, zarr_version, zarr_format, write_empty_chunks, chunkmanager_store_kwargs)
    795 # TODO: figure out how to properly handle unlimited_dims
    796 try:
--> 797     dump_to_store(dataset, zstore, writer, encoding=encoding)
    798     writes = writer.sync(
    799         compute=compute, chunkmanager_store_kwargs=chunkmanager_store_kwargs
    800     )
    801 finally:

File ~/checkouts/readthedocs.org/user_builds/meteora/checkouts/stable/.pixi/envs/doc/lib/python3.13/site-packages/xarray/backends/writers.py:491, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
    488 if encoder:
    489     variables, attrs = encoder(variables, attrs)
--> 491 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File ~/checkouts/readthedocs.org/user_builds/meteora/checkouts/stable/.pixi/envs/doc/lib/python3.13/site-packages/xarray/backends/zarr.py:1009, in ZarrStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
   1003     if self._append_dim not in existing_dims:
   1004         raise ValueError(
   1005             f"append_dim={self._append_dim!r} does not match any existing "
   1006             f"dataset dimensions {existing_dims}"
   1007         )
-> 1009 variables_encoded, attributes = self.encode(
   1010     {vn: variables[vn] for vn in new_variable_names}, attributes
   1011 )
   1013 if existing_variable_names:
   1014     # We make sure that values to be appended are encoded *exactly*
   1015     # as the current values in the store.
   1016     # To do so, we decode variables directly to access the proper encoding,
   1017     # without going via xarray.Dataset to avoid needing to load
   1018     # index variables into memory.
   1019     existing_vars, _, _ = conventions.decode_cf_variables(
   1020         variables={
   1021             k: self.open_store_variable(name=k) for k in existing_variable_names
   (...)   1025         attributes={},
   1026     )

File ~/checkouts/readthedocs.org/user_builds/meteora/checkouts/stable/.pixi/envs/doc/lib/python3.13/site-packages/xarray/backends/common.py:454, in AbstractWritableDataStore.encode(self, variables, attributes)
    452 for k, v in variables.items():
    453     try:
--> 454         encoded_variables[k] = self.encode_variable(v)
    455     except Exception as e:
    456         e.add_note(f"Raised while encoding variable {k!r} with value {v!r}")

File ~/checkouts/readthedocs.org/user_builds/meteora/checkouts/stable/.pixi/envs/doc/lib/python3.13/site-packages/xarray/backends/zarr.py:933, in ZarrStore.encode_variable(self, variable, name)
    932 def encode_variable(self, variable, name=None):
--> 933     variable = encode_zarr_variable(variable, name=name)
    934     return variable

File ~/checkouts/readthedocs.org/user_builds/meteora/checkouts/stable/.pixi/envs/doc/lib/python3.13/site-packages/xarray/backends/zarr.py:516, in encode_zarr_variable(var, needs_copy, name)
    495 def encode_zarr_variable(var, needs_copy=True, name=None):
    496     """
    497     Converts a Variable into another Variable which follows some
    498     of the CF conventions:
   (...)    513         A variable which has been encoded as described above.
    514     """
--> 516     var = conventions.encode_cf_variable(var, name=name)
    517     var = ensure_dtype_not_object(var, name=name)
    519     # zarr allows unicode, but not variable-length strings, so it's both
    520     # simpler and more compact to always encode as UTF-8 explicitly.
    521     # TODO: allow toggling this explicitly via dtype in encoding.
    522     # TODO: revisit this now that Zarr _does_ allow variable-length strings

File ~/checkouts/readthedocs.org/user_builds/meteora/checkouts/stable/.pixi/envs/doc/lib/python3.13/site-packages/xarray/conventions.py:102, in encode_cf_variable(var, needs_copy, name)
     90 ensure_not_multiindex(var, name=name)
     92 for coder in [
     93     CFDatetimeCoder(),
     94     CFTimedeltaCoder(),
   (...)    100     variables.BooleanCoder(),
    101 ]:
--> 102     var = coder.encode(var, name=name)
    104 for attr_name in CF_RELATED_DATA:
    105     pop_to(var.encoding, var.attrs, attr_name)

File ~/checkouts/readthedocs.org/user_builds/meteora/checkouts/stable/.pixi/envs/doc/lib/python3.13/site-packages/xarray/coding/times.py:1384, in CFDatetimeCoder.encode(self, variable, name)
   1383 def encode(self, variable: Variable, name: T_Name = None) -> Variable:
-> 1384     if np.issubdtype(variable.dtype, np.datetime64) or contains_cftime_datetimes(
   1385         variable
   1386     ):
   1387         dims, data, attrs, encoding = unpack_for_encoding(variable)
   1389         units = encoding.pop("units", None)

File ~/checkouts/readthedocs.org/user_builds/meteora/checkouts/stable/.pixi/envs/doc/lib/python3.13/site-packages/numpy/_core/numerictypes.py:534, in issubdtype(arg1, arg2)
    477 r"""
    478 Returns True if first argument is a typecode lower/equal in type hierarchy.
    479 
   (...)    531 
    532 """
    533 if not issubclass_(arg1, generic):
--> 534     arg1 = dtype(arg1).type
    535 if not issubclass_(arg2, generic):
    536     arg2 = dtype(arg2).type

TypeError: Cannot interpret 'datetime64[us, UTC]' as a data type
Raised while encoding variable 'time' with value <xarray.IndexVariable 'time' (time: 144)> Size: 1kB
<DatetimeArray>
['2021-08-13 00:20:00+00:00', '2021-08-13 00:50:00+00:00',
 '2021-08-13 01:20:00+00:00', '2021-08-13 01:50:00+00:00',
 '2021-08-13 02:20:00+00:00', '2021-08-13 02:50:00+00:00',
 '2021-08-13 03:20:00+00:00', '2021-08-13 03:50:00+00:00',
 '2021-08-13 04:20:00+00:00', '2021-08-13 04:50:00+00:00',
 ...
 '2021-08-15 19:20:00+00:00', '2021-08-15 19:50:00+00:00',
 '2021-08-15 20:20:00+00:00', '2021-08-15 20:50:00+00:00',
 '2021-08-15 21:20:00+00:00', '2021-08-15 21:50:00+00:00',
 '2021-08-15 22:20:00+00:00', '2021-08-15 22:50:00+00:00',
 '2021-08-15 23:20:00+00:00', '2021-08-15 23:50:00+00:00']
Length: 144, dtype: datetime64[us, UTC]

We can now read back the saved zarr store from disk into a vector data cube:

roundtripped = xr.open_zarr("geo-encoded.zarr").xvec.decode_cf()
roundtripped

which we can test to be identical to the original ts_cube (this may currently fail due to xvec/issues/141):

roundtripped.identical(ts_cube)

Footnotes#

See a benchmark comparing tstore, netCDF and zarr in terms of resulting file sizes as well as reading and writing times.

References#

[1]

Edzer Pebesma. Vector data cubes. r-spatial.org blog, 2022. URL: https://r-spatial.org/r/2022/09/12/vdc.html.