diff --git a/api-reference/tilebox.datasets/Collection.delete.mdx b/api-reference/tilebox.datasets/Collection.delete.mdx index 43ae441..f70615b 100644 --- a/api-reference/tilebox.datasets/Collection.delete.mdx +++ b/api-reference/tilebox.datasets/Collection.delete.mdx @@ -1,10 +1,10 @@ --- title: Collection.delete -icon: download +icon: layer-group --- ```python -def Collection.delete(datapoints: xr.Dataset) -> int +def Collection.delete(datapoints: DatapointIDs) -> int ``` Delete datapoints from the collection. @@ -17,8 +17,17 @@ Datapoints are identified and deleted by their ids. ## Parameters - - An [`xarray.Dataset`](/sdks/python/xarray) containing an "id" variable consisting of datapoint IDs to delete. + + Datapoint IDs to delete from the collection. + + Supported `DatapointIDs` types are. + - A `pandas.DataFrame` containing an `id` column. + - A `pandas.Series` containing datapoint IDs. + - An `xarray.Dataset` containing an "id" variable. + - An `xarray.DataArray` containing datapoint IDs. + - A `numpy.ndarray` containing datapoint IDs. + - A `Collection[UUID]` containing datapoint IDs as python built-in `UUID` objects, e.g. `list[UUID]`. + - A `Collection[str]` containing datapoint IDs as strings, e.g. `list[str]`. ## Returns @@ -27,8 +36,19 @@ The number of datapoints that were deleted. ```python Python -datapoints = collection.load("2023-05-01 12:45:33.423") - -n_deleted = collection.delete(datapoints) +collection.delete([ + "0195c87a-49f6-5ffa-e3cb-92215d057ea6", + "0195c87b-bd0e-3998-05cf-af6538f34957", +]) ``` + +## Errors + + + One of the data points is not found in the collection. If any of the data points are not found, + nothing will be deleted. + + + One of the specified ids is not a valid UUID + diff --git a/api-reference/tilebox.datasets/Collection.delete_ids.mdx b/api-reference/tilebox.datasets/Collection.delete_ids.mdx deleted file mode 100644 index efdfc14..0000000 --- a/api-reference/tilebox.datasets/Collection.delete_ids.mdx +++ /dev/null @@ -1,36 +0,0 @@ ---- -title: Collection.delete_ids -icon: download ---- - -```python -def Collection.delete_ids(datapoints_ids: list[UUID]) -> int -``` - -Delete datapoint from the collection by their ids. - - - You need to have write permission on the collection to be able to delete datapoints. - - -## Parameters - - - The ids of the datapoints to delete. - - -## Returns - -The number of datapoints that were deleted. - - -```python Python -from uuid import UUID - -datapoints_ids=[ - UUID("29b29ade-db02-427a-be9c-a8ef8184f544"), - UUID("fa4a8e4e-6afe-41a3-b228-b867330669bd"), -] -n_deleted = collection.delete_ids(datapoints_ids) -``` - diff --git a/api-reference/tilebox.datasets/Collection.find.mdx b/api-reference/tilebox.datasets/Collection.find.mdx index 74b7235..502df97 100644 --- a/api-reference/tilebox.datasets/Collection.find.mdx +++ b/api-reference/tilebox.datasets/Collection.find.mdx @@ -1,6 +1,6 @@ --- title: Collection.find -icon: download +icon: layer-group --- ```python diff --git a/api-reference/tilebox.datasets/Collection.info.mdx b/api-reference/tilebox.datasets/Collection.info.mdx index dcc1841..3028d52 100644 --- a/api-reference/tilebox.datasets/Collection.info.mdx +++ b/api-reference/tilebox.datasets/Collection.info.mdx @@ -1,6 +1,6 @@ --- title: Collection.info -icon: download +icon: layer-group --- ```python diff --git a/api-reference/tilebox.datasets/Collection.ingest.mdx b/api-reference/tilebox.datasets/Collection.ingest.mdx new file mode 100644 index 0000000..4b769ba --- /dev/null +++ b/api-reference/tilebox.datasets/Collection.ingest.mdx @@ -0,0 +1,62 @@ +--- +title: Collection.ingest +icon: layer-group +--- + +```python +def Collection.ingest( + data: IngestionData, + allow_existing: bool = True +) -> list[UUID] +``` + +Ingest data into a collection. + + + You need to have write permission on the collection to be able to delete datapoints. + + +## Parameters + + + The data to ingest. + + Supported `IngestionData` data types are: + - A `pandas.DataFrame`, mapping the column names to dataset fields. + - An `xarray.Dataset`, mapping variables and coordinates to dataset fields. + - `Iterable`, `dict` or `nd-array`: Ingest any object that can be converted to a `pandas.DataFrame` using + it's constructor, equivalent to `ingest(pd.DataFrame(data))`. + + + Datapoint fields are used to generate a deterministic unique `UUID` for each + datapoint in a collection. Duplicate datapoints result in the same ID being generated. + If `allow_existing` is `True`, `ingest` will skip those datapoints, since they already exist. + If `allow_existing` is `False`, `ingest` will raise an error if any of the generated datapoint IDs already exist. + Defaults to `True`. + + +## Returns + +List of datapoint ids that were ingested, including the IDs of already existing datapoints in case of duplicates and +`allow_existing=True`. + + +```python Python +import pandas as pd + +collection.ingest(pd.DataFrame({ + "time": [ + "2023-05-01T12:00:00Z", + "2023-05-02T12:00:00Z", + ], + "value": [1, 2], + "sensor": ["A", "B"], +})) +``` + + +## Errors + + + If `allow_existing` is `False` and any of the datapoints attempting to ingest already exist. + diff --git a/api-reference/tilebox.datasets/Collection.load.mdx b/api-reference/tilebox.datasets/Collection.load.mdx index b24a516..d44fc66 100644 --- a/api-reference/tilebox.datasets/Collection.load.mdx +++ b/api-reference/tilebox.datasets/Collection.load.mdx @@ -1,6 +1,6 @@ --- title: Collection.load -icon: download +icon: layer-group --- ```python @@ -32,7 +32,7 @@ If no data exists for the requested time or interval, an empty `xarray.Dataset` - If `True`, the response contains only the [datapoint metadata](/datasets/timeseries) without the actual dataset-specific fields. Defaults to `False`. + If `True`, the response contains only the [required fields for the dataset type](/datasets/types/timeseries) without the actual dataset-specific fields. Defaults to `False`. diff --git a/api-reference/tilebox.datasets/Dataset.collection.mdx b/api-reference/tilebox.datasets/Dataset.collection.mdx index 519ce19..e7a0399 100644 --- a/api-reference/tilebox.datasets/Dataset.collection.mdx +++ b/api-reference/tilebox.datasets/Dataset.collection.mdx @@ -1,6 +1,6 @@ --- title: Dataset.collection -icon: layer-group +icon: database --- ```python diff --git a/api-reference/tilebox.datasets/Dataset.collections.mdx b/api-reference/tilebox.datasets/Dataset.collections.mdx index aa0347c..e18e2a1 100644 --- a/api-reference/tilebox.datasets/Dataset.collections.mdx +++ b/api-reference/tilebox.datasets/Dataset.collections.mdx @@ -1,6 +1,6 @@ --- title: Dataset.collections -icon: layer-group +icon: database --- ```python diff --git a/api-reference/tilebox.datasets/Dataset.create_collection.mdx b/api-reference/tilebox.datasets/Dataset.create_collection.mdx index f1b463e..98d15db 100644 --- a/api-reference/tilebox.datasets/Dataset.create_collection.mdx +++ b/api-reference/tilebox.datasets/Dataset.create_collection.mdx @@ -1,6 +1,6 @@ --- title: Dataset.create_collection -icon: layer-group +icon: database --- ```python diff --git a/api-reference/tilebox.datasets/Dataset.get_or_create_collection.mdx b/api-reference/tilebox.datasets/Dataset.get_or_create_collection.mdx index e1dac81..483aa17 100644 --- a/api-reference/tilebox.datasets/Dataset.get_or_create_collection.mdx +++ b/api-reference/tilebox.datasets/Dataset.get_or_create_collection.mdx @@ -1,6 +1,7 @@ --- title: Dataset.get_or_create_collection -icon: layer-group +sidebarTitle: Dataset.get_or_create_c... +icon: database --- ```python diff --git a/assets/guides/ingest/dataset-schema-dark.png b/assets/guides/ingest/dataset-schema-dark.png new file mode 100644 index 0000000..e3a9106 Binary files /dev/null and b/assets/guides/ingest/dataset-schema-dark.png differ diff --git a/assets/guides/ingest/dataset-schema-light.png b/assets/guides/ingest/dataset-schema-light.png new file mode 100644 index 0000000..c871b55 Binary files /dev/null and b/assets/guides/ingest/dataset-schema-light.png differ diff --git a/assets/guides/ingest/dataset-slug-dark.png b/assets/guides/ingest/dataset-slug-dark.png new file mode 100644 index 0000000..b713fad Binary files /dev/null and b/assets/guides/ingest/dataset-slug-dark.png differ diff --git a/assets/guides/ingest/dataset-slug-light.png b/assets/guides/ingest/dataset-slug-light.png new file mode 100644 index 0000000..4424f4d Binary files /dev/null and b/assets/guides/ingest/dataset-slug-light.png differ diff --git a/assets/guides/ingest/explorer-dark.png b/assets/guides/ingest/explorer-dark.png new file mode 100644 index 0000000..eda436a Binary files /dev/null and b/assets/guides/ingest/explorer-dark.png differ diff --git a/assets/guides/ingest/explorer-light.png b/assets/guides/ingest/explorer-light.png new file mode 100644 index 0000000..f701ade Binary files /dev/null and b/assets/guides/ingest/explorer-light.png differ diff --git a/assets/guides/ingest/modis-explore-dark.png b/assets/guides/ingest/modis-explore-dark.png new file mode 100644 index 0000000..6e8e587 Binary files /dev/null and b/assets/guides/ingest/modis-explore-dark.png differ diff --git a/assets/guides/ingest/modis-explore-light.png b/assets/guides/ingest/modis-explore-light.png new file mode 100644 index 0000000..3e6e0c9 Binary files /dev/null and b/assets/guides/ingest/modis-explore-light.png differ diff --git a/assets/jupyter/tilebox-banner-python.svg b/assets/jupyter/tilebox-banner-python.svg new file mode 100644 index 0000000..4d2935e --- /dev/null +++ b/assets/jupyter/tilebox-banner-python.svg @@ -0,0 +1,21 @@ + + + Layer 1 + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/assets/tilebox-banner-go.svg b/assets/tilebox-banner-go.svg new file mode 100644 index 0000000..7e2daa3 --- /dev/null +++ b/assets/tilebox-banner-go.svg @@ -0,0 +1,21 @@ + + + Layer 1 + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/assets/tilebox-banner-python.svg b/assets/tilebox-banner-python.svg new file mode 100644 index 0000000..ad2d949 --- /dev/null +++ b/assets/tilebox-banner-python.svg @@ -0,0 +1,21 @@ + + + Layer 1 + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/changelog.mdx b/changelog.mdx new file mode 100644 index 0000000..a64508d --- /dev/null +++ b/changelog.mdx @@ -0,0 +1,12 @@ +--- +title: Product Updates +description: New updates and improvements +icon: rss +--- + + + + ## Custom Datasets + + Coming soon! + diff --git a/datasets/collections.mdx b/datasets/concepts/collections.mdx similarity index 72% rename from datasets/collections.mdx rename to datasets/concepts/collections.mdx index bbe3438..87201be 100644 --- a/datasets/collections.mdx +++ b/datasets/concepts/collections.mdx @@ -1,6 +1,6 @@ --- title: Collections -description: Learn about time series dataset collections +description: Learn about dataset collections icon: layer-group --- @@ -10,13 +10,13 @@ Collections group data points within a dataset. They help represent logical grou This section provides a quick overview of the API for listing and accessing collections. Below are some usage examples for different scenarios. -| Method | API Reference | Description | -| ---------------------------------- | ------------------------------------------------------------------------------------------------------ | ------------------------------------------------ | -| `dataset.collections` | [Listing collections](/api-reference/tilebox.datasets/Dataset.collections) | List all available collections for a dataset. | -| `dataset.create_collection` | [Creating a collection](/api-reference/tilebox.datasets/Dataset.create_collection) | Create a collection in a dataset. | -| `dataset.get_or_create_collection` | [Accessing or creating a collection](/api-reference/tilebox.datasets/Dataset.get_or_create_collection) | Get a collection, create it if it doesn't exist. | -| `dataset.collection` | [Accessing a collection](/api-reference/tilebox.datasets/Dataset.collection) | Access an individual collection by its name. | -| `collection.info` | [Collection information](/api-reference/tilebox.datasets/Collection.info) | Request information about a collection. | +| Method | Description | +| -------------------------------------------------------------------- | ---------------------------------------------------- | +| [`dataset.collections`](/api-reference/tilebox.datasets/Dataset.collections) | List all available collections for a dataset. | +| [`dataset.create_collection`](/api-reference/tilebox.datasets/Dataset.create_collection) | Create a collection in a dataset. | +| [`dataset.get_or_create_collection`](/api-reference/tilebox.datasets/Dataset.get_or_create_collection) | Get a collection, create it if it doesn't exist. | +| [`dataset.collection`](/api-reference/tilebox.datasets/Dataset.collection) | Access an individual collection by its name. | +| [`collection.info`](/api-reference/tilebox.datasets/Collection.info) | Request information about a collection. | Refer to the examples below for common use cases when working with collections. These examples assume that you have already [created a client](/datasets/introduction#creating-a-datasets-client) and [listed the available datasets](/api-reference/tilebox.datasets/Client.datasets). @@ -114,7 +114,7 @@ dataset.collection("Sat-X").info() # raises NotFoundError: 'No such collection S ## Next steps - - Learn how to load data points from a collection. + + Learn how to query data from a collection. diff --git a/datasets/concepts/datasets.mdx b/datasets/concepts/datasets.mdx new file mode 100644 index 0000000..3dfad9b --- /dev/null +++ b/datasets/concepts/datasets.mdx @@ -0,0 +1,158 @@ +--- +title: Datasets +description: Tilebox Datasets act as containers for data points. All data points in a dataset share the same type and fields. +icon: database +--- + +## Overview + +This section provides a quick overview of the API for listing and accessing datasets. + +| Method | Description | +| -------------------------------------------------------------------- | ---------------------------------------------------- | +| [`client.datasets`](/api-reference/tilebox.datasets/Client.datasets) | List all available datasets. | +| [`client.dataset`](/api-reference/tilebox.datasets/Client.dataset) | Access an individual dataset by its name. | + + + You can create your own, custom datasets via the [Tilebox Console](/console). + + +## Related Guides + + + + Learn how to create a Timeseries dataset using the Tilebox Console. + + + Learn how to ingest an existing CSV dataset into a Timeseries dataset collection. + + + +## Dataset types + +Each dataset is of a specific type. Each dataset type comes with a set of required fields for each data point. +The dataset type also determines the query capabilities for a dataset, e.g. whether a dataset supports time-based queries +or additionally also spatially filtered queries. + +To find out which fields are required for each dataset type check out the documentation for the available dataset types +below. + + + + Each data point is linked to a specific point in time. Common for satellite telemetry, or other time-based data. + Supports efficient time-based queries. + + + Each data point is linked to a specific point in time and a location on the Earth's surface. Common for satellite + imagery. Supports efficient time-based and spatially filtered queries. + + + +## Dataset specific fields + +Additionally, each dataset has a set of fields that are specific to that dataset. Fields are defined during dataset +creation. That way, all data points in a dataset are strongly typed and are validated during ingestion. +The required fields of the dataset type, as well as the custom fields specific to each dataset together make up the +**dataset schema**. + +Once a **dataset schema** is defined, existing fields cannot be removed or edited as soon as data has been ingested into it. +However, you can always add new fields to a dataset, since all fields are always optional. + + + The only exception to this rule are empty datasets. If you empty all collections in a dataset, you can freely + edit the data schema, since no conflicts with existing data points can occur. + + + +## Field types + +When defining the data schema, you can specify the type of each field. The following field types are supported. + +### Primitives + +| Type | Description | Example value | +| --- | --- | --- | +| string | A string of characters of arbitrary length. | `Some string` | +| int64 | A 64-bit signed integer. | `123` | +| uint64 | A 64-bit unsigned integer. | `123` | +| float64 | A 64-bit floating-point number. | `123.45` | +| bool | A boolean. | `true` | +| bytes | A sequence of arbitrary length bytes. | `0xAF1E28D4` | + +### Time + +| Type | Description | Example value | +| --- | --- | --- | +| Duration | A signed, fixed-length span of time represented as a count of seconds and fractions of seconds at nanosecond resolution. See [Duration](https://protobuf.dev/reference/protobuf/google.protobuf/#duration) for more information. | `12s 345ms` | +| Timestamp | A point in time, represented as seconds and fractions of seconds at nanosecond resolution in UTC Epoch time. See [Timestamp](https://protobuf.dev/reference/protobuf/google.protobuf/#timestamp) for more information. | `2023-05-17T14:30:00Z` | + +### Identifier + +| Type | Description | Example value | +| --- | --- | --- | +| UUID | A [universally unique identifier (UUID)](https://en.wikipedia.org/wiki/Universally_unique_identifier). | `126a2531-c98d-4e06-815a-34bc5b1228cc` | + +### Geospatial + +| Type | Description | Example value | +| --- | --- | --- | +| Geometry | Geospatial geometries of type Point, LineString, Polygon or MultiPolygon. | `POLYGON ((12.3 -5.4, 12.5 -5.4, ...))` | + +### Arrays + +Every type is also available as an array, allowing to ingest multiple values of the underlying type for each data point. The size of the array is flexible, and can be different for each data point. + +## Creating a dataset + +You can create a dataset in Tilebox using the [Tilebox Console](/console). Check out the [Creating a dataset](/guides/datasets/create) guide for an example of how to achieve this. + +## Listing datasets + +You can use [your client instance](/datasets/introduction#creating-a-datasets-client) to access the datasets available to you. To list all available datasets, use the `datasets` method of the client. + + +```python Python +from tilebox.datasets import Client + +client = Client() +datasets = client.datasets() +print(datasets) +``` + + +```plaintext Output +open_data: + asf: + ers_sar: European Remote Sensing Satellite (ERS) Synthetic Aperture Radar (SAR) Granules + copernicus: + landsat8_oli_tirs: Landsat-8 is part of the long-running Landsat programme ... + sentinel1_sar: The Sentinel-1 mission is the European Radar Observatory for the ... + sentinel2_msi: Sentinel-2 is equipped with an optical instrument payload that samples ... + sentinel3_olci: OLCI (Ocean and Land Colour Instrument) is an optical instrument used to ... + ... +``` + +Once you have your dataset object, you can use it to [list the available collections](/datasets/concepts/collections) for the dataset. + + + If you're using an IDE or an interactive environment with auto-complete, you can use it on your client instance to discover the datasets available to you. Type `client.` and trigger auto-complete after the dot to do so. + + +## Accessing a dataset + +Each dataset has an automatically generated *code name* that can be used to access it. The *code name* is the name of the group, followed by a dot, followed by the dataset name. +For example, the *code name* for the Sentinel-2 MSI dataset above, which is part of the `open_data.copernicus` group, the code name is `open_data.copernicus.sentinel2_msi`. + +To access a dataset, use the `dataset` method of your client instance and pass the *code name* of the dataset as an argument. + + +```python Python +from tilebox.datasets import Client + +client = Client() +s2_msi_dataset = client.dataset("open_data.copernicus.sentinel2_msi") +``` + + +Once you have your dataset object, you can use it to [access available collections](/datasets/concepts/collections) for the dataset. + diff --git a/datasets/delete.mdx b/datasets/delete.mdx new file mode 100644 index 0000000..01e1fed --- /dev/null +++ b/datasets/delete.mdx @@ -0,0 +1,84 @@ +--- +title: Deleting Data +sidebarTitle: Delete +description: Learn how to delete data points from Tilebox datasets. +icon: trash-can +--- + +import { CodeOutputHeader } from '/snippets/components.mdx'; + +## Overview + +This section provides an overview of the API for deleting data from a collection. + +| Method | Description | +| ------ | ----------- | +| [`collection.delete`](/api-reference/tilebox.datasets/Collection.delete) | Delete data points from a collection. | + + + You need to have write permission on the collection to be able to delete datapoints. + + +Check out the examples below for common scenarios of deleting data from a collection. + +## Deleting data by datapoint IDs + +To delete data from a collection, use the [delete](/api-reference/tilebox.datasets/Collection.delete) method. This method accepts a list of datapoint IDs to delete. + + +```python Python +from tilebox.datasets import Client + +client = Client() +datasets = client.datasets() +collections = datasets.my_custom_dataset.collections() +collection = collections["Sensor-1"] + +n_deleted = collection.delete([ + "0195c87a-49f6-5ffa-e3cb-92215d057ea6", + "0195c87b-bd0e-3998-05cf-af6538f34957", +]) + +print(f"Deleted {n_deleted} data points.") +``` + + + +```plaintext Output +Deleted 2 data points. +``` + + + + `delete` not only takes a list of datapoint IDs as string, but supports a wide range of other useful input types as well. + See the [delete](/api-reference/tilebox.datasets/Collection.delete) API documentation for more details. + + + +### Possible errors + +- `NotFoundError`: raised if one of the data points is not found in the collection. If any of the data points are not found, + nothing will be deleted. +- `ValueError`: raised if one of the specified ids is not a valid UUID + + +## Deleting a time interval + +One common way to delete data is to first load it from a collection and then forward it to the `delete` method. For +this use case it often is a good idea to query the datapoints with `skip_data=True` to avoid loading the data fields, +since we only need the datapoint IDs. See [fetching only metadata](/datasets/query#fetching-only-metadata) for more details. + + +```python Python +to_delete = collection.load(("2023-05-01", "2023-06-01"), skip_data=True) + +n_deleted = collection.delete(datapoints) +print(f"Deleted {n_deleted} data points.") +``` + + + +```plaintext Output +Deleted 104 data points. +``` + diff --git a/datasets/ingest-delete-data.mdx b/datasets/ingest-delete-data.mdx deleted file mode 100644 index 1c85149..0000000 --- a/datasets/ingest-delete-data.mdx +++ /dev/null @@ -1,75 +0,0 @@ ---- -title: Ingesting and Deleting Data -sidebarTitle: Ingesting and deleting data -description: Learn how to ingest and delete data from Time Series Dataset collections. -icon: download ---- - -## Overview - -This section provides an overview of the API for ingesting and deleting data from a collection. It includes usage examples for many common scenarios. - -| Method | API Reference | Description | -| ----------------------- | ------------------------------------------------------------------------------------ | ---------------------------------------------------- | -| `collection.delete` | [Deleting data points](/api-reference/tilebox.datasets/Collection.delete) | Delete data points from a collection. | -| `collection.delete_ids` | [Deleting data points by IDs](/api-reference/tilebox.datasets/Collection.delete_ids) | Delete data points from a collection by their ids. | - - - You need to have write permission on the collection to be able to ingest or delete datapoints. - - -Check out the examples below for common scenarios when ingesting and deleting data from collections. -The examples assume you have already [created a client](/datasets/introduction#creating-a-datasets-client) and [accessed a specific dataset collection](/datasets/collections) that you have write permissions on. - - -```python Python -from tilebox.datasets import Client - -client = Client() -datasets = client.datasets() -collections = datasets.open_data.copernicus.sentinel1_sar.collections() -collection = collections["S1A_IW_RAW__0S"] -``` - - -## Ingesting data - - -## Deleting data - -To delete data from a collection, use the [delete](/api-reference/tilebox.datasets/Collection.delete) or [delete_ids](/api-reference/tilebox.datasets/Collection.delete_ids) method. - -One common way to delete data is to load it from a collection and then forward it to the `delete` method. - - -```python Python -datapoints = collection.load(("2023-05-01", "2023-06-01")) - -n_deleted = collection.delete(datapoints) -print(f"Deleted {n_deleted} data points.") -``` - - -```plaintext Output -Deleted 104 data points. -``` - -In case you already have the list of datapoint IDs that you want to delete, you can use the `delete_ids` method. - - -```python Python -from uuid import UUID - -datapoints_ids=[ - UUID("29b29ade-db02-427a-be9c-a8ef8184f544"), - UUID("fa4a8e4e-6afe-41a3-b228-b867330669bd"), -] - -n_deleted = collection.delete_ids(datapoints_ids) -print(f"Deleted {n_deleted} data points.") -``` - - -```plaintext Output -Deleted 2 data points. -``` diff --git a/datasets/ingest.mdx b/datasets/ingest.mdx new file mode 100644 index 0000000..c956d5d --- /dev/null +++ b/datasets/ingest.mdx @@ -0,0 +1,345 @@ +--- +title: Ingesting Data +sidebarTitle: Ingest +description: Learn how to delete data points from Tilebox datasets. +icon: up-from-bracket +--- + +import { CodeOutputHeader } from '/snippets/components.mdx'; + +## Overview + +This section provides an overview of the API for ingesting and deleting data from a collection. It includes usage examples for many common scenarios. + +| Method | Description | +| ----------------------- | ---------------------------------------------------- | +| [`collection.ingest`](/api-reference/tilebox.datasets/Collection.ingest) | Ingest data into a collection. | + + + You need to have write permission on the collection to be able to ingest data. + + +Check out the examples below for common scenarios of ingesting data into a collection. + +## Dataset schema + +Tilebox Datasets are strongly-typed. This means you can only ingest data that matches the schema of a dataset. The schema is defined during dataset creation time. +The examples on this page assume that you have access to a [Timeseries dataset](/datasets/types/timeseries) that has the following schema: + + + Check out the [Creating a dataset](/guides/datasets/create) guide for an example of how to create such a dataset. + + +**MyCustomDataset schema** + +| Field name | Type | Description | +| ---------- | ---- | ----------- | +| `time` | Timestamp | Timestamp of the data point. Required by the [Timeseries dataset](/datasets/types/timeseries) type. | +| `id` | UUID | Auto-generated UUID for each datapoint. | +| `ingestion_time` | Timestamp | Auto-generated timestamp for when the data point was ingested into the Tilebox API. | +| `value` | float64 | A numeric measurement value. | +| `sensor` | string | A name of the sensor that generated the data point. | +| `precise_time` | Timestamp | A precise measurement time in nanosecond precision. | +| `sensor_history` | Array[float64] | The last few measurements of the sensor. | + + + A full overview of available data types can be found in the [here](/datasets/concepts/datasets#field-types). + + +Once we've defined the schema and created a dataset, we can access it and create a collection to ingest data into. + + +```python Python +from tilebox.datasets import Client + +client = Client() +dataset = client.dataset("my_org.my_custom_dataset") +collection = dataset.get_or_create_collection("Measurements") +``` + + +## Preparing data for ingestion + +[`collection.ingest`](/api-reference/tilebox.datasets/Collection.ingest) supports a wide range of input types. Below is an example of using either a `pandas.DataFrame` or an `xarray.Dataset` as input. + +### pandas.DataFrame + +A [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) is a representation of two-dimensional, potentially heterogeneous tabular data. It is a powerful tool for working with structured data, and Tilebox supports it as input for `ingest`. + +The example below shows how to construct a `pandas.DataFrame` from scratch, that matches the schema of the `MyCustomDataset` dataset and can therefore be ingested into it. + + +```python Python +import pandas as pd + +data = pd.DataFrame({ + "time": [ + "2025-03-28T11:44:23Z", + "2025-03-28T11:45:19Z", + ], + "value": [45.16, 273.15], + "sensor": ["A", "B"], + "precise_time": [ + "2025-03-28T11:44:23.345761444Z", + "2025-03-28T11:45:19.128742312Z", + ], + "sensor_history": [ + [-12.15, 13.45, -8.2, 16.5, 45.16], + [300.16, 280.12, 273.15], + ], +}) +print(data) +``` + + + +```plaintext Python + time value sensor precise_time sensor_history +0 2025-03-28T11:44:23Z 45.16 A 2025-03-28T11:44:23.345761444Z [-12.15, 13.45, -8.2, 16.5, 45.16] +1 2025-03-28T11:45:19Z 273.15 B 2025-03-28T11:45:19.128742312Z [300.16, 280.12, 273.15] +``` + + +Once we have the data ready in this format, we can `ingest` it into a collection. + + +```python Python +# now that we have the data frame in the correct format +# we can ingest it into the Tilebox dataset +collection.ingest(data) + +# To verify it now contains the 2 data points +print(collection.info()) +``` + + + +```plaintext Python +Measurements: [2025-03-28T11:44:23.000 UTC, 2025-03-28T11:45:19.000 UTC] (2 data points) +``` + + + + You can now also head on over to the [Tilebox Console](/console) and view the newly ingested data points there. + + +### xarray.Dataset + +[xarray.Dataset](/sdks/python/xarray) is the default format in which Tilebox Datasets returns data when +[querying data](/datasets/query) from a collection. +Tilebox also supports it as input for ingestion. The example below shows how to construct an `xarray.Dataset` +from scratch, that matches the schema of the `MyCustomDataset` dataset and can therefore be ingested into it. +To learn more about `xarray.Dataset`, visit our dedicated [Xarray documentation page](/sdks/python/xarray). + + +```python Python +import pandas as pd + +data = xr.Dataset({ + "time": ("time", [ + "2025-03-28T11:46:13Z", + "2025-03-28T11:46:54Z", + ]), + "value": ("time", [48.1, 290.12]), + "sensor": ("time", ["A", "B"]), + "precise_time": ("time", [ + "2025-03-28T11:46:13.345761444Z", + "2025-03-28T11:46:54.128742312Z", + ]), + "sensor_history": (("time", "n_sensor_history"), [ + [13.45, -8.2, 16.5, 45.16, 48.1], + [280.12, 273.15, 290.12, np.nan, np.nan], + ]), +}) +print(data) +``` + + + +```plaintext Python + Size: 504B +Dimensions: (time: 2, n_sensor_history: 5) +Coordinates: + * time (time) + + + Array fields manifest in xarray using an extra dimension, in this case `n_sensor_history`. Therefore in case + of different array sizes for each data point, remaining values are filled up with a fill value, depending on the + `dtype` of the array. For `float64` this is `np.nan` (not a number). + Don't worry - when ingesting data into a Tilebox dataset, Tilebox will automatically skip those padding fill values + and not store them in the dataset. + + +Now that we have the `xarray.Dataset` in the correct format, we can ingest it into the Tilebox dataset collection. + + +```python Python +collection = dataset.get_or_create_collection("OtherMeasurements") +collection.ingest(data) + +# To verify it now contains the 2 data points +print(collection.info()) +``` + + + +```plaintext Python +OtherMeasurements: [2025-03-28T11:46:13.000 UTC, 2025-03-28T11:46:54.000 UTC] (2 data points) +``` + + + +## Copying or moving data + +Since [collection.load](/datasets/query) returns a `xarray.Dataset`, and `ingest` takes such a dataset as input you +can easily copy or move data from one collection to another. + + + Copying data like this also works across datasets in case the dataset schemas are compatible. + + + +```python Python +src_collection = dataset.collection("Measurements") +data_to_copy = src_collection.load(("2025-03-28", "2025-03-29")) + +dest_collection = dataset.collection("OtherMeasurements") +dest_collection.ingest(data_to_copy) # copy the data to the other collection + +# To verify it now contains 4 datapoints (2 we ingested already, and 2 we copied just now) +print(dest_collection.info()) +``` + + + +```plaintext Python +OtherMeasurements: [2025-03-28T11:44:23.000 UTC, 2025-03-28T11:46:54.000 UTC] (4 data points) +``` + + +## Idempotency + +Tilebox will auto-generate datapoint IDs based on the data of all of its fields - except for the auto-generated +`ingestion_time`, so ingesting the same data twice will result in the same ID being generated. By default, Tilebox +will silently skip any data points that are duplicates of existing ones in a collection. This behavior is especially +useful when implementing idempotent algorithms. That way, re-executions of certain ingestion tasks due to retries +or other reasons will never result in duplicate data points. + +However, you can instead also request an error to be raised if any of the generated datapoint IDs already exist. +This can be done by setting the `allow_existing` parameter to `False`. + + +```python Python +data = pd.DataFrame({ + "time": [ + "2025-03-28T11:45:19Z", + ], + "value": [45.16], + "sensor": ["A"], + "precise_time": [ + "2025-03-28T11:44:23.345761444Z", + ], + "sensor_history": [ + [-12.15, 13.45, -8.2, 16.5, 45.16], + ], +}) + +# we already ingested the same data point previously +collection.ingest(data, allow_existing=False) + +# we can still ingest it, by setting allow_existing=True +# but the total number of datapoints will still be the same +# as before in that case, since it already exists and therefore +# will be skipped +collection.ingest(data, allow_existing=True) # no-op +``` + + + +```plaintext Python +ArgumentError: found existing datapoints with same id, refusing to ingest with "allow_existing=false" +``` + + +## Ingestion from common file formats + +Through the usage of `xarray` and `pandas` you can also easily ingest existing datasets available in file +formats, such as CSV, [Parquet](https://parquet.apache.org/), [Feather](https://arrow.apache.org/docs/python/feather.html) and more. + +### CSV + +Comma-separated values (CSV) is a common file format for tabular data. It is widely used in data science. Tilebox +supports CSV ingestion using the `pandas.read_csv` function. + +Let's assume we have a CSV file named `data.csv` with the following content. If you want to follow along, you can +download the file [here](https://storage.googleapis.com/tbx-web-assets-2bad228/docs/data-samples/ingestion_data.csv). + +```csv ingestion_data.csv +time,value,sensor,precise_time,sensor_history,some_unwanted_column +2025-03-28T11:44:23Z,45.16,A,2025-03-28T11:44:23.345761444Z,"[-12.15, 13.45, -8.2, 16.5, 45.16]","Unsupported" +2025-03-28T11:45:19Z,273.15,B,2025-03-28T11:45:19.128742312Z,"[300.16, 280.12, 273.15]","Unsupported" +``` + +This data already conforms to the schema of the `MyCustomDataset` dataset, except for `some_unwanted_column` which +we want to drop before we ingest it. Here is how this could look like: + + +```python Python +import pandas as pd + +data = pd.read_csv("ingestion_data.csv") +data = data.drop(columns=["some_unwanted_column"]) + +collection = dataset.get_or_create_collection("CSVMeasurements") +collection.ingest(data) +``` + + +### Parquet + +[Apache Parquet](https://parquet.apache.org/) is an open source, column-oriented data file format designed for efficient data storage and retrieval. +Tilebox supports Parquet ingestion using the `pandas.read_parquet` function. + +The parquet file used in this example [is available here](https://storage.googleapis.com/tbx-web-assets-2bad228/docs/data-samples/ingestion_data.parquet). + + +```python Python +import pandas as pd + +data = pd.read_parquet("ingestion_data.parquet") + +# our data already conforms to the schema of the MyCustomDataset +# dataset, so lets ingest it +collection = dataset.get_or_create_collection("ParquetMeasurements") +collection.ingest(data) +``` + + +### Feather + +[Feather](https://arrow.apache.org/docs/python/feather.html) is a file format originating from the Apache Arrow project, +designed for storing tabular data in a fast and memory-efficient way. It is supported by many programming languages, +including Python. Tilebox supports Feather ingestion using the `pandas.read_feather` function. + +The feather file file used in this example [is available here](https://storage.googleapis.com/tbx-web-assets-2bad228/docs/data-samples/ingestion_data.feather). + + +```python Python +import pandas as pd + +data = pd.read_feather("ingestion_data.feather") + +# our data already conforms to the schema of the MyCustomDataset +# dataset, so lets ingest it +collection = dataset.get_or_create_collection("FeatherMeasurements") +collection.ingest(data) +``` + diff --git a/datasets/introduction.mdx b/datasets/introduction.mdx index da2d080..5dd8c4c 100644 --- a/datasets/introduction.mdx +++ b/datasets/introduction.mdx @@ -1,30 +1,34 @@ --- -title: Introduction -description: Learn about Tilebox Datasets +title: Tilebox Datasets +sidebarTitle: Introduction +description: Tilebox Datasets provides structured and high-performance satellite metadata access. icon: house --- -Time series datasets refer to datasets where each data point is linked to a timestamp. This format is common for data collected over time, such as satellite data. +Tilebox Datasets ingests and structures metadata for efficient querying, significantly reducing data transfer and storage costs. -This section covers: +Create your own [custom datasets](/datasets/concepts/datasets) and easily set up a private, custom, strongly-typed and highly-available catalogue, or +explore any of the wide range of [available public open data datasets](/datasets/open-data) available on Tilebox. + +Learn more about datasets by exploring the following sections: - - Discover available time series datasets and learn how to list them. - - - Understand the common fields shared by all time series datasets. + + Learn what dataset types are available on Tilebox and how to create, list and access them. - + Learn what collections are and how to access them. - + Find out how to access data from a collection for specific time intervals. + + Learn how to ingest data into a collection. + - For a quick reference to API methods or specific parameter meanings, [check out the complete time series API Reference](/api-reference/tilebox.datasets/Client). + For a quick reference to API methods or specific parameter meanings, [check out the complete Datasets API Reference](/api-reference/tilebox.datasets/Client). ## Terminology @@ -33,10 +37,10 @@ Get familiar with some key terms when working with time series datasets. - Time series data points are individual entities that form a dataset. Each data point has a timestamp and consists of a set of fixed [metadata fields](/datasets/timeseries#common-fields) along with dataset-specific fields. + Data points are the individual entities that form a dataset. Each data point has a set of required [fields](/datasets/types/timeseries) determined by the dataset type, and can have additional, custom user-defined fields. - Time series datasets act as containers for data points. All data points in a dataset share the same type and fields. + Datasets act as containers for data points. All data points in a dataset share the same type and fields. Tilebox supports different types of datasets, currently those are [Timeseries](/datasets/types/timeseries) and [Spatio-temporal](/datasets/types/spatiotemporal) datasets. Collections group data points within a dataset. They help represent logical groupings of data points that are often queried together. @@ -106,7 +110,7 @@ datasets = client.datasets() # raises AuthenticationError ## Next steps - + diff --git a/datasets/managed-datasets/data-types.mdx b/datasets/managed-datasets/data-types.mdx deleted file mode 100644 index 8ee92c8..0000000 --- a/datasets/managed-datasets/data-types.mdx +++ /dev/null @@ -1,69 +0,0 @@ ---- -title: Data types -description: Learn about the data types supported by Tilebox -icon: file-binary ---- - -Tilebox supports primitives and data for data fields. These are defined in [well_known_types.proto](https://github.com/tilebox/tilebox-go/blob/main/apis/datasets/v1/well_known_types.proto). -Tilebox also uses Google well-known-types [Duration](https://protobuf.dev/reference/protobuf/google.protobuf/#duration) and [Timestamp](https://protobuf.dev/reference/protobuf/google.protobuf/#timestamp). - - - If there is a data type you want to see in Tilebox, - please get in touch. - - -## Primitive types - - - A string of characters. - - - - A 64-bit signed integer. - - - - A 64-bit unsigned integer. - - - - A 64-bit floating-point number. - - - - A boolean. - - - - A sequence of bytes. - - -## Time types - - - A duration of time. See [Duration](https://protobuf.dev/reference/protobuf/google.protobuf/#duration) for more information. - - - - A point in time. See [Timestamp](https://protobuf.dev/reference/protobuf/google.protobuf/#timestamp) for more information. - - -## Identifier types - - - A [universally unique identifier (UUID)](https://en.wikipedia.org/wiki/Universally_unique_identifier). - - -## Geospatial types - - - A latitude and longitude pair. - - - - A latitude, longitude, and altitude triplet. - - - - Geospatial geometries of type Point, LineString, Polygon or MultiPolygon. - diff --git a/datasets/loading-data.mdx b/datasets/query.mdx similarity index 72% rename from datasets/loading-data.mdx rename to datasets/query.mdx index c38e0f2..4c2d254 100644 --- a/datasets/loading-data.mdx +++ b/datasets/query.mdx @@ -1,20 +1,20 @@ --- -title: Loading Time Series Data -sidebarTitle: Loading Data -description: Learn how to load data from Time Series Dataset collections. -icon: download +title: Querying data +sidebarTitle: Query +description: Learn how to query and load data from Tilebox datasets. +icon: server --- ## Overview This section provides an overview of the API for loading data from a collection. It includes usage examples for many common scenarios. -| Method | API Reference | Description | -| ----------------- | ---------------------------------------------------------------------------- | ---------------------------------------------------- | -| `collection.load` | [Loading data](/api-reference/tilebox.datasets/Collection.load) | Load data points from a collection. | -| `collection.find` | [Loading a data point](/api-reference/tilebox.datasets/Collection.find) | Find a specific datapoint in a collection by its id. | +| Method | Description | +| -------------------------------------------------------------------- | ---------------------------------------------------- | +| [`collection.load`](/api-reference/tilebox.datasets/Collection.load) | Query data points from a collection. | +| [`collection.find`](/api-reference/tilebox.datasets/Collection.find) | Find a specific datapoint in a collection by its id. | -Check out the examples below for common scenarios when loading data from collections. The examples assume you have already [created a client](/datasets/introduction#creating-a-datasets-client) and [accessed a specific dataset collection](/datasets/collections). +Check out the examples below for common scenarios when loading data from collections. ```python Python @@ -27,11 +27,11 @@ collection = collections["S1A_IW_RAW__0S"] ``` -## Loading data - To load data points from a dataset collection, use the [load](/api-reference/tilebox.datasets/Collection.load) method. It requires a `time_or_interval` parameter to specify the time or time interval for loading. -### TimeInterval +## Filtering by time + +### Time interval To load data for a specific time interval, use a `tuple` in the form `(start, end)` as the `time_or_interval` parameter. Both `start` and `end` must be [TimeScalars](#time-scalars), which can be `datetime` objects or strings in ISO 8601 format. @@ -80,9 +80,9 @@ data = xr.concat(data, dim="time") Above example demonstrates how to split a large time interval into smaller chunks while loading data in separate requests. Typically, this is not necessary as the datasets client auto-paginates large intervals. -### TimeInterval objects +### Endpoint inclusivity -For greater control over inclusivity of start and end times, you can use the `TimeInterval` dataclass instead of a tuple with the `load` parameter. This class allows you to specify the `start` and `end` times, as well as their inclusivity. Here's an example of creating equivalent `TimeInterval` objects in two different ways. +For greater control over inclusivity of start and end times, you can use the `TimeInterval` dataclass instead of a tuple of two [TimeScalars](#time-scalars). This class allows you to specify the `start` and `end` times, as well as their inclusivity. Here's an example of creating equivalent `TimeInterval` objects in two different ways. ```python Python @@ -94,6 +94,7 @@ interval1 = TimeInterval( end_inclusive=False ) interval2 = TimeInterval( + # python datetime granularity is in milliseconds datetime(2017, 1, 1), datetime(2022, 12, 31, 23, 59, 59, 999999), end_inclusive=True ) @@ -104,7 +105,7 @@ print(interval2) print(f"They are equivalent: {interval1 == interval2}") print(interval2.to_half_open()) -# Same operation as above +# Query data for a time interval data = collection.load(interval1, show_progress=True) ``` @@ -119,9 +120,13 @@ They are equivalent: True ### Time scalars -You can load all points for a specific time using a `TimeScalar` for the `time_or_interval` parameter to `load`. A `TimeScalar` can be a `datetime` object or a string in ISO 8601 format. When passed to the `load` method, it retrieves all data points matching the specified time. Note that the `time` field of data points in a collection may not be unique, so multiple data points could be returned. If you want to fetch only a single data point, use [find](#loading-a-data-point-by-id) instead. +You can load all datapoints linked to a specific timestamp by specifying a `TimeScalar` as the time query argument. A `TimeScalar` can be a `datetime` object or a string in ISO 8601 format. When passed to the `load` method, it retrieves all data points matching exactly that specified time, with a millisecond precision. + + + A collection may contain multiple datapoints for one millisecond, so multiple data points could still be returned. If you want to fetch only a single data point, [query the collection by id](#loading-a-data-point-by-id) instead. + -Here's how to load a data point at a specific time from a [collection](/datasets/collections). +Here's how to load a data point at a specific millisecond from a [collection](/datasets/concepts/collections). ```python Python @@ -146,7 +151,7 @@ Data variables: (12/30) ``` - Tilebox uses millisecond precision for timestamps. To load all data points for a specific second, it's a [time interval](/datasets/loading-data#time-intervals) request. Refer to the examples below for details. + Tilebox uses millisecond precision for timestamps. To load all data points for a specific second, it's a [time interval](/datasets/query#time-interval) request. Refer to the examples below for details. The output of the `load` method is an `xarray.Dataset` object. To learn more about Xarray, visit the dedicated [Xarray page](/sdks/python/xarray). @@ -182,9 +187,54 @@ Data variables: (12/30) This feature works by constructing a `TimeInterval` object from the first and last elements of the iterable, making both the start and end time inclusive. +### Timezones + +All `TimeScalars` specified as a string are treated as UTC if they do not include a timezone suffix. If you want to query data for a specific time or time range +in another timezone, it's recommended to use a `datetime` object. In this case, the Tilebox API will convert the datetime to `UTC` before making the request. +The output will always contain UTC timestamps, which will need to be converted again if a different timezone is required. + + +```python Python +from datetime import datetime +import pytz + +# Tokyo has a UTC+9 hours offset, so this is the same as +# 2017-01-01 02:45:25.679 UTC +tokyo_time = pytz.timezone('Asia/Tokyo').localize( + datetime(2017, 1, 1, 11, 45, 25, 679000) +) +print(tokyo_time) +data = collection.load(tokyo_time) +print(data) # time is in UTC since API always returns UTC timestamps +``` + + +```plaintext Output +2017-01-01 11:45:25.679000+09:00 + Size: 725B +Dimensions: (time: 1, latlon: 2) +Coordinates: + ingestion_time (time) datetime64[ns] 8B 2024-06-21T11:03:33.852435 + id (time) + Spatio-temporal datasets - including spatial filtering capabilities - are currently in development and not available yet. Stay tuned for updates! + + ## Fetching only metadata -Sometimes, it may be useful to load only the [time series metadata](/datasets/timeseries#common-fields) without the actual data fields. This can be done by setting the `skip_data` parameter to `True` when using `load`. Here's an example. +Sometimes, it may be useful to load only dataset metadata fields without the actual data fields. This can be done by setting the `skip_data` parameter to `True`. +For example, when only checking if a datapoint exists, you may want to use `skip_data=True` to avoid loading the data fields. +If this flag is set, the response will only include the required fields for the given dataset type, but no additional custom data fields. ```python Python @@ -206,7 +256,7 @@ Data variables: ## Empty response -The `load` method always returns an `xarray.Dataset` object, even if there are no data points for the specified time. In such cases, the returned dataset will be empty, but no error will be raised. +The `load` method always returns an `xarray.Dataset` object, even if there are no data points for the specified query. In such cases, the returned dataset will be empty, but no error will be raised. ```python Python @@ -223,44 +273,11 @@ Data variables: *empty* ``` -## Timezone handling - -When a `TimeScalar` is specified as a string, the time is treated as UTC. If you want to load data for a specific time in another timezone, use a `datetime` object. In this case, the Tilebox API will convert the datetime to `UTC` before making the request. The output will always contain UTC timestamps, which will need to be converted again if a different timezone is required. - - -```python Python -from datetime import datetime -import pytz - -# Tokyo has a UTC+9 hours offset, so this is the same as -# 2017-01-01 02:45:25.679 UTC -tokyo_time = pytz.timezone('Asia/Tokyo').localize( - datetime(2017, 1, 1, 11, 45, 25, 679000) -) -print(tokyo_time) -data = collection.load(tokyo_time) -print(data) # time is in UTC since API always returns UTC timestamps -``` - - -```plaintext Output -2017-01-01 11:45:25.679000+09:00 - Size: 725B -Dimensions: (time: 1, latlon: 2) -Coordinates: - ingestion_time (time) datetime64[ns] 8B 2024-06-21T11:03:33.852435 - id (time) ```python Python @@ -268,9 +285,14 @@ This method always returns a single data point or raises an exception if no data datapoint = collection.find(datapoint_id) print(datapoint) ``` + ```go Golang + fmt.Println("test") + ``` -```plaintext Output +
Output
+ +```plaintext Python Size: 725B Dimensions: (latlon: 2) Coordinates: @@ -284,6 +306,10 @@ Data variables: (12/30) satellite object 8B 'SENTINEL-1' ... ... ``` +```plaintext Golang +Test ... +``` + Since `find` returns only a single data point, the output dataset does not include a `time` dimension. @@ -291,7 +317,7 @@ Since `find` returns only a single data point, the output dataset does not inclu You can also set the `skip_data` parameter when calling `find` to load only the metadata of the data point, same as for `load`. -### Possible exceptions +### Possible errors - `NotFoundError`: raised if no data point with the given ID is found in the collection - `ValueError`: raised if the specified `datapoint_id` is not a valid UUID diff --git a/datasets/timeseries.mdx b/datasets/timeseries.mdx deleted file mode 100644 index dc025d0..0000000 --- a/datasets/timeseries.mdx +++ /dev/null @@ -1,70 +0,0 @@ ---- -title: Time series data -description: Learn about time series datasets -icon: timeline ---- - -Time series datasets act as containers for data points. All data points in a dataset share the same type and fields. - -Additionally, all time series datasets include a few [common fields](#common-fields). One of these fields, the `time` field, allows you to perform time-based data queries on a dataset. - -## Listing datasets - -You can use [your client instance](/datasets/introduction#creating-a-datasets-client) to access the datasets available to you. For example, to access the `sentinel1_sar` dataset in the `open_data.copernicus` dataset group, use the following code. - - -```python Python -from tilebox.datasets import Client - -client = Client() -datasets = client.datasets() -dataset = datasets.open_data.copernicus.sentinel1_sar -``` - - -Once you have your dataset object, you can use it to [list the available collections](/datasets/collections) for the dataset. - - - If you're using an IDE or an interactive environment with auto-complete, you can use it on your client instance to discover the datasets available to you. Type `client.` and trigger auto-complete after the dot to do so. - - -## Common fields - -While the specific data fields between different time series datasets can vary, there are common fields that all time series datasets share. - - - The timestamp associated with each data point. Tilebox uses millisecond precision for storing and indexing data points. Timestamps are always in UTC. - - - - A [universally unique identifier (UUID)](https://en.wikipedia.org/wiki/Universally_unique_identifier) that uniquely identifies each data point. IDs are generated so that sorting them lexicographically also sorts them by their time field. - - - - The time the data point was ingested into the Tilebox API. Timestamps are always in UTC. - - -These fields are present in all time series datasets. Together, they make up the metadata of a data point. Each dataset also has its own set of fields that are specific to that dataset. - - - Tilebox uses millisecond precision timestamps for storing and indexing data points. If multiple data points share the same timestamp within one millisecond, they will all display the same timestamp. Each data point can have any number of timestamp fields with a higher precision. For example, telemetry data commonly includes timestamp fields with nanosecond precision. - - -## Example data point - -Below is an example data point from a time series dataset represented as an [`xarray.Dataset`](/sdks/python/xarray). It contains the common fields. When using the Tilebox Python client library, you receive the data in this format. - -```plaintext Example timeseries datapoint - -Dimensions: () -Coordinates: - time datetime64[ns] 2023-03-12 16:45:23.532 - id - The datatype ` diff --git a/datasets/types/spatiotemporal.mdx b/datasets/types/spatiotemporal.mdx new file mode 100644 index 0000000..784e860 --- /dev/null +++ b/datasets/types/spatiotemporal.mdx @@ -0,0 +1,52 @@ +--- +title: Spatio-temporal +description: Spatio-temporal datasets link each data point to a specific point in time and a location on the Earth's surface. +icon: earth-europe +--- + + + Spatio-temporal datasets are currently in development and not available yet. Stay tuned for updates! + + +Each spatio-temporal dataset comes with a set of required and auto-generated fields for each data point. + +## Required fields + +While the specific data fields between different time series datasets can vary, there are common fields that all time series datasets share. + + + The timestamp associated with each data point. Timestamps are always in UTC. + + + + For indexing and querying, Tilebox truncates timestamps to millisecond precision. However, Timeseries datasets may contain arbitrary custom `Timestamp` fields that store timestamps up to a nanosecond precision. + + + + A location on the earth's surface associated with each data point. Supported geometry types are `Polygon`, `MultiPolygon`, `Point` and `MultiPoint`. + + + +## Auto-generated fields + + + A [universally unique identifier (UUID)](https://en.wikipedia.org/wiki/Universally_unique_identifier) that uniquely identifies each data point. IDs are generated so that sorting them lexicographically also sorts them by time. + + + + IDs generated by Tilebox are deterministic, meaning that ingesting the exact same data values into the same collection will always result in the same ID. + + + + The time the data point was ingested into the Tilebox API. + + +## Creating a spatio-temporal dataset + +To create a spatio-temporal dataset, use the [Tilebox Console](/console) and select `Spatio-temporal Dataset` as the dataset type. The required and auto-generated fields +outlined above will be automatically added to the dataset schema. + +## Spatio-temporal queries + +Spatio-temporal datasets support efficient time-based and spatially filtered queries. To query a specific location in a given time interval, +specify a time range and a geometry when [querying data points](/datasets/query) from a collection. diff --git a/datasets/types/timeseries.mdx b/datasets/types/timeseries.mdx new file mode 100644 index 0000000..5beca7b --- /dev/null +++ b/datasets/types/timeseries.mdx @@ -0,0 +1,43 @@ +--- +title: Timeseries +description: Timeseries datasets link each data point to a specific point in time. +icon: diamonds-4 +--- + +Each timeseries dataset comes with a set of required and auto-generated fields for each data point. + +## Required fields + +While the specific data fields between different time series datasets can vary, there are common fields that all time series datasets share. + + + The timestamp associated with each data point. Timestamps are always in UTC. + + + + For indexing and querying, Tilebox truncates timestamps to millisecond precision. However, Timeseries datasets may contain arbitrary custom `Timestamp` fields that store timestamps up to a nanosecond precision. + + + +## Auto-generated fields + + + A [universally unique identifier (UUID)](https://en.wikipedia.org/wiki/Universally_unique_identifier) that uniquely identifies each data point. IDs are generated so that sorting them lexicographically also sorts them by time. + + + + IDs generated by Tilebox are deterministic, meaning that ingesting the exact same data values into the same collection will always result in the same ID. + + + + The time the data point was ingested into the Tilebox API. + + +## Creating a timeseries dataset + +To create a timeseries dataset, use the [Tilebox Console](/console) and select `Timeseries Dataset` as the dataset type. The required and auto-generated fields +outlined above will be automatically added to the dataset schema. + +## Time-based queries + +Timeseries datasets support time-based queries. To query a specific time interval, specify a time range when [querying data](/datasets/query) from a collection. diff --git a/datasets/managed-datasets/creating-dataset.mdx b/guides/datasets/create.mdx similarity index 59% rename from datasets/managed-datasets/creating-dataset.mdx rename to guides/datasets/create.mdx index 027242c..705f728 100644 --- a/datasets/managed-datasets/creating-dataset.mdx +++ b/guides/datasets/create.mdx @@ -13,16 +13,14 @@ This page guides you through the process of creating a dataset in Tilebox. Create a dataset in Tilebox by going to [My datasets](https://console.tilebox.com/datasets/my-datasets) and clicking the "Create dataset" button. - Choose a dataset kind from the dropdown menu. Prefilled fields for the selected dataset kind are automatically added. - Common fields for time series can be found [here](/datasets/timeseries#common-fields). - Support for STAC-compatible datasets is coming soon. + Choose a [dataset kind](/datasets/concepts/datasets#dataset-types) from the dropdown menu. Required fields for the selected dataset kind are automatically added. Tilebox Console Tilebox Console - + Complete these fields: - `Name` is the name of your dataset. @@ -34,8 +32,8 @@ This page guides you through the process of creating a dataset in Tilebox. Tilebox Console - - Common fields for each dataset kind are added automatically. Click "Add field" to add more fields. + + Specify the fields for your dataset. Each field has these properties: - `Name` is the name of the field (it should be snake_case). @@ -53,41 +51,32 @@ This page guides you through the process of creating a dataset in Tilebox. -The next steps are to create at least one collection and ingest data into it. +## Automatic dataset schema documentation + +By specifying the fields for your dataset, including the data type, description and an example value for each one, Tilebox +is capable of automatically generating a documentation page for your dataset schema. -It might then resemble this: - Tilebox Console - Tilebox Console + Dataset schema overview + Dataset schema overview -## Editing a dataset - -You can edit the dataset by clicking the edit pencil button on the dataset overview page. - -On this page, you can change the dataset name, description, add new fields, and edit both description and example for existing fields. - - - If you don't see the edit pencil button, you don't have the required permissions to edit the - documentation. - - -## Adding documentation +## Adding additional documentation -In addition to the auto-generated overview page, you can add custom documentation to your dataset. -This documentation is displayed on the dataset page and can be used to provide more context and details about the data. +You can also add custom documentation to your dataset, providing more context and details about the data included data. +This documentation supports rich formatting, including links, tables, code snippets, and more. - Tilebox Console - Tilebox Console + Tilebox Console + Tilebox Console To add documentation, click the edit pencil button on the dataset page to open the documentation editor. You can use Markdown to format your documentation; you can include links, tables, code snippets, and other Markdown features. - If you don't see the edit pencil button, you don't have the required permissions to edit the - documentation. + If you don't see the edit pencil button, you don't have the required permissions to edit the + documentation. Once you are done editing the documentation, click the `Save` button to save your changes. diff --git a/guides/datasets/ingest.mdx b/guides/datasets/ingest.mdx new file mode 100644 index 0000000..7dc4c03 --- /dev/null +++ b/guides/datasets/ingest.mdx @@ -0,0 +1,235 @@ +--- +title: Ingesting data +description: Learn how to ingest data into a Tilebox dataset +icon: up-from-bracket +--- + +import { CodeOutputHeader } from '/snippets/components.mdx'; + +This page guides you through the process of ingesting data into a Tilebox dataset. Starting from an existing +dataset available as file in the [GeoParquet](https://geoparquet.org/) format, we'll walk you through the process of +ingestion that data into Tilebox as a [Timeseries](/datasets/types/timeseries) dataset. + + + + This example is also available as a Google Colab notebook. Click here to navigate to an interactive example. + + + +## Downloading the example dataset + +The dataset used in this example is available as a [GeoParquet](https://geoparquet.org/) file. You can download it +from here: [modis_MCD12Q1.geoparquet](https://storage.googleapis.com/tbx-web-assets-2bad228/docs/data-samples/modis_MCD12Q1.geoparquet). + +## Installing the necessary packages + +This example uses a couple of python packages for reading parquet files and for visualizing the dataset. Install the +required packages using your preferred package manager. For new projects, we recommend using [uv](https://docs.astral.sh/uv/). + + +```bash uv +uv add tilebox-datasets geopandas folium matplotlib mapclassify +``` +```bash pip +pip install tilebox-datasets geopandas folium matplotlib mapclassify +``` +```bash poetry +poetry add tilebox-datasets="*" geopandas="*" folium="*" matplotlib="*" mapclassify="*" +``` +```bash pipenv +pipenv install tilebox-datasets geopandas folium matplotlib mapclassify +``` + + +## Reading and previewing the dataset + +The dataset is available as a [GeoParquet](https://geoparquet.org/) file. You can read it using the `geopandas.read_parquet` function. + + +```python Python +import geopandas as gpd + +modis_data = gpd.read_parquet("modis_MCD12Q1.geoparquet") +modis_data.head(5) +``` + + + +```plaintext Python + time end_time granule_name geometry horizontal_tile_number vertical_tile_number tile_id file_size checksum checksum_type day_night_flag browse_granule_id published_at +0 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h00v08.061.2022146024956.hdf POLYGON ((-180 10, -180 0, -170 0, -172.62252 ... 0 8 51000008 275957 941243048 CKSUM Day None 2022-06-23 10:54:43.824000+00:00 +1 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h00v09.061.2022146024922.hdf POLYGON ((-180 0, -180 -10, -172.62252 -10, -1... 0 9 51000009 285389 3014510714 CKSUM Day None 2022-06-23 10:54:44.697000+00:00 +2 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h00v10.061.2022146032851.hdf POLYGON ((-180 -10, -180 -20, -180 -20, -172.6... 0 10 51000010 358728 2908215698 CKSUM Day None 2022-06-23 10:54:44.669000+00:00 +3 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h01v08.061.2022146025203.hdf POLYGON ((-172.62252 10, -170 0, -160 0, -162.... 1 8 51001008 146979 1397661843 CKSUM Day None 2022-06-23 10:54:44.309000+00:00 +4 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h01v09.061.2022146025902.hdf POLYGON ((-170 0, -172.62252 -10, -162.46826 -... 1 9 51001009 148935 2314263965 CKSUM Day None 2022-06-23 10:54:44.023000+00:00 +``` + + +## Exploring it visually + +Geopandas comes with a built in explorer to visually explore the dataset. + + +```python Python +modis_data.head(1000).explore(width=800, height=600) +``` + + + + Explore the MODIS dataset + Explore the MODIS dataset + + +## Create a Tilebox dataset + +Now we'll create a [Timeseries](/datasets/types/timeseries) dataset with the same schema as the given MODIS dataset. +To do so, we'll use the [Tilebox Console](/console), navigate to `My Datasets` and click `Create Dataset`. We then select +`Timeseries Dataset` as the dataset type. + + + For more information on creating a dataset, check out the [Creating a dataset](/guides/datasets/create) guide for a + Step by step guide. + + +Now, to match the given MODIS dataset, we'll specify the following fields: + +| Field | Type | Note | +| --- | --- | --- | +| granule_name | string | MODIS granule name | +| geometry | Geometry | Tile boundary coords of the granule | +| end_time | Timestamp | Measurement end time | +| horizontal_tile_number | int64 | Horizontal modis tile number (0-35) | +| vertical_tile_number | int64 | Vertical modis tile number (0-17) | +| tile_id | int64 | Modis Tile ID | +| file_size | uint64 | File size of the product in bytes | +| checksum | string | Hash checksum of the file | +| checksum_type | string | Checksum algorithm (MD5 / CKSUM) | +| day_night_flag | int64 | Day / Night / Both | +| browse_granule_id | string | Optional granule ID for browsing | +| published_at | Timestamp | The time the product was published | + +In the console, this will look like the following: + + + Tilebox Console + Tilebox Console + + +## Access the dataset from Python + +Our newly created dataset is now available. Let's access it from Python. For this, we'll need to know the dataset slug, +which was assigned automatically based on the specified `code_name`. To find out the slug, navigate to the dataset overview +in the console. + + + Explore the MODIS dataset + Explore the MODIS dataset + + +We can now instantiate the dataset client and access the dataset. + + +```python Python +from tilebox.datasets import Client + +client = Client() +dataset = client.dataset("tilebox.modis") # replace with your dataset slug +``` + + +## Create a collection + +Next, we'll create a collection to insert our data into. + + +```python Python +collection = dataset.get_or_create_collection("MCD12Q1") +``` + + +## Ingest the data + +Now, we'll finally ingest the MODIS data into the collection. + + +```python Python +datapoint_ids = collection.ingest(modis_data) +print(f"Successfully ingested {len(datapoint_ids)} datapoints!") +``` + + + +```plaintext Python +Successfully ingested 7245 datapoints! +``` + + +## Query the newly ingested data + +We can now query the newly ingested data. Let's query a subset of the data for a specific time range. + + + Since the data is now stored directly in the Tilebox dataset, you can query and access it from anywhere. + + + +```python Python +data = collection.load(("2015-01-01", "2020-01-01")) +data +``` + + + +```plaintext Python + Size: 403kB +Dimensions: (time: 1575) +Coordinates: + * time (time) datetime64[ns] 13kB 2015-01-01 ... 2019-01-01 +Data variables: (12/14) + id (time) + + + For more information on accessing and querying data, check out [querying data](/datasets/query). + + + +## View the data in the console + +You can also view your data in the Console, by navigate to the dataset, selecting the collection and then clicking +on one of the data points. + + + Explore the MODIS dataset + Explore the MODIS dataset + + +## Next steps + +Congrats! You've successfully ingested data into Tilebox. You can now explore the data in the console and use it for +further processing and analysis. + + + + Learn all about [querying your newly created dataset](https://docs.tilebox.com/datasets/query) + + + Explore the different dataset types available in Tilebox + + + Check out a growing number of publicly available open data datasets on Tilebox + + diff --git a/introduction.mdx b/introduction.mdx index 76e05e2..1b557a7 100644 --- a/introduction.mdx +++ b/introduction.mdx @@ -70,9 +70,9 @@ You can also start by looking through these guides: Learn about time-series datasets and their structure. Discover how to query and load data from a dataset. diff --git a/mint.json b/mint.json index 7178c93..2fe2d48 100644 --- a/mint.json +++ b/mint.json @@ -38,6 +38,14 @@ "url": "https://console.tilebox.com" }, "tabs": [ + { + "name": "User Guides", + "url": "guides" + }, + { + "name": "Languages & SDKs", + "url": "sdks" + }, { "name": "API Reference", "url": "api-reference" @@ -75,46 +83,35 @@ ] }, { - "group": "SDKs", + "group": "Datasets", "pages": [ + "datasets/introduction", { - "group": "Python", - "icon": "python", + "group": "Concepts", + "icon": "circle-nodes", "pages": [ - "sdks/python/install", - "sdks/python/sample-notebooks", - "sdks/python/xarray", - "sdks/python/async", - "sdks/python/geometries" + "datasets/concepts/datasets", + "datasets/concepts/collections" ] }, { - "group": "Go", - "icon": "golang", + "group": "Dataset Types", + "icon": "puzzle", "pages": [ - "sdks/go/introduction" + "datasets/types/timeseries", + "datasets/types/spatiotemporal" ] - } + }, + "datasets/query", + "datasets/ingest", + "datasets/delete", + "datasets/open-data" ] }, { - "group": "Datasets", + "group": "Storage", "pages": [ - "datasets/introduction", - "datasets/timeseries", - "datasets/collections", - "datasets/loading-data", - "datasets/ingest-delete-data", - { - "group": "Managed datasets", - "icon": "bars-progress", - "pages": [ - "datasets/managed-datasets/creating-dataset", - "datasets/managed-datasets/data-types" - ] - }, - "datasets/open-data", - "datasets/storage-clients" + "storage/clients" ] }, { @@ -152,6 +149,36 @@ } ] }, + { + "group": "Tilebox SDKs", + "pages": [ + "sdks/introduction" + ] + }, + { + "group": "Python", + "pages": [ + "sdks/python/install", + "sdks/python/sample-notebooks", + "sdks/python/xarray", + "sdks/python/async", + "sdks/python/geometries" + ] + }, + { + "group": "Go", + "icon": "golang", + "pages": [ + "sdks/go/introduction" + ] + }, + { + "group": "Datasets", + "pages": [ + "guides/datasets/create", + "guides/datasets/ingest" + ] + }, { "group": "tilebox.datasets", "pages": [ @@ -159,14 +186,14 @@ "api-reference/tilebox.datasets/Client.datasets", "api-reference/tilebox.datasets/Client.dataset", "api-reference/tilebox.datasets/Dataset.collections", + "api-reference/tilebox.datasets/Dataset.collection", "api-reference/tilebox.datasets/Dataset.create_collection", "api-reference/tilebox.datasets/Dataset.get_or_create_collection", - "api-reference/tilebox.datasets/Dataset.collection", - "api-reference/tilebox.datasets/Collection.info", - "api-reference/tilebox.datasets/Collection.load", - "api-reference/tilebox.datasets/Collection.find", "api-reference/tilebox.datasets/Collection.delete", - "api-reference/tilebox.datasets/Collection.delete_ids" + "api-reference/tilebox.datasets/Collection.find", + "api-reference/tilebox.datasets/Collection.info", + "api-reference/tilebox.datasets/Collection.ingest", + "api-reference/tilebox.datasets/Collection.load" ] }, { diff --git a/quickstart.mdx b/quickstart.mdx index 31a4950..0cc51fb 100644 --- a/quickstart.mdx +++ b/quickstart.mdx @@ -18,11 +18,26 @@ If you prefer to work locally, follow these steps to get started. - Install the Tilebox Python packages. The easiest way to do this is using `pip`: + Install the Tilebox Python packages. + + ```bash uv + uv add tilebox-datasets tilebox-workflows tilebox-storage ``` + ```bash pip pip install tilebox-datasets tilebox-workflows tilebox-storage ``` + ```bash poetry + poetry add tilebox-datasets="*" tilebox-workflows="*" tilebox-storage="*" + ``` + ```bash pipenv + pipenv install tilebox-datasets tilebox-workflows tilebox-storage + ``` + + + + For new projects we recommend using [uv](https://docs.astral.sh/uv/). More information about installing the Tilebox Python SDKs can be found in the [Installation](/sdks/python/install) section. + Create an API key by logging into the [Tilebox Console](https://console.tilebox.com), navigating to [Account -> API Keys](https://console.tilebox.com/account/api-keys), and clicking the "Create API Key" button. @@ -98,18 +113,12 @@ If you prefer to work locally, follow these steps to get started. Review the following guides to learn more about the modules that make up Tilebox: - - + + Learn how to create a Timeseries dataset using the Tilebox Console. + + + Learn how to ingest an existing CSV dataset into a Timeseries dataset collection. + diff --git a/sdks/go/introduction.mdx b/sdks/go/introduction.mdx index 3e5ec9a..8cf4678 100644 --- a/sdks/go/introduction.mdx +++ b/sdks/go/introduction.mdx @@ -1,7 +1,9 @@ --- title: Introduction -description: Learn about the Tilebox GO SDK +description: Learn about the Tilebox Go SDK icon: wrench --- -Hang tight - Go support for Tilebox is coming soon. + + The Tilebox Go SDK is currently in development. Stay tuned for updates! + diff --git a/sdks/introduction.mdx b/sdks/introduction.mdx new file mode 100644 index 0000000..a2b082c --- /dev/null +++ b/sdks/introduction.mdx @@ -0,0 +1,27 @@ +--- +title: Tilebox languages and SDKs +sidebarTitle: Overview +description: Tilebox supports multiple languages and SDKs for accessing datasets and running workflows. +icon: book-open +--- + +import { HeroCard } from '/snippets/components.mdx'; + +The following language SDKs are currently available for Tilebox. Select one to learn more. + + + + Tilebox Python + + + Tilebox Go + + diff --git a/sdks/python/async.mdx b/sdks/python/async.mdx index b80b6e2..eb2c3a9 100644 --- a/sdks/python/async.mdx +++ b/sdks/python/async.mdx @@ -6,7 +6,7 @@ icon: rotate ## Why use async? -When working with external datasets, such as [Tilebox datasets](/datasets/timeseries), loading data may take some time. To speed up this process, you can run requests in parallel. While you can use multi-threading or multi-processing, which can be complex, often times a simpler option is to perform data loading tasks asynchronously using coroutines and `asyncio`. +When working with external datasets, such as [Tilebox datasets](/datasets/concepts/datasets), loading data may take some time. To speed up this process, you can run requests in parallel. While you can use multi-threading or multi-processing, which can be complex, often times a simpler option is to perform data loading tasks asynchronously using coroutines and `asyncio`. ## Switching to an async datasets client diff --git a/sdks/python/install.mdx b/sdks/python/install.mdx index 9a01583..2761838 100644 --- a/sdks/python/install.mdx +++ b/sdks/python/install.mdx @@ -19,15 +19,19 @@ Tilebox offers a Python SDK for accessing Tilebox services. The SDK includes sep ## Installation -Install the Tilebox python packages using your preferred package manager: +Install the Tilebox python packages using your preferred package manager. + + + For new projects we recommend using [uv](https://docs.astral.sh/uv/). + -```bash pip -pip install tilebox-datasets tilebox-workflows tilebox-storage -``` ```bash uv uv add tilebox-datasets tilebox-workflows tilebox-storage ``` +```bash pip +pip install tilebox-datasets tilebox-workflows tilebox-storage +``` ```bash poetry poetry add tilebox-datasets="*" tilebox-workflows="*" tilebox-storage="*" ``` diff --git a/sdks/python/xarray.mdx b/sdks/python/xarray.mdx index 416b3e9..eea7ea3 100644 --- a/sdks/python/xarray.mdx +++ b/sdks/python/xarray.mdx @@ -35,7 +35,7 @@ The Tilebox Python client provides access to satellite data as an [xarray.Datase ## Example dataset -To understand how Xarray functions, below is a quick a look at a sample dataset as it might be retrieved from a [Tilebox datasets](/datasets/timeseries) client. +To understand how Xarray functions, below is a quick a look at a sample dataset as it might be retrieved from a [Tilebox datasets](/datasets/concepts/datasets) client. diff --git a/snippets/components.mdx b/snippets/components.mdx index 3c5fe3a..53267c8 100644 --- a/snippets/components.mdx +++ b/snippets/components.mdx @@ -10,3 +10,13 @@ export const HeroCard = ({ children, title, description, href }) => { ); }; + + +export const CodeOutputHeader = () => { + return ( +
+ + Output +
+ ) +} diff --git a/datasets/storage-clients.mdx b/storage/clients.mdx similarity index 98% rename from datasets/storage-clients.mdx rename to storage/clients.mdx index 8bf809c..62a5fc5 100644 --- a/datasets/storage-clients.mdx +++ b/storage/clients.mdx @@ -4,7 +4,7 @@ description: Learn about the different storage clients available in Tilebox to a icon: hard-drive --- -Tilebox does not host the actual open data satellite products but instead relies on publicly accessible storage providers for data access. Instead Tilebox ingests available metadata as [datasets](/datasets/timeseries) to enable high performance querying and structured access of the data as [xarray.Dataset](/sdks/python/xarray). +Tilebox does not host the actual open data satellite products but instead relies on publicly accessible storage providers for data access. Instead Tilebox ingests available metadata as [datasets](/datasets/concepts/datasets) to enable high performance querying and structured access of the data as [xarray.Dataset](/sdks/python/xarray). Below is a list of the storage providers currently supported by Tilebox.