# How to use Pandas and GeoPandas A `GeoPandasAdapter` is provided in package *here-geopandas-adapter* to ease working with *Pandas* and *GeoPandas* libraries. Once imported, instantiated and enabled in the `Platform`, many read and write functions of the HERE Data SDK for Python accept and return `pd.DataFrame`, `pd.GeoSeries`, `gpd.GeoDataFrame` and `gpd.GeoSeries` in place of Python `list` and `dict` objects. * [Enabling the adapter](here-geopandas-adapter.md#enabling-the-adapter) * [Read to DataFrame](here-geopandas-adapter.md#read-to-dataframe) * [Write DataFrame to layer](https://docs.here.com/data-sdk/docs/here-platform-layers#write-dataframe-to-layer) * [Manipulate DataFrames](here-geopandas-adapter.md#manipulate-dataframes-and-geodataframes)
The HERE GeoPandas Adapter can be applied in any of three ways: * to all read/write operations * on a per-catalog basis * on a per-function-call basis Below we illustrate these three options. To have the adapter apply to all catalogs and other entities created through a `Platform` object you can specify adapter when instantiating that `Platform` object: ```python from here.platform import Platform from here.geopandas_adapter import GeoPandasAdapter platform = Platform(adapter=GeoPandasAdapter()) ``` It's also possible to enable the adapter only for selected catalogs, specifying it in the corresponding `get_catalog` call: ```python from here.platform import Platform from here.geopandas_adapter import GeoPandasAdapter platform = Platform() adapter = GeoPandasAdapter() # These catalogs use the adapter weather_eu = platform.get_catalog('hrn:here:data::olp-here:live-weather-eu', adapter=adapter) weather_na = platform.get_catalog('hrn:here:data::olp-here:live-weather-na', adapter=adapter) # This catalog does not sdii = platform.get_catalog("hrn:here:data::olp-here:olp-sdii-sample-berlin-2") ``` Lastly, it's also possible to specify the use of the adapter in single functions: ```python from here.platform import Platform from here.geopandas_adapter import GeoPandasAdapter platform = Platform() adapter = GeoPandasAdapter() weather_na = platform.get_catalog('hrn:here:data::olp-here:live-weather-na') live_layer = weather_na.get_layer('latest-data') # This function uses the adapter weather_df = live_layer.read_partitions([75477, 75648, 75391, 75562], adapter=adapter, record_path="weather_condition_tile") # This function does not weather_msgs = live_layer.read_partitions([75477, 75648, 75391, 75562]) ```


When reading and decoding data, parameters that are adapter-specific are passed to the `pd.read_csv`, `pd.read_parquet` and similar *Pandas* functions that perform the actual decoding of each single partition. You can use them to fine-tune the details of the decoding of single partitions, including how to handle the (Geo)DataFrame index, if present in the data.


Use [get\_partitions\_metadata](https://developer.here.com/documentation/sdk-python-v2/api_reference/here.platform.layer.html#here.platform.layer.VersionedLayer.get_partitions_metadata) to obtain partitions metadata. When the `GeoPandasAdapter` is enabled, a `pd.DataFrame` is returned instead of a `list` or `dict` as shown in the example below. **Example: getting versioned metadata in a DataFrame** ```python from here.platform import Platform from here.geopandas_adapter import GeoPandasAdapter platform = Platform(adapter=GeoPandasAdapter()) sdii_catalog = platform.get_catalog("hrn:here:data::olp-here:olp-sdii-sample-berlin-2") versioned_layer = sdii_catalog.get_layer("sample-versioned-layer") partitions_df = versioned_layer.get_partitions_metadata([377894434, 377894435, 377894440, 377894441]) ``` Partitions metadata are returned in a DataFrame that is not indexed. | | id | data\_handle | checksum | data\_size | crc | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | --------: | :----------------------------------- | :------- | ---------: | :-- | | 0 | 377894434 | e2eefcae-e695-4f98-8a55-6881ca1ef52d | | 7697 | | | 1 | 377894435 | da494218-e5b9-4538-9860-624864a718a7 | | 11963 | | | 2 | 377894440 | ef395fe1-51b4-4909-bd3c-3883d88d66b3 | | 569494 | | | 3 | 377894441 | a5e1f634-7fbb-43f6-bbdb-7e91edc67879 | | 342066 | | | Use [read\_partitions](https://developer.here.com/documentation/sdk-python-v2/api_reference/here.platform.layer.html#here.platform.layer.VersionedLayer.read_partitions) to fetch and decode the data. When the `GeoPandasAdapter` is enabled, a `pd.DataFrame` or a `gpd.GeoDataFrame`, depending on the content, is returned instead of a `list` or `dict` as shown in the example below. | | | | | | | **Example: reading versioned data in a DataFrame** | | | | | | ```python partitions_df = versioned_layer.read_partitions(partition_ids=[377894434, 377894435]) ``` Partitions data are returned in a DataFrame that is not indexed. Only one `pd.DataFrame` or `gpd.GeoDataFrame` is returned. Data of multiple partitions are all included in the same output. A `partition_id` column is added to disambiguate. The name of the columns depends on the content type, schema and actual content of the layer. If no `partition_ids` are provided, the whole layer is read. This specific example reads content encoded in *Protobuf* format. | | partition\_id | tileId | messages | refs | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | ------------: | --------: | :--------------------------------------------------------------------------------------------------------------------------- | :--- | | 0 | 377894434 | 377894434 | \[\{'messageId': 'ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64', 'message': \{'envelope': \{'version': '1.0', 'submitter': 'Probe Ro | \[] | | 1 | 377894435 | 377894435 | \[\{'messageId': '4418dfe4-091e-41fe-bb21-49d6524442af', 'message': \{'envelope': \{'version': '1.0', 'submitter': 'Probe Ro | \[] | | *(text truncated for clarity)* | | | | | | Depending on the content type and actual schema, the returned DataFrame may be directly usable or require further manipulation to bring it to a usable form. *CSV*, *GeoJSON*, *Parquet* and schemaless content types are decoded and converted to the best possible format for the user automatically. For example, *GeoJSON* is decoded into a `gpd.GeoDataFrame`. *Protobuf*-encoded data usually have nested, composite and repeated fields, lists, dictionaries, and other complex data structures. | | | | | | Documentation of [GeoPandasDecoder](https://developer.here.com/documentation/sdk-python-v2/api_reference/here.geopandas_adapter.geopandas_adapter.html#here.geopandas_adapter.geopandas_adapter.GeoPandasDecoder) illustrates parameters that can be used to fine-tune the decoding and improve the resulting output for every content type, but in particular for *Protobuf*-encoded data. Very common is the `record_path` parameter: when specified, only content in that path is decoded. If the field at the given path happens to be a repeated field, the function returns multiple rows per partition. Dictionaries are also unpacked automatically to multiple columns, when possible. | | | | | | Continuing the example above, we read again the same partitions, specifying the `record_path` parameter and selecting only some columns for clarity: | | | | | ```python columns = ["messageId", "message.envelope.transientVehicleUUID", "message.path.positionEstimate", "metadata.receivedTime"] messages_df = versioned_layer.read_partitions(partition_ids=[377894434, 377894435], record_path="messages", columns=columns) ``` results in: | | partition\_id | messageId | message.envelope.transientVehicleUUID | message.path.positionEstimate | metadata.receivedTime | | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | ------------: | :----------------------------------- | :------------------------------------ | :---------------------------------------------------- | --------------------: | | 0 | 377894434 | ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64 | ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64 | \[\{'timeStampUTC\_ms': '1506403044000', 'positionTyp | 1507151512491 | | 1 | 377894434 | eaa76f08-ed02-4893-b524-9bde9296b9f9 | eaa76f08-ed02-4893-b524-9bde9296b9f9 | \[\{'timeStampUTC\_ms': '1506402922000', 'positionTyp | 1507151512491 | | 2 | 377894434 | a86fb17f-27a6-4e47-b2fb-77ec61000625 | a86fb17f-27a6-4e47-b2fb-77ec61000625 | \[\{'timeStampUTC\_ms': '1506403015000', 'positionTyp | 1507151512491 | | 3 | 377894434 | 79bba846-b804-4026-a980-7d4045e7a493 | 79bba846-b804-4026-a980-7d4045e7a493 | \[\{'timeStampUTC\_ms': '1506403037000', 'positionTyp | 1507151512491 | | 4 | 377894434 | cc71d131-e8ed-4269-b1d1-d9c4c3108408 | cc71d131-e8ed-4269-b1d1-d9c4c3108408 | \[\{'timeStampUTC\_ms': '1506402944000', 'positionTyp | 1507151512492 | | *(text and rows truncated for clarity)* | | | | | | | The `partition_id` columns is always added automatically after decoding. | | | | | | | The column `message.path.positionEstimate` contains a list, that can be further processed, turning the DataFrame from having one row per message to one row per position estimate: | | | | | | ```python from here.geopandas_adapter.utils.dataframe import unpack_columns estimates_df = messages_df[["messageId", "message.path.positionEstimate"]].explode("message.path.positionEstimate") estimates_df = unpack_columns(estimates_df, "message.path.positionEstimate", keep_prefix=False) ``` results in: | | messageId | timeStampUTC\_ms | positionType | longitude\_deg | latitude\_deg | horizontalAccuracy\_m | heading\_deg | speed\_mps | mapMatchedLinkID | mapMatchedLinkIDOffset\_m | | -----------------------------------------: | :----------------------------------- | ---------------: | :----------- | -------------: | ------------: | --------------------: | -----------: | ---------: | ---------------: | ------------------------: | | 0 | ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64 | 1506403044000 | RAW\_GPS | 13.3611 | 52.5099 | 0 | 90.8589 | 16 | 175536727 | 0 | | 0 | ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64 | 1506403046000 | RAW\_GPS | 13.3616 | 52.5099 | 0 | 91.4001 | 16 | 175536727 | 32 | | 0 | ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64 | 1506403048000 | RAW\_GPS | 13.3621 | 52.5098 | 0 | 91.5694 | 16 | 175536727 | 64 | | 0 | ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64 | 1506403050000 | RAW\_GPS | 13.3625 | 52.5098 | 0 | 91.5694 | 16 | 175536727 | 92.1063 | | 1 | eaa76f08-ed02-4893-b524-9bde9296b9f9 | 1506402922000 | RAW\_GPS | 13.3731 | 52.5092 | 0 | 85.7321 | 16 | 180105322 | 0 | | *(columns and rows truncated for clarity)* | | | | | | | | | | | ### Get partitions data and metadata from volatile layer in a DataFrame Use [get\_partitions\_metadata](https://developer.here.com/documentation/sdk-python-v2/api_reference/here.platform.layer.html#here.platform.layer.VolatileLayer.get_partitions_metadata) to obtain partitions metadata. When the `GeoPandasAdapter` is enabled, a `pd.DataFrame` is returned instead of a `list` or `dict` as shown in the example below. **Example: getting volatile metadata in a DataFrame** ```python from here.platform import Platform from here.geopandas_adapter import GeoPandasAdapter platform = Platform(adapter=GeoPandasAdapter()) weather_catalog = platform.get_catalog('hrn:here:data::olp-here:live-weather-eu') volatile_layer = weather_catalog.get_layer('latest-data') partitions_df = volatile_layer.get_partitions_metadata(partition_ids=[81150, 81151]) ``` Partitions metadata are returned in a DataFrame that is not indexed. | | id | data\_handle | checksum | data\_size | crc | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | ----: | -----------: | :------- | :--------- | :-- | | 0 | 81150 | 81150 | | | | | 1 | 81151 | 81151 | | | | | Use read\_partitions to fetch and decode the data. When the `GeoPandasAdapter` is enabled, a `pd.DataFrame` or a `gpd.GeoDataFrame`, depending on the content, is returned instead of a `list` or `dict` as shown in the example below. | | | | | | Note Volatile metadata and underlying data can occasionally be out of sync. When this occurs, metadata may indicate that data exists in a given partition but at the current point in time there is no data residing there. In the event you call `read_partitions` and one or more of the requested partitions do not exist or contain no data, no rows will be added to the returned DataFrame for that partition. This could result in an empty DataFrame being returned. **Example: reading volatile data in a DataFrame** ```python partitions_df = volatile_layer.read_partitions(partition_ids=[81150, 81151], record_path="weather_condition_tile") ``` Partitions data are returned in a DataFrame that is not indexed. Only one `pd.DataFrame` or `gpd.GeoDataFrame` is returned. Data of multiple partitions are all included in the same output. A `partition_id` column is added to disambiguate. The name of the columns depends on the content type, schema and actual content of the layer. If no `partition_ids` are provided, the whole layer is read. This specific example reads content encoded in *Protobuf* format. ```python columns = ["tile_id", "center_point_geohash", "air_temperature.value", "dew_point_temperature.value", "humidity.value", "air_pressure.value", "visibility.value", "iop.value", "wind_velocity.value", "wind_velocity.direction", "precipitation_type.precipitation_type"] partitions_df = volatile_layer.read_partitions(partition_ids=[81150, 81151], record_path="weather_condition_tile", columns=columns) ``` In this example we select only some columns obtained from the *Protobuf* repeated field `weather_condition_tile`, resulting in: | | partition\_id | tile\_id | center\_point\_geohash | air\_temperature.value | dew\_point\_temperature.value | humidity.value | air\_pressure.value | visibility.value | iop.value | wind\_velocity.value | wind\_velocity.direction | precipitation\_type.precipitation\_type | | -----------------------------: | ------------: | --------: | :--------------------- | ---------------------: | ----------------------------: | -------------: | ------------------: | ---------------: | --------: | -------------------: | -----------------------: | :-------------------------------------- | | 0 | 81150 | 332391761 | g7ybnf00 | 4.83 | 2 | 82.09 | 1003.09 | 9.99 | 0 | 33.5 | 22.81 | NONE | | 1 | 81150 | 332391760 | g7ybn600 | 4.84 | 2 | 82.04 | 1003.08 | 9.99 | 0 | 33.47 | 22.73 | NONE | | 2 | 81150 | 332391767 | g7ybpy00 | 4.8 | 2 | 82.26 | 1003.12 | 9.99 | 0 | 33.62 | 23.09 | NONE | | 3 | 81150 | 332391765 | g7ybpf00 | 4.81 | 2 | 82.18 | 1003.11 | 9.99 | 0 | 33.57 | 22.97 | NONE | | 4 | 81150 | 332391764 | g7ybp600 | 4.82 | 2 | 82.14 | 1003.1 | 9.99 | 0 | 33.53 | 22.89 | NONE | | *(rows truncated for clarity)* | | | | | | | | | | | | | ### Get partitions data and metadata from index layer in a DataFrame Use get\_partitions\_metadata to obtain partitions metadata. When the `GeoPandasAdapter` is enabled, a `pd.DataFrame` is returned instead of a `list` or `dict` as shown in the example below. **Example: getting index metadata in a DataFrame** ```python from here.platform import Platform from here.geopandas_adapter import GeoPandasAdapter platform = Platform(adapter=GeoPandasAdapter()) sdii_catalog = platform.get_catalog("hrn:here:data::olp-here:olp-sdii-sample-berlin-2") index_layer = sdii_catalog.get_layer("sample-index-layer") partitions_df = index_layer.get_partitions_metadata(query="hour_from=ge=10") ``` Partitions metadata are returned in a DataFrame that is not indexed. The data handle is used in place of partition id, since the index layer doesn't have a proper identifier for partitions. | | id | data\_handle | checksum | data\_size | crc | | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------- | :----------------------------------- | :--------------------------------------- | ---------: | :-- | | 0 | 1d63cfb6-5b79-455a-8fda-1503b99253e3 | 1d63cfb6-5b79-455a-8fda-1503b99253e3 | 0353f45622ac843ccabbc8af4ce6739d5baf171a | 290391 | | | 1 | 1f9c8d0a-2519-4cd8-af4a-0fd0fa16b047 | 1f9c8d0a-2519-4cd8-af4a-0fd0fa16b047 | 1a1472a4de647291da7498407b59a2011af6c25c | 113261 | | | 2 | 2f9c978d-b6bc-4889-b7d4-a47849fb6a17 | 2f9c978d-b6bc-4889-b7d4-a47849fb6a17 | 74b94f931c3bda3a7500eadaf34506445c0a10ba | 356674 | | | 3 | 2fed9456-7275-4786-b600-0c4865854b79 | 2fed9456-7275-4786-b600-0c4865854b79 | ad68c63881bfeae3635d64270df4e13202049f54 | 115175 | | | 4 | 3b0c053b-8988-4621-92d7-9daf65e7d4a7 | 3b0c053b-8988-4621-92d7-9daf65e7d4a7 | e7aca6afb0a37ed46d9e11a8c2ed73afa9eae1d0 | 114945 | | | Use read\_partitions to fetch and decode the data. When the `GeoPandasAdapter` is enabled, a `pd.DataFrame` or a `gpd.GeoDataFrame`, depending on the content, is returned instead of a `list` or `dict` as shown in the example below. If no `partition_ids` are provided, the whole layer is read. | | | | | | | **Example: reading index data in a DataFrame** | | | | | | ```python partitions_df = index_layer.read_partitions(query="hour_from=ge=10") ``` Partitions data are returned in a DataFrame that is not indexed. Only one `pd.DataFrame` or `gpd.GeoDataFrame` is returned. Data of multiple partitions are all included in the same output. A `partition_id` column is added to disambiguate. The name of the columns depends on the content type, schema and actual content of the layer. The data handle is used in place of partition id, since the index layer doesn't have a proper identifier for partitions. | | partition\_id | envelope | path | pathEvents | pathMedia | | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------- | :-------------------------------------------------- | :----------------------------------------------------- | :-------------------------------------------------- | :-------- | | 0 | 1d63cfb6-5b79-455a-8fda-1503b99253e3 | \{'version': '1.0', 'submitter': 'Probe Route Simul | \{'positionEstimate': array(\[\{'timeStampUTC\_ms': 15 | \{'vehicleStatus': None, 'vehicleDynamics': None, ' | | | 1 | 1d63cfb6-5b79-455a-8fda-1503b99253e3 | \{'version': '1.0', 'submitter': 'Probe Route Simul | \{'positionEstimate': array(\[\{'timeStampUTC\_ms': 15 | \{'vehicleStatus': None, 'vehicleDynamics': None, ' | | | 2 | 1d63cfb6-5b79-455a-8fda-1503b99253e3 | \{'version': '1.0', 'submitter': 'Probe Route Simul | \{'positionEstimate': array(\[\{'timeStampUTC\_ms': 15 | \{'vehicleStatus': None, 'vehicleDynamics': None, ' | | | 3 | 1d63cfb6-5b79-455a-8fda-1503b99253e3 | \{'version': '1.0', 'submitter': 'Probe Route Simul | \{'positionEstimate': array(\[\{'timeStampUTC\_ms': 15 | \{'vehicleStatus': None, 'vehicleDynamics': None, ' | | | 4 | 1d63cfb6-5b79-455a-8fda-1503b99253e3 | \{'version': '1.0', 'submitter': 'Probe Route Simul | \{'positionEstimate': array(\[\{'timeStampUTC\_ms': 15 | \{'vehicleStatus': None, 'vehicleDynamics': None, ' | | | *(text and rows truncated for clarity)* | | | | | | | In this specific example, as demonstrate for other layer types and described in details in the section [Manipulate DataFrames and GeoDataFrames](https://docs.here.com/data-sdk/docs/here-geopandas-adapter#manipulate-dataframes-and-geodataframes), it's convenient to use the `unpack_columns` function to further unpack the dictionaries into proper columns: | | | | | | ```python from here.geopandas_adapter.utils import dataframe columns = ["partition_id", "pathEvents"] events_df = dataframe.unpack_columns(partitions_df[columns], ["pathEvents"], keep_prefix=False) ``` resulting in: | | partition\_id | vehicleStatus | vehicleDynamics | signRecognition | laneBoundaryRecognition | exceptionalVehicleState | proprietaryInfo | environmentStatus | | -----------------------------------------------: | :----------------------------------- | :------------ | :-------------- | :---------------------------------------------------- | :---------------------- | :---------------------- | :-------------- | :---------------- | | 0 | 1d63cfb6-5b79-455a-8fda-1503b99253e3 | | | \[\{'timeStampUTC\_ms': 1506402914000, 'positionOffse | | | | | | 1 | 1d63cfb6-5b79-455a-8fda-1503b99253e3 | | | \[\{'timeStampUTC\_ms': 1506403395000, 'positionOffse | | | | | | 2 | 1d63cfb6-5b79-455a-8fda-1503b99253e3 | | | \[\{'timeStampUTC\_ms': 1506403082000, 'positionOffse | | | | | | 3 | 1d63cfb6-5b79-455a-8fda-1503b99253e3 | | | None | | | | | | 4 | 1d63cfb6-5b79-455a-8fda-1503b99253e3 | | | \[\{'timeStampUTC\_ms': 1506403131000, 'positionOffse | | | | | | *(text, columns and rows truncated for clarity)* | | | | | | | | | ### Get partitions data and metadata from stream layer in a DataFrame Use get\_stream\_metadata to consume partitions metadata from a stream subscription. When the `GeoPandasAdapter` is enabled, a `pd.DataFrame` is returned instead of a `list` or `dict` as shown in the example below. **Example: getting stream metadata in a DataFrame** ```python from here.platform import Platform from here.geopandas_adapter import GeoPandasAdapter platform = Platform(adapter=GeoPandasAdapter()) sdii_catalog = platform.get_catalog("hrn:here:data::olp-here:olp-sdii-sample-berlin-2") stream_layer = sdii_catalog.get_layer("sample-streaming-layer") with stream_layer.subscribe() as subscription: partitions_df = stream_layer.get_stream_metadata(subscription=subscription) ``` Partitions metadata (stream messages) are returned in a DataFrame that is not indexed. Data can be inlined, as in this example, or stored via the Blob API if too large. | | id | data\_handle | data\_size | data | checksum | crc | timestamp | kafka\_partition | kafka\_offset | | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------- | :----------- | :--------- | :--------------------------------------- | :--------------------------------------------------------------- | :-- | :------------------------- | ---------------: | ------------: | | 0 | c755c5f5-3e01-4398-a3cd-f9a99393b5b4 | | | b'\nB\n\x031.0\x12\x15Probe Route Simula | c5b9d6040e7cb1ca805f20e26e3c5e3f818d3cc59b9f637c443b9b7b90018fa0 | | 2021-11-26 14:00:52.695000 | 3 | 18856435 | | 1 | b69f5967-1408-44d9-9f2a-6e6fd4ec274a | | | b'\nB\n\x031.0\x12\x15Probe Route Simula | bff2e955dff1d35c0a52916aafce8200ebf876c8055204b56d688929fae4ff70 | | 2021-11-26 14:00:57.833000 | 3 | 18856436 | | 2 | 14eb5324-1c3b-44dc-8632-47cfa1dc051e | | | b'\nB\n\x031.0\x12\x15Probe Route Simula | 2463cf999a2d97d991adef6af957ed34a3902a1619b3b6f447c4f61c2dd162b6 | | 2021-11-26 14:01:01.933000 | 3 | 18856437 | | 3 | 03c70b04-1f15-46a2-8745-15793cac4eb5 | | | b'\nB\n\x031.0\x12\x15Probe Route Simula | ee4432e0d4a6d52727ab4c1ea38d61672172b30dd90598f3f9b7d082a601f3ab | | 2021-11-26 14:01:05.037000 | 3 | 18856438 | | 4 | 2ba84d9e-a4fd-44b5-980b-8db2f04d80b6 | | | b'\nB\n\x031.0\x12\x15Probe Route Simula | be4406f678f4ae882fe85e153f62ebab55270772dea094eae49a11358c6dd222 | | 2021-11-26 14:01:11.253000 | 3 | 18856439 | | *(text and rows truncated for clarity)* | | | | | | | | | | | Use read\_stream to consume, fetch and decode the data from a stream subscription. When the `GeoPandasAdapter` is enabled, a `pd.DataFrame` or a `gpd.GeoDataFrame`, depending on the content, is returned instead of a `list` or `dict` as shown in the example below. | | | | | | | | | | | **Example: reading stream data in a DataFrame** | | | | | | | | | | | In this example we show how adapter-specific parameters, such as `record_path`, can be used to customize the decoding. We're interested in only a selection of the properties of the data. | | | | | | | | | | | This specific example reads content encoded in *Protobuf* format. | | | | | | | | | | ```python with stream_layer.subscribe() as subscription: columns = ["timeStampUTC_ms", "latitude_deg", "longitude_deg", "heading_deg", "speed_mps"] partitions_df = stream_layer.read_stream(subscription=subscription, record_path="path.positionEstimate", columns=columns) ``` Partitions data are returned in a DataFrame that is not indexed. Only one `pd.DataFrame` or `gpd.GeoDataFrame` is returned. Data of multiple partitions are all included in the same output. A `partition_id` column is added to disambiguate. The name of the columns depends on the content type, schema and actual content of the layer. | | partition\_id | partition\_timestamp | timeStampUTC\_ms | latitude\_deg | longitude\_deg | heading\_deg | speed\_mps | | -----------------------------: | :----------------------------------- | :------------------------- | ---------------: | ------------: | -------------: | -----------: | ---------: | | 0 | ae93f978-777a-4afe-ab08-993162ef934a | 2021-11-26 13:56:18.727000 | 1637934814720 | 52.5263 | 13.3499 | 276.471 | 16 | | 1 | ae93f978-777a-4afe-ab08-993162ef934a | 2021-11-26 13:56:18.727000 | 1637934816720 | 52.5263 | 13.3496 | 268.154 | 16 | | 2 | ae93f978-777a-4afe-ab08-993162ef934a | 2021-11-26 13:56:18.727000 | 1637934818720 | 52.5263 | 13.3491 | 268.179 | 16 | | 3 | ae93f978-777a-4afe-ab08-993162ef934a | 2021-11-26 13:56:18.727000 | 1637934820720 | 52.5263 | 13.3486 | 268.946 | 16 | | 4 | ae93f978-777a-4afe-ab08-993162ef934a | 2021-11-26 13:56:18.727000 | 1637934822720 | 52.5263 | 13.3482 | 269.345 | 16 | | *(rows truncated for clarity)* | | | | | | | | ### Get features from interactive map layer in a GeoDataFrame Use search\_features to retrieve features from one interactive map layer. When the `GeoPandasAdapter` is enabled, a `gpd.GeoDataFrame` is returned instead of a `list` or `dict` as shown in the example below. The layer supports other functions, among which [get\_features](https://developer.here.com/documentation/sdk-python-v2/api_reference/here.platform.layer.html#here.platform.layer.InteractiveMapLayer.get_features) and [spatial\_search](https://developer.here.com/documentation/sdk-python-v2/api_reference/here.platform.layer.html#here.platform.layer.InteractiveMapLayer.spatial_search) that query and retrieve features from the layer. A GeoDataFrame is returned from these functions as well. When running in Jupyter notebooks, a GeoDataFrame enables an effortless, visual inspection of the features over a map, as demonstrated by using the [HERE Inspector](https://docs.here.com/data-sdk/docs/here-inspector) in the examples below. **Example: reading features in a GeoDataFrame** In this example we retrieve the districts (*Bezirk*) of Berlin from a sample catalog and a sample interactive map layer. ```python from here.platform import Platform from here.geopandas_adapter import GeoPandasAdapter platform = Platform(adapter=GeoPandasAdapter()) sample_catalog = platform.get_catalog("hrn:here:data::olp-here:here-geojson-samples") iml_layer = sample_catalog.get_layer("berlin-interactivemap") features_gdf = iml_layer.search_features() ``` `search_features` without parameters returns all the content, resulting in: | | geometry | Bez | BezName | @ns:com:here:xyz | | :------------------------------------------------------------------------- | :------------------------------------------------- | --: | :------------------------- | :-------------------------------------------------------- | | pjB2hRwTpsW2ZAoP | MULTIPOLYGON Z (((13.429401 52.508571 0, 13.429028 | 01 | Mitte | \{'createdAt': 1629098476655, 'updatedAt': 1629098476655} | | bzuUAjSSniAlAza3 | MULTIPOLYGON Z (((13.491453 52.488265 0, 13.490708 | 02 | Friedrichshain-Kreuzberg | \{'createdAt': 1629098476655, 'updatedAt': 1629098476655} | | p6PdohLKy98613Yh | MULTIPOLYGON Z (((13.523023 52.645034 0, 13.522967 | 03 | Pankow | \{'createdAt': 1629098476655, 'updatedAt': 1629098476655} | | rBPLWN1rBqpn3e48 | MULTIPOLYGON Z (((13.34142 52.504867 0, 13.341344 | 04 | Charlottenburg-Wilmersdorf | \{'createdAt': 1629098476655, 'updatedAt': 1629098476655} | | Jawrgifeu6bFL4SE | MULTIPOLYGON Z (((13.282182 52.53405 0, 13.282092 | 05 | Spandau | \{'createdAt': 1629098476655, 'updatedAt': 1629098476655} | | *(text and rows truncated for clarity)* | | | | | | It's also possible to specify search parameters, as in the following case: | | | | | ```python features_gdf = iml_layer.search_features(params={"p.BezName": "Pankow"}, force_2d=True) ``` resulting in the selection of just one district and removal of z-level from the coordinates: | | geometry | Bez | BezName | @ns:com:here:xyz | | :--------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------- | --: | :------ | :-------------------------------------------------------- | | p6PdohLKy98613Yh | MULTIPOLYGON (((13.523023 52.645034, 13.522967 52. | 03 | Pankow | \{'createdAt': 1629098476655, 'updatedAt': 1629098476655} | | *(text truncated for clarity)* | | | | | | Result can be rendered directly on a map when running in a Jupyter notebook, for example using the [HERE Inspector](https://docs.here.com/data-sdk/docs/here-inspector): | | | | | ```python from here.inspector import inspect from here.inspector.styles import Color inspect(features_gdf, "Districts of Berlin", style=Color.BLUE) ``` **Example: geospatial search of features in a GeoDataFrame** In this example we query the districts of Berlin within a 1000m-distance from a city landmark, the Zoologischer Garten railway station, located at the coordinates visible in the query. ```python from here.platform import Platform from here.geopandas_adapter import GeoPandasAdapter platform = Platform(adapter=GeoPandasAdapter()) sample_catalog = platform.get_catalog("hrn:here:data::olp-here:here-geojson-samples") iml_layer = sample_catalog.get_layer("berlin-interactivemap") features_gdf = iml_layer.spatial_search(lng=13.33474, lat=52.50686, radius=1000) ``` resulting in: | | geometry | Bez | BezName | @ns:com:here:xyz | | :--------------------------------------------------------------- | :------------------------------------------------- | --: | :------------------------- | :-------------------------------------------------------- | | pjB2hRwTpsW2ZAoP | MULTIPOLYGON Z (((13.429401 52.508571 0, 13.429028 | 01 | Mitte | \{'createdAt': 1629098476655, 'updatedAt': 1629098476655} | | rBPLWN1rBqpn3e48 | MULTIPOLYGON Z (((13.34142 52.504867 0, 13.341344 | 04 | Charlottenburg-Wilmersdorf | \{'createdAt': 1629098476655, 'updatedAt': 1629098476655} | | jLrIE0BxQ6vj5U2a | MULTIPOLYGON Z (((13.427455 52.38578 0, 13.426965 | 07 | Tempelhof-Schöneberg | \{'createdAt': 1629098476655, 'updatedAt': 1629098476655} | | The result can be rendered directly in a Jupyter notebook using: | | | | | ```python from here.inspector import inspect from here.inspector.styles import Color inspect(features_gdf, "Districts within 1000m from Berlin Zoologischer Garten railway station", style=Color.RED) ``` ## Write DataFrame to layer To write data and metadata to versioned, volatile, index, stream and interactive map layers, please familiarize yourself first with the write functions described in the [corresponding section](here-platform-layers.md#write-to-layer) of this user guide. For content types supported by the GeoPandas Adapter (see [Table](here-platform-layers.md#supported-formats)), contents of a DataFrame or GeoDataFrame can be encoded and written to layer with a single function. For content types not supported, you will need to pass `encode=False` and take care of the encoding yourself. All the standard parameters of `set_partitions_metadata`, `write_partitions`, `append_stream_metadata`, `write_stream`, `write_features`, `update_features`, `delete_features` are supported, in addition to adapter-specific parameters that are forwarded to this adapter and its data encoder. When writing and encoding data, the `GeoPandasAdapter` splits the (Geo)DataFrame to write in partitions according to the `partition_id` column. Each selection of rows is then encoded and stored as standalone partition. Rows with no partition identifier set are discarded. Parameters that are adapter-specific are passed to the `DataFrame.to_csv`, `DataFrame.to_parquet` and similar functions that perform the actual encoding of each single partition. You can use them to fine-tune the details of the encoding of single partitions, including how to handle the (Geo)DataFrame index. For more information on supported content types and exact parameters, please see the documentation of GeoPandasEncoder. In case `encode=False` is passed to `write_partitions` or `write_stream`, a plain Python collection containing `bytes` and not a (Geo)DataFrame must be passed as well, as the adapter is not used and no encoding takes place. Write examples are symmetric to the read examples shown above. ## Manipulate DataFrames and GeoDataFrames The commonly used [Pandas](https://pandas.pydata.org/docs/user_guide/index.html) and [GeoPandas](https://geopandas.org/docs.html) libraries are well documented, and many examples showing how to use them to perform data analysis and manipulation are publicly available. Generally, data is in a tabular representation where each cell of the table contains one value with a defined data type (numeric, string, or other basic type). Map data and, in general, data stored in a catalog can be highly structured sometimes and follow a complex, nested schema. Dealing with this complexity in *Pandas* can be difficult. Therefore, the HERE Data SDK for Python includes in the *here-geopandas-adapter* package utility functions to perform repetitive tasks and manipulate complex DataFrames, in particular DataFrames with columns that contain dictionaries instead of single values. ### Unpacking series and DataFrames *Pandas* provides the [explode](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html) function to turn objects of type `list` contained in a column into multiple rows. Similarly, HERE Data SDK for Python provides the [unpack](https://here-dni.github.io/HEREDataSDKforPythonv2/here.geopandas_adapter.utils.dataframe.html#here.geopandas_adapter.utils.dataframe.unpack) and [unpack\_columns](https://here-dni.github.io/HEREDataSDKforPythonv2/here.geopandas_adapter.utils.dataframe.html#here.geopandas_adapter.utils.dataframe.unpack_columns) functions to turn single columns containing `dict` into multiple columns. This is a convenience function to unpack data structures that sometimes result from reading data from catalogs or working with complex data models. `unpack` is applied to a `Series` containing `dict` objects, it returns a `DataFrame`. `unpack_columns` is applied to a `DataFrame` to replace one or more column that contain `dict` objects with multiple columns, one for each field of the dictionaries. Unpacking is also recursive, to deal easily with deeply nested data structures. **Example: unpacking a DataFrame column that contains dictionaries** Given the example DataFrame `df`, derived from structured objects: ```python import pandas as pd berlin = { "name": "Berlin", "location": { "longitude": 13.408333, "latitude": 52.518611, "country": { "name": "Deutschland", "code": "DE" } }, "zip_codes": { "min": 10115, "max": 14199 }, "population": 3664088 } paris = { "name": "Paris", "location": { "longitude": 2.351667, "latitude": 48.856667, "country": { "name": "France", "code": "FR" } }, "zip_codes": { "min": 75001, "max": 75020 }, "population": 2175601 } df = pd.DataFrame([berlin, paris]) ``` resulting in: | | name | location | zip\_codes | population | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----- | :-------------------------------------------------------------------------------------------------- | :---------------------------- | ---------: | | 0 | Berlin | \{'longitude': 13.408333, 'latitude': 52.518611, 'country': \{'name': 'Deutschland', 'code': 'DE'}} | \{'min': 10115, 'max': 14199} | 3664088 | | 1 | Paris | \{'longitude': 2.351667, 'latitude': 48.856667, 'country': \{'name': 'France', 'code': 'FR'}} | \{'min': 75001, 'max': 75020} | 2175601 | | We can unpack the columns `location` and `zip_codes` containing dictionaries that otherwise would be difficult to operate with. Unpacking is recursive and unpacks also nested dictionaries, for example `country` contained in `location`. | | | | | ```python from here.geopandas_adapter.utils.dataframe import unpack_columns unpacked_df = unpack_columns(df, columns=["location", "zip_codes"]) ``` resulting in: | | name | location.longitude | location.latitude | location.country.name | location.country.code | zip\_codes.min | zip\_codes.max | population | | -: | :----- | -----------------: | ----------------: | :-------------------- | :-------------------- | -------------: | -------------: | ---------: | | 0 | Berlin | 13.4083 | 52.5186 | Deutschland | DE | 10115 | 14199 | 3664088 | | 1 | Paris | 2.35167 | 48.8567 | France | FR | 75001 | 75020 | 2175601 | ### Replacing a column with one or more columns The function replace\_column can be used to replace one single column of a `DataFrame` with one or multiple columns of another DataFrame. **Example: replacing one column with a multiple columns** Given the example DataFrames `df` and `df2`: ```python import pandas as pd df = pd.DataFrame({ "col_A": [11, 31, 41], "col_B": [12, 32, 42], "col_C": [14, 34, 42] }, index = [1, 3, 4]) df2 = pd.DataFrame({ "col_Bx": [110, 130, 140], "col_By": [115, 135, 145] }, index = [1, 3, 4]) ``` resulting in: | | col\_A | col\_B | col\_C | | -------------------------------------------------: | ------: | ------: | -----: | | 1 | 11 | 12 | 14 | | 3 | 31 | 32 | 34 | | 4 | 41 | 42 | 42 | | and: | | | | | | col\_Bx | col\_By | | | --: | -----: | -----: | | | 1 | 110 | 115 | | | 3 | 130 | 135 | | | 4 | 140 | 145 | | | We can replace `col_B` with `col_Bx` and `col_By`: | | | | ```python from here.geopandas_adapter.utils.dataframe import replace_column replaced_df = replace_column(df, "col_B", df2) ``` resulting in: | | col\_A | col\_Bx | col\_By | col\_C | | -: | -----: | ------: | ------: | -----: | | 1 | 11 | 110 | 115 | 14 | | 3 | 31 | 130 | 135 | 34 | | 4 | 41 | 140 | 145 | 42 | ### Adding and removing prefixes to column names The functions prefix\_columns and unprefix\_columns are used to add or remove a prefix from the names of selected columns of a `DataFrame`. A separator `.` is added between the prefix and column names. This is useful to group (*prefix*) related columns of a DataFrame under a common prefix or to remove a lengthy, verbose prefix present in multiple columns (*unprefix*) to obtain a derived DataFrame that is more comfortable to work with. **Example: prefixing columns with common prefix** Given the example DataFrame `df`: ```python import pandas as pd df = pd.DataFrame({ "name": ["Sarah", "Vivek", "Marco"], "age": [41, 29, 35], "house_nr": ["1492", "34-35", "48A"], "road": ["SE 36th Ave", "Seshadri Road", "Via Giosuè Carducci"], "city": ["Portland", "Bengaluru", "Milan"], "zip": [97214, 560009, 20123], "state": ["OR", "KA", pd.NA], "country": ["US", "IN", "IT"], }) ``` resulting in: | | name | age | house\_nr | road | city | zip | state | country | | --------------------------------------------------------------------------------: | :---- | --: | :-------- | :------------------ | :-------- | -----: | :---- | :------ | | 0 | Sarah | 41 | 1492 | SE 36th Ave | Portland | 97214 | OR | US | | 1 | Vivek | 29 | 34-35 | Seshadri Road | Bengaluru | 560009 | KA | IN | | 2 | Marco | 35 | 48A | Via Giosuè Carducci | Milan | 20123 | | IT | | We can group columns that are part of the address, prefixing them with `address`: | | | | | | | | | ```python from here.geopandas_adapter.utils.dataframe import prefix_columns prefixed_df = prefix_columns(df, "address", ["house_nr", "road", "city", "zip", "country", "state"]) ``` resulting in: | | name | age | address.house\_nr | address.road | address.city | address.zip | address.state | address.country | | --------------------------------------------------------------------------------------------------: | :---- | --: | :---------------- | :------------------ | :----------- | ----------: | :------------ | :-------------- | | 0 | Sarah | 41 | 1492 | SE 36th Ave | Portland | 97214 | OR | US | | 1 | Vivek | 29 | 34-35 | Seshadri Road | Bengaluru | 560009 | KA | IN | | 2 | Marco | 35 | 48A | Via Giosuè Carducci | Milan | 20123 | | IT | | **Example: removing a common prefix** | | | | | | | | | | Continuing the example above, we can remove the `address` prefix and obtain the original DataFrame: | | | | | | | | | ```python from here.geopandas_adapter.utils.dataframe import unprefix_columns unprefixed_df = unprefix_columns(prefixed_df, "address") ``` resulting in: | | name | age | house\_nr | road | city | zip | state | country | | -: | :---- | --: | :-------- | :------------------ | :-------- | -----: | :---- | :------ | | 0 | Sarah | 41 | 1492 | SE 36th Ave | Portland | 97214 | OR | US | | 1 | Vivek | 29 | 34-35 | Seshadri Road | Bengaluru | 560009 | KA | IN | | 2 | Marco | 35 | 48A | Via Giosuè Carducci | Milan | 20123 | | IT |