GuidesTypeScript API ReferencePython v2 API Reference
Guides

To read data and metadata from supported layer types into a DataFrame or GeoDataFrame, enable the GeoPandasAdapter and use the standard HERE Data SDK read functions.

To read data and metadata from versioned, volatile, index, stream and interactive map layers, please familiarize yourself first with the read functions described in the corresponding section of this user guide.

All the standard parameters of get_partitions_metadata, read_partitions, read_stream_metadata, read_stream, get_features, iter_features are supported, in addition to adapter-specific parameters that are forwarded to this adapter and its data decoder.

When reading and decoding data, parameters that are adapter-specific are passed to the pd.read_csv, pd.read_parquet and similar Pandas functions that perform the actual decoding of each single partition. You can use them to fine-tune the details of the decoding of single partitions, including how to handle the (Geo)DataFrame index, if present in the data.

The GeoPandasAdapter puts together the output in a single DataFrame. For more information on supported content types and exact parameters, please see the documentation of GeoPandasDecoder. The partition name is saved in a partition_id column, to distinguish data read from one partition from data read from another partition, when reading multiple partitions at once. The actual name of the partition_id column can be configured in the GeoPandasAdapter constructor, together with other parameters to fine-tune decoding of specific formats like content following a Protocol Buffers schema.

In case decode=False is passed to read_partitions or read_stream, no decoding takes places, the adapter is not used and a plain Python collection or iterator containing bytes is returned.

Get partitions data and metadata from versioned layer in a DataFrame

Use get_partitions_metadata to obtain partitions metadata. When the GeoPandasAdapter is enabled, a pd.DataFrame is returned instead of a list or dict as shown in the example below.

Example: getting versioned metadata in a DataFrame

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())
sdii_catalog = platform.get_catalog("hrn:here:data::olp-here:olp-sdii-sample-berlin-2")
versioned_layer = sdii_catalog.get_layer("sample-versioned-layer")

partitions_df = versioned_layer.get_partitions_metadata([377894434, 377894435, 377894440, 377894441])

Partitions metadata are returned in a DataFrame that is not indexed.

iddata_handlechecksumdata_sizecrc
0377894434e2eefcae-e695-4f98-8a55-6881ca1ef52d7697
1377894435da494218-e5b9-4538-9860-624864a718a711963
2377894440ef395fe1-51b4-4909-bd3c-3883d88d66b3569494
3377894441a5e1f634-7fbb-43f6-bbdb-7e91edc67879342066

Use read_partitions to fetch and decode the data. When the GeoPandasAdapter is enabled, a pd.DataFrame or a gpd.GeoDataFrame, depending on the content, is returned instead of a list or dict as shown in the example below.

Example: reading versioned data in a DataFrame

partitions_df = versioned_layer.read_partitions(partition_ids=[377894434, 377894435])

Partitions data are returned in a DataFrame that is not indexed. Only one pd.DataFrame or gpd.GeoDataFrame is returned. Data of multiple partitions are all included in the same output. A partition_id column is added to disambiguate. The name of the columns depends on the content type, schema and actual content of the layer. If no partition_ids are provided, the whole layer is read.

This specific example reads content encoded in Protobuf format.

partition_idtileIdmessagesrefs
0377894434377894434[{'messageId': 'ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64', 'message': {'envelope': {'version': '1.0', 'submitter': 'Probe Ro[]
1377894435377894435[{'messageId': '4418dfe4-091e-41fe-bb21-49d6524442af', 'message': {'envelope': {'version': '1.0', 'submitter': 'Probe Ro[]

(text truncated for clarity)

Depending on the content type and actual schema, the returned DataFrame may be directly usable or require further manipulation to bring it to a usable form. CSV, GeoJSON, Parquet and schemaless content types are decoded and converted to the best possible format for the user automatically. For example, GeoJSON is decoded into a gpd.GeoDataFrame. Protobuf-encoded data usually have nested, composite and repeated fields, lists, dictionaries, and other complex data structures.

Documentation of GeoPandasDecoder illustrates parameters that can be used to fine-tune the decoding and improve the resulting output for every content type, but in particular for Protobuf-encoded data. Very common is the record_path parameter: when specified, only content in that path is decoded. If the field at the given path happens to be a repeated field, the function returns multiple rows per partition. Dictionaries are also unpacked automatically to multiple columns, when possible.

Continuing the example above, we read again the same partitions, specifying the record_path parameter and selecting only some columns for clarity:

columns = ["messageId", "message.envelope.transientVehicleUUID", "message.path.positionEstimate", "metadata.receivedTime"]

messages_df = versioned_layer.read_partitions(partition_ids=[377894434, 377894435], record_path="messages", columns=columns)

results in:

partition_idmessageIdmessage.envelope.transientVehicleUUIDmessage.path.positionEstimatemetadata.receivedTime
0377894434ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64ee7c8af4-fbe0-45e3-9c55-e170f0d2fa64[{'timeStampUTC_ms': '1506403044000', 'positionTyp1507151512491
1377894434eaa76f08-ed02-4893-b524-9bde9296b9f9eaa76f08-ed02-4893-b524-9bde9296b9f9[{'timeStampUTC_ms': '1506402922000', 'positionTyp1507151512491
2377894434a86fb17f-27a6-4e47-b2fb-77ec61000625a86fb17f-27a6-4e47-b2fb-77ec61000625[{'timeStampUTC_ms': '1506403015000', 'positionTyp1507151512491
337789443479bba846-b804-4026-a980-7d4045e7a49379bba846-b804-4026-a980-7d4045e7a493[{'timeStampUTC_ms': '1506403037000', 'positionTyp1507151512491
4377894434cc71d131-e8ed-4269-b1d1-d9c4c3108408cc71d131-e8ed-4269-b1d1-d9c4c3108408[{'timeStampUTC_ms': '1506402944000', 'positionTyp1507151512492

(text and rows truncated for clarity)

The partition_id columns is always added automatically after decoding.

The column message.path.positionEstimate contains a list, that can be further processed, turning the DataFrame from having one row per message to one row per position estimate:

from here.geopandas_adapter.utils.dataframe import unpack_columns

estimates_df = messages_df[["messageId", "message.path.positionEstimate"]].explode("message.path.positionEstimate")
estimates_df = unpack_columns(estimates_df, "message.path.positionEstimate", keep_prefix=False)

results in:

messageIdtimeStampUTC_mspositionTypelongitude_deglatitude_deghorizontalAccuracy_mheading_degspeed_mpsmapMatchedLinkIDmapMatchedLinkIDOffset_m
0ee7c8af4-fbe0-45e3-9c55-e170f0d2fa641506403044000RAW_GPS13.361152.5099090.8589161755367270
0ee7c8af4-fbe0-45e3-9c55-e170f0d2fa641506403046000RAW_GPS13.361652.5099091.40011617553672732
0ee7c8af4-fbe0-45e3-9c55-e170f0d2fa641506403048000RAW_GPS13.362152.5098091.56941617553672764
0ee7c8af4-fbe0-45e3-9c55-e170f0d2fa641506403050000RAW_GPS13.362552.5098091.56941617553672792.1063
1eaa76f08-ed02-4893-b524-9bde9296b9f91506402922000RAW_GPS13.373152.5092085.7321161801053220

(columns and rows truncated for clarity)

Get partitions data and metadata from volatile layer in a DataFrame

Use get_partitions_metadata to obtain partitions metadata. When the GeoPandasAdapter is enabled, a pd.DataFrame is returned instead of a list or dict as shown in the example below.

Example: getting volatile metadata in a DataFrame

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())
weather_catalog = platform.get_catalog('hrn:here:data::olp-here:live-weather-eu')
volatile_layer = weather_catalog.get_layer('latest-data')

partitions_df = volatile_layer.get_partitions_metadata(partition_ids=[81150, 81151])

Partitions metadata are returned in a DataFrame that is not indexed.

iddata_handlechecksumdata_sizecrc
08115081150
18115181151

Use read_partitions to fetch and decode the data. When the GeoPandasAdapter is enabled, a pd.DataFrame or a gpd.GeoDataFrame, depending on the content, is returned instead of a list or dict as shown in the example below.

Note

Volatile metadata and underlying data can occasionally be out of sync. When this occurs, metadata may indicate that data exists in a given partition but at the current point in time there is no data residing there. In the event you call read_partitions and one or more of the requested partitions do not exist or contain no data, no rows will be added to the returned DataFrame for that partition. This could result in an empty DataFrame being returned.

Example: reading volatile data in a DataFrame

partitions_df = volatile_layer.read_partitions(partition_ids=[81150, 81151], record_path="weather_condition_tile")

Partitions data are returned in a DataFrame that is not indexed. Only one pd.DataFrame or gpd.GeoDataFrame is returned. Data of multiple partitions are all included in the same output. A partition_id column is added to disambiguate. The name of the columns depends on the content type, schema and actual content of the layer. If no partition_ids are provided, the whole layer is read.

This specific example reads content encoded in Protobuf format.

columns = ["tile_id",
           "center_point_geohash",
           "air_temperature.value",
           "dew_point_temperature.value",
           "humidity.value",
           "air_pressure.value",
           "visibility.value",
           "iop.value",
           "wind_velocity.value",
           "wind_velocity.direction",
           "precipitation_type.precipitation_type"]

partitions_df = volatile_layer.read_partitions(partition_ids=[81150, 81151], record_path="weather_condition_tile", columns=columns)

In this example we select only some columns obtained from the Protobuf repeated field weather_condition_tile, resulting in:

partition_idtile_idcenter_point_geohashair_temperature.valuedew_point_temperature.valuehumidity.valueair_pressure.valuevisibility.valueiop.valuewind_velocity.valuewind_velocity.directionprecipitation_type.precipitation_type
081150332391761g7ybnf004.83282.091003.099.99033.522.81NONE
181150332391760g7ybn6004.84282.041003.089.99033.4722.73NONE
281150332391767g7ybpy004.8282.261003.129.99033.6223.09NONE
381150332391765g7ybpf004.81282.181003.119.99033.5722.97NONE
481150332391764g7ybp6004.82282.141003.19.99033.5322.89NONE

(rows truncated for clarity)

Get partitions data and metadata from index layer in a DataFrame

Use get_partitions_metadata to obtain partitions metadata. When the GeoPandasAdapter is enabled, a pd.DataFrame is returned instead of a list or dict as shown in the example below.

Example: getting index metadata in a DataFrame

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())
sdii_catalog = platform.get_catalog("hrn:here:data::olp-here:olp-sdii-sample-berlin-2")
index_layer = sdii_catalog.get_layer("sample-index-layer")

partitions_df = index_layer.get_partitions_metadata(query="hour_from=ge=10")

Partitions metadata are returned in a DataFrame that is not indexed. The data handle is used in place of partition id, since the index layer doesn't have a proper identifier for partitions.

iddata_handlechecksumdata_sizecrc
01d63cfb6-5b79-455a-8fda-1503b99253e31d63cfb6-5b79-455a-8fda-1503b99253e30353f45622ac843ccabbc8af4ce6739d5baf171a290391
11f9c8d0a-2519-4cd8-af4a-0fd0fa16b0471f9c8d0a-2519-4cd8-af4a-0fd0fa16b0471a1472a4de647291da7498407b59a2011af6c25c113261
22f9c978d-b6bc-4889-b7d4-a47849fb6a172f9c978d-b6bc-4889-b7d4-a47849fb6a1774b94f931c3bda3a7500eadaf34506445c0a10ba356674
32fed9456-7275-4786-b600-0c4865854b792fed9456-7275-4786-b600-0c4865854b79ad68c63881bfeae3635d64270df4e13202049f54115175
43b0c053b-8988-4621-92d7-9daf65e7d4a73b0c053b-8988-4621-92d7-9daf65e7d4a7e7aca6afb0a37ed46d9e11a8c2ed73afa9eae1d0114945

Use read_partitions to fetch and decode the data. When the GeoPandasAdapter is enabled, a pd.DataFrame or a gpd.GeoDataFrame, depending on the content, is returned instead of a list or dict as shown in the example below. If no partition_ids are provided, the whole layer is read.

Example: reading index data in a DataFrame

partitions_df = index_layer.read_partitions(query="hour_from=ge=10")

Partitions data are returned in a DataFrame that is not indexed. Only one pd.DataFrame or gpd.GeoDataFrame is returned. Data of multiple partitions are all included in the same output. A partition_id column is added to disambiguate. The name of the columns depends on the content type, schema and actual content of the layer. The data handle is used in place of partition id, since the index layer doesn't have a proper identifier for partitions.

partition_idenvelopepathpathEventspathMedia
01d63cfb6-5b79-455a-8fda-1503b99253e3{'version': '1.0', 'submitter': 'Probe Route Simul{'positionEstimate': array([{'timeStampUTC_ms': 15{'vehicleStatus': None, 'vehicleDynamics': None, '
11d63cfb6-5b79-455a-8fda-1503b99253e3{'version': '1.0', 'submitter': 'Probe Route Simul{'positionEstimate': array([{'timeStampUTC_ms': 15{'vehicleStatus': None, 'vehicleDynamics': None, '
21d63cfb6-5b79-455a-8fda-1503b99253e3{'version': '1.0', 'submitter': 'Probe Route Simul{'positionEstimate': array([{'timeStampUTC_ms': 15{'vehicleStatus': None, 'vehicleDynamics': None, '
31d63cfb6-5b79-455a-8fda-1503b99253e3{'version': '1.0', 'submitter': 'Probe Route Simul{'positionEstimate': array([{'timeStampUTC_ms': 15{'vehicleStatus': None, 'vehicleDynamics': None, '
41d63cfb6-5b79-455a-8fda-1503b99253e3{'version': '1.0', 'submitter': 'Probe Route Simul{'positionEstimate': array([{'timeStampUTC_ms': 15{'vehicleStatus': None, 'vehicleDynamics': None, '

(text and rows truncated for clarity)

In this specific example, as demonstrate for other layer types and described in details in the section Manipulate DataFrames and GeoDataFrames, it's convenient to use the unpack_columns function to further unpack the dictionaries into proper columns:

from here.geopandas_adapter.utils import dataframe

columns = ["partition_id", "pathEvents"]

events_df = dataframe.unpack_columns(partitions_df[columns], ["pathEvents"], keep_prefix=False)

resulting in:

partition_idvehicleStatusvehicleDynamicssignRecognitionlaneBoundaryRecognitionexceptionalVehicleStateproprietaryInfoenvironmentStatus
01d63cfb6-5b79-455a-8fda-1503b99253e3[{'timeStampUTC_ms': 1506402914000, 'positionOffse
11d63cfb6-5b79-455a-8fda-1503b99253e3[{'timeStampUTC_ms': 1506403395000, 'positionOffse
21d63cfb6-5b79-455a-8fda-1503b99253e3[{'timeStampUTC_ms': 1506403082000, 'positionOffse
31d63cfb6-5b79-455a-8fda-1503b99253e3None
41d63cfb6-5b79-455a-8fda-1503b99253e3[{'timeStampUTC_ms': 1506403131000, 'positionOffse

(text, columns and rows truncated for clarity)

Get partitions data and metadata from stream layer in a DataFrame

Use get_stream_metadata to consume partitions metadata from a stream subscription. When the GeoPandasAdapter is enabled, a pd.DataFrame is returned instead of a list or dict as shown in the example below.

Example: getting stream metadata in a DataFrame

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())
sdii_catalog = platform.get_catalog("hrn:here:data::olp-here:olp-sdii-sample-berlin-2")
stream_layer = sdii_catalog.get_layer("sample-streaming-layer")

with stream_layer.subscribe() as subscription:
    partitions_df = stream_layer.get_stream_metadata(subscription=subscription)

Partitions metadata (stream messages) are returned in a DataFrame that is not indexed. Data can be inlined, as in this example, or stored via the Blob API if too large.

iddata_handledata_sizedatachecksumcrctimestampkafka_partitionkafka_offset
0c755c5f5-3e01-4398-a3cd-f9a99393b5b4b'\nB\n\x031.0\x12\x15Probe Route Simulac5b9d6040e7cb1ca805f20e26e3c5e3f818d3cc59b9f637c443b9b7b90018fa02021-11-26 14:00:52.695000318856435
1b69f5967-1408-44d9-9f2a-6e6fd4ec274ab'\nB\n\x031.0\x12\x15Probe Route Simulabff2e955dff1d35c0a52916aafce8200ebf876c8055204b56d688929fae4ff702021-11-26 14:00:57.833000318856436
214eb5324-1c3b-44dc-8632-47cfa1dc051eb'\nB\n\x031.0\x12\x15Probe Route Simula2463cf999a2d97d991adef6af957ed34a3902a1619b3b6f447c4f61c2dd162b62021-11-26 14:01:01.933000318856437
303c70b04-1f15-46a2-8745-15793cac4eb5b'\nB\n\x031.0\x12\x15Probe Route Simulaee4432e0d4a6d52727ab4c1ea38d61672172b30dd90598f3f9b7d082a601f3ab2021-11-26 14:01:05.037000318856438
42ba84d9e-a4fd-44b5-980b-8db2f04d80b6b'\nB\n\x031.0\x12\x15Probe Route Simulabe4406f678f4ae882fe85e153f62ebab55270772dea094eae49a11358c6dd2222021-11-26 14:01:11.253000318856439

(text and rows truncated for clarity)

Use read_stream to consume, fetch and decode the data from a stream subscription. When the GeoPandasAdapter is enabled, a pd.DataFrame or a gpd.GeoDataFrame, depending on the content, is returned instead of a list or dict as shown in the example below.

Example: reading stream data in a DataFrame

In this example we show how adapter-specific parameters, such as record_path, can be used to customize the decoding. We're interested in only a selection of the properties of the data.

This specific example reads content encoded in Protobuf format.

with stream_layer.subscribe() as subscription:
    columns = ["timeStampUTC_ms",
               "latitude_deg",
               "longitude_deg",
               "heading_deg",
               "speed_mps"]

    partitions_df = stream_layer.read_stream(subscription=subscription, record_path="path.positionEstimate", columns=columns)

Partitions data are returned in a DataFrame that is not indexed. Only one pd.DataFrame or gpd.GeoDataFrame is returned. Data of multiple partitions are all included in the same output. A partition_id column is added to disambiguate. The name of the columns depends on the content type, schema and actual content of the layer.

partition_idpartition_timestamptimeStampUTC_mslatitude_deglongitude_degheading_degspeed_mps
0ae93f978-777a-4afe-ab08-993162ef934a2021-11-26 13:56:18.727000163793481472052.526313.3499276.47116
1ae93f978-777a-4afe-ab08-993162ef934a2021-11-26 13:56:18.727000163793481672052.526313.3496268.15416
2ae93f978-777a-4afe-ab08-993162ef934a2021-11-26 13:56:18.727000163793481872052.526313.3491268.17916
3ae93f978-777a-4afe-ab08-993162ef934a2021-11-26 13:56:18.727000163793482072052.526313.3486268.94616
4ae93f978-777a-4afe-ab08-993162ef934a2021-11-26 13:56:18.727000163793482272052.526313.3482269.34516

(rows truncated for clarity)

Get features from interactive map layer in a GeoDataFrame

Use search_features to retrieve features from one interactive map layer. When the GeoPandasAdapter is enabled, a gpd.GeoDataFrame is returned instead of a list or dict as shown in the example below.

The layer supports other functions, among which get_features and spatial_search that query and retrieve features from the layer. A GeoDataFrame is returned from these functions as well.

When running in Jupyter notebooks, a GeoDataFrame enables an effortless, visual inspection of the features over a map, as demonstrated by using the HERE Inspector in the examples below.

Example: reading features in a GeoDataFrame

In this example we retrieve the districts (Bezirk) of Berlin from a sample catalog and a sample interactive map layer.

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())

sample_catalog = platform.get_catalog("hrn:here:data::olp-here:here-geojson-samples")
iml_layer = sample_catalog.get_layer("berlin-interactivemap")

features_gdf = iml_layer.search_features()

search_features without parameters returns all the content, resulting in:

geometryBezBezName@ns:com:here:xyz
pjB2hRwTpsW2ZAoPMULTIPOLYGON Z (((13.429401 52.508571 0, 13.42902801Mitte{'createdAt': 1629098476655, 'updatedAt': 1629098476655}
bzuUAjSSniAlAza3MULTIPOLYGON Z (((13.491453 52.488265 0, 13.49070802Friedrichshain-Kreuzberg{'createdAt': 1629098476655, 'updatedAt': 1629098476655}
p6PdohLKy98613YhMULTIPOLYGON Z (((13.523023 52.645034 0, 13.52296703Pankow{'createdAt': 1629098476655, 'updatedAt': 1629098476655}
rBPLWN1rBqpn3e48MULTIPOLYGON Z (((13.34142 52.504867 0, 13.34134404Charlottenburg-Wilmersdorf{'createdAt': 1629098476655, 'updatedAt': 1629098476655}
Jawrgifeu6bFL4SEMULTIPOLYGON Z (((13.282182 52.53405 0, 13.28209205Spandau{'createdAt': 1629098476655, 'updatedAt': 1629098476655}

(text and rows truncated for clarity)

It's also possible to specify search parameters, as in the following case:

features_gdf = iml_layer.search_features(params={"p.BezName": "Pankow"}, force_2d=True)

resulting in the selection of just one district and removal of z-level from the coordinates:

geometryBezBezName@ns:com:here:xyz
p6PdohLKy98613YhMULTIPOLYGON (((13.523023 52.645034, 13.522967 52.03Pankow{'createdAt': 1629098476655, 'updatedAt': 1629098476655}

(text truncated for clarity)

Result can be rendered directly on a map when running in a Jupyter notebook, for example using the doc:here-inspectorHERE Inspector:

from here.inspector import inspect
from here.inspector.styles import Color

inspect(features_gdf, "Districts of Berlin", style=Color.BLUE)

Example: geospatial search of features in a GeoDataFrame

In this example we query the districts of Berlin within a 1000m-distance from a city landmark, the Zoologischer Garten railway station, located at the coordinates visible in the query.

from here.platform import Platform
from here.geopandas_adapter import GeoPandasAdapter

platform = Platform(adapter=GeoPandasAdapter())

sample_catalog = platform.get_catalog("hrn:here:data::olp-here:here-geojson-samples")
iml_layer = sample_catalog.get_layer("berlin-interactivemap")

features_gdf = iml_layer.spatial_search(lng=13.33474, lat=52.50686, radius=1000)

resulting in:

geometryBezBezName@ns:com:here:xyz
pjB2hRwTpsW2ZAoPMULTIPOLYGON Z (((13.429401 52.508571 0, 13.42902801Mitte{'createdAt': 1629098476655, 'updatedAt': 1629098476655}
rBPLWN1rBqpn3e48MULTIPOLYGON Z (((13.34142 52.504867 0, 13.34134404Charlottenburg-Wilmersdorf{'createdAt': 1629098476655, 'updatedAt': 1629098476655}
jLrIE0BxQ6vj5U2aMULTIPOLYGON Z (((13.427455 52.38578 0, 13.42696507Tempelhof-Schöneberg{'createdAt': 1629098476655, 'updatedAt': 1629098476655}

The result can be rendered directly in a Jupyter notebook using:

from here.inspector import inspect
from here.inspector.styles import Color

inspect(features_gdf, "Districts within 1000m from Berlin Zoologischer Garten railway station", style=Color.RED)