GuidesTypeScript API ReferencePython v2 API Reference
Guides

The commonly used Pandas and GeoPandas libraries are well documented, and many examples showing how to use them to perform data analysis and manipulation are publicly available. Generally, data is in a tabular representation where each cell of the table contains one value with a defined data type (numeric, string, or other basic type).

Map data and, in general, data stored in a catalog can be highly structured sometimes and follow a complex, nested schema. Dealing with this complexity in Pandas can be difficult. Therefore, the HERE Data SDK for Python includes in the here-geopandas-adapter package utility functions to perform repetitive tasks and manipulate complex DataFrames, in particular DataFrames with columns that contain dictionaries instead of single values.

Unpacking series and DataFrames

Pandas provides the explode function to turn objects of type list contained in a column into multiple rows. Similarly, HERE Data SDK for Python provides the unpack and unpack_columns functions to turn single columns containing dict into multiple columns. This is a convenience function to unpack data structures that sometimes result from reading data from catalogs or working with complex data models.

unpack is applied to a Series containing dict objects, it returns a DataFrame. unpack_columns is applied to a DataFrame to replace one or more column that contain dict objects with multiple columns, one for each field of the dictionaries. Unpacking is also recursive, to deal easily with deeply nested data structures.

Example: unpacking a DataFrame column that contains dictionaries

Given the example DataFrame df, derived from structured objects:

import pandas as pd

berlin = {
    "name": "Berlin",
    "location": {
        "longitude": 13.408333,
        "latitude": 52.518611,
        "country": { "name": "Deutschland", "code": "DE" }
    },
    "zip_codes": { "min": 10115, "max": 14199 },
    "population": 3664088
}

paris = {
    "name": "Paris",
    "location": {
        "longitude": 2.351667,
        "latitude": 48.856667,
        "country": { "name": "France", "code": "FR" }
    },
    "zip_codes": { "min": 75001, "max": 75020 },
    "population": 2175601
}

df = pd.DataFrame([berlin, paris])

resulting in:

namelocationzip_codespopulation
0Berlin{'longitude': 13.408333, 'latitude': 52.518611, 'country': {'name': 'Deutschland', 'code': 'DE'}}{'min': 10115, 'max': 14199}3664088
1Paris{'longitude': 2.351667, 'latitude': 48.856667, 'country': {'name': 'France', 'code': 'FR'}}{'min': 75001, 'max': 75020}2175601

We can unpack the columns location and zip_codes containing dictionaries that otherwise would be difficult to operate with. Unpacking is recursive and unpacks also nested dictionaries, for example country contained in location.

from here.geopandas_adapter.utils.dataframe import unpack_columns

unpacked_df = unpack_columns(df, columns=["location", "zip_codes"])

resulting in:

namelocation.longitudelocation.latitudelocation.country.namelocation.country.codezip_codes.minzip_codes.maxpopulation
0Berlin13.408352.5186DeutschlandDE10115141993664088
1Paris2.3516748.8567FranceFR75001750202175601

Replacing a column with one or more columns

The function replace_column can be used to replace one single column of a DataFrame with one or multiple columns of another DataFrame.

Example: replacing one column with a multiple columns

Given the example DataFrames df and df2:

import pandas as pd

df = pd.DataFrame({
    "col_A": [11, 31, 41],
    "col_B": [12, 32, 42],
    "col_C": [14, 34, 42]
}, index = [1, 3, 4])

df2 = pd.DataFrame({
    "col_Bx": [110, 130, 140],
    "col_By": [115, 135, 145]
}, index = [1, 3, 4])

resulting in:

col_Acol_Bcol_C
1111214
3313234
4414242

and:

col_Bxcol_By
1110115
3130135
4140145

We can replace col_B with col_Bx and col_By:

from here.geopandas_adapter.utils.dataframe import replace_column

replaced_df = replace_column(df, "col_B", df2)

resulting in:

col_Acol_Bxcol_Bycol_C
11111011514
33113013534
44114014542

Adding and removing prefixes to column names

The functions prefix_columns and unprefix_columns are used to add or remove a prefix from the names of selected columns of a DataFrame. A separator . is added between the prefix and column names.

This is useful to group (prefix) related columns of a DataFrame under a common prefix or to remove a lengthy, verbose prefix present in multiple columns (unprefix) to obtain a derived DataFrame that is more comfortable to work with.

Example: prefixing columns with common prefix

Given the example DataFrame df:

import pandas as pd

df = pd.DataFrame({
    "name": ["Sarah", "Vivek", "Marco"],
    "age": [41, 29, 35],
    "house_nr": ["1492", "34-35", "48A"],
    "road": ["SE 36th Ave", "Seshadri Road", "Via Giosuè Carducci"],
    "city": ["Portland", "Bengaluru", "Milan"],
    "zip": [97214, 560009, 20123],
    "state": ["OR", "KA", pd.NA],
    "country": ["US", "IN", "IT"],
})

resulting in:

nameagehouse_nrroadcityzipstatecountry
0Sarah411492SE 36th AvePortland97214ORUS
1Vivek2934-35Seshadri RoadBengaluru560009KAIN
2Marco3548AVia Giosuè CarducciMilan20123<NA>IT

We can group columns that are part of the address, prefixing them with address:

from here.geopandas_adapter.utils.dataframe import prefix_columns

prefixed_df = prefix_columns(df, "address", ["house_nr", "road", "city", "zip", "country", "state"])

resulting in:

nameageaddress.house_nraddress.roadaddress.cityaddress.zipaddress.stateaddress.country
0Sarah411492SE 36th AvePortland97214ORUS
1Vivek2934-35Seshadri RoadBengaluru560009KAIN
2Marco3548AVia Giosuè CarducciMilan20123<NA>IT

Example: removing a common prefix

Continuing the example above, we can remove the address prefix and obtain the original DataFrame:

from here.geopandas_adapter.utils.dataframe import unprefix_columns

unprefixed_df = unprefix_columns(prefixed_df, "address")

resulting in:

nameagehouse_nrroadcityzipstatecountry
0Sarah411492SE 36th AvePortland97214ORUS
1Vivek2934-35Seshadri RoadBengaluru560009KAIN
2Marco3548AVia Giosuè CarducciMilan20123<NA>IT