πŸ”₯ API

Data provided by the package

Population data

Population data provided by ISTAT (Istituto Italiano di Statistica). Starting from version 0.4.0 the database contains data for every age from 0 to 100 years and divided into males and females. Male data will have the suffix _M and female data will have the suffix _F while the sum of both males and females will have no suffix at all.

Important

Age 100 contains data for age β‰₯ 100 years.

Default label naming rule

Considering population_limits=[25, 50, 75], the default label prefix naming rule will be:

  • <25 for the first element.

  • 25-50 for the second elements.

  • 50-75 for the penultimate element.

  • >=75 for the last element.

Don’t forget that those prefix will be combined with the suffix _M, _F, obtaining the following labels:

  • <25_M, <25_F and <25 for the first element (males, females, all).

  • 25-50_M, 25-50_F and 25-50 for the second element.

  • 50-75_M, 50-75_F and 50-75 for the penultimate element.

  • >=75_M, >=75_F and >=75 for the last element.

Customizing population group cutoffs

When using methods that take population_limits as parameter three different behaviour are contemplated.

  1. population_limits='total' - returns the total population with the prefix 'population' so return columns will be population_M, population_F and population.

  2. population_limits='auto' - returns population divided in default age groups using default age cutoffs that are [3, 11, 19, 25, 50, 65, 75]. The generated groups will be called according to the default label naming rule defined above.

  3. population_limits=[int] - returns population divided in age groups using the provided cutoffs. population_limits will be sorted in ascending order, converted to int and duplicates will be removed. The generated groups will be called according to the default label naming rule defined above.

Important

Remember that custom provided cutoffs will be used to generate groups whose lower bound is included and upper bound is excluded. For example, if you provide population_limits=[25, 50, 75] the generated groups will be [25, 50) and [50, 75). Plus a group that includes ages lower than the first cutoff and a group that includes ages greater or equal than the last will be added, in this example [0, 25) and the last group will be [75, ].

Warning

Values below or equal than 0 and greater than 100 will be ignored.

Some municipalities have no population data available. Italy-geopop will return numpy.nan.

Municipality data

  • municipality: str - municipality name, capitalized, available only if data is retrieved using italy_goepop.pandas_extension.ItalyGeopop.from_municipality().

  • municipality_code: int - municipality istat code, available only if data is retrieved using italy_goepop.pandas_extension.ItalyGeopop.from_municipality().

  • cadastral_code: str - municipality cadastral code (cadastral code), available only if data is retrieved using italy_goepop.pandas_extension.ItalyGeopop.from_municipality().

  • municipalities: list - a list of dictionaries with the following structure {'municipality_code': <municipality code:int>, 'municipality': <municipality name:str}, available only if data is retrieved using italy_goepop.pandas_extension.ItalyGeopop.from_province().

Province data

  • province: str - province name, capitalized, available only if data is retrieved using italy_goepop.pandas_extension.ItalyGeopop.from_province().

  • province_short -province abbreviation, uppercase, available only if data is retrieved using italy_goepop.pandas_extension.ItalyGeopop.from_province().

  • province_code: int - province istat code, available only if data is retrieved using italy_goepop.pandas_extension.ItalyGeopop.from_province().

  • provinces: list - a list of dictionaries with the following structure {'province_code': <province code:int>, 'province': <province name:str, 'municipalities':list[{'municipality_code': <municipality code:int>, 'municipality': <municipality name:str}]}, available only if data is retrieved using italy_goepop.pandas_extension.ItalyGeopop.from_region().

Region data

  • region: str - region name, capitalized, available if data is retrieved using italy_goepop.pandas_extension.ItalyGeopop.from_region().

  • region_code: int - region istat code, available if data is retrieved using italy_goepop.pandas_extension.ItalyGeopop.from_region().

Geospatial data

  • geometry: geometry types - geospatial data needed to plot geography, available only if accessor was activated with include_geometry=True.

italy_geopop.pandas_extension

class italy_geopop.pandas_extension.ItalyGeopop(pandas_obj: Any, include_geometry: bool = False, data_year: int | None = None)

Serves as base for registering italy_geopop as pandas accessor. You shouldn’t initalize it directly.

Instead, from a pandas.Series object you can access its methods via italy_geopop attribute.

from_municipality(return_cols: list | str | Pattern | None = None, regex: bool = False, population_limits: list | str = 'auto', population_labels: list | None = None) DataFrame

Get data for municipalities. Input series can contain municipalities names, municipalities istat codes or municipalities cadastral code (also known as Belfiore’s code); data types can also be mixed. If input data is not found in italian data, a row of NaNs is returned, this behaviour may change in the future.

Parameters:
  • return_cols (list[str] | None, optional.) – used to subset the returned data in order to provide the requested fields. If None, all available fields are returned. If is an instance of re.Pattern or is a string and regex param is True columns will be filtered and returned only if their names match the regular expression. The available fields are listed above, defaults to None.

  • regex (bool, optional.) – if True, return_cols is interpreted as a regex pattern, defaults to False.

  • population_limits (list[int] | str, optional.) – see above, can be a list of int or 'total' or 'auto', defaults to β€˜auto’.

  • population_labels (list[str] | None, optional.) – a list of strings that defines labels name, if None the default label naming rule will be used, defaults to None.

Raises:

KeyError – if return_cols is or contains a column not listed above or includes geometry and accessor was intialize without geometry data.

Returns:

Requested data in a 2-dimensional dataframe that has the same index of input data.

Return type:

pandas.DataFrame

from_province(return_cols: list | str | Pattern | None = None, regex: bool = False, population_limits: list | str = 'auto', population_labels: list | None = None) DataFrame

Get data for provinces. Input series can contain provinces names, provinces abbreviations or provinces istat codes; data types can also be mixed. If input data is not found in italian data, a row of NaNs is returned, this behaviour may change in the future.

Parameters:
  • return_cols (list[str] | None, optional.) – used to subset the returned data in order to provide the requested fields. If None, all available fields are returned. If is an instance of re.Pattern or is a string and regex param is True columns will be filtered and returned only if their names match the regular expression. The available fields are listed above, defaults to None.

  • regex (bool, optional.) – if True, return_cols is interpreted as a regex pattern, defaults to False.

  • population_limits (list[int] | str, optional.) – see above, can be a list of int or 'total' or 'auto', defaults to β€˜auto’.

  • population_labels (list[str] | None, optional.) – a list of strings that defines labels name, if None the default label naming rule will be used, defaults to None.

Note

To understand how municipalities are grouped, see above Municipality data.

Raises:

KeyError – if return_cols is or contains a column not listed above or includes geometry and accessor was intialize without geometry data.

Returns:

Requested data in a 2-dimensional dataframe that has the same index of input data.

Return type:

pandas.DataFrame

from_region(return_cols: list | str | Pattern | None = None, regex: bool = False, population_limits: list | str = 'auto', population_labels: list | None = None) DataFrame

Get data for regions. Input series can contain regions names or regions istat codes; data types can also be mixed. If input data is not found in italian data, a row of NaNs is returned, this behaviour may change in the future.

Parameters:
  • return_cols (list[str] | None, optional.) – used to subset the returned data in order to provide the requested fields. If None, all available fields are returned. If is an instance of re.Pattern or is a string and regex param is True columns will be filtered and returned only if their names match the regular expression. The available fields are listed above, defaults to None.

  • regex (bool, optional.) – if True, return_cols is interpreted as a regex pattern, defaults to False.

  • population_limits (list[int] | str, optional.) – see above, can be a list of int or 'total' or 'auto', defaults to β€˜auto’.

  • population_labels (list[str] | None, optional.) – a list of strings that defines labels name, if None the default label naming rule will be used, defaults to None.

Note

To understand how provinces are grouped, see above Province data.

Raises:

KeyError – if return_cols is or contains a column not listed above or includes geometry and accessor was intialize without geometry data.

Returns:

Requested data in a 2-dimensional dataframe that has the same index of input data.

Return type:

pandas.DataFrame

get_population_data(level: str = 'municipality', include_geometry: bool = False, population_limits: list | str = 'auto', population_labels: list | None = None) DataFrame

Same as italy_geopop.geopop.Geopop.compose_df().

smart_from_municipality(return_cols: list | str | Pattern | None = None, regex: bool = False, population_limits: list | str = 'auto', population_labels: list | None = None) DataFrame | Series

Same as from_municipality but can understand more complex text. Values are returned only if match is unequivocal.

1>>> s = pd.Series(["Comune di Abano Terme", "Comune di Airasca", "Comune di Milano o di Verona?", 1001])
2>>> s.italy_geopop.smart_from_municipality(return_cols='municipality')
30    Abano Terme
41        Airasca
52            NaN
63          Agliè
7Name: municipality, dtype: object
smart_from_province(return_cols: list | str | Pattern | None = None, regex: bool = False, population_limits: list | str = 'auto', population_labels: list | None = None) DataFrame | Series

Same as from_province but can understand more complex text. Values are returned only if match is unequivocal.

1>>> s = pd.Series(["Citta' di Brescia", "UniversitΓ  degli studi di Milano", "Milano o Verona", 5])
2>>> s.italy_geopop.smart_from_province(return_cols='province')
30    Brescia
41     Milano
52        NaN
63       Asti
7Name: province, dtype: object
smart_from_region(return_cols: list | str | Pattern | None = None, regex: bool = False, population_limits: list | str = 'auto', population_labels: list | None = None) DataFrame | Series

Same as from_region but can understand more complex text. Values are returned only if match is unequivocal.

1>>> s = pd.Series(["Regione Lombardia", "Regione del Veneto", "Piemonte o Umbria?", 15])
2>>> s.italy_geopop.smart_from_region(return_cols='region')
30    Lombardia
41       Veneto
52          NaN
63     Campania
7Name: region, dtype: object
italy_geopop.pandas_extension.pandas_activate(include_geometry=False, data_year: int | None = None)

Activate pandas extension registering class :py:class:ItalyGeopop as pandas.Series accessor named italy_geopop.

Parameters:
  • include_geometry (bool, optional.) – specifies if geometry column should also be returned when accessor is used, defaults to False.

  • data_year (int, optional.) – year of data to use, if None the latest available data will be used, defaults to None.

Returns:

None

Warning

include_geometry=True comports costs in term of speed as geospatial datasets need to be loaded.

italy_geopop.pandas_extension.pandas_activate_context(include_geometry=False, data_year: int | None = None)

Same as activate but lives within the context. Useful if you want to register the accessor with different initialization options more than once in your code or if you want to free up memory right after you get the needed data (the trade off is that italy-geopop needs to be reinitialized everytime you register and use the accessor).

Parameters:
Yields:

Context with italy_geopop accessor registered to pd.Series.

# pandas_activate_context example

with pandas_activate_context():
    # You can access italy_geopop here

# You cannot access italy_geopop here

italy_geopop.geopop

class italy_geopop.geopop.Geopop(data_year: int | None = None)

Bases: object

A class that contains italian geospatial and population data.

Parameters:

data_year (int, optional) – the year of the data you need; if None the latests is automatically picked, defaults to None

Raises:

ValueError – if data_year is not available

compose_df(level='municipality', include_geometry=False, population_limits: str | list = 'auto', population_labels: list | None = None)

Method to get a dataframe with administrative, geospatial and population data.

Parameters:
  • level (str, optional) – the level of details of the dataframe that can be muncipality or province or region, defaults to β€˜muncipality’.

  • include_geometry (bool, optional) – if True the dataframe will include geospatial data, defaults to False.

  • population_limits (str | list, optional) – a list of int or 'total' or 'auto', defaults to β€˜auto’.

  • population_labels (list | None, optional) – a list of str that defines labels name, defaults to None.

get_italian_population_for_municipalites(population_limits: str | list = 'auto', population_labels: list | None = None) DataFrame

Method to get italian population data for municipalities.

Parameters:
  • population_limits (str | list, optional) – a list of int or 'total' or 'auto', defaults to β€˜auto’.

  • population_labels (list[str] | None, optional) – a list of strings that defines labels name, defaults to None.

Raises:

ValueError if population_limits is not a list of int or a string in ['total', 'auto'].

Returns:

a 2-dimensional dataframe with municipality_code as index and many columns according to population_limits and population_labels, see above for more informations.

Return type:

pd.DataFrame

get_italian_population_for_provinces(population_limits: str | list = 'auto', population_labels: list | None = None) DataFrame

Method to get italian population data for provinces.

Parameters:
  • population_limits (str | list, optional) – a list of int or 'total' or 'auto', defaults to β€˜auto’.

  • population_labels (list[str] | None, optional) – a list of strings that defines labels name, defaults to None.

Raises:

ValueError if population_limits is not a list of int or a string in ['total', 'auto'].

Returns:

a 2-dimensional dataframe with province_code as index and many columns according to population_limits and population_labels, see above for more informations.

Return type:

pd.DataFrame

get_italian_population_for_regions(population_limits: str | list = 'auto', population_labels: list | None = None) DataFrame

Method to get italian population data for regions.

Parameters:
  • population_limits (str | list, optional) – a list of int or 'total' or 'auto', defaults to β€˜auto’.

  • population_labels (list[str] | None, optional) – a list of strings that defines labels name, defaults to None.

Raises:

ValueError if population_limits is not a list of int or a string in ['total', 'auto'].

Returns:

a 2-dimensional dataframe with region_code as index and many columns according to population_limits and population_labels, see above for more informations.

Return type:

pd.DataFrame

property italy_municipalities: DataFrame

Property to get italian municipalities data.

Returns:

a 2-dimensional dataframe with municipality_code as index and municipality, province_code, province, province_short, region, region_code as columns.

Return type:

pd.DataFrame

property italy_municipalities_geometry: DataFrame

Property to get geospatial data for plotting municipalities.

Returns:

a 2-dimensional dataframe with municipality_code as index and geometry as column.

Return type:

pd.DataFrame

property italy_provinces: DataFrame

Property to get italian provinces data.

Note

To understand how municipalities are grouped, see above Municipality data.

Returns:

a 2-dimensional dataframe with province_code as index and province, province_short, municipalities, region, region_code as columns.

Return type:

pd.DataFrame

property italy_provinces_geometry: DataFrame

Method to get geospatial data for plotting provinces.

Returns:

a 2-dimensional dataframe with province_code as index and geometry as column.

Return type:

pd.DataFrame

property italy_regions: DataFrame

Property to get italian regions data.

Note

To understand how provinces are grouped, see above Province data.

Returns:

a 2-dimensional dataframe with region_code as index and region, provinces as column.

Return type:

pd.DataFrame

property italy_regions_geometry: DataFrame

Method to get geospatial data for plotting regions.

Returns:

a 2-dimensional dataframe with region_code as index and geometry as column.

Return type:

pd.DataFrame

property population_df: DataFrame

Method to get italian population data.

Returns:

a 2-dimensional dataframe with municipality_code as index and and many columns with population data in long format (columns: age, F, M and tot).

Return type:

pd.DataFrame