π₯ APIο
Data provided by the packageο
Population dataο
Population data provided by ISTAT (Istituto Italiano di Statistica). Starting from version 0.4.0 the database contains data for every age from 0 to 100 years and divided into males and females.
Male data will have the suffix _M and female data will have the suffix _F while the sum of both males and females will have no suffix at all.
Important
Age 100 contains data for age β₯ 100 years.
Default label naming rule
Considering population_limits=[25, 50, 75], the default label prefix naming rule will be:
<25for the first element.25-50for the second elements.50-75for the penultimate element.>=75for the last element.
Donβt forget that those prefix will be combined with the suffix _M, _F, obtaining the following labels:
<25_M,<25_Fand<25for the first element (males, females, all).25-50_M,25-50_Fand25-50for the second element.50-75_M,50-75_Fand50-75for the penultimate element.>=75_M,>=75_Fand>=75for the last element.
Customizing population group cutoffs
When using methods that take population_limits as parameter three different behaviour are contemplated.
population_limits='total'- returns the total population with the prefix'population'so return columns will bepopulation_M,population_Fandpopulation.population_limits='auto'- returns population divided in default age groups using default age cutoffs that are[3, 11, 19, 25, 50, 65, 75]. The generated groups will be called according to the default label naming rule defined above.population_limits=[int]- returns population divided in age groups using the provided cutoffs.population_limitswill be sorted in ascending order, converted to int and duplicates will be removed. The generated groups will be called according to the default label naming rule defined above.
Important
Remember that custom provided cutoffs will be used to generate groups whose lower bound is included and upper bound is excluded. For example, if you provide population_limits=[25, 50, 75] the generated groups will be [25, 50) and [50, 75). Plus a group that includes ages lower than the first cutoff and a group that includes ages greater or equal than the last will be added, in this example [0, 25) and the last group will be [75, ].
Warning
Values below or equal than 0 and greater than 100 will be ignored.
Some municipalities have no population data available. Italy-geopop will return numpy.nan.
Municipality dataο
municipality: str- municipality name, capitalized, available only if data is retrieved usingitaly_goepop.pandas_extension.ItalyGeopop.from_municipality().municipality_code: int- municipality istat code, available only if data is retrieved usingitaly_goepop.pandas_extension.ItalyGeopop.from_municipality().cadastral_code: str- municipality cadastral code (cadastral code), available only if data is retrieved usingitaly_goepop.pandas_extension.ItalyGeopop.from_municipality().municipalities: list- a list of dictionaries with the following structure{'municipality_code': <municipality code:int>, 'municipality': <municipality name:str}, available only if data is retrieved usingitaly_goepop.pandas_extension.ItalyGeopop.from_province().
Province dataο
province: str- province name, capitalized, available only if data is retrieved usingitaly_goepop.pandas_extension.ItalyGeopop.from_province().province_short-province abbreviation, uppercase, available only if data is retrieved usingitaly_goepop.pandas_extension.ItalyGeopop.from_province().province_code: int- province istat code, available only if data is retrieved usingitaly_goepop.pandas_extension.ItalyGeopop.from_province().provinces: list- a list of dictionaries with the following structure{'province_code': <province code:int>, 'province': <province name:str, 'municipalities':list[{'municipality_code': <municipality code:int>, 'municipality': <municipality name:str}]}, available only if data is retrieved usingitaly_goepop.pandas_extension.ItalyGeopop.from_region().
Region dataο
region: str- region name, capitalized, available if data is retrieved usingitaly_goepop.pandas_extension.ItalyGeopop.from_region().region_code: int- region istat code, available if data is retrieved usingitaly_goepop.pandas_extension.ItalyGeopop.from_region().
Geospatial dataο
geometry: geometry types- geospatial data needed to plot geography, available only if accessor was activated withinclude_geometry=True.
italy_geopop.pandas_extensionο
- class italy_geopop.pandas_extension.ItalyGeopop(pandas_obj: Any, include_geometry: bool = False, data_year: int | None = None)
Serves as base for registering
italy_geopopas pandas accessor. You shouldnβt initalize it directly.Instead, from a
pandas.Seriesobject you can access its methods viaitaly_geopopattribute.- from_municipality(return_cols: list | str | Pattern | None = None, regex: bool = False, population_limits: list | str = 'auto', population_labels: list | None = None) DataFrame
Get data for municipalities. Input series can contain municipalities names, municipalities istat codes or municipalities cadastral code (also known as Belfioreβs code); data types can also be mixed. If input data is not found in italian data, a row of NaNs is returned, this behaviour may change in the future.
- Parameters:
return_cols (list[str] | None, optional.) β used to subset the returned data in order to provide the requested fields. If None, all available fields are returned. If is an instance of re.Pattern or is a string and regex param is True columns will be filtered and returned only if their names match the regular expression. The available fields are listed above, defaults to None.
regex (bool, optional.) β if True, return_cols is interpreted as a regex pattern, defaults to False.
population_limits (list[int] | str, optional.) β see above, can be a list of int or
'total'or'auto', defaults to βautoβ.population_labels (list[str] | None, optional.) β a list of strings that defines labels name, if None the default label naming rule will be used, defaults to None.
- Raises:
KeyError β if return_cols is or contains a column not listed above or includes
geometryand accessor was intialize without geometry data.- Returns:
Requested data in a 2-dimensional dataframe that has the same index of input data.
- Return type:
pandas.DataFrame
- from_province(return_cols: list | str | Pattern | None = None, regex: bool = False, population_limits: list | str = 'auto', population_labels: list | None = None) DataFrame
Get data for provinces. Input series can contain provinces names, provinces abbreviations or provinces istat codes; data types can also be mixed. If input data is not found in italian data, a row of NaNs is returned, this behaviour may change in the future.
- Parameters:
return_cols (list[str] | None, optional.) β used to subset the returned data in order to provide the requested fields. If None, all available fields are returned. If is an instance of re.Pattern or is a string and regex param is True columns will be filtered and returned only if their names match the regular expression. The available fields are listed above, defaults to None.
regex (bool, optional.) β if True, return_cols is interpreted as a regex pattern, defaults to False.
population_limits (list[int] | str, optional.) β see above, can be a list of int or
'total'or'auto', defaults to βautoβ.population_labels (list[str] | None, optional.) β a list of strings that defines labels name, if None the default label naming rule will be used, defaults to None.
Note
To understand how
municipalitiesare grouped, see above Municipality data.- Raises:
KeyError β if return_cols is or contains a column not listed above or includes
geometryand accessor was intialize without geometry data.- Returns:
Requested data in a 2-dimensional dataframe that has the same index of input data.
- Return type:
pandas.DataFrame
- from_region(return_cols: list | str | Pattern | None = None, regex: bool = False, population_limits: list | str = 'auto', population_labels: list | None = None) DataFrame
Get data for regions. Input series can contain regions names or regions istat codes; data types can also be mixed. If input data is not found in italian data, a row of NaNs is returned, this behaviour may change in the future.
- Parameters:
return_cols (list[str] | None, optional.) β used to subset the returned data in order to provide the requested fields. If None, all available fields are returned. If is an instance of re.Pattern or is a string and regex param is True columns will be filtered and returned only if their names match the regular expression. The available fields are listed above, defaults to None.
regex (bool, optional.) β if True, return_cols is interpreted as a regex pattern, defaults to False.
population_limits (list[int] | str, optional.) β see above, can be a list of int or
'total'or'auto', defaults to βautoβ.population_labels (list[str] | None, optional.) β a list of strings that defines labels name, if None the default label naming rule will be used, defaults to None.
Note
To understand how
provincesare grouped, see above Province data.- Raises:
KeyError β if return_cols is or contains a column not listed above or includes
geometryand accessor was intialize without geometry data.- Returns:
Requested data in a 2-dimensional dataframe that has the same index of input data.
- Return type:
pandas.DataFrame
- get_population_data(level: str = 'municipality', include_geometry: bool = False, population_limits: list | str = 'auto', population_labels: list | None = None) DataFrame
Same as
italy_geopop.geopop.Geopop.compose_df().
- smart_from_municipality(return_cols: list | str | Pattern | None = None, regex: bool = False, population_limits: list | str = 'auto', population_labels: list | None = None) DataFrame | Series
Same as
from_municipalitybut can understand more complex text. Values are returned only if match is unequivocal.1>>> s = pd.Series(["Comune di Abano Terme", "Comune di Airasca", "Comune di Milano o di Verona?", 1001]) 2>>> s.italy_geopop.smart_from_municipality(return_cols='municipality') 30 Abano Terme 41 Airasca 52 NaN 63 Agliè 7Name: municipality, dtype: object
- smart_from_province(return_cols: list | str | Pattern | None = None, regex: bool = False, population_limits: list | str = 'auto', population_labels: list | None = None) DataFrame | Series
Same as
from_provincebut can understand more complex text. Values are returned only if match is unequivocal.1>>> s = pd.Series(["Citta' di Brescia", "UniversitΓ degli studi di Milano", "Milano o Verona", 5]) 2>>> s.italy_geopop.smart_from_province(return_cols='province') 30 Brescia 41 Milano 52 NaN 63 Asti 7Name: province, dtype: object
- smart_from_region(return_cols: list | str | Pattern | None = None, regex: bool = False, population_limits: list | str = 'auto', population_labels: list | None = None) DataFrame | Series
Same as
from_regionbut can understand more complex text. Values are returned only if match is unequivocal.1>>> s = pd.Series(["Regione Lombardia", "Regione del Veneto", "Piemonte o Umbria?", 15]) 2>>> s.italy_geopop.smart_from_region(return_cols='region') 30 Lombardia 41 Veneto 52 NaN 63 Campania 7Name: region, dtype: object
- italy_geopop.pandas_extension.pandas_activate(include_geometry=False, data_year: int | None = None)
Activate pandas extension registering class :py:class:ItalyGeopop as pandas.Series accessor named
italy_geopop.- Parameters:
include_geometry (bool, optional.) β specifies if geometry column should also be returned when accessor is used, defaults to False.
data_year (int, optional.) β year of data to use, if None the latest available data will be used, defaults to None.
- Returns:
None
Warning
include_geometry=Truecomports costs in term of speed as geospatial datasets need to be loaded.
- italy_geopop.pandas_extension.pandas_activate_context(include_geometry=False, data_year: int | None = None)
Same as activate but lives within the context. Useful if you want to register the accessor with different initialization options more than once in your code or if you want to free up memory right after you get the needed data (the trade off is that italy-geopop needs to be reinitialized everytime you register and use the accessor).
- Parameters:
include_geometry β same as italy_geopop.activate.
data_year β
same as italy_geopop.activate.
- Yields:
Context with
italy_geopopaccessor registered to pd.Series.
# pandas_activate_context example with pandas_activate_context(): # You can access italy_geopop here # You cannot access italy_geopop here
italy_geopop.geopopο
- class italy_geopop.geopop.Geopop(data_year: int | None = None)
Bases:
objectA class that contains italian geospatial and population data.
- Parameters:
data_year (int, optional) β the year of the data you need; if None the latests is automatically picked, defaults to None
- Raises:
ValueError β if
data_yearis not available
- compose_df(level='municipality', include_geometry=False, population_limits: str | list = 'auto', population_labels: list | None = None)
Method to get a dataframe with administrative, geospatial and population data.
- Parameters:
level (str, optional) β the level of details of the dataframe that can be
muncipalityorprovinceorregion, defaults to βmuncipalityβ.include_geometry (bool, optional) β if True the dataframe will include geospatial data, defaults to False.
population_limits (str | list, optional) β a list of int or
'total'or'auto', defaults to βautoβ.population_labels (list | None, optional) β a list of str that defines labels name, defaults to None.
- get_italian_population_for_municipalites(population_limits: str | list = 'auto', population_labels: list | None = None) DataFrame
Method to get italian population data for municipalities.
- Parameters:
population_limits (str | list, optional) β a list of int or
'total'or'auto', defaults to βautoβ.population_labels (list[str] | None, optional) β a list of strings that defines labels name, defaults to None.
- Raises:
ValueError if
population_limitsis not a list of int or a string in['total', 'auto'].- Returns:
a 2-dimensional dataframe with
municipality_codeas index and many columns according topopulation_limitsandpopulation_labels, see above for more informations.- Return type:
pd.DataFrame
- get_italian_population_for_provinces(population_limits: str | list = 'auto', population_labels: list | None = None) DataFrame
Method to get italian population data for provinces.
- Parameters:
population_limits (str | list, optional) β a list of int or
'total'or'auto', defaults to βautoβ.population_labels (list[str] | None, optional) β a list of strings that defines labels name, defaults to None.
- Raises:
ValueError if
population_limitsis not a list of int or a string in['total', 'auto'].- Returns:
a 2-dimensional dataframe with
province_codeas index and many columns according topopulation_limitsandpopulation_labels, see above for more informations.- Return type:
pd.DataFrame
- get_italian_population_for_regions(population_limits: str | list = 'auto', population_labels: list | None = None) DataFrame
Method to get italian population data for regions.
- Parameters:
population_limits (str | list, optional) β a list of int or
'total'or'auto', defaults to βautoβ.population_labels (list[str] | None, optional) β a list of strings that defines labels name, defaults to None.
- Raises:
ValueError if
population_limitsis not a list of int or a string in['total', 'auto'].- Returns:
a 2-dimensional dataframe with
region_codeas index and many columns according topopulation_limitsandpopulation_labels, see above for more informations.- Return type:
pd.DataFrame
- property italy_municipalities: DataFrame
Property to get italian municipalities data.
- Returns:
a 2-dimensional dataframe with
municipality_codeas index andmunicipality,province_code,province,province_short,region,region_codeas columns.- Return type:
pd.DataFrame
- property italy_municipalities_geometry: DataFrame
Property to get geospatial data for plotting municipalities.
- Returns:
a 2-dimensional dataframe with
municipality_codeas index andgeometryas column.- Return type:
pd.DataFrame
- property italy_provinces: DataFrame
Property to get italian provinces data.
Note
To understand how
municipalitiesare grouped, see above Municipality data.- Returns:
a 2-dimensional dataframe with
province_codeas index andprovince,province_short,municipalities,region,region_codeas columns.- Return type:
pd.DataFrame
- property italy_provinces_geometry: DataFrame
Method to get geospatial data for plotting provinces.
- Returns:
a 2-dimensional dataframe with
province_codeas index andgeometryas column.- Return type:
pd.DataFrame
- property italy_regions: DataFrame
Property to get italian regions data.
Note
To understand how
provincesare grouped, see above Province data.- Returns:
a 2-dimensional dataframe with
region_codeas index andregion,provincesas column.- Return type:
pd.DataFrame
- property italy_regions_geometry: DataFrame
Method to get geospatial data for plotting regions.
- Returns:
a 2-dimensional dataframe with
region_codeas index andgeometryas column.- Return type:
pd.DataFrame
- property population_df: DataFrame
Method to get italian population data.
- Returns:
a 2-dimensional dataframe with
municipality_codeas index and and many columns with population data in long format (columns:age,F,Mandtot).- Return type:
pd.DataFrame