SVInsight¶
A Census API key is needed to run this package. You can obtain an API key from the Census Bureau’s developer page.
- class svinsight.svi.SVInsight(project_name: str, file_path: str, api_key: str, geoids: list[str])¶
A class to calculate a Social Vulnerability Index.
- __init__(project_name: str, file_path: str, api_key: str, geoids: list[str])¶
Initialize the SVInsight class.
- Parameters:
project_name (str) – The name of the project. Will be used in file structure and names of saved files.
file_path (str) – The file path where the project will be saved.
api_key (str) – The Census API key for accessing data.
geoids (list[str]) – A list of geographic identifiers. Must all be either length 2 or 5.
- Raises:
FileNotFoundError – If the file path does not exist.
ValueError – If the GEOIDs are not all of the same length, either 2 or 5.
TypeError – If a value is not of the expected type.
- __weakref__¶
list of weak references to the object
- add_variable(boundary: str, year: int, name: str, num: list, den: list = [1], description: str = None)¶
Add additional variable and collect necessary raw data.
- Parameters:
boundary (str) – The boundary type for the variable. Should be either ‘bg’ or ‘tract’.
year (int) – The year for which the raw data is collected.
name (str) – The name of the variable.
num (list) – The list of numerator variables used to calculate the variable.
den (list, optional) – The list of denominator variables used to calculate the variable. Default is [1].
description (str, optional) – Optional description of the variable. Default is None.
- Raises:
ValueError – If the variable name already exists.
ValueError – If the boundary type is invalid or the year is not between 2013 and 2022.
FileNotFoundError – If the raw data file doesn’t exist. Run the census_data method first.
- Returns:
None
- Return type:
None
- boundaries_data(boundary: str = 'bg', year: int = 2019, overwrite: bool = False) GeoDataFrame ¶
Pulls block group or tract data from the Census FTP site.
- Parameters:
boundary (str) – The type of boundary. Defaults to ‘bg’. Acceptable values are ‘bg’, or ‘tract’.
year (int) – The year of the data. Defaults to ‘2019’.
overwrite (bool) – whether or not to overwrite an existing geopackage if it exists. Default is False
- Returns:
The boundary data as a GeoDataFrame.
- Return type:
gpd.GeoDataFrame
- Raises:
ValueError – If the boundary type is invalid, the year is not between 2013 and 2022, or geoids not properly formatted.
- calculate_svi(config_file: str, boundary: str = 'bg', year: int = 2019)¶
Calculate the Social Vulnerability Index (SVI) using two different methods.
- Parameters:
config_file (str) – The name of the configuration file (without the extension) containing the SVI variables.
boundary (str) – The boundary type for the SVI calculation. Default is ‘bg’.
year (int) – The year for which the SVI is calculated. Default is 2019.
- Returns:
None
- Raises:
ValueError – If the boundary type is invalid or the year is not between 2013 and 2022,
This method reads a configuration file in YAML format, loads the raw data as a dataframe, calculates the SVI using two different methods, and saves the results to output files.
Method 1: Iterative Factor Analysis
Conducts factor analysis on the input variables to identify significant components.
Scales the data and calculates initial loading factors.
Iteratively refactors the data based on the Kaiser Criterion until all significant variables are included.
Calculates the SVI using the scaled factors and the ratio of variance.
Appends the SVI variables to the output dataframe.
Method 2: Rank Method
Ranks the input variables in descending order for each observation.
Calculates the sum of ranks for each observation.
Calculates the SVI using the rank sum.
Appends the SVI variables to the output dataframe.
The output dataframe is saved as a GeoPackage file and a CSV file. Additionally, intermediate results from Method 1 such as significant components, loading factors, and variances are saved in an Excel file for documentation purposes.
- census_data(boundary: str = 'bg', year: int = 2019, interpolate: bool = True, verbose: bool = False, overwrite: bool = False)¶
Pulls Census data for a specific boundary and year. The Census API can sometimes error out. Waiting a few seconds/minutes and re-running should solve the issue.
- Parameters:
boundary (str) – The boundary type to retrieve data for. Valid options are ‘bg’ (block group) and ‘tract’ (census tract).
year (int) – The year of the Census data to retrieve. Valid options are from 2011 to 2021.
interpolate (bool, optional) – Whether to interpolate missing data. Defaults to True. If year is before 2014, ignored and not-interpolated.
verbose (bool, optional) – Whether to display verbose output. Defaults to False.
overwrite (bool, optional) – Whether to overwrite existing data. Defaults to False.
- Raises:
ValueError – If the boundary type is invalid or the year is not between 2013 and 2022.
FileNotFoundError – If the shapefile for the specified boundary and year does not exist.
- Returns:
None
- Return type:
None
- configure_variables(config_file: str, exclude: list = None, include: list = None, inverse_vars: list = ['PERCAP', 'QRICH', 'MDHSEVAL'])¶
Configure variables and save them to a YAML file.
- Parameters:
config_file (str) – The name of the configuration file.
exclude (list, optional) – Optional. A list of variable names to exclude for the configuration. Defaults to None.
include (list, optional) – Optional. A list of variable names to only include for the configuration. Defaults to None.
inverse_vars (list, optional) – Optional. A list of variable names to be inverted. Defaults to [‘PERCAP’, ‘QRICH’, ‘MDHSEVAL’].
- Returns:
None
- Raises:
ValueError – If exclude and include arguments are both passed.
ValueError – If a variable in the exclude list is not available in self.all_vars_eqs.
ValueError – If a variable in the include list is not available in self.all_vars_eqs.
- plot_svi(plot_option: int, geopackages: list)¶
Simple plotting method to quickly map an SVI variable or compare two SVIs.
- Parameters:
plot_option (int) – Which plot method to use: Either 1 (single SVI map), 2 (two side by side maps), or 3 (full comparison figure).
geopackages (list) – The required information for plotting, must be format: [year, boundary, config, variable]. Nested list if plot_option 2.
- Returns:
matplotlib figure object
- Raises:
ValueError – If the boundary type is invalid or the year is not between 2013 and 2022,
This method quickly creates an example SVI plot either by itself or in a comparative format. The plot options and their required information can be found below.
Plot Option 1: Single Plot
A single figure of a single SVI estimate. The geopackage parameter must be in the format [year, boundary, config, variable] where:
Year: SVI estimate year (int)
Boundary: Boundary of interest (‘bg’ or ‘tract’, str)
Config: Which config file was used to create the SVI estimate (str)
Variable: Which SVI variable to plot (i.e., the attributes of the SVI geopackages created, str).
Plot Option 2: Simple Comparative Plot
A simple two by one figure to visually compare two differnet SVI estimates. These estimates can be from the same or different geopackages. The geopackages parameter should be a nested list of the same variables as described in plot option 1: [[year, boundary, config, variable],[year, boundary, config, variable]].
Plot Option 3: Complete Comparative Plot
A more detailed plotting option, that will produce a difference plot and calculate a linear regression. Because the difference map and linear regression require the same set of input geoids (i.e., the same locations in the geopackage), it is currently required that the variables come from the same geopackage, and its intended purpose is to therefore compare the differences between the Factor Analysis and Rank Method methodologies that have the same configuration. The geopackages input should be formated as follows: [year, boundary, config]. The additional plots show the following information:
Difference plot: Shows the The FA_SVI_Rank minus the RM_SVI_Rank to highlight areas where the factor analysis method is under (negative) and over (positive) predicting SVI rank when compared to the rank method.
Linear Regression: Shows linear correlation betweeen factor analysis and rank method SVI estimates and automatically computes an r-squared value with p-value, 95% confidence interval, and 95% prediction interval.
- var_descriptions(vars: list = None)¶
Print the descriptions of the variables.
- Parameters:
vars (list) – Optional. List of variables to print descriptions for. If not provided, descriptions for all variables will be printed.
- Raises:
ValueError – If any variable in vars is not an available variable.
- Returns:
None
- Return type:
None