Plotting Data on an Interactive Choropleth Map Using Python, GeoPandas, and Folium

Mar 7, 2018 09:00 · 752 words · 4 minutes read geopandas choropleth data-visualization geo-spatial folium census

Choropleth maps are a great way to visualize geo-spatial data, and luckily for us, there are several great packages to do this in Python. In this tutorial, I will be using Folium, but you can check out my high level analysis of the other options in the Appendix.

import io
import requests
import zipfile

import folium
import geopandas as gpd
import pandas as pd
import seaborn as sns

You can naivgate the various census TIGER files through the website: https://www.census.gov/geo/maps-data/data/tiger.html
or the FTP site: https://www2.census.gov/geo/tiger/TIGER_DP/
I believe all of the census TIGER files are zipped up in geodatabase format, so it’s fairly easy to pass the data into geopandas after it has been unzipped

url = "https://www2.census.gov/geo/tiger/TIGER_DP/2015ACS/ACS_2015_5YR_COUNTY.gdb.zip"
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
zip_output_path = z.namelist()[0].split('/', 1)[0]
z.extractall()
gdf = gpd.read_file(zip_output_path)

The column headers can be a little cryptic. The document below gives us a basic data dictionary, but unfortunately its in a pdf format: https://www2.census.gov/geo/pdfs/maps-data/data/tiger/tgrshp2017/TGRSHP2017_TechDoc.pdf

Lets take a quick look at the data:

gdf.iloc[0] # print the first record
STATEFP                                                        31
COUNTYFP                                                      039
COUNTYNS                                                 00835841
GEOID                                                       31039
NAME                                                       Cuming
NAMELSAD                                            Cuming County
LSAD                                                           06
CLASSFP                                                        H1
MTFCC                                                       G4020
CSAFP                                                            
CBSAFP                                                           
METDIVFP                                                         
FUNCSTAT                                                        A
ALAND                                                  1.4779e+09
AWATER                                                1.04474e+07
INTPTLAT                                              +41.9158651
INTPTLON                                             -096.7885168
Shape_Length                                              1.62443
Shape_Area                                               0.161524
GEOID_Data                                           05000US31039
geometry        (POLYGON ((-97.01951600006333 42.0040969998767...
Name: 0, dtype: object

For the purposes of a choroplot, we want some numeric metric to plot. I calculate the water area percent for each county as the metric that we can plot later.

We also need to specify our colors, and I am doing this manually using seaborn color palettes and the style attribute of our GeoDataFrame. The 'style' column is special, because folium will look for a column named 'style' to provide the the style attributes (similar to the 'geography' column).

quantiles = 20
gdf['PCT_WATER'] = gdf['AWATER'] / (gdf['AWATER'] + gdf['ALAND'])
# assign quantiles and colors for each county
gdf['quantile'] = pd.qcut(gdf['PCT_WATER'], quantiles, labels=False)
colors = sns.color_palette("coolwarm", quantiles).as_hex()
gdf['style'] = gdf['quantile'].apply(
    lambda l: {
        'fillColor': colors[quantiles-1-int(l)], 
        'fillOpacity': 0.7, 
        'weight': 2, 
        'color': 'black'})

One of the limitations of folium in a jupyter notebook is that you might get an IOPub data rate error if you try to load up too many geometries. You can change your IOPub data rate, or you can simply load up less data. In this case I am going to load less data by selecting just the counties in NY state (FIPS code 36).

STATE_CODE = '36' # You can pick any state code, but for this demo I will just use NY, state code 36
lat_avg = gdf[gdf['STATEFP'] == STATE_CODE]['INTPTLAT'].astype(float).mean()
lon_avg = gdf[gdf['STATEFP'] == STATE_CODE]['INTPTLON'].astype(float).mean()
m = folium.Map([lat_avg, lon_avg], zoom_start=6, tiles='cartodbpositron')

By default, the geometries that you get from the Census bureau are very detailed. This can cause problems loading in Chrome, so I use the geopandas simplify function to reduce the complexity of the geometries while still preserving the shape. Setting the tolerance takes a little trial and error, and in this particular example, I found that 0.01 worked well (a higher tolerance produces simpler geometry). By setting preserve_topology to False we can speed up the operation, and produce simpler geometries.

gdf['geometry'] = gdf['geometry'].simplify(0.01, preserve_topology=False)

One limitation of folium is the popup functionality. Other choropleth tools like Plotly and Bokeh provide very easy out-of-the-box popups, but the functionality is a bit more limited in folium, where a popup is set for the entire GeoJSON object. We are able to navigate around this by creating a single GeoJSON object for each row in our geopandas dataframe:

for i in range(len(gdf)):
    if gdf['STATEFP'][i] == STATE_CODE:
        gs = folium.GeoJson(gdf.iloc[i:i+1])
        label = '{}: {}% water'.format(
            gdf['NAME'][i], round(gdf['PCT_WATER'][i]*100, 1))
        folium.Popup(label).add_to(gs)
        gs.add_to(m)

Finally, we can display our map object! High water area counties are shown in blue, while high land area counties are shown in red. Try clicking on a county to see the exact water percentage!

m

Appendix

Folium leverages Leaflet.js on the front end, making it one of the most interactive and visually appealing options for plotting geographic data in my opinion. The drawback is that it lacks some of the functionality that the other packages offer.

plotly is another great interactive option, with great popup functionality. Plotly uses a freemium model, and if you use the free version, your plot will link back to the corporate plotly website.

Bokeh is very similar to plotly, but not quite as clean. Bokeh does not include the corporate link.

GeoPandas offers some really easy to use plotting functionality, but lacks the interactive functionality that the other packages offer .