Analyzing raw survey data and giving insights to non-technical stakeholders. 

State Farm Summer Innovation Research Fellowship | Summer 2019


Every 10 years, the City of Bloomington conducts a survey to determine the “slum and bleak” areas in the city and demonstrate the need of CDBG (Community Development Block Grant) funding to U.S. Department of Housing and Urban Development via the Buildings and Conditions Report, which summaries the study from this survey. In order to be eligible for funding every CDBG-funded activity must qualify as meeting one of HUD’s three national objectives.

The three objectives are:

• Benefiting low- and moderate-income persons
• Preventing or eliminating slums or blight, or
• Meeting other community development needs that have a particular urgency because existing conditions pose a serious and immediate threat to the health and welfare of the community when financial resources are not available to meet such needs.

In 2008, the city determined the highlighted area above, bounded by Morris Avenue on the west, Taylor Street on the south, Lee Street on the east, and Locust Street on the north, to be “slum and bleak”. The survey was done again in 2018, and the city had not been able to analyze the data and publish a report at the time of this fellowship (summer 2019).

West Bloomington Revitalization Project (WBRP), the non-profit I worked with through this fellowship, was formed in 2008 to help revitalize the area and bring it to par with the rest of the city. My supervisor for this research is
one of the board members of WBRP and I wanted to enable them to make data-driven decisions and redirect their efforts where they could have maximum impact.


Challenge #1: There was no raw survey data from ’08 to comparatively analyze with 2018’s.


We just had the report. Any data that was collected back then had not been stored digitally for easy access in the future. Recognizing this issue, some of the questions I wanted to answer at the end of this project did not have to do so much with comparing the two datasets but rather with determining where we stood right now and how much work was needed to get where we wanted to be.

Where does West Bloomington stand now?
Which aspects of housing have improved over the past decade with the help of WBRP and the City of Bloomington? And by how much?
Can we identify areas that could benefit from a particular type of programming, for example workshops for household repairs?


Through extensive literature review, I learnt more about the City’s and other non-profits’ efforts into revitalizing West Bloomington.


CDBG funds are allocated on a formula basis. Each year, CDBG funds are distributed to state and local governments according to their population, poverty, and other housing variables. Annual funding priorities are set by each community, subject to HUD eligibility, based on identified needs and priorities. In recent years, Bloomington and Normal have allocated their funds to a variety of programs, including homeownership assistance, residential rehabilitation and infrastructure improvements.

The 2008 Buildings and Conditions Report concluded by proposing that the funding from HUD (CDBG) be focused on “improvement to building and/or infrastructure conditions include activities such as: clearance, rehabilitation (i.e. costs of labor, materials, supplies for the rehabilitation of property, including repair or replacement of principal fixtures and components of existing structures, financing, refinancing, security devices, conservation, water and sewer, barrier removal, historic preservation, lead-based paint hazard evaluation and reduction,) public facilities and capital improvements (i.e. streets, sidewalks, curbs and gutters, parks, playgrounds, water and sewer lines, flood and drainage improvements, and utility lines.)”


Challenge #2: The rating scale in ’08 was different than the one in ’18. While in ’18, the rating scale was 0-4, in ’08 it was 1-6. Number of components differed as well.


During the summer of 2008, twelve building components were ranked to evaluate the residential buildings (both single family and rental dwellings) within the West Bloomington Neighborhood Plan area. The twelve components include: foundation, roof, exterior wall, windows, screens/storms, chimney tower, porch, porch steps, guttering, sidewalk/driveway, garage, and accessory structure. Each of these items were given a rank between one (1) and six (6); with one being the in the best condition and six being in the worst condition.

During the summer of 2018, thirteen building components were ranked to evaluate the residential buildings (both single family and rental dwellings) within the West Bloomington Neighborhood Plan area. The thirteen components include: foundation, roof, siding, windows, screens/storms, chimney, porch, entry stairs, guttering, sidewalk, driveway, public sidewalk, and accessory structure. Each of these items were given a rank between one (1) and four (4); with one being the in the best condition and six being in the worst condition.


Survey data is often messy and requires strategic data cleaning. In my case, not all properties had a building to rate, not all of them had a gutter or a chimney. Moreover, data collection for this survey was done nearly an year before I had a chance to look at it, and on paper. My supervisor and her formerstudent had rated all the properties in west Bloomington neighborhood.

Between then and the time the data got to me digitally, a lot of errors were made. Originally, the properties were only given a rating of 1, 2, 3, or 4 but somehow the data I received also had zeroes and blank spaces.

During the data cleaning process, I proceeded – with guidance from my supervisor — by considering “0” to mean “N/A”, which was a valid category in the survey. I used Google Colab to run my Jupyter Notebook, where I did all the cleaning and analysis.

Python has powerful libraries for data analysis and I taught myself Pandas and NumPy to be able to understand this data. As the first step, it made sense to look at the descriptive stats and understand if we could distinguish the categories that had better ratings than the rest.

As you can see in the image below, the number of properties rated differed by a lot. We only had maximum ratings for public sidewalk, siding, and windows. However, something noteworthy to look at was the mean and standard deviation.

      • Public Sidewalks were the best rated, with a mean value of 2.08.
      • All of them had the max rating of 4, which means at least one property was rated poor for each category.
      • Medians were “fair” for driveways and chimneys.



The 2008 Buildings and Conditions Report concluded by categorizing percentages of properties “sound”, “minor”, or “major/critical”. Of the 657 buildings ranked (exterior only); 125 (19%) are labeled “sound”, 282 (43%) are labeled “minor”, and 250 (38%) are labeled “major/critical”. The average overall building ranking was 2.76 which fell into the “minor” category.

The building component rankings were weighted for the overall structure ranking based on the components importance to the structural integrity of the building. The most important components, vital to structural integrity, included: foundation, roof, exterior walls, and windows; these components were weighted to contribute to 90% of the
structures overall ranking. The weight of the components rankings are listed below in percentages:

Foundation: 30%
Roof: 30%
Exterior Wall: 20%
Windows: 10%
Porch: 2%
Porch Steps: 2%
Screens/Storms: 1%
Chimney Tower: 1%
Guttering: 1%
Accessory Structure: 1%
Garage: 1%
Sidewalk/Drive: 1%
Total: 100%

The overall ranking was calculated (see equation below) by multiplying the components rank by its assigned weight. Each weighted ranking was totaled and divided by the total of potential weights.


83 out of 516 properties had an overall score > 3 => 16% were “major/critical”, down from 38% in 2008.

Since exterior walls weren’t rated in 2018, we only have the foundation, roof, and windows as the most important components for these residential buildings. From the analysis in my Python notebook, I have:






Overall scores for 3 major components:
1.0 (Excellent) = 5.45%
2.0 (Good)      = 53.72%
3.0 (Fair)      = 29.91%
4.0 (Poor)      = 10.89%


Also in the notebook, I used histograms to visualize the pattern of rating across the neighborhood.

These visualizations confirmed that majority of the ratings fell into the ” good” category, indicated by number 2, across all the elements. The rest of them — excellent (1), fair (3), poor (4) — followed a similar trend across all the element ratings. More number of properties were poor than critical, except for driveway, which had the same number of driveways in critical conditions as in poor.

Recognizing that histograms made little, if any, sense to the stakeholders (in this case, the board members of WBRP), I proceeded to present the data in a way that would promote intuitive understanding of where we stood.


Detailed visual analysis of gutter ratings

The 2008 Buildings and Conditions report was my only reference point when trying to measure the improvements in West Bloomington neighborhood, so I started by putting maps from 2008 and 2018 side by side for a quick visual comparison. 

This is a scaled down version of this part of the study. For conciseness, I have only included analysis of gutter here.

Curb and Gutter conditions from 2008 as reported in the Buildings and Conditions report. Notice that Washington, Grove and Olive were all rated “poor”.

Geocoded ratings from 2018. Although we were missing data for part of Olive and Taylor streets, Grove and Washington were reported to have “good” and “fair” gutters.

But I was still comparing two visualizations reported on different scales.

The challenge here was trying to compare particular property-specific data points from 2018 map with those of street-specific visualization from 2008. So I proceeded by reducing the number of data points on the 2018 map to poor/fair, good, and excellent. Note that a non-existent gutter also meant that it was in a poor state.

To identify areas for improvement, we needed to be able to point to them in the map.

Gutters rated “excellent”.

Gutters rated “fair” and “poor” (including non-existent). Notice how many are fair/poor in contrast with the ones rated “excellent” (on the left).

Below is an image of non-existent and poor-rated gutter locations. I pulled the gutter and curb rating from 2008 again for a quick comparison of the properties that would need help with gutters. Notice how the entire Washington St was rated worst in ’08 and has improved since. 

All gutters reported to be “poor” or non-existent in 2018 in west Bloomington.

All gutters reported in 2008 in west Bloomington.

Finding overlap in ratings using visualization

To find better ways to redirect WBRP’s efforts for revitalization, I plotted overlaps between certain building components to visually note which properties/areas of the neighborhood could benefit from a specific revitalization program targeted towards certain building components. As an example, we noticed a lot of overlap between poor rating for roof and chimney.

Overlap between the fair/poor ratings of roof and chimney.




Zoomed in version of the overlap.


The overall data cleaning and analysis of the survey data helped non-technical audience such as WBRP board members understand the data and helped them understand how to proceed for months to come. While statistical analysis confirmed WBRP’s assumptions about improvement in the past decade, the visualizations helped picture the current condition of West Bloomington neighborhood. This project was conducted over 10 weeks.