Blog — Jared Joseph

In regards to getting academic work out for a wider audience, I think effective visualizations are the most valuable things I can spend my time on. To try and practice this skill, I participated in the UC Davis Data Science Initiative’s proposition visualization challenge along with a fellow sociology grad student. The goal of the challenge was to take a proposition on California’s November 6, 2018 ballot, and find a way to visualize it that would help people decide how to vote. You can see it at the bottom of the post, and read about it’s creation.

My partner and I decided to tackle proposition 11, which concerned emergency medial workers. We decided on this one as my partner studies labor and work, while I used to be a Emergency Medical Technician (EMT). As an overview, the proposition would allow ambulance providers to force workers to remain on call during their breaks, and require them to provide some mental health services and training. This all sounds good, so much so that there wasn’t even an official no position submitted to the California voters guide. Things are rarely that simple.

The main supporter and near sole funder of the proposition was American Medical Response (AMR), one of the largest private ambulance services in the US, which spent $30 million on the campaign. The company had some pending lawsuits that would be dismissed if this proposition goes though. Many of the features of this proposition were also already provided to medial personnel, like the training and mental health services. Something was off.

It proved particularly difficult to create a meaningful and informative visualization. We thought of looking at response times, pay for workers, the effect of break or lack thereof on worker health and alertness, but some element was always missing. We knew we wanted to find some way to show what people were saying on the no side, since there was no official statement sent out to voters in the official voting guidebook. We settled on using Twitter to find out how people were talking about the proposition.

I created a bot to scrape twitter every 5 hours for hashtags related to the proposition (#NoOn11, #YesOn11, etc.). After a few days we had a surprising low number of tweets, which may have been caused by the limitations of twitter’s free API. With what we did have, we wanted to showcase how people who supported and opposed the proposition were talking about it. We tried sorting purly based on hash tag, but that got muddeled when people were insulting the other side’s hashtag, or were using the neutral #prop11 tag while taking a side in the text of their tweet. I tried a simple string search based on using the words “yes” or “no” in the tweet, but that again resulted in false classifications. I ended up making an overly complicated solution, but one that I learned from.

I implemented my first “complex” machine learning algorithm, and created a neural net classifier based on the text of the tweets themselves. This would sort them into Yes, No, and Neutral tweets. It’s not perfect, and I know I have too little data and probably violated some assumptions, but this was a small hack-ey challenge and a good space to experiment. It ended up working well enough. I took these classified tweets, and compared each to the text of the official Yes statement, to see how much the statement was steering the conversation around the proposition, considering there was no opposing statement.

The answer? None of the tweets really matched the Yes statement; not tweets for or against. But, at least it provided away to show the volume of tweets for and against, and to see the text of each so that those who were opposed could have their position known. It was a quick challenge, and each ground had their own flavor of jankiness, but everyone involved learned something new. I think it was a great exercise, and our group did end up winning the challenge, which was nice.

Here you can see the visualization. The size of the dots are representative of the number of times that tweet was favorited. The classification wasn’t perfect, but you can mouse over the dots to read the text of the tweet! If you wanted to see how I made this, you can find the github repository here.