Top of my to-do list has been to build something using data from the City of Seattle data portal (data.seattle.gov), powered by Socrata. This one is not particularly useful, but I’ll share with you anyway!
One of the things I like least about homeownership is the need to keep up a yard. Thankfully I’ve never received nastygrams about yard care, but I’ve heard some horror stories. One couple in Texas was in the process of removing a dead tree from their yard — they had just cut it down and into transportable sized pieces, and piled and covered the logs up with a tarp, and THE NEXT DAY they got a note demanding they remove “tree debris” from their yard within three days or pay a penalty.
Anyway, in that frame of mind I giggled through the “weed and vegetation code citations” available via public record. Since Washington legalized marijuana last year, I had an idea in my mind to track code citations like these:
- Large overgrown blackberries and vegetation encroaching sidewalk
- Hazardous vegetation encroaching on sidewalk forcing pedestrians into street
- OBSERVED LARGE TREE ON PROPERTY NO SIGNS OF RODENTS AND NO BEES WERE PRESENT DURING A SUNNY DAY
…and present them as our city’s most pressing weed violations.
Clever? Maybe. Half-baked? Definitely. My first idea was to tweet out the description of the citation, with a google maps street-view image of the address in question. That might be visually interesting but seemed like a huge invasion of privacy, so I quickly shelved it. I have no interest in actually shaming the property owners.
So instead I started with the easiest path, tweet the generic descriptions, to see where that led me. Here are the “get-started” steps:
- Register for a developer key with Socrata
- Identify the data set you want at data.seattle.gov (for me: code violations)
- Use developer key to access the API (I used a Ruby gem from Socrata) to fetch that data set
- Filter (for code group: “weed and vegetation”) and collect results in an array
- Set up a Twitter API client to talk to twitter, via a new twitter account if needed (I recycled an existing bot!)
Ok, now we have everything setup that we’ll need to tweet out our code violations, so let’s think through how this should work…
Every XX minutes, have the twitterbot access the database to see if there are new records available. If yes, tweet their descriptions, and save the case number in a “tweeted” array (to prevent duplicates). If the first result available has already been tweeted, skip it and go to the next item in the array and tweet it if it hasn’t yet been tweeted and isn’t a duplicate. Repeat until infinity.
This should work! I ran this formula in print-only (no tweeting) to see what it would do, and reached 2 sad conclusions:
- Case numbers won’t help me prevent duplicates, since lots of the records have different case numbers and identical descriptions. If I want to avoid repeat text, I’m going to have to keep track of the entire text description.
- Some of the descriptions are interesting, but most are really boring. Maybe some emoji will help!
With help from the ‘twitter-text’ gem and my pal who runs Botgle, I set up an array of plant-like emoji and set the tweet builder to add zero to 4 random emoji to each citation description. So now we have:
? ST 11/16/15 Vegetation encroaches sidewalk. Sidewalk to be clear concrete edge to concrete edge up to 8 feet above sidewalk.
? Weed & Veg violation. ? ?
? 11/05/15 DW Vegetation overhanging into sidewalk r-o-w ? ?
? ? Bradrick 11.6.15 Vegetation blocking the sidewalk. ?
Better! Not great still, but better.
I might just call this a day if not for that same-old problem that Heroku has about restarting every 24 hours. When Heroku restarts, my “tweeted” array gets cleared, opening the door for duplicate tweets. I ran into this problem with an earlier bot and the solution was to randomize the text selection. In this case, I might need to set up a twitter “listener” to have it run correctly (new tweet every time a new entry is recorded in the data set), but that’s more than I want to do and might not even work on Heroku. Instead, I thought, “ok, how about when it starts, I have it pull the last XX tweets from the twitter account and store THOSE in the “tweeted” array?” Done! Easy! But I’m still getting duplicates, can you see why… ?
here’s a hint:?
Because when the descriptions come out from the API, they’re pure text. I’m altering them by adding emoji before tweeting. Pulling a list of recent tweets to compare with the API results, it’s going to compare “? Weed & Veg violation. ??” to “Weed & Veg violation.” and say “these aren’t the same, tweet it!”
I looked to see if I could find a good regular expression solution for comparing the two, but I’m struggling with how Ruby ‘reads’ emoji. What I *can* do is store several trimmed permutations of each tweet in the “tweeted” array in the hopes that one will match up with the incoming text. It’s a memory hog but with a project this size it doesn’t really matter. And for a bigger project, hopefully I won’t be dealing with a server restart every night.
So that’s the bot! Weed Violations. I’m still mulling over how I can make it more interesting without losing the core objective, which is using text from Socrata’s API. I’m thinking about mashing it up with Little Shop of Horrors or Joyce Kilmer or something to play up the “killer weeds” angle. Or maybe I’ll just let it run for awhile and see what happens.
If you’ve been wanting to play with city or county data, Socrata makes it easy to dive in!