The New York World’s interactive documenting serious health violations in state inspections of supermarkets made grocery shoppers all over the city cringe. Whether they found official confirmation of long-obvious problems or discovered disgusting conditions that until now had been out of public view, it all started with a mountain of data that holds the dirty secrets of the city’s food stores.
So how did we make the sausage, so to speak? The New York World combed through more than 4,000 records from the New York State Department of Agriculture and Markets showing which supermarkets within New York City had violations known as “critical deficiencies” — issues state inspectors deem “an immediate threat to the public health and welfare.”
The interactive does not include citations for lesser violations such as grime, dust and debris, which are frequent and widespread at most grocery stores.
The raw data from the Department of Agriculture and Markets included, bodegas, mom and pop shops, warehouses and other establishments that sell food, in addition to supermarkets. We narrowed down the massive list to 25 grocery store chains that have large stores with multiple locations in New York City, and had at least one critical deficiency on record. Search results return supermarkets that fall within a half mile radius of the address, zip code or neighborhood center.
The data was obtained by Patch through a Freedom of Information Law request to New State Department of Agriculture and Markets and is available for download as a SQL file. We imported the data into a MySQL database and queried only for major New York City supermarket chains going back to January 2008, then exported and cleaned data using Open Refine. From there, we imported data into a Google Fusion table which essentially acted as our database. We queried the data through Google’s SQL API and obtained geographic coordinates for searches using Google’s Geocoding API.
A few quirks in the data needed addressing. There wasn’t really a clear and consistent way of determining whether a supermarket is still operational or if it now operates under a different name than it did in 2008. We did our best to filter out defunct establishments. Multiple violations often appeared in one row of the data for a given supermarket and these needed to be extracted into separate instances in order to get a more accurate total for that location. Supermarket names were also inconsistent and we needed to weed out the ones that weren’t relevant as well as normalize the ones that were. For example, Bravo Deli was not the same as Bravo Supermarket and was removed from the dataset, while La Bella Marketplace in Bensonhurst is actually a C Town and so our edited data calls it “C Town.”
Looking at thousands of rows of data on a computer screen can get a bit tedious. To keep things organized, we decided to look for trends and interesting tidbits using MySQL. “Running a query” in MySQL means looking for results in the data — filtering out relevant information by specifying certain search criteria.
We searched for the neighborhoods with the most violations and we looked for the most common violations in the database. We did the same for the chain that was mentioned most often — and least often — in the database. Interested in trends, we repeated the process multiple times, looking for the dirtiest zip codes and boroughs.
If your don’t see your supermarket in our interactive, please email us the name of the store and its location to nyworld@thenewyorkworld.com and we will investigate.