There’s been no shortage of hype about the relationship between cities and data, especially so-called big data. For large numbers of tech companies, cities, and even a growing number of urbanists, data promises to solve all manner of urban problems, from predictive policing to improving traffic flow to promoting energy efficiency.
An even bigger potential role for new kinds of data lies in helping researchers and policy-makers better understand how cities and neighborhoods grow and evolve—but only if done right.
The legitimately exciting use of new data
A growing number of researchers are using data from internet sources such as Google, Twitter, and Yelp to develop new insights into cities and urban change. The sociologists Robert Sampson and Jackelyn Hwang have used Street View images to examine the role of race in the process of gentrification and neighborhood transformation. Similarly, a study from the U.K. Spatial Economics Research Centre used geo-tagged photos on Flickr to determine levels of urbanity in London and Berlin. Mobility data from Uber and Lyft—and even taxicabs—has also been used in several recent studies, which my CityLabcolleague Laura Bliss and former colleague Eric Jaffe have chronicled. Data from real estate sites such as Zillow and Trulia is also being used to analyzehousing price trends across neighborhoods, cities, and metro areas.
Other research has used reviewer data from Yelp to study gentrification and unequal urban consumption patterns. One study used Yelp reviews to shed light on the connection between gentrification and race in Brooklyn. AnotherNBER study employed Yelp data to find out how ethnic and racial segregation affects consumption levels in New York City.
Twitter data has been used to chart regional preferences and patterns of behavior. A study from the Oxford Internet Institute mapped the flow of online content and ideas across cultures. The cartography blog Floating Sheep has used data from Twitter, Google, and Wikipedia to map everything from beer and pizza to weed, bowling, and strip clubs. And my own team has used data from MySpace to track the leading centers for popular music genres across the U.S. and the world.
More recently, a team of Italian researchers combined data from Foursquare and OpenStreetMap, among other sources, to test Jane Jacobs’ theories of urban vitality and diversity in six Italian cities. Their study confirmed many of Jacobs’ key insights about the importance of short blocks, mixed land uses, walkability, dense concentrations of talented workers, and urban public spaces.
In addition to data from websites, satellite data offers the possibility of amassing systematic and comparable data across global cities (little, if any, has been previously available). Several studies (including my own) have used satellite data to get at the economic output of cities and metros around the world. And a 2012 study in the American Economic Review uses light emissions from satellites as a proxy for the spatial organization and economic size of global cities. While this data is subject to considerable limits, it provides at least rough estimates of the overall size and economic scale of cities across the world.
Accurately characterizing “big data”
Not all data from new sources qualifies as “big data,” which—as its name implies—refers to truly massive amounts of information. Max Nathan of the London School of Economics breaks down actual big data into three key categories: internet data from sites like Yelp, Twitter, or Google and other commercial data, government-sponsored data collected by cities or towns, and Census and related data. One example is a 2014 NESTA study, which used big data from the London-based firm Growth Intelligence to map patterns of information and technology businesses in the U.K. Another comes from aforthcoming study in the American Journal of Sociology, which uses data from millions of 3-1-1 service requests to examine neighborhood conflict among residents of different ethnicities.
According to Nathan, big data can be thought of in terms of “four Vs”: variety, volume (millions or billions of observations), velocity (real-time data), and veracity (raw data). Actual big data often requires data analytics methods like machine learning to process and derive meaning from such large troves of information. The ongoing Livehoods Project from the School of Computer Science at Carnegie Mellon University, for instance, uses machine learning to analyze 18 million check-ins on Foursquare to determine the structure and characteristics of eight different cities. When used appropriately, big data and new data analytics can help researchers discern urban structures and patterns that traditional data and methods might not uncover on their own.
A particularly good example of the use of big data is a recent NBER study by Harvard and MIT researchers, which uses computer visioning to better understand geographic differences in income and housing prices. Although the paper covers plenty of ground, perhaps the most interesting section involves the use of Google Street View to predict income levels and housing prices in Boston and New York between 2007 and 2014. The study links 12,200 images of New York City and over 3,600 images of Boston to data on median family income and home values from the 2006-2011 from the American Community Survey. It then examines the extent to which the positive physical attributes shown in these images (i.e. things like size and green space) attract more affluent residents and predict incomes and housing prices.
Ultimately, the study finds that “images can predict income at the block group level far better than race or education does.” The study notes that a key purpose of big data is to help illuminate the role of smaller geographic areas in our urban economies, which are harder to get at with traditional Census data. The authors conclude that big data offers “some hope that Google Street View and similar predicts will enable us to better understand patterns of wealth and poverty worldwide.”
Problems and limitations
While big data may ultimately be able to advance our observation of and theories about cities, a growing number of scholars urge caution in using it. A2014 workshop, which brought together 40 or so leading urban social scientists and data users, identified six key issues surrounding big data, spanning data quality and compatibility, the use of new analytical techniques, and questions of privacy and security. As the workshop summary notes:
Developing theory to go with the new methods and data is critical, and is often sidelined. Engineering and control theory (or big data “without theory”) work well when there is a measurable outcome, a simple policy to correct for it, and fast enough reaction time that the correction can be implemented while it is still appropriate. In cities, this is the process used to optimize service delivery. But this theory does not work well for complex systems with long time horizons, like most social systems.
In other words, big data and new data analytics are only as good as the questions we pose and theories we generate to better understand them. No matter how powerful they may be, new data sources and analytic techniques are no real substitute for nuanced human reasoning about cities. The real power of course lies in using these new tools to test and deepen the insights of cutting-edge urban theory. My own hope is that we can eventually combine them in ways that deepen our understanding of the underlying “urban genomics” of neighborhoods, cities, and urban areas.
This article was originally published on www.citylab.com and can be viewed in full


Archive
- October 2024(44)
- September 2024(94)
- August 2024(100)
- July 2024(99)
- June 2024(126)
- May 2024(155)
- April 2024(123)
- March 2024(112)
- February 2024(109)
- January 2024(95)
- December 2023(56)
- November 2023(86)
- October 2023(97)
- September 2023(89)
- August 2023(101)
- July 2023(104)
- June 2023(113)
- May 2023(103)
- April 2023(93)
- March 2023(129)
- February 2023(77)
- January 2023(91)
- December 2022(90)
- November 2022(125)
- October 2022(117)
- September 2022(137)
- August 2022(119)
- July 2022(99)
- June 2022(128)
- May 2022(112)
- April 2022(108)
- March 2022(121)
- February 2022(93)
- January 2022(110)
- December 2021(92)
- November 2021(107)
- October 2021(101)
- September 2021(81)
- August 2021(74)
- July 2021(78)
- June 2021(92)
- May 2021(67)
- April 2021(79)
- March 2021(79)
- February 2021(58)
- January 2021(55)
- December 2020(56)
- November 2020(59)
- October 2020(78)
- September 2020(72)
- August 2020(64)
- July 2020(71)
- June 2020(74)
- May 2020(50)
- April 2020(71)
- March 2020(71)
- February 2020(58)
- January 2020(62)
- December 2019(57)
- November 2019(64)
- October 2019(25)
- September 2019(24)
- August 2019(14)
- July 2019(23)
- June 2019(54)
- May 2019(82)
- April 2019(76)
- March 2019(71)
- February 2019(67)
- January 2019(75)
- December 2018(44)
- November 2018(47)
- October 2018(74)
- September 2018(54)
- August 2018(61)
- July 2018(72)
- June 2018(62)
- May 2018(62)
- April 2018(73)
- March 2018(76)
- February 2018(8)
- January 2018(7)
- December 2017(6)
- November 2017(8)
- October 2017(3)
- September 2017(4)
- August 2017(4)
- July 2017(2)
- June 2017(5)
- May 2017(6)
- April 2017(11)
- March 2017(8)
- February 2017(16)
- January 2017(10)
- December 2016(12)
- November 2016(20)
- October 2016(7)
- September 2016(102)
- August 2016(168)
- July 2016(141)
- June 2016(149)
- May 2016(117)
- April 2016(59)
- March 2016(85)
- February 2016(153)
- December 2015(150)