Skip to content

Speed Data-ing Fall 2020

By Caroline Grand

On November 18th, 2020, the NULab for Texts, Maps, and Networks hosted the fifth annual “Speed Data-ing” event, a research showcase that brought together potential collaborators to discuss a range of digital humanities and computational social science research questions, methodologies, and data sets. This year, Speed Data-ing was held in collaboration with the Digital Scholarship Group as part of the Digital Humanities Open Office Hours series.

Rachael Grudt, a master’s student in Data Science, kicked off the round of talks with a presentation of her ongoing work for her thesis, “Quantifying the Impact of NSF ADVANCE Grants on Recipients’ Publication Careers.” ADVANCE is an initiative from the National Science Foundation which develops systematic approaches to increase the representation and advancement of women in academic STEM careers, both by helping them fit into existing structures and by changing those structures. Grudt’s research investigates the impact of this 19-year-long program on individual grant recipients’ publication careers, primarily through use of bibliometric data from the Microsoft Academic Graph. After normalizing the data, Grudt analyses the grant recipients’ number of publications, their average number of co-authorships, and their average number of women co-authors. Overall, the publication data from faculty grantees shows a boost in publications and a sustained increase in their number of women coauthors over time, suggesting a positive relationship between an individual becoming involved in the ADVANCE ecosystem and the individual changing their co-authorship gender composition to include more women. If you’re interested in collaborating with Grudt, she can be reached at: grudt.r[at]northeastern[dot]edu.

Next, Syed Haque, a sixth-year PhD student in Network Science, discussed the rich dataset available from Wikipedia edit history. Because anyone with a registered account can update a Wikipedia article at any time, the continuous revision process constitutes the knowledge curation process on Wikipedia. As the whole text of an article is repeated every time a revision happens, the edit history of one article connects all of the revisions within the whole text, creating a linked progression of revisions in the dataset. Haque is currently working with a snapshot of the entire Wikipedia edit history on August 8th, 2020, which he has converted from XML to JSON in order to index the data with ElasticSearch, which provides a full text search capability for single or multiple phrases. Through ElasticSearch, Haque can explore how certain concepts—like “flattening the curve”—have been assimilated into the discourse of Wikipedia over time. If you’re interested in working with Haque or his dataset, he can be reached at: haque.s[at]northeastern[dot]edu.

Brennan Klein, a final-year PhD candidate in Network Science, discussed recent efforts to understand the ways in which collective social behavior has changed as a result of the COVID-19 pandemic. Klein has been analyzing large-scale datasets with Northeastern’s Network Science Institute, including datasets specific to COVID-19 and its impact, in testing and county-level case numbers; datasets about behavioral response, taken from aggregated human mobility flow from mobile devices and large tech companies like Google, Facebook, and Waze; and datasets for exploratory analysis, such as articles, testing policies, and case numbers in university responses to the pandemic. Analyzing these datasets provides epidemiological insights: for instance, Klein has found that mobility and contacts today are correlated with new deaths in 5–6 weeks. If you’re interested in collaborating with Klein or using these datasets, he can be reached at[at]northeastern[dot]edu.

Stefan McCabe, a fifth-year PhD student in Network Science, shared his ongoing research with the Lazer Lab on COVID-19 information sharing on Twitter. The Lazer Lab’s dataset collects profiles from users on Twitter and matches them with voter records to map the distribution of COVID-19 information sharing across demographic, racial, and partisan lines. Their Twitter data has extracted profile information from 10% of user accounts over a two-year period, with 2.2 billion tweets. McCabe’s research with the Lazer Lab also looks at the distribution of COVID-19-related tweets over time, as well as the distribution of other major keywords like Black Lives Matter (BLM). If you’re interested in working with McCabe, he can be reached at mccabe.s[at]northeastern[dot]edu.

Cara Marta Messina, a doctoral candidate in English, presented her ongoing work for her dissertation, The Critical Fan Toolkit. Messina’s work analyzes ideologies that permeate to television shows and their fandoms, and examines how fans critically take up these ideologies in their fanfiction authorship. Messina’s large dataset analyzes 36,000 fanfiction texts from the Game of Thrones and Legend of Korra fandoms and interviews from fans. In particular, Messina is investigating how white supremacist ideology is replicated in Game of Thrones fanfictions; she has found that only 4% of the Game of Thrones fanfiction corpus on Archive of Our Own incorporates characters of color. However, also Messina has identified authors who consciously critique these ideologies in their fanfictions, suggesting that fandoms are not homogenous. If you’re interested in getting involved with the Critical Fan Toolkit, Messina can be reached at messina.c[at]northeastern[dot]edu.

Finally, Riley Tucker, a doctoral candidate in Criminology & Criminal Justice, shared his ongoing work with the Boston Area Research Initiative (BARI). Tucker has been analyzing several datasets which measure activity in Boston before, during, and after the COVID-19 pandemic across a variety of social dimensions. Online data sources include Craigslist, Yelp, AirBnB, Google Places, and Foursquare. Tucker has also been exploring administrative datasets, including property assessments, building permits, code violations, and food inspections across the city. In the near future, Tucker and BARI will be collecting and analyzing data from the Boston Public Health Position and from COVID-19 surveys through UMass Boston. If you’re interested in collaborating with Tucker, he can be reached at tucker.r[at]northeastern[dot]edu.

If you have any leads for these promising projects or are interested in becoming involved with collaborative research at the NULab, please reach out to Sarah Connell at: sa.connell[at]northeastern[dot]edu.

More Stories

An image of the room where the NULab 2024 Spring Conference was held

NULab Spring Conference 2024 Recap

A wordcloud describing the most search words with slavery and American largest

NULab Research Project: LLMs, Literature and History


Eleven Plus: A Generation Poised