Digital Scholarship Group & NULab Fall Welcome 2019

Introduction

On October 30th, 2019, the NULab and the Digital Scholarship Group (DSG) hosted the sixth annual Fall Welcome event, featuring an exciting group of lightning talks from Northeastern graduate students and faculty, as well as a keynote lecture by Professor Laura K. Nelson, Department of Sociology and Anthropology.

Lightning talks

In his presentation, “Applications of Natural Language Processing in Data Journalism,” School of Journalism Lecturer Aleszu Bajak discussed the work Storybench, a digital storytelling resource, has been doing on data journalism. Drawing upon computational data analysis, Storybench’s work has been featured in national news sources like CNN and the Wall Street Journal, including a story on the media’s negative treatment of women candidates in the 2020 Democratic presidential primaries. More information on Storybench can be found here.

In her talk “XM<LGBT/>: A Queer method for Rhetoric and Composition Research,” Abbie Levesque questioned how a queer methodology may be implemented in practice when encoding and coding textual data. She described using a self-built XML schema to qualitatively code data, arguing that embodying queer methods in schema creation, encoding, and coding work means these schemas and categories are never static, but ever-changing and fluid. Her talk is a preview of her article, “XM<LGBT/>: A Schema for Encoding Queer Identities in Qualitative Research,” which is currently in-press at the journal Computers and Composition.

Network Science Institute PhD Candidate Carolina Mattsson detailed her work “Networks of Monetary Flow at Native Resolution” on financial networks in East Africa and China. Drawing upon proprietary data from local financial technology companies (similar to Venmo or bank peer-to-peer financial transfer apps), Mattsson shows the network structure of the connected financial transactions and how money “walks” through the network. Mattsson’s work reveals the shape of the economy as it increasingly moves to online and peer-to-peer formats, and was even able to find evidence of financial fraud in the process.

Professor of English and Dean’s Professor of Civic Sustainability Ellen Cushman gave a presentation on the preservation of language in her talk: “Digital Archive for American Indian Language Preservation and Perseverance.” Professor Cushman’s Cherokee heritage has inspired her to employ digital archiving tools in her ongoing work to preserve the Cherokee language and use these tools in Cherokee language pedagogy.

Cara Marta Messina presented on the Digital Integration Teaching Initiative, a new initiative created and supported by the College of Social Sciences and Humanities to partner with faculty across disciplines and integrate digital proficiencies in their classrooms through assignments and workshops. She provided an example from a First-Year Writing course, where DITI and the faculty member extended the faculty’s previous assignment to incorporate Google Sheets and conversations about data collection, organization, and analysis as well as the social and political impact of these results. To read more about the modules DITI has created and their faculty partnerships, visit the DITI site.

Keynote

The NULab and DSG began the new academic year welcoming Dr. Laura K. Nelson, Assistant Professor of Sociology at Northeastern University, as the keynote speaker at Northeastern’s Digital Scholarship Fall Welcome. In her talk, “Machine Learning is Feminist,” Nelson began with the provocative claim that theory is not determined by the tools that one uses, and that computational tools, though used by many in a positivistic and universalizing way, are, from an epistemic stance, feminist. Nelson clarified that while arguments can and should be made that machine learning is feminist from a social justice perspective, computational methods themselves maintain a feminist epistemic stance in that they allow for and recognize intersecting systems of cultural and structural power in a way that the mainstream dominance and tyranny of variable-based regression analysis in the social sciences obscures. Taking direct aim at the narrow-and-long styles of survey data and statistical analysis that have dominated the social sciences for decades, Nelson argues that these methods, though yielding important insights on their own, have produced a “view from nowhere” of the social world, a view that while attempting to be totalizing, objective, and universal, obscures many of the contexts on the ground that give the social world its meaning, fitting within the frameworks of feminist theory, like that of Dorothy Smith, which calls attention to the contexts of marginalized groups whose labor makes this view from nowhere possible in the first place.

To develop this argument, Nelson pointed to a number of critiques lodged against traditional regression analysis that, taken together, obscure the systems of power and the contexts that make the social world meaningful, highlighting the damage done by focusing only on this view from nowhere. Nelson focused on four specific critiques of regression analysis, specifically it’s inability to recognize and account for intercategorical complexity, intracategorical complexity, anticategorical complexity (McCall 2005), and the contextual processes underlying and giving meaning to the data itself. From this base, Nelson made the case that machine learning (computational methods) directly addresses the shortcomings of regression analysis brought out by these critiques. Nelson argues that computational methods can address and account for intercategorical complexity by providing more data than was traditionally available to bring out diversity within categories. These methods can also address intracategorical complexity by providing and bringing out higher dimensionality, increasing the number of variables at play, and producing “wide” and long data. Computational methods can also address anticategorical complexity, or the fact that the categories we apply to the social world might not be mutually exclusive in reality, through unsupervised learning which can discover new categories as well as give percentage predictions on multiple categories, removing binary categorical assertions. Finally, to confront the absence of focus on the processes underlying the meaning and context of data in the social world, computational methods rely on the oversharing of data producers, which is directly into the fact that the data we are getting is not predetermined by the researcher but presented by respondents with all the information they make available which can bring out connections between the subject of interest and the other processes that impinge on it that the researcher in more traditional survey studies might not have thought played a role in the phenomenon in question.

Computational methods then respond positively to the pitfalls facing traditional regression analysis, but what makes these methods particularly feminist? Nelson argued that the vector space that these methods rely on is the epistemic grounds upon which these methods can be considered feminist. Spatial vectors are used to represent quantities that have both magnitude and direction which can be added to and scaled. Vector space allows for high dimensionality, providing general representations of complex data including texts, networks, images, and maps, allowing for clustering and scaling, making it possible to represent overlapping and interlocking phenomena in correspondence with one another. The vector space utilized by computational methods allows for the representation of interlocking systems of power in a relational capacity that allows us to see the interactions between these systems and reveals the lived experiences underlying these systems of power, thereby forming the ground upon which these methods rely as well as establishing them as feminist from their epistemic base.

From this base, Nelson then advocated for the establishment of an Interpretive Data Science, preserving the epistemic grounding given by computational methods and pushing it forward into an active program guided by three goals/principles, creating a data science that acts to 1) imbue action with meaning, 2) that seeks to measure not just how much of a phenomenon is present but to uncover what drives these differences in measurement, and 3) that aims to reveal intersectional experiences. Nelson then provided case studies for each of these three guiding principles to present what an interpretive data science looks like in action. In order to highlight the ways in which action can be imbued with meaning, Nelson referred to a study she has been working on looking at social movement discourse to reveal goal orientations that group movements together in new ways not accounted for in the literature, letting discourse and context guide how we study social movements in connection as opposed to as generalized occurrences of the same phenomena. In order to get at what drives differences in quantitative measurement, Nelson then referred to work she has conducted looking at the gendered inequality in service labor performed in the medical profession, finding that not only do female physicians do more work than male physicians but the content of their work is different, their comments being more helpful specifically for struggling medical students, producing different outcomes and forcing us to ask new questions. To highlight the ways in which an interpretive data science can reveal interlocking systems of power, Nelson referred to a study of North American slave narratives revealing differences in the language used by men and women to represent their experiences, leveraging different rhetorical and value systems to relate their experiences.

To begin to bring the insights pulled from these various studies together, Nelson asked the audience: What does it all mean? For Nelson, the emergence of these more interpretively focused data projects represents a missed opportunity, a series of exceptions to the research conducted in the mainstream under the tyranny of regression analysis and the epistemic conditions they rely on, namely a concern with generalizability and the quest for universal theories and laws. Not only are these interpretive exceptions important, but for Nelson they constitute the center of data science that should be, providing not only an alternative to mainstream data science models, but in many instances models and understandings of the social world that are simply more accurate. The pillars of Nelson’s proposed interpretive data science aim to preserve and account for the role that context and interpretation play in our understanding of the social world. This context is critical and can be retained by making big data small (Foucault Welles 2014), moving against aggregation to bring more contextual elements into our analyses, and by staying rooted in a qualitative understanding of the data. This approach should be further complemented with bringing in groups generally left out of, or under-represented within, traditional research. This approach should fundamentally aim to reveal interlocking systems of power and not reproduce them. Nelson rounded out her talk by arguing that a program for interpretive data science should be concerned with meaning-making not patterns, with understanding not laws, and should aim to be contextual as opposed to universal, all things that are possible under the feminist epistemology presented to us by machine learning and computational methods.

Digital Scholarship Group & NULab Fall Welcome 2019

Introduction

Lightning talks

Keynote

More Stories

Reading Between the Lines Part II: A Mini Blog Series Investigating A Narrative of the Life of Mrs. Mary Jemison

Reading Between the Lines: A Mini Blog Series Investigating A Narrative of the Life of Mrs. Mary Jemison

NULab Research Project: LLMs, Literature and History