Skip to content
Contact Us

Ethics of Social Media Content Moderation: Examining Warning Labels, Contextual Information, and UI/UX Tools

The primary aim of this ongoing project, begun in mid-2020, has been to provide robust social and ethical analysis and evaluation of different approaches to online content moderation, with an emphasis on the use of warning labels and curation of information, and techniques that relate to user interface designs and user experience (UI/UX) choices. Researchers have also drawn on global surveys to assess public attitudes toward content moderation techniques. The project’s central research question is: How can online social media platforms, such as YouTube, Facebook, TikTok, and Twitter, meet the needs and demands of society in terms of shaping informational content (and context) in an ethical and effective way? The overall goal is to create more knowledge in this emerging area of information ethics to help companies and policymakers make more informed, and more ethical, decisions.

Publications from the project have included articles in the Yale Journal of Law & Technology; Journal of the Association for Information Science and Technology; Harvard Kennedy School Misinformation Review; Journal of Experimental Psychology; as well as conference papers at the International Communication Association and the Midwest Political Science Association.

The project includes a focus on labeling to address scientific misinformation related to COVID-19, as well as a focus on how best to incorporate fact-checking into labeling efforts. The project draws on knowledge gained from current attempts at online content labeling in general, as well as from strong analogs to content labeling, such as in library sciences and food labeling.

Support for this project comes from Northeastern’s Office of the Provost, the College of Arts, Media & Design, Meta, Inc, and X, Corp.


Prof. John Wihbey, Lead Investigator, Associate Professor of Media Innovation

Prof. Myo Chung, Assistant Professor of Journalism and Media Advocacy

Prof. Don Fallis, Professor of Philosophy and Computer Science

Prof. Kay Mathiesen, Associate Professor of Philosophy

Prof. Ronald Sandler, Professor of Philosophy, Director, Ethics Institute

Dr. Briony Swire-Thompson, Research Scientist, Network Science Institute

Current researchers:

Garrett Morrow (PhD in Political Science)

Project alums:

Dania Alnahdi (BS in Computer Science & Design)

Gabriela Compagni (BS in Philosophy, Minor in Data Science)

Jessica Montgomery Polny (MS in Media Advocacy)

Nicholas Miklaucic (BS in Data Science & Behavioral Neuroscience)

Roberto Patterson (MS in Media Advocacy)

Dr. Matthew Kopec, Associate Director, Ethics Institute


Prof. John Wihbey

As advanced artificial intelligence (AI) technologies are developed and deployed, core zones of information and knowledge that support democratic life will be mediated more comprehensively by machines. Chatbots and AI agents may structure most internet, media, and public informational domains. What humans believe to be true and worthy of attention – what becomes public knowledge – may increasingly be influenced by the judgments of advanced AI systems. This pattern will present profound challenges to democracy. A pattern of what we might consider “epistemic risk” will threaten the possibility of AI ethical alignment with human values. AI technologies are trained on data from the human past, but democratic life often depends on the surfacing of human tacit knowledge and previously unrevealed preferences. Accordingly, as AI technologies structure the creation of public knowledge, the substance may be increasingly a recursive byproduct of AI itself – built on what we might call “epistemic anachronism.” This paper argues that epistemic capture or lock-in and a corresponding loss of autonomy are pronounced risks, and it analyzes three example domains – journalism, content moderation, and polling – to explore these dynamics. The pathway forward for achieving any vision of ethical and responsible AI in the context of democracy means an insistence on epistemic modesty within AI models, as well as norms that emphasize the incompleteness of AI’s judgments with respect to human knowledge and values.

Prof. John Wihbey & Garrett Marrow.

Based on representative national samples of ~1,000 respondents per country, we assess how people in three countries, the United Kingdom, the United States, and Canada, view the use of new artificial intelligence (AI) technologies such as large language models by social media companies for the purposes of content moderation. We find that about half of survey respondents across the three countries indicate that it would be acceptable for company chatbots to start public conversations with users who appear to violate rules or platform community guidelines. Persons who have more regular experiences with consumer-facing chatbots are less likely to be worried in general about the use of these technologies on social media. However, the vast majority of persons (80%+) surveyed across all three countries worry that if companies deploy chatbots supported by generative AI and engage in conversations with users, the chatbots may not understand context, may ruin the social experience of connecting with other humans, and may make flawed decisions. This study raises questions about a potential future where humans and machines interact a great deal more as common actors on the same technical surfaces, such as social media platforms.

Garrett Morrow, Prof. Myojung Chung, Prof. John Wihbey, Dr. Mike Peacey, Yushu Tian, Lauren Vitacco, Daniela Rincon Reyes, & Melissa Clavijo

Citizens and policymakers in many countries are voicing frustration with social media platform companies, which are, increasingly, host to much of the world’s public discourse. Many societies have considered regulation to address issues such as misinformation and hate speech. However, there is relatively little data on how countries compare precisely in terms of public attitudes toward social media regulation. This report provides an overview of public opinion across four diverse democracies – the United Kingdom, South Korea, Mexico, and the United States – furnishing comparative perspectives on issues such as online censorship, free speech, and social media regulation. We gathered nationally representative samples of 1,758 (South Korea), 1,415 (U.S.), 1,435 (U.K.), and 784 (Mexico) adults in the respective countries. Across multiple measures, respondents from the United States and Mexico are, on the face of it, more supportive of freedoms of expression than respondents from the United Kingdom and South Korea. Additionally, the United Kingdom, South Korea, and Mexico are more supportive of stricter content moderation than the United States, particularly if the content causes harm or distress for others. The data add to our understanding of the global dynamics of content moderation policy and speak to civil society efforts, such as the Santa Clara Principles, to articulate standards for companies that are fair to users and their communities. The findings underscore how different democracies may have varying needs and translate and apply their values in nuanced ways.

Prof. John Wihbey, Dr. Matthew Kopec & Prof. Ronald Sandler

Social media platforms have been rapidly increasing the number of informational labels they are appending to user-generated content in order to indicate the disputed nature of messages or to provide context. The rise of this practice constitutes an important new chapter in social media governance, as companies are often choosing this new “middle way” between a laissez-faire approach and more drastic remedies such as removing or downranking content. Yet information labeling as a practice has, thus far, been mostly tactical, reactive, and without strategic underpinnings. In this paper, we argue against defining success as merely the curbing of misinformation spread. The key to thinking about labeling strategically is to consider it from an epistemic perspective and to take as a starting point the “social” dimension of online social networks. The strategy we articulate emphasizes how the moderation system needs to improve the epistemic position and relationships of platform users — i.e., their ability to make good judgements about the sources and quality of the information with which they interact on the platform — while also appropriately respecting sources, seekers, and subjects of information. A systematic and normatively grounded approach can improve content moderation efforts by providing clearer accounts of what the goals are, how success should be defined and measured, and where ethical considerations should be taken into consideration. We consider implications for the policies of social media companies, propose new potential metrics for success, and review research and innovation agendas in this regard.

Garrett Morrow, Dr. Briony Swire-Thomson, Jessica Montgomery Polny, Dr. Matthew Kopec & Prof. John Wihbey

There is a toolbox of content moderation options available to online platforms such as labeling, algorithmic sorting, and removal. A content label is a visual and/or textual attachment to a piece of user-generated content intended to contextualize that content for the viewer. Examples of content labels are fact-checks or additional information. At their essence, content labels are simply information about information. If a social media platform decides to label a piece of content, how does the current body of social science inform the labeling practice? Academic research into content labeling is nascent, but growing quickly; researchers have already made strides toward understanding labeling best practices to deal with issues such as misinformation, conspiracy theories, and misleading content that may affect everything from voting to personal health. We set aside normative or ethical questions of labeling practice, and instead focus on surfacing the literature that can inform and contextualize labeling effects and consequences. This review of a kind of “emerging science” summarizes the labeling literature to date, highlights gaps for future research, and discusses important considerations for social media platforms. Specifically, this paper discusses the particulars of content labels, their presentation, and the effects of various label formats and characteristics. The current literature can help guide the usage and improvement of content labels on social media platforms and inform public debate and policy over platform moderation.


Prof. John Wihbey, Garrett Morrow, Prof. Myojung Chung, Dr. Mike Peacey

Social media companies have increasingly been using labeling strategies to identify, highlight, and mark content that may be problematic in some way but not sufficiently violating to justify removing it. Such labeling strategies, which are now being used by most major social platforms, present a host of new challenges and questions. This report, based on a national survey conducted in the U.S. in summer 2021 (N = 1,464), provides new insights into public preferences around social media company policy and interventions in the media environment. It is often assumed that there are highly polarized views about content moderation. However, we find relatively strong, bipartisan support for the basic strategy and general goals of labeling.

Prof. John P. Wihbey & Jessica Montgomery Polny

This archive presents a comprehensive overview of social media platform labeling strategies for moderating user-generated content. We examine how visual, textual, and user-interface elements have evolved as technology companies have attempted to inform users on misinformation, harmful content, and more. We study policy implementation across eight major social media platforms: Facebook, Instagram, Twitter, YouTube, Reddit, TikTok, WhatsApp, and Snapchat. This archive allows for comparison of different content labeling methods and strategies, starting when platforms first implemented labeling tactics, through the early months of 2021. We evaluate policy response to real-world events and the degree of friction labels have against user interaction with harmful content. This archive also serves as an appendix to other Ethics of Content Labeling publications.

See the archive’s project page overview. Individual platform links:




Gabriela Compagni

Analyzing how the news media cover the topic of content moderation and labeling, as well as how the public absorbs and spreads this coverage, can provide important insights into whether the interests and concerns of news producers line up with those of the public . In this study, we used the open-source analytical platform Media Cloud to filter thousands of news stories using targeted keywords relating to content moderation, and then searched for common links among articles, such as popular phrases, people, corporate entities, or themes. In tandem with this examination, we assessed the extent to which the news content was viewed by the public and shared on social media, and guaged the public’s interest in the topic over time. Our results should be of interest to corporate policy makers, researchers, and regulators along at least two dimensions. First there is a substantial difference between what aspects of content moderation and labeling seem important to news producers (measured by media  inlinks to articles) and what seems important to news consumers (measured by number of e.g., Facebook shares). Second, examining keywords that appear most often across content moderation news stories, such as “misinformation” and “removal,” provides further insights into what concerns are salient to the public. It’s possible that an over-reliance on news coverage has been causing stakeholders in the content moderation game to misjudge what really matters to the public, in which case our examination could help content moderators to better align their efforts with public concerns.

Dr. Briony Swire-Thomson, Nicholas Miklaucic, Prof. John Wihbey, Dr. David Lazer, Dr. Joseph DeGutis

The backfire effect is when a correction increases an individual’s belief in the misconception and is often used as a reason not to correct misinformation. The current study aimed to test whether correcting misinformation backfires more than a no-correction control, and whether item-level differences in backfire rates were associated with (1) measurement error or (2) theoretically-meaningful attributes related to worldview and familiarity. In two near identical studies we conducted a longitudinal pre/post study with 920 participants from Prolific Academic. Participants rated 21 misinformation items and 21 facts and were assigned to either a correction condition or test-retest control. We found that no item backfired more in the correction condition compared to test-retest control or initial belief rating. Item backfire rates were strongly correlated with item reliability and did not correlate with importance/worldview. Familiarity and backfire rate were significantly negatively correlated, though familiarity was highly correlated with reliability.

Garrett Morrow & Gabriela Compagni

Local news sources in the United States have been dwindling for years. Although newsrooms are shrinking, the American public generally trust their local news sources. Crisis events like the COVID-19 pandemic are circumstances where people are actively searching for information and some of what they will find will inevitably be misinformation given the volume of misinformation being created and the affordances of social media services that encourage viral spread. It is critical to understand if local news is spreading misinformation or acting as a cross-cutting information source. This study uses local news data from a media aggregator and mixed methods to analyze the relationship between local news and misinformation. Findings suggest that local news sources are serving as cross-cutting information sources but occasionally reinforce misinformation. We also find a worrying increase of anti-mask stories and accompanying decrease of pro-mask stories after a mask mandate is enacted.

Prof. John Wihbey & Garrett Morrow

For over a century, Supreme Court Justice Oliver Wendell Holmes’ metaphor of the “Marketplace of Ideas” has been central to Americans’ conceptualization of the First Amendment. However, the metaphor has
evolved, and the today’s marketplace looks much different than the marketplace of the early twentieth century. We argue that the Marketplace of Ideas is now a dynamic environment of information exchange that is
distributed throughout the internet and private applications and is guided by algorithms. The modern Marketplace of Ideas frames discussion of freedom of expression and content moderation. An updated understanding of the metaphor allows for an improved public sphere of discussion where free thought can flourish, truth can be tested, and ideas can be productively exchanged. This paper articulates three central evaluative criteria against which a given contemporary marketplace regime can be judged: instrumental value; epistemic value; and normative value. In this paper, we explain how the metaphor has evolved into marketplace 3.0 and the criteria necessary for judging the usefulness of the Marketplace of Ideas.


Tackling misinformation: What researchers could do with social media data

  • with Dr. Briony Swire-Thomson and Prof. John Wihbey as collaborators

Designing Solutions to Misinformation, CfD Conversations 03

  • Moderated by Prof. John Wihbey with Roberto Patterson as contributor

More Stories

Summer Training Program on Responsible Computing Education


TeχnēCon 2024


2024 IDEAS Summer Program

Programs and Initiatives