The summer of 2024 was filled with several exciting conferences in computational social science (CSS). These included the Association for the Advancement of Artificial Intelligence’s Conference on Web and Social Media (ICWSM) in Buffalo, NY, and the International Conference for Computational Social Science (IC2S2) in Philadelphia, PA. Scholars from across the globe gathered to share their projects, expertise, and resources with the CSS community. Each conference offered a variety of knowledge-sharing opportunities, from hands-on workshops to live demos showcasing novel data sources and methods aimed at streamlining and enhancing access to CSS research.
One such resource showcased at ICWSM was Communalytic, a computational social science research tool designed for studying online communities and discourse. Communalytic offers various modules including data collection from platforms like Reddit, Telegram, and YouTube, and analysis tools for topics, networks, toxicity, and sentiment. The tool is available in two account types: EDU (free) and PRO (paid), with different data storage capacities and collaboration features. It supports academic research and teaching by providing efficient, API-based access to social media data. What makes this tool unique is how it enables storage of API keys across multiple social media platforms, and provides streamlined and structured ways to retrieve data using a Graphical User Interface (GUI). This effectively lowers the technical startup cost for many social media related CSS research projects. Communalytic is developed and maintained by researchers at the Social Media Lab at Toronto Metropolitan University’s Ted Rogers School of Management.
Some might say that Wikipedia is the single greatest website for the storage of human knowledge, and we can learn a lot about human behavior and society by examining who or what has a Wikipedia page, what those pages include, and how individuals and events are discussed. To increase researcher access to this knowledge base, researchers from Northwestern University held a tutorial entitled “Wikimedia Data Tutorial: Using public data from Wikipedia and its sister projects for academic research”. This resource helps individuals understand the technical concepts around working with Wikipedia data, potential pitfalls, and best practices and recommendations from researchers who have been working to increase access to this body of information.
Social media data access featured prominently at ICWSM. The tutorial “Scraping Reddit the Right Way: A Guide to Legal and Ethical Data Collection with RedditHarbor” introduced learners to ethical and legal methods for collecting Reddit data. Co-organized by Socius and the Open Data Institute (ODI), the session covered the ODI’s Global Data Infrastructure, ethical considerations for using Reddit data, and hands-on training with the RedditHarbor toolkit.
There were additional sessions at IC2S2 that are without public facing materials. These included a guide to the Meta Content library for social science researchers, and a guide to using the Dark Web for social science research. These conferences were rich in information, and display the commitment of the scientific community to increase data access and transparency on social media.
The overarching theme across these conferences was the increasing commitment of the scientific community to responsible, accessible, and ethical data practices. The emphasis on tools like RedditHarbor, the Meta Content Library, and dark web research methods reflects the growing complexity of data sources used in computational social science. Beyond these tools, both ICWSM and IC2S2 demonstrated the CSS community’s drive to expand the boundaries of research by making data collection and analysis more transparent, secure, and collaborative. The focus on responsible data access is a significant takeaway, equipping researchers with not only the methods but also the ethical frameworks to advance social media research in impactful ways.