Overview
Computers are powerful machines, and much of our world is oriented around interfacing with them. Computers only understand instructions in 0s and 1s (called binary code), but very few people can write instructions using 0s and 1s. Instead, most people use tools that others have written to interact with computers. These are both general and specialized tools for interacting with computers: general tools, such as programming languages, are more flexible but more difficult to learn; specialized tools are easier to learn, but only do a limited number of tasks (e.g., Word for word processing or GIS tools for working with maps).
In this glossary, we define some tools that enable social scientists and humanists to productively use computers, as well as other related concepts.
Key Terms
An API, or application programming interface, is a set of subroutine definitions, communication protocols, and tools for building software that ultimately allows applications to communicate with one another. An API may be for a web-based system, operating system, database system, computer hardware, or software library.
Citation software is used to manage different sources, analyze source usage, and produce citations. Citation software streamlines the processes of creating citations, managing citation formatting, and automatically filling in citation details via identifiers like ISBNs or DOIs. Citation software examples include: RefWorks, Zotero, Menedely, and Endnote, all of which offer basic versions for free or more expansive choices for a fee.
Computational text analysis is how computers “read” texts. Text analysis as a general definition is a way to read, understand, and make inferences about texts. While people excel at making inferences about and understanding texts, computers are much more effective at counting and identifying patterns across texts.
For example, computers can count words or phrases (sometimes referred to as nGrams, where n= the number of words in the phrase), automatically label the parts of speech in a text (such as nouns, verbs, and prepositions), trace linguistic patterns using collocation and word trees, and perform more complex forms of analysis like topic modeling.
Digital archives are collections of public and historic materials stored online, often for preservation and accessibility. These materials are usually digitized through scanning, photography, or transcription. Examples of digital archives can be found on most museum websites, which often have digital repositories of their art. Some specific examples include the Library of Congress Digital Collections or the Washington State Archives.
Digital ethics are the principles that guide users and producers as they research, collect data from, and make choices in digital spaces, many of which have real-world consequences. It is especially important to think about who has access to data, what they are doing with it (e.g., tools, algorithms), and how this may impact individuals and societies. Understanding the political, social, and economic contexts of how data is stored, shared, and analyzed can help researchers, users, and creators assess these impacts.
Digital ethics lead researchers to make informed decisions when they gather and analyze data produced online; guide users to think critically about the tools they use; help teachers choose the best pedagogical platforms for their students; shape creators’ choices for gathering and disseminating information; and highlight the potentially harmful ways data is collected, stored, and analyzed.
Digital Object Identifier (DOI) is a unique identifier given to digital objects and publications like academic journals, government documentation, or research reports. DOIs are useful because a unique code is assigned to each publication and it is therefore easy to look up a specific document and its details by its DOI. Unlike ISBNs, DOIs can also be used to link directly to the online document itself.
Excel is a Microsoft spreadsheet program that can read, analyze, and visualize spreadsheet data; Excel is mainly used for quantitative analysis. There are multiple types of files that work with Excel, including Excel files (e.g. .xlsx files) and delimiter-separated value files (e.g., comma separated value [.csv] or tab separated value [.tsv] files).
While Excel is not the only spreadsheet program (other options include Numbers for MacOS users and open-access programs like Google Sheets and LibreOffice), it is often used in professions that deal with quantitative data because it can quickly organize data, and enables statistical analyses and data visualizations. However, researchers who choose to work with Excel should be aware that its proprietary data format and auto-formatting behaviors are serious limitations that should be considered carefully. Excel is not free.
Geospatial refers to data specific to a particular location. GIS, or geographic information systems, capture, store, manipulate, and manage spatial or geographic data, typically for research presentations or platforms. Examples of GIS and geospatial programs include ArcGIS, Carto, and Mapptitude, which are not free, and QGIS, which is a free, open-source program with more limited options.
Git is a version control language that coders often use to store different versions of their code. This is a preventative and organizational measure so coders may return to former versions of their code. Git can be used to store these different versions on coders’ local machines or, with GitHub, in cloud storage.
GitHub is an open-source version control repository that stores information in cloud storage and is typically used by coders to store, publish, and share their code, and to collaborate on projects. GitHub uses Git to maintain different versions of codes and files.
What makes GitHub particularly useful for coding is its interface, which displays the inputs and outputs of particular codes and encourages documentation from coders in the form of README files. GitHub can be collaborative in that multiple users can work on one repository, but the repository can also be published and used by only one person, while others can “fork” (a form of downloading) the code and information in that repository.
Graphical User Interfaces (GUIs) are visual applications that allow users to interact with computers, such as dropdown menus, icons, buttons, and toolbars. This is the most common method for users to interact with their computers and access software. Examples of GUIs are Microsoft Word, Google Chrome, and your local file manager (e.g., Finder on Macs or File Explorer on PCs), all launched through clicking on a specific icon.
An alternative way of interacting with the computer is through the command line, which only allows keyed instructions instead of clicking icons with a cursor.
An Integrated Development Environment (IDE) is a type of software for coders and software engineers to build programs, analyze data, and edit code. Typically, IDEs have an environment that allows coders to input code, see outputs, view their data, receive error messages for debugging, and more. Some IDEs include Spyder, RStudio, and Oxygen.
International Standard Book Number (ISBN) is a unique number identification code given to individual editions of books. ISBNs are useful because a unique code is assigned to each book and it is therefore easy to look up a specific text and its details by this number. Publishers purchase ISBNs from the International ISBN Agency and assign them to their publications. ISBNs are commonly used in library references and citation software.
Machine learning is when coders are able to guide computer systems to carry out tasks without the systems’ being explicitly told what to do. Coders provide a series of instructions and data as a way to help train a machine learning algorithm, and then the algorithm uses these instructions and data to perform independent tasks. An example of machine learning is facial recognition software, in which the computer is given thousands of examples of pictures with and without faces, to learn how to recognize faces in pictures it has never seen before.
Network analysis is the investigation of the elements, structures, and processes of networks. Networks are formed of nodes (the thing or actor) and edges (the tie or relationships between the nodes). Networks can include social connections, road layouts, and organizational ties, among many other examples. Network analysis can be used to investigate phenomena that include social networks, the spread of diseases, traffic patterns, and migration flows. Network analysis software examples include: Gephi, VOSviewer, IBM Analyst Notebook, and R/Python libraries.
Notebooks, or computational notebooks/essays, are active programming environments that allow users to write and run code, and integrate narrative, code, and output in the same document. Notebooks are becoming increasingly popular for researchers to present their analyses and interpret their results. Because notebooks are more intuitive for some compared to standard IDEs (they look more similar to standard essays), these tools are increasingly used for pedagogy in the social sciences and humanities. Specific tools for composing notebooks include Jupyter Notebooks (formally known as IPython) and R Markdown.
Programming languages are the ways in which coders write instructions for their computers to follow. Computers only understand instructions in 0s and 1s (called binary code), but programming languages allow users to write instructions in a way that is more interpretable to humans that the computer then translates to 0s and 1s. Programming languages range from low-level languages that are closer to binary code (e.g., C and C++) to high-level languages that are more user friendly (e.g., Python, R, and JavaScript).
Procedural programming languages are written as a series of instructions that the computer follows (e.g., the C language). Object-oriented programming languages define data structures, data types, and operations that can be performed on these data objects. Because of their orientation toward data objects, object-oriented languages are more popular for data analysis in the social sciences and humanities.
Often coders use Integrated Development Environments to facilitate the use of a programming language; this enables users to write and run code, view variables, and inspect output in the same application. More recently, those using programming for data analysis often display and share their analyses through notebooks, which integrate narrative, code, and output in one document. These documents are sometimes called computational essays.
Python is an object-oriented programming language. Python was originally built as a general-purpose language with the ability to work with a range of different data structures and was primarily used to build applications. Now, however, Python is one of the most popular languages used for data analysis and visualizations; it allows users to collect and structure data—and is also used for statistical analysis, natural language processing, machine learning, network analysis, and other forms of computational analyses. Popular IDEs for Python include Spyder and PyCharm. Jupyter Notebooks are popularly used to publish analyses carried out with Python, although Jupyter can also run other languages, including R.
R is an object-oriented programming language. R was originally built for statistical analysis, although it now works with all types of data and has the ability to build applications. R enables users to conduct statistical analysis, data visualization, natural language processing, machine learning, network analysis, and other forms of computational analysis. The most popular IDE for R is RStudio, and R Markdown is used to compose and publish notebooks in R.
Social media are applications and web services that allow users to create and share content like blog posts, images, or videos. Social media provide a method for online social networking. Examples include Facebook, Twitter, Pinterest, or LinkedIn.
Stata is general-purpose statistics software that allows users to manage data, analyze statistics, run regressions, and make graphics or simulations for research. The term “stata” is the merging of “statistics” and “data.” Stata is not free.
The Text Encoding Initiative (TEI) is an organization that has created standards for representing humanities research materials, primarily texts, in digital forms. The TEI standard is a free application of XML to texts and is used to describe the features and structures (e.g., paragraphs, lines of verse, etc.) of textual content. XML is a metalanguage that provides a syntax for markup, while TEI is a language that provides a vocabulary and a grammar for encoding texts.
Web design encompasses the various skills and disciplines in the production and maintenance of websites, including graphic and interface design and web accessibility. Example platforms for web design include WordPress, Wix, Jekyll, and Omeka. All of these examples offer limited, free options (either with limited storage space, a restricted number of websites one could make, or set themes one could use), with more choices available for a monthly or annual fee.
Web scraping is the process of extracting large amounts of data from an internet source and downloading the data to a local repository. The scraping process can be done manually, but is usually automated by using software because of the large amount of data typically involved.
When web scraping, it is best to use an API whenever possible. Some websites prohibit web scraping and/or the use of robots. Even if the websites allow web scraping, be responsible and do not overload the servers. Some web scraping tools include TAGs, Python, R, and Javascript.
XML (eXtensible Markup Language) is a set of provisions for encoding documents so that they are readable by both computers and humans. XML uses elements to label and mark the boundaries of document components, and attributes to provide more information about elements. XML is a free standard format to create and share structured data.
-
Type of Program
-
Links and Resources