These resources cover topics that include programming languages like R and Python, introductions to computational methods of text analysis, and natural language processing methods, such as topic modeling and word embedding models.
Guides and Tutorials
- Codecademy for Python – Online teaching resources for learning Python and other programming languages.
- Download and Install PyCharm – Website for downloading and installing PyCharm, an Integrated Development Environment (IDE) for Python.
- Download and Install IPython – Website for downloading and installing IPython, an interactive shell for Python programming.
- Download and Install Python – Website for downloading and installing Python.
- Google Python Tutorial – Google’s free online Python programming class.
- The Hitchhiker’s Guide to Python – A textbook for both novice and expert developers by Kenneth Reitz and Tanya Schlusser.
- Machine Learning with Sci-Kit Learn – Video tutorials for machine learning in Python with the Sci-Kit Learn Package.
- Practice Python – Beginner Python exercises that come with discussion topics.
- Python for Everybody – Open source Python textbook with exercises.
- Python for Informatics – Textbook by Charles Severance that is an applied but comprehensive introductory Python text with sections on text parsing.
- Python NLP Course – Yandex Data School’s Natural Language Processing in Python Course.
- Python Practice Problems for Beginner Coders – Resource with a Google Colab notebook featuring practice problems for people already familiar with the basics.
- Python Programming for the Humanities – Interactive tutorial and introduction into Python programming for the humanities by Folgert Karsdorp.
- Python Text Analysis Course – Material on GitHub for Laura K. Nelson’s text analysis course.
- Python Text Analysis Tutorial – A list of tutorials on Python and text analysis assembled by Neal Caren of the University of North Carolina, Chapel Hill.
- Web-Scraping Tutorial– Tutorial on Web-Scraping with Python.
- Computational Statistics in R Course – Material from Northeastern professor Nick Beauchamp’s Computational Social Science course.
- Data Science Specialization Course – A broad course in R covering multiple data science techniques like statistical inference, machine learning, regression models, exploratory analysis, and others.
- Download and Install R – Website for downloading and installing the R programming language.
- Download and Install RStudio – Website for downloading and installing RStudio, an Integrated Development Environment (IDE) for R.
- Humanities Data Analysis Class – Material from Ben Schmidt’s graduate seminar on data analysis for the humanities.
- A Light Introduction to Text Analysis in R – An introduction and overview of text analysis tools in R.
- Managing and Manipulating Data in R – Material from a UCLA programming with R course.
- RSeek – A search tool for finding resources on R programming.
- Simple Data Types in R – Information on basic data types in the R programming language.
- Text Analysis With R for Students of Literature – Matthew Lee Jockers introductory text with PDF available through NEU Library.
- Text as Data R Course – Material on using textual data in R by Chris Bail of Duke University.
- Applied Text Analysis Course – Material from Justin Grimmer’s course on Applied Text Analysis for Social Scientists.
- Computational Text Analysis for Social Science – Article on social science text analysis methodology usage by Brendan O’Connor, David Bamman, and Noah A. Smith of Carnegie Mellon University.
- Natural Language Processing with Deep Learning – Material from Stanford University’s “Natural Language Processing with Deep Learning” course.
- Python and Text Analysis for Absolute Beginners – a Jupyter Book on basic text analysis with Python created by the NULab and Research Data Services in the Northeastern Library.
- Stanford CS224N: NLP with Deep Learning – Video lectures from Stanford University’s Natural Language Processing with Deep Learning from Winter 2019.
- Text Analysis Introduction – Basic guide to introductory text analysis from “Tooling Up for Digital Humanities.”
- Text Encoding Initiative (TEI) – TEI Guidelines are published as open-source software.
- Text Mining with R: A Tidy Approach – Textbook by Julia Silge and David Robinson.
- Where to Start – A guide on how to start text mining. Written by Ted Underwood of the University of Illinois, Urbana-Champaign.
Tools and Methods
- AntConc – A concordancing and text analysis toolkit created by Laurence Anthony.
- CasualConc – A Mac OSX-native toolkit (AntConc’s Mac version is ported from the PC, and has some bugs).
- Lexos – A tool for scrubbing, chunking, and tokenizing text, in addition to performing modest analysis and visualizing clusters. See: How to Create Topic Clouds with Lexos – Blog post by Scott Kleinman on using Lexos for topic modeling word clouds.
- Voyant Tools – A simple, yet powerful web-based text analysis and visualization tool.
- Word Tree – A tool that creates word trees from a block of text.
- TEI Publisher – Practice and view demos of text encoding with this online tool.
- Guided Tour – A comprehensive guide to topic modeling by Scott Weingart of Carnegie Mellon University.
- Topic Modeling Made Just Simple Enough – An introduction to topic modeling written by Ted Underwood of the University of Illinois, Urbana-Champaign.
- Topic Modeling Toolbox – An alternative to MALLET for LDA topic modeling from Stanford University.
- Pulling Out the Stops – An article questioning the utility of highly customized or comprehensive stop lists by Alexandra Schofield, Måns Magnusson, and David Mimno.
Journal of Digital Humanities’s Special Issue – Special issue of JDH on Topic Modeling in the humanities published in 2012.
- Topic Modeling: A Basic Introduction – Introductory article by Megan R. Brett from JDH’s special issue explaining the basic concepts of topic modeling.
- Words Alone – Article on Latent Dirichlet Allocation’s (LDA’s) limitations by Ben Schmidt.
MALLET – Website for downloading and installing MALLET, an open-source and Java-based Latent Dirichlet allocation (LDA) package.
- Topic Modeling Tutorial – Tutorial by Shawn Graham, Scott Weingart, and Ian Milligan’s on setting up a command-line environment for using MALLET.
GUI Tools that use MALLET:
- Google’s Topic Modeling Tool – A graphical user interface for doing topic modeling.
- Serendip – A system for visualizing topic models by Eric Alexander and Joe Kholmann of the University of Wisconsin-Madison.
- Bookworm – A customizable corpus trend visualization tool.
- TextPlot – A Python package by David McClure that produces a force-directed network of words in a text, the nodes of which are clustered using estimated kernel densities. See also: (Mental)Maps of Text, a blog post explaining the concept of TextPlot; TextPlot Refresh – Python 3, PyPi, CLI App, a blog post on downloading and setting up TextPlot; and Literary MRIs (or, tuning TextPlot), a blog post on TextPlot’s parameters.
- Vector Space Models for the Digital Humanities – A Blog post by Ben Schmidt which links to his R package wrapping word2vec (word2vec is written in C).
- Women Writers Vector Toolkit – Interface for querying terms in word2vec models trained on Women Writers Project corpus.
- Word Vectors for the Thoughtful Humanist series – A blog series published by the Women Writers Project on word embedding models.