Skip to content

Northeastern University

Go to the Northeastern
University homepage

•

College of Social Sciences
and Humanities

See all results

The NULab for Digital Humanities and Computational Social Science

About
- Contact
- Get involved!
Research
Teaching
Community
News & Events
- Blog
- News
- Events

Northeastern University

College of Social Sciences and Humanities

About
People
Academics
Research
Resources
Stories
Events
- Graduate Programs Welcome Day
Connect

Northeastern University

The NULab for Digital Humanities and Computational Social Science

About
- Contact
- Get involved!
Research
Teaching
Community
News & Events
- Blog
- News
- Events

Overview

Digital Integration Teaching Initiative

A presentation for an — A presentation on Our Marathon for an Advanced Writing in the Disciplines course

Data Considerations for DITI Faculty Partnerships

Menu

Overview
Call For Partnerships
Data Considerations for DITI Faculty Partnerships
Faculty Partnership Guidelines
Glossary of Terms

Data Considerations

DITI team members span multiple disciplines and their familiarity with data in disciplines other than their own may be limited. We recommend that faculty partners select useful data for workshops, projects, and presentations during advance planning for the module. Faculty partners should plan to provide or point to datasets that are relevant to the course and to consult with DITI team members on what kinds of data are needed for different assignments or workshops. In some cases, the DITI team may be able to assist with data gathering. If faculty are unable to provide data resources, then DITI team members may be able to supply datasets; however, such data may not always be the most relevant for the course.

The DITI is guided by three principles: open data, analyzable data, and archivable data. If datasets are not arranged in an open, standard, analyzable format, then users of that data might have difficulty accessing it in the short- or long-term. We strive to remove all unnecessary restrictions over the data that we create, use, and share in the classroom and on our GitHub Digital Showcase. If at all possible, we would like faculty to follow these principles when it comes to selecting datasets, though we will do our best to work with files in any form. We prefer faculty send us data that are analyzable and non-proprietary, so we can avoid problems like: major software updates causing datasets to become inaccessible, difficulty with reading files on different operating systems and with different software, or proprietary data types becoming obsolete.

The DITI follows data and file formatting guidelines that allow for long-term storage and wide-range accessibility. If you have any questions about these data considerations, please contact a DITI member.

Below are lists outlining the data file formatting standards we recommend at the DITI:

Data File Formatting

Containers

TAR
GZIP
ZIP

Databases

CSV
XML

Tabular Data

CSV
TSV

Geospatial Vector Data

SHP
GeoJSON
KML
DBF
NetCDF

Geospatial Raster Data

GeoTIFF/TIFF
NetCDF
HDF-EOS

Moving Images

MOV
MPEG
AVI
MXF
MP4

Sounds

WAVE
AIFF
MP3
MXF

Still Images

TIFF
JPEG 2000
PDF
PNG
GIF
BMP

Text

XML
HTML
TXT
UTF-8

Web Archive

WARC

Below are the data values we observe at the DITI and more resources explaining the reasoning behind these guidelines:

Data Values & Resources

Data Values

Standardize all coded and null values within a dataset.
Use an explicit value for missing or no data, rather than an empty field.
For numeric fields, represent missing data with a specified extreme value (e.g., -9999), the IEEE floating point NaN value (Not a Number), or the database NULL. Be advised that NULL and NaN can cause problems, particularly with some older programs. For character fields, use NULL, “not applicable”, “n/a” or “none”.
If there are multiple reasons that cells might not contain values, include a separate code for each.
The null value(s) should be consistently applied within and among data files.
If data values are encoded, be sure to provide a definition in the metadata. We recommend using UTF-8 when possible.
Don’t include rows with summary statistics. It is best to put summary statistics, figures, analyses, and other summary content into a separate companion data file.

Resources

“Data and File Formatting,” Axiom Data Science. 2017. https://www.axiomdatascience.com/best-practices/DataandFileFormatting.html
Tauberer, Joshua. “Analyzable Data in Open Formats (Principles 5 and 7).” Open Government Data: The Book. Second Edition, 2014. https://opengovdata.io/2014/analyzable-data-in-open-formats/

Type of Program
Get involved with the DITI
Toolkits and digital resources
Examples from DITI collaborations

More Programs

A presentation for an

Digital Integration Teaching Initiative

A map of the south-eastern United States showing the digital

Graduate Certificate Digital Humanities

The NULab for Digital Humanities and Computational Social Science

Contact

405 Lake Hall, Northeastern University
360 Huntington Avenue
Boston, MA 02115

[email protected]

Contact

Resources

Follow us on NUDIGITAL N
Follow us on Mastodon M