Skip to Main Content

Text and Data Mining Databases

Guidance on using library-licensed resources for text and data mining.

Linguistic Data Consortium


Georgetown University Library is a Linguistic Data Consortium (LDC) member and has access to linguistic corpora from 2003 to the present. Users must register and create an account with the LDC using their Georgetown email. Access to download datasets will be granted within two business days once the user account is verified. See this FAQ for more information about creating a user account. After the account is approved, corpora can be downloaded from the LDC catalog.

For offline analysis, the Library also has a large number of the licensed corpora available on CD, DVD, or hard drive which can be requested through HoyaSearch.

Quick Info:

Coverage Content licensed by Georgetown University
Registration required? Yes (Catalog/Download) / No (Physical media)
API available?


Publication restrictions? Maybe - always check the dataset's record in the LDC Catalog for publication restrictions and citation requirements
Library permission required? Yes (Catalog/Download - LDC account required) / No (Physical media)

Library Resources Covered:

Creative Commons   This work is licensed under a Creative Commons Attribution NonCommercial 4.0 International License. | Details of our policy