Guides: Text and Data Mining Databases: Linguistic Data Consortium

Linguistic Data Consortium

Summary:

Georgetown University Library is a Linguistic Data Consortium (LDC) member and has access to linguistic corpora from 2003 to the present. Users must register and create an account with the LDC using their Georgetown email. Access to download datasets will be granted within two business days once the user account is verified. See this FAQ for more information about creating a user account. After the account is approved, corpora can be downloaded from the LDC catalog.

For offline analysis, the Library also has a large number of the licensed corpora available on CD, DVD, or hard drive which can be requested through HoyaSearch.

Quick Info:

Coverage	Content licensed by Georgetown University
Registration required?	Yes (Catalog/Download) / No (Physical media)
API available?	No
Publication restrictions?	Maybe - always check the dataset's record in the LDC Catalog for publication restrictions and citation requirements
Library permission required?	Yes (Catalog/Download - LDC account required) / No (Physical media)

Library Resources Covered:

Linguistic Data Consortium Corpus
Users must register and create an account with Linguistic Data Consortium (LDC) using their Georgetown email. Access to download datasets will be granted within two business days once the user account is verified. See FAQ for more information about creating a user account.