Guides: Text and Data Mining Databases: Getting Started

Which library resources permit text and data mining?

A very small, but growing number of library databases and subscriptions offer some form of access to corpora for text and data mining, usually through defined methods. This guide provides information on specific collections licensed by Georgetown University Library that do permit some form of text and data mining activity. Many publishers offer text and data mining access via API, separate from the search interfaces used in daily use of these titles. Note that this information is subject to change by the publisher. Always review the text and data mining documentation from the provider before beginning your project.

Note that many publishers do not permit text and data mining on their resources. Find out more about those resources.

Some providers listed in this guide limit the content available to text and data mine to metadata only, Georgetown-subscribed content only, Open Access content only, or to a limited number of their products. Providers rarely permit traditional screen scraping.

If you are interested in a source not listed on this page, please contact the Electronic Resources & Serials Unit for more information. Text and data mining may be expressly forbidden or limited according to the provider's terms of use.

Access Policies

Except for the resources (and their associated access policies) listed below under the terms provided, electronic resources licensed by Georgetown University usually:

prohibit systematic (whether automated or manual) downloading, including screen scraping
prohibit the redistribution of content, including cleaned data
limit the number of articles or citations that can be downloaded at one time

Violation of these terms can easily result in access to the electronic resource being shut down for the entire campus, affecting your research and that of your fellow Georgetown University faculty and students around the world. Please review the Responsible Use of Electronic Resources policy for more information.

The Library actively advocates for its e-resource suppliers to provide text and data mining rights and support to Georgetown users.

If you are unsure of the text and data mining policies of a library resource, contact the Electronic Resources & Serials Unit in advance.

Which library resources DON'T permit text and data mining?

Many library database, e-book, and e-journal providers do not permit text and data mining on their products or do not have solutions available to Georgetown University. These include:

ProQuest (including current and historical newspapers)
Factiva
EBSCO (including current and historical newspapers)
LexisNexis (including Nexis Uni)

In addition, some providers listed in this guide limit the content available to text and data mine to metadata only, Georgetown-subscribed content only, Open Access content only, or to a limited number of their products.