A very small, but growing number of library databases and subscriptions offer some form of access to corpora for text and data mining, usually through defined methods. This guide provides information on specific collections licensed by Georgetown University Library that do permit some form of text and data mining activity. Many publishers offer text and data mining access via API, separate from the search interfaces used in daily use of these titles. Note that this information is subject to change by the publisher. Always review the text and data mining documentation from the provider before beginning your project.
Note that many publishers do not permit text and data mining on their resources. Find out more about those resources.
Some providers listed in this guide limit the content available to text and data mine to metadata only, Georgetown-subscribed content only, Open Access content only, or to a limited number of their products. Providers rarely permit traditional screen scraping.
If you are interested in a source not listed on this page, please contact the Electronic Resources & Serials Unit for more information. Text and data mining may be expressly forbidden or limited according to the provider's terms of use.