Skip to Main Content

Text Mining

Creating a Transcribed Corpus

Sometimes when creating a corpus, you may elect to transcribe audio or video to text to analyze. Transcription can be done manually, or using tools to assist and speed up the transcription process. See below for guidance on how to transcribe for free using Word for Web. 

With all transcription tools, it is important to understand that these tools are imperfect and often have issues transcribing accents or dialects, which can perpetuate linguistic bias. Additionally, the human transcribing the text can insert their own biases into the data. See the articles below for more information on ensuring accurate and efficient transcription:

Transcription with Word for Web

Microsoft 365 offers 300 minutes of free audio transcription per month via the Word for Web platform. Transcripts created can be edited and exported to a Word file format. This platform currently supports transcription of .mp4, .m4a, .mp3, and .wav files. 

Notes: 

  • A free Microsoft license has been assigned to all members of the Georgetown community by UIS. No sign-up should be necessary, however, if you encounter access issues with your account, contact UIS for support. A guide for accessing Microsoft 365 via the university license can be found here

  • Transcription services can only be accessed via Word for Web in Microsoft Edge or Google Chrome. Transcription is not available on the Word desktop application or in other web browsers. Please ensure you have downloaded and can access an appropriate browser for this tutorial. 

Steps for Transcription:

  1. Go to georgetown1-my.sharepoint.com. Sign in via your Georgetown email address and complete all necessary SSO prompts. 

  2. Create a new Word document by selecting “Add New” in the top left corner and selecting “Word document”.

  3. Word for Web should automatically open that new word document, however, you may need to manually select and open the file from the menu below.

  4. In the top right, select the arrow next to “Dictate”, and then select “Transcribe”.

This will open the transcription window. Note: at the bottom of this window, you can track how many minutes of free transcription you have used so far this month. 

  1. Click “Upload audio” and select the file you wish to transcribe. 

  2. Wait for transcription - this may take a few minutes to load.

  3. Review the transcription in the pane that appears. You can click the pen icon that appears when you hover over sections of the transcription to edit text and add speaker names. 

  4. Click “Add to document” to move the transcript to a Word document. You can select to move just text, or add in speaker names and timestamps. 

  5. Once your transcript has been moved to Word, you can edit and adjust it using the Word interface. You can also click "File" and "Save As" to select a location in OneDrive to save your transcript file, or to download a copy to edit locally via the Word desktop application. 

Creative Commons   This work is licensed under a Creative Commons Attribution NonCommercial 4.0 International License. | Details of our policy