You can upload these transcript file types to Audio-Video:
- SubRip (.srt extension)
- .xml
- .txt
You can add .xml or .txt files on a project-by-project basis. To do this, contact shanti-support@virginia.edu. Let them know the format you use to encode transcripts.
These formats assume you have a transcription of your file that includes timestamps – that is, start times and end times for the dialogue. If you don't have timestamped transcripts, you can paste the text of the transcript in the resource's description.
SubRip
Transcripts in .srt format are written in the following order:
- A number, which identifies the transcript in the sequence
- The start time for the transcript in hours:minutes:seconds,milliseconds , followed by
-->
, then the end time for the transcript - The transcript text
- A blank line to mark the end of the transcript
Here's an example:
120 00:00:17,710 --> 00:00:19,820 I am speaking
To mark a change in speakers, start the transcript with text with >>, followed by the speaker's name. For example:
120 00:00:17,710 --> 00:00:19,820 >> JOHN: I want to upload a transcript. 121 00:00:20,710 --> 00:00:21,820 >> ALICE: I can help you with that. 122 00:00:22,910 --> 00:00:23,820 >> Do you have an Audio-Video account? 123 00:00:25,710 --> 00:00:26,820 >> JOHN: Yes, I do.
You can find out more with the external YouTube transcript guide.
Multilingual Transcripts
You can use multiple languages for SubRip transcripts in Audio-Video. Each language will have its own line in the public video interface. Viewers can then hide and display languages from transcripts or subtitles.
Enabling Multilingual Transcripts
To make this feature work for your video:
- Upload your transcript in Audio-Video
A “Language Tier” field will open
- Choose the languages in your transcript from the drop-down menu
- Click Connect
Audio-Video will process your transcript to let viewers choose languages
Formatting in Inqscribe
You can make multilingual transcripts using InqScribe Transcription Software. To mark multilingual transcripts using Inqscribe, separate each language in the timecode with a /. Here's an example of a single time-coded line in in English and Tibetan:
[00:00:00.14]??????????????? / ??????????????????????????????? / Now what is this we call "böd" (Tibet)?
If you want to mark who is speaking:
- keep the speaker within the / that separates the languages
- follow the speaker's name with a colon
Here's an example of the same single time-coded line in English and Tibetan with one speaker, Tsering Gyalpo:
[00:00:00.14]??????????????? / Tsering Gyalpo: ??????????????????????????????? / Now what is this we call "böd" (Tibet)?
To export the multilingual transcript with speakers in InqScribe:
- From the menu, click File > Export > XML..
A window will open
- Check Export Speaker Names
- Enter a colon ( : ) for the speaker name delimiter
- Export the file
Your file is ready for Audio-Video