Exporting Microsoft Stream Transcript

Microsoft has changed the interface on Stream slightly, so my code to export the Stream transcript needed an update. since copy/paste doesn’t seem to work for everyone, the script is also available as a text file.

var objTranscriptionLines = window.angular.element(window.document.querySelectorAll('.transcript-list')).scope().$ctrl.transcriptLines;
var strRunningText = "";
for(var i = 0; i < objTranscriptionLines.length; i++){
    if( objTranscriptionLines[i] ){
        var strLineText = objTranscriptionLines[i].eventData.text;
        strRunningText = strRunningText + "\n" + strLineText;
    }
}
console.log(strRunningText);

4 comments

  1. Avatar
    Chuck A says:

    This worked well, thank you, Lisa!
    I’d never been in the Firefox Developer Console before, but your instructions on the prior page https://www.rushworth.us/lisa/?p=5681 were very clear.
    Using this, I will be able to download the transcripts for 20 -30 videos in a pre-recorded series of interest to my team, but not easily mass-searchable.
    Is there a way to capture the time entry for each line as well?
    Is there a way to capture the “copy the video URL at current time” (like Youtube has?). Ideally, someone searching the full transcripts would be able to find a matching line, click a link related to the timestamp and open their browser to the stream video at that exact point.
    Thanks!

    • Avatar
      Lisa says:

      It looks like there’s a way to do what you’re looking for. If you inspect the object that’s getting created from the transcript list content, there are a lot of additional components beyond eventData.text that I’m using to export the transcript. To create a jump-to-timecode URL in Microsoft Stream, you need the unique ID that’s assigned to the video — fortunately, that should be available from the URL you are using to view the video. Strip off any existing info that’s being passed along (the stuff after the ? in the URL) and you’ve got a base URL for creating the timecode links. For each line of content, you need to pull in the timecode (t.startSeconds in my example code) and append it to the video’s URL.

      The question then becomes how you want to display it to the users — the timecodes seem to be in seconds (i.e. an hour in, my timecode is 3600), which isn’t the most user friendly view. There’s a t.start attribute that lists the time as PT36M23.225 at 36 minutes and 23 seconds into the video … but that’s hardly more comprehensible. Personally, I’d consider building a “pretty time” function that turns an integer number of seconds into an hour:minute:second format. For a quick example, though, I am just going to show the user the number of seconds & let them click on a link that says 3618 to get to a line that’s an hour into the video 🙂 Alternately, you could just create the link around the transcript text and leave the timecode as a value that’s only visible if you’re looking at the URL you are about to click.

      Unfortunately, the comment editor doesn’t take code very well … https://www.rushworth.us/lisa/?p=8395 includes code examples!

  2. Avatar
    Graham Kinahan says:

    Code to extract Stream subtitles works great but unclear why only extracts twenty lines or so, regardless of where paused or playing in an hour-long video… is there a step to force it to extract all lines generated during 60 minutes of play? Thnx

    • Avatar
      Lisa says:

      Thanks for the info — I’d been testing with a short video, so hadn’t noticed the problem. I had started a meeting with myself, talked for two or three minutes, and ended the meeting so I’d have a good sample for the documentation I was writing. I’ve updated the code in this article with something that exported the entire transcript for a real meeting — about 45 minutes long — so hopefully this works for your hour-long meetings too!

Leave a Reply

Your email address will not be published. Required fields are marked *