Making WordPress.org

Opened 8 years ago

Last modified 2 months ago

#1510 new enhancement

Add full transcripts to videos on WordPress.tv

Reported by: mor10's profile mor10 Owned by:
Milestone: Priority: normal
Component: WordPress.tv Keywords:
Cc:

Description

The videos on WordPress.tv provide a wealth of information, but this information is pretty much invisible on the web, both for search engines, the internal search system on the site, and for users who don't want to watch / listen to the videos. The lack of transcripts is also a major accessibility concern.

Adding full transcripts of videos would remedy this situation and provide significant benefits:

  • Videos become accessible to a far greater audience
  • Presented content becomes searchable from within wordpress.tv
  • Presented content becomes indexable for search engines resulting in organic traffic and (probably) boosting the exposure of the videos to a larger audience
  • Visitors can choose to read the transcript rather than watch the video (reading is often faster than watching)
  • Translation (even automatic) of videos is vastly simplified with the side effect of potential multi-lingual captions
  • Videos with poor audio quality become easier to digest for visitors
  • Transcripts provide an excellent opportunity for future enhancement of the site including transcript-based navigation similar to what's found on Lynda.com

Implementation

I propose a panel directly below the video player that shows the first few paragraphs and provides a "reveal more" link or similar to reveal the whole transcript. Ideally the DOM should contain the entire transcript to make the content accessible for search.

Practical Implementation

The obvious question is "where are these transcripts going to come from"? Three options:

  1. Edited text dumps from CART captioning during events
  2. Edited dumps from volunteer video captions
  3. Volunteer text transcripts

Change History (13)

This ticket was mentioned in Slack in #wptv by mor10. View the logs.


8 years ago

This ticket was mentioned in Slack in #accessibility by mor10. View the logs.


8 years ago

#3 @mor10
8 years ago

Somewhat related to #1455

#4 follow-up: @Otto42
8 years ago

Question: how is this different than the subtitles file? For those that have such, of course.

#5 in reply to: ↑ 4 ; follow-up: @mor10
8 years ago

Replying to Otto42:

Question: how is this different than the subtitles file? For those that have such, of course.

The proposal here is to make the full transcript part of the page rather than a linked element that requires parsing by the user. When you visit the page for a video, you should see the video player directly followed by the full transcript of the video as part of the post. In feeds you should see the transcript, and it should also be exposed in the API (if available).

The current subtitles file is provided as .ttml which is not natively read by the browser and requires parsing. It also contains a large volume of timecode data which needs to be stripped out for it to work as a transcript.

#6 follow-up: @arush
8 years ago

I think this is a good idea, especially until the accessibility of the video player is fixed. Is the code for WPTV open source as of this point? Because at the moment, in order to stream a video, you've got to tab to one of the linked copies, (low, high, ETC), copy the link, open a media player locally, insert the URL and play it that way. Or download the video to your local machine for play. This has resulted in quite a large archive of WPTV videos on my local machine, which is neat, but sometimes transcripts would definitely be useful. I do, however, realize that in the worst case senario of manual transcription, this would involve a ton of effort.

#7 in reply to: ↑ 6 @mor10
8 years ago

Replying to arush:

I do, however, realize that in the worst case senario of manual transcription, this would involve a ton of effort.

As proposed, this would be optional, in the same way that the captions are optional. Hopefully over time we'll have more CART at events, and the added exposure of transcripts will encourage speakers to caption and provide transcripts for their own videos.

#8 in reply to: ↑ 5 ; follow-up: @Otto42
8 years ago

Replying to mor10:

The proposal here is to make the full transcript part of the page rather than a linked element that requires parsing by the user.

No, I mean, if we have subtitles, then is that a suitable source for said transcript to be produced from?

Version 0, edited 8 years ago by Otto42 (next)

#9 in reply to: ↑ 8 @mor10
8 years ago

Replying to Otto42:

Replying to mor10:

No, I mean, if we have subtitles, then is that a suitable source for said transcript to be produced from?

Parsing ttml is dead easy, and a script can convert it to some other format trivially.

Yes. The answer is yes. Subtitles and transcripts are essentially the same, just formatted differently, so auto-generation from subtitles to transcript would simplify the process significantly.

That said, there should be an option for submitting transcripts as well. When events have CART captioning, a transcript will always be available in txt format.

This ticket was mentioned in Slack in #wptv by mor10. View the logs.


8 years ago

This ticket was mentioned in Slack in #wptv by casiepa. View the logs.


4 years ago

#13 @casiepa
4 years ago

@mor10 There is a 4th option now

The obvious question is "where are these transcripts going to come from"? Three options:
1.Edited text dumps from CART captioning during events
2.Edited dumps from volunteer video captions
3.Volunteer text transcripts

We (WPTV moderators) started testing YouTube and AWS Transcribe, so a 4th option is needed:

  1. Automated transcription after the event.

Where categories 2 and 3 could be seen as 'almost ready to publish', categories 1 and 4 would need moderator review or at least have a clear indication that those have not been reviewed. To be decided if those become visible immediately or not.

As the transcription for a video should be only 1 (in the original language of the video), it could be just a txt file uploaded in the media library that gets attached to the post that holds the video. This would allow moderators to attach/detach in case there is a newer (e.g. reviewed) version.

Adding the transcript next to the video as show/hide collapsible field, but fully available to search engines should indeed help to get people reach the correct videos.
/cc @dd32

This ticket was mentioned in Slack in #wptv by casiepa. View the logs.


4 years ago

Note: See TracTickets for help on using tickets.