Using CCExtractor, modified version
Its best to just type "ccextractor" and hit return. Yes, its long, but all options are there. If we add or remove options, we update the help accordingly.
CCExtractor extracts digital 608 captions by default to an srt file. If you want analog 608 (SCTE20), or digital 708 captions, or SMI output, you must specify flags for each.
By default the TN will always use the "-noru" and "-utf8" flags. These flags disable roll up commands, and enable UTF8 encoding, respectively. Make sure you add these two flags to every run of ccextractor.
By default ccextractor will always create an output file to receive digital 608 captions. If no digital 608 captions exist, this file will contain no captions. This is the way ccextractor works and there's no way to turn it off. Kind of stupid, but that's how the code was when we got here. It always creates the digital 608 output file at the very start and will fill it with digital 608 captions if there are digital 608 captions.
When extracting SCTE-20 caption data, sometimes the scte20 output file is created and sometimes its not based on the extension of the source, ".mov, .mpg, mp4, etc". This is something we need to fix. The rule here, as with digital 608, is see if there is data in the file.
When extracting digital 708 captions, the output file is created conditionally. If there is 708 data you get a 708 output file, else there will not be a 708 output file.
Generated caption files are always in the same directory as the source file. The source file name, without extension, will always be the output file name plus ".srt", or ".smi", etc.
If you want other output types, use "-out=sami" for example. This will generate ".smi" files.
The time stamps in the generated 608 analog, 608 digital, and 708 digital output files should all match exactly. We put a lot of time into making this work correctly.
When the TN does caption extraction during preprocessing, it always tries to extract 608 analog, 608 digital, and 708 digital from the source. It uses these flags unless told to do something special by the profile: "-out=sami -1 -svc 1 -delay -0 -utf8 -noru -scte20"
Extracting 608 Digital Captions
ccextractor -noru -utf8 608d_captions.mp4
This will create a "608d_captions.srt" file. It will contain the digital 608 captions.
Extracting 608 Analog Captions
ccextractor -noru -utf8 -scte20 608a_608d_captions.mp4
This will create a "608a_608d_captions.srt" and a "608a_608d_captions_scte20.srt" file. Both files will contain data. Like digital 608 caption output, the "608a_608d_captions_scte20.srt" file is created first and filled with analog 608.
Extracting 708 Digital Captions
ccextractor -noru -utf8 -svc all 708d_captions.mp4
This will create a "708d_captions.srt" and a "708d_captions_svc1.srt" file. The "708d_captions.srt" file will be empty and the "708d_captions_svc1.srt" file should contain the 708 captions.
Extracting 608 Analog, 608 Digital, and 708 Digital
ccextractor -noru -utf8 -scte20 -svc all 608a_608d_708d_captions.mp4
This will create a "608a_608d_708d_captions.srt", a "608a_608d_708d_captions_scte20.srt" and a "608a_608d_708d_captions_svc1.srt" file. All three output files will have data.
Extracting 608 Analog, 608 Digital, and 708 Digital From A File With No Captions
ccextractor -noru -utf8 -scte20 -svc all no_captions.mpg
This will create a "no_captions.srt" and a "no_captions_scte20.srt". These two files will be empty. There will be no 708 output file.
Hello Mike,
ReplyDeleteI use ccextractor 0.88 and try to extract SCTE20 information from a TS file with CC608 + SCTE20 inside.
the command with option -scte20 is not valid with the version 0.88
"
E:\CCextractor\ccextractor.0.88-windows.binaries>ccextractorwinfull.exe --scte20 -out=txt -o "C:\dump_CC_ccextract88_.txt" -lf -bom -utf8 --nofontcolor C:\dump_CC.ts
Error: Error: Parameter --scte20 not understood.
CCExtractor 0.88, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues
"
I try with -scte20 or --scte20, same error.
Where can I find a version that accept this parameter?
Also without parameter --noscte20 the only file CC608 created contained extra char in almost all lines. Seems a mix of 608 + scte20.