srt2CSV script lets you convert SubRip (.srt) formatted file(s) into a CSV file.
This script can be utilized when one is collaborating with others in translation project.
First argument is the file of the .srt file in original language (such as English), second argument is another .srt file (partially or totally) translated into another language. If third is given, it is used as a name of output CSV file.
Usage : srt2csv INPUT_SRT_FILENAME [-t TARGET_LANG_SRT_FILENAME] [-o OUTPUT_CSV_FILENAME]
#!/usr/bin/perl -w
# srt2CSV.pl
#
# Copyright(C) Since 2009 Akira KAKINOHANA All Rights Reserved
# Author : Akira KAKINOHANA <kira@kirameister.net>
# Distributed at : http://softwares.kirameister.net/
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation version 3.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# For GNU General Public License, see <http://www.gnu.org/licenses/>.
my $version = "0.06 - Apr 3 2010";
use Getopt::Std;
use strict;
use warnings;
use utf8;
use Getopt::Long;
binmode(STDOUT, ":utf8");
my $arg_num = @ARGV;
if ($arg_num < 1){
die
"Usage perl $0 INPUT_SRT_FILENAME [-t TARGET_LANG_SRT_FILENAME] [-o OUTPUT_CSV_FILENAME]
Use --help for available options\n";
}
my $opt_target = 0;
my $opt_output = 0;
my $opt_help = 0;
my $opt_version = 0;
GetOptions(
'target=s' => \$opt_target,
'output=s' => \$opt_output,
'help' => \$opt_help,
'version' => \$opt_version
);
if ($opt_version){
print "srt2csv.pl written by Akira K.\n";
print "Version : " .$version."\n";
exit;
}
if ($opt_help){
print
"NAME :
srt2csv -- script that lets you convert SubRip (.srt) formatted file(s) into a CSV file.
SYNOPSIS :
perl srt2csv.pl INPUT_SRT_FILE -o OUTPUT_CSV_FILE
perl srt2csv.pl INPUT_SRT_FILE -t INPUT_SRT_FILE_TARGET_LANG
perl srt2csv.pl INPUT_SRT_FILE
You need to specify input .srt file(s) and output .csv file, if necessary.
When -o (--output) option is set, following argument is set as the output
CSV file.
When -o option is not specified, the result is shown in the standard output
(STDOUT).
When -t (--target) option is specified, following argument is seen as
another input .srt file.
This file is a .srt file in translated language, and must be in .srt
(SubRip) format.
When no option is given, the first argument (filename) is read as an
unique source file. The file could contain translated scripts below the
original lines (of course, a .srt file without any editing is acceptable
too). The restriction of the format is :
o The original information must be stored AS IT WAS.
o The translated text can be inserted ONLY after the original text line.
o Translation can be several-lines long.
o No blank line, however, must be inserted between the translated lines.
o Between the segments, (at least) one blank line must be inserted.
o Comment line can be inserted by using the # at the beginning of line.
The unique input file must follow the following format:
===
1
00:00:00,000 --> 00:00:03,000
My talk is \"Flapping Birds and Space Telescopes.\"
私の話は「折鶴と宇宙望遠鏡」です
2
00:00:03,000 --> 00:00:05,000
And you would think that should have nothing to do with one another,
どちらも関係ないもののように
思われるかもしれませんが
#comment line
3
00:00:05,000 --> 00:00:08,000
but I hope by the end of these 18 minutes,
4
(snip..)
===
When no option is given, the first argument is considered to be the input
.srt file, and the result is shown in the standard output (STDOUT).
AUTHOR :
Akira KAKINOHANA
";
exit;
}
my $csv_filename = $opt_output if ($opt_output);
my $target_filename = $opt_target if ($opt_target);
my $srt_filename = "";
$srt_filename = $ARGV[0];
open (DATA, "<:utf8", $srt_filename) || die "Cannot open the file $srt_filename : $!\n";
my $output_file = 0;
if (defined($csv_filename)){
$output_file = 1;
open (OUT, ">:utf8" ,$csv_filename) || die "Cannot open the file $csv_filename : $!\n";
}
## if --target option is set..
my %dest_table;
if (defined($target_filename)){
open (T_DATA, "<:utf8", $target_filename) || die "Cannot open the file $target_filename : $!\n";
while (my $no = ){
my $time = ;
my $dest = ;
my $temp = ;
chomp($no);
$no =~ s/\D//g;
chomp($dest);
$dest =~ s/"/''/g; ## in order to avoid the CSV mismatch..
$dest_table{"$no"} = $dest;
}
}
my $line_no = 0;;
if ($output_file){
print OUT "no,s_time,e_time,source,target,comment\n";
}else{
print "no,s_time,e_time,source,target,comment\n";
}
LOOP: while (my $line = ){
$line_no++;
chomp($line);
next LOOP if ($line =~ /^$/);
$line =~ s/\D//g;
my $no = $line; # number.
if ($no !~ m/^\d+$/){
print STDERR "Format ERROR at (or before) the line $line_no : $no \n";
}
$line = ;
$line_no++;
chomp($line);
$line =~ /^(.*?) --> (.*?)$/;
my $s_time = $1;
my $e_time = $2;
$s_time =~ s/,/./; # start time.
$e_time =~ s/,/./; # end time.
$line = ;
$line_no++;
chomp($line);
$line =~ s/"/''/g; ## in order to avoid the CSV mismatch..
$line = "\"" . $line . "\"";
my $src = $line; # en.
my $dest = "";
while ($line = ){
$line_no++;
chomp($line);
if ($line =~ /^$/){ # go to next segment..
## printint out..
if (defined($target_filename)){
if ($output_file){
print OUT $no . "," . $s_time . "," . $e_time . "," . $src . ",\"" . $dest_table{$no} . "\",\n";
}else{
print $no . "," . $s_time . "," . $e_time . "," . $src . ",\"" . $dest_table{$no} . "\",\n";
}
}else{
if ($output_file){
print OUT $no . "," . $s_time . "," . $e_time . "," . $src . ",\"" . $dest . "\",\n";
}else{
print $no . "," . $s_time . "," . $e_time . "," . $src . ",\"" . $dest . "\",\n";
}
}
next LOOP;
}
next if ($line =~ m/^#/);
$line =~ s/"/''/g; ## in order to avoid the CSV mismatch..
$dest .= $line;
}
}
__END__
If you do not wish to have to create his / her script running environment, please visit Service page.