Fix subtitles offset with python!

[UPDATE – May 25, 2014] I revamped this script, moved it to GitHub, and wrote a new post about it!
[UPDATE – May 19, 2013] Script updated to support Python 3!

One of the most common problems with subtitle files, especially with TV series subtitles, is that they often start all too late because you have a version of the video file containing opening titles (or ‘previously on MyFavoriteSeries’ sequences) and the subtitles don’t account for them, or the other way around.

Of course, once you’ve fixed this offset the subtitles are fine, as the movie is played at the same rate in all versions.

My beloved XBMC has a function to sync subtitles, but it’s more of a fine-tuning thing, you can’t specify a very large offset (last time I checked) and it takes some time to actually reload the subtitles and show you the results.

I developed a small script in python to do just that, as I thought that it would have been quicker to write it than to look for it (and it was… at least the quick&dirty version :D). To use it, just open the subtitles with any text editor you like, look for the first dialog and take note of when that dialog takes place in the movie: your offset is the difference between the time in the movie and the one you found in the file. So if the .srt file states that Renly Baratheon says “Do you swear it?” at 00:02:08,883 but in the .avi file it’s actually at roughly 00:03:43,500, your offset is 3:43,5 - 2:08,883 = 94,617 = 1:34,617. Then, you run the script calling

python subslider.py MySubs.srt offset

and your new subs are in MySubs_offset.srt. That’s it!

You can specify positive offsets –like e.g. +15— for when subtitles should be delayed, or negative offsets –like e.g. -30— in case it’s the movie that should be delayed (and subs anticipated).

Offsets can be specified both with decimal notation (as in +94,617, subs delayed by 94.617 seconds) and with time notation (as in -5:07,324, video delayed by 5 minutes 7 seconds 324 milliseconds). Time notation follows the one used in .srt files, so you get a comma as decimal separator.

Here it is, you can save it to a file named subslider.py and run it with python 2.7 ([Update – May 19, 2013] or python 3!).

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# SubSlider - a simple script to apply offsets to subtitles
#
# Copyright May 2nd 2012 - MB <https://somethingididnotknow.wordpress.com>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>
from __future__ import print_function
from datetime import timedelta, datetime
import os
import re
import sys

class SubSlider:
    """A simple script to apply offsets to subtitles.

    Subtitles can be delayed by specifying a positive offset (e.g. +12 or simply 12), or video can be delayed by specifying a negative offset (e.g. -12)"""

    def __init__(self, argv):
        if len(argv) < 2:
            self.usage()
        else:
            self.first_valid = 0
            self.parse_args(argv)
            self.parse_subs()
            self.fix_file()
            os.remove(self.output_temp)
            print('Success! Offset subs have been written to %s' % os.path.abspath(self.output_subs))

    def usage(self):
        print("""usage: subslider.py [-h] subs_file offset

Applies an offset to a subtitles file

positional arguments:
  subs_file             The input subtitles file, the one to which the offset
                        is to be applied
  offset                The offset to be applied to the input subtitles file.
                        Format is [+/-][MM:]SS[,sss] like +1:23,456 (new subs
                        will be displayed with a delay of 1 minute, 23 seconds,
                        456 milliseconds) or -100 (subs 100
                        seconds earlier) or +12,43 (subs delayed of 12 seconds
                        43 milliseconds)""")
        sys.exit(1)


    def parse_args(self, args):
        error = None
        if not os.path.isfile(args[0]):
            print('%s does not exist' % args[0])
            error = True
        else:
            self.input_subs = args[0]
            self.output_subs = '%s_offset.srt' % os.path.splitext(self.input_subs)[0]
            self.output_temp = '%s_temp.srt' % os.path.splitext(self.input_subs)[0]
        offset_ok = re.match('[\+\-]?(\d{1,2}\:)?\d+(\,\d{1,3})?$', args[1])
        if not offset_ok:
            print('%s is not a valid offset, format is [+/-][MM:]SS[,sss], see help dialog for some examples' % args[1])
            error = True
        else:
            offset = re.search('([\+\-])?((\d{1,2})\:)?(\d+)(\,(\d{1,3}))?', args[1])
            self.direction, self.minutes, self.seconds, self.millis = (offset.group(1), offset.group(3), offset.group(4), offset.group(6))
        if error:
            self.usage()

    def parse_subs(self):
        with open(self.input_subs, 'r') as input:
            with open(self.output_temp, 'w') as output:
                nsafe = lambda s: int(s) if s else 0 
                block = 0
                date_zero = datetime.strptime('00/1/1','%y/%m/%d')
                for line in input:
                    parsed = re.search('(\d{2}:\d{2}:\d{2},\d{3}) \-\-> (\d{2}:\d{2}:\d{2},\d{3})', line)
                    if parsed:
                        block += 1
                        start, end = (self.parse_time(parsed.group(1)), self.parse_time(parsed.group(2)))
                        offset = timedelta(minutes=nsafe(self.minutes), seconds=nsafe(self.seconds), microseconds=nsafe(self.millis) * 1000)
                        if '-' == self.direction:
                            start -= offset
                            end -= offset
                        else:
                            start += offset
                            end += offset
                        offset_start, offset_end = (self.format_time(start), self.format_time(end))
                        if not self.first_valid:
                            if end > date_zero:
                                self.first_valid = block
                                if start < date_zero:
                                    offset_start = '00:00:00,000'
                        output.write('%s --> %s\n' % (offset_start, offset_end))
                    else:
                        output.write(line)

    def fix_file(self):
        with open(self.output_temp, 'r') as input:
            with open(self.output_subs, 'w') as output:
                start_output = False
                for line in input:
                    if re.match('\d+$', line.strip()):
                        block_num = int(line.strip())
                        if block_num >= self.first_valid:
                            if not start_output:
                                start_output = True
                            output.write('%d\r\n' % (block_num - self.first_valid + 1))
                    elif start_output:
                        output.write(line)

    def format_time(self, value):
        formatted = datetime.strftime(value, '%H:%M:%S,%f')
        return formatted[:-3]

    def parse_time(self, time):
        parsed = datetime.strptime(time, '%H:%M:%S,%f')
        return parsed.replace(year=2000)

if __name__ == '__main__':
    SubSlider(sys.argv[1:])

as always, the same script is also on pastebin.

Whenever applying the offset moves some dialogs before 0:00:00,000 I decided to drop them altogether, starting with the first dialog ending after time 0, making it start at time 0 if start is negative.
The renumbering of dialogs (see fix_file) is something that is not needed, at least by VLC (which I used to test the script). You can have dialogs starting at, say, 42 and VLC is fine with that.

I was a little disappointed with the datetime.strptime function, in that it has no built-in support for milliseconds (only microseconds, and even that only on python2.7+!). The whole date/time/datetime system is not as pythonic as it seems at first sight, so I had to do a couple of little ugly things (as in parse_time and format_time).

Advertisements

19 thoughts on “Fix subtitles offset with python!

  1. I ran the script in the following manner and I’m getting an error

    F:\Users\ABC\Downloads\tmp>c:\Python33\python.exe subslider.py Side.Effects.2013
    .1080p.BluRay.x264.YIFY.srt -500
    File “subslider.py”, line 54
    43 milliseconds)”””
    ^
    SyntaxError: invalid syntax

    Am I running it incorrectly?

    Thanks

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s