Fix out of sync subtitles with Python!

This is an update of my old post from a couple of years ago.

[Edit – October 2015]: I created an in-browser version of subslider, called subslider.js. Just visit this page and follow the instructions, or read this blog post if you want to know more about it!

After using that script quite a few times, and loving it, I decided to give it a facelift and add the one feature that I’ve been wishing it had for all this time: the option to just tell it the timestamp of the first dialog without performing any math 🙂

Yeah, it’s simple math, but having to use base 60 means more brain CPU time wasted (and above all, it means more time separating me from my movie!).

I also moved the code to GitHub, so you can find it here: SubSlider. And this is the direct link to the python script, for the impatient.

The old way of specifying offsets using +/- has been replaced my a more argparse-standard system of flags. Also, the new feature I mentioned above can be used by running the script like this:

python subslider.py -s 1:23,450 MySubFile.srt

assuming your subtitles file is called MySubFile.srt and assuming that the first dialog in the movie takes place at 1:23,450. This time, there’s an “interactive” dialog that asks you to choose the first line among the first 10 lines in the .srt file. I added it because sometimes you get the equivalent of opening titles in the .srt, and that doesn’t help when you’re synchronizing.

If you want to get a different number of lines, you can simply change the LINES_TO_SHOW variable at line 43 to whatever number you prefer.

As always, feel free to contribute 🙂

Advertisements

Create a diff for i18n strings.xml files to manage localization on Android

Keeping all your strings.xml files synchronized in Android projects can be painful, as Eclipse doesn’t tell you which strings have no localized version in which language. Android is perfectly happy with it as well, it just uses the default (usually English) string in the app, much for the joy of your non-English users.

I came up with a simple Python script that just scans your res/values-** folders for strings.xml files and, using your default res/values/strings.xml as reference, outputs a list of missing strings for each file, along with the original value set for the key.

So, if for instance your res/values/strings.xml is this:

<?xml version="1.0" encoding="utf-8"?>
<resources>
    <string name="app_name">My App</string>
    <string name="title_activity_main">My Activity</string>
    <string name="hello_world">Hello, World!</string>
</resources>

and your, say, res/values-it/strings.xml is this:

<?xml version="1.0" encoding="utf-8"?>
<resources>
    <string name="app_name">La mia App</string>
    <string name="title_activity_main">La mia Activity</string>
</resources>

and your… res/values-fr/strings.xml? is this:

<?xml version="1.0" encoding="utf-8"?>
<resources>
    <string name="app_name">Mon App</string>
    <string name="hello_world">Bonjour, Monde!</string>
</resources>

the script would output:

Missing in /home/whatever/wherever/.../App/res/values-it/strings.xml:
<string name="hello_world">Hello, World!</string>

Missing in /home/whatever/wherever/.../App/res/values-fr/strings.xml:
<string name="title_activity_main">My Activity</string>

So the idea is that you can cut and paste those lines in the appropriate files to translate them.

The script also outputs some warnings in case it finds duplicate keys in any of the strings.xml files.
Your localized strings.xml files may have more <string> items than the default, as no check is performed against that.

I put the script it in a folder within my Android projects that is simply ignored by Android (I usually call it not_in_apk or something like that), so if you put it elsewhere remember to change the path at line 23

path_to_default = '../res/values/strings.xml'

to the path to your default strings.xml file (absolute or relative, it should work anyway).

I didn’t do much testing, so it may not work for you… Worst thing that can happen is.. it doesn’t work 🙂
It won’t mess with your files, I promise you that.

Here’s the script! Run it with python i18n.py.

Last note: this script only takes strings.xml files into account, you should run Android Lint to check for strings to be translated in other XML files (stringarrays.xml and other files).

Fix subtitles offset with python!

[UPDATE – May 25, 2014] I revamped this script, moved it to GitHub, and wrote a new post about it!
[UPDATE – May 19, 2013] Script updated to support Python 3!

One of the most common problems with subtitle files, especially with TV series subtitles, is that they often start all too late because you have a version of the video file containing opening titles (or ‘previously on MyFavoriteSeries’ sequences) and the subtitles don’t account for them, or the other way around.

Of course, once you’ve fixed this offset the subtitles are fine, as the movie is played at the same rate in all versions.

My beloved XBMC has a function to sync subtitles, but it’s more of a fine-tuning thing, you can’t specify a very large offset (last time I checked) and it takes some time to actually reload the subtitles and show you the results.

I developed a small script in python to do just that, as I thought that it would have been quicker to write it than to look for it (and it was… at least the quick&dirty version :D). To use it, just open the subtitles with any text editor you like, look for the first dialog and take note of when that dialog takes place in the movie: your offset is the difference between the time in the movie and the one you found in the file. So if the .srt file states that Renly Baratheon says “Do you swear it?” at 00:02:08,883 but in the .avi file it’s actually at roughly 00:03:43,500, your offset is 3:43,5 - 2:08,883 = 94,617 = 1:34,617. Then, you run the script calling

python subslider.py MySubs.srt offset

and your new subs are in MySubs_offset.srt. That’s it!

You can specify positive offsets –like e.g. +15— for when subtitles should be delayed, or negative offsets –like e.g. -30— in case it’s the movie that should be delayed (and subs anticipated).

Offsets can be specified both with decimal notation (as in +94,617, subs delayed by 94.617 seconds) and with time notation (as in -5:07,324, video delayed by 5 minutes 7 seconds 324 milliseconds). Time notation follows the one used in .srt files, so you get a comma as decimal separator.

Here it is, you can save it to a file named subslider.py and run it with python 2.7 ([Update – May 19, 2013] or python 3!).

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# SubSlider - a simple script to apply offsets to subtitles
#
# Copyright May 2nd 2012 - MB <https://somethingididnotknow.wordpress.com>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>
from __future__ import print_function
from datetime import timedelta, datetime
import os
import re
import sys

class SubSlider:
    """A simple script to apply offsets to subtitles.

    Subtitles can be delayed by specifying a positive offset (e.g. +12 or simply 12), or video can be delayed by specifying a negative offset (e.g. -12)"""

    def __init__(self, argv):
        if len(argv) < 2:
            self.usage()
        else:
            self.first_valid = 0
            self.parse_args(argv)
            self.parse_subs()
            self.fix_file()
            os.remove(self.output_temp)
            print('Success! Offset subs have been written to %s' % os.path.abspath(self.output_subs))

    def usage(self):
        print("""usage: subslider.py [-h] subs_file offset

Applies an offset to a subtitles file

positional arguments:
  subs_file             The input subtitles file, the one to which the offset
                        is to be applied
  offset                The offset to be applied to the input subtitles file.
                        Format is [+/-][MM:]SS[,sss] like +1:23,456 (new subs
                        will be displayed with a delay of 1 minute, 23 seconds,
                        456 milliseconds) or -100 (subs 100
                        seconds earlier) or +12,43 (subs delayed of 12 seconds
                        43 milliseconds)""")
        sys.exit(1)


    def parse_args(self, args):
        error = None
        if not os.path.isfile(args[0]):
            print('%s does not exist' % args[0])
            error = True
        else:
            self.input_subs = args[0]
            self.output_subs = '%s_offset.srt' % os.path.splitext(self.input_subs)[0]
            self.output_temp = '%s_temp.srt' % os.path.splitext(self.input_subs)[0]
        offset_ok = re.match('[\+\-]?(\d{1,2}\:)?\d+(\,\d{1,3})?$', args[1])
        if not offset_ok:
            print('%s is not a valid offset, format is [+/-][MM:]SS[,sss], see help dialog for some examples' % args[1])
            error = True
        else:
            offset = re.search('([\+\-])?((\d{1,2})\:)?(\d+)(\,(\d{1,3}))?', args[1])
            self.direction, self.minutes, self.seconds, self.millis = (offset.group(1), offset.group(3), offset.group(4), offset.group(6))
        if error:
            self.usage()

    def parse_subs(self):
        with open(self.input_subs, 'r') as input:
            with open(self.output_temp, 'w') as output:
                nsafe = lambda s: int(s) if s else 0 
                block = 0
                date_zero = datetime.strptime('00/1/1','%y/%m/%d')
                for line in input:
                    parsed = re.search('(\d{2}:\d{2}:\d{2},\d{3}) \-\-> (\d{2}:\d{2}:\d{2},\d{3})', line)
                    if parsed:
                        block += 1
                        start, end = (self.parse_time(parsed.group(1)), self.parse_time(parsed.group(2)))
                        offset = timedelta(minutes=nsafe(self.minutes), seconds=nsafe(self.seconds), microseconds=nsafe(self.millis) * 1000)
                        if '-' == self.direction:
                            start -= offset
                            end -= offset
                        else:
                            start += offset
                            end += offset
                        offset_start, offset_end = (self.format_time(start), self.format_time(end))
                        if not self.first_valid:
                            if end > date_zero:
                                self.first_valid = block
                                if start < date_zero:
                                    offset_start = '00:00:00,000'
                        output.write('%s --> %s\n' % (offset_start, offset_end))
                    else:
                        output.write(line)

    def fix_file(self):
        with open(self.output_temp, 'r') as input:
            with open(self.output_subs, 'w') as output:
                start_output = False
                for line in input:
                    if re.match('\d+$', line.strip()):
                        block_num = int(line.strip())
                        if block_num >= self.first_valid:
                            if not start_output:
                                start_output = True
                            output.write('%d\r\n' % (block_num - self.first_valid + 1))
                    elif start_output:
                        output.write(line)

    def format_time(self, value):
        formatted = datetime.strftime(value, '%H:%M:%S,%f')
        return formatted[:-3]

    def parse_time(self, time):
        parsed = datetime.strptime(time, '%H:%M:%S,%f')
        return parsed.replace(year=2000)

if __name__ == '__main__':
    SubSlider(sys.argv[1:])

as always, the same script is also on pastebin.

Whenever applying the offset moves some dialogs before 0:00:00,000 I decided to drop them altogether, starting with the first dialog ending after time 0, making it start at time 0 if start is negative.
The renumbering of dialogs (see fix_file) is something that is not needed, at least by VLC (which I used to test the script). You can have dialogs starting at, say, 42 and VLC is fine with that.

I was a little disappointed with the datetime.strptime function, in that it has no built-in support for milliseconds (only microseconds, and even that only on python2.7+!). The whole date/time/datetime system is not as pythonic as it seems at first sight, so I had to do a couple of little ugly things (as in parse_time and format_time).

Some simple utility methods I use a lot in Java

Whenever I start a new project in Java I always find myself in need of some basic utility methods that are lacking in the standard library. I suppose that every programmer has his own little “bag of tricks” for every language (well, maybe not for python.. ❤ ).. and this is mine!

These are only some of the methods I use the most, of course, those that may be useful for many people 🙂

/*
 * This program is free software. It comes without any warranty, to
 * the extent permitted by applicable law. You can redistribute it
 * and/or modify it under the terms of the Do What The Fuck You Want
 * To Public License, Version 2, as published by Sam Hocevar. See
 * http://sam.zoy.org/wtfpl/COPYING for more details.
 */

import java.util.List;
import java.util.Set;

/**
 * Contains only a bunch of <code>static</code> utility methods.
 *
 * @author mb - somethingididnotknow.wordpress.com
 */
public final class Utilities {

    /**
     * Checks whether <strong>all</strong> the provided objects are
     * <code>null</code>.
     *
     * @param objects
     *            a number of objects of any kind that are to be checked
     *            against <code>null</code>
     * @return <code>true</code> in case <strong>all</strong> the argument
     *         objects are <code>null</code>, <code>false</code> otherwise.
     */
    public static boolean areAllNull(Object... objects) {
        for (Object o : objects) {
            if (o != null)
                return false;
        }
        return true;
    }

    /**
     * Checks whether <strong>any</strong> of the argument objects is
     * <code>null</code>.
     *
     * @param objects
     *            a number of objects of any kind that are to be checked
     *            against <code>null</code>.
     * @return <code>true</code> if at least one of the arguments is
     *         <code>null</code>.
     */
    public static boolean isAnyNull(Object... objects) {
        for (Object o : objects) {
            if (o == null)
                return true;
        }
        return false;
    }

    /**
     * Checks whether the two arguments are equal using a <em>null-safe</em>
     * comparison.
     *
     * In case only one of the two objects is <code>null</code>,
     * <code>false</code> is returned. In case both are not <code>null</code>
     * {@link Object#equals(Object)} is called on the first object using the
     * second as argument.
     *
     * @param first
     *            the first object to be checked.
     * @param second
     *            the second object to be checked.
     * @return <code>true</code> in case {@link Object#equals(Object)} returns
     *         <code>true</code> or the objects are both <code>null</code>,
     *         <code>false</code> otherwise.
     */
    public static boolean nsEquals(Object first, Object second) {
        if (first == null)
            return second == null;
        if (second == null)
            return false;
        return first.equals(second);
    }

    /**
     * Returns a String that is empty in case the argument <tt>string</tt> is
     * <code>null</code>, the unmodified <tt>string</tt> otherwise.
     *
     * @param string
     *            the string to be checked against <code>null</code>
     * @return the empty String if <tt>string</tt> is <code>null</code>, the
     *         argument <tt>string</tt> unmodified otherwise
     */
    public static String nonNull(final String string) {
        return string == null ? "" : string;
    }

    /**
     * An equivalent of Python's <code>str.join()</code> function on lists: it
     * returns a String which is the concatenation of the strings in the
     * argument array. The separator between elements is the argument
     * <tt>toJoin</tt> string. The separator is only inserted between
     * elements: there's no separator before the first element or after the
     * last.
     *
     * @param toJoin
     *            the separator, if <code>null</code> the empty String is used
     * @param list
     *            a list of <code>Object</code>s on which
     *            {@link Object#toString()} will be called
     * @return the concatenation of String representations of the objects in
     *         the list
     */
    public static String join(String toJoin, Object[] list) {
        if (list == null || list.length == 0)
            return "";
        StringBuilder builder = new StringBuilder();
        String delimiter = nonNull(toJoin);
        int i = 0;
        for (; i < (list.length - 1); i++) {
            if (list[i] != null)
                builder.append(list[i]);
            builder.append(delimiter);
        }
        builder.append(list[i]);
        return builder.toString();
    }

    /**
     * An equivalent of Python's <code>str.join()</code> function on lists: it
     * returns a String which is the concatenation of the strings in the
     * argument list. The separator between elements is the string providing
     * this method. The separator is only inserted between elements: there's
     * no separator before the first element or after the last.
     *
     * @param toJoin
     *            the separator, if <code>null</code> the empty String is used
     * @param list
     *            a list of <code>Object</code>s on which
     *            {@link Object#toString()} will be called
     * @return the concatenation of String representations of the objects in
     *         the list
     */
    public static String join(String toJoin, List list) {
        if (list == null || list.isEmpty())
            return "";
        StringBuilder builder = new StringBuilder();
        String delimiter = nonNull(toJoin);
        int i = 0;
        for (; i < list.size() - 1; i++) {
            if (list.get(i) != null)
                builder.append(list.get(i));
            builder.append(delimiter);
        }
        builder.append(list.get(i));
        return builder.toString();
    }

    /**
     * An equivalent of Python's <code>str.join()</code> function on lists: it
     * returns a String which is the concatenation of the strings in the
     * argument list. The separator between elements is the string providing
     * this method. The separator is only inserted between elements: there's
     * no separator before the first element or after the last.
     *
     * @param toJoin
     *            the separator, if <code>null</code> the empty String is used
     * @param set
     *            a set of <code>Object</code>s on which
     *            {@link Object#toString()} will be called
     * @return the concatenation of String representations of the objects in
     *         the set
     */
    public static String join(String toJoin, Set set) {
        return join(toJoin, set.toArray());
    }
    /**
     * Checks whether the argument <tt>array</tt> contains at least a
     * <code>null</code> value.
     *
     * @param array
     *            the array to be checked.
     * @return <code>true</code> in case <em>at least</em> one of the values
     *         stored in the argument <tt>array</tt> is <code>null</code>, or
     *         in case the <tt>array</tt> itself is <code>null</code>.
     */
    public static boolean containsNull(Object[] array) {
        if (array == null)
            return true;
        for (Object o : array) {
            if (o == null)
                return true;
        }
        return false;
    }

    /**
     * Checks whether the argument <tt>list</tt> contains at least a
     * <code>null</code> value.
     *
     * @param list
     *            the list to be checked
     * @return <code>true</code> in case <em>at least</em> one of the values
     *         stored in the argument <tt>array</tt> is <code>null</code>, or
     *         in case the <tt>list</tt> itself is <code>null</code>
     */
    public static boolean containsNull(List list) {
        if (list == null)
            return true;
        for (Object o : list) {
            if (o == null)
                return true;
        }
        return false;
    }

    /**
     * Checks whether the argument <tt>string</tt> is <code>null</code> or
     * empty. Please note that the <tt>string</tt> is
     * <strong>trimmed</strong>, so that a check on a string containing
     * white spaces only will always return <code>true</code>.
     *
     * @param string
     *            the string to be checked
     * @return <code>true</code> in case the argument <tt>string</tt> is
     *         <code>null</code>, empty ({@link String#length()} returns 0) or
     *         contains only white spaces (
     *         <tt>{@link String#trim()}.length()</tt> returns 0)
     */
    public static boolean isNullOrEmpty(String string) {
        return string == null || string.trim().length() == 0;
    }
}

I don’t know what’s wrong with the syntax highlighter… here’s the same class in pastebin!

Set fullscreen windows to go over gnome-shell’s (and Unity’s) top panel (top bar) with python/C++ and Glade

I know that there’s a function called fullscreen() that you can call on a window to set it fullscreen. I also know that my screen resolution is 1920×1080, so I supposed that explicitly setting the window’s height and width to the exact screen size would have forced mutter to put my window above everything else.

But…

No, it doesn’t. There’s a sneaky little option called Window Type that has to be set to Popup to make gnome’s top bar surrender and let your window dominate the screen. You can find it in the General tab in Glade (it’s the third field in Glade 3.10.0, I don’t know about the other versions).

Is this some kind of common knowledge that doesn’t have to be put on tutorials? I find the lack of documentation on GTK+3 disturbing.. 🙂 (I know that the API is quite well documented, but we’re still missing the plethora of examples you can find for GTK+2 on the web)

Retrieve the window object from widgets with pygobject / GTK+3

I’m trying to develop a simple app using PyGObject, pyGTK as we used to call it last time I used it.

I must say that I mildly hate Gnome 3 (and Unity, for that matters) and even though there’s certainly been an improvement in the overall quality of the API for GTK+3 the lack of documentation and good tutorials is very depressing. One must always fall back to C documentation and hope that a python function exists with a similar name, or struggle through the source code to look for it.

Rant aside, one thing I wanted to do is to draw on a DrawingArea using Cairo whenever the mouse cursor enters some region. Once I got Glade to signal the motion-notify-event correctly (you have to both set the signal on the ‘signals’ tab and check the option for ‘Pointer Motion’ in the ‘Common’ tab under ‘Events’) I realized that you must create a Cairo context every time the signal handler is called, so that you can draw on it.

Googling and googling (and after stumbling upon the best source for what I wanted to do… even though it’s in Japanese! :D) I found that the easiest solution is to call something like

def onMouseMove(self, widget, event):
  cr = widget.window.cairo_create()
  cr.arc(whatever...)

Unfortunately, an AttributeError: 'DrawingArea' object has no attribute 'window' came up, so after more googling I found out that the new fashion is to call get_property("propertyName") to retrieve its fields… which is not bad in and on itself (if not a little too weakly-typed for my tastes), but it’s not that immediate if you were used to code for GTK+2.

This is what I finally came up with:

def onMouseMove(self, widget, event):
  cr = widget.get_property('window').cairo_create()
  cr.arc(whatever...)

Now I just have to do the rest… 🙂