Find the most recent file/folder in a folder with Java 8

A simple snippet that shows how Java 8 lambdas can be really nice to replace operations that used to require a lot of boilerplate in earlier versions of Java.

If you want to find the most recent file or subfolder in a folder with Java 8, here’s what you need to do:

Path parentFolder = Paths.get("path", "to", "your", "file");

Optional<File> mostRecentFileOrFolder =
    Arrays
        .stream(parentFolder.toFile().listFiles())
        .max(
            (f1, f2) -> Long.compare(f1.lastModified(),
                f2.lastModified()));

if (mostRecentFolder.isPresent()) {
    File mostRecent = mostRecentFileOrFolder.get();
    System.out.println("most recent is " + mostRecent.getPath());
} else {
    System.out.println("folder is empty!");
}

The very nice thing is that you can take advantage of the flexibility of Java 8 streams to either make the operation parallel (just throw in a parallel() call after Arrays.stream()), or to filter results according to other criteria.

For example, if you’re only interested in one type of child elements (a file or a folder), you could…

// if you're only interested in files...
Optional<File> mostRecentFile =
    Arrays
        .stream(parentFolder.toFile().listFiles())
        .filter(f -> f.isFile())
        .max(
            (f1, f2) -> Long.compare(f1.lastModified(),
                f2.lastModified()));

// if you're interested in folders...
Optional<File> mostRecentFolder =
    Arrays
        .stream(parentFolder.toFile().listFiles())
        .filter(f -> f.isDirectory())
        .max(
            (f1, f2) -> Long.compare(f1.lastModified(),
                f2.lastModified()));
Advertisements

Fix subtitles offset with python!

[UPDATE – May 25, 2014] I revamped this script, moved it to GitHub, and wrote a new post about it!
[UPDATE – May 19, 2013] Script updated to support Python 3!

One of the most common problems with subtitle files, especially with TV series subtitles, is that they often start all too late because you have a version of the video file containing opening titles (or ‘previously on MyFavoriteSeries’ sequences) and the subtitles don’t account for them, or the other way around.

Of course, once you’ve fixed this offset the subtitles are fine, as the movie is played at the same rate in all versions.

My beloved XBMC has a function to sync subtitles, but it’s more of a fine-tuning thing, you can’t specify a very large offset (last time I checked) and it takes some time to actually reload the subtitles and show you the results.

I developed a small script in python to do just that, as I thought that it would have been quicker to write it than to look for it (and it was… at least the quick&dirty version :D). To use it, just open the subtitles with any text editor you like, look for the first dialog and take note of when that dialog takes place in the movie: your offset is the difference between the time in the movie and the one you found in the file. So if the .srt file states that Renly Baratheon says “Do you swear it?” at 00:02:08,883 but in the .avi file it’s actually at roughly 00:03:43,500, your offset is 3:43,5 - 2:08,883 = 94,617 = 1:34,617. Then, you run the script calling

python subslider.py MySubs.srt offset

and your new subs are in MySubs_offset.srt. That’s it!

You can specify positive offsets –like e.g. +15— for when subtitles should be delayed, or negative offsets –like e.g. -30— in case it’s the movie that should be delayed (and subs anticipated).

Offsets can be specified both with decimal notation (as in +94,617, subs delayed by 94.617 seconds) and with time notation (as in -5:07,324, video delayed by 5 minutes 7 seconds 324 milliseconds). Time notation follows the one used in .srt files, so you get a comma as decimal separator.

Here it is, you can save it to a file named subslider.py and run it with python 2.7 ([Update – May 19, 2013] or python 3!).

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# SubSlider - a simple script to apply offsets to subtitles
#
# Copyright May 2nd 2012 - MB <https://somethingididnotknow.wordpress.com>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>
from __future__ import print_function
from datetime import timedelta, datetime
import os
import re
import sys

class SubSlider:
    """A simple script to apply offsets to subtitles.

    Subtitles can be delayed by specifying a positive offset (e.g. +12 or simply 12), or video can be delayed by specifying a negative offset (e.g. -12)"""

    def __init__(self, argv):
        if len(argv) < 2:
            self.usage()
        else:
            self.first_valid = 0
            self.parse_args(argv)
            self.parse_subs()
            self.fix_file()
            os.remove(self.output_temp)
            print('Success! Offset subs have been written to %s' % os.path.abspath(self.output_subs))

    def usage(self):
        print("""usage: subslider.py [-h] subs_file offset

Applies an offset to a subtitles file

positional arguments:
  subs_file             The input subtitles file, the one to which the offset
                        is to be applied
  offset                The offset to be applied to the input subtitles file.
                        Format is [+/-][MM:]SS[,sss] like +1:23,456 (new subs
                        will be displayed with a delay of 1 minute, 23 seconds,
                        456 milliseconds) or -100 (subs 100
                        seconds earlier) or +12,43 (subs delayed of 12 seconds
                        43 milliseconds)""")
        sys.exit(1)


    def parse_args(self, args):
        error = None
        if not os.path.isfile(args[0]):
            print('%s does not exist' % args[0])
            error = True
        else:
            self.input_subs = args[0]
            self.output_subs = '%s_offset.srt' % os.path.splitext(self.input_subs)[0]
            self.output_temp = '%s_temp.srt' % os.path.splitext(self.input_subs)[0]
        offset_ok = re.match('[\+\-]?(\d{1,2}\:)?\d+(\,\d{1,3})?$', args[1])
        if not offset_ok:
            print('%s is not a valid offset, format is [+/-][MM:]SS[,sss], see help dialog for some examples' % args[1])
            error = True
        else:
            offset = re.search('([\+\-])?((\d{1,2})\:)?(\d+)(\,(\d{1,3}))?', args[1])
            self.direction, self.minutes, self.seconds, self.millis = (offset.group(1), offset.group(3), offset.group(4), offset.group(6))
        if error:
            self.usage()

    def parse_subs(self):
        with open(self.input_subs, 'r') as input:
            with open(self.output_temp, 'w') as output:
                nsafe = lambda s: int(s) if s else 0 
                block = 0
                date_zero = datetime.strptime('00/1/1','%y/%m/%d')
                for line in input:
                    parsed = re.search('(\d{2}:\d{2}:\d{2},\d{3}) \-\-> (\d{2}:\d{2}:\d{2},\d{3})', line)
                    if parsed:
                        block += 1
                        start, end = (self.parse_time(parsed.group(1)), self.parse_time(parsed.group(2)))
                        offset = timedelta(minutes=nsafe(self.minutes), seconds=nsafe(self.seconds), microseconds=nsafe(self.millis) * 1000)
                        if '-' == self.direction:
                            start -= offset
                            end -= offset
                        else:
                            start += offset
                            end += offset
                        offset_start, offset_end = (self.format_time(start), self.format_time(end))
                        if not self.first_valid:
                            if end > date_zero:
                                self.first_valid = block
                                if start < date_zero:
                                    offset_start = '00:00:00,000'
                        output.write('%s --> %s\n' % (offset_start, offset_end))
                    else:
                        output.write(line)

    def fix_file(self):
        with open(self.output_temp, 'r') as input:
            with open(self.output_subs, 'w') as output:
                start_output = False
                for line in input:
                    if re.match('\d+$', line.strip()):
                        block_num = int(line.strip())
                        if block_num >= self.first_valid:
                            if not start_output:
                                start_output = True
                            output.write('%d\r\n' % (block_num - self.first_valid + 1))
                    elif start_output:
                        output.write(line)

    def format_time(self, value):
        formatted = datetime.strftime(value, '%H:%M:%S,%f')
        return formatted[:-3]

    def parse_time(self, time):
        parsed = datetime.strptime(time, '%H:%M:%S,%f')
        return parsed.replace(year=2000)

if __name__ == '__main__':
    SubSlider(sys.argv[1:])

as always, the same script is also on pastebin.

Whenever applying the offset moves some dialogs before 0:00:00,000 I decided to drop them altogether, starting with the first dialog ending after time 0, making it start at time 0 if start is negative.
The renumbering of dialogs (see fix_file) is something that is not needed, at least by VLC (which I used to test the script). You can have dialogs starting at, say, 42 and VLC is fine with that.

I was a little disappointed with the datetime.strptime function, in that it has no built-in support for milliseconds (only microseconds, and even that only on python2.7+!). The whole date/time/datetime system is not as pythonic as it seems at first sight, so I had to do a couple of little ugly things (as in parse_time and format_time).

Some simple utility methods I use a lot in Java

Whenever I start a new project in Java I always find myself in need of some basic utility methods that are lacking in the standard library. I suppose that every programmer has his own little “bag of tricks” for every language (well, maybe not for python.. ❤ ).. and this is mine!

These are only some of the methods I use the most, of course, those that may be useful for many people 🙂

/*
 * This program is free software. It comes without any warranty, to
 * the extent permitted by applicable law. You can redistribute it
 * and/or modify it under the terms of the Do What The Fuck You Want
 * To Public License, Version 2, as published by Sam Hocevar. See
 * http://sam.zoy.org/wtfpl/COPYING for more details.
 */

import java.util.List;
import java.util.Set;

/**
 * Contains only a bunch of <code>static</code> utility methods.
 *
 * @author mb - somethingididnotknow.wordpress.com
 */
public final class Utilities {

    /**
     * Checks whether <strong>all</strong> the provided objects are
     * <code>null</code>.
     *
     * @param objects
     *            a number of objects of any kind that are to be checked
     *            against <code>null</code>
     * @return <code>true</code> in case <strong>all</strong> the argument
     *         objects are <code>null</code>, <code>false</code> otherwise.
     */
    public static boolean areAllNull(Object... objects) {
        for (Object o : objects) {
            if (o != null)
                return false;
        }
        return true;
    }

    /**
     * Checks whether <strong>any</strong> of the argument objects is
     * <code>null</code>.
     *
     * @param objects
     *            a number of objects of any kind that are to be checked
     *            against <code>null</code>.
     * @return <code>true</code> if at least one of the arguments is
     *         <code>null</code>.
     */
    public static boolean isAnyNull(Object... objects) {
        for (Object o : objects) {
            if (o == null)
                return true;
        }
        return false;
    }

    /**
     * Checks whether the two arguments are equal using a <em>null-safe</em>
     * comparison.
     *
     * In case only one of the two objects is <code>null</code>,
     * <code>false</code> is returned. In case both are not <code>null</code>
     * {@link Object#equals(Object)} is called on the first object using the
     * second as argument.
     *
     * @param first
     *            the first object to be checked.
     * @param second
     *            the second object to be checked.
     * @return <code>true</code> in case {@link Object#equals(Object)} returns
     *         <code>true</code> or the objects are both <code>null</code>,
     *         <code>false</code> otherwise.
     */
    public static boolean nsEquals(Object first, Object second) {
        if (first == null)
            return second == null;
        if (second == null)
            return false;
        return first.equals(second);
    }

    /**
     * Returns a String that is empty in case the argument <tt>string</tt> is
     * <code>null</code>, the unmodified <tt>string</tt> otherwise.
     *
     * @param string
     *            the string to be checked against <code>null</code>
     * @return the empty String if <tt>string</tt> is <code>null</code>, the
     *         argument <tt>string</tt> unmodified otherwise
     */
    public static String nonNull(final String string) {
        return string == null ? "" : string;
    }

    /**
     * An equivalent of Python's <code>str.join()</code> function on lists: it
     * returns a String which is the concatenation of the strings in the
     * argument array. The separator between elements is the argument
     * <tt>toJoin</tt> string. The separator is only inserted between
     * elements: there's no separator before the first element or after the
     * last.
     *
     * @param toJoin
     *            the separator, if <code>null</code> the empty String is used
     * @param list
     *            a list of <code>Object</code>s on which
     *            {@link Object#toString()} will be called
     * @return the concatenation of String representations of the objects in
     *         the list
     */
    public static String join(String toJoin, Object[] list) {
        if (list == null || list.length == 0)
            return "";
        StringBuilder builder = new StringBuilder();
        String delimiter = nonNull(toJoin);
        int i = 0;
        for (; i < (list.length - 1); i++) {
            if (list[i] != null)
                builder.append(list[i]);
            builder.append(delimiter);
        }
        builder.append(list[i]);
        return builder.toString();
    }

    /**
     * An equivalent of Python's <code>str.join()</code> function on lists: it
     * returns a String which is the concatenation of the strings in the
     * argument list. The separator between elements is the string providing
     * this method. The separator is only inserted between elements: there's
     * no separator before the first element or after the last.
     *
     * @param toJoin
     *            the separator, if <code>null</code> the empty String is used
     * @param list
     *            a list of <code>Object</code>s on which
     *            {@link Object#toString()} will be called
     * @return the concatenation of String representations of the objects in
     *         the list
     */
    public static String join(String toJoin, List list) {
        if (list == null || list.isEmpty())
            return "";
        StringBuilder builder = new StringBuilder();
        String delimiter = nonNull(toJoin);
        int i = 0;
        for (; i < list.size() - 1; i++) {
            if (list.get(i) != null)
                builder.append(list.get(i));
            builder.append(delimiter);
        }
        builder.append(list.get(i));
        return builder.toString();
    }

    /**
     * An equivalent of Python's <code>str.join()</code> function on lists: it
     * returns a String which is the concatenation of the strings in the
     * argument list. The separator between elements is the string providing
     * this method. The separator is only inserted between elements: there's
     * no separator before the first element or after the last.
     *
     * @param toJoin
     *            the separator, if <code>null</code> the empty String is used
     * @param set
     *            a set of <code>Object</code>s on which
     *            {@link Object#toString()} will be called
     * @return the concatenation of String representations of the objects in
     *         the set
     */
    public static String join(String toJoin, Set set) {
        return join(toJoin, set.toArray());
    }
    /**
     * Checks whether the argument <tt>array</tt> contains at least a
     * <code>null</code> value.
     *
     * @param array
     *            the array to be checked.
     * @return <code>true</code> in case <em>at least</em> one of the values
     *         stored in the argument <tt>array</tt> is <code>null</code>, or
     *         in case the <tt>array</tt> itself is <code>null</code>.
     */
    public static boolean containsNull(Object[] array) {
        if (array == null)
            return true;
        for (Object o : array) {
            if (o == null)
                return true;
        }
        return false;
    }

    /**
     * Checks whether the argument <tt>list</tt> contains at least a
     * <code>null</code> value.
     *
     * @param list
     *            the list to be checked
     * @return <code>true</code> in case <em>at least</em> one of the values
     *         stored in the argument <tt>array</tt> is <code>null</code>, or
     *         in case the <tt>list</tt> itself is <code>null</code>
     */
    public static boolean containsNull(List list) {
        if (list == null)
            return true;
        for (Object o : list) {
            if (o == null)
                return true;
        }
        return false;
    }

    /**
     * Checks whether the argument <tt>string</tt> is <code>null</code> or
     * empty. Please note that the <tt>string</tt> is
     * <strong>trimmed</strong>, so that a check on a string containing
     * white spaces only will always return <code>true</code>.
     *
     * @param string
     *            the string to be checked
     * @return <code>true</code> in case the argument <tt>string</tt> is
     *         <code>null</code>, empty ({@link String#length()} returns 0) or
     *         contains only white spaces (
     *         <tt>{@link String#trim()}.length()</tt> returns 0)
     */
    public static boolean isNullOrEmpty(String string) {
        return string == null || string.trim().length() == 0;
    }
}

I don’t know what’s wrong with the syntax highlighter… here’s the same class in pastebin!