batch converting raw image files to JPEG using Photoshop and Python

Recently I had to batch convert a bunch of raw image files to JPEG. I had a folder given to me and this folder had a lot of image files, some were in JPEG format, and some other were in NEF format. The NEF files are a kind of raw image files taken right from some digital camera. Windows 7′s built-in file browser could not display the images for NEF files, but Photoshop could. They were large in file size compared to JPEG files. My task was to convert all NEF files to JPEG and discard the NEF files to save disk space.

I thought to write a Python script to do the job and googled around but there seemed to be no module for conversion. PIL is a popular image library for Python and it doesn’t support reading NEF files.

So I thought maybe what if I could automate Photoshop? Or what if Photoshop had some kind of API? So I googled around further and found that Photoshop has a built-in batch conversion feature. You can access the feature by clicking File > Scripts > Image Processor… menu from Photoshop. See Getting To Know Photoshop: Image Processor

The images folder I had to work with had many subfolders. Image files in some subfolders were all JPEG, but some subfolders only had NEF files in them, and then there were subfolders which had both JPEG files and NEF files. I made a simplified test folder called pic-folder which has two subfolders and six image files in it. The file list of pic-folder:

pic-folder\10\A.jpg
pic-folder\10\A.NEF
pic-folder\10\B.jpg
pic-folder\10\B.NEF
pic-folder\20\C.NEF
pic-folder\20\D.NEF

The subfolder 10 represents a subfolder where somebody else has already converted the NEF files to JPEG but not discarded the NEF files. The subfolder 20 represents a subfolder where conversion is not done.

Let's see what Photoshop can do to that folder. Open Image Processor from Photoshop. In "1. Select the images to process" section, select pic-folder, and check "Include All sub-folders". In "2. Select location to save processed images" section, select a different folder (an empty folder), say pic-folder-dest, and check "Keep folder structure". In "3. File Type" section, make sure the option "Save as JPEG" is checked, which is by default checked. In "4. Preferences" section, make sure that "Run Action" is unchecked. I don't know what the option "Include ICC Profile" does, but it's checked by default, so let's leave it checked. Now click "Run". Photoshop will open each image file in subfolders of pic-folder and save as JPEG in appropriate subfolders of pic-folder-dest. After Photoshop finishes its job, the file list of pic-folder-dest should be like this:

pic-folder-dest\10\A.jpg
pic-folder-dest\10\A_1.jpg
pic-folder-dest\10\B.jpg
pic-folder-dest\10\B_1.jpg
pic-folder-dest\20\C.jpg
pic-folder-dest\20\D.jpg

Photoshop processed all six files so we got six JPEG files in pic-folder-dest. It didn't skip JPEG files. It first processed pic-folder\10\A.jpg, which is already in JPEG format, and saved the result as pic-folder-dest\10\A.jpg, and then it processed pic-folder\10\A.NEF, and saved the result as pic-folder-dest\10\A_1.jpg because the name pic-folder-dest\10\A.jpg was occupied by then, and so on. That's a problem.

How to skip jpg files, and also skip NEF files which already have corresponding jpg files? Photoshop has scripting feature. Maybe there is a way to customize the job of Image Processor further by scripting, but I had no time to learn how to script Photoshop. The folder I was given had to be processed within few days. So what I did was write and run a pre-processing Python script which takes out all NEF files without corresponding jpg files to a separate folder, then run Image Processor on that separate folder, and finish with a post-processing Python script that takes output jpg files and put them into the original folder.

Part of pre-processing script:

import shutil, os, errno

# http://stackoverflow.com/questions/273192/python-best-way-to-create-directory-if-it-doesnt-exist-for-file-write
def ensuredirs(path):
    try:
        os.makedirs(path)
    except OSError as exc:
        if exc.errno == errno.EEXIST:
            pass
        else:
            raise

def move_file(src, dst, dryrun=False):
    if dryrun:
        print 'os.rename(A, B)'
        print 'A:', src
        print 'B:', dst
    else:
        ensuredirs(os.path.dirname(dst))
        os.rename(src, dst)

JPG_EXTS =  ['.JPG', '.JPEG', '.jpg', '.jpeg']

def is_nef(fn):
    return fn.endswith('.nef') or fn.endswith('.NEF')

def is_jpg(fn):
    root, ext = os.path.splitext(fn)
    return ext in JPG_EXTS

def take_nefs_out(src_dir, dest_dir, dryrun=False):
    """Take out NEF files in src_dir with no accompanying JPEG files, 
    create folder dest_dir, and move the files to dest_dir,
    preserving the folder structure.
    """
    assert os.path.isdir(src_dir)
    assert not os.path.exists(dest_dir)
    for p, dirs, files in os.walk(src_dir):
        for fn in files:
            if is_nef(fn):
                root, ext = os.path.splitext(fn)
                if any(root + jpg_ext in files for jpg_ext in JPG_EXTS):
                    continue
                fullpath = os.path.join(p, fn)
                newfullpath = fullpath.replace(src_dir, dest_dir, 1)
                move_file(fullpath, newfullpath, dryrun)

Try a dry run on the test folder.

os.chdir(parent_folder_of_pic_folder)
take_nefs_out("pic-folder", "nef-folder", dryrun=True)
os.rename(A, B)
A: pic-folder\20\C.NEF
B: nef-folder\20\C.NEF
os.rename(A, B)
A: pic-folder\20\D.NEF
B: nef-folder\20\D.NEF

OK, it's selecting the right NEF files and it seems they'll move to right places. Run take_nefs_out.

os.chdir(parent_folder_of_pic_folder)
take_nefs_out("pic-folder", "nef-folder")

After that, the file list for pic-folder should be:

pic-folder\10\A.jpg
pic-folder\10\A.NEF
pic-folder\10\B.jpg
pic-folder\10\B.NEF

and the file list for nef-folder should be:

nef-folder\20\C.NEF
nef-folder\20\D.NEF

Run Image Processor on nef-folder to create JPEG files in jpg-folder. Then the file list for jpg-folder should be:

jpg-folder\20\C.jpg
jpg-folder\20\D.jpg

Finally, we need to move JPEG files in jpg-folder to pic-folder and remove NEF files.

Part of post-processing script:

def move_jpgs(src_dir, dest_dir, dryrun=False):
    assert all(os.path.isdir(d) for d in [dest_dir, src_dir])
    for p, dirs, files in os.walk(src_dir):
        for fn in files:
            assert is_jpg(fn)
            src_path = os.path.join(p,fn)
            dst_path = src_path.replace(src_dir,dest_dir, 1)
            move_file(src_path, dst_path, dryrun)

move_jpgs("jpg-folder", "pic-folder", dryrun=True)

After running move_jpgs("jpg-folder", "pic-folder"), the file list of pic-folder should be:

pic-folder\10\A.jpg
pic-folder\10\A.NEF
pic-folder\10\B.jpg
pic-folder\10\B.NEF
pic-folder\20\C.jpg
pic-folder\20\D.jpg

To delete NEF files:

def remove_file(fn, dryrun=False):
    assert os.path.isfile(fn)
    if dryrun:
        print "os.remove on:", fn
    else:
        os.remove(fn)

def remove_nefs(adir, dryrun=False):
    """Remove NEF files from folder adir."""
    assert os.path.isdir(adir)
    for p, dirs, files in os.walk(adir):
        for fn in files:
            if is_nef(fn):
                fullpath = os.path.join(p, fn)
                remove_file(fullpath, dryrun)

remove_nefs("pic-folder")

After that, the file list of pic-folder should be:

pic-folder\10\A.jpg
pic-folder\10\B.jpg
pic-folder\20\C.jpg
pic-folder\20\D.jpg
This entry was posted in Python and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s