img-DB project

After a few years collecting code snippets for automatically renaming images, resizing and compressing them, I have decided to make a proper project.
That’s how it all started, anyway…
As I’m writing this, img-DB is a full project and it’s (probably) ready to be used by other people.

*** Big disclaimer: There may be serious bugs…
If you want to use this library with your family photos, DON’T move them on import, so you don’t lose them. Better to copy (which is the default anyway and is a non-destructive operation).

img-DB is an open-source CLI app, written in Python, that helps you organize your photo (or more generally images) collection.
No more lock-in a proprietary app like Apple Photos, or Google Photos. All your photos are indexed inside a single HTML file that you could potentially view in the browser.
… unless you imported more than 10k images, in which case you shouldn’t open the DB in your browser, because it may crash.

The most similar apps would be:

Of course I’m not trying to compete with Apple, Google or Adobe, they have hundreds of people people working full time on their products, and I’m just one guy with a family, a full time job and just a few hours to spare, but all things considered I think this project is in a really good shape:

  • you can import images, which can copy, or move and rename the images in an archive folder, but most importantly it creates a DB with extracted metadata
  • you can create folders of links from the archive, to another place on the disk, to have a different view of the images. For example, you can create folders with the images grouped by camera, model, by ISO or aperture or shutter speed, or by date, or by color similarity. Unlike all the other similar apps, you can use a regular file explorer to navigate the folders.
  • you can export HTML galleries that link to the archive, again to have a different view of the imported images. The template is flexible, with Javascript you can make the gallery look super cool and also sort, filter or group the images in any way.
  • you can export the DB in many different formats like JSON, CSV, HTML table, to explore different statistics in external programs
  • (as I’m writing, this feature is not ready yet) you can mass rename/ move some files, so you can create folders with the images grouped by date, camera, model or any other thing extracted from the images.

My main goal was to replace Apple Photos, and Google Photos for my family, to host the images myself and have the same features you expect: viewing, sorting, searching (and much much more).
It’s not ready yet, because the gallery templates don’t look too pretty and are not that functional yet, but I’m getting there.

I’m really, really excited about the idea of having a database of images inside an HTML file.
Initially it was like a debug feature, so I can see what I’m importing, but the LXML parser can easily handle tens of thousands of IMG elements instantly, so it became a core feature.

As I’m writing this, I have imported most of my photo collection, in total 45k images and the DB size is 400MB, each thumb is 256px, I extracted ALL metadata and ALL visual-hashes (perceptual).
I recently discovered there’s still a ton of duplicates in there actually, and I’m working on a smart way to eliminate them.
Because the UID is a crypto hash, it’s impossible to have images identical by content hash, but a long time ago I was compressing the images with different apps like JPEGoptim and I also kept some of the original images too; so I have the same photo twice, one version is maybe 4MB (the original) and another that is maybe 1MB (just a bit smaller and compressed).
Sometimes I have 3, or more copies of a photo, because I shared them on Whatsapp, Messenger, or Telegram and I still keep some of the copies and because they resize and compress the originals, I have those versions too.
Most of the visual-hashes are identical, some are very close, so my plan is to keep the best quality version and remove the other copies.
This will take some time, I have to get the algorithm right, so I don’t lose my images :)

The project is public on Github:
The documentation is in the same repository:

@notes #python #image #project