libvips, and Sharp | Hemanth's Internet Home

Almost a year ago, I had decided to tackle image preprocessing on my ssg (mettu, that I use to build this site). Back then, by virtue of using Vite and Python, I had an option to pick between js and python libraries. It had reduced to picking between Sharp and Pillow (PIL), thanks to a quick (and haphazard) Google search. Sharp was chalked up to be really fast and light, and not debating the matter further, I picked sharp, and wrote my script. Now I had 2 different scripts Vite would have to call in my SSG: an image preprocessor based on sharp, and the build script, written in python.

Such is the coincidence, that I frequently find myself to pick up the script and drop it into various uses, and also twice into 2 different sites over the past year, where the latest one was for the site of PES innovation lab, where I'm interning this summer. When a friend of mine (check his site out) there joked about using PIL because the lab's acronym was also PIL, I had a question: just why is Sharp much faster than Pillow?

So I look it up, and there it was: Sharp is a wrapper around the C library libvips, which is apparently the fastest and lightest image processing library out there.

libvips was initially developed in late 1989, for a EU-funded research project named VASARI, which aimed to digitise the collections of the National Gallery of Art in Washington D.C. Their goal was to have a library that could perform operations on large images, without loading the entire image into memory, given they'd have to handle images with sizes of 700MiB on a system with less than 32MiB of RAM. Improvements were made over time, and I'll leave this wiki to explain that to you.

Now, I found a few benchmarks, including this paper that evaluated how quick libvips actually was. It's almost 8x faster than ImageMagick, and uses ~20x lesser memory too. I wanted to know how it works, and this is my understanding:-

First, libvips builds a DAG (Directed Acyclic Graph) of the operations that have to be performed on the image. Now, instead of performing it on the entirety of the image, it breaks the images down into tiles, and performs it on all the tiles in a horizontally threaded fashion.

Horizontally threaded? Every tile is processed one by one, with each thread performing all the operations in the DAG on that particular tile. This comes with the added benefit of reducing I/O bottleneck, because most of the tile is in the cache, which would be accessed frequently by the thread.

In addition, libvips gains from having multiple processors being available; it runs multiple of these threads for each of these CPU cores, speeding up the process. It also uses multiple SIMD instructions to speed up these operations, utilising libraries like Highway.

I really wanted to know about Highway, but that's a topic for another day (and I have an exam tomorrow).