Clipping IIIF Images to a Path

Several people have approached me now about the work I have done as a member of the Scripta Qumranica Electronica (SQE) project with respect to displaying images served via iiif and then clipped to non-rectangular regions.  This can be desirable, since the material artefacts we humanists work with are rarely perfect squares, and people working on manuscripts may need to piece together fragments of various shapes.

My motivation

In the SQE project, we have many scroll fragments, none of which are rectangular.  These fragments may fit very closely together such that portions of a rectangular cropping of one fragment would obscure material in another cropped fragment when placed together. Or it may simply be a desideratum to fully isolate the object from its background

In order to get an unobstructed view of the fragments alone, it is necessary to acquire images that are cropped to non-rectangular regions.  I present here an overview of the technologies involved and several possible implementations to turn something like this:

into something like this:

The examples presented here are based on an image of 1Q7 (a Qumran Samuel scroll) provided by the BnF. I had the good furtune to see this particular manuscript in person back in the Spring of 2017, when Laurent Hericher, Guide du lecteur du département des Manuscrits, kindly set up a viewing for me at the BnF in Paris. The manuscript was among the earliest published Dead Sea Scrolls and shares some peculiar morphology with the great Isaiah scroll.

A brief introduction to IIIF

The International Image Interoperability Framework (IIIF) was designed to standardize the methods of disseminating and acquiring data regarding material artefacts.  The great benefit of this technology is that, for example, a leaf from the BnF and another from the Vatican library can be displayed side-by-side in the same web application, or even better, can be placed together in their proper locations within a digital codex via a iiif manifest.  Such a manifest allows a researcher to read through that “object” (in this case a reconstructed manuscript) as if its constituent parts had never been split apart and distanced from each other into different institutions.  This is important, since it seems to me that gone are the days of shipping pieces of artefacts around so that scholars can view the whole object in a single physical location.

Access to images via IIIF

Rather than discuss the iiif framework at large, I will limit my comments here to the iiif image API.  Among other things, the iiif image API provides a protocol for requesting images using a standard syntax.  The iiif image API section 2.1 offers the following explanation of its syntax:

2.1. Image Request URI Syntax

The IIIF Image API URI for requesting an image must conform to the following URI Template:

{scheme}://{server}{/prefix}/{identifier}/{region}/{size}/{rotation}/{quality}.{format}

For example:

http://www.example.org/image-service/abcd1234/full/full/0/default.jpg

The parameters of the Image Request URI include region, size, rotation, quality and format, which define the characteristics of the returned image. These are described in detail in Image Request Parameters.

You should note that the {scheme}://{server}{/prefix}/ section of the address may look a bit different depending on whether the image server uses URL rewriting or not.

For instance, the BnF iiif URL’s look like this:

http://gallica.bnf.fr/iiif/ark:/12148/btv1b8551261c/f5/643,4934,1862.6,530/pct:52.54875/0/native.jpg

Here “/12148/btv1b8551261c/f5” is the {identifier}. If we play with that a little bit we can see the functionality of the protocol:

We can get a full image at 5% scale as follows:

<img src="http://gallica.bnf.fr/iiif/ark:/12148/btv1b8551261c/f5/full/pct:5/0/native.jpg"/>

We can access a rectangle region by inputting start-x, start-y, cropping-width, and cropping-height in the {region} section of the request in a comma separated list /x,y,width,height/.  For example:

<img src="http://gallica.bnf.fr/iiif/ark:/12148/btv1b8551261c/f5/643,4934,1863,530/pct:50/0/native.jpg"/>

This is useful, since we don’t waste bandwidth transferring irrelevant portions of the image.  We can even allow for scaling using a percent setting in the {size}, which is helpful when the images we are joining together are not of the same scale.  The image server does not experience much load, even with the heavy image processing, since these servers typically use image files containing an image pyramid for fast scaling, and are segmented into tiles for fast cropping (ptif is the standard format for such representations, but now see the recent achievements of the latest updates to the OpenJPEG jpeg 2000 codec).

We can even get a rotated version of the image:

<img src="http://gallica.bnf.fr/iiif/ark:/12148/btv1b8551261c/f5/643,4934,1863,530/pct:30/27/native.jpg"/>

Depending on server support, we could change native.jpg to gray.jpg for a grayscale version, or even native.png for a png. By the way, all the above images are loaded directly from the BnF via the iiif image API, just have a peak at this page’s source code.

Clipping methods:

After acquiring a rectangular cropped image from the iiif compliant image serving institution, we have several options for clipping it to a non-rectangular path.  The most useful approach will depend on each individual use case, and will be dependent upon the nature of the clipping mask, whether raster or vector.

Raster masks:

A raster mask would be a black and white image (most likely 1 bit), in which the white pixels cover the portions of the image that we do not want to see, and the black pixels cover the the portions of the image that we do want to see.

If you use raster masks, your implementation will be complex from a programmatic perspective, perhaps a bit slow due to computational requirements, and impossible if the iiif server does not set CORS to “*”.  If you wish to take the hard road, skip to Working with raster masks.  Otherwise, you can always convert your raster mask to a vector path using a tool like Peter Selinger’s famous Potrace, which even has a browser-side javascript implementation.  The in browser Javascript version is shockingly fast too, so just use that for real time raster to vector conversion.

Vector masks:

A vector mask is textual data (it may be converted to binary), which list a series of points that describe the path of a polygon among other things.  For example, the following path gives us a square: “M10 10L10 50L50 50L50 10L10 10”.

<svg xmlns="http://www.w3.org/2000/svg" version="1.1" height="60">
	<path d="M10 10L10 50L50 50L50 10L10 10">
</svg>

Such a path draws a line from an absolute x,y coordinate point of 10,10 to an absolute coordinate point at 10, 50, then to 50,50, then to 50,10, and finally a line back to the origin 10,10 to close the polygon. The capital L matters, a lowercase l would point to a relative positioning of the point.  You can also close the shape with a Z instead of using coordinates for returning the line to its starting point.

<svg xmlns="http://www.w3.org/2000/svg" version="1.1" height="60">
	<path d="M10 10L10 50L50 50L50 10Z">
</svg>

The path can be expanded with as many points as desired to make an appropriate mask.  For example, a hexagon:

<svg xmlns="http://www.w3.org/2000/svg" version="1.1" height="90">
	<path d="M30.1 84.5L10.2 50L30.1 15.5L69.9 15.5L89.8 50L69.9 84.5">
</svg>

When representing a polygon with “holes” in it, another polygon is simply drawn inside the larger polygon.

<svg xmlns="http://www.w3.org/2000/svg" version="1.1" height="90">
	<path d="M30.1 84.5L10.2 50L30.1 15.5L69.9 15.5L89.8 50L69.9 84.5M30 30L30 50L50 50L50 30L30 30">
</svg>

Note that some implementations will require that the inner polygon have its points listed in the opposite direction (i.e., evenodd fill rule), for instance clockwise for the outer polygon and counter-clockwise for the inner polygon (see Joni Trythall’s explanation).  If you want to get really fancy, you can even add Bezier values to your path in order to represent curved regions.

For a more in depth introduction to SVG paths, see Chris Coyier’s introduction, for a great expose on the SVG coordinate system, see Sara Soueidan’s write-up.

Working with vector masks:

So, let’s say you have a iiif image address now and a vector mask, how do you go about clipping the unwanted regions of the image?  Well, there are several possible approaches:

You could use a pre-existing library like polyclip.js https://github.com/zoltan-dulac/polyClip (see here and here).

If you need more programmatic control, you can use clip paths in CSS and SVG, or even a clipping path in HTML5 Canvas elements.

  1. CSS: See Sara Soueidan’s tutorial for CSS clipping paths defined either inline or within an SVG element.
  2. SVG: Or see Chris Coyier’s tutorial for clipping fully within an SVG element.
  3. Canvas: You can also clip an HTML5 Canvas element using context.clip();, but you must first draw the path on the context using procedural instructions, which is a bit of a pain when you have a good vector mask ready to go.  So forget that unless you really need an HTML5 Canvas for some reason.

Any of these methods will automatically ignore clicks on transparent regions of the image (polyclip requires an extra css directive), so the clipped images will behave just the way they look.  Very nice indeed!

I tried all three methods and found the clipped SVG approach to be the simplest since it is quite self-contained.  The approach involves creating an SVG element, placing a path definition inside of it, giving it the URL for our iiif image, and linking the clip-path of the element to the clipping path we just defined.  We may also need to deal with some sizing issues.  The basic layout of our element is as follows:

<svg id="SVG-2959" width="978.7730174999999" height="278.508375">
	<defs>
		<path id="Path-2959" d="M1531 528.8999999999996 L … L1531 528.8999999999996 M600.7 480.3000000000002 L … L600.7 480.3000000000002 …" transform="scale(0.5254875)"></path>
	</defs>
	<defs>
		<clipPath id="Clip-2959">
			<use stroke="none" fill="black" fill-rule="evenodd" xlink:href="#Path-2959"></use>
		</clipPath>
	</defs>
	<g id="Container-2959" clip-path="url(#Clip-2959)" pointer-events="visiblePainted" style="opacity: 1;">
		<image id="ClippedImg-2959" class="clippedImg" draggable="false" xlink:href="http://gallica.bnf.fr/iiif/ark:/12148/btv1b8551261c/f5/643,4934,1862.6,530/pct:52.54875/0/native.jpg" width="978.7730174999999" height="278.508375"></image>
	</g>
	<use stroke="blue" stroke-width="3" fill="none" fill-rule="evenodd" id="fragOutline-2959" class="fragOutline" xlink:href="#Path-2959" style="visibility: hidden;"></use>
</svg>

Check your browser code and you will see that this image was downloaded from the Bnf and clipped via SVG path right here in your browser. Of course the code could have been a bit shorter by placing the clipping path directly in the clipPath element and removing the use element with id fragOutline-2652, but I did it this way say that you can add a simple CSS hover directive on the element to make the blue outline appear when the mouse moves over our fragment (e.g., svg g:hover + use {visibility: visible;}).

Now, in my original implementation scaling was not working properly between the raster and vector parts of the SVG container, so I set the sizes programmatically.  If you look at the dimensions of my cropping region in the iiif url request “643,4934,1862.6,530” you see that my cropping region has a height of 1862.6px and a width of 530px.  Then, the sizing portion of my iiif request “pct:52.54875” shows the scale I have requested, about 53%.  From this we see the reasoning for the width and height dimensions I set for the SVG element and its child image element; 52.54875% of 1862.6 is 978.7730175 and 52.54875% of 530 is 278.508375.  The svg scale transform setting for my clipping path is, you guessed it, transform=“scale(0.5254875)”.  And now everything lines up perfectly, no wasted resolution from shrinking the image I get from the iiif server, and I can even implement some smart caching for resized versions of the fragment.

Moving to a larger implementation

Now how did I find all those numbers and such?  Well, in the SQE project we have a large MySQL database (actually MariaDB) into which I have defined the clipping paths as POLYGON objects and linked those entries to a table that contains image information, such as the location, dimensions, and resolution of the relevant image.  So, when I want to load a clipped image, I gather all of that information together.  The database can even give me the bounding box of my polygon path, so that I have the data for the cropping section of my iiif request ready to go.

Here are some of the cool GIS functions available in MariaDB:

SELECT DISTINCT … ST_AsText(ST_Envelope(artefact.region_in_master_image)) as rect,
	ST_AsText(artefact.region_in_master_image) as poly,
	ST_AsText(artefact.position_in_scroll) as pos
FROM ...
WHERE ...

ST_ENVELOPE () gives me a bounding box of my polygon clipping path as a rectangle polygon (MINX MINY, MAXX MINY, MAXX MAXY, MINX MAXY, MINX MINY).  My table column artefact.region_in_master_image provides the polygon path of my clipping region, and artefact.position_in_scroll gives me the clipped image’s insertion point in the larger reconstructed scroll.  MariaDB outputs POLYGON’s and POINT’s in a format known as Well-Known Text (WKT), which is pretty easy to work with and can even interoperate with GeoJSON.  I parse all of this data browser side as follows:

// Get my clipped image position in scroll data:
var x_loc = parseFloat(artefact.pos.split(' ')[0].replace('POINT(', ''));
var y_loc = parseFloat(artefact.pos.split(' ')[1]);

// Parse the bounding box data for, x,y,width,height (I then use this directly in my
// iiif image request).
var rect = artefact.rect;
rect = rect.replace('POLYGON((', '');
var coords = rect.split(',');
var img_x = coords[0].split(' ')[0];
var img_y = coords[0].split(' ')[1];
var img_width = coords[2].split(' ')[0] - img_x;
var img_height = coords[2].split(' ')[1] - img_y;

// Grab the clipping path polygon and split into an array of individual
// polygons.
var data = artefact['poly'];
var polygons = data.split("\),\(");</code>

Then I must translate the clipping path coordinates from their current location to a new point of origin, since they were originally created in relation to the coordinate system of the full size, full resolution image, but I will be applying them now to a rectangular cropping from that original image.  Since the coordinate system is top-left, that will almost invariably mean shifting every x coordinate of the path to the left (i.e., a lower number), and shifting every y pixel of the path up (also a lower number).  This is done by iterating over all the polygonal clipping regions and converting them to SVG syntax:

// This will be my new SVG path string
var new_polygons = '';

// Iterate over every polygon, make sure to keep things in scope with “this” at the end.
polygons.forEach(function(polygon, index) {

	// SVG paths start with M, which is the point from which a line will be drawn
	new_polygons += 'M';

	// Remove all the WK specific syntax
	polygon = polygon.replace(/POLYGON/g, "");
	polygon = polygon.replace(/\(/g, "");
	polygon = polygon.replace(/\)/g, "");

	// Now put each x y point into an array and format it for SVG syntax
	var points = polygon.split(",");
	points.forEach(function(point) {
		if (new_polygons.slice(-1) !== 'M'){
			new_polygons += 'L';
		}

		// Here is the code for translating each point, note that we just subtract the
		// img_x and img_y from the bounding box we parsed above.
		new_polygons += (point.split(' ')[0] - img_x) + ' ' + (point.split(' ')[1] - img_y);
	}, this);
}, this);

Lastly, we will want to define some abstract resolution that we will scale every clipped image to.  Doing all of this fancy clipping is meaningless if the clipped images are not properly sized in relation to each other.  This step may require some manual work if the images you are using do not provide their resolution, or they have an incorrect resolution.  If the image has a ruler, which it really should, you can always count the pixels.

Let me explain the theory.  Let’s say you have one image that is 1000 DPI, that is 1000 pixels in the image should encompass 1 inch worth of real world material.  Since we are working with relatively thin flat artefacts, this approximation is close enough, so long as the camera lens is parallel to the artefact, and any barrel distortion has been corrected.  Now let’s say we need to place that clipped image alongside one that is 500 DPI.  The 1000 DPI image will need to be shrunk by 50% in order to match the resolution of the 500 DPI image.  Simple!  So, pick a number, any number, to be your system default resolution, in SQE we use 1215 DPI since that is the resolution of basically all the images we get from the our partner the Israel Antiquities Authority.  You should pick a number that similarly makes sense for the type of images you generally access, 1200 might be a good number, or anything that is large enough and also a superior highly composite number—perhaps 2520.  Now no matter what image you load for clipping, just resize it in relation to that number and everything will be ok…famous last words.

For positioning and rotation of these SVG elements, I just nest them into two div. The top level div gets translated via CSS to the proper XY coordinate, and the second level div gets rotated via a CSS rotate transform. This keeps everything nice and clear, though perhaps a bit overloaded. With CSS transforms, even realtime dragging and rotating works fantastically with no delays or stutters. –(Edit: We now use a 2D transform matrix applied directly to the SVG element itself for translate/rotate/scale.)– If you want to see this implementation in action, just scroll back to the top of the page and take a look at the image with the scroll fragments all cut out and arranged in a nice line. If you inspect that element, you will see that it is precisely this method, implemented right in the browser!

Working with raster masks:

If you insist on working completely in a raster environment, you will need to convert portions of the image you receive from the iiif server into transparent pixels.  This can be done using an HTML5 Canvas by working with that canvas’s drawing context.  You start by reading the iiif image into one off-screen HTML5 Canvas, then reading your mask into another off-screen HTML5 Canvas.  Then you must step through each pixel and compare it with the corresponding pixel in your raster mask.  If the pixel in the mask is white, then set the alpha pixel in the image to 0 (the context is rgba, so the alpha pixel is the fourth).  If the pixel in the mask is black, then do nothing. Put your now cropped pixeldata into a new canvas and display that one on screen.  When you finish this, you will have a lovely image with all areas outside of the mask being perfectly transparent.

// Create image elements for our mask and the image to be clipped
var mask = new Image();
var image = new Image();

// Nested functions are necessary to make sure all image data is loaded before
// we begin to work on it.
mask.onload = function () {
	var tempCanvas = document.createElement("canvas");
	var tempContext = tempCanvas.getContext("2d");
	tempCanvas.width = mask.width;
	tempCanvas.height = mask.height;
	tempContext.drawImage(mask, 0, 0);
	var maskData = tempContext.getImageData(0, 0, mask.width, mask.height);

	// Now we can work on the image when it is loaded.
	image.onload = function () {
		tempContext.clearRect(0, 0, tempCanvas.width, tempCanvas.height);
		tempContext.drawImage(image, 0, 0);
		var imageData = tempContext.getImageData(0, 0, image.width, image.height);

		// Now we need some variables for finding the bounding box of our mask.
		var bound = {
			top: null,
			left: null,
			right: null,
			bottom: null
		};
		var x, y;

		// Let’s get looping.  We jump through our pixel data in bound of 4.
		// This is because we are only changing the alpha channel, and each
		// pixel has 4 channels (red, green, blue, alpha).
		// We start the array at three because arrays are 0 based, so 3
		// is the fourth element in the array: 0,1,2,3.
		for (var i = 3; i < maskData.data.length; i += 4) {

			// We check if the mask is present in this pixel.
			// If so, we use it to find the bounding box.
			// If not, we jump to the else and set our image’s
			//corresponding pixel to transparent.
			if (maskData.data[i] != 0) {

				//Find our crop box
				x = (i / 4) % image.width;
				y = ~~((i / 4) / image.width);
				if (bound.top === null) {
					bound.top = y;
				}
				if (bound.left === null) {
					bound.left = x;
				} else if (x < bound.left) {
					bound.left = x;
				}
				if (bound.right === null) {
					bound.right = x;
				} else if (bound.right < x) {
					bound.right = x;
				}
				if (bound.bottom === null) {
					bound.bottom = y;
				} else if (bound.bottom < y) {
					bound.bottom = y;
				}
			} else {

				//Apply mask transparency
				imageData.data[i] = 0;
			}
		}

		// Now we trim our image to the bounding box of the mask
		var trimWidth = bound.right - bound.left;
		var trimHeight = bound.bottom - bound.top;
		tempCanvas.width = trimWidth;
		tempCanvas.height = trimHeight;
		tempContext.putImageData(imageData, 0 - bound.left, 0 - bound.top);

		// Now we copy our cropped and clipped image to a new
		// image element.
		var maskedImage = new Image();
		maskedImage.onload = function () {
			fragImage.setElement(maskedImage);

			//We need to remove the bounding box that Canvas.toDataURL() places around its images
			//Is there another better way to do this?
			fragImage.clipTo = function (ctx) {
				ctx.rect(-Math.abs(trimWidth / 2) + 2, -Math.abs(trimHeight / 2) + 2, trimWidth - 4, trimHeight - 4);
			};
			canvas.add(fragImage);
			canvas.renderAll();
		}
		maskedImage.src = tempCanvas.toDataURL();
	};
	image.src = “path / to / my / image / file”;
};
mask.src = “path / to / my / mask / file”;

Phew!  But your work may not be over.  The transparent sections of the image will still respond to event listeners.  I had code to deal with this, but have lost it in my rapid dash away from this method to something more sane.  Nevertheless, if you want the image to not only look like a cut out fragment, but also to act like one, you might only need to set the CSS for that canvas element to: pointer-events: none.  If that doesn’t work, then you need to capture every click event on your image, find the exact pixel where the click happened, and check the pixel data for that pixel to see if it has an alpha value of 0 or not.  If it does have an alpha value of 0, you need to allow the event to bubble up the DOM to the next responding element, if there is one, and repeat the process.  Fun!  You have now learned how to turn your computer into an effective space heater; but you were warned.

Pin It on Pinterest