The JPEGMetaData
class parses meta data from JPG files. The primary use case is to quickly extract embedded thumbnails from a large JPG image.
The simplest invocation to retrieve the largest thumbnail is:
BufferedImage bi = JPEGMetaData.getThumbnail(myFile);
But if you want more granular control you can create your own JPEGMetaDataListener
to receive each incoming key/value property, thumbnail and comment:
JPEGMetaDataListener listener = createListener(); try (InputStream in = url.openStream()) { JPEGMetaData.read(in, listener); }
The JPEGMetaData
object skims chunks of the JPG input stream for supported meta data. If you're interested in properties, comments, or more than one thumbnail, then you can implement your own JPEGMetaDataListener
:
public interface JPEGMetaDataListener { boolean isThumbnailAccepted(String markerName, int width, int height); void addProperty(String markerName, String propertyName, Object value); void addThumbnail(String markerName, BufferedImage bi); void addComment(String markerName, String comment); void startFile(); void endFile(); void imageDescription(int bitsPerPixel, int width, int height, int numberOfComponents); void processException(Exception e, String markerCode); }
This class was originally developed to help the user browse a folder of megapixel images.
This is not intended to be a fully functional JPEG parser. (I have no intention of recreating that wheel!) But it's pretty light weight solution to the problem.
When I say this is a "light weight solution": I'm implicitly comparing this approach to an ImageIO
-based solution. The documentation for ImageIO
says:
3.3.4 Reading "Thumbnail" Images
Some image formats allow a small preview image (or multiple previews) to be stored alongside the main image. These "thumbnail" images are useful for identifying image files quickly, without the need to decode the entire image.
Applications can determine how many thumbnail images associated with a particular image are available by calling:
reader.getNumThumbnails(imageIndex);
If a thumbnail image is present, it can be retrieved by calling:
int thumbailIndex = 0; BufferedImage bi; bi = reader.readThumbnail(imageIndex, thumbnailIndex);
That sounds great, but the catch is: it doesn't actually work. No built-in readers actually produced a thumbnail using the approach above.
I found a discussion online where a developer wrote:
I should have remarked that the JAI Image I/O Tools JPEG reader supports via the thumbnail method calls all thumbnails embedded in the JFIF APP0, JFXX APP0, and EXIF APP1 marker segments. Please see this javadoc for more information:
http://download.java.net/media/jai-imageio/javadoc/1.1/overview-summary.html#JPEG
I think that the only thumbnails supported by the Java SE Image I/O JPEG reader via the thumbnail method calls are those in the JFIF and JFXX marker segments. If you are unable to use JAI Image I/O Tools for some reason you could however derive the EXIF thumbnail by parsing the contents of the "unknown" node in the image metadata corresponding to the EXIF APP1 marker segment.
To paraphase: we would need to install a separate library/extension (JAI - Java Advanced Imaging) for the ImageIO
calls to "just work". If I run the following code (without JAI installed), I get an exception:
Iterator iterator = ImageIO.getImageReadersBySuffix("jpeg"); while(iterator.hasNext()) { ImageReader reader = (ImageReader)iterator.next(); try { reader.setInput( ImageIO.createImageInputStream(jpeg) ); BufferedImage thumbnail = reader.readThumbnail(0, 0); } catch(Exception e) { e.printStackTrace(); } }
I have nothing against JAI, and honestly I forget why we ruled that out as the solution to our problem. (This project is over a decade old at this point.) But for whatever reason: we tried our hand at parsing the thumbnails ourselves.
It isn't too hard to tease out the blocks used for thumbnails. Like the developer suggested above: APP0 and APP1 blocks were easily identifiable. (They contained mini stand-alone JPEGs. ImageIO is perfectly capable of reading them, you just have to present the data as its own ByteArrayInputStream.)
A little more detective work also revealed a 3rd block: APP13. This is apparently Adobe's invention; it's also known as an Image Resource Block (IRB). I iterated over thousands of images (from various sources) and found that about 1/3 of them only used an IRB block for a thumbnail. Like the Netscape block for GIFs: this may have started with Adobe, but it looks like other folks are using it just because it's popular. But this is still easy to parse with the same basic approach: it's a mini-JPG embedded inside a JPG.
Adding properties and comments to the reader was more of an afterthought. So far I haven't used that information; I've been exclusively interested in thumbnails.
So overall: this meets the original goals/needs we set out to satisfy, but it is hackish in nature. There's probably a lot of metadata -- and maybe even some thumbnails -- we don't support yet.
See the ThumbnailGenerator Demo for details about performance. As of this writing: retrieving the thumbnail from a large megapixel JPG via the JPEGMetaData
is about 50x faster than reading and scaling the image.