Tuesday, March 22, 2011

Last night at the PDF Hospital

Lisa checked a 6-page file into the PDF Hospital last night on behalf of a colleague. The complaint was a page that rendered very slowly and was impossible to modify.

Triage showed that the file was only about 2 megs, and the page in question only appeared to have a few images. Exploratory surgery confirmed that the problem was with the image components of the page by progressively removing the non-image components and still seeing the same symptoms.

This left 4 images to explore in depth. Each image was composed of a single background image, and 1024-16,384 copies of the foreground image mask and associated clipping path. The background image was larger than the foreground image mask, so it was easy to select and move out of the way. When we moved a copy of the foreground image mask, it disappeared because it was now outside the associated clipping path. But with the background image and one copy of the foreground image mask out of the way, we could select and delete the remaining image masks and clipping paths in the original image location. Once all of the clipping paths were deleted, the foreground image mask appeared again, and we verified through on-screen inspection and test prints that it was complete. We could then move the remaining background image and foreground image mask back into their original location without disturbing any of the item layering on the page at all.

Repeating this process on all 4 images resulted in a symptom-free page. We released the file back to its owner, and were left with the mystery of how you end up with thousands of identical copies of an image in a file. It’s almost certainly not the result of someone pressing a button 16,000 times.

The key is probably that the number of copies were powers of 2. If you are copying outside images and placing them into a file, the expected process is that when you select a new image to copy, that replaces the previously copied image in whatever bit of memory is being used as the temporary clipboard. Select A, place A, select B, place B. If, however, all previously placed images remain selected in memory along with the new image selected, then you’ll have exactly this sort of exponential result. If the brackets show the items previously placed, then you end up placing A, [A]B, [AAB]C, [AAAABBC]D, etc. You’ll have 16,384 copies of A happen after only placing 14 items. Fortunately, the damage was treatable in this instance. But it would be nice to know what software workflow is resulting in exponential pastes.

I’ve been thinking about opening the PDF Hospital for a while. We operate on PDF files all the time, and we’ve seen a lot of strange problems. Does your PDF file load or print really slowly, or give you PostScript errors? Do searches on the text not work? Do some pieces of text appear as different characters when printed than when they’re on screen? Do accents disappear, or capital W’s print as barcodes? Do some PDF viewers show little boxes in your PDF file? Do you want to move items around on the page, or make something larger or smaller? Do you want to remove or add items to the page? We deal with all of this frequently, so opening the PDF Hospital to outside business is the obvious next step. Not everything is treatable, but many problems are. The challenge is setting a price structure and managing expectations.

No comments: