In order to provide a higher level of accuracy to the automated results generated by InFocus, SSB Bart Group needed a way to classify images to properly test them for alternative text. For example, a blank image that is not used as an active element on the page should general have a null alt attribute. A photograph provided on a page may require a description of the contents of the photograph while a logo or other text containing image needs to contain a reiteration of the text of the image. To properly classify such images the actual image data, as opposed to the declared data in the HTML document, must be reviewed and then an intelligent determination about how that image should be classified must be made. It is worth noting that the focus of this effort is solely on information SSB can gather by analyzing the image data directly and explicitly does not account for the use of the image in the document. Decisions about alternative text that rely on image use – for example all linked images must have alternative text that describes the link target – are outside of scope for the classification and are test decisions SSB addressed in DCQL.
Over the last 10 years, SSB has explored a variety of different methods for classifying images and determining their alternative text. These included:
- Looking at the size of the image
- Looking at the HTML declared attributes of the image
- Looking at the image content for specific image types – most notably GIF images – and acting accordingly.
The issue with these approaches is simply that they fail to take advantage of the most significant source of information available to us – the data of the image itself. With proper caching and multi-thread loading a fast testing engine can be made that actually loads all the images on a page into memory providing the engine a significant amount of information that is not be available as part of the page’s Document Object Model (DOM). This information is outlined in the Image Variables section of this document and covers a variety of highly detailed information about the images. Further, this information can then be exposed in DCQL in such a fashion that Organization Administrators and developers can write their own custom tests for image testing that require no knowledge of the programmatic or file system representations of images. This degree of abstraction allows powerful tests to be written that can readily be applied regardless of image type or deployment platform.
Our research and experience found that three factors were important for classifying images for the purposes of determining if their alternative text was valid:
- Decorative – Is the image purely decorative or is it meant to communicate information? For the problem of automatic determination, SSB focused on images that were decorative independent of use context – specifically those images used purely for page layout, coloration and background. For images that provide material information but are used in a decorative fashion on a given page – a common example is a thumbnail next to a text link targeting matching content – the onus would remain on the developer to properly set image alternative text.
- Contains Text – Is text the primary content of the image? Often images are used to concisely control the positioning and formatting of text on a page. For our purposes images that contain text are those that are predominately used to indicate the text information with other content of the images used solely for decorative or illustrative purposes. In the case of these images the alternative text should generally be set exactly to the text of the image.
- Complex – Is the image a complex one such as a chart, graph, schematic or other complex image? In the case of these images alt text is unlikely to be significant and either the longdesc should be used, the image should be linked to an alternative page or the content of the image should be made available in the text of the page itself.
In determining the proper classification of an image, the following variables are considered:
- Height – The height of the in pixels
- Width – The width of the image in pixels
- Color Depth – Number of unique colors in the image
- File Size – Size of the file in bytes
- Edge Count – Number of vertical and horizontal edges found in the image
- File Type – Type of the file based on the actual file sent by server
In validating the accuracy of our approach to classifying images, we found the following:
- Decorative – Images can be classified as decorative quickly and highly accurately. Our validation tests of the decision tree found that the tree classified images accurately about 98% of the time.
- Contains Text – Classifying images as containing text or not is difficult using only the Image Variables listed above and the accuracy of the decision tree never exceeded about 80% accuracy in determining if an image contained text. More practically since a determination could be made about whether an image was decorative the need to make a secondary determination about whether or not the image contained text was lessened from a diagnostic perspective.
- Complex – Classifying images as complex turned out to be most efficiently accomplished by simply flagging all images over a certain file size and square pixel size for review by the user. This is a factor of the exceptionally low occurrence rate of complex images in web pages. Across the sample set of 2000 images less than 1% were complex images all of which were over a certain, basic file size and square pixel size which could readily be entered statically into a DCQL test.
By far, the most interesting lesson SSB learned in implementing the automatic image classification scheme was that relatively complex and computational intensive algorithm for Sobel and regular edge detection produced no useful increase in the ability of the decision tree to classify images. While interesting to implement as a technology edge detection was not a meaningful variable in the classification of images. The one exception to this initially seemed to be classifying images as man-made or as photographs – where edge counts tend to be higher in man-made images. Unfortunately, image type turned out to be far better at making this same classification – GIF images tend to be man-made, JPEG images are photos – and is far less computationally difficult to achieve.