PDF Accessibility: Table of Contents Guidelines
May 5, 2017
If there is a Bible for PDF accessibility, it is the 14th chapter of ISO 32000_2008, the ur-text of the PDF standard. For any Scripture, of course, there is commentary, and the latest is ISO 14298-1-2016.
Nowhere in the core documents defining PDF accessibility is there any complete, definitive description of how to create a table of contents. That is why, when we review PDF tables of contents, there are so many variations in how they are tagged.
ISO 32000 describes the tags used to make a table of contents:
(From ISO 302000 14.8.2 table 333)
TOC | (Table of contents) A list made up of table of contents item entries (structure type TOCI) and/or other nested table of contents entries (TOC). A TOC entry that includes only TOCI entries represents a flat hierarchy. A TOC entry that includes other nested TOC entries (and possibly TOCI entries) represents a more complex hierarchy. Ideally, the hierarchy of a top-level TOC entry reflects the structure of the main body of the document. NOTE 2Lists of figures and tables, as well as bibliographies, can be treated as tables of contents for purposes of the standard structure types. |
TOCI | (Table of contents item) An individual member of a table of contents. This entry’s children may be any of the following structure types: Lbl A label (see “List Elements” in 14.8.4.3, “Block-Level Structure Elements”) Reference A reference to the title and the page number (see “Inline-Level Structure Elements” in 14.8.4.4, “Inline-Level Structure Elements”) NonStruct Non-structure elements for wrapping a leader artifact (see “Grouping Elements” in 14.8.4.2, “Grouping Elements”). P Descriptive text (see “Paragraph like Elements” 14.8.4.3, “Block-Level Structure Elements”) TOCTable of content elements for hierarchical tables of content, as described for the TOC entry |
Lists and Links
A table of contents is a list. In other words, it should have the same kind of list structure that other PDF lists have — just with different tags. In this post, we will talk about lists and tables of contents together. Getting the list hierarchy right in tables of contents is a main issue in PDF accessibility, as it is with lists and sub lists.
Improper links are the second major issue that reviewers encounter with tables of contents. Take look at the TOCI element in the table above. Nobody uses label elements anymore. There is a reference tag which is used today. NonStruct is used occasionally but is not really needed. P refers to a container tag that one can use within any block element to define text — but it is not essential either. There is nothing in the above that says that links must be present in a table of contents. Instead, this is a requirement for Section 508 compliance of most government agencies — that all tables of contents have working links.
Part 1: Proper List Structure
There are two main document flows into government PDFs: Microsoft Word or Microsoft PowerPoint, and documents created by programs like Adobe InDesign. When documents are exported from Word to PDF the accuracy is about 90%, but the following problem occurs very often, as in the following list:
- One potato
- Two carrots
- One carrot should be sprouted
- The other carrot should not be sprouted
When lists and sub lists are exported from Word to Acrobat, the tag system should look like this (for the sake of accessibility I’m going to put descriptions of Adobe tags into a table so that it is easier for screen reader users to track the hierarchy:)
Root Level | Level 1 | Level 2 | Level 3 | Level 4 | Level 5 |
---|---|---|---|---|---|
<L> | |||||
<LI> | |||||
<LBody> | One potato | ||||
<LI> | |||||
<LBody> | Two carrots | ||||
<L> | |||||
<LI> | |||||
<LBody> | One carrot should be sprouted | ||||
<LI> | |||||
<LBody> | The other carrot should not be sprouted |
I’ve left out the bullets because they make it difficult to format in this table.
What can go wrong? The sublist, which begins with a separate list declaration entirely can be moved to the left so that it is at the root level with the parent list. The more extensive the list is, and the more levels there are, the more confused this can get.
With a table of contents, it is especially important to preserve the list and sublist structure that the author has created because this is a visual representation of the arrangement of the ideas in the document.
Consider the following:
Table of Contents………………………………………………………………………….2
Introduction to 508 Compliance…………………………………………………….6
What is 508 Compliance?………………………………………………………….6
Purpose………………………………………………………………………………….7
How does this law impact my committee?…………………………………..7
Will NWCG assist committees in 508 Compliance?……………………..7
Microsoft Word Document Creation……………………………………………..7
This should be tagged as represented in the following table, where the table of contents item contents have been deleted for clarity:
Root Level | Level I | Level 2 | Level 3 | Level 4 |
<TOC > | ||||
<TOCI> | Table of Contents | |||
<TOCI> | Introduction to 508 Compliance | |||
<TOC > | ||||
<TOCI> | What is 508 compliance? | |||
<TOCI> | Purpose | |||
<TOCI> | How does this law impact my committee? | |||
<TOCI> | Will NWCG assist committees in 508 compliance? | |||
<TOCI> | Microsoft Word Document Creation |
What happens all too frequently is that one tables of contents like this are exported from Word, they are flattened out so that the new <TOC> that signifies a sublist is moved to the root level. This defeats the author’s purpose in grouping his ideas the way she did.
So the first step in reviewing the way your table of contents is tagged before submitting it for 508 review is to compare the visual hierarchies in the original document to the tagging system. Ensure that sub lists start with an entirely new table of contents declaration, <TOC>, that is a child of the parent level <TOC I>. Ensure that when the table of contents returns to the root level, that the table of contents item also returns to the root level, as in the last row of the table.
InDesign Issues
When documents are exported from programs like InDesign, which use their own XML-based tagging systems, you will see in the tag structure tags that may or may not resemble native Acrobat tags. To check for compliance, you will need to open the Role Map editor to see how the InDesign XML tags have been mapped to native Acrobat tags. For all InDesign products, this should be checked very carefully because parts of the role map may be incorrect (see my blog post on Role Mapping). For example, when you look for TOC I equivalents in the role map, you may see something like this:
/TOC — 1 /P
/TOC — 2 /P
What this means is that what came over from InDesign as first and second level table of contents entries have been mapped to simple text. What you want to see is something like this:
/TOC — 1 /TOCI
/TOC – 2 /TOCI
Make sure that everything that is intended to be in a table of contents is mapped to <TOC> or <TOCI>.
Part 2: The Link Section
Once you have made sure that the list structures of the table of contents reflect what the author intended in the tagging, it is time to turn your attention to how the link section is constructed. There is nothing in the PDF core documents about how this should be done, but there are practices and traditions based on years of knowledge of the evolution of both Acrobat and screen readers. Note: regarding screen readers, we are excluding Acrobat Read Aloud function. What this does in the table of contents is read every single dot or period in the leader element aloud. If you need a free screen reader try NVDA.
The best detailed guide on PDF accessibility available online was written by our own Cammie Truesdell and Suman Kaur for the Veterans Administration and can be found at: https://www.section508.va.gov/support/tutorials/pdf/index.asp. They have a fine screenshot of the proper tagging of the link section, which for accessibility purposes I have turned into a table as above so that the various levels and dependencies will be apparent for screen reader users:
Root level | Level I | Level 2 | Level 3 | Level 4 |
<TOC > | ||||
<TOCI> | ||||
<Reference> | ||||
<Link> | ||||
What is 508 compliance? | ||||
”¦.. | ||||
5 | ||||
<Link — OBJR> |
In this example, there are both Reference and Link tags. Are both really necessary? A Reference is a specialized kind of internal link that one should always use if the reference points to a target that is within the document. A Link tag can be used both within a document and to refer to an outside target.
I tested the above configuration in the original document with JAWS 18, Window Eyes, and NVDA 2017.1, in the following ways:
- As it is.
- Using the Reference tag with no Link tag.
- Using the Link tag with no Reference tag.
- Using a <NonStruct> tag for the dots that make up the leader.
Here is what I found:
- As it is, tables of contents will be read perfectly fine with all three screen readers.
- Tables of Contents will be announced perfectly well with the Reference tag but no Link tag. In JAWS, with the virtual cursor, the link text will be read along with three of the leader dots, and the page reference will be read with another down arrow. This is because the leader dots fill up the buffer of how many characters can be read at once. Using control/down arrow will get around this. Using the tab control means everything will be announced at once as a link. This is true also for NVDA and Window Eyes.
- With the Link tag in place of the Reference tag, table of contents items will be read as regular links in which the link text is read first before the item is announced as a link, whereas each item is announced as a link first with the Reference tag in place.
- Regarding the leader dots — in earlier editions of JAWS and Acrobat, it was necessary to go through and artifact all of the leader dots in the table of contents because they would all be read individually by the screen readers. Now, they all have evolved enough that nobody will be caught listening to all of those dots individually rendered, so there is no need for a <NonStruct> tag to tell the screen reader what they are. I did try inserting a <NonStruct> tag into the tag tree and testing with that. The result was uninspiring.
Conclusions:
- It is proper form to have a Reference tag.
- The Link tag in addition is not necessary. However, traditions in government agencies may require it.
- Tables of contents will announce perfectly well with a Link tag with no Reference tag, but this is not in compliance with ISO 32,000 — 2008 regarding internal references.
- Ensure that there is always a Link-OBJR tag for every table of contents entry, and that this is a child of the Link or Reference tag. This ensures that each link is keyboard accessible. If you find that there is a link or reference that is missing this tag, go into the tag tree, activate the menu, and click Find. Choose Annotations (links and references are considered annotations by Acrobat). If an orphan annotation is found, a dialog box will appear giving you the option to tag the element. When you do so, the Link-OBJR tag will appear in the tag tree. You will need to make sure that it is moved so that it becomes the child of the appropriate Reference.