ISC Conference 2012, Day 2—The re-indexing dilemma

Max McMaster, an award-winning indexer and representative of the Australian and New Zealand Society of Indexers (ANZSI), spoke at the ISC conference about reindexing. For a new edition of a book, a subsequent annual report, or a bilingual document, do you adapt was has already been done, or do you index from scratch? Further, are there ethical issues in reusing an existing index?

In Australia, trade publishers will often buy foreign rights and revise a book for the Australian market. In many cases, this means that although the content of the two editions is similar, terminology can vary substantially. Moreover, North American indexers tend to produce lengthier indexes than Australians are used to. When re-indexing for the Australian edition, the index may end up being 30% shorter.

If a new edition of an existing book is just a repagination, it’s absolutely most efficient to reuse the index headings. (McMaster adds 1000 to all the old page numbers to keep track of what he’s changed and what he hasn’t.) If changes to a new edition are minimal and you created the first index, reusing the existing index, with necessary revisions, may be easiest. If changes are substantial, however, it’s much more efficient to start from scratch. McMaster is emphatic in dispelling the myth, however, that re-indexing, in whatever form, is easier than creating an index for a new work.

If you didn’t create the first index and want to re-index a book, it’s still useful to see the existing index. The previous indexer may have found a way to solve a problem that will save you a lot of time or you may spot weaknesses that you should avoid.

In Australia, government annual reports are all required to have an index, so this is a boon for indexers in that country. Since the design and the components of an annual report rarely change from year to year, re-indexing is a snap. Basically, once you land a contract to do one annual report, you’ve got it for life. (McMaster has co-authored a guide for non-indexers on how to index annual reports.)

For bilingual documents, you can’t reuse pagination, since the structure and length of the two languages will be different. One possibility is translated embedded indexing. However, Heather Ebbs pointed out that translating an index doesn’t really work, since there are cultural and contextual differences.

As for ethical issues, McMaster once had a publisher reuse his index for a book that he did for Australia that was then repackaged, with a different title, in New Zealand. In Australia, because an indexer is under contract, he or she doesn’t retain copyright of the work; however, McMaster would have appreciated being notified at the very least that his index would be reused (and a bit of additional compensation wouldn’t have hurt, either). Mary Newberry said that in Canada, copyright of the index does belong to the indexer.

McMaster’s presentation brought up the issue of credit; in one of his anecdotes he mentioned that his name was on a book’s copyright page, which led me to ask him whether crediting an indexer is standard practice in Australia. He says that an indexer is credited only maybe 5% of the time. Christine Jacobs had an interesting approach to the credit issue: she invoices for a credit line (and, incidentally, for a copy of the finished book). She asks for a credit on the copyright page, in the acknowledgements, or in the index itself, and lists this as a separate line item on her invoice. In cases where she doesn’t approve of the changes an editor, author, or publisher has made to the index, she simply removes that item, and her name doesn’t appear.

ISC Conference 2012, Day 2—What is the future of indexing?

Cheryl Landes is a technical writer and indexer who sees a changing role for indexers—one that is rife with possibilities.

Today people are consuming content in four main ways: through print, on e-readers, on tablets, and on smartphones. In the past year, more people have been moving towards tablets and smartphones rather than e-readers, since the former devices offer colour and other functionality. Many software vendors of authoring tools are adding outputs to accommodate tablets, and more and more companies are publishing technical documentation that can be read on tablets or smartphones (for example, Alaska Airlines replaced forty pounds of paper pilots’ manuals with iPads). Despite the movement towards mobile devices, however, Landes doesn’t believe that print will ever go away.

Digital content means users are able to search, but searching doesn’t yield the speed of information retrieval or context that an index offers. Indexers have to be proactive about educating others about the utility and importance of indexes, and emerging technologies are providing many opportunities for indexers to apply their skills beyond the scope of traditional back-of-the-book indexing.

Partnering with content strategists

Indexers can serve as consultants about taxonomies and controlled vocabularies, which are key to finding content. (An example of a taxonomy is the Legislative Assembly of British Columbia’s Index to Debates.)

Database indexing

Growth in this area is anticipated as more companies move their catalogues online, particularly in retail.

Embedded indexing

Embedded indexing tags content directly in a file and allows for single-sourcing, which is ideal for publishers who want print and digital outputs for their content. (Landes echoes Jan Wright in saying that for the past decade technical communicators have been grappling with issues trade publishers are facing now, yet they’re not talking to each other. How do we start that conversation?)

Search engine optimization

Indexers understand what kinds of search terms certain target audiences use. Acting as consultants, they can create strategies for keywording in metadata.

Blog and wiki indexing

This area is likely to grow because more companies are turning to blogs to promote products and services, and they are using wikis for technical documentation.

Social media

Possible consulting opportunities abound in this quickly changing field. Facebook’s Timeline and Twitter’s hashtags are both attempts at indexing in social media, but one can envision the need for more sophisticated methods of retrieving information as more and more content is archived on these platforms.

ISC Conference 2012, Day 1—American Society for Indexing’s Digital Trends Task Force

ASI’s David Ream and Jan Wright gave the ISC a report on their work with the Digital Trends Task Force (DTTF), which came into being  in the summer of 2011 after the issue of electronic publication indexing was brought up at the ASI conference earlier that year.

The task force actively participated in the International Digital Publishing Forum (IDPF), a consortium of businesses and organizations involved in defining the new EPUB 3.0 standard. By establishing a special indexers’ working group in the IDPF, and with the Australia and New Zealand Society of Indexers’ membership in the IDPF, indexers made their presence known to a much wider community of players driving the future of electronic publishing. (EPUB is the open source format that can be read on the iPad, Nook, Kobo, Sony readers, and other e-readers. The notable exception is the Kindle, which uses a different format.)

The task force also set out to do industry outreach at such events as the Digital Book World and O’Reilly’s Tools of Change conferences. With this kind of outreach, the ASI could establish itself as an authority about indexing in a digital age. At the latter conference, a recurring concern of electronic publishers was the issue of discovery, since traditional channels, like bookstores and libraries, are now out of the equation. Indexing—and indexers—Ream and Wright pointed out, was the gateway to discovery, and because discovery means money, publishers are more likely to listen to indexers if we emphasize discovery. (Interestingly, Amazon did not participate in Tools of Change.)

Wright also presented at the WritersUA conference. WritersUA, based in the U.S., is a group of technical writers, which have had to deal with the issue of single-sourcing—and a move to XML—years ago. They have experience solving the kinds of problems trade publishers are only now beginning to face.

Wright’s outreach extended to being a guest on #ePrdctn Hour on Twitter, which, as a platform, Wright said, was more powerful than she could have ever imagined. After her Twitter hour, establishing herself as an expert in the nascent field of ebook indexing, Wright was able to reach organizations and companies that otherwise would have been much harder to access. For instance, she is now able to talk directly to Adobe engineers about InDesign’s scripts for ebooks.

The ASI is trying to get the Digital Trends Task Force to conferences that indexers don’t usually attend, focusing on the themes of monetization and semantic metadata.

To stay informed about digital trends affecting indexers, Wright and Ream suggest joining the DTTF’s LinkedIn group and following TidBITS, Peter Meyers (@petermeyers on Twitter), and Joe Wikert.

ISC Conference 2012, Day 1—Indexing National Film Board of Canada images

NFB librarian Katherine Kasirer showed ISC conference attendees what’s involved in indexing the National Film Board’s collection, particularly its Stock Shot library.

We all know the National Film Board as a Canadian institution. It was established in 1939 and has about 13,000 titles in its catalogue, including feature-length documentaries and short animated films. Only 2,500 are available through the NFB.ca website, and these are the result of the NFB’s ongoing project to digitize all films and make them available for streaming.

The NFB also has what it calls the Stock Shot library (or the “Images” database), which is a collection of discarded footage (outtakes) that can be used in other productions. The database also includes

  • the Canadian Army Film and Photo Units (CAPFU) collection, deposited in 1946
  • the Associated Screen News collection
  • captured materials from World War II (German war propaganda)
  • the Canadian Government Motion Picture collection

Users might be, say, music video or commercial producers, researchers, or documentary and feature filmmakers. The database has very fine subject indexing to allow users to find exactly what they need. Since filmmakers often have to convey a particular mood or show a specific object or event, the indexing must include a number of elements of information to help users retrieve the desired footage, including

  • subject
  • location
  • shooting conditions (e.g., foggy, sunny)
  • time of day, season
  • camera angles (e.g., close-up, aerial shot)
  • year of production
  • special effects (e.g., underwater, time-lapse)
  • camera operator
  • film (title of film that produced the outtakes)
  • technical details

The search is, of course, bilingual, and will bring up images and clips, not just a written description. Kasirer’s presentation really drove home how specific and often how nuanced image and footage indexing can be.

ISC Conference 2012, Day 1—Building a bilingual taxonomy for ordinary images indexing

Elaine Ménard gave ISC conference attendees a glimpse into the world of information science research. An assistant professor in the school of information studies at McGill University, Ménard embarked on a project to develop a bilingual taxonomy to see how controlled vocabularies can assist in both indexing and information retrieval. Taxonomies are inherently labour intensive to create, and the bilingualism adds an additional complication.

Ménard’s Taxonomy for Image Indexing And RetrivAl (TIIARA) project consists of three phases:

  1. a best practices review,
  2. development of the taxonomy, and
  3. testing and refinement of the taxonomy.

Phase 3 is currently underway, and she gave us an overview of the first two phases.

In phase 1, Ménard and her team evaluated 150 resources, including 70 image collections held by libraries, museums, image search engines, and commercial stock agencies and 80 image-sharing platforms with user-generated tagging. They discovered that 40% of the metadata dealt with the image’s dimensions, material, and source, and 50% of the metadata addressed copyright information, with the balance devoted to subject classification. This review of best practices constituted the basis of phase 2.

In phase 2, Ménard’s team constructed an image database and developed the top-level categories and subcategories of the taxonomy. To create the database, they solicited voluntary submissions and ended up with a database, called Images DOnated Liberally (IDOL), of over 6,000 photos from 14 contributors. Her taxonomy kept in mind Miller’s Law of 7 +/- 2 and featured (after a series of revisions and refinements) nine top-level categories, designed to help users with retrieval while being as broad as possible, and a further forty-three second-level categories.

After the category headings were translated, two volunteers, one anglophone and the other francophone, tested the preliminary taxonomy through a card-sorting game, in which they were instructed to sort the second-level cards according to whatever structure they desired and provide a heading for each sorted group. This pretest showed a polarization of “splitters” and “lumpers” and didn’t provide any practical recommendations for the taxonomy but did suggest revisions to the card-sorting exercise.

Ten participants (five male, five female; five anglophone, five francophone) were recruited to test the taxonomy to expose problematic categories in the structure. Half of the group was instructed to sort the second-level categories according to the existing first-level structure; the other half could sort the second-level categories as they pleased. Through this test Ménard hoped to assess how well each category and subcategory were understood; the differences between the French and English sorts would reveal nuances that had to be taken into account in the translation of the structure.

Results showed that the first-level categories of “Arts,” “Places,” and “Nature” were well understood but that “Abstractions,” “Activities,” and “Business and Industry” were problematic. Feedback from participants helped researchers clarify the taxonomic structure to seven first-level headings. Interestingly, Ménard found fewer disparities between the languages than expected.

The revised TIIARA structure was refined to include second-, third-, and fourth-level subcategories and was simultaneously developed in English and French.

In phase 3, underway now, two indexers—one English, one French—will work to index all images in the IDOL databases according to the TIIARA structure. Iterative user testing will be carried out to validate and refine the taxonomy.

So far the study has shown that language barriers still prevent users from easily accessing information, including visual resources, and a bilingual taxonomy is a definite benefit for image searchers. Eventually the aim is to implement TIIARA in an image search engine.

ISC Conference 2012, Day 1—More to come!

There were four other ISC sessions that I attended today, but I haven’t had the chance to write them up. I’ll post them as soon as I can piece together something coherent out of my notes. Thanks for your patience!

UPDATE (Sunday, June 3): Ack. Now I have three and a half days’ worth of conference sessions—for the ISC and the EAC—that I have to summarize and post. I took a heap of notes, and I got the speakers’ permission to post synopses, so I’ll eventually get everything up here, though perhaps not as quickly as I initially imagined. I’m hoping to work my way through the session notes over the next week or so. Right now, however, brain = toast.

ISC Conference 2012, Day 1—The glory and the nothing of a name

Noeline Bridge is the editor Indexing Names, a book fresh off the press. She spoke today at the ISC conference about proper noun indexing, particularly the tricky problems that arise from people’s names.

Determining the order of the elements of a name with multiple components is the basic problem that a proper noun indexer must solve. For example, the indexer must know that many medieval names and names that indicate a patronymic are typically left as is and that German names with “von” are traditionally indexed under the part that follows “von.” Bridge gave attendees an extremely useful list of resources that guide the practice with respect to inverting names in a variety of languages.

Deciding how much information to include and exclude is also an indexer’s judgment call. We have to be sensitive to what a publisher or author may want. For instance, one of Bridge’s publisher clients insisted that all military titles be included. Bridge occasionally adds glosses with qualifying phrases for added specificity. As an index user, she explains, she likes to know right away which entries refer to human beings and which ones do not, and the glosses help establish that.

Be careful for parts of a name that may be titles or honorifics. If an author uses only one name to refer to a person (e.g., Batista, versus Fulgencio Batista y Zaldivar), one school of thought is that that’s all you need to include, but Bridge often prefers to look up and include all components of that person’s name for completeness.

Bridge uses glosses to help distinguish between people with similar names (a situation that comes up often in family histories or local histories) by place, by occupation, or by relationship. She uses these to keep them straight for herself and often simply leaves them in to help the reader. Sometimes she uses a family tree program to keep track of whom the text is referring to if there are many generations of people with the same name.

Changes in name can be a complicated category, because in some cases—for instance, when a writer adopts a pseudonym—the person is adopting a different persona, and an argument can be made to index these separately. In cases where a name evolves, once again, the indexer must use judgment to decide whether to use the most recent name/title or the one used predominantly in the book.

In the case of transliteration and romanization, the decision usually has been made for you by the author regarding spelling. An exception is when you have a collection or anthology with different authors on overlapping topics.

A theme throughout Bridge’s talk was that you must be prepared to yield tactfully an author’s preferences, and you must be sensitive to context. For example, whereas you would usually index a celebrity under a name by which she is most commonly known, at times it may be appropriate to use her birth name if you’re indexing a book about her family history.

ISC Conference 2012, Day 1—Ebook indexes: the devil is in the details

Jan Wright, a leader in the field of ebook indexing, gave the keynote address at the Indexing Society of Canada’s annual conference this morning. We are witnessing a watershed moment, she says, where we are trying to define what the markup for our content should look like, no matter where it is—whether it ends up on paper or on a device like e-reader or smart phone. This development is in its infancy right now, with conflicting formats on different platforms, and Wright is part of a working group of indexers actively involved in shaping the EPUB 3.0 standard to include indexing concerns.

Current ebook indexing is either nonexistent or ineffectual. Ebook indexes may be missing or static, and there are almost no ebook indexes that index at a paragraph level. They are not an integrated navigational tool, they are difficult to get to, and they are hard to browse, especially if they’re typeset in two columns.

Existing platforms try to mimic certain features of indexing, but they don’t provide all of the functionality of a traditional index. For example, iBooks Author conflates an index with a glossary and limits the function of indexes as navigational tools. Amazon’s X-Ray, currently available only on the Kindle Touch, shows all occurrences of a particular term by page, chapter, or book, but it is merely recall—without the precision of an index—and offers terms in chronological order. In other words, it’s a brute force attempt at indexing.

When considering ebook indexes, we have to take into account a reader’s mental patterns and search behaviours. Some readers have never read the book and need to know if it adequately covers a given topic; some have read the book and know that their search topic is in there, but they have to find it. We must also keep in mind that reading styles differ whether you’re reading for education or for pleasure, fiction or nonfiction. Using physical cues, such as the position in a book or location on a page, to locate content, as well as behaviours like skimming, are disrupted in ebooks. Some platforms attempt to mimic a paper metaphor, but really, paper is just another interface. The key is to figure out what each interface does best and playing to those strengths, because the paper metaphor doesn’t carry over well onto a small screen. The danger with today’s ineffective ebook indexes is that they are training the reader to believe they are unpredictable and thus to question why they should bother using them at all.

The ideal ebook index has features that have been implemented in other contexts before and so should be completely feasible. Wright gave us a demo of what an effective ebook index should do. It should be accessible from every page; the “Find” feature should reflect the best hits, as identified by index; it should show the search results with snippets of text to offer context; it should allow cross-references to help you refine search phrasing; and it should remember that you’ve been there before and let you go back to it. Ebooks would also allow for additional functionality, like bringing up all indexed terms in a highlighted swath of text in a kind of “mind map” that offers additional information showing how concepts are connected.

So what can we do now? First, Wright says, is to get ready for the eventual use of scripts and anchors in EPUB 3.0. A goal is to develop a way to add anchors or tags to content at the paragraph level, which would allow for hyperlinking directly to the relevant content. Once prototypes of the interactive ebook index have been developed, we must assess their usability to ascertain what’s best for readers.

A big takeaway from this keynote speech is that advocacy and outreach are essential. With the standards at a nascent, malleable stage, this is the time for indexers to have their concerns addressed as the technology develops so that indexers’ workflow can be taken into account. (But more on this in a later post.)

Credit where credit’s due

EAC-BC’s professional development co-chair, Eva van Emden, has posted some thoughts of her own about low-cost ways book and magazine publishers can help keep their freelancers happy, following my posts about the care and feeding of freelancers and maximizing your freelance editors’ marketing potential.

Her point about acknowledging a freelancer’s role with a credit line is an excellent one. Although most of my book publisher clients will credit at least the designer and substantive editor, and sometimes the copy editor, I’m well aware that doing so is not standard within the industry, and I think that, as minor a point as it seems, making it standard is something worth fighting for. Our work as editors should be invisible, but we shouldn’t be.

I appreciate that my editorial credits often pop up in Google Books results when people search for my name and that I can easily point prospective clients to Amazon’s Look Inside feature to show that I’ve worked on a particular book when my name appears on the copyright page. Not only does not including an editorial credit hurt my ability to promote myself, but it also hurts the profession. We aren’t doing ourselves any favours by essentially agreeing to pretend that editors don’t contribute to a book.

Although I understand why the proofreader may not be credited (or wish to be credited), particularly unsung are indexers, who are only very rarely acknowledged for their contributions. In a way, I can understand why—the tight timelines involved in indexing and the fact that the author and editor will modify the index often mean that the indexer does not have direct control over the quality of the final printed index, and since the index is occasionally added in at bluelines, having to modify the copyright page to add a credit would mean additional costs for the publisher. Mostly, however, I think it’s just inertia that has prevented crediting indexers from becoming standard. The indexer for Derek Hayes’s Historical Atlas of the North American Railroad (2010), Judith Anderson, was delighted to be asked if she wanted a credit, and Hayes, who designs all of his own books, has acknowledged the indexer on the copyright page of his atlases ever since. Associating the index with a name is especially important, I think, to show that a person is involved in creating the index—there’s a common misconception that indexes can be computer generated without human input, and, again, perpetuating that myth can only damage the indexing profession.

I’m not suggesting that we need some sort of overt advocacy campaign to change the way publishers operate (although organizations like the Editors’ Association of Canada and Indexing Society of Canada are in a good position to raise awareness of this issue), but if we all begin requesting credits when we work with our clients, we can begin organically to define a new standard for giving all team members their due.