Creating Search Indexes

In order to search an index file, you must have one present on your computer, network server, some media storage device, or embedded in a PDF. When you install an Acrobat viewer, a help index file is included during your installation. You can use this file to search for words contained in any of the help documents.

If you want to search your own files, you need to create an index. To create an index file, you use Acrobat Catalog. Acrobat Catalog is available only in Acrobat Professional. Search indexes can be used by all Acrobat viewers including Adobe Reader.

To launch Acrobat Catalog from within Acrobat Professional choose Advanced >> Document Processing >> Full Text Index with Catalog. Catalog is robust and provides many options for creating and modifying indexes. After a search index is created, any user can access the search index in all Acrobat viewers to find words using the Search window.

However, before you begin to work with Acrobat Catalog, you need to take some preliminary steps to be certain all your files are properly prepared and ready to be indexed.

Preparation involves creating PDFs with all the necessary information to facilitate searches. All searchable document description information needs to be supplied in the PDF documents at the time of PDF creation or by modifying PDFs in Acrobat before you begin working with Catalog.

For workgroups and multiple user access to search indexes, this information needs to be clear and consistent. Other factors, such as naming conventions, location of files, and optimizing performance should all be thought out and planned prior to creating an index file.

Adding document descriptions is not a requirement for creating search indexes. You can index files without any information in the document description fields. Adding document descriptions merely adds more relevant information to your PDF documents and aids users in finding search results faster.

Document description information should be supplied in all PDF files to be searched. As discussed earlier, all document description data are searchable. Spending time creating document descriptions and defining the field types for consistent organization will facilitate searches performed by multiple users.

The first of the planning steps is to develop a flow chart or outline of company information and the documents to be categorized. This organization may or may not be implemented where you intend to develop a PDF workflow.

If your information flow is already in place, you may need to make some modifications to coordinate nomenclature and document identity with the document summary items in Acrobat. Document descriptions contained in the Title, Subject, Author, and Keywords fields should be consistent and intuitive.

They should also follow a hierarchy consistent with the company’s organizational structure and workflow. The document summary items should be mapped out and defined. When preparing files for indexing, consider the following:

  • Title. Title information might be thought of as the root of an outline—the parent statement, if you will. Descriptive titles should be used to help users narrow searches within specific categories. The Title field can also be used to display the title name at the top of the Acrobat window when you select viewing titles in the Initial View properties.
  • Author. Avoid using proper names for the Author field. Personnel change in companies and roles among employees change. Identify the author of PDF documents according to departments, work groups, facilities, and so on.
  • Subject. If the Title field is the parent item in an outline format, the Subject would be a child item nested directly below the title. Subjects might be considered subsets of titles. When creating document summaries, be consistent.

Don’t use subject and title or subject and keyword information back and forth with different documents. If an item, such as employee grievances, is listed as a Subject in some PDFs and then listed as a Title in other documents, the end users will become confused with the order and searches will become unnecessarily complicated.

  • Keywords. If you have a forms identification system in place, be certain to use form numbers and identity as part of the Keywords field. You might start the Keywords field with a form number and then add additional keywords to help narrow searches.

Be consistent and always start the Keywords field with forms or document numbers. If you need to have PDF author names, add them here in the Keywords fields. If employees change roles or leave the company, the Author fields still provide the information relative to a department.

Legacy PDF files used in an organization may have been created without a document description, or you may reorganize PDFs and want to change document summaries. To quickly (or efficiently) update these documents, you can create a batch sequence to change multiple PDF files and then run the sequence.

Place your PDFs in a folder where the document summaries are to be edited. In the Edit Sequence dialog box, select the items to change and edit each document summary item. Run the sequence to update an entire folder of PDFs.

The content, filenames, and location of PDFs to be cataloged contribute to file structure items. All the issues related to file structure must be thought out and appropriately designed for the audience that you intend to support. The important considerations are as follows:

  • File naming conventions. Names provided for the PDF files are critical for distributing documents among users. If filenames get truncated, then either Acrobat Search or the end user will have difficulty finding a document when performing a search.

This is of special concern to Macintosh users who want to distribute documents across platforms. The best precaution is to always use standard DOS file-naming conventions. The standard eight-character maximum filename, with no more than three-character file extensions (filename.ext), will always work regardless of platform.

  • Folder names. Folder names should follow the same conventions as filenames. Macintosh users who want to keep filenames longer than standard DOS names must limit folder names to eight characters and no more than a three-character file extension for cross-platform compliance.
  • File and folder name identity. In previous versions of Acrobat and Acrobat catalog you had to avoid using ASCII characters from 133 to 159 for any filename or folder name. Acrobat Catalog in earlier versions did not support some extended characters in this range, and you could experience problems when using files across platforms.

Not in Acrobat 8 with more support for non-English languages, you don’t need to be worried about file and folder identity that use special characters.

  • Folder organization. Folders to be cataloged should have a logical hierarchy. Copy all files to be cataloged to a single folder or a single folder with nested folders in the same path. When nesting folders, be certain to keep the number of nested folders to a minimum.

Deeply nested folders slow down searches, and path names longer than 256 characters create problems. Extended characters from ASCII 133 to ASCII 159 used to be a problem when using Acrobat Catalog. In Acrobat 8 you’ll find support for creating index files and searching files and folders containing these characters.

  • Folder locations. Windows users must keep the location of folders on a local hard drive or a network server volume. Although Macintosh users can catalog information across computer workstations, creating separate indexes for files contained on separate drives would be advisable. Any files moved to different locations make searches inoperable.
  • PDF structure. File and folder naming should be handled before creating links and attaching files. If filenames are changed after the PDF structure has been developed, many links become inoperable. Be certain to complete all editing in the PDF documents before cataloging files.

Searches can be performed very fast if you take a little time in creating the proper structure and organization. If you don’t avoid some pitfalls with the way that you organize files, then searches perform much slower. A few considerations to be made include the following:

  • Optimize PDF files. Optimization should be performed on all PDF files as one of the last steps in your workflow. Use the Save As optimizes for Fast Web View found in the General category in the Preferences dialog box and run the PDF Optimizer located in the Advanced menu (Acrobat Professional only). Optimization is especially important for searches to be performed from CD-ROM files.
  • Break up long PDF files. Books, reports, essays, and other documents that contain many pages should be broken up into multiple PDF files. If you have books to be cataloged, break up the books into separate chapters. Acrobat Search runs much faster when finding information from several small files. It slows down when searching through long documents.

You can have multiple indexes for various uses and different workgroups. Personnel may use one index for department matters, another for company-wide information, and perhaps another for a research library. In a search, all relevant keywords will appear from indexes loaded in the Index Selection dialog box.

When using multiple indexes, employees may forget the structure of document summaries and what index is needed for a given search. You can create readme files and index help files to store key information about what search words can be used to find document summaries.

You can create a single PDF file, text files, or multiple files that serve as help. Figure below shows an example of a PDF help file that might be used to find documents related to a company’s personnel policies, procedures, and forms. In the top-right corner of Figure below, the document summary for the help file is listed.

Acrobat-8

The Title fields for this company are broken into categories for policies, procedures, forms, and charts. The Subject fields break down the title categories into specific personnel items, and the Author fields contain the department that authored the documents. Form numbers appear for all Keywords fields.

When creating help files that guide a user for searching document information, use a common identifier in the Subject, Author, and Keywords fields reserved for only finding help files. In Figure above, the identifier is “Table.”

Whenever a user searches for the word “table” in the Author field, the only returns in the Search Results dialog box will be help files. When using the Title and Author field together, a user can find a specific help file for a given department. In the previous example, the Title is HR and the Author is Table.

When these words are searched for the document information, the help file for the HR department is returned in the Search Results. If you reserve keywords for the document Summary fields, any employee can easily find information by remembering only a few keywords.