Capturing Web Pages to PDF
To begin capturing Web pages, select From Web Page in the Create PDF task button pull-down menu, click the Create PDF from Web Page tool, or choose File >> Create PDF >> From Web Page. The Create PDF from Web Page dialog box opens.
In the Create PDF from Web Page dialog box, various settings determine many different attributes for how a Web page is converted to PDF and how it appears in the Acrobat Document pane. The first level of controls is handled in the Create PDF from Web Page dialog box.
Additional buttons in this dialog box open other dialog boxes where you apply many more settings. If this is your first attempt at capturing a Web page, then leave the default values in the dialog box, and supply a URL in the URL field box. Click the Create button and watch the page appear in Acrobat.
Be certain the Levels field box is set to 1 on your first attempt. Entering any other value may keep you waiting for some time depending on how many pages download from additional levels.
Depending on the site, the number of different links from the site to other URLs, and the structure of the HTML pages, you often need to wade through the maze of dialog boxes that control settings for the PDF conversion from the HTML files. You don’t need to memorize all of these settings, but just use the following section as a reference when you capture Web pages.
The controls available to you in the Open Web Page dialog box begin with the URL you supplied in the Create PDF from Web Page dialog box when downloading the first Web page. This URL determines the site where the pages, which are converted, are hosted. After you enter the URL, the remaining selections you need to set include:
- Get only x levels. Appended pages can contain more than one level. The URL link may go to another site hosted on another server or stay on the same server. Select the levels to be downloaded by clicking the up or down arrows or entering a numeric value in the field box.
- Get Entire Site. When you select this radio button, all levels on the Web site are downloaded.
- Stay on same path. When this option is enabled, all documents are confined to the directory path under the selected URL.
- Stay on same server. Links made to other servers are not downloaded when this option is enabled. n Create. When you’re ready to convert Web pages from the site identified in the URL field box, click the Create button.
- Browse. Selecting this button enables you to capture a Web site residing on your computer or network server. Click Browse to open a navigation dialog box where you can find the directory where HTML pages are stored and capture the pages.
- Settings. Click this button to make choices for the conversion options.
An alternative to using the Create from Web Page dialog box for a single Web page stored locally on your computer is to use drag and drop. Select the HTML document to convert to PDF and drag it to the top of the Acrobat window or program icon. If you have multiple HTML files to convert, you can also use the Create From Multiple Files menu command.
Although it may not be entirely practical, Web designers who are more comfortable with WYSIWYG (What You See Is What You Get) HTML editors than layout applications may find that creating layout assemblies in their favorite editor is beneficial. You can’t get control over image sampling, but you can achieve a layout for screen display.
Create the layout in a program such as Adobe GoLive, Microsoft FrontPage, or Macromedia Dreamweaver. When you’re finished with the pages, launch Acrobat and select Create PDF from Web Page. Click the Browse button in the Create PDF from Web Page dialog box and navigate to your HTML files.
Click the Create button to convert your pages to PDF. You can send these pages as an e-mail attachment to a colleague or print them to your desktop printer. It may sound a little crazy, but some people just don’t like to leave familiar ground.
Even though you may browse to a folder on your hard drive and convert a local Web site to PDF, any external links launch your Internet connection and capture pages on another site. If you want only local pages converted, be certain to click the Stay on same server button in the Create PDF from Web Page dialog box.
Clicking the Settings button in the Create PDF from Web Page dialog box opens the Web Page Conversion Settings dialog box, which has two tabs on which you supply file conversion attributes and page layout settings. The General tab deals with the file attribute settings.
Under the File Description heading the file types are listed for those file types. Select any file type in the list. Only the file types for HTML and Text offer more options, which you access by clicking the Settings button on the right side of the dialog box. If you select a file type other than HTML or Plain Text, the Settings button is grayed out. At the bottom of the dialog box are four PDF Settings check boxes:
- Create bookmarks. When you enable this option, pages converted to PDF have structured Bookmarks created for each page captured. The page’s title is used as the Bookmark name. If the page has no title, Acrobat supplies the URL as the Bookmark name.
- Place headers & footers on new pages. A header and footer are placed on all converted pages if this option is enabled. A header in the HTML file consists of the page title appearing with the tag. The footer retrieves the page’s URL, the page name, and a date stamp for the date and time the page was downloaded.
- Create PDF tags. The structure of the converted PDF matches that of the original HTML file. Items such as list elements, table cells, and similar HTML tags are preserved. The PDF document contains structured Bookmarks for each of the structured items. A tagged Bookmark then links to a table, list, or other HTML element.
- Save refresh commands. When this option is enabled, a list of all URLs in the converted PDF document is saved. When the capture is refreshed, these URLs are revisited and new PDF pages are converted for any new pages added to the site. If you want to append new pages to the PDF, you must enable this item for Acrobat to update the file.
If you look again at the top of the dialog box, the two items for which you can edit additional settings include HTML and Plain Text files. When you select HTML in the File Description list and click the Settings button, a dialog box opens for HTML Conversion Settings.
The two tabs in the HTML Conversion Settings dialog box are the General tab and the Fonts and Encoding tab. The first group of settings handles the general attributes assigned to the page layout:
- Default Colors. Use this option to assign new default colors for Text, Background color, Links, and Alt Text. You can choose a color from a set of preset colors, or choose the option for custom colors, from a palette that opens after you click the swatch.
- Force These Settings for All Pages. HTML pages may or may not have assigned color values. When no color is assigned for one of these items this setting defines the unassigned elements with the colors set in the Default Colors section. If this check box is enabled, all colors, including HTML-assigned colors, are changed to the Default Colors.
- Background Options. These include settings for the background colors used on the Web page, tiled image backgrounds, and table cells. When these check boxes are enabled, the original design is preserved in the PDF document.
If you find table cells, background colors, and tiled background images distracting when you’re reading Web pages either in a browser or converted to PDF, disable the Background Options check boxes before converting to PDF. The original design is changed, but the files are easily legible for both screen reading and when printed.
- Line Wrap. Enables you to choose a maximum distance for word-wrapping the text in an HTML file. When the <PRE> tag is used in HTML, the text is preformatted to preserve line breaks and indents. The field box for this option enables you to control the maximum length for text lines in inches.
- Multimedia. Enables you to set options for handling multimedia clips. From the pull-down menu you can choose from three options:
- Disable multimedia capture. Movie and sound clips are ignored. Only the Web pages are converted to PDF and no links to the media are included in the capture.
- Embed multimedia content when possible. Acrobat viewers 6 and above enable you to embed multimedia clips in the PDF document. Selecting this option captures the Web page and embeds any multimedia files that meet the compatibility requirements of Acrobat. Be aware that embedded multimedia files are available only to Acrobat viewers 6.0 or later.
- Reference multimedia content by URL. The captured Web page contains a link to the URL where the multimedia files are hosted.
- Convert Images. If checked, graphics are converted. If unchecked, the graphics are not converted.
- Underline Links. Displays the text used in an tag with an underline. This option can be helpful if the text for a link is not a different color than the body copy.
After you choose the General settings, click the Fonts and Encoding tab to open the Fonts and Encoding portion of the HTML Conversion Settings dialog box. Options for font handling include the following:
- Input Encoding. Sets the Web page text encoding for body text, heads, and preformatted text. The default is consistent with the language you install. Other supported languages include Chinese, Japanese, Korean, and Unicode characters.
- Body Text, Headings, and Pre-Formatted text. The items appearing under the Language Specific Font Settings section contain editable fields for changing the text encoding and fonts used for the respective items. You make global changes by clicking the Change button, which opens a dialog box for font selections.
You choose fonts from all the fonts installed in your system. A pull-down menu is available for body text, headings, and preformatted text. You can assign fonts individually to each item.
- Base Font Size. You choose font sizes for each of the three text items from pull-down menus or by editing the field boxes.
- Embed Platform Fonts When Possible. Fonts used to view the pages are embedded when the check box is enabled. File sizes are larger with embedded fonts, but file integrity is preserved and eliminates a need for font substitution. Embedded fonts ensure the display and print of the PDF documents precisely as seen in the Web browser.
After choosing all the settings for how to convert HTML files, click OK in the HTML Conversion Settings dialog box. The dialog box disappears and returns you to the Web Page Conversion Settings dialog box. The other file format to which you apply settings is Plain Text files. Select Plain Text in the File Description list and click the Settings button.
The Plain Text Conversion Settings dialog box opens. Choices in this dialog box are similar to the choices available in the HTML Conversion dialog box for the Color, Font, and Line Wrap items, which were just discussed. Line Wrap behaves similarly to the Pre-Formatted text. One additional item appears in this dialog box:
- Text Layout. For large bodies of text, the number of lines on the page can be user-defined. Depending on point size, the standard number of lines on an 8.5 × 11–inch letter-sized page is 66. The default in Acrobat is 60 when the Text Layout check box is enabled. You can make a choice for the number of lines by editing the field box only after selecting the Limit Lines per Page check box.
You can also make font choices for plain text files. Click the Fonts and Encoding tab to reveal more options in the Plain Text Conversion Settings dialog box. Options available in the Fonts and Encoding tab are similar to the font options you have with HTML page conversions.
Make choices in this dialog box for text encoding, text font, and whether the fonts are to be embedded in the resulting PDF. After making changes in the Page Layout dialog box for Plain Text documents, click OK. Once again you return to the Web Page Conversion Settings dialog box.
All the settings discussed on the previous few pages were related to the General tab. In the Web Page Conversion Settings dialog box another option is available. Page layout offers you options for describing the physical size and orientation of converted pages. Click the Page Layout tab and the Web Page Conversion Settings dialog box opens.
Page layout attributes enable you to force long HTML pages into more standard page sizes for viewing or printing. If an HTML page spans several letter-sized pages, you can determine where the page breaks occur and the orientation of the converted pages. Many options are available in the Page Layout tab of the Web Page Conversion Settings dialog box:
- Page Size. This pull-down menu provides a variety of default page sizes. Acrobat supports page sizes from 1-inch square to 200-inches square. You can supply any value between the minimum and maximum page sizes in the Width and Height field boxes below the pull-down menu to override the fixed sizes available from the pull-down menu.
To make changes in the field boxes, edit the text, click the up and down arrows in the dialog box, or click in a field box and press the up and down arrow keys on your keyboard. Press Tab and Shift+Tab to toggle between the field boxes.
- Margins. In the four Margins field boxes, you can set the amount of space on all four sides of the PDF page before any data appear. You make the changes for the margin sizes via the same methods described in the preceding bullet.
- Sample Page. The thumbnail at the right side of the dialog box displays a view of the converted page when sizes are established for the Width, Height, and Margin settings.
- Orientation. You choose portrait or landscape orientation from the radio button options. If a site’s Web pages all conform to screen sizes such as 640 × 480, you might want to change the orientation to landscape.
- Scale wide contents to fit page. Once again, because HTML documents don’t follow standard page sizes, images and text can be easily clipped when these documents are converted to a standard size. When this option is enabled, the page contents are reduced in size to fit within the page margins.
- Switch to landscape if scaled smaller than. The percentage value is user definable. When the page contents appear on a portrait page within the limit specified in the field box, the PDF document is automatically converted to a landscape orientation.
The default is 70 percent. If the default value is used, any vertical page scaled lower than 70 percent is auto-switched to landscape as long as the orientation is selected for Portrait.
After you choose all settings and options in all the dialog boxes pertaining to converting Web sites to PDF, you can revisit the Create PDF from Web Page command from any one of the three methods discussed earlier. As pages are downloaded and converted to PDF, the Download Status dialog box opens displaying, appropriately, the download status.
After the first page downloads, moves to the background behind the converted Web pages. The dialog box actually remains open, but hides behind the PDF pages being converted as the download continues.
If you want to bring the Download Status dialog box to the foreground, choose Advanced >> Web Capture >> Bring Status Dialogs to Foreground. The dialog box opens in the foreground while Acrobat continues to convert pages.