Creating PDF From Web Pages

You use the Create PDF From Web Page menu command to convert Web pages to PDF. You can use the command or click the Create PDF From Web page tool in the File toolbar to convert Web pages hosted on Web sites or HTML files stored locally on your computer or networked servers.

Web Capture provides a complex set of preferences and tools with different options for converting Web pages, a Web site, or multiple sites to PDF. A captured Web site converts HTML, text files, and images to PDF, and each Web page is appended to the converted document.

Conversion to PDF from Web sites can provide many opportunities for archiving information, analyzing data, creating search indexes, and many more uses where information needs to reside locally on computers.

Web pages containing animation such as Flash animation can be converted to PDF in Acrobat 6 and later. When animated pages are captured, the animation effects are viewed in the PDF file in any Acrobat viewer.

To understand how to capture a Web site and convert the documents to PDF, you need a fundamental understanding of a Web page and the structure of a site. A Web page is a file created with the Hypertext Markup Language (HTML). There is nothing specific to the length of a Web page.

A page may be a screen the size of 640 × 480 pixels or a length equivalent to several hundred letter-sized pages. Size, in terms of linear length, is usually determined by the page content and amount of space needed to display the page. PDF files, on the other hand, have fixed lengths up to 200 × 200 inches.

You can determine the fixed size of the PDF page prior to converting the Web site from HTML to PDF. After the PDF page size is determined, any Web pages captured adhere to the fixed size. If a Web page is larger than the established PDF page, the overflow automatically creates additional PDF pages.

Hence, a single converted Web page may result in several PDF pages. Web site design typically follows a hierarchical order. The home page rests at the topmost level where direct links from this page occupy a second level. Subsequently, links from the second level refer to pages at a third level, and so forth.

When pages are captured with Acrobat, the user can specify the number of levels to convert. Be forewarned, however; even two levels of a Web site can occupy many Web pages. The number of pages and the speed of your Internet connection determine the amount of time needed to capture a site.

One or more levels can be captured from a Web site. You decide the number of levels to convert in the Create PDF from Web Page dialog box. PDF pages are converted and placed in a new PDF file or appended to an existing PDF file. One nice feature with Create PDF From Web Page is it can seek out and append only new pages that have not yet been downloaded.

After pages are converted to PDF they can be viewed in Acrobat. Any URL links on the converted Web page are preserved in the resultant PDF and can be used to append files to the PDF or open the link destinations in your Web browser. The file types that can be converted to PDF include the following:

  • Adobe PDF format. Although not converted to PDF because they already appear in the format, PDF pages can be downloaded with Create PDF From Web Page.
  • FDF. Form Data Format files can be captured and converted to PDF. An FDF file might be form data exported from a PDF form.
  • GIF Image Format (Graphics Interchange Format). GIF images, as well as the last image in an animated GIF, can be captured when you convert a Web site to PDF. GIFs, like JPEGs within the HTML file, can also appear on separate PDF pages.
  • HTML documents. HTML files can be converted to PDF. The hypertext links from the original HTML file are active in the PDF document as long as the destination documents and URLs have also been converted.
  • JPEG (Joint Photographic Experts Group) image format. Images used in the HTML documents are also captured and converted to PDF. JPEGs may be part of the converted HTML page. When captured, they can be part of a captured HTML page and can also appear individually on PDF pages.
  • Plain text. Any text-only documents contained on a Web site, such as an ASCII text document, can be converted to PDF. When capturing text-only files, you have the opportunity to control many text attributes and page formats.
  • PNG image format. Portable Network Graphics (PNG) contained in Web pages can be converted to PDF just like GIF and JPEG images.
  • XDP. Forms create with Adobe LiveCycle Designer can be saved in XDP (XML Data Package) that can be understood by an XFA plug-in.
  • XFDF. XML-based FDF files typically exported from PDF forms can be converted to PDF. n Image maps. Image maps created in HTML are converted to PDF. The links associated with the map are active in the PDF as long as the link destinations are also converted.
  • Password-secure areas. A password-secure area of a Web site can also be converted to PDF. In order to access a secure site, however, you need the password(s).

If a Web page link to another Web page or URL exists, it is preserved in the converted PDF document. Links to pages, sites, and various actions work similarly to the way they do directly on the Web site. However, if a PDF document contains a link to another PDF document, the converted file doesn’t preserve the link.

When the site is converted, the captured pages reside in a single PDF document. In order to maintain PDF links that open other PDF documents, the destination documents need to be captured as individual pages or extracted and saved from the converted pages.

Links to other levels are also inactive if they have not been converted during the capture. You can append individual linked pages to the converted PDF document by clicking Web links. Selections for converting individual links can be made available in a dialog box opening after clicking a Web link.

You can then append one or more links to the converted document. For executed animation, such as an animation from a GIF file or other programming application, the download contains only the last image in the sequence. A mouseover effect that changes an image is preserved in the converted PDF document as long as you download both the original image and the image associated with the mouseover.

Additionally, you can capture sounds contained in documents. You can also convert form fields to PDF, and field types such as radio buttons, check boxes, list boxes, and combo boxes often convert with the data intact.

You might want to convert a form that has a list of countries and use the form field in your own PDF forms. The Acrobat implementation of JavaScript varies considerably from JavaScript written for Web pages, so many JavaScripts do not work in converted Web pages.

For Web pages that contain non-English characters, you need to have the appropriate resources loaded in order to download and convert the files. Japanese characters, for example, require installation of the Far East language files and additional system files.

Using non-English characters requires you to make additional settings choices for Language Scripts. The options are available in the HTML Conversion Settings dialog box in the Fonts and Encoding tab. For making adjustments in the HTML Conversion Settings dialog box.

After you convert a Web site to PDF, you can edit the document in Acrobat as you would any other PDF. Links to pages become editable links—that is, you can modify their properties. When a site has been converted to PDF, all the PDF pages contain Bookmarks linked to the respective pages.

The first Bookmark is a regular (unstructured) Bookmark that contains the domain name from which the site was captured. All Bookmarks appearing below the server name are structured Bookmarks linked to the converted pages.

With the exception of specific Web applications, you can edit these Bookmarks like any other Bookmarks created in Acrobat. Additionally, you can use structured Bookmarks for page editing by moving and deleting the Bookmarks and associated pages.