How the Web Works
The World Wide Web isn’t a particular place on the Internet, nor is it a particular computer or something that you can “log into.” Instead, the best way to describe the Web is as a service on the Internet. Using certain protocols, computers that are designated as Web server computers—because they’re connected to the Internet and run Web server software—can respond to requests from client computers running Web browser software.
Every computer on the Internet has an address. When a request comes into a Web server computer from a particular address, it responds by sending the requested file back to that address. When the browser application receives that file, it reacts accordingly, generally by displaying the file as a Web page, image, or multimedia element within the browser’s own window.
In other cases, the browser might recognize that it can’t handle the file, so it hands it off to a helper application or saves the file in a designated place on the client computer’s hard disk. During a typical Web surfing session, this backand- forth communication happens repeatedly for each page—not only do the words need to be downloaded, but so does every image and multimedia feed (sounds, digital movies, and so on).
This is possible because both computers are connected to the Internet. They both recognize a protocol for transmitting and receiving commands, and the client computer can recognize the language that’s used to re-create and display the Web page in the Web browser application. So, we’re dealing with three different protocols or languages here. The first of these protocols is the Transmission Control Protocol/Internet Protocol, or TCP/IP.
This is how computers are connected to one another on the Internet. Each computer is given an address, which is then used to identify the computer and enable it to send commands and data from one place to another. If you have a computer that you plan to use on the Internet, you need to establish a TCP/IP connection for that computer, whether it’s via a telephone modem, cable modem, DSL connection, or some other means, such as a corporate or institutional network-based connection.
After you have that TCP/IP connection up and running, launch a Web browser application, which uses the Hypertext Transfer Protocol (HTTP) to trade commands and communication. Then, the Web server sends Hypertext Markup Language (HTML) documents to your Web browser, which displays them to you. Let’s look at these protocols— HTTP and HTML—more closely.
The Hypertext Transfer Protocol (HTTP) is the underlying protocol that makes it possible for Web browsers and Web servers to communicate. A fairly simple protocol, it isn’t terribly interesting to most Web designers because it’s used exclusively by the Web browser and Web server computers to communicate.
So, you don’t necessarily need to know the intimate details of how HTTP works in order to be a Web author. But the basics don’t hurt. The Web browser requests a connection, via an HTTP command, with the Web server computer. If the Web server computer is able to grant the request, the Web browser then requests particular files that it believes are available on that Web server computer.
If the files are available, the Web server computer sends them to the Web browser, which can then display the files if it’s capable of doing so. Note that HTTP isn’t the only protocol that’s in use on the Internet. There are also protocols designed for transferring files (File Transport Protocol, or FTP) or transferring email messages (Post Office Protocol, Simple Mail Transport Protocol), among others.
In fact, there are even variations on HTTP, such as Secure HTTP (SHTTP), which uses an encrypted, or coded, communication between the Web browser and server to transmit and receive secure information, such as Web commerce information and credit card numbers. But the protocol tends to work behind the scenes. Indeed, the only place where you really need to worry about the protocol you’re using is when you’re creating hyperlinks.
The Hypertext Markup Language (HTML) is a series of standard codes and conventions designed to create pages and emphasize text for display in programs such as those found in Web browsers. Using HTML, you can create a Web page that includes formatted text and commands that cause the Web browser to load and display images or other multimedia elements (movies, sounds, and animations) on that page.
HTML’s name gives you a hint as to what it is—it’s a markup language, which distinguishes it, primarily, from a programming language. In general terms, HTML is a set of instructions that tells a Web browser how certain text and images should be displayed on the page. In most cases, this is done using commands that dictate the organization of a document.
Even the images and multimedia you see on a Web page are part of the markup of HTML. Whereas in a Word document images are embedded as part of the document, an HTML document points to the location of image files, which must be individually available alongside it. Every HTML document is nothing more than a plain text document and no images or multimedia are a part of that HTML document.
So, when a Web browser reads an HTML document, it also reads instructions for loading and positioning any images or multimedia files that you’ve decided to include. Among those instructions, an HTML document almost invariably includes instructions for creating a hyperlink—a link to other HTML documents.
One of the keys to HTML—and, by extension, a key to the way the Web works—is its support of hypertext links. Using special commands in HTML, a Web author can change certain text to make it “clickable.” When the user clicks hypertext, that user’s Web browser generally responds by loading a new Web page. (I say “generally” because sometimes clicking hypertext will cause a helper application such as RealAudio or Telnet to appear, or it may cause a file to be downloaded to your hard disk.)
However, not all links are necessarily text—images can also be clickable. In that case, it’s more appropriate to call the link a hyperlink instead of hypertext, but it’s not terribly important. The terms are basically interchangeable.
What’s more important is recognizing what a big part hyperlinks play in Web publishing and the World Wide Web. Nearly every page on the Web is in some way linked to every other page. On a smaller scale, hyperlinks make it important for you to consider the organization of your site. They also make it possible for your Web page to take part in a larger world of related pages.
How is it possible to link to all these pages? Using HTML markup, you simply create a link that points the Web browser to another address on the Web. Every page on the Web (and most other Internet resources) has a special address that uniquely identifies it, enabling the Web browser to specifically request that page. Those addresses are called Uniform Resource Locators (URLs).
Most Internet services have some sort of addressing scheme so you can find a particular resource easily. For each service, the format of these addresses tends to be a bit different. For example, you would send an email message to America Online account using the address someone@aol.com in an email application.
On the other hand, to access the AOL public FTP site (where you might download the AOL software application), you would enter the following address in your FTP application: ftp.aol.com. Web browsers are capable of accessing many different types of Internet services, and the Web is about accessing individual documents.
So, the URL is a combination of addresses, such as ftp.aol.com, and some additional elements that allow you to specify the type of Internet service and the particular document you’d like to retrieve. That way, URLs can be used to access, by address, most any document or service that’s accessible via a Web browser. An URL follows the format: protocol://internet_address/path/filename.ext or protocol:internet_address. Here’s an example of an URL to access a Web document: http://www.microsoft.com/windows/index.html
Look at this address carefully. According to the format for an URL, http:// is the protocol and www.microsoft.com is the address of Microsoft’s Web server computer. That’s followed by a slash (/) to suggest that a path statement is coming next. The path statement tells you that you’re looking at the document index.html, located in the directory windows. The two basic advantages to URLs are:
- First, they enable you to indicate explicitly the type of Internet service involved. HTTP, for example, indicates the Hypertext Transfer Protocol. However, a URL could easily include a different protocol. You’ll look at this part of the URL in a moment.
- Second, the URL system of addressing gives every single document, program, and file on the Internet its own unique address.
HTTP is the protocol most often used by Web browsers to access HTML pages. List below shows some of the other protocols that can be part of an URL.
- http:// - HTTP (Web) servers
- https:// - Some secure HTTP (Web) servers
- file:// - HTML documents on your hard drive
- ftp:// - FTP servers and files
- gopher:// - Gopher menus and documents
- news:// - A Usenet newsgroup server
- news: - A particular Usenet newsgroup
- mailto: - An email message addressed to a particular email address
- telnet: - A Remote Telnet (login) server
By entering one of these protocols, followed by an Internet server address and a path statement, you can access nearly any document, directory, file, or program available on the Internet or on your own hard drive. As you can see, URLs extend beyond Web servers to other types of Internet protocols.
FTP servers are used specifically for transferring files (as opposed to viewing those files). Gopher servers were the (largely defunct) precursors of Web servers that made plain-text documents available for retrieval. A Telnet server is used for remote login connections, where you enter a username and password, and then use command-line syntax to accomplish things on the server computer.
Most Web browsers can display FTP site listings and Gopher menus, and some can send email messages, but most require a helper application for Telnet access.