Besides specifying information about the file being transported, HTTP
also defines the phases of a request/response interaction.
HTTP provides two primary methods to request documents: GET or POST
The foundation of HTTP/0.9 (the first implementation of the HTTP protocol)
was the definition of the GET method that was used by a web browser to
request a specific document.
For example, the following HTTP request would return the document
"index.html" that was located in the web server's root directory called
"webdocs"
GET /webdocs/index.html CRLF
Notice that the GET request began with the GET keyword, included a
document to retrieve, and ended with a carriage return and line feed
combination.
If you would like, you can try making a GET request by connecting to your
favorite web server and sending the GET request yourself (as if you were a
web browser).
Below is a GET session I cut and pasted from a telnet window. In this case,
I used telnet to contact the web server "www.extropia.com" and asked for
the file "irobot.html" in the "Scripts/Columns" directory (Don't forget the two
carriage returns at the end). The server responded by sending me the
contents of that file (the HTML code you see).
selena: telnet www.extropia.com 80
Trying 206.53.239.130...
Connected to www.extropia.com.
Escape character is '^]'.
GET /Scripts/Columns/irobot.html
<HTML>
<HEAD>
<TITLE>Hello there</TITLE>
</HEAD>
<BODY>
Hello there. My, you are awfully good-looking to be a web
browser!
</BODY>
</HTML>
Connection closed by foreign host.
selena:
The beauty of web browsers of course, is that they take care of the HTTP
protocol specifications so that the user only needs to enter the URL of the
page they want to see. The web browser formulates the actual GET
request, sends it to the web server, receives the HTML document back,
and then displays the HTML document according to the HTML instructions.
Besides allowing web browsers (or you
pretending to be one) to get documents from a web server, the GET
method also implements a method for a web browser to send optional
search parameters as well (it was used with ISINDEX HTML files
originally).
Search parameters were encoded in a
special way that the web server can deal with.
Encoding works like this:
The URL is differentiated from the
search parameters by a question mark (?). In other words, a URL
generically looks like the following:
http://www.domain.com/dir/file?search parameters
Since you may want to have multiple
search parameters, the GET method specifies that parameters are
differentiated by placing an ampersand sign (&) between them.
Thus, the encoded URL above becomes something like the
following:
http://www.domain.com/dir/file?search1&search2&search3
Next, search parameters themselves are
specified as "name/value pairs" separated by an equal sign (=)
such as in the following example that sets the variable "lname"
equal to "Sol" and the variable "fname" equal to
"Selena":
http://www.domain.com/dir/file?lname=Sol&fname=Selena
Further, any spaces in the encoding
string are replaced by plus signs (+) as in the following
example:
http://www.domain.com/dir/file?name=Selena+Sol&age=28
Finally, any non-alphanumeric characters
are replaced with their hexadecimal equivalents that are escaped
with the percent sign (%). For example, a single quote character
(') is encoded as %27 and a line break (which is a carriage
return plus a line feed) is encoded as %0D%0A. Thus, we might
see the following example that specifies that the variable
pageName is equal to "Selena Sol's Page":
http://www.domain.com/dir/file?pageName=Selena+Sol%27s+Page
Though the GET method was very useful, a couple of serious problem
remained.
First, the GET method only allowed a limited amount of data (1024
characters) to be sent as URL encoded data.
If there were too many name/value pairs, some of them would be
clipped and data would get lost.
Further, since the information was sent as part of the URL, the user
could see all of that data. On the one hand, that made URL's look really could see all of that data. On the one hand, that made URL's look really
ugly and scary. On the other hand, it meant that the user got to see all of
the inner workings of your CGI input.
This all changed with the development of HTTP/1.0.
The HTTP/1.0 protocol was developed from 1992 to 1996 in order to
satisfy the need to exchange more than simple text information.
The first major change from the HTTP/0.9 specification was the use of
MIME-like headers in request and response messages.
The next HTTP change was the definition of new request methods:
HEAD and POST.
Let's look at both of these changes in greater depth.
Under HTTP/1.0 an HTTP transaction
consisted of a header followed by an empty line and then some
extra data.
We have already talked about the header.
The POST method of input was the other important change brought about
by the introduction of HTTP/1.0.
The POST method allowed web browsers to send an unlimited amount of
data to a web server by allowing them to tag it on to an HTTP request
after the request headers as the message body.
Typically, the message body would be our old familiar encoded URL
string after the question mark (?).
Thus, it would not be strange for a web server to get a POST request that
looked something like the following:
POST /cgi-bin/phone_book.cgi HTTP/1.0
Referer: http://www.somedomain.com/Direcory/file.html
User-Agent: Mozilla/1.22 (Windows: I: 32bit)
Accept */*
Content-type: application/x-www-form-urlencoded
Content-length: 29
name=Selena+Sol&phone=7700404
Notice that the "Content-length" request header is equal to the number of
characters in the body of the request. This is important because a CGI
script could easily parse through the variables in the body using the
length.
Of course, as with the GET method, the user never needs to deal with the
protocol itself. Instead, the browser does all the work of preparing the
POST request headers and body.
So the million-dollar question is how does the browser get the
name/value pairs to put into the HTTP message body?
The answer to that is HTML Forms. Remember those things from last
section?