An Instantaneous Introduction to CGI Scripts and HTML Forms
World Wide Web (WWW) browsers display hypertext documents written in the
Hypertext Markup Language (HTML). Web browsers can also display "HTML forms"
that allow users to enter data. By using forms browsers can collect as well
as display infomation.
When information is collected by a browser it is sent to a HyperText
Transfer Protocol (HTTP) server specified in the HTML form, and that server
starts a program, also specified in the HTML form, that can process the
collected information. Such programs are known as "Common Gateway Interface"
programs, or CGI scripts.
This document describes the Common Gateway Interface in some detail. It
focuses on the ways in which a form, a client browser, a server, and the
HTTP protocol work together. To understand this complex interaction, you
must first understand how a client and a server work together to deliver a
"normal" HTML document. This is the "canonical" Web activity; the "usual"
Web function. Then you need to understand how scripts are executed in the
Web environment without mediating forms. Once these two processes are clear,
the forms interface is straight-forward.
The Canonical Browser-Server Interaction
During a "normal" document exchange a WWW client (Netscape, Mosaic, Lynx,
etc.) requests a document from a WWW server and displays that document on a
user display device. If that document contains a link to another document,
and the user activates that link, the WWW client will then fetch and display
the linked document.
The following diagram shows a WWW client running on a desktop system,
Computer A, interacting with two servers: An HTTP server running on Computer
B and an HTTP server running on Computer C.
[Cannonical File Exchange on the Web]
The client running on Computer A gets a document, stored in a file named
docu1.html, from the HTTP server running on Computer B. This document
contains a link to another document, stored in a file named docu2.html on
Computer C. The Uniform Resource Locator (URL) for that link might look
something like:
http://ComputerC.domain/docu2.html
If the user activates that link, the client retrieves the file from the HTTP
server running on Computer C and displays it on the monitor connected to
Computer A.
The HyperText Transfer Protocol defines communication between the client and
an HTTP server. The following example shows what an HTTP exchange between a
Lynx client and an HTTP server running on Computer C might look like as the
client fetches docu2.html.
The client sends the following text to server:
GET /docu2.html HTTP/1.0
Accept: www/source
Accept: text/html
Accept: image/gif
User-Agent: Lynx/2.2 libwww/2.14
From: montulli@www.cc.ukans.edu
* a blank line *
The "GET" request indicates which file the client wants and announces that
it is using HTTP version 1.0 to communicate. The client also lists the
Multipurpose Internet Mail Extension (MIME) types it will accept in return,
and identifies itself as a Lynx client. (The "Accept:" list has been
truncated for brevity.) The client also identifies its user in the "From:"
field.
Finally, the client sends a blank line indicating it has completed its
request.
The server then responds by sending:
HTTP/1.0 200 OK
Date: Wednesday, 02-Feb-94 23:04:12 GMT
Server: NCSA/1.1
MIME-version: 1.0
Last-modified: Monday, 15-Nov-93 23:33:16 GMT
Content-type: text/html
Content-length: 2345
* a blank line *
. . . . . .etc.
In this message the server agrees to use HTTP version 1.0 for communication
and sends the status 200 indicating it has successfully processed the
client's request. It then sends the date and identifies itself as an NCSA
HTTP server. It also indicates it is using MIME version 1.0 to describe the
information it is sending, and includes the MIME-type of the information
about to be sent in the "Content-type:" header. Finally, it sends the number
of characters it is going to send, followed by a blank line and the data
itself.
Things to note here:
* Client and server headers are RFC 822 compliant mail headers.
* A Client may send any number of Accept: headers and the server is
expected to convert the data into a form the client can accept.
Executing "scripts"
An HTTP URL may identify a file that contains a program or script rather
than an HTML document. That program may be executed when a user activates
the link containing the URL.
The diagram below shows an hypertext document on Computer B with a link to a
file on Computer C that holds the CGI program that will be executed if a
user activates the link. This link is a "normal" http: link, but the file is
stored in such a way that the HTTP server on Computer C can tell that the
file contains a program that is to be run, rather than a document that is to
be sent to the client as usual.
When the program runs, it prepares an HTML document on the fly, and sends
that document to the client, which displays the document as it would any
other HTML document.
[Data Flow with an HTTP Script]
Such programs are sometimes called HTTP scripts or "Common Gateway
Interface" (CGI) scripts. Note that CGI scripts may be written in scripting
languages (like Perl, TCL, etc.) or in any other programming language (like
C, Pascal, Basic).
On some HTTP servers these CGI programs are stored in a directory called
cgi-bin, and so they are also sometimes called "cgi-bin scripts."
Here is a simple AppleScript program that can be run by a MacHTTP server
when it receives a request for the file containing the script. When it runs,
this program builds an HTML document containing the current time and returns
the document to the WWW client that requested it.
set crlf to (ASCII character 13) & (ASCII character 10)
set header to "HTTP/1.0 200 OK" & crlf -
& "Server: MacHTTP" & crlf
set header to header & "MIME-Version: 1.0" -
& crlf & "Content-type: text/html"
set header to header & crlf & crlf -
& "Server Script"
set body to "
The time is:
" -
& (current date) & "
"
return header & body
The program is stored in a file named "date", in a folder called "scripts".
When a user activates a link that points to this script, the Web client will
generate an HTTP request that might look like:
GET /scripts/date HTTP/1.0
Accept: www/source
Accept: text/html
Accept: image/gif
User-Agent: Lynx/2.2 libwww/2.14
From: montulli@www.cc.ukans.edu
* a blank line *
When the script runs it will generate an HTTP response that might look like:
HTTP/1.0 200 OK"
Server: MacHTTP"
MIME-Version: 1.0
Content-type: text/html
* blank line *
Server Script
The time is:
September 15, 1994 3:15 pm
This looks just like any HTTP response from an HTTP server returning a
normal HTML document. It just happens to have been generated on the fly.
Executing a Script via an HTML Form
The ability to process fill-out forms within the Web required modifications
to HTML, Web clients, and Web servers (and eventually to HTTP, as well).
A set of tags was added to HTML to direct a WWW client to display a form to
be filled out by a user and then forward the collected data to an HTTP
server specified in the form.
Servers were modified so that they could then start the CGI program
specified in the form and pass the collected data to that program, which
could, in turn, prepare a response (possibly by consulting a pre-existing
database) and return a WWW document to the user.
The following diagram shows the various components of the process.
[Data Flow with an HTTP Form]
In this diagram, the Web client running on Computer A acquires a form from
some Web server running on Computer B. It displays the form, the user enters
data, and the client sends the entered information to the HTTP server
running on Computer C. There, the data is handed off to a CGI program which
prepares a document and sends it to the client on Computer A. The client
then displays that document.
HTML Tags Related to Forms Mode
The tags added to HTML to allow for HTML forms are:
Define an input form.
Attributes: ACTION, METHOD, ENCTYPE
Define an input field.
Attributes: NAME, TYPE, VALUE, CHECKED, SIZE, MAXLENGTH
Define a selection list.
Attributes: NAME, MULTIPLE, SIZE