An Instantaneous Introduction to CGI Scripts and HTML Forms World Wide Web (WWW) browsers display hypertext documents written in the Hypertext Markup Language (HTML). Web browsers can also display "HTML forms" that allow users to enter data. By using forms browsers can collect as well as display infomation. When information is collected by a browser it is sent to a HyperText Transfer Protocol (HTTP) server specified in the HTML form, and that server starts a program, also specified in the HTML form, that can process the collected information. Such programs are known as "Common Gateway Interface" programs, or CGI scripts. This document describes the Common Gateway Interface in some detail. It focuses on the ways in which a form, a client browser, a server, and the HTTP protocol work together. To understand this complex interaction, you must first understand how a client and a server work together to deliver a "normal" HTML document. This is the "canonical" Web activity; the "usual" Web function. Then you need to understand how scripts are executed in the Web environment without mediating forms. Once these two processes are clear, the forms interface is straight-forward. The Canonical Browser-Server Interaction During a "normal" document exchange a WWW client (Netscape, Mosaic, Lynx, etc.) requests a document from a WWW server and displays that document on a user display device. If that document contains a link to another document, and the user activates that link, the WWW client will then fetch and display the linked document. The following diagram shows a WWW client running on a desktop system, Computer A, interacting with two servers: An HTTP server running on Computer B and an HTTP server running on Computer C. [Cannonical File Exchange on the Web] The client running on Computer A gets a document, stored in a file named docu1.html, from the HTTP server running on Computer B. This document contains a link to another document, stored in a file named docu2.html on Computer C. The Uniform Resource Locator (URL) for that link might look something like: http://ComputerC.domain/docu2.html If the user activates that link, the client retrieves the file from the HTTP server running on Computer C and displays it on the monitor connected to Computer A. The HyperText Transfer Protocol defines communication between the client and an HTTP server. The following example shows what an HTTP exchange between a Lynx client and an HTTP server running on Computer C might look like as the client fetches docu2.html. The client sends the following text to server: GET /docu2.html HTTP/1.0 Accept: www/source Accept: text/html Accept: image/gif User-Agent: Lynx/2.2 libwww/2.14 From: montulli@www.cc.ukans.edu * a blank line * The "GET" request indicates which file the client wants and announces that it is using HTTP version 1.0 to communicate. The client also lists the Multipurpose Internet Mail Extension (MIME) types it will accept in return, and identifies itself as a Lynx client. (The "Accept:" list has been truncated for brevity.) The client also identifies its user in the "From:" field. Finally, the client sends a blank line indicating it has completed its request. The server then responds by sending: HTTP/1.0 200 OK Date: Wednesday, 02-Feb-94 23:04:12 GMT Server: NCSA/1.1 MIME-version: 1.0 Last-modified: Monday, 15-Nov-93 23:33:16 GMT Content-type: text/html Content-length: 2345 * a blank line * . . . . . .etc. In this message the server agrees to use HTTP version 1.0 for communication and sends the status 200 indicating it has successfully processed the client's request. It then sends the date and identifies itself as an NCSA HTTP server. It also indicates it is using MIME version 1.0 to describe the information it is sending, and includes the MIME-type of the information about to be sent in the "Content-type:" header. Finally, it sends the number of characters it is going to send, followed by a blank line and the data itself. Things to note here: * Client and server headers are RFC 822 compliant mail headers. * A Client may send any number of Accept: headers and the server is expected to convert the data into a form the client can accept. Executing "scripts" An HTTP URL may identify a file that contains a program or script rather than an HTML document. That program may be executed when a user activates the link containing the URL. The diagram below shows an hypertext document on Computer B with a link to a file on Computer C that holds the CGI program that will be executed if a user activates the link. This link is a "normal" http: link, but the file is stored in such a way that the HTTP server on Computer C can tell that the file contains a program that is to be run, rather than a document that is to be sent to the client as usual. When the program runs, it prepares an HTML document on the fly, and sends that document to the client, which displays the document as it would any other HTML document. [Data Flow with an HTTP Script] Such programs are sometimes called HTTP scripts or "Common Gateway Interface" (CGI) scripts. Note that CGI scripts may be written in scripting languages (like Perl, TCL, etc.) or in any other programming language (like C, Pascal, Basic). On some HTTP servers these CGI programs are stored in a directory called cgi-bin, and so they are also sometimes called "cgi-bin scripts." Here is a simple AppleScript program that can be run by a MacHTTP server when it receives a request for the file containing the script. When it runs, this program builds an HTML document containing the current time and returns the document to the WWW client that requested it. set crlf to (ASCII character 13) & (ASCII character 10) set header to "HTTP/1.0 200 OK" & crlf - & "Server: MacHTTP" & crlf set header to header & "MIME-Version: 1.0" - & crlf & "Content-type: text/html" set header to header & crlf & crlf - & "Server Script" set body to "

The time is:

" - & (current date) & "

" return header & body The program is stored in a file named "date", in a folder called "scripts". When a user activates a link that points to this script, the Web client will generate an HTTP request that might look like: GET /scripts/date HTTP/1.0 Accept: www/source Accept: text/html Accept: image/gif User-Agent: Lynx/2.2 libwww/2.14 From: montulli@www.cc.ukans.edu * a blank line * When the script runs it will generate an HTTP response that might look like: HTTP/1.0 200 OK" Server: MacHTTP" MIME-Version: 1.0 Content-type: text/html * blank line * Server Script

The time is:

September 15, 1994 3:15 pm

This looks just like any HTTP response from an HTTP server returning a normal HTML document. It just happens to have been generated on the fly. Executing a Script via an HTML Form The ability to process fill-out forms within the Web required modifications to HTML, Web clients, and Web servers (and eventually to HTTP, as well). A set of tags was added to HTML to direct a WWW client to display a form to be filled out by a user and then forward the collected data to an HTTP server specified in the form. Servers were modified so that they could then start the CGI program specified in the form and pass the collected data to that program, which could, in turn, prepare a response (possibly by consulting a pre-existing database) and return a WWW document to the user. The following diagram shows the various components of the process. [Data Flow with an HTTP Form] In this diagram, the Web client running on Computer A acquires a form from some Web server running on Computer B. It displays the form, the user enters data, and the client sends the entered information to the HTTP server running on Computer C. There, the data is handed off to a CGI program which prepares a document and sends it to the client on Computer A. The client then displays that document. HTML Tags Related to Forms Mode The tags added to HTML to allow for HTML forms are:

. . .
Define an input form. Attributes: ACTION, METHOD, ENCTYPE Define an input field. Attributes: NAME, TYPE, VALUE, CHECKED, SIZE, MAXLENGTH Define a selection list. Attributes: NAME, MULTIPLE, SIZE