tutorials Web Programming 101 At some point along the way webmasters around the net realized that HTML (1) was too limited to do many of the things that they wanted to accomplish.

How could a webmaster display the current time and date on every page accessed by a client? How could she collect information about clients who were accessing her web site? How could she create a web site that was more than just an information warehouse, but a meaningful and dynamic conversation?

Certainly, HTML was great for distributing "pre-prepared" web pages on request. A client would use a web browser to contact a web server and use HTTP to ask the web server for a specific HTML document. (2) The web server would then send the requested document back to the web browser which in turn, would display the document as defined by the HTML to the client.

Pretty nifty really, and far superior to older technologies like gopher and ftp. However, the interaction between the client and the server was still extremely trivial. The server could only provide HTML documents that had been specially encoded by a webmaster, and that had been placed in certain publicly-accessible directories. The interaction between web browser and web server was pretty mind numbingly simplistic and the coolness of surfing through hyperlinks quickly became droll.

HTML fell short for anything truly "dynamic".

For example, to put the current date on every page using only HTML would require a webmaster to manually edit every file, every day. As you can imagine, this got tiring very quickly for sites with more than 5 pages. Webmasters needed a way to have HTML pages created and modified "on-the-fly," with information that could change weekly, daily, by the second, or for each and every request. And they needed those pages to be modified automatically, without their constant oversight.

As it so happens, the hardware that web server software runs on typically has quite a few resources that can be utilized to help solve these problems. Not only do servers have processing power to spare, they also have a battery of applications (such as e-mail, database manipulation, or calendaring) already installed and ripe for utilization.

And thus was born CGI (Common Gateway Interface). (3) CGI: The Birth of Server Side Web Programming As with most computer jargon, the term Common Gateway Interface can be fairly meaningless at first glance. So, before getting into what CGI can do, let's take a moment to define what CGI actually is.

Let's look at the CGI processes in the following chart.

Okay, so that is probably a lot of abstract stuff to take in all at once, especially if you have not worked with CGI already. So let's back up a minute and go over what CGI is by looking at it in the wild. Let's look at some examples of CGI in action.

You can easily see that CGI makes for a much more profound surfing experience allowing web sites to offer useful and compelling services to surfers who may be interested in information or products offered. (5) However, there is a dark side! CGI sucks! Well, as you might expect, for all its dynamism, CGI was not a holy grail. In fact, there are a lot of sysadmins out there who would be ecstatic if CGI were outlawed. CGI simply causes too many problems.

Well, those are some pretty damning flaws. Like I said, many systems administrators would love to see CGI fall off the face of the Earth. Unfortunately for those system administrators, the fact is that CGI has continued to be the workhorse of the web, powering 90% of the dynamic web pages out there.

The fact is that CGI, especially CGI/Perl is easy to work with and most non-technically oriented webmasters out there can get their needs filled, and filled right away. However amazingly, brand-fantasmagorically wonderful other technologies sound, they are still vaporware as far as the average web developer is concerned. Either the ISP does not provide those technologies, or the learning and development curve is too steep or expensive. And of course for small applications typical of most websites, the big guns of C or C++ are just overkill.

CGI, for all its flaws, works, and works pretty darn well if done carefully. "Intranet" developers with massive budgets can yack all they want to about servlets and SQL gateways and Server Side Includes and customized server applications written in Java, but for most "Internet" developers out there, CGI is the only tool available for solving their problems. And with creativity and care, CGI can also be the right tool. Client Side Scripting However, this is not to say that other technologies are not extremely useful. Several technologies have proven to be just as important as CGI for the average Internet developer. These technologies focus on putting the demands of computation in the hands of the client instead of the server. Thus, things like processing simple requests, maintaining state, and GUI (Graphical Usr Interface) presentation are handled by the web surfer's own computer instead of being handled by some web server hosting a site.

Client-side programming is based on the idea that the computer that the client is using to browse the web has quite a bit of CPU power sitting there doing nothing. Meanwhile, web servers are being tasked to death handling hundreds of CGI requests above and beyond their regular duties. Thus, it makes sense to share some of that burden between the client and server by taking some of the processing load off the server and giving it to the client.

As it so happens, much of what CGI does, can be handled on the client's side. Typically, the only time the server needs to be involved is when the web application needs to send email or access datafiles. Things like maintaining state, filling out forms, error checking, or performing numeric calculation, on the other hand can be handled by the client's own computer. The web browser need not check back with a CGI script every time the user wants to do something. A "script-enabled" HTML page can carry with it instructions on how to handle certain events.

In the following figure, client-side scripting has reduced server load by over 80% for every client accessing the CGI script. And of course, since most of the processing is handled locally, the application as a whole runs 5 times faster.

Obviously, this solves many of the problems posed by CGI. Client-side applications maintain security by keeping server processing to a minimum. They are not restricted by HTTP and the GUI can be as pretty and sleek as any traditional software package out there.

The two most popular languages for client-side scripting are JavaScript (Netscape Navigator) and VBScript (Microsoft's Internet Explorer). Both technologies allow web programmers to encode short program "snippets" into their HTML documents that can be executed by a web browser. JavaScript Made Easy[link has gone dead--ed.] provides several excellent examples of JavaScript in action and Reaz Hoque provides a very straight forward tutorial on JavaScript basics. On the other side of the coin, Microsoft provides a good list of samples for VBScript.

Actually, script-enabled HTML pages can be fairly dynamic and do indeed cut down on the work of the sever. Of course, in any real application, there will need to be a CGI script on the server to email results or access data, but much of the work, perhaps 75% of it, is done by the client. This can cut down server load by 80% on complex applications.

Unfortunately, script-enabled HTML pages have their problems too. The most obvious problem, of course, is that the web browser program must be able to interpret the language used for scripting. And since Netscape and Microsoft are too knuckleheaded to build upon common standards, we are left in the cold. JavaScript programs continually break when viewed using Internet Explorer and VBScripts do the same when viewed with Netscape.

Thus, client-side scripting has remained primarily useful only for limited, controlled intranets where webmasters can be sure that all users are using the same browser software to view web pages.

Further, both JavaScript and VBScript are only limited languages meant for quick jobs with little complexity. Ticker tape animations and sub totaling are one thing, but a true web application requires a bit more umph. Platform Independent Client-Side Applications with Java That very umph comes with Java. Java was originally developed at Sun Microsystems in 1991 to provide a platform-independent programming language and operating system for consumer electronics (TV sets, toasters and VCRs).

In syntax and execution, Java is a lot like a simplified version of C++. ("simplified" should be read in the previous sentence as "an improved"). It is a highly robust, distributed, high performance, object-oriented, multi-threaded language with all of the usual features. As such, it builds upon years of C++ development, taking the good and dispensing with the bad.

As it so happened however, Java did not make it into the consumer electronics market. Instead it wound up in our web browsers.

Java seemed to be a perfect fit for the web. The language itself was extremely small (as it was built to go inside toasters and alarm clocks with tiny amounts of memory). Thus it could quickly be transferred over the web.

Further, Java was platform independent. That is, any computer with a Java virtual machine can run a Java program. Programs can be written anywhere and be run anywhere. This is crucial because, as we saw in the case of the client-side scripting languages, if a language can not run on any machine, it cannot be used on the web that must service every machine, language, and environment imaginable.

Platform independence works because Java is an interpreted rather than a compiled language. Unlike C or C++ code, when Java is compiled, it is not compiled into platform specific machine code, but into platform independent byte code. This byte code is distributed over the web and interpreted by a virtual machine (typically built right into a web browser these days) on whichever platform it is being run. Perhaps a picture would be useful...

Thus, as a programmer, you need only concern yourself with the generic Java programming language and compile your applications into bytecode on whatever system you are using. You can then be assured that your bytecode will be executed correctly whether your clients are using Macs, Pcs, Unix boxes or anything else.

Java, of course, demands books worth of explanation and description. So, of course, we will not delve too deeply into the language here. Instead, I recommend that you browse through the resources collected at Gamelan which is the be all and end all of Java resource sites. There you can sample several Java programs yourself and see how amazing Java really is.

Did I say amazing? Well, Java is certainly a great addition to every web developers tool box, but as you might have expected, Java has as many drawbacks as any of the other tools we've discussed already. Java Sucks Though Java can create interfaces that go way beyond the capability of HTML, CGI, and JavaScript. And though the language is extremely powerful and portable, Java still has serious restrictions.

Of particular concern are the security restrictions built into Java such as the fact that Java programs (Java applet specifically) cannot easily write files to the local harddrive or get data from servers other than the ones they came from. While this may make the public more confident about the language (an important thing and perhaps worth the limitations it causes), it makes Java programs fairly useless for the average developer who absolutely needs such capabilities to create full featured applications.

Further, Java programs with a lot of logic take longer to download. If you went to the Gamelan site linked above and tried to run some example Java apps, you certainly found that you had to wait quite a bit for them to download.

Similarly, because the programs run on the client's machine, they do not have access to resources on the server. Thus, a Java program cannot even query simple flat-file databases located remotely without a proxy (some other program working as a helper on the server).

Finally, Java is still a new language. As such, it is plagued by all the bugs, inconsistency and incompatibilities that any new language is faced with. Though Java boasts platform independence, in reality, programs run differently from platform to platform...if they work at all. Further, though programs might be platform independent, they are not browser independent. Each browser, in fact each operating system, has its own buggy virtual machine that produces different output for the same program. Thus, when you distribute a Java program, you can never be sure exactly how it will run, or if it will be run at all.

Though the restrictions of Java are being addressed slowly, the picture looks bleak in the short term (next couple of years) for the Internet developer. Although, code signatures, and other security fixes are arriving, they will still cause complications for the average web developer with regards to centrally storing information and trusting it. Security will be a continuing thorn in our sides.

But even still, if all of the well-publicized inconveniences of Java were solved tomorrow, there would still be issues preventing the average web developer from writing all their web apps with Java. For example, not everyone has a database to program against or can afford the cost of JDBC (Java database connectivity) proxy servers. In fact, it is safe to say that "most" web developers do not have those tools to work with. Typically, Internet Service Providers do not allow customers to run servers of any kind through their account, let alone complicated database servers. Thus, in order to perform database management functions essential to many applications, the average web developer will still need to work with flat files on the server hardware...and this means CGI.

There are also issues preventing the spoiled "intranet" developer from using Java as well. For example, the JDBC standard will not necessarily help in a corporate environment in which some sort of proxy to a real database server may still be needed that can communicate across a firewall with a web server. Not only will Java be blocked by a firewall, but it cannot use standard encryption standards to provide secure, encrypted transactions.

In short, though Java is a profound addition to our toolbox, it is not the answer to all our woes. Conclusion: Stocking your Toolbox As any good technician knows, there is no such thing as a "best" tool. The best tool is dependent on a whole host of factors from the type of task at hand to the personality of the marketing director. The best tool is a fantasy.

Instead, every web developer should have at her disposal a wide array of tools to solve problems. Sometimes a server-side solution will be appropriate, other times a client-side solution will be best. Your main goal as a web developer is to develop an intuition about when to use which.

That said, I would like to suggest one combination of tools that I see as becoming extremely important for all web developers. The combination is that of CGI and Java. Consider the following Problems and Solutions...

The average "Internet" web developer has probably already picked up Perl/CGI programming. Most have not picked up Java with the exception of being able to code GUI interfaces using various visual tools such as Symantec's Visual Cafe or Microsoft J++.
A Java to CGI interface leverages existing Perl/CGI knowledge so that the core program logic can be located on a server while merely having to code a thin (very small and easily downloaded) GUI Java client. In addition, a developer experienced in Perl will be able to whip out 80% of their program in a short period of time using a language like Perl while leaving a mere 20% (The GUI) left to Java (A hard language for most people).

Internet developers who do work for sites on a virtual web server or an ISP typically cannot use Sybase, Oracle, or another commercial database to store data via JDBC. Frequently, the ONLY option that these developers or consultants have is to do flat-file processing using CGI/PERL that generally has precluded the use of Java.
A Java to CGI interface will allow applets to be created that can use flat-file databases that an average small-business can afford (free).

Developers who have already invested a lot of time creating CGI/PERL for their site do not want to rewrite all their applications in Java.
A Java to CGI interface will allow existing applications to be leveraged by allowing a developer to create a Java applet on top of an existing CGI script with minor modifications to make the CGI output data conducive to interpretation by the Java applet

As you can see, the benefits and flaws of Java and CGI compliment each other very well. Using Java frontends and CGI backends presents an excellent opportunity for web developers on the Internet to create fully featured applications with the available resources. I would recommend that every web developer make sure to study up on the interaction of Java and CGI to be prepared for the contracts that will come forward over the next few years. Footnotes

In this column we will use the word "client" to refer to a person who is using a "web browser" program like Netscape Navigator or Internet Explorer to display HTML documents received from a "web server".

A web server is a combination of hardware (an actual computer that stores all of the HTML files) and software (the program that listens for web browser requests and utilizes the hardware resources to fulfill those requests).

Web browsers and web servers communicate using HTTP (Hyper Text Transfer Protocol) which provides a communication standard for efficient and intelligible dialog. Essentially HTTP allows a web browser to contact a web server somewhere on the web and ask for a specific document (or resource). It also allows the server to send the requested document (or execute the resource) back to the web browser.

  • Truthfully much of what is done by CGI can also be done using SSI (Server Side Includes) which is a service provided by web server software in which certain HTML comment tags can be used to execute commands. SSI will not be covered this month since it demands its own article, however, for the purposes of this introduction, SSI programs are similar enough in theory to CGI programs that they can be thought of as the same thing.

  • When you get some software for your computer and you have to get the special "Mac Version" or "Windows Version", you are getting a "platform dependent" program. Unfortunately, when you move from being a PC user to being a Mac user, you have to buy all new programs because the programs you bought for Windows will not work on Mac. The beauty of web programs is that they are typically "platform independent" which means that you can run them anywhere. Whether you use a PC, Mac or Unix box, the programs will work just fine.

  • CGI is not the only form of server-side scripting available, of course. For example, Netscape's Live Wire is an online development environment for Web site management and client-server application development. It uses JavaScript, Netscape's scripting language, to create server-based applications similar to CGI programs. Unlike CGI programs, however, LiveWire applications are closely integrated with the HTML pages that control them. However, non-CGI server side strategies are best covered in their own article.

  • You can think of an instance of a script as a unique and independent version of a generic script. It is called an "instance" because ten web surfers could all execute a CGI script at the same time. Though each web surfer would be using the same generic CGI script, each instance of that script would be personalized to that web surfer. Thus you may have ten instances of the exact same script running in parallel on the web server hardware.

  • Hidden variables allow you to maintain state using the HTML "Hidden" form tag. Essentially, you include information in your HTML form that will not be visible to the user when they look at the form in their web browser window, but which will be transferred to the CGI script with the user-supplied data. The format of the tag looks something like the following:
    <INPUT TYPE = "HIDDEN" NAME = "first_name" VALUE = "selena">
    <INPUT TYPE = "HIDDEN" NAME = "last_name" VALUE = "sol">
    When the CGI script processes the information that the user enters into the HTML form, it will also receive the variable "first_name" with the value of "selena" as well as "last_name" equal to "sol".

    If the user is not using a FORM tag to navigate through a site, the admin can still encode state information in the URL by using the HTTP standard for URL encoding. For example, the following hyperlink would send the same info as above to the CGI script.

    <A HREF = "www.extropia.com/test.cgi?first_name=selena&last_name=sol">click here</A>
    Notice that variables to be passed along are listed after the question mark, name/value pairs are separated by the ampersand sign, and the variable name and variable values are separated by an equal sign.

    Finally, the CGI script may write out state information to a file on the server and then simply pass along the location of the file using one or both of the above methods. This is best when there is a large amount of state information.

    By the way, maintaining state can also be achieved using Netscape Cookies, however, we will not address cookies here because they require their own article.

  • Perl is a fun language to use because it keeps the nuts and bolts of machine code as invisible as possible. One of the ways Perl does this is by adding an extra step between you and the computer. This extra step is called a "Perl interpreter". This interpreter (which your sysadmin must install) reads a Perl program that you write and translates it "on the fly" into machine code that can be understood by your computer. Your "executable" can then be moved to any other system with a Perl interpreter and be run without problems. Further, the code can be easily modified and understood. Unfortunately, in order to run your executable, you must also run the interpreter and this can be expensive in terms of server resources.

    In more intense languages like C or C++, there is no interpreter. You must use a special "compiler" program to translate your code into machine code. This affords greater power to your programs since you do not need to run a separate interpreter when you run your executable, but it does mean that executables are specific to each operating system and that the source code is stored separately from the executable code.

  • Notice that CGI scripts must be smart enough to answer all sorts of questions.