Debugging Web Applications
[ TOC ]
[ TOC ]
"I downloaded an application and it won't work." We get this email
about 5 or 6 times every day. The way that we address the problem
however, is not by providing an answer, but by telling a story. We
tell a story about us - a mystery.
It is a story about how we debug applications on the zillions of different,
intractable, curmudgeonly systems that exist on the web.
It is a story about how we find the culprit bug when we are not exactly
sure about how the operating system works, which web browsers are trying to
run the application, what funky directives the local system administrator
has applied to the server, or any other number of big, hairy, ugly question
marks that stand between us and a programming-free weekend.
It is a mystery which, as most mysteries do, begins with Sir Arthur Conan
Doyle.
Let's see what Doyle has to say about debugging.
"By a man's finger-nails, by his coat-sleeve, by his
boots, by his trouser-knees, by the callosities of his
forefinger and thumb, by his expression, by his shirt-
cuffs -- by each of these things a man's calling is
plainly revealed. That all united should fail to
enlighten the competent inquirer in any case is almost
inconceivable." - From A Study In Scarlet
Well, this may not seem like a discussion of software debugging, but it
really is. What Doyle is trying to say is that all software and hardware
bugs want to be caught. In fact they want to be caught so badly, that they
carefully lay clues for you as to their whereabouts. Perhaps Doyle meant to
say something like the following:
"By an application’s error message on the command line,
its output to STDOUT, by the HTTP message it sends to
the web browser window, by its entry in the error log, by the
interaction of its algorithms, by the libraries it calls
and the responses sent by them, -- by each of these things
applications’ failures are plainly revealed. That all
united should fail to enlighten the competent hacker
in any case is almost inconceivable." - From A Study In CGI
As a debugger, it's your job to listen to those clues, put them together
into a theory that can be tested, and test the theory against the software
package. In almost every case, you will bat yourself on the brow and say to
yourself, ``Doh! Of course, how simple!''. Because, when all is said and
done, computers are pretty simple creatures and when they break down, there
are usually pretty simple reasons why.
[ TOC ]
Benjamin Hoff once revealed this interesting little story about Taoism and
we supposed that we might pass it along to you.
"I am learning," Yen Hui said.
"How?" the Master asked.
"I forgot the rules of Righteousness and the levels of
Benevolence," he replied.
"Good, but could be better," the Master said.
A few days later, Yen Hui remarked, "I am making progress."
"How?" the Master asked.
"I forgot the Rituals and the Music," he answered.
"Better, but not perfect," the Master said.
Some time later, Yen Hui told the master, "Now I sit
down and forgot everything."
The Master looked up, startled, "What do you mean, you forgot
everything?" he quickly asked.
"I forgot my body and senses, and leave all appearance and
information behind," answered Yen Hui. "In the middle of
Nothing, I join the source of All Things."
The Master bowed. "You have transcended the limitations of
time and knowledge. I am far behind you. You have found the
Way!"
- From the Tao of Pooh
Benjamin added, ``An empty sort of mind is valuable for finding Perls and
Tails and things because it can see what's in front of it. An Overstuffed
mind is unable to.'' (Well he actually spelled it like ``Pearls'', but we
know what he meant.)
What does this have to do with CGI debugging you ask? Well, it has
everything to do with CGI debugging. CGI debugging is not a skill. It is
not a thing you learn in school. It is not something that is particularly
aided by FAQS, or books, or system administrators, or discussion boards.
[ TOC ]
If you have spent more than an hour on a problem, it is time to stop. Very
few problems necessitate more than an hour to solve, so if you’ve been
sitting there for an hour you can be sure that it is most likely that the
problem you are having is not the bug, but yourself.
At this point, it is time to turn off the monitor, light a candle and some
incense, turn on some music, and relax.
You might even go out and walk around the block if it is warm and sunny.
About 20 minutes later you should be ready to get back to work having
achieved several crucial things:
1. You are not one application closer to a heart attack.
2. You are not angry or frustrated.
3. You have cleared your mind of all your preconceived ideas about what you
think the bug is saying and are prepared to ``listen'' to the bug to find
out what it says it has to say.
4. You are not intimidated by the application. Programming is like riding
horses - the minute the application thinks that it is in charge is the
minute it throws you off. (Well, most horses are not that mean, but you
know the expression)
[ TOC ]
Upon returning from the void, the first thing you should do is to set aside
the program and start by coding something really, really small.
You see, debugging is an exercise in the scientific method. And in the
world of the scientific method, the best thing you can do is break
everything up into the smallest pieces you can because the whole is going
to be a summation of the parts and when you find the faulty part, you find
the problem.
NOTE: If you would like to get a quick set of debugging
test scripts, try the following URL:
http://www.extropia.com/scripts/debug_examples.html
NOTE: If you are not a Perl expert, debugging code can
seem pretty daunting. However, don’t let it frighten you.
Perl is one of the best languages for providing excellent
documentation. Not only are there a host of excellent
books from O’Reilly, but there are online tutorials such
as those at http://www.extropia.com/tutorials/ and
http://www.perl.com/.
Further, Perl provides its own online documentation in the form of perldoc.
Perldoc is easy to use. To learn about any standard installed module, type
the following from the command line:
|
|
$ perldoc [modulename]
Thus, to get documentation for the CGI.pm module, use the following:
|
To obtain documentation on any Perl function, use
|
|
$ perldoc -f functionname
|
Thus, to get documentation on the use of the print() function,
use the following:
To learn about references use perldoc perlref and to learn about
object-oriented perl features, try perldoc perltoot. Finally, to get a list
of all the Perl tutorials, try:
[ TOC ]
In other words, you should start by creating the most minimal CGI program
you can so that you will be able to determine what special traits your
local executing environment has that might cause a more complex program to
fall apart.
Try this little application out for size. Copy and paste the following
lines of Perl code into a plain text file, and save it as
hello_cyberspace.cgi somewhere in the cgi-bin directory tree.
|
|
#!/usr/local/bin/perl
print "Content-type: text/html\n\n";
print "<HTML>Hello Cyberspace</HTML>";
|
Okay, now set the permission for this little application so that it is
readable and executable by the web server. Typically, you will use the
following command on a UNIX-based web server.
|
|
chmod 755 hello_cyberspace.cgi
|
Next, run the ``Hello Cyberspace'' application from your browser. You will
probably need to access it with a URL something like the following:
http://www.yourdomain.com/cgi-bin/hello_cyberspace.cgi
Does it work? If not...
1. The first line (#!/usr/local/bin/perl) might be wrong or you may have
accidentally put a blank line before it so that it is not ``really'' the
first line. Check out the section in the Installation Chapter in this guide
entitled 'Modifying the Perl Path Line'.
2. You mis-typed the HTTP header. In order for your browser and server to
communicate, you must correctly follow the HTTP protocol. This protocol
specifies that an HTML-based response, be preceded with ``Content-type:
text/html'' followed by two newline characters.
3. You did not set the permissions correctly and the web browser has not
been given the right permissions to execute the application. The
permissions should be 755.
4. You are not allowed to execute CGI applications from the directory that
you have created the hello_cyberspace.cgi in and you either got a 500
server error or you received the text of the application in your web
browser. The system administrator has restricted you because CGI
applications can be dangerous and she wants to protect her system from your
incompetence.
Most likely, the system administrator has either created a special
directory like cgi-bin for you to put CGI applications or has allowed you
to create special ``access files'' that tell the server that in this
special case, it is okay to run a CGI application. Either way, you should
check with your system administrator and ask her how she has decided to
deal with CGI applications and in which directories it is okay for you to
run them.
5. There are invisible embedded new line characters. Read the related
section earlier in Installation Chapter of this guide entitled 'Unpacking
on Windows and Mac'.
At this point, you can be pretty sure that if the ``Hello Cyberspace''
application did not run, it was because of one of the five reasons above.
After all, there is not much that can be wrong with three lines of code.
That is the reason that we are starting so small.
[ TOC ]
The next thing to do is to try to get the little application to talk to
external files. Since most likely, you will be using CGI.pm to interpret
incoming form data, we may as well start by talking to CGI.pm. To do that,
you will use the 'use' command.
|
|
#!/usr/local/bin/perl
print "Content-type: text/html\n\n";
print "<HTML>Hello World</HTML>";
use CGI;
|
Try it out and see if it works. If all went well, there should be no change
in the output of your program.
So what could go wrong with that?
For one, the Perl interpreter may not be able to load the requested module.
Suppose you got the following error:
|
|
$ perl hello_cyberspace.cgi
Can't locate CGi.pm in @INC (@INC contains: /usr/local/perl/lib
/usr/local/perl/site/lib .) at hello_cyberspace.cgi line 4.
BEGIN failed--compilation aborted at hello_cyberspace.cgi
line 4.
|
This error clearly notes that it was trying to locate the CGi....what’s
that....CGi.....didn’t you mean CG'I'!!! Well as you can see, if Perl
cannot load an external module, it will let you know. In this case, it was
a simple typo. However, it could also be a more difficult error to hunt
down.
For example, you can be pretty sure that when you issue a use command on a
Perl module in the standard distribution of Perl, that the Perl interpreter
will be able to find it, barring typos.
Well, on the other hand, if you are trying to locate modules that are not
part of the standard Perl distribution (like the eXtropia Modules) it can
be more difficult because Perl does not know where to look right off the
bat.
Thus suppose you had a directory structure that looked like the following:
|
|
apache
cgi-bin
Test
hello_cyberspace.cgi
Modules
eXtropia
Datasource.pm
|
Now suppose you make the following modifications to your script:
|
|
#!/usr/local/bin/perl
print "Content-type:text/html\n\n";
print "<HTML>Hello World</HTML>";
use CGI;
use Datasource;
|
What do you suppose will happen? Well, you’ll get an error much like the
following:
|
|
$ perl hello_cyberspace.cgi
Can't locate Datasource.pm in @INC (@INC contains:
/usr/local/perl/lib /usr/local/perl/site/lib.) at
hello_cyberspace.cgi line 6. BEGIN failed--compilation
aborted at hello_cyberspace.cgi line 6.
|
The problem is that the Perl interpreter is looking in its default array of
directories, @INC, in which it has been told to expect Perl modules and
your module, Datasource.pm, is not in any of those directories. That is, your copy of DataSource.pm is located in
apache/cgi-bin/Extropia/Modules. Perl is looking for it in
/usr/local/perl/lib and /usr/local/perl/site/lib.
What you need to do is give Perl some hints as to where it might find your
module. To do that, you use the 'use lib' command modifying your
application to read:
|
|
#!/usr/local/bin/perl
print "Content-type:text/html\n\n";
print "<HTML>Hello World</HTML>";
use lib qw(../Modules/eXtropia);
use CGI;
use Datasource;
|
Now you’ll be able to run your application without a hitch. The
use lib command tells the Perl interpreter to also look in the directory apache/cgi-bin/Modules/eXtropia.
So what if that still did not work? Well, you have two possible problems
1. Permissions, permissions, permissions! Check that the directories in the
path are all executable by world so that the web server can traverse them
and that the files and directories are readable by world so that the web
server has permission to read them.
2. Some ISPs host all accounts as virtual servers. This means that every
account sees itself as the root server, when in actuality, there is one
root server which has aliases to each account. They may also implement a
CGI wrapper as discussed earlier.
[ TOC ]
The problem with paths changing out from under your script is not just a
problem with CGI wrappers or virtual accounts. Some NT web servers such as
IIS, older versions of Netscape, and Website have a different conception of
working directory than other web servers.
For example, some web servers run CGI scripts from the point of view of the
directory containing .conf files. Others run scripts from the perspective
of the cgi-bin alias.
Whatever the case, you can mitigate this problem by using the
chdir() (change directory) command.
To do so, add the following code to your CGI application
|
|
BEGIN {
chdir('some_absolute_path');
}
|
Note that this block of code should come directly after the first line that
points to the location of Perl so that all your code will be affected by
the directory change.
Essentially, the code changes the working directory to that specified by
'some_absolute_path'. In theory, you will set this equal to the actual
absolute path of the script itself.
Virtual Servers are more secure for the ISPs, so they prefer them. The use
of Virtual Servers also allows you to have your own domain name instead of
the domain name of the ISP so they are also nice for you. However, they can
cause lots of problems when trying to install applications that need to
talk to other files on the file system (like hello_cyberspace.cgi needs to
talk to DataSource.pm). Specifically, virtual servers can occasionally get
kind of screwy when it comes to what path is the ``real'' path (especially
Windows servers).
It is possible that the path that you see from the command line may be
totally different from what the web server sees when it runs. Thus, what
you may see as:
|
|
domainname/cgi-bin/hello_cyberspace.cgi
|
the web server may see as:
|
|
/usr/local/etc/httpd/cgi-bin/hello_cyberspace.cgi
|
And when you tell it to use something like ./Library/Datasource.pm, the web server may look for:
|
|
/usr/local/etc/httpd/cgi-bin/Library/Datasource.pm
|
instead of:
|
|
domainname/cgi-bin/Library/Datasource.pm
|
The solution is to ask your system administrator what path you should use
when loading files into your CGI application. Another way to find out what
path the web server is using is to use Cwd. You can try adding the following lines to your application:
|
|
#!/usr/local/bin/perl
print "Content-type: text/plain\n\n";
use Cwd;
my $dir = getcwd();
print "$dir\n\n";
This should echo back the current working directory as seen by your
web server. This path will help you determine what you need to type
in order to get your application to access a supporting file like
F<DataSource.pm>. But remember that you can always just work this out
with your system administrator. That is what she is there for. That
is what you pay her to do.
|
[ TOC ]
Once you have successfully loaded CGI.pm, you can use it to make sure that your CGI application is actually getting
the information from the browser that it is supposed to get.
NOTE: Since the usage of CGI.pm is covered in depth in
Lincoln Stein's book and web site, we won't bother to
explain its usage. If you are not sure what $cgi->param()
is, just do some reading. It is a quick chapter and pretty
straightforward. You can also use perldoc as was discussed earlier.
To do so, we can add a couple of lines to our little CGI application.
|
|
#!/usr/local/bin/perl
print "Content-type: text/html\n\n";
use CGI;
my $cgi = new CGI();
my $param;
print "<HTML>";
foreach $param ($cgi->param()) {
print "$param = " . $cgi->param($param) . "<BR>\n";
}
print "Hello Cyberspace";
print "</HTML>";
|
So what did we add to the application? We simply added a small foreach loop
that goes through each of the incoming form variables stored by the CGI
object and printed out the name and value of the form variable.
If you try to run this from a web browser, you will need to pass in some
parameters. To do so, just use a URL-encoded string such as:
[ TOC ]
You may also be interested in running this script from the command line.
However, when you try it out, you’ll be in for a surprise. Instead of
running the application straight through, it will pause.
What you'll see is something like:
|
|
$ perl hello_cyberspace.cgi
(offline mode: enter name=value pairs on standard input)
|
What has happened is that CGI.pm has detected that you are running the
application from the command line rather than from a browser and will give
you the opportunity to input the form NAME/VALUE pairs that would be coming
in if the application had been called from the web.
The application will not run. It will just sit there. In fact, it is
waiting for you to enter name value pairs and then hit
CTRL-D (or CTRL-Z on Windows) to continue.
If you type in some parameters then hit the CTRL sequence, you’ll get the
intended results:
|
|
$ perl hello_cyberspace.cgi
Content-type: text/html
(offline mode: enter name=value pairs on standard input)
fname=Selena
lname=Sol
email=selena@extropia.com [ HIT THE CTRL-Z (or CTRL-D) HERE]
lname = Sol<BR>
email = selena@extropia.com<BR>
Hello Cyberspace
C:\Program Files\Apache Group\Apache\cgi-bin\Test>
|
You should consult the documentation for CGI.pm if you want more information on how to more efficiently debug CGI
applications from the command line. Most systems should have installed the
documentation already. Thus, you should be able to get the documentation by
typing the following:
This little foreach loop is an invaluable tool when you want to check to
see what the application thinks its variables are. While debugging, you can
always temporarily add this foreach loop to zip through the current
variables and check to see what they are. It may be that one of the
following problems has occurred:
1. You have accidentally overwritten a variable.
2. The application has lost some values for variables you thought it had.
3. The application never received variables that it needs.
Often one forgets to pass state information from page to page via hidden
variables. If you forget to add state information to every HTML page, it is
easy to lose it along the way. Most of the time, that state information is
crucial. So anytime you have a CGI application that utilizes several
screens of info, you should print out your variables when debugging to make
sure they are all getting passed back to the application.
Oh, and one more thing -- you can also get a listing of the current
environment variables by adding the following foreach loop:
|
|
foreach $environment_variable (%ENV) {
print "$environment_variable = $ENV{$environment_variable}<BR>";
}
|
[ TOC ]
So what happens if you introduce logical errors into the application while
you are debugging? Worse yet, what if there are 1000 lines of code and you
are not sure where the error is because you were coding excitedly and
jumping back and forth through sections without constantly checking
yourself to see what you did?
Well, this is actually pretty common and there are quite a few ways to go
about finding the error depending on your taste.
[ TOC ]
The first and most common way to check to see where an application is
failing is to run it from the command line because the command line will
give you much more information than the web browser when you are trying to
debug.
Perl makes it very easy for you to check the syntax of your CGI application
by offering you a syntax checker. In order to check the syntax of your CGI
application, simply type the following from the command line:
Of course, if executing the code has no effect other than outputting, you
can also just try running the application itself without debugging using
the following command:
Perl will attempt to execute your CGI application and will output errors if
there are any. Perl sends back a good deal of useful information about your
problem. Typically, it will do its best to analyze what the problem was as
well as give you a line number so that you can look into the problem
yourself.
[ TOC ]
If you are testing taint-mode-enabled applications, make sure you use perl
-T when running applications from the command line or else you'll get the
error:
|
|
Too late for "-T" option at mlm.cgi line 1.
|
Thus, you might use something like the following:
|
|
$ perl -T hello_cyberspace.cgi
|
[ TOC ]
Assuming that your system administrator has given you access to the log
files, another useful debugging tool is the error log of the web server you
are using. This text file lists all of the errors that have occurred while
the web server has been processing requests from the web. Each time your
CGI application produces an error, the web server adds a log entry.
If your system administrator does not allow access to the error log, you
may ask her to email you a version with only errors related to your work.
She can create such a version by using the grep command and it should not
be too difficult.
On the other hand, if you do have access to the error log, it can usually
be found in the logs directory under the main web server root.
For example, on most Apache servers it can be found at
|
|
/usr/local/etc/httpd/logs
|
[ TOC ]
In Teach Yourself CGI Programming in Perl by Eric Herrmann and in CGI
Programming on the World Wide Web by Shishir Gundavaram you can read about
a method to test your CGI applications using telnet. We recommend reading
these texts if you have the chance. In the meantime, here is a quick
explanation.
If you are able to use the TELNET program to contact your web server, you
can view the output of your CGI application by pretending to be a web
browser. This makes it easy to see ``exactly'' what is being sent to the
web browser.
The first step is to contact the web server using the telnet command:
|
|
telnet www.yourdomain.com 80
|
Typically, web servers are located on port 80 of your server hardware.
Thus, for most of you, you need only contact port 80 on the server. Once
you have established a connection with the HTTP server, you may formulate a
GET request:
|
|
GET /cgi-bin/test.cgi HTTP/1.0
|
This command tells the server to send you the output of the requested
document, which in this case is a CGI application. After your GET request,
the web server will execute your CGI application and send back the results.
[ TOC ]
Another method that you can use to find out where a logical error is when
it is not a ``syntax error'' but an HTTP error is to use
print "Content-type: text/html\n\ntest"; exit;. An HTTP error causes the dreaded, ``404 document contains no data'' error
that the command line and error logs won't necessarily help with. The
application will run fine from the command line, but it won't run from the
web.
Look at the hello world application with a couple of minor changes:
|
|
#!/usr/local/bin/perl
use CGI;
my $cgi = new CGI();
my $param;
foreach $param ($cgi->param()) {
print "$param = " . $cgi->param($param) . "<BR>\n";
}
print "Content-type: text/html\n\n";
print "Hello World<P>";
print "</HTML>";
When you run this application, you will get a "404 document
contains no data" error because the program has sent text to
the browser (the variable names and values) before it has sent
the magic HTTP header line "Content-type: text/html\n\n". But
how would you find out that this is a problem?
|
The solution is to use the ``print "Content-type: text/html\n\ntest";exit;'' line to walk through your routine one step at a time to discover at
which point the problem begins. Let's try it.
|
|
#!/usr/local/bin/perl
print "Content-type: text/html\n\ntest";exit;
use CGI;
my $cgi = new CGI();
my $param;
foreach $param ($cgi->param()) {
print "$param = " . $cgi->param($param) . "<BR>\n";
}
print "Content-type: text/html\n\n";
print "Hello World<P>";
|
That is going to work just fine. The web browser will read ``test'' and we
will know that the error is not being caused by the first line of the
application. Notice that because we use the exit() function,
Perl will stop executing the application so we will not get any of the
other info.
Next, let's move the testing line down...
|
|
#!/usr/local/bin/perl
use CGI;
my $cgi = new CGI();
print "Content-type: text/html\n\ntest";exit;
my $param;
foreach $param ($cgi->param()) {
print "$param = " . $cgi->param($param) . "<BR>\n";
}
print "Content-type: text/html\n\n";
print "Hello World<P>";
|
That is going to work just fine too. You’re getting bold there jumping two
lines at a time, but when you actually use this method, you can feel free
to jump entire routines if you are sure they are not the cause of the bug.
Just don't jump too many at once.
Okay, now let's dump the line into the foreach loop.
|
|
#!/usr/local/bin/perl
use CGI;
my $cgi = new CGI();
my $param;
foreach $param ($cgi->param()) {
print "Content-type: text/html\n\ntest";exit;
print "$param = " . $cgi->param($param) . "<BR>\n";
}
print "Content-type: text/html\n\n";
print "Hello World<P>";
|
That works too. Remember to pass some variables as URL encoded data as
shown above.
Finally, we move the line to the end of the foreach loop and we see that we
get the 404 document contains no data problem.
|
|
#!/usr/local/bin/perl
use CGI;
my $cgi = new CGI();
my $param;
foreach $param ($cgi->param()) {
print "$param = " . $cgi->param($param) . "<BR>\n";
print "Content-type: text/html\n\ntest";exit;
}
print "Content-type: text/html\n\n";
print "Hello World<P>";
|
That is it. We’ve just discovered where the bug was. We can bonk ourselves
on the head and say, ``Of course, the HTTP header MUST be the first thing
printed to the browser!
[ TOC ]
Data::Dumper is an exceptionally cool Perl module that allows you to easily print out
the current state of any standard Perl data structure. Though there are
many features available with
Data::Dumper, and although there are many ways to use it, we generally prefer the
simple approach when debugging. Specifically, we use the syntax:
|
|
use Data::Dumper
print Data::Dumper->Dump([$object_name],[*type_glob_name]);
|
Finally, note that you can always get more detailed documentation on
Data::Dumper, by using perldoc from the command line:
[ TOC ]
When you are working with objects it can sometimes be difficult to use the print "Content-type: text/html\n\ntest";exit; method because object relationships can often get very complex. A single
call from a application executable may seem simple enough, but it may open
a complex set of object relationships.
Thus, moving the debug line from one line of code to another can be a
little misrepresentative of where the error is occurring. As a result, Perl
offers several useful debugging tools that are tuned to the needs of
object-oriented programming. These are croak(),
confess(), and die() that all come with the Carp
module.
However, it is worth mentioning that from a debugging perspective, you can
use the following guidelines to determine which tool to use. Use
die() for shallow errors such as when you are editing the
application executable or the primary application object.
Use croak() or confess() if you are debugging
modules such as eXtropia drivers.
NOTE: Within the context of debugging web applications,
you should add the fatalsToBrowser pragma in CGI::Carp
so that errors will be sent to the browser in their full
text form. For example, you should use:
use CGI::Carp qw(fatalsToBrowser);
[ TOC ]
Well, that's all folks. If you are comfortable with the debugging tools
outlined here and you are ready to get your mindset in gear, then you
should have no worries. Think of CGI debugging as fun. In fact, to get
practice, try going to a CGI discussion forum like the one at http://www.extropia.com/cgi-bin/prod/BBS/Scripts/bbs_entrance.cgi
and helping people solve their problems. You will not only hone your own
skills, but make the CGI community a happier group to be a part of. Good
luck.
[ TOC ]
[ TOC ]
[ TOC ]
|
Master Copy URL: http://www.extropia.com/support/docs/adt/
Copyright © 2000-2001 Extropia. All rights reserved.
|
[ TOC ]
|
Written by eXtropia. Last
Modified at 10/19/2001 |
|