Improper Output Handling


 Project: WASC Threat Classification

 Threat Type: Weakness

Reference ID: WASC-22

 

Improper Output Handling

 

Output handling refers to how an application generates outgoing data.  If an application has improper output handling, the output data may be consumed leading to vulnerabilities and actions never intended by the application developer.  In many cases, this unintended interpretation is classified as one or more forms of critical application vulnerabilities.

 

Any location where data leaves an application boundary may be subject to improper output handling.  Application boundaries exist where data leaves one context and enters another.  This includes applications passing data to other applications via web services, sockets, command line, environmental variables, etc...  It also includes passing data between tiers within an application architecture, such as a database, directory server, HTML/JavaScript interpreter (browser), or operating system.  More detail on where improper output handling can occur can be found in the section below titled "Common Data Output Locations”.

 

Improper output handling may take various forms within an application.  These forms can be categorized into: protocol errors, application errors and data consumer related errors.  Protocol errors include missing or improper output encoding or escaping and outputting of invalid data.  Application errors include logic errors such as outputting incorrect data or passing on malicious content unfiltered.  If the application does not properly distinguish legitimate content from illegitimate, or does not work around known vulnerabilities in the data consumer, it may result in data-consumer abuse caused from improper output handling.

 

An application that does not provide data in the correct context may allow an attacker to abuse the data consumer.  This can lead to specific threats referenced within the WASC Threat Classification, including Content Spoofing [6], Cross-Site Scripting [7], HTTP Response Splitting [8], HTTP Response Smuggling [9], LDAP Injection [10], OS Commanding [11], Routing Detour [12], Soap Array Abuse [13], URL Redirector [14], XML Injection [15], XQuery Injection [16], XPath Injection [17], Mail Command Injection [18], Null Injection [19] and SQL Injection [20].

 

Proper output handling prevents the unexpected or unintended interpretation of data by the consumer.  To achieve this objective, developers must understand the application's data model, how the data will be consumed by other portions of the application, and how it will ultimately be presented to the user.  Techniques for ensuring the proper handling of output include but are not limited to the filtering and sanitization of data (more detail on output sanitization and filtering can be found in appropriately titled sections below).  However, inconsistent use of selected output handling techniques may actually increase the risk of improper output handling if output data is overlooked or left untreated.  To ensure "defense in depth" developers must assume that all data within an application is untrusted when choosing appropriate output handling strategies.

 

While proper output handling may take many different forms, an application cannot be secure unless it protects against unintended interpretations by the data consumer. This core requirement is essential for an application to securely handle output operations.

 

Common Data Output Locations

Depending on the location that user controllable output is placed, various attacks can be executed. OWASP has a Cheat Sheet [4] outlining mitigations at the various stages of output.  Listed below are several of the most common data output locations.

 

Inside HTTP Headers

HTTP headers exist in both the HTTP Request and HTTP Response and define various characteristics of the client and the requested resource. Attacks against HTTP headers typically involve the injection of Carriage Return/Line Feeds (CR/LF) in order to change the HTTP message structure. By changing the message structure it is possible to abuse both clients (e.g. browsers), and servers (application servers, proxies, and web servers). Notable attacks include HTTP Response Splitting [8], HTTP Response Smuggling [9], and URL Redirector Abuse [14].

 

Inside HTML Tags

Text between HTML tags, in the form <tag>text</tag>, is usually treated by the browser as text to be displayed to the user.  If data is included in this text and is not properly escaped, the data may be unintentionally treated as HTML markup and lead to vulnerabilities.  Data reflected into tags such as <script> and <style>require additional care to prevent the introduction of additional vulnerabilities. Notable attacks include Cross-Site Scripting [7], Cross-Site Request Forgery [25], and Content Spoofing [6].

 

Inside HTML Attributes

Tag attribute content, in the form <tag attr="text">, is another common insertion point for application data in web applications.  HTML attribute data always requires escaping to avoid the data being inadvertently treated as HTML markup.  Many attributes have special meaning and require additional attention to avoid introducing vulnerabilities.  For example the "href" attribute, even if properly encoded will be treated as a script if it starts with "javascript:" (e.g <a href="">link</a>).  The "href", image "src", form "action", and other URL attributes may also be exploited to create cross-site-request-forgery attacks. The Web Application Security Consortium's Script Mapping Project [21] was created in an attempt to map out the script execution behaviors of particular HTML attributes.  Notable attacks include Cross-Site Scripting [7], Cross-Site Request Forgery [25], and Content Spoofing [6].

 

Inside Client-side Script

While a subset of HTML tags, the application data inside <script> tags deserves special attention.  Applications that include data as script variable content must quote and escape or in some way insure that the text is treated as data and not executable script, or otherwise risk the introduction of a variety of attacks.  Even when data is properly escaped it may eventually be passed to a standard VBScript or JavaScript function such as "eval", which may lead to cross-site scripting and other attacks. Notable attacks include Cross-Site Scripting [7],  Cross-Site Request Forgery [25], and Content Spoofing [6].

 

Inside XML Messages

XML in its ubiquity can be found at almost every layer of web applications, including web service messages, XHTML, XSL transforms, AJAX messages, and object serialization.  Application data inserted into XML requires escaping or risks being treated as XML markup in much the same way as HTML.  Additionally, even when properly encoded, some XML messages types give certain attributes and content special meaning that may be interpretted in such a way as to lead to a vulnerability. Notable attacks include XML Injection [15], SOAP Array Abuse , XML External Entities , XML Entity Expansion , and XML Attribute Blowup .

 

Inside SQL Queries

Web applications are often backed by relational databases to persist and report on data.  Applications must insure that SQL queries based upon user influenced data will not allow the data to be interpretted as instructions to the database. Notable attacks include SQL Injection [20].

 

Inside JavaScript Object Notation (JSON) Messages

JSON is a data serialization construct derived from the JavaScript language that is often used by Ajax developers. JSON typically utilizes the JavaScript eval() function for object creation, if an attacker can influence the content/structure of a JSON message a compromise of the DOM is likely. All dynamic data needs to be properly sanitized prior to being included within a JSON message.  In particular, quotes or double-quotes need to be escaped when placed in keys or values to ensure the message structure cannot be compromised. Notable JSON attacks include Cross-Site Scripting [7],  Cross-Site Request Forgery [25], and Content Spoofing [6].

 

Inside Cascading Style Sheets (CSS)

Cascading style sheets (CSS) are typically utilized as external references for formatting the appearance of HTML pages. It is common practice to auto generate CSS, and apply it to the page via the "style" HTML element or tag.  User influenced data included within CSS should be explicitly sanitized to prevent the injection, and execution of a user controlled CSS content. Notable attacks include Cross-Site Scripting [7],  Cross-Site Request Forgery [25], and Content Spoofing [6].

 

Character Set and Encoding Considerations

For a client to safely interpret data, it is important for the server to explicitly specify the appropriate charsets [28]. A common mistake involves a website failing to provide a character set within HTML content (within the meta 'content' attribute), or within the HTTP 'Content-Type' response header. In 2005 an XSS vulnerability was discovered in a major website [27] due to a failure of specifying a character set/encoding [28] such as UTF8. Due to the content inspection behavior of browsers such as Internet Explorer, an attacker was capable of injecting UTF7 into a webpage lacking a charset and execute a malicious payload without the use of metacharacters. Ensure that prior to outputting user controlled data to a consumer, that the appropriate charset/encoding is specified.

 

Unicode and Internationalization

Most Unicode abuses involve either attacking how the data is visualized when presented to the user, or how data is transformed. Extensive information on Unicode visualization, and transformation based attacks can be found in [29] and [31].  Notable Unicode attacks include Content Spoofing, and Directory Traversal.

 

Output Sanitization

Output sanitization can be performed by transforming data from its original form to an acceptable form either by removal of that data, or by encoding or decoding it. Common encoding methods used in web applications include the HTML entity encoding and URL Encoding schemes. HTML entity encoding serves the need for encoding literal representations of certain meta-characters to their corresponding character entity references. Character references for HTML entities are pre-defined and have the format &name;  where "name" is a case-sensitive alphanumeric string.

 

A common example of HTML entity encoding is where "<" is encoded as < and ">" encoded as > .  URL encoding applies to parameters and their associated values that are transmitted as part of HTTP query strings. Likewise, characters that are not permitted in URLs are represented using their Unicode Character Set code point value, where each byte is encoded in hexadecimal as "%HH".  For example, "<" is URL-encoded as "%3C" and "ÿ" is URL-encoded as "%C3%BF". Refer to [1] for comprehensive information on character encoding solutions.

 

Output Filtering

Output Filtering is a decision making process that leads either to the acceptance or the rejection of output based on predefined criteria. In its most basic form, output filtering deals with matching or comparing a data stream with a predefined set of characters to determine acceptability. Acceptable data is passed forward for processing and unwanted characters are either blocked/stripped or transformed thus preventing the application from processing unrecognized and potentially malicious output. There are two major approaches to output filtering [2]:

 

 

There are advantages and disadvantages to both approaches. Blacklist based filtering is widely used as it is fairly easy to implement, but offers protection only from known threats. Characters in a blacklist can be modeled to evade filtering as the filter only blocks known bad characters; an attacker can specially craft an attack to avoid those specific characters. Researchers have demonstrated several ways of evading blacklist based filtering approaches. The XSS cheat sheet [5] and SQL cheat sheet [24] are classic examples of how filter evasion techniques can be used against blacklist based approaches. Both Mitre [22] and NVD [23] host several advisories describing vulnerabilities due to poor blacklist filtering implementations.

 

Whitelist based filtering is often more difficult to implement properly. Although proven efficient with virus and malware protection techniques, it can be difficult to compile a list of all good input that a system can accept.

 

A common approach to perform filtering, validation and sanitization is through the use of a regex (Regular Expressions) [23]. Regular Expressions provide a concise and flexible means of identifying patterns in a given data set. Many ready-made regular expressions that deal with common input/output related attacks such as SQL Injection [20], OS Commanding [11] and Cross-Site Scripting [7] are available on the Internet. While these regular expressions may be simple to copy into an application, it is important for developers using them to ensure they are evaluating the requirements for their expected input streams.

 

For XML based applications, XML Schema Validation [30][32] is a popular approach for applying Input/Output Filtering to XML messages. XML Schemas provide formatting and processing instructions for parsers when interpreting XML documents. Schemas are used for all of the major XML standard grammars coming out of OASIS. A schema file is what an XML parser uses to understand the XML’s grammar and structure, and contains essential preprocessor instructions. Schema Validation is a method of checking to see if an XML document conforms to a set of constraints. Schema Validation used in a security context is often called schema hardening.

 

Commercial companies like Microsoft and open source communities like OWASP have ongoing efforts to provide protection tools against some of the common attacks mentioned above. Microsoft's Anti Cross-Site Scripting Library [26] not only guides its users and developers with putting measures in place to thwart cross-site scripting attacks, but also provides insight into alternatives for proper input and output encoding where its library routines may not apply. OWASP's ESAPI project [3] provides guidelines and primary defenses against SQL Injection attacks. It also provides details on database specific SQL escaping requirements to help escape/encode user input before concatenating it with a SQL query. SQL escaping, as advocated in EASPI, uses DBMS character escaping schemes to convert input that can be characterized by the SQL engine as data instead of code. 

 

 

 

References

 

Character encodings in HTML

[1] http://en.wikipedia.org/wiki/Character_encodings_in_HTML

 

Secure input and output handling

[2] http://en.wikipedia.org/wiki/Secure_input_and_output_handling

 

OWASP Enterprise Security API

[3] http://www.owasp.org/index.php/Category:OWASP_Enterprise_Security_API

 

OWASP XSS (Cross-Site Scripting) Prevention Cheat Sheet

[4] http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet

 

XSS (Cross-Site Scripting) Cheat Sheet

[5] http://ha.ckers.org/xss.html

 

Content Spoofing

[6] http://projects.webappsec.org/Content-Spoofing

 

Cross-Site Scripting

[7] http://projects.webappsec.org/Cross-Site-Scripting

 

HTTP Response Splitting 

[8] http://projects.webappsec.org/HTTP-Response-Splitting

 

HTTP Response Smuggling

[9] http://projects.webappsec.org/HTTP-Response-Smuggling

 

LDAP Injection

[10] http://projects.webappsec.org/LDAP-Injection

 

OS Commanding

[11] http://projects.webappsec.org/OS-Commanding

 

Routing Detour

[12] http://projects.webappsec.org/Routing-Detour

 

SOAP Array Abuse

[13] http://projects.webappsec.org/SOAP-Array-Abuse

 

URL Redirector Abuse

[14] http://projects.webappsec.org/URL-Redirector-Abuse

 

XML Injection

[15] http://projects.webappsec.org/XML-Injection

 

XQuery Injection

[16] http://projects.webappsec.org/XQuery-Injection

 

XPath Injection

[17] http://projects.webappsec.org/XPath-Injection

 

Mail Command Injection

[18] http://projects.webappsec.org/Mail-Command-Injection

 

Null Byte Injection

[19] http://projects.webappsec.org/Null-Byte-Injection

 

SQL Injection

[20] http://projects.webappsec.org/SQL-Injection

 

WASC Script Mapping Project

[21] http://projects.webappsec.org/Script-Mapping

 

CVE at Mitre

[22] http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=blacklist

 

National Vulnerability Database

[23] http://nvd.nist.gov/

 

SQL Cheat Sheet

[24] http://ha.ckers.org/sqlinjection/

 

Cross-Site Request Forgery

[25] http://projects.webappsec.org/Cross-Site-Request-Forgery

 

Microsoft Anti-Cross-Site Scripting Library V3.0

[26] http://www.microsoft.com/downloads/details.aspx?FamilyId=051ee83c-5ccf-48ed-8463-02f56a6bfc09&displaylang=en

 

Google's XSS Vulnerability

[27] http://shiflett.org/blog/2005/dec/googles-xss-vulnerability

 

Character Sets

[28] http://en.wikipedia.org/wiki/Universal_Character_Set

 

CasabaSecurity Unicode Vulnerability and Defense Research

[29] http://www.casabasecurity.com/category/categories/unicode

 

W3C XML Schema

[30] http:// www.w3.org/XML/Schema

 

Attacking Internationalized Software

[31] https://www.isecpartners.com/files/iSEC-Attacking_Internationalized_Software.BH2006.pdf

 

XML Document Validation with an XML Schema

[32] http://onjava.com/pub/a/onjava/2004/09/15/schema-validation.html