Context-Aware Output Encoding
Context-Aware Output Encoding
Output encoding is the primary defense against XSS, but it must be applied correctly for the specific context where data is displayed. HTML encoding is not a universal solution – data appearing in JavaScript requires JavaScript encoding, data in URLs requires URL encoding, and data in CSS requires CSS encoding. Using the wrong encoding for a context can leave applications vulnerable despite encoding efforts. Understanding these contexts and their encoding requirements is crucial for effective XSS prevention.
For HTML contexts, encode characters that have special meaning in HTML: less than (<), greater than (>), ampersand (&), single quote ('), double quote ("), and forward slash (/). However, the specific encoding depends on where in the HTML the data appears. Data within HTML tags requires basic HTML entity encoding, but data within attribute values might need additional encoding depending on whether the attribute is quoted. Unquoted attributes require the most aggressive encoding, replacing virtually all non-alphanumeric characters.
JavaScript contexts present unique challenges because data might need multiple levels of encoding. If server-side code generates JavaScript that includes user data, that data needs JavaScript encoding (escaping quotes, backslashes, and control characters). If that JavaScript then inserts the data into HTML, it needs HTML encoding as well. This double-encoding requirement often trips up developers who apply only one level of encoding. Modern template engines handle this automatically, but understanding the underlying requirement helps developers use these tools correctly.