Filter Evasion Through Creative Encoding

Modern web applications often implement filters to block common XSS payloads, but creative encoding techniques can bypass many of these defenses. Attackers exploit the fact that browsers are extremely forgiving and will interpret content encoded in various ways. HTML entities, URL encoding, Unicode variations, and mixed encoding schemes all provide opportunities for filter bypass. Understanding how browsers decode and interpret these encodings is essential for building effective defenses.

HTML entity encoding offers numerous bypass opportunities because browsers decode entities in different contexts at different times. While a filter might block <script>, it might not catch <script> or the decimal equivalent <script>. More sophisticated attacks use partial entities, relying on browser error correction. For example, &ltscript> (missing semicolon) is interpreted as <script> by many browsers. Filters must account for these browser quirks, normalizing input before validation to prevent encoding-based bypasses.

Unicode provides a particularly rich source of bypass techniques. Browsers normalize many Unicode characters, potentially transforming seemingly safe input into dangerous payloads. For example, certain Unicode characters are visually identical to ASCII characters but have different code points. Full-width characters, mathematical alphanumeric symbols, and other Unicode ranges can represent standard ASCII in ways that bypass simple filters. Additionally, Unicode normalization can transform characters after they pass through filters, creating vulnerabilities in applications that don't account for this behavior.