OWASP Podcast/Transcripts/067

Interview with Jeff Williams (OWASP Podcast 67)
Jim Manico
 * You are listening to the Open Web Application Security Project. This is an OWASP Top Ten 2010 podcast.  We have with us today Jeff Williams.  Jeff Williams is the CEO of Aspect Security.  He is the volunteer chair of the OWASP Foundation, and he is also the primary author of the OWASP XSS Cheat Sheet.  So Jeff, can you start by telling us what you think about the many changes in the OWASP Top Ten for 2010?

Jeff Williams
 * Well, let’s see. Technically the changes aren’t that significant.  There is only a couple of real new areas in the Top Ten.  Security misconfiguration, which we used to have in the Top Ten a long time ago, but we took it out and now it is back in because based on our new model, which I will talk about in a second, it deserved to be back in, and the other thing that we added in was invalidated redirects and forwards, and that has been a problem for awhile out there.  We see it quite a bit, but with the rise of malware as one of the massive threats to people using the Internet, we felt this one really deserved to be one of the Top Ten, and the data supports it, so we added it in there.  Now the big change to this year is that in previous years we called it the Top Ten Vulnerabilities, and that was fine.  It did a lot of good in helping get aware of the vulnerabilities they should be protecting against, but we feel like the market has moved past that a little bit, and in some cases they were more interested in really understanding the risk associated with some of these bad programming practices, and so we have changed the OWASP Top Ten.  It is now the OWASP Top Ten Application Security Risks, and we focus on not just the vulnerability, but also sort of the whole chain of the threat, from the threat agent who might attack something to the attack vector they might use to the security weakness, which is what we used to call the vulnerability, we talk about the security controls, and then we talk about the impact a little bit, like the technical impact of a problem and the business impact of the problem, and by putting that whole chain together, we’re hoping that people get an even better understanding of what this problem means to their business, and hopefully that will drive more people to fix these things instead of continuing to sort of ignore them.

Jim Manico
 * Jeff, the OWASP Top Ten team is on track to create a cheat sheet for every OWASP Top Ten category. You’re the author of the first one, the XSS, the cross-site scripting prevention cheat sheet.  Jeff, why do we even need this cheat sheet?  You only authored this last year but there is nothing else out there that provides this level of defensive description.  Why do you think that is?  Why do we need this?

Jeff Williams
 * Well, let’s see. There has been a huge amount of study into XSS attacks, so you’ve got…cheat sheet, on all the ways, all the different XSS attack vectors, there are tools that scan for XSS.  There has been this ton of research on the attack side, and basically, until I tried to put together this sheet, the guidance for developers was really pretty thin.  Even at OWASP and in the previous versions of the Top Ten, the guidance has basically been do the HTML into the encoding on your untrusted input and then you’re fine.  That guidance wasn’t good enough.  It’s not good enough to stop a large number of XSS attacks, and it really represented, I think, a fairly poor understanding of the problem for developers and it makes us look like we don’t know what we’re talking about.  It makes us look irrelevant, so I try to put this together to explain how XSS works a little better from the programmer’s perspective and really what they have to do and the underlying theme of the OWASP Top Ten Prevention Cheat Sheet is that you have to encode based on the context that you are putting untrusted data into, so there is different rules.  If you’re putting untrusted data into regular, standard HTML, you can use HTML into the escaping and there is different rules there, but if you’re putting it into an attribute or into JavaScript or into CSS or into a URL you need to use different escaping formats,  I mean completely different.  Like instead of using ampersand-LT-semicolon, you need to use percent-two-F, and you need to, if it’s a URL, if you’re in CSS you need to use backslash-27.  If you need to use a JavaScript segment, then you need to use backslash-X, so different rules for different places in the HTML document.

Jim Manico
 * Let’s talk about untrusted data for a moment. What is your vision of what untrusted data really is?  Is that just data that we get from the browser from a user, or is there more?

Jeff Williams
 * Well, certainly a lot of untrusted data does come from the browser, and hopefully developers are treating everything that comes from the browser in the HTTP request as untrusted data, but I don’t think we should stop there. There is a lot of different types of untrusted data.  If you are using back-end services, like you’ve got a database or you’re connecting to a web service or you’re pulling data out of documents, you have to consider where all that data came from and what you should do is think, am I guaranteed that this data doesn’t have scripts in it, and if you can’t guarantee that then you really should follow the simple rules in the prevention cheat sheet.

Jim Manico
 * So Jeff, would you care to tell us how you think input validation relates to cross-site scripting? To this day, when you go Google around and look for information on XSS defense, more often than not, folks say that XSS can be solved at the boundary through input validation routines. What do you think of that, Jeff?

Jeff Williams
 * Well, I mean, the straight up fact is that input validation can’t prevent all XSS. There are lots of situations where you need the characters that can be used to cause cross-site scripting problems in your application, so you might need less-thans and semicolons and single-ticks and those sorts of characters.  You might need those in your input, so straight input validation really shouldn’t be used to provide complete protection against XSS.  Now that said, I am a huge advocate of input validation.  I have talked about it several times at conferences.  It is very important, and so I think of it as an important defense in depth-factor, but it’s really not the perfect way to eliminate all cross-site scripting from the organization.

Jim Manico
 * So Jeff, before we go too deep into defensive theory, can we talk a little bit about injection attack theory? Would you care to tell us about injecting up and injecting down and how that relates to XSS?

Jeff Williams
 * When I started looking at cross-site scripting and trying to put this cheat sheet together, I came up with some theory about how this kind of injection really works, and cross-site scripting really is just a kind of injection. Now when I say injection, I mean something kind of specific.  I mean that the attacker’s data is breaking out of the data context and entering a code context.  Typically, that is used, they do that through the use of some special characters that are significant in whatever interpreter happens to be interpreting that data, so in HTML, those are interpreters like the HTML parser or the XML parser that underlies that, the CSS parser, the JavaScript parser, the URL parser all have different special characters that are relevant in those contexts, and what I noticed was that there is a couple of different types of injection within XSS that are happening here, so frequently what happens is that you are closing the data context and you are starting a new code context, so this is what happens when like if you are injecting into an attribute value, you are going to close that attribute with a quote.   You might close the whole HTML element with a greater-than or a backslash-greater-than and then you are going to start a new context where you might put in a script, so if you think about it in terms of the XML hierarchy, you are really going up a level and then back down a level when you are creating a new HTML element to contain the script, so that is the most common way.  Most of the examples that are out there are injecting up, but there is also examples of what I call injecting down, and that is when you don’t go up first, you don’t have to escape the current context.   Essentially, you create a code context within your current context, and you do this, for example when you’re in a URL context and you can control the URL.  If you go into a JavaScript URL context, you are really creating another level down from the URL context, so this is useful because, from an attacker’s point of view, when you are thinking about the possibilities for injecting cross-site script into a particular place in HTML, you can think about, well are there any ways to go up, meaning close the current context and create another context outside that, or you can think about are there ways to inject down, like can I create a code context within the current context that I’m in without closing the current context, so that’s what injecting up and injecting down are about, and it has a big effect on which characters you need to be very careful about escaping.

Jim Manico
 * Jeff, as you mentioned earlier, the real solution to XSS, the real defense, is outbound coding. I hear a lot of different terms used.  I hear outbounding coding, I hear outbound escaping, I see some folks who say just do HTML entity encoding and you’re all set.

Jeff Williams
 * Essentially, what they mean is changing some of the characters so that they no longer are the special characters that are significant in the interpreter to something that is safe, something that the interpreter will treat just as data. The problem is every different interpreter has a different syntax for escaping, so HTML has its own form, CSS has its own form, JavaScript has its own type of escaping.  You can read about some of these in the actual cheat sheet page, but it doesn’t really even stop there.  Operating systems all have their own escape format.  Every database vendor has their own escaping rules.  It’s really kind of ridiculous how many different forms of output escaping there are, and it makes it virtually impossible for the developer to keep all that in their head.  That’s why I have said numerous times that you need a security encoding or escaping library that knows how to escape for the different interpreters that you’re dealing with.  That way we can get it right once and everyone can use those same things, and I’ll probably mention that that’s exactly what we’re building as one part of the ESAPI project, also at OWASP.

Jim Manico
 * So Jeff, again, I see I a lot of folks who even today are still writing just HTML entity encode everything on output and you’re all set. Why is this terribly wrong?

Jeff Williams
 * Well, HTML entity encoding is pretty powerful. It disables the simple kinds of scripts if your untrusted data is just ending up in regular HTML, or even in HTML attributes in most cases, like if it’s a quoted attribute.  We might talk a little more about that later, but it’s not good enough to stop cross-site scripting in other contexts like in CSS or inside a JavaScript block or in an on mouse-over event.  HTML entity escaping doesn’t work in those contexts, so you can have all of the HTML entity encoding you want and the script will still run, it’s just not good enough.  I wish there was an easier way to do this.  We’ve tried very hard to make the Cross-Site Scripting Prevention Cheat Sheet as absolutely simple as possible and still work.

Jim Manico
 * So speaking of which, let’s start marching through these rules. I’m looking at your cheat sheet now and rule number zero says never insert untrusted data except in allowed locations.  What are some of these illegal locations that data can never be displayed in a safe way?

Jeff Williams
 * I think probably, if you thought hard enough about it, you could come up with a way to escape data in almost any location, untrusted data, in almost any location in an HTML document, but some of them are really tricky edge-cases where I can’t think of a really good use case for doing it. It would be like inside a Meta Tag, using a JavaScript.  I mean, there are these nested contexts where the encoding gets pretty tricky, and we just didn’t think that met the sort of 80-20 rule on what most developers ought to be doing.  If you need to put untrusted data in some crazy place in your HTML, then this is your heads up.  Really think hard about how it needs to be escaped.  You probably need to do some testing to make sure that untrusted data can’t possibly introduce a script in that context, so rule number zero is basically saying deny all.  It says don’t put untrusted data in your document unless you are putting it in one of the five contexts that we talk about in the rest of the document.  One of the biggest places where you really shouldn’t put untrusted data is in a script tag, like right in the middle of a JavaScript.  It’s very difficult to escape data if it’s right in the middle of a JavaScript block, and we do see this.  I’ve seen a number of apps that have something like a URL parameter called call-back, and it actually contains JavaScript code, and then when the app renders that page it takes that JavaScript code and sticks it into the web page.  Well, you’re never going to be able to escape that properly.  Just don’t do that anymore.   Please don’t pass around JavaScript in URL parameters or form parameters or hidden fields or anything or cookies.  That’s crazy.  Stick to some other approach for handling your call-backs.

Jim Manico
 * Yeah, that sounds like XSS as a feature.

Jeff Williams
 * It really is, but you’d be surprised how many apps actually do that.

Jim Manico
 * So the next rule, rule number one is HTML-escape before inserting untrusted data into the HTML element context.

Jeff Williams
 * Exactly, so this is the normal use case for building a web page. You’re taking untrusted data like the users name from a form field or something and you’re going to end up rendering that in the HTML body somewhere, like, Hi, Jeff.  In that case you want to take that untrusted data and you just want to HTML-entity encode it.  We’ve provided a list of six characters that are really important to escape in that context.  We included the slash because it is part of the XML browser, the XML documents parser, so we’re trying to be better safe than sorry here and over-encode just a little bit.  In general, I think you’ll see that the cheat sheet is conservative that way.  It instructs you to over-encode a little bit because, frankly, nobody knows exactly how all of the corner cases in the browser work, except for Garreth Hayes and he’s ridiculous.

Jim Manico
 * Now Jeff, I’m of the mind to encode every single character other than alpha-numerics. Why do you think we’ve moved away from that and are encoding less characters?

Jeff Williams
 * Well, we did get a lot of push-back on encoding everything because, well, it’s a couple of reasons. One, I think it makes your HTML look a little messy, and when we sat down and analyzed it we couldn’t think of a good reason why this would cause a problem in the HTML element context.  Now when we get to the attribute context the rules are going to be different and so, I think if you want to have one method that does HTML entity encoding or escaping, then you better encode a lot more than just these big six, but if you’re willing to bite off the difficulty of having two different methods that developers can keep track of, then it’s okay to only encode the big six in the regular HTML element context, but when we get to attributes, you’ve got to encode a little bit more.  Now it’s interesting, some environments have escape method already in there, PHP has one and dot net has one.  Interestingly, Java does not have an HTML entity encoding method and I have gone to great lengths to work with the Servlet spec team to try to get one built in there and I haven’t made much progress, unfortunately, but I don’t see how you can field a platform today without providing support for these kind of escaping formats.

Jim Manico
 * So Jeff, can you just quickly explain to us what you mean for this rule one by the HTML element context real quick?

Jeff Williams
 * Yeah, the HTML element context is the data that goes between tags like body or div. It’s your normal HTML body content.

Jim Manico
 * So moving onto rule two, it’s saying attribute escape before inserting untrusted data into HTML common attributes.

Jeff Williams
 * So here we’re looking at a different context within the HTML document. We’re looking at attributes.  Attributes are things like if you have a name or a value, it might be a form input value, so you’ll have a tag that says input, name equals something and value equals something.  If you’re taking untrusted data and putting it into those attributes, this is the rule that you’ll want to apply, and you’ll notice it says common attributes, so here we’re not talking about the more complex attributes like on blur and on mouse-click and all of those different things.  We’re talking about the simple attributes that aren’t sort of scripting or style related.

Jim Manico
 * So Jeff, why is this method encoding more characters than normal HTML entity encoding?

Jeff Williams
 * Well, the reason is that so many web applications don’t properly quote their attributes. The HTML browsers are fairly forgiving parsers and they don’t always require you to quote, even though the XML spec says you should and HTML Tidy says you should.  It doesn’t really matter.  The browsers render stuff unquoted just fine and the problem with that is that unquoted attributes can be terminated by a whole bunch of different characters like space and tab and vertical tab and all sorts of weird characters can terminate.  Even things like plus and percent and equals and less-than and so on can terminate attributes.  Now if you did guarantee that all your attributes were properly quoted either with single quotes or double quotes, then really the only thing you’d need to make sure you escape is the corresponding quote, but I think that is a very dangerous practice, so we’re recommending that if you are putting untrusted data in the attributes, just go ahead and escape everything, because in there are all the special characters.  You don’t have to escape characters like alpha-numerics, but escape all of the other stuff and make sure that even if a developer accidentally puts in an untrusted attribute it won’t allow an attacker to break out and introduce a script.

Jim Manico
 * Rule number three, JavaScript escape before inserting untrusted data into HTML JavaScript data values.

Jeff Williams
 * Okay, there is a lot of little words there, so this is for when you’re taking untrusted data and you’re putting it inside a JavaScript, and more specifically you’re putting it inside a data value inside a JavaScript. I want to be really clear.  I don’t want people to be taking data and just putting it straight inside a JavaScript context.  We’re talking about a very narrow use case, like when you need to set a variable to the value of some untrusted data like X equals five, and you happen to have gotten that five from the HTTP request.  That’s when you need to use JavaScript escaping to make sure that the attacker can’t break out of that value, and JavaScript escaping is considerably different than HTML entity escaping.  If you use HTML entity escaping it won’t work.  It won’t do what you want and, in fact, the attack might still work.  JavaScript escaping is the backslash-X-hex-hex format, and you need an output encoder that supports that if you want to generate data in that format, so again I will point to the…reference implementation.  We’ve got encoders for all of these different schemes.

Jim Manico
 * You mentioned Garreth Hayes, well, Garreth Hayes says I’m sure I can break your encoder, so he sent me a string, I encoded it, sent it back to him and he was like, well, what do you know, you’re doing it right, so we got the Garreth Hayes thumbs-up on this encoder as well by the way.

Jeff Williams
 * If you’ve got a Garreth Hayes…then you’re doing pretty good.

Jim Manico
 * So next we have rule number four. CSS escape before inserting untrusted data into HTML style property values.

Jeff Williams
 * Yeah, so this is another interesting context within HTML. This is a CSS context and it can either be an embedded style sheet or it might be a style tag that is associated with one of the divs or something.  We’re saying you are allowed to use untrusted data in here but only if in a certain context, in a style property, and so you will see a couple of examples in the sheet of just where you are allowed to put it, but it’s basically in the definition of a property, so in the place where you would put a color or a font style or something.  You are allowed to put some data there, but you’ve got to escape it to make sure that the attacker can’t break out of the context or introduce a sub context, so to break out of one of these contexts, there are some special characters that somebody might use, things like space and tab and percent and some of the attribute kinds of special characters, but there are also sub contexts to be considered here, like -moz-binding or some of the, in IE there is the expression context that can create a sub context here, so you need to be really careful, and there is a note in here about that, but you need to be really careful about how you deal with untrusted data in CSS and certainly use the escaping format.  Now CSS escaping is another different format.  It’s backslash-hex-hex, so you can’t just use HTML encoding or JavaScript encoding.  There is a separate escaping format for this.

Jim Manico
 * Next we have rule number five. URL escape before inserting untrusted data into HTML URL attributes.

Jeff Williams
 * Alright, so you may have heard of URL encoding or possibly percent encoding. We’re calling it URL escaping here, but it is all the same thing.  It means, in this case, the format is percent-hex-hex and when you’re inserting untrusted data into a URL, you need to make sure it’s escaped.  Now you want to be careful here to make sure you’re inserting this untrusted data within the right spot in the URL.  You shouldn’t let an attacker control the entire URL because then they could provide, and this is a URL, it might be used in an image tag, it might be used in an image or a link or a script source or an iFrame, any of those HREFs are URLs that need to be escaped in this way, and you don’t want to let the attacker provide the whole URL because then they might provide a JavaScript URL.  That would be an example of injecting down, so you want to be careful to make sure you put the untrusted data somewhere on the tail end of the URL, and then you want to make sure you escape it right using the percent escaping, and the last thing you need to be sure you don’t do is you want to make sure you don’t escape things like ampersands and question marks and equals.  You are intending to allow this URL to actually work because you will override the query string and the parameters within it, so you need to decide are you passing the entire URL as a parameter and so you want to escape those things or are you intending for someone to be able to click on this thing and you want to escape the data values within it.  That’s something where we’ve got one method available in this API to do this and we’ll probably extend that to provide a slightly different method if you want to escape the parts of a URL.

Jim Manico
 * Speaking of that, what do you think is the future for this XSS Cheat Sheet? Where do you go now?

Jeff Williams
 * Well, we’ve gotten a lot of great feedback from developers that find this really useful. This finally explains it, what they are actually supposed to do to stop cross-site scripting in a good way and we have taking it and we’re building it into a…so we’ve got actually the code to back this up, but really what we need to do is make sure that we get this into the hands of the developers that need it and what I would like to see is I would like to see more web apps move to a more templated kind of system where they are using components to generate their HTML and then the component designers will have to know these rules, but most developers won’t have to use these rules anymore, so I’d like to see JSF and the other templating systems or component systems start to build these rules into their frameworks so that most developers don’t need to know about this.  Currently, unfortunately, it’s really spotty.  OWASP did a project where we looked at all the different JSF components and unfortunately, it’s not clear which parts of the components do escaping properly and which don’t, and there are certainly some places that don’t do it properly, so there is still a lot of work here to do to get the world to a place where we can reasonably think about stamping out cross-site scripting.  This is just one piece of the foundation.  There is an awful lot of work that we’ve got left to do.  That said, I’ve done a lot of work with companies that are actively stamping out cross-site scripting.  They’ve taking these rules, they’ve built them into their own frameworks and libraries, and they’re making a big deal about it and it’s starting to work.  They are seeing significant reductions in the amount of cross-site scripting flaws that they have.  It just requires some diligence and you can stamp it out.  It’s an important thing to do because XSS represents a significant risk, particularly to the users of your system, but also to your system itself, so it’s a worthwhile goal.  Certainly from a risk perspective, we think it absolutely deserves to be right up there at the top of the OWASP Top Ten.

Jim Manico
 * You’ve been listening to the OWASP Top Ten 2010 XSS Cheat Sheet Podcast with Jeff Williams.  OWASP, the Open Web Application Security Project is a 501(c)(3) not for profit worldwide charitable organization focused on improving the security of application software.  Our mission is to make application security visible so that people and organizations can make informed decisions about true application security risks.  Everyone is free to participate in OWASP and all of our materials are available under a free and open software license.  For more information, please visit www.OWASP.org.