Hi everyone and welcome to this bonus course on Xml external entities. The video that you're about to watch was recorded as part of the 2017 a wasp top 10 list. But xml external entities is still very relevant today, even though it was not included on the 2021 list. It was actually merged into a larger topic for a broader discussion. But like I said, Xml external entities, abbreviated X. X. E by the way, is still a very relevant security risk today. So I hope you enjoy this video from the 2017 list. In this course we are going to look at the number four security risk on the list. And that is XmL external entities. This is the acronym for this is X. X. E. It's kind of a play on the sound of the word external for that second. Ex so while the acronym should be probably X. E. You'll actually see it written X. X. E. So it's X. M. L. And then that external entities nonetheless, X. X. The that is the number four security risk on the list. I will mention this quickly. This is the first time that Xxi has been on the list of the top 10 and it has jumped all the way to the number four spot as it's, you know, introductory appearance on the list. So this is this is an interesting an interesting thing which kind of kind of goes to show that one. It's it's it's a pretty prevalent you know, issue out there today. And and and it's and it's fairly new in the sense that, you know, if it was really prevalent and really dangerous, you know, several years ago then it would have been on the list you know, back then, but but anyway, but nonetheless it's number four on the list. So we'll dive into the xml external entities alright, before we get into xml external entities, I want to go over just a couple of quick things about Xml itself. Xml stands for extensible markup language. It's a popular data format that's used extensively around the internet. And beyond. You've you can see different examples there, soap and rest and html, doc x. You know, you see sometimes you'll see like in a word document or PowerPoint document or whatever, it'll be like dot D O C X or dot pp T X, that X is the the xml part. So interesting stuff, even some picture formats like svg, things like that. Nonetheless, it's used extensively, it's it's even used like in Samuel assertions if you use sample for like single sign on and Federated access, that kind of thing. Samuel uses xml format for for the assertions that it uses. Anyway, it was designed to store and transport data and in order to interpret the data from an xml file, you need an xml parse or sometimes you'll see it called an xml processor and so in the context of web application, your web application is going to need some sort of xml processor. Xml parse to look at the file and interpret the data in the file. You can see there's some example syntax it's got some, you know, weird characters. I'm actually going to start to use my little mouse point here, so I can kind of point out a couple things, you know, weird little syntactical things with less than signs and question marks and all this kind of stuff, but nonetheless, it'll it'll say, you know, something related to xml, the version number of xml you're talking about, but then it's going to have these fields, so you'll notice here kind of this opening you know, marker that says note and then the note is going to have a to field from a heading and a body. And so this is going to be to Alice from bob heading, his greetings. And then the body of the note is hi Alice. And then you close out the notes. So you'll this is kind of like a with html format. You have the you know, these these greater than less than signs with a little slash to denote the end of that field. And then without the slash at the beginning, so this note is comprised of these parts and pieces and each of these are going to have, you know, some sort of a defined value inside of each of those. So this just gives you an idea of like what an xml file might look like something to this effect. Of course, they can get extremely extensive, extremely big and all that stuff. But this gives you an idea. Okay, the in terms of a web application, here's a simple example of how a web application would accept xml input and then parse that input. Remember we talked about the parcel, which is like the xml processor, it's the it's the engine that that is going to be able to look at the xml file and then you know, interpreted and that kind of thing. And then it's and in this specific web application example is going to output the results of the file. Okay? So a client request comes in here, it's a post, and then which is the method. And then you have this example dot com slash xml with the version of http, whatever. But it's going to have this note and it says hello Alice. And then the end of the note. Right, So, super, simple thing right here. But nonetheless, this is an xml file technically that is going to be posted to this web application. The web application is going to accept that as input. It's going to parse the xml and then it's going to output the result. So when it outputs the result, what are you going to see from the server response? You're going to see this http I know this is one dot this is 1.1. That may just be that's just an example of where the server may support. Hp version one or or in this specific response it's going to be, you know http version one, whereas the request was a 1.1 but nonetheless it's not important. So it's a 200. Ok. Which means that that it accepted the post everything went through Okay. And it's going to say hello Alice, which is exactly what the note said. So super simple stuff but that just gives you an idea of how a web application could accept input in the form of an xml file and then parse it and then output the results. Okay. A couple of definitions to be aware of here on xml they can be specific types and then the parse the processor is going to validate if that document adheres to the correct type before it processes the document. So the idea here is that you have a specific type of xml document and then the parse needs to make sure that the contents of that document adhere to the type that's defined before it's going to parse anything. So so anyway, a couple of different type definitions that you'll see out there for xml documents are xml schema definition or X S D. And then the other one is a document type definition or D T D the X X. E. Vulnerabilities specifically, since that's what we're talking about here in this video. They occur in the document type definitions, so that's important to know as well that the type of Xml document that we are talking about here with respect to X. X. E. Vulnerabilities occur in the document type definition type of xml document. So because of that, we're not really going to talk about the excess D type definition in this video specifically because we're really going to kind of key in on the risk of xxi or the, you know, external entities which means we're going to need to really dig into the D. T. D. Type of xml document. Alright. A. D. T. D. The document type definition. It defines the structure, the legal elements, the attributes of an XML document. And then what this allows you to do is independent entities can agree on the standard for interchanging data by using the DTD. Think of this as the rules, the structure if you will of what the document is going to look like. It's going to have header field, it's going to have to and from and whatever. So it's the underlying structure of what the elements it's going to have in it and that kind of thing. It doesn't define the actual text of the document, but it describes and it defines the format, the structure of what this document is going to look like [COUGH]. And then a web application can use that DTD to verify that the data is valid. So, this makes sense in the sense that if the DTD is your foundational document that defines what this XML document is, it's the structure, it's all that. Then a web application can then look to the DTD almost like the rule book, if you will. To say, hey, I've got this XML document it's type DTD, the Document Type Definition. And so based on the DTD rules then the web application can then look at and parse this XML document to know if it's even structured correctly or not. So I think you get the idea there. So that's what the DTD can do and does frankly. Here's an example of a DTD and I'm going to use the little mouse pointer here again. So again, it's <?xml version = "1" that beginning part. And then you're always going to see <!DOCTYPE note [ here for a DTD a document type definition. So it's a <!DOCTYPE note [ you'll see that actual <!DOCTYPE note [. This one specifically is called note and then there are several elements defined within the <!DOCTYPE of note[. So there's this <!ELEMENT note (to, from, heading, and body)>. So any time you see this <!ELEMENT note (to, from, heading and body)>. The <!ELEMENT to (# PCDATA)> and these are different. You can have different types of definitions of what the element can be. So this is just one example, (# PCDATA). You can go look at all the XML, DTD structure types and all the element definitions and all that. There's all kinds of different things that you can define here. But nonetheless, the to field in this document or in this XML, DTD is going to be of (#PCDATA). And then the from is the same, and heading is the same and body is the same. So that when you get down here, this is the actual XML file down here, that is based on these rules that are outlined in the document type definition in the DTD. So you have <note> which is the element and then this thing had better have to, from, heading, body. Well, here it is <to> Alice</to>. <from>Bob</from>. <heading>Greetings</heading>. And <body)>Hi Alice!<body>. And </note>. And the <to> Alice</to> which conforms to the type of element field that was defined up here. So it conforms to this (#PCDATA) text. And so you can start to see that this DTD structure starts to form the rules that this part of the XML document needs to live by. So that's the idea of what's going on here with a DTD so keep that in mind. External entities happen in a DTD. And they are entities that are used, I'm sorry XML entities period are used to define shortcuts to special characters. So you can imagine you start to get all kind of special characters possibly that kind of thing. And then these entities can be declared as either internal or external entities. And then an entity is going to have three different parts. Specifically it's going to have an ampersand (&) and then the name of the entity and then it's going to end with a semicolon(:). So when you define the syntax of the entity, then you'll have this !ENTITY and then you are going to have the name of the entity. And then this right here SYSYEM you'll see SYSTEM sometimes you'll see public sometimes as the type of entity. So again you have internal or external. This one right here happens to be external because it is pointing to a URI/URL which means it's going to lock to an external place to define the structure of this thing. And then you'll see SYSTEM or public that's almost like a public or private you type of an entity. So without getting into the really, really nitty gritty details of that. The bottom line here is that you can have an entity that has a name and then it's going to be defined as either basically public or private. So you'll see SYSTEM or public right here. And then this last part right here can either have a pointer or an address as it were to an external URL/URI or it can have the actual data set that defines this entity. So that's where the internal or external comes into play. If this is internal, then you just define the entity right here within these quotation marks. If it's external, then you point to a URL where all that stuff is defined out here in the external URL. So now you can start to get an idea of not only XML, which is this data format type. And then you can start to get the feel for the entities within a DTD. And then you can get an idea that these entities are going to be internal or external. So, anyway, so with all that said, here is an example of a DTD with an external entity. So here is <?xml version "1.0"?>. Remember <!DOCTYPE that's DTD. You're going to see <!DOCTYPE book[ for DTD. In this case it's a book is what this is called. And then there's an <!ELEMENT chapter ANY>. Remember this is like that (#PCDATA), but this can be anything. But the thing that I wanted to key in on here is the <!ENTITY Chapter 1. This is SYSTEM it's not public, which means it's essentially like a private entity. And then this is an external entity because it links to an external URL right here. So it points to this URL "www.example.com/Chapter1.dtd">. So this is going to be the actual structure of what this entity has. And then the way that you call this is you say <book>&Chapter1;</book> which is the <!DOCTYPE book[ up here. And then you remember that ampersand, the name of the entity and the semi colon. You put those parts in the call to this. And when those are put in there and then you end that, book syntax here, then it calls the parts and pieces as it were of the chapter one, DTD or this external entity that's defined here at this U R L. All right, so again, you can see the idea that you can have an XML file that has a DTD or it's a type DTD. And then you can have an entity defined within that DTD that then points out to this external place in order to pull in the structure of what this XML document needs to do. But you can also call that entity which would then effectively make the XML parsel come out to this external location in order to parse through this XML file. All right, so that's kind of the idea of what's going on here. So with that I wanted to spend just a quick minute on the OWASP literature here for XML external entities. You'll notice that the exploitability gets a ranking of the 2, which means it's fairly exploitable, it's not like super easy to exploit, but it's not that hard either. The prevalence is a 2 as well, which means it's fairly widespread, it's not extremely widespread but it's also not hard to find either, right, so it's out there. Detectability on this one is the three, which is the highest rank that you can get, which means that it's super easy to detect if your web application is vulnerable to this security risk. You'll notice here that the literature talks about SAST tools and DAST tools that you can use to to discover this issue. so you can of course do manual testing as well, but anyway, but you can use these different SAST and DAST tools to run against your web application to see if you're vulnerable to XML external entities. And then finally, technical impact is a three because you can really do some damage on this thing, you can do denial of service attacks. You can extract sensitive information, you can execute remote request from the server, you can do some really, really powerfully bad things via an XML external entity exploit. Okay, so again you run all these numbers through the formula that the OWASP uses. And you land on a value for this specific security risk and based on the ranking, it landed on the number four risk on this version, the 2017 version of the OWASP top 10. Okay [COUGH] Couple of things to look at with respect to external entity risks, an XML input containing a reference to an external entity could be processed by a weakly configured XML parser. And we've talked about this already that if you reference an external entity, then it could go through the processing of maybe a malicious file or a malicious command if you don't have a properly configured XML parser, right? And then this could lead to the disclosure of confidential data, it could be a denial of service attack, service side request forgery. You could do ports scanning all kinds of stuff that you it's one of these, how you can get as creative as you really want to from an attacker perspective, but you can do a lot of bad things I guess is the message here. Okay, so those are some risks associated with external entity. I wanted to run through a couple of quick examples and then show you a couple of attack types as well that you could use with an XML external entity. All right, so this one right here is actually not an external entity but an internal entity. And the reason we know that is because this right here this entity is defined as var, it's like variable whatever. But then it's not pointing to an external URL, it just gives the text of the word world right here. And so the var is defined in this case as the text world. All righty, so again you have, let's say, you have a web application and it accepts a client request in the form of a post method and then you send it, to this URL with all, this is your typical post whatever. And you send it this DOCTYPE, remember DTD and it's got an element of greeting, right? And the greeting has an entity of var and the value of the entity var is world. All righty so world is the value. So then down here, when you actually run the element greeting and you say greeting and then you say hello and then you put in that, remember you have to have the ampersand and then the name of the entity and then the semi colon. So you run that syntax then and then you close out the greeting portion here then what that's going to do is it's going to say hello and then it's going to put the value of this entity var right here. Okay so we'll click through and show you what the server response is going to look like. So here's the server response, it's the HTTP 200, okay everything came through fine and it's going to say Hello World. All right so because the nay or I'm sorry, the value of that entity var was world right, remember that? So it's just going to say Hello World. Okay so here's a denial of service attack that could be run against an improperly configured XML parser that if you have that as part of your web application then you could be susceptible to this. So here's the way that this thing works and it's I don't know, it makes me laugh a little bit but not really because it is a denial service. It will totally crash your XML parser but the basis of this is just a bunch of lols like laugh out loud, right, that's the little text syntax for laugh out loud. All right which is why it's called billion laughs DoS attack. All right, so again you have a post to your web application and then XML version all that stuff, remember DOCTYPE, right? This is a DTD and DOCTYPE here is lolz and it's got an element that is lolz and remember we're going to define that here in in this case it's pc data. And then there are all these entities that are defined within the element lolz or lols, right? So entity lol is defined as the text, so this is not actually an external entity, this is an internal entity. But it is part of what could be a client request to your web application. And again, if your web application is willing to parse this XML file, then you will be DoS attacked with this thing. And so anyway, so the way it works is again, lol is the name of the entity that has the value lol this text right here and then you have an entity, lol1 and lol1 has embedded in it. Ten calls back to lol. So when you run the entity lol1, then basically you're running lol 10 different times, which is going to output the value lol. So it's going to be, lol, and then lol, and then lol, 10 different times, right? But then you have lol2 here, and I think you can start to see the the embedding that starts to take place here, these layers of lols, right? The embedding inside of the embedding, inside of the embedding and blah, blah, blah. So lol2 is going to be lol1, which this right here in and of itself is 10 lols, right? And then this one is 10 lols, and then this one is 10 lols. So lol2 specifically has 10 rounds of lol1 and lol1 is 10 lols in and of itself, right? And then you're going to have lol3 and it has lol2, which lol2 is 10 rounds of lol1 and lol1 is 10 lols in and of itself. So anyway, so you can start to see all of this, you get all the way up to nine. And so what this becomes is 10, because they have 10 lol embeddings in all of them. It's 10 raised to the 9th power. And if you run 10 to the 9th power, you'll see what that number actually equals, and then you'll start to get a clue as to why this is called the billion laughs DoS attack. All right, so then to call this thing, then you run lolz, which is this element right here, which has, again, embedded in it, all of these different entities. And so you're going to say lolz, but then you're going to call, which one are you going to call? You're not going to call lol, because if you called lol, you would just get one lol, right? So you're going to call lol9, and then calling lol9, what you've done is you've called lol8, 10 times. But then, when you call lol8, you've called lol7 all those different times. So you can see, it just starts to back all the way up. So that at the very end of it, you've got 10 to the 9 lolsthat are output on your screen or the parser is going to have to chunk through all that stuff. And so the reason that it's called a DoS attack, a denial of service attack, is the memory requirements that need to be there from an XML parser perspective, in order to chunk through every single one of these and output all of the different lols. You're probably going to crash the memory of your XML parser. So just to give you a quick idea of what the server response looks like on this thing, it's goes, X H B, okay, blah blah blah. And then just all these lols right? With a little dot, dot, dot, whatever. And then you can see 10 to the 9th equals 1 billion, I mean, that's just a math fact, right? So there's going to be 1billion different lols that this thing is going to run through, this is not anywhere close to that, of course, that's why I put the little dot, dot, dot, right? Okay, so again, if you have an improperly configured XMK parser, then this is going to be allowed to be run against your web application, and it could completely shut down your ex XML parser. And then potentially your web application itself. because it's just taken up every bit of the memory. Which by the way, what a lot of XML partners have done to to defend against this type of attack, is they just limit the amount of memory that's able to be used from an XML parser perspective. So if you only give it x amount of memory, then as its chunking through this, it's going to hit the limit of the memory that it's allowed to hit, and it's going to chunk through it all. It's just going to take a really, really long time. Anyway, but then that also saves resources in terms of memory and all that for other processing in your web application. Okay, so that is an example of an XML entity. Again, it was actually defined internally, not externally, but that is a DOS attack that could be run against your web application. So anyway, something to keep in mind. All right, here is an example of a password attack, where you have again, a post against your web application and then here's the XML version, DOCTYPE, remember that DTD. This is a note that has the element greeting. And then this one is not an internal XML entity, this is an external XML entity, because it's got the, remember this is the system which is kind of the private version of an entity. This entity is called xxe, I just want to call it that. But then this is going to point, not to an internally defined value here, but it's going to point to an external location, right? So it's going to point to this file, and then what we're trying to get is /etc/password, which is the file system of the actual web application that you're dealing with here. So if an attacker were to send this post against your web application, and your XML parser actually were allowed to run this, then it would go into the file system of your web application and pull out the contents of /etc/password, right? And so when you run the greeting, which is this element here, when you run the greeting and then you run the the entity command. So remember ampersand in the name of the entity and then semi colon ,and then you close out the greeting here. Again, if it's allowed to do this, then it is going to reach into the file system of the web application and then it's going to return the contents of the /etc/password file. So here's a quick little picture of how that works. You have an attacker here that runs that post attack, the post that we just said against your web application. And then again, if this thing is is going to allow the external entity to run against that specific file, /etc/password. It's going to pass it on to the XML parser, that's going to parse the document based on the dtd, definitions of that XML file. And it's going to see that those are valid, the the file's built correctly and all that stuff. But it's going to grab the contents of etc password and so what's going to happen is it's going to come and say /etc/password file contents. And it's going to pass those back to the screen of the attacker. I mean, that's going to be the server response kind of like the server response on all those other ones were like Hello World or a bunch of lols, right? This one is going to have t whatever's in /etc/password, the Root account, the Ben account, the there's a John account, whatever, right? This is a very overly simplified version of what it would be. But you get the idea here, that the actual /etc/password file contents are going to be passed back to the attacker in the form of the server response. All right, so this is not good, I mean, this is another reason why it's like, hey, this xxii stuff jumped all the way up to number four because this could disclose some very sensitive data. You could run system commands on things like this if you get really creative with how you format your XML post. Request to the web application. And so anyway, so this could get really dangerous, all right. The actual server response on something like this may look something like this right here, so http 200, okay. And then it's going to show you stuff like this root, daemon, bin this that kind of thing. These if you know the structure of the sc password file then you can start to look into this. This x basically means that yes, there is a password it's encrypted and there's like shadow files that start to come into play here, that kind of thing. You can look into the details of how the sc password file itself is structured and formatted and all that kind of stuff. But nonetheless, this is not good in terms of actually returning the actual contents of the sc password file back to what would be the attackers. So in this case this is the several response of an attack against the sc password file using an improperly configured XML partner. So anyway, so a couple of things now to step into, in terms of the vulnerability of your web application. It's good to look at this applications and like it says here in particular XML based web services or downstream integrations. I mean look at how your web application in integrates with other downstream applications or systems or that kind of thing. You could be vulnerable to attack if these different things are applicable. If you accept XML directly or XML uploads from untrusted sources or you insert untrusted data into XML documents. And then if that's allowed to be parsed by an XML processor, then that is not a good thing, you could totally be vulnerable in that case. If any of the XML processors in your web application or SOAP based web services has DTDs enabled. Remember we talked about the whole DTD thing earlier, that's a big part of this. If you use sample for identity processing, I mentioned that earlier sample uses XML for identity assertions and it could be vulnerable to this type of attack. So something to think about. If you use SOAP prior to version 1.2, it's likely susceptible to external entity attacks if the entities are being passed to the SOAP framework. So again, if you use SOAP then look at the version number, if your prior to 1.2, then you may want to bump that up to 1.2 or better. And then that last one being vulnerable to XXE attacks means that you are likely vulnerable to denial of service attacks, which goes back to that billion laughs denial of service attack that we went through earlier. So anyways, so there's just a lot of things that could go wrong with your web application, if you have some of these vulnerabilities. A couple other things to mention here, developer training is essential to identify and mitigate these XXE attacks or vulnerabilities. So make sure you look at this, I mean if you feel like you're vulnerable based on all the stuff that we've talked about then make sure your developers go through some training on what XXE is. How your web applications use the XML partners that are used. How do they parse XML documents and all the all that stuff. And then if you want to prevent XXE then whenever it's possible, use less complex data forms. Json is a good example, avoid serialization of sensitive data. So those are just a couple of things to keep in mind. You need to patch or upgrade all of your processors. [COUGH] Either the ones in use by the application or the underlying operating system, use dependency checkers. And then like I said, SOAP 1.2 or higher is where you need to be if you use SOAP. [COUGH] And then if it's possible disable XML external entity and DTD processing in all XML partners in the application. So if you can at all do it just disable DTD processing, disable external entity. And then finally you can implement a positive whitelisting server side input validation. You can filter sanitization to prevent hostile data, you can see the rest there, within your documents and headers or notes. So those are just a few other things you can do to protect yourself. A couple last things that I would mention is you need to verify the XML or the XSL [COUGH] file uploads are functionality that validate incoming XML XSD validation. The XSD that goes back to one of the types of XML documents that we didn't really get into today because we were talking more about DTD. But look at all that stuff that essentially you need to validate incoming XML using XSD validation or similar. So anyway and then I mentioned this before as well, but SAST tools can really help detect XXE in your source code, manual code review is the best. But SAST and even SAST tools are certainly good to use. And then if these are not possible for whatever reason, look at patching API security gateways are a good idea. Web application firewalls, I'll take just a quick minute on this, I have not mentioned WAFs before. But web application firewalls in general are built to defend against frankly the OWASP top 10 type security risks and vulnerabilities. So if you can implement WAF, I mean that's a good thing. I've talked to security professionals around the world that have said, hey, once upon a time I used to just say just build secure web applications and you don't really need WAF, right? But now, they have come around where they're like, hey, certainly still build secure web applications, but it's good to put a WAF in front of your web application as well. It's going to catch a lot of this stuff. So if you happen to forget, in terms of the creation of your web application or the way life goes is you don't always have just a greenfield brand, new blank canvas as it were to create a brand new beautiful web application on your own that is completely free of bugs and vulnerabilities and all that stuff. And so the reality of the world that we live in is you're going to take code from someone else that built it or a web application that was designed by a different team or whatever. Now you're the one administering it or trying to secure it or whatever, and you may not really know exactly how it's built or why in the world did they do certain things or whatever. So [COUGH] anyway, so having a WAF in front of your web application is just a good idea and it helps with frankly a lot of these OWASP top 10 security risks. So anyway, so WAFs are a good idea and then finally can see down there the cheat sheet. I know I've mentioned cheat sheets before, this is a really good one. XML external entity prevention. You can see the link there, go check it out. And it's got just a whole bunch of stuff from general guidance down to very specific things to look at from different programming, code review that kind of stuff. That you can really dig into to make sure you prevent your web application from the vulnerability, the security risk of XML external entity attacks. So that kind of wraps it up for XML external entity. I hope you've learned a couple of things about just what XML is, what these external entities are and how those could cause problems for your web application. And again, I mean this one did not even used to be on the list at all. And now suddenly and its premiere appearance as it were. It jumps all the way up to number four. So it's out there and its prevalence and it can cause a lot of problems if you don't secure your web application against this security risk. So with that, I will say thank you for hanging in there with us today. Like I said, I hope you've learned a few things about XML external entities and come on back and check out the next course in the path here, the skills learning path where we'll look at the number five, security risk on the top 10. We're almost halfway there. So again, thanks for hanging in there, and I hope you have a great day.