XSScrapy: fast, thorough XSS vulnerability spider

Nytro · August 31, 2014

XSScrapy: fast, thorough XSS vulnerability spider

Posted on August 20, 2014 by Dan McInerney — 7 Comments ?

https://github.com/DanMcInerney/xsscrapy

Unsatisfied with the current crop of XSS-finding tools, I wrote one myself and am very pleased with the results. I have tested this script against other spidering tools like ZAP, Burp, XSSer, XSSsniper, and others and it has found more vulnerabilities in every case. This tool has scored me dozens of responsible disclosures in major websites including an Alexa Top 10 homepage, major financial institutes, and large security firms’ pages. Even the site of the Certified Ethical Hacker certificate progenitors fell victim although that shouldn’t impress you much if you actually know anything about EC-Council . For the record they did not offer me a discounted CEH. Shame, but thankfully this script has rained rewards upon my head like Bush/Cheney on Halliburton; hundreds of dollars, loot, and Halls of Fame in just a few weeks of bug bounty hunting.

I think I’ve had my fill of fun with it so I’d like to publicly release it now. Technically I publicly released it the first day I started on it since it’s been on my github the whole time but judging by the Github traffic graph it’s not exactly the Bieber of security tools. Hopefully more people will find some use for it after this article which will outline it’s logic, usage, and shortcomings.

Basic usage

Install the prerequisite python libraries, give it a URL, and watch it spider the entire site looking in every nook and cranny for XSS vulnerabilities.

apt-get install python-pip

git clone https://github.com/DanMcInerney/xsscrapy

cd xsscrapy

pip install requirements.txt

scrapy crawl xsscrapy -a url="http://example.com" To login then scrape:

scrapy crawl xsscrapy -a url="http://example.com/login" -a user=my_username -a pw=my_password

All vulnerabilities it finds will be places in formatted-vulns.txt. Example output when it finds a vulnerable user agent header:

XSS attack vectors xsscrapy will test

Referer header (way more common than I thought it would be!)
User-Agent header
Cookie header (added 8/24/14)
Forms, both hidden and explicit
URL variables
End of the URL, e.g. www.example.com/<script>alert(1)</script>
Open redirect XSS, e.g. looking for links where it can inject a value of javascript:prompt(1)

XSS attack vectors xsscrapy will not test

Other headers

Let me know if you know of other headers you’ve seen XSS-exploitable in the wild and I may add checks for them in the script.

Persistent XSS’s reflected in pages other than the immediate response page

If you can create something like a calendar event with an XSS in it but you can only trigger it by visiting a specific URL that’s different from the immediate response page then this script will miss it.

DOM XSS

DOM XSS will go untested.

CAPTCHA protected forms

This should probably go without saying, but captchas will prevent the script from testing forms that are protected by them.

AJAX

Because Scrapy is not a browser, it will not render javascript so if you’re scanning a site that’s heavily built on AJAX this scraper will not be able to travel to all the available links. I will look into adding this functionality in the future although it is not a simple task.

Test strings

There are few XSS spiders out there, but the ones that do exist tend to just slam the target with hundreds to thousands of different XSS payloads then look for the exact reflection. This is silly. If < and > are filtered then <img src=x onerror=alert(1)> is going to fail just as hard as <script>alert(1)</script> so I opted for some smart targeting more along the logical lines ZAP uses.

9zqjx When doing the initial testing for reflection points in the source code this is the string that is used. It is short, uses a very rare letter combination, and doesn’t use any HTML characters so that lxml can accurately parse the response without missing it.

'"()=<x> This string is useful as it has every character necessary to execute an XSS payload. The “x” between the angle bracket helps prevent false positives that may occur like in some ASP filters that allow < and > but not if there’s any characters between them.

'"(){}[]; Embedded javascript injection. The most important character for executing an XSS payload inside embedded javascript is ‘ or ” which are necessary to bust out of the variable that you’re injecting into. The other characters may be necessary to create functional javascript. This attack path is useful because it doesn’t require < or > which are the most commonly filtered characters.

JaVAscRIPT:prompt(99) If we find an injection point like: <a href=”INJECTION”> then we don’t need ‘, “, <, or > because we can just use the payload above to trigger the XSS. We add a few capital letters to bypass poorly written regex filters, use prompt rather than alert because alert is also commonly filtered, and use 99 since it doesn’t require quotes, is short, and is not 1 which as you can guess is also filtered regularly.

Logic

Xsscrapy will start by pulling down the list of disallowed URLs from the site’s robots.txt file to add to the queue then start spidering the site randomly choosing 1 of 6 common user agents. The script is built on top of web spidering library Scrapy which in turn is built on the asynchronous Twisted framework since in Python asynchronosity is simpler, speedier and more stable than threading. You can choose the amount of concurrent requests in settings.py; by default it’s 12. I also chose to use the lxml library for parsing responses since it’s 3-4x as fast as the popular alternative BeautifulSoup.

With every URL the script spiders it will send a second request to the URL with the 9zqjx test string as the Referer header and a third request to the URL with /9zqjx tacked onto the end of the URL if there are no variables in the URL in order to see if that’s an injection point as well. It will also analyze the original response for a reflection of the user agent in the code. In order to save memory I have also replaced the original hash-lookup duplicate URL filter stock in Scrapy with a bloom filter.

When xsscrapy finds an input vector like a URL variable it will load the value with the test string 9zqjx. Why 9zqjx? Because it has very few hits on Google and is only 5 characters long so it won’t have problems with being too long. The script will analyze the response using lxml+XPaths to figure out if the injection point is inbetween HTML tags like <title>INJECTION</title>, inside an HTML attribute like <input value=”INJECTION”>, or inside embedded javascript like var=’INJECTION’;. Once it’s determined where the injection took place it can figure out which characters are going to be the most important and apply the appropriate XSS test string. It will also figure out if the majority of the HTML attributes are enclosed with ‘ or ” and will apply that fact to its final verdict on what may or may not be vulnerable.

Once it’s found a reflection point in the code and chosen 1 of the 3 XSS character strings from the section above it will resend the request with the XSS character string surrounded by 9zqjx, like 9zqjx’”()=<x>9zqjx, then analyze the response from the server. There is a pitfall to this whole method, however. Since we’re injecting HTML characters we can’t use lxml to analyze the response so we must to use regex instead. Ideally we’d use regex for both preprocessing and postprocessing but that would require more time and regex skill than I possess. That being said, I have not yet found an example in the wild where I believe this would have made a difference. Definitely doesn’t mean they don’t exist.

This script doesn’t encode its payloads except for form requests. It will perform one request with the unencoded payload and one request with an HTML entity-encoded payload. This may change later, but for now it seems to me at least 95% of XSS vulnerabilities in top sites lack any filtering at all making encoding the payload mostly unnecessary.

After the XSS characters string is sent and the response processed using regex xsscrapy will report its finding to the DEBUG output. By default the output of the script is set to DEBUG and will be very verbose. You can change this within xsscrapy/settings.py if you wish. If it doesn’t find anything you’ll see “WARNING: Dropped: No XSS vulns in http://example.com”.

It should be noted that redirects are on which means it will scrape some domains that aren’t just part of the original domain you set as the start URL if a domain within the start URL redirects to them. You can disable redirects by uncommenting REDIRECT_ENABLED in xsscrapy/setting.py.

Manually testing reported vulnerabilities

If you see a hit for a vulnerability in a site, you’ll need to go manually test it. Tools of the trade: Firefox with the extensions HackBar and Tamper Data. Go to formatted-vulns.txt and check the “Type:” field.

header: Use Firefox + Tamper Data to edit/create headers and payload the header value.

url: Just hit the URL with Firefox and change the parameter seen in “Injection point” to a payload

form: Use Firefox + HackBar. Enter the value within the “POST url” field of the xsscrapy report into the top half of HackBar then check the box “Enable Post data” and enter your variable and payload in the bottom box, e.g., var1=”><sVG/OnLoaD=prompt(9)>&var2=<sVG/OnLoaD=prompt(9)>

end of url: Use Firefox, go to the URL listed within the xsscrapy report, and add your payload to the end of the URL like example.com/page1/<sVG/OnLoaD=prompt(9)>

The “Line:” fields are just there to quickly identify if the vulnerability is a false positive and to see if it’s an HTTP attribute injection or HTML tag injection. Recommended payloads:

Attribute injection: “><sVG/OnLoaD=prompt(9)>

Between tag injection: <sVG/OnLoaD=prompt(9)>

Future

I would like the payloaded request URLs to not automatically be URL encoded as they are now. This hasn’t been a huge deal so far, but it would be a nice addition. It seems as easy as monkey-patching the Request class and eliminateing safe_url_string but I haven’t had success yet.
Payloads appended to the end of cookies should be added so we can exploit that vector. Done 8/24/14.
Add ability to scrape AJAX-heavy sites. ScrapingHub has some middleware but still not the simplest task.
iframe support Done 8/25/14

Sursa: XSScrapy: fast, thorough XSS vulnerability spider | Dan McInerney