Filtering the Crap, Content Security Policy (CSP) Reports

Nytro · April 15, 2020

Filtering the Crap, Content Security Policy (CSP) Reports

13 days ago

Stuart Larsen #article

It's pretty well accepted that if you collect Content Security Policy (CSP) violation reports, you're going to have to filter through a lot of confusing and unactionable reports.

But it's not as bad as it used to be. Things are way better than they were six years ago when I first started down the CSP path with Caspr. Browsers and other User Agents are way more thoughtful on what and when they report. And new additions to CSP such as "script-sample" have made filtering reports pretty manageable.

This article will give a quick background, and then cover some techniques that can be used to filter Content Security Policy reports.

What is a Content Security Policy report?

If you're new to Content Security Policy, I'd recommend checking out An Introduction To Content Security Policy first.

Content Security Policy has a nifty feature called report-uri. When report-uri is enabled the browser will send a JSON blob whenever the browser detects a violation from the CSP. (For more info: An Introduction to report-uri). That JSON blob is the report.

Here's a random violation report from my personal website https://c0nrad.io:

Report: Violation report from c0nrad.io on an inline style

The report has a number of tasty details:

blocked-uri: inline. The blocked-uri was an 'inline' resource
violated-directive: style-src-elem. The violated directive was a CSS style element (it means <style> block as opposed to "style=" attr (attribute) on an HTML element)
source-file / line-number: https://c0nrad.io/ / 8. The inline resource came from file https://c0nrad.io on line 8. If you view-source of https://c0nrad.io, it's still there
script-sample: .something { width: 100%} 'The first 40 characters are .something { width :100%}.

These reports are a miracle when getting started with CSP. You can use them to quickly determine where your policy is lacking. You can even use it to build new policies from scratch. It's actually how tools like CSP Generator automatically build new content security policies. Just by parsing these reports.

Why filter Content Security Policy reports?

If the violation reports are so amazing, why do we want to filter them? It seems a little counter intuitive at first, but the sad reality is that not all reports are created equal.

Here's some of the inline reports that Csper's has received on it's own policy. Only three of them are from a real inline script in Csper (which I purposely injected):

Figure: Sample violation reports generated by Content Security Policy

For more fun, I highly recommend checking out this amazing list of fun CSP reports: csp-wtf

What's frustrating is that a large percentage of reports received from CSP are unactionable. They're not really related to the website. These "unactionable" can come from a lot of different places. The most common is extensions and addons. There's also ads, content injected from ISPs, malware, corporate proxies, custom user scripts, browser quirks, and a sprinkle of serious "wtf" reports.

Filtering Techniques

The goal of filtering is to remove the unactionable reports, so that you're only left with reports that should be looked into. But of course you don't want to filter too much such that you lose reports that really should of been analyzed (such as an XSS on your website).

They are somewhat listed in order of importance+ease+reliability.

Blacklists

The easiest way to filter out a huge number of reports by applying some simple blacklisting rules.

I think everyone either directly or indirectly has taken a page from Neil Matatall's/Twitter's book back in 2014: https://matatall.com/csp/twitter/2014/07/25/twitters-csp-report-collector-design.html

Some more lists:

Depending on your use-case, it maybe be better to classify them, and then selectively filter out those classifications later (just incase you actually need the reports). Some buckets I found to work well are 'extension', 'unactionable'.

But this technique alone cuts out ~50% of the weird reports.

Malformed Reports

Another easy way to filter reports is to make sure they have all the necessary fields (and that fields like effective-directive are actually a real directive). If it's missing some fields it's probably not worth time investigating. It's probably a very old or incomplete user agent. All the fields can be found in the CPS3 spec.

You could argue that maybe the users being XSS'ed are on a very old browser that doesn't correctly report on all fields, and so if you filter them out you're going to miss the XSS that needs to be patched. Which is definitely fair. But with browser auto-updating I think/hope most people are on a decently recent browser. And also (this should not be a full excuse not to care), but people on very outdated browsers probably have a large number of other browser problems to worry about. And also if multiple users are being XSS's, the majority of them are probably on a competent user agent that will report all the fields, so it will be picked up.

It comes down to how much time/resources an organization has to dedicate to CSP. Something is better than nothing. And this case, this something can save you hours, for a pretty small chance of something falling through the cracks. I recommend adding a label to reports that are missing important fields (or using egregious values) to be categorized as 'malformed', and just kept to the side so they can be skimmed every once in awhile.

Bots

Another easy way to filter out 'unactionable' reports is to check if the User Agents belongs to a Bot. A number of web scrapers inject javascript into the pages they are analyzing. (The bots also have CSP enabled). Which seams silly at first, but they're probably just using headless chrome or something.

Some example user agents:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; compatible; BuiltWith/1.0; +http://builtwith.com/biup) Chrome/60.0.3112.50 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/61.0.3163.59 Safari/537.36 Prerender (+https://github.com/prerender/prerender)
Mozilla/5.0 (compatible; woorankreview/2.0; +https://www.woorank.com/)
Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/)

Since these UserAgents inject their own javascript not related to the website, it's not worth the time investigating them.

Script-Sample

If report-sample is enabled (which I highly recommend it be enabled), you can start filtering out reports on the first 40 characters in the script-sample field.

A good starting point is the csp-wtf list.

A quick note of caution though for websites running in 'Content-Security-Policy-Report-Only'. If you automatically filter out anything that matches these script-samples, an attacker could attempt to use an XSS that starts with one of those strings to avoid detection. If it's a DOM based XSS it'll be very hard to determine what is an injection vs what is a DOM based XSS (more on that later).

Browser Age

One filtering technique that Csper started supporting this week is filtering on Browser Age. Older browsers (and less common browsers) have some fun CSP quirks (and sadly probably more malware, toolbars, etc, which all cause unactionable reports), so if you're short on resources, they reports should probably be looked at less.

So when a report is received, you take the User Agent + Version, look up the release date of that User Agent, and if it's older than some time period (2 years) label it as an older user agent. This cuts out like 15% of the reports.

The same argument still holds of "what if the XSS victim is using an old browser". Again I think that it is up to the website's security maturity and available resources to determine what an appropriate level of effort is. But for the average website, giving less attention to the >2yrs old browsers but giving more attention to the rest of the reports, instead of being over flooded by reports and doing nothing, is infinitely better. The reports are still there for those who want to look at them.

line-number / column-number analysis

Modern browsers make a good attempt to add Line-Number/Column-Number to the violation reports. (The PR for chrome).

So when there's a resource that doesn't have a line-number/column-number, it's a good cause for an eyebrow raise. A lot of reports also use "1' or "0" as the line number. These can also be a great signal for something odd.

I found that usually a line number of 0/1 signifies that the resources was "injected" after the fact. (As in it was not part of the original HTML). This could be things like SPAS (angular/react) injecting resources, or browsers injecting content scripts, or a DOM based XSS.

Unfortunately (at least for modern chrome), I couldn't find a way to determine the difference between a DOM based XSS, and something injected by a browser script.

For example here's a report of a DOM based XSS I injected myself through an angular innerHTML. It looks pretty much the same as a lot of extension injections with a line-number of 1:

But it is still interesting when a report has a line-number of 1. So inline reports can either be split into categories of "inline" or "injected". The injected will contain most of the browser stuff, but could also contain DOM based XSS's, so still needs to be looked at.

I hope in the future that source-file will better accurately reflect where the javascript came from, and we can filter out all extension stuff with great ease.

SourceFile / DocumentURI

In a somewhat related vein, stored or reflected XSS's should have a matching sourcefile/documenturi (obviously not the case for DOM or more exotic XSS's). In some of the odd reports the source file will be from something external (such as a script from Google Translate).

If you're specifically looking to detect a stored/reflected XSS, a mismatch can be a nice indication that maybe the report isn't as useful.

Somewhat related, Firefox also doesn't include sourcefile on eval's from extensions, which can help reduce eval noise. (They can be placed in the extension bucket).

Other Ideas

Similar Reports From Newer Browser Versions

Browsers are getting way better at fully specifying what content came from an extension. For example below it's pretty obvious that this report is from an extension (thanks to the source-file starting with moz-extension). This report came from a Useragent with Firefox/Windows/Desktop released 22 days ago.

The next report most likely came from the same extension, but from the report it's not obvious where the report came from. This UserAgent is Firefox/Windows/Desktop but released 9 months ago.

{
  "csp-report": {
    "blocked-uri": "inline",
    "column-number": 1,
    "document-uri": "https://csper.io/blog/other-csp-security",
    "line-number": 1,
    "original-policy": "default-src 'self'; connect-src 'self' https://*.hotjar.com https://*.hotjar.io https://api.hubspot.com https://forms.hubspot.com https://rs.fullstory.com https://stats.g.doubleclick.net https://www.google-analytics.com wss://*.hotjar.com; font-src 'self' data: https://script.hotjar.com; frame-src 'self' https://app.hubspot.com https://js.stripe.com https://vars.hotjar.com https://www.youtube.com; img-src 'self' data: https:; object-src 'none'; script-src 'report-sample' 'self' http://js.hs-analytics.net/analytics/ https://edge.fullstory.com/s/fs.js https://js.hs-analytics.net/analytics/ https://js.hs-scripts.com/ https://js.hscollectedforms.net/collectedforms.js https://js.stripe.com/v3/ https://js.usemessages.com/conversations-embed.js https://script.hotjar.com https://static.hotjar.com https://www.google-analytics.com/analytics.js https://www.googletagmanager.com/gtag/js; style-src 'report-sample' 'self' 'unsafe-inline'; base-uri 'self'; report-uri https://csper-prod.endpoint.csper.io/",
    "referrer": "",
    "script-sample": "(() => {\n        try {\n            // co…",
    "source-file": "https://csper.io/blog/other-csp-security",
    "violated-directive": "script-src"
  }
}

It's not perfect, but it may be possible to group similar reports together and perform the analysis on the latest user agent. But you have to be careful that you don't aggressively group reports together to the point where an attacker could attempt to smuggle XXS's that start with (() => {\n try {\n // co to avoid detection on report-only deployments.

Hopefully as everyone moves to very recent browsers we can just filter on the source-file. There was also a little chatter about adding the sha256-hash to the report, that would also make this infinitely more feasible (but, people would need to be on more recent versions of their browsers to send the new sha256, and by that point we'll already have the moz-extension indicator in the source-file).

'Crowd Sourced' Labeling

Another idea that I've been mulling over is 'Crowd Sourced' labeling. What if people could mark reports as "unactionable" (somewhat like the csp-wtf list)? Or "this report doesn't apply to my project". These reports be aggregated and then displayed to other users of a report-uri endpoint as "other users have marked this report as unactionable". For people just getting started with CSP this could be nice validation to ignore a report.

Or specifically if there's XSS's with a known payload, people could mark as "this was a real XSS", and other people get that indication when there's a similar report in their project.

Due to my privacy/abuse concerns this idea has been kicked down the road. It would need to be rock solid. As of right now (for csper) there is no way for a customer to glean information about another customer, and obviously this is how things should be. But maybe in the future there could be an opt-in anonymized feature flag for this. But not for many months at least. If this is interesting to you (because it's a good idea, or a terrible idea, I'd love to hear your thoughts!) stuart@csper.io.

Conclusion

A dream I have is that one day most everyone could actually use Content-Security-Policy-Report-Only and get value with almost no work. If individuals are using the latest user agents, and if an endpoint's classification is good enough, websites could roll out CSP in report-only mode for a few weeks to establish a baseline of known inline reports and their positions, and then the endpoint will know where expected inline resources exist, and then only alert website owners on new reports it thinks that are an XSS. XSS detection for any website for almost no work.

We're not there yet. But browsers and getting better at what they send, and classification of reports is getting easier.

I hope this was useful! If you have any ideas or comments I would love to hear them! stuart at csper.io.

Automatically Generating Content Security Policy

A guide to automatically generating content security policy (CSP) headers. Csper builder collection csp reports using report-uri to generate/build a policy online in minutes.

Content Security Policy (CSP) report-uri

Technical reference on content security policy (CSP) report-uri. Includes example csp reports, example csp report-uri policy, and payload

Sign In

Filtering the Crap, Content Security Policy (CSP) Reports

Recommended Posts

Nytro

Filtering the Crap, Content Security Policy (CSP) Reports

What is a Content Security Policy report?

Why filter Content Security Policy reports?

Filtering Techniques

Blacklists

Malformed Reports

Bots

Script-Sample

Browser Age

line-number / column-number analysis

SourceFile / DocumentURI

Other Ideas

Similar Reports From Newer Browser Versions

'Crowd Sourced' Labeling

Conclusion

Automatically Generating Content Security Policy

Content Security Policy (CSP) report-uri

Other Security Features of Content Security Policy

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Pages