Nytro Posted January 25, 2023 Report Posted January 25, 2023 Krzysztof Pranczk Jan 3 Security Drone: Scaling Continuous Security at Revolut Intro As we’re continuously growing and extending our product offerings, we can face many technological challenges. These challenges are solved by our engineers and this results in many features, changes and updates being developed and successfully delivered to our customers. However, with the development of new features comes many security challenges that are faced by the internal Application Security Team. This lovely bunch is responsible for the security assurance of every new feature developed by our engineers. To provide the highest level of security assurance to our products, we’ve implemented a number of processes placed in different stages of the Software Development Life Cycle (SDLC), including automated scans in our CI/CD pipelines. However, it wouldn’t be possible to efficiently triage every security finding produced by automated scanners. In July 2022, there were nearly 39,000 commits created by over 900 authors! To address this challenge, we developed Security Drone. In this article, we’d like to share with you our approach to provide the highest security assurance in fast CI/CD environments. Challenges faced by Revolut The classic approach to security testing requires the security teams to perform a manual review of any developed features, with the help of automated security scans. Traditionally, this work was executed on a risk and priority basis and that need was stretching the team beyond its capacity. Additionally, the team had to cope with the numerous CI pipelines across the company. This approach wasn’t a viable solution in terms of scaling, quality and coverage. Some of you may have an idea of the security challenges faced in a fast CI/CD environment. If not, let us make a little recap of the challenges we’ve been facing: Software changes are constantly increasing New changes are integrated and deployed every day Engineers tend to prioritise the development of functionalities over security The internal application security team can’t be big enough to have a dedicated security engineer for each project — both internal and external AppSec teams nowadays must work on automating the work they were doing 10 years ago in a manual way With more tools integrated into the pipelines, the timeline of each job increases, which negatively affects the development experience And many more challenges you probably observe in your companies too! First solution — The classic approach to CI/CD pipeline scans Our first solution, and the most simplistic one, was to onboard automated security scanners like Static Application Security Testing (SAST) and Software Composition Analysis (SCA) and review the findings within the AppSec team. It worked, but we started facing another problem: the company was growing, and from an initial handful of projects, now we had to deal with thousands of projects and non-stop builds at any given time. As a result, we had to manage hundreds of CI pipelines used for security purposes. Let’s see some numbers of what we observed in our environment. Every 24h of a working day were about 950 new pull requests (PR) with nearly 1.85 commits per PR. During working hours, automated scans were executed 3–4 times per minute, on average, against various projects. The chart below presents how many security scans were performed on the 14th of July 2022 every 30 minutes. Number of automated security scans per 30minutes With those numbers, we faced another challenge — triaging all of the security findings. These scans produced a high number of false positive vulnerabilities that had to be manually triaged by the security team. Initially, we didn’t think scanning every software change was the way to go, we should only be scanning the changes intended for the Production environment. We did the analysis and concluded that about 81% of the commits had a final destination to the main branch, and we were facing the same challenge. Our scanners would be completing at least three successful security scans on a software change every minute! This amount of scans would still result in a potentially high number of false positives, which had to be reviewed and triaged within a certain period of time. Not only that, but more and more security scanners within CI/CD pipelines would affect their timelines negatively. So we thought: ‘Do we really have to go in this direction? Do we have to manage all of the pipelines for various projects and triage all of the identified security issues within the AppSec Team? Do we have to affect CI/CD timelines negatively?’ At this point, we had a lot of doubts and questions about this approach. In the Application Security team we understood this approach wouldn’t be scalable in a fast-paced environment such as Revolut. We could implement some small improvements to address each of these questions. However, those improvements could just delay bigger problems for later on. At this point, we decided to go in a slightly different direction, having a security-shift-left approach in our minds. Security static analysis tools can easily be shifted left to an earlier phase in the SDLC, since these types of tools don’t require the application to be running. Second solution — Security Drone You may expect as a second solution something extremely smart and well-designed. So did we. But it didn’t take long for us to take an agile approach with lean implementations. We also started security shifting left and communicating security issues to developers as early as possible, before going into testing or production environments. At the beginning of our MVP solution, we decided to scan code changes during the pull request phase. In our opinion, it was a natural step for engineers to propose software changes for their colleagues to approve. At this point, we also decided to communicate all of the identified security findings to developers. The scanner’s results could be taken into account during code review, which is an integral part of the development process in Revolut. It should be noted that our scanner was triggered when a new PR was created or code was updated in an already existing PR. We also decided to place Security Drone in a Kubernetes cluster to scan the code independently from CI/CD pipelines and have a centralised scans management. In many cases, independent scans were able to deliver results to developers faster than CI jobs were finished. Initially, we implemented only a SAST scanner solution to make the process as fast as possible, to quickly deliver results to developers. We aimed to provide results in under 300 seconds. We carefully researched available SAST solutions to choose the most suitable one for our needs. It had to be fast, well-documented and allow us to write custom rules to identify potential security issues specific to our environment. Last but not least, we wanted to achieve the lowest possible false positive rate to avoid producing irrelevant findings, to limit the amount of manual work that we had to cope with during triaging. But not only that, as we had decided to communicate all of the identified security findings to developers at the PR page, we had to make sure not to affect their experience negatively by reviewing a number of irrelevant findings. Later, we started implementing new scanners in Security Drone such as Software Composition Analysis and Infrastructure as Code to bring more value to the automated security testing. Security Drone’s high level architecture is presented below. Architecture of Security Drone Currently, we scan all pull requests created by Revolut engineers. In July, Security Drone performed over 39,000 scans. Median scanning time is below 112 seconds and the average is below 110 seconds. Initially, we used 19 SAST and 63 IaC rules. Only high and critical SCA issues were directly reported to our developers. Our first MVP was released in Q1 2022, extended and adjusted in the last months. Now, Security Drone has been operating in production for the last eight months without issues on both the operational and engineering distribution sides. We use the following tools in Security Drone: Semgrep — Static Application Security Testing Snyk Open Source — Software Composition Analysis Checkov — Infrastructure as a Code What have we achieved with Security Drone? We have adopted a shift-left approach to security to identify and communicate security findings earlier in the SDLC, before going into testing or production environments Security issues can be fixed before going into production, and as a result, they don’t have to be manually triaged by AppSec Team members Only merged security issues are reported to the AppSec Team to triage and loop into the vulnerability lifecycle process We lowered the false positive rate by carefully choosing the SAST solution and continual tuning of rules. We were able to achieve ~3.8% FP rate! Our centrally managed scanner currently scans 100% of the code in Revolut which saves us hundreds of hours of manual reviews. Here are some numbers from last 24 hours: • Nearly 1700 pull requests were scanned • Over 3900 scans associated with above PRs were performed Ability to find new vulnerabilities in other applications based on patterns The scans are fast and don’t disrupt the developer experience. They’re executed in parallel and scanning times are presented below: • Median scanning time for SAST is 11 seconds • Median scanning time for IaC is 22 seconds • Median scanning time for SCA is 101 seconds Increased security awareness and continuous learning amongst engineers. They’re also aware of the direction that AppSec is moving. What is next? Security Drone will always be under development as new technologies are emerging and improvements to the development experience can be made. On our roadmap we have various points, some of which include: Ability to flag findings as a false positive in a developer-friendly way Incremental SAST scans — scan only code changes in PRs Integration of more security scanners and the development of more SAST/IaC rules Keep your eyes peeled for our next blogpost around Application Security in Revolut, as we may share more interesting tools and guides on how we solve the challenges we face every day. Credits Credits go to every Revolut AppSec engineer involved in the design and development of Security Drone, especially: Arsalan Ghazi, Krzysztof Pranczk, Pedro Moura, Roger Norton Sursa: https://medium.com/revolut/security-drone-scaling-continuous-security-at-revolut-862bcd55956e Quote