Jump to content
Nytro

Windows CSRSS Write Up: Inter-process Communication

Recommended Posts

Posted (edited)

Windows CSRSS Write Up: Inter-process Communication (part 1/3)

In the second post of the Windows CSRSS Write Up series, I would like to explain how the practical communication between the Windows Subsystem and user’s process takes place under the hood. Due to the fact that some major improvements have been introduced in Windows Vista and later, the entire article is split into two parts – the first one giving an insight at what the communication channel really is, as well as how is it taken advantage of by both CSRSS and a user processes. The second one, on the other hand, is going to talk through the modifications and new features shipped with the Windows systems starting from Vista, as most of the basic ideas remain the same for decades. As you already know what to expect, proceed to the next section :-)

Local Procedure Calls

Before starting to analyze the mystery API interface implemented by CSRSS (otherwise known as CsrApi), one must first get some basic knowledge regarding the internal mechanism, used to establish a stable inter-process connection and actually exchange information.

The basics

LPC is a packet-based, inter-process communication mechanism implemented in the NT kernel (supported since the very first Windows NT versions – most likely 3.51). The mechanism was originally designed so that it was possible to communicate between modules running in different processor privilege levels – i.e. process – process, process – driver and driver – driver connections are equally well supported. This is possible thanks to the fact that the required API functions are exposed to both user-mode (via ntdll.dll) and kernel-mode (via ntoskrnl.exe). Even though we are mostly concerned by the first scenario (where numerous ring-3 processes communicate with csrss.exe), practical examples of the remaining two also exist – let it be the Kernel Mode Security Support Provider Interface (KSecDD.sys) communicating with LSASS.exe, for instance. Apart from being used by certain system processes talking to each other (e.g. Lsass verifying user credentials on behalaf of Winlogon), LPC is also a part of the RPC (Remote Procedure Call) implementation. What should be also noted is that the LPC mechanism is directed towards synchronous communication, and therefore enforces a blocking scheme, where the client must wait until its request is dispatched and handled, instead of continuing its execution. As mentioned in the Introduction section, Windows Vista has brought some major changes in this matter – one of these changes was the implementation of a brand new mechanism called ALPC (standing for Advanced or Asynchronous LPC – which one?), deprecating the old LPC mechanism. Since then, all the client – server requests are performed in an asynchronous manner, so that the client is not forced to wait for the response, for ages.

Underlying port objects

As it turns out, a great part of the Windows system functionalities internally rely on special, dedicated objects (implemented by the Object Manager) – let it be File System operations, Windows Registry management, thread suspension or whatever you can think of – the LPC mechanism isn’t any different. In this particular case, we have to deal with a port object, otherwise known as LpcPortObjectType. The OBJECT_TYPE structure, describing the object in consideration, is defined as follows:

kd> dt _OBJECT_TYPE 81feca90 /r

ntdll!_OBJECT_TYPE

+0x000 Mutex : _ERESOURCE

+0x038 TypeList : _LIST_ENTRY [ 0x81fecac8 - 0x81fecac8 ]

+0x040 Name : _UNICODE_STRING "Port"

+0x000 Length : 8

+0x002 MaximumLength : 0xa

+0x004 Buffer : 0xe1007110 "Port"

+0x048 DefaultObject : 0x80560960 Void

+0x04c Index : 0x15

+0x050 TotalNumberOfObjects : 0xdb

+0x054 TotalNumberOfHandles : 0xd9

+0x058 HighWaterNumberOfObjects : 0xdb

+0x05c HighWaterNumberOfHandles : 0xd9

+0x060 TypeInfo : _OBJECT_TYPE_INITIALIZER

+0x000 Length : 0x4c

+0x002 UseDefaultObject : 0x1 ''

+0x003 CaseInsensitive : 0 ''

+0x004 InvalidAttributes : 0x7b2

+0x008 GenericMapping : _GENERIC_MAPPING

+0x018 ValidAccessMask : 0x1f0001

+0x01c SecurityRequired : 0 ''

+0x01d MaintainHandleCount : 0 ''

+0x01e MaintainTypeList : 0 ''

+0x020 PoolType : 1 ( PagedPool )

+0x024 DefaultPagedPoolCharge : 0xc4

+0x028 DefaultNonPagedPoolCharge : 0x18

+0x02c DumpProcedure : (null)

+0x030 OpenProcedure : (null)

+0x034 CloseProcedure : 0x805904f3 void nt!ObReferenceObjectByName+0

+0x038 DeleteProcedure : 0x805902e1 void nt!ObReferenceObjectByName+0

+0x03c ParseProcedure : (null)

+0x040 SecurityProcedure : 0x8056b84f long nt!CcUnpinDataForThread+0

+0x044 QueryNameProcedure : (null)

+0x048 OkayToCloseProcedure : (null)

+0x0ac Key : 0x74726f50

+0x0b0 ObjectLocks : [4] _ERESOURCE

This object can be considered a specific gateway between two modules – it is being used by both sides of the communication channel, while not seeing each other directly at the same time. More precisely, the subject of our considerations are named ports, only; this is caused by the fact that the object must be easily accessible for every possible process. After the server correctly initializes a named port object – later utilized by the clients – it waits for an incoming connection. When a client eventually decides to connect, the server can verify whether further communication should or shouldn’t be allowed (usually based on the client’s CLIENT_ID structure). If the request is accepted, the connection is considered established – the client is able to send input messages and optionally wait for a response (depending on the packet type). Every single packet exchanged between the client and server (including the initial connection requests) begins with a PORT_MESSAGE structure, of the following definition:

//

// LPC Port Message

//

typedef struct _PORT_MESSAGE

{

union

{

struct

{

CSHORT DataLength;

CSHORT TotalLength;

} s1;

ULONG Length;

} u1;

union

{

struct

{

CSHORT Type;

CSHORT DataInfoOffset;

} s2;

ULONG ZeroInit;

} u2;

union

{

LPC_CLIENT_ID ClientId;

double DoNotUseThisField;

};

ULONG MessageId;

union

{

LPC_SIZE_T ClientViewSize;

ULONG CallbackId;

};

} PORT_MESSAGE, *PPORT_MESSAGE;

The above header consist of the most essential information concerning the message, such as:

  • DataLength
    Determines the size of the buffer, following the header structure (in bytes)
  • TotalLength
    Determines the entire size of the packet, must be equal sizeof(PORT_MESSAGE) + DataLength
  • Type
    Specifies the packet type, can be one of the following: //
    // LPC Message Types
    //
    typedef enum _LPC_TYPE
    {
    LPC_NEW_MESSAGE,
    LPC_REQUEST,
    LPC_REPLY,
    LPC_DATAGRAM,
    LPC_LOST_REPLY,
    LPC_PORT_CLOSED,
    LPC_CLIENT_DIED,
    LPC_EXCEPTION,
    LPC_DEBUG_EVENT,
    LPC_ERROR_EVENT,
    LPC_CONNECTION_REQUEST,
    LPC_CONNECTION_REFUSED,
    LPC_MAXIMUM
    } LPC_TYPE;

  • ClientId
    Identifies the packet sender by Process ID and Thread ID
  • MessageId
    A unique value, identifying a specific LPC message

Due to the fact that LPCs can be used to send both small and large amounts of data – two, distinct mechanisms of passing memory between the client and server were developed. In case 304 or less bytes are requested to be sent, a special LPC buffer is used and sent together with the header (described by Length and DataLength), while greater messages are passed using shared memory sections, mapped in both parties taking part in the data exchange. LPC Api

Due to the fact that LPC is an internal, undocumented mechanism (mostly employed by the system executables), one cannot make use of it based on the win32 API alone. However, a set of LPC-management native routines is exported by the ntdll module; using these functions, one is able to build his own LPC-based protocol and use it on his own favor (e.g. as a fast and convenient IPC technique). A complete list of the Native Calls follows:

  1. NtCreatePort
  2. NtConnectPort
  3. NtListenPort
  4. NtAcceptConnectPort
  5. NtCompleteConnectPort
  6. NtRequestPort
  7. NtRequestWaitReplyPort
  8. NtReplyPort
  9. NtReplyWaitReplyPort
  10. NtReplyWaitReceivePort
  11. NtImpersonateClientOfPort
  12. NtSecureConnectPort

The above list is somewhat correspondent to the cross-ref table for _LpcPortObjectType (excluding NtQueryInformationPort, NtRegisterThreadTerminatePort and a couple of other routines).. All of the functions are more or less documented by independent researchers, Tomasz Nowak and Bo Branten – a brief description of each export is available on the net, though most of the symbols speak by themselves anyway. Having the function names, let’s take a look at how the functions can be actually taken advantage of!

Server – Setting up a port

In order to make the server reachable for client modules, it must create Named Port by calling NtCreatePort (specyfing the object’s name and an optional security descriptor):

NTSTATUS NTAPI

NtCreatePort

(OUT PHANDLE PortHandle,

IN POBJECT_ATTRIBUTES ObjectAttributes,

IN ULONG MaxConnectInfoLength,

IN ULONG MaxDataLength,

IN OUT PULONG Reserved OPTIONAL );

When the LPC port is successfully created, it becomes visible to other, external modules – potential clients.

Server – Port Listening

In order to accept an inbound connection, the server starts listening on the newly created port, awaiting for the clients. This is achieved using a NtListenPort routine of the following definition:

NTSTATUS

NTAPI

NtListenPort

(IN HANDLE PortHandle,

OUT PLPC_MESSAGE ConnectionRequest); Being dedicated to the synchronous approach, the function blocks the thread and waits until someone tries to make use of the port. And so, while the server is waiting, some client eventually tries to connect…

Client – Connecting to a Port

Knowing that the port has already been created and is currently waiting (residing inside NtListenPort), our client process is able to connect, specifying the port name used during the creation proces. The following function will take care of the rest:

NTSTATUS

NTAPI

NtConnectPort

(OUT PHANDLE ClientPortHandle,

IN PUNICODE_STRING ServerPortName,

IN PSECURITY_QUALITY_OF_SERVICE SecurityQos,

IN OUT PLPC_SECTION_OWNER_MEMORY ClientSharedMemory OPTIONAL,

OUT PLPC_SECTION_MEMORY ServerSharedMemory OPTIONAL,

OUT PULONG MaximumMessageLength OPTIONAL,

IN ConnectionInfo OPTIONAL,

IN PULONG ConnectionInfoLength OPTIONAL );

Server – Accepting (or not) the connection

When a client tries to connect at one side of the port, the server’s execution track returns from NtListenPort, having the PORT_MESSAGE header filled with information. In particular, the server can access a CLIENT_ID structure, identifying the source process/thread. Based on that data, the server can make the final decision whether to allow or refuse the connection. Whatever option is chosen, the server calls a NtAcceptConnectPort function:

NTSTATUS

NTAPI

NtAcceptConnectPort

(OUT PHANDLE ServerPortHandle,

IN HANDLE AlternativeReceivePortHandle OPTIONAL,

IN PLPC_MESSAGE ConnectionReply,

IN BOOLEAN AcceptConnection,

IN OUT PLPC_SECTION_OWNER_MEMORY ServerSharedMemory OPTIONAL,

OUT PLPC_SECTION_MEMORY ClientSharedMemory OPTIONAL ); In case of a rejection, the execution ends here. The client returns from the NtConnectPort call with an adequate error code (most likely STATUS_PORT_CONNECTION_REFUSED), and the server ends up calling NtListenPort again. If, however, the server decides to proceed with the connection, another routine must be called: NTSTATUS

NTAPI

NtCompleteConnectPort

(IN HANDLE PortHandle);

After the above function is triggered, our connection is confirmed and read to go!

Server – Waiting for a message

After opening up a communication channel, the server must begin listening for incoming packets (or client-related events). Because of the specific nature of LPC, the server is unable to send messages by itself – rather than that, it must wait for the client to send a request, and then possibly respond with a piece of data. And so, in order to (as always – synchronously) await a message, the server should call the following function:

NTSTATUS

NTAPI

NtReplyWaitReceivePort

(IN HANDLE PortHandle,

OUT PHANDLE ReceivePortHandle OPTIONAL,

IN PLPC_MESSAGE Reply OPTIONAL,

OUT PLPC_MESSAGE IncomingRequest);

Client – Sending a message

Having the connection established, our client is now able to send regular messages, at the time of its choice. Moreover, the application can choose between one-side packets and interactive requests. By sending the first type of message, the client does not expect the server to reply – most likely, it is a short, informational packet. On the other hand, interactive messages require the server to fill in a return buffer of a given size. These two packet types can be sent using different native calls:

NTSTATUS

NTAPI

NtRequestPort

(IN HANDLE PortHandle,

IN PLPC_MESSAGE Request); or NTSTATUS

NTAPI

NtRequestWaitReplyPort

(IN HANDLE PortHandle,

IN PLPC_MESSAGE Request,

OUT PLPC_MESSAGE IncomingReply);

Apparently, the difference between these two definitions are pretty much obvious :-)

Server – Replying to incoming packets

In case the client requests data from the server, the latter is obligated to respond providing some output data. In order to do so, the following function should be used:

NTSTATUS

NTAPI

NtReplyPort

(IN HANDLE PortHandle,

IN PLPC_MESSAGE Reply);

Client – Closing the connection

When, eventually, the client either terminates or decides to close the LPC connection, it can clean up the connection by simply dereferencing the port object – the NtClose (or better, documented CloseHandle) native call can be used:

NTSTATUS

NTAPI

NtClose(IN HANDLE ObjectHandle);

The entire IPC process has already been presented in a visual form – some very illustrative flow charts can be found here (LPC Communication) and here (LPC Part 1: Architecture). All of the described functions are actually used while maintaining the CSRSS connection – you can check it by yourself! What should be noted though, is that the above summary covers the LPC communication (which can be already used to create an IPC framework), but tells nothing about what data, in particular, is being sent over the named port. Obviously, the Windows Subsystem manages its own, internal communication protocol implemented by both client-side (ntdll.dll) and server-side (csrsrv.dll, winsrv.dll, basesrv.dll) system libraries. In order to make it more convenient for kernel32.dll to make use of the CSR packets, a special subset of routines dedicated to CSRSS-communication exists in ntdll.dll. The list of these functions includes, but is not limited to:

  1. CsrClientCallServer
  2. CsrClientConnectToServer
  3. CsrGetProcessId
  4. CsrpConnectToServer

Thanks to the above symbols, it is possible for kernel32.dll (and most importantly – us) to send custom messages on behalf of the current process, without a thorough knowledge of the protocol structure. Furthermore, ntdll.dll contains all the necessary, technical information required while talking to CSRSS, such as the port name to connect to. The next post is going to talk over both client- and user- sides of the LPC initialization and usage, as it is practically performed – watch out :)

Conclusion

winobj.gif

All in all, a great number of internal Windows mechanisms make use of LPC – both low-level ones, such as the Windows debugging facility or parts of exception handling implementation, as well as high-level capabilities, including user credentials verification performed by LSASS. One can list all of the named (A)LPC port object present in the system using the WinObj tool by Windows Sysinternals. It is also highly recommended to create one’s own implementation of a LPC-based inter-process communication protocol – a very learning experience. An exemplary source code can be found in the following package: link. Have fun, leave comments and stay tuned for respective entries ;D

References

  1. LPC Communication
  2. Local Procedure Calls (LPCs)
  3. LPC Part 1: Architecture
  4. Sysinternals WinObj
  5. Windows Privilege Escalation through LPC
  6. Ntdebugging on LPC interface

Windows CSRSS Write Up: Inter-process Communication (part 2/3)

A quick beginning note: My friend d0c_s4vage has created a technical blog and posted his first text just a few days ago. The post entry covers a recent, critical libpng vulnerability discovered by this guy; the interesting thing is that, among others, the latest Firefox and Chrome versions were vulnerable. Feel free to take a minute and read the article here. Additionally, the video and mp3 recordings from the presentation performed by me and Gynvael on the CONFidence 2010 conference, are now publicly available on the official website: link (Case study of recent Windows vulnerabilities).

Foreword

A majority of the LPC /supposedly an acronym for Local Inter-Process Communication rather than Local Procedure Calls, as stated in WRK/ basics have been described in the first post of Inter-process Communication chapter, together with the corresponding, undocumented native functions related to LPC Ports. As you already have the knowledge required to understand higher abstraction levels, today I would like to shed some light on the internal Csr~ interface provided by NTDLL and extensively utilized by the Win32 API DLLs (kernel32 and user32).

API_levels.png

Introduction

As explained previously, LPC is an (officially) undocumented, packet-based IPC mechanism. It basically relies on two things – a Port Object and internal LPC structures, such as _PORT_HEADER – both unexposed to the Windows API layer. Due to the fact that CSRSS implements his own protocol on top of LPC, it would become highly inconvenient (and impractical) for the win32 libraries to take care of both LPC and CSRSS internals, at the same time. And so, an additional layer between the port-related functions and high-level API was created – let’s call it Native Csr Interface. The medium level of the call chain provides a set of helper functions, specifically designed to hide the internals of the communication channel from high-level API implementation. Therefore, it should be theoretically possible to re-implement the Csr-Interface using a different communication mechanism with similar properties, without any alterations being applied on the API level. This has been partially accomplished by replacing the deprecated LPC with an improved version of the mechanism – Advanced / Asynchronous LPC on modern NT-family systems (Vista, 7). In this post, the precise meaning, functionalities and definitions of the crucial Csr~ routines will be focused on. After reading the article, one should be able to recognize and understand specific CSR API calls found inside numerous, documented functions related to console management, process / thread creation and others.

Connection Initialization

What has already been mentioned is the fact that every application belonging to the win32-subsystem is connected to the Windows Subsystem process (CSRSS) at its startup, by default. Although it is technically possible to disconnect from the port before the program is properly terminated, such behavior is beyond the scope of this post entry. However, some details regarding a security flaw related to CSRSS-port disconnection in the context of a live process, can be found here and here (discovered by me and Gynvael). From this point on, it will be assumed that when the process is given execution (i.e. Entry Point, imported module’s DllMain or TLS callback is called), the CSRSS connection is already established. And so, the question is – how, and where the connection is set up during the process initialization. This section provides answers for both of these questions.

Opening named LPC port

During a process creation, numerous parts of the system come into play and perform their part of the job. It all starts with the parent application calling an API function (CreateProcess) – the execution then goes through the kernel, a local win32 subsystem, and finally – ring-3 process self-initialization (performed by the system libraries). A step-by-step explanation of the Windows process creation can be found in the Windows Internals 5 book, Chapter “Processes, Threads and Jobs”. As the CSRSS connection is not technically crucial for the process to exist (and execute), it can be performed later than other parts of the process initialization. And so, the story of establishing a connection with the subsystem begins in the context of a newly-created program – more precisely, inside the kernel32 entry point (kernel32!BaseDllInitialize). At this point, the CSRSS-related part of the routine performs the following call: view sourceprint?

1.BOOL WINAPI _BaseDllInitialize(HINSTANCE, DWORD, LPVID)

2.{

3.(...)

4.

5.CsrClientConnectToServer(L"\\Windows",BASESRV_INDEX,...);

6.

7.(...)

8.}

thus forwarding the execution to the ntdll.dll module, where a majority of the subsystem-related activities are performed. Before we dive into the next routine, two important things should be noted here:

  1. The Base Dll (kernel32) has complete control over the Port Object directory and makes the final decision regarding the referenced port’s name prefix. As it turns out, it is also possible for a different Object Directory to be used – let’s take a look at the following pseudo-code listing: view sourceprint?
    1.if(SessionId)
    2.swprintf(ObjectDirectory,L"%ws\\%ld%ws",L"\\Sessions",SessionId,L"\\Windows");
    3.else
    4.wcscpy(ObjectDirectory,L"\\Windows");
    The “SessionId” symbols represents a global DWORD variable, initialized inside the BaseDllInitialize function, as well: view sourceprint?
    1.mov eax, large fs:18h
    2.mov eax, [eax+30h]
    3.mov eax, [eax+1D4h]
    4.mov _SessionId, eax
    … translated to the following high-level pseudo-code:
    view sourceprint?
    1.SessionId = NtCurrentTeb()->SessionId;
    If one takes a look into the PEB structure definition, he will certainly find the variable: view sourceprint?
    1.kd> dt _PEB
    2.nt!_PEB
    3.(...)
    4.+0x154 TlsExpansionBitmapBits : [32] Uint4B
    5.+0x1d4 SessionId : Uint4B
    6.+0x1d8 AppCompatFlags : _ULARGE_INTEGER
    7.(...)
  2. If one decides to connect to the win32 subsystem, he must specify a particular ServerDll to connect to (csrsrv, basesrv, winsrv); the identification number is be passed as the second argument of CsrClientConnectToServer. As can be seen, kernel32 specifies the BASESRV_INDEX constant, as it desires to connect to a certain module – being basesrv in this case. Basesrv.dll is the kernel32 equivalent on the subsystem side – a Csr connection between these two modules is required for some of the basic win32 API calls to work properly. On the other hand, all of the console-management functionality is implemented by winsrv (to be exact – the consrv part of the module). And so – in order to take advantage of functions, such as AllocConsole, FreeConsole, SetConsoleTitle or WriteConsole – a valid connection with winsrv is also required. Fortunately – kernel32 remembers about it and issues a call to another internal function – ConDllInitialize() – after the LPC Port connection is successfully established. The routine’s obvious purpose is to set up the console-related structures inside the Base dll image, and use the CsrClientConnectToServer function with the second argument set to CONSRV_INDEX.

When we make a step into CsrClientConnectToServer and analyze further, a great amount of CSRSS-related initialization code surrounds us. Don’t worry – a huge part of the routine deals with user-mode structures and other irrevelant stuff – our interest begins, where the following call is made: view sourceprint?

1.if(!CsrPortHandle)

2.{

3.ReturnCode = CsrpConnectToServer(ObjectDirectory); // ObjectDirectory is kernel32-controlled

4.if(!NT_SUCCESS(ReturnCode))

5.return (ReturnCode);

6.}

As the above indicates, the global CsrPortHandle variable is compared with zero – if this turns out to be true, CsrpConnectToServer is called, taking the object directory string as its only argument. So – let’s face another routine ;> The proc starts with the following code: view sourceprint?

1.CsrPortName.Length = 0;

2.CsrPortName.MaxLength = 2*wcslen(ObjectDirectory)+18;

3.CsrPortName.Buffer = RtlAllocateHeap(CsrHeap,NtdllBaseTag,CrsPortName.MaxLength);

4.

5.RtlAppendUnicodeToString(&CsrPortName,ObjectDirectory);

6.RtlAppendUnicodeToString(&CsrPortName,L"\\");

7.RtlAppendUnicodeToString(&CsrPortName,L"ApiPort");

Apparently, the final Port Object name is formed here, and stored inside a local “UNICODE_STRING CsrPortName” structure. Next then, a special section is created, using an adequate native call: view sourceprint?

1.LARGE_INTEGER SectionSize = 0x10000;

2.NtStatus = NtCreateSection(&SectionHandle, SECTION_ALL_ACCESS, NULL, &SectionSize, PAGE_READWRITE, SEC_RESERVE, NULL);

3.

4.if(!NT_SUCCESS(NtStatus))

5.return NtStatus;

This section is essential to the process<->subsystem communication, as this memory area is mapped in both the client and win32 server, and then used for exchanging large portions of data between these two parties. And so, when the section is successfully created, the routine eventually tries to connect to the named port! view sourceprint?

1./* SID Initialization */

2.NtStatus = RtlAllocateAndInitializeSid(...,&SystemSid);

3.if(!NT_SUCCESS(NtStatus))

4.return NtStatus;

5.

6.NtStatus = NtSecureConnectPort(&CsrPortHandle,&CsrPortName,...);

7.RtlFreeSid(SystemSid);

8.NtClose(&SectionHandle);

For the sake of simplicity and reading convenience, I’ve stripped the remaining arguments from the listing; they describe some advanced connection characteristics, and are beyond the scope of this post. When everything is fine up to this point, we have an established connection (yay, CSRSS accepted our request) and an open handle to the port. Therefore, we can start sending first packets, in order to let CSRSS (and its modules – ServerDlls) know about ourselves. So – after returning back to ntdll!CsrClientConnectToServer: view sourceprint?

1.NtStatus = CsrpConnectToServer(ObjectName);

2.if(!NT_SUCCESS(NtStatus))

3.return NtStatus;

the following steps are taken: view sourceprint?

1.if(ConnectionInformation)

2.{

3.CaptureBuffer = CsrAllocateCaptureBuffer(1,InformationLength);

4.CsrAllocateMessagePointer(CaptureBuffer,InformationLength,&conn.ConnectionInformation);

5.RtlMoveMemory(conn.ConnectionInformation,ConnectionInformation,InformationLength);

6.}

7.CsrClientCallServer(&Message, CaptureBuffer, CSR_API(CsrpClientConnect), sizeof(ConnStructure));

First of all, the ConnectionInformation pointer is checked – in case it’s non-zero, the CsrAllocateCaptureBuffer, CsrAllocateMessagePointer and RtlMoveMemory functions are called, respectively. The purpose of these operations is to move the data into a shared heap in such a way, that both our application and CSRSS can easily read its contents. After the “if” statement, a first, real message is sent to the subsystem using CsrClientCallServer, of the following prototype: view sourceprint?

1.NTSTATUS CsrClientCallServer(PCSR_API_MSG m, PCSR_CAPTURE_HEADER CaptureHeader, CSR_API_NUMBER ApiNumber, ULONG ArgLength);

For a complete, cross-version compatible table and/or list of Csr APIs, check the following references: CsrApi List and CsrApi Table. And so, in the above snippet, the “CsrpClientConnect” API is used, providing additional information about the connecting process. This message is handled by an internal csrsrv.CsrSrvClientConnect routine, which redirects the message to an adequate callback function, specified by the ServerDll being connected to (in this case – basesrv!BaseClientConnectRoutine). After sending the above message, the connection between the client- and server-side DLLs (i.e. kernel32 and basesrv) can be considered fully functional. As it turns out, parts of the execution path presented above can be also true for CSRSS itself! Because of the fact that ntdll!CsrClientConnectToServer can be reached from inside the subsytem process, the CsrClientConnectToServer routine must handle such case properly. And so – before any actions are actually taken by the function, the current process instance is checked, first: view sourceprint?

01.NtHeaders = RtlImageHeader(NtCurrentPeb()->ImageBaseAddress);

02.CsrServerProcess = (NtHeaders->OptionalHeader.Subsystem == IMAGE_SUBSYSTEM_NATIVE);

03.

04.if(CsrServerProcess)

05.{

06.// Take normal steps

07.}

08.else

09.{

10.// Do nothing, except for the _CsrServerApiRoutine pointer initialization

11._CsrServerApiRoutine = GetProcAddress(GetModuleHandle("csrsrv"),"CsrCallServerFromServer");

12.}

Apparently, every process connecting to the LPC Port that has the SUBSYSTEM_NATIVE header value set, is assumed to be an instance of CSRSS. This, in turn, implies that CSRSS is the only native, system-critical process which makes use of the Csr API calls.

Data tranmission

Having the connection up and running, a natural order of things is to exchange actual data. In order to achieve this, one native call is exported by ntdll – the CsrClientCallServer function, already mentioned in the text. Because of the fact that each Csr API requires a different amount of input/output data (while some don’t need these, at all) from the requestor, as well as due to the LPC packet-length limitations, the messages can be sent in a few, different ways. In general, all of the CSR-supported packets can be divided into three, main groups: empty, short, and long packets. Based on the group a given packet belongs to, it is sent using an adequate mechanism. This section provides a general overview of the data transmission-related techniques, as well as exemplary (practical) use of each type.

Empty packets

  • Description “Empty packets” is a relatively small group of purely-informational messages, which are intended to make CSRSS perform a specific action. These packets don’t supply any input data – their API ID is the only information needed by the win32 subsystem. A truely-empty packets don’t generate any output data, either.

  • Sending Due to the fact that “empty packets” don’t supply any additional information, the only data being transferred is the internal _PORT_HEADER structure. The address of a correctly initialized PortHeader should be then passed as the first CsrClientCallServer parameter. The shared section doesn’t take part while sending and handling these packets. What is more, no serious input validation is required by the API handler, because there is no input in the first place. The routine is most often supposed to perform one, certain action and then return. Unsupported APIs, statically returning the STATUS_UNSUCCESSFUL or STATUS_NOT_SUPPORTED error codes, can also be considered “empty packets”, as they always behave the same way, regardless of the input information.

  • Examples One, great example of an empty-packet is winsrv!SrvCancelShutdown. As the name implies, the APIs purpose is pretty straight-forward – cancelling the shutdown. Seemingly, no input / output arguments are necessary: view sourceprint?
    01.; __stdcall SrvCancelShutdown(x, x)
    02._SrvCancelShutdown@8 proc near
    03.call _CancelExitWindows@0 ; CancelExitWindows()
    04.neg eax
    05.sbb eax, eax
    06.and eax, 3FFFFFFFh
    07.add eax, 0C0000001h
    08.retn 8
    09._SrvCancelShutdown@8 endp
    As shown above, the handler issues a call to the CancelExitWindows() function, and doesn’t make use of any of the two parameters. Another CsrApi function of this kind is basesrv!BaseSrvNlsUpdateCacheCount, always performing the same task:
    view sourceprint?
    01.; __stdcall BaseSrvNlsUpdateCacheCount(x, x)
    02._BaseSrvNlsUpdateCacheCount@8 proc near
    03.cmp _pNlsRegUserInfo, 0
    04.jz short loc_75B28AFC
    05.push esi
    06.mov esi, offset _NlsCacheCriticalSection
    07.push esi
    08.call ds:__imp__RtlEnterCriticalSection@4 ; RtlEnterCriticalSection(x)
    09.mov eax, _pNlsRegUserInfo
    10.inc dword ptr [eax+186Ch]
    11.push esi
    12.call ds:__imp__RtlLeaveCriticalSection@4 ; RtlLeaveCriticalSection(x)
    13.pop esi
    14.loc_75B28AFC:
    15.xor eax, eax
    16.retn 8
    17._BaseSrvNlsUpdateCacheCount@8 endp
    A few more examples can be found – looking for these is left as an exercise for the reader.

Short packets

  • Description The “short packets” group describes a great part of the Csr messages. Every request, passing actual data to / from CSRSS but fitting in the LPC-packet length restriction belongs to this family. And so – most fixed-size (i.e. these, that don’t contain volatile text strings or other, possibly long chunks of data) structures are indeed smaller than the 304-byte limitation.

  • Sending As this particular type requires additional data to be appended at the end of the _PORT_MESSAGE structure, a set of API-specific structs has been created. All of these types begin with the standard LPC PortMessage header, and then specify the actual variables to send, e.g.: view sourceprint?
    1.struct CSR_MY_STRUCTURE
    2.{
    3.struct _PORT_HEADER PortHeader;
    4.BOOL Boolean;
    5.ULONG Data[0x10];
    6.DWORD Flags;
    7.};
    Such amount of data can be still sent in a single LPC packets. And so, a custom structure, beginning with the _PORT_HEADER field must be used as a first CsrClientCallServer argument. The Capture Buffer technique remains unused, thus the second parameter should be set to NULL.

  • Examples As for the examples, it is really easy to list a couple:
    1. winsrv!SrvGetConsoleAliasExesLength
    2. winsrv!SrvSetConsoleCursorMode
    3. winsrv!SrvGetConsoleCharType
    4. basesrv!BaseSrvExitProcess
    5. basesrv!BaseSrvBatNotification

    The above handlers take a constant number of bytes as the input, and optionally return some data (of static length, as well).

Long packets

  • Description From the researcher’s point of view, the “long packets” group is doubtlessly the most interesting one. Due to the fact that they are used to send/receive large amounts of data (beyond the maximum size of a LPC message), a special mechanism called a Shared Section is used for transferring these messages. Let’s take a look at the details.

  • Initialization Do you remember the ntdll!CsrpConnecToServer function? At some point, between forming the port name and establishing the connection, we could see a weird NtCreateSection(0×10000) call. As it turns out, this section is a special memory area, mapped in both the client and server processes. After creating the section, its handle is passed to CSRSS through the NtSecureConnectPort native call. Once the win32 subsystem receives a connection request and accepts it, the section is mapped into the server’s virtual address space. Next then, CSRSS provides its client with some basic memory mapping information – such as the server-side base address and view size. Based on the supplied info, a few global variables are initialized (CsrProcessId, CsrObjectDirectory), with CsrPortMemoryRemoteDelta being the most important one for us: view sourceprint?
    1.CsrPortMemoryRemoteDelta = (CSRSS.BaseAddress - LOCAL.BaseAddress);
    Basically, the above variable is filled with the distance between the server- and user- mappings of the shared memory. This information is going to appear to be crucial to exchange information, soon. Furthermore, a commonly known structure called “heap” is created on top of the allocation: view sourceprint?
    1.CsrPortHeap = RtlCreateHeap(0x8000u, LOCAL.BaseAddress, LOCAL.ViewSize, PageSize, 0, 0);
    From this point on, the shared heap is going to be used thorough the whole communication session, for passing data of various size and content. The functions taking advantage of the heap are:
    1. CsrAllocateCaptureBuffer
    2. CsrFreeCaptureBuffer
    3. CsrAllocateMessagePointer (indirect)
    4. CsrCaptureMessageBuffer (indirect)
    5. CsrCaptureMessageString (indirect)

    [*]All of the above routines are apparently related to the “Capture Buffer” mechanism, described in the following section.

    • Capture Buffers In order to fully understand the idea behind Capture Buffers, one should see it as a special box, a container designed to hold data in such a way, that it can be easily accessed by both sides of the communication (i.e. be offset-based rather than VA-based etc). Such structure is determined by the following characteristics:
      1. Number of memory blocks: one Capture Buffer is able to hold mulitple data blocks – e.g. a couple of strings, describing a specific object (like a console window).
      2. Total size: the total size of the container, including its header, pointer table, and the data blocks themselves.

      So – these “data boxes” are used to transfer data between the two parties. In order to illustrate this complex the mechanism, suppose we’ve got the following structure: view sourceprint?

      01.struct CSR_MESSAGE

      02.{

      03._PORT_HEADER PortHeader;

      04.LPVOID FirstPointer;

      05.LPVOID SecondPointer;

      06.LPVOID ThirdPointer;

      07.LPVOID ForthPointer;

      08.LPVOID FifthPointer;

      09.} m;

      The above packet is going to be sent to CSRSS after the initialization takes place. Having the above declared, we can take a closer look at each of the CA-related functions:

      1. CsrAllocateCaptureBuffer(ULONG PointerCount, ULONG Size) Allocates an adequate number of bytes from CsrHeap: (Size + sizeof(CAPTURE_HEADER) + PointerCount*sizeof(LPVOID)) … and returns the resulting pointer to the user. Right after the allocation, the CaptureBuffer structure contents look like this: CaptureBuffer = AllocateCaptureBuffer(5,20); CaptureBuffer.png
        Due to the fact that no messages have been allocated from the CaptureBuffer yet, Capture.Memory is a single memory block, while the Capture.Pointers[] array remains empty.
      2. CsrFreeCaptureBuffer(LPVOID CaptureBuffer) Frees a given CaptureBuffer memory area, by issuing a simple call: view sourceprint?
        1.RtlFreeHeap(CsrHeap,0,CaptureBuffer);
      3. CsrAllocateMessagePointer(LPVOID CaptureBuffer, ULONG Length, PVOID* Pointer) The routine allocates “Length” bytes from the CaptureBuffer’s general memory block. The address of the newly allocated block is stored inside *Pointer, while Pointer is put into one of the Capture.Pointers[] items. Example:
        view sourceprint?
        1.CsrAllocateMessagePointer(CaptureBuffer,3,&m.FirstPointer);
        CaptureBuffer2.png
      4. Having three (out of twenty) bytes allocated, one can copy some data: view sourceprint?
        1.RtlCopyMemory(m.FirstPointer,"\xcc\xcc\xcc",3);
        After all of the five allocations are made, the CaptureBuffer structure layout can look like this:
        CaptureBuffer3.png
        It is important to keep in mind that the pointers into CaptureBuffer.Memory[] must reside in the actual LPC message being sent to the server – the reason of this requirement will be disclosed, soon :-)
      5. CsrCaptureMessageBuffer(LPVOID CaptureBuffer, PVOID Buffer, ULONG Length, PVOID *OutputBuffer) The routine is intended to simplify things for the developer, by performing the CaptureBuffer-allocation and copying the user specified data at the same time. Pseudocode:
        view sourceprint?
        1.CsrAllocateMessagePointer(CaptureBuffer,Length,OutputBuffer);
        2.RtlCopyMemory(*OutputBuffer,Buffer,Length);
      6. CsrCaptureMessageString(LPVOID CaptureBuffer, PCSTR String, ULONG Length, ULONG MaximumLength, PSTRING OutputString) Similar to the previous routine – allocates the requested memory space, and optionally copies a specific string into the new allocation.

      [*]After the Capture Buffer is allocated and initialized (all N memory blocks are in use), it’s time to send the message, already! This time, we fill in the second parameter of the CsrClientCallServer routine with our CaptureBuffer pointer. When the following call is issued: view sourceprint?

      1.CsrClientCallServer(&m,CaptureBuffer,API_NUMBER,sizeof(m)-sizeof(_PORT_HEADER));

      … and the 2nd argument is non-zero, a couple of interesting conversions are taking place in the above routine. This is the time when the CsrPortMemoryRemoteDelta value comes into play. First of all, the data-pointers residing in the CSR_MESSAGE structure (&m) are translated to a server-compatible virtual address, by adding the RemoteDelta. From now on, the m.FirstPointer, m.SecondPointer, …, m.FifthPointer are invalid in the context of the local process, but are correct in terms of server-side memory mapping. view sourceprint?

      1.for( UINT i=0;i<PointerCount;i++ )

      2.*CaptureBuffer.Pointers += CsrPortMemoryRemoteDelta;

      Furthermore, the CaptureBuffer.Pointers[] array is altered, using the following pseudo-code: view sourceprint?

      1.for( UINT i=0;i<PointerCount;i++ )

      2.CaptureBuffer.Pointers -= &m;

    So, to sum everything up – after the address/offset translation is performed, we’ve got the following connection between the LPC message and shared buffer:

    • m.CaptureBuffer points to the server’s virtual address of the CaptureBuffer base,
    • CaptureBuffer->Pointers[] contain the relative offsets of the data pointers, i.e. (&m+CaptureBuffer->Pointers[0]) is the pointer to the first capture buffer,
    • (&m+CaptureBuffer->Pointers[n]) points to the server’s virtual address of the n-th capture buffer.

    [*]Or, the same connection chain illustrated graphically looks like this: CaptureBuffer4.png

    [*]When both the local CSR_MESSAGE and shared CaptureBuffer structures are properly modified, ntdll!CsrClientCallServer calls the standard NtRequestWaitReplyPort LPC function, and waits for an optional output. When the native calls returns, all of the modified struct fields are restored to their original values, so that the user (or, more likely – win32 APIs) can easily read the error code and optional subsytem’s output.

    Due to the fact that the VA- and offset-related conversions are non-trivial to be explained in words, I strongly advice you to check the information presented in this post by yourself. This should give you even better insight at how the cross-process data exchange reliability is actually achieved.

  • Sending What’s been already described – if one wants to make use of large data transfers, he must allocate a CaptureBuffer, specifying the number of memory blocks and the total byte count, fill it with the desired data (using CsrCaptureMessageBuffer or CsrCaptureMessageString), and call the CsrClientCallServer, supplying an LPC structure, (containing the data-pointers into CaptureBuffer) as the first parameter, and the CaptureBuffer itself – as the second one. The rest of the job is up to ntdll. Please keep in mind that one CaptureBuffer can be technically utilized only once – and therefore, it should be freed after its first (and last) usage, using CsrFreeCaptureBuffer.

  • Examples In this particular case, every CsrApi handler using the CsrValidateMessageBuffer import makes a good example, let it be:
    • winsrv!SrvAllocConsole
    • winsrv!SrvSetConsoleTitle
    • winsrv!SrvAddConsoleAlias

    … and numerous other functions, which are pretty easy to find by oneself.

Conclusion

This post entry aimed to briefly present the “Native Csr Interface” – both in terms of the functions, structures and mechanisms playing some role in the Inter-Process Communication. As you must have noted, only client-side perspective has been described here, as the precise way of CSRSS receiving, handling and responding to the request is a subject for another, long article (or two). And so – if you feel like some important Csr~ routines should have been described or mentioned here – let me know. On the other hand, I am going to cover the remaining, smaller functions (such as CsrGetProcessId) in one, separate post called CSRSS Tips & Tricks. Watch out for (part 3/3) and don’t hesitate to leave constructive comments! ;)

Cheers!

Sursa 1: Windows CSRSS Write Up: Inter-process Communication (part 1/3) | j00ru//vx tech blog

Sursa 2: Windows CSRSS Write Up: Inter-process Communication (part 2/3) | j00ru//vx tech blog

Edited by Nytro

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...