The Windows Win32 thread synchronization tutorial: Asynchronous Procedure Calls (APC), synchronization internals and critical section objects

Windows Thread Synchronization 7

Asynchronous Procedure Calls

An asynchronous procedure call (APC) is a function that executes asynchronously in the context of a particular thread. When an APC is queued to a thread, the system issues a software interrupt. The next time the thread is scheduled, it will run the APC function. An APC generated by the system is called a kernel-mode APC. An APC generated by an application is called a user-mode APC. A thread must be in an alertable state to run a user-mode APC. Each thread has its own APC queue. An application queues an APC to a thread by calling the QueueUserAPC() function. The calling thread specifies the address of an APC function in the call to QueueUserAPC(). The queuing of an APC is a request for the thread to call the APC function. When a user-mode APC is queued, the thread to which it is queued is not directed to call the APC function unless it is in an alertable state. A thread enters an alertable state when it calls the SleepEx(), SignalObjectAndWait(), MsgWaitForMultipleObjectsEx(), WaitForMultipleObjectsEx(), or WaitForSingleObjectEx() function. If the wait is satisfied before the APC is queued, the thread is no longer in an alertable wait state so the APC function will not be executed. However, the APC is still queued, so the APC function will be executed when the thread calls another alertable wait function. Note that the ReadFileEx(), SetWaitableTimer(), and WriteFileEx() functions are implemented using an APC as the completion notification callback mechanism.

Synchronization Internals

When an I/O request is issued, a structure is allocated to represent the request. This structure is called an I/O request packet (IRP). With synchronous I/O, the thread builds the IRP, sends it to the device stack, and waits in the kernel for the IRP to complete. With asynchronous I/O, the thread builds the IRP and sends it to the device stack. The stack might complete the IRP immediately, or it might return a pending status indicating that the request is in progress. When this happens, the IRP is still associated with the thread, so it will be canceled if the thread terminates or calls a function such as CancelIo(). In the meantime, the thread can continue to perform other tasks while the device stack continues to process the IRP. There are several ways that the system can indicate that the IRP has completed:

Update the overlapped structure with the result of the operation so the thread can poll to determine whether the operation has completed.
Signal the event in the overlapped structure so a thread can synchronize on the event and be woken when the operation completes.
Queue the IRP to the thread's pending APC so that the thread will execute the APC routine when it enters an alertable wait state and return from the wait operation with a status indicating that it executed one or more APC routines.
Queue the IRP to an I/O completion port, where it will be executed by the next thread that waits on the completion port.

Threads that wait on an I/O completion port do not wait in an alertable state. Therefore, if those threads issue IRPs that are set to complete as APCs to the thread, those IPC completions will not occur in a timely manner; they will occur only if the thread picks up a request from the I/O completion port and then happens to enter an alertable wait.

Critical Section Objects

A critical section object provides synchronization similar to that provided by a mutex object, except that a critical section can be used only by the threads of a single process. Event, mutex, and semaphore objects can also be used in a single-process application, but critical section objects provide a slightly faster, more efficient mechanism for mutual-exclusion synchronization (a processor-specific test and set instruction). Like a mutex object, a critical section object can be owned by only one thread at a time, which makes it useful for protecting a shared resource from simultaneous access. Unlike a mutex object, there is no way to tell whether a critical section has been abandoned. A critical section is a piece of code that only one thread can execute at a time. If multiple threads try to enter a critical section, only one can run and the others will sleep. Imagine you have three threads that all want to enter a critical section.

Windows synchronization object: critical section

Only one thread can enter the critical section; the other two have to sleep. When a thread sleeps, its execution is paused and the OS will run some other thread.

Windows synchronization object: critical section - only one thread can access critical section at one time

Once the thread in the critical section exits, another thread is woken up and allowed to enter the critical section.

Windows synchronization object: critical section - another thread can use critical section when it is released

It’s important to keep the code inside a critical section as small as possible. The larger the critical section the longer it takes to execute, making the wait time longer for any additional threads that want access. Starting with Windows Server 2003 with Service Pack 1 (SP1), threads waiting on a critical section do not acquire the critical section on a first-come, first-serve basis. This change increases performance significantly for most code. However, some applications depend on FIFO ordering and may perform poorly or not at all on current versions of Windows (for example, applications that have been using critical sections as a rate-limiter). To ensure that your code continues to work correctly, you may need to add an additional level of synchronization. For example, suppose you have a producer thread and a consumer thread that are using a critical section object to synchronize their work. Create two event objects, one for each thread to use to signal that it is ready for the other thread to proceed. The consumer thread will wait for the producer to signal its event before entering the critical section, and the producer thread will wait for the consumer thread to signal its event before entering the critical section. After each thread leaves the critical section, it signals its event to release the other thread.

Windows Server 2003 and Windows XP/2000: Threads that are waiting on a critical section are added to a wait queue; they are woken and generally acquire the critical section in the order in which they were added to the queue. However, if threads are added to this queue at a fast enough rate, performance can be degraded because of the time it takes to awaken each waiting thread. The process is responsible for allocating the memory used by a critical section. Typically, this is done by simply declaring a variable of type CRITICAL_SECTION. Before the threads of the process can use it, initialize the critical section by using the InitializeCriticalSection() or InitializeCriticalSectionAndSpinCount() function. A thread uses the EnterCriticalSection() or TryEnterCriticalSection() function to request ownership of a critical section. It uses the LeaveCriticalSection() function to release ownership of a critical section. If the critical section object is currently owned by another thread, EnterCriticalSection() waits indefinitely for ownership. In contrast, when a mutex object is used for mutual exclusion, the wait functions accept a specified time-out interval. The TryEnterCriticalSection() function attempts to enter a critical section without blocking the calling thread.

When a thread owns a critical section, it can make additional calls to EnterCriticalSection() or TryEnterCriticalSection() without blocking its execution. This prevents a thread from deadlocking itself while waiting for a critical section that it already owns. To release its ownership, the thread must call LeaveCriticalSection() one time for each time that it entered the critical section. There is no guarantee about the order in which waiting threads will acquire ownership of the critical section. A thread uses the InitializeCriticalSectionAndSpinCount() or SetCriticalSectionSpinCount() function to specify a spin count for the critical section object. Spinning means that when a thread tries to acquire a critical section that is locked, the thread enters a loop, checks to see if the lock is released, and if the lock is not released, the thread goes to sleep. On single-processor systems, the spin count is ignored and the critical section spin count is set to 0 (zero). On multiprocessor systems, if the critical section is unavailable, the calling thread spins dwSpinCount times before performing a wait operation on a semaphore that is associated with the critical section. If the critical section becomes free during the spin operation, the calling thread avoids the wait operation. Any thread of the process can use the DeleteCriticalSection() function to release the system resources that are allocated when the critical section object is initialized. After this function is called, the critical section object cannot be used for synchronization. When a critical section object is owned, the only other threads affected are the threads that are waiting for ownership in a call to EnterCriticalSection(). Threads that are not waiting are free to continue running.

Windows Thread Synchronization 7

< Thread Synchronization 6 | Thread Synchronization Programming | Win32 Programming | Thread Synchronization 8 >