The Windows file input/output - buffering, caching, synchronous and asynchronous

The Windows File Management 14

Input and Output (I/O)

Windows provides the ability to perform input and output (I/O) operations on storage components located on local and remote computers.

I/O Concepts

File Buffering

This topic covers the various considerations for application control of file buffering, also known as unbuffered file input/output (I/O). File buffering is usually handled by the system behind the scenes and is considered part of file caching within the Windows operating system unless otherwise specified. Although the terms caching and buffering are sometimes used interchangeably, this topic uses the term buffering specifically in the context of explaining how to interact with data that is not being cached (buffered) by the system, where it is otherwise largely out of the direct control of user-mode applications. When opening or creating a file with the CreateFile() function, the FILE_FLAG_NO_BUFFERING flag can be specified to disable system caching of data being read from or written to the file. Although this gives complete and direct control over data I/O buffering, in the case of files and similar devices there are data alignment requirements that must be considered. Note that, this alignment information applies to I/O on devices such as files that support seeking and the concept of file position pointers (or offsets). For non-seeking devices, such as named pipes or communications devices, turning off buffering may not require any particular alignment. Any limitations or efficiencies that may be gained by alignment in that case are dependent on the underlying technology.

In a simple example, the application would open a file for write access with the FILE_FLAG_NO_BUFFERING flag and then perform a call to the WriteFile() function using a data buffer defined within the application. This local buffer is, in these circumstances, effectively the only file buffer that exists for this operation. Because of physical disk layout, file system storage layout, and system-level file pointer position tracking, this write operation will fail unless the locally-defined data buffers meet certain alignment criteria, discussed in the following section. Discussion of caching does not consider any hardware caching on the physical disk itself, which is not guaranteed to be within the direct control of the system in any case. This has no effect on the requirements specified in this topic.

Alignment and File Access Requirements

As previously discussed, an application must meet certain requirements when working with files opened with FILE_FLAG_NO_BUFFERING. The following specifics apply:

File access sizes, including the optional file offset in the OVERLAPPED structure, if specified, must be for a number of bytes that is an integer multiple of the volume sector size. For example, if the sector size is 512 bytes, an application can request reads and writes of 512, 1,024, 1,536, or 2,048 bytes, but not of 335, 981, or 7,171 bytes.
File access buffer addresses for read and write operations should be sector-aligned, which means aligned on addresses in memory that are integer multiples of the volume sector size. Depending on the disk, this requirement may not be enforced.

An application can determine a volume sector size by calling the GetDiskFreeSpace() function. Because buffer addresses for read and write operations must be sector-aligned, the application must have direct control of how these buffers are allocated. One way to sector-align buffers is to use the VirtualAlloc() function to allocate the buffers. Consider the following:

VirtualAlloc() allocates memory that is aligned on addresses that are integer multiples of the system's page size. Page size is 4,096 bytes on x64 and x86 or 8,192 bytes for the Intel Itanium processor family.
Sector size is typically 512 to 4,096 bytes for direct-access storage devices (hard drives) and 2,048 bytes for CD-ROMs.
Both page and sector sizes are powers of 2.

Therefore, in most situations, page-aligned memory will also be sector-aligned, because the case where the sector size is larger than the page size is rare. Another way to obtain manually-aligned memory buffers is to use the _aligned_malloc() function from the C Run-Time library.

File Caching

By default, Windows caches file data that is read from disks and written to disks. This implies that read operations read file data from an area in system memory known as the system file cache, rather than from the physical disk. Correspondingly, write operations write file data to the system file cache rather than to the disk, and this type of cache is referred to as a write-back cache. Caching is managed per file object.

Caching occurs under the direction of the cache manager, which operates continuously while Windows is running. File data in the system file cache is written to the disk at intervals determined by the operating system, and the memory previously used by that file data is freed, this is referred to as flushing the cache. The policy of delaying the writing of the data to the file and holding it in the cache until the cache is flushed is called lazy writing, and it is triggered by the cache manager at a determinate time interval. The time at which a block of file data is flushed is partially based on the amount of time it has been stored in the cache and the amount of time since the data was last accessed in a read operation. This ensures that file data that is frequently read will stay accessible in the system file cache for the maximum amount of time. This file data caching process is illustrated in the following figure.

The file data caching process illustration

As depicted by the solid arrows in the previous figure, a 256KB region of data is read into a 256KB cache slot in system address space when it is first requested by the cache manager during a file read operation. A user-mode process then copies the data in this slot to its own address space. When the process has completed its data access, it writes the altered data back to the same slot in the system cache, as shown by the dotted arrow between the process address space and the system cache. When the cache manager has determined that the data will no longer be needed for a certain amount of time, it writes the altered data back to the file on the disk, as shown by the dotted arrow between the system cache and the disk. The amount of I/O performance improvement that file data caching offers depends on the size of the file data block being read or written. When large blocks of file data are read and written, it is more likely that disk reads and writes will be necessary to finish the I/O operation. I/O performance will be increasingly impaired as more of this kind of I/O operation occurs.

In these situations, caching can be turned off. This is done at the time the file is opened by passing FILE_FLAG_NO_BUFFERING as a value for the dwFlagsAndAttributes parameter of CreateFile(). When caching is disabled, all read and write operations directly access the physical disk. However, the file metadata may still be cached. To flush the metadata to disk, use the FlushFileBuffers() function. The frequency at which flushing occurs is an important consideration that balances system performance with system reliability. If the system flushes the cache too often, the number of large write operations flushing incurs will degrade system performance significantly. If the system is not flushed often enough, then the likelihood is greater that either system memory will be depleted by the cache, or a sudden system failure (such as a loss of power to the computer) will happen before the flush. In the latter instance, the cached data will be lost.

To ensure that the right amount of flushing occurs, the cache manager spawns a process every second called a lazy writer. The lazy writer process queues one-eighth of the pages that have not been flushed recently to be written to disk. It constantly reevaluates the amount of data being flushed for optimal system performance, and if more data needs to be written it queues more data. Lazy writers do not flush temporary files, because the assumption is that they will be deleted by the application or system. Some applications, such as virus-checking software, require that their write operations be flushed to disk immediately; Windows provides this ability through write-through caching. A process enables write-through caching for a specific I/O operation by passing the FILE_FLAG_WRITE_THROUGH flag into its call to CreateFile(). With write-through caching enabled, data is still written into the cache, but the cache manager writes the data immediately to disk rather than incurring a delay by using the lazy writer. A process can also force a flush of a file it has opened by calling the FlushFileBuffers() function. File system metadata is always cached. Therefore, to store any metadata changes to disk, the file must either be flushed or be opened with FILE_FLAG_WRITE_THROUGH.

Synchronous and Asynchronous I/O

There are two types of input/output (I/O) synchronization: synchronous I/O and asynchronous I/O. Asynchronous I/O is also referred to as overlapped I/O. In synchronous file I/O, a thread starts an I/O operation and immediately enters a wait state until the I/O request has completed. A thread performing asynchronous file I/O sends an I/O request to the kernel by calling an appropriate function. If the request is accepted by the kernel, the calling thread continues processing another job until the kernel signals to the thread that the I/O operation is complete. It then interrupts its current job and processes the data from the I/O operation as necessary. The two synchronization types are illustrated in the following figure.

Synchronous and asynchronous I/O illustrations

In situations where an I/O request is expected to take a large amount of time, such as a refresh or backup of a large database or a slow communications link, asynchronous I/O is generally a good way to optimize processing efficiency. However, for relatively fast I/O operations, the overhead of processing kernel I/O requests and kernel signals may make asynchronous I/O less beneficial, particularly if many fast I/O operations need to be made. In this case, synchronous I/O would be better. The mechanisms and implementation details of how to accomplish these tasks vary depending on the type of device handle that is used and the particular needs of the application. In other words, there are usually multiple ways to solve the problem.

Synchronous and Asynchronous I/O Considerations

If a file or device is opened for synchronous I/O (that is, FILE_FLAG_OVERLAPPED is not specified), subsequent calls to functions such as WriteFile() can block execution of the calling thread until one of the following events occurs:

The I/O operation completes (in this example, a data write).
An I/O error occurs. (For example, the pipe is closed from the other end.)
An error was made in the call itself (for example, one or more parameters are not valid).
Another thread in the process calls the CancelSynchronousIo() function using the blocked thread's thread handle, which terminates I/O for that thread, failing the I/O operation.
The blocked thread is terminated by the system; for example, the process itself is terminated, or another thread calls the TerminateThread() function using the blocked thread's handle. (This is generally considered a last resort and not good application design.)

In some cases, this delay may be unacceptable to the application's design and purpose, so application designers should consider using asynchronous I/O with appropriate thread synchronization objects such as I/O completion ports. A process opens a file for asynchronous I/O in its call to CreateFile() by specifying the FILE_FLAG_OVERLAPPED flag in the dwFlagsAndAttributes parameter. If FILE_FLAG_OVERLAPPED is not specified, the file is opened for synchronous I/O. When the file has been opened for asynchronous I/O, a pointer to an OVERLAPPED structure is passed into the call to ReadFile() and WriteFile(). When performing synchronous I/O, this structure is not required in calls to ReadFile() and WriteFile().

If a file or device is opened for asynchronous I/O, subsequent calls to functions such as WriteFile() using that handle generally return immediately but can also behave synchronously with respect to blocked execution. For more information, see more information. Although CreateFile() is the most common function to use for opening files, disk volumes, anonymous pipes, and other similar devices, I/O operations can also be performed using a handle typecast from other system objects such as a socket created by the socket() or accept() functions. Handles to directory objects are obtained by calling the CreateFile() function with the FILE_FLAG_BACKUP_SEMANTICS attribute. Directory handles are almost never used, backup applications are one of the few applications that will typically use them. After opening the file object for asynchronous I/O, an OVERLAPPED structure must be properly created, initialized, and passed into each call to functions such as ReadFile() and WriteFile(). Keep the following in mind when using the OVERLAPPED structure in asynchronous read and write operations:

Do not deallocate or modify the OVERLAPPED structure or the data buffer until all asynchronous I/O operations to the file object have been completed.
If you declare your pointer to the OVERLAPPED structure as a local variable, do not exit the local function until all asynchronous I/O operations to the file object have been completed. If the local function is exited prematurely, the OVERLAPPED structure will go out of scope and it will be inaccessible to any ReadFile() or WriteFile() functions it encounters outside of that function.

You can also create an event and put the handle in the OVERLAPPED structure; the wait functions can then be used to wait for the I/O operation to complete by waiting on the event handle. As previously stated, when working with an asynchronous handle, applications should use care when making determinations about when to free resources associated with a specified I/O operation on that handle. If the handle is deallocated prematurely, ReadFile() or WriteFile() may incorrectly report that the I/O operation is complete. Further, the WriteFile() function will sometimes return TRUE with a GetLastError() value of ERROR_SUCCESS, even though it is using an asynchronous handle (which can also return FALSE with ERROR_IO_PENDING). Programmers accustomed to synchronous I/O design will usually release data buffer resources at this point because TRUE and ERROR_SUCCESS signify the operation is complete. However, if I/O completion ports are being used with this asynchronous handle, a completion packet will also be sent even though the I/O operation completed immediately. In other words, if the application frees resources after WriteFile() returns TRUE with ERROR_SUCCESS in addition to in the I/O completion port routine, it will have a double-free error condition. In this example, the recommendation would be to allow the completion port routine to be solely responsible for all freeing operations for such resources.

The system does not maintain the file pointer on asynchronous handles to files and devices that support file pointers (that is, seeking devices), therefore the file position must be passed to the read and write functions in the related offset data members of the OVERLAPPED structure. File pointer position for a synchronous handle is maintained by the system as data is read or written and can also be updated using the SetFilePointer() or SetFilePointerEx() function. An application can also wait on the file handle to synchronize the completion of an I/O operation, but doing so requires extreme caution. Each time an I/O operation is started, the operating system sets the file handle to the nonsignaled state. Each time an I/O operation is completed, the operating system sets the file handle to the signaled state. Therefore, if an application starts two I/O operations and waits on the file handle, there is no way to determine which operation is finished when the handle is set to the signaled state. If an application must perform multiple asynchronous I/O operations on a single file, it should wait on the event handle in the specific OVERLAPPED structure for each I/O operation, rather than on the common file handle. To cancel all pending asynchronous I/O operations, use either:

CancelIo() - this function only cancels operations issued by the calling thread for the specified file handle.
CancelIoEx() - this function cancels all operations issued by the threads for the specified file handle.

Use CancelSynchronousIo() to cancel pending synchronous I/O operations. The ReadFileEx() and WriteFileEx() functions enable an application to specify a routine to execute when the asynchronous I/O request is completed.

The Windows File Management 14

< Windows Files 13 | Win32 Programming | Win32 File Index | Windows Files 15 >