The files current character set code page, directory, compression and decompression

The Windows File Management 7

Determining the Current Character Set Code Page

The AreFileApisANSI() function determines whether the file I/O functions are using the ANSI or OEM character set code page. The SetFileApisToANSI() function causes the functions to use the ANSI code page. The SetFileApisToOEM() function causes the functions to use the OEM code page. By default, file I/O functions use ANSI file names. Functions exported by Kernel32.dll that accept or return file names are affected by the file code page setting. Both SetFileApisToANSI() and SetFileApisToOEM() set the code page per process, rather than per thread or per computer.

AreFileApisANSI() Program Example

Create a new empty Win32 console application project. Give a suitable project name and change the project location if needed.

Then, add the source file and give it a suitable name.

Next, add the following source code.

#include <windows.h>

#include <stdio.h>

int wmain(int argc, WCHAR *argv[])

{

BOOL bRetVal;

bRetVal = AreFileApisANSI();

if(bRetVal != 0)

wprintf(LFile I/O functions is using the ANSI code page!\n);

else

wprintf(LFile I/O functions is using the OEM code page!\n);

return 0;

}

Build and run the project. The following screenshot is a sample output.

AreFileApisANSI() Program Example: A sample console program output in action

Reading From and Writing to Files

An application reads from and writes to a file by using the ReadFile(), ReadFileEx(), WriteFile(), and WriteFileEx() functions. These functions require a handle to a file to be opened for reading and writing, respectively. They read and write a specified number of bytes at the location indicated by the file pointer. The data is read and written exactly as specified; the functions do not format the data. When the file pointer reaches the end of a file and the application attempts to read from the file, no error occurs, but no bytes are read. Therefore, reading zero bytes without an error means the application has reached the end of the file. Writing zero bytes does nothing.

Positioning a File Pointer

When an application calls CreateFile() to open a file for the first time, Windows places the file pointer at the beginning of the file. As bytes are read from or written to the file, Windows advances the file pointer the number of bytes read or written. An application can position the file pointer to a specified offset by calling SetFilePointer(). The SetFilePointer() function can also be used to query the current file pointer position by specifying a move method of FILE_CURRENT and a distance of zero.

Reading From or Writing To Files Using a Scatter-Gather Scheme

A scatter-gather scheme uses the operating system to deliver in one operation multiple discrete chunks of data (such as database records) from a file to separate, noncontiguous buffers in memory. A scatter-gather scheme also writes the data from noncontiguous buffers in one operation. An application can implement a scatter-gather scheme with ReadFileScatter() and WriteFileGather().

Flushing System-Buffered I/O Data to Disk

Windows stores the data in file read and write operations in system-maintained data buffers to optimize disk performance. When an application writes to a file, the system usually buffers the data and writes the data to the disk on a regular basis. An application can force the operating system to write the contents of these data buffers to the disk by using the FlushFileBuffers() function. Alternatively, an application can specify that write operations are to bypass the data buffer and write directly to the disk by setting the FILE_FLAG_NO_BUFFERING flag when the file is created or opened using the CreateFile() function. If there is data in the internal buffer when the file is closed, the operating system does not automatically write the contents of the buffer to the disk before closing the file. If the application does not force the operating system to write the buffer to disk before closing the file, the caching algorithm determines when the buffer is written. Accessing a data buffer while a read or write operation is using it may corrupt the buffer. Applications must not read from, write to, reallocate, or free the data buffer that a read or write operation is using until the operation completes.

Truncating or Extending Files

An application can truncate or extend a file by calling SetEndOfFile(). This function sets the end-of-file marker to the current position of the file pointer. Note that when a file is extended, the contents between the old and new end-of-file locations are not defined.

File and Directory Linking

The NTFS file system provides the ability to create a system representation of a file or directory in a location in the directory structure that is different from the file or directory object that is being linked to. This process is called linking. There are two types of links supported in the NTFS file system: hard links and junctions. The NTFS file system also provides the distributed link tracking service, which automatically tracks links as they are moved.

File Compression and Decompression

The NTFS file system volumes support file compression on an individual file basis. The file compression algorithm used by the NTFS file system is Lempel-Ziv compression. This is a lossless compression algorithm, which means that no data is lost when compressing and decompressing the file, as opposed to lossy compression algorithms such as JPEG, where some data is lost each time data compression and decompression occur. Data compression reduces the size of a file by minimizing redundant data. In a text file, redundant data can be frequently occurring characters, such as the space character, or common vowels, such as the letters e and a; it can also be frequently occurring character strings. Data compression creates a compressed version of a file by minimizing this redundant data.

Each type of data-compression algorithm minimizes redundant data in a unique manner. For example, the Huffman encoding algorithm assigns a code to characters in a file based on how frequently those characters occur. Another compression algorithm, called run-length encoding, generates a two-part value for repeated characters: the first part specifies the number of times the character is repeated, and the second part identifies the character. Another compression algorithm, known as the Lempel-Ziv algorithm, converts variable-length strings into fixed-length codes that consume less space than the original strings.

The NTFS File System File Compression

On the NTFS file system, compression is performed transparently. This means it can be used without requiring changes to existing applications. The compressed bytes of the file are not accessible to applications; they see only the uncompressed data. Therefore, applications that open a compressed file can operate on it as if it were not compressed. However, these files cannot be copied to another file system. If you compress a file that is larger than 30 gigabytes, the compression may not succeed. The following topics identify the NTFS file system file compression:

Compression Attribute

On an NTFS file system volume, each file and directory has a compression attribute. Other file systems may also implement a compression attribute for individual files and directories. You can determine whether a file system supports a compression attribute for files and directories by calling the GetVolumeInformation() function and examining the FS_FILE_COMPRESSION bit flag. Use the GetFileAttributes() or GetFileAttributesEx() function to determine the compression attribute of a file or directory. If a file's compression attribute is set (FILE_ATTRIBUTE_COMPRESSED), all data in the file is compressed. If the attribute is clear, none of the data in the file is compressed. There is no partially compressed state from a user-mode programming perspective; the compression attribute is a simple Boolean indicator of compression state. A directory's compression attribute provides a default compression attribute for newly created files and subdirectories. When you call CreateFile() or CreateDirectory() to create a new file or directory, the new file or directory inherits the compression attribute of its parent directory. To modify the FILE_ATTRIBUTE_COMPRESSED attribute for a file or directory, you must use the DeviceIoControl() function with the FSCTL_SET_COMPRESSION control code.

Compression State

Each file and directory on a volume that supports compression for individual files and directories has a compression state. Whereas the compression attribute of a file or directory indicates simply whether the file or directory is compressed or not compressed, the compression state also specifies the format of any compressed data. Use the FSCTL_GET_COMPRESSION control code to determine the compression state of a file or directory. Compression state is encoded as a 16-bit value. A compression state value of COMPRESSION_FORMAT_NONE indicates that a file is not compressed. A value of COMPRESSION_FORMAT_DEFAULT indicates that a file is compressed, using the default compression format. Any other value indicates that a file is compressed, using the compression format specified by the compression state value. Use the FSCTL_SET_COMPRESSION control code to set the compression state of a file or directory. This operation also sets the compression attribute of the file or directory. Setting the compression state of a file to a nonzero value compresses the file, using the compression format encoded by the compression state value. Setting a file's compression state to zero decompresses the file. These are synchronous operations. The file is compressed or decompressed immediately when you set its compression state. Setting a directory's compression state does not cause any immediate compression or decompression. Instead, setting a directory's compression state sets a default compression state that will be given to all newly created files and subdirectories.

Obtaining the Size of a Compressed File

Use the GetCompressedFileSize() function to obtain the compressed size of a file. If the file is compressed, its compressed size will be less than its uncompressed size. Use the GetFileSize() function to determine the uncompressed size of a file.

The Windows File Management 7

< Windows Files 6 | Win32 Programming | Win32 File Index | Windows Files 8 >