The Windows File Management 2

 

 

 

 

 

Creating, Deleting, and Maintaining Files

 

File Names, Paths, and Namespaces

 

All file systems follow the same general naming conventions for an individual file: a base file name and an optional extension, separated by a period. However, each file system, such as NTFS and FAT, can have specific and differing rules about the formation of the individual components in a directory or file name. Character count limitations can also be different and can vary depending on the path name prefix format used. This is further complicated by support for backward compatibility mechanisms. Any Windows (Win32) application developer should be aware of these limitations and differences and know which file and path names are valid.

For example, the older MS-DOS FAT file system supports a maximum of 8 characters for the base file name and 3 characters for the extension, for a total of 12 characters including the dot separator. This is commonly known as an 8.3 file name. The Windows FAT and NTFS file systems are not limited to 8.3 file names, because they have long file name support, but they still support the 8.3 version of long file names. Be aware that the term directory simply refers to a special type of file as far as the file system is concerned; therefore in certain contexts some reference material will use the general term file to encompass both concepts of directories and data files as such. Because of the higher level nature of this topic, it will use the term file to refer to actual data files only. Some file systems, such as NTFS, support linked files and directories, which also follow file naming conventions and rules just as a regular file or directory would.

 

Basic Naming Conventions

 

The following fundamental rules enable applications to create and process valid names for files and directories, regardless of the file system:

 

  1. Use a period to separate the base file name from the extension in the name of a directory or file.
  2. Use a backslash (\) to separate the components of a path. The backslash divides the file name from the path to it, and one directory name from another directory name in a path. For additional details about what a path is, see the Path Names and Namespaces section below.
  3. Use a backslash as required as part of volume names, for example, the "C:\" in "C:\path\file" or the "\\server\share" in "\\server\share\path\file" for Universal Naming Convention (UNC) names. You cannot use a backslash in the actual file or directory name components because it separates the names into components.
  4. Use almost any character in the current code page for a name, including Unicode characters and characters in the extended character set (128 - 255), except for the following:

 

 

  1. Use a period as a directory component in a path to represent the current directory, for example ".\tmp.txt".
  2. Use two consecutive periods (..) as a directory component in a path to represent the parent of the current directory, for example "..\tmp.txt".
  3. Do not use the following reserved device names for the name of a file:

 

CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9

Also avoid these names followed immediately by an extension; for example, NUL.txt is not recommended.

 

  1. Do not assume case sensitivity. For example, consider the names OSCAR, Oscar, and oscar to be the same, even though some file systems (such as a POSIX-compliant file system) may consider them as different. Note that NTFS supports POSIX semantics for case sensitivity but this is not the default behavior.
  2. Do not end a file or directory name with a trailing space or a period. Although the underlying file system may support such names, the operating system does not. However, it is acceptable to start a name with a period.

 

Path Names and Namespaces

 

The path to a specified file consists of one or more components, separated by special characters, with each component usually being a directory name or file name, with some notable exceptions discussed below. It is often critical to the system's interpretation of a path what the beginning of the path (the prefix) looks like and what special characters are used in which position within the path, including the last character. If a component of a path is a file name, it must be the last component. Each component of a path will also be constrained by the maximum length specified for a particular file system. In general, these rules fall into two categories: short and long. Note that directory names are stored by the file system as a special type of file, but naming rules for files also apply to directory names. A path is simply the string representation of the hierarchy between all of the directories that exist for a particular file or directory name.

Any discussion of path names needs to include the concept of a namespace in Windows. There are two main categories of namespace conventions used in the Win32 APIs, commonly referred to as the NT namespace and the Win32 namespace. The NT namespace was designed to be the lowest level namespace on which other subsystems and namespaces could exist, including the Win32 subsystem and, by extension, the Win32 file and device namespaces. POSIX is another example of a subsystem in Windows that is built on top of the NT namespace. Early versions of Windows also defined several predefined, or reserved, names for certain special devices such as communications (serial and parallel) ports and the default display console as part of what is now called the NT device namespace, and are still supported in current versions of Windows for backward compatibility.

To sort out some of this, the following items are different examples of Win32 namespace prefixing and conventions, and summarize how they are used. Note that these examples are intended for use with the Win32 API functions and do not all necessarily work with Windows shell applications such as Windows Explorer. The "\\?\" prefix tells the Win32 APIs to disable all string parsing and to send this string straight to the file system. For example, if the file system supports large paths and file names, you can exceed the MAX_PATH limits that are otherwise enforced by the Win32 APIs. This also allows you to turn off automatic expansion of ".." and "." in the path names. Many but not all file APIs support "\\?\"; you should look at the reference topic for each API to be sure. The "\\.\" prefix will access the device namespace instead of the file namespace. This is how you access physical disks and volumes directly, without going through the file system, if the API supports this type of access. You can access many other devices this way (using the CreateFile() and DefineDosDevice() functions, for example). Most APIs won't support "\\.\", only those that are designed to work with the device namespace.

For example, if you want to open the system's serial communications port 1, you can use either "\\.\COM1" or "COM1" in the call to the CreateFile() function. This works because COM1-COM9 are part of the reserved names in the NT file namespace as previously mentioned. But if you have a 100 port serial expansion board and want to open COM56, you need to open it using "\\.\COM56". This works because "\\.\" goes to the device namespace, and there is no predefined NT namespace for COM56. Another example of this is using the CreateFile() function on "\\.\PhysicalDiskX" or "\\.\CdRom1" allow you to access those devices, bypassing the file system. It just happens that the device driver that implements the name "C:\" has its own namespace that is the file system. APIs that go through the CreateFile() function should work because CreateFile() is the same API to open files and devices. If you're working with Win32 functions, you should use only "\\.\" to access devices and not files.

There are also APIs that allow the use of the NT namespace convention, but the Windows Object Manager makes that unnecessary in most cases. To illustrate, it is useful to browse the Windows namespaces in the system object browser using the Windows Sysinternals WinObj tool. When you run this tool, what you see is the NT namespace rooted at "\". The subdirectory "Global??" is where the Win32 namespace resides. Named device objects reside in the NT namespace within the "Device" subdirectory. Here you may also find Serial0 and Serial1, the device objects representing the two COM ports if present on your system. The device object representing a volume would be something like "HarddiskVolume1", although the numeric suffix may vary. The name "DR0" under subdirectory "Harddisk0" would be the device object representing a disk, and so on. To make these device objects accessible by Win32 applications, the device drivers create a symbolic link (symlink) in the Win32 namespace to their respective device objects. For example, COM0 and COM1 under the "Global??" subdirectory are simply symlinks to Serial0 and Serial1, "C:" is a symlink to HarddiskVolume1, "Physicaldrive0" is a symlink to DR0, and so on. Without a symlink, a specified device "Xxx" will not be available to any Win32 application using Win32 namespace conventions as described previously. However, a handle could be opened to that device using any APIs that support the NT namespace absolute path of the format "\Device\Xxx". With the addition of multi-user support via Terminal Services and virtual machines, it has further become necessary to virtualize the system-wide root device within the Win32 namespace. This was accomplished by adding the symlink named "GLOBALROOT" to the Win32 namespace, which you can see in the "Global??" subdirectory of the WinObj browser tool previously discussed, and can access via the path "\\?\GLOBALROOT". This prefix ensures that the path following it looks in the true root path of the system object manager and not a session-dependent path.

 

 

 

 

< Windows Files 1 | Win32 Programming | Win32 File Index | Windows Files 3 >