Sample Page

A number of COM files in IBM PC DOS 1.0

A COM file is a type of simple executable file. On the Digital Equipment Corporation (DEC) VAX operating systems of the 1970s, .COM was used as a filename extension for text files containing commands to be issued to the operating system (similar to a batch file).[1] With the introduction of Digital Research‘s CP/M (a microcomputer operating system modeled after TOPS-10 for the PDP-10), the type of files commonly associated with COM extension changed to that of executable files. This convention was later carried over to DOS. Even when complemented by the more general EXE file format for executables, the compact COM files remained viable and frequently used under DOS.

The .COM file name extension has no relation to the .com (for “commercial”) top-level Internet domain name. However, this similarity in name has been exploited by malware writers.[citation needed]

DOS binary format

The COM format is the original binary executable format used in CP/M (including SCP and MSX-DOS) as well as DOS. It is very simple; it has no header (with the exception of CP/M 3 files),[2] and contains no standard metadata, only code and data. This simplicity exacts a price: the binary was designed to have a maximum size of 65,280 (FF00h) bytes (256 bytes short of 64 KiB) and store all its code and data in one 8086 memory segment.

Although COM files are written as if they all start at the same fixed entry point (0100h)),[nb 1] MS-DOS chooses which 64K memory segment to load it into and so there is no problem having multiple COM files in memory at the same time.[3] The bytes are placed starting at offset 0100h (256 decimal) because the previous bytes are reserved for system use (i.e., the DOS program segment prefix).

Before MS-DOS and the 8086 processor, in the Intel 8080 CPU architecture, only 65,536 bytes of memory could be addressed (address range 0000h to FFFFh). Under CP/⁠M, the first 256 bytes of this memory, from 0000h to 00FFh were reserved for system use by the zero page, and any user program had to be loaded at exactly 0100h to be executed.[nb 1] MS-DOS’s COM files were designed to mimic that configuration to ease porting programs from CP/⁠M.[4]

Although the file format is the same in DOS and CP/M, COM files for the two operating systems are not compatible; DOS COM files contain x86 instructions and possibly DOS system calls, while CP/M COM files contain 8080 instructions and CP/M system calls (programs restricted to certain machines could also contain additional instructions for 8085 or Z80).

MS-DOS was designed to run on the Intel 8086 processor which could address sixteen different memory “segments”, each of which held 65,536 bytes of memory. Since COM files were never intended to span multiple segments, Microsoft created a successor file format that allowed for larger programs: .EXE.[5]


COM files in DOS set all x86 segment registers to the same value and the SP (stack pointer) register to the offset of the last word available in the first 64 KiB segment (typically FFFEh) or the maximum size of memory available in the block the program is loaded into for both, the program plus at least 256 bytes stack, whatever is smaller, thus the stack begins at the very top of the corresponding memory segment and works down from there.[6][7]

In the original DOS 1.x API, which was a derivative of the CP/M API, program termination of a COM file would be performed by calling the INT 20h (Terminate Program) function or else INT 21h Function 0, which served the same purpose, and the programmer also had to ensure that the code and data segment registers contained the same value at program termination to avoid a potential system crash. Although this could be used in any DOS version, Microsoft recommended the use of INT 21h Function 4Ch for program termination from DOS 2.x onward, which did not require the data and code segment to be set to the same value.

Under CP/M 3, if the first byte of a COM file is C9h, there is a 256-byte header;[2] since C9h corresponds to the 8080 instruction RET, this means that the COM file will immediately terminate if run on an earlier version of CP/M that does not support this extension. (Because the instruction sets of the 8085 and Z80 are supersets of the 8080 instruction set, this works on all three processors.) C9h is an invalid opcode on the 8088/8086, however it is the opcode for LEAVE since the 80188/80186. Albeit possible, LEAVE is unlikely to be used as the first instruction in a valid program. Thus the executable loader in some versions of DOS rejects COM files that start with C9h [citation needed], avoiding a crash.

Files may have names ending in .COM, but not be in the simple format described above; this is indicated by a magic number at the start of the file. For example, the COMMAND.COM file in DR DOS 6.0 is actually in DOS executable format, indicated by the first two bytes being 4D 5A (MZ in ASCII), the initials of Mark Zbikowski.

Large programs

Under DOS there is no memory management provided for COM files by the loader or execution environment. All memory is simply available to the COM file. After execution, the operating system command shell, COMMAND.COM, is reloaded. This leaves the possibilities that the COM file can either be very simple, using a single segment, or arbitrarily complex, providing its own memory management system. An example of a complex program is COMMAND.COM, the DOS shell, which provided a loader to load other COM or EXE programs. In the COM system, larger programs (up to the available memory size) can be loaded and run, but the system loader assumes that all code and data is in the first segment, and it is up to the COM program to provide any further organization. Programs larger than available memory, or large data segments, can be handled by dynamic linking, if the necessary code is included in the COM program. The advantage of using the COM rather than EXE format is that the binary image is usually smaller and easier to program using an assembler.[8] Once compilers and linkers of sufficient power became available, it was no longer advantageous to use the COM format for complex programs.

Platform support

The format is still executable on many modern Windows NT-based platforms, but it is run in an MS-DOS-emulating subsystem, NTVDM, which is not present in 64-bit variants. COM files can be executed also on DOS emulators such as DOSBox, on any platform supported by these emulators.

Use for compatibility reasons

Windows NT-based operating systems use the .com extension for a small number of commands carried over from MS-DOS days although they are in fact presently implemented as .exe files. The operating system will recognize the .exe file header and execute them correctly despite their technically incorrect .com extension. (In fact any .exe file can be renamed .com and still execute correctly.) The use of the original .com extensions for these commands ensures compatibility with older DOS batch files that may refer to them with their full original filenames. These commands are CHCP, DISKCOMP, DISKCOPY, FORMAT, MODE, MORE and TREE.[9]

Execution preference

In DOS, if a directory contains both a COM file and an EXE file with same name, when no extension is specified the COM file is preferentially selected for execution. For example, if a directory in the system path contains two files named foo.com and foo.exe, the following would execute foo.com:

C:\>foo

A user wishing to run foo.exe can explicitly use the complete filename:

C:\>foo.exe

Taking advantage of this default behaviour, virus writers and other malicious programmers have used names like notepad.com for their creations, hoping that if it is placed in the same directory as the corresponding EXE file, a command or batch file may accidentally trigger their program instead of the text editor notepad.exe. Again, these .com files may in fact contain a .exe format executable.

On Windows NT and derivatives (Windows 2000, Windows XP, Windows Vista, and Windows 7), the PATHEXT environment variable is used to override the order of preference (and acceptable extensions) for calling files without specifying the extension from the command line. The default value still places .com files before .exe files. This closely resembles a feature previously found in JP Software’s line of extended command line processors 4DOS, 4OS2, and 4NT.

Malicious usage of the .com extension

Some computer virus writers have hoped to take advantage of modern computer users’ likely lack of knowledge of the .com file extension and associated binary format, along with their more likely familiarity with the .com Internet domain name. E-mails have been sent with attachment names similar to “www.example.com”. Unwary Microsoft Windows users clicking on such an attachment would expect to begin browsing a site named http://www.example.com/, but instead would run the attached binary command file named www.example, giving it full permission to do to their machine whatever its author had in mind.[citation needed]

There is nothing malicious about the COM file format itself; this is an exploitation of the coincidental name collision between .com command files and .com commercial web sites.

See also

Notes

  1. ^ a b In most versions of CP/M, the start of the TPA was at offset +100h, only preceded in memory by the zero page at offset +0h. Some versions differed for hardware reasons including CP/M for the Heath H89, where it started at offset +4300h (for compatibility, a Magnolia Microsystems hardware modification existed to map out the ROMs at +100h after startup), or CP/M for the TRS-80 Model I and TRS-80 Model III, where programs were loaded at offset +0h.

References

  1. ^ Christian, Brian; Markson, Tom; Skrenta, Rich (eds.). “Section 5.3”. The PDP-11 How-To Book (Revision 1 ed.). Archived from the original on 2018-08-01. Retrieved 2018-08-01. (NB. Has a reference for the RT-11 operating system running on the PDP-11 minicomputer, which shows in section 5.3 that .COM is used to refer to a command file.)
  2. ^ a b Elliott, John C.; Lopushinsky, Jim (2002) [1998-04-11]. “CP/M 3 COM file header”. Seasip.info. Archived from the original on 2018-08-01.
  3. ^ Rollins, Dan (1985). “Program Startup & Exit”. Tech Help Reference. Flambeaux Software. Retrieved 2026-06-01.
  4. ^ Necasek, Michal (2011-09-13). “Who needs the address wraparound, anyway?”. OS/2 Museum. Retrieved 2026-06-01. This form is provided to simplify translation of 8080/Z80 programs into 8086 code, and is not recommended for new programs.
  5. ^ Summers, Jason. “MS-DOS EXE”. Just Solve the File Format Problem. Retrieved 2026-06-01. MS-DOS EXE (or DOS EXE), also known as MZ format, is an executable file format used mainly by MS-DOS. It is the successor of COM.
  6. ^ Paul, Matthias R. (2002-10-07) [2000]. “Re: Run a COM file”. Newsgroupalt.msdos.programmer. Retrieved 2017-09-03.{{cite newsgroup}}: CS1 maint: deprecated archival service (link) [1] (NB. Has details on the DOS COM program calling conventions.)
  7. ^ Lunt, Benjamin “Ben” D. (2020). “DOS .COM startup registers”. Forever Young Software. Archived from the original on 2020-11-12. Retrieved 2021-12-14.
  8. ^ Scanlon, Leo J. (1991). “Chapter 2”. Assembly Language Subroutines for MS-DOS (2 ed.). Windcrest Books. p. 16. ISBN 0-8306-7649-X.
  9. ^ “Windows Commands”. Microsoft. 2023-04-26.