How to differentiate 16 bit MZ and 32 bit MZ - c

I need to differentiate a 32 bit PE from a 16 bit DOS MZ.
What is the correct way to do it?
I can use heuristics like looking for the PE header, but I feel like it's not necessarily deterministic

All DOS style executables have an 'MZ' as the first two bytes.
To identify an MSDOS executable vs. the multitude of other variants the best bet seems to be to read the position of the relocation table at offset 0x0018 in the file, if this is greater than 0x0040 (into the file) it is not just plain DOS.
To specifically identify the executable as a 'PE' executable there is a pointer at offset 0x003C in the file. This is an offset within the file the will have the bytes 'PE' and two nuls. Other MSDOS 'MZ' variants will use the same location to put other codes eg: 'NE', 'W3', 'LE' etc.
'PE' style executables also come in many forms, I expect you'll be interested in 32bit vs. 64bit at the very least.
Probably the ultimate authority on this sort of thing is the Unix 'file' command, it's designed to reliably identify ANY file type by investigating it's contents. The MSDOS part is listed here. Microsoft is NOT a reliable authority on this because they ignore non-Microsoft information.

A plain DOS EXE header is only 28 (0x1C) bytes long and is usually followed by the DOS relocation table if present. The IMAGE_DOS_HEADER struct of the NT PE header is much larger at 64 (0x40) bytes as it has been extended for the various other Windows executable formats. This header size difference is why the answer from #user3710044 is not only the fastest but its reliable: an EXE is plain DOS if the relocation table [e_lfarlc] < 0x40).
As long as you realize that the e_lfanew member (an offset to a number of possible "extended" headers) does not exist in a plain DOS executable, you can also use the following logic to distinguish between the various MZ-style formats:
If the beginning of the file does not begin with "MZ" or "ZM", it is not an DOS or Windows executable image. Otherwise you may have one of the following types of executable formats: plain DOS, NE (Windows 16-bit), LE (16-bit VXD), PE32, or PE32+ (PE64).
Determine if you have a plain DOS executable by looking at the e_lfanew value. A plain DOS executable will have an out-of-range e_lfanew pointing outside of the limits of the file, a zero, or if the offset happens to be in range, the signature at its offset won't match any signatures below.
Try to match the signature of the "in-range" offset pointed to by e_lfanew with the following WORD or DWORD values:
"PE" followed by two zero bytes if the image is a PE32 or PE32+ (PE64) and is further determined by the "magic" in the NT Optional Header
"NE" indicates the image is a 16-bit Windows executable
"LE" indicates the image is a 16-bit Virtual Device Driver (VXD)
More obscure signatures (referenced from Ralph Brown's INT 21/AH=4Bh):
LX variant of LE used in OS/2 2.x
W3 Windows WIN386.EXE file; a collection of LE files
W4 Windows95 VMM32.VXD file
DL HP 100LX/200LX system manager compliant executable (.EXM)
MP old PharLap .EXP
P2 PharLap 286 .EXP
P3 PharLap 386 .EXP

Related

Create a C program of size 100 bytes with Visual Studio

I want to write a C application program which upon building will create an executable of size 100 bytes or less.
Even if I create a simple C program with just an empty main(), my output file becomes 11KB on Visual Studio 2015. Is there a way to tell VS not to include any default libs which will reduce my executable size. Or is there any other way to reduce the executable file size?
A sensible Win32 executable cannot be less than some hundred bytes in size: What is the smallest possible Windows (PE) executable?
You can however write a plain old COM executable, which can only be run on x86-Windows. You would need appropriate toolchains: Looking for 16-bit c compiler for x86
You can create an executable with the text section (i.e. the section that has the executable code in it) of less than 100 bytes for a hello world console output, but the .EXE file size will be larger because it needs to have a valid PE format structure, and that format requires quite a bit of padding. /NODEFAULT will of course give you errors. You'll then have to reimplement whatever's missing, usually making things no-op, and you'll need to use link-time code generation so that all the calls to the empty functions get removed (as well as the functions themselves). You'll also need to find compiler flags that disable all "cool" features. E.g. to make the compiler stop emitting buffer security checks (__security_check_cookie & al.), provide the /GS- option.
You'll probably need to use a custom tool to completely strip the .EXE file from unnecessary PE cruft that VS linker emits. And your executable will still be runtime-linked with at least KERNEL32.DLL, since without that you won't be able to do anything useful. If you're brave, you could use NTDLL.DLL (i.e. the native API) directly, but that will probably need more than 100 bytes of code to.
Your executable will also need to be targeting the 32 bit architecture; 64-bit one will be about 25% larger (at such small section sizes to start with).
It's a nice challenge.

ARM linker ELF. Need a way to derive information

I'm Using ARM ADS tool chain to build elf and bin files. map file shows a function 0x20253025. In the elf file I see a branch instruction to the same function as BB,6D,E B,FF (4 bytes). BB is interpreted ad so BB is fine.But the 24 bit address 6DEBFF does not correlate with #0x20253025. ANy idea where to look or how I can get the pattern?
Branch instructions do not usually have the absolute target address: they are encoded with relative offsets. See the branch instruction encoding.

File management in C?

I'm training with file management in C, I saw that there are plenty of ways to open a file with fopen using words as a,r,etc.. Everything ok, but I read also that if to that word I add b that become a binary file. What does it mean? Which are the differences with a normal file?
Opening a file in text mode causes the C libraries to do some handling specific to text. For example, new lines are different between Windows and Unix/linux but you can simply write '\n' because C is handling that difference for you.
Opening a file in binary mode doesn't do any of this special handling, it just treats it as raw bytes. There's a bit of a longer explanation of this on the C FAQ
Note that this only matters on Windows; Unix/linux systems don't (need to) differentiate between text and binary modes, though you can include the 'b' flag without them complaining.
If you open a regular file in the binary mode, you'll get all its data as-is and whatever you write into it, will appear in it.
OTOH, if you open a regular file in the text mode, things like ends of lines can get special treatment. For example, the sequence of bytes with values of 13 (CR or '\r') and 10 (LF or '\n') can get truncated to just one byte, 10, when reading or 10 can get expanded into 13 followed by 10 when writing. This treatment is platform-specific (read, compiler/OS-specific).
For text files, this is often unimportant. But if you apply the text mode to a non-text file, you risk data corruptions.
Also, reading and writing bytes at arbitrary offsets in files opened in the text mode isn't supported because of that special treatment.
The difference is explained here
A binary file is a series of 1's and 0's. This is called machine language because microprocessors can interpret this by sending a signal for 1's or no signal for 0's. This is much more compact, but not readable by humans.
For this reason, text files are a string of binary signals designated to be displayed as more people-friendly characters which lend themselves to language much better than binary. ASCII is an example of one such designation. This reveals the truth of the matter: all files are binary on the lowest level.
But, binary lends itself to any application which does not have to be textually legible to us lowly humans =] Examples applications where binary is preferred are sound files, images, and compiled programs. The reason binary is preferred to text is that it is more efficient to have an image described in machine language than textually (which has to be translated to machine language anyway).
There are two types of files: text files and binary files.
Binary files have two features that distinguish them from text files: You can jump instantly to any record in the file, which provides random access as in an array; and you can change the contents of a record anywhere in the file at any time. Binary files also usually have faster read and write times than text files, because a binary image of the record is stored directly from memory to disk (or vice versa). In a text file, everything has to be converted back and forth to text, and this takes time.
more info here
b is for working with binary files. However, this has no effect on POSIX compliant operating systems.
from the manpage of fopen:
The mode string can also include the letter 'b' either as a last char‐
acter or as a character between the characters in any of the two-char‐
acter strings described above. This is strictly for compatibility with
C89 and has no effect; the 'b' is ignored on all POSIX conforming sys‐
tems, including Linux. (Other systems may treat text files and binary
files differently, and adding the 'b' may be a good idea if you do I/O
to a binary file and expect that your program may be ported to non-UNIX
environments.)

C: reading files which are > 4 GB

I have some kind of reader which only has a handle (FILE*) to a file.
Another process keeps writing to a the same file which I don't have control.
Now, as the other process appends images to that file, it is likely that soon the file size will cross 4 GB limit.
The reader process reads that file using the handle, offset and length of the image file which can be found from some DB.
My question is how would reader be able to read the chunk from the file which will be present after 4GB size.
I'm working on Win32 machine.
EDIT:
I'm working on FreeBSD machine as well.
Just use the standard C API on Windows, fread, fwrite work just fine on large files. You will need _fseeki64 to seek to a 64-bit position.
You can alternatively use the plain WinAPI (ReadFile, etc.) which can also deal with >4 GiB files without problems.
[Edit]: The only thing you really need is a 64-bit seek, which ReadFile provides via the OVERLAPPED structure (as some commenters mentioned.) You can of course also get by using SetFilePointer which is the equivalent of _fseeki64. Reading/Writing is never a problem, no matter the file size, only seeking.
On FreeBSD the stdio API is not limited to 32 bits(4Gb).
You should have no problems reading past 4Gb as long as you use a 64 bit integer to manipulate the offsets and lengths.
If you're seeking in a FILE* , you'll have to use fseeko() and not fseek() if you're on a 32 bit host. fseek() takes a long which is 32 bit on 32 bit machines. fseeko() takes an off_t type which is 64 bits on all FreeBSD architectures.

Why does an EXE file that does *nothing* contain so many dummy zero bytes?

I've compiled a C file that does absolutely nothing (just a main that returns... not even a "Hello, world" gets printed), and I've compiled it with various compilers (MinGW GCC, Visual C++, Windows DDK, etc.). All of them link with the C runtime, which is standard.
But what I don't get is: When I open up the file in a hex editor (or a disassembler), why do I see that almost half of the 16 KB is just huge sections of either 0x00 bytes or 0xCC bytes? It seems rather ridiculous to me... is there any way to prevent these from occurring? And why are they there in the first place?
Thank you!
Executables in general contain a code segment and at least one data segment. I guess each of these has a standard minimum size, which may be 8K. And unused space is filled up with zeros. Note also that an EXE written in a higher level (than assembly) language contains some extra stuff on top of the direct translation of your own code and data:
startup and termination code (in C and its successors, this handles the input arguments, calls main(), then cleans up after exiting from main())
stub code and data (e.g. Windows executables contain a small DOS program stub whose only purpose is to display the message "This program is not executable under DOS").
Still, since executables are usually supposed to do something (i.e. their code and data segment(s) do contain useful stuff), and storage is cheap, by default noone optimizes for your case :-)
However, I believe most of the compilers have command line parameters with which you can force them to optimize for space - you may want to check the results with that setting.
Here is more details on the EXE file formats.
As it turns out, I should've been able to guess this beforehand... the answer was the debug symbols and code; those were taking up most of the space. Not compiling with /DEBUG and /PDB (which I always do by default) reduced the 13 K down to 3 K.

Resources