Two binaries were compiled. One is on a 32-bit Windows 7 machine and other is on a 64-bit Windows 10. All the source files and dependencies are the same; however, after compilation the checksum was compared and they are different. Could someone provide an explanation as to why?
If you're comparing a 32-bit build and a 64-bit build you'll need to keep in mind that the compiled code will be almost completely different. 64-bit x86_64 and 32-bit x86 machine code will not only vary considerably, but even if that was not the case, remember that pointers are twice the size in 64-bit code, so a lot of code will be structured differently and addresses in the code will be expressed as bigger pointers.
The technical reason is the machine code is not the same between different architectures. Intel x86, and x86_64 have considerable variation in how the opcodes are expressed, and in how programs are structured internally. If you want to read up on the differences there's a lot of published material that can explain, like Intel's own references.
As others have pointed out building twice on the same machine may not even produce a byte-for-byte matching binary. There will be slight differences in it, especially if there's a code-signing step.
In short, you can't do this and expect them to match.
All the source files and dependencies are the same; however, after compilation the checksum was compared and they are different.
The checksum is calculated from the binary code that the compiler produces, not the source code. If anything is different, you should get a different checksum. Changes in the source code will change the binary, so a different checksum. But using a different compiler will also produce a different binary. Using the same compiler with different options will produce a different binary. Compiling the same program with the same compiler and the same compiler options on a different machine might produce a different binary. Compiling the same program for different processor architectures will definitely produce a different binary.
Related
I have read several times before that the length of the data type in C depends on the architecture on the PC.
But recently I have read documentation of a compiler and this documentation specifies the data types that you can access and the length of this data type.
So my question is, the data type length depends on the compiler your using, the architecture of the computer or both?
The final determination of the size of each type is up to the C implementation, subject to the constraints of the C standard. (The standard specifies things such as that char is one “byte,” and a byte is eight bits or more, and that certain types are the same size as each other.)
A C implementation typically consists of a compiler, header files, libraries, a linker, and a target system to run on, and possibly other components. The compiler has a large influence on the characteristics of types, especially the core types of C, but sometimes header files complete the definitions of types, so that one compiler may be used with different sets of header files that provide different types.
Most C implementations are influenced by the characteristics of the execution architecture they are designed for. An int will often be the same size as a general processor register, and a pointer (of any type) will often be the smallest size that is a power-of-two number of bytes and has enough bits to span the address space of the target architecture.
However, C implementations may be designed for other purposes. I have seen a C implementation that used a 32-bit pointer size for a 64-bit architecture, for the purpose of reducing memory usage in programs that had a great many pointers. A C implementation could also be designed to run older C code on a new processor, so the C implementation used the sizes of types that the older code was written for rather than sizes that would be more “natural” on the new processor.
It can depend on either. Of course the executable produced by the compiler will need to run efficiently on the CPU architecture that it is compiled for, so if (for example) you are cross-compiling software to run it on a 16-bit microcontroller, than it is very likely that sizeof(int)==2 in the compiled code, even if you are using a modern 64-bit desktop PC to do the compilation.
Simultaneously to that, it's also up to the compiler writers (within limits) to decide what sizes various non-width-specific data types should have in their compiler. So for example, some compilers set sizeof(int)==4 while others set sizeof(int)==8, for reasons particular to their own history and user's needs.
I have a C program last compiled in 1990, that reads and writes some binary files. The executable still works, reading and writing them perfectly. I need to recompile the source, add some features, and then use the code, reading in some of the old data, and outputting it with additional information.
When I recompile the code, with no changes, and execute it, it fails reading in the old files, giving segmentation faults when I try to process the data read into an area of memory. I believe that the problem may be that the binary files written earlier used 4 8-bit byte integers, 8 byte longs, and 4 byte floats. The architecture on my machine now uses 64-bit words instead of 32. Thus when I extract an integer from the data read in, it is aligned incorrectly and sets an array index that is out of range for the program space.
On the Mac OS X 10.12.6, using its C compiler which might be:
Apple LLVM version 8.0.0 (clang-800.0.33.1)
Target: x86_64-apple-darwin16.7.0
Is there a compiler switch that would set the compiled lengths of integers and floats to the above values? If not, how do I approach getting the code to correctly read the data?
Welcome to the world of portability headaches!
If your program was compiled in 1990, there is a good chance it uses 4 byte longs, and it is even possible that it use 2 byte int, depending on the architecture it was compiled for.
The size of basic C types is heavily system dependent, among a number of more subtle portability issues. long is now 64-bit on both 64-bit linux and 64-bit OS/X, but still 32-bit on Windows (for both 32-bit and 64-bit versions!).
Reading binary files, you must also deal with endianness, that changed from big-endian in 1990 MacOS to little-endian on today's OS/X, but still big-endian on other systems.
To make matters worse, the C language evolved over this long period and some non trivial semantic changes occurred between pre-ANSI C and Standard C. Some old syntaxes are no longer supported either...
There is no magic flag to address these issues, you will need to dive into the C code and understand what is does and try and modernize the code and make it more portable, independent on the target architecture. You can use the fixed width types from <stdint.h> to ease this process (int32_t, ...).
People answering C questions on Stackoverflow are usually careful to post portable code that works correctly for all target architectures, even some purposely vicious ones such as the DS9K (a ficticious computer that does everything in correct but unexpected ways).
Beforehand: This is just a nasty idea I had this night :-)
Think about the following scenario:
You have some arm-elf executable and for some reasons you want to run it on your amd64 box without emulating.
To simplify the scenario, let's say we just want to deal with simple console applications which are just linked against libc and there are no additional architecture specific requirements.
If you want to transform binaries between different architectures you have to consider the following points:
Endianess of the architectures
bit-width of registers
functionality of different registers
Endianess should be one of the lesser problems.
If the bit-width of the destination registers is smaller then those of the source architecture one could insert additional instructions to represent the same behaviour. The same applies to the functionality of registers.
Finally, (and before bashing down this idea), have a look at the following simple code snippet and its corresponding disassembly of the objects.
C Code
Corresponding ARM Disassembly
Corresponding AMD64 Disassembly
In my opinion it should be possible to convert those objects between different architectures. Even function calls (like printf) could be mapped or wrapped to the destination architecture's libc.
And now my questions:
Did anyone already think about realising this?
Is it actually possible?
Are there already some projects dealing with this issue?
Thanks in advance!
There is a library which I build against different 32-bit platforms. Now, 64-bit architectures must be supported. What are the most general strategies to extend existing 32-bit code to support 64-bit architectures? Should I use #ifdef's or anything else?
The amount of effort involved will depend entirely on how well written the original code is. In the best possible case there will be no effort involved other than re-compiling. In the worst case you will have to spend a lot of time making your code "64 bit clean".
Typical problems are:
assumptions about sizes of int/long/pointer/etc
assigning pointers <=> ints
relying on default argument or function result conversions (i.e. no function prototypes)
inappropriate printf/scanf format specifiers
assumptions about size/alignment/padding of structs (particularly in regard to file or network I/O, or interfacing with other APIs, etc)
inappropriate casts when doing pointer arithmetic with byte offsets
Simply don't rely on assumption of the machine word size? always use sizeof, stdint.h, etc. Unless you rely on different library calls for different architectures, there should be no need for #ifdefs.
The easiest strategy is to build what you have with 64-bit settings and test the heck out of it. Some code doesn't need to change at all. Other code, usually with wrong assumptions about the size of ints/pointers will be much more brittle and will need to be modified to be non-dependant on the architecture.
Very often binary files containing binary records cause the most problems. This is especially true in environments where ints grow from 32-bit to 64-bit in the transition to a 64-bit build. Primarily this is due to the fact that integers get written natively to files in their current (32-bit) length and read in using an incorrect length in a 64-bit build where ints are 64-bit.
Is size of a datatype hardware architecture dependent or compiler dependent?
I want to know what factors really influence in determining the size of a datatype?
The compiler (more properly the "implementation") is free to choose the sizes, subject to the limits in the C standard (for instance int must be at least 16 bits). The compiler optionally can subject itself to other standards, like POSIX, which can add more constraints. For example I think POSIX says all data pointers are the same size, whereas the C standard is perfectly happy for sizeof(int*) != sizeof(char*).
In practice, the compiler-writer's decisions are strongly influenced by the architecture, because unless there's a strong reason otherwise they want the implementation to be efficient and interoperable. Processor manufacturers or OS vendors often publish a thing called a "C ABI", which tells you (among other things), how big the types are and how they're stored in memory. Compilers are never obliged to follow the standard ABI for their architecture, and CPUs often have more than one common ABI anyway, but to call directly from code out of one compiler to code out of another, both compilers have to be using the same ABI. So if your C compiler doesn't use the Windows ABI on Windows, then you'd need extra wrappers to call into Windows dlls. If your compiler supports multiple platforms, then it quite likely uses different ABIs on different platforms.
You often see abbreviations used to indicate which of several ABIs is in use. So for instance when a compiler on a 64 bit platform says it's LP64, that means long and pointers are 64bit, and by omission int is 32bit. If it says ILP64, that means int is 64bit too.
In the end, it's more a case of the compiler-writer choosing from a menu of sensible options, than picking numbers out of the air arbitrarily. But the implementation is always free to do whatever it likes. If you want to write a compiler for x86 which emulates a machine with 9-bit bytes and 3-byte words, then the C standard allows it. But as far as the OS is concerned you're on your own.
The size is ultimately determined by the compiler. e.g. Java has a fixed set of sizes (8,16,32,64) while the set of sizes offered by C for its various types depends in part on the hardware it runs on; i.e. the compiler makes the choice but (except in cases like Java where datatypes are explicitly independent of underlying hardware) is strongly influenced by what the hardware offers.
The size of different data types is compiler, and its configuration, dependent (different compilers, or different switches to the same compiler, on some machine can have different sizes).
Usually the compiler is matched to the hardware it is installed on ... so you could say that the sizes of types are also hardware dependent. Making a compiler that emits 16-bit pointers on a machine where they are 48-bits is counter productive.
But it is possible to use a compiler on a computer to create a program meant to run on a different computer with different sizes.
It depends on the target hardware architecture, operation system and possibly the compiler.
The intel compiler sizes a long integer as follows:
OS arch size
Windows IA-32 4 bytes
Windows Intel 64 4 bytes
Windows IA-64 4 bytes
Linux IA-32 4 bytes
Linux Intel 64 8 bytes
Linux IA-64 8 bytes
Mac OS X IA-32 4 bytes
Mac OS X Intel 64 8 bytes
Here as link to show the sizes on microsoft visual c++ compiler.
The size of the "native" datatypes is up to the compiler. While this in turn is influenced by the hardware, I wouldn't start guessing.
Have a look at <stdint.h> - that header has platform-independent typedef's that should cater for whatever needs you might have.
The answer to your question is is yes, I'll explain.
Consider common storage types, i.e. size_t, int64_t, etc. These are decided (defined in most cases) at compile time depending on the architecture that you are building on. Don't have them? Fear not, the compiler figured out the underlying meaning of int and how unsigned effects it.
Once the standard C headers figure out the desired word length, everything just adjusts to your system.
Unless, of course, you happen to be cross compiling, then they are decided (or defined) by whatever architecture you specified.
In short, your compiler (well, mostly preprocessor) is going to adjust types to fit the target architecture. Be it the one you are using, or the one you are cross compiling for.
That is if I understand your question correctly. This is one of the few 'magic' abstractions that the language provides and part of the reason why its often called 'portable assembly'.
\\ It is exclusively compiler dependent.
Or to say more correctly C language standard dependent (http://en.wikipedia.org/wiki/C99).
As in the standard are clearly assigned sizes of bult-in types.
But. They are not fixed "one size for all" there.
There is just a minimal size (e.g. char is at least 8bit) need to be preserved by any compiler on any architecture.
But also it can be 16 or even 32 bit char depend on arch.
And also there is reletive sizes between different types are preserved.
It mean, that for example short is can be 8, 16 or 32 bit, but it cannot be more in length then more wide type int on the same architecture.
Only smaller or same length.
That is a borders inside whith compiler developers must work if they want to make C standard compatible compiler.