Why MD5 registers are difference between standard and program? - md5

MD5 algorithm has 4 standard registers, A = (01234567)16, B = (89abcdef)16, C = (fedcba98)16, D = (7654321)16, while in java program, it is A=0X67452301L,B=0XEFCDAB89L,C=0X98BADCFEL,D=0X10325476L, why there is a difference?

This is likely due to an interpretation of the values in two different architectures. I suspect the register values referenced were obtained from x86 architecture which is Little-endian whereas Java is Big-endian.
http://en.wikipedia.org/wiki/Endianness

Related

How to swap the byte order for individual words in a vector in ARM/ACLE

I usually write portable C code and try to adhere to strictly standard-conforming subset of the features supported by compilers.
However, I'm writing codes that exploits the ARM v8 Cryptography extensions to implement SHA-1 (and SHA-256 some days later). A problem that I face, is that, FIPS-180 specify the hash algorithms using big-endian byte order, whereas most ARM-based OS ABIs are little-endian.
If it's a single integer operand (on general purpose register) I can use the APIs specified for the next POSIX standard, but I'm working with SIMD registers, since it's where ARMv8 Crypto works.
So Q: how do I swap the byte order for words in a vector register on ARM? I'm fine with assembly answers, but prefer ACLE intrinsics ones.
The instructions are:
REV16 for byte-swapping short integers,
REV32 for byte-swapping 32-bit integers, and
REV64 for byte-swapping 64-bit integers.
They can be used to swap the byte AND word order of any length that's strictly less than what their name indicates. They're defined in section C7.2.219~C7.2.221 of Arm® Architecture Reference Manual
Armv8, for A-profile architecture "DDI0487G_b_armv8_arm.pdf"
e.g. REV32 can be used to reverse the order of 2 short integers within each 32-bit words:
[00][01][02][03][04][05][06][07]
to
[02][03][00][01][06][07][04][05]
Their intrinsics are defined in a separate document: Arm Neon Intrinsics Reference "advsimd-2021Q2.pdf"
To swap the 32-bit words in a 128-bit vector, use the vrev32q_u8 instrinsic. Relevant vreinterpretq_* intrinsics need to be used to re-interpret the type of the operands.

Why do we use explicit data types? (from a low level point of view)

When we take a look at some fundamental data types, such as char and int, we know that a char is simply an unsigned byte (depending on the language), int is just a signed dword, bool is just a char that can only be 1 or 0, etc. My question is, why do we use these types in compiled languages instead of just declaring a variable of type byte, dword, etc, since the operations applied to the types mentionned above are pretty much all the same, once you differentiate signed and unsigned data, and floating point data?
To extend the context of the question, in the C language, if and while statements can take a boolean value as an input, which is usually stored as a char, which exausts the need for an explicit boolean type.
In practice, the 2 pieces of code should be equivilant at the binary level:
int main()
{
int x = 5;
char y = 'c';
printf("%d %c\n", x - 8, y + 1);
return 0;
}
//outputs: -3 d
-
signed dword main()
{
signed dword x = 5;
byte y = 'c';
printf("%d %c\n", x - 8, y + 1);
return 0;
}
//outputs: -3 d
My question is, why do we use these types in compiled languages
To make the code target-agnostic. Some platforms only have efficient 16-bit integers, and forcing your variables to always be 32-bit would make your code slower for no reason when compiled for such platforms. Or maybe you have a target with 36-bit integers, and a strict 32-bit type would require extra instructions to implement.
Your question sounds very x86-centric. x86 is not the only architecture, and for most languages not the one language designers had in mind.
Even more recent languages that were designed in the era of x86 being widespread on desktops and servers were designed to be portable to other ISAs, like 8-bit AVR where a 32-bit int would take 4 registers vs. 2 for a 16-bit int.
A programming language defines an "abstract" data model, that a computer designer is free to implement his way. For instance, nothing mandates to store a Boolean in a byte, it could be "packed" as a single bit along with others. And if you read carefully the C standard, you will notice that a char has no defined size.
[Anecdotically, I recall an old time when FORTRAN variables, including integers, floats but also booleans, were stored on 72 bits on IBM machines.]
Language designers should put little constraints on machine architecture, to leave opportunities for nice designs. In fact, languages have no "low level", they implicitly describe a virtual machine not tied to a particular hardware (it could be implemented with cogwheels and ropes).
As far as I know, only the ADA language went to the point of specifying in details all the characteristics of the arithmetic, but not to the point of enforcing a number of bits per word.
Ignoring the boolean type was one of the saddest design decision in the C language. I took as late as C99 to integrate it :-(
Another sad decision is to have stopped considering the int type as the one that naturally fits in a machine word (and should have become 64 bits in current PCs).
The point of a high-level language is to provide some isolation from machine details. So, we speak of "integers", not some particular number of bytes of memory. The implementation then maps the higher-level types on whatever seems best suited to the target hardware.
And there are different semantics associated with different 4-byte types: for integers, signed versus unsigned is important to some classes of programs.
I understand this is a C question and it's arguable about how high-level C is or is not; but it is at least intended to be portable across machine architectures.
And, in your example, you assume 'int' is 32 bits. Nothing in the language says that has to be true. It has not always been true, and certainly was not true in the original PDP-11 implementation. And nowadays, for example, it is possibly appropriate to have 'int' be 64 bits on a 64-bit machine.
Note that it's not invariable that languages have types like "integer", etc. BLISS, a language at the same conceptual level as C, has the machine word as the only builtin datatype.

C code written in 1990 executable runs. Recompiled now, it gets read errors on old file

I have a C program last compiled in 1990, that reads and writes some binary files. The executable still works, reading and writing them perfectly. I need to recompile the source, add some features, and then use the code, reading in some of the old data, and outputting it with additional information.
When I recompile the code, with no changes, and execute it, it fails reading in the old files, giving segmentation faults when I try to process the data read into an area of memory. I believe that the problem may be that the binary files written earlier used 4 8-bit byte integers, 8 byte longs, and 4 byte floats. The architecture on my machine now uses 64-bit words instead of 32. Thus when I extract an integer from the data read in, it is aligned incorrectly and sets an array index that is out of range for the program space.
On the Mac OS X 10.12.6, using its C compiler which might be:
Apple LLVM version 8.0.0 (clang-800.0.33.1)
Target: x86_64-apple-darwin16.7.0
Is there a compiler switch that would set the compiled lengths of integers and floats to the above values? If not, how do I approach getting the code to correctly read the data?
Welcome to the world of portability headaches!
If your program was compiled in 1990, there is a good chance it uses 4 byte longs, and it is even possible that it use 2 byte int, depending on the architecture it was compiled for.
The size of basic C types is heavily system dependent, among a number of more subtle portability issues. long is now 64-bit on both 64-bit linux and 64-bit OS/X, but still 32-bit on Windows (for both 32-bit and 64-bit versions!).
Reading binary files, you must also deal with endianness, that changed from big-endian in 1990 MacOS to little-endian on today's OS/X, but still big-endian on other systems.
To make matters worse, the C language evolved over this long period and some non trivial semantic changes occurred between pre-ANSI C and Standard C. Some old syntaxes are no longer supported either...
There is no magic flag to address these issues, you will need to dive into the C code and understand what is does and try and modernize the code and make it more portable, independent on the target architecture. You can use the fixed width types from <stdint.h> to ease this process (int32_t, ...).
People answering C questions on Stackoverflow are usually careful to post portable code that works correctly for all target architectures, even some purposely vicious ones such as the DS9K (a ficticious computer that does everything in correct but unexpected ways).

Is the Variable size in different Microcontroller will be the same? [duplicate]

This question already has answers here:
Different int sizes on my computer and Arduino
(5 answers)
Closed 9 years ago.
If we define A Variable "integer" In PIC Microcontroller, Will It be the same size when I define the same "int" Variable At Atmel Microcontroller ? Or it will be different sizes ?
This Question is in Embedded Systems interview, What Should the Answer be ?
I'm a little confused !!
Does it depend on the Microcontroller or the programming language ?
Does the same variables Type like integer are the same size in all different programming languages ??
It's Not the same question as It's a little different in the Embedded controllers.
The answer to the interview question should be something like:
Possibly, where it matters one should use the types defined in stdint.h, or otherwise consult the compiler documentation or inspect the definitions in limits.h.
The interviewer is unlikely to be asking for a yes/no answer and probably would not appreciate such terseness in an interview situation in any case - the questions are intended to get you talking until you have said something useful or interesting about yourself or your abilities and knowledge. What he is perhaps looking for is whether you are aware of the fact that standard type sizes in C are a compiler/architecture dependency and how you might handle the potential variability in portable code.
It is likely and possible that an int between one PIC and another PIC or one Atmel and another will differ let alone between PIC and Atmel. An Atmel AVR32 for example will certainly differ from an an 8bit AVR, and similarly the MIPS based PIC32 differs from "classic" PICs.
Also the size of built in types is strictly a "compiler implementation" issue, so it is possible that two different compilers for the same processor will differ (although it is highly improbable - since no compiler vendor would sensibly go out of their way to be that perverse!).
Languages other than C and C++ (and assembler of course) are less common on small micro-controllers because these are systems level languages with minimal runtime environment requirements, but certainly the sizes of types may vary depending on the language definition.
The problem is that standard C types will tend to vary from implementation to implementation. Using the types found in stdint.h will allow you to specify how many bits you want.
It depends on the architecture 32 bits or 64 bits.
On 32 bits systems, your integer would be coded on 32bit :
for signed integer 32 bits :
value between -2,147,483,648 and 2,147,483,647
On 64 bits system it will be 64 :
for signed integer 64 bits : value between -9223372036854775808 and 9223372036854775807
So to answer your question integer can have different size depending on the architecture you are using.
TIP: If you have your code assume that a specific type is of a specific size, you can verify this assumption during compilation:
#define C_ASSERT(cond) char c_assert_var_##__LINE__[(cond) ? 1 : -1]
C_ASSERT(sizeof(int) == 4);
During compile-time, this will produce the following code:
char c_assert_var_350[(sizeof(int) == 4) ? 1 : -1];
which will not compile if sizeof(int) != 4
It depends on many things, I can't say neither yes nor no, but my answer is more no.
int is guaranteed to be 16-bit. However in many latter architecture int is 32-bit number and it doesn't break any rules. As far I know what in Atmels 8-bit microcontrollers int is 16-bit, not sure about PIC.
Anyway, my suggest would be to use defined types. I don't know what compiler you are using but I'm using AVR Studio. It has defined types such as:
uint8_t
int8_t
uint16_t
...
int64_t
So these types are guaranteed to have same size on every processor, you just need to make a little research through your compiler.

fwrite portability

Is fwrite portable? I'm not really faced to the problem described below but I'd like to understand the fundamentals of C.
Lest assume we have two machines A8 (byte = 8bits) and B16 (byte = 16 bits).
Will the following code produce the same output on both machines ?
unsigned char[10] chars;
...
fwrite(chars,sizeof(unsigned char),10,mystream);
I guess A8 will produce 80 bits (10 octets) and B16 will produce 160 bits (20 octets).
Am I wrong?
This problem won't appear if only uintN_t types were used as their lengths in bits are independent of the size of the byte. But maybe uint8_t won't exist on B16.
What is the solution to this problem?
I guess building an array of uint32_t, putting my bytes in this array (with smart shifts and masks depending on machine's architecture) and writing this array will solve the problem. But this not really satisfactory.There is again an assumption that uint32_t exists on all platforms.The filling of this array will be very dependant on the current machine's architecture.
Thanks for any response.
fwrite() is a standard library function. So it must be portable for each C compiler.
That is it must be defined in C standard library of that compiler to support your machine.
So machine of 8bit, 16 bit, 32 bit give you same high level operation.
But if you want to design those library function then you have to consider machine architecture, memory organization of that machine.
As a C compiler user you should not bother about internal behavior.
I think you just want to use those C library function. So no difference in behavior of the function for different machine.
A byte is on almost every modern computer 8 bits. But there is an other reason fwrite isn't portable:
A file which was written on a Little Endian machine can't be readed by a big endian machine and other way.
In C, char is defined as "smallest addressable unit of the machine". That is, char is not necessarily 8 bits.
In most cases, it's safe enough to rely on a fact that char is 8 bits, and not to deal with some extreme cases.
To speak generally, you probably won't be able to write "half of a byte" to a file on a storage. Additionally, there will be issues with portability on hardware level between devices which are designed to work with different byte size machines. If you are dealing with other devices (such as telecom or stuff), you will have to implement bit streams.

Resources