C- how to make a variable to be big end. by default? - c

I want my program to use big end by default (now it is little).
That means,everytime I declare an uint_32/int,the value which be transformed to it will be on big end order.
Is that possible? (Without calling each time to ntohl()).
I have researched on google and s.o for 3 days and haven't got any anawer
I will strongly appreciate any help!
edit:
I have server and client.
The client works with big endian,and the server is little endian.
Now,i am sending to the server md5 value for a byte array,ntohl()ing it on the server and get the valid md5 value.
In the server,when I call the md5.c function (which is dll,by the way),I am getting a different value.
This value is not even similair on any way to the recived value.
I assume it happens because of the endiannes.
The byte array I send to the function is not changing because those are single bytes,which are not seensetive the endianess,but the vars I declare on the function,and with them I manipulate my byte arr,can make a problem.
That is the reason big endian is so important to me.

I want my program to use big end by default
You cannot. The endianness is a fixed property of the target processor (of your C compiler) and related to its instruction set architecture.
You might in principle (but with C, that is really not worth the trouble) use a C compiler for some virtual machine (like WebAssembly, or MMIX) of the appropriate endianness, or even define a bytecode VM for a machine of desired endianness.
A variant of that might be to cross-compile for some other instruction set architecture (e.g. MIPS, ARM, etc...) and use some emulator to run the executable (and the necessary OS).
Regarding your edit, you could consider sending on the network the md5sum in alphanumerical form (e.g. like the output of the md5sum command).

You can compile the program for different processors, some of which use big endian instead of little endian, or you can control how your code will be compiled in that aspect. For example, for MIPS: https://gcc.gnu.org/onlinedocs/gcc/MIPS-Options.html

Endianness is purely about how a processor performs multi-byte arithmetic. The only time a programmer need to be aware is when serializing or addressing parts of an integer.
So unless you can change how a processor performs arithmetic on multi-byte words (ARM allows you to change endianness) you are stuck with how the processor works.

Related

Low level languages and their dependencies

I am trying to understand exactly what it means that low-level languages are machine-dependent.
Let's take for example C, well if it is machine-dependent does it mean that if it was compiled on one computer it might not be able to run on another?
In the end processors executes machine code which is basicly a collection of binary numbers. The processor decode each binary number to figure out what it is supposed to do. One binary number could mean "Add register X to register Y and store the result in register Z". Another binary number could mean "Store the content of register X into the memory address held by register Y". And so on...
The complete description of these decoding rules (i.e. binary number into operation) represents the processors instruction set (aka ISA).
A low level language is a language where the code you can write maps very closely to the specific processors instruction set. Assembly is one obvious example. Since different processor may have different instruction sets, it's clear that an assembly program written for one processors ISA can't be used on a processor with a different ISA.
Let's take for example C, well if it is machine-dependent does it mean that if it was compiled on one computer it might not be able to run on another?
Correct. A program compiled for one processor (family) can't run on another processor with (completely) different ISA. The program needs to be recompiled.
Also notice that the target OS also plays a role. If you use the same processor but use different OS you'll also need to recompile.
There are at least 3 different kind of languages.
A languages that is so close to the target systems ISA that the source code can only be used on that specific target. Example: Assembly
A language that allows you to write code that can be used on many different targets using a target specific compilation. Example: C
A language that allows you to write code that can be used on many different targets without a target specific compilation. These still require some kind of target specific runtime environment to be installed. Example: Java.
High-Level languages are portable, meaning every architecture can run high-level programs but, compared to low-level programs (like written in Assembly or even machine code), they are less efficient and consume more memory.
Low-level programs are known as "closer to the hardware" and so they are optimized for a certain type of hardware architecture/processor, being faster programs, but relatively machine-dependant or not-very-portable.
So, a program compiled for a type of processor it's not valid for other types; it needs to be recompiled.
In the before
When the first processors came out, there was no programming language whatsoever, you had a very long and very complicated documentation with a list of "opcodes": the code you had to put into memory for a given operation to be executed in your processor. To create a program, you had to put a long string of number in memory, and hope everything worked as documented.
Later came Assembly languages. The point wasn't really to make algorithms easier to implement or to make the program readable by any human without any experience on the specific processor model you were working with, it was created to save you from spending days and days looking up things in a documentation. For this reason, there isn't "an assembly language" but thousands of them, one per instruction set (which, at the time, basically meant one per CPU model)
At this point in time, all languages were platform-dependent. If you decided to switch CPUs, you'd have to rewrite a significant portion (if not all) of your code. Recognizing that as a bit of a problem, someone created a the first platform-independent language (according to this SE question it was FORTRAN in 1954) that could be compiled to run on any CPU architecture as long as someone made a compiler for it.
Fast forward a bit and C was invented. C is a platform-independent programming language, in the sense that any C program (as long as it conforms with the standard) can be compiled to run on any CPU (as long as this CPU has a C compiler). Once a C program has been compiled, the resulting file is a platform-dependent binary and will only be able to run on the architecture it was compiled for.
C is platform-dependent
There's an issue though: a processor is more than just a list of opcodes. Most processors have hardware control devices like watchdogs or timers that can be completely different from one architecture to another, even the way to talk to other devices can change completely. As such, if you want to actually run a program on a CPU, you have to include things that make it platform-dependent.
A real life example of this is the Linux kernel. The majority of the kernel is written in C but there's still around 1% written in different kinds of assembly. This assembly is required to do things such as initialize the CPU or use timers. Using this hack means Linux can run on your desktop x86_64 CPU, your ARM Android phone or a RISCV SoC but adding any new architecture isn't as simple as just "compile it with your architecture's compiler".
So... Did I just say the only way to run a platform-independent on an actual processor is to use platform-dependent code? Yes, for most architectures, you have to.
Or is it?
But there's a catch! That's only true if you want to run you code on bare metal (meaning: without an OS). One of the great things of using an OS is how abstracted everything is: you don't need to know how the kernel initializes the CPU, nor do you need to know how it gets its clock, you just need to know how to access those abstracted resources.
But the way of accessing resources dependent on the OS, aren't we back to square one? We could be, if not for the standard library! This library is used to access functions like printf in a defined way. It doesn't matter if you're working on a Linux running on PowerPC or on an ARM Windows, printf will always print things on the standard output the same way.
If you write standard C using only the standard library (and intend for your program to run in an OS) C is completely platform-independent!
EDIT: As said in the comments below, even that is not enough. It doesn't really have anything to do with specific CPUs but some things such as the system function or the size of some types are documented as implementation-defined. To make C really platform independent you need to make sure to only use well defined functions of the STL and learn some best practice (never rely on sizeof(int)==4 for instance).
Thinking about 'what's a program' might help you understand your question. Is a program a collection of text (that you've typed in or otherwise manufactured) or is it something you run? Is it both?
In the case of a 'low-level' language like C I'd say that the text is the program source, and that this is turned into a program (aka executable) by a compiler. A program is something you can run. You need a C compiler for a system to be able to make the program source into a program for that system. Once built the program can only be run on systems close to the one it was compiled for. However there is a more interesting, if more difficult question: can you at least keep the program source the same, so that all you need to do is recompile? The answer to this is 'sort-of No' I sort-of think. For example you can't, in pure C, read the state of the shift key. Of course operating systems provide such facilities and you can interface to those in C, but then such code depends on the OS. There might be libraries (eg the curses library) that provide such facilities for many OS and that can help to reduce the dependency, but no library can clain to portably cover all OS.
In the case of a 'higher-level' language like python I'd say the text is both the program and the program source. There is no separate compilation stage with such languages, but you do need an interpreter on a system to be able to run your python program on that system. However that this is happening may not be clear to the user as you may well seem to be able to run your python 'program' just by naming it like you run your C programs. But this, most likely comes down to the shell (the part of the OS that deals with commands) knowing about python programs and invoking the interpreter for you. It can appear then that you can run your python program anywhere but in fact what you can do is pass the program to any python interpreter.
In the zoo of programming there are not only many, very varied beasts, but new kinds of beasts arise all the time, and old beasts metamorphose. Terms like 'program', 'script' and even 'executable' are often used loosely.

Define size of long and time_t to 4bytes

I want to synchronize two Raspberry Pi's with a C program. It is working fine, if the program only is running on the Pi's, but for development I want to use my PC (where its also easier to debug), but I send the timespec struct directly as binary over the wire. A raspberry is using 4bytes for long and time_t, my PC is using 8byte each... So they do not come together.
Is it possible to set long and time_t to 4byte each, only for this C script?
I know that the size of long, short, etc. is defined by the system.
Important: I only want to define it once in the script and not transforming it to uintXX or int each time.
In programming, it is not uncommon to need to treat network transmissions as separate from in-memory handling. In fact, it is pretty much the norm. So converting it to a network format of the proper byte order and size is really recommended and while help with the abstractions for your interfaces.
You might as well consider transforming to plain text, if that is not a time-critical piece of data exchange. It makes for a lot easier debugging.
C is probably not the best tool for the job here. It's much too low level to provide automatic data serialization like JavaScript, Python or similar more abstract languages.
You cannot assume the definitions of timespec will be identical on different platforms. For one thing the size of an int will be different depending on the 32/64 bits architecture, and you can have endianing problems too.
When you want to exchange data structures between heterogeneous platforms, you need to define your own protocol with unambiguous data and a clear endianing convention.
One solution would be to send the numbers as ASCII. Not terribly efficient, but if it's just a couple of values, who cares?
Another would be to create an exchange structure with (u)intXX_t fields.
You can assume a standard raspberry kernel will be little endian like your PC, but if you're writing a small exchange protocol, you might as well add a couple of htonl/ntohl for good measure.

Simple 32 to 64-bit transition?

Is there a simple way to compile 32-bit C code into a 64-bit application, with minimal modification? The code was not setup to use fixed type sizes.
I am not interested in taking advantage of 64-bit memory addressing. I just need to compile into a 64-bit binary while maintaining 4 byte longs and pointers.
Something like:
#define long int32_t
But of course that breaks a number of long use cases and doesn't deal with pointers. I thought there might be some standard procedure here.
There seem to be two orthogonal notions of "portability":
My code compiles everywhere out of the box. Its general behaviour is the same on all platforms, but details of available features vary depending on the platform's characteristics.
My code contains a folder for architecture-dependent stuff. I guarantee that MYINT32 is always 32 bit no matter what. I successfully ported the notion of 32 bits to the nine-fingered furry lummoxes of Mars.
In the first approach, we write unsigned int n; and printf("%u", n) and we know that the code always works, but details like the numeric range of unsigned int are up to the platform and not of our concern. (Wchar_t comes in here, too.) This is what I would call the genuinely portable style.
In the second approach, we typedef everything and use types like uint32_t. Formatted output with printf triggers tons of warnings, and we must resort to monsters like PRI32. In this approach we derive a strange sense of power and control from knowing that our integer is always 32 bits wide, but I hesitate to call this "portable" -- it's just stubborn.
The fundamental concept that requires a specific representation is serialization: The document you write on one platform should be readable on all other platforms. Serialization is naturally where we forgo the type system, must worry about endianness and need to decide on a fixed representation (including things like text encoding).
The upshot is this:
Write your main program core in portable style using standard language primitives.
Write well-defined, clean I/O interfaces for serialization.
If you stick to that, you should never even have to think about whether your platform is 32 or 64 bit, big or little endian, Mac or PC, Windows or Linux. Stick to the standard, and the standard will stick with you.
No, this is not, in general, possible. Consider, for example, malloc(). What is supposed to happen when it returns a pointer value that cannot be represented in 32 bits? How can that pointer value possibly be passed to your code as a 32 bit value, that will work fine when dereferenced?
This is just one example - there are numerous other similar ones.
Well-written C code isn't inherently "32-bit" or "64-bit" anyway - it should work fine when recompiled as a 64 bit binary with no modifications necessary.
Your actual problem is wanting to load a 32 bit library into a 64 bit application. One way to do this is to write a 32 bit helper application that loads your 32 bit library, and a 64 bit shim library that is loaded into the 64 bit application. Your 64 bit shim library communicates with your 32 bit helper using some IPC mechanism, requesting the helper application to perform operations on its behalf, and returning the results.
The specific case - a Matlab MEX file - might be a bit complicated (you'll need two-way function calling, so that the 64 bit shim library can perform calls like mexGetVariable() on behalf of the 32 bit helper), but it should still be doable.
The one area that will probably bite you is if any of your 32-bit integers are manipulated bit-wise. If you assume that some status flags are stored in a 32-bit register (for example), or if you are doing bit shifting, then you'll need to focus on those.
Another place to look would be any networking code that assumes the size (and endian) of integers passed on the wire. Once those get moved into 64-bit ints you'll need to make sure that you don't lose sign bits or precision.
Structures that contain integers will no longer be the same size. Any assumptions about size and alignment need to be cleaned out.

Adding 64 bit support to existing 32 bit code, is it difficult?

There is a library which I build against different 32-bit platforms. Now, 64-bit architectures must be supported. What are the most general strategies to extend existing 32-bit code to support 64-bit architectures? Should I use #ifdef's or anything else?
The amount of effort involved will depend entirely on how well written the original code is. In the best possible case there will be no effort involved other than re-compiling. In the worst case you will have to spend a lot of time making your code "64 bit clean".
Typical problems are:
assumptions about sizes of int/long/pointer/etc
assigning pointers <=> ints
relying on default argument or function result conversions (i.e. no function prototypes)
inappropriate printf/scanf format specifiers
assumptions about size/alignment/padding of structs (particularly in regard to file or network I/O, or interfacing with other APIs, etc)
inappropriate casts when doing pointer arithmetic with byte offsets
Simply don't rely on assumption of the machine word size? always use sizeof, stdint.h, etc. Unless you rely on different library calls for different architectures, there should be no need for #ifdefs.
The easiest strategy is to build what you have with 64-bit settings and test the heck out of it. Some code doesn't need to change at all. Other code, usually with wrong assumptions about the size of ints/pointers will be much more brittle and will need to be modified to be non-dependant on the architecture.
Very often binary files containing binary records cause the most problems. This is especially true in environments where ints grow from 32-bit to 64-bit in the transition to a 64-bit build. Primarily this is due to the fact that integers get written natively to files in their current (32-bit) length and read in using an incorrect length in a 64-bit build where ints are 64-bit.

When to worry about endianness?

I have seen countless references about endianness and what it means. I got no problems about that...
However, my coding project is a simple game to run on linux and windows, on standard "gamer" hardware.
Do I need to worry about endianness in this case? When should I need to worry about it?
My code is simple C and SDL+GL, the only complex data are basic media files (png+wav+xm) and the game data is mostly strings, integer booleans (for flags and such) and static-sized arrays. So far no user has had issues, so I am wondering if adding checks is necessary (will be done later, but there are more urgent issues IMO).
The times when you need to worry about endianess:
you are sending binary data between machines or processes (using a network or file). If the machines may have different byte order or the protocol used specifies a particular byte order (which it should), you'll need to deal with endianess.
you have code that access memory though pointers of different types (say you access a unsigned int variable through a char*).
If you do these things you're dealing with byte order whether you know it or not - it might be that you're dealing with it by assuming it's one way or the other, which may work fine as long as your code doesn't have to deal with a different platform.
In a similar vein, you generally need to deal with alignment issues in those same cases and for similar reasons. Once again, you might be dealing with it by doing nothing and having everything work fine because you don't have to cross platform boundaries (which may come back to bite you down the road if that does become a requirement).
If you mean a PC by "standard gamer hardware", then you don't have to worry about endianness as it will always be little endian on x86/x64. But if you want to port the project to other architectures, then you should design it endianness-independently.
Whenever you recieve/transmit data from a network, remeber to convert to/from network and host byte order. The C functions htons, htonl etc, or equivalients in your language, should be used here.
Whenever you read multi-byte values (like UTF-16 characters or 32 bit ints) from a file, since that file might have originated on a system with different endianness. If the file is UTF 16 or 32 it probably has a BOM (byte-order mark). Otherwise, the file format will have to specify endianness in some way.
You only need to worry about it if your game needs to run on different hardware architectures. If you are positive that it will always run on Intel hardware then you can forget about it. If it will run on Linux though many people use different architectures than Intel and you may end up having to think about it.
Are you distributing you game in source code form?
Because if you are distributing you game as a binary only, then you know exactly which processor families your game will run on. Also, the media files, are they user generated (possibly via a level editor) or are they really only ment to be supplied by yourself?
If this is a truly closed environment (your distribute binaries and the game assets are not intended to be customized) then you know your own risks to endians and I personally wouldn't fool with it.
However, if you are either distributing source and/or hoping people will customize their game, then you have a potential for concern. However, with most of the desktop/laptop computers around these days moving to x86 I would think this is a diminishing concern.
The problem occurs with networking and how the data is sent and when you are doing bit fiddling on different processors since different processors may store the data differently in memory.
I believe Power PC has the opposite endianness of the Intel boards. Might be able to have a routine that sets the endianness dependant on the architecture? I'm not sure if you can actually tell what the hardware architecture is in code...maybe someone smarter then me does know the answer to that question.
Now in reference to your statement "standard" Gamer H/W, I would say typically you're going to look at Consumer Off the Shelf solutions are really what most any Standard Gamer is using, so you're almost going to for sure get the same endian across the board. I'm sure someone will disagree with me but that's my $.02
Ha...I just noticed on the right there is a link that is showing up related to the suggestion I had above.
Find Endianness through a c program

Resources