i am going to read a TDMS file in matlab using Mexfunction in C language in a 64 bit windows machine, but i will develop the app in 32 bit windows machine. i know in there is a difference between 32 bit machine and 64 bits with the size of variables. i used a lot of fread(.. sizeof(type)..). is it going to be a problem when it is running in 64 bit machine? if so, how can i make it portable to 64 bits mahince?
thanks
ISO C99 provides the header <stdint.h>, which defines, amongst others, types of the form intN_t and uintN_t, where N is the width of the corresponding integer or unsigned integer type. If an implementation provides integer types of width 8, 16, 32 or 64, it should provide the corresponding typedefs.
The more general problem is that you will have to know what the size of the variables were on the machine that WROTE the file, not the machine that is reading them. In other words, you can say sizeof(int) and get say 8 on some crazy 64 bit system, but if the file was saved on a normal 32 bit machine, sizeof(int) may be 4 (or even 2, according to ansi c, I think). The sizeof command will tell you the size of an int, or whatever, on your local machine at the time of compile. But it can't tell you anything about the machine that saved the file.
Your best bet is to see if the TDMS standard (I'm not familiar with it) defines variable sizes. If so, you should use those, rather than sizeof.
A poor second alternative is to have a test sequence at the beginning of the file and dynamically adjust your variable sizes until you can read the test sequence correctly.
Yes, there could potentially be an issue depending on what you do. For instance, if you rely on pointer size being 4 bytes or 8 bytes, this will be an issue. However, if you are doing something benign than maybe not. I think we'd have to see the specific code to be able to tell you. In short, there should be a straightforward way to go about this without caring about whether or not you are in a 64-bit or 32-bit architecture.
Related
Preface
So after a long time of C only work I came back to Delphi and found out that there are some new things in Delphi. One being NativeInt.
To my surprise I discovered that Delphi and C handle their "native integer"1 types different for x86-64. Delphi NativeInt seems to behave like C void * and Delphi Pointer which is contrary to what I would expect from the names.
In Delphi NativeInt is 64 bit in size. Expressed in Code:
SizeOf(NativeInt) = SizeOf(Pointer) = SizeOf(Int64) = 8
C has only 64 bit pointers. int remains 32 bit. Expressed in Code2:
sizeof(int) == 4 != sizeof(void *) == 8
Even the Free Pascal Compiler3 agrees on the size of NativeInt.
Question
Why was 64 bit chosen for Delphi NativeInt and 32 bits for C int?
Of course both are valid according to the language documentation/specification. However, "the language allows for it" is not really a helpful answer.
I guess it has to do with speed of execution as this is the main selling point of C today. Wikipedia and other sources all say that x86-64 do have 64 bit operand registers. However, they also state that the default operand size is 32 bit. So maybe operations on 64 bit operands are slower compared to 32 bit operands? Or maybe the 64 bit registers can do 2 32 bit operations at the same time? Is that a reason?
Is there maybe another reason the creators of the compilers did choose these sizes?
Footnotes
I am comparing Delphi NativeInt to C int because the name/specificaion suggests that they have similar purpose. I know there is also Delphi Integer which behaves like C int on x68 and x86-64 in Delphi.
sizeof() returns the size as multiple of char in C. However, char is 1 byte on x86-64.
It does so in Delphi mode and default mode for NativeInt. The other integer types in default mode are a whole other can of worms.
NativeInt is simply an integer that is the same size as a pointer. Hence the fact that it changes size on different platforms. The documentation says exactly that:
The size of NativeInt is equivalent to the size of the pointer on the current platform.
The main use for NativeInt is to store things like operating system handles that behind the scenes are actually memory addresses. You are not expected to use it to perform arithmetic, store array lengths etc. If you attempt to do that then you make it much more difficult to share code between 32 and 64 bit versions of your program.
You can think of Delphi NativeInt as being directly equivalent to the .net type IntPtr. In C and C++ the OS handle types would commonly be declared as void* which is a pointer type rather than an integer type. However, you would perfectly well use a type like intptr_t if you so wished.
You use the term "native integer" to describe NativeInt, but in spite of the name it's very important to realise that NativeInt is not the native integer type of the language. That would be Integer. The native in NativeInt refers to the underlying hardware platform rather than the language.
The Delphi type Integer, the language native integer, matches up with the C type int, the corresponding language native type. And on Windows these types are 32 bits wide for both 32 and 64 bit systems.
When the Windows designers started working on 64 bit Windows, they had a keen memory of what had happened when int changed from 16 to 32 bits in the transition from 16 bit to 32 bit systems. That was no fun at all, although it was clearly the right decision. This time round, from 32 to 64, there was no compelling reason to make int a 64 bit type. Had the Windows designers done so, it would have made porting much harder work. And so they chose to leave int as a 32 bit type.
In terms of performance, the AMD64 architecture was designed to operate efficiently on 32 bit types. Since a 32 bit integer is half the size of a 64 bit integer, then memory usage is reduced by making int only 32 bits on a 64 bit system. And this will have a performance benefit.
A couple of comments:
You state that "C has only 64 bit pointers". That is not so. A 32 bit C compiler will generally use a flat 32 bit memory model with 32 bit pointers.
You also say, "in Delphi NativeInt is 64 bit in size". Again that is not so. It is either 32 or 64 bits wide depending on the target.
Note that NativeInt is not meant to interact with a pointer!
The issue is that nativeInt is signed.
Normally this is not what you want, because the pointer points the the beginning of the datablock. Negative offsets will net you an access violation here.
If you have a pointer pointing to the middle (because you're doing indexing or something like that) then negative offsets apply and NativeInt aka IntPtr comes into view.
For standard pointers (pointing to the start): use UIntPtr, because it will not breakdown when the offset becomes bigger than 2^31/2^63.
(Likely on a 32bit platform, not so much on 64bit)
For this reason there is a UIntPtr, which maps exactly to the C equivalent.
The UIntPtr is a NativeUint.
Use cases
Which of the types you choose to use depends on the use case.
A: I want the fastest integer -> Choose Int32 aka integer;
B1: I want to have an integer to do pointer arithmetic -> Choose UIntPtr aka NativeUInt*.
B2: I do indexing with my pointer -> Choose IntPtr aka NativeInt.
C: I want a big integer, but don't want the big slowdown that Int64 gives me on X86 -> choose NativeInt.
D: I want a bigint: choose Int64. (but know that it will be slowish on X86).
*) if you want to make it clear to the reader of your code that you're messing with pointers you need to name it UIntPtr obviously.
Is fwrite portable? I'm not really faced to the problem described below but I'd like to understand the fundamentals of C.
Lest assume we have two machines A8 (byte = 8bits) and B16 (byte = 16 bits).
Will the following code produce the same output on both machines ?
unsigned char[10] chars;
...
fwrite(chars,sizeof(unsigned char),10,mystream);
I guess A8 will produce 80 bits (10 octets) and B16 will produce 160 bits (20 octets).
Am I wrong?
This problem won't appear if only uintN_t types were used as their lengths in bits are independent of the size of the byte. But maybe uint8_t won't exist on B16.
What is the solution to this problem?
I guess building an array of uint32_t, putting my bytes in this array (with smart shifts and masks depending on machine's architecture) and writing this array will solve the problem. But this not really satisfactory.There is again an assumption that uint32_t exists on all platforms.The filling of this array will be very dependant on the current machine's architecture.
Thanks for any response.
fwrite() is a standard library function. So it must be portable for each C compiler.
That is it must be defined in C standard library of that compiler to support your machine.
So machine of 8bit, 16 bit, 32 bit give you same high level operation.
But if you want to design those library function then you have to consider machine architecture, memory organization of that machine.
As a C compiler user you should not bother about internal behavior.
I think you just want to use those C library function. So no difference in behavior of the function for different machine.
A byte is on almost every modern computer 8 bits. But there is an other reason fwrite isn't portable:
A file which was written on a Little Endian machine can't be readed by a big endian machine and other way.
In C, char is defined as "smallest addressable unit of the machine". That is, char is not necessarily 8 bits.
In most cases, it's safe enough to rely on a fact that char is 8 bits, and not to deal with some extreme cases.
To speak generally, you probably won't be able to write "half of a byte" to a file on a storage. Additionally, there will be issues with portability on hardware level between devices which are designed to work with different byte size machines. If you are dealing with other devices (such as telecom or stuff), you will have to implement bit streams.
Is there a simple way to compile 32-bit C code into a 64-bit application, with minimal modification? The code was not setup to use fixed type sizes.
I am not interested in taking advantage of 64-bit memory addressing. I just need to compile into a 64-bit binary while maintaining 4 byte longs and pointers.
Something like:
#define long int32_t
But of course that breaks a number of long use cases and doesn't deal with pointers. I thought there might be some standard procedure here.
There seem to be two orthogonal notions of "portability":
My code compiles everywhere out of the box. Its general behaviour is the same on all platforms, but details of available features vary depending on the platform's characteristics.
My code contains a folder for architecture-dependent stuff. I guarantee that MYINT32 is always 32 bit no matter what. I successfully ported the notion of 32 bits to the nine-fingered furry lummoxes of Mars.
In the first approach, we write unsigned int n; and printf("%u", n) and we know that the code always works, but details like the numeric range of unsigned int are up to the platform and not of our concern. (Wchar_t comes in here, too.) This is what I would call the genuinely portable style.
In the second approach, we typedef everything and use types like uint32_t. Formatted output with printf triggers tons of warnings, and we must resort to monsters like PRI32. In this approach we derive a strange sense of power and control from knowing that our integer is always 32 bits wide, but I hesitate to call this "portable" -- it's just stubborn.
The fundamental concept that requires a specific representation is serialization: The document you write on one platform should be readable on all other platforms. Serialization is naturally where we forgo the type system, must worry about endianness and need to decide on a fixed representation (including things like text encoding).
The upshot is this:
Write your main program core in portable style using standard language primitives.
Write well-defined, clean I/O interfaces for serialization.
If you stick to that, you should never even have to think about whether your platform is 32 or 64 bit, big or little endian, Mac or PC, Windows or Linux. Stick to the standard, and the standard will stick with you.
No, this is not, in general, possible. Consider, for example, malloc(). What is supposed to happen when it returns a pointer value that cannot be represented in 32 bits? How can that pointer value possibly be passed to your code as a 32 bit value, that will work fine when dereferenced?
This is just one example - there are numerous other similar ones.
Well-written C code isn't inherently "32-bit" or "64-bit" anyway - it should work fine when recompiled as a 64 bit binary with no modifications necessary.
Your actual problem is wanting to load a 32 bit library into a 64 bit application. One way to do this is to write a 32 bit helper application that loads your 32 bit library, and a 64 bit shim library that is loaded into the 64 bit application. Your 64 bit shim library communicates with your 32 bit helper using some IPC mechanism, requesting the helper application to perform operations on its behalf, and returning the results.
The specific case - a Matlab MEX file - might be a bit complicated (you'll need two-way function calling, so that the 64 bit shim library can perform calls like mexGetVariable() on behalf of the 32 bit helper), but it should still be doable.
The one area that will probably bite you is if any of your 32-bit integers are manipulated bit-wise. If you assume that some status flags are stored in a 32-bit register (for example), or if you are doing bit shifting, then you'll need to focus on those.
Another place to look would be any networking code that assumes the size (and endian) of integers passed on the wire. Once those get moved into 64-bit ints you'll need to make sure that you don't lose sign bits or precision.
Structures that contain integers will no longer be the same size. Any assumptions about size and alignment need to be cleaned out.
Variables of type int are allegedly "one machine-type word in length"
but in embedded systems, C compilers for 8 bit micro use to have int of 16 bits!, (8 bits for unsigned char) then for more bits, int behave normally:
in 16 bit micros int is 16 bits too, and in 32 bit micros int is 32 bits, etc..
So, is there a standar way to test it, something as BITSIZEOF( int ) ?
like "sizeof" is for bytes but for bits.
this was my first idea
register c=1;
int bitwidth=0;
do
{
bitwidth++;
}while(c<<=1);
printf("Register bit width is : %d",bitwidth);
But it takes c as int, and it's common in 8 bit compilers to use int as 16 bit, so it gives me 16 as result, It seems there is no standar for use "int" as "register width", (or it's not respected)
Why I want to detect it? suppose I need many variables that need less than 256 values, so they can be 8, 16, 32 bits, but using the right size (same as memory and registers) will speed up things and save memory, and if this can't be decided in code, I have to re-write the function for every architecture
EDIT
After read the answers I found this good article
http://embeddedgurus.com/stack-overflow/category/efficient-cc/page/4/
I will quote the conclusion (added bold)
Thus
the bottom line is this. If you want
to start writing efficient, portable
embedded code, the first step you
should take is start using the C99
data types ‘least’ and ‘fast’. If your
compiler isn’t C99 compliant then
complain until it is – or change
vendors. If you make this change I
think you’ll be pleasantly surprised
at the improvements in code size and
speed that you’ll achieve.
I have to re-write the function for every architecture
No you don't. Use C99's stdint.h, which has types like uint_fast8_t, which will be a type capable of holding 256 values, and quickly.
Then, no matter the platform, the types will change accordingly and you don't change anything in your code. If your platform has no set of these defined, you can add your own.
Far better than rewriting every function.
To answer your deeper question more directly, if you have a need for very specific storage sizes that are portable across platforms, you should use something like types.h stdint.h which defines storage types specified with number of bits.
For example, uint32_t is always unsigned 32 bits and int8_t is always signed 8 bits.
#include <limits.h>
const int bitwidth = sizeof(int) * CHAR_BIT;
The ISA you're compiling for is already known to the compiler when it runs over your code, so your best bet is to detect it at compile time. Depending on your environment, you could use everything from autoconf/automake style stuff to lower level #ifdef's to tune your code to the specific architecture it'll run on.
I don't exactly understand what you mean by "there is no standar for use "int" as "register width". In the original C language specification (C89/90) the type int is implied in certain contexts when no explicit type is supplied. Your register c is equivalent to register int c and that is perfectly standard in C89/90. Note also that C language specification requires type int to support at least -32767...+32767 range, meaning that on any platform int will have at least 16 value-forming bits.
As for the bit width... sizeof(int) * CHAR_BIT will give you the number of bits in the object representation of type int.
Theoretically though, the value representation of type int is not guaranteed to use all bits of its object representation. If you need to determine the number of bits used for value representation, you can simply analyze the INT_MIN and INT_MAX values.
P.S. Looking at the title of your question, I suspect that what you really need is just the CHAR_BIT value.
Does an unsigned char or unsigned short suit your needs? Why not use that? If not, you should be using compile time flags to bring in the appropriate code.
I think that in this case you don't need to know how many bits has your architecture. Just use variables as small as possible if you want to optimize your code.
I've always used typedef in embedded programming to avoid common mistakes:
int8_t - 8 bit signed integer
int16_t - 16 bit signed integer
int32_t - 32 bit signed integer
uint8_t - 8 bit unsigned integer
uint16_t - 16 bit unsigned integer
uint32_t - 32 bit unsigned integer
The recent embedded muse (issue 177, not on the website yet) introduced me to the idea that it's useful to have some performance specific typedefs. This standard suggests having typedefs that indicate you want the fastest type that has a minimum size.
For instance, one might declare a variable using int_fast16_t, but it would actually be implemented as an int32_t on a 32 bit processor, or int64_t on a 64 bit processor as those would be the fastest types of at least 16 bits on those platforms. On an 8 bit processor it would be int16_t bits to meet the minimum size requirement.
Having never seen this usage before I wanted to know
Have you seen this in any projects, embedded or otherwise?
Any possible reasons to avoid this sort of optimization in typedefs?
For instance, one might declare a
variable using int_fast16_t, but it
would actually be implemented as an
int32_t on a 32 bit processor, or
int64_t on a 64 bit processor as those
would be the fastest types of at least
16 bits on those platforms
That's what int is for, isn't it? Are you likely to encounter an 8-bit CPU any time soon, where that wouldn't suffice?
How many unique datatypes are you able to remember?
Does it provide so much additional benefit that it's worth effectively doubling the number of types to consider whenever I create a simple integer variable?
I'm having a hard time even imagining the possibility that it might be used consistently.
Someone is going to write a function which returns a int16fast_t, and then someone else is going to come along and store that variable into an int16_t.
Which means that in the obscure case where the fast variants are actually beneficial, it may change the behavior of your code. It may even cause compiler errors or warnings.
Check out stdint.h from C99.
The main reason I would avoid this typedef is that it allows the type to lie to the user. Take int16_t vs int_fast16_t. Both type names encode the size of the value into the name. This is not an uncommon practice in C/C++. I personally use the size specific typedefs to avoid confusion for myself and other people reading my code. Much of our code has to run on both 32 and 64 bit platforms and many people don't know the various sizing rules between the platforms. Types like int32_t eliminate the ambiguity.
If I had not read the 4th paragraph of your question and instead just saw the type name, I would have assumed it was some scenario specific way of having a fast 16 bit value. And I obviously would have been wrong :(. For me it would violate the "don't surprise people" rule of programming.
Perhaps if it had another distinguishing verb, letter, acronym in the name it would be less likely to confuse users. Maybe int_fast16min_t ?
When I am looking at int_fast16_t, and I am not sure about the native width of the CPU in which it will run, it may make things complicated, for example the ~ operator.
int_fast16_t i = 10;
int_16_t j = 10;
if (~i != ~j) {
// scary !!!
}
Somehow, I would like to willfully use 32 bit or 64 bit based on the native width of the processor.
I'm actually not much of a fan of this sort of thing.
I've seen this done many times (in fact, we even have these typedefs at my current place of employment)... For the most part, I doubt their true usefulness... It strikes me as change for changes sake... (and yes, I know the sizes of some of the built ins can vary)...
I commonly use size_t, it happens to be the fastest address size, a tradition I picked up in embedding. And it never caused any issues or confusion in embedded circles, but it actually began causing me problems when I began working on 64bit systems.