Why is size_t better?

Why is size_t better? - c

The title is actually a bit misleading, but I wanted to keep it short. I've read about why I should use size_t and I often found statements like this:
size_t is guaranteed to be able to express the maximum size of any object, including any array
I don't really understand what that means. Is there some kind of cap on how much memory you can allocate at once and size_t is guaranteed to be large enough to count every byte in that memory block?
Follow-up question:
What determines how much memory can be allocated?

Let's say the biggest object your compiler/platform can have is 4 gb. size_t then is 32 bit. Now let's say you recompile your program on a 64 bit platform able to support objects of size 2^43 - 1. size_t will be at least 43 bit long (but normally it will be 64 bit at this point). The point is that you only have to recompile the program. You don't have to change all your ints to long (if int is 32 bit and long is 64 bit) or from int32_t to int64_t.
(if you are asking yourself why 43 bit, let's say that Windows Server 2008 R2 64bit doesn't support objects of size 2^63 nor objects of size 2^62... It supports 8 TB of addressable space... So 43 bit!)
Many programs written for Windows considered a pointer to be as much big as a DWORD (a 32 bit unsigned integer). These programs can't be recompiled on 64 bit without rewriting large swats of code. Had they used DWORD_PTR (an unsigned value guaranteed to be as much big as necessary to contain a pointer) they wouldn't have had this problem.
The size_t "point" is the similar. but different!
size_t isn't guaranteed to be able to contain a pointer!!
(the DWORD_PTR of Microsoft Windows is)
This, in general, is illegal:
void *p = ...
size_t p2 = (size_t)p;
For example, on the old DOS "platform", the maximum size of an object was 64k, so size_t needed to be 16 bit BUT a far pointer needed to be at least 20 bit, because the 8086 had a memory space of 1 mb (in the end a far pointer was 16 + 16 bit, because the memory of an 8086 was segmented)

Basically it means that size_t, is guaranteed to be large enough to index any array and get the size of any data type.
It is preferred over using just int, because the size of int and other integer types can be smaller than what can be indexed. For example int is usually 32-bits long which is not enough to index large arrays on 64-bit machines. (This is actually a very common problem when porting programs to 64-bit.)

That is exactly the reason.
The maximum size of any object in a given programming language is determined by a combination of the OS, the CPU architecture and the compiler/linker in use.
size_t is defined to be big enough to hold the size value of the largest possible object.
This usually means that size_t is typedef'ed to be the same as the largest int type available.
So on a 32 bit environment it would typically be 4 bytes and in a 64 bit system 8 bytes.

size_t is defined for the platform that you are compiling for. Hence it can represent the maximum for that platform.

size_t is the return of the sizeof operator (see 7.17 c99) therefore it must describe the largest possible object the system can represent.

Have a look at
http://en.wikipedia.org/wiki/Size_t

Related

What's the sizeof of size_t on gpu?

My understanding is that size_t is a type large enough to represent (or address) any memory position in a given architecture.
For instance, on a 32 bit machine size_t should be able to represent at least 2^32 values. This means that sizeof(size_t) must be >= 4 in 32 bit architectures, right?
So what should be the sizeof(size_t) on code that's meant to run a gpu?
Since many gpus have more than 4gb, sizeof(size_t) must be at least 5. But I imagine it's 8, for alignment purposes.

Roughly speaking, size_t should be able to represent the size of any single allocated object. This might be smaller than the total address space though.
For example in 16-bit MS-DOS program one memory model had a 16-bit size_t even though many megabytes of memory were available, and pointers were 32-bit. But you could not allocate any particular chunk of memory larger than 64K.
It would be up to the compiler writer for the GPU to make size_t have some size that is large enough for the largest possible allocation on that GPU. As you say, this is likely to be a power of 2 (but not guaranteed).
The type used to represent any memory position is void *.

Is it possible to allocate huge array in c?

Hi all i want to use the following :
int i;
unsigned short int **graph;
graph = (unsigned short int**)malloc (sizeof(unsigned short int *) * 65535);
if (graph == NULL ) fprintf(stderr, "out of memory\n");
for (i = 0; i < 65535; i++){
graph[i] = (unsigned short int*)malloc (sizeof(unsigned short int) *65535);
if (graph[i] == NULL ) fprintf(stderr, "out of memory\n");
}
the size 65535 is constant
i need to build this size of graph
is it possible?
will it help if i will split it ?
Thanks!

There are four different issues to consider here:
1) The size of the argument of malloc. It is of type size_t, which is an unsigned integer type that is at least 16 bits and large enough to hold the size of any object or the index of any array. In practice it tends to be the platform's native word size, i.e., 32 bits for 32-bit platforms, and 64 bits for 64-bit platforms, so you are likely to need at least a 32-bit platform, but this almost certainly the case unless developing for embedded (or very retro) systems.
One should also remember that the argument may overflow and you can silently end up successfully allocating less memory than you thought you'd get (e.g., you may effectively call malloc(65534) when you thought you were calling malloc(2 * 65535)). But in this case it is very unlikely to be an issue for any platform capable of allocating this amount of memory.
2) Whether the malloc calls succeed. You are already checking for this, so simply running the code will answer this. You are allocating over 8 GB† of memory here, so it is likely that it will fail unless compiled for 64 bits (since the maximum addressable memory for 32 bits is 4 GB).
3) Whether you can actually use all the memory you've allocated. Some operating systems will overcommit memory and allow you to allocate much more memory than is actually available. You may run into trouble if you actually try to use all the memory you've allocated. This depends on the OS and the amount of memory actually available, possibly including swap.
4) Whether it is practical for the machine the program is run on to actually have that much data in memory. Even if the malloc calls succeed and the OS lets you use the memory allocated, it is still over 8 GB, which means that a typical machine should probably have at least 12 GB of RAM installed to accommodate this, the OS, and other programs. Otherwise it may swap like crazy, despite theoretically working.
You have revealed in comments that you are running a 64-bit machine with 4 GB of RAM installed, so if you compile for 64 bits the first two points are not an issue, but point 3 may be, and point 4 almost certainly will be. So, either install more RAM or figure out a different way to handle the data (e.g., if you are storing a graph as per the variable name, perhaps it is often sparse enough that you don't need to allocate for the worst case).
† “over 8 GB” comes from 65535 * sizeof(short *) + 65535 * 65535 * sizeof(short), where sizeof(short) is very likely to be 2, and sizeof(short *) (the pointer size) either 4 or 8. There is also some extra overhead for malloc's bookkeeping, but still it rounds to “over 8 GB”.
Some stylistic observations:
It would be better style to use one of the types from stdint.h if you want specifically 16 bits, e.g., uint16_t or uint_least16_t
You should not cast the return value of malloc in C (unlike in C++)
You can replace sizeof(unsigned short int *) with sizeof(*graph) and sizeof(unsigned short int) with sizeof(**graph) to avoid repetition (and allow you to change the type of graph without changing the malloc calls)
You don't need the int in unsigned short int

Max allowed size will be range of integer. So if you are using 16 bit OS max size is 65535, if you are using 32 bit OS size will be 4,294,967,295.

The maximum size of non-arrayed data is SIZE_MAX. SIZE_MAX is at least 65535.
SIZE_MAX is type size_t which is often the same type as unsigned but may be different.
The single largest allocation available using malloc() is SIZE_MAX.
void *malloc(size_t size);
sizeof(unsigned short int *) * 65535 may fail due to integer math overflow.
To allocate an array larger than SIZE_MAX (but each element is still <= SIZE_MAX), use calloc().
void *calloc(size_t nmemb, size_t size);
unsigned short int **graph = calloc(65535u, sizeof *graph);

Efficiency of different integer sizes on a 64-bit CPU

In a 64-bit CPU, if the int is 32 bits whereas the long is 64 bits, would the long be more efficient than the int?

The main problem with your question is that you did not define "efficient". There are several possible efficiency related differences.
Of course if you need to use 64 bits, then there's no question. But sometimes you could use 32 bits and you wonder if it would be better to use 64 bits instead.
Data Size Efficiency
Using 32 bits will use less memory. This is more efficient especially if you use a lot of them. Not only it's more efficient in the sense that you may not get to swap out, but also in the sense that you'll have fewer cache misses. If you use just a few then the efficiency difference is irrelevant.
Code Size Efficiency
This is heavily dependent on the architecture. Some architectures will need longer instructions to manipulate 32 bit values, others will need longer instructions to manipulate 64 bits values and others will make no difference. On the intel processors, for example, 32 bits is the default operand size even for 64 bits code. Smaller code may have a little advantage both in cache behavior and in pipeline usage. But it is dependent on the architecture which operand size will use smaller code.
Execution Speed Efficiency
In general there should be no difference beyond the one implied by code size. Once the instruction has been decoded the timing for mere execution are generally identical. However, once again, this is in fact architecture specific. There are architectures that do not have native 32 bit arithmetic, for example.
My suggestion:
If it's just some local variables or data in small structures that you do not allocate in huge quantities, use int and do it in a way that does not assume a size, so that a new version of the compiler or a different compiler that use a different size for int will still work.
However if you have huge arrays or matrixes, then use the smallest type you can use and make sure its size is explicit.

On the common x86-64 architecture, 32-bit arithmetic is never slower than 64 bit arithmethic. So int is always the same speed or faster than long. On other architectures that don't actually have builtin 32-bit arithmetic, such as the MMIX, this might not hold.
Basic wisdom holds: Write it without considering such micro-optimizations and if necessary, profile and optimize.

If you are trying to store 64 bits of data, use a long. If you aren't going to need the 64 bits use the regular 32 bit int.

Yes, a 64bit number would be more efficient than a 32bit number.
On a 64bit CPU most compilers would give you 64bit if you ask for an long int though.
To see the size with your current compiler:
#include <stdio.h>
int main(int argc, char **argv){
long int foo;
printf("The size of an int is: %ld bytes\n", sizeof(foo));
printf("The size of an int is: %ld bits\n", sizeof(foo) * 8);
return 0;
}
If your cpu is running in 64bit mode you can expect that the CPU will use that regardless of what you ask. All the registers are 64bit, the operations are 64bit so if you want to get a 32bit result it will generally convert the 64bit result to 32bit for you.
The limits.h on my system defines long int as:
/* Minimum and maximum values a `signed long int' can hold. */
# if __WORDSIZE == 64
# define LONG_MAX 9223372036854775807L
# else
# define LONG_MAX 2147483647L
# endif
# define LONG_MIN (-LONG_MAX - 1L)

Can I allocate more than 65535 bytes according C standards?

malloc defined like below:
void *malloc(size_t size);
http://pubs.opengroup.org/onlinepubs/009695399/functions/malloc.html
size_t definition (stddef.h):
size_t: Unsigned integer type of the result of the sizeof operator.
http://pubs.opengroup.org/onlinepubs/009604499/basedefs/stddef.h.html
But according this page, max limitation of size_t is 65535.
(Section Limits of Other Integer Types):
Limit of size_t: SIZE_MAX 65535
http://pubs.opengroup.org/onlinepubs/007904975/basedefs/stdint.h.html
Does it mean I can not allocate more than 65535 bytes when I want to respect C standard?

SIZE_MAX must be at least 65535. If you're running something like MS-DOS, chances are it'll actually even be that small. On a typical, reasonably current desktop computer (say, anything less than 10 years old) you can expect it to be larger, typically at least around 4 billion (232-1, to be more exact).
Whether you need to (try to) deal with a more limited system will depend on the range of targets to which you might care about porting your code. If you really might need to deal with a 16-bit compiler on a system with less than, say, 1 megabyte of addressable memory, then you'll have to write your code with that in mind. In all honesty, however, for most people that's simply irrelevant -- even relatively small portable systems (e.g., an iPod) can address far more memory than that any more. OTOH, if you're writing code for a singing greeting card, then yes, such limitations probably come with the territory (but in such cases, the standard is often something to treat more as a general guideline than an absolute law).

The minimum value of SIZE_MAX is 65535 but it can (and usually is) be more.
On most non-embedded platforms, size_t is a typedef for unsigned long and SIZE_MAX is set to ULONG_MAX.

On a 32-bit platform SIZE_MAX is usually 2^32 - 1 and on a 64 bit platform it is 2^64 - 1. Check with a printf if unsure.
printf("sizeof size_t = %zx, SIZE_MAX = %zx\n", sizeof(size_t), SIZE_MAX);
Include stdint.h to get the value of SIZE_MAX.

int v/s. long in C

On my system, I get:
sizeof ( int ) = 4
sizeof ( long ) = 4
When I checked with a C program, both int & long overflowed to the negative after:
a = 2147483647;
a++;
If both can represent the same range of numbers, why would I ever use the long keyword?

int has a minimum range of -32767 to 32767, whereas long has a minimum range of -2147483647 to 2147483647.
If you are writing portable code that may have to compile on different C implementations, then you should use long if you need that range. If you're only writing non-portable code for one specific implementation, then you're right - it doesn't matter.

Because sizeof(int)==sizeof(long) isn't always true. int normaly represents the fastest size with at least 2*8 Bit. long on the other hand is at least 4*8 Bit.

C defines a number of integer types and specifies the relation of their sizes. Basically, what it says is that sizeof(long long) >= sizeof(long) >= sizeof(int) >= sizeof(short) >= sizeof(char), and that sizeof(char) == 1.
But the actual sizes are not defined, and depend on the architecture you are running on. On a 32-bit PC, int and long are typically four bytes and long long is 8 bytes. But on a 64-bit system, long is typically 8 bytes, and thus different from int.
There is also a type called uintptr_t (and intptr_t) that is guaranteed to have the same size as data pointers.
The important thing to remember is to not assume that you can, for example, store pointer values in a long or an int. Being portable is probably more important than you think, and it is likely that you will want to compile your code on a 64-bit system in the near future.

I think it's more of a compiler issue nowadays, since computers has gone much faster and demands more numbers, as was the case before.

On different platform or with a different compiler, the int and long may be different.
If you don't plan to port your code to anything else or use a different machine, then pick the one you want, it won't make a difference.

It depends on the compiler, and you might want to check this out: What does the C++ standard state the size of int, long type to be?

The size of built-in data types is variable depending on the C implementation, but they all have minimum ranges. Nowadays, int is typically 4 bytes long (32-bits) because most OS are 32-bit. Note that char will always be 1 bytes.

The size of a data type depends upon the compiler. Different compilers have diffrent size of int and other data types.
So if you make a code which is going to run on diffrent machine you should use long or it is depend on the range of the value tha t ur variable may have.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight