Is it possible to allocate huge array in c? - c

Hi all i want to use the following :
int i;
unsigned short int **graph;
graph = (unsigned short int**)malloc (sizeof(unsigned short int *) * 65535);
if (graph == NULL ) fprintf(stderr, "out of memory\n");
for (i = 0; i < 65535; i++){
graph[i] = (unsigned short int*)malloc (sizeof(unsigned short int) *65535);
if (graph[i] == NULL ) fprintf(stderr, "out of memory\n");
}
the size 65535 is constant
i need to build this size of graph
is it possible?
will it help if i will split it ?
Thanks!

There are four different issues to consider here:
1) The size of the argument of malloc. It is of type size_t, which is an unsigned integer type that is at least 16 bits and large enough to hold the size of any object or the index of any array. In practice it tends to be the platform's native word size, i.e., 32 bits for 32-bit platforms, and 64 bits for 64-bit platforms, so you are likely to need at least a 32-bit platform, but this almost certainly the case unless developing for embedded (or very retro) systems.
One should also remember that the argument may overflow and you can silently end up successfully allocating less memory than you thought you'd get (e.g., you may effectively call malloc(65534) when you thought you were calling malloc(2 * 65535)). But in this case it is very unlikely to be an issue for any platform capable of allocating this amount of memory.
2) Whether the malloc calls succeed. You are already checking for this, so simply running the code will answer this. You are allocating over 8 GB† of memory here, so it is likely that it will fail unless compiled for 64 bits (since the maximum addressable memory for 32 bits is 4 GB).
3) Whether you can actually use all the memory you've allocated. Some operating systems will overcommit memory and allow you to allocate much more memory than is actually available. You may run into trouble if you actually try to use all the memory you've allocated. This depends on the OS and the amount of memory actually available, possibly including swap.
4) Whether it is practical for the machine the program is run on to actually have that much data in memory. Even if the malloc calls succeed and the OS lets you use the memory allocated, it is still over 8 GB, which means that a typical machine should probably have at least 12 GB of RAM installed to accommodate this, the OS, and other programs. Otherwise it may swap like crazy, despite theoretically working.
You have revealed in comments that you are running a 64-bit machine with 4 GB of RAM installed, so if you compile for 64 bits the first two points are not an issue, but point 3 may be, and point 4 almost certainly will be. So, either install more RAM or figure out a different way to handle the data (e.g., if you are storing a graph as per the variable name, perhaps it is often sparse enough that you don't need to allocate for the worst case).
† “over 8 GB” comes from 65535 * sizeof(short *) + 65535 * 65535 * sizeof(short), where sizeof(short) is very likely to be 2, and sizeof(short *) (the pointer size) either 4 or 8. There is also some extra overhead for malloc's bookkeeping, but still it rounds to “over 8 GB”.
Some stylistic observations:
It would be better style to use one of the types from stdint.h if you want specifically 16 bits, e.g., uint16_t or uint_least16_t
You should not cast the return value of malloc in C (unlike in C++)
You can replace sizeof(unsigned short int *) with sizeof(*graph) and sizeof(unsigned short int) with sizeof(**graph) to avoid repetition (and allow you to change the type of graph without changing the malloc calls)
You don't need the int in unsigned short int

Max allowed size will be range of integer. So if you are using 16 bit OS max size is 65535, if you are using 32 bit OS size will be 4,294,967,295.

The maximum size of non-arrayed data is SIZE_MAX. SIZE_MAX is at least 65535.
SIZE_MAX is type size_t which is often the same type as unsigned but may be different.
The single largest allocation available using malloc() is SIZE_MAX.
void *malloc(size_t size);
sizeof(unsigned short int *) * 65535 may fail due to integer math overflow.
To allocate an array larger than SIZE_MAX (but each element is still <= SIZE_MAX), use calloc().
void *calloc(size_t nmemb, size_t size);
unsigned short int **graph = calloc(65535u, sizeof *graph);

Related

What's the sizeof of size_t on gpu?

My understanding is that size_t is a type large enough to represent (or address) any memory position in a given architecture.
For instance, on a 32 bit machine size_t should be able to represent at least 2^32 values. This means that sizeof(size_t) must be >= 4 in 32 bit architectures, right?
So what should be the sizeof(size_t) on code that's meant to run a gpu?
Since many gpus have more than 4gb, sizeof(size_t) must be at least 5. But I imagine it's 8, for alignment purposes.
Roughly speaking, size_t should be able to represent the size of any single allocated object. This might be smaller than the total address space though.
For example in 16-bit MS-DOS program one memory model had a 16-bit size_t even though many megabytes of memory were available, and pointers were 32-bit. But you could not allocate any particular chunk of memory larger than 64K.
It would be up to the compiler writer for the GPU to make size_t have some size that is large enough for the largest possible allocation on that GPU. As you say, this is likely to be a power of 2 (but not guaranteed).
The type used to represent any memory position is void *.

Are there any actual implementations that permit `char array[SIZE_MAX];` (or successful equivalent using `malloc`)?

The C99 standard suggests that the type size_t is large enough to store the size of any object, as it is the resulting type of the sizeof operator.
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. ...
The value of the result ... is implementation-defined, and its type (an unsigned integer type) is size_t, defined in (and other headers).
Since SIZE_MAX (<limits.h>, if I recall correctly) is defined to be the largest value that a size_t can store, it should follow that the largest object would have a size equal to SIZE_MAX. That would in fact be helpful, but alas it seems as though we'd be asking quite a lot to allocate anything even one quarter that size.
Are there any implementations where you can actually declare (or otherwise allocate) an object as large as SIZE_MAX?
It certainly doesn't seem to be the common case... In C11 the optional rsize_t type and its corresponding RSIZE_MAX macro were introduced. It's supposed to be a runtime constraint if any standard C function is used with a value greater than RSIZE_MAX as an rsize_t argument. This seems to imply that the largest object might be RSIZE_MAX bytes. However, this doesn't seem to be widely supported, either!
Are there any implementations where RSIZE_MAX exists and you can actually declare (or otherwise allocate) an object as large as RSIZE_MAX?
I think all C implementations will allow you to declare an object of that size. The OS may refuse to load the executable, as there isn't enough memory.
Likewise, all C runtime libraries will allow you to attempt to allocate that size of memory, however, it will probably fail as there isn't that much memory, neither virtual nor real.
Just think: if size_t is a type equal to the machine word size (32 bits, 64 bits), then the highest adressable memory cell (byte) is 2^32 (or 2^64). Given there are lower memory interrupt vectors, BIOS and OS, not to mention the code and data of your program, there is never this amount of memory available.
Oh, now I see the issue. Assuming realloc factor of ~1.5, there are two integer limits. First is SIZE_MAX/3, because above that size*3/2 will overflow. But at that large low bits are insignificant and we can inverse the operator order to size/2*3 to still grow by ~1.5 (that imposes second limit of SIZE_MAX/3*2). At last, resort to SIZE_MAX.
After that, we just have to binary-search the amount that system can actually allocate (in range from the result of growing to minimal required size).
int
grow(char **data_p, size_t *size_p, size_t min)
{
size_t size = *size_p;
while (size < min)
size = (size <= SIZE_MAX / 3 ? size * 3 / 2 :
size <= SIZE_MAX / 3 * 2 ? size / 2 * 3 : SIZE_MAX);
if (size != *size_p) {
size_t ext = size - min;
char *data;
for (;; ext /= 2)
if ((data = realloc(*data_p, size + ext)) || ext == 0)
break;
if (data == NULL) return -1; // ENOMEM
*data_p = data;
*size_p = size + ext;
}
return 0;
}
And none of os-dependent or manual limits!
As you can see, the original question is a consequence of [probably] imperfect implementation that doesn't treat edge cases with respect. It doesn't matter if any questioned systems do or do not exist now – any algorithm that is supposed to work near to the integer limits should take proper care of them.
(Please note that above code assumes char data and has no accounting for other element sizes; implementing that may add more complex checks.)

Declare array large enough to hold a type

Suppose I'm given a function with the following signature:
void SendBytesAsync(unsigned char* data, T length)
and I need a buffer large enough to hold a byte array of the maximum length that can be specified by type T. How do I declare that buffer? I can't just use sizeof as it will return the size (in bytes) of type T and not the maximum value that the type could contain. I don't want to use limits.h as the underlying type could change and my buffer be too small. I can't use pow from math.h because I need a constant expression. So how do I get a constant expression for the maximum size of a type at compile time in C?
Edit
The type will be unsigned. Since everyone seems to be appalled at the idea of a statically allocated buffer determined at compile time, I'll provide a little background. This is for an embedded application (on a microcontroller) where reliability and speed are the priorities. As such, I'm perfectly OK with wasting statically assigned memory for the sake of run time integrity (no malloc issues) and performance (no overhead for memory allocation each time I need the buffer). I understand the risk that if the max size of T is too large my linker will not be able to allocate a buffer that big, but that will be a compile-time failure, which can be accommodated, rather than a run-time failure, which cannot be tolerated. If, for example I use size_t for the size of the payload and allocate the memory dynamically, there is a very real possibility that the system will not have that much memory available. I would much rather know this at compile time, than at run-time where this will result in packet loss, data corruption, etc. Looking at the function signature I provided, it is ridiculous to provide a type as a size parameter for a dynamically allocated buffer and not expect the possibility that a caller will use the max value of the type. So I'm not sure why there seems to be so much consternation about allocating that memory once, for good. I can see this being a huge problem in the Windows world where multiple processes are fighting for the same memory resources, but in the embedded world, there's only 1 task to be done and if you can't do that effectively, then it doesn't matter how much memory you saved.
Use _Generic:
#define MAX_SIZE(X) _Generic((X),
long: LONG_MAX,
unsigned long: ULONG_MAX,
/* ... */)
Prior to C11 there isn't a portable way to find an exact maximum value of an object of type T (all calculations with CHAR_BIT, for example, may yield overestimates due to padding bits).
Edit: Do note that under certain conditions (think segmented memory of real-life situations) you might not be able to allocate a buffer large enough to equal the maximum value of any given type T.
if T is unsigned, then would ((T) -1) work?
(This is probably really bad, and if so, please let me know why :-) )
Is there a reason why you are allocating the maximum possible buffer size instead of a buffer that is only as large as you need? Why not have the caller simply specify the amount of memory needed?
Recall that the malloc() function takes an argument of type size_t. That means that (size_t)(-1) (which is SIZE_MAX in C99 and later) will represent the largest value that can be passed to malloc. If you are using malloc as your allocator, then this will be your absolute upper limit.
Maybe try using a bit shift?
let's see:
unsigned long max_size = (1 << (8 * sizeof(T))) - 1
sizeof(T) gives you the number of bytes T occupies in memory. (not technically true. usually the compiler will align the structure with memory... so if T is one byte, it will actually allocate 4, or something.)
Breaking it down:
8 * sizeof(T) gives you the number of bits that size represents
1 << x is the same as saying 2 to the x power. Because every time you shift to the left, you're multiplying by two. Just as every time you shift to the left in base 10, you are multiplying by 10.
- 1 an 8-bit number can hold 256 values. 0..255.
Interesting question. I would start by looking in the 'limits' header for the max value of a numeric type T. I have not tried it but I would do something that uses T::max

Can I allocate more than 65535 bytes according C standards?

malloc defined like below:
void *malloc(size_t size);
http://pubs.opengroup.org/onlinepubs/009695399/functions/malloc.html
size_t definition (stddef.h):
size_t: Unsigned integer type of the result of the sizeof operator.
http://pubs.opengroup.org/onlinepubs/009604499/basedefs/stddef.h.html
But according this page, max limitation of size_t is 65535.
(Section Limits of Other Integer Types):
Limit of size_t: SIZE_MAX 65535
http://pubs.opengroup.org/onlinepubs/007904975/basedefs/stdint.h.html
Does it mean I can not allocate more than 65535 bytes when I want to respect C standard?
SIZE_MAX must be at least 65535. If you're running something like MS-DOS, chances are it'll actually even be that small. On a typical, reasonably current desktop computer (say, anything less than 10 years old) you can expect it to be larger, typically at least around 4 billion (232-1, to be more exact).
Whether you need to (try to) deal with a more limited system will depend on the range of targets to which you might care about porting your code. If you really might need to deal with a 16-bit compiler on a system with less than, say, 1 megabyte of addressable memory, then you'll have to write your code with that in mind. In all honesty, however, for most people that's simply irrelevant -- even relatively small portable systems (e.g., an iPod) can address far more memory than that any more. OTOH, if you're writing code for a singing greeting card, then yes, such limitations probably come with the territory (but in such cases, the standard is often something to treat more as a general guideline than an absolute law).
The minimum value of SIZE_MAX is 65535 but it can (and usually is) be more.
On most non-embedded platforms, size_t is a typedef for unsigned long and SIZE_MAX is set to ULONG_MAX.
On a 32-bit platform SIZE_MAX is usually 2^32 - 1 and on a 64 bit platform it is 2^64 - 1. Check with a printf if unsure.
printf("sizeof size_t = %zx, SIZE_MAX = %zx\n", sizeof(size_t), SIZE_MAX);
Include stdint.h to get the value of SIZE_MAX.

Why is size_t better?

The title is actually a bit misleading, but I wanted to keep it short. I've read about why I should use size_t and I often found statements like this:
size_t is guaranteed to be able to express the maximum size of any object, including any array
I don't really understand what that means. Is there some kind of cap on how much memory you can allocate at once and size_t is guaranteed to be large enough to count every byte in that memory block?
Follow-up question:
What determines how much memory can be allocated?
Let's say the biggest object your compiler/platform can have is 4 gb. size_t then is 32 bit. Now let's say you recompile your program on a 64 bit platform able to support objects of size 2^43 - 1. size_t will be at least 43 bit long (but normally it will be 64 bit at this point). The point is that you only have to recompile the program. You don't have to change all your ints to long (if int is 32 bit and long is 64 bit) or from int32_t to int64_t.
(if you are asking yourself why 43 bit, let's say that Windows Server 2008 R2 64bit doesn't support objects of size 2^63 nor objects of size 2^62... It supports 8 TB of addressable space... So 43 bit!)
Many programs written for Windows considered a pointer to be as much big as a DWORD (a 32 bit unsigned integer). These programs can't be recompiled on 64 bit without rewriting large swats of code. Had they used DWORD_PTR (an unsigned value guaranteed to be as much big as necessary to contain a pointer) they wouldn't have had this problem.
The size_t "point" is the similar. but different!
size_t isn't guaranteed to be able to contain a pointer!!
(the DWORD_PTR of Microsoft Windows is)
This, in general, is illegal:
void *p = ...
size_t p2 = (size_t)p;
For example, on the old DOS "platform", the maximum size of an object was 64k, so size_t needed to be 16 bit BUT a far pointer needed to be at least 20 bit, because the 8086 had a memory space of 1 mb (in the end a far pointer was 16 + 16 bit, because the memory of an 8086 was segmented)
Basically it means that size_t, is guaranteed to be large enough to index any array and get the size of any data type.
It is preferred over using just int, because the size of int and other integer types can be smaller than what can be indexed. For example int is usually 32-bits long which is not enough to index large arrays on 64-bit machines. (This is actually a very common problem when porting programs to 64-bit.)
That is exactly the reason.
The maximum size of any object in a given programming language is determined by a combination of the OS, the CPU architecture and the compiler/linker in use.
size_t is defined to be big enough to hold the size value of the largest possible object.
This usually means that size_t is typedef'ed to be the same as the largest int type available.
So on a 32 bit environment it would typically be 4 bytes and in a 64 bit system 8 bytes.
size_t is defined for the platform that you are compiling for. Hence it can represent the maximum for that platform.
size_t is the return of the sizeof operator (see 7.17 c99) therefore it must describe the largest possible object the system can represent.
Have a look at
http://en.wikipedia.org/wiki/Size_t

Resources