What's the sizeof of size_t on gpu? - c

My understanding is that size_t is a type large enough to represent (or address) any memory position in a given architecture.
For instance, on a 32 bit machine size_t should be able to represent at least 2^32 values. This means that sizeof(size_t) must be >= 4 in 32 bit architectures, right?
So what should be the sizeof(size_t) on code that's meant to run a gpu?
Since many gpus have more than 4gb, sizeof(size_t) must be at least 5. But I imagine it's 8, for alignment purposes.

Roughly speaking, size_t should be able to represent the size of any single allocated object. This might be smaller than the total address space though.
For example in 16-bit MS-DOS program one memory model had a 16-bit size_t even though many megabytes of memory were available, and pointers were 32-bit. But you could not allocate any particular chunk of memory larger than 64K.
It would be up to the compiler writer for the GPU to make size_t have some size that is large enough for the largest possible allocation on that GPU. As you say, this is likely to be a power of 2 (but not guaranteed).
The type used to represent any memory position is void *.

Related

Is it possible to allocate huge array in c?

Hi all i want to use the following :
int i;
unsigned short int **graph;
graph = (unsigned short int**)malloc (sizeof(unsigned short int *) * 65535);
if (graph == NULL ) fprintf(stderr, "out of memory\n");
for (i = 0; i < 65535; i++){
graph[i] = (unsigned short int*)malloc (sizeof(unsigned short int) *65535);
if (graph[i] == NULL ) fprintf(stderr, "out of memory\n");
}
the size 65535 is constant
i need to build this size of graph
is it possible?
will it help if i will split it ?
Thanks!
There are four different issues to consider here:
1) The size of the argument of malloc. It is of type size_t, which is an unsigned integer type that is at least 16 bits and large enough to hold the size of any object or the index of any array. In practice it tends to be the platform's native word size, i.e., 32 bits for 32-bit platforms, and 64 bits for 64-bit platforms, so you are likely to need at least a 32-bit platform, but this almost certainly the case unless developing for embedded (or very retro) systems.
One should also remember that the argument may overflow and you can silently end up successfully allocating less memory than you thought you'd get (e.g., you may effectively call malloc(65534) when you thought you were calling malloc(2 * 65535)). But in this case it is very unlikely to be an issue for any platform capable of allocating this amount of memory.
2) Whether the malloc calls succeed. You are already checking for this, so simply running the code will answer this. You are allocating over 8 GB† of memory here, so it is likely that it will fail unless compiled for 64 bits (since the maximum addressable memory for 32 bits is 4 GB).
3) Whether you can actually use all the memory you've allocated. Some operating systems will overcommit memory and allow you to allocate much more memory than is actually available. You may run into trouble if you actually try to use all the memory you've allocated. This depends on the OS and the amount of memory actually available, possibly including swap.
4) Whether it is practical for the machine the program is run on to actually have that much data in memory. Even if the malloc calls succeed and the OS lets you use the memory allocated, it is still over 8 GB, which means that a typical machine should probably have at least 12 GB of RAM installed to accommodate this, the OS, and other programs. Otherwise it may swap like crazy, despite theoretically working.
You have revealed in comments that you are running a 64-bit machine with 4 GB of RAM installed, so if you compile for 64 bits the first two points are not an issue, but point 3 may be, and point 4 almost certainly will be. So, either install more RAM or figure out a different way to handle the data (e.g., if you are storing a graph as per the variable name, perhaps it is often sparse enough that you don't need to allocate for the worst case).
† “over 8 GB” comes from 65535 * sizeof(short *) + 65535 * 65535 * sizeof(short), where sizeof(short) is very likely to be 2, and sizeof(short *) (the pointer size) either 4 or 8. There is also some extra overhead for malloc's bookkeeping, but still it rounds to “over 8 GB”.
Some stylistic observations:
It would be better style to use one of the types from stdint.h if you want specifically 16 bits, e.g., uint16_t or uint_least16_t
You should not cast the return value of malloc in C (unlike in C++)
You can replace sizeof(unsigned short int *) with sizeof(*graph) and sizeof(unsigned short int) with sizeof(**graph) to avoid repetition (and allow you to change the type of graph without changing the malloc calls)
You don't need the int in unsigned short int
Max allowed size will be range of integer. So if you are using 16 bit OS max size is 65535, if you are using 32 bit OS size will be 4,294,967,295.
The maximum size of non-arrayed data is SIZE_MAX. SIZE_MAX is at least 65535.
SIZE_MAX is type size_t which is often the same type as unsigned but may be different.
The single largest allocation available using malloc() is SIZE_MAX.
void *malloc(size_t size);
sizeof(unsigned short int *) * 65535 may fail due to integer math overflow.
To allocate an array larger than SIZE_MAX (but each element is still <= SIZE_MAX), use calloc().
void *calloc(size_t nmemb, size_t size);
unsigned short int **graph = calloc(65535u, sizeof *graph);

Declare array large enough to hold a type

Suppose I'm given a function with the following signature:
void SendBytesAsync(unsigned char* data, T length)
and I need a buffer large enough to hold a byte array of the maximum length that can be specified by type T. How do I declare that buffer? I can't just use sizeof as it will return the size (in bytes) of type T and not the maximum value that the type could contain. I don't want to use limits.h as the underlying type could change and my buffer be too small. I can't use pow from math.h because I need a constant expression. So how do I get a constant expression for the maximum size of a type at compile time in C?
Edit
The type will be unsigned. Since everyone seems to be appalled at the idea of a statically allocated buffer determined at compile time, I'll provide a little background. This is for an embedded application (on a microcontroller) where reliability and speed are the priorities. As such, I'm perfectly OK with wasting statically assigned memory for the sake of run time integrity (no malloc issues) and performance (no overhead for memory allocation each time I need the buffer). I understand the risk that if the max size of T is too large my linker will not be able to allocate a buffer that big, but that will be a compile-time failure, which can be accommodated, rather than a run-time failure, which cannot be tolerated. If, for example I use size_t for the size of the payload and allocate the memory dynamically, there is a very real possibility that the system will not have that much memory available. I would much rather know this at compile time, than at run-time where this will result in packet loss, data corruption, etc. Looking at the function signature I provided, it is ridiculous to provide a type as a size parameter for a dynamically allocated buffer and not expect the possibility that a caller will use the max value of the type. So I'm not sure why there seems to be so much consternation about allocating that memory once, for good. I can see this being a huge problem in the Windows world where multiple processes are fighting for the same memory resources, but in the embedded world, there's only 1 task to be done and if you can't do that effectively, then it doesn't matter how much memory you saved.
Use _Generic:
#define MAX_SIZE(X) _Generic((X),
long: LONG_MAX,
unsigned long: ULONG_MAX,
/* ... */)
Prior to C11 there isn't a portable way to find an exact maximum value of an object of type T (all calculations with CHAR_BIT, for example, may yield overestimates due to padding bits).
Edit: Do note that under certain conditions (think segmented memory of real-life situations) you might not be able to allocate a buffer large enough to equal the maximum value of any given type T.
if T is unsigned, then would ((T) -1) work?
(This is probably really bad, and if so, please let me know why :-) )
Is there a reason why you are allocating the maximum possible buffer size instead of a buffer that is only as large as you need? Why not have the caller simply specify the amount of memory needed?
Recall that the malloc() function takes an argument of type size_t. That means that (size_t)(-1) (which is SIZE_MAX in C99 and later) will represent the largest value that can be passed to malloc. If you are using malloc as your allocator, then this will be your absolute upper limit.
Maybe try using a bit shift?
let's see:
unsigned long max_size = (1 << (8 * sizeof(T))) - 1
sizeof(T) gives you the number of bytes T occupies in memory. (not technically true. usually the compiler will align the structure with memory... so if T is one byte, it will actually allocate 4, or something.)
Breaking it down:
8 * sizeof(T) gives you the number of bits that size represents
1 << x is the same as saying 2 to the x power. Because every time you shift to the left, you're multiplying by two. Just as every time you shift to the left in base 10, you are multiplying by 10.
- 1 an 8-bit number can hold 256 values. 0..255.
Interesting question. I would start by looking in the 'limits' header for the max value of a numeric type T. I have not tried it but I would do something that uses T::max

Map memory to another address

X86-64, Linux, Windows.
Consider that I'd want to make some sort of "free launch for tag pointers". Basically I want to have two pointers that point to the same actual memory block but whose bits are different. (For example I want one bit to be used by GC collection or for some other reason).
intptr_t ptr = malloc()
intptr_t ptr2 = map(ptr | GC_FLAG_REACHABLE) //some magic call
int* p = int*(ptr);
int* p2 = int*(ptr2);
*p = 10;
*p2 = 20;
assert(*p == 20)
assert(p != p2)
On Linux, mmap() the same file twice. Same thing on Windows really, but it has its own set of functions for that.
Mapping the same memory (mmap on POSIX as Ignacio mentions, MapViewOfFile on Windows) to multiple virtual addresses may provide you some interesting coherency puzzles (are writes at one address visible when read at another address?). Or maybe not. I'm not sure what all the platform guarantees are.
More commonly, one simply reserves a few bits in the pointer and shifts things around as necessary.
If all your objects are aligned to 8-byte boundaries, it's common to simply store tags in the 3 least-significant bits of a pointer, and mask them off before dereferencing (as thkala mentions). If you choose a higher alignment, such as 16-bytes or 32-bytes, then there are 3 or 5 least-significant bits that can be used for tagging. Equivalently, choose a few most-significant bits for tagging, and shift them off before dereferencing. (Sometimes non-contiguous bits are used, for example when packing pointers into the signalling NaNs of IEEE-754 floats (223 values) or doubles (251 values).)
Continuing on the high end of the pointer, current implementations of x86-64 use at most 48 bits out of a 64-bit pointer (0x0000000000000000-0x00007fffffffffff + 0xffff800000000000-0xffffffffffffffff) and Linux and Windows only hand out addresses in the first range to userspace, leaving 17 most-significant bits that can be safely masked off. (This is neither portable nor guaranteed to remain true in the future, though.)
Another approach is to stop considering "pointers" and simply use indices into a larger memory array, as the JVM does with -XX:+UseCompressedOops. If you've allocated a 512MB pool and are storing 8-byte aligned objects, there are 226 possible object locations, so a 32-value has 6 bits to spare in addition to the index. A dereference will require adding the index times the alignment to the base address of the array, saved elsewhere (it's the same for every "pointer"). If you look at things carefully, this is simply a generalization of the previous technique (which always has base at 0, where things line up with real pointers).
Once upon a time I worked on a Prolog implementation that used the following technique to have spare bits in a pointer:
Allocate a memory area with a known alignment. malloc() usually allocates memory with a 4-byte or 8-byte alignment. If necessary, use posix_memalign() to get areas with a higher alignment size.
Since the resulting pointer is aligned to intervals of multiple bytes, but it represents byte-accurate addresses, you have a few spare bits that will by definition be zero in the memory area pointer. For example a 4-byte alignment gives you two spare bits on the LSB side of the pointer.
You OR (|) your flags with those bits and now have a tagged pointer.
As long as you take care to properly mask the pointer before using it for memory access, you should be perfectly fine.

Can I allocate more than 65535 bytes according C standards?

malloc defined like below:
void *malloc(size_t size);
http://pubs.opengroup.org/onlinepubs/009695399/functions/malloc.html
size_t definition (stddef.h):
size_t: Unsigned integer type of the result of the sizeof operator.
http://pubs.opengroup.org/onlinepubs/009604499/basedefs/stddef.h.html
But according this page, max limitation of size_t is 65535.
(Section Limits of Other Integer Types):
Limit of size_t: SIZE_MAX 65535
http://pubs.opengroup.org/onlinepubs/007904975/basedefs/stdint.h.html
Does it mean I can not allocate more than 65535 bytes when I want to respect C standard?
SIZE_MAX must be at least 65535. If you're running something like MS-DOS, chances are it'll actually even be that small. On a typical, reasonably current desktop computer (say, anything less than 10 years old) you can expect it to be larger, typically at least around 4 billion (232-1, to be more exact).
Whether you need to (try to) deal with a more limited system will depend on the range of targets to which you might care about porting your code. If you really might need to deal with a 16-bit compiler on a system with less than, say, 1 megabyte of addressable memory, then you'll have to write your code with that in mind. In all honesty, however, for most people that's simply irrelevant -- even relatively small portable systems (e.g., an iPod) can address far more memory than that any more. OTOH, if you're writing code for a singing greeting card, then yes, such limitations probably come with the territory (but in such cases, the standard is often something to treat more as a general guideline than an absolute law).
The minimum value of SIZE_MAX is 65535 but it can (and usually is) be more.
On most non-embedded platforms, size_t is a typedef for unsigned long and SIZE_MAX is set to ULONG_MAX.
On a 32-bit platform SIZE_MAX is usually 2^32 - 1 and on a 64 bit platform it is 2^64 - 1. Check with a printf if unsure.
printf("sizeof size_t = %zx, SIZE_MAX = %zx\n", sizeof(size_t), SIZE_MAX);
Include stdint.h to get the value of SIZE_MAX.

Why is size_t better?

The title is actually a bit misleading, but I wanted to keep it short. I've read about why I should use size_t and I often found statements like this:
size_t is guaranteed to be able to express the maximum size of any object, including any array
I don't really understand what that means. Is there some kind of cap on how much memory you can allocate at once and size_t is guaranteed to be large enough to count every byte in that memory block?
Follow-up question:
What determines how much memory can be allocated?
Let's say the biggest object your compiler/platform can have is 4 gb. size_t then is 32 bit. Now let's say you recompile your program on a 64 bit platform able to support objects of size 2^43 - 1. size_t will be at least 43 bit long (but normally it will be 64 bit at this point). The point is that you only have to recompile the program. You don't have to change all your ints to long (if int is 32 bit and long is 64 bit) or from int32_t to int64_t.
(if you are asking yourself why 43 bit, let's say that Windows Server 2008 R2 64bit doesn't support objects of size 2^63 nor objects of size 2^62... It supports 8 TB of addressable space... So 43 bit!)
Many programs written for Windows considered a pointer to be as much big as a DWORD (a 32 bit unsigned integer). These programs can't be recompiled on 64 bit without rewriting large swats of code. Had they used DWORD_PTR (an unsigned value guaranteed to be as much big as necessary to contain a pointer) they wouldn't have had this problem.
The size_t "point" is the similar. but different!
size_t isn't guaranteed to be able to contain a pointer!!
(the DWORD_PTR of Microsoft Windows is)
This, in general, is illegal:
void *p = ...
size_t p2 = (size_t)p;
For example, on the old DOS "platform", the maximum size of an object was 64k, so size_t needed to be 16 bit BUT a far pointer needed to be at least 20 bit, because the 8086 had a memory space of 1 mb (in the end a far pointer was 16 + 16 bit, because the memory of an 8086 was segmented)
Basically it means that size_t, is guaranteed to be large enough to index any array and get the size of any data type.
It is preferred over using just int, because the size of int and other integer types can be smaller than what can be indexed. For example int is usually 32-bits long which is not enough to index large arrays on 64-bit machines. (This is actually a very common problem when porting programs to 64-bit.)
That is exactly the reason.
The maximum size of any object in a given programming language is determined by a combination of the OS, the CPU architecture and the compiler/linker in use.
size_t is defined to be big enough to hold the size value of the largest possible object.
This usually means that size_t is typedef'ed to be the same as the largest int type available.
So on a 32 bit environment it would typically be 4 bytes and in a 64 bit system 8 bytes.
size_t is defined for the platform that you are compiling for. Hence it can represent the maximum for that platform.
size_t is the return of the sizeof operator (see 7.17 c99) therefore it must describe the largest possible object the system can represent.
Have a look at
http://en.wikipedia.org/wiki/Size_t

Resources