When to cast size_t - c

I'm a little confused as how to use size_t when other data types like int, unsigned long int and unsigned long long int are present in a program. I try to illustrate my confusion minimally. Imagine a program where I use
void *calloc(size_t nmemb, size_t size)
to allocate an array (one- or multidimensional). Let the call to calloc() be dependent on nrow and sizeof(unsigned long int). sizeof(unsigned long int) is obviously fine because it returns size_t. But let nrow be such that it needs to have type unsigned long int. What do I do in such a case? Do I cast nrow in the call to calloc() from unsigned long int to size_t?
Another case would be
char *fgets(char *s, int size, FILE *stream)
fgets() expects type int as its second parameter. But what if I pass it an array, let's say save, as it's first parameter and use sizeof(save) to pass it the size of the array? Do I cast the call to sizeof() to int? That would be dangerous since int isn't guaranteed to hold all possible returns from sizeof().
What should I do in these two cases? Cast, or just ignore possible warnings from tools such as splint?
Here is an example regarding calloc() (I explicitly omit error-checking for clarity!):
long int **arr;
unsigned long int mrow;
unsigned long int ncol;
arr = calloc(mrow, sizeof(long int *));
for(i = 0; i < mrow; i++) {
arr[i] = calloc(ncol, sizeof(long int));
}
Here is an example for fgets() (Error-handling again omitted for clarity!):
char save[22];
char *ptr_save;
unsigned long int mrow
if (fgets(save, sizeof(save), stdin) != NULL) {
save[strcspn(save, "\n")] = '\0';
mrow = strtoul(save, &ptr_save, 10);
}

I'm a little confused as how to use size_t when other data types like
int, unsigned long int and unsigned long long int are present in a
program.
It is never a good idea to ignore warnings. Warnings are there to direct your attention to areas of your code that may be problematic. It is much better to take a few minutes to understand what the warning is telling you -- and fix it, then to get bit by it later when you hit a corner-case and stumble off into undefined behavior.
size_t itself is just a data-type like any other. While it can vary, it generally is nothing more than an unsigned int covering the range of positive values that can be represented by int including 0 (the type size was intended to be consistent across platforms, the actual bytes on each may differ). Your choice of data-type is a basic and fundamental part of programming. You choose the type based on the range of values your variable can represent (or should be limited to representing). So if whatever you are dealing with can't be negative, then an unsigned or size_t is the proper choice. The choice then allows the compiler to help identify areas where your code would cause that to be violated.
When you compile with warnings enabled (e.g. -Wall -Wextra) which you should use on every compile, you will be warned about possible conflicts in your data-type use. (i.e. comparison between signed and unsigned values, etc...) These are important!
Virtually all modern x86 & x86_64 computers use the twos-compliment representation for signed values. In simple terms it means that if the leftmost bit of a signed number is 1 the value is negative. Herein lie the subtle traps you may fall in when mixing/casting or comparing numbers of varying type. If you choose to cast an unsigned number to a signed number and that number happens to have the most significant bit populated, your large number just became a very small number.
What should I do in these two cases? Cast, or just ignore possible
warnings...
You do what you do each time you are faced with warnings from the compiler. You analyze what is causing the warning, and then you fix it (or if you can't fix it -- (i.e. is comes from some library you don't have access to) -- you understand the warning well enough that you can make an educated decision to disregard it knowing you will not hit any corner-cases that would lead to undefined behavior.
In your examples (while neither should produce warning, they may on some compilers):
arr = calloc (mrow, sizeof(long int *));
What is the range of sizeof(long int *)? Well -- it's the range of what the pointer size can be. So, what's that? (4 bytes on x86 or 8 bytes on x86_64). So the range of values is 4-8, yes that can be properly fixed with a cast to size_t if needed, or better just:
arr = calloc (mrow, sizeof *arr);
Looking at the next example:
char save[22];
...
fgets(save, sizeof(save), stdin)
Here again what is the possible range of sizeof save? From 22 - 22. So yes, if a warnings is produced complainting about the fact that sizeof returns long unsigned and fgets calls for int, 22 can be cast to int.

When to cast size_t
You shouldn't.
Use it where it's appropriate.
(As you already noticed) the libc-library functions tell you where this is the case.
Additionally use it to index arrays.
If in doubt the type suits your program's needs you might go for the useful assertion statement as per Steve Summit's answer and if it fails start over with your program's design.
More on this here by Dan Saks: "Why size_t matters" and "Further insights into size_t"

My other answer got waaaaaaay too long, so here's a short one.
Declare your variables of natural and appropriate types. Let the compiler take care of most conversions. If you have something that is or might be a size, go ahead and use size_t. (Similarly, if you have something that's involved in file sizes or offsets, use off_t.)
Try not to mix signed and unsigned types.
If you're getting warnings about possible data loss because of larger types getting downconverted to possibly smaller types, and if you can't change the types to make the warnings go away, first (a) convince yourselves that the values, in practice, will not ever actually overflow the smaller type, then (b) add an explicit downconversion cast to make the warning go away, and for extra credit (c) add an assertion to document and enforce your assumption:
.
assert(size_i_need <= SIZE_MAX);
char *buf = malloc((size_t)size_i_need);

In general, you're right, you should not ignore the warnings! And in general, if you can, you should shy away from explicit casts, because they can make your code less reliable, or silence warning which are really trying to tell you something important.
Most of the time, I believe, the compiler should do the right thing for you. For example, malloc() expects a size_t, and the compiler knows from the function prototype that it does, so if you write
int size_i_need = 10;
char *buf = malloc(size_i_need);
the compiler will insert the appropriate conversion from int to size_t, as necessary. (I don't believe I've had warnings here I had to worry about, either.)
If the variables you're using are already unsigned, so much the better!
Similarly, if you were to write
fgets(buf, sizeof(buf), ifp);
the compiler will again insert an appropriate conversion. Here, I guess I see what you're getting at, a 64-bit compiler might emit a warning about the downconversion from long to int. Now that I think about it, I'm not sure why I haven't had that problem, because this is a common idiom.
(You also asked about passing unsigned long to malloc, and on a machine where size_t is smaller than long, I suppose that might get you warnings, too. Is that what you were worried about?)
If you've got a downsize that you can't avoid, and your compiler or some other tool is warning about it, and you want to get rid of the warning safely, you could use a cast and an assertion. That is, if you write
unsigned long long size_i_need = 23;
char *buf = malloc(size_i_need);
this might get a warning on a machine where size_t is 32 bits. So you could silence the warning with a cast (on the assumption that your unsigned long long values will never actually be too big), and then back up your assumption with a call to assert:
unsigned long long size_i_need = 23;
assert(size_i_need <= SIZE_MAX);
char *buf = malloc((size_t)size_i_need);
In my experience, the biggest nuisance is printing these things out. If you write
printf("int size = %d\n", sizeof(int));
or
printf("string length = %d\n", strlen("abc"));
on a 64-bit machine, a modern compiler will typically (and correctly) warn you that "format specifies type 'int' but the argument has type 'unsigned long'", or something to that effect. You can fix this in two ways: cast the value to match the printf format, or change the printf format to match the value:
printf("int size = %d\n", (int)sizeof(int));
printf("string length = %lu\n", strlen("abc"));
In the first case, you're assuming that sizeof's result will fit in an int (which is probably a safe bet). In the second case, you're assuming that size_t is in fact unsigned long, which may be true on a 64-bit compiler but may not be true on some other. So it's actually safer to use an explicit cast in the second case, too:
printf("string length = %lu\n", (unsigned long)strlen("abc"));
The bottom line is that abstract types like size_t don't work so well with printf; this is where we can see that the C++ output style of cout << "string length = " << strlen("abc") << endl has its advantages.
To solve this problem, there are some special printf modifiers that are guaranteed to match size_t and I think off_t and a few other abstract types, although they're not so well known. (I wasn't sure where to look them up, but while I've been composing this answer, some commenters have already reminded me.) So the best way to print one of these things (if you can remember, and unless you're using old compilers) would be
printf("string length = %zu\n", strlen("abc"));
Bottom line:
You obviously don't have to worry about passing plain int or plain unsigned to a function like calloc that expects size_t.
When calling something that might result in a downcast, such as passing a size_t to fgets where size_t is 64 bits but int is 32, or passing unsigned long long to calloc where size_t is only 32 bits, you might get warnings. If you can't make the passed-in types smaller (which in the general case you're not going to be able to do), you'll have little choice to silence the warnings but to insert a cast. In this case, to be strictly correct, you might want to add some assertions.
With all of that said, I'm not sure I've actually answered your question, so if you'd like further clarification, please ask.

Related

Assigning a short to int * fails

I understand that I can reassign a variable to a bigger type if it fits, ad its ok to do it. For example:
short s = 2;
int i = s;
long l = i;
long long ll = l;
When I try to do it with pointers it fails and I don't understand why. I have integers that I pass as arguments to functions expecting a pointer to a long long. And it hasn't failed, yet..
The other day I was going from short to int, and something weird happens, I hope someone can I explain it to me. This would be the minimal code to reproduce.
short s = 2;
int* ptr_i = &s; // here ptr_i is the pointer to s, ok , but *ptr_i is definitely not 2
When I try to do it with pointers it fails and I don't understand why.
A major purpose of the type system in C is to reduce programming mistakes. A default conversion may be disallowed or diagnosed because it is symptomatic of a mistake, not because the value cannot be converted.
In int *ptr_i = &s;, &s is the address of a short, typically a 16-bit integer. If ptr_i is set to point to the same memory and *ptr_i is used, it attempts to refer to an int at that address, typically a 32-bit integer. This is generally an error; loading a 32-bit integer from a place where there is a 16-bit integer, and we do not know what is beyond it, is not usually a desired operation. The C standard does not define the behavior when this is attempted.
In fact, there are multiple things that can go wrong with this:
As described above, using *ptr_i when we only know there is a short there may produce undesired results.
The short object may have alignment that is not suitable for an int, which can cause a problem either with the pointer conversion or with using the converted pointer.
The C standard does not define the result of converting short * to int * except that, if it is properly aligned for int, the result can be converted back to short * to produce a value equal to the original pointer.
Even if short and int are the same width, say 32 bits, and the alignment is good, the C standard has rules about aliasing that allow the compiler to assume that an int * never accesses an object that was defined as short. In consequence, optimization of your program may transform it in unexpected ways.
I have integers that I pass as arguments to functions expecting a pointer to a long long.
C does allow default conversions of integers to integers that are the same width or wider, because these are not usually mistakes.

Is using type unsigned char acceptable when only a small number is needed?

If writing a program, and I need a number that is lower than 255, is it ok to use type unsigned char for use as a number, such as if I need a number to make a loop? Is there any reason, including sticking to tradition and convention, to stick to declaring numbers with int and float, etc.
For example, should code like this never be used? Or would it be good practice to use an unsigned char in this case, since it allocates less memory than a short int?
#include <stdio.h>
typedef unsigned char short_loop;
int main(int argc, char *argv[])
{
short_loop i;
for(i = 0; i < 138; i++)
printf("StackOverflow Example Code\n");
return 0;
}
I'm asking for future reference, and only using the code above to help illustrate.
I wouldn't do it.
If your program is working with large arrays of values, then there is a benefit in terms of memory usage of using an array of char rather than an array of int.
But, for the control variable of a single loop, there is unlikely to be much benefit, and potentially a performance hit, for several reasons.
Comparing i < 138 will promote i to int before doing the comparison, since 138 is of type int. Promotions (and down-conversions) also potentially occur with initialising and incrementing i as well.
int is typically the "native type" on the host machine - which normally means it is a type that is preferred on the host (e.g. hardware provides registers that are optimised for performance when operating on that type). Which, even if some technique is used to prevent promotion of unsigned char to int before doing the comparisons in the loop, operations on int may be more effective anyway.
So, in the end, your approach might or might not give a performance benefit. The only way to be sure would be to profile the code .... and the benefits (or otherwise) would be compiler dependent (e.g. affected by optimisation approaches) and host dependent (e.g. how efficient are operations on unsigned char in comparison with operations on int)
Your approach also makes the code harder to understand, and therefore harder to get right. And, if some other programmer (or you) modify the code in future, any benefit may be negated .... for example, by accidentally reintroducing unintended type conversions.
Yes, using unsigned char is fine. I would not use the short_loop typedef though, it just obfuscates the code for no reason.
Some would recommend using uint8_t to emphasize that the intent is a small integer, instead of character data; even though uint8_t is very likely to be a typedef for unsigned char.
Theoretically you could use the typedef uint_fast8_t, which is supposed to mean "an unsigned integer type, at least 8 bits, and the fastest possible of the available types" , obviously the interpretation of "fastest" is a bit vague.
For use with PC - not very good practice.
For use in embedded device - depends.
If it is guaranteed that it will not exceed 255, then sure you can use it as it is converted to an int anyway, but the memory difference is not very significant in this example.
It violates readability.
And as others said you are obfuscating your code unnecessarily.
With the type definition.. giving it ugly name too.
indexes are known to be of type int. And that's all.

Does casting remove endian dependency in C/C++?

i.e. if we cast a C or C++ unsigned char array named arr as (unsigned short*)arr and then assign to it, is the result the same independent of machine endianness?
Side note - I saw the discussion on IBM and elsewhere on SO with example:
unsigned char endian[2] = {1, 0};
short x;
x = *(short *) endian;
...stating that the value of x will depend on the layout of endian, and hence the endianness of the machine. That means dereferencing an array is endian-dependent, but what about assigning to it?
*(short*) endian = 1;
Are all future short-casted dereferences then guaranteed to return 1, regardless of endianness?
After reading the responses, I wanted to post some context:
In this struct
struct pix {
unsigned char r;
unsigned char g;
unsigned char b;
unsigned char a;
unsigned char y[2];
};
replacing unsigned char y[2] with unsigned short y makes no individual difference, but if I make an array of these structs and put that in another struct, then I've noticed that the size of the container struct tends to be higher for the "unsigned short" version, so, since I intend to make a large array, I went with unsigned char[2] to save space overhead. I'm not sure why, but I imagine it's easier to align the uchar[2] in memory.
Because I need to do a ton of math with that variable y, which is meant to be a single short-length numerical value, I find myself casting to short a lot just to avoid individually accessing the uchar bytes... sort of a fast way to avoid ugly byte-specific math, but then I thought about endianness and whether my math would still be correct if I just cast everything like
*(unsigned short*)this->operator()(x0, y0).y = (ySum >> 2) & 0xFFFF;
...which is a line from a program that averages 4-adjacent-neighbors in a 2-D array, but the point is that I have a bunch of these operations that need to act on the uchar[2] field as a single short, and I'm trying to find the lightest (i.e. without an endian-based if-else statement every time I need to access or assign), endian-independent way of working with the short.
Thanks to strict pointer aliasing it's undefined behaviour, so it might be anything. If you'd do the same with a union however the answer is no, the result is dependent on machine endianness.
Each possible value of short has a so-called "object representation"[*], which is a sequence of byte values. When an object of type short holds that value, the bytes of the object hold that sequence of values.
You can think of endianness as just being one of the ways in which the object representation is implementation-dependent: does the byte with the lowest address hold the most significant bits of the value, or the least significant?
Hopefully this answers your question. Provided you've safely written a valid object representation of 1 as a short into some memory, when you read it back from the same memory you'll get the same value again, regardless of what the object representation of 1 actually is in that implementation. And in particular regardless of endianness. But as the others say, you do have to avoid undefined behavior.
[*] Or possibly there's more than one object representation for the same value, on exotic architectures.
Yes, all future dereferences will return 1 as well: As 1 is in range of type short, it will end up in memory unmodified and won't change behind your back once it's there.
However, the code itself violates effective typing: It's illegal to access an unsigned char[2] as a short, and may raise a SIGBUS if your architecture doesn't support unaligned access and you're particularly unlucky.
However, character-wise access of any object is always legal, and a portable version of your code looks like this:
short value = 1;
unsigned char *bytes = (unsigned char *)&value;
How value is stored in memory is of course still implementation-defined, ie you can't know what the following will print without further knowledge about the architecture:
assert(sizeof value == 2); // check for size 2 shorts
printf("%i %i\n", bytes[0], bytes[1]);

scanf("%d", char*) - char-as-int format string?

What is the format string modifier for char-as-number?
I want to read in a number never exceeding 255 (actually much less) into an unsigned char type variable using sscanf.
Using the typical
char source[] = "x32";
char separator;
unsigned char dest;
int len;
len = sscanf(source,"%c%d",&separator,&dest);
// validate and proceed...
I'm getting the expected warning: argument 4 of sscanf is type char*, int* expected.
As I understand the specs, there is no modifier for char (like %sd for short, or %lld for 64-bit long)
is it dangerous? (will overflow just overflow (roll-over) the variable or will it write outside the allocated space?)
is there a prettier way to achieve that than allocating a temporary int variable?
...or would you suggest an entirely different approach altogether?
You can use %hhd under glibc's scanf(), MSVC does not appear to support integer storage into a char directly (see MSDN scanf Width Specification for more information on the supported conversions)
It is dangerous to use that. Since there an implicit cast from a unsigned char* to an int*, if the number is more than 0xFF it is going to use bytes (max 3) next to the variable in the stack and corrupt their values.
The issue with %hhd is that depending of the size of an int (not necessarily 4 bytes), it might not be 1 byte.
It does not seem sscanf support storage of numbers into a char, I suggest you use an int instead. Although if you want to have the char roll-over, you can just cast the int into a char afterward, like:
int dest;
int len;
len = sscanf(source,"%c%d",&separator,&dest);
dest = (unsigned char)dest;
I want to read in a number never
exceeding 255 (actually much less)
into an unsigned char type variable
using sscanf.
In most contexts you save little to nothing by using char for an integer.
It generally depends on architecture and compiler, but most modern CPUs are not very well at handling data types which are of different in size from register. (Notable exception is the 32-bit int on 64-bit archs.)
Adding here penalties for non-CPU-word-aligned memory access (do not ask me why CPUs do that) use of char should be limited to the cases when one really needs a char or memory consumption is a concern.
It is not possible to do.
sscanf will never write single byte when reading an integer.
If you pass a pointer to a single allocated byte as a pointer to int, you will run out of bounds. This may be OKay due to default alignment, but you should not rely on that.
Create a temporary. This way you will also be able to run-time check it.
Probably the easiest way would be to load the number simply into a temporary integer and store it only if its in the required boundaries. (and you would probably need something like unsigned char result = tempInt & 0xFF; )
It's dangerous. Scanf will write integer sized value over the character-sized variable. In your example (most probably) nothing will happen unless you reorganize your stack variables (scanf will partially overwrite len when trying to write integer sized "dest", but then it returns correct "len" and overwrites it with "right" value).
Instead, do the "correct thing" rather than relying on compiler mood:
char source[] = "x32";
char separator;
unsigned char dest;
int temp;
int len;
len = sscanf(source,"%c%d",&separator,&temp);
// validate and proceed...
if (temp>=YOUR_MIN_VALUE && temp<=YOUR_MAX_VALUE) {
dest = (unsigned char)temp;
} else {
// validation failed
}

Int in a simulated memory array of uchar

In C, in an Unix environment (Plan9), I have got an array as memory.
uchar mem[32*1024];
I need that array to contain different fields, such as an int (integer) to indicate the size of memory free and avaliable. So, I've tried this:
uchar* memp=mem;
*memp=(int)250; //An example of size I want to assign.
I know the size of an int is 4, so I have to force with casting or something like that, that the content of the four first slots of mem have the number 250 in this case, it's big endian.
But the problem is when I try to do what I've explained it doesn't work. I suppose there is a mistake with the conversion of types. I hopefully ask you, how could I force that mem[0] to mem[3] would have the size indicated, representated as an int and no as an uchar?
Thanks in advance
Like this:
*((int*) memp) = 250;
That says "Even though memp is a pointer to characters, I want you treat it as a pointer to integers, and put this integer where it points."
Have you considered using a union, as in:
union mem_with_size {
int size;
uchar mem[32*1024];
};
Then you don't have to worry about the casting. (You still have to worry about byte-ordering, of course, but that's a different issue.)
As others have pointed out, you need to cast to a pointer to int. You also need to make sure you take alignment of the pointer in consideration: on many architectures, an int needs to start at a memory location that is divisible by sizeof(int), and if you try to access an unaligned int, you get a SIGBUS. On other architectures, it works, but slowly. On yet others, it works quickly.
A portable way of doing this might be:
int x = 250;
memcpy(mem + offset, &x, sizeof(x));
Using unions may make this easier, though, so +1 to JamieH.
Cast pointer to int, not unsigned char again!
int * memp = (int *)mem;
* memp = 250; //An example of size I want to assign.

Resources