C pointers void * buffer problem - c

Sorry for messing you all with the C stuff.
The write() takes void * buff. And i need to call this function from main() by giving the required data.
But when i am printing it throws an error. Help me out friends.
Code is as follows.
void write(int fd, void *buff,int no_of_pages)
{
// some code that writes buff into a file using system calls
}
Now i need to send the buff with the data i need.
#include "stdio.h"
#include "malloc.h"
int main()
{
int *x=(int*)malloc(1024);
*(x+2)=3192;
*(x+3)="sindhu";
printf("\n%d %s",*(x+2),*(x+3));
write(2,x,10); //(10=4bytes for int + 6 bytes for char "sindhu");
}
It warns me
warning: format ‘%s’ expects type ‘char *’, but argument 3 has type ‘int’
How can i remove this warning

By casting to a valid type:
printf("\n%d %s",*(x+2),(char*)(x+3));
Note: What you're doing looks evil. I'd reconsider this design!

Quite simply: do as the error says. Do not pass an integer to a string formatting sequence.
printf("\n%d %d", *(x+2), *(x+3));
^--- note the change

You need to use a char * to reference a string:
char * cp = "sindhu";
printf("\n%d %s", *(x+2), cp);
would be better.

There are actually a couple of interesting points in your question. Firstly, I am surprised that the printf is generating a warning that is rather helpful of your compiler as inherently printf is not type safe so no warning is necessary. Secondly, I am actually amazed that your compiler is allowing this:
*(x+3) = "sindhu";
I am pretty sure that should be an error or at the very least a warning, without an explicit cast. Note that "sindhu" is of type const char* and your array is an array of type int. So essentially what you are doing here is putting the memory address of the string into the 4th integer in your array. Now the important thing here is that this makes the very dangerous assumption that:
sizeof(int) == sizeof(char*)
This can be easily not be the case; most notably many 64-bit systems do not exhibit this property.
Bitmask's answer will eliminate the warning you are receiving, however as he suggests, I strongly advise that you change the design of your program such that this is not necessary.
Also as one final stylistic point remember that for the most part arrays and pointers in C are the same, this is not entirely true but sufficed to say that *(x+2) is equivalent to x[2] which is rather easier on the eyes when reading the code.

int *x=(int*)malloc(1024);
Lose the cast; it's not necessary, and it will suppress a useful diagnostic if you forget to #include stdlib.h or otherwise don't have a cast for malloc in scope. Secondly, it's generally better from a readability standpoint to specify the number of elements you need of a specific type, rather than a number of bytes. You'd do that like so:
int *x = malloc(N * sizeof *x);
which says "allocate enough memory to store N int values".
*(x+2)=3192;
Okay. You're assigning the integer value 3192 to x[2].
*(x+3)="sindhu";
Bad juju; I'm surprised the compiler didn't yak on this line. You're attempting to store a value of type char * to an int (since the type of x is int *, the type of *(x + 3) is an int). I'm not sure what you're trying to accomplish here; if you're trying to store the value of the pointer at x[3], note that pointer values may not necessarily be representable as an int (for example, suppose an char * is 4 bytes wide but an int is 2 bytes wide). In either case the types are not compatible, and a cast is required:
*(x + 3) = (int) "sindhu"; // equivalent to writing x[3] = (int) "sindhu"
If you're trying to copy the contents of the string to the buffer starting at x[3], this is definitely the wrong way to go about it; to make this "work" (for suitably loose definitions of "work"), you would need to use either the strcpy or memcpy library functions:
strcpy((char *) (x + 3), "sindhu"); // note the cast, and the fact that
// x + 3 is *not* dereferenced.
As for the problem in the printf statement, the type of *(x + 3) is int, not char *, which is not compatible with the %s conversion specifier. Again, to make this "work", you'd do something like
printf("%d %s\n", *(x + 2), (char *) (x + 3));
You really don't want to store different types of data in the same memory buffer in such an unstructured way; unless you really know what you're doing, it leads to massive heartburn.

Related

strcpy()/strncpy() crashes on structure member with extra space when optimization is turned on on Unix?

When writing a project, I ran into a strange issue.
This is the minimal code I managed to write to recreate the issue. I am intentionally storing an actual string in the place of something else, with enough space allocated.
// #include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <stddef.h> // For offsetof()
typedef struct _pack{
// The type of `c` doesn't matter as long as it's inside of a struct.
int64_t c;
} pack;
int main(){
pack *p;
char str[9] = "aaaaaaaa"; // Input
size_t len = offsetof(pack, c) + (strlen(str) + 1);
p = malloc(len);
// Version 1: crash
strcpy((char*)&(p->c), str);
// Version 2: crash
strncpy((char*)&(p->c), str, strlen(str)+1);
// Version 3: works!
memcpy((char*)&(p->c), str, strlen(str)+1);
// puts((char*)&(p->c));
free(p);
return 0;
}
The above code is confusing me:
With gcc/clang -O0, both strcpy() and memcpy() works on Linux/WSL, and the puts() below gives whatever I entered.
With clang -O0 on OSX, the code crashes with strcpy().
With gcc/clang -O2 or -O3 on Ubuntu/Fedora/WSL, the code crashes (!!) at strcpy(), while memcpy() works well.
With gcc.exe on Windows, the code works well whatever the optimization level is.
Also I found some other traits of the code:
(It looks like) the minimum input to reproduce the crash is 9 bytes (including zero terminator), or 1+sizeof(p->c). With that length (or longer) a crash is guaranteed (Dear me ...).
Even if I allocate extra space (up to 1MB) in malloc(), it doesn't help. The above behaviors don't change at all.
strncpy() behaves exactly the same, even with the correct length supplied to its 3rd argument.
The pointer does not seem to matter. If structure member char *c is changed into long long c (or int64_t), the behavior remains the same. (Update: changed already).
The crash message doesn't look regular. A lot of extra info is given along.
I tried all these compilers and they made no difference:
GCC 5.4.0 (Ubuntu/Fedora/OS X/WSL, all are 64-bit)
GCC 6.3.0 (Ubuntu only)
GCC 7.2.0 (Android, norepro???) (This is the GCC from C4droid)
Clang 5.0.0 (Ubuntu/OS X)
MinGW GCC 6.3.0 (Windows 7/10, both x64)
Additionally, this custom string copy function, which looks exactly like the standard one, works well with any compiler configuration mentioned above:
char* my_strcpy(char *d, const char* s){
char *r = d;
while (*s){
*(d++) = *(s++);
}
*d = '\0';
return r;
}
Questions:
Why does strcpy() fail? How can it?
Why does it fail only if optimization is on?
Why doesn't memcpy() fail regardless of -O level??
*If you want to discuss about struct member access violation, pleast head over here.
Part of objdump -d's output of a crashing executable (on WSL):
P.S. Initially I want to write a structure, the last item of which is a pointer to a dynamically allocated space (for a string). When I write the struct to file, I can't write the pointer. I must write the actual string. So I came up with this solution: force store a string in the place of a pointer.
Also please don't complain about gets(). I don't use it in my project, but the example code above only.
What you are doing is undefined behavior.
The compiler is allowed to assume that you will never use more than sizeof int64_t for the variable member int64_t c. So if you try to write more than sizeof int64_t(aka sizeof c) on c, you will have an out-of-bounds problem in your code. This is the case because sizeof "aaaaaaaa" > sizeof int64_t.
The point is, even if you allocate the correct memory size using malloc(), the compiler is allowed to assume you will never use more than sizeof int64_t in your strcpy() or memcpy() call. Because you send the address of c (aka int64_t c).
TL;DR: You are trying to copy 9 bytes to a type consisting of 8 bytes (we suppose that a byte is an octet). (From #Kcvin)
If you want something similar use flexible array members from C99:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
size_t size;
char str[];
} string;
int main(void) {
char str[] = "aaaaaaaa";
size_t len_str = strlen(str);
string *p = malloc(sizeof *p + len_str + 1);
if (!p) {
return 1;
}
p->size = len_str;
strcpy(p->str, str);
puts(p->str);
strncpy(p->str, str, len_str + 1);
puts(p->str);
memcpy(p->str, str, len_str + 1);
puts(p->str);
free(p);
}
Note: For standard quote please refer to this answer.
I reproduced this issue on my Ubuntu 16.10 and I found something interesting.
When compiled with gcc -O3 -o ./test ./test.c, the program will crash if the input is longer than 8 bytes.
After some reversing I found that GCC replaced strcpy with memcpy_chk, see this.
// decompile from IDA
int __cdecl main(int argc, const char **argv, const char **envp)
{
int *v3; // rbx
int v4; // edx
unsigned int v5; // eax
signed __int64 v6; // rbx
char *v7; // rax
void *v8; // r12
const char *v9; // rax
__int64 _0; // [rsp+0h] [rbp+0h]
unsigned __int64 vars408; // [rsp+408h] [rbp+408h]
vars408 = __readfsqword(0x28u);
v3 = (int *)&_0;
gets(&_0, argv, envp);
do
{
v4 = *v3;
++v3;
v5 = ~v4 & (v4 - 16843009) & 0x80808080;
}
while ( !v5 );
if ( !((unsigned __int16)~(_WORD)v4 & (unsigned __int16)(v4 - 257) & 0x8080) )
v5 >>= 16;
if ( !((unsigned __int16)~(_WORD)v4 & (unsigned __int16)(v4 - 257) & 0x8080) )
v3 = (int *)((char *)v3 + 2);
v6 = (char *)v3 - __CFADD__((_BYTE)v5, (_BYTE)v5) - 3 - (char *)&_0; // strlen
v7 = (char *)malloc(v6 + 9);
v8 = v7;
v9 = (const char *)_memcpy_chk(v7 + 8, &_0, v6 + 1, 8LL); // Forth argument is 8!!
puts(v9);
free(v8);
return 0;
}
Your struct pack makes GCC believe that the element c is exactly 8 bytes long.
And memcpy_chk will fail if the copying length is larger than the forth argument!
So there are 2 solutions:
Modify your structure
Using compile options -D_FORTIFY_SOURCE=0(likes gcc test.c -O3 -D_FORTIFY_SOURCE=0 -o ./test) to turn off fortify functions.
Caution: This will fully disable buffer overflow checking in the whole program!!
No answer has yet talked in detail about why this code may or may not be undefined behaviour.
The standard is underspecified in this area, and there is a proposal active to fix it. Under that proposal, this code would NOT be undefined behaviour, and the compilers generating code that crashes would fail to comply with the updated standard. (I revisit this in my concluding paragraph below).
But note that based on the discussion of -D_FORTIFY_SOURCE=2 in other answers, it seems this behaviour is intentional on the part of the developers involved.
I'll talk based on the following snippet:
char *x = malloc(9);
pack *y = (pack *)x;
char *z = (char *)&y->c;
char *w = (char *)y;
Now, all three of x z w refer to the same memory location, and would have the same value and the same representation. But the compiler treats z differently to x. (The compiler also treats w differently to one of those two, although we don't know which as OP didn't explore that case).
This topic is called pointer provenance. It means the restriction on which object a pointer value may range over. The compiler is taking z as having a provenance only over y->c, whereas x has provenance over the entire 9-byte allocation.
The current C Standard does not specify provenance very well. The rules such as pointer subtraction may only occur between two pointers to the same array object is an example of a provenance rule. Another provenance rule is the one that applies to the code we are discussing, C 6.5.6/8:
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
The justification for bounds-checking of strcpy, memcpy always comes back to this rule - those functions are defined to behave as if they were a series of character assignments from a base pointer that's incremented to get to the next character, and the increment of a pointer is covered by (P)+1 as discussed in this rule.
Note that the term "the array object" may apply to an object that wasn't declared as an array. This is spelled out in 6.5.6/7:
For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
The big question here is: what is "the array object"? In this code, is it y->c, *y, or the actual 9-byte object returned by malloc?
Crucially, the standard sheds no light whatsoever on this matter. Whenever we have objects with subobjects, the standard does not say whether 6.5.6/8 is referring to the object or the subobject.
A further complicating factor is that the standard does not provide a definition for "array", nor for "array object". But to cut a long story short, the object allocated by malloc is described as "an array" in various places in the standard, so it does seem that the 9-byte object here is a valid candidate for "the array object". (In fact this is the only such candidate for the case of using x to iterate over the 9-byte allocation, which I think everyone would agree is legal).
Note: this section is very speculative and I attempt to provide an argument as to why the solution chosen by the compilers here is not self-consistent
An argument could be made that &y->c means the provenance is the int64_t subobject. But this does immediately lead to difficulty. For example, does y have the provenance of *y? If so, (char *)y should have the the provenance *y still, but then this contradicts the rule of 6.3.2.3/7 that casting a pointer to another type and back should return the original pointer (as long as alignment is not violated).
Another thing it doesn't cover is overlapping provenance. Can a pointer compare unequal to a pointer of the same value but a smaller provenance (which is a subset of the larger provenance) ?
Further, if we apply that same principle to the case where the subobject is an array:
char arr[2][2];
char *r = (char *)arr;
++r; ++r; ++r; // undefined behavior - exceeds bounds of arr[0]
arr is defined as meaning &arr[0] in this context, so if the provenance of &X is X, then r is actually bounded to just the first row of the array -- perhaps a surprising result.
It would be possible to say that char *r = (char *)arr; leads to UB here, but char *r = (char *)&arr; does not. In fact I used to promote this view in my posts many years ago. But I no longer do: in my experience of trying to defend this position, it just can't be made self-consistent, there are too many problem scenarios. And even if it could be made self-consistent, the fact remains that the standard doesn't specify it. At best, this view should have the status of a proposal.
To finish up, I would recommend reading N2090: Clarifying Pointer Provenance (Draft Defect Report or Proposal for C2x).
Their proposal is that provenance always applies to an allocation. This renders moot all the intricacies of objects and subobjects. There are no sub-allocations. In this proposal, all of x z w are identical and may be used to range over the whole 9-byte allocation. IMHO the simplicity of this is appealing, compared to what was discussed in my previous section.
This is all because of -D_FORTIFY_SOURCE=2 intentionally crashing on what it decides is unsafe.
Some distros build gcc with -D_FORTIFY_SOURCE=2 enabled by default. Some don't. This explains all the differences between different compilers. Probably the ones that don't crash normally will if you build your code with -O3 -D_FORTIFY_SOURCE=2.
Why does it fail only if optimization is on?
_FORTIFY_SOURCE requires compiling with optimization (-O) to keep track of object sizes through pointer casts / assignments. See the slides from this talk for more about _FORTIFY_SOURCE.
Why does strcpy() fail? How can it?
gcc calls __memcpy_chk for strcpy only with -D_FORTIFY_SOURCE=2. It passes 8 as the size of the target object, because that's what it thinks you mean / what it can figure out from the source code you gave it. Same deal for strncpy calling __strncpy_chk.
__memcpy_chk aborts on purpose. _FORTIFY_SOURCE may be going beyond things that are UB in C and disallowing things that look potentially dangerous. This gives it license to decide that your code is unsafe. (As others have pointed out, a flexible array member as the last member of your struct, and/or a union with a flexible-array member, is how you should express what you're doing in C.)
gcc even warns that the check will always fail:
In function 'strcpy',
inlined from 'main' at <source>:18:9:
/usr/include/x86_64-linux-gnu/bits/string3.h:110:10: warning: call to __builtin___memcpy_chk will always overflow destination buffer
return __builtin___strcpy_chk (__dest, __src, __bos (__dest));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(from gcc7.2 -O3 -Wall on the Godbolt compiler explorer).
Why doesn't memcpy() fail regardless of -O level?
IDK.
gcc fully inlines it just an 8B load/store + a 1B load/store. (Seems like a missed optimization; it should know that malloc didn't modify it on the stack, so it could just store it from immediates again instead of reloading. (Or better keep the 8B value in a register.)
why making things complicated? Overcomplexifying like you're doing gives just more space for undefined behaviour, in that part:
memcpy((char*)&p->c, str, strlen(str)+1);
puts((char*)&p->c);
warning: passing argument 1 of 'puts' from incompatible pointer ty
pe [-Wincompatible-pointer-types]
puts(&p->c);
you're clearly ending up in an unallocated memory area or somewhere writable if you're lucky...
Optimizing or not may change the values of the addresses, and it may work (since the addresses match), or not. You just cannot do what you want to do (basically lying to the compiler)
I would:
allocate just what's needed for the struct, don't take the length of the string inside into account, it's useless
don't use gets as it's unsafe and obsolescent
use strdup instead of the bug-prone memcpy code you're using since you're handling strings. strdup won't forget to allocate the nul-terminator, and will set it in the target for you.
don't forget to free the duplicated string
read the warnings, put(&p->c) is undefined behaviour
test.c:19:10: warning: passing argument 1 of 'puts' from incompatible pointer ty
pe [-Wincompatible-pointer-types]
puts(&p->c);
My proposal
int main(){
pack *p = malloc(sizeof(pack));
char str[1024];
fgets(str,sizeof(str),stdin);
p->c = strdup(str);
puts(p->c);
free(p->c);
free(p);
return 0;
}
Your pointer p->c is the cause of crash.
First initialize struct with size of "unsigned long long" plus size of "*p".
Second initialize pointer p->c with the required area size.
Make operation copy: strcpy(p->c, str);
Finally free first free(p->c) and free(p).
I think it was this.
[EDIT]
I'll insist.
The cause of the error is that its structure only reserves space for the pointer but does not allocate the pointer to contain the data that will be copied.Take a look
int main()
{
pack *p;
char str[1024];
gets(str);
size_t len_struc = sizeof(*p) + sizeof(unsigned long long);
p = malloc(len_struc);
p->c = malloc(strlen(str));
strcpy(p->c, str); // This do not crashes!
puts(&p->c);
free(p->c);
free(p);
return 0;
}
[EDIT2]
This is not a traditional way to store data but this works:
pack2 *p;
char str[9] = "aaaaaaaa"; // Input
size_t len = sizeof(pack) + (strlen(str) + 1);
p = malloc(len);
// Version 1: crash
strcpy((char*)p + sizeof(pack), str);
free(p);

Not Understanding Void Pointer / How to cast? (C Programming)

Suppose I want to write a function:
int read_file (char *filename, void *abc)
that reads the file, and puts the numbers in an array, which abc points to.
I must use the void pointer - I know how I would do it if it were int *abc instead of void, as I could treat abc syntactically like an array, and do stuff like abc[0]=1, but here I can't do that, as it's a void pointer.
I'm not too familiar with void pointers, and how I should get this to work. Please help! I prefer not to post code, as this is for a school assignment, and just want to know how I would put the information in the file into an array pointed to by abc, maybe with casting (not sure how to do that though).
I am already familiar with putting file information into an array, if it's given by int abc.
If read_file is always called with an int array for the abc parameter, you can just copy it to an int pointer and work with that.
int *p = abc;
In most cases, you need to cast when changing from one type to another, however a void * may be freely cast to or from any non-function pointer without a cast safely.
Underneath the hood all pointers are the same - they're integers which represent memory addresses. So you can cast them every which way. Just go:
int *p = (int *)abc;
And voila - you have your int * which you already know how to deal with. This is the "casting" thing, by the way - write the desired type in parentheses in front of your expression to "cast" it. It'll take the same bits and reinterpret them in a different way.
In a few cases C is actually smart enough to convert the data rather than blindly use the same bits. For example:
float f = 3.75;
int i = (int)f;
In this case i will contain 3 because it rounds down in this case. And an int is stored in memory differently than a float, so there is actual conversion going on here.
And in other cases it will forbid you to cast at all, because of how little sense it makes:
char *c = "Hello, world!";
float f = (float)c;
But quite often you can get away with it. Especially with pointers, everything is fair game. Now, mind you, with great power comes great responsibility. Although you can do it, doesn't mean that the result will be sensible and that using it won't crash your program. Be careful.

Why am I getting warnings when returning array from a function?

I've seen a few examples of returning array from function on stackoverflow. I followed those examples but i am still getting warnings.
#include <stdio.h>
#include <stdlib.h>
char * getNum();
int main(){
int * num;
int * i;
num = getNum();
puts(num);
return 0;
}
char * getNum(){
FILE * fp;
fp = fopen("number", "r"); //getting a 1000 digit number from a file
char * n; //putting it in "array"
n = (char *)malloc(1000 * sizeof(char));
char x[100];
int i, j = 0;
while(!feof(fp)){
fgets(x, 100, fp);
for(i=0; i<50; i++){ //getting the first 50 characters in a line
n[j] = x[i]; //to avoid "new line"
j++;
}
}
fclose(fp);
n[1000] = '\0';
return n;
}
puts(num) gives the right number should I just ignore the warnings?
Why are they popping up?
I hope this isn't considered a duplicat.
cc 8.c -o 8
8.c: In function ‘main’:
8.c:11:9: warning: assignment from incompatible pointer type
num = getNum();
^
8.c:12:10: warning: passing argument 1 of ‘puts’ from incompatible pointer type
puts(num);
^
In file included from 8.c:1:0:
/usr/include/stdio.h:695:12: note: expected ‘const char *’ but argument is of type ‘int *’
extern int puts (const char *__s);
^
The problem is that you seem to be assigning the result of calling getNum to a variable of type int *, which doesn't make much sense as getNum returns a char *.
You also try to print with puts that variable of type int *, while puts accepts only a const char *.
What's more, you're getting out of bounds in your function's code, as I've already mentioned in the comments: n = (char *)malloc(1000 * sizeof(char)); allocates memory for exactly 1000 characters. n[1000] = '\0'; tries to access the 1001st (!) character. Remember, the array indices start from zero!
n = (char *)malloc(1000 * sizeof(char)); // ugly code
That should really be
n = malloc(1000);
if (n==NULL) {perror("malloc"); exit(EXIT_FAILURE);};
because sizeof(char) is always 1, and because you should not cast (in C) the result of malloc, and because malloc can fail and you should test against that.
Then you do later
n[1000] = '\0'; // wrong code
This changes the 1001-th byte (not the last 1000-th one), so is undefined behavior (it is a buffer overflow). You should be really scared, bad things could happen (and if they don't, it is because you are unlucky). You probably want
n[999] = '\0'; // the 1000th byte
BTW, I guess you are using GCC or Clang on some POSIX like system. You really should take the habit of compiling with all warnings and debug info:
cc -Wall -g 8.c -o 8
(If using make you probably want CFLAGS= -Wall -g in your Makefile)
You might want to learn more about instrumentation options, e.g. you could also pass -fsanitize=address to gcc
Then you can use the gdb debugger and perhaps also valgrind. But be sure to improve your own code till you get no warnings.
Of course num = getNum(); is incorrect. You are assigning a pointer into an integer (so you have a type mismatch). That is often wrong (and in the rare cases -not here- you would really mean it, you should have some explicit cast to show to the reader of your code - perhaps you next year - what you really wanted to do).
You want to convert a string to a number. Read more the documentation of the C standard library. You want to use atoi (type man atoi in a terminal, or read atoi(3)) then code:
num = atoi(getNum());
You are also coding:
while(!feof(fp)) // bad code
This is wrong (and a bit tricky, see this). feof is only valid after some read operation (and not before, see feof(3)). since you want to remove the ending \n (but read fgets(3) & strchr(3)...) you probably want instead
do {
if (NULL==fgets(x, 100, fp)) break;
char*eol = strchr(x, '\n');
if (eol) *eol = 0;
} while (!feof(fp));
at last your code is also incorrect because malloc is not initializing the returned memory zone. So your n might have a zero byte in some unexpected place.
Perhaps you should read about the (POSIX specific) strdup(3) or getline(3). Both could make your code simpler. And think about the adverse situation of files with very long lines (you might have a file with a line of many thousand bytes, and you could even have a file without any newlines in it).
Your puts(num) is also wrong, since also a type mismatch. Consider using printf(3) (but take the habit of ending the control format string with \n or else use fflush(3) because <stdio.h> is buffering), so printf("%d\n", num);
The problem is that you are assigning a pointer of type char * to one of type int *, hence "INCOMPATIBLE" pointers.
You also attempted to pass the int * pointer to puts() which expects a char * pointer, thus there is another warning.
Why are you doing this? If they are clearly pointers of different type it's very strange that you attempted it anyway.
It would be interesting to know, if you think this is correct for some reason.
You need to carefully read warnings and try to understand what they mean, it would help you learn a lot of the basics and good practice too.
Also, please note that: while (!feof(fp)) is not a good way to check if there is nothing else to read because fgets() needs to fail first (to attempt a read after EOF) so you still do the for (i ...) loop after the bad read. You should change to while (fget(x, 100, fp) != NULL)
Finally, in c there is no need to cast void * to any other poitner type, and it's considered bad practice to cast the return value of malloc() because of the reasons explained in this answer

When to cast size_t

I'm a little confused as how to use size_t when other data types like int, unsigned long int and unsigned long long int are present in a program. I try to illustrate my confusion minimally. Imagine a program where I use
void *calloc(size_t nmemb, size_t size)
to allocate an array (one- or multidimensional). Let the call to calloc() be dependent on nrow and sizeof(unsigned long int). sizeof(unsigned long int) is obviously fine because it returns size_t. But let nrow be such that it needs to have type unsigned long int. What do I do in such a case? Do I cast nrow in the call to calloc() from unsigned long int to size_t?
Another case would be
char *fgets(char *s, int size, FILE *stream)
fgets() expects type int as its second parameter. But what if I pass it an array, let's say save, as it's first parameter and use sizeof(save) to pass it the size of the array? Do I cast the call to sizeof() to int? That would be dangerous since int isn't guaranteed to hold all possible returns from sizeof().
What should I do in these two cases? Cast, or just ignore possible warnings from tools such as splint?
Here is an example regarding calloc() (I explicitly omit error-checking for clarity!):
long int **arr;
unsigned long int mrow;
unsigned long int ncol;
arr = calloc(mrow, sizeof(long int *));
for(i = 0; i < mrow; i++) {
arr[i] = calloc(ncol, sizeof(long int));
}
Here is an example for fgets() (Error-handling again omitted for clarity!):
char save[22];
char *ptr_save;
unsigned long int mrow
if (fgets(save, sizeof(save), stdin) != NULL) {
save[strcspn(save, "\n")] = '\0';
mrow = strtoul(save, &ptr_save, 10);
}
I'm a little confused as how to use size_t when other data types like
int, unsigned long int and unsigned long long int are present in a
program.
It is never a good idea to ignore warnings. Warnings are there to direct your attention to areas of your code that may be problematic. It is much better to take a few minutes to understand what the warning is telling you -- and fix it, then to get bit by it later when you hit a corner-case and stumble off into undefined behavior.
size_t itself is just a data-type like any other. While it can vary, it generally is nothing more than an unsigned int covering the range of positive values that can be represented by int including 0 (the type size was intended to be consistent across platforms, the actual bytes on each may differ). Your choice of data-type is a basic and fundamental part of programming. You choose the type based on the range of values your variable can represent (or should be limited to representing). So if whatever you are dealing with can't be negative, then an unsigned or size_t is the proper choice. The choice then allows the compiler to help identify areas where your code would cause that to be violated.
When you compile with warnings enabled (e.g. -Wall -Wextra) which you should use on every compile, you will be warned about possible conflicts in your data-type use. (i.e. comparison between signed and unsigned values, etc...) These are important!
Virtually all modern x86 & x86_64 computers use the twos-compliment representation for signed values. In simple terms it means that if the leftmost bit of a signed number is 1 the value is negative. Herein lie the subtle traps you may fall in when mixing/casting or comparing numbers of varying type. If you choose to cast an unsigned number to a signed number and that number happens to have the most significant bit populated, your large number just became a very small number.
What should I do in these two cases? Cast, or just ignore possible
warnings...
You do what you do each time you are faced with warnings from the compiler. You analyze what is causing the warning, and then you fix it (or if you can't fix it -- (i.e. is comes from some library you don't have access to) -- you understand the warning well enough that you can make an educated decision to disregard it knowing you will not hit any corner-cases that would lead to undefined behavior.
In your examples (while neither should produce warning, they may on some compilers):
arr = calloc (mrow, sizeof(long int *));
What is the range of sizeof(long int *)? Well -- it's the range of what the pointer size can be. So, what's that? (4 bytes on x86 or 8 bytes on x86_64). So the range of values is 4-8, yes that can be properly fixed with a cast to size_t if needed, or better just:
arr = calloc (mrow, sizeof *arr);
Looking at the next example:
char save[22];
...
fgets(save, sizeof(save), stdin)
Here again what is the possible range of sizeof save? From 22 - 22. So yes, if a warnings is produced complainting about the fact that sizeof returns long unsigned and fgets calls for int, 22 can be cast to int.
When to cast size_t
You shouldn't.
Use it where it's appropriate.
(As you already noticed) the libc-library functions tell you where this is the case.
Additionally use it to index arrays.
If in doubt the type suits your program's needs you might go for the useful assertion statement as per Steve Summit's answer and if it fails start over with your program's design.
More on this here by Dan Saks: "Why size_t matters" and "Further insights into size_t"
My other answer got waaaaaaay too long, so here's a short one.
Declare your variables of natural and appropriate types. Let the compiler take care of most conversions. If you have something that is or might be a size, go ahead and use size_t. (Similarly, if you have something that's involved in file sizes or offsets, use off_t.)
Try not to mix signed and unsigned types.
If you're getting warnings about possible data loss because of larger types getting downconverted to possibly smaller types, and if you can't change the types to make the warnings go away, first (a) convince yourselves that the values, in practice, will not ever actually overflow the smaller type, then (b) add an explicit downconversion cast to make the warning go away, and for extra credit (c) add an assertion to document and enforce your assumption:
.
assert(size_i_need <= SIZE_MAX);
char *buf = malloc((size_t)size_i_need);
In general, you're right, you should not ignore the warnings! And in general, if you can, you should shy away from explicit casts, because they can make your code less reliable, or silence warning which are really trying to tell you something important.
Most of the time, I believe, the compiler should do the right thing for you. For example, malloc() expects a size_t, and the compiler knows from the function prototype that it does, so if you write
int size_i_need = 10;
char *buf = malloc(size_i_need);
the compiler will insert the appropriate conversion from int to size_t, as necessary. (I don't believe I've had warnings here I had to worry about, either.)
If the variables you're using are already unsigned, so much the better!
Similarly, if you were to write
fgets(buf, sizeof(buf), ifp);
the compiler will again insert an appropriate conversion. Here, I guess I see what you're getting at, a 64-bit compiler might emit a warning about the downconversion from long to int. Now that I think about it, I'm not sure why I haven't had that problem, because this is a common idiom.
(You also asked about passing unsigned long to malloc, and on a machine where size_t is smaller than long, I suppose that might get you warnings, too. Is that what you were worried about?)
If you've got a downsize that you can't avoid, and your compiler or some other tool is warning about it, and you want to get rid of the warning safely, you could use a cast and an assertion. That is, if you write
unsigned long long size_i_need = 23;
char *buf = malloc(size_i_need);
this might get a warning on a machine where size_t is 32 bits. So you could silence the warning with a cast (on the assumption that your unsigned long long values will never actually be too big), and then back up your assumption with a call to assert:
unsigned long long size_i_need = 23;
assert(size_i_need <= SIZE_MAX);
char *buf = malloc((size_t)size_i_need);
In my experience, the biggest nuisance is printing these things out. If you write
printf("int size = %d\n", sizeof(int));
or
printf("string length = %d\n", strlen("abc"));
on a 64-bit machine, a modern compiler will typically (and correctly) warn you that "format specifies type 'int' but the argument has type 'unsigned long'", or something to that effect. You can fix this in two ways: cast the value to match the printf format, or change the printf format to match the value:
printf("int size = %d\n", (int)sizeof(int));
printf("string length = %lu\n", strlen("abc"));
In the first case, you're assuming that sizeof's result will fit in an int (which is probably a safe bet). In the second case, you're assuming that size_t is in fact unsigned long, which may be true on a 64-bit compiler but may not be true on some other. So it's actually safer to use an explicit cast in the second case, too:
printf("string length = %lu\n", (unsigned long)strlen("abc"));
The bottom line is that abstract types like size_t don't work so well with printf; this is where we can see that the C++ output style of cout << "string length = " << strlen("abc") << endl has its advantages.
To solve this problem, there are some special printf modifiers that are guaranteed to match size_t and I think off_t and a few other abstract types, although they're not so well known. (I wasn't sure where to look them up, but while I've been composing this answer, some commenters have already reminded me.) So the best way to print one of these things (if you can remember, and unless you're using old compilers) would be
printf("string length = %zu\n", strlen("abc"));
Bottom line:
You obviously don't have to worry about passing plain int or plain unsigned to a function like calloc that expects size_t.
When calling something that might result in a downcast, such as passing a size_t to fgets where size_t is 64 bits but int is 32, or passing unsigned long long to calloc where size_t is only 32 bits, you might get warnings. If you can't make the passed-in types smaller (which in the general case you're not going to be able to do), you'll have little choice to silence the warnings but to insert a cast. In this case, to be strictly correct, you might want to add some assertions.
With all of that said, I'm not sure I've actually answered your question, so if you'd like further clarification, please ask.

Why do we need to cast what malloc returns?

int length = strlen(src);
char *structSpace = malloc(sizeof(String) + length + 1);
String *string = (String*) structSpace;
int *string = (int*) structSpace;
*I created a struct called String
You don't. void* will implicitly cast to whatever you need in C. See also the C FAQ on why you would want to explicitly avoid casting malloc's return in C. #Sinan's answer further illustrates why this has been followed inconsistently.
Because malloc returns a pointer to void, i.e., it is simply allocating chunks of memory with no regard as to the data that will be stored there. In C++ your returned void* will not be implicitly cast to the pointer of your type. In your example, you have not cast what malloc has returned. Malloc returned a void* which was implicitly cast to a char*, but on the next line you... ok, it doesn't make much sense anymore.
The C FAQ list is an invaluable resource: Why does some code carefully cast the values returned by malloc to the pointer type being allocated?.
This is one of the few issues that makes the statement "C++ is a superset of C" not completely true. In C, a void pointer can be implicitly cast to any other type of pointer. However, C++ is a bit more strict with type safety, so you need to explicitly cast the return value of malloc to the appropriate type. Usually, this isn't much of an issue, because C++ code tends to use new rather than malloc, which doesn't require typecasting.
In C, casting the result from malloc is unnecessary and should not be done. Doing so can, for example, cover up the error of having failed to #include <stdlib.h>, so you don't have a prototype for malloc in scope. This, in turn, can lead to other errors and lack of portability (though the worst offenders in that respect are now mostly obsolete).
In C++, you must cast the result of malloc to assign it to a pointer to any type other than void. Unless you really need to write code that can be compiled as either C or C++, however, you should generally avoid using malloc in C++ at all and allocate memory using new.
You tend to see this kind of C code from novices (or C++ coders :-) ):
int main() {
int len = 40;
char *my_string = (char *) malloc(sizeof(char)*len);
return 0;
}
This is unnecessary and evil, you can avoid the useless cast via including stdlib.h
#include <stdlib.h>
int main() {
int len = 40;
char *my_string = malloc(sizeof(char)*len);
return 0;
}
You should strongly consider casting after using the malloc command because it provides for greater portability and greater compatibility with other parts of your program. If you do not do so, you may run the risk of incompatible data types which could result in errors.

Resources