Valgrind C: Argument of function has a fishy (possibly negative) value

Valgrind C: Argument of function has a fishy (possibly negative) value - c

I'm getting this error message in multiple places in my code where I call malloc or realloc. Here is one example.
void* reallocate_array(void* ptr, size_t size)
{
return realloc(ptr,size)
}
EDIT2: Looks like the problem is in the test case. I can't modify this
char* reallocated = (char*) reallocate_array(allocated,-1)
Here is my solution which got rid of the fishy value error
if((int)size < 0)
{
return NULL;
}
I was under the impression that size_t was an unsigned integer, meaning it could never be negative. Could this be bug in Valgrind or is it warning me of a possible wraparound?
EDIT: Valgrind output
==20841== 1 errors in context 1 of 3:
==20841== Argument 'size' of function realloc has a fishy (possibly negative) value: -1
==20841== at 0x4C2BB78: realloc (vg_replace_malloc.c:785)
==20841== by 0x4057B1: reallocate_array (allocation.c:24)
==20841== by 0x402A8A: reallocate_NegativeBytes_Test::TestBody() (tests.cpp:56)

Props to Valgrind: it is quite right that passing a negative actual argument to a parameter of unsigned type is fishy. The result in your particular case will be that the argument is converted to the largest representable value of type size_t, but that may very well be different from what was intended.
I suspect that the conversion to a large, positive unsigned value is indeed different from what was intended by your test case. Inasmuch as the test case expects the memory allocation to fail, the case probably was passing, but not for the reason I suspect its author anticipated. At minimum, it is a bad test case on account of being unclear about what it is intended to test.
As for your solution, it is fishy, too. The standard has this to say about your conversion of (size_t) -1 to type int:
Otherwise, the new type is signed and the value cannot be represented
in it; either the result is implementation-defined or an
implementation-defined signal is raised.
(C2011 6.3.1.3/3)
Implementation-defined behavior and the possibility of a signal is not a comfortable place to hang your hat.
If you insist on validating the value inside the function, then you might consider this test:
if (size & ~(SIZE_MAX >> 1)) {
// ...
}
That tests whether the most-significant bit of size is set, which it will be if the value was converted from any negative number of a type no wider than size_t.
Myself, however, I would try to get the test case changed or dropped. Use Valgrind's complaints about it to support your argument, if you wish.

Related

Assign result of sizeof() to ssize_t

It happened to me that I needed to compare the result of sizeof(x) to a ssize_t.
Of course GCC gave an error (lucky me (I used -Wall -Wextra -Werror)), and I decided to do a macro to have a signed version of sizeof().
#define ssizeof (ssize_t)sizeof
And then I can use it like this:
for (ssize_t i = 0; i < ssizeof(x); i++)
The problem is, do I have any guarantees that SSIZE_MAX >= SIZE_MAX? I imagine that sadly this is never going to be true.
Or at least that sizeof(ssize_t) == sizeof(size_t), which would cut half of the values but would still be close enough.
I didn't find any relation between ssize_t and size_t in the POSIX documentation.
Related question:
What type should be used to loop through an array?

There is no guarantee that SSIZE_MAX >= SIZE_MAX. In fact, it is very unlikely to be the case, since size_t and ssize_t are likely to be corresponding unsigned and signed types, so (on all actual architectures) SIZE_MAX > SSIZE_MAX. Casting an unsigned value to a signed type which cannot hold that value is Undefined Behaviour. So technically, your macro is problematic.
In practice, at least on 64-bit platforms, you're unlikely to get into trouble if the value you are converting to ssize_t is the size of an object which actually exists. But if the object is theoretical (eg sizeof(char[3][1ULL<<62])), you might get an unpleasant surprise.
Note that the only valid negative value of type ssize_t is -1, which is an error indication. You might be confusing ssize_t, which is defined by Posix, with ptrdiff_t, which is defined in standard C since C99. These two types are the same on most platforms, and are usually the signed integer type corresponding to size_t, but none of those behaviours is guaranteed by either standard. However, the semantics of the two types are different, and you should be aware of that when you use them:
ssize_t is returned by a number of Posix interfaces in order to allow the function to signal either a number of bytes processed or an error indication; the error indication must be -1. There is no expectation that any possible size will fit into ssize_t; the Posix rationale states that:
A conforming application would be constrained not to perform I/O in pieces larger than {SSIZE_MAX}.
This is not a problem for most of the interfaces which return ssize_t because Posix generally does not require interfaces to guarantee to process all data. For example, both read and write accept a size_t which describes the length of the buffer to be read/written and return an ssize_t which describes the number of bytes actually read/written; the implication is that no more than SSIZE_MAX bytes will be read/written even if more data were available. However, the Posix rationale also notes that a particular implementation may provide an extension which allows larger blocks to be processed ("a conforming application using extensions would be able to use the full range if the implementation provided an extended range"), the idea being that the implementation could, for example, specify that return values other than -1 were to be interpreted by casting them to size_t. Such an extension would not be portable; in practices, most implementations do limit the number of bytes which can be processed in a single call to the number which can be reported in ssize_t.
ptrdiff_t is (in standard C) the type of the result of the difference between two pointers. In order for subtraction of pointers to be well defined, the two pointers must refer to the same object, either by pointing into the object or by pointing at the byte immediately following the object. The C committee recognised that if ptrdiff_t is the signed equivalent of size_t, then it is possible that the difference between two pointers might not be representable, leading to undefined behaviour, but they preferred that to requiring that ptrdiff_t be a larger type than size_t. You can argue with this decision -- many people have -- but it has been in place since C90 and it seems unlikely that it will change now. (Current standard wording from , §6.5.6/9: "If the result is not representable in an object of that type [ptrdiff_t], the behavior is undefined.")
As with Posix, the C standard does not define undefined behaviour, so it would be a mistake to interpret that as forbidding the subtraction of two pointers in very large objects. An implementation is always allowed to define the result of behaviour left undefined by the standard, so that it is completely valid for an implementation to specify that if P and Q are two pointers to the same object where P >= Q, then (size_t)(P - Q) is the mathematically correct difference between the pointers even if the subtraction overflows. Of course, code which depends on such an extension won't be fully portable, but if the extension is sufficiently common that might not be a problem.
As a final point, the ambiguity of using -1 both as an error indication (in ssize_t) and as a possibly castable result of pointer subtraction (in ptrdiff_t) is not likely to be a present in practice provided that size_t is as large as a pointer. If size_t is as large as a pointer, the only way that the mathematically correct value of P-Q could be (size_t)(-1) (aka SIZE_MAX) is if the object that P and Q refer to is of size SIZE_MAX, which, given the assumption that size_t is the same width as a pointer, implies that the object plus the following byte occupy every possible pointer value. That contradicts the requirement that some pointer value (NULL) be distinct from any valid address, so we can conclude that the true maximum size of an object must be less than SIZE_MAX.

Please note that you can't actually do this.
The largest possible object in x86 Linux is just below 0xB0000000 in size, while SSIZE_T_MAX is 0x7FFFFFFF.
I haven't checked if read and stuff actually can handle the largest possible objects, but if they can it worked like this:
ssize_t result = read(fd, buf, count);
if (result != -1) {
size_t offset = (size_t) result;
/* handle success */
} else {
/* handle failure */
}
You may find libc is busted. If so, this would work if the kernel is good:
ssize_t result = sys_read(fd, buf, count);
if (result >= 0 || result < -256) {
size_t offset = (size_t) result;
/* handle success */
} else {
errno = (int)-result;
/* handle failure */
}

ssize_t is a POSIX type, it's not defined as part of the C standard. POSIX defines that ssize_t must be able to handle numbers in the interval [-1, SSIZE_MAX], so in principle it doesn't even need to be a normal signed type. The reason for this slightly weird definition is that the only place ssize_t is used is as the return value for read/write/etc. functions.
In practice it's always a normal signed type of the same size as size_t. But if you want to be really pedantic about your types, you shouldn't use it for other purposes than handling return values for IO syscalls. For a general "pointer-sized" signed integer type C89 defines ptrdiff_t. Which in practice will be the same as ssize_t.
Also, if you look at the official spec for read(), you'll see that for the 'nbyte' argument it says that 'If the value of nbyte is greater than {SSIZE_MAX}, the result is implementation-defined.'. So even if a size_t is capable of representing larger values than SSIZE_MAX, it's implementation-defined behavior to use larger values than that for the IO syscalls (the only places where ssize_t is used, as mentioned). And similar for write() etc.

I'm gonna take this on as an X-Y problem. The issue you have is that you want to compare a signed number to an unsigned number. Rather than casting the result of sizeof to ssize_t, You should check if your ssize_t value is less than zero. If it is, then you know it is less than the your size_t value. If not, then you can cast it to size_t and then do a comparison.
For an example, here's a compare function that returns -1 if the signed number is less than the unsigned number, 0 if equal, or 1 if the signed number is greater than the unsigned number:
int compare(ssize_t signed_number, size_t unsigned_number) {
int ret;
if (signed_number < 0 || (size_t) signed_number < unsigned_number) {
ret = -1;
}
else {
ret = (size_t) signed_number > unsigned_number;
}
return ret;
}
If all you wanted was the equivalent of < operation, you can go a bit simpler with something like this:
(signed_number < 0 || (size_t) signed_number < unsigned_number))
That line will give you 1 if signed_number is less than unsigned_number and it limits the branching overhead. Just takes an extra < operation and a logical-OR.

Using + to check if multiple pointers are all NULL

Syntactically it makes sense (Although it looks like some other language, which I don't particularly enjoy), it can save a lot of typing and code space, but how bad is it?
if(p1 + (unsigned)p2 + (unsigned)p3 == NULL)
{
// all pointers are NULL, exit
}
Using pointer arithmetic with a pointer rvalue, I don't see how it could give a false result (the entire expression to evaluate to NULL even though not all pointers are NULL), but I don't exactly know how much evilness this potentially hides, so is it bad to do this, not-common way of checking if plenty of pointers are all NULL?

Regarding to the original version of the question, which omitted the casts ...
it can save a lot of typing and code space, but how bad is it?
Very, very bad. Its behavior is altogether undefined, and if your compiler fails to reject it then you should get yourself a better one. Subtraction of one pointer from another is defined under some circumstances (and yields an integer result), but it is never meaningful to add two pointers.
Inasmuch as it shouldn't even compile, every keystroke used to type it instead of something that works is wasted, so no, it doesn't save typing or code space.
I don't see how it could give a false result.
If the compiler actually accepts it, the result can be anything at all. It is undefined.
so is it bad to do this, not-common way of checking if plenty of pointers are all NULL?
Yes.
Regarding the modified question in which all but one of the pointers are cast to integer:
The casts do not rescue the code -- multiple problems remain.
If the remaining pointer does not point to a valid object, or if the sum of the integers is negative or greater than the number of elements in the array to which the pointer points then the result of the pointer addition is still undefined (where a pointer to a scalar is treated as a pointer to a one-element array). Of course, the integer sum can't be negative in this particular case, but that's of minimal advantage.
C does not guarantee that casting a null pointer to an integer yields the value 0. It is common for it to do so, but the language does not require it.
C does not guarantee that non-null pointers convert to nonzero integers, and with your particular code that's a genuine risk. The type unsigned is not necessarily large enough to afford a distinct value to every distinct pointer.
Even if all of the foregoing were not a problem for some particular implementation -- that is, if you could safely perform arithmetic on a NULL pointer, and NULL pointers reliably converted to integers as zero, and non-NULL pointers reliably converted to nonzero -- the test could still go wrong because two nonzero unsigned integers can sum to zero. That happens where the arithmetic sum of the two is equal to UINT_MAX + 1.

There are multiple reasons why this is not a reliable method.
First, when you add an integer to a pointer, the C standard does not say what happens if the result is outside of the array into which the pointer points. (For these purposes, pointing just one past the last element, the end of the array, counts as inside, not outside. Also, a pointer to a single object counts as an array of one object.) Note that the C standard does not just not say what the result of the addition is; it does not say what the behavior of the entire program is. So, once you execute an addition that goes outside of an array, you cannot predict (from the C standard) what your program will do at all.
One likely result is that the compiler will see pointer + integer + integer and reason (or, more technically, apply transformations as if this reasoning were used) that pointer + integer is valid only if pointer is not NULL, and then the result is never NULL, so the expression pointer + integer is never NULL. Similarly, pointer + integer + integer is never NULL. Therefore pointer + integer + integer == NULL is always false, and we can optimize the program by removing this code completely. Thus, the code to handle the case when all pointers are NULL will be silently removed from your program.
Second, even if the C standard did guarantee a result of the addition, this expression could, hypothetically, evaluate to NULL even if none of the pointers were NULL. For example, consider a 16-bit address space where the first pointer were represented with the address 0x7000, the second were 0x6000, and the third were 0x3000. (I will also suppose these are char * pointers, so one element is one byte.) If we add these, the mathematical result is 0x10000. In 16-bit arithmetic, that wraps, so the computed result is 0x0000. Thus, the expression could evaluate to zero, which is likely used for NULL.
Third, unsigned may be narrower than pointers (for example, it may be 32 bits while pointers are 64), so the cast may lose information—there may be non-zero bits in the bits that were lost during the conversion, so the test will fail to detect them.
There are situations where we want to optimize pointer tests, and there are legitimate but non-standard ways to do it. On some processors, branching can be expensive, so doing some arithmetic with one test and one branch may be faster than doing three tests and three branches. C provides an integer type intended for working with pointer representations: uintptr_t, declared in <stdint.h>. With that, we can write this code:
if (((uintptr_t) p1 | (uintptr_t) p2 | (uintptr_t) p3) == 0) …
What this does is convert each pointer to an unsigned integer of a width suitable for working with pointer representations. The C standard does not say what the result of this conversion is, but it is intended to be unsurprising, and C implementations for flat address spaces may document that the result is the memory address. They may also document that NULL is the zero address. Once we have these integers, we OR them together instead of adding them. The result of an OR has a bit set if either of the corresponding bits in its operands was set. Thus, if any one of the addresses is not zero, then the result will not be zero either. So this code, if executed in a suitable C implementation, will perform the test you desire.
(I have used such tests in special high-performance code to test whether all pointers were aligned as desired, rather than to test for NULL. In that case, I had direct access to the compiler developers and could ensure the compiler would behave as desired. This is not standard C code.)

Using any sort of pointer arithmetic on non-array pointers is undefined behavior in C.

Using bit operations to "turn off" binary digits of a pointer

I was able to use bit operations to "turn off" binary digits of a number.
Ex:
x = x & ~(1<<0)
x = x & ~(1<<1)
(and repeat until desired number of digits starting from the right are changed to 0)
I would like to apply this technique to a pointer's address.
Unfortunately, the & operator cannot be used with pointers. Using the same lines of code as above, where x is a pointer, the compiler says "invalid operands to binary & (have int and int)."
I tried to typecast the pointers as ints, but that doesn't work as I assume the ints are too small (and I just realized I'm not allowed to cast).
(note: though this is part of a homework problem, I've already reasoned out why I need to turn off some digits after a good couple hours, so I'm fine in that regard. I'm simply trying to see if I can get a clever technique to do what I want to do here).
Restrictions: I cannot use loops, conditionals, any special functions, constants greater than 255, division, mod.
(edit: added restrictions to the bottom)

Use uintptr_t from <stdint.h>. You should always use unsigned types for bit twiddling, and (u)intptr_t is specifically chosen to be able to hold a pointer's value.
Note however that adjusting a pointer manually and dereferencing it is undefined behaviour, so watch your step. You shall be able to recover the exact original value of the pointer (or another valid pointer) before doing so.
Edit : from your comment I understand that you don't plan on dereferencing the twiddled pointer at all, so no undefined behaviour for you. Here is how you can check if your pointers share the same 64-byte block :
uintptr_t p1 = (uintptr_t)yourPointer1;
uintptr_t p2 = (uintptr_t)yourPointer2;
uintptr_t mask = ~(uintptr_t)63u; // Shave off 5 low-order bits
return (p1 & mask) == (p2 & mask);

C language standard library includes the (optional though) type intptr_t, for which there is guarantee that "any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer".
Of course if you perform bitwise operation on the integer than the result is undefined behaviour.

Edit:
How unfortunate haha. I need a function to show two pointers are in
the same 64-byte block of memory. This holds true so long as every
digit but the least significant 6 digits of their binary
representations are equal. By making sure the last 6 digits are all
the same (ex: 0), I can return true if both pointers are equal. Well,
at least I hope so.
You should be able to check if they're in the same 64 block of memory by something like this:
if ((char *)high_pointer - (char *)low_pointer < 64) {
// do stuff
}
Edit2: This is likely to be undefined behaviour as pointed out by chris.
Original post:
You're probably looking for intptr_t or uintptr_t. The standard says you can cast to and from these types to pointers and have the value equal to the original.
However, despite it being a standard type, it is optional so some library implementations may choose not to implement it. Some architectures might not even represent pointers as integers so such a type wouldn't make sense.
It is still better than casting to and from an int or a long since it is guaranteed to work on implementations that supply it. Otherwise, at least you'll know at compile time that your program will break on a certain implementation/architecture.
(Oh, and as other answers have stated, manually changing the pointer when casted to an integer type and dereferencing it is undefined behaviour)

How is {int i=999; char c=i;} different from {char c=999;}?

My friend says he read it on some page on SO that they are different,but how could the two be possibly different?
Case 1
int i=999;
char c=i;
Case 2
char c=999;
In first case,we are initializing the integer i to 999,then initializing c with i,which is in fact 999.In the second case, we initialize c directly with 999.The truncation and loss of information aside, how on earth are these two cases different?
EDIT
Here's the link that I was talking of
why no overflow warning when converting int to char
One member commenting there says --It's not the same thing. The first is an assignment, the second is an initialization
So isn't it a lot more than only a question of optimization by the compiler?

They have the same semantics.
The constant 999 is of type int.
int i=999;
char c=i;
i created as an object of type int and initialized with the int value 999, with the obvious semantics.
c is created as an object of type char, and initialized with the value of i, which happens to be 999. That value is implicitly converted from int to char.
The signedness of plain char is implementation-defined.
If plain char is an unsigned type, the result of the conversion is well defined. The value is reduced modulo CHAR_MAX+1. For a typical implementation with 8-bit bytes (CHAR_BIT==8), CHAR_MAX+1 will be 256, and the value stored will be 999 % 256, or 231.
If plain char is a signed type, and 999 exceeds CHAR_MAX, the conversion yields an implementation-defined result (or, starting with C99, raises an implementation-defined signal, but I know of no implementations that do that). Typically, for a 2's-complement system with CHAR_BIT==8, the result will be -25.
char c=999;
c is created as an object of type char. Its initial value is the int value 999 converted to char -- by exactly the same rules I described above.
If CHAR_MAX >= 999 (which can happen only if CHAR_BIT, the number of bits in a byte, is at least 10), then the conversion is trivial. There are C implementations for DSPs (digital signal processors) with CHAR_BIT set to, for example, 32. It's not something you're likely to run across on most systems.
You may be more likely to get a warning in the second case, since it's converting a constant expression; in the first case, the compiler might not keep track of the expected value of i. But a sufficiently clever compiler could warn about both, and a sufficiently naive (but still fully conforming) compiler could warn about neither.
As I said above, the result of converting a value to a signed type, when the source value doesn't fit in the target type, is implementation-defined. I suppose it's conceivable that an implementation could define different rules for constant and non-constant expressions. That would be a perverse choice, though; I'm not sure even the DS9K does that.
As for the referenced comment "The first is an assignment, the second is an initialization", that's incorrect. Both are initializations; there is no assignment in either code snippet. There is a difference in that one is an initialization with a constant value, and the other is not. Which implies, incidentally, that the second snippet could appear at file scope, outside any function, while the first could not.

Any optimizing compiler will just make the int i = 999 local variable disappear and assign the truncated value directly to c in both cases. (Assuming that you are not using i anywhere else)

It depends on your compiler and optimization settings. Take a look at the actual assembly listing to see how different they are. For GCC and reasonable optimizations, the two blocks of code are probably equivalent.

Aside from the fact that the first also defines an object iof type int, the semantics are identical.

i,which is in fact 999
No, i is a variable. Semantically, it doesn't have a value at the point of the initialization of c ... the value won't be known until runtime (even though we can clearly see what it will be, and so can an optimizing compiler). But in case 2 you're assigning 999 to a char, which doesn't fit, so the compiler issues a warning.

What is "-1L" / "1L" in C?

What do "-1L", "1L" etc. mean in C ?
For example, in ftell reference, it says
... If an error occurs, -1L is returned ...
What does this mean ? What is the type of "1L" ?
Why not return NULL, if error occurs ?

The L specifies that the number is a long type, so -1L is a long set to negative one, and 1L is a long set to positive one.
As for why ftell doesn't just return NULL, it's because NULL is used for pointers, and here a long is returned. Note that 0 isn't used because 0 is a valid value for ftell to return.
Catching this situation involves checking for a non-negative value:
long size;
FILE *pFile;
...
size = ftell(pFile);
if(size > -1L){
// size is a valid value
}else{
// error occurred
}

ftell() returns type long int, the L suffix applied to a literal forces its type to long rather than plain int.
NULL would be wholly incorrect because it is a macro representing a pointer not an integer. Its value, when interpreted and an integer may represent a valid file position, while -1 (or any negative value) cannot.
For all intents and purposes you can generally simply regard the error return as -1, the L suffix is not critical to correct operation in most cases due to implicit casting rules

It means to return the value as a long, not an int.

That means -1 as a long (rather than the default type for numbers, which is an integer)

-1 formated in long int is a -1L. Why not simple NULL? Because NULL in this function is a normal result and can't sygnalize error too. Why NULL in this function is a normal result? Because NULL == 0 and ftell returns position in a stream, when you are on start of stream function returns 0 and this is a normal result not error, then if you compare this function to NULL to check error, you will be get error when you will be on start position in stream.

Editing today implies more details are still wanted.
Mark has it right. The "L" suffix is long. -1L is thus a long -1.
My favored way to test is different from Marks and is a matter of preference not goodness.
if ( err >= 0L )
success
else
error
By general habit I do not like looking for explicit -1. If a -2 ever pops up in the future my code will likely not break.
Ever since I started using C, way back in the beginning of C, I noticed most library routines returning int values return 0 for success and -1 on error. Most.
NULL is not normally returned by integer functions as NULL is a pointer value. Besides the clash of types a huge reason for not returning NULL depends on a bit of history.
Things were not clean back when C was being invented, and maybe not even on small systems today. The original K&R C did not guarantee NULL would be zero as is usually the case on CPUs with virtual memory. On small "real memory" systems zero may be a valid address making it necessary for "invalid" addresses to be moved to some other OS dependent location. Such would really be accepted by the CPU, just not generated in the normal scheme of things. Perhaps a very high memory address. I can even see a hidden array called extern const long NULL[1]; allowing NULL to become the address of this otherwise unused array.
Back then you saw a lot of if ( ptr != NULL ) statements rather than if ( ptr ) for people serious about writing portable code.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight