size_t used as a value in a formula - c

Here is a short snippet of a function reading lines.
How is that possible that it compares bufsize with ((size_t)-1)/2 ?
I imagined comparing a variable to eg. int - that is just impossible; to INT_MAX on the contrary it is correct, I think.
So how can that code actually work and give no errors?
int c;
size_t bufsize = 0;
size_t size = 0;
while((c=fgetc(infile)) != EOF) {
if (size >= bufsize) {
if (bufsize == 0)
bufsize = 2;
else if (bufsize <= ((size_t)-1)/2)
bufsize = 2*size;
else {
free(line);
exit(3);
}
newbuf = realloc(line,bufsize);
if (!newbuf) {
free(line);
abort();
}
line = newbuf;
}
/* some other operations */
}

(size_t)-1
This is casting the value -1 to a size_t. (type)value is a cast in C.
Since size_t is an unsigned type, this is actually the maximum value that size_t can hold, so it's used to make sure that the buffer size can actually be safely doubled (hence the subsequent division by two).

The code relies on some assumptions about bits and then does a well known hack for finding the maximum size_t value (provided that size_t doesn't accommodate more bits than the register, a safe bet on many machines).
First it fills a register up with 1 bits, then it casts it into a size_t data type, so the comparison will work. As long as that register is larger in number of bits than the size_t data type, then the (if any) unused 1 bits will be truncated, and you will get the largest unsigned number that can fit in size_t bits.
After you have that, it divides by two to get half of that number, and does the comparison to see if it seems to be safe to increase size without going over the "maximum" size_t. but by then, it's dividing a size_t data type, and comparing two size_t data types (a type safe operation).
If you really wanted to remove this bit-wizardy (ok, it's not the worst example of bit wizardy I've seen). Consider that the following snippet
else if (bufsize <= ((size_t)-1)/2)
bufsize = 2*size;
could be replaced with
else if (bufsize <= (MAX_SIZE/2)
bufsize = 2*size;
and be type safe without casting and more readable.

(size_t)-1 casts -1 to the type size_t, which results in SIZE_MAX (a macro defined in stdint.h), the maximum value that the size_t type can hold.
So the comparison is checking whether bufsize is less than or equal to one half the maximum value that can be contained in a size_t

size_t isn't being interpreted as a value, it's being used to cast the value of negative one to the type size_t.
((size_t)-1)/2
is casting -1 to a size_t and then dividing by 2.

The size_t in ((size_t)-1)/2) is simply being used as a cast: casting -1 to size_t.
The trick here is that size_t is unsigned, so the cast (size_t) -1 will be converted to the maximum value of size_t, or SIZE_MAX. This is useful in the context of the loop. However, I'd prefer to see SIZE_MAX used directly rather than this trick.

Related

What are the ramifications of returning the value -1 as a size_t return value in C?

I am reading a textbook and one of the examples does this. Below, I've reproduced the example in abbreviated form:
#include <stdio.h>
#define SIZE 100
size_t linearSearch(const int array[], int searchVal, size_t size);
int main(void)
{
int myArray[SIZE];
int mySearchVal;
size_t returnValue;
// populate array with data & prompt user for the search value
// call linear search function
returnValue = linearSearch(myArray, mySearchVal, SIZE);
if (returnValue != -1)
puts("Value Found");
else
puts("Value Not Found");
}
size_t linearSearch(const int array[], int key, size_t size)
{
for (size_t i = 0; i < size; i++) {
if (key == array[i])
return i;
}
return -1;
}
Are there any potential problems with this? I know size_t is defined as an unsigned integral type so it seems as if this might be asking for trouble at some point if I'm returning -1 as a size_t return value.
There's a few APIs that come to mind which use the maximum signed or unsigned integer value as a sentinel value. For example, C++'s std::string::find() method returns std::string::npos if the value given to find() could not be found within the string, and std::string::npos is equal to (std::string::size_type)-1.
Similarly, on iOS and OS X, NSArray's indexOfObject: method return NSNotFound when the object cannot be found in the array. Surprisingly, NSNotFound is actually defined to NSIntegerMax, which is either INT_MAX for 32-bit platforms or LONG_MAX for 64-bit platforms, even though NSArray indexes are typically NSUInteger (which is either unsigned int for 32-bit platforms or unsigned long for 64-bit platforms).
It does mean that there will be no distinction between “not found” and “element number 18,446,744,073,709,551,615” (for 64-bit systems), but whether that is an acceptable trade off is up to you.
An alternative is to have the function return the index through a pointer argument and have the function's return value indicate success or failure, e.g.
#include <stdbool.h>
bool linearSearch(const int array[], int val, size_t size, size_t *index)
{
// find value and then
if (found)
{
*index = indexOfFoundItem;
return true;
}
else
{
*index = 0; // optional, in some cases, better to leave *index untouched
return false;
}
}
Your compiler may decide to complain about comparing signed with unsigned — GCC or Clang will if provoked* — but otherwise "it works". On two's-complement machines (most machines these days), (size_t)-1 is the same as SIZE_MAX — indeed, as discussed in extenso in the comments, it is the same for one's-complement or sign-magnitude machines because of the wording in §6.3.1.3 of the C99 and C11 standards).
Using (size_t)-1 to indicate 'not found' means that you can't distinguish between the last entry in the biggest possible array and 'not found', but that's seldom an actual problem.
So, it's just the one edge case where I could end up having a problem?
The array would have to be an array of char, though, to be big enough to cause trouble — and while you could have 4 GiB memory with a 32-bit machine, it's pretty implausible to have all that memory committed to a character array (and it's very much less likely to be an issue with 64-bit machines; most don't run to 16 exbibytes of memory). So it isn't a practical edge case.
In POSIX, there is a ssize_t type, the signed type of the same size of size_t. You could consider using that instead of size_t. However, it causes the same angst that (size_t)-1 causes, in my experience. Plus on a 32-bit machine, you could have a 3 GiB chunk of memory treated as an array of char, but with ssize_t as a return type, you couldn't usefully use more than 2 GiB — or you'd need to use SSIZE_MIN (if it existed; I'm not sure it does) instead of -1 as the signal value.
*
GCC or Clang has to be provoked fairly hard. Simply using -Wall is not sufficient; it takes -Wextra (or the specific -Wsign-compare option) to trigger a warning. Since I routinely compile with -Wextra, I'm aware of the issue; not everyone is as vigilant.
Comparing signed and unsigned quantities is fully defined by the standard, but can lead to counter-intuitive results (because small negative numbers appear very large when converted to unsigned values), which is why the compilers complain if requested to do so.
Normally if you want to return negative values and still have some notion of a size type you use ssize_t. gcc and clang both complain but the following compiles. Note, some of the following is undefined behavior...
#include <stdio.h>
#include <stdint.h>
size_t foo() {
return -1;
}
void print_bin(uint64_t num, size_t bytes);
void print_bin(uint64_t num, size_t bytes) {
int i = 0;
for(i = bytes * 8; i > 0; i--) {
(i % 8 == 0) ? printf("|") : 1;
(num & 1) ? printf("1") : printf("0");
num >>= 1;
}
printf("\n");
}
int main(void){
long int x = 0;
printf("%zu\n", foo());
printf("%ld\n", foo());
printf("%zu\n", ~(x & 0));
printf("%ld\n", ~(x & 0));
print_bin((~(x & 0)), 8);
}
The output is
18446744073709551615
-1
18446744073709551615
-1
|11111111|11111111|11111111|11111111|11111111|11111111|11111111|11111111
I'm on a 64bit machine. The following in binary
|11111111|11111111|11111111|11111111|11111111|11111111|11111111|11111111
can mean -1 or 18446744073709551615, it depends on context ie in what way the type that has that binary representation is being used.

fseek - fails skipping a large amount of bytes?

I'm trying to skip a large amount of bytes before using fread to read the next bytes.
When size is small #define size 6404168 - it works:
long int x = ((long int)size)*sizeof(int);
fseek(fincache, x, SEEK_CUR);
When size is huge #define size 649218227, it doesn't :( The next fread reads garbage, can't really understand which offset is it reading from.
Using fread instead as a workaround works in both cases but its really slow:
temp = (int *) calloc(size, sizeof(int));
fread(temp,1, size*sizeof(int), fincache);
free(temp);
Assuming sizoef(int) is 4 and you are on a 32 bit system (where sizeof(long) is 4),
So 649218227*4 would overflow what a long can hold. Signed integer overflow is undefined behaviour. So you it works for smaller values (that's less than LONG_MAX).
You can use a loop instead to fseek() necessary bytes.
long x;
intmax_t len = size;
for(;len>0;){
x = (long) (len>LONG_MAX?LONG_MAX:len);
fseek(fincache, x, SEEK_CUR);
len = len-x;
}
The offset argument of fseek is required to be a long, not a long long. So x must fit into a long, else don't use fseek.
Since your platform's int is most likely 32-bit, multiplying 649,218,227 with sizeof(int) results in a number that exceeds INT_MAX and LONG_MAX, which are both 2**31-1 on 32-bit platforms. Since fseek accepts a long int, the resulting overflow causes your program to print garbage.
You should consult your compiler's documentation to find if it provides an extension for 64-bit seeking. On POSIX systems, for example, you can use fseeko, which accepts an offset of type off_t.
Be careful not to introduce overflow before even calling the 64-bit seeking function. Careful code could look like this:
off_t offset = (off_t) size * (off_t) sizeof(int);
fseeko(fincache, offset, SEEK_CUR);
Input guidance for fseek:
http://www.tutorialspoint.com/c_standard_library/c_function_fseek.htm
int fseek(FILE *stream, long int offset, int whence)
offset − This is the number of bytes to offset from whence.
You are invoking undefined behavior by passing a long long (whose value is bigger then the Max of Long int) to fseek rather then the required long.
As is known, UB can do anything, including not work.
Try this, You may have to read it out if it's such a large number
size_t toseek = 6404168;
//change the number to increase it
while(toseek>0)
{
char buffer[4096];
size_t toread = min(sizeof(buffer), toseek);
size_t read = fread(buffer, 1, toread, stdin);
toseek = toseek - read;
}

error: comparison between signed and unsigned integer expressions

I am getting above cited error when compiling following code.
long _version_tag;
size_t _timing;
ssize_t bytes_read = read(fd, &_version_tag, sizeof(long));
if (bytes_read < sizeof(long) || _version_tag != TIMING_FILE_VERSION_TAG)
return -1;
gcc complains at this point:
if (bytes_read < sizeof(long) || _version_tag != TIMING_FILE_VERSION_TAG)
^
I haved even tried to explicitly cast bytes_read to long but in vain. Can somebody please help what's going on here?
read(3) returns -1 if it fails, so test for that first. If it succeeded, then you can test the length. Something like:
ssize_t bytes_read = read(fd, &_version_tag, sizeof(long));
if (bytes_read == -1 || (size_t)bytes_read < sizeof(long) || _version_tag != TIMING_FILE_VERSION_TAG)
return -1;
The problem is that bytes_read is of type ssize_t which is signed while sizeof(long) is of type size_t which is unsigned.
As sizeof(long) will never be too big to be represented as ssize_t on any system I can imagine, I would recommend to cast the result of sizeof(long) to ssize_t:
bytes_read < (ssize_t)(sizeof(long)) // Or static_cast in C++
This is usually just a warning, not an error as the result is well defined by the standard. However, implicit comparison of signed and unsigned integer types can have surprising results and should thus be avoided.
Various answers offer the solution to solve the warning caused by bytes_read < sizeof(long) to cast one side of the comparison to the type of the other side.
(size_t) bytes_read < sizeof(long) // or
bytes_read < (ssize_t) sizeof(long)
If the first approach is taken, then a preceding test should occur.
bytes_read == -1
Lots of comments reflect on the merits of casting one side of the other. Casting is always a bit tricky for there exist the potential for demoting casting (going to a reduced range) and losing information.
In OP's example of reading into long _version_tag, it makes no difference which side was cast as long is certainly has a size less than 32,767 - the smallest maximum value of ssize_t and less than 65,535 - the smallest maximum value of size_t.
If looking for a general solution and not assuming _version_tag is a long, but potential something far larger, and using Ref, then code could use the following. Code knows the value in bytes_read, if not -1, must fit in a size_t as it certainly will be no larger than what was passed to read().
foo _version_tag;
ssize_t bytes_read = read(fd, &_version_tag, sizeof _version_tag);
if (bytes_read == -1 || (size_t) bytes_read < sizeof _version_tag || ...)
return -1;
For my part, I eschew casting and prefer multiplying to avoid down-casting.
if (bytes_read == -1 || ((size_t)1 * bytes_read) < sizeof _version_tag || ...)
return -1;
ssize_t
is a signed type, but the result of sizeof is size_t which is an unsigned type. In C if you want to avoid the warning/error you can cast the sizeof result to ssize_t.
ssize_t is signed, where size_t is unsigned. You first need to make sure bytes_read is non-negative (meaning the call did not fail). If that check passes, you can just do bytes_read < static_cast<ssize_t>(sizeof(long)).

How the condition to check whether the link's size in a symbolic link file is too big, works in this code?

Here is a piece of code from the lib/xreadlink.c file in GNU Coreutils..
/* Call readlink to get the symbolic link value of FILENAME.
+ SIZE is a hint as to how long the link is expected to be;
+ typically it is taken from st_size. It need not be correct.
Return a pointer to that NUL-terminated string in malloc'd storage.
If readlink fails, return NULL (caller may use errno to diagnose).
If malloc fails, or if the link value is longer than SSIZE_MAX :-),
give a diagnostic and exit. */
char * xreadlink (char const *filename)
{
/* The initial buffer size for the link value. A power of 2
detects arithmetic overflow earlier, but is not required. */
size_t buf_size = 128;
while (1)
{
char* buffer = xmalloc(buf_size);
ssize_t link_length = readlink(filename, buffer, buf_size);
if(link_length < 0)
{
/*handle failure of system call*/
}
if((size_t) link_length < buf_size)
{
buffer[link_length] = 0;
return buffer;
}
/*size not sufficient, allocate more*/
free (buffer);
buf_size *= 2;
/*Check whether increase is possible*/
if (SSIZE_MAX < buf_size || (SIZE_MAX / 2 < SSIZE_MAX && buf_size == 0))
xalloc_die ();
}
}
The code is understandable except I could not understand how the check for whether the link's size is too big works, that is the line:
if (SSIZE_MAX < buf_size || (SIZE_MAX / 2 < SSIZE_MAX && buf_size == 0))
Further, how can
(SIZE_MAX / 2 < SSIZE_MAX)
condition be true on any system???
SSIZE_MAX is the maximum value of the signed variety of size_t. For instance if size_t is only 16 bits (very unlikely these days), SIZE_MAX is 65535 while ssize_max is 32767. More likely it is 32 bits (giving 4294967295 and 2147483647 respectively), or even 64 bits (giving numbers too big to type here :-) ).
The basic problem to solve here is that readlink returns a signed value even though SIZE_MAX is an unsigned one ... so once buf_size exceeds SSIZE_MAX, it's impossible to read the link, as the large positive value will result in a negative return value.
As for the "furthermore" part: it quite likely can't, i.e., you're right. At least on any sane system, anyway. (It is theoretically possible to have, e.g., a 32-bit SIZE_MAX but a 33-bit signed integer so that SSIZE_MAX is also 4294967295. Presumably this code is written to guard against theoretically-possible, but never-actually-seen, systems.)

Whats the easiest way to convert a long in C to a char*?

What is the clean way to do that in C?
wchar_t* ltostr(long value) {
int size = string_size_of_long(value);
wchar_t *wchar_copy = malloc(value * sizeof(wchar_t));
swprintf(wchar_copy, size, L"%li", self);
return wchar_copy;
}
The solutions I came up so far are all rather ugly, especially allocate_properly_size_whar_t uses double float base math.
A long won't have more than 64 digits on any platform (actually less than that, but I'm too lazy to figure out what the actual minimum is now). So just print to a fixed-size buffer, then use wcsdup rather than trying to calculate the length ahead of time.
wchar_t* ltostr(long value) {
wchar_t buffer[ 64 ] = { 0 };
swprintf(buffer, sizeof(buffer), L"%li", value);
return wcsdup(buffer);
}
If you want a char*, it's trivial to translate the above:
char* ltostr(long value) {
char buffer[ 64 ] = { 0 };
snprintf(buffer, sizeof(buffer), "%li", value);
return strdup(buffer);
}
This will be faster and less error-prone than calling snprintf twice, at the cost of a trivial amount of stack space.
int charsRequired = snprintf(NULL, 0, "%ld", value) + 1;
char *long_str_buffer = malloc(charsRequired);
snprintf(long_str_buffer, charsRequired, "%ld", value);
The maximum number of digits is given by ceil(log10(LONG_MAX)). You can precompute this value for the most common ranges of long using the preprocessor:
#include <limits.h>
#if LONG_MAX < 1u << 31
#define LONG_MAX_DIGITS 10
#elif LONG_MAX < 1u << 63
#define LONG_MAX_DIGITS 19
#elif LONG_MAX < 1u << 127
#define LONG_MAX_DIGITS 39
#else
#error "unsupported LONG_MAX"
#endif
Now, you can use
wchar_t buffer[LONG_MAX_DIGITS + 2];
int len = swprintf(buffer, sizeof buffer / sizeof *buffer, L"%li", -42l);
to get a stack-allocated wide-character string. For a heap-allocated string, use wcsdup() if available or a combination of malloc() and memcpy() otherwise.
Many people would recommend you avoid this approach, because it's not apparent that the user of your function will have to call free at some point. Usual approach is to write into a supplied buffer.
Since you receive a long, you know it's range will be in –2,147,483,648 to 2,147,483,647 and since swprintf() uses locale ("C") by default (you control that part), you only need 11 characters. This saves you from string_size_of_long().
You could either (for locale C):
wchar_t* ltostr(long value) {
wchar_t *wchar_copy = malloc(12 * sizeof(wchar_t));
swprintf(wchar_copy, 12, L"%li", value);
return wchar_copy;
}
Or more general but less portable, you could use _scwprintf to get the length of the string required (but then it's similar to your original solution).
PS: I'd simplify the memory allocation and freeing more than this "tool-box" function.
You can use the preprocessor to calculate an upper bound on the number of chars required to hold the text form of an integer type. The following works for signed and unsigned types (eg MAX_SIZE(int)) and leaves room for the terminating \0 and possible minus sign.
#define MAX_SIZE(type) ((CHAR_BIT * sizeof(type)) / 3 + 2)

Resources