If size_t is signed like -1 is incorrect the code?
For example in this strncat implementation at the end of function size_t is -1. Can overflow?
Consider this example:
#include <stdio.h>
#include <string.h>
char *util_strncat(char *destination, const char *source, size_t len) {
char *temp = destination + strlen(destination);
while (len--) {
*temp++ = *source++;
if(*source == '\0'){
break;
}
}
*temp = '\0';
printf("%zu\n",len); //-1
return destination;
}
int main() {
char dest[7] = "abcd";
char src[] = "efg";
util_strncat(dest, src, 2);
printf("%s\n", dest);
return 0;
}
what happens if size_t is signed like -1 (?)
If code truly printed a "-1", then the compilation is not C compliant.
in this strncat implementation at the end of function size_t is -1.
No it is not -1 with a conforming C compiler.
size_t is an unsigned integer of some width - at least 16 bits.
while (len--) { loop continues until len == 0. With the post decrement len--, the value after the evaluation wraps around to SIZE_MAX, some large positive value.
Can overflow?
Mathematically yes size_t math can overflow. In C unsigned math is well defined to wrap around1, in effect modulo SIZE_MAX + 1 for type size_t. §6.2.5 9
I'd expect printf("%zu\n",len); to report 18446744073709551615, 4294967295 or some other Mersenne number.
size_t which is the unsigned integer type of the result of the sizeof operator; C11dr §7.19 2
1 A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
Questionable code
Consider util_strncat(ptr, "", size) can lead to undefined behavior (UB).
int main() {
char dest[7] = "abcd";
char src[] = "\0xyz";
util_strncat(dest, src, 2);
printf("%s\n", dest);
printf("%c\n", dest[5]); // 'x'!!
}
The %zu cant print the negative number at any circumstances. When unsigned number of the type size_t wraps over when decreasing its value becomes SIZE_MAX.
Related
In the below code length of string s in 3 and length of t is 5. So, 3-5 = -2 which is smaller than 0. Then, why does the below code print 3?
#include <stdio.h>
#include <string.h>
void printlength(char *s, char *t){
unsigned int i=0;
int len = ((strlen(s) - strlen(t))> i ? strlen(s):strlen(t));
printf("%d",len);
}
int main()
{
char *x ="abc";
char *y ="defgh";
printlength(x,y);
return 0;
}
when -2 is converted to an unsigned int the result is the unsigned int value (UINT_MAX + 1- 2 or UINT_MAX - 1) , which is greater than i. strlen returns size_t which is an unsigned data type.
Also size_t is the correct type for len which we would print with printf("%zu",len).
Suprisingly when you compared the result of subtraction with i and the value of i is 0. You can do this
size_t slen = strlen(s);
size_t tlen = strlen(t);
printf("%zu\n", (slen > tlen)? slen : tlen);
Your problem is with subtracting greater unsigned value from the smaller one.
`(unsigned) 3 - (unsigned) 5` = (unsigned) 4294967294 which is > 0.
Use proper types for your calculations and proper logic. Remember that strlen returns value of type size_t.
No need to repeat strlen operation for the same string.
The improved version of your program could look like this:
#include <stdio.h>
#include <string.h>
void printlength(char *s, char *t){
size_t len;
size_t sLen = strlen(s);
size_t tLen = strlen(t);
if(sLen > tLen)
len = sLen - tLen;
else
len = tLen - sLen;
printf("len = %u\n\n",len);
printf("NOTE: (unsigned) 3 - (unsigned) 5 = %u", -2);
}
int main()
{
char *x ="abc";
char *y ="defgh";
printlength(x,y);
return 0;
}
OUTPUT:
len = 2
NOTE: (unsigned) 3 - (unsigned) 5 = 4294967294
So, 3-5 = -2
Thats for signed ints, for size_t which strlen() returns and is unsigned, that's a pretty big number.
The prototype of strlen() is:
size_t strlen ( const char * );
It's return value type is size_t, which in most cases, is an unsigned integer type (usually unsigned int or unsigned long.
When you do subtraction between two unsigned integers, it will underflow and wrap around if the result is lower than 0, the smallest unsigned integer. Therefore on a typical 32-bit system, 3U - 5U == 4294967294U and on a typical 64-bit system, 3UL - 5UL == 18446744073709551614UL. Your test of (strlen(s) - strlen(t)) > i has exactly the same behavior of strlen(s) == strlen(t) when i == 0, as their length being identical is the only case that could render the test being false.
It's advised to avoid using subtraction when comparing intergers. If you really want to to that, addition is better:
strlen(s) > strlen(t) + i
This way it's less likely to have unsigned integer overflow.
By the way, if you save the length of the strings in variables, you can reduce an extra call to strlen(). And since you do not modify the strings in your function, it is better to declare the function parameters as const char*. It's also recommended that you do
const char *x ="abc";
const char *y ="defgh";
since string literals cannot be modified. Any attempt to modify a string literal invokes undefined behavior.
#include<stdio.h>
#include<string.h>
void printlength(char *s, char *t) {
unsigned int c=0;
int len = ((strlen(s) - strlen(t)) > c) ? strlen(s) : strlen(t);
printf("%d\n", len);
}
void main() {
char *x = "abc";
char *y = "defgh";
printlength(x,y);
}
When I compile it, it gives 3, but, I don't understand how the conversion is taking place here: (strlen(s) - strlen(t)) > c)
This is very poor code (strlen(s) - strlen(t)) is always >= 0 as it is unsigned math. The type returned by strlen() is size_t, some unsigned type. So unless the values are equal, the difference is always a positive number due to unsigned math wrap-around.
Then int len = strlen(s); even when the length of s is differ from t.
The better way to use similar code would be to only add.
// ((strlen(s) - strlen(t)) > c)
(strlen(s) > (c + strlen(t))
Note: On rare platforms with SIZE_MAX <= INT_MAX, the difference can be negative as math is then done with the signed type int. Yet the compare with c is unsigned and than then happens as unsigned resulting in a negative difference being "wrapped-around" to a very large number, greater than 0. #Paul Hankin
I made my bubble sort program generic. I went on testing it and it was working well until I placed a negative number in the array, and I was surprised that it was pushed to the end, making it bigger than positive numbers.
Obviously memcmp is the reason, so why does memcmp() regard negative numbers greater than positive numbers?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void bubblesortA(void *_Buf, size_t bufSize, size_t bytes);
int main(void)
{
size_t n, bufsize = 5;
int buf[5] = { 5, 1, 2, -1, 10 };
bubblesortA(buf, 5, sizeof(int));
for (n = 0; n < bufsize; n++)
{
printf("%d ", buf[n]);
}
putchar('\n');
char str[] = "bzqacd";
size_t len = strlen(str);
bubblesortA(str, len, sizeof(char));
for (n = 0; n < len; n++)
{
printf("%c ", str[n]);
}
putchar('\n');
return 0;
}
void bubblesortA(void *buf, size_t bufSize, size_t bytes)
{
size_t x, y;
char *ptr = (char*)buf;
void *tmp = malloc(bytes);
for (x = 0; x < bufSize; x++)
{
ptr = (char *)buf;
for (y = 0; y < (bufSize - x - 1); y++)
{
if (memcmp(ptr, ptr + bytes, bytes) > 0)
{
memcpy(tmp, ptr, bytes);
memcpy(ptr, ptr + bytes, bytes);
memcpy(ptr + bytes, tmp, bytes);
}
ptr += bytes;
}
}
free(tmp);
}
Edit:
So, How do I modify the program to make it compare correctly?
memcmp compares bytes, it does not know if the bytes represent ints, doubles, strings, ...
So it can not do any better that treat the bytes as unsigned numbers. Because a negative integer is normally represented using two's-complement, the highest bit of a negative integer is set, making it bigger than any positive signed integer.
Answering OP's appended edit
How can i modify the program to make it compare correctly?
To compare two types as an anonymous bit pattern, memcmp() works fine. To compare two values of some type, code needs a compare function for that type. Following qsort() style:
void bubblesortA2(void *_Buf,size_t bufSize,size_t bytes,
int (*compar)(const void *, const void *)))
{
....
// if(memcmp(ptr,ptr+bytes,bytes) > 0)
if((*compar)(ptr,ptr+bytes) > 0)
....
To compare int, pass in a compare int function. Notice that a, b are the addresses to the objects.
int compar_int(const void *a, const void *b) {
const int *ai = (const int *)a;
const int *bi = (const int *)b;
return (*ai > *bi) - (*ai < *bi);
}
To compare char, pass in a compare char function
int compar_int(const void *a, const void *b) {
const char *ac = (const char *)a;
const char *bc = (const char *)b;
return (*ac > *bc) - (*ac < *bc);
}
Negative numbers have sign bit (the most significant bit) set to 1. Function memcmp compares bytes as unsigned values. So it considers the sign bit as a value bit. As result sometimes negative numbers are greater than positive numbers.
In the notation that numbers and negative numbers are counted, the first bit is used as signbit, a bit that indicates wether a number is negative or positive.
It becomes clear if we look at the binary representations of 2 numbers. Its worth noting that memcmp just compares 2 numbers as if they were both unsigned numbers of the lenght specified.
-27 in binary notation (8 bit notation, twos complement): 1110 0101
+56 in binary notation : 0011 1000
If you compare the two as if they were positive, you'll note that the -27 representation is actually bigger.
I am converting an input raw pcm stream into mp3 using lame. The encoding function within that library returns mp3 encoded samples in an array of type unsigned char. This mp3-encoded stream now needs to be placed within an flv container which uses a function that writes encoded samples in char array. My problem is that I am passing the array from lame (of type unsigned char) into the flv library. The following piece of code (only symbolic) illustrates my problem:
/* cast from unsigned char to char. */
#include <stdio.h>
#include <stdlib.h>
void display(char *buff, int len) {
int i = 0;
for(i = 0; i < len; i++) {
printf("buff[%d] = %c\n", i, buff[i]);
}
}
int main() {
int len = 10;
unsigned char* buff = (unsigned char*) malloc(len * sizeof(unsigned char));
int i = 0;
for(i = 65; i < (len + 65); i++) {
buff[i] = (unsigned char) i;
printf("char = %c", (char) i);
}
printf("Displaying array in main.\n");
for(i = 0; i < len; i++) {
printf("buff[%d] = %u\n", i, 'buff[i]');
}
printf("Displaying array in func.\n");
display(buff, len);
return 0;
}
My question(s):
1. Is the implicit type conversion in the code below (as demonstrated by passing of buff into function display safe? Is some weird behaviour likely to occur?
2. Given that I have little option but to stick to the functions as they are present, is there a "safe" way of converting an array of unsigned chars into chars?
The only problem with converting unsigned char * into char * (or vice versa) is that it's supposed to be an error. Fix it with a cast.
display((char *) buff, len);
Note: This cast is unnecessary:
printf("char = %c", (char) i);
This is fine:
printf("char = %c", i);
The %c formatter takes an int arg to begin with, since it is impossible to pass a char to printf() anyway (it will always get converted to int, or in an extremely unlikely case, unsigned int.)
You seem to worry a lot about type safety where there is no need for it. Since this is C and not C++, there is no strong typing system in place. Conversions from unsigned char to char are usually harmless, as long as the "sign bit" is never set. The key to avoiding problems is to actually understand them. The following problems/features exist in C:
The default char type has implementation-defined signedness. One should never make any assumptions of its signedness, nor use it in arithmetic of any kind, particularly not bit-wise operations. char should only be used for storing/printing ASCII letters. It should never be mixed with hex literals or there is a potential for subtle bugs.
The integer promotions in C implicitly promote all small integer types, among them char and unsigned char, to an integer type that can hold their result. This will in practice always be int.
Formally, pointer conversions between different types could be undefined behavior. But pointer conversions between unsigned char and char are in practice safe.
Character literals '\0' etc are of type int in C.
printf and similar functions default promote all character parameters to int.
You are also casting the void* result of malloc, which is completely pointless in C, and potentially harmful in older versions of the C standard that translated functions to "default int" if no function prototype was visible.
And then you have various weird logic-related bugs and bad practice, which I have fixed but won't comment in detail. Use this modified code:
#include <stdio.h>
#include <stdlib.h>
void display(const char *buff, int len) {
for(int i = 0; i < len; i++) {
printf("buff[%d] = %c\n", i, buff[i]);
}
}
int main() {
int len = 10;
unsigned char* buff = malloc(len * sizeof(unsigned char));
if(buff == NULL)
{
// error handling
}
char ch = 'A';
for(int i=0; i<len; i++)
{
buff[i] = (unsigned char)ch + i;
printf("char = %c\n", buff[i]);
}
printf("\nDisplaying array in main.\n");
for(int i = 0; i < len; i++) {
printf("buff[%d] = %u\n", i, buff[i]);
}
printf("\nDisplaying array in func.\n");
display((char*)buff, len);
free(buff);
return 0;
}
C/C++ casts from any integral type to any other same-or-larger integral type are guaranteed not to produce data loss. Casts between signed and unsigned fields would generally create overflow and underflow hazards, but the buffer you're converting actually points to raw data whose type is really void*.
In the C language, how do I convert unsigned long value to a string (char *) and keep my source code portable or just recompile it to work on other platform (without rewriting code?
For example, if I have sprintf(buffer, format, value), how do I determine the size of buffer with platform-independent manner?
const int n = snprintf(NULL, 0, "%lu", ulong_value);
assert(n > 0);
char buf[n+1];
int c = snprintf(buf, n+1, "%lu", ulong_value);
assert(buf[n] == '\0');
assert(c == n);
The standard approach is to use sprintf(buffer, "%lu", value); to write a string rep of value to buffer. However, overflow is a potential problem, as sprintf will happily (and unknowingly) write over the end of your buffer.
This is actually a big weakness of sprintf, partially fixed in C++ by using streams rather than buffers. The usual "answer" is to allocate a very generous buffer unlikely to overflow, let sprintf output to that, and then use strlen to determine the actual string length produced, calloc a buffer of (that size + 1) and copy the string to that.
This site discusses this and related problems at some length.
Some libraries offer snprintf as an alternative which lets you specify a maximum buffer size.
you can write a function which converts from unsigned long to str, similar to ltostr library function.
char *ultostr(unsigned long value, char *ptr, int base)
{
unsigned long t = 0, res = 0;
unsigned long tmp = value;
int count = 0;
if (NULL == ptr)
{
return NULL;
}
if (tmp == 0)
{
count++;
}
while(tmp > 0)
{
tmp = tmp/base;
count++;
}
ptr += count;
*ptr = '\0';
do
{
res = value - base * (t = value / base);
if (res < 10)
{
* -- ptr = '0' + res;
}
else if ((res >= 10) && (res < 16))
{
* --ptr = 'A' - 10 + res;
}
} while ((value = t) != 0);
return(ptr);
}
you can refer to my blog here which explains implementation and usage with example.
char buffer [50];
unsigned long a = 5;
int n=sprintf (buffer, "%lu", a);
Try using sprintf:
unsigned long x=1000000;
char buffer[21];
sprintf(buffer,"%lu", x);
Edit:
Notice that you have to allocate a buffer in advance, and have no idea how long the numbers will actually be when you do so. I'm assuming 32bit longs, which can produce numbers as big as 10 digits.
See Carl Smotricz's answer for a better explanation of the issues involved.
For a long value you need to add the length info 'l' and 'u' for unsigned decimal integer,
as a reference of available options see sprintf
#include <stdio.h>
int main ()
{
unsigned long lval = 123;
char buffer [50];
sprintf (buffer, "%lu" , lval );
}
... how do I determine the size of buffer with platform-independent manner?
One of the challenges of converting a unsigned long to a string is how to determine the string size that is needed.
Dynamically
Repeatedly divide the value by 10 until 0 to find size_needed.
value_copy = value;
unsigned size_needed = 1; // For the null character.
if (value_copy < 0) size_needed++; // Only needed for signed types.
do {
size_needed++; // Add 1 per digit.
value_copy /= 10;
} while (value_copy != 0);
Worse case
Find the string length of ULONG_MAX.
Start with the nifty IMAX_BITS(m) which returns the number of bits in a Mersenne Number like ULONG_MAX. (This give us the max bit width even if the type has padding.) Then scale by log102 (0.301...) to find the number of decimal digits and add 2 for rounding and the null character.
#define IMAX_BITS(m) ((m)/((m)%255+1) / 255%255*8 + 7-86/((m)%255+12))
#define LOG2_10_N 28
#define LOG2_10_D 93
#define UNSIGNED_LONG_STRING_SIZE (IMAX_BITS(ULONG_MAX)*LOG2_10_N/LOG2_10_D + 2)
// Slightly different for signed types, one more for the sign:
#define SIGNED_LONG_STRING_SIZE (IMAX_BITS( LONG_MAX)*LOG2_10_N/LOG2_10_D + 3)
Armed with the string size, there are many possible next steps. I like using C99's (and later) compound literal to form the needed space. The space is valid until the end of the block.
char *unsigned_long_to_string(char *dest, unsigned long x) {
sprintf(dest, "%lu", x);
return dest;
}
// Compound literal v-----------------------------------v
#define UNSIGNED_LONG_TO_STRING(u) unsigned_long_to_string((char [UNSIGNED_LONG_STRING_SIZE]){0}, (u))
int main(void) {
puts(UNSIGNED_LONG_TO_STRING(42));
puts(UNSIGNED_LONG_TO_STRING(ULONG_MAX));
}
Output
42
18446744073709551615 // This varies