How is the conversion being done in this C code? - c

#include<stdio.h>
#include<string.h>
void printlength(char *s, char *t) {
unsigned int c=0;
int len = ((strlen(s) - strlen(t)) > c) ? strlen(s) : strlen(t);
printf("%d\n", len);
}
void main() {
char *x = "abc";
char *y = "defgh";
printlength(x,y);
}
When I compile it, it gives 3, but, I don't understand how the conversion is taking place here: (strlen(s) - strlen(t)) > c)

This is very poor code (strlen(s) - strlen(t)) is always >= 0 as it is unsigned math. The type returned by strlen() is size_t, some unsigned type. So unless the values are equal, the difference is always a positive number due to unsigned math wrap-around.
Then int len = strlen(s); even when the length of s is differ from t.
The better way to use similar code would be to only add.
// ((strlen(s) - strlen(t)) > c)
(strlen(s) > (c + strlen(t))
Note: On rare platforms with SIZE_MAX <= INT_MAX, the difference can be negative as math is then done with the signed type int. Yet the compare with c is unsigned and than then happens as unsigned resulting in a negative difference being "wrapped-around" to a very large number, greater than 0. #Paul Hankin

Related

Conversion of string constant to numeric value using C

I have written a C program which uses two different algorithms to convert a string constant representing a numeric value to its integer value. For some reasons, the first algorithm, atoi(), doesn't execute properly on large values, while the second algorithm, atoi_imp(), works fine. Is this an optimization issue or some other error? The problem is that the first function makes the program's process to terminate with an error.
#include <stdio.h>
#include <string.h>
unsigned long long int atoi(const char[]);
unsigned long long int atoi_imp(const char[]);
int main(void) {
printf("%llu\n", atoi("9417820179"));
printf("%llu\n", atoi_imp("9417820179"));
return 0;
}
unsigned long long int atoi(const char str[]) {
unsigned long long int i, j, power, num = 0;
for (i = strlen(str) - 1; i >= 0; --i) {
power = 1;
for (j = 0; j < strlen(str) - i - 1; ++j) {
power *= 10;
}
num += (str[i] - '0') * power;
}
return num;
}
unsigned long long int atoi_imp(const char str[]) {
unsigned long long int i, num = 0;
for (i = 0; str[i] >= '0' && str[i] <= '9'; ++i) {
num = num * 10 + (str[i] - '0');
}
return num;
}
atoi is part of C standard library, with signature int atoi(const char *);.
You are declaring that a function with that name exists, but give it different return type. Note that in C, function name is the only thing that matters, and the toolchain can only trust what you tell in the source code. If you lie to the compiler, like here, all bets are off.
You should select different name for your own implementation to avoid issues.
As researched by #pmg, C standard (link to C99.7.1.3) says, using names from C standard library for your own global symbols (functions or global variables) is explicitly Undefined Behavior. Beware of nasal demons!
Ok there is at least one problem with your function atoi.
You are looping down on an unsigned value and check if its bigger equal zero, which should be an underflow.
The most easy fix is index shifting i.e.:
unsigned long long int my_atoi(const char str[]) {
unsigned long long int i, j, power, num = 0;
for (i = strlen(str); i != 0; --i) {
power = 1;
for (j = 0; j < strlen(str) - i; ++j) {
power *= 10;
}
num += (str[i-1] - '0') * power;
}
return num;
}
Too late, but may help. I did for base 10, in case you change the base you need to take care about how to compute the digit 0, in *p-'0'.
I would use the Horner's rule to compute the value.
#include <stdio.h>
void main(void)
{
char *a = "5363", *p = a;
int unsigned base = 10;
long unsigned x = 0;
while(*p) {
x*=base;
x+=(*p-'0');
p++;
}
printf("%lu\n", x);
}
Your function has an infinite loop: as i is unsigned, i >= 0 is always true.
It can be improved in different ways:
you should compute the length of str just once. strlen() is not cheap, it must scan the string until it finds the null terminator. The compiler is not always capable of optimizing away redundant calls for the same argument.
power could be computed incrementally, avoiding the need for a nested loop.
you should not use the name atoi as it is a standard function in the C library. Unless you implement its specification exactly and correctly, you should use a different name.
Here is a corrected and improved version:
unsigned long long int atoi_power(const char str[]) {
size_t i, len = strlen(str);
unsigned long long int power = 1, num = 0;
for (i = len; i-- > 0; ) {
num += (str[i] - '0') * power;
power *= 10;
}
return num;
}
Modified this way, the function should have a similar performance as the atoi_imp version. Note however that they do not implement the same semantics. atoi_pow must be given a string of digits, whereas atoi_imp can have trailing characters.
As a matter of fact neither atoi_imp nor atoi_pow implement the specification of atoi extended to handle larger unsigned integers:
atoi ignored any leading white space characters,
atoi accepts an optional sign, either '+' or '-'.
atoi consumes all following decimal digits, the behavior on overflow is undefined.
atoi ignores and trailing characters that are not decimal digits.
Given these semantics, the natural implementation or atoi is that of atoi_imp with extra tests. Note that even strtoull(), which you could use to implement your function handles white space and an optional sign, although the conversion of negative values may give surprising results.

What happens if size_t is signed like -1

If size_t is signed like -1 is incorrect the code?
For example in this strncat implementation at the end of function size_t is -1. Can overflow?
Consider this example:
#include <stdio.h>
#include <string.h>
char *util_strncat(char *destination, const char *source, size_t len) {
char *temp = destination + strlen(destination);
while (len--) {
*temp++ = *source++;
if(*source == '\0'){
break;
}
}
*temp = '\0';
printf("%zu\n",len); //-1
return destination;
}
int main() {
char dest[7] = "abcd";
char src[] = "efg";
util_strncat(dest, src, 2);
printf("%s\n", dest);
return 0;
}
what happens if size_t is signed like -1 (?)
If code truly printed a "-1", then the compilation is not C compliant.
in this strncat implementation at the end of function size_t is -1.
No it is not -1 with a conforming C compiler.
size_t is an unsigned integer of some width - at least 16 bits.
while (len--) { loop continues until len == 0. With the post decrement len--, the value after the evaluation wraps around to SIZE_MAX, some large positive value.
Can overflow?
Mathematically yes size_t math can overflow. In C unsigned math is well defined to wrap around1, in effect modulo SIZE_MAX + 1 for type size_t. §6.2.5 9
I'd expect printf("%zu\n",len); to report 18446744073709551615, 4294967295 or some other Mersenne number.
size_t which is the unsigned integer type of the result of the sizeof operator; C11dr §7.19 2
1 A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
Questionable code
Consider util_strncat(ptr, "", size) can lead to undefined behavior (UB).
int main() {
char dest[7] = "abcd";
char src[] = "\0xyz";
util_strncat(dest, src, 2);
printf("%s\n", dest);
printf("%c\n", dest[5]); // 'x'!!
}
The %zu cant print the negative number at any circumstances. When unsigned number of the type size_t wraps over when decreasing its value becomes SIZE_MAX.

Why does the below code yield O/P as 3 instead of 5?

In the below code length of string s in 3 and length of t is 5. So, 3-5 = -2 which is smaller than 0. Then, why does the below code print 3?
#include <stdio.h>
#include <string.h>
void printlength(char *s, char *t){
unsigned int i=0;
int len = ((strlen(s) - strlen(t))> i ? strlen(s):strlen(t));
printf("%d",len);
}
int main()
{
char *x ="abc";
char *y ="defgh";
printlength(x,y);
return 0;
}
when -2 is converted to an unsigned int the result is the unsigned int value (UINT_MAX + 1- 2 or UINT_MAX - 1) , which is greater than i. strlen returns size_t which is an unsigned data type.
Also size_t is the correct type for len which we would print with printf("%zu",len).
Suprisingly when you compared the result of subtraction with i and the value of i is 0. You can do this
size_t slen = strlen(s);
size_t tlen = strlen(t);
printf("%zu\n", (slen > tlen)? slen : tlen);
Your problem is with subtracting greater unsigned value from the smaller one.
`(unsigned) 3 - (unsigned) 5` = (unsigned) 4294967294 which is > 0.
Use proper types for your calculations and proper logic. Remember that strlen returns value of type size_t.
No need to repeat strlen operation for the same string.
The improved version of your program could look like this:
#include <stdio.h>
#include <string.h>
void printlength(char *s, char *t){
size_t len;
size_t sLen = strlen(s);
size_t tLen = strlen(t);
if(sLen > tLen)
len = sLen - tLen;
else
len = tLen - sLen;
printf("len = %u\n\n",len);
printf("NOTE: (unsigned) 3 - (unsigned) 5 = %u", -2);
}
int main()
{
char *x ="abc";
char *y ="defgh";
printlength(x,y);
return 0;
}
OUTPUT:
len = 2
NOTE: (unsigned) 3 - (unsigned) 5 = 4294967294
So, 3-5 = -2
Thats for signed ints, for size_t which strlen() returns and is unsigned, that's a pretty big number.
The prototype of strlen() is:
size_t strlen ( const char * );
It's return value type is size_t, which in most cases, is an unsigned integer type (usually unsigned int or unsigned long.
When you do subtraction between two unsigned integers, it will underflow and wrap around if the result is lower than 0, the smallest unsigned integer. Therefore on a typical 32-bit system, 3U - 5U == 4294967294U and on a typical 64-bit system, 3UL - 5UL == 18446744073709551614UL. Your test of (strlen(s) - strlen(t)) > i has exactly the same behavior of strlen(s) == strlen(t) when i == 0, as their length being identical is the only case that could render the test being false.
It's advised to avoid using subtraction when comparing intergers. If you really want to to that, addition is better:
strlen(s) > strlen(t) + i
This way it's less likely to have unsigned integer overflow.
By the way, if you save the length of the strings in variables, you can reduce an extra call to strlen(). And since you do not modify the strings in your function, it is better to declare the function parameters as const char*. It's also recommended that you do
const char *x ="abc";
const char *y ="defgh";
since string literals cannot be modified. Any attempt to modify a string literal invokes undefined behavior.

casting unsigned char to char would result in different binary representations?

I think the title is pretty self explanatory but basically what I'm saying is that, if I have the following instruction:
a = (char) b;
knowing that a's type is char and b's is unsigned char, can that instruction result in making a and b have different binary representations?
The type char can be either signed or unsigned. Char types have no padding, so all bits are value bits.
If char is unsigned, then the value bits of a will be the same as those of b.
If char is signed, then...
if the value of b is representable by char, the common value bits of a and b will the same.
otherwise, the conversion from unrepresentable unsigned char value to char results in an implementation-defined result.
The answer in general, is no, there is no difference. Here you can test it yourself. Just supply the respective values for 'a' and 'b'
#include <stdio.h>
#include <string.h>
const char *byte_to_binary(int x)
{
static char b[9];
b[0] = '\0';
int z;
for (z = 128; z > 0; z >>= 1)
strcat(b, ((x & z) == z) ? "1" : "0");
}
return b;
}
int main(void) {
unsigned char b = -7;
char a = -7;
printf("1. %s\n", byte_to_binary(a));
a = (char) b;
printf("2. %s\n", byte_to_binary(a));
return 0;
}

strlen is not allways a good idea to be used

I'm not sure if I chose the right Title, but today I discovered (as a beginner in C) that for me strlen is not always the right decision to be made when I need it.
So I tried the following:
#include<stdio.h>
#include<string.h>
int foo(char *s){
int len = strlen(s);
/* code here */
return len;
}
int main(void){
char *name = "Michi";
int len = foo(name);
int a = 20, b = 10, c = a - b;
if(c < len){
printf("True: C(%d) < Len(%d)\n",c,len);
}else{
printf("False: C(%d) > Len(%d)\n",c,len);
}
return 0;
}
Output:
False: C(10) > Len(5)
But when I compile with "-Wconversion" I get:
program.c:5:19: warning: conversion to ‘int’ from ‘size_t’ may alter its value [-Wconversion]
int len = strlen(s);
^
A quick fix will be to cast strlen:
int len = (int)strlen(s);
But I was not agree, so I decided that I really need something else, another approach maybe?
I tried the following:
#include<stdio.h>
#include<string.h>
unsigned int size(char *s){
unsigned int len;
/* code here */
len = (unsigned int)strlen(s);
return len;
}
int main(void){
char *name = "Michi";
unsigned int len = size(name);
int a = 20, b = 10, c = a - b;
if(c < (signed int)len){
printf("True: C(%d) < Len(%d)\n",c,len);
}else{
printf("False: C(%d) > Len(%d)\n",c,len);
}
return 0;
}
But I still need to cast strlen because of its return type (size_t which I know that is an unsigned type (typedef long unsigned int size_t;))
Finally I decided for another approach, to create my own function, which make things easier and with less possible future problems and I got:
#include<stdio.h>
long int stringLEN(char *s){
int i = 0;
long int len = 0;
while (s[i] != '\0'){
len++;
i++;
}
return len;
}
long int foo(char *s){
long int len = stringLEN(s);
/* code here */
return len;
}
int main(void){
char *name = "Michi";
long int len = foo(name);
int a = 20, b = 10, c = a - b;
if(c < len){
printf("True: C(%d) < Len(%ld)\n",c,len);
}else{
printf("False: C(%d) > Len(%ld)\n",c,len);
}
return 0;
}
where no cast is needed anymore.
So my QUESTION is:
is this (for my case) a better approach ?
If not I need some explanations, my books (I have 3) does not explain me in that way that I can understand this things.
I know only that at some point cast could be a big problem, somehow.
EDIT:
This code will also not compile with -Wconversion:
#include<stdio.h>
#include<string.h>
size_t foo(char *s){
size_t len = strlen(s);
/* code here */
return len;
}
int main(void){
char *name = "Michi";
size_t len = foo(name);
int a = 20, b = 10, c = a - b;
if(c < len){
printf("True: C(%d) < Len(%zu)\n",c,len);
}else{
printf("False: C(%d) > Len(%zu)\n",c,len);
}
return 0;
}
Output:
error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]|
But if I cast len works. I realized that, if the size is bigger then that int it will never fit.
Digging through all the other answers, your true question seems to be how to deal with a situation like this:
#include <string.h>
#include <libfoo.h>
extern void foo(void);
extern void bar(void);
void pick_foo_or_bar(const char *s)
{
size_t slen = strlen(s);
int value = libfoo_api_returning_an_int();
if (slen > value) // -Wconversion warning on this line
foo();
else
bar();
}
... where you can't change the type of either slen or value, because both are correct for the API they're receiving the result of.
The -Wconversion warning is trying to tell you something meaningful. Comparison of signed and unsigned integer types in C does something very strange, not what you would expect from the laws of arithmetic in ℤ; a naive comparison like what I wrote above can and has caused catastrophic bugs. But the cure is not casts or inventing your own strlen; the cure is to fix the comparison so it does what you expect from the laws of arithmetic. The principles for this are:
First check whether the signed quantity is negative. If so, treat it as smaller than the unsigned quantity.
Otherwise, cast the smaller type to the larger type before comparing them.
In this case, size_t is almost certain to be larger than, or the same size as, int, so you would write
#include <assert.h>
#include <limits.h>
#include <string.h>
#include <libfoo.h>
extern void foo(void);
extern void bar(void);
// Code below is correct only if size_t is at least as large as int.
static_assert(SIZE_MAX >= INT_MAX);
void pick_foo_or_bar(const char *s)
{
size_t slen = strlen(s);
int value = libfoo_api_returning_an_int();
if (value < 0 || (size_t)value < slen)
foo();
else
bar();
}
The static_assert is present because, if I remember correctly, the C standard does not guarantee size_t being at least as large as unsigned int. I could, for instance, imagine an ABI for the 80286 where int was four bytes wide but size_t only two. In that situation you would need to do the casting the other way around:
void pick_foo_or_bar(unsigned short a, long b)
{
if (b < 0 || b < (long)a)
foo();
else
bar();
}
If you don't know which of the two types is bigger, or if you don't know which of them is signed, your only recourse in standard C is (u)intmax_t:
void pick_foo_or_bar(uid_t a, gid_t b)
{
if (a < 0 && b < 0) {
if ((intmax_t)a < (intmax_t)b)
bar();
else
foo();
} else if (a < 0) {
bar();
} else if (b < 0) {
foo();
} else {
if ((uintmax_t)a < (uintmax_t)b)
bar();
else
foo();
}
}
... and, given the exceedingly unfortunate precedent set by C99 wrt long, there probably will come a day when (u)intmax_t is not the biggest integer type supported by the compiler, and then you're just hosed.
The lenght of a string can never be negative, whilst an integer could be - the warning is because the range of values for size_t is different to int, and some positive values of size_t would be treated as negative if cast to an int. The better option is to have the return type to your function match, in this case, have foo return a size_t - you'll soon see that the datatype would permiate most of the code, and leave some other oddities that could do odd things (size_t - size_t could underflow...)
This will compile without warnings:
#include<stdio.h>
#include<string.h>
size_t foo(char *s){
size_t len = strlen(s);
/* code here */
return len;
}
int main(void){
char *name = "Michi";
size_t len = foo(name);
size_t a = 20, b = 10, c = a - b;
if(c < len){
printf("True: C(%zu) < Len(%zu)\n",c,len);
} else {
printf("False: C(%zu) > Len(%zu)\n",c,len);
}
return 0;
}
as well explained in the answers and comments by #thomasdickey, #rolandshaw, #andreaghidini, #olaf, #juanchopanza and others.
Did you really made a better approach? No: why should a stringlen function return values that can be negative? There is no such thing as a string with negative size.
The standard strlen function is already there, is more efficient, is able to deal with strings with a maximum size which is twice the maximum size handled by stringLEN, and has a more precise definition of the return type.
There are 2 issues:
strlen() returns type size_t. size_t is some unsigned integer type likely as wide or wider than int. It is compiler/platform dependent.
Code needs to compare and int to size_t. Since size_t is unsigned, and to prevent a warning of mixed signed/unsigned comparrison, explicitly change int to an unsigned integer. To change an non-negative int to an unsigned integer, cast to (unsigned).
To compare, test if c is negative and if not, then compare (unsigned)c directly to len. Compiler will covert types as needed and result in an arithmetically correct answer.
..
size_t len = strlen("SomeString");
int c = 20; // some int
if (c < 0 || (unsigned)c < len) puts("c less than len");
else puts("c >= len");
The normal way to solve this is to use variables typed size_t, and choose an appropriate format for printing them. Then no cast is needed. For printf, see these:
printf format specifiers for uint32_t and size_t
What's the correct way to use printf to print a size_t?
i think there must be varying from compiler to compiler.....because i tried it on a online compiler and it didn't show any warning.

Resources