strlen is not allways a good idea to be used - c

I'm not sure if I chose the right Title, but today I discovered (as a beginner in C) that for me strlen is not always the right decision to be made when I need it.
So I tried the following:
#include<stdio.h>
#include<string.h>
int foo(char *s){
int len = strlen(s);
/* code here */
return len;
}
int main(void){
char *name = "Michi";
int len = foo(name);
int a = 20, b = 10, c = a - b;
if(c < len){
printf("True: C(%d) < Len(%d)\n",c,len);
}else{
printf("False: C(%d) > Len(%d)\n",c,len);
}
return 0;
}
Output:
False: C(10) > Len(5)
But when I compile with "-Wconversion" I get:
program.c:5:19: warning: conversion to ‘int’ from ‘size_t’ may alter its value [-Wconversion]
int len = strlen(s);
^
A quick fix will be to cast strlen:
int len = (int)strlen(s);
But I was not agree, so I decided that I really need something else, another approach maybe?
I tried the following:
#include<stdio.h>
#include<string.h>
unsigned int size(char *s){
unsigned int len;
/* code here */
len = (unsigned int)strlen(s);
return len;
}
int main(void){
char *name = "Michi";
unsigned int len = size(name);
int a = 20, b = 10, c = a - b;
if(c < (signed int)len){
printf("True: C(%d) < Len(%d)\n",c,len);
}else{
printf("False: C(%d) > Len(%d)\n",c,len);
}
return 0;
}
But I still need to cast strlen because of its return type (size_t which I know that is an unsigned type (typedef long unsigned int size_t;))
Finally I decided for another approach, to create my own function, which make things easier and with less possible future problems and I got:
#include<stdio.h>
long int stringLEN(char *s){
int i = 0;
long int len = 0;
while (s[i] != '\0'){
len++;
i++;
}
return len;
}
long int foo(char *s){
long int len = stringLEN(s);
/* code here */
return len;
}
int main(void){
char *name = "Michi";
long int len = foo(name);
int a = 20, b = 10, c = a - b;
if(c < len){
printf("True: C(%d) < Len(%ld)\n",c,len);
}else{
printf("False: C(%d) > Len(%ld)\n",c,len);
}
return 0;
}
where no cast is needed anymore.
So my QUESTION is:
is this (for my case) a better approach ?
If not I need some explanations, my books (I have 3) does not explain me in that way that I can understand this things.
I know only that at some point cast could be a big problem, somehow.
EDIT:
This code will also not compile with -Wconversion:
#include<stdio.h>
#include<string.h>
size_t foo(char *s){
size_t len = strlen(s);
/* code here */
return len;
}
int main(void){
char *name = "Michi";
size_t len = foo(name);
int a = 20, b = 10, c = a - b;
if(c < len){
printf("True: C(%d) < Len(%zu)\n",c,len);
}else{
printf("False: C(%d) > Len(%zu)\n",c,len);
}
return 0;
}
Output:
error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]|
But if I cast len works. I realized that, if the size is bigger then that int it will never fit.

Digging through all the other answers, your true question seems to be how to deal with a situation like this:
#include <string.h>
#include <libfoo.h>
extern void foo(void);
extern void bar(void);
void pick_foo_or_bar(const char *s)
{
size_t slen = strlen(s);
int value = libfoo_api_returning_an_int();
if (slen > value) // -Wconversion warning on this line
foo();
else
bar();
}
... where you can't change the type of either slen or value, because both are correct for the API they're receiving the result of.
The -Wconversion warning is trying to tell you something meaningful. Comparison of signed and unsigned integer types in C does something very strange, not what you would expect from the laws of arithmetic in ℤ; a naive comparison like what I wrote above can and has caused catastrophic bugs. But the cure is not casts or inventing your own strlen; the cure is to fix the comparison so it does what you expect from the laws of arithmetic. The principles for this are:
First check whether the signed quantity is negative. If so, treat it as smaller than the unsigned quantity.
Otherwise, cast the smaller type to the larger type before comparing them.
In this case, size_t is almost certain to be larger than, or the same size as, int, so you would write
#include <assert.h>
#include <limits.h>
#include <string.h>
#include <libfoo.h>
extern void foo(void);
extern void bar(void);
// Code below is correct only if size_t is at least as large as int.
static_assert(SIZE_MAX >= INT_MAX);
void pick_foo_or_bar(const char *s)
{
size_t slen = strlen(s);
int value = libfoo_api_returning_an_int();
if (value < 0 || (size_t)value < slen)
foo();
else
bar();
}
The static_assert is present because, if I remember correctly, the C standard does not guarantee size_t being at least as large as unsigned int. I could, for instance, imagine an ABI for the 80286 where int was four bytes wide but size_t only two. In that situation you would need to do the casting the other way around:
void pick_foo_or_bar(unsigned short a, long b)
{
if (b < 0 || b < (long)a)
foo();
else
bar();
}
If you don't know which of the two types is bigger, or if you don't know which of them is signed, your only recourse in standard C is (u)intmax_t:
void pick_foo_or_bar(uid_t a, gid_t b)
{
if (a < 0 && b < 0) {
if ((intmax_t)a < (intmax_t)b)
bar();
else
foo();
} else if (a < 0) {
bar();
} else if (b < 0) {
foo();
} else {
if ((uintmax_t)a < (uintmax_t)b)
bar();
else
foo();
}
}
... and, given the exceedingly unfortunate precedent set by C99 wrt long, there probably will come a day when (u)intmax_t is not the biggest integer type supported by the compiler, and then you're just hosed.

The lenght of a string can never be negative, whilst an integer could be - the warning is because the range of values for size_t is different to int, and some positive values of size_t would be treated as negative if cast to an int. The better option is to have the return type to your function match, in this case, have foo return a size_t - you'll soon see that the datatype would permiate most of the code, and leave some other oddities that could do odd things (size_t - size_t could underflow...)

This will compile without warnings:
#include<stdio.h>
#include<string.h>
size_t foo(char *s){
size_t len = strlen(s);
/* code here */
return len;
}
int main(void){
char *name = "Michi";
size_t len = foo(name);
size_t a = 20, b = 10, c = a - b;
if(c < len){
printf("True: C(%zu) < Len(%zu)\n",c,len);
} else {
printf("False: C(%zu) > Len(%zu)\n",c,len);
}
return 0;
}
as well explained in the answers and comments by #thomasdickey, #rolandshaw, #andreaghidini, #olaf, #juanchopanza and others.
Did you really made a better approach? No: why should a stringlen function return values that can be negative? There is no such thing as a string with negative size.
The standard strlen function is already there, is more efficient, is able to deal with strings with a maximum size which is twice the maximum size handled by stringLEN, and has a more precise definition of the return type.

There are 2 issues:
strlen() returns type size_t. size_t is some unsigned integer type likely as wide or wider than int. It is compiler/platform dependent.
Code needs to compare and int to size_t. Since size_t is unsigned, and to prevent a warning of mixed signed/unsigned comparrison, explicitly change int to an unsigned integer. To change an non-negative int to an unsigned integer, cast to (unsigned).
To compare, test if c is negative and if not, then compare (unsigned)c directly to len. Compiler will covert types as needed and result in an arithmetically correct answer.
..
size_t len = strlen("SomeString");
int c = 20; // some int
if (c < 0 || (unsigned)c < len) puts("c less than len");
else puts("c >= len");

The normal way to solve this is to use variables typed size_t, and choose an appropriate format for printing them. Then no cast is needed. For printf, see these:
printf format specifiers for uint32_t and size_t
What's the correct way to use printf to print a size_t?

i think there must be varying from compiler to compiler.....because i tried it on a online compiler and it didn't show any warning.

Related

Unsigned short int to Binary in C without using malloc() function?

I wrote a program that changes an unsigned int to into binary. In my function I use the malloc() function. I was wondering if there was a way to do it without the malloc().
#include <stdio.h>
#include <stdlib.h>
char *toBinary(unsigned n);
int main(void) {
int n;
printf("Enter an integer to convert: ");
scanf("%d",&n);
char* binary = toBinary(n);
printf("%s",binary);
return 0;
}
char* toBinary(unsigned n) {
char* binary = (char*)malloc(sizeof(char) * 16);
int j = 0;
unsigned i;
for (i = 1 << 16; i > 0; i = i / 2) {
if(j == 8)
binary[j++] = ' ';
else
binary[j++] = (n & i) ? '1' : '0';
}
binary[j]='\0';
return binary;
}
~
Pretty simple: Pass the pre-allocated buffer to the function:
typedef enum
{
CE_NoError,
CE_InsufficientMemory,
} ConversionError;
ConversionError toBinary(unsigned int n, size_t length, char binary[length])
{
// check first if length suffices to hold all characters
// plus the terminating null character
// use the array passed to instead of the one malloc'ed one...
}
Usage:
char binary[sizeof(unsigned int) * CHAR_BIT + /*1*/ 2]; // (!): intermediate space!
if(toBinary(theValue, sizeof(binary), binary) != CE_NoError)
{
// appropriate error handling!
}
Instead of the enum you might return bool (need to include stdbool.h for) if you consider the enum overkill for just one single error type...
Side note: Replace unsigned int (as coming from your code) with unsigned short if if you indeed want to convert the latter as in your title (and as your for loop indicates).

Conversion of string constant to numeric value using C

I have written a C program which uses two different algorithms to convert a string constant representing a numeric value to its integer value. For some reasons, the first algorithm, atoi(), doesn't execute properly on large values, while the second algorithm, atoi_imp(), works fine. Is this an optimization issue or some other error? The problem is that the first function makes the program's process to terminate with an error.
#include <stdio.h>
#include <string.h>
unsigned long long int atoi(const char[]);
unsigned long long int atoi_imp(const char[]);
int main(void) {
printf("%llu\n", atoi("9417820179"));
printf("%llu\n", atoi_imp("9417820179"));
return 0;
}
unsigned long long int atoi(const char str[]) {
unsigned long long int i, j, power, num = 0;
for (i = strlen(str) - 1; i >= 0; --i) {
power = 1;
for (j = 0; j < strlen(str) - i - 1; ++j) {
power *= 10;
}
num += (str[i] - '0') * power;
}
return num;
}
unsigned long long int atoi_imp(const char str[]) {
unsigned long long int i, num = 0;
for (i = 0; str[i] >= '0' && str[i] <= '9'; ++i) {
num = num * 10 + (str[i] - '0');
}
return num;
}
atoi is part of C standard library, with signature int atoi(const char *);.
You are declaring that a function with that name exists, but give it different return type. Note that in C, function name is the only thing that matters, and the toolchain can only trust what you tell in the source code. If you lie to the compiler, like here, all bets are off.
You should select different name for your own implementation to avoid issues.
As researched by #pmg, C standard (link to C99.7.1.3) says, using names from C standard library for your own global symbols (functions or global variables) is explicitly Undefined Behavior. Beware of nasal demons!
Ok there is at least one problem with your function atoi.
You are looping down on an unsigned value and check if its bigger equal zero, which should be an underflow.
The most easy fix is index shifting i.e.:
unsigned long long int my_atoi(const char str[]) {
unsigned long long int i, j, power, num = 0;
for (i = strlen(str); i != 0; --i) {
power = 1;
for (j = 0; j < strlen(str) - i; ++j) {
power *= 10;
}
num += (str[i-1] - '0') * power;
}
return num;
}
Too late, but may help. I did for base 10, in case you change the base you need to take care about how to compute the digit 0, in *p-'0'.
I would use the Horner's rule to compute the value.
#include <stdio.h>
void main(void)
{
char *a = "5363", *p = a;
int unsigned base = 10;
long unsigned x = 0;
while(*p) {
x*=base;
x+=(*p-'0');
p++;
}
printf("%lu\n", x);
}
Your function has an infinite loop: as i is unsigned, i >= 0 is always true.
It can be improved in different ways:
you should compute the length of str just once. strlen() is not cheap, it must scan the string until it finds the null terminator. The compiler is not always capable of optimizing away redundant calls for the same argument.
power could be computed incrementally, avoiding the need for a nested loop.
you should not use the name atoi as it is a standard function in the C library. Unless you implement its specification exactly and correctly, you should use a different name.
Here is a corrected and improved version:
unsigned long long int atoi_power(const char str[]) {
size_t i, len = strlen(str);
unsigned long long int power = 1, num = 0;
for (i = len; i-- > 0; ) {
num += (str[i] - '0') * power;
power *= 10;
}
return num;
}
Modified this way, the function should have a similar performance as the atoi_imp version. Note however that they do not implement the same semantics. atoi_pow must be given a string of digits, whereas atoi_imp can have trailing characters.
As a matter of fact neither atoi_imp nor atoi_pow implement the specification of atoi extended to handle larger unsigned integers:
atoi ignored any leading white space characters,
atoi accepts an optional sign, either '+' or '-'.
atoi consumes all following decimal digits, the behavior on overflow is undefined.
atoi ignores and trailing characters that are not decimal digits.
Given these semantics, the natural implementation or atoi is that of atoi_imp with extra tests. Note that even strtoull(), which you could use to implement your function handles white space and an optional sign, although the conversion of negative values may give surprising results.

How is the conversion being done in this C code?

#include<stdio.h>
#include<string.h>
void printlength(char *s, char *t) {
unsigned int c=0;
int len = ((strlen(s) - strlen(t)) > c) ? strlen(s) : strlen(t);
printf("%d\n", len);
}
void main() {
char *x = "abc";
char *y = "defgh";
printlength(x,y);
}
When I compile it, it gives 3, but, I don't understand how the conversion is taking place here: (strlen(s) - strlen(t)) > c)
This is very poor code (strlen(s) - strlen(t)) is always >= 0 as it is unsigned math. The type returned by strlen() is size_t, some unsigned type. So unless the values are equal, the difference is always a positive number due to unsigned math wrap-around.
Then int len = strlen(s); even when the length of s is differ from t.
The better way to use similar code would be to only add.
// ((strlen(s) - strlen(t)) > c)
(strlen(s) > (c + strlen(t))
Note: On rare platforms with SIZE_MAX <= INT_MAX, the difference can be negative as math is then done with the signed type int. Yet the compare with c is unsigned and than then happens as unsigned resulting in a negative difference being "wrapped-around" to a very large number, greater than 0. #Paul Hankin

Casting from unsigned into signed char in C

I am converting an input raw pcm stream into mp3 using lame. The encoding function within that library returns mp3 encoded samples in an array of type unsigned char. This mp3-encoded stream now needs to be placed within an flv container which uses a function that writes encoded samples in char array. My problem is that I am passing the array from lame (of type unsigned char) into the flv library. The following piece of code (only symbolic) illustrates my problem:
/* cast from unsigned char to char. */
#include <stdio.h>
#include <stdlib.h>
void display(char *buff, int len) {
int i = 0;
for(i = 0; i < len; i++) {
printf("buff[%d] = %c\n", i, buff[i]);
}
}
int main() {
int len = 10;
unsigned char* buff = (unsigned char*) malloc(len * sizeof(unsigned char));
int i = 0;
for(i = 65; i < (len + 65); i++) {
buff[i] = (unsigned char) i;
printf("char = %c", (char) i);
}
printf("Displaying array in main.\n");
for(i = 0; i < len; i++) {
printf("buff[%d] = %u\n", i, 'buff[i]');
}
printf("Displaying array in func.\n");
display(buff, len);
return 0;
}
My question(s):
1. Is the implicit type conversion in the code below (as demonstrated by passing of buff into function display safe? Is some weird behaviour likely to occur?
2. Given that I have little option but to stick to the functions as they are present, is there a "safe" way of converting an array of unsigned chars into chars?
The only problem with converting unsigned char * into char * (or vice versa) is that it's supposed to be an error. Fix it with a cast.
display((char *) buff, len);
Note: This cast is unnecessary:
printf("char = %c", (char) i);
This is fine:
printf("char = %c", i);
The %c formatter takes an int arg to begin with, since it is impossible to pass a char to printf() anyway (it will always get converted to int, or in an extremely unlikely case, unsigned int.)
You seem to worry a lot about type safety where there is no need for it. Since this is C and not C++, there is no strong typing system in place. Conversions from unsigned char to char are usually harmless, as long as the "sign bit" is never set. The key to avoiding problems is to actually understand them. The following problems/features exist in C:
The default char type has implementation-defined signedness. One should never make any assumptions of its signedness, nor use it in arithmetic of any kind, particularly not bit-wise operations. char should only be used for storing/printing ASCII letters. It should never be mixed with hex literals or there is a potential for subtle bugs.
The integer promotions in C implicitly promote all small integer types, among them char and unsigned char, to an integer type that can hold their result. This will in practice always be int.
Formally, pointer conversions between different types could be undefined behavior. But pointer conversions between unsigned char and char are in practice safe.
Character literals '\0' etc are of type int in C.
printf and similar functions default promote all character parameters to int.
You are also casting the void* result of malloc, which is completely pointless in C, and potentially harmful in older versions of the C standard that translated functions to "default int" if no function prototype was visible.
And then you have various weird logic-related bugs and bad practice, which I have fixed but won't comment in detail. Use this modified code:
#include <stdio.h>
#include <stdlib.h>
void display(const char *buff, int len) {
for(int i = 0; i < len; i++) {
printf("buff[%d] = %c\n", i, buff[i]);
}
}
int main() {
int len = 10;
unsigned char* buff = malloc(len * sizeof(unsigned char));
if(buff == NULL)
{
// error handling
}
char ch = 'A';
for(int i=0; i<len; i++)
{
buff[i] = (unsigned char)ch + i;
printf("char = %c\n", buff[i]);
}
printf("\nDisplaying array in main.\n");
for(int i = 0; i < len; i++) {
printf("buff[%d] = %u\n", i, buff[i]);
}
printf("\nDisplaying array in func.\n");
display((char*)buff, len);
free(buff);
return 0;
}
C/C++ casts from any integral type to any other same-or-larger integral type are guaranteed not to produce data loss. Casts between signed and unsigned fields would generally create overflow and underflow hazards, but the buffer you're converting actually points to raw data whose type is really void*.

Is there a strtol equivalent that does not require a null-terminated string?

Is there a standard C function similar to strtol which will take a char* and a length for a non-null-terminated string?
I know that I could copy out the string into a null-terminated region, but for efficiency reasons that is undesirable.
No such function in the standard library. You will either have to use the temporary buffer method, or write your own function from scratch.
To answer your question: no, there is no standard function, but it is simple enough to write your own:
#include <stdio.h>
#include <ctype.h>
int natoi(char *s, int n)
{
int x = 0;
while(isdigit(s[0]) && n--)
{
x = x * 10 + (s[0] - '0');
s++;
}
return x;
}
int main(int argc, char*argv[])
{
int i;
for(i = 1; i < argc; i++)
printf("%d: %d\n", i, natoi(argv[i], 5));
}
strntol is probably what you're after... it's not standard C, though.
If you're that pressed for efficiency, you can probably motivate the time to write and debug your own.
But: just do it with a copy; you probably have an upper bound for how long the string can be (a decimal numeral that fits in a long has a strict upper bound on its maximum length), so you can have a static buffer. Then profile your entire application, and see if the copying/conversion really is a bottleneck. If it really is, then you know you need to write your own.
Here's a rough (untested, browser-written) starting point:
long limited_strtol(const char *string, size_t len)
{
long sign = 1;
long value = 0;
for(; len > 0 && *string == '-'; string++, len--)
sign *= -1;
for(; len > 0 && isdigit(*string); string++, len--)
{
value *= 10;
value += *string - '0';
len--;
string++;
}
return sign * value;
}

Resources