I use strlen() call all over my project, until now I compiled my project without -Wall compiler option. But when I start using -Wall I face so many compiler warning. 80% are the strlen char * vs const char * warning.
I am aware of type casting all strlen() calls. Is there any other way that I can suppress the following warning?
./Proj.c:3126: warning: pointer targets in passing argument 1 of
'strlen' differ in signedness`
C:/staging/usr/include/string.h:397: note: expected 'const char *' but
argument is of type 'unsigned char *'`
strlen takes a const char* as its input.
Unfortunately the C standard states that signedness of char is down to the compiler and platform. Many programmers therefore opt to set the signedness of char explicitly using signed char or unsigned char.
But doing that will cause warnings to be emitted if char* has the other sign convention to what you expect.
Luckily in the context of strlen, taking a C-style cast is safe: use strlen((const char*)...);
There's always the option to do :
inline size_t u_strlen(const unsigned char * array)
{
return strlen((const char*)array);
}
This way you don`t have to add a conversion everywhere in your code.
Although the question remains, why are you using unsigned char? I suppose it`s a byte array for data packets over networking, in that case you should take care of the length in the protocol anyways.
It is not a matter if char* vs const char*, that is not the problem being reported (because it is not a problem). The problem is the fact that you are using unsigned char*. Whether of not a plain char is signed or unsigned is implementation dependent; so on some platforms unsigned char* will be the same as char* and on others it won't.
The best solution is ensure type agreement by not qualifying your strings and string pointers as unsigned; it almost certainly serves no useful purpose. For strings and characters the distinction between signed and unsigned is irrelevant - that is only of interest when performing arithmetic and using char as a "small integer".
Most compilers support a command line switch to specify the default signedness of char; however I would not recommend that as a solution, nor would I recommend casting; correct type agreement should always be your first choice.
Related
This question already has answers here:
Implement `memcpy()`: Is `unsigned char *` needed, or just `char *`?
(4 answers)
Closed 10 days ago.
I'm trying to create my own versions of C functions and when I got to memcpy and memset I assumed that I should cast the destination and sources pointers to char *. However, I've seen many examples where the pointers were casted to unsigned char * instead. Why is that?
void *mem_cpy(void *dest, const void *src, size_t n) {
if (dest == NULL || src == NULL)
return NULL;
int i = 0;
char *dest_arr = (char *)dest;
char *src_arr = (char *)src;
while (i < n) {
dest_arr[i] = src_arr[i];
i++;
}
return dest;
}
It doesn't matter for this case, but a lot of folks working with raw bytes will prefer to explicitly specify unsigned char (or with stdint.h types, uint8_t) to avoid weirdness if they have to do math with the bytes. char has implementation-defined signedness, and that means, when the integer promotions & usual arithmetic conversions are applied, a char with the high bit set is treated as a negative number if signed, and a positive number if unsigned.
While neither behavior is necessarily wrong for a given problem, the fact that the behavior can change between compilers or even with different flags set on the same compiler, means you often need to be explicit about signedness, using either signed char or unsigned char as appropriate, and 99% of the time, the behaviors of unsigned char are what you want, so people tend to default to it even when it's not strictly required.
There's no particular reason in this specific case, it's mostly stylistic.
But in general it is always best to stick to unsigned arithmetic when dealing with raw data. That is: unsigned char or uint8_t.
The char type is problematic because it has implementation-defined signedness and is therefore avoided in such code. Is char signed or unsigned by default?
NOTE: this is dangerous and poor style:
char *src_arr = (char *)src;
(And the cast hid the problem underneath the carpet)
Since you correctly used "const correctness" for src, the correct type is: const char *src_arr; I'd change to code to:
unsigned char *dest_arr = dest;
const unsigned char *src_arr = src;
A good rule of thumb for beginners is to never use a cast. I'm serious. Some 90% of all casts we see on SO in beginner-level programs are wrong, in one way or the other.
Btw (advanced topic) there's a reason why memcpy has the prototype as:
void *memcpy(void * restrict s1,
const void * restrict s2,
size_t n);
The restrict qualifier on the pointers tell the user of the function "hey I'm counting on you to not pass on two pointers to the same object or pointers that may overlap". Doing so would cause problems in various situations and for various targets, so this is a good idea.
It's much more likely that the user passes on overlapping pointers than null pointers, so if you are to have slow, superfluous error checking against NULL, you should also restrict qualify the pointers.
If the user passes on null pointers I'd just let the function crash, instead of slowing it down with extra branches that are pointless bloat in some 99% of all use cases.
Why ... unsigned char* instead of char*?
Short answer: Because the functionality differs in select operations when char is signed and the C spec specifies unsigned char like functionality for str...() and mem...().
When does it make a difference?
When a function (like memcmp(), strcmp(), etc.) compares for order, one byte is negative and the other is positive, the order of the two bytes differ. Example: -1 < 1, yet when viewed as an unsigned char: 255 > 1.
When does it not make a difference?
When copying data and comparing for equality*1.
Non-2's compliment
*1 One's compliment and sign-magnitude encoding are expected to be dropped in the upcoming version C2x. Until then, those signed encodings support 2 zeroes. For str...() and mem...() functions, C specifies data access as unsigned char. This means only the +0 is a null character and order depends on pure binary, unsigned, encoding.
The code I am handling has a lot of castings that are being made from uint8 to char, and then the C library functions are called upon this castings.I was trying to understand why would the writer prefer uint8 over char.
For example:
uint8 *my_string = "XYZ";
strlen((char*)my_string);
What happens to the \0, is it added when I cast?
What happens when I cast the other way around?
Is this a legit way to work, and why would anybody prefer working with uint8 over char?
The casts char <=> uint8 are fine. It is always allowed to access any defined memory as unsigned characters, including string literals, and then of course to cast a pointer that points to a string literal back to char *.
In
uint8 *my_string = "XYZ";
"XYZ" is an anonymous array of 4 chars - including the terminating zero. This decays into a pointer to the first character. This is then implicitly converted to uint8 * - strictly speaking, it should have an explicit cast though.
The problem with the type char is that the standard leaves it up to the implementation to define whether it is signed or unsigned. If there is lots of arithmetic with the characters/bytes, it might be beneficial to have them unsigned by default.
A particularly notorious example is the <ctype.h> with its is* character class functions - isspace, isalpha and the like. They require the characters as unsigned chars (converted to int)! A piece of code that does the equivalent of char c = something(); if (isspace(c)) { ... } is not portable and a compiler cannot even warn about this! If the char type is signed on the platform (default on x86!) and the character isn't ASCII (or, more properly, a member of the basic execution character set), then the behaviour is undefined - it would even abort on MSVC debug builds, but unfortunately just causes silent undefined behaviour (array access out of bounds) on glibc.
However, a compiler would be very loud about using unsigned char * or its alias as an argument to strlen, hence the cast.
We are using char * and the lib we use is using const unsigned char *, so we convert to const unsigned char *
// lib
int asdf(const unsigned char *vv);
//ours
char *str = "somestring";
asdf((const unsigned char*)str);
Is it safe? any pitfall?
it is safe.
char *str = "somestring";
str is a constant,you could change str point to another string:
str = "some";//right
but you can't change the string which str is pointing now
str[0]='q';//wrong
so it is safe to use const to convert a constant to constant
if the asdf() only use to show the string,likes printf() or puts() ,you no need to const,because you are not change the string.
if you use const it will be safer.when you implementing the asdf(),it will make sure you can't write the wrong code likes "str[0]='q';" because you can't compile it.
if there are no const,you will find the error until you running the program.
If it's being treated as a string by that interface, there should be no problem at all. You don't even need to add the const in your cast if you don't want to - that part is automatic.
Technically, it is always safe to convert pointers from one type to another (except between function pointers and data pointers, unless your implementation provides an extension that allows that). It's only the dereferencing that is unsafe.
But dereferencing a pointer to any signedness of char is always safe (from a type aliasing perspective, at least).
It probably won't break anything in the compiler if you pass a char* where an unsigned char* was expected by using a cast. The const part is unproblematic -- it's just indicating that the function won't modify the argument via its pointer.
In many cases it doesn't matter whether char values are treated as signed or unsigned -- it's usually only a problem when performing arithmetic or comparing the sizes of values. However, if the function is expressly defined to take an unsigned char*, I guess there's a chance that it really requires the input data to be unsigned, for some arithmetical reason. If you're treating your character data as signed elsewhere, then it's possible that there is an incompatibility between your data and the data expected by the function.
In many cases, however, developers write "unsigned" to mean "I will not be doing arithmetic on this data", so the signedness probably won't matter.
On some embedded device, I have passed an unsigned char pointer to atoi without a cast.
unsigned char c[10]="12";
atoi(c);
Question: is it well defined?
I saw somewhere it is ok for string functions, but was not sure about atoi.
Edit: Btw. Some concerns have been expressed on one of the answer below that it might not be OK even for string functions such as strcpy - but if I got right (?) the author meant also it can be that in practice this can be OK.
Also that I am here, is it ok to do following assignment to unsigned char pointer ok too? Because I used some tool which is complaining about "Type mismatch (assignment) (ptrs to signed/unsigned)"
unsigned char *ptr = strtok(unscharbuff,"-");
// is assignment also ok to unsigned char?
No, it's not well defined. It's a constraint violation, requiring a compile-time diagnostic. In practice it's very very likely to work as you expect, but it's not guaranteed to do so, and IMHO it's poor style.
The atoi function is declared in <stdlib.h> as:
int atoi(const char *nptr);
You're passing an unsigned char* argument to a function that expects a char* argument. The two types are not compatible, and there is no implicit conversion from one to the other. A conforming compiler may issue a warning (that counts as a diagnostic) and then proceed to generate an executable, but the behavior of that executable is undefined.
As of C99, a call to a function with no visible declaration is a constraint violation, so you can't get away with it by omitting the #include <stdlib.h>.
C does still permit calls to functions with a visible declaration where the declaration is not a prototype (i.e., doesn't define the number of type(s) of the parameters). So, rather than the usual #include <stdlib.h>, you could add your own declaration:
int atoi();
which would permit calling it with an unsigned char* argument.
This will almost certainly "work", and it might be possible to construct an argument from the standard that its behavior is well defined. The char and unsigned char values of '1' and '2' are guaranteed to have the same representation
But it's far easier to add the cast than to prove that it's not necessary -- or, better yet, to define c as an array of char rather than as an array of unsigned char, since it's intended to hold a string.
unsigned char *ptr = strtok(unscharbuff,"-");
This is also a constraint violation. There is no implicit conversion from unsigned char* to char* for the first argument in the strtok call, and there is no implicit conversion from char* to unsigned char* for the initialization of ptr.
Yes, these will function perfectly fine. Your compiler settings will determine whether you get a warning regarding type. I usually compile with -Wall, to turn on all warnings, and then use static casting in the code for each and every case, so that I know I have carefully examined them. The end result is zero errors and zero warnings, and any change that triggers a warning in the future will really stand out, not get lost in 100 tolerated messages.
My default char type is "unsigned char" as set in the gcc option (-funsigned-char gcc). So arguably I can use "char" when I need "unsigned char" in the code. But i am getting warning for conversion between (char*) and (unsigned char* or signed char*):
"error: pointer targets in passing argument 1 of 'test2' differ in signedness" .
How can I avoid warning when I pass unsigned char* variable to char* (knowing that my syetem has default unsigned char as set by compiler option)?
static void test2(char* a) //char is unsigned by deafult as set by -funsigned-char gcc option
{
}
void test1(void)
{
// This passes, but if i change it to unsigned char (or 'signed char') it fails
// I dont want it to fail for "unsigned char c" since default char is unsigned.
char c = 65;
test2(&c);
}
The switches -funsigned-char and -fsigned-char do not refer to char *.
You might use -Wno-pointer-signto switch off the warning you receive.
Use a cast:
char c = 65; // weird magic :-(
test2((unsigned char *)(&c));
All char types are layout compatible, and casting their pointers does not constitute type punning or violate aliasing rules, so you can do this freely.
The proper solution is to pass the correct type of variable to the function, i.e. if the function expects plain char, then declare a plain char and take its address, and so on.
The C standard says that "char", "signed char", and "unsigned char" are different types. "char" must have the same behaviour as either "signed char", or "unsigned char", as determined by compiler switches, but you can't use them interchangeably. You should be writing code that works exactly the same regardless of whether you use -funsigned-char or not.
The suggestion by another poster to use a cast is not good. All that does is suppress the warning, it would be clearer to explicitly disable the warning (e.g. with a pragma, or turn the warning off globally in your makefile).
The cast doesn't fix the problem with the code, it just stops the compiler pointing it out. This is a slightly academic point, but on a non 2's complement system, signed chars may have trap representations (they are layout compatible for values 0 <= x <= CHAR_MAX but not other values). So the code could crash.
Based on the details you've given, in practical terms your best solution is probably just to disable the warning and live with the fact that the code is non-portable in this respect.
Finally I got an answer :
-Wpointer-sign is implied by -Wall and by -pedantic . To avoid warning use -Wno-pointer-sign