On some embedded device, I have passed an unsigned char pointer to atoi without a cast.
unsigned char c[10]="12";
atoi(c);
Question: is it well defined?
I saw somewhere it is ok for string functions, but was not sure about atoi.
Edit: Btw. Some concerns have been expressed on one of the answer below that it might not be OK even for string functions such as strcpy - but if I got right (?) the author meant also it can be that in practice this can be OK.
Also that I am here, is it ok to do following assignment to unsigned char pointer ok too? Because I used some tool which is complaining about "Type mismatch (assignment) (ptrs to signed/unsigned)"
unsigned char *ptr = strtok(unscharbuff,"-");
// is assignment also ok to unsigned char?
No, it's not well defined. It's a constraint violation, requiring a compile-time diagnostic. In practice it's very very likely to work as you expect, but it's not guaranteed to do so, and IMHO it's poor style.
The atoi function is declared in <stdlib.h> as:
int atoi(const char *nptr);
You're passing an unsigned char* argument to a function that expects a char* argument. The two types are not compatible, and there is no implicit conversion from one to the other. A conforming compiler may issue a warning (that counts as a diagnostic) and then proceed to generate an executable, but the behavior of that executable is undefined.
As of C99, a call to a function with no visible declaration is a constraint violation, so you can't get away with it by omitting the #include <stdlib.h>.
C does still permit calls to functions with a visible declaration where the declaration is not a prototype (i.e., doesn't define the number of type(s) of the parameters). So, rather than the usual #include <stdlib.h>, you could add your own declaration:
int atoi();
which would permit calling it with an unsigned char* argument.
This will almost certainly "work", and it might be possible to construct an argument from the standard that its behavior is well defined. The char and unsigned char values of '1' and '2' are guaranteed to have the same representation
But it's far easier to add the cast than to prove that it's not necessary -- or, better yet, to define c as an array of char rather than as an array of unsigned char, since it's intended to hold a string.
unsigned char *ptr = strtok(unscharbuff,"-");
This is also a constraint violation. There is no implicit conversion from unsigned char* to char* for the first argument in the strtok call, and there is no implicit conversion from char* to unsigned char* for the initialization of ptr.
Yes, these will function perfectly fine. Your compiler settings will determine whether you get a warning regarding type. I usually compile with -Wall, to turn on all warnings, and then use static casting in the code for each and every case, so that I know I have carefully examined them. The end result is zero errors and zero warnings, and any change that triggers a warning in the future will really stand out, not get lost in 100 tolerated messages.
Related
The code I am handling has a lot of castings that are being made from uint8 to char, and then the C library functions are called upon this castings.I was trying to understand why would the writer prefer uint8 over char.
For example:
uint8 *my_string = "XYZ";
strlen((char*)my_string);
What happens to the \0, is it added when I cast?
What happens when I cast the other way around?
Is this a legit way to work, and why would anybody prefer working with uint8 over char?
The casts char <=> uint8 are fine. It is always allowed to access any defined memory as unsigned characters, including string literals, and then of course to cast a pointer that points to a string literal back to char *.
In
uint8 *my_string = "XYZ";
"XYZ" is an anonymous array of 4 chars - including the terminating zero. This decays into a pointer to the first character. This is then implicitly converted to uint8 * - strictly speaking, it should have an explicit cast though.
The problem with the type char is that the standard leaves it up to the implementation to define whether it is signed or unsigned. If there is lots of arithmetic with the characters/bytes, it might be beneficial to have them unsigned by default.
A particularly notorious example is the <ctype.h> with its is* character class functions - isspace, isalpha and the like. They require the characters as unsigned chars (converted to int)! A piece of code that does the equivalent of char c = something(); if (isspace(c)) { ... } is not portable and a compiler cannot even warn about this! If the char type is signed on the platform (default on x86!) and the character isn't ASCII (or, more properly, a member of the basic execution character set), then the behaviour is undefined - it would even abort on MSVC debug builds, but unfortunately just causes silent undefined behaviour (array access out of bounds) on glibc.
However, a compiler would be very loud about using unsigned char * or its alias as an argument to strlen, hence the cast.
We are using char * and the lib we use is using const unsigned char *, so we convert to const unsigned char *
// lib
int asdf(const unsigned char *vv);
//ours
char *str = "somestring";
asdf((const unsigned char*)str);
Is it safe? any pitfall?
it is safe.
char *str = "somestring";
str is a constant,you could change str point to another string:
str = "some";//right
but you can't change the string which str is pointing now
str[0]='q';//wrong
so it is safe to use const to convert a constant to constant
if the asdf() only use to show the string,likes printf() or puts() ,you no need to const,because you are not change the string.
if you use const it will be safer.when you implementing the asdf(),it will make sure you can't write the wrong code likes "str[0]='q';" because you can't compile it.
if there are no const,you will find the error until you running the program.
If it's being treated as a string by that interface, there should be no problem at all. You don't even need to add the const in your cast if you don't want to - that part is automatic.
Technically, it is always safe to convert pointers from one type to another (except between function pointers and data pointers, unless your implementation provides an extension that allows that). It's only the dereferencing that is unsafe.
But dereferencing a pointer to any signedness of char is always safe (from a type aliasing perspective, at least).
It probably won't break anything in the compiler if you pass a char* where an unsigned char* was expected by using a cast. The const part is unproblematic -- it's just indicating that the function won't modify the argument via its pointer.
In many cases it doesn't matter whether char values are treated as signed or unsigned -- it's usually only a problem when performing arithmetic or comparing the sizes of values. However, if the function is expressly defined to take an unsigned char*, I guess there's a chance that it really requires the input data to be unsigned, for some arithmetical reason. If you're treating your character data as signed elsewhere, then it's possible that there is an incompatibility between your data and the data expected by the function.
In many cases, however, developers write "unsigned" to mean "I will not be doing arithmetic on this data", so the signedness probably won't matter.
I use strlen() call all over my project, until now I compiled my project without -Wall compiler option. But when I start using -Wall I face so many compiler warning. 80% are the strlen char * vs const char * warning.
I am aware of type casting all strlen() calls. Is there any other way that I can suppress the following warning?
./Proj.c:3126: warning: pointer targets in passing argument 1 of
'strlen' differ in signedness`
C:/staging/usr/include/string.h:397: note: expected 'const char *' but
argument is of type 'unsigned char *'`
strlen takes a const char* as its input.
Unfortunately the C standard states that signedness of char is down to the compiler and platform. Many programmers therefore opt to set the signedness of char explicitly using signed char or unsigned char.
But doing that will cause warnings to be emitted if char* has the other sign convention to what you expect.
Luckily in the context of strlen, taking a C-style cast is safe: use strlen((const char*)...);
There's always the option to do :
inline size_t u_strlen(const unsigned char * array)
{
return strlen((const char*)array);
}
This way you don`t have to add a conversion everywhere in your code.
Although the question remains, why are you using unsigned char? I suppose it`s a byte array for data packets over networking, in that case you should take care of the length in the protocol anyways.
It is not a matter if char* vs const char*, that is not the problem being reported (because it is not a problem). The problem is the fact that you are using unsigned char*. Whether of not a plain char is signed or unsigned is implementation dependent; so on some platforms unsigned char* will be the same as char* and on others it won't.
The best solution is ensure type agreement by not qualifying your strings and string pointers as unsigned; it almost certainly serves no useful purpose. For strings and characters the distinction between signed and unsigned is irrelevant - that is only of interest when performing arithmetic and using char as a "small integer".
Most compilers support a command line switch to specify the default signedness of char; however I would not recommend that as a solution, nor would I recommend casting; correct type agreement should always be your first choice.
Say I have some utf8 encoded string. Inside it words are delimited using ";".
But each character (except ";") inside this string has utf8 value >128.
Say I store this string inside unsigned char array:
unsigned char buff[]="someutf8string;separated;with;";
Is it safe to pass this buff to strtok function? (If I just want to extracts words using ";" symbol).
My concern is that strtok (or also strcpy) expect char pointers, but inside my
string some values will have value > 128.
So is this behaviour defined?
No, it is not safe -- but if it compiles it will almost certainly work as expected.
unsigned char buff[]="someutf8string;separated;with;";
This is fine; the standard specifically permits arrays of character type (including unsigned char) to be initialized with a string literal. Successive bytes of the string literal initialize the elements of the array.
strtok(buff, ";")
This is a constraint violation, requiring a compile-time diagnostic. (That's about as close as the C standard gets to saying that something is illegal.)
The first parameter of strok is of type char*, but you're passing an argument of type unsigned char*. These two pointer types are not compatible, and there is no implicit conversion between them. A conforming compiler may reject your program if it contains a call like this (and, for example, gcc -std=c99 -pedantic-errors does reject it.)
Many C compilers are somewhat lax about strict enforcement of the standard's requirements. In many cases, compilers issue warnings for code that contains constraint violations -- which is perfectly valid. But once a compiler has diagnosed a constraint violation and proceeded to generate an executable, the behavior of that executable is not defined by the C standard.
As far as I know, any actual compiler that doesn't reject this call will generate code that behaves just as you expect it to. The pointer types char* and unsigned char* almost certainly have the same representation and are passed the same way as arguments, and the types char and unsigned char are explicitly required to have the same representation for non-negative values. Even for values exceeding CHAR_MAX, like the ones you're using, a compiler would have to go out of its way to generate misbehaving code. You could have problems on a system that doesn't use 2's-complement for signed integers, but yo're not likely to encounter such a system.
If you add an explicit cast:
strtok((char*)buff, ";")
removes the constraint violation and will probably silence any warning -- but the behavior is still strictly undefined.
In practice, though, most compilers try to treat char, signed char, and unsigned char almost interchangeably, partly to cater to code like yours, and partly because they'd have to go out of their way to do anything else.
According to the C11 Standard (ISO/IEC 9899:2011 §7.24.1 String Handling Conventions, ¶3, emphasis added):
For all functions in this subclause, each character shall be
interpreted as if it had the type unsigned char (and therefore every
possible object representation is valid and has a different value).
Note: this paragraph was not present in the C99 standard.
So I do not see a problem.
Why is it that I can assign a string to a variable of type int? Eg. the following code compiles correctly:
int main(int argv, char** argc){
int a="Hello World";
printf(a);
}
Also, the program doesn't compile when I assign a string to a variable of a different type, namely double and char.
I suppose that what is actually going on is that the compiler executes int* a = "Hello World"; and when I write double a="Hello World";, it executes that line of code as it is.
Is this correct?
In fact, that assignment is a constraint violation, requiring a diagnostic (possibly just a warning) from any conforming C implementation. The C language standard does not define the behavior of the program.
EDIT : The constraint is in section 6.5.16.1 of the C99 standard, which describes the allowed operands for a simple assignment. The older C90 standard has essentially the same rules. Pre-ANSI C (as described in K&R1, published in 1978) did allow this particular kind of implicit conversion, but it's been invalid since the 1989 version of the language.
What probably happens if it does compile is that
int a="Hello World";
is treated as it if were
int a = (int)"Hello World";
The cast takes a pointer value and converts to int. The meaning of such a conversion is implementation-defined; if char* and int are of different sizes, it can lose information.
Some conversions may be done explicitly, for example between different arithmetic types. This one may not.
Your compiler should complain about this. Crank up the warning levels until it does. (Tell us what compiler you're using, and we can tell you how to do that.)
EDIT :
The printf call:
printf(a);
has undefined behavior in C90, and is a constraint violation, requiring a diagnostic, in C99, because you're calling a variadic function with no visible prototype. If you want to call printf, you must have a
#include <stdio.h>
(In some circumstances, the compiler won't tell you abut this, but it's still incorrect.) And given a visible declaration, since printf's first parameter is of type char* and you're passing it an int.
And if your compiler doesn't complain about
double a="Hello World";
you should get a better compiler. That (probably) tries to convert a pointer value to type double, which doesn't make any sense at all.
"Hello World" is an array of characters ending with char '\0'
When you assign its value to an int a, you assign the address of the first character in the array to a. GCC is trying to be kind with you.
When you print it, then it goes to where a points and prints all the characters until it reaches char '\0'.
It will compile because (on a 32-bit system) int, int *, and char * all correspond to 32-bit registers -- but double is 64-bits and char is 8-bits.
When compiled, I get the following warnings:
[11:40pm][wlynch#wlynch /tmp] gcc -Wall foo.c -o foo
foo.c: In function ‘main’:
foo.c:4: warning: initialization makes integer from pointer without a cast
foo.c:5: warning: passing argument 1 of ‘printf’ makes pointer from integer without a cast
foo.c:5: warning: format not a string literal and no format arguments
foo.c:5: warning: format not a string literal and no format arguments
foo.c:6: warning: control reaches end of non-void function
As you can see from the warnings, "Hello World" is a pointer, and it is being converted to an integer automatically.
The code you've given will not always work correctly though. A pointer is sometimes larger than an int. If it is, you could get a truncated pointer, and then a very odd fault when you attempt to use that value.
It produces a warning on compilers like GCC and clang.
warning: initialization makes integer from pointer without a cast [enabled by default]
The string literal "Hello World" is a const char *, so you are assigning a pointer to an int (i.e. casting the address of the first character in the string as an int value). On my compilers, gcc 4.6.2 and clang-mac-lion, assigning the string to int, unsigned long long, or char all produce warnings, not errors.
This is not behavior to rely on, quite frankly. Not to mention, your printf(a); is also a dangerous use of printf.