Pass char* to method expecting unsigned char* - c

I am working on some embedded device which has SDK. It has a method like:
MessageBox(u8*, u8*); // u8 is typedefed unsigned char when I checked
But I have seen in their examples calling code like:
MessageBox("hi","hello");
passing char pointer without cast. Can this be well defined? I am asking because I ran some tool over the code, and it was complaining about above mismatch:
messageBox("Status", "Error calculating \rhash");
diy.c 89 Error 64: Type mismatch (arg. no. 1) (ptrs to signed/unsigned)
diy.c 89 Error 64: Type mismatch (arg. no. 2) (ptrs to signed/unsigned)
Sometimes I get different opinions on this answer and this confuses me even more. So to sum up, by using their API the way described above, is this problem? Will it crash the program?
And also it would be nice to hear what is the correct way then to pass string to SDK methods expecting unsigned char* without causing constraint violation?

It is a constraint violation, so technically it is not well defined, but in practice, it is not a problem. Yet you should cast these arguments to silence these warnings. An alternative to littering your code with ugly casts is to define an inline function:
static inline unsigned char *ucstr(const char *str) { return (unsigned char *)str; }
And use that function wherever you need to pass strings to the APIs that (mistakenly) take unsigned char * arguments:
messageBox(ucstr("hi"), ucstr("hello"));
This way you will not get warnings while keeping some type safety.
Also note that messageBox should take const char * arguments. This SDK uses questionable conventions.

The problem comes down to it being implementation-defined whether char is unsigned or signed.
Compilers for which there is no error will be those for which char is actually unsigned. Some of those (notably the ones that are actually C++ compilers, where char and unsigned char are distinct types) will issue a warning. With these compilers, converting the pointer to unsigned char * will be safe.
Compilers which report an error will be those for which char is actually signed. If the compiler (or host) uses an ASCII or similar character set, and the characters in the string are printable, then converting the string to unsigned char * (or, better, to const unsigned char * which avoids dropping constness from string literals) is technically safe. However, those conversions are potentially unsafe for implementations that use different character sets OR for strings that contain non-printable characters (e.g. values of type signed char that are negative, and values of unsigned char greater than 127). I say potentially unsafe, because what happens depends on what the called function does - for example does it check the values of individual characters? does it check the individual bits of individual characters in the string? The latter is, if the called function is well designed, one reason it will accept a pointer to unsigned char *.
What you need to do therefore comes down to what you can assume about the target machine, and its char and unsigned char types - and what the function is doing with its argument. The most general approach (in the sense that it works for all character sets, and regardless of whether char is signed or unsigned) is to create a helper function which copies the array of char to a different array of unsigned char. The working of that helper function will depend on how (and if) you need to handle the conversion of signed char values with values that are negative.

Related

Back then, were char variables in C declared as unsigned by default?

Here is a code fragment from "The C Programming Language" by Kernighan and Ritchie.
#include <stdio.h>
/* copy input to output; 2nd version */
main()
{
int c;
while ((c = getchar()) != EOF) {
putchar(c);
}
Justification for using int c instead of char c:
...We can't use char since c must be big enough to hold EOF in addition to any possible char.
I think using int instead of char is only justifiable if c is modified with unsigned because an signed char won't be able to hold
the value of EOF which is -1, and when I wrote this program char c was interpreted as signed char c, therefore I had no problem.
Were char variables previously unsigned by default? And if so, then why did they alter it?
I think using int instead of char is only justifiable if c is modified
with unsigned because a signed char won't be able to hold the value of
EOF which is -1.
Who says that EOF is -1? It is specified to be negative, but it doesn't have to be -1.
In any case, you're missing the point. Signedness notwithstanding, getchar() needs to return a type that can represent more values than char can, because it needs to provide, in one way or another, for every char value, plus at least one value that is distinguishable from all the others, for use as EOF.
Were char variables previously unsigned by default? And if so, then
why did they alter it?
No. But in C89 and pre-standard C, functions could be called without having first been declared, and the expected return type in such cases was int. This is among the reasons that so many of the standard library functions return int.
The C Standard does not define whether or not char is signed or unsigned, that's why we also have signed char and unsigned char. This has been the case in K&R C and is still the case in C18. But this is not really relevant when looking at your actual question, we simply need to use int here because we need a type that can hold more values than char so that we can use one of them to signal EOF.
Values and representations aside, the main reason we don’t use char to represent EOF is that it isn’t a character - it is a condition signaled by the input function when it reaches the end of the stream. It is logically a different entity from a character value.
Why int is discussed here.
But what comes to char being unsigned I believe is more common among architectures, and there are more devices in existence that have those architecture. But as one architecture (x86-16/32) has been popular among newbie programmers, that is often neglected. x86 is peculiar in that it has the move with sign extension and signed memory operands, which is why the signed char makes more sense there, whereas on majority of other architectures unsigned char is the more efficient one.

Passing a pointer to place in array

Learning C and having some issues with pointers/arrays.
Using MPLAB X and a PIC24FJ128GB204 as target device, but I don't really think it matters for this question.
The answer might be obvious, but without much background in C (yet), it is difficult to find a similar question that I understand enough to draw conclusions.
I have written an I2C library with the following function:
int I2C1_Write(char DeviceAddress, unsigned char SubAddress, unsigned char *Payload, char ByteCnt){
int PayloadByte = 0;
int ReturnValue = 0;
char SlaveWriteAddr;
// send start
if(I2C1_Start() < 0){
I2C1_Bus_SetDirty;
return I2C1_Err_CommunicationFail;
}
// address slave
SlaveWriteAddr = (DeviceAddress << 1) & ~0x1; // shift left, AND with 0xFE to keep bit0 clear
ReturnValue = I2C1_WriteSingleByte(SlaveWriteAddr);
switch (ReturnValue){
case I2C1_OK:
break;
case I2C1_Err_NAK:
I2C1_Stop();
I2C1_Bus_SetDirty;
return I2C1_Err_BadAddress;
default:
I2C1_Stop();
I2C1_Bus_SetDirty;
return I2C1_Err_CommunicationFail;
}
// part deleted for brevity
// and finally payload
for(PayloadByte = 0; PayloadByte < ByteCnt; PayloadByte++){
// send byte one by one
if(I2C1_WriteSingleByte(Payload[PayloadByte]) != I2C1_ACK){
I2C1_Stop();
I2C1_Bus_SetDirty;
return I2C1_Err_CommunicationFail;
}
}
return I2C1_OK;
}
I want to call this function from another one, using a predefined const:
const unsigned char CMD_SingleShot[3] = {2, 0x2C, 0x06};
This has the length of the command as first byte, then the command bytes.
The calling function is:
int SHT31_GetData(unsigned char MeasurementData[]){
// address device and start measurement
if(I2C1_Write(SHT31_Address,
0xFF,
CMD_SingleShot[1], // this is where the error message is given
CMD_SingleShot[0])
< 1){
return -1;
}
//code omitted for brevity
return 1;
}
The error message:
../Sensirion_SHT31.c:40:17: warning: passing argument 3 of 'I2C1_Write' makes pointer from integer without a cast
../I2C1.h:66:5: note: expected 'unsigned char *' but argument is of type 'unsigned char'
The problem is clearly
(unsigned char)CMD_SingleShot[1] where I try to give a pointer to the second byte of the unsigned char array.
I have tried:
reading up on pointers and arrays and trying to understand
searching for similar functions
given up on understanding and trying random thing hoping the error messages would lead me to the correct way of doing this. Things like:
CMD_SingleShot[1]
&CMD_SingleShot[1]
(unsigned char)CMD_SingleShot + 1
this just gave other error messages.
My questions:
given the I2C1_Write function as is (expecting a unsigned char *) (for instance, if that was not my code and I couldn't change that), how would I pass the pointer to the second byte in the cont unsigned char array? My understanding is that an array is a pointer, so
since this is my code, is there a better way to do this altogether?
First, don't do casting unless you know better than the compiler what is going on. Which, in your case, you don't. This is nothing to be ashamed of.
Doing &CMD_SingleShot[1] is a step in the right direction. The problem is that CMD_SingleShot[1] is of type const unsigned char and therefore taking the address of that gives you a pointer of type const unsigned char *. This cannot be passed to the Payload parameter since it expects a pointer of unsigned char *. Luckily you don't modify whatever Payload points to, so there is no reason for that to be non-const. Just change the definition of Payload to const unsigned char * and the compiler should be happy.
And by the way, in c, &Foo[n] is the same as Foo + n. Whatever you write is a matter of taste.
Edit:
Sometimes you don't have access to the library source code, and because of bad library design you are forced to cast. In that case, it is up to you to get things right. The compiler will happily shoot you in the foot if you ask it to.
In your case, the correct cast would be (unsigned char *)&CMD_SingleShot[1] and NOT (unsigned char *)CMD_SingleShot[1]. The first case interprets a pointer of one type as a pointer of different type. The second case interprets an unsigned character as a pointer, which is very bad.
Passing the address of the second byte of your command is done with either
&CMD_SingleShot[1]
or
CMD_SingleShot+1
But then you will run into an invalid conversion error since your command is defined as const unsigned char and then &CMD_SingleShot[1] is of type const unsigned char* but your function expects unsigned char*.
What you can do is either change the argument of your function:
int I2C1_Write(char DeviceAddress, unsigned char SubAddress, const unsigned char *Payload, char ByteCnt)
or cast your passing argument:
I2C1_Write(SHT31_Address, 0xFF, (unsigned char*)&CMD_SingleShot[1], CMD_SingleShot[0])
In the latter case be aware that casting away const'ness might result in undefined behaviour when changing it afterwards.
The function call is mostly correct, but since the 3rd parameter of the function is a pointer, you must pass an address to an array accordingly, not a single character. Thus &CMD_SingleShot[1] rather than CMD_SingleShot[1].
if(I2C1_Write(SHT31_Address,
0xFF,
&CMD_SingleShot[1],
CMD_SingleShot[0])
< 1)
However when you do this, you claim that you get "discards qualifiers from pointer target" which is a remark about const correctness - apparently CMD_SingleShot is const (since it's a flash variable or some such?).
That compiler error in turn simply means that the function is incorrectly designed - an I2C write function should clearly not modify the data, just send it. So the most correct fix is to change the function to take const unsigned char *Payload. Study const correctness - if a function does not modify data passed to it by a pointer, then that pointer should be declared as const type*, "pointer to read-only data of type".
If it wouldn't be possible to change the function because you are stuck with an API written by someone else, then you'd have to copy the data into a read/write buffer before passing it to the function. "Casting away" const is almost never correct (though most often works in practice, but without guarantees).
Other concerns:
When programming C in general, and embedded C in particular, you should use stdint.h instead of the default types of C, that are problematic since they have variable sizes.
Never use char (without unsigned) for anything else but actual strings. It has implementation-defined signedness and is generally dangerous - never use it for storing raw data.
When programming an 8 bit MCU, always use uint8_t/int8_t when you know in advance that the variable won't hold any larger values than 8 bits. There are plenty of cases where the compiler simply can't optimize 16 bit values down to 8 bit.
Never use signed or potentially signed operands to the bitwise operators. Code such as (DeviceAddress << 1) & ~0x1 is wildly dangerous. Not only is DeviceAddress potentially signed and could end up negative, it gets implicitly promoted to int. Similarly, 0x1 is of type int and so on 2's complement PIC ~0x1 actually boils down to -2 which is not what you want.
Instead try to u suffix all integer constants and study Implicit type promotion rules.

Unsigned char plaguing my code

In the C library an implementation of memcpy might look like this:
#include <stddef.h> /* size_t */
void *memcpy(void *dest, const void *src, size_t n)
{
char *dp = dest;
const char *sp = src;
while (n--)
*dp++ = *sp++;
return dest;
}
Nice, clean and type agnostic right? But when following a kernel tutorial, the prototypes look like this:
unsigned char *memcpy(unsigned char *dest, const unsigned char *src, int count);
I've attempted to implement it like this:
{
unsigned char *dp = dest;
unsigned const char *sp = src;
while (count--)
*dp++ = *sp++;
return dest;
}
But I'm quite wary of seeing unsigned char everywhere and potentially nasty bugs resulting from casts.
Should I attempt to use uint8_t and other variants wherever possible instead of unsigned TYPE?
Sometimes I see unsigned char * instead of const char*. Should this be considered a bug?
What have you seen at C library implementation of memcpy that is to confirm C programming standard ie that is how C programming standard defined the memcpy interface. And what have you seen at kernel tutorial is totally depends on the tutorial writer, how he designed his kernel and code. AFAIU, to show how to write a kernel and how internal components needs to be glued and built, you can avoid thinking too much about whether to use uint8_t or unsigned char (these kind of decision needs to be made based on whether you really want your kernel to expand at certain level) they are same, but helps readability of your project.
And, regarding the second point - if you really confident that you should use const char * instead of unsigned char * then yes fix it. But, also concentrate on how kernel works, how memory/video/peripheral devices are setup initialized etc.
Technically, as far as C99 is concerned, (u)int8_t must be available if the underlying machine has a primitive 8-bit integer (with no padding)... and not otherwise. For C99 the basic unit of memory is the char, which technically may have any number of bits. So, from that perspective, unsigned char is more correct than uint8_t. POSIX, on the other hand, has decided that the 8-bit byte has achieved tablet-of-stone status, so CHARBIT == 8, forever, and the difference between (u)int8_t and (unsigned/signed) char is academic.
The ghastly legacy of the ambiguous signed-ness of char, we just get to live with.
I like to typedef unsigned char byte pretty early, and have done with.
I note that memcmp() is defined to compare on the basis of unsigned char. So I can see some logic in treating all memxxx() as taking unsigned char.
I'm not sure I would describe memcpy(void* dst, ...) as "type agnostic"... what memcpy is doing is moving chars, but by declaring the arguments in this way the programmer is auto-magically relieved of casting to char*. Anyone who passes an int* and a count of ints, expecting memcpy to do the usual pointer arithmetic magic, is destined for a short, sharp shock ! So, in this case, if you are required to cast to unsigned char*, then I would argue that it is clearer and safer. (Everybody learns very quickly what memxxx() do, so the extra clarity would generally be seen as a pain in the hindquarters. I agree that casts in general should be treated as "taking off the seat belt and pressing the pedal to the metal", but not so much in this case.)
Clearly unsigned char* and const char* are quite different to each other (how different would depend on the signed-ness of char), but whether the appearance of one or the other is a bug or not would depend on the context.

Why cast is needed in printf?

To print a number of type off_t it was recommended to use the following piece of code:
off_t a;
printf("%llu\n", (unsigned long long)a);
Why the format string is not enough?
What will be the problem if it were not casted?
The format string doesn't tell the compiler to perform a cast to unsigned long long, it just tells printf that it's going to receive an unsigned long long. If you pass in something that's not an unsigned long long (which off_t might not be), then printf will simply misinterpret it, with surprising results.
The reason for this is that the compiler doesn't have to know anything about format strings. A good compiler will give you a warning message if you write printf("%d", 3.0), but what can a compiler do if you write printf(s, 3.0), with s being a string determined dynamically at run-time?
Edited to add: As Keith Thompson points out in the comments below, there are many places where the compiler can perform this sort of implicit conversion. printf is rather exceptional, in being one case where it can't. But if you declare a function to accept an unsigned long long, then the compiler will perform the conversion:
#include <stdio.h>
#include <sys/types.h>
int print_llu(unsigned long long ull)
{
return printf("%llu\n", ull); // O.K.; already converted
}
int main()
{
off_t a;
printf("%llu\n", a); // WRONG! Undefined behavior!
printf("%llu\n", (unsigned long long) a); // O.K.; explicit conversion
print_llu((unsigned long long) a); // O.K.; explicit conversion
print_llu(a); // O.K.; implicit conversion
return 0;
}
The reason for this is that printf is declared as int printf(const char *format, ...), where the ... is a "variadic" or "variable-arguments" notation, telling the compiler that it can accept any number and types of arguments after the format. (Obviously printf can't really accept any number and types of arguments: it can only accept the number and types that you tell it to, using format. But the compiler doesn't know anything about that; it's left to the programmer to handle it.)
Even with ..., the compiler does do some implicit conversions, such as promoting char to int and float to double. But these conversions are not specific to printf, and they do not, and cannot, depend on the format string.
The problem is you don't know how big an off_t is. It could be a 64 bit type or a 32 bit type (or perhaps something else). If you use %llu, and do not pass an (unsigned) long long type, you'll get undefined behavior, in practice it might just print garbage.
Not knowing how big it is, the easy way out is to cast it to the biggest reasonable type your system supports, e.g. a unsigned long long. That way using %llu is safe, as printf will receive an unsigned long long type because of the cast.
(e.g. on linux, the size of an off_t is 32 bit by default on a 32 bit machine, and 64 bit if you enable large file support via #define _FILE_OFFSET_BITS=64 before including the relevant system headers)
The signature of printf looks like this:
int printf(const char *format, ...);
The vararg... indicates that anything can follow, and by the rules of C, you can pass anything to printf as long as you include a format string. C simply does not have any constructs to describe any restrictions for the types of objects passed. This is why you must use casts so that the objects passed have exactly the needed type.
This is typical for C, it walks a line between rigidity and trusting the programmer. An unrelated example is that you may use char * (without const) to refer to string literals, but if you modify them, your program may crash.

Using size_t for specifying the precision of a string in C's printf

I have a structure to represent strings in memory looking like this:
typedef struct {
size_t l;
char *s;
} str_t;
I believe using size_t makes sense for specifying the length of a char string. I'd also like to print this string using printf("%.*s\n", str.l, str.s). However, the * precision expects an int argument, not size_t. I haven't been able to find anything relevant about this. Is there someway to use this structure correctly, without a cast to int in the printf() call?
printf("%.*s\n", (int)str.l, str.s)
// ^^^^^ use a type cast
Edit
OK, I didn't read the question properly. You don't want to use a type cast, but I think, in this case: tough.
Either that or simply use fwrite
fwrite(str.s, str.l, 1, stdout);
printf("\n");
You could do a macro
#define STR2(STR) (int const){ (STR).l }, (char const*const){ (STR).s }
and then use this as printf("%.*s\n", STR2(str)).
Beware that this evaluates STR twice, so be carefull with side effects, but you probably knew that already.
Edit:
I am using compound initializers such that these are implicit conversions. If things go wrong there are more chances that the compiler will warn you than with an explicit cast.
E.g if STR has a field .l that is a pointer and you'd only put a cast to int, all compilers would happily convert that pointer to int. Similar for the .s field this really has to correspond to a char* or something compatible, otherwise you'd see a warning or error.
There is no guarantee that the size_t is an int, or that it can be represented within an int. It's just part of C's legacy in not defining the exact size of an int, coupled with concerns that size_t's implementation might need to be leveraged to address large memory areas (ones that have more than MAX_INT values in them).
The most common error concerning size_t is to assume that it is equivalent to unsigned int. Such old bugs were common, and from personal experience it makes porting from a 32 bit to a 64 bit architecture a pain, as you need to undo this assumption.
At best, you can use a cast. If you really want to get rid of the cast, you could alternatively discard the use of size_t.

Resources