Here is a code fragment from "The C Programming Language" by Kernighan and Ritchie.
#include <stdio.h>
/* copy input to output; 2nd version */
main()
{
int c;
while ((c = getchar()) != EOF) {
putchar(c);
}
Justification for using int c instead of char c:
...We can't use char since c must be big enough to hold EOF in addition to any possible char.
I think using int instead of char is only justifiable if c is modified with unsigned because an signed char won't be able to hold
the value of EOF which is -1, and when I wrote this program char c was interpreted as signed char c, therefore I had no problem.
Were char variables previously unsigned by default? And if so, then why did they alter it?
I think using int instead of char is only justifiable if c is modified
with unsigned because a signed char won't be able to hold the value of
EOF which is -1.
Who says that EOF is -1? It is specified to be negative, but it doesn't have to be -1.
In any case, you're missing the point. Signedness notwithstanding, getchar() needs to return a type that can represent more values than char can, because it needs to provide, in one way or another, for every char value, plus at least one value that is distinguishable from all the others, for use as EOF.
Were char variables previously unsigned by default? And if so, then
why did they alter it?
No. But in C89 and pre-standard C, functions could be called without having first been declared, and the expected return type in such cases was int. This is among the reasons that so many of the standard library functions return int.
The C Standard does not define whether or not char is signed or unsigned, that's why we also have signed char and unsigned char. This has been the case in K&R C and is still the case in C18. But this is not really relevant when looking at your actual question, we simply need to use int here because we need a type that can hold more values than char so that we can use one of them to signal EOF.
Values and representations aside, the main reason we don’t use char to represent EOF is that it isn’t a character - it is a condition signaled by the input function when it reaches the end of the stream. It is logically a different entity from a character value.
Why int is discussed here.
But what comes to char being unsigned I believe is more common among architectures, and there are more devices in existence that have those architecture. But as one architecture (x86-16/32) has been popular among newbie programmers, that is often neglected. x86 is peculiar in that it has the move with sign extension and signed memory operands, which is why the signed char makes more sense there, whereas on majority of other architectures unsigned char is the more efficient one.
Related
Learning C and having some issues with pointers/arrays.
Using MPLAB X and a PIC24FJ128GB204 as target device, but I don't really think it matters for this question.
The answer might be obvious, but without much background in C (yet), it is difficult to find a similar question that I understand enough to draw conclusions.
I have written an I2C library with the following function:
int I2C1_Write(char DeviceAddress, unsigned char SubAddress, unsigned char *Payload, char ByteCnt){
int PayloadByte = 0;
int ReturnValue = 0;
char SlaveWriteAddr;
// send start
if(I2C1_Start() < 0){
I2C1_Bus_SetDirty;
return I2C1_Err_CommunicationFail;
}
// address slave
SlaveWriteAddr = (DeviceAddress << 1) & ~0x1; // shift left, AND with 0xFE to keep bit0 clear
ReturnValue = I2C1_WriteSingleByte(SlaveWriteAddr);
switch (ReturnValue){
case I2C1_OK:
break;
case I2C1_Err_NAK:
I2C1_Stop();
I2C1_Bus_SetDirty;
return I2C1_Err_BadAddress;
default:
I2C1_Stop();
I2C1_Bus_SetDirty;
return I2C1_Err_CommunicationFail;
}
// part deleted for brevity
// and finally payload
for(PayloadByte = 0; PayloadByte < ByteCnt; PayloadByte++){
// send byte one by one
if(I2C1_WriteSingleByte(Payload[PayloadByte]) != I2C1_ACK){
I2C1_Stop();
I2C1_Bus_SetDirty;
return I2C1_Err_CommunicationFail;
}
}
return I2C1_OK;
}
I want to call this function from another one, using a predefined const:
const unsigned char CMD_SingleShot[3] = {2, 0x2C, 0x06};
This has the length of the command as first byte, then the command bytes.
The calling function is:
int SHT31_GetData(unsigned char MeasurementData[]){
// address device and start measurement
if(I2C1_Write(SHT31_Address,
0xFF,
CMD_SingleShot[1], // this is where the error message is given
CMD_SingleShot[0])
< 1){
return -1;
}
//code omitted for brevity
return 1;
}
The error message:
../Sensirion_SHT31.c:40:17: warning: passing argument 3 of 'I2C1_Write' makes pointer from integer without a cast
../I2C1.h:66:5: note: expected 'unsigned char *' but argument is of type 'unsigned char'
The problem is clearly
(unsigned char)CMD_SingleShot[1] where I try to give a pointer to the second byte of the unsigned char array.
I have tried:
reading up on pointers and arrays and trying to understand
searching for similar functions
given up on understanding and trying random thing hoping the error messages would lead me to the correct way of doing this. Things like:
CMD_SingleShot[1]
&CMD_SingleShot[1]
(unsigned char)CMD_SingleShot + 1
this just gave other error messages.
My questions:
given the I2C1_Write function as is (expecting a unsigned char *) (for instance, if that was not my code and I couldn't change that), how would I pass the pointer to the second byte in the cont unsigned char array? My understanding is that an array is a pointer, so
since this is my code, is there a better way to do this altogether?
First, don't do casting unless you know better than the compiler what is going on. Which, in your case, you don't. This is nothing to be ashamed of.
Doing &CMD_SingleShot[1] is a step in the right direction. The problem is that CMD_SingleShot[1] is of type const unsigned char and therefore taking the address of that gives you a pointer of type const unsigned char *. This cannot be passed to the Payload parameter since it expects a pointer of unsigned char *. Luckily you don't modify whatever Payload points to, so there is no reason for that to be non-const. Just change the definition of Payload to const unsigned char * and the compiler should be happy.
And by the way, in c, &Foo[n] is the same as Foo + n. Whatever you write is a matter of taste.
Edit:
Sometimes you don't have access to the library source code, and because of bad library design you are forced to cast. In that case, it is up to you to get things right. The compiler will happily shoot you in the foot if you ask it to.
In your case, the correct cast would be (unsigned char *)&CMD_SingleShot[1] and NOT (unsigned char *)CMD_SingleShot[1]. The first case interprets a pointer of one type as a pointer of different type. The second case interprets an unsigned character as a pointer, which is very bad.
Passing the address of the second byte of your command is done with either
&CMD_SingleShot[1]
or
CMD_SingleShot+1
But then you will run into an invalid conversion error since your command is defined as const unsigned char and then &CMD_SingleShot[1] is of type const unsigned char* but your function expects unsigned char*.
What you can do is either change the argument of your function:
int I2C1_Write(char DeviceAddress, unsigned char SubAddress, const unsigned char *Payload, char ByteCnt)
or cast your passing argument:
I2C1_Write(SHT31_Address, 0xFF, (unsigned char*)&CMD_SingleShot[1], CMD_SingleShot[0])
In the latter case be aware that casting away const'ness might result in undefined behaviour when changing it afterwards.
The function call is mostly correct, but since the 3rd parameter of the function is a pointer, you must pass an address to an array accordingly, not a single character. Thus &CMD_SingleShot[1] rather than CMD_SingleShot[1].
if(I2C1_Write(SHT31_Address,
0xFF,
&CMD_SingleShot[1],
CMD_SingleShot[0])
< 1)
However when you do this, you claim that you get "discards qualifiers from pointer target" which is a remark about const correctness - apparently CMD_SingleShot is const (since it's a flash variable or some such?).
That compiler error in turn simply means that the function is incorrectly designed - an I2C write function should clearly not modify the data, just send it. So the most correct fix is to change the function to take const unsigned char *Payload. Study const correctness - if a function does not modify data passed to it by a pointer, then that pointer should be declared as const type*, "pointer to read-only data of type".
If it wouldn't be possible to change the function because you are stuck with an API written by someone else, then you'd have to copy the data into a read/write buffer before passing it to the function. "Casting away" const is almost never correct (though most often works in practice, but without guarantees).
Other concerns:
When programming C in general, and embedded C in particular, you should use stdint.h instead of the default types of C, that are problematic since they have variable sizes.
Never use char (without unsigned) for anything else but actual strings. It has implementation-defined signedness and is generally dangerous - never use it for storing raw data.
When programming an 8 bit MCU, always use uint8_t/int8_t when you know in advance that the variable won't hold any larger values than 8 bits. There are plenty of cases where the compiler simply can't optimize 16 bit values down to 8 bit.
Never use signed or potentially signed operands to the bitwise operators. Code such as (DeviceAddress << 1) & ~0x1 is wildly dangerous. Not only is DeviceAddress potentially signed and could end up negative, it gets implicitly promoted to int. Similarly, 0x1 is of type int and so on 2's complement PIC ~0x1 actually boils down to -2 which is not what you want.
Instead try to u suffix all integer constants and study Implicit type promotion rules.
I am working on some embedded device which has SDK. It has a method like:
MessageBox(u8*, u8*); // u8 is typedefed unsigned char when I checked
But I have seen in their examples calling code like:
MessageBox("hi","hello");
passing char pointer without cast. Can this be well defined? I am asking because I ran some tool over the code, and it was complaining about above mismatch:
messageBox("Status", "Error calculating \rhash");
diy.c 89 Error 64: Type mismatch (arg. no. 1) (ptrs to signed/unsigned)
diy.c 89 Error 64: Type mismatch (arg. no. 2) (ptrs to signed/unsigned)
Sometimes I get different opinions on this answer and this confuses me even more. So to sum up, by using their API the way described above, is this problem? Will it crash the program?
And also it would be nice to hear what is the correct way then to pass string to SDK methods expecting unsigned char* without causing constraint violation?
It is a constraint violation, so technically it is not well defined, but in practice, it is not a problem. Yet you should cast these arguments to silence these warnings. An alternative to littering your code with ugly casts is to define an inline function:
static inline unsigned char *ucstr(const char *str) { return (unsigned char *)str; }
And use that function wherever you need to pass strings to the APIs that (mistakenly) take unsigned char * arguments:
messageBox(ucstr("hi"), ucstr("hello"));
This way you will not get warnings while keeping some type safety.
Also note that messageBox should take const char * arguments. This SDK uses questionable conventions.
The problem comes down to it being implementation-defined whether char is unsigned or signed.
Compilers for which there is no error will be those for which char is actually unsigned. Some of those (notably the ones that are actually C++ compilers, where char and unsigned char are distinct types) will issue a warning. With these compilers, converting the pointer to unsigned char * will be safe.
Compilers which report an error will be those for which char is actually signed. If the compiler (or host) uses an ASCII or similar character set, and the characters in the string are printable, then converting the string to unsigned char * (or, better, to const unsigned char * which avoids dropping constness from string literals) is technically safe. However, those conversions are potentially unsafe for implementations that use different character sets OR for strings that contain non-printable characters (e.g. values of type signed char that are negative, and values of unsigned char greater than 127). I say potentially unsafe, because what happens depends on what the called function does - for example does it check the values of individual characters? does it check the individual bits of individual characters in the string? The latter is, if the called function is well designed, one reason it will accept a pointer to unsigned char *.
What you need to do therefore comes down to what you can assume about the target machine, and its char and unsigned char types - and what the function is doing with its argument. The most general approach (in the sense that it works for all character sets, and regardless of whether char is signed or unsigned) is to create a helper function which copies the array of char to a different array of unsigned char. The working of that helper function will depend on how (and if) you need to handle the conversion of signed char values with values that are negative.
In the C library an implementation of memcpy might look like this:
#include <stddef.h> /* size_t */
void *memcpy(void *dest, const void *src, size_t n)
{
char *dp = dest;
const char *sp = src;
while (n--)
*dp++ = *sp++;
return dest;
}
Nice, clean and type agnostic right? But when following a kernel tutorial, the prototypes look like this:
unsigned char *memcpy(unsigned char *dest, const unsigned char *src, int count);
I've attempted to implement it like this:
{
unsigned char *dp = dest;
unsigned const char *sp = src;
while (count--)
*dp++ = *sp++;
return dest;
}
But I'm quite wary of seeing unsigned char everywhere and potentially nasty bugs resulting from casts.
Should I attempt to use uint8_t and other variants wherever possible instead of unsigned TYPE?
Sometimes I see unsigned char * instead of const char*. Should this be considered a bug?
What have you seen at C library implementation of memcpy that is to confirm C programming standard ie that is how C programming standard defined the memcpy interface. And what have you seen at kernel tutorial is totally depends on the tutorial writer, how he designed his kernel and code. AFAIU, to show how to write a kernel and how internal components needs to be glued and built, you can avoid thinking too much about whether to use uint8_t or unsigned char (these kind of decision needs to be made based on whether you really want your kernel to expand at certain level) they are same, but helps readability of your project.
And, regarding the second point - if you really confident that you should use const char * instead of unsigned char * then yes fix it. But, also concentrate on how kernel works, how memory/video/peripheral devices are setup initialized etc.
Technically, as far as C99 is concerned, (u)int8_t must be available if the underlying machine has a primitive 8-bit integer (with no padding)... and not otherwise. For C99 the basic unit of memory is the char, which technically may have any number of bits. So, from that perspective, unsigned char is more correct than uint8_t. POSIX, on the other hand, has decided that the 8-bit byte has achieved tablet-of-stone status, so CHARBIT == 8, forever, and the difference between (u)int8_t and (unsigned/signed) char is academic.
The ghastly legacy of the ambiguous signed-ness of char, we just get to live with.
I like to typedef unsigned char byte pretty early, and have done with.
I note that memcmp() is defined to compare on the basis of unsigned char. So I can see some logic in treating all memxxx() as taking unsigned char.
I'm not sure I would describe memcpy(void* dst, ...) as "type agnostic"... what memcpy is doing is moving chars, but by declaring the arguments in this way the programmer is auto-magically relieved of casting to char*. Anyone who passes an int* and a count of ints, expecting memcpy to do the usual pointer arithmetic magic, is destined for a short, sharp shock ! So, in this case, if you are required to cast to unsigned char*, then I would argue that it is clearer and safer. (Everybody learns very quickly what memxxx() do, so the extra clarity would generally be seen as a pain in the hindquarters. I agree that casts in general should be treated as "taking off the seat belt and pressing the pedal to the metal", but not so much in this case.)
Clearly unsigned char* and const char* are quite different to each other (how different would depend on the signed-ness of char), but whether the appearance of one or the other is a bug or not would depend on the context.
So I understand the use of typecasting. Making a type of variable act as another. But everytime I attempt to do so it prints a diamond lol?
#include <stdio.h>
#include <strings.h>
#include <windows.h>
void loginscreen(void)
{
printf("\nWelcome to the login screen...\n");
int num = 4;
printf("%c", (char)num);
getchar();
}
Also can I get an explanation of malloc and why and how it uses typecasting.
You are casting the number 4 to the ASCII character 4, which happens to be EOT (End of Transmission). This is a special character that signals the end of the input. In Unix-like systems it can be generated by pressing Ctrl+D (Ctrl+Z in Windows). As this is a non-printable character, your terminal is probably displaying it as '�', the replacement character used to replace an unknown or unrepresentable character.
Addressing your other question, malloc() basically asks the system to give you a chunk of memory. There are plenty of wonderful resources on the web where you can find very good explanations.
Casting (not "typecasting" doesn't make a type of variable act as another; it converts a value of one type to another type (or perhaps to the same type).
(Pointer conversions can be used to reinterpret an object as an object of a different type. Your code doesn't do that.)
Some conversions are implicit; others are explicit. A cast is an operator consisting of a parenthesized type name; it specifies an explicit conversion. (There's no such thing as an implicit cast.)
In your example:
printf("%c", (char)num);\
the value of num (which is of type int) is converted to type char. It's then immediately converted (promoted) back to type int, because that's the behavior when something of a type narrower than int is passed as an argument to a variadic function like printf. It would behave exactly the same way without the cast:
printf("%c\n", num);
It prints the character whose value is 4, which happens to be a non-printable control character.
You asked about malloc, but since there's no call to malloc in your code, that's (a) a separate question, and (b) it's rather vague. If you have a more specific question about malloc, you can post it separately. But first, I suggest reading section 7 of the comp.lang.c FAQ, which discusses memory allocation. (In particular, you should not cast the result of malloc; it's unnecessary and can mask errors in some cases.)
I am using gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1
The man page for isalnum() says:
SYNOPSIS
#include <ctype.h>
int isalnum(int c);
However, it also says:
These functions check whether c, which must have the value of an
unsigned char or EOF, ...
I have found that isalnum() will blow up for very large positive (or negative) int values (but it handles all short int values).
Is the man page saying the int passed in must have a value of an unsigned char because the C library writers are reserving the right to implement isalnum() in a way that will not handle all int values without blowing up?
The C standard says as much...
In ISO/IEC 9899:1999 (the old C standard), it says:
§7.4 Character handling
The header declares several functions useful for classifying and mapping
characters. In all cases the argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value of the macro EOF. If the
argument has any other value, the behavior is undefined.
(I've left out a footnote.) Both C89 and C11 say very much the same thing.
One common implementation is to use an array offset by 1 — a variation on the theme of:
int _CtypeBits[257] = { ... };
#define isalpha(c) (_Ctype_bits[(c)+1]&_ALPHA);
As long as c is in the range of integers that an unsigned char can store (and there are 8 bits per character, EOF is -1, and the initialization is correct), then this works beautifully. Note that the macro expansion only uses the argument once, which is another requirement of the standard. But if you pass random values out the stipulated range, you access random memory (or, at the least, memory that is not initialized to contain the correct information).