Quoted initializer for unsigned char array in C - c

This seems to work in GCC and Visual C without comment:
static const **unsigned** char foo[] = "bar";
This is a salt being used in a unit test. There are other ways to do it, but this is simplest and involves the least casting down the line.
Will this cause trouble with other compilers?

This is safe, at least if you're using a conforming C compiler.
N1570 6.7.9 paragraph 14 says:
An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
The C90 standard has essentially the same wording, so you don't need to worry about older compilers.
The character types are char, signed char, and unsigned char.
Interestingly, there's no corresponding guarantee for pointer initialization, so this:
const char *ptr = "hello";
is safe, but this:
const unsigned char *uptr = "hello";
is not -- and there doesn't seem to be a simple workaround.

Related

Legal to initialize uint8_t array with string literal? [duplicate]

This question already has an answer here:
Why is it ok to use a string literal to initialize an unsigned char array but not to initialize an unsigned char pointer?
(1 answer)
Closed 1 year ago.
Is it OK to initialize a uint8_t array from a string literal? Does it work as expected or does it mangle some bytes due to signed-unsigned conversion? (I want it to just stuff the literal's bits in there unchanged.) GCC doesn't complain with -Wall and it seems to work.
const uint8_t hello[] = "Hello World"
I am using an API that takes a string as uint8_t *. Right now I am using a cast, otherwise I would get a warning:
const char* hello = "Hello World\n"
HAL_UART_Transmit(uart, (uint8_t *)hello, 12, 50);
// HAL_UART_Transmit(uart, hello, 12, 50);
// would give a warning such as:
// pointer targets in passing argument 2 of 'HAL_UART_Transmit' differ in signedness [-Wpointer-sign]
On this platform, char is 8 bits and signed. Is it under that circumstance OK to use uint8_t instead of char? Please don't focus on the constness issue, the API should take const uint8_t * but doesn't. This API call is just the example that brought me to this question.
Annoyingly this question is now closed, I would like to answer it myself. Apologies for adding this info here, I don't have the permission to reopen.
All of the following work with gcc -Wall -pedantic, but the fourth warns about converting signed to unsigned. The bit pattern in memory will be identical, and if you cast such an object to (uint8_t *) it will have the same behavior. According to the marked duplicate, this is because you may assign string literals to any char array.
const char string1[] = "Hello";
const uint8_t string2[] = "Hello";
uint8_t string3[] = "Hello";
uint8_t* string4 = "Hello";
char* string5 = "Hello";
Of course, only the first two are recommendable, since you shouldn't attempt to modify string literals. In the concrete case above, you could either create a wrapper function/macro, or just leave the cast inside as a concession to the API and call it a day.
C 2018 6.7.9 14 tells us “An array of character type may be initialized by a character string literal or UTF–8 string literal…”
C 2018 6.2.5 15 tells us “The three types char, signed char, and unsigned char are collectively called the character types.”
C 2018 6.2.5 4 and 6.2.5 6 says there may be extended integer types.
There is no statement that any extended integer types are character types.
C 2018 7.20 4 tells us “For each type described herein that the implementation provides, <stdint.h> shall declare that typedef name…” and 7.20.1 5 tells us “When typedef names differing only in the absence or presence of the initial u are defined, they shall denote corresponding signed and unsigned types as described in 6.2.5…”
Therefore, a C implementation could provide an unsigned 8-bit type that is an extended integer type, not an unsigned char, and may define uint8_t to be this type, and then 6.7.9 14 does not tell us that an array of this type may be initialized by a character string literal.
If an implementation is allowing you to initialize an array of uint8_t with a string literal, then either it defines uint8_t to be an unsigned char or to be unsigned char, or it defines uint8_t to be an extended integer type but allows you to initialize the array as an extension to the C standard. It would be up to the C implementation to define the behavior of that extension, but I would expect it to work just as initializing for an array of character type.
(Conceivable, defining uint8_t to be an extended integer type and disallowing its treatment as a character type could be useful for distinguish the character types, which are allowed to alias any objects, from pure integer types, which would not allow such aliasing. This might allow the compiler to perform additional optimizations, since it would know the aliasing could not occur, or possibly to diagnose certain errors.)
The elements of a string literal have type char (by C 2018 5.2.1 6). C 2018 6.7.9 14 tells us that “Successive bytes of the string literal… initialize the elements of the array.” Each byte should initialize an array element in the usual way, including conversion to the destination type per C 2018 6.7.9 11. For the string you show, "Hello World", the character values are all non-negative, so there is no issue in converting their char values to uint8_t. If you had negative characters in the string, they should be converted to uint8_t in the usual way.
(If you have octal or hexadecimal escape sequences that have values not represented in a char, there could be some language-lawyer weirdness in the initialization.)

C programming preferring uint8 over char

The code I am handling has a lot of castings that are being made from uint8 to char, and then the C library functions are called upon this castings.I was trying to understand why would the writer prefer uint8 over char.
For example:
uint8 *my_string = "XYZ";
strlen((char*)my_string);
What happens to the \0, is it added when I cast?
What happens when I cast the other way around?
Is this a legit way to work, and why would anybody prefer working with uint8 over char?
The casts char <=> uint8 are fine. It is always allowed to access any defined memory as unsigned characters, including string literals, and then of course to cast a pointer that points to a string literal back to char *.
In
uint8 *my_string = "XYZ";
"XYZ" is an anonymous array of 4 chars - including the terminating zero. This decays into a pointer to the first character. This is then implicitly converted to uint8 * - strictly speaking, it should have an explicit cast though.
The problem with the type char is that the standard leaves it up to the implementation to define whether it is signed or unsigned. If there is lots of arithmetic with the characters/bytes, it might be beneficial to have them unsigned by default.
A particularly notorious example is the <ctype.h> with its is* character class functions - isspace, isalpha and the like. They require the characters as unsigned chars (converted to int)! A piece of code that does the equivalent of char c = something(); if (isspace(c)) { ... } is not portable and a compiler cannot even warn about this! If the char type is signed on the platform (default on x86!) and the character isn't ASCII (or, more properly, a member of the basic execution character set), then the behaviour is undefined - it would even abort on MSVC debug builds, but unfortunately just causes silent undefined behaviour (array access out of bounds) on glibc.
However, a compiler would be very loud about using unsigned char * or its alias as an argument to strlen, hence the cast.

In C11, string literals as char[], unsigned char[], char* and unsigned char*

Usually string literals is type of const char[]. But when I treat it as other type I got strange result.
unsigned char *a = "\355\1\23";
With this compiler throw warning saying "pointer targets in initialization differ in signedness", which is quite reasonable since sign information can be discarded.
But with following
unsigned char b[] = "\355\1\23";
There's no warning at all. I think there should be a warning for the same reason above. How can this be possible?
FYI, I use GCC version 4.8.4.
The type of string literals in C is char[], which decays to char*. Note that C is different from C++, where they are of type const char[].
In the first example, you try to assign a char* to an unsigned char*. These are not compatible types, so you get a compiler diagnostic message.
In the second example, the following applies, C11 6.7.9/14:
An array of character type may be initialized by a character string literal or UTF−8 string
literal, optionally enclosed in braces. Successive bytes of the string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.
Meaning that the code is identical to this:
unsigned char b[] =
{
'\355',
'\1',
'\23',
'\0'
};
This may yield warnings too, but is valid code. C has lax type safety when it comes to assignment1 between different integer types, but much stricter when it comes to assignment between pointer types.
For the same reason as we can write unsigned int x=1; instead of unsigned int x=1u;.
As a side note, I have no idea what you wish to achieve with an octal escape sequence of value 355. Perhaps you meant to write "\35" "5\1\23"?
1 The type rules of initialization are the same as for assignment. 6.5.16.1 "Simple assignment" applies.
The first is the initialization of a pointer, the target types of pointers must agree on signedness.
The second is the initialization of an array. The special rules for initialization with string literals have it that the value of each character of the literal is taken to initialize the individual elements of the array.
BTW, other than you state, string literals are not const qualified in C. You don't have the right to modify them, but this is not reflected in the type.

What is the type of string literal in C? [duplicate]

This question already has answers here:
What is the type of string literals in C and C++?
(4 answers)
Closed 7 years ago.
Is the type of a string, like "hello, world" a char * or const char *, as of C99? I know that in C++ it is the latter, but what about in C?
String literals in C are not pointers, they are arrays of chars. You can tell this by looking at sizeof("hello, world"), which is 13, because null terminator is included in the size of the literal.
C99 allows string literals to be assigned to char *, which is different from C++, which requires const char *.
String literals are of type char[N] in C. For example, "abc" is an array of 4 chars (including the NUL terminator).
The type of a string literal in C is char[]. This can directly be assigned to a char*. In C++, the type is const char[] as all constants are marked with const in C++.
A character literal is always an array of read-only characters, with the array of course including the string terminator. As all arrays it of course decays to a pointer to the first element, but being read-only makes it a pointer to a const. It originated in C and was inherited by C++.
The thing is that C99 allows the weaker char * (without const) which C++ (with its stronger type system) does not allow. Some compilers may issue a warning if making a non-constant char * point to a string literal, but it's allowed. Trying to modify the string through the non-const char * of course leads to undefined behavior.
I don't have a copy of the C11 specification in front of me, but I don't think that C11 makes this stronger.

Difference between char array declaration forms

we had this question in programming exam, and we are all debating the correct answer, soo what do you think?
3.1 Which of the following is an incorrect string initialization?
(a) char plant[] = "Tree";
(b) char plant[] = {'T','R','E','E'};
(c) char plant[80] = "Tree";
(d) char plant[80] = {'T','R','E','E'};
(e) None of the above
thanks in advance :)
They're all syntactically valid, but I'm assuming what the question is leaning towards is that (b) will simply create a char [4] - that is, it will not be null terminated, whereas the other three will be.
The C99 and draft C11 standards define explicitly that a string is null-terminated: 7 Library 7.1.1 Definitions “A string is a contiguous sequence of characters terminated by and including the first null character.” The term “string” being so defined – and more than just a convention in the libraries – a “incorrect string initialization” (as referred to in the question) could be one that does not include a null character.
The C11 standard stipulates in 6.7.9 ¶22 “If an array of unknown size is initialized, its size is determined by the largest indexed element with an explicit initializer.” C99 6.7.8 ¶22 says the same. This is the case in (b), which therefore is unterminated and incorrect:
char plant[] = {'T','R','E','E'};
6.7.9 /6.7.8 ¶21 that “If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration”; ¶10 says that such objects are filled with (various) zeroes; this implies that (c) and (d) are null-terminated:
char plant[80] = "Tree";
char plant[80] = {'T','R','E','E'};
6.7.9 / 6.7.8 ¶14 says “An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.” This implies that this (a) is null-terminated:
char plant[] = "Tree";

Resources