Unknown escape sequence - c

I am trying to printf a string that shows a temperature table
printf("TABLE 24A (20\°C)");
The degree sign is a constant I have defined as 0xDF so the the string looks like this: "TABLE 24A (20\xDF C)"
This works but looks incorrect because of the space between the \xDF and the C.
If I remove the space the compiler issues a warning hex escape sequence out of range.
If I modify the string to "TABLE 24A (20\xDF\C)" I get the correct result but the compiler issues warning unknown escape sequence: '\C'
Is there a way to get rid of the warnings but lose the space between the two characters?

You can take advantage of the fact that consecutive string literals are automatically concatenated:
printf("**TABLE 24A (20\xDF" "C)**");
This prevents the parser from consuming more characters for the escape sequence than you want.
You could also pass in the character as a parameter and use the %c format specifier to print it:
printf("**TABLE 24A (20%cC)**", '\xDF');

\x escape sequences consume as many adjacent hex digits as possible. The C is being parsed as a hex digit.
With \x, you could combine two adjacent string literals.
printf("**TABLE 24A (20\xDF""C)**");
Or use a \unnnn Unicode escape, which is limited to four hex characters.
printf("**TABLE 24A (20\u00DFC)**");
Or octal \nnn:
printf("**TABLE 24A (20\337C)**");

Related

Creating a C string parser

In C (and similar languages), a string is declared for example as "abc". Another example is "ab\"c". I have a file which contains these strings. That is, the file contents is "abc" or "ab\c" etc. Any literal string that can be defined in a .c file can be defined in the file I'm reading.
These strings can be malformed. E.g. "abc (no closing quotes). What is the best way to write a parser to make sure the string in the file is a valid C literal string? (so that if I copy the file contents and paste them after char* str =, the resulting expression will be accepted by the compiler when at the top of a function)
The strings are each in a separate line.
Alternatively, you can think of this as wanting to parse lines that declare literal string variables. Imagine I'm grepping a big file and use char\* .* = (.*);$ and want to make sure the part in the parenthesis will not cause compilation errors;
The grammar for C string literals is given in C 2018 6.4.5. Supposing you want to parse only plain strings, not those with encoding prefixes such as u in u"xyz", then the grammar for a string-literal is " s-char-sequenceopt ", where “opt” means optional and s-char-sequence is one or more s-char tokens. An s-char is any member of the source character set except ", \ or the new-line character or is an escape-sequence.
The source character set includes at least the Latin alphabet (26 letters A-Z) in uppercase and lowercase, the ten digits, space, horizontal tab, vertical tab, form feed, and these characters:
"#%&’()*+,-./:;?[\]^_{|}~
However, a C implementation may include other characters in its source character set. Therefore, any character found in the string other than ", \, or the new-line character must be accepted as potentially valid in some C implementation.
An escape-sequence is defined in 6.4.4.4 1 to be one of:
\ followed by ', ", ?, \, a, b, f, n, r, t, v,
\ followed by one to three octal digits, or
\x followed by one or more hexadecimal digits, or
a universal-character-name.
Paragraph 7 says:
Each octal or hexadecimal escape sequence is the longest sequence of characters that can constitute the escape sequence.
A universal-character-name is defined in 6.4.3 to be \u followed by four hexadecimal digits or \U followed by eight hexadecimal digits. Paragraph 2 limits these:
A universal character name shall not specify a character whose short identifier is less than 00A0 other than 0024 ($), 0040 (#), or 0060 (‘), nor one in the range D800 through DFFF inclusive.
This part of the C grammar looks fairly simple to parse:
A string literal must start with a ".
If the next character is anything other than ", \, or a new-line character, then accept it.
If the next character is \ and it is followed by one of the single characters listed above, accept it and the following character.
If the next character is \ and it is followed by one to three octal digits, accept it and up to three octal digits.
If the next two characters are \x and are followed by a hexadecimal digit, accept them and all the hexadecimal digits that follow.
If the next two characters are \u and are followed by four hexadecimal digits, accept those six characters. However, if the value is one of those prohibited in the constraint above, this is not a valid C string literal.
If the next two characters are \U and are followed by eight hexadecimal digits, accept those ten characters. However, if the value is one of those prohibited in the constraint above, this is not a valid C string literal.
Repeat the above until the next character is not accepted.
If the next character is not ", this is not a valid C string literal.
If the next character is ", accept it.
If that is the end of the line read from the file, it is a valid C string literal. Otherwise, it is not.

Separating hexadecimal escape sequences in strings

Can a string constant like "foo" "\x01" "bar" be written as a single string literal (while keeping the hexadecimal notation)? With "foo\x01bar" the escape sequence seems to be interpreted as \x01ba since I get the warning "hex escape sequence out of range."
"foo" "\x01" "bar" is a string literal.
The C standard states that a hexadecimal escape sequence is the longest sequence of characters that can constitute the escape sequence. Without the explicit concatenation (which is the common workaround to this problem), the compiler parses \x01ba which is obviously out of range.
How about "foo\x01\142ar"? Is that cheating?
Another solution is to simply write the escaped character in octal, instead of hexadecimal
"foo\1bar"
and no more ambiguity...

Hex value in C string

I need to prepare constant array of ANSI C strings that contains bytes from range of 0x01 to 0x1a. I made custom codepage, so those values represents different characters (i.e. 0x09 represents Š). I'd like to initialise the array in that way:
static const char* brands[] = {
"Škoda",
//etc...
};
How can I put 0x09 instead of Š in "Škoda"?
Recommend not using "\x09koda", use octal or spaced strings.
The problem is that hexadecimal escape sequences are not limited in their length. So if the next char is a hexadecimal character, problems occur. Use octal, which is limited to 3. Or use separated strings. The compiler will concatenate then, but the escape sequence will not accidentally run too far.
// problematic
"\x09Czech"
^^^^^--- The escape sequence is \x09C, but \0x09 was hoped for
// recommend octal
"\0111234"
^^^^--- The escape sequence is \011
// recommend spaced strings
"\x09" "Czech"
Very simple
"Škoda" -> "\x09koda"
How can I put 0x09 instead of Š in "Škoda"?
"\x09koda"
Have a look at escape squences - i.e \x09
For hex escape you want to use the \Xnnn, for octal just \nnn and for unicode \Unnnn

how to include hex value in string using sprintf

i want to include value of i hex format in c.
for(i=0;i<10;i++)
sprintf(s1"DTLK\x%x\xFF\xFF\xFF\xFF\xFF\xFF",i);
but the above code outputs an error: \x used with no following hex digits
Pls any one suggest me a proper way....
Supposing you don't want to literally have \x00..\x0A, but the corresponding byte, you need
sprintf(s1, "DTLK%c\xFF\xFF\xFF\xFF\xFF\xFF",i);
while inserting \x%x would be at the wrong abstraction level...
If, OTOH, you really want to literally have the hex characters instead of the bytes with the named hey characters as their representation, the other answers might be more helpful.
You need to escape the slash on front of the \x:
sprintf(s1"DTLK\\x%x\xFF\xFF\xFF\xFF\xFF\xFF",i);
// ^------- Here
Depending on what output you would like to achieve, you may need to escape the remaining slashes as well.
Currently, the snippet produces a sequence of six characters with the code 0xFF. If this is what you want, your code fragment is complete. If you would like to see a sequence of \xFF literals, i.e. a string that looks like \x5\xFF\xFF\xFF\xFF\xFF\xFF when i == 5, you need to escape all slashes in the string:
sprintf(s1"DTLK\\x%x\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF",i);
// ^ ^ ^ ^ ^ ^ ^
Finally, if you would like the value formatted as a two-digit hex code even when the value is less than sixteen, use %02x format code to tell sprintf that you want a leading zero.
\x expects a hex value like \xC9.
If you want to include \x in your output, you need to escape \ with \\:
sprintf(s1"DTLK\\x%x\xFF\xFF\xFF\xFF\xFF\xFF",i);
sprintf(s1"DTLK\\x%x\xFF\xFF\xFF\xFF\xFF\xFF",i);
// ^------- Here
Depending on what output you would like to achieve, you may need to escape the remaining slashes as well.
Currently, the snippet produces a sequence of six characters with the code 0xFF. If this is what you want, your code fragment is complete. If you would like to see a sequence of \xFF literals, i.e. a string that looks like \x5\xFF\xFF\xFF\xFF\xFF\xFF when i == 5, you need to escape all slashes in the string:
sprintf(s1"DTLK\\x%x\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF",i);
// ^ ^ ^ ^ ^ ^ ^
Finally, if you would like the value formatted as a two-digit hex code even when the value is less than 16, use %02x format code to tell sprintf that you want a leading zero.

Convert this kind of hex string to a NSData/NSString

I have this hex string:
\x5c30\x3032\x5f5c\x3337\x345c\x3334\x366f\x5c32\x3633\x5c30\x3136\x5c32\x3132\x5c32\x3234\x4e5c\x3236\x335c\x3231\x335c\x3337\x355c\x3335\x315c\x3232\x365c\x3337
How could I convert it to a NSString or NSData? I though of using C methods, but I'm not experienced in C :(
Looks like Unicode characters (specifically, CJK ideographs) to me.
Use an NSScanner to scan the string. Scan up to a backslash, and add whatever you scanned to a mutable string. Then, scan the backslash and throw it away, and then scan the x and throw that away.
Then, scan four single characters, which will be the digits (NSScanner doesn't have a method to scan a single character, so you will need to get them yourself using characterAtIndex: and then adjust the scanner's scan location accordingly). Perform the appropriate conversion of the hexadecimal digit characters to numbers and the math to assemble a single number from them, and you will have the code point (character value) represented by the escape sequence. Add that single character to your string.
Repeat that until you run out of input string, and you will have converted the input string with all its escape sequences into a string with the unescaped characters.

Resources