String literals for array

String literals for array - c

#include <stdio.h>
int main(void) {
char a[] = "125"; // (int)1, (int)2, (int)5. But array 'a' has a type char. So int is in char. ???
printf("%s", a);
}
In that code, each element of string literal has type int. But the array a has type char.
In C99 6.4.5 $2 fragment
The same considerations apply to each element of the sequence in a character string
literal or a wide string literal as if it were in an integer character constant or a wide
character constant
In C99 6.4.5 $5 fragment
For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequenceI think they are not compatible, it's a contradiction to me. What's wrong about my thought?

No, this is a string literal.
Quoting C11, chapter 6.4.5, String Literals:
A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, as in "xyz".[...]
To elaborate, the acceptable syntax for a string liteal is:
string-literal:
encoding-prefixopt " s-char-sequenceopt "
encoding-prefix:
u8
u
U
L
s-char-sequence:
s-char
s-char-sequence s-char
s-char:
any member of the source character set except
the double-quote ", backslash \, or new-line character
escape-sequence
and then, the "source character set", referring (Chapter 5.2.1/P3)
Both the basic source and basic execution character sets shall have the following members: the 26 uppercase letters of the Latin alphabet
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
the 26 lowercase letters of the Latin alphabet
a b c d e f g h i j k l m
n o p q r s t u v w x y z
the 10 decimal digits
0 1 2 3 4 5 6 7 8 9
the following 29 graphic characters
! " # % & ' ( ) * + , - . / :
; < = > ? [ \ ] ^ _ { | } ~
So, a construct like "123" is a string literal, not individual integers held by/in char.

char a[] = "125";
In that code, each element of string literal has type int. But the array a has type char.
No, the fact that it's a 5 does not mean it has to be an int. The type of which has to be determined by the context of where/how it is declared.
In your case that 5 is of type char because it is part of the string literal.
Also note that 5 can be one of any other types such as unsigned int, unsigned short, double, etc. So again you must look at how it's declared in the first place.

Related

sizeof operator in C behavior with strings [duplicate]

This question already has answers here:
Behaviour of sizeof with string
(2 answers)
Closed 2 years ago.
For the sizeof operator I am seeing following results; and I am not able understand the reason behind the same.
What I understand is sizeof operator returns the result in terms of size_t
Below are the results,
sizeof("6") -> 2
sizeof("a") -> 2
sizeof('a') -> 4
sizeof("something") -> 10
sizeof("some") -> 5

By definition (C11 3.6), 1 char requires 1 byte (may not be 1 octet in some exotic system)
"6" has type char[2], so 2 bytes
"a" has type char[2]
'a' has type int ==> in your system, int requires 4 bytes
"something" has type char[10]
"some" has type char[5]
Note that "a" and 'a' are very different things: "a" is an array of char with 2 elements; 'a' is an int value, very much like 42 or -1.

In records like this
sizeof("6") -> 2
there is used a string literal as an operand.
String literals are character arrays that store a sequence of characters terminated with the zero character '\0'.
So for example the string literal "6" is stored in memory like
char literal_6[] = { '6', '\0' };
Or this declaration is equivalent to
char literal_6[2] = { '6', '\0' };
Note: for example the string literal "some" is stored in memory like a character array declared as
char literal_some[] = { 's', 'o', 'm', 'e', '\0' };
So the expression sizeof("6") is equivalent to the expression sizeof( char[2] ).
In this record
sizeof('a') -> 4
there is used integer character constant '4' that has the type int.
So this expression sizeof('a') is equivalent to the expression sizeof( int ).
It is interesting to note that for example sizeof("something") is not equal to sizeof("something" + 0). In the last expression the character array that denotes the string literal is implicitly converted to pointer to its first element. So the last expression is equivalent to the expression sizeof( char * ).
Also pay attention to that if you have for example
int x = 10;
size_t n = sizeof( ++x );
then x will not be equal to 11 after the declaration of n because in this case the expression used as an operand of the sizeof operator is not evaluated. It is only the type of the expression that is important.
On the other hand if you have a variable length array then the operator sizeof will evaluate at run-time to determine its size. Here is a demonstrative program.
#include <stdio.h>
int main(void)
{
for ( int i = 1; i < 5; i++ )
{
int a[i];
printf( "sizeof( a[%d] ) = %zu\n", i, sizeof( a ) );
}
return 0;
}
The program output is
sizeof( a[1] ) = 4
sizeof( a[2] ) = 8
sizeof( a[3] ) = 12
sizeof( a[4] ) = 16

string manipulation with %s format

May you explain the following output:
main()
{
char f[] = "qwertyuiopasd";
printf("%s\n", f + f[6] - f[8]);
printf("%s", f + f[4] - f[8]);
}
output:
uiopasd
yuiopasd
For example regarding the first printf:
f[8] should represent the char 'o'
f[6] should represent the char 'u'
%s format prints the string (printf("%s", f) is giving the whole "qwertyuiopasd")
So how does it come together, what is the byte manipulation here?

There are multiple problems in the code posted:
the missing return type for main is an obsolete syntax. you should use int main().
the prototype for printf is not in scope when the calls are compiled. This has undefined behavior. You should include <stdio.h>.
the expression f + f[6] - f[8] has undefined behavior: addition is left associative, so f + f[6] - f[8] is evaluated as (f + f[6]) - f[8]. f[6], which is the letter u is unlikely to have a value less than 14 (in ASCII, its value is 117) so f + f[6] points well beyond the end of the string, thus is an invalid pointer and computing f + f[6] - f[8] has undefined behavior, in spite of the fact that 'u' - 'o' has the value 6 for the ASCII character set. The expression should be changed to f + (f[6] - f[8]).
Assuming ASCII, the letters o, u and t have values 111, 117 and 116.
f + (f[6] - f[8]) is f + ('u' - 'o') which is f + (117 - 111) or f + 6.
f + 6 is the address of f[6], hence a pointer to the 7th character of the string "qwertyuiopasd". Printing this string produces uiopasd.

Assume the characters follows the ASCII scheme, the ASCII value of the following characters are :
o (f[8]): 111
u (f[6]): 117
t (f[4]): 116
f is the pointer to the char[], the first statement values to f + 6, this pointer will point to the 6th element of the array, and on printing, it will print from the sixth element till the point it encounters \0.
Similarly, the second statement evaluates to f + 5, thus you get yuiopasd as output.
What does f + n means?
You can perform the following arithmetic on the pointers ++, --, +, -. The pointer stores the memory address, and the increment operator on a pointer will increase the address value by the size of the type.
for eg for an integer, if f points to address location 1000, and we are storing 4 bytes int in the array, then f + 1 will point to 1004, which is the next element in the array.

it is a simple pointer arithmetic which will be easier to understand with this example
int main(void)
{
char f[] = "9876543210";
printf("%s , f[6]=%d, f[8]=%d, f[6]-f[8]=%d, f + f[6] - f[8] = %s\n",f, f[6], f[8], f[6]-f[8], f + f[6] - f[8]);
The result is :
9876543210 , f[6]=51, f[8]=49, f[6]-f[8]=2, f + f[6] - f[8] = 76543210
f[n] is the integer value of the nth index element of the array.
In this example the difference between ASCII codes of the 6th and 8th elements is 2.
When we add 2 to the char pointer it will reference the element 2 chars ahead which in our case is '7'

This is all about pointer arithmetic. The expression f + f[6] - f[8] evaluates to a char* pointer (like its first operand, because the name of an array variable is syntactically equivalent to a pointer to its first element), and will expand to this:
f + (int)'u' - (int)'o'
(where 'u' and 'o' represent f[6] and f[8], respectively).
The values that represent the characters, 'u' and 'o', are (on almost all modern systems, which use the ASCII system), separated by 6, so the expression adds 6 to the f address and prints the string starting from its 7th element.
Similarly for the expression f + f[4] - f[8] - but here, the difference is only 5 ('t' - 'o').

What does "20"[1] do?

In a test exam, we were told to find the value of some expressions.
All but 1 were clear, which was "20"[1]. I thought it was the 1st index of the number, so 0, but testing with a computer it prints 48.
What exactly does that 'function' do?

It's not a function, it's just indexing an array.
"20" here is a character array, and we're taking the value at index 1 - which is '0' - the character '0'.
This is the same as
char chArr[] = "20"; // using a variable to hold the array
printf ("%d", chArr[1]); // use indexing on the variable, decimal value 48
printf ("%c", chArr[1]); // same as above, will print character representation, 0
The decimal value of '0' is 48, according to ASCII encoding, the most common encoding around these days.

Well, depending on your point of view it's either '0', 48, or 0x30.
#include <stdio.h>
int main()
{
printf("'%c' %d 0x%X\n", "20"[1], "20"[1], "20"[1]);
return 0;
}
The above prints
'0' 48 0x30

In this subscripting expression
"20"[1]
"20" is a string literal that has the type char[3]. Used in expressions the literal is converted to pointer to its first element.
So this expression
"20"[1]
yields the second element of the string literal that is '0'.
You can imagine this record like
char *p = "20";
char c = p[1];
48 is the ASCII value of the character '0'.
A more exotic record can look like
1["20"]
that is equivalent to the previous record.
From the C Standard (6.5.2.1 Array subscripting)
2 A postﬁx expression followed by an expression in square brackets []
is a subscripted designation of an element of an array object. The
deﬁnition of the subscript operator [] is that E1[E2] is identical to
(*((E1)+(E2))). Because of the conversion rules that apply to the
binary + operator, if E1 is an array object (equivalently, a pointer
to the initial element of an array object) and E2 is an integer,
E1[E2] designates the E2-th element of E1 (counting from zero).
Here is a demonstrative program
#include <stdio.h>
int main(void)
{
printf( "\"20\"[1] == '%c' and its ASCII value is %d\n", "20"[1], "20"[1] );
printf( "1[\"20\"] == '%c' and its ASCII value is %d\n", 1["20"], 1["20"] );
return 0;
}
Its output is
"20"[1] == '0' and its ASCII value is 48
1["20"] == '0' and its ASCII value is 48

How does printf(3+"excellent"+4) this line run?

I don't understand why the output is nt in this program.
Can anyone explain this program?
#include <stdio.h>
#include <stdlib.h>
int main(){
printf(3+"excellent"+4); //output is "nt"
return 0;
}

"excellent" is an array of type char[10], the elements of which are the 9 letters of the word and the terminating '\0'. And then, C11 6.3.2.1p3,
Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue. [...]
i.e. it is converted to a pointer to the first character of the string, (e), and then has the type char *.
Now we have two additions:
(3 + (char *)"excellent") + 4
The C standard says (simplified, C11 6.5.6p8) that when adding an integer and a pointer together, the result will be a pointer of the same type, and will be interpreted so that if the pointer p was pointing to element n of an array, then p + m will result in a pointer that will point to element n + m of the same array, or one past the end, or, if n + m is outside the bounds of the array or one past the end, the behaviour is undefined.
I.e. 3 + "excellent" will give a pointer that will point to the 2nd letter e of excellent. Now of course since the parenthesized expression has type char * and it points to the element 3 of the array, if we add 4 to it, we get a pointer that points to the element 7, i.e. 8th letter, the n.
<-------------- char [10] -------------->
+---+---+---+---+---+---+---+---+---+---+
| e | x | c | e | l | l | e | n | t | \0|
+---+---+---+---+---+---+---+---+---+---+
^ ^ ^
|  | |
first character, "excellent" after lvalue conversion
| |
+ 3 + "excellent"
|
+ 3 + "excellent" + 4
Now finally, what will happen when we call printf giving such a pointer as an argument? printf will consider the argument as being a pointer to a first character of a null terminated string that is the format string. Other than special sequences that start with %, all characters are copied verbatim to the output until the terminating null is met.
Another way to look into these is to remember that
*(a + b)
is equal to
a[b] (or even b[a])
and since &*x is equivalent to x,
&*(a + b) == (a + b) == (b + a) == &a[b] == &b[a]`
and we get that
3 + "excellent" + 4
equals
&"excellent"[3] + 4
which equals
&"excellent"[3 + 4]
i.e.
&"excellent"[7]

This
printf(3+"excellent"+4);
Can be written in a little longer but a way more clear way:
const char *str = "excellent";
const char *to_print = str + 3 + 4; // equivalent to &str[7] which points to 'n'
printf(to_print); // or printf("%s", to_print); which prints "nt"

It is because it is printing everything after the 7th character. The plus tells it where to start the print. If you change it to printf(2+"excellent"+4) you get "ent"

What's the longest string that can be printed with "%1.17g" format for any double precision float?

I'm maintaining a C json library and I need to know what's the maximum numbers of characters sprintf will output with "%1.17g" format string. Currently I'm allocating 1100 bytes (based on What is the maximum length in chars needed to represent any double value?) which seems quite wasteful. If I understand correctly it should never be longer than 22 characters (1 for integer part, 1 for dot, 16 for mantissa, 4 for "e-XX"). However problems with floating point numbers can be quite counterintuitive and I'm not sure if I'm not missing something. Is my reasoning correct?

Continuing from the comment,
The %1. (one before the '.') specifies the minimum field-width, it provides no limitation on the number of digits that can appear. If the number of digits exeeds the field-width, the field is expanded.
For g or G conversion specifiers the 17 specifies the "the maximum number of significant digits". Further "Style e is used if the exponent from its conversion is less than -4 or greater than or equal to the precision."
e, E The double argument is rounded and converted in the style [-]d.ddde±dd where there is one digit before the decimal-point
character and the number of digits after it is equal to the precision;
if the precision is missing, it is taken as 6; if the precision is
zero, no decimal-point character appears. An E conversion uses the
letter 'E' (rather than 'e') to introduce the exponent. The
exponent always contains at least two digits; if the value is zero,
the exponent is 00.
The maximum number of digits would then be:
'(+/-)' + 1 + '.' + 17 + 'e' + '(+/-)' + XXX + '\0' = 26-chars
(where XXX is a maximum of 308)
For good measure a buffer of 32-chars should suffice. There is nothing wrong with an 1100-char buffer. I'd rather be 10,000 bytes too long, than 1-byte too short.

What's the longest string that can be printed with “%1.17g” format for any double
Using "%1.17g" prints the double using various styles:
// Large/small values in exponential notation
printf("%1.17g\n", -1.0e200/7);
printf("%1.17g\n", -1.0e-200/7);
printf("%1.17g\n", -1.0e0/7);
-1.4285714285714286e+199
-1.4285714285714286e-201
-1.4285714285714286e-06
// middle values in fixed notation
printf("%1.17g\n", -1.0e-2/7);
printf("%1.17g\n", -1.0e-5/7);
-0.14285714285714285
-0.0014285714285714286
// non-finite values
printf("%1.17g\n", -NAN);
printf("%1.17g\n", -INFINITY);
-nan /* this may be longer */
-inf
The longest apparent string size is 25 char:
sign digit point fraction e sign exponent null
- 1 . 4285714285714286 e + 199 \0
1 1 1 17-1 1 1 3 1
What could this be longer?
C allows not-a-numbers to also include a payload with may include many characters. (I doubt more than the payload written in decimal. 16 with binary64)
The exponent range may be need more than 3 characters. (perhaps a 4 or 5 digit exponent)
double may require more the 17 digits to differentiate all double. (Detectable with DBL_DECIMAL_DIG)
The present locale may add extra characters for a double (not so likely)
The lead 1 in "%1.17g" is the minimum characters to print. It serves scant purpose here.
Solution: estimate the longest buffer using generous considerations - and then double it.
#define G_SIZE (1 + 1 + 1 + DBL_DECIMAL_DIG-1 + 1 + 1 + 5 + 1)
char buf[G_SIZE * 2];
int cnt = snprintf(buf, sizeof buf, "%.*g", DBL_DECIMAL_DIG, value);
if (cnt < 0 || cnt >= sizeof buf) {
unexpected_conversion_hanlder();
}
or use a variable length array and 2 calls to snprintf()
int cnt = snprintf(NULL, 0, "%.*g", DBL_DECIMAL_DIG, value);
char buf[cnt + 1];
snprintf(buf, sizeof buf , "%.*g", DBL_DECIMAL_DIG, value);

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

String literals for array - c

Related

sizeof operator in C behavior with strings [duplicate]

string manipulation with %s format

What does "20"[1] do?

How does printf(3+"excellent"+4) this line run?

What's the longest string that can be printed with "%1.17g" format for any double precision float?

Categories

Resources