According to the definition of printf, it says that first argument should be an array i.e char* followed by ellipses ... i.e variable arguments after that. If I write:
printf(3+"helloWorld"); //Output is "loWorld"`
According to the definition shouldn't it give an error?
Here is the definition of printf:
#include <libioP.h>
#include <stdarg.h>
#include <stdio.h>
#undef printf
/* Write formatted output to stdout from the format string FORMAT. */
/* VARARGS1 */
int __printf(const char *format, ...) {
va_list arg;
int done;
va_start(arg, format);
done = vfprintf(stdout, format, arg);
va_end (arg);
return done;
}
#undef _IO_printf
ldbl_strong_alias(__printf, printf);
/* This is for libg++. */
ldbl_strong_alias(__printf, _IO_printf);
This is not an error.
If you pass "helloWorld" to printf, the string literal is converted to a pointer to the first character.
If you pass 3+"helloWorld", you're adding 3 to a pointer to the first character, which results in a pointer to the 4th character. This is still a valid pointer to a string, it's just not the whole string that was defined.
3+"helloWorld" is of char * type (after conversion in call to printf). In C, the type of a string literal is char []. When passed as an argument to a function, char [] will convert to pointer to its first element (array to pointer conversion rule). Therefore, "helloWorld" will be converted to a pointer to the element h and 3+"helloWorld" will move the pointer to the 4th element of the array "helloWorld".
From Pointer Arithmetic:
If the pointer P points at an element of an array with index I, then
P+N and N+P are pointers that point at an element of the same array with index I+N
P-N is a pointer that points at an element of the same array with index {tt|I-N}}
The behavior is defined only if both the original pointer and the result pointer are pointing at elements of the same array or one past the end of that array. ....
The type of string literal is char[N], where N is size of string (including null terminator).
From C Standard#6.3.2.1p3 [emphasis mine]
3 Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ''array of type'' is converted to an expression with type ''pointer to type'' that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
So, in the expression
3+"helloWorld"
"helloWorld", which is of type char [11] (array of characters), convert to pointer to character that points to the initial element of the array object.
Which means, the expression is:
3 + P
where P is pointer to initial element of "helloWorld" string
----------------------------------------------
| h | e | l | l | o | W | o | r | l | d | \0 |
----------------------------------------------
^
|
P (pointer pointing to initial element of array)
when 3 gets added to pointer P, the resulting pointer will be pointing to 4th character:
----------------------------------------------
| h | e | l | l | o | W | o | r | l | d | \0 |
----------------------------------------------
^
|
P (after adding 3 the resulting pointer pointing to 4th element of array)
This resulting pointer of expression 3+"helloWorld" will be passed to printf(). Note that the first parameter of printf() is not array but pointer to a null-terminated string and the expression 3+"helloWorld" resulting in pointer to 4th element of "helloWorld" string. Hence, you are getting output "loWorld".
The answers from dbush and haccks are concise and illuminating, and I upvoted the one from haccks in support of the bounty offered by dbush.
The only thing that I find left unsaid is that the way the question title is phrased makes me wonder if the OP would think that e.g. this also should produce an error:
char sometext[] = {'h', 'e', 'l', 'l', 'o', '\0'};
printf (sometext);
since there is no string literal involved at all. The OP needs to understand that one should never think that a function call that takes a char * argument can "only allow taking a string literal as [that] argument".
The answers from dbush and haccks hint at this by mentioning the conversion of a string literal to a char * (and how adding an integer to that evaluates), but I feel that it's worth pointing out explicitly that anything that is treated as a char * can be used, even things not converted from a string literal.
printf(3+"helloWorld"); //Output is "loWorld"
It will not give error because String in C Language String is array of characters and Array name give base address of array.
In case of printf(3+"helloWorld"); 3+"helloWorld" is giving the address of fourth element of array of charecters i.e of String
This is still a valid pointer to a string i.e char*
printf only allow taking a char* as the first argument
The first argument to printf is declared as const char *format. It means printf should be passed a pointer to char and the characters pointed to by this pointer will not be changed by printf. There are additional constraints on this first argument:
it should point to a proper C string, that is an array of characters terminated by a null byte.
it may contain conversion specifiers, which must be properly constructed and the corresponding arguments must be passed as extra arguments to printf, with the expected types and order as derived from the format string.
Passing a string constant such as "helloWorld" as the format argument is the most common way to invoke printf. String constants are arrays of char terminated by a null byte which should not be modified by the program. Passing them to functions expecting a pointer to char will cause a pointer to their first byte to be passed, as is the case for all arrays in C.
The expression "helloWorld" + 3 or 3 + "helloWorld" evaluates to a pointer to the 4th byte of the string. It is equivalent to the expression 3 + &("helloWorld"[0]), &("helloWorld"[3]) or simply &"helloWorld"[3]. As a matter of fact, it is also equivalent to &3["helloWorld"] but this latest form is only used for pathological obfuscation.
printf does not use the bytes that precede the format argument, so passing 3 + "helloWorld" is equivalent to passing "loWorld" and produces the same output.
To be more precise, the first argument to printf() function is not an array of chars but a pointer to array of chars. the difference between them is the same difference between byval and byref in VB world.
a pointer can be incremented and decremented using (++ and --) or applying arithmetic operations (+ and -).
in your case you are passing a pointer to "helloWorld" incremented by three, thus it points to the forth element of the array of chars "helloWorld".
lets simplify this in asm pseudo code
MOV EAX, offset ("helloWorld")
ADD EAX, 3
PUSH EAX
CALL printf
you maybe think that 3+"hello word" do concatenation between 3 and "hello world", but in C concatenation is done otherwise. the simplest way to do is sprintf(buff, "%d%s",3,"hello wrord");
It's pretty strange, but not wrong. By doing 3 + you are moving your pointer to a different location.
The same thing work when you initialize a char *:
char *str1 = "Hello";
char *str2 = 2 + str1;
str2 is now equal to "llo".
Related
I got confused making a printing function.
void Printing(int* pi, char* pa)
{
printf("%d", *pi);
printf("%s", *pa);
}
Code above has an error in 2nd printf().
But code below doesn't have. It prints the integer and string well.
void Printing(int* pi, char* pa)
{
printf("%d", *pi);
printf("%s", pa);
}
So far, I gave variables to printf(). But I don't understand why I need to give the pointer to the 2nd printf().
In your code
printf("%s", *pa);
should be
printf("%s", pa);
as %s expects the starting address of a null-terminated character array (i.e., a pointer, not the char as you have supplied).
From C11, chapter 7.21.6.1 The fprintf
s
If no l length modifier is present, the argument shall be a pointer to the initial element of an array of character type. Characters from the array are written up to (but not including) the terminating null character. [...]
To add, *pa is same as pa[0], which is of type char. To print that, you'd need to use %c conversion specifier.
But I don't understand why I need to give the pointer to the 2nd printf().
Because strings work a little weirdly in C. Technically, there is no type for strings in C. So const char */char * is used for strings. The way this works is that the pointer points to the beginning of the string, and the string ends with a NUL character '\0'. To visualize it, say you call Printing with Printing(0, "Hello");, you pass a pointer to the beginning of a string literal which looks like this in memory:
+---+---+---+---+---+---+
| H | e | l | l | o |END|
+---+---+---+---+---+---+
And the pointer you pass points to the first character, H. If you understand this, you will understand why it needs a pointer. If you dereference it, you will only give the first character H, so it won't be able to print the whole string.
I am a beginner C programmer and I am having difficulties grasping the concept of pointers. My question is why does the program require char *lowercase to run normally and why, if I remove the *, it breaks the program?
char *lowercase(char a[]){
int c = 0;
char *lowercase_string = malloc(300);
strcpy(lowercase_string,a);
while (lowercase_string[c] != '\0') {
if( lowercase_string[c] >= 'A' && lowercase_string[c] <= 'Z' ){
lowercase_string[c] = lowercase_string[c]+32;
}
c++;
}
return lowercase_string;
}
In C, strings are just contiguous chunks of characters, ending with a null byte (the character with a value of 0, denoted by '\0' or '\x00'). To represent a string, a pointer to the first element is used, which is what the star means after char. When you return a pointer, you return where the string is, and people can use that information to get the whole string just by iterating (by adding to the pointer/looking past it) until they find a null byte. If you just return char, you only can return one character of the string, which is nowhere close to the full string.
As written is previous answers, the char pointer points to the first letter of the string, which gives you the ability to scan the whole string (moving 1 char at a time).
Additionally, since your string is allocated on the heap, it must be accessed through a pointer, as the allocated block of memory is nameless.
The root of the issue is how C treats arrays (not just arrays of character type, but any type).
Arrays are not pointers - they are contiguous sequences of objects of some type. No pointer is part of the array object itself. When you declare an array like int arr[5];, you get something like this in memory:
+–––+
arr: | | arr[0]
+–––+
| | arr[1]
+–––+
...
+–––+
| | arr[4]
+–––+
However, unless it is the operand of the sizeof or unary & operators or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T" and the value of the expression will be the address of the first element of the array.
When you pass an array argument to a function, what the function actually receives is a pointer to the first element. When you attempt to return an array from a function, that array expression will be converted to a pointer (more on this later). In fact, you cannot declare a function to return an array type - something like
int foo( void )[10];
is not allowed. You can return pointers to arrays:
int (*foo( void ))[10];
but not arrays directly. This is why the *alloc functions return pointers instead of array types.
This is also why returning non-static local arrays from a function is a problem - you’re not returning the value of the array (that is, a copy of the array’s contents), you’re returning its address. After the function returns, though, that array no longer exists and that pointer value is invalid. That’s why you need to use malloc to allocate storage that will hang around after the function returns.
Strings are sequences of character values including a zero-valued terminator. The string "hello" is represented by the sequence {'h', 'e', 'l', 'l’, 'o', 0}. Strings (including string literals) are stored in arrays of character type (char for ASCII, EBCDIC, and UTF-8 encodings, wchar_t for "wide" encodings like UTF-16).
So, that’s all background. In your specific case, you need to declare lowercase as char * because it is returning a value of that type (lowercase_string). It breaks when you leave the * off because you’re telling the rest of the program it returns a single character value, not a pointer, and the rest of the program is expecting it to return a pointer.
Your program is returning a pointer so you must call the function with char *.
And you know, in c a char array is also treated as a char pointer
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Please take a look at this code.
#include <stdio.h>
int main()
{
char *p;
p = "%d";
p++;
p++;
printf(p-2,23);
return 0;
}
I have the following questions
1) How can a pointer to a character data type can hold a string data type?
2) What happens when p is incremented twice?
3) How can the printf()can print a string when no apparent quotation marks are used?
"How can a pointer to a character data type can hold a string data type?" Well, it's partly true that in C, type 'pointer to char' is the string type. Any function that operates on strings (including printf) will be found to accept these strings via parameters of type char *.
"How can printf() print a string when no apparent quotation marks are used?" There's no rule that says you need quotation marks to have a string! That thing with quotation marks is a string constant or string literal, and it's one way to get a string into your program, but it's not at all the only way. There are lots of ways to construct (and manipulate, and modify) strings that don't involve any quotation marks at all.
Let's draw some pictures representing your code:
char *p;
p is a pointer to char, but as you correctly note, it doesn't point anywhere yet. We can represent it graphically like this:
+-----------+
p: | ??? |
+-----------+
Next you set p to point somewhere:
p = "%d";
This allocates the string "%d" somewhere (it doesn't matter where), and sets p to point to it:
+---+---+---+
| % | d |\0 |
+---+---+---+
^
|
\
\
\
|
+-----|-----+
p: | * |
+-----------+
Next, you start incrementing p:
p++;
As you said, this makes p point one past where it used to, to the second character of the string:
+---+---+---+
| % | d |\0 |
+---+---+---+
^
|
|
|
|
|
+-----|-----+
p: | * |
+-----------+
Next,
p++;
Now we have:
+---+---+---+
| % | d |\0 |
+---+---+---+
^
|
/
/
/
|
+-----|-----+
p: | * |
+-----------+
Next you called printf, but somewhat strangely:
printf(p-2,23);
The key to that is the expression p-2. If p points to the third character in the string, then p-2 points to the first character in the string:
+---+---+---+
| % | d |\0 |
+---+---+---+
^ ^
+----|----+ |
p-2: | * | /
+---------+/
/
|
+-----|-----+
p: | * |
+-----------+
And that pointer, p-2, is more or less the same pointer that printf would have received if you're more conventionally called printf("%d", 23).
Now, if you thought printf received a string, it may surprise you to hear that printf is happy to receive a char * instead — and that in fact it always receives a char *. If this is surprising, ask yourself, what did you thing printf did receive, if not a pointer to char?
Strictly speaking, a string in C is an array of characters (terminated with the '\0' character). But there's this super-important secret fact about C, which if you haven't encountered yet you will real soon (because it's really not a secret at all):
You can't do much with arrays in C. Whenever you mention an array in an expression in C, whenever it looks like you're trying to do something with the value of the array, what you get is a pointer to the array's first element.
That pointer is pretty much the "value" of the array. Due to the way pointer arithmetic works, you can use pointers to access arrays pretty much transparently (almost as if the pointer was the array, but of course it's not). And this all applies perfectly well to arrays of (and pointers to) characters, as well.
So since a string in C is an array of characters, when you write
"%d"
that's an array of three characters. But when you use it in an expression, what you get is a pointer to the array's first element. For example, if you write
printf("%d", 23);
you've got an array of characters, and you're mentioning it in an expression, so what you get is a pointer to the array's first element, and that's what gets passed to printf.
If we said
char *p = "%d";
printf(p, 23);
we've done the same thing, just a bit more explicitly: again, we've mentioned the array "%d" in an expression, so what we get as its value is a pointer to its first element, so that's the pointer that's used to initialize the pointer variable p, and that's the pointer that gets passed as the first argument to printf, so printf is happy.
Up above, I said "it's partly true that in C, type 'pointer to char' is the string type". Later I said that "a string in C is an array of characters". So which is it? An array or a pointer? Strictly speaking, a string is an array of characters. But like all arrays, we can't do much with an array of characters, and when we try, what we get is a pointer to the first element. So most of the time, strings in C are accessed and manipulated and modified via pointers to characters. All functions that operate on strings (including printf) actually receive pointers to char, pointing at the strings they'll manipulate.
the following explains each statement in the posted code:
#include <stdio.h>// include the header file that has the prototype for 'printf()'
int main( void ) // correct signature of 'main' function
{
char *p; // declare a pointer to char, do not initialize
p = "%d"; // assign address of string to pointer
p++; // increment pointer (so points to second char in string
p++; // increment pointer (so points to third char in string
printf(p-2,23);// use string as 'format string' in print statement,
// and pass a parameter of 23
return 0; // exit the program, returning 0 to the OS
}
1) How can a pointer to a character data type can hold a string data type?
Ans: String is not a basic data type in C. String is nothing but a continuous placement of char in memory until '\0' is encountered.
2) What happens when p is incremented twice?
Ans: It now points to the '\0' character.
3) How can the printf()can print a string when no apparent quotation marks are used
Ans: A string is always represented in quotation marks so extra quotes are not needed.
1. How can a pointer to a character data type can hold a string data type?
-> Char pointer will hold the address of char datatype, since string is collection of char datatypes. Hence char pointer can hold the string data type..
2. What happens when p is incremented twice?
-> When you assign the char pointer to string pointer will point to first char. So when you increment the pointer twice, it will hold the address of 3rd char, in your case it is'\0';
3. How can the printf()can print a string when no apparent quotation marks are used?
-> printf(p-2,23); Uses string as format identifier in your case it is "%d".
I have a two part question:
Understand output from sizeof
Understand how strings are stored in variables (e.g. bits and ram)
Question 1
I'm trying to understand the output from the following piece of C code.
printf("a: %ld\n", sizeof("a")); // 2
printf("abc: %ld\n", sizeof("abc")); // 4
It always seems to be one larger than the actual number of characters specified.
The docs suggest that the returned value represents the size of the object (in this case a string) in bytes. So if the size of a gives us back 2 bytes, then I'm curious how a represents 16 bits of information.
If I look at the binary representation of the ASCII character a I can see it is 01100001. But that's only showing 3 bits out of 1 byte being used.
Question 2
Also, how do large strings get stored into a variable in C? Am I right in thinking that they have to be stored within an array, like so:
char my_string[5] = "hello";
Interestingly when I have some code like:
char my_string = "hello";
printf("my_string: %s\n", my_string);
I get two compiler errors:
- incompatible pointer to integer conversion initializing 'char' with an expression of type 'char [6]'
- format specifies type 'char *' but the argument has type 'char'
...which I don't understand. Firstly it states the type is presumed to be a size of [6] when there's only 5 characters. Secondly the mention of a pointer here seems odd to me? Why does printf expect a pointer and why does not specifying the length of the variable/array result in a pointer to integer error?
By the way I seemingly can set the length of the variable/array to 5 rather than 6 and it'll work as I'd expect it to char my_string[5] = "hello";.
I'm probably just missing something very basic/fundamental about how bits and strings work in C.
Any help understanding this would be appreciated.
The first part of the question is due to the way strings are stored in C. Strings in C are nothing more than a series of characters (char) with a \0 added at the end, which is the reason you're seeing a +1 when you do sizeof. Notice in your second part if you were to say char my_string[4] = "hello"; you'd also get a compiler error saying there wasn't enough size for this string. That's also related to this.
Now onto the second part, strings themselves are a series of characters. However, you don't store every character by themselves in a variable. You instead have a pointer to these series of characters that will allow you to access them from some part of memory. Additional information regarding pointers and strings in C can be found here: Pointer to a String in C
In C, a string is a sequence of character values followed by a zero valued terminator. For example, the string "hello" is the sequence of character values {'h', 'e', 'l', 'l', 'o', 0 }1. Strings (including string literals) are stored as arrays of char (or wchar_t for wide-character strings). To account for the terminator, the size of the array must always be one greater than the number of characters in the string:
char greeting[6] = "hello";
The storage for greeting will look like
+---+
greeting: |'h'| greeting[0]
+---+
|'e'| greeting[1]
+---+
|'l'| greeting[2]
+---+
|'l'| greeting[3]
+---+
|'o'| greeting[4]
+---+
| 0 | greeting[5]
+---+
Storage for a string literal is largely the same2:
+---+
"hello": |'h'| "hello"[0]
+---+
|'e'| "hello"[1]
+---+
|'l'| "hello"[2]
+---+
|'l'| "hello"[3]
+---+
|'o'| "hello"[4]
+---+
| 0 | "hello"[5]
+---+
Yes, you can apply the subscript operator [] to a string literal just like any other array expression.
Except when it is the operand of the sizeof or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array. So, the string literal "hello" is an expression of type "6-element array of char". If I pass that literal as an argument to a function like
printf( "%s\n", "hello" );
then both of the string literal expressions "%s" and "hello" are converted from "4-element array of char"3 and "6-element array of char" to "pointer to char", so what printf receives are pointer values, not array values.
You've already seen two exceptions to the conversion rule. You saw it in your code when you used the sizeof operator and got a value one more than you expected. sizeof evaluates to the number of bytes required to store the operand. Because of the zero terminator, it takes N+1 bytes to store an N-character string.
The second exception is the declaration of the greeting array above; since I'm using the string literal to initialize the array, the literal is not converted to a pointer value first. Note the you can write that declaration as
char greeting[] = "hello";
In that case, the size of the array is taken from the size of the initializer.
The third exception occurs when the array expression is the operand of the unary & operator. Instead of evaluating to a pointer to a pointer to char (char **), the expression &greeting evaluates to type "pointer to 6-element array of char", or char (*)[6].
The length of a string is the number of characters before to zero terminator. All the standard library functions that deal with strings expect to see that terminator. The size of the array to store that string must be at least one greater than the maximum length of the string you intend to store.
Sometimes you'll see people write '\0' instead of a naked 0 to represent a string terminator; they mean the same thing.
Storage for string literals is allocated at program startup and held until the program terminates. String literals may be stored in a read-only memory segment; attempting to modify the contents of a string literal results in undefined behavior.
'\n' counts as a single character.
If
char d[3];
d[0] ='p';
d[1] ='o';
d[2] ='\0';
how come
printf("%s\n",d[0]);
won't work properly.
But if I have
char n[2][4];
n[0][0]=’T’; n[0][1]=’o’; n[0][2]=’m’; n[0][3]=0;
n[1][0]=’S’; n[1][1]=’u’; n[1][2]=’e’; n[1][3]=0;
printf("%s %s\n", n[0],n[1]);
it will print the entire string?
Because
d[0] - is a character
And
n - is and array of and array of characters. I.e. an array of strings
d[0] is the first character contained in the array whereas printf requires the address of that first character.
It's that address that you get when you use d in your source code, or you can explicitly work it out with &(d[0]), the address of the character that's at the address at the start of the array :-).
The reason why your two-dimensional arrays work is exactly the same: n[0] is the address of n[0][0], the same way that d is the address of d[0].
If you were to pass n[0][0] (the character) to printf, you would have the same problem as when you passed d[0].
printf("%s\n",d[0]); is technically undefined behavior. The documentation for printf describes the various conversion specifiers.
s
If no l modifier is present: The const char * argument is expected to
be a pointer to an array of character type (pointer to a string).
Characters from the array are written up to (but not including) a
terminating null byte ('\0');
If you enable warnings, i.e. -Wall, you may get:
warning: format '%s' expects argument of type 'char *', but argument 2 has type
'int' [-Wformat=]
printf("%s\n",d[0]);
For why the second example works, read about array-to-pointer conversions. James McNellis writes:
In both C and C++, an array can be used as if it were a pointer to its
first element. Effectively, given an array named x, you can replace
most uses of &x[0] with just x.
[...]
void f(int* p);
int x[5];
f(x); // this is the same as f(&x[0])
So n[0] is equivalent to &n[0][0], just as d is equivalent to &d[0]. But d is not equivalent to d[0].
When you try and print yor string d, you pass it the first character of the array:
printf("%s\n",d[0]);
d[0] means "The thing stored in index 0 of array d". This is the literal character 'p' and not the beginning of the string.
Passing a character to printf with the string (%s) us undefined behaviour and could cause problems. Don't do it
When printing a string with printf you need to pass a pointer to the string, which is the pointer to the first element of the string (and not the element itself).
&d[0];
^^ ^
|| Index 0
|array d
Address of
This evaluates to the address of the 0th index of array d.
Additionally, we can take advantage of the fact that arrays degrade to pointers of the type.
&d[0] == &(*(d+0)) == d
This means that we can do away with the & and [] operators and just pass in d:
printf("%s\n", d);
This will print the string.
With your second example, having the two dimensional array [][] provides an extra level of indirection (equates to a char **).
This means that when you were calling printf, when you were passing in n[0] and n[1], you were actually passing in the pointer to the first character in each string. i.e. &n[0][0] and &n[1][0].
ideone example