Confusion about the way C handles strings

Confusion about the way C handles strings - arrays

Why wouldn't
char *name = "asd";
printf("%p\n%p", (void *)&name, (void *)&name[0]);
give the same output as
char name[] = "asd";
printf("%p\n%p", (void *)&name, (void *)&name[0]);
I've read that C takes strings as a pointer to their first char, till it reaches the '\0' but the code above doesn't like it, so it is confusing for a C beginner.

First, let's give your two variables different names and contents, so we can clearly tell them apart.
char *namep = "asd";
char namea[] = "zxc";
These result in data structures which might look like this:
+-------+
namep: | * |
+---|---+
|
/
|
V
+---+---+---+----+
| a | s | d | \0 |
+---+---+---+----+
+---+---+---+----+
namea: | z | x | c | \0 |
+---+---+---+----+
Now let's look at your two printf calls:
printf("%p\n%p", (void *)&namep, (void *)&namep[0]);
Now, &namep gives you the address of the namep pointer.
But &namep[0] gives you the address of the first character in the pointed-to string (a). If you had printed
printf("%p\n", (void *)namep);
you would have seen the same thing.
printf("%p\n%p", (void *)&namea, (void *)&namea[0]);
Here, &namea gives you the address of the array.
And &namea[0] gives you the address of its first character (z) — which is the same place.
And in fact, due to the special handling (the "decay" of arrays into pointers), if you had said
printf("%p\n", (void *)namea);
you would also have seen the same thing.
You asked:
I've read that C takes strings as a pointer to their first char, till it reaches the '\0'.
That's correct.
Suppose you wrote the code
char *p;
for(p = namep; *p != '\0'; p++)
putchar(*p);
This would print your namep string, asd. There's no mystery here. namep was already a pointer, pointing at the first character of the string, so this scrap of code takes its own pointer p, which starts pointing where namep points, and prints the pointed-to characters until it gets to the terminating \0.
What's perhaps more surprising is that you can do exactly the same thing with namea:
char *p;
for(p = namea; *p != '\0'; p++)
putchar(*p);
This works, too, printing zxc, and if you don't believe me, I encourage you to type it into your C compiler and try it.
Now, you may be asking, if p is a pointer and namea is an array, how can the loop initialization p = namea work? And this, again, is due to the "decay" of arrays to pointers. Again, when you try to use namea's value like this, what you get — the value that gets assigned to p — is automatically a pointer to namea's first element. And of course that's just what you want. p starts there, and prints characters 'til it finds a \0, thus printing zxc.

Related

What is the output of the following code in C? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Please take a look at this code.
#include <stdio.h>
int main()
{
char *p;
p = "%d";
p++;
p++;
printf(p-2,23);
return 0;
}
I have the following questions
1) How can a pointer to a character data type can hold a string data type?
2) What happens when p is incremented twice?
3) How can the printf()can print a string when no apparent quotation marks are used?

"How can a pointer to a character data type can hold a string data type?" Well, it's partly true that in C, type 'pointer to char' is the string type. Any function that operates on strings (including printf) will be found to accept these strings via parameters of type char *.
"How can printf() print a string when no apparent quotation marks are used?" There's no rule that says you need quotation marks to have a string! That thing with quotation marks is a string constant or string literal, and it's one way to get a string into your program, but it's not at all the only way. There are lots of ways to construct (and manipulate, and modify) strings that don't involve any quotation marks at all.
Let's draw some pictures representing your code:
char *p;
p is a pointer to char, but as you correctly note, it doesn't point anywhere yet. We can represent it graphically like this:
+-----------+
p: | ??? |
+-----------+
Next you set p to point somewhere:
p = "%d";
This allocates the string "%d" somewhere (it doesn't matter where), and sets p to point to it:
+---+---+---+
| % | d |\0 |
+---+---+---+
^
|
\
\
\
|
+-----|-----+
p: | * |
+-----------+
Next, you start incrementing p:
p++;
As you said, this makes p point one past where it used to, to the second character of the string:
+---+---+---+
| % | d |\0 |
+---+---+---+
^
|
|
|
|
|
+-----|-----+
p: | * |
+-----------+
Next,
p++;
Now we have:
+---+---+---+
| % | d |\0 |
+---+---+---+
^
|
/
/
/
|
+-----|-----+
p: | * |
+-----------+
Next you called printf, but somewhat strangely:
printf(p-2,23);
The key to that is the expression p-2. If p points to the third character in the string, then p-2 points to the first character in the string:
+---+---+---+
| % | d |\0 |
+---+---+---+
^ ^
+----|----+ |
p-2: | * | /
+---------+/
/
|
+-----|-----+
p: | * |
+-----------+
And that pointer, p-2, is more or less the same pointer that printf would have received if you're more conventionally called printf("%d", 23).
Now, if you thought printf received a string, it may surprise you to hear that printf is happy to receive a char * instead — and that in fact it always receives a char *. If this is surprising, ask yourself, what did you thing printf did receive, if not a pointer to char?
Strictly speaking, a string in C is an array of characters (terminated with the '\0' character). But there's this super-important secret fact about C, which if you haven't encountered yet you will real soon (because it's really not a secret at all):
You can't do much with arrays in C. Whenever you mention an array in an expression in C, whenever it looks like you're trying to do something with the value of the array, what you get is a pointer to the array's first element.
That pointer is pretty much the "value" of the array. Due to the way pointer arithmetic works, you can use pointers to access arrays pretty much transparently (almost as if the pointer was the array, but of course it's not). And this all applies perfectly well to arrays of (and pointers to) characters, as well.
So since a string in C is an array of characters, when you write
"%d"
that's an array of three characters. But when you use it in an expression, what you get is a pointer to the array's first element. For example, if you write
printf("%d", 23);
you've got an array of characters, and you're mentioning it in an expression, so what you get is a pointer to the array's first element, and that's what gets passed to printf.
If we said
char *p = "%d";
printf(p, 23);
we've done the same thing, just a bit more explicitly: again, we've mentioned the array "%d" in an expression, so what we get as its value is a pointer to its first element, so that's the pointer that's used to initialize the pointer variable p, and that's the pointer that gets passed as the first argument to printf, so printf is happy.
Up above, I said "it's partly true that in C, type 'pointer to char' is the string type". Later I said that "a string in C is an array of characters". So which is it? An array or a pointer? Strictly speaking, a string is an array of characters. But like all arrays, we can't do much with an array of characters, and when we try, what we get is a pointer to the first element. So most of the time, strings in C are accessed and manipulated and modified via pointers to characters. All functions that operate on strings (including printf) actually receive pointers to char, pointing at the strings they'll manipulate.

the following explains each statement in the posted code:
#include <stdio.h>// include the header file that has the prototype for 'printf()'
int main( void ) // correct signature of 'main' function
{
char *p; // declare a pointer to char, do not initialize
p = "%d"; // assign address of string to pointer
p++; // increment pointer (so points to second char in string
p++; // increment pointer (so points to third char in string
printf(p-2,23);// use string as 'format string' in print statement,
// and pass a parameter of 23
return 0; // exit the program, returning 0 to the OS
}

1) How can a pointer to a character data type can hold a string data type?
Ans: String is not a basic data type in C. String is nothing but a continuous placement of char in memory until '\0' is encountered.
2) What happens when p is incremented twice?
Ans: It now points to the '\0' character.
3) How can the printf()can print a string when no apparent quotation marks are used
Ans: A string is always represented in quotation marks so extra quotes are not needed.

1. How can a pointer to a character data type can hold a string data type?
-> Char pointer will hold the address of char datatype, since string is collection of char datatypes. Hence char pointer can hold the string data type..
2. What happens when p is incremented twice?
-> When you assign the char pointer to string pointer will point to first char. So when you increment the pointer twice, it will hold the address of 3rd char, in your case it is'\0';
3. How can the printf()can print a string when no apparent quotation marks are used?
-> printf(p-2,23); Uses string as format identifier in your case it is "%d".

Problems about C program char * pointer

There is a simple C program.
#include<stdio.h>
int main()
{
char *s = "abcde";
printf("%s\n", s);
printf("%s\n", *s); /* This is wrong */
return 0;
}
This is my thought:
variable s is a char * pointer to the string abcde. So variable s is a memory address , and the memory address store the string abced.
The %s format string in function printf() is formatting a string. I don't know why s is the string. The variable s is a char * pointer and *s is the string of abcde, isn't it?

In C, "strings" are NUL-terminated arrays of characters.
The code char *s = "abcde"; does two things:
First, it allocates (in read-only program data) some (unnamed) memory, and populates it with "abcde":
1000 1001 1002 1003 1004 1005
_________________________________________
| a | b | c | d | e | \0 |
|______|______|______|______|______|______|
Then, in the stack frame of main, a pointer to char is allocated, named s, and its value is initialized with the address of your string. In my example, s = 1000.
1000 1001 1002 1003 1004 1005
_________________________________________
| a | b | c | d | e | \0 |
|______|______|______|______|______|______|
^
|
|
s = 1000
The %s format specifier tells printf to expect the address of a NUL-terminated string as the corresponding argument..
When you pass s, you are doing just that: telling printf that your string lives at address 1000. printf goes to that address, and starts reading the characters there (a, b, c...) until it encounters a NUL ('\0') character, at which point it stops. It has now read your string.
When you pass *s, two things happen. First, the program de-references the pointer. Since it is a pointer-to-char, that means it reads one character from the memory at 1000. The result of this is 'a', which is the decimal number 97. Now, this value is passed to printf (as before), and printf still thinks it's an address. However, 97 is an invalid address, and your program crashes.
If you need to brush up on your understanding of pointers:
How do pointer to pointers work in C?
Everything you need to know about pointers in C
Pointer Basics

and *s is the string of abcde, isn't it?
No. *s is a char, the first char in the string, so it's a. *(s + 1) is b, *(s + 2) is c, and so on. A char* is an address which points, or refers, to some number of chars.
You are lying to printf with that second call and invoking undefined behavior. The character a is passed and interpreted as an address using it's integral value. That's going to lead to bad things.
printf expects a pointer to char when you use the %s format specifier and that's what you have to pass it. So, what ends up happening is that printf reads past (char*)'a' looking for a null terminator. It may or may not find one before segfaulting, but it's UB either way.
Turn your warnings on.

In c, a string is just a string of characters in memory terminated with a null character ('\0') which tells you where the string stops. There are no string objects. As such when you provide a 'string' to printf you pass it a pointer to where the string is held and it figures out the rest.

when printf meet with a string , It automatically put out the string from start address "a" to the end "\0"; This is special. While printf deal with string, it obey this rule. And for a string, "*a" is the first character of the string , not the whole string . So when you want to put out a string , give the start address to printf. "a"."(a+1)"...can help you put out the character you want in the string.

Q: The variable s is a char * pointer and *s is the string of abcde, isn't it?
No.
*s is a specific character pointed to by s. In the question code, it is the character 'a'.
If it helps, *s results in the same character as s[0]. In otherwords, (for the question code) *s is the value of the first character in the array (or string) of characters.
On the other hand, s is a char * that can store an address. It can store any address in memory you like. In the question code, s was initialized to point at a specific static string (or array of characters).
You can print the specific address pointed to by s:
printf("s points to %p\n", s);
You can print the specific character pointed to by s:
printf("The character pointed to by s: %c\n", *s);
You can print the string (or array) of characters pointed to by s:
printf("s points to an array of characters: %s", s);

Pointer to pointer gives segmentation fault?

Here is the code
int main
{
char s[]="prady";
char **p;
p=(char **)&s;
printf("%u %u\n",p,*p);
printf("%u %u\n",&s,s);
printf("%s\n",s);
printf("%s\n",&s);
printf("%u %u\n",s+1,&s+1);
printf("%s\n",p);
printf("%s\n",*p);
}
o/p:
3217062327 1684107888
3217062327 3217062327
prady
prady
3217062328 3217062336
prady
Segmentation fault
My doubt as follows
How both the address is same of s and &s?
If both are same then how they show different when adding 1 to it?
How I got segmentation fault in *p?

First, arrays are not pointers. Pointers are not arrays. Arrays decay into pointers.
1.How both the address is same of s and &s?
char s[]="prady";
--------------------------
s: | p | r | a | d | y | \0 |
--------------------------
The array s is a request for 6 characters to be set aside. In other words, at s there are 6 characters. 's` is a "thing", it doesn't point at anything, it just is.
char *ptr = "prady";
------ --------------------------
|*ptr| --> | p | r | a | d | y | \0 |
------ --------------------------
The pointer ptr requests a place which holds a pointer. The pointer can point at any char or any string literal (continuous chars).
Another way to think about this:
int b; //this is integer type
&b; //this is the address of the int b, right?
int c[]; //this is the array of ints
&c; //this would be the address of the array, right?
So that's pretty understandable how about this:
*c; //that's the first element in the array
What does that line of code tell you? if I deference c, then I get an int. That means just plain c is an address. Since it's the start of the array it's the address of the array, thus:
c == &c;
2. If both are same then how they show different when adding 1 to it.
From my answer to #1 I assume you see why they're not the same. So why do you get different values?
Look at the values you get:
s = 0x3217062327
s+1 = 0x3217062328 // It's 1 bigger, why? Because a char takes 1 byte, s holds chars
// so s (address of a char) + 1 (sizeof char) gives you one more than s
&a + 1 //This is adding 1 (sizeof array) which is bigger than the size of a char
3. How I got segmentation fault in *p.
I think you can get this from my previous two answers...
But:
p is a pointer to a pointer to a character
You set p to the address of the array (which remember, is the array itself)
a deference of p is a pointer to a char (another address), however you can't do that to the array.
When you typecast you tell the compiler "I know better then you so just make these two work". When you segfault... it's because you didn't really know better.

In your case s isn't a pointer. It is an array!
a small change will fix a thing:
char *s = "prady";

1.How both the address is same of s and &s.
s is an array of characters. But, arrays are converted to pointers, save in a few cases: when they are used to initialize an array (e.g: your char s[]="prady";
line), when they are the operand of the unary & operator (plenty of cases in your code), and when they are the operand of the sizeof operator.
2.If both are same then how they show different when adding 1 to it.
They are not the same.
2.How I got segmentation fault in *p.
p contains the address of "prady". *p contains "prady". Attempting to use "prady" as if it were the address of a string causes a segfault.

Is it possible to convert char[] to char* in C?

I'm doing an assignment where we have to read a series of strings from a file into an array. I have to call a cipher algorithm on the array (cipher transposes 2D arrays). So, at first I put all the information from the file into a 2D array, but I had a lot of trouble with conflicting types in the rest of my code (specifically trying to set char[] to char*). So, I decided to switch to an array of pointers, which made everything a lot easier in most of my code.
But now I need to convert char* to char[] and back again, but I can't figure it out. I haven't been able to find anything on google. I'm starting to wonder if it's even possible.

It sounds like you're confused between pointers and arrays. Pointers and arrays (in this case char * and char []) are not the same thing.
An array char a[SIZE] says that the value at the location of a is an array of length SIZE
A pointer char *a; says that the value at the location of a is a pointer to a char. This can be combined with pointer arithmetic to behave like an array (eg, a[10] is 10 entries past wherever a points)
In memory, it looks like this (example taken from the FAQ):
char a[] = "hello"; // array
+---+---+---+---+---+---+
a: | h | e | l | l | o |\0 |
+---+---+---+---+---+---+
char *p = "world"; // pointer
+-----+ +---+---+---+---+---+---+
p: | *======> | w | o | r | l | d |\0 |
+-----+ +---+---+---+---+---+---+
It's easy to be confused about the difference between pointers and arrays, because in many cases, an array reference "decays" to a pointer to it's first element. This means that in many cases (such as when passed to a function call) arrays become pointers. If you'd like to know more, this section of the C FAQ describes the differences in detail.
One major practical difference is that the compiler knows how long an array is. Using the examples above:
char a[] = "hello";
char *p = "world";
sizeof(a); // 6 - one byte for each character in the string,
// one for the '\0' terminator
sizeof(p); // whatever the size of the pointer is
// probably 4 or 8 on most machines (depending on whether it's a
// 32 or 64 bit machine)
Without seeing your code, it's hard to recommend the best course of action, but I suspect changing to use pointers everywhere will solve the problems you're currently having. Take note that now:
You will need to initialise memory wherever the arrays used to be. Eg, char a[10]; will become char *a = malloc(10 * sizeof(char));, followed by a check that a != NULL. Note that you don't actually need to say sizeof(char) in this case, because sizeof(char) is defined to be 1. I left it in for completeness.
Anywhere you previously had sizeof(a) for array length will need to be replaced by the length of the memory you allocated (if you're using strings, you could use strlen(), which counts up to the '\0').
You will need a make a corresponding call to free() for each call to malloc(). This tells the computer you are done using the memory you asked for with malloc(). If your pointer is a, just write free(a); at a point in the code where you know you no longer need whatever a points to.
As another answer pointed out, if you want to get the address of the start of an array, you can use:
char* p = &a[0]
You can read this as "char pointer p becomes the address of element [0] of a".

If you have
char[] c
then you can do
char* d = &c[0]
and access element c[1] by doing *(d+1), etc.

You don't need to declare them as arrays if you want to use use them as pointers. You can simply reference pointers as if they were multi-dimensional arrays. Just create it as a pointer to a pointer and use malloc:
int i;
int M=30, N=25;
int ** buf;
buf = (int**) malloc(M * sizeof(int*));
for(i=0;i<M;i++)
buf[i] = (int*) malloc(N * sizeof(int));
and then you can reference buf[3][5] or whatever.

None of the above worked for me except strtok
#include <string.h>
Then use strtok
char some[] = "some string";
char *p = strtok(some, "");
strtok is used to split strings. But you can see that I split it on nothing ""
Now you have a pointer.

Well, I'm not sure to understand your question...
In C, Char[] and Char* are the same thing.
Edit : thanks for this interesting link.

Difference between char a[]="string"; char *p="string"; [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
What is the difference between char s[] and char *s in C?
What is the difference between char a[]="string"; and char *p="string";?

The first one is array the other is pointer.
The array declaration "char a[6];" requests that space for six characters be set aside, to be known by the name "a." That is, there is a location named "a" at which six characters can sit. The pointer declaration "char *p;" on the other hand, requests a place which holds a pointer. The pointer is to be known by the name "p," and can point to any char (or contiguous array of chars) anywhere.
The statements
char a[] = "hello";
char *p = "world";
would result in data structures which could be represented like this:
+---+---+---+---+---+---+
a: | h | e | l | l | o |\0 |
+---+---+---+---+---+---+
+-----+ +---+---+---+---+---+---+
p: | *======> | w | o | r | l | d |\0 |
+-----+ +---+---+---+---+---+---+
It is important to realize that a reference like x[3] generates different code depending on whether x is an array or a pointer. Given the declarations above, when the compiler sees the expression a[3], it emits code to start at the location "a," move three past it, and fetch the character there. When it sees the expression p[3], it emits code to start at the location "p," fetch the pointer value there, add three to the pointer, and finally fetch the character pointed to. In the example above, both a[3] and p[3] happen to be the character 'l', but the compiler gets there differently.
You can use search there are tons of explanations on the subject in th internet.

char a[]="string"; //a is an array of characters.
char *p="string";// p is a string literal having static allocation. Any attempt to modify contents of p leads to Undefined Behavior since string literals are stored in read-only section of memory.

No difference. Unless you want to actually write to the array, in which case the whole world will explode if you try to use the second form. See here.

First declaration declares an array, while second - a pointer.
If you're interested in difference in some particular aspect, please clarify your question.

One difference is that sizeof(a)-1 will be replaced with the length of the string at compile time. With p you need to use strlen(p) to get the length at runtime. Also some compilers don't like char *p="string", they want const char *p="string" in which case the memory for "string" is read-only but the memory for a is not. Even if the compiler does not require the const declaration it's bad practice to modify the string pointed to by p (ie *p='a'). The pointer p can be changed to point to something else. With the array a, a new value has to be copied into the array (if it fits).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Confusion about the way C handles strings - arrays

Related

What is the output of the following code in C? [closed]

Problems about C program char * pointer

Pointer to pointer gives segmentation fault?

Is it possible to convert char[] to char* in C?

Difference between char a[]="string"; char *p="string"; [duplicate]

Categories

Resources