In C99 a string is typically initialized by using the char* data type since there is no primitive "string" data type. This effectively creates an array of chars by storing the address of the first char in the variable:
FILE* out = fopen("out.txt", "w");
char* s = argv[1];
fwrite(s, 12, 1, out);
fclose(out);
//successfully prints out 12 characters from argv[1] as a consecutive string.
How does the compiler know that char* s is a string and not just the address of a singular char? If I use int* it will only allow one int, not an array of them. Why the difference?
My main focus is understanding how pointers, referencing and de-referencing work, but the whole char* keeps messing with my head.
How does the compiler know that char* s is a string and not just the address of a singular char?
It doesn't. As far as "the compiler" is concerned, char* s is a pointer to char.
On the other hand, there are many library functions that assume that a char* points to an element of a null-terminated sequence of char (see for example strlen, strcmp etc.).
Note that fwrite does not make this assumption. It requires that you tell it how many bytes you want to write (and that this number doesn't take you beyond the bounds of the buffer pointed at by the first argument.)
If I use int* it will only allow one int, not an array of them. Why the difference?
That is incorrect. The C language does not have a special case for char*. An int* can also point to an element of an array of int. In fact, you could write a library that uses 0 or another sentinel value to indicate the end of a sequence of int, and use it much in the same was as char* are used by convention.
In your code
fwrite(s, 12, 1, out);
is equivalent to writing
write 12 elements of size 1 byte, location starting from address s to the file pointed by out.
Here. a char is of one byte exactly, so you get the desired output.
How does the compiler know that char* s is a string and not just the address of a singular char?
Well, it does not (and does not need to). You asked to (read from s and) write 12 bytes, so it will do that. If the memory is inaccessible, that's a programming mistake. fwrite() itself won't handle that.
Beware:
If s is not allocated memory to be accessed upto s[11] (technically), it will be undefined behaviour. It's upto the programmer to pass the valid values as argument.
In case of int, the size is 4 bytes (usually, on 32 bit system) and printing byte-by-byte won't give you the desired result.
In that case, you need to make use of fprintf() to print formatted output.
Compiler won't have any idea other than char* is an address of a character. We can make it read the characters following by incrementing the address of the first character. The case is similar to any pointer int*,long* etc., compiler just treat a pointer as something points to an address of its type.
Related
why this code does not seem to work the way I expect
char *c="hello";
char *x=malloc(sizeof(char)*5+1);
memcpy(x,(char(*)[2])c,sizeof("hello"));
printf("%s\n",x);
On this question I got comment you cannot cast a pointer to an array. But you can cast it to a pointer to array. Try (char*[2])c so I am just casting to pointer to array of two char so it will get first two characters from c becuase this is what (char(*)[2])c suppose to do. If not then am I missing anything? and I thought since Iam copying it the at index after 1 and 2 I get junk because i did not call memset. why I am getting full hello write with memcpy even though I just casted it t0 (char(*)[2])
how to extract specific range of characters from string with casting to array type-- What it can't be done?
Converting a pointer does not change the memory the pointer points to. Converting the c to char [2] or char (*)[2] will not separate two characters from c.
c is char * that points to the first character of "hello".
(char (*)[2]) c says to take that address and convert it to the type “pointer to an array of 2 char”. The result points to the same address as before; it just has a different type. (There are some technical C semantic issues involved in type conversions and aliasing, but I will not discuss those in this answer.)
memcpy(x,(char(*)[2])c,sizeof("hello")); passes that address to memcpy. Due to the declaration of memcpy, that address is automatically converted to const void *. So the type is irrelevant (barring the technical issues mentioned above); whether you pass the original c or the converted (char (*)[2]) c, the result is a const void * to the same address.
sizeof "hello" is 6, because "hello" creates an array that contains six characters, including the terminating null character. So memcpy copies six bytes from "hello" into x.
Then x[5]='\0'; is redundant because the null character is already there.
To copy n characters from position p in a string, use memcpy(x, c + p, n);. In this case, you will need to manually append a null character if it is not included in the n characters. You may also need to guard against going beyond the end of the string pointed to by c.
I'm confused about the way C handles strings and char * vs char[].
char name[10] = "asd";
printf("%p\n%p", &name, &name[0]); //0x7ffed617acd
//0x7ffed617acd
If this code gives the same addresses for both arguments, does it mean that the C compiler takes char arrays (strings) as a pointer to the first char in the array and moves in the memory till it gets the null terminator? Why wouldn't the same happen if we changed the char name[] to char *name? (I know they differ but what makes C take both in a different way?)
I know that arrays can't be assigned after declaration (unless you used something like strcpy, strcat) which is also confusing. Why wouldn't C take them as any other data type? (Something tells me the compiler has a specific addr for it while you can assign char* to whatever location in the mem since its a pointer).
I know that char * have fixed size unlike char[] which makes char * not usable for first argument of strcat.
in C a "string" is an array of type "char" (terminated with \0).
When you are referring to an array in C, you are using a pointer to the first element. In this case (char *).
According to the ANSI-C standard the name of an array is a pointer to the first element.
Being able to write name instead of &name[0] is syntactical sugar.
In the same way accessing an array element writing name[i] is analogue to writing *(name+i).
does it mean that the c compiler takes char arrays (strings) as a pointer to the first char in the array
An array is not a pointer. But an array will implicitly convert to a pointer to first element. Such conversion is called "decaying".
... and moves in the memory till it gets the null terminator???
You can write such loop if you know the pointer is to an element of null terminated string. If you write that loop, then the compiler will produce a program that does such thing.
Why wouldn't the same happen if we changed the char name[] to char *name?
Your premise is faulty. You can iterate an array directly, as well as using a pointer.
If this code gives the same addresses for both arguments, does it mean
The address of an object is the first byte of the object. What this "same address" means is that the first byte of the first element of the array is in the same address as the first byte of the array as a whole.
I know that arrays can't be assigned after declaration (unless you used something like strcpy, strcat) which is also confusing.
Neither strcpy nor strcat assign an array. They assign elements of the array which you can also do without calling those functions.
Why wouldn't C take them as any other data type?
This question is unclear. What do you mean by "C taking them"? Why do you think C should take another data type? Which data type do you think it should take?
char name[10] = "asd";
printf("%p\n%p", &name, &name[0]);
The arguments are of type char(*)[10] and char* respectively. The %p format specifier requires that the argument is of type similar to void* which isn't similar to those arguments. Passing an argument of a type other than required by the format specifier results in undefined behaviour. You should cast other pointer types to void* when using %p.
I read that when you declare strings using a pointer, the pointer contains the memory address of the string literal. Therefore I expected to get the memory address from this code but rather I got some random numbers. Please help me understand why it didn't work.
int main()
{
char *hi = "Greeting!";
printf("%p",hi);
return(0);
}
If the pointer hi contains the memory address of the string literal, then why did it not display the memory address?
It did work. It's just that you can consider the address as being arbitrarily chosen by the C runtime. hi is a pointer set to the address of the capital G in your string. You own all the memory from hi up to and including the nul-terminator at the end of that string.
Also, use const char *hi = "Greeting!"; rather than char *: the memory starting at hi is read-only. Don't try to modify the string: the behaviour on attempting to do that is undefined.
The "random numbers" you got are the Memory addresses. They are not constant, since on each execution of your program, other Memory addresses are used.
A pointer could be represented in several ways. The format string "%p" "writes an implementation defined character sequence defining a pointer" link. In most cases, it's the pointed object's address interpreted as an appropriately sized unsigned integer, which looks like "a bunch of random number".
A user readable pointer representation is generally only useful for debugging. It allows you to compare two different representations (are the pointers the same?) and, in some cases, the relative order and distance between pointers (which pointer comes "first", and how far apart are they?). Interpreting pointers as integers works well in this optic.
It would be helpful to us if you could clarify what output you expected. Perhaps you expected pointers to be zero-based?
Note that, while some compilers might accept your example, it would be wiser to use const char *. Using a char * would allow you to try to modify your string literal, which is undefined behavior.
This is a general question about C.(I dont have a lot of experience coding in C)
So, if I have a function that takes a char* as an argument. How to know whether its a pointer to a single char or a char array, because if it's a char array I can expect a \0 but if it's not a char array then I wouldn't want to search for \0.
Is char* in argument a pointer to a single char or a char array?
Yes.
A parameter of type char* is always a pointer to a char object (or a null pointer, not pointing to anything, if that's what the caller passes as the corresponding argument).
It's not a pointer to an array (that would be, for example, a pointer of type char(*)[42]), but the usual way to access the elements of an array is via a pointer to the element type, not to the whole array. Why? Because an actual pointer-to-array must always specify the length of the array (42 in my example), which is inflexible and doesn't let the same function deal with arrays of different lengths.
A char* parameter can be treated just as a pointer to a single char object. For example, a function that gets a character of input might be declared like this:
bool get_next_char(char *c);
The idea here is that the function's result tells you whether it was successful; the actual input character is "returned" via the pointer. (This is a contrived example; <stdio.h> already has several functions that read characters from input, and they don't use this mechanism.)
Compare the strlen function, which computes the length of a string:
size_t strlen(const char *s);
s points to the first element of an array of char; internally, strlen uses that pointer to traverse the array, looking for the terminating '\0' character.
Ignoring the const, there's no real difference between the char* parameters for these two functions. In fact, C has no good way to distinguish between these cases: a pointer that simply points to a single object vs. a pointer that points to the first element of an array.
It does have a bad way to make that distinction. For example, strlen could be declared as:
size_t strlen(const char s[]);
But C doesn't really have parameters of array type at all. The parameter declaration const char s[] is "adjusted" to const char *s; it means exactly the same thing. You can even declare a length for something that looks like an array parameter:
void foo(char s[42]);
and it will be quietly ignored; the above really means exactly the same thing as:
void foo(char *s);
The [42] may have some documentation value, but a comment has the same value -- and the same significance as far as the compiler is concerned.
Any distinction between a pointer to a single object and a pointer to the first element of an array has to be made by the programmer, preferably in the documentation for the function.
Furthermore, this mechanism doesn't let the function know how long the array is. For char* pointers in particular, it's common to use the null character '\0' as a marker for the end of a string -- which means it's the callers responsibility to ensure that that marker is actually there. Otherwise, you can pass the length as a separate argument, probably of type size_t. Or you can use any other mechanism you like, as long as everything is done consistently.
... because if it's a char array I can expect a \0 ...
No, you can't, at least not necessarily. A char* could easily point to the first element of a char array that's not terminated by a '\0' character (i.e., that doesn't contain a string). You can impose such a requirement if you like. The standard library functions that operate on strings impose that requirement -- but they don't enforce it. For example, if you pass a pointer to an unterminated array to strlen, the behavior is undefined.
Recommended reading: Section 6 of the comp.lang.c FAQ.
You cannot determine how many bytes are referenced by a pointer. You need to keep track of this yourself.
It is possible that a char array is NOT terminated with a \0 in which case you need to know the length of the array. Also, it is possible for an array to have a length of 1, in which case you have one character with no terminating \0.
The nice thing about C is that you get to define the details about data structures, thus you are NOT limited to a char array always ending with \0.
Some of the terms used to describe C data structures are synonymous. For example, an array is sequential series of data elements, an array of characters is a string, and a string can be terminated with a null char (\0).
Guys i have few queries in pointers. Kindly help to resolve them
char a[]="this is an array of characters"; // declaration type 1
char *b="this is an array of characters";// declaration type 2
question.1 : what is the difference between these 2 types of declaration ?
printf("%s",*b); // gives a segmentation fault
printf("%s",b); // displays the string
question.2 : i didn't get how is it working
char *d=malloc(sizeof(char)); // 1)
scanf("%s",d); // 2)
printf("%s",d);// 3)
question.3 how many bytes are being allocated to the pointer c?
when i try to input a string, it takes just a word and not the whole string. why so ?
char c=malloc(sizeof(char)); // 4)
scanf("%c",c); // 5)
printf("%c",c);// 6)
question.4 when i try to input a charcter why does it throw a segmentation fault?
Thanks in advance.. Waiting for your reply guys..
printf("%s",*b); // gives a segmentation fault
printf("%s",b); // displays the string
the %s expects a pointer to array of chars.
char *c=malloc(sizeof(char)); // you are allocating only 1 byte aka char, not array of char!
scanf("%s",c); // you need pass a pointer to array, not a pointer to char
printf("%s",c);// you are printing a array of chars, but you are sending a char
you need do this:
int sizeofstring = 200; // max size of buffer
char *c = malloc(sizeof(char))*sizeofstring; //almost equals to declare char c[200]
scanf("%s",c);
printf("%s",c);
question.3 how many bytes are being allocated to the pointer c? when i
try to input a string, it takes just a word and not the whole string.
why so ?
In your code, you only are allocating 1 byte because sizeof(char) = 1byte = 8bit, you need allocate sizeof(char)*N, were N is your "string" size.
char a[]="this is an array of characters"; // declaration type 1
char *b="this is an array of characters";// declaration type 2
Here you are declaring two variables, a and b, and initializing them. "this is an array of characters" is a string literal, which in C has type array of char. a has type array of char. In this specific case, the array does not get converted to a pointer, and a gets initialized with the array "this is an array of characters". b has type pointer to char, the array gets converted to a pointer, and b gets initialized with a pointer to the array "this is an array of characters".
printf("%s",*b); // gives a segmentation fault
printf("%s",b); // displays the string
In an expression, *b dereferences the pointer b, so it evaluates to the char pointed by b, i.e: T. This is not an address (which is what "%s" is expecting), so you get undefined behavior, most probably a crash (but don't try to do this on embedded systems, you could get mysterious behaviour and corrupted data, which is worse than a crash). In the second case, %s expects a pointer to a char, gets it, and can proceed to do its thing.
char *d=malloc(sizeof(char)); // 1)
scanf("%s",d); // 2)
printf("%s",d);// 3)
In C, sizeof returns the size in bytes of an object (= region of storage). In C, a char is defined to be the same as a byte, which has at least 8 bits, but can have more (but some standards put additional restrictions, e.g: POSIX requires 8-bit bytes, i.e: octets). So, you are allocating 1 byte. When you call scanf(), it writes in the memory pointed to by d without restraint, overwriting everything in sight. scanf() allows maximum field widths, so:
Allocate more memory, at least enough for what you want + 1 terminating ASCII NUL.
Tell scanf() to stop, e.g: scanf("%19s") for a maximum 19 characters (you'll need 20 bytes to store that, counting the terminating ASCII NUL).
And last (if markdown lets me):
char c=malloc(sizeof(char)); // 4)
scanf("%c",c); // 5)
printf("%c",c);// 6)
c is not a pointer, so you are trying to store an address where you shouldn't. In scanf, "%c" expects a pointer to char, which should point to an object (=region of storage) with enough space for the specified field width, 1 by default. Since c is not a pointer, the above may crash in some platforms (and cause worse things on others).
I see several problems in your code.
Question 1: The difference is:
a gets allocated in writable memory, the so-called data segment. Here you can read and write as much as you want. sizeof a is the length of the string plus 1, the so-called string terminator (just a null byte).
b, however, is just a pointer to a string which is located in the rodata. That means, in a data area which is read only. sizeof b is whatever is the pointer size on your system, maybe 4 or 8 on a PC or 2 on many embedded systems.
Question 2: The printf() format wants a pointer to a string. With *b, you dereferene the pointer you have and give it the first byte of data, which is a t (ASCII 84 or something like that). The callee, however, treats it as a pointer, dereferences it and BAM.
With b, however, everything goes fine, as it is exactly the right call.
Question 3: malloc(sizeof(char)) allocates exactly one byte. sizeof(char) is 1 by definition, so the call is effectively malloc(1). The input just takes a word because %s is defined that way.
Question 4:
char c=malloc(sizeof(char)); // 4)
shound give you a warning: malloc() returns a pointer which you try to put into a char. ITYM char *...
As you continue, you give that pointer to scanf(), which receives e.g. instead of 0x80043214 a mere 0x14, interprets it as a pointer and BAM again.
The correct way would be
char * c=malloc(1024);
scanf("%1024s", c);
printf("%s", c);
Why? Well, you want to read a string. 1 byte is too small, better allocate more.
In scanf() you should take care that you don't allow reading more than your buffer can hold - thus the limitation in the format specifier.
and on printing, you should use %s, because you want the whole string to be printed and not only the first character. (At least, I suppose so.)
Ad Q1: The first is an array of chars with a fixed pointer a pointing to it. sizeof(a) will return something like 20 (strlen(a)+1). Trying to assign something to a (like a = b) will fail, since a is fixed.
The second is a pointer pointing to an array of char and hence is the sizeof(b) usually 4 on 32-bit or 8 on 64-bit. Assigning something to b will work, since the pointer can take a new value.
Of course, *a or *b work on both.
Ad Q2: printf() with the %s argument takes a pointer to a char (those are the "strings" in C). Hence, printf("%s", *b) will crash, since the "pointer" used by printf() will contain the byte value of *b.
What you could do, is printf("%c", *b), but that would only print the first character.
Ad Q3: sizeof(char) is 1 (by definition), hence you allocate 1 byte. The scanf will most likely read more than one byte (remember that each string will be terminated by a null character occupying one char). Hence the scanf will trash memory, likely to cause memory sometime later on.
Ad 4: Maybe that's the trashed memory.
Both declaration are the same.
b point to the first byte so when you say *b it's the first character.
printf("%s", *b)
Will fail as %s accepts a pointer to a string.
char is one byte.