strcpy() on a string inside a struct - undefined? [duplicate] - arrays

This question already has answers here:
What is the difference between char s[] and char *s?
(14 answers)
Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?
(19 answers)
Closed 2 years ago.
I have a struct defined similar to the following:
struct csv_headers {
char field1[256];
char field2[256];
};
And then later in my code I have:
struct csv_headers fields;
strcpy(fields.field1, str_ary[0]);
strcpy(fields.field2, str_ary[1]);
Where str_ary[n] is some string.
Does this cause undefined behaviour the same way the following code would?
char str[] = "some text";
strcpy(str, "text");
Since strcpy should not be used for string literals?
If so, what is the accepted way to copy a string into a string declared inside a struct? My code example above compiles with no warnings using -Wextra (assuming I typed it correctly - I don't have the code in front of me right now).
I am aware of the potential dangers of using strcpy and don't wish to discuss that here, that is for illustrative purposes only. I just want to know the accepted way of copying a string into a struct containing strings.

Does this cause undefined behaviour the same way the following code would?
char str[] = "some text";
strcpy(str, "text");
Since strcpy should not be used for string literals?
str isn’t a string literal, it’s a char[10] (initialised by copying data from a string literal1), and it’s a valid target for strcpy. This code does not cause undefined behaviour; it’s well-defined and valid.
Likewise, fields.field1 and fields.field2 in your example are regular char[]s. That they’re members of a struct isn’t important in this context.
However, be sure that the source is valid: str_ary[0] looks fishy, unless str_ary is an array of zero-terminated strings, i.e. something like char[][].
1 The initialisation of a string array from a string literal happens as if by strcpy, i.e. it’s almost exactly the same as writing
char str[sizeof("some text")];
strcpy(str, "some text");
In fact, compilers could (and at least in some cases do) generate the same code for this initialisation as for the direct assignment. By contrast, char *str = "some text"; doesn’t perform copying, and here str is a pointer to a string literal, and therefore read-only, and not a valid target for strcpy.

Does this cause undefined behaviour the same way the following code would?
Well, your snippet does not cause problems.
char str1[] = "foo"; // str1 is a regular array initialized to "foo"
char *str2 = "bar" // str2 is a pointer, pointing to a string literal
strcpy(str1, "FOO"); // Ok
strcpy(str2, "BAR"); // Undefined behavior

As long as the string fits in the target space, even
char str[] = "some text";
strcpy(str, "text");
is well-defined.
char field1[256]; declares an array with room for 256 characters. This allows you to store a null-terminated string of up to 255 characters (the array size, minus one for the null terminator).
char str[256] = "some text"; additionally initializes the array to contain the string "some text". You can still copy a string of up to 255 characters into the array, since that's how much room there is.
char str[] = "some text"; is exactly identical to char str[10] = "some text";. With the empty brackets, str is still an array of characters. What the empty brackets mean is that the size of the array (what would normally go inside the brackets) is determined automatically. There is no other difference compared with brackets with a size in it. For an array of characters, the size is the length of the string plus one for the null terminator.
In all cases involving an array, the string used to initialize the array (if there is one) is stored in the array itself. It isn't accessible in any other way, and if you change the array, the string that was used to initialize it is not present in memory anymore (at least in the sense that there's no way to find it — there's no guarantee that it won't be present in a memory dump, but you have no way to access it from your program).
The critical difference is not between char str[10] and char str[], but between char str[…] and char *p. With a * instead of brackets, char *p declares a pointer, not an array.
char *p = "some text";
does two things. One thing is that it arranges for a string literal "some text" to be stored somewhere that's accessible for the whole duration of the program. Another thing is that it declares a variable p that points to this string literal. You cannot modify the memory where "some text" is stored (in practice, on many platforms, it's in read-only memory). On the other hand, you can make something else point to that memory, and you can change p to point to something else.
char *p = "some text";
char *q;
puts(p); // prints "some text"
q = p;
p = "hello";
puts(q); // prints "some text"
puts(p); '// prints "hello"

Related

How can i copy to string literal in C [duplicate]

This question already has answers here:
Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?
(19 answers)
Closed 4 years ago.
char temp1[4] = "abc";
char *temp2 = "123";
strcpy(temp1,temp2);
if I want to copy a string literal to an array, it works well, but if I do it in the opposite way, I get an error:
char temp1[4] = "abc";
char *temp2 = "123";
strcpy(temp2,temp1);
The feedback from compiler is "Segmentation fault".
So what's the differences? Is there anyway to copy a string to a string literal?
Thx.
You need to understand the subtle difference between these 2 lines
char temp1[4] = "abc";
char *temp2 = "123";
The first one creates a 4 character variable and copies "abc\0" to it.
You can overwrite that if you want to. You can do e.g. temp1[0] = 'x' if you want.
The second one creates a pointer that points to the constant literal "123\0".
You cannot overwrite that, its typically in memory that is declared read only to the OS.
What you have is more complicated than a string literal and what you attempt to do cannot be described as "copying to string literal". Which is good, because copying to a string literal is literally impossible. (Excuse the pun.)
First, what you are successfully doing in the first code quote is copying from a string literal into an array of chars of size 4 (you knew that). You are however doing this with the added detail of copying via a pointer to that string literal (temp2). Also note that what the pointer is pointing to is not a variable which can be edited in any way. It is "just a string which the linker knows about".
In the second code quote you attempt to copy a string (strictly speaking a zero-terminated sequence of chars which is stored in an array, temp1, but not a string literal) to a place where a pointer to char (temp2) points to, but which happens not to be a variable which is legal to write to.
The types of the involved variables allow such an operation basically, but in this case it is forbidden/impossible; which causes the segmentation fault.
Now what IS possible and might be what you actually attempt, is to repoint temp2 to the address at the beginning of temp1. I believe that is what gives you the desired effect:
char temp1[4] = "abc";
char *temp2 = "123";
/* Some code, in which temp2 is used with a meaningful
initialisation value, which is represented by "123".
Then, I assume you want to change the pointer, so that it points
to a dynamically determined string, which is stored in a changeable
variable.
To do that: */
temp2=temp1;
/* But make sure to keep the variable, the name you chose makes me worry. */
Note that an array identifier can be used as a pointer to the type of the array entries.

I don't understand the difference between a char array and a char * string [duplicate]

This question already has answers here:
What is the difference between char s[] and char *s?
(14 answers)
Closed 7 years ago.
I made the following little program:
#include <stdio.h>
void strncpy(char * s, char * t, int n);
int
main()
{
char string1[]="Learning strings";
char string2[10];
strncpy(string2,string1,3);
printf("string1:%s\nstring2:%s\n",string1,string2);
return 0;
}
void strncpy(char * s, char * t, int n)
{
int i;
for(i=0; i<n && t[i]!=0;i++)
s[i]=t[i];
s[i]=0;
}
I was trying to learn the difference between doing something like:
char greeting[]="Hello!";
And
char * farewell="Goodbye!";
And I thought my program would work with either of the two types of 'strings'(correct way of saying it?), but it only works with the first one.
Why does this happen? What's the difference between the two types?
What would I have to do to my program to be able to use strings of the second type?
The statement
char greeting[] = "Hello!";
causes the compiler to work out the size of the string literal "Hello!" (7 characters including the terminating '\0'), create an array of that size, and then copy that string into that array. The result of that greeting can be modified (e.g. its characters overwritten).
The statement
char * farewell="Goodbye!";
creates a pointer that points at the first character in the string literal "Goodbye!". That string literal cannot be modified without invoking undefined behaviour.
Either greeting or farewell can be passed to any function that does not attempt to modify them. greeting can also be passed to any function which modifies it (as long as only characters greeting[0] through to greeting[6] are modified, and no others). If farewell is modified, the result is undefined behaviour.
Generally speaking, it is better to change the definition of farewell to
const char * farewell="Goodbye!";
which actually reflects its true nature (and will, for example, cause a compilation error if farewell is passed to a function expecting a non-const parameter). The fact that it is possible to define farewell as a non-const pointer while it points at (the first character of) a string literal is a historical anomaly.
And, of course, if you want farewell to be safely modifiable, declare it as an array, not as a pointer.
The string literals "Hello" and "Goodbye" are stored as arrays of char such that they are allocated at program startup and released at program exit, and are visible over the entire program. They may be stored in such a way that they cannot be modified (such as in a read-only data segment). Attempting to modify the contents of a string literal results in undefined behavior, meaning the compiler isn't required to handle the situation in any particular way - it may work the way you want, it may result in a segmentation violation, or it may do so ething else.
The line
char greeting[] = "Hello";
allocates enough space to hold a copy of the literal and writes the contents of the literal to it. You may modify the contents of this array at will (although you can't store strings longer than "Hello" to it).
The line
char *farewell = "Goodbye";
creates a pointer and writes the address of the string literal "Goodbye" to it. Since this is a pointer to a string literal, we cannot write to the contents of the literal through that pointer.

strcpy behaving differently when two pointers are assigned strings in different ways

I am sorry, I might me asking a dumb question but I want to understand is there any difference in the below assignments? strcpy works in the first case but not in the second case.
char *str1;
*str1 = "Hello";
char *str2 = "World";
strcpy(str1,str2); //Works as expected
char *str1 = "Hello";
char *str2 = "World";
strcpy(str1,str2); //SEGMENTATION FAULT
How does compiler understand each assignment?Please Clarify.
Edit: In the first snippet you wrote *str1 = "Hello" which is equivalent to assigning to str[0], which is obviously wrong, because str1 is uninitialized and therefore is an invalid pointer. If we assume that you meant str1 = "Hello", then you are still wrong:
According to C specs, Attempting to modify a string literal results in undefined behavior: they may be stored in read-only storage (such as .rodata) or combined with other string literals so both snippets that you provided will yield undefined behavior.
I can only guess that in the second snippet the compiler is storing the string in some read-only storage, while in the first one it doesn't, so it works, but it's not guaranteed.
Sorry, both examples are very wrong and lead to undefined behaviour, that might or might not crash. Let me try to explain why:
str1 is a dangling pointer. That means str1 points to somewhere in your memory, writing to str1 can have arbitrary consequences. For example a crash or overriding some data in memory (eg. other local variables, variables in other functions, everything is possible)
The line *str1 = "Hello"; is also wrong (even if str1 were a valid pointer) as *str1 has type char (not char *) and is the first character of str1 which is dangling. However, you assign it a pointer ("Hello", type char *) which is a type error that your compiler will tell you about
str2 is a valid pointer but presumably points to read-only memory (hence the crash). Normally, constant strings are stored in read-only data in the binary, you cannot write to them, but that's exactly what you do in strcpy(str1,str2);.
A more correct example of what you want to achieve might be (with an array on the stack):
#define STR1_LEN 128
char str1[STR1_LEN] = "Hello"; /* array with space for 128 characters */
char *str2 = "World";
strncpy(str1, str2, STR1_LEN);
str1[STR1_LEN - 1] = 0; /* be sure to terminate str1 */
Other option (with dynamically managed memory):
#define STR1_LEN 128
char *str1 = malloc(STR1_LEN); /* allocate dynamic memory for str1 */
char *str2 = "World";
/* we should check here that str1 is not NULL, which would mean 'out of memory' */
strncpy(str1, str2, STR1_LEN);
str1[STR1_LEN - 1] = 0; /* be sure to terminate str1 */
free(str1); /* free the memory for str1 */
str1 = NULL;
EDIT: #chqrlie requested in the comments that the #define should be named STR1_SIZE not STR1_LEN. Presumably to reduce confusion because it's not the length in characters of the "string" but the length/size of the buffer allocated. Furthermore, #chqrlie requested not to give examples with the strncpy function. That wasn't really my choice as the OP used strcpy which is very dangerous so I picked the closest function that can be used correctly. But yes, I should probably have added, that the use of strcpy, strncpy, and similar functions is not recommended.
There seems to be some confusion here. Both fragments invoke undefined behaviour. Let me explain why:
char *str1; defines a pointer to characters, but it is uninitialized. It this definition occurs in the body of a function, its value is invalid. If this definition occurs at the global level, it is initialized to NULL.
*str1 = "Hello"; is an error: you are assigning a string pointer to the character pointed to by str1. str1 is uninitialized, so it does not point to anything valid, and you channot assign a pointer to a character. You should have written str1 = "Hello";. Furthermore, the string "Hello" is constant, so the definition of str1 really should be const char *str1;.
char *str2 = "World"; Here you define a pointer to a constant string "World". This statement is correct, but it would be better to define str2 as const char *str2 = "World"; for the same reason as above.
strcpy(str1,str2); //Works as expected NO it does not work at all! str1 does not point to a char array large enough to hold a copy of the string "World" including the final '\0'. Given the circumstances, this code invokes undefined behaviour, which may or may not cause a crash.
You mention the code works as expected: it only does no in appearance: what really happens is this: str1 is uninitialized, if it pointed to an area of memory that cannot be written, writing to it would likely have crashed the program with a segmentation fault; but if it happens to point to an area of memory where you can write, and the next statement *str1 = "Hello"; will modify the first byte of this area, then strcpy(str1, "World"); will modify the first 6 bytes at that place. The string pointed to by str1 will then be "World", as expected, but you have overwritten some area of memory that may be used for other purposes your program may consequently crash later in unexpected ways, a very hard to find bug! This is definitely undefined behaviour.
The second fragment invokes undefined behaviour for a different reason:
char *str1 = "Hello"; No problem, but should be const.
char *str2 = "World"; OK too, but should also be const.
strcpy(str1,str2); //SEGMENTATION FAULT of course it is invalid: you are trying to overwrite the constant character string "Hello" with the characters from the string "World". It would work if the string constant was stored in modifiable memory, and would cause even greater confusion later in the program as the value of the string constant was changed. Luckily, most modern environemnts prevent this by storing string constants in a read only memory. Trying to modify said memory causes a segment violation, ie: you are accessing the data segment of memory in a faulty way.
You should use strcpy() only to copy strings to character arrays you define as char buffer[SOME_SIZE]; or allocate as char *buffer = malloc(SOME_SIZE); with SOME_SIZE large enough to hold what you are trying to copy plus the final '\0'
Both code are wrong, even if "it works" in your first case. Hopefully this is only an academic question! :)
First let's look at *str1 which you are trying to modify.
char *str1;
This declares a dangling pointer, that is a pointer with the value of some unspecified address in the memory. Here the program is simple there is no important stuff, but you could have modified very critical data here!
char *str = "Hello";
This declares a pointer which will point to a protected section of the memory that even the program itself cannot change during execution, this is what a segmentation fault means.
To use strcpy(), the first parameter should be a char array dynamically allocated with malloc(). If fact, don't use strcpy(), learn to use strncpy() instead because it is safer.

Confusion in basic concept of pointers and strings [duplicate]

This question already has answers here:
What is the difference between char s[] and char *s?
(14 answers)
Closed 9 years ago.
In my GCC 32-bit compiler, the following code gives the output
char *str1="United";
printf("%s",str1);
output:
United
then should I consider char *str1="United"; the same as char str1[]="United"?
The two are not the same: char *str1="United" gives you a pointer to a literal string, which must not be modified (else it's undefined behavior). The type of a literal string should really be const but it's non-const for historical reasons. On the other hand, char str1[]="United" gives you a modifiable local string.
char* str = "United"; is read-only. You wouldn't be able to reach inside the string and change parts of it:
*str = 'A';
Will most likely give you a segmentation fault.
On the other hand char str1[] = "United"; is an array and so it can be modified as long as you don't exceed the space allocated for it (arrays cannot be resized). For example this is perfectly legal:
char str[] = "United";
str[0] = 'A';
printf("%s\n", str);
This will print Anited.
See the comp.lang.c.faq, question 1.32. It basically boils down to the fact that declaring the string in array form (char str[] = "foo") is identical to char str[] = {'f','o','o'}, which is identical to char str[] = {102, 111, 111}; that is, it is a normal array on the stack. But when you use a string literal in any other context it becomes "an unnamed, static array of characters, [which] may be stored in read-only memory, and which therefore cannot necessarily be modified." (And trying to modify it results in undefined behavior wherever it happens to be stored, so don't).

Difference between char *str = "…" and char str[N] = "…"? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the difference between char s[] and char *s in C?
Question about pointers and strings in C
I'm reading about the strings in C and I'm confused. I can "declare" strings in two ways:
char *str = "This is string";
char str2[20] = "This is string";
What is the difference between the two declarations? When would char str2[20] be preferred over char *str?
In C, strings are represented as sequences of chars, with a NULL character (aka 0, '\0'). They are stored in memory and you work with a way of referencing it. You have identified the two ways of referencing it, a char *, which is a pointer to a sequence of chars and an array, which is an immediate string of chars as an actual variable. Be aware that the string "abc" is 4 bytes long as there is an additional NULL character to represent the end of the string.
In addition to this, you are actually assigning strings in the example, which also involves the strings to given at compile-time.
So two questions. First is about how you represent strings (char * vs char[]) the second is about compile-time strings.
To come to your examples:
The first one creates a constant string in the text of the program and a pointer to it. Depending on the compiler it may be stored anywhere. It is the equivalent of mallocing a string and storing a pointer to it, except you must not change the contents of the memory. It is a char *, so you can change the pointer to point to somewhere else, like another malloced string or to the start of an array that you defined in example 2.
The second creates a char array (which a way of representing a string). The array is stored and allocated on the stack for the duration of the function, and you may change the contents. Because it is not a pointer, you cannot change it to point to a different string.
char *str = "This is string";
Puts the string in the constant data section (also known as .rdata) of the program.This data can't be modified.
char str2[20] = "This is string";
In this type of declaration data is preferably stored in the stack area of the program, if declared inside the function scope and in data section if declared in global scope.This data can be modified.
So if you have a necessity to modify data then use the second approach.
C has no strings. All there is is char arrays. And arrays in C are just pointers to the first element.
The easiest way of doing it is in fact your first variant. Not specifying an explicit length of the array for literals will save you from accidentally doing something like
char[3] = "abc";
C strings are constant in memory, so:
char *str = "This is string";
Stores "This is string" in memory and it is not mutable, you can only assign another address to str.
However
char str2[20] = "This si string";
is a shorthand of
char String2[20]={'T','h','i','s',' ','s','i',' ','s','t','r','i','n','g','\0'};
which does not stores a string in memory, stores independent bytes.
If you want to use constant strings like messages, then use first line.
If you want to use and manipulate strings like in a word processor then use second.
Regards
char *str = "This is string"; - This will keep the string in text segment as a read only data and it will store the address in the local pointer variable str.
str[0] = 'a'; //This will leads to crash, because strings are in read only segment.
printf("%d",sizeof(str)); //This will print 4(in 32bit m/c) or 8(in 64 bit m/c)
char str2[20] = "This is string"; - This will keep the string as character array in local stack.
str2[0] = 'a'; //This will change the first character to a
printf("%d",sizeof(str2)); //This will print 20

Resources