Explanation on how does the memcpy function behaves? [duplicate] - c

This question already has answers here:
No out of bounds error
(7 answers)
Closed 5 years ago.
#include <stdio.h>
#include <string.h>
char lists[10][25];
char name[10];
void main()
{
scanf("%s" , lists[0]);
memcpy(name , lists[0], 25);
printf("%s\n" , name);
}
In the above code I am predefining the size of character array "name" as 10.
Now when I gave the input as :
Input - abcdefghijklmnopqrstuvwxy
The output I got was the same string : abcdefghijklmnopqrstuvwxy
Should'nt I get the output as : abcdefghij ???
how this is becoming possible even though the size of array is limited to 10?

Because it doesn't know the size of the allocated memory it's writing into, and you got away with where the extra data got written. You might not on another platform, or using a different compiler, or different optimisation settings.
When passing the size parameter to memcpy (), it's a good idea to take the size of the destination memory into account.
When using char arrays, if you want to be safer about not overrunning memory, you can use strncpy (). It'll take care of inserting the trailing NULL in the right place.

To start with, arrays are pointers. In C there are no length checks like on Java for example.
When you write char a[2]; the OS gives you space on the memory for 2 chars.
For example, let the memory be
|1|2|3|4|5|6|7|8|9|10|11|12|
a
a is a pointer to the address 1. The a[0] = 0; is equal with *(a+0) = 0, meaning write 0 to the address a + offset 0.
So if you try to write to an address that you have not allocated, unexpected things can happen.
For example, lets say we have char a[2];char b[2]; and the memory map is
|1|2|3|4|5|6|7|8|9|10|11|12|
a b
Then the a[2] = 0 is equal to b[0] = 0. But if this address is an address of an other program, then a segmentation error will be raised.
Try the program (it may work with no optimizations of the compiler):
#include <stdio.h>
#include <string.h>
char a[4];
char b[4];
void main()
{
scanf("%s" , a); // input "12345678"
printf("%s\n" , b); // print "5678"
}
memcpy just copies from an address to the other the size of data you said.
In your example, you were luky because all the addresses you accessed where assigned to your program (inside your's memory page).
In C/C++ you are responsible to handle the memory correctly. Also, keep in mind that strings end at the char \0 so inside an array char str[10]; we usually have tops 9 chars and the \0.

Related

Why do I need "&" in printf when I want to print a pointer

So I wrote this code where I scan 2 strings. One is declared as an array and one as a pointer.
Now to my question: Why do I need for printing text2 in the printf-statment the "&" before Text2 and when I print Text1 not?
I thought if I put "&" in printf before the variable it pirnts the memory address. I this case not, it prints the string.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char Text1[45];
char *Text2;
scanf("%s" , &Text1);
printf("Text1: %s\n", Text1);
scanf("%s" , &Text2);
printf("Text2: %s\n", &Text2);
return 0;
}
char Text1[45] is an array of characters. The compiler will allocate 45 bites in a program memory for use by the program. The value of the bytes is not known at the moment. So, scanf("%s" , Text1) will put input chars into this memory, assuming that there are less than 44 of them, or it will override the program stack and possibly crash. To prevent from this issue, you should use something like %44s.
There is no need to use & in this case. It does not do much with the array declared in such a way. Therefore you do not need it in printf("%s\n", Text1). But you can use it if you wish.
char *Text2 declares a pointer variable. It means that the compiler allocates enough space to contain the pointer value. The value of the pointer is not defined at the moment, so it does not point anywhere. If you plan to use it with characters, you need to allocate space for them or assign the space in a different way. For example, Text2 = malloc(45) will allocate 45 bytes for use and set a pointer to those bytes. Or you can do Text2 = Text1, assigning address of the first byte of the Text1 array as a pointer. This way the Text1 array will be used as a byte storage.
As a result, scanf("%s", Text2) will use the pointer to access bytes, either allocated by malloc or in the Text1. Now you need to printf("%s\n", Text2).
You should not use & on Text2. It will return an address of the pointer variable and not the address of the array of bytes. You need the latter. So print with &Text2 will return trash and could cause a crash.
BTW, if you used malloc it is a good idea to free the memory which was allocated if it is not needed any longer: free(Text2).
Let's get rid of the part dealing with Text1 for the moment, and focus solely on Text2. That leaves us with something like this:
char *Text2;
scanf("%s" , &Text2);
printf("Text2: %s\n", &Text2);
You've declared Text2 as a pointer, but you haven't initialized it to point to any available space. Then you pass the address of that pointer to scanf, and match it up with a format that tells scanf to read a string, and deposit it at the specified location, so instead of using the pointer as a pointer, scanf will try to use it as if it were an array of char.
To make this work sanely, we want to use the pointer as a pointer, and have it point at some available memory--and we want to tell scanf the size of that memory, so the user can't enter more data than we've provided space to store.
#define MAXSIZE 128
char *Text2 = malloc(MAXSIZE);
scanf("%127s", Text2); // note lack of ampersand here
printf("%s\n", Text2); // Now we don't need an ampersand here either.
Your program is exhibiting undefined behavior and although it is mostly pointless to speculate about undefined behavior, it may be interesting to consider the following:
$ cat a.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int
main(void)
{
char text1[128] = {0};
char *text2;
char text3[128] = {0};
scanf("%44s" , &text2); /* Undefined behavior */
printf("Text1: %s\n", text1);
printf("Text3: %s\n", text3);
return 0;
}
$ echo abcdefghijklmnopHelloWorld | ./a.out
bash: child setpgid (96724 to 96716): Operation not permitted
Text1:
Text3: HelloWorld
The behavior shown above indicates that on my platform, the text that is being written to &text2 is overwriting the value of text3 (this is a stack overflow). This is simply because (on my platform), the variables text2 and text3 are placed 16 bytes apart in the stack when the program executes. To reiterate, the behavior of the code is undefined and the actual performance will vary greatly depending on where it is run, but despite mythical warnings about demons flying out of your nose it is not likely to cause any harm experimenting with it.

Why does this program NOT segfault? [duplicate]

This question already has answers here:
Accessing an array out of bounds gives no error, why?
(18 answers)
Closed 2 years ago.
Usually, this question is probably phrased in a positive way, becoming the next member in the club of duplicate questions - this one hopefully isn't. I have written a simple program to reverse a string in C. Here it is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
char arr[4] = "TEST";
char rev[2];
int j = 0;
for(int i = 0; i < 4; i++) {
j = 4 - i - 1;
rev[j] = arr[i];
}
printf("%s\n",rev);
}
When I define char arr and char rev to be of size 4, everything works fine. When I leave arr size out I get unexpected repeat output like "TSTTST". When I define rev to be an array of 2 chars, I do not get a segfault, yet in the loop I am trying to access its third and fourth element. As far as my relatively limited understanding tells me, accessing the third element in an array of length two should segfault, right? So why doesn't it?
EDIT:
Interestingly enough, when I leave the loop out like so
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
char arr[4] = "TEST";
char rev[2] = "00";
printf("%s\n",rev);
}
it prints "00TEST". What happened here? Some kind of overflow? I even restarted the terminal, recompiled and ran again.
EDIT 2:
I have been made aware that this is indeed a duplicate. However, most of the suggested duplicates referred to C++, which this isn't. I think this is a good question for new C programmers to learn about and understand undefined behavior. I, for one, didn't know that accessing an array out of bounds does not always cause a SEGFAULT. Also, I learned that I have to terminate string literals myself, which I falsely believed was done automatically. This is partly wrong: it is added automatically - the C99 Standard (TC3) says in 6.4.5 String literals that terminating nulls are added in translation phase 7. As per this answer and the answers for this question, char arrays are also null-terminated, but this is only safe if the array has the correct length (string length + 1 for null-terminator).
char rev[2] assigns a memory of size 2*sizeof(char) with variable/pointer rev. You are accessing memory not allocated to the pointer. It may or may not cause errors.
It might appear to work fine, but it isn't very safe at all. By writing data outside the allocated block of memory you are overwriting some data you shouldn't. This is one of the greatest causes of segfaults and other memory errors, and what you're observing with it appearing to work in this short program is what makes it so difficult to hunt down the root cause.
When you do rev[2] or rev[3] you are accessing rev + 2 and rev + 3 addresses which are not allocated to rev pointer. Since its a small program and there is nothing there, it's not causing any errors.
In respect to edit:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
char arr[4] = "TEST";
char rev[2] = "00";
printf("%s\n",rev);
}
%s prints till null is encountered, the size you have assigned of the arr and rev doesn't allow for null to be there, try changing values as follow:
char arr[5] = "TEST";
char rev[3] = "00";
The program will work as intended as in arr there will be TEST\0 and rev will be rev\0 where \0 is null character in C.
Give this article a read, it'll solve most of your queries.

Why don't c compiler throws error if I try storing values greater than what I specified during dynamic allocation?

This is what I have tried. I have not even ended my string with a \0 character.
#include <stdio.h>
#include <malloc.h>
#include <string.h>
int main()
{
int size=5;
char *str = (char *)malloc((size)*sizeof(char));
*(str+0) = 'a';
*(str+1) = 'b';
*(str+2) = 'c';
*(str+3) = 'd';
*(str+4) = 'e';
*(str+5) = 'f';
printf("%d %s", (int)strlen(str), str);
return 0;
}
According to the rule, it can store only 4 charaters and one for the \0 as I have specified it in malloc.
It gives me the perfect output.
Output
6 abcdef
Check this out here : https://onlinegdb.com/B1UeOXbjH
You allocated your own memory, so it is up to you to manage it responsibly. In your example, you allocated 5 bytes of RAM, and you created a pointer which points to the first address of it. Your pointer is not a string, it is not an array. So, what you then did was you wrote 6 bytes, starting at the address pointed to by your pointer. The 6th byte is overflowing into unallocated memory. So you wrote it into memory which may be used for something else and could cause unknown problems. You have created a leak, and you didn't free up the memory you allocated when you quit the program, which is another leak. You didn't add in a /0 anywhere so I honestly think you lucked out. There really isn't any way to know how strlen() would respond. If you want C to handle it for you, than you have char *str = "abcdef" and that will create your string of length 6 plus the /0. But if you do it manually like you did, than you have to handle everything.
C does not count its arrays: if you ask for a chunk of memory of so-many bytes, it gives you that chunk via a pointer, and it's entirely up to you to manage it responsibly.
This leads to a certain efficiency - no overhead from the compiler/runtime checking all this for you - but creates enormous challenges for incorrect code (which you've shown an example of).
Many of us every much like the down-to-the-metal efficiency of C, but there's a reason that so many prefer languages such as Java or C# that do manage this for you, and enforce array bounds. It's a tradeoff.

What is the difference of these array declarations? [duplicate]

This question already has answers here:
Difference between char* and char[]
(8 answers)
String Literals
(3 answers)
Closed 9 years ago.
#include <stdio.h>
#include <string.h>
int main(void){
char s1[30]="abcdefghijklmnopqrstuvwxyz";
printf("%s\n",s1);
printf("%s",memset(s1,'b',7));
getch();
return 0;
}
Above code works but when I create s1 array like this,
char *s1="abcdefghijklmnopqrstuvwxyz";
it does not give any errors in compile time but fails to run in runtime.
I am using Visual Studio 2012.
Do you know why?
I found prototype of memset is:
void *memset( void *s, int c, size_t n );
char s1[30] allocates a writable memory segment to store the contents of the array, char *s1="Sisi is an enemy of Egypt."; doesn't - the latter only sets a pointer to the address of a string constant, which the compiler will typically place in a read-only section of the object code.
String literals gets space in "read-only-data" section which gets mapped into the process space as read-only (So you can't change it).
char s1[30]="abcdefghijklmnopqrstuvwxyz";
This declares s1 as array of type char, and initialized it.
char *s1="abcdefghijklmnopqrstuvwxyz";
Will place "abcdefghijklmnopqrstuvwxyz" in the read-only parts of the memory and making a pointer to that.
However modifying s1 through memset yields an undefined behavior.
An very good question!.
If you make gcc output the assembly, and compare the output, you could find out the answer, and the following is why:
char s1[30]="abcdef";
when defined in a function, it will define an array of char, and s1 is the name of the array. The program will allocate memory in stack.
when define globally, it will define a object in the program, and the object is not an read only data.
char* s2 = "abcdef"; only define a point of char, which point to an const char stored in the .rodata, that is the read only data in the program.
To make program run efficiently and make the progress management easily, the compiler will generate different sections for a given code. Constant chars, like the char* s2 = "abcdef"; and the printf format string will be stored in the .section rodata section. After loading into the main memory by the loader of the OS, this section will be marked as read only. That is why when you use memset to modify the memory which s2 point to, it will complain Segment fault.
Here is an explaination: Difference between char* and char[]

Creating one array of strings in C fails, why?

I tried to create one array of strings in C. Here is the code:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdlib.h>
int main()
{
char *foo[100];
strcpy(foo[0], "Testing this");
printf("%s ", foo[0]);
return 1;
}
But when I compile it, it simply breaks. No error, no nothing, it simply doesn't work and breaks. Any suggestion? When I tri char *foo[10] it works, but I can't work with just 10 strings
You allocated an array of pointers but did not allocate any memory for them to point to. You need to call malloc to allocate memory from the heap.
char *foo[100];
foo[0] = malloc(13);
strcpy(foo[0], "Testing this");
Naturally you would need to free the memory at some later date when you were finished with it.
Your code invokes what is known as undefined behavior. Basically anything can happen, including the code working as you intended. If the version with char *foo[10] works as you intended that's simply down to luck.
As an aside, your main() definition is wrong. It should be int main(void).
You're assigning an unallocated pointer. char *foo[100] is an array of 100 unallocated pointers, and they point to unknown locations in memory, ones which you can probably not access.
You are creating an 100 pointers to point no where. As explained by David, you need to dynamically allocate the memory. However, you can also have the compiler do this for you if you know the size of the strings (or max):
// Create an array of 10 strings of size 100 and initialize them automatically
char foo[10][100] = {0};
// Now you can use it them and not worry about memory leaks
strcpy(foo[0], "text");
// Or use the safer version
strcpy_s(foo[0], 100, "text");
Expanding upon other people's answers:
char *foo;
is a pointer to a character. It may be assigned the address of a single character or assigned the address of the first of a sequence of characters terminated by '\0'. It may also be assigned an address via malloc().
char foo[100];
is space for 100 characters or space for a string of up to 99 characters and a terminating '\0' character.
char *foo[100];
is 100 character pointers, i.e., 100 char *foo; types.
#include <stdlib.h>
#include <stdio.h>
int main(void){
char *foo[100];
foo[0] = malloc(13*(sizeof(char)));
strcpy(foo[0], "Testing this");
printf("%s ", foo[0]);
return 0;
}
This is the corrected version of your code.
MISTAKE: not allocating enough memory for the string.
CORRECTION: using malloc to allocate 13 blocks of memory for the string.

Resources