K&R code example confusion: copy string

K&R code example confusion: copy string - c

In chapter one, the K&R introduces a function copy as follows:
void copy(char to[], char from[]) {
/* copy from from[] to to[], assumes sufficient space */
int i = 0;
while ((to[i] = from[i]) != '\0') {
i++;
}
}
Tinkering around a little with this function, I had some unexpected results.
Example program:
int main() {
char a[3] = {'h', 'a', '\n'};
char b[3];
printf("a: %s", a); // prints ha
copy(b, a);
printf("a: %s", a); // prints nothing
printf("b: %s", b); // prints ha
return 0;
}
Now to my questions:
Why does the copying from a to b work, that is why does the while loop in copy ever terminate even though a does not contain a '\0'?
Why is a mutated?

You're probably experiencing a buffer overflow.
As a is not terminated properly (missing \0), copy copies from a to b as long as it finds no \0. Therefore, there are more bytes written to b as it can contain which then overflow into a (platform dependent, undefined behaviour).
The part that overflows is a \0 after a, thus making a a zero-length string.
Your stack probably looks like this:
b a
[ arbitrary memory ][ 0 0 0 ][ h a \n ][ 0 ? ? ? ? ]
The ? indicates unknown data as we don't know what lies there and it is nowhere specified.
But we know that there has to be a 0, or otherwise printf would print much more garbage.
copy copies until there is a zero found after the beginning a. There is, by chance, a
0 after the end of a, which then gets copied to b. As b is already filled with the
contents from a, b overflows into a, making your stack look like this:
b a
[ arbitrary memory ][ h a \n ][ 0 a \n ][ 0 ? ? ? ? ]
As there's a \0 at the beginning of a, printf assumes a to be empty.

This copy function relies on a terminating nullbyte to determine when it should stop copying. When you use a string constant, it is automatically null-terminated. However, normal arrays are not null-terminated, so the function continues to access memory behind the end of a until one of the bytes there happens to be a \0. When the copy function stops copying depends on the contents of that memory area. You do not know what, if anything, happens to be there, so you have no idea how long copy will keep on copying or what will happen.

Even though you did not explicitly put a '\0' in your variable a, it is very likely that a[3] (which is technically out of bounds) happens to be a zero.
Lots of memory is usually filled with Zeros / \0 values, although there is absolutely no guarantee of this.
That explains why you successfully copied the string to b.
Your copy function is wrong in the sense it "thinks" its copying arrays of ints:
void copy(int to[], int from[]) {
When it should be copying arrays of characters:
void copy(char to[], int char[]) {

It works, because your while loop isn't doing an logical operation, its doing an asignment.
So it's as long going through the loop, as long the asigned value is true.(and thats as long it's non zero) So its terminated when the '\0' (whats 0) is reached.
and it will anywhen probably terminate (with undefined behaviour) even if there is no '\0', because after leaving your boundarys the chance of the apearance of any zero valued byte is pretty big. but after leaving your array boundaries, the behavior of your program could be anything. (it could even let nasal dragons spawn ;) )
The reason why b doesn't print anything could be grounded in the Byte order, as you are wrting int values into a Byte array, so it could be that the first Byte of the Byteorder is placed in first Byte and is 0 so from outside it will be treat as a '\0' even if your purpose was to treat the whole int as a sign and not as 4 single values.
p.s.: This would even break strict-aliasing rules.

Better to examine whether to or from is NULL before the while loop below.
while ((to[i] = from[i]) != '\0'){
i++;
}
To your questions:
1.The loop terminates because something stored in memory after a equals to 0 (or '\0'), which is undefined.
2.a may change or not after calling copy which is undefined as well. Both variables stored after a and the position b stored at affect. In the case nemo mentioned above, a mutated after function copy was executed.
Always insert a '\0' at the end of a char array for security.

Related

string gets filled with garbage

i got a string and a scanf that reads from input until it finds a *, which is the character i picked for the end of the text. After the * all the remaining cells get filled with random characters.
I know that a string after the \0 character if not filled completly until the last cell will fill all the remaining empty ones with \0, why is this not the case and how can i make it so that after the last letter given in input all the remaining cells are the same value?
char string1 [100];
scanf("%[^*]s", string1);
for (int i = 0; i < 100; ++i) {
printf("\n %d=%d",i,string1[i]);
}
if i try to input something like hello*, here's the output:
0=104
1=101
2=108
3=108
4=111
5=0
6=0
7=0
8=92
9=0
10=68

You have an uninitialized array:
char string1 [100];
that has indeterminate values. You could initialize the array like
char string1 [100] = { 0 };
or
char string1 [100] = "";
In this call
scanf("%[^*]s", string1);
you need to remove the trailing character s, because %[] and %s are distinct format specifiers. There is no %[]s format specifier. It should look like this:
scanf("%[^*]", string1);
The array contains a string terminated by the zero character '\0'.
So to output the string you should write for example
for ( int i = 0; string1[i] != '\0'; ++i) {
printf( "%c", string1[i] ); // or putchar( string1[i] );
putchar( '\n' );
or like
for ( int i = 0; string1[i] != '\0'; ++i) {
printf("\n %d=%c",i,string1[i]);
putchar( '\n' );
or just
puts( string1 );
As for your statement
printf("\n %d=%d",i,string1[i]);
then it outputs each character (including non-initialized characters) as integers due to using the conversion specifier d instead of c. That is the function outputs internal ASCII representations of characters.

I know that a string after the \0 character if not filled completly
until the last cell will fill all the remaining empty ones with \0
No, that's not true.
It couldn't be true: there is no length to a string. No where neither the compiler nor any function can even know what is the size of the string. Only you do. So, no, string don't autofill with '\0'
Keep in minds that there aren't any string types in C. Just pointer to chars (sometimes those pointers are constant pointers to an array, but still, they are just pointers. We know where they start, but there is no way (other than deciding it and being consistent while coding) to know where they stop.
Sure, most of the time, there is an obvious answer, that make obvious for any reader of the code what is the size of the allocated memory.
For example, when you code
char string1[20];
sprintf(string1, "hello");
it is quite obvious for a reader of that code that the allocated memory is 20 bytes. So you may think that the compiler should know, when sprinting in it of sscaning to it, that it should fill the unused part of the 20 bytes with 0. But, first of all, the compiler is not there anymore when you will sscanf or sprintf. That occurs at runtime, and compiler is at compilation time. At run time, there is not trace of that 20.
Plus, it can be more complicated than that
void fillString(char *p){
sprintf(p, "hello");
}
int main(){
char string1[20];
string1[0]='O';
string1[1]='t';
fillString(&(string1[2]));
}
How in this case does sprintf is supposed to know that it must fill 18 bytes with the string then '\0'?
And that is for normal usage. I haven't started yet with convoluted but legal usages. Such as using char buffer[1000]; as an array of 50 length-20 strings (buffer, buffer+20, buffer+40, ...) or things like
union {
char str[40];
struct {
char substr1[20];
char substr2[20];
} s;
}
So, no, strings are not filled up with '\0'. That is not the case. It is not the habit in C to have implicit thing happening under the hood. And that could not be the case, even if we wanted to.
Your "star-terminated string" behaves exactly as a "null-terminated string" does. Sometimes the rest of the allocated memory is full of 0, sometimes it is not. The scanf won't touch anything else that what is strictly needed. The rest of the allocated memory remains untouched. If that memory happened to be full of '\0' before the call to scanf, then it remains so. Otherwise not. Which leads me to my last remark: you seem to believe that it is scanf that fills the memory with non-null chars. It is not. Those chars were already there before. If you had the feeling that some other methods fill the rest of memory with '\0', that was just an impression (a natural one, since most of the time, newly allocated memory are 0. Not because a rule says so. But because that is the most frequent byte to be found in random area of memory. That is why uninitialized variables bugs are so painful: they occur only from times to times, because very often uninitialized variables are 0, just by chance, but still they are)

The easiest way to create a zeroed array is to use calloc.
Try replacing
char string1 [100];
with
char *string1=calloc(1,100);

How does a char pointer differ from an int pointer in the below code?

1)
int main()
{
int *j,i=0;
int A[5]={0,1,2,3,4};
int B[3]={6,7,8};
int *s1=A,*s2=B;
while(*s1++ = *s2++)
{
for(i=0; i<5; i++)
printf("%d ", A[i]);
}
}
2)
int main()
{
char str1[] = "India";
char str2[] = "BIX";
char *s1 = str1, *s2=str2;
while(*s1++ = *s2++)
printf("%s ", str1);
}
The second code works fine whereas the first code results in some error(maybe segmentation fault). But how is the pointer variable s2 in program 2 working fine (i.e till the end of the string) but not in program 1, where its running infinitely....
Also, in the second program, won't the s2 variable get incremented beyond the length of the array?

The thing with strings in C is that they have a special character that marks the end of the string. It's the '\0' character. This special character has the value zero.
In the second program the arrays you have include the terminator character, and since it is zero it is treated as "false" when used in a boolean expression (like the condition in your while loop). That means your loop in the second program will copy characters up to and including the terminator character, but since that is "false" the loop will then end.
In the first program there is no such terminator, and the loop will continue and go out of bounds until it just randomly happen to find a zero in the memory you're copying from. This leads to undefined behavior which is a common cause of crashes.
So the difference isn't in how pointers are handled, but in the data. If you add a zero at the end of the source array in the first program (B) then it will also work well.

In str2, You have assigned String. Which means there will be end Of
String('\0' or NULL) due to which when you will increment Str2 and it will
reach to end of string, It will return null and hence your loop will break.
And with integer pointer, there is no end of string. thats why its going to infinite loop.

Joachim gave a good explanation about String terminal character \0 in C language.
Another thing to be aware of when working with pointer is pointer arithmetic.
Arithmetic unit for pointer is the size of the entity pointed.
With a char * pointer named charPtr, on system where char are stored on 1 byte, doing charPtr++ will increase the value in charPtr by *1 (1 byte) to make it ready to point to the next char in memory.
With a int * pointer named intPtr, on system where int are stored on 4 bytes, doing intPtr++ will increase the value in intPtr by 4 (4 bytes) to make it ready to point to the next int in memory.

why does this simple code gives not work at times????then when i restart my code blocks it does work at times

#include <stdio.h>
#include <stdlib.h>
main()
{
char *a,*b,*c={0};
int i=0,j=0;
a=(char *)malloc(20*sizeof(char));
b=(char *)malloc(20*sizeof(char));
c=(char *)malloc(20*sizeof(char));
printf("Enter two strings:");
gets(a);
gets(b);
while(a[i]!=NULL)
{
c[i]=a[i];
i++;
}
while(b[j]!=NULL)
{
c[i]=b[j];
i++;
j++;
}
printf("The concated string is %s",c);
}
this is crazy........i spend one whole night it didn't work and then next night it suddenly works perfectly....i'm confused

There are many things wrong with your code.
Not all of them matter if all you care about is getting the code to work.
However, I have tried here to show you different misconceptions that are clear from your code, and show you how to code it better.
You are misunderstanding what NULL means. a NULL pointer doesn't point at anything
Strings are terminated with '\0' which is an ASCII NUL, not the same thing, though both use the value 0.
char* s = "hello";
The above string is actually 6 characters long. 5 bytes for the hello, 1 for the '\0' that is stuck at the end. Incidentally, this means that you can only have strings up to 19 characters long because you need to reserve one byte for the terminal '\0'
char* r = NULL;
The pointer r is pointing at nothing. There is no '\0' there, and if you attempt to look at r[0], you will crash.
As Ooga pointed out, you missed terminating with '\0' which is going to create random errors because your printf will keep going to try to print until the first zero byte it finds. Whether you crash on any particular run is a matter of luck. Zeros are common, so usually you will stop before you crash, but you will probably print out some junk after the string.
Personally, I would rather crash than have the program randomly print out the wrong thing. At least when you crash, you know something is wrong and can fix it.
You also seem to have forgotten to free the memory you malloc.
If you are going to use malloc, you should use free at the end:
int* a = malloc(20);
...
free(a);
You also are only mallocing 20 characters. If you go over that, you will do horrible things in memory. 20 seems too short, you will have only 19 characters plus the null on the end to play with but if you do have 20 characters each in a and b, you would need 40 characters in c.
If this is an assignment to use malloc, then use it, but you should free when you are done. If you don't have to use malloc, this example does not show a reason for using it since you are allocating a small, constant amount of memory.
You are initializing c:
char* c = {0};
In a way that makes no sense.
The {0} is an array with a single zero value. c is pointing to it, but then you immediately point it at something else and never look at your little array again.
You probably mean that C is pointing to nothing at first.
That would be:
char* c = NULL;
but then you are immediately wiping out the null, so why initialize c, but not a and b?
As a general rule, you should not declare values and initialize them later. You can always do something stupid and use them before they are initialized. Instead, initialize as you declared the:
int* a = malloc(20);
int* b = malloc(20);
int* c = malloc(40);
Incidentally, the size of a char is by definition 1, so:
20* sizeof(char)
is the same as 20.
You probably saw an example like:
20 * sizeof(int)
Since sizeof(int) which is not 1 the above does something. Typically sizeof(int) is 4 bytes, so the above would allocate 80 bytes.
gets is unsafe, since it doesn't say how long the buffer is
ALWAYS use fgets instead of gets. (see below).
Many computers have been hacked using this bug (see http://en.wikipedia.org/wiki/Robert_Tappan_Morris)
Still, since malloc is not really needed, in your code, you really should write:
enum { SIZE = 128 };
char a[SIZE];
fgets(a, SIZE, STDIN);
char b[SIZE];
fgets(b, SIZE, STDIN);
char c[SIZE*2];
int i;
int j = 0;
for (i = 0; a[i] != '\0' && i < 127; i++)
c[j++] = a[i];
for (i; b[i] != '\0' && i < 127; i++)
c[j++] = a[i];
c[j] = '\0';
...
Last, I don't know if you are learning C or C++. I will simply point out that this kind of programming is a lot easier in C++ where a lot of the work is done for you. You can first get concatenation done the easy way, then learn all the pointer manipulation which is harder.
#include <string>
#include <iostream>
using namespace std;
int main() {
string a,b,c;
getline(cin, a); // read in a line
getline(cin, b);
c = a + b;
cout << c;
}
Of course, you still need to learn this low-level pointer stuff to be a sophisticated programmer in C++, but if the purpose is just to read in and concatenate lines, C++ makes it a lot easier.

You are not properly null-terminating c. Add this before the printf:
c[i] = '\0';
Leaving out null-termination will seem to work correctly if the char at i happens to be 0, but you need to set it to be sure.

The string c is not being terminated with a null char, this means printf does not know where to stop and will likely segfault your program when it overruns. The reason you may be getting sporadic success is that there is a random chance the malloced area has been pre zeroed when you allocate it, if this is the case it will succeed as a null char is represented as a literal 0 byte.
there are two solutions available to you here, first you could manually terminate the string with a null char as so:
c[i] = '\0';
Second you can use calloc instead of malloc, it guarantees the memory is always pre zeroed.
As a side note you should likely add some length checking to your code to ensure c will not overflow if A and B are both over 10. (or just make c 40 long)
I hope this helps.

Find String Length without recursion in C

#include<stdio.h>
#include<conio.h>
void main()
{
int str1[25];
int i=0;
printf("Enter a string\n");
gets(str1);
while(str1[i]!='\0')
{
i++;
}
printf("String Length %d",i);
getch();
return 0;
}
i'm always getting string length as 33. what is wrong with my code.

That is because, you have declared your array as type int
int str1[25];
^^^-----------Change it to `char`

You don't show an example of your input, but in general I would guess that you're suffering from buffer overflow due to the dangers of gets(). That function is deprecated, meaning it should never be used in newly-written code.
Use fgets() instead:
if(fgets(str1, sizeof str1, stdin) != NULL)
{
/* your code here */
}
Also, of course your entire loop is just strlen() but you knew that, right?
EDIT: Gaah, completely missed the mis-declaration, of course your string should be char str1[25]; and not int.

So, a lot of answers have already told you to use char str1[25]; instead of int str1[25] but nobody explained why. So here goes:
A char has length of one byte (by definition in C standard). But an int uses more bytes (how much depends on architecture and compiler; let's assume 4 here). So if you access index 2 of a char array, you get 1 byte at memory offset 2, but if you access index 2 of an int array, you get 4 bytes at memory offset 8.
When you call gets (which should be avoided since it's unbounded and thus might overflow your array), a string gets copied to the address of str1. That string really is an array of char. So imaging the string would be 123 plus terminating null character. The memory would look like:
Adress: 0 1 2 3
Content: 0x31 0x32 0x33 0x00
When you read str1[0] you get 4 bytes at once, so str1[0] does not return 0x31, you'll get either 0x00333231 (little-endian) or 0x31323300 (big endian).
Accessing str1[1] is already beyond the string.
Now, why do you get a string length of 33? That's actually random and you're "lucky" that the program didn't crash instead. From the start address of str1, you fetch int values until you finally get four 0 bytes in a row. In your memory, there's some random garbage and by pure luck you encounter four 0 bytes after having read 33*4=132 bytes.
So here you can already see that bounds checks are very important: your array is supposed to contain 25 characters. But gets may already write beyond that (solution: use fgets instead). Then you scan without bounds and may thus also access memory well beyond you array and may finally run into non-existing memory regions (which would crash your program). Solution for that: do bounds checks, for example:
// "sizeof(str1)" only works correctly on real arrays here,
// not on "char *" or something!
int l;
for (l = 0; l < sizeof(str1); ++l) {
if (str1[l] == '\0') {
// End of string
break;
}
}
if (l == sizeof(str1)) {
// Did not find a null byte in array!
} else {
// l contains valid string length.
}

I would suggest certain changes to your code.
1) conio.h
This is not a header that is in use. So avoid using it.
2) gets
gets is also not recommended by anyone. So avoid using it. Use fgets() instead
3) int str1[25]
If you want to store a string it should be
char str1[25]

The problem is in the string declaration int str1[25]. It must be char and not int
char str1[25]

void main() //"void" should be "int"
{
int str1[25]; //"int" should be "char"
int i=0;
printf("Enter a string\n");
gets(str1);
while(str1[i]!='\0')
{
i++;
}
printf("String Length %d",i);
getch();
return 0;
}

Strcpy: behaving more like 'strcut'

char a[3], b[3];
strcpy(a,"abc");
printf("a1 = %s\n", a);
strcpy(b,a);
printf("a2 = %s\n", a);
printf("b = %s\n", b);
From how I understand strcpy to work the output would be:
a1 = abc
a2 = abc
b = abc
Instead I obtain
a1 = abc
a2 =
b = abc
Why when I call strcpy the second time does it (apparently) erase the contents of a?
Thanks

This is a buffer overflow problem - your a and b are too short – they don't have room for the null terminator. What is happening is a is just after b in memory, so when strcpy(b,a) executes, the null terminator stored at the end of b is actually the same memory location as the first character of a. This makes a suddenly an empty string.
For starters, make the lengths of the arrays 4 instead of 3. This is okay in sandbox/play/learning mode, but consider in production code:
Use safer string functions (e.g. strncpy) to avoid buffer overflows.
Use character arrays/buffers that support variable size or pre-calculation of the size required to fit your data.

Since you arrays are too small and do not have room for the null terminator you are most likely overwriting a when you try to copy a to b since the strcpy does not know when to stop copying. This declaration would fix the problem for this particular program:
char a[4], b[4];
In the general case you need to ensure that your destination has enough to space to accommodate the source as well as the null terminator.
This example gives you a better idea of what is going on, this is just for demonstration purposes and you should use code like this for anything else but to learn. This works for me in ideone and you can see if live here but may not work properly in other compilers since we are invoking undefined behavior:
#include <stdio.h>
#include <string.h>
int main()
{
char a[3], b[4];
// a will have a lower address in memory than b
printf("%p %p\n", a, b);
// "abc" is a null terminated literal use a size of 4 to force a copy of null
strncpy(a,"abc",4);
// printf will not overrun buffer since we terminated it
printf("a2 = %s\n", a);
// explicitly only copy 3 bytes
strncpy(b,a,3);
// manually null terminate b
b[3] = '\0' ;
// So we can prove we are seeing b's contents
b[0] = 'z' ;
// This will overrun into b now since b[0] is no longer null
printf("a2 = %s\n", a);
printf("b = %s\n", b);
}

The first strcpy(a,"abc") is already wrong. Don't get confused with char array versus C-String... a C-String is a always a char array, but a char array is NOT always a C-String.
A C-String must have a '\0' char in the end. So when you do strcpy "abc" -> a[3] you are actually moving the following 4 bytes to your array { 'a', 'b', 'c', '\0' }
Because a and b were created together, b is right ahead a. When you print out a it goes fine IN THIS CASE because printf() still can find a '\0' to identify as the end of string, despite it's wrong... because your '\0' char is the the area reserved to b.
The following problems are all related to the same thing...
The solution is: the buffer to your C-String must the the maximum size of your string + 1, so you can guarantee you will have room for the '\0' char. If you need for more details, google for "C-String" or "null-terminated string".

You've made a very common beginners mistake. In C, there is no string primitive; when we talk about strings, we're really talking about null-terminated character arrays (or buffers, I don't care what nomenclature you like). so your char[3] will hold a string of 2 letters, plus the null terminator. Another subtle issue is that in memory, they will be laid out on the stack as a[0]a[1]a[2]b[0]b[1]b[2]--and this is the reason you didn't crash when you deserved to. See "abc" is REALLY "abc\0", so a[3] == c and b[0] == \0, and since the behavior is undefined when strings overlap (as these do), I suspect that your implementation just copied chars until it copies a \0. That being the case, strcpy(a, b) will result in a being an empty string.
On the other hand, your program works as it was written to. What you wrote isn't what you meant :)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

K&R code example confusion: copy string - c

Related

string gets filled with garbage

How does a char pointer differ from an int pointer in the below code?

why does this simple code gives not work at times????then when i restart my code blocks it does work at times

Find String Length without recursion in C

Strcpy: behaving more like 'strcut'

Categories

Resources