Concatenating strings - need clarification - c

char * a = (char *) malloc(10);
strcpy(a,"string1");
char * x = "string2";
strcat(a,x);
printf("\n%s",a);
Here, I allocated only 10B to a, but still after concatenating a and x (combined size is 16B), C prints the answer without any problem.
But if I do this:
char * a = "string1";
char * x = "string2";
strcat(a,x);
printf("\n%s",a);
Then I get a segfault. Why is this? Why does the first one work despite lower memory allocation? Does strcat reallocate memory for me? If yes, why does the second one not work? Is it because a & x declared that way are unmodifiable string literals?

In your first example, a is allocated in the heap. So when you're concatenating the other string, something in the heap will be overwritten, but there is no write-protection.
In your second example, a points to a region of the memory that contains constants, and is readonly. Hence the seg fault.

The first one doesn't always work, it already caused an overflow. The second one, a is a pointer to the constant string which is stored in the data section, in a read-only page.

In the 2nd case what you have is a pointer to unmodifiable string literals,
In 1st case, you are printing out a heap memory location and in that case its undefined, you cannot guarantee that it will work every time.
(may be write it in a very large loop, yo may see this undefined behavior)

Your code is writing beyond the buffer that it's permitted, which causes undefined behavior. This can work and it can fail, and worse: it can look like it worked but cause seemingly unrelated failures later. The language allows you to do things like this because you're supposed to know what you're doing, but it's not recommended practice.
In your first case, of having used malloc to allocate buffers, you're actually being helped but not in a manner you should ever rely on. The malloc function allocates at least as much space as you've requested, but in practice it typically rounds up to a multiple of 16... so your malloc(10); probably got a 16 byte buffer. This is implementation specific and it's never a good idea to rely on something like that.
In your second case, it's likely that the memory pointed to by your a (and x) variable(s) is non-writable, which is why you've encountered a segfault.

Related

Dynamic memory allocation and pointers related concept doubts

On the first note: it is a new concept to me!!
I studied pointers and dynamic memory allocations and executed some program recently and was wondering in statement char*p="Computers" the string is stored in some memory location and the base address,
i.e the starting address of the string is stored in p, now I noticed I can perform any desired operations on the string, now my doubt is why do we use a special statement like malloc and calloc when we can just declare a string like this of the desired length.
If my understanding of the concept Is wrong please explain.
Thanks in advance.
In this declaration
char*p="Computers";
the pointer p is initialized by the address of the first character of the string literal "Computers".
String literals have the static storage duration. You may not change a string literal as for example
p[0] = 'c';
Any attempt to change a string literal results in undefined behavior.
The function malloc is used to allocate memory dynamically. For example if you want to create dynamically a character array that will contain the string "Computers" you should write
char *p = malloc( 10 ); // the same as `malloc( 10 * sizeof( char ) )`
strcpy( p, "Computers" );
You may change the created character array. For example
p[0] = 'c';
After the array is not required any more you should free the allocated memory like
free( p );
Otherwise the program can have a memory leak.
A simple answer to that would be by doing
char *p = "Computers";
you are basically declaring a fixed constant string. With that means you cannot edit anything inside the string. Trying to do so may result in Segmentation Fault. Using malloc and calloc would allow us to edit the string.
Simply do this on p[0] = 'c' and you will see the result
A statement like
char *p = "Computers";
is not an example of dynamic memory allocation. The memory for the string literal is set aside when the program starts up and held until the program terminates. You can’t resize that memory, and you’re not supposed to modify it (the behavior on doing so is undefined - it may work as expected, it may crash outright, it may do anything in between).
We use malloc, calloc, and realloc to allocate memory at runtime that needs to be writable, resizable, and doesn’t go away until we explicitly release it.
We have to use pointers to reference dynamically-allocated memory because that’s just how the language is designed, but pointers play a much larger role in C programming than just tracking dynamic memory.
as a novice, I described below as my own thinking…
Dynamic memory completely depends on the pointer. I mean without pointer knowledge you are able to cope up with dynamic memory allocation. (stdlib) library function where store calloc, malloc, relalloc and free.
malloc initialized no bit mentioned, calloc mainly used for the array.
realloc used to increase or decrease size.
To simply say it is not as hard as what you first think. if you declare an array[500] initial declare but you used 100 and the rest of 400 bits remove to use dynamic memory.

C null terminator's throught char* correct handelling

This question is aimed at improving my understanding of
what I can and cannot do with pointers when allocating and freeing:
The bellow code is not meant to run, but just set up a situation for the questions bellow.
char *var1 = calloc(8,sizeof(char));
char **var2 = calloc(3,sizeof(char*));
var1 = "01234567";
var1[2] = '\0';
var1[5] = '\0';
//var1 = [0][1][\0][3][4][\0][6][7]
var2[0] = var1[0];
var2[1] = var1[3];
var2[2] = var1[6];
free(var1);
free(var2);
given the following snippet
1: is it ok to write to a location after the \0 if you know the size you allocated.
2: Can I do what I did with var2 , if it points to a block that another pointer is pointing at?
3: are the calls to free ok? or will free die due to the \0 located throughout var1.
I printed out all the variables after free, and only the ones up to the first null got freed (changed to null or other weird and normal looking characters). Is that ok?
4: Any other stuff you wish to point out that is completely wrong and should be avoided.
Thank you very much.
Ok, let's just just recap what you have done here:
char *var1 = calloc(8,sizeof(char));
char **var2 = calloc(3,sizeof(char*));
So var1 is (a pointer to) a block of 8 chars, all set to zero \0.
And var2 is (a pointer to) a block of 3 pointers, all set to NULL.
So now it's the program's memory, it can do whatever it wants with it.
To answer your questions specifically ~
It's quite normal to write characters around inside your char block. It's a common programming pattern to parse string buffers by writing a \0 after a section of text to use everyday C string operations on it, but then point to the next character after the added \0 and continue parsing.
var2 is simply a bunch of char-pointers, it can point to whatever char is necessary, it doesn't necessarily have to be at the beginning of the string.
The calls to free() are somewhat OK (except for the bug - see below). It's normal for the content of free()d blocks to be overwritten when they are returned to the stack, so they often seem to have "rubbish" characters in them if printed out afterwards.
There is some issues with the assignment of var1 ~
var1 = "01234567";
Here you are saying "var1 now points to this constant string". Your compiler may have generated a warning about about this. Firstly the code assigns a const char* to a char* (hard-coded strings are const, but C compilers will only warn about this [EDIT: this is true for C++, not C, see comment from n.m.]). And secondly, the code lost all references to the block of memory that var1 used to point to. You can now never free() this memory - it has leaked. However, at the end of the program, the free() is trying to operate on a pointer-to a block of memory (the "01234567") which was not allocated on the heap. This is BAD. Since you're exiting immediately, there's no ill-effects, but if this was in the middle of execution, the next allocation (or next 1000th!) could crash weirdly. These sorts of problems are hard to debug.
Probably what you should have done here (I'm guessing your intention though) is used a string copy:
strncpy(var1, "01234567", 8);
With that operation instead of the assignment, everything is OK. This is because the digits are stored in the memory allocated on line1.
Question 4 - what's wrong
You 'calloc' some memory and store a pointer to it in var1. Then later you execute var1 = "01234567" which stores a pointer to a literal string in var1, thus losing the calloc'd memory. I imagine you thought you were copying a string. Use strcpy or similar.
Then you write zero values into what var1 points to. Since that's a literal string, it may fail if the literal is in read-only memory. The result is undefined.
free(var1) is not going to go well with a pointer to a literal. Your code may fail or you may get heap corruption.
Pointers don't work this way.
If someone wrote
int a = 6*9;
a = 42;
you would wonder why they ever bothered to initialise a to 6*9 in the first place — and you would be right. There's no reason to. The value returned by * is simply forgotten without being used. It could be never calculated in the first place and no one would know the difference. This is exactly equivalent to
int a = 42;
Now when pointers are involved, there's some kind of evil neural pathway in our brain that tries to tell us that a sequence of statements that is exactly like the one shown above is somehow working differently. Don't trust your brain. It isn't.
char *var1 = calloc(8,sizeof(char));
var1 = "01234567";
You would wonder why they ever bothered to initialise var1 to calloc(8,sizeof(char)); in the first place — and you would be right. There's no reason to. The value returned by calloc is simply forgotten without being used. It could be never calculated in the first place and no one would know the difference. This is exactly equivalent to
char* var1 = "01234567";
... which is a problem, because you cannot modify string literals.
What you probably want is
char *var1 = calloc(8, 1); // note sizeof(char)==1, always
strncpy (var1, "01234567", 8); // note not strcpy — you would need 9 bytes for it
or some variation of that.
var1 = "01234567"; is not correct because you assign a value of pointer to const char to a pointer to mutable char and causes a memory leak because the value of pointer to a calloc allocated buffer of 8 char stored in variable var1 is lost. It seems like you actually intended to initialize allocated array with the value of the string literal instead (though that would require allocation of an array of 9 items). Assignment var1[2] = '\0'; causes undefined behavior because the location var1 points to is not mutable. var2[0] = var1[0]; is wrong as well because you assign a value of char to pointer to char. Finally free(var1); will try to deallocate a pointer to buffer baking string literal, not something you allocated.

strcpy working no matter the malloc size?

I'm currently learning C programming and since I'm a python programmer, I'm not entirely sure about the inner workings of C. I just stumbled upon a really weird thing.
void test_realloc(){
// So this is the original place allocated for my string
char * curr_token = malloc(2*sizeof(char));
// This is really weird because I only allocated 2x char size in bytes
strcpy(curr_token, "Davi");
curr_token[4] = 'd';
// I guess is somehow overwrote data outside the allocated memory?
// I was hoping this would result in an exception ( I guess not? )
printf("Current token > %s\n", curr_token);
// Looks like it's still printable, wtf???
char *new_token = realloc(curr_token, 6);
curr_token = new_token;
printf("Current token > %s\n", curr_token);
}
int main(){
test_realloc();
return 0;
}
So the question is: how come I'm able to write more chars into a string than is its allocated size? I know I'm supposed to handle mallocated memory myself but does it mean there is no indication that something is wrong when I write outside the designated memory?
What I was trying to accomplish
Allocate a 4 char ( + null char ) string where I would write 4 chars of my name
Reallocate memory to acomodate the last character of my name
know I'm supposed to handle mallocated memory myself but does it mean there is no indication that something is wrong when I write outside the designated memory?
Welcome to C programming :). In general, this is correct: you can do something wrong and receive no immediate feedback that was the case. In some cases, indeed, you can do something wrong and never see a problem at runtime. In other cases, however, you'll see crashes or other behaviour that doesn't make sense to you.
The key term is undefined behavior. This is a concept that you should become familiar with if you continue programming in C. It means just like it sounds: if your program violates certain rules, the behaviour is undefined - it might do what you want, it might crash, it might do something different. Even worse, it might do what you want most of the time, but just occasionally do something different.
It is this mechanism which allows C programs to be fast - since they don't at runtime do a lot of the checks that you may be used to from Python - but it also makes C dangerous. It's easy to write incorrect code and be unaware of it; then later make a subtle change elsewhere, or use a different compiler or operating system, and the code will no longer function as you wanted. In some cases this can lead to security vulnerabilities, since unwanted behavior may be exploitable.
Suppose that you have an array as shown below.
int arr[5] = {6,7,8,9,10};
From the basics of arrays, name of the array is a pointer pointing to the base element of the array. Here, arr is the name of the array, which is a pointer, pointing to the base element, which is 6. Hence,*arr, literally, *(arr+0) gives you 6 as the output and *(arr+1) gives you 7 and so on.
Here, size of the array is 5 integer elements. Now, try accessing the 10th element, though the size of the array is 5 integers. arr[10]. This is not going to give you an error, rather gives you some garbage value. As arr is just a pointer, the dereference is done as arr+0,arr+1,arr+2and so on. In the same manner, you can access arr+10 also using the base array pointer.
Now, try understanding your context with this example. Though you have allocated memory only for 2 bytes for character, you can access memory beyond the two bytes allocated using the pointer. Hence, it is not throwing you an error. On the other hand, you are able to predict the output on your machine. But it is not guaranteed that you can predict the output on another machine (May be the memory you are allocating on your machine is filled with zeros and may be those particular memory locations are being used for the first time ever!). In the statement,
char *new_token = realloc(curr_token, 6); note that you are reallocating the memory for 6 bytes of data pointed by curr_token pointer to the new_tokenpointer. Now, the initial size of new_token will be 6 bytes.
Usually malloc is implemented such a way that it allocates chunks of memory aligned to paragraph (fundamental alignment) that is equal to 16 bytes.
So when you request to allocate for example 2 bytes malloc actually allocates 16 bytes. This allows to use the same chunk of memory when realloc is called.
According to the C Standard (7.22.3 Memory management functions)
...The pointer returned if the allocation succeeds is suitably aligned so
that it may be assigned to a pointer to any type of object
with a fundamental alignment requirement and then used to access such an
object or an array of such objects in the space allocated
(until the space is explicitly deallocated).
Nevertheless you should not rely on such behavior because it is not normative and as result is considered as undefined behavior.
No automatic bounds checking is performed in C.
The program behaviour is unpredictable.
If you go writing in the memory reserved for another process, you will end with a Segmentation fault, otherwise you will only corrupt data, ecc...

Why does this intentionally incorrect use of strcpy not fail horribly?

Why does the below C code using strcpy work just fine for me? I tried to make it fail in two ways:
1) I tried strcpy from a string literal into allocated memory that was too small to contain it. It copied the whole thing and didn't complain.
2) I tried strcpy from an array that was not NUL-terminated. The strcpy and the printf worked just fine. I had thought that strcpy copied chars until a NUL was found, but none was present and it still stopped.
Why don't these fail? Am I just getting "lucky" in some way, or am I misunderstanding how this function works? Is it specific to my platform (OS X Lion), or do most modern platforms work this way?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char *src1 = "123456789";
char *dst1 = (char *)malloc( 5 );
char src2[5] = {'h','e','l','l','o'};
char *dst2 = (char *)malloc( 6 );
printf("src1: %s\n", src1);
strcpy(dst1, src1);
printf("dst1: %s\n", dst1);
strcpy(dst2, src2);
printf("src2: %s\n", src2);
dst2[5] = '\0';
printf("dst2: %s\n", dst2);
return 0;
}
The output from running this code is:
$ ./a.out
src1: 123456789
dst1: 123456789
src2: hello
dst2: hello
First, copying into an array that is too small:
C has no protection for going past array bounds, so if there is nothing sensitive at dst1[5..9], then you get lucky, and the copy goes into memory that you don't rightfully own, but it doesn't crash either. However, that memory is not safe, because it has not been allocated to your variable. Another variable may well have that memory allocated to it, and later overwrite the data you put in there, corrupting your string later on.
Secondly, copying from an array that is not null-terminated:
Even though we're usually taught that memory is full of arbitrary data, huge chunks of it are zero'd out. Even though you didn't put a null-terminator in src2, chances are good that src[5] happens to be \0 anyway. This makes the copy succeed. Note that this is NOT guaranteed, and could fail on any run, on any platform, at anytime. But you got lucky this time (and probably most of the time), and it worked.
Overwriting beyond the bounds of allocated memory causes Undefined Behavior.
So in a way yes you got lucky.
Undefined behavior means anything can happen and the behavior cannot be explained as the Standard, which defines the rules of the language, does not define any behavior.
EDIT:
On Second thoughts, I would say you are really Unlucky here that the program works fine and does not crash. It works now does not mean it will work always, In fact it is a bomb ticking to blow off.
As per Murphy's Law:
"Anything that can go wrong will go wrong"["and most likely at the most inconvenient possible moment"]
[ ]- Is my edit to the Law :)
Yes, you're quite simply getting lucky.
Typically, the heap is contiguous. This means that when you write past the malloced memory, you could be corrupting the following memory block, or some internal data structures that may exist between user memory blocks. Such corruption often manifests itself long after the offending code, which makes debugging this type of bugs difficult.
You're probably getting the NULs because the memory happens to be zero-filled (which isn't guaranteed).
As #Als said, this is undefined behaviour. This may crash, but it doesn't have to.
Many memory managers allocate in larger chunks of memory and then hand it to the "user" in smaller chunks, probably a mutliple of 4 or 8 bytes. So your write over the boundary probably simply writes into the extra bytes allocated. Or it overwrites one of the other variables you have.
You're not malloc-ing enough bytes there. The first string, "123456789" is 10 bytes (the null terminator is present), and {'h','e','l','l','o'} is 6 bytes (again, making room for the null terminator). You're currently clobbering the memory with that code, which leads to undefined (i.e. odd) behavior.

Char* p, and scanf

I have been trying to look for a reason why the following code is failing, and I couldn't find one.
So please, excuse my ignorance and let me know what's happening here.
#include<stdio.h>
int main(void){
char* p="Hi, this is not going to work";
scanf("%s",p);
return 0;
}
As far as I understood, I created a pointer p to a contiguous area in the memory of the size 29 + 1(for the \0).
Why can't I use scanf to change the contents of that?
P.S Please correct me If I said something wrong about char*.
char* p="Hi, this is not going to work";
this does not allocate memory for you to write
this creates a String Literal which results inUndefined Behaviour every time you try to change its contents.
to use p as a buffer for your scanf do something like
char * p = malloc(sizeof(char) * 128); // 128 is an Example
OR
you could as well do:
char p[]="Hi, this is not going to work";
Which I guess is what you really wanted to do.
Keep in mind that this can still end up being UB because scanf() does not check whether the place you are using is indeed valid writable memory.
remember :
char * p is a String Literal and should not be modified
char p[] = "..." allocates enough memory to hold the String inside the "..." and may be changed (its contents I mean).
Edit :
A nice trick to avoid UB is
char * p = malloc(sizeof(char) * 128);
scanf("%126s",s);
p points to a constant literal, which may in fact reside in a read-only memory area (implementation dependent). At any rate, trying to overwrite that is undefined behaviour. I.e. it might result in nothing, or an immediate crash, or a hidden memory corruption which causes mysterious problems much later. Don't ever do that.
It is crashing because memory has not been allocated for p. Allocate memory for p and it should be ok. What you have is a constant memory area pointing to by p. When you attempt to write something in this data segment, the runtime environment will raise a trap which will lead to a crash.
Hope this answers your question
scanf() parses data entered from stdin (normally, the keyboard). I think you want sscanf().
However, the purpose of scanf() is to part a string with predefined escape sequences, which your test string doesn't have. So that makes it a little unclear exactly what you are trying to do.
Note that sscanf() takes an additional argument as the first argument, which specifies the string being parsed.

Resources