C memory issue with char*

C memory issue with char* - c

I need help with my C code. I have a function that sets a value to the spot in memory
to the value that you have input to the function.
The issue that I am facing is if the pointer moves past the allocated amount of memory
It should throw an error. I am not sure how to check for this issue.
unsignded char the_pool = malloc(1000);
char *num = a pointer to the start of the_pool up to ten spots
num[i] = val;
num[11] = val; //This should throw an error in my function which
So how can I check to see that I have moved into unauthorized memory space.

C will not catch this error for you. You must do it yourself.
For example, you could safely wrap access to your array in a function:
typedef struct
{
char *data;
int length;
} myArrayType;
void MakeArray( myArrayType *p, int length )
{
p->data = (char *)malloc(length);
p->length = length;
}
int WriteToArrayWithBoundsChecking( myArrayType *p, int index, char value )
{
if ( index >= 0 && index < p->length )
{
p->data[index] = value;
return 1; // return "success"
}
else
{
return 0; // return "failure"
}
}
Then you can look at the return value of WriteToArrayWithBoundsChecking() to see if your write succeeded or not.
Of course you must remember to clean up the memory pointed at by myArrayType->data when you are done. Otherwise you will cause a leak.

dont you mean?
num[11] = val
Yes there is no way to check that it is beyond bounds except doing it yourself, C provides no way to do this. Also note that arrays start at zero so num[10] is also beyond bounds.

The standard defines this as Undefined behavior.
It might work, it might not, you never know, when coding in C/C++, make sure you check for bounds before accessing your arrays

Common C compilers will not perform array bounds checking for you.
Some compilers are available that claim to support array bounds -- but their performance is usually poor enough compared to the normal compilers that they are usually not distributed far and wide.
There are even dialects of C intended to provide memory safety, but again, these usually do not get very far. (The linked Cyclone, for example, only supports 32 bit platforms, last time I looked into it.)
You may build your own datastructures to provide bounds checking if you wish. If you maintain a structure that includes a pointer to the start of your data, a data member that includes the allocated size, and functions that work on the structure, you can implement all this. But the onus is entirely on you or your environment to provide these datastructures.

I guess you could use sizeof to avoid your array access out of bound index. But c allows you to access some memory out of your array bound. That's OK for c compiler, and OS will manage the behavior when you do that.
C/C++ doesn't actually do any boundary checking with regards to arrays. It depends on the OS to ensure that you are accessing valid memory.
You could use array like this:
type name[size];

if you are using the Visual Studio 2010 ( or 2011 Beta ) it will till you after u try to free the allocated memory.
there is advanced tools to check for leaked memory.
in you example, you have actually moved to unauthorized memory space indeed. your indexes should be between 0 to ( including ) 999.

Related

Declaring arraz with dynamic content

I want to make a program which will say how many big and short letters is in the word and such, but run in to the problem I can't declare content of array dynamically. This is all C code.
I tried this:
char something;
scanf("%c",somethnig);
char somethingmore[]=something;
printf("%c",something[0])
but it wasn't possible to compile I also tried something like this:
char *something;
scanf("%c",something);
printf("%c",something[0]);
which was possible to compile but crushed when called array pointer(I apologize if the naming is wrong) I programing beginner so this is maybe silly question.
This is all just example of problem I run to not code of my program.

Well, disregarding the weirdly wrong syntax in your snippet, I think a good answer comes down to remind you of one thing:
C doesn't do any memory management for you.
Or, in other words, managing memory has to be done explicitly. As a consequence, arrays have a fixed size in C (must be known at compile time, so the compiler can reserve appropriate space in the binary, typically in a data segment, or on the stack for a local variable).
One notable exception is variable length arrays in c99, but even with them, the size of the array can be set only one time -- at initialization. It's a matter of taste whether to consider this a great thing or just a misfeature, but it will not solve your problem of changing the size of something at runtime.
If you want to dynamically grow something, there's only one option: make it an allocated object and manage memory for it yourself, using the functions malloc(), calloc(), realloc() and free(). All these functions are part of standard C, so you should read up on them. A typical usage (not related to your question) would be something like:
#include <stdlib.h>
int *list = 0;
size_t capacity = 0;
size_t count = 0;
void append(int value)
{
if (capacity)
{
if (count == capacity)
{
/* reserve more space, for real world code check realloc()
* return value */
capacity *= 2;
list = realloc(list, capacity * sizeof(int));
}
}
else
{
/* reserve an initial amount, for real world code check malloc()
* return value */
capacity = 16;
list = malloc(capacity * sizeof(int));
}
list[count++] = value;
}
This is very simplified, you'd probably define a container as a struct containing your pointer to the "array" as well as the capacity and count members and define functions working on that struct in some real world code. Or you could go and use predefined containers in a library like e.g. glib.

Does cyclone perform static or dynamic checks on fat pointers?

I am working my way thru Cyclone: A Safe Dialect of C for a PL class. The paper's authors explain that they've added a special 'fat' pointer that stores bounds information to prevent buffer overflows. But they don't specify if the check on this pointer is static or dynamic. The example they give seems to imply that the programmer must remember to check the size of the array in order to check that they don't exceed the buffer. This seems to open up the possibility of programming errors, just like in C. I thought the whole idea of Cyclone was to make such errors impossible. Does the language have a check? Does it just make it harder to make programming mistakes?
int strlen(const char ?s) {
int i, n;
if (!s) return 0;
n = s.size; //what if the programmer forgets to do this.. or accidentally adds an n++;
for (i = 0; i < n; i++,s++)
if (!*s) return i;
return n;
}

"Fat" pointers support pointer arithmetic with run-time bounds
checking.
Obtained from Wikipedia by googling for “fat pointers”.

How to prevent dangling pointers/junk in c?

I'm new to C and haven't really grasped when C decides to free an object and when it decides to keep an object.
heap_t is pointer to a struct heap.
heap_t create_heap(){
heap_t h_t = (heap_t)malloc(sizeof(heap));
h_t->it = 0;
h_t->len = 10;
h_t->arr = (token_t)calloc(10, sizeof(token));
//call below a couple of times to fill up arr
app_heap(h_t, ENUM, "enum", 1);
return h_t;
}
putting h_t through
int app_heap(heap_t h, enum symbol s, char* word, int line){
int it = h->it;
int len = h->len;
if (it + 1 < len ){
token temp;
h->arr[it] = temp;
h->arr[it].sym = s;
h->arr[it].word = word;
h->arr[it].line = line;
h->it = it + 1;
printf(h->arr[it].word);
return 1;
} else {
h->len = len*2;
h->arr = realloc(h->arr, len*2);
return app_heap(h, s, word, line);
}
}
Why does my h_t->arr fill up with junk and eventually I get a segmentation fault? How do I fix this? Any C coding tips/styles to avoid stuff like this?

First, to answer your question about the crash, I think the reason you are getting segmentation fault is that you fail to multiply len by sizeof(token) in the call to realloc. You end up writing past the end of the block that has been allocated, eventually triggering a segfault.
As far as "deciding to free an object and when [...] to keep an object" goes, C does not decide any of it for you: it simply does it when you tell it to by calling free, without asking you any further questions. This "obedience" ends up costing you sometimes, because you can accidentally free something you still need. It is a good idea to NULL out the pointer, to improve your chance of catching the issue faster (unfortunately, this is not enough to eliminate the problem altogether, because of shared pointers).
free(h->arr);
h -> arr = NULL; // Doing this is a good practice
To summarize, managing memory in C is a tedious task that requires a lot of thinking and discipline. You need to check the result of every allocation call to see if it has failed, and perform many auxiliary tasks when it does.

C does not "decide" anything, if you have allocated something yourself with an explicit call to e.g. malloc(), it will stay allocated until you free() it (or until the program terminates, typically).
I think this:
token temp;
h->arr[it] = temp;
h->arr[it].sym = s;
/* more accesses */
is very weird, the first two lines don't do anything sensible.
As pointed out by dasblinkenlight, you're failing to scale the re-allocation into bytes, which will cause dramatic shrinkage of the array when it tries to grow, and corrupt it totally.
You shouldn't cast the return values of malloc() and realloc(), in C.
Remember that realloc() might fail, in which case you will lose your pointer if you overwrite it like you do.
Lots of repetition in your code, i.e. realloc(h->arr, len*2) instead of realloc(h->arr, h->len * sizeof *h->arr) and so on.
Note how the last bullet point also fixes the realloc() scaling bug mentioned above.

You're not reallocating to the proper size, the realloc statement needs to be:
realloc(h->arr, sizeof(token) * len*2);
^^^^^^^^^^^^
(Or perhaps better realloc(h->arr, sizeof *h->arr * h->h_len);)
In C, you are responsible to free the memory you allocate. You have to free() the memory you've malloc/calloc/realloc'ed when it's suitable to do so. The C runtime never frees anything, except when the program has terminated(some more esoteric systems might not release the memory even then).
Also, try to be consistent, the general form for allocating is always T *foo = malloc(sizeof *foo), and dont duplicate stuff.
e.g.
h_t->arr = (token_t)calloc(10, sizeof(token));
^^^^^^^^ ^^ ^^^^^^^^^^^^^
Don't cast the return value of malloc in C. It's unncessesary and might hide a serious compiler warning and bug if you forget to include stdlib.h
the cast is token_t but the sizeof applies to token, why are they different, and are they the same type as *h_t->arr ?
You already have the magic 10 value, use h_t->len
If you ever change the type of h_t->arr, you have to remember to change the sizeof(..)
So make this
h_t->arr = calloc(h_t->len, sizeof *h_t->arr);

Two main problems in creating dangling pointers in C are the not assigning
NULL to a pointer after freeing its allocated memory, and shared pointers.
There is a solution to the first problem, of automatically nulling out the pointer.
void SaferFree(void *AFree[])
{
free(AFree[0]);
AFree[0] = NULL;
}
The caller, instead calling
free(p);
will call
SaferFree(&p);
In respect to the second and harder to be siolved issue:
The rule of three says:
If you need to explicitly declare either the destructor, copy constructor or copy assignment operator yourself, you probably need to explicitly declare all three of them.
Sharing a pointer in C is simply copying it (copy assignment). It means that using the rule of three (or the general rule of 0)
when programming in C obliges the programmer to supply a way to construct and especially destruct such an assignment, which is possible, but not an
easy task especially when C does not supply a descructor that is implicitly activated as in C++.

Why am i able to access other variables using array indexing?

Here len is at A[10] and i is at A[11]. Is there a way to catch these errors??
I tried compiling with gcc -Wall -W but no warnings are displayed.
int main()
{
int A[10];
int i, len;
len = sizeof(A) / sizeof(0[A]);
printf("Len = %d\n",len);
for(i = 0; i < len; ++i){
A[i] = i*19%7;
}
A[i] = 5;
A[i + 1] = 6;
printf("Len = %d i = %d\n",len,i);
return 0;
}
Output :
Len = 10
Len = 5 i = 6

You are accessing memory outside the bounds of the array; in C, there is no bounds checking done on array indices.
Accessing memory beyond the end of the array technically results in undefined behavior. This means that there are no guarantees about what happens when you do it. In your example, you end up overwriting the memory occupied by another variable. However, undefined behavior can also cause your application to crash, or worse.
Is there a way to catch these errors?
The compiler can catch some errors like this, but not many. It is often impossible to catch this sort of error at compile-time and report a warning.
Static analysis tools can catch other instances of this sort of error and are usually built to report warnings about code that is likely to cause this sort of error.

C does not generally do bounds-checking, but a number of people have implemented bounds-checking for C. For instance there is a patch for GCC at http://sourceforge.net/projects/boundschecking/. Of course bounds-checking does have some overhead, but it can often be enabled and disabled on a per-file basis.

The array allocation of A is adjacent in memory to i and len. Remember that when you address via an array, it's exactly like using pointers, and you're walking off the end of the array, bumping into the other stuff you put there.
C by default does not do bounds checking. You're expected to be careful as a programmer; in exchange you get speed and size benefits.
Usually external tools, like lint, will catch the problems via static code analysis. Some libraries, depending on compiler vendor, will add additional padding or memory protection to detect when you've walked off the end.
Lots of interesting, dangerous, and non-portable things reside in memory at "random spots." Most of the house keeping for heap memory allocations occur in memory locations before the one the compiler gives you.
The general rule is that if you didn't allocate or request it, don't mess with it.

i's location in memory is just past the end of A. That's not guaranteed with every compiler and every architecture, but most probably wouldn't have a reason to do it any other way.
Note that counting from 0 to 9, you have 10 elements.

Array indexing starts from 0. Hence the size of array is equal to one less than the declared value. You are overwriting the memory beyond what is allowed.
These errors may not be reported as warnings but you can use tools like prevent, sparrow, Klockworks or purify to find such "malpractices" if i may call them that.

The short answer is that local variables are al-located on stack, and indexing is just like *(ptr + index). So it could happen that the room for int y[N] is adjacent to the room for another int x; e.g. x is located after the last y. So, y[N-1] is this last y, while y[N] is the int past the last y, and in this case, by accident, it happens you get x (or whatever in your practical example). But it is absolutely not a sure fact what you can get going past the bounds of an array and so you can't rely on that. Even though undetected, it's a "index out of bound error", and a source of bugs.

C Memory Management

I've always heard that in C you have to really watch how you manage memory. And I'm still beginning to learn C, but thus far, I have not had to do any memory managing related activities at all.. I always imagined having to release variables and do all sorts of ugly things. But this doesn't seem to be the case.
Can someone show me (with code examples) an example of when you would have to do some "memory management" ?

There are two places where variables can be put in memory. When you create a variable like this:
int a;
char c;
char d[16];
The variables are created in the "stack". Stack variables are automatically freed when they go out of scope (that is, when the code can't reach them anymore). You might hear them called "automatic" variables, but that has fallen out of fashion.
Many beginner examples will use only stack variables.
The stack is nice because it's automatic, but it also has two drawbacks: (1) The compiler needs to know in advance how big the variables are, and (2) the stack space is somewhat limited. For example: in Windows, under default settings for the Microsoft linker, the stack is set to 1 MB, and not all of it is available for your variables.
If you don't know at compile time how big your array is, or if you need a big array or struct, you need "plan B".
Plan B is called the "heap". You can usually create variables as big as the Operating System will let you, but you have to do it yourself. Earlier postings showed you one way you can do it, although there are other ways:
int size;
// ...
// Set size to some value, based on information available at run-time. Then:
// ...
char *p = (char *)malloc(size);
(Note that variables in the heap are not manipulated directly, but via pointers)
Once you create a heap variable, the problem is that the compiler can't tell when you're done with it, so you lose the automatic releasing. That's where the "manual releasing" you were referring to comes in. Your code is now responsible to decide when the variable is not needed anymore, and release it so the memory can be taken for other purposes. For the case above, with:
free(p);
What makes this second option "nasty business" is that it's not always easy to know when the variable is not needed anymore. Forgetting to release a variable when you don't need it will cause your program to consume more memory that it needs to. This situation is called a "leak". The "leaked" memory cannot be used for anything until your program ends and the OS recovers all of its resources. Even nastier problems are possible if you release a heap variable by mistake before you are actually done with it.
In C and C++, you are responsible to clean up your heap variables like shown above. However, there are languages and environments such as Java and .NET languages like C# that use a different approach, where the heap gets cleaned up on its own. This second method, called "garbage collection", is much easier on the developer but you pay a penalty in overhead and performance. It's a balance.
(I have glossed over many details to give a simpler, but hopefully more leveled answer)

Here's an example. Suppose you have a strdup() function that duplicates a string:
char *strdup(char *src)
{
char * dest;
dest = malloc(strlen(src) + 1);
if (dest == NULL)
abort();
strcpy(dest, src);
return dest;
}
And you call it like this:
main()
{
char *s;
s = strdup("hello");
printf("%s\n", s);
s = strdup("world");
printf("%s\n", s);
}
You can see that the program works, but you have allocated memory (via malloc) without freeing it up. You have lost your pointer to the first memory block when you called strdup the second time.
This is no big deal for this small amount of memory, but consider the case:
for (i = 0; i < 1000000000; ++i) /* billion times */
s = strdup("hello world"); /* 11 bytes */
You have now used up 11 gig of memory (possibly more, depending on your memory manager) and if you have not crashed your process is probably running pretty slowly.
To fix, you need to call free() for everything that is obtained with malloc() after you finish using it:
s = strdup("hello");
free(s); /* now not leaking memory! */
s = strdup("world");
...
Hope this example helps!

You have to do "memory management" when you want to use memory on the heap rather than the stack. If you don't know how large to make an array until runtime, then you have to use the heap. For example, you might want to store something in a string, but don't know how large its contents will be until the program is run. In that case you'd write something like this:
char *string = malloc(stringlength); // stringlength is the number of bytes to allocate
// Do something with the string...
free(string); // Free the allocated memory

I think the most concise way to answer the question in to consider the role of the pointer in C. The pointer is a lightweight yet powerful mechanism that gives you immense freedom at the cost of immense capacity to shoot yourself in the foot.
In C the responsibility of ensuring your pointers point to memory you own is yours and yours alone. This requires an organized and disciplined approach, unless you forsake pointers, which makes it hard to write effective C.
The posted answers to date concentrate on automatic (stack) and heap variable allocations. Using stack allocation does make for automatically managed and convenient memory, but in some circumstances (large buffers, recursive algorithms) it can lead to the horrendous problem of stack overflow. Knowing exactly how much memory you can allocate on the stack is very dependent on the system. In some embedded scenarios a few dozen bytes might be your limit, in some desktop scenarios you can safely use megabytes.
Heap allocation is less inherent to the language. It is basically a set of library calls that grants you ownership of a block of memory of given size until you are ready to return ('free') it. It sounds simple, but is associated with untold programmer grief. The problems are simple (freeing the same memory twice, or not at all [memory leaks], not allocating enough memory [buffer overflow], etc) but difficult to avoid and debug. A hightly disciplined approach is absolutely mandatory in practive but of course the language doesn't actually mandate it.
I'd like to mention another type of memory allocation that's been ignored by other posts. It's possible to statically allocate variables by declaring them outside any function. I think in general this type of allocation gets a bad rap because it's used by global variables. However there's nothing that says the only way to use memory allocated this way is as an undisciplined global variable in a mess of spaghetti code. The static allocation method can be used simply to avoid some of the pitfalls of the heap and automatic allocation methods. Some C programmers are surprised to learn that large and sophisticated C embedded and games programs have been constructed with no use of heap allocation at all.

There are some great answers here about how to allocate and free memory, and in my opinion the more challenging side of using C is ensuring that the only memory you use is memory you've allocated - if this isn't done correctly what you end up with is the cousin of this site - a buffer overflow - and you may be overwriting memory that's being used by another application, with very unpredictable results.
An example:
int main() {
char* myString = (char*)malloc(5*sizeof(char));
myString = "abcd";
}
At this point you've allocated 5 bytes for myString and filled it with "abcd\0" (strings end in a null - \0).
If your string allocation was
myString = "abcde";
You would be assigning "abcde" in the 5 bytes you've had allocated to your program, and the trailing null character would be put at the end of this - a part of memory that hasn't been allocated for your use and could be free, but could equally be being used by another application - This is the critical part of memory management, where a mistake will have unpredictable (and sometimes unrepeatable) consequences.

A thing to remember is to always initialize your pointers to NULL, since an uninitialized pointer may contain a pseudorandom valid memory address which can make pointer errors go ahead silently. By enforcing a pointer to be initialized with NULL, you can always catch if you are using this pointer without initializing it. The reason is that operating systems "wire" the virtual address 0x00000000 to general protection exceptions to trap null pointer usage.

Also you might want to use dynamic memory allocation when you need to define a huge array, say int[10000]. You can't just put it in stack because then, hm... you'll get a stack overflow.
Another good example would be an implementation of a data structure, say linked list or binary tree. I don't have a sample code to paste here but you can google it easily.

(I'm writing because I feel the answers so far aren't quite on the mark.)
The reason you have to memory management worth mentioning is when you have a problem / solution that requires you to create complex structures. (If your programs crash if you allocate to much space on the stack at once, that's a bug.) Typically, the first data structure you'll need to learn is some kind of list. Here's a single linked one, off the top of my head:
typedef struct listelem { struct listelem *next; void *data;} listelem;
listelem * create(void * data)
{
listelem *p = calloc(1, sizeof(listelem));
if(p) p->data = data;
return p;
}
listelem * delete(listelem * p)
{
listelem next = p->next;
free(p);
return next;
}
void deleteall(listelem * p)
{
while(p) p = delete(p);
}
void foreach(listelem * p, void (*fun)(void *data) )
{
for( ; p != NULL; p = p->next) fun(p->data);
}
listelem * merge(listelem *p, listelem *q)
{
while(p != NULL && p->next != NULL) p = p->next;
if(p) {
p->next = q;
return p;
} else
return q;
}
Naturally, you'd like a few other functions, but basically, this is what you need memory management for. I should point out that there are a number tricks that are possible with "manual" memory management, e.g.,
Using the fact that malloc is guaranteed (by the language standard) to return a pointer divisible by 4,
allocating extra space for some sinister purpose of your own,
creating memory pools..
Get a good debugger... Good luck!

#Euro Micelli
One negative to add is that pointers to the stack are no longer valid when the function returns, so you cannot return a pointer to a stack variable from a function. This is a common error and a major reason why you can't get by with just stack variables. If your function needs to return a pointer, then you have to malloc and deal with memory management.

#Ted Percival:
...you don't need to cast malloc()'s return value.
You are correct, of course. I believe that has always been true, although I don't have a copy of K&R to check.
I don't like a lot of the implicit conversions in C, so I tend to use casts to make "magic" more visible. Sometimes it helps readability, sometimes it doesn't, and sometimes it causes a silent bug to be caught by the compiler. Still, I don't have a strong opinion about this, one way or another.
This is especially likely if your compiler understands C++-style comments.
Yeah... you caught me there. I spend a lot more time in C++ than C. Thanks for noticing that.

In C, you actually have two different choices. One, you can let the system manage the memory for you. Alternatively, you can do that by yourself. Generally, you would want to stick to the former as long as possible. However, auto-managed memory in C is extremely limited and you will need to manually manage the memory in many cases, such as:
a. You want the variable to outlive the functions, and you don't want to have global variable. ex:
struct pair{
int val;
struct pair *next;
}
struct pair* new_pair(int val){
struct pair* np = malloc(sizeof(struct pair));
np->val = val;
np->next = NULL;
return np;
}
b. you want to have dynamically allocated memory. Most common example is array without fixed length:
int *my_special_array;
my_special_array = malloc(sizeof(int) * number_of_element);
for(i=0; i
c. You want to do something REALLY dirty. For example, I would want a struct to represent many kind of data and I don't like union (union looks soooo messy):
struct data{
int data_type;
long data_in_mem;
};
struct animal{/*something*/};
struct person{/*some other thing*/};
struct animal* read_animal();
struct person* read_person();
/*In main*/
struct data sample;
sampe.data_type = input_type;
switch(input_type){
case DATA_PERSON:
sample.data_in_mem = read_person();
break;
case DATA_ANIMAL:
sample.data_in_mem = read_animal();
default:
printf("Oh hoh! I warn you, that again and I will seg fault your OS");
}
See, a long value is enough to hold ANYTHING. Just remember to free it, or you WILL regret. This is among my favorite tricks to have fun in C :D.
However, generally, you would want to stay away from your favorite tricks (T___T). You WILL break your OS, sooner or later, if you use them too often. As long as you don't use *alloc and free, it is safe to say that you are still virgin, and that the code still looks nice.

Sure. If you create an object that exists outside of the scope you use it in. Here is a contrived example (bear in mind my syntax will be off; my C is rusty, but this example will still illustrate the concept):
class MyClass
{
SomeOtherClass *myObject;
public MyClass()
{
//The object is created when the class is constructed
myObject = (SomeOtherClass*)malloc(sizeof(myObject));
}
public ~MyClass()
{
//The class is destructed
//If you don't free the object here, you leak memory
free(myObject);
}
public void SomeMemberFunction()
{
//Some use of the object
myObject->SomeOperation();
}
};
In this example, I'm using an object of type SomeOtherClass during the lifetime of MyClass. The SomeOtherClass object is used in several functions, so I've dynamically allocated the memory: the SomeOtherClass object is created when MyClass is created, used several times over the life of the object, and then freed once MyClass is freed.
Obviously if this were real code, there would be no reason (aside from possibly stack memory consumption) to create myObject in this way, but this type of object creation/destruction becomes useful when you have a lot of objects, and want to finely control when they are created and destroyed (so that your application doesn't suck up 1GB of RAM for its entire lifetime, for example), and in a Windowed environment, this is pretty much mandatory, as objects that you create (buttons, say), need to exist well outside of any particular function's (or even class') scope.