What are the benefits to use const pointers or const data? - c

How can a non const pointer (that allows to access in nearby memory) or a non const data be exploited?
I studied that a non const pointer can be exploited to access in nearby memory of computer.
But my question is: how?
I mean, if I have started my compiled program, how can an attacker exploit it if it is already compiled and running?
Is this a dangerous vulnerability? Or it isn't so important?

Constant Pointer
A constant pointer in C cannot change the address of the variable to which it is pointing, i.e., the address will remain constant. Therefore, we can say that if a constant pointer is pointing to some variable, then it cannot point to any other variable.
Constant Data
Code Maintenance
Say I have a need to use the speed of light for calculations throughout my project. The speed of light is not going to change so there is no need for a variable. I could simply copy and paste 300,000,000 (obviously without the commas) anytime I need the speed of light (in meters per second). This will work fine for now but down the road my calculations may require more precision so if I want to change 300,000,000 to 299,792,458 it will be a fairly painful process if I used this value a lot in my code. If I use a const I can use this the name of this const throughout the code so if I need to change its value I can just change it once at the beginning of the code.
Memory
This isn’t usually relevant given the computing power of modern devices but for software on embedded devices there may not be the RAM space for variables and by using a const its value is put into the program when the code is compiled. This allows storage space (on the HDD/SSD) to be used instead of the RAM.
Accidental Changes
If you need a value to stay constant use a const so that you don’t accidentally change its value later on because you forget that you need its value to remain the same. Think of it this way, if its value is not going to change throughout the code why wouldn’t you use a const.

Pointers in C are the most dangerous entity that has potential exploit your RAM. If you somehow know the base address of the pointer where it is pointing, use that address to move forward or backward in the memory and do whatever you want since in C there is no exception handling of OutofBoundException.
For that, you need to have a good understanding of pointers.

Related

Where does local const variable will get stored?

Where does local const variable will get stored? I have verified that, every where in function where const variable is used, get replaced with its value(like immediate value addressing mode). But if pointer is assigned to it then it gets stored on stack. Here I do not understand one thing how processor knows its constant value. Is there any read only section in stack like it present in .data section?
Generally, the processor does not know that an object is declared const in C.
Systems commonly have regions of memory that are marked read-only after a program is loaded, and static const objects are stored in such memory. For these objects, the processor enforces the read-only property.
Systems generally do not have read-only memory used for stack. This would be inherently difficult—the memory would need to be read-write when a function is starting, so that its stack frame can be constructed, but read-only at other times. So the program would be frequently changing the hardware memory protection settings. This would impair performance and is generally not considered worth while.
So programs generally have only a read-write stack available. When you declare an automatic (rather than static) const object, where can the compiler put it? As you note, it is often optimized into an immediate operand in instructions. However, when you take its address, it must have an address, so it must be in memory.
One idea might be that, since it is const, it will not chamge, so we only need one copy, so it can be stored in the static read-only section instead of on the stack. However, the C standard says that each different object has a different address. To comply with that requirement, the compiler has to create a different instance of the object in memory each time it is created in the C code. Putting it on the stack is an easy way to do this.
I think it totally depends on your tool-chain specific implementation. Variables are stored in RAM, program in Flash memory and constants either in RAM or Flash.
Correct me if I'm wrong.

Does making a variable a const or final save bytes or memory?

I've been working with a program and I've been trying to conserve bytes and storage space.
I have many variables in my C program, but I wondered if I could reduce the program's size by making some of the variables that don't change throughout the program const or final.
So my questions are these:
Is there any byte save when identifying static variables as constant?
If bytes are saved by doing this, why are they saved? How does the program store the variable differently if it is constant, and why does this way need less storage space?
If bytes are not saved by defining variables as constant, then why would a developer define a variable this way in the first place? Could we not just leave out the const just in case we need to change the variable later (especially if there is no downfall in doing so)?
Are there only some IDEs/Languages that save bytes with constant variables?
Thanks for any help, it is greatly appreciated.
I presume you're working on deeply embedded system (like cortex-M processors).
For these, you know that SRAM is a scarce resource whereas you have plenty of FLASH memory.
Then as much as you can, use the const keyword for any variable that doesn't change. Doing this will tell compiler to store the variable in FLASH memory and not in SRAM.
For example, to store a text on your system you can do this:
const char* const txtMenuRoot[] = { "Hello, this is the root menu", "Another text" };
Then not only the text is stored in FLASH, but also its pointer.
All your questions depend heavily on compiler and environment. A C compiler intended for embedded environment can do a great job about saving memory, while others maybe not.
Is there any byte save when identifying static variables as constant?
Yes, it may be possible. But note that "const", generally, isn't intended to specify how to store a variable - instead its meaning is to help the programmer and the compiler to better understand the source code (when the compiler "understand better", it can produce better object code). Some compiler can use that information to also store the variable in read-only memory, or delete it and turn it into literals in object code. But in the context of your question, may be that a #define is more suitable.
If bytes are saved by doing this, why are they saved? How does the program store the variable differently if it is constant, and why does this way need less storage space?
Variables declared in source code can go to different places in the object code, and different places when an object file is loaded in memory and executed. Note that, again, there are differences on various architectures - for example in a small 8/16 bits MCU (cpu for electronic devices), generally there is no "loading" of an object file. So the value of a variable is stored somewhere - anyway. But at low level the compiler can use literals instead of addresses, and this mostly saves some memory. Suppose you declare a constant variable GAIN=5 in source code. When that variable is used in some formula, the compiler emits something like "LD R12,GAIN" (loads register R12 with the content of the address GAIN, where variable GAIN is stored). But the compiler can also emit "LD R12,#5" (loads the value "5" in R12). In both cases an instruction is needed, but in the second case there is no memory for variables involved. This is a saving, and can also be faster.
If bytes are not saved by defining variables as constant, then why would a developer define a variable this way in the first place? Could we not just leave out the const just in case we need to change the variable later (especially if there is no downfall in doing so)?
As told earlier, the "const" keyword is meant to better define what operations will be done on the variable. This is useful for programmers, for clarity. It is useful to clearly state that a variable is not intended to be modified, especially when the variable is a formal parameter. In some environments, there is actually some read-only memory that can only be read and not written to and, if a variable (maybe a "system variable") is marked as "const", all is clear to the programmer -and- the compiler, which can warn if it encounters code trying to modify that variable.
Are there only some IDEs/Languages that save bytes with constant variables?
Definitely yes. But don't talk about IDEs: they are only editors. And about languages, things are complicated: it depends entirely on implementation and optimization. Likely this kind of saving is used only in compilers (not interpreters), and depends a lot on optimization options/capabilities of the compiler.
Think of const this way (there is no such thing as final or constant in C, so I'll just ignore that). If it's possible for the compiler to save memory, it will (especially when you compile optimizing for size). const gives the compiler more information about the properties of an object. The compiler can make smarter decisions when it has more information and it doesn't prevent the compiler from making the exact same decision as before it had that information.
It can't hurt and may help and it also helps the programmers working with the code to easier reason about it. Both the compiler and the programmer are helped, no one gets hurt. It's a win-win.
Compilers can reduce the memory used based on the knowledge of the code, const help compiler to know the real code behaviour (if you activate warnings you can have suggestions of where to put const).
But a struct can contains unused byte due to alignment restrictions of the hw used and compilers cannot alter the inner order of a struct. This can be done only changing the code.
struct wide struct compact
{ {
int_least32_t i1; int_least32_t i1,
int_least8_t b; i2;
int_least32_t i2; int_least8_t b;
} }
Due to the alignment restrictions the struct wide can have an empty space between members 'b' and 'i2'.
This is not the case in struct compact because the elements are listed from the widest, which can require greater alignments, to the smaller.
In same cases the struct compact leads even to faster code.

Why Use Pointers in C?

I'm still wondering why in C you can't simply set something to be another thing using plain variables. A variable itself is a pointer to data, is it not? So why make pointers point to the data in the variable when you can simply use the original variable? Is it to access specific bits (or bytes, I guess) of data within said variable?
I'm sure it's logical, however I have never fully grasped the concept and when reading code seeing *pointers always throws me off.
One common place where pointers are helpful is when you are writing functions. Functions take their arguments 'by value', which means that they get a copy of what is passed in and if a function assigns a new value to one of its arguments that will not affect the caller. This means that you couldn't write a "doubling" function like this:
void doubling(int x)
{
x = x * 2;
}
This makes sense because otherwise what would the program do if you called doubling like this:
doubling(5);
Pointers provide a tool for solving this problem because they let you write functions that take the address of a variable, for example:
void doubling2(int *x)
{
(*x) = (*x) * 2;
}
The function above takes the address of an integer as its argument. The one line in the function body dereferences that address twice: on the left-hand side of the equal sign we are storing into that address and on the right-hand side we are getting the integer value from that address and then multiply it by 2. The end result is that the value found at that address is now doubled.
As an aside, when we want to call this new function we can't pass in a literal value (e.g. doubling2(5)) as it won't compile because we are not properly giving the function an address. One way to give it an address would look like this:
int a = 5;
doubling2(&a);
The end result of this would be that our variable a would contain 10.
A variable itself is a pointer to data
No, it is not. A variable represents an object, an lvalue. The concept of lvalue is fundamentally different from the concept of a pointer. You seem to be mixing the two.
In C it is not possible to "rebind" an lvalue to make it "point" to a different location in memory. The binding between lvalues and their memory locations is determined and fixed at compile time. It is not always 100% specific (e.g. absolute location of a local variable is not known at compile time), but it is sufficiently specific to make it non-user-adjustable at run time.
The whole idea of a pointer is that its value is generally determined at run time and can be made to point to different memory locations at run time.
No, a variable is not a pointer to data. If you declare two integers with int x, y;, then there is no way to make x and y refer to the same thing; they are separate.
Whenever you read or write from a variable, your computer has to somehow determine the exact location of that variable in your computer's memory. Your computer will look at the code you wrote and use that to determine where the variable is. A pointer can represent the situation where the location is not known at the time when you compile your code; the exact address is only computed later when you actually run your code.
If you weren't allowed to use pointers or arrays, every line of code you write would have to access specific variables that are known at compile time. You couldn't write a general piece of code that reads and writes from different places in memory that are specified by the caller.
Note: You can also use arrays with a variable index to access variables whose location is not known at compile time, but arrays are mostly just syntactical sugar for pointers. You can think about all array operations in terms of pointer operations instead. Arrays are not as flexible as pointers.
Another caveat: As AnT points out, the location of local variables is usually on the stack, so they are a type of variable where the location isn't known at compile time. But the only reason that the stack works for storing local variables in a reentrant function is that your compiler implements hidden pointers called the stack pointer and/or frame pointer, and functions use these pointers to find out which part of memory holds their arguments and local variables. Pointers are so useful that the compiler actually uses them behind your back without telling you.
Another reason: C was designed to build operating systems and lots of low level code that deals with hardware. Every piece of hardware exposes its interface by means of registers, and on nearly all architectures, registers are mapped into the CPU memory space, and they have not to be in the same address always (thanks to jumper settings, PnP, autoconfig, and so on)
So the OS writer, while writing a driver for instance, needs a way to deal with what seems memory locations, only that they don't refer RAM cells.
Pointers serve to this purpose by allowing the OS writer to specify what memory location he or she wants to access to.

Finding roots for garbage collection in C

I'm trying to implement a simple mark and sweep garbage collector in C. The first step of the algorithm is finding the roots. So my question is how can I find the roots in a C program?
In the programs using malloc, I'll be using the custom allocator. This custom allocator is all that will be called from the C program, and may be a custom init().
How does garbage collector knows what all the pointers(roots) are in the program? Also, given a pointer of a custom type how does it get all pointers inside that?
For example, if there's a pointer p pointing to a class list, which has another pointer inside it.. say q. How does garbage collector knows about it, so that it can mark it?
Update: How about if I send all the pointer names and types to GC when I init it? Similarly, the structure of different types can also be sent so that GC can traverse the tree. Is this even a sane idea or am I just going crazy?
First off, garbage collectors in C, without extensive compiler and OS support, have to be conservative, because you cannot distinguish between a legitimate pointer and an integer that happens to have a value that looks like a pointer. And even conservative garbage collectors are hard to implement. Like, really hard. And often, you will need to constrain the language in order to get something acceptable: for instance, it might be impossible to correctly collect memory if pointers are hidden or obfuscated. If you allocate 100 bytes and only keep a pointer to the tenth byte of the allocation, your GC is unlikely to figure out that you still need the block since it will see no reference to the beginning. Another very important constraint to control is the memory alignment: if pointers can be on unaligned memory, your collector can be slowed down by a factor of 10x or worse.
To find roots, you need to know where your stacks start, and where your stacks end. Notice the plural form: each thread has its own stack, and you might need to account for that, depending on your objectives. To know where a stack starts, without entering into platform-specific details (that I probably wouldn't be able to provide anyways), you can use assembly code inside the main function of the current thread (just main in a non-threaded executable) to query the stack register (esp on x86, rsp on x86_64 to name those two only). Gcc and clang support a language extension that lets you assign a variable permanently to a register, which should make it easy for you:
register void* stack asm("esp"); // replace esp with the name of your stack reg
(register is a standard language keyword that is most of the time ignored by today's compilers, but coupled with asm("register_name"), it lets you do some nasty stuff.)
To ensure you don't forget important roots, you should defer the actual work of the main function to another one. (On x86 platforms, you can also query ebp/rbp, the stack frame base pointers, instead, and still do your actual work in the main function.)
int main(int argc, const char** argv, const char** envp)
{
register void* stack asm("esp");
// put stack somewhere
return do_main(argc, argv, envp);
}
Once you enter your GC to do collection, you need to query the current stack pointer for the thread you've interrupted. You will need design-specific and/or platform-specific calls for that (though if you get something to execute on the same thread, the technique above will still work).
The actual hunt for roots starts now. Good news: most ABIs will require stack frames to be aligned on a boundary greater than the size of a pointer, which means that if you trust every pointer to be on aligned memory, you can treat your whole stack as a intptr_t* and check if any pattern inside looks like any of your managed pointers.
Obviously, there are other roots. Global variables can (theoretically) be roots, and fields inside structures can be roots too. Registers can also have pointers to objects. You need to separately account for global variables that can be roots (or forbid that altogether, which isn't a bad idea in my opinion) because automatic discovery of those would be hard (at least, I wouldn't know how to do it on any platform).
These roots can lead to references on the heap, where things can go awry if you don't take care.
Since not all platforms provide malloc introspection (as far as I know), you need to implement the concept of scanned memory--that is, memory that your GC knows about. It needs to know at least the address and the size of each of such allocation. When you get a reference to one of these, you simply scan them for pointers, just like you did for the stack. (This means that you should take care that your pointers are aligned. This is normally the case if you let your compiler do its job, but you still need to be careful when you use third-party APIs).
This also means that you cannot put references to collectable memory to places where the GC can't reach it. And this is where it hurts the most and where you need to be extra-careful. Otherwise, if your platform supports malloc introspection, you can easily tell the size of each allocation you get a pointer to and make sure you don't overrun them.
This just scratches the surface of the topic. Garbage collectors are extremely complex, even when single-threaded. When you add threads to the mix, you enter a whole new world of hurt.
Apple has implemented such a conservative GC for the Objective-C language and dubbed it libauto. They have open-sourced it, along with a good part of the low-level technologies of Mac OS X, and you can find the source here.
I can only quote Hot Licks here: good luck!
Okay, before I go even further, I forgot something very important: compiler optimizations can break the GC. If your compiler is not aware of your GC, it can very well never put certain roots on the stack (only dealing with them in registers), and you're going to miss them. This is not too problematic for single-threaded programs if you can inspect registers, but again, a huge mess for multithreaded programs.
Also be very careful about the interruptibility of allocations: you must make sure that your GC cannot kick in while you're returning a new pointer because it could collect it right before it is assigned to a root, and when your program resumes it would assign that new dangling pointer to your program.
And here's an update to address the edit:
Update: How about if I send all the pointer names and types to GC when
I init it? Similarly, the structure of different types can also be
sent so that GC can traverse the tree. Is this even a sane idea or am
I just going crazy?
I guess you could allocate our memory then register it with the GC to tell it that it should be a managed resource. That would solve the interruptability problem. But then, be careful about what you send to third-party libraries, because if they keep a reference to it, your GC might not be able to detect it since they won't register their data structures with your GC.
And you likely won't be able to do that with roots on the stack.
The roots are basically all static and automatic object pointers. Static pointers would be linked inside the load modules. Automatic pointers must be found by scanning stack frames. Of course, you have no idea where in the stack frames the automatic pointers are.
Once you have the roots you need to scan objects and find all the pointers inside them. (This would include pointer arrays.) For that you need to identify the class object and somehow extract from it information about pointer locations. Of course, in C many objects are not virtual and do not have a class pointer within them.
Good luck!!
Added: One technique that could vaguely make your quest possible is "conservative" garbage collection. Since you intend to have your own allocator, you can (somehow) keep track of allocation sizes and locations, so you can pick any pointer-sized chunk out of storage and ask "Might this possibly be a pointer to one of my objects?" You can, of course, never know for sure, since random data might "look like" a pointer to one of your objects, but still you can, through this mechanism, scan a chunk of storage (like a frame in the call stack, or an individual object) and identify all the possible objects it might address.
With a conservative collector you cannot safely do object relocation/compaction (where you modify pointers to objects as you move them) since you might accidentally modify "random" data that looks like an object pointer but is in fact meaningful data to some application. But you can identify unused objects and free up the space they occupy for reuse. With proper design it's possible to have a very effective non-compacting GC.
(However, if your version of C allows unaligned pointers scanning could be very slow, since you'd have to try every variation on byte alignment.)

In C if a variable is not assigned a value then why does it take garbage value?

Why do the variables take garbage values?
I guess the rationale for this is that your program will be faster.
If compiler automatically reset (ie: initialize to 0 or to NaN for float/doubles etc etc) your variables, it would take some time doing that (it'd have to write to memory).
In many cases initializing variables could be unneeded: maybe you will never access your variable, or will write on it the first time you access it.
Today this optimization is arguable: the overhead due to initializing variables is maybe not worth the problems caused by variables uninitialized by mistake, but when C has been defined things were different.
Unassigned variables has so-called indeterminate state that can be implemented in whatever way, usually by just keeping unchanged whatever data was in memory now occupied by the variable.
It just takes whatever is in memory at the address the variable is pointing to.
When you allocate a variable you are allocating some memory. if you dont overwrite it, memory will contain whatever "random" information was there before and that is called garbage value.
Why would it not? A better question might be "Can you explain how it comes about that a member variable in C# which is not initialised has a known default value?"
When variable is declared in C, it involves only assigning memory to variable and no implicit assignment. Thus when you get value from it, it has what is stored in memory cast to your variable datatype. That value we call as garbage value. It remains so, because C language implementations have memory management which does not handle this issue.
This happens with local variables and memory allocated from the heap with malloc(). Local variables are the more typical mishap. They are stored in the stack frame of the function. Which is created simply by adjusting the stack pointer by the amount of storage required for the local variables.
The values those variables will have upon entry of the function is essentially random, whatever happened to be stored in those memory locations from a previous function call that happened to use the same stack area.
It is a nasty source of hard to diagnose bugs. Not in the least because the values aren't really random. As long as the program has predictable call patterns, it is likely that the initial value repeats well. A compiler often has a debug feature that lets it inject code in the preamble of the function that initializes all local variables. A value that's likely to produce bizarre calculation results or a protected mode access violation.
Notable perhaps as well is that managed environments initialize local variables automatically. That isn't done to help the programmer fall into the pit of success, it's done because not initializing them is a security hazard. It lets code that runs in a sandbox access memory that was written by privileged code.

Resources