By printing "garbage values" in uninitialized data segment (bss) can we map out all values from previous program - c

I have a weird question, and i am not sure if i will be able to explain it but here we go. While learning C and using it you usually come across the term "trash" or "garbage" value, my first question to that is, is data left over data in that memory address from some different program or of anything or is it actually some 'random' value , if i take that it is true that is leftover value in that memory address why are we still able to read from such memory address, i mean lets assume we just declare int x; and it is now stored in bss on some memory address #, and we were to output its value we would get the value of that resides on that address, so if all the things i said are true, doesnt that allow for us to declare many many many variables but only declare and not initialize perhaps we can map all the values previously stored in bss from some program from before etc.
I am mostly likely sure that this would be a big security threat and thus i know there is probably some measure against it but i want to know what prevents this?

No, the contents of the .bss section are zeroed out before your program starts. This is to satisfy C's guarantee that global and static variables, if not explicitly initialized, will be initialized to zero.
Indeed, on a typical multitasking system, all memory allocated by your process will be zeroed by the operating system before you are given access to it. This is to avoid precisely the security hole you mention.
The values of local (auto) variables, on the stack, do typically contain "garbage" if not initialized, but it would be garbage left over from the execution of your own program up to this point. If your program happens not to have written anything to that particular location on the stack, then it will still contain zero (again on a typical OS); it will never contain memory contents from other programs.
The same goes for memory allocated by malloc. If it is coming straight from the OS, it contains zeros. If it happens to be a block that was previously allocated and freed, it might contain garbage from your previous use of that memory, or from malloc's internal data, but again it will never contain another program's data.

Nothing in the C language itself prevents you from doing almost exactly as you say. The only thing you said that was wrong, considering only the requirements of C standard, was talking about variables "in bss". Objects with static storage duration and no initializer (which is the standardese equivalent of variables in bss) are guaranteed to be initialized to zero at program startup, so you cannot access the data of no-longer-running programs that way. But, in an environment like good old-fashioned MS-DOS or CP/M, there was nothing whatsoever to stop you from setting a pointer to the base of physical RAM, scanning to the end, and finding data from previous programs.
All modern operating systems for full-featured computers, however, provide memory protection which means, among other things, that they guarantee that no process can read another process's memory, whether or not the other process is still running, except via well-defined APIs that enforce security policy. The "Spectre" family of hardware bugs are a big deal just because they break this guarantee.
The details of how memory protection work are too complex to fit into this answer box, but one of the things that's almost always done is, whenever you allocate more memory from the operating system, that memory is initialized, either to all-bits-zero or to the contents of a file on disk. Either way you can't get at "garbage".

Related

What does uninitialized read mean?

Someone said uninitialized read is accessing an unwritten but allocated memory space.
And there’s also someone said it is accessing an unallicated memory space. So I am here to double check the meaning and BTW: Could you briefly explain what do "written" and "allocated" mean.
Hard to say without full context but here is best guesses --
uninitialized read -- you would say this when a variable or structure is read from memory without a value or default having been written to it. Thus you are reading unitialized (random) data. If a hacker could write to that memory location they could cause your system to act unexpectedly.*
TO FIX: make sure all allocated data and structures have default values written to them.
unallocated memory -- this is memory that has not specifically been marked as used by your application. This means any application or system could write to this memory and impact your system (since you are not reading from space that is designated for your application.
TO FIX: make sure you allocate all memory you use using your memory management system of choice.
*It has been pointed out that the system might behave unexpected anyway but the fact the system could be controlled by an outside agency was my point
Could you briefly explain what do "written" and "allocated" mean.
“Allocated” means the memory has been designated for a specific use.
When int x; appears inside a function in a C program, memory is automatically allocated for it. (It is automatic in that the compiler arranges for the memory to be reserved for x, so the author of this function does not have to do anything else to get that memory.) Memory can also be allocated in other ways, such as by explicit request, and C has rules for which declarations do or do not reserve memory that can be somewhat complicated.
When memory is automatically allocated in this way, it is not automatically initialized. This means the program has decided a certain part of memory will be used for x but it has not put any value into it. That memory could contain a value left over from prior use, or it could contain zero from when the operating system cleared it before assigning it to the program, or it could contain something else. (Additionally, due to the rules of the C standard and the complexities of modern compilers, memory that is not initialized can cause complications in your program. It may act in ways that are confusing to beginners.)
To ensure the memory has a defined value, you should initialize it. This can be done in the definition, as with int x = 3;, or it can be done later, as with x = 3;.
Setting an object to a value is also called writing to memory, storing to memory, storing to an object, and assigning a value. So, if you have written a value to an object, you have initialized it. (“Initialization” generally refers to the first time a value is written to a new object, but we can also say we are “reinitializing” something when we are resetting its value to a state we consider “earlier” in some sense.)
Someone said uninitialized read is accessing an unwritten but allocated memory space. And there’s also someone said it is accessing an unallicated memory space.
“Uninitialized read” is a somewhat crude term. Properly, we might say a “read of uninitialized memory,” and that is indeed reading memory that is uninitialized. Even if the memory assigned for a new object, say x, was previously used for something else, we refer to that memory as uninitialized once it has been newly designated for the new object and not yet written to.
“Uninitialized read” does not mean accessing unallocated memory.

What exactly is the value contained in an uninitialized local variable in C?

If we have a function in C with a simple unitialized ìnt variable in it, we know that this variable may not be always initialized to zero. Instead, it may contain some "garbage" value.
My question is: what exactly could represent that value? It can be some information left (unfreed memory) by a process that was terminated before?
If yes, then wouldn't be this an extremely major security breach? Because in that way any process can read information left by processes that used the same address space as the current process (passwords, tokens, etc.).
My assumption is that for each new process, the kernel zeroes the memory allocated for that new process (at least for the stack) and then it loads the executable into memory. Those "garbage" values are actually values generated by the loading procedure of the current process (so that there is no way to access any left data from other processes that used the same address space).
I'm arguing with some fellows on this topic and I really want a clear and comprehensive answer to this (I'm sure there is one). We are assuming that the kernel is debian/centos based. It would be great to know if there are differences in behaviour for different kernels / OS-es.
Thank you respectfully.
This should be separated into two questions:
What does the C standard say about the value of an uninitialized object?
What is in memory when main is called?
The first question is discussed in other Stack Overflow questions and answers. A full answer is complicated and involves a discussion of a variety of circumstances, and this question does not seem to be asking about that particularly, so I will leave it for the other Stack Overflow questions. For this question, suffice it to say that using the value of an uninitialized object is prone to undefined behavior. Further, this is not simply because the memory of the object might have troublesome values but because the C standard permits a C implementation to treat a program that reads an uninitialized value as a misbehaving program in various ways, and optimizations can then disrupt the program further.
As far as what is in memory is concerned (supposing we have a supported way to examine it, perhaps by using assembly language instead of C), then every multiuser system that provides any sort of security erases (or otherwise initializes) memory before making it available to a process. Any values that are in memory at the time main is called are, as the question contemplates, either the result of the loading process or of initialization by the operating system. (Note that the result of the loading process includes both loading of constant data and program text—so we would expect to find the defined values there—and whatever data is leftover from the work done by the loading code—its variables and so on.)
The question asks for a clear answer, so let me be clear about this: An operating system that provides security for user processes must erase data of previous processes from memory before making that memory available to another process. Security cannot be provided by trusting a program not to examine the memory it is given and doing whatever it wants with it.
Rudimentary systems not intended for sharing by untrusted users can of course skip the initialization of memory when creating new processes and allocating memory for them.
Well, the local variables store in stack space, so once you finish the call to the current routine, stack pointer moves up to free all the current routine local variables and, for efficiency reasons, no previous contents are erased (only the stack pointer is moved).
If you enter a new routine, what the compiler does, is to move the stack pointer down (it doesn't push anything on the local variables space, just moves over that space to make room for the new set of local variables) and doesn't use that space until a local variable is needed in the code. What you are asking for is how to interpret the bit pattern that the stack segment has from previous use, and that depends on how the stack has been used previously to entering the current routine. This can be:
rests of temporary data used to calculate a complex expression.
parameter data of a previous call to another routine.
return addresses of previous called routines.
local variables of a previously called routine, that, as ended, are not in use anymore.
any other thing.
As that memory is used now in a different way (as the local space of current routine dictates) there's no valid interpretation of such memory contents, but as trashed data from old code.

Global variable seems to not occupy any memory space

I want to understand exactly where the global variables are stored in my program. On the stack? On the heap? Somewhere else?
So for that I wrote this small code:
int global_vector[1000000];
int main () {
global_vector[0] = 1; // just to avoid a compilation warning
while(true); // to give me time to check the amount of RAM used by my program
return 0;
}
No matter how large I make global_vector, the program only uses a really tiny amount of RAM. I do not understand the reason for this. Could someone please explain?
This is completely implementation-dependent, but typically global variables are stored in a special memory segment that is separate from the stack and the heap. This memory could be allocated as a fixed-size buffer inside of the executable itself, or in a segment that is given to the program at startup by the operating system.
The reason that you're not seeing the memory usage go up probably has to do with how virtual memory is handled by the OS. As an optimization, the operating system won't actually give any memory to the program for that giant array unless you actually use it. Try changing your program to for-loop over the entire contents of the array and see if that causes the RAM usage to go up. (It's also possible that the optimizer in your compiler is eliminating the giant array, since it's almost completely unused. Putting a loop to read/write all the values might also force the compiler to keep it).
Hope this helps!
The optimizer is probably removing the array entirely since you never use it.
Global variables that are not given explicit initializers, like yours in this case, are initialized to 0's by default. They are placed into an area of memory called the .bss segment, and no additional data is stored in the object file/executable file indicating the initial value of the data (unlike explicitly initialized data, which has to have its initial value stored somewhere).
When the OS loads the program, it reads in the descriptions of all of the segments and allocates memory for that. Since it knows that the .bss segment is initialized to all 0's, it can do a sneaky trick to avoid having to actually allocate tons of memory and then initialize it to all 0's: it allocates address space for the segment in the process's page table, but all of the pages point to the same page, filled with 0's.
That single zero-page is also set to read-only. Then, if and when the process writes to some data in the .bss segment, a page fault occurs. The OS intercepts the page fault, figures out what's going on, and then actually allocates unique memory for that page of data. It then restarts the instruction, and the code continues on its merry way as if the memory had been allocated all along.
So, the end result is that if you have a zero-initialized global variable or array, each page-sized chunk of data (typically 4 KB) that never gets written to will never actually have memory allocated for it.
Note: I'm being a little fuzzy here with the word "allocated". If you dig into this sort of thing, you're likely to encounter words such as "reserved" and "committed". See this question and this page for more info on those terms in the context of Windows.

Are uninitialized values ever a security risk?

While learning C, I made some mistakes and printed elements of a character array that were uninitialized.
If I expand the size of the array to be quite large, say 1 million elements in size and then print the contents, what comes out is not always user unreadable, but seems to contain some runtime info.
Consider the following code:
#include <stdio.h>
main() {
char s[1000000];
int c, i;
printf("Enter input string:\n");
for (i = 0; ( c = getchar()) != '\n'; i++) {
s[i] = c;
}
printf("Contents of input string:\n");
for (i = 0; i < 999999; i++) {
putchar(s[i]);
}
printf("\n");
return 0;
}
Just scrolling through the output, I find things such as:
???l????????_dyldVersionNumber_dyldVersionString_dyld_all_image_infos_dyld_fatal_error_dyld_shared_cache_ranges_error_string__mh_dylinker_header_stub_binding_helper_dyld_func_lookup_offset_to_dyld_all_image_infos__dyld_start__ZN13dyldbootstrapL30randomizeExecutableLoadAddressEPK12macho_headerPPKcPm__ZN13dyldbootstrap5startEPK12macho_headeriPPKcl__ZN4dyldL17setNewProgramVarsERK11ProgramVars__ZN4dyld17getExecutablePathEv__ZN4dyld22mainExecutablePreboundEv__ZN4dyld14mainExecutableEv__ZN4dyld21findImageByMachHeaderEPK11mach_header__ZN4dyld26findImageContainingAddressEPKv
and also,
Apple Inc.1&0$U ?0?*?H??ot CA0?"0ple Certification Authority10U
?䑩 ??GP??^y?-?6?WLU????Kl??"0?>?P ?A?????f?$kУ????z
?G?[?73??M?i??r?]?_???d5#KY?????P??XPg? ?ˬ,
op??0??C??=?+I(??ε??^??=?:??? ?b??q?GSU?/A????p??LE~LkP?A??tb
?!.t?<
?A?3???0X?Z2?h???es?g^e?I?v?3e?w??-??z0?v0U?0U?0?0U+?iG?v ??k?.#??GM^0U#0?+?iG?v ??k?.#??GM^0?U
0?0? ?H??cd0??0+https://www.apple.com/appleca/0?+0????Reliance on
this certificate by any party assumes acceptance of the then
applicable standard terms and conditions of use, certificate
poli?\6?L-x?팛??w??v?w0O????=G7?#?,Ա?ؾ?s???d?yO4آ>?x?k??}9??S ?8ı??O
01?H??[d?c3w?:,V??!ںsO??6?U٧??2B???q?~?R??B$*??M?^c?K?P????????7?uu!0?0??0
I believe one time my $PATH environment variable was even printed out.
Can the contents of an uninitialized variable ever pose a security risk?
Update 1
Update 2
So it seems clear from the answers that this is indeed a security risk. This surprises me.
Is there no way for a program to declare its memory content protected to allow the OS to restrict any access to it other than the program that initialized that memory?
Most C programs use malloc to allocate memory. A common misunderstanding is that malloc zeros out the memory returned. It actually does not.
As a result, due to the fact that memory chunks are "recycled" it is quite possible to get one with information of "value".
An example of this vulnerability was the tar program on Solaris which emitted contents of /etc/passwd. The root cause was the fact that the memory allocated to tar to read a block from disk was not initialized and before getting this memory chunk the tar utility made a OS system call to read /etc/passwd. Due to the memory recycling and the fact that tar did not initialize the chunk fragments of /etc/passwd were printed to logs. This was solved by replacing malloc with calloc.
This is an actual example of security implication if you don't explicitly and properly initialize memory.
So yes, do initialize your memory properly.
Update:
Is there no way for a program to declare its memory content protected
to allow the OS to restrict any access to it other than the program
that initialized that memory?
The answer is yes (see in the end) and no.
I think that you view it the wrong way here. The more appropriate question would be for example, why doesn't malloc initialize the memory on request or clears the memory on release but instead recycles it?
The answer is that the designers of the API explicitly decided not to initialize (or clear memory) as doing this for large blocks of memory 1)would impact performance and 2)is not always necessary (for example you may not deal, in your application or several parts in your application with data that you actually care if they are exposed). So the designers decided not to do it, as it would inadvertently impact performance, and to drop the ball to the programmer to decide on this.
So carrying this also to the OS, why should it be the OS's responsibility to clear the pages? You expect from your OS to hand you memory in a timely manner but security is up to the programmer.
Having said that there are some mechanism provided that you could use to make sure that sensitive data are not stored in swap using mlock in Linux.
mlock() and mlockall() respectively lock part or all of the calling
process's virtual address space into RAM, preventing that memory
from being paged to the swap area. munlock() and munlockall()
perform the converse operation, respectively unlocking part or all
of the calling process's virtual address space, so that pages in the
specified virtual address range may once more to be swapped out if
required by the kernel memory manager. Memory locking and unlocking
are performed in units of whole pages.
Yes, at least on systems where the data may be transmitted to outside users.
There have been a whole series of attacks on webservers (and even iPods) where you get it to dump the contents of memory from other process - and so get details of the type and version of the OS, the data in other apps and even things like password tables
It's quite possible to perform some sensitive work in an area of memory, and not clear that buffer.
A future invocation can then retrieve that uncleared work via a call to malloc() or by checking the heap (via an unitiaised buffer/array declaration). It could inspect it (maliciously) or inadvertently copy it. If you're doing anything sensitive it thus makes sense to clear that memory before binning it (memset() or similar), and perhaps before using/copying it.
From the C standard:
6.7.8 Initialization
"If an object that has automatic storage duration is not initialized
explicitly, its value is indeterminate."
indeterminate value is defined as:
either an unspecified value or a trap representation.
Trap representation is defined as:
Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined.41) Such a representation is called a trap
representation.
Accessing such a values leads to undefined behaviour and can pose security threats.
This paper Attacks on uninitialized variables can give some insights on they can be used to exploit the system.
If you are concerned about security, safest way is to allways initialize every variable you're going to use. It may even help you find some bugs.
There may be some good reasons for not initializing memory, but in most cases initializing every variable/memory will be a good thing.
Reading uninitialized memory leads to undefined behavior. Bear in mind that what it means to be initialized depends on the invariant of a particular type. For example, it may be required for some pointer to be non-null, some enum to be from a valid range or a certain parameter to be a power of two. Situation complicates further with compound structures. An arbitrary sequence of bytes may not represent a valid object. This is why zeroing memory is not enough. If the expected invariant is broken, some code path relying on it will behave in an undefined manner and may pose a security issue.

Garbage values in a multiprocess operating system

Does the allocated memory holds the garbage value since the start of the OS session? Does it have some significance before we name it as a garbage value in our program runtime session? If so then why?
I need some advice on study materials regarding linux kernel programming, device driver programming and also want to develop an understanding on how the computer devices actually work. I get stuck into the situations like the "garbage value" and feel like I have to study something else also for better understanding of the programming language. I am studying by myself and getting a lot of confusing situations. Any advice will be really helpful.
"Garbage value" is a slang term, meaning "I don't know what value is there, or why, and for that reason I will not use the value". It is "garbage" in the sense of "useless nonsense", and sometimes it is also "garbage" in the sense of "somebody else's leavings".
Formally, uninitialized memory in C takes "indeterminate values". This might be some special value written there by the C implementation, or it might be something "left over" by an earlier user of the same memory. So for examples:
A debug version of the C runtime might fill newly-allocated memory with an eye-catcher value, so that if you see it in the debugger when you were expecting your own stored data, you can reasonably conclude that either you forgot to initialize it or you're looking in the wrong place.
The kernel of a "proper" operating system will overwrite memory when it is first assigned to a process, to avoid one process seeing data that "belongs" to another process and that for security reasons should not leak across process boundaries. Typically it will overwrite it with some known value, like 0.
If you malloc memory, write something in it, then free it and malloc some more memory, you might get the same memory again with its previous contents largely intact. But formally your newly-allocated buffer is still "uninitialized" even though it happens to have the same contents as when you freed it, because formally it's a brand new array of characters that just so happens to have the same address as the old one.
One reason not to use an "indeterminate value" in C is that the standard permits it to be a "trap representation". Some machines notice when you load certain impossible values of certain types into a register, and you'd get a hardware fault. So if the memory was previously used for, say, an int, but then that value is read as a float, who is to say whether the left-over bit pattern represents a so-called "signalling NaN", that would halt the program? The same could happen if you read a value as a pointer and it's mis-aligned for the type. Even integer types are permitted to have "parity bits", meaning that reading garbage values as int could have undefined behavior. In practice, I don't think any implementation actually does have trap representations of int, and I doubt that any will check for mis-aligned pointers if you just read the pointer value -- although they might if you dereference it. But C programmers are nothing if not cautious.
What is garbage value?
When you encounter values at a memory location and cannot conclusively say what these values should be then those values are garbage value for you. i.e: The value is Indeterminate.
Most commonly, when you use a variable and do not initialize it, the variable has an Indeterminate value and is said to possess a garbage value. Note that using an Uninitialized variable leads to an Undefined Behavior, which means the program is not a valid C/C++ program and it may show(literally) any behavior.
Why the particular value exists at that location?
Most of the Operating systems of today use the concept of virtual memory. The memory address a user program sees is an virtual memory address and not the physical address. Implementations of virtual memory divide a virtual address space into pages, blocks of contiguous virtual memory addresses. Once done with usage these pages are usually at least 4 kilobytes. These pages are not explicitly wiped of their contents they are only marked as free for reuse and hence they still contain the old contents if not properly initialized.
On a typical OS, your userspace application only sees a range of virtual memory. It is up to the kernel to map this virtual memory to actual, physical memory.
When a process requests a piece of (virtual) memory, it will initially hold whatever is left in it -- it may be a reused piece of memory that another part of the process was using earlier, or it may be memory that a completely different process had been using... or it may never have been touched at all and be in whatever state it was when you powered on the machine.
Usually nobody goes and wipes a memory page with zeros (or any other equally arbitrary value) on your behalf, because there'd be no point. It's entirely up to your application to use the memory in whatever way you please, and if you're going to write to it anyway, then you don't care what was in it before.
Consequently, in C it is simply not allowed to read a variable before you have written to it, under pain of undefined behaviour.
If you declare a variable without initialising it to a particular value, it may contain a value which was previously assigned by a different program that has since released that piece of memory, or it may simply be a random value from when the computer was booted (iirc, PCs used to initialise all RAM to 0 on bootup because early versions of DOS required it, but new computers no longer do this). You can't assume the value will be zero, for instance.
Garbage value, e.g. in C, typically refers to the fact that if you just reserve memory, but never intialize it, it will hold random values, since it simply is not initialized yet (C doesn't do that for you automatically; it would just be overhead, and C is designed for as little overhead as possible).
The random values in the memory are leftovers from whatever was in there before.
These previous values are left in there, because usually there is not much use in going around setting memory to zero - or any other value - that will later be overwritten again anway. Because for the general case, there is no use in reading uninitialized memory (except if you e.g. want to exploit possible security issues - see the special cases where memory is actually zeroed: Kernel zeroes memory?).

Resources