FILE in C and different output - c

#include <stdio.h>
int main(void){
FILE *fp = fopen("loop.txt" , "r");
printf("%p\n",fp);
}
output :
Run 1 : 0x101d010
Run 2 : 0x13f9010
Run 3 : 0xeaf010
Why is the output different every time ?

The fopen() function call returns a pointer to a FILE structure that 'describes' the file, in terms of what the operating system needs in order to access that file on disk. That FILE structure will be located somewhere in memory (allocated at run-time); the actual location (address) of that memory block will vary between different runs of the program - which is exactly why you need to keep track of it in your fp (pointer) variable.
All other calls to library functions (such as fwrite(), fread() and fclose()), which access that file, will need that fp variable as a parameter; this indicates to the functions (and to the system) which file object you are working with.
To give an authoritative and detailed explanation about why your program receives a different address in the file pointer, each time you run it, would require equally detailed and authoritative knowledge of your system's implementation of the fopen() call (and related I/O support code) – and that is knowledge that I don't have.
However, here are two possible explanations:
Each time you call fopen(), the system allocates space for the required FILE structure by calling malloc(sizeof(FILE)); this will return the address of the first available chunk of system memory of sufficient size, which will clearly vary between runs, depending on what other programs and/or services are using the system's memory pool.
The I/O subsystem has a fixed, internal table of FILE structures, each with its (fixed) starting address; when you call fopen(), the system assigns the first available table entry to your opened file and the function returns the address of that. But this can also vary between runs, depending on what other programs/services are using entries in that table.
If I had to make a guess (and that's all it would be), the large differences between the addresses you show in your example would tend to make me favour the first possibility. But there are numerous other ways your system could handle the task.

Related

Custom heap/memory allocation ranges

I am writing a 64-bit application in C (with GCC) and NASM under Linux.
Is there a way to specify, where I want my heap and stack to be located. Specifically, I want all my malloc'ed data to be anywhere in range 0x00000000-0x7FFFFFFF. This can be done at either compile time, linking or runtime, via C code or otherwise. It doesn't matter.
If this is not possible, please explain, why.
P.S. For those interested, what the heck I am doing:
The program I am working on is written in C. During runtime it generates NASM code, compiles it and dynamically links to the already running program. This is needed for extreme optimization, because that code will be run thousands-if-not-billions of times, and is not known at compile time. So the reason I need 0x00000000-0x7FFFFFFF addresses is because they fit in immediates in assembler code. If I don't need to load the addresses separately, I can just about half the number of memory accesses needed and increase locality.
For Linux, the standard way of acquiring any Virtual Address range is using the mmap(2) function.
You can specify the starting virtual address and the size. If the address is not already in use and it not reserved by prior calls (or by the kernel) you will get access to the virtual address.
The success of this call can be checked by comparing the return value to the start address you passed. If the call fails, the function returns NULL.
In general mmap is used to map virtual addresses to file descriptors. But this mapping has to happen through physical pages on the RAM. Since the applications cannot directly access the disk.
Since you do not want any file backing, you can use the MAP_ANONYMOUS flag in the mmap call (also pass -1 as the fd).
This is the excerpt for the related part of the man-page -
MAP_ANONYMOUS
The mapping is not backed by any file; its contents are
initialized to zero. The fd argument is ignored; however,
some implementations require fd to be -1 if MAP_ANONYMOUS (or
MAP_ANON) is specified, and portable applications should
ensure this. The offset argument should be zero. The use of
MAP_ANONYMOUS in conjunction with MAP_SHARED is supported on
Linux only since kernel 2.4.

Why all system calls in Linux passes arguments to kernel using "call by reference"?

If we look at the syscalls.h file in Linux kernel, we can see that all most all the arguments of the system calls are passed by reference. For example
asmlinkage long sys_open_by_handle_at(int mountdirfd,
struct file_handle __user *handle,
int flags);
Here, file_handle is passed as a pointer. Why not simple the value is not passed to kernel?
Efficiency.
Many (most?) systems implement function calls by pushing argument values onto a stack. If you pass a struct or any other complex data type by value, you'd need to copy it to the stack. There's no reason to do this, since the kernel has access to the entire memory space of the process. Aside from the copy cost, you'd also increase the stack space needed.
In addition, the kernel will need to copy any data it needs to retain into the kernel memory space. The kernel can't rely on user space code behavior. (It's also not going to free anything obtained from user space, which eliminates some any concerns over mixing up responsibility for reclaiming memory.)
Finally, realistically, coders working in the kernel need to be very comfortable with working with pointers. There's really no advantage to passing by value once you're completely comfortable with pointers.
This part is a bit more of an opinion, but I think there's also a strong legacy effect. The Unix kernel and C developed somewhat in tandem. See https://en.wikipedia.org/wiki/C_(programming_language) for some of the history. It's been a long time, but if I recall correctly, older versions of C wouldn't allow you to pass a struct by value. Regardless, working with pointers was highly idiomatic in C (and I would say still is). In other words, this is just how things have always been done.
The memory space for user mode and kernel mode are different. When you make a system call, the MMU of the Linux subsystem makes sure that proper memory mapping of the user space process running in their own Virtual address space is done to the Physical address space of the kernel.
Variables in the user mode stay in the process' virtual address space. They can't just be passed in system calls and expected to get mapped in the physical address space .
This is what my understanding is. Would love to discuss and clarify if needed.
Principally I understand that the struct file_handle parameter of the function sys_open_by_handle_at(() is an "in" parameter, i.e. it is not modified by the function. Therefore it could as well be passed by value. I see about three reasons why this is not done. All reasons are surely valid for this particular function; at least the last argument (K&R) applies to all struct arguments, in all system calls.
The struct can have a size of e.g. 128 bytes which would be slow to copy to the stack.
Passing a pointer obviates the need to know the struct definition on the caller side. The struct is an "opaque handle" filled by a previous call to [sys_]name_to_handle_at(). The caller doesn't want to and actually shouldn't be burdened with the details of the struct's contents. (Leaving the caller innocent obviates the need to recompile the program because the struct's layout changes. I can also imagine that the contents differs between file system types.)
Unix and even its open source complement Linux is older than C99. I suppose that for the longest time K&R C was the smallest common denominator C standard the kernel sources adhered to. In K&R C it is simply not possible to pass structs by value.

How are function calls resolved?

When a function is called, execution is shifted to a point indicated by the function pointer. At the start of execution, the executable code has to be loaded from disk.
How is the correct function pointer called? The executable code is not mapped into virtual memory at the same location every time, right? So how does the runtime make sure that a call to a function always calls the correct function even if the location of the executable code is different for each execution?
Consider the following code:
void func(void); //Func defined in another dynamic library
int main()
{
func();
//How is the pointer to func known if the file containing func is loaded from disk at run time?
};
The way that function pointers are resolved is really quite simple. When the compiler chain spits out an executable binary, all internal addresses are relative to a "base address." In some executable formats, this base address is specified, in others it is implied.
Basically, the compiler says that it assumes execution will start at address A. The runtime decides that it should actually start at B. The runtime then subtracts A and adds B to all non-relative addresses in the binary before executing it.
This process also applies to things like DLLs. Dynamic libraries store a list of addresses relative to the base pointer that point to each exported function. Names are often also associated with the list, so that you can reference a function by name. When the library is loaded, the address translation is applied to everything, including the address table. At that point, a caller just has to look up the address in the table that was translated, and then they'll have the absolute address of a given function.
In older operating systems, long long ago (and, in some cases, even today), well before things like address space layout randomization, memory pages, and multitasking operating systems, programs would just be copied to the specified base address in memory where it would then be executed.
In modern operating systems, one of a few things can happen, depending on the capabilities or requirements of the platform and application. Most operating systems handle native binaries as I described in the second paragraph, however some applications (such as running 16-bit x86 on later architectures) can involve more complex strategies. One such strategy involves giving the code a static virtual address space. This has various limitations, such as the need for an emulation/compatibility layer if you want it to interact with external code (like a windowed console or the network stack).
As the need for 16-bit support declines though, that sort of scheme is used less and less. Giving all programs their own unique address space (rather than letting it overlap) promotes the use of shared libraries, services, and other shared goodies.
In general, function calls are resolved statically. When you compile the file, first - .o (or .obj) file is created. All known addresses - are local functions (from this file). Unknown are "extern" functions.
Then, linking is performed. Linking completes address mapping for every function which is "extern". If any names are missing - linking error occurs.
How is the correct function pointer called?
Function pointer is function address, function name is function address. Both are values, not L-values. &func and func are absolutely same.
Loading or PE (or ELF) files is a process or loading the executable to memory. Too much information to explain. Basically, just for clarification, consider: every function has its own address in the process address space.
You can print the 'func' and see whether is has the same address during every execution like this:
printf("%u", function);
For me it's the same address every time (virtual memory wise).

Where is the FILE struct allocated?

In C, when opening a file with
FILE *fin;
fin=fopen("file.bin","rb");
I only have a pointer to a structure of FILE. Where is the actual FILE struct allocated on Windows machine? And does it contain all the necessary information for accessing the file?
My aim is to dump the whole data segment to disk and then to reload the dumped file back to the beginning of the data segment. The code that reloads the dumped file is placed in a separate function. This way, the fin pointer is local and is on the stack, thus is not being overwritten on reload. But the FILE struct itself is not local. I take care not to overwrite the memory region of size sizeof(FILE) that starts at the address fin.
The
fread(DataSegStart,1,szTillFin,fin);
fread(dummy,1,sizeof(FILE),fin);
fread(DataSegAfterFin,1,szFinTillEnd,fin);
operations completes successfully, but I get an assertion failure on
fclose(fin)
Do I overwrite some other necessary file data other than in the FILE struct?
The actual instance of the FILE structure exists within the standard library. Typically the standard library allocates some number of FILE structures, which may or may not be a fixed number of them. When you call fopen(), it returns a pointer to one of those structures.
The data within the FILE structure likely contains pointers to other things such as buffers. You're unlikely to be able to save and restore those structures to disk without some really deep integration with your standard library implementation.
You may be interested in something like CryoPID which does process save and restore at a different level.
It seems like you're trying to do something dangerous, unlikely to work.
fopen allocates a FILE structure and initializes it. fclose releases it. How it allocates it and what it puts in it is implementation dependent. It could contain a pointer to another piece of memory, which is also allocated somewhere (since it's buffered I/O, I guess it does allocate a buffer somewhere).
Writing code that relies on the internals of fopen is dangerous, most likely won't work, and surely won't be stable and portable.
Well, you have a pointer to a FILE object, so technically you know where it is but you should be aware that FILE is deliberately an opaque type. You shouldn't need to know what it contains, you just need to know that you can pass it to functions that know about it to perform certain actions. Additionally, FILE may not be a complete type so sizeof(FILE) might not be correct and, additionally, the object might contain pointers to other structures. Simply avoiding overwriting the FILE object is not likely to be sufficient for you to avoid corrupting the program by writing over most of its memory.
FILE is defined in stdio.h. It contains all the information about the file but, looking at the code you show, I think you don't understand its purpose. It is created and run through the operating system with the C library which fills FILE with information about the file but it is not contained in the file itself.

Storing struct array in kernel space, Linux

I believe I may be over-thinking this problem a bit... I've got a text file located on my filesystem which I am parsing at boot and storing the results into an array of structs. I need to copy this array from user space into kernel space (copy_from_user), and must have this data accessible by the kernel at any time. The data in kernel space will need to be accessed by the Sockets.c file. Is there a special place to store an array within kernel space, or can I simply add a reference to the array in Sockets.c? My C is a bit rusty...
Thanks for any advice.
I believe there are two main parts in your problem:
Passing the data from userspace to kernelspace
Storing the data in the kernelspace
For the first issue, I would suggest using a Netlink socket, rather than the more traditional system call (read/write/ioctl) interface. Netlink sockets allow configuration data to be passed to the kernel using a socket-like interface, which is significantly simpler and safer to use.
Your program should perform all the input parsing and validation and then pass the data to the kernel, preferably in a more structured form (e.g. entry-by-entry) than a massive data blob.
Unless you are interested in high throughput (megabytes of data per second), the netlink interface is fine. The following links provide an explanation, as well as an example:
http://en.wikipedia.org/wiki/Netlink
http://www.linuxjournal.com/article/7356
http://linux-net.osdl.org/index.php/Generic_Netlink_HOWTO
http://www.kernel.org/doc/Documentation/connector/
As far as the array storage goes, if you plan on storing more than 128KB of data you will have to use vmalloc() to allocate the space, otherwise kmalloc() is preferred. You should read the related chapter of the Linux Device Drivers book:
http://lwn.net/images/pdf/LDD3/ch08.pdf
Please note that buffers allocated with vmalloc() are not suitable for DMA to/from devices, since the memory pages are not contiguous. You might also want to consider a more complex data structure like a list if you do not know how many entries you will have beforehand.
As for accessing the storage globally, you can do it as with any C program:
In a header file included by all .c files that you need to access the data put something like:
extern struct my_struct *unique_name_that_will_not_conflict_with_other_symbols;
The extern keyword indicates that this declares a variable that is implemented at another source file. This will make this pointer accesible to all C files that include this header.
Then in a C file, preferrably the one with the rest of your code - if one exists:
struct my_struct *unique_name_that_will_not_conflict_with_other_symbols = NULL;
Which is the actual implementation of the variable declared in the header file.
PS: If you are going to work with the Linux kernel, you really need to brush up on your C. Otherwise you will be in for some very frustrating moments and you WILL end up sorry and sore.
PS2: You will also save a lot of time if you at least skim through the whole Linux Device Drivers book. Despite its name and its relative age, it has a lot of information that is both current and important when writing any code for the Linux Kernel.
You can just define an extern pointer somewhere in the kernel (say, in the sockets.c file where you're going to use it). Initialise it to NULL, and include a declaration for it in some appropriate header file.
In the part of the code that does the copy_from_user(), allocate space for the array using kmalloc() and store the address in the pointer. Copy the data into it. You'll also want a mutex to be locked around access to the array.
The memory allocated by kmalloc() will persist until freed with kfree().
Your question is basic and vague enough that I recommend you work through some of the exercises in this book. The whole of chapter 8 is dedicated to allocating kernel memory.
Initializing the Array as a global variable in your Kernel Module will make it accessible forever until the kernel is running i.e. until your system is running.

Resources