C: Store a large "virtual" array - c

I am translating a 32 bit CPU emulator from python to C.
32 bit address space ==> 4GB of memory, but that is more memory than a lot of machines can handle. For this reason in the Python emulator, I used a dict, because it gave access to the entire address space, but only a small subset would be used at once.
In C, I would like to preserve access to the whole address space (since a C-based emulator would be able to read or write to the whole address space in a matter of seconds) but keep the memory manageable (so no 4gb array), and maintain high performance(the main reason for rewriting the emulator in c).
One solution I have thought of is creating a paging system, so only a small amount of the array is stored in memory and the rest on disk. How could I implement this (I am new to C), and are there any better solutions?

Consider looking into mmap and memory-mapped storage.

Related

Partitioned memory for writing a soft CPU of virtual machine

I'm trying to write a simple soft CPU in C that will work on an imaginary machine for an embedded application. I'm new to this, so bear with me.
I've been trying to do this in an IDE, but run into an issue where I need to malloc the memory and am not getting a consistent memory address for allocating my registers, so I'm unable to run tests and debug. On an actual piece of hardware, I understand that the documentation would give me the addresses of specific registers, main memory, and hard disk memory, correct? I'd like to be able to define macros for my registers that I can then pass around to read/write, but this seems impossible without static memory addresses.
So it seems like I need a good way to allocate a static chunk of memory with static addresses, either in an IDE or on my own machine with a text editor. What would be the best way to do this? For reference, I'm using Cloud9 IDE but can't figure out how to do it in this platform.
Thanks!
You should do something like uint8_t* const address_space = calloc( memory_size, sizeof(uint8_t) );, check the return value of course, and then make all your machine addresses indices into the array, like address_space[dest] = register[src];. If your emulated CPU can handle data of different sizes or has less strict alignment restrictions than your host CPU, you would need to use memcpy() or pointer casts to transfer data.
Your debugger will understand expressions like address_space[i] whether address_space is statically or dynamically allocated, but you can statically allocate it if you know the exact size in advance, such as to emulate a machine with 16-bit addresses that always has exactly 65,536 bytes of RAM.

Fortran: insufficient virtual memory

I - not a professional software engineer - am currently extending a quite large scientific software.
At runtime I get an error stating "insufficient virtual memory".
At this point during runtime, the used working memory is about 550mb and the error accurs when a rather big threedimensional array is dynamically allocated. The array - if it would be allocated - would be about a size of 170mb. Adding this to the already used 550mb the program would still be way below the 2gb boundary that is set for 32bit applications. Also there is more than enough working memory available on the system.
Visual Studio is currently set that it allocates arrays on the stack. Allocating them on the heap does not make any difference anyway.
Splitting the array into smaller arrays (being the size of the one big array in sum) results in the program running just fine. So I guess that the dynamically allocated memory has to be available in one adjacent block.
So there I am and I have no clue how to solve this. I can not deallocate some of the already used 550mb as the data is still required. I also can not change very much of the configuration (e.g. the compiler).
Is there a solution for my problem?
Thank you some much in advance and best regards
phroth248
The virtual memory is the memory your program can address. It is usually the sum of the physical memory and the swap space. For example, if you have 16GB of physical memory and 4GB of swap space, the virtual memory will be 20GB. If your Fortran program tries to allocate more than those 20 addressable GB, you will get an "insufficient virtual memory" error.
To get an idea of the required memory of your 3D array:
allocate (A(nx,ny,nz))
You have nx*ny*nz elements and each element takes 8 bytes in double precision or 4 bytes in single precision. I let you do the math.
Some things:
1. It is usually preferable to to allocate huge arrays using operating system services rather than language facilities. That will circumvent any underlying library problems.
You may have a problem with 550MB in a 32-bit system. Usually there is some division of the 4GB address space into dedicated regions.
You need to make sure you have enough virtual memory.
a) Make sure your page file space is large enough.
b) Make sure that your system is not configured to limit processes address space sizes to smaller than what you need.
c) Make sure that your accounts settings are not limiting your process address space to smaller than allowed by the system.

Does using FreeDOS allow my program to access more than 64 K of memory?

I am interested in programming in C on FreeDOS while learning some basic ASM in the process, will using FreeDOS allow my program to access more than the standard 640K of memory?
And secondly, about the ASM, I know on modern processors it is hard to program on assembly due to the complexity of the CPU architecture, but does using FreeDOS limit me to the presumably simpler 16-bit instruction set?
MS-DOS and FreeDOS use the "HIMEM" areas: These are:
Some memory areas above 0xA000:0x0000 reserved for extension cards that contain RAM instead of extension cards
The memory starting from 0xFFFF:0x0010 to 0xFFFF:0xFFFF which is located above 1MB but can be accessed using 16-bit real mode code (if the so-called A20-line is active).
The maximum memory size that can be archieved this way is about 800K.
Using XMS and EMS you can use up to 64M:
XMS will allocate memory blocks above the area that can be accessed via 16-bit real mode code. There are special functions that can copy data from that memory to the low 640K of memory and vice versa
EMS is similar; however using EMS it is possible to "map" the high memory to a low address (a feature of 32-bit CPUs) which means that you can access some memory above the 1MB area as if it was located at an address below 1MB.
Without any extender a program can use maximum 640KB of low memory in DOS. But each structure will be limited to the size of a segment, or 64KB. That means you can have 10 large arrays of size 64KB. Of course you can have multiple arrays in a segment but their total size must not exceed the segment size. Some compilers also handle addresses spanning across multiple segments automatically so you can use objects larger than 64KB seamlessly, or you can also do the same if you're writing in assembly
To access more memory you need an extender like EMS or XMS. But note that the address space is still 20-bit wide. The extenders just map the high memory regions into some segments in the addressable space so you can only see a small window of your data at a time
Regarding assembly, you can use 32-bit registers in 16-bit mode. There are 66h and 67h prefixes to change the operand size. However that doesn't mean that writing 16-bit code is easier. In fact it has lots of idiosyncrasies to remember like the limited register usage in memory addressing. The 32-bit x86 instruction set is a lot cleaner with saner addressing modes as well as a flat address space which is a lot easier to use.

Virtual memory management in Fortran under Mac OS X

I'm writing a Fortran 90 program (compiled using gfortran) to run under Mac OS X. I have 13 data arrays, each comprising about 0.6 GB of data My machine is maxed out at 8 GB real memory, and if I try to hold all 13 arrays in memory at once, I'm basically trying to use all 8 GB, which I know isn't possible in view of other system demands. So I know that the arrays would be subject to swapping. What I DON'T know is how this managed by the operating system. In particular,
Does the OS swap out entire data structures (e.g., arrays) when it needs to make room for other data structures, or does it rather do it on a page-by-page basis? That is, does it swap out partial arrays, based on which portions of the array have been least-recently accessed?
The answer may determine how I organize the arrays. If partial arrays can get swapped out, then I could store everything in one giant array (with indexing to select which of the 13 subarrays I need) and trust the OS to manage everything efficiently. Otherwise, I might preserve separate and distinct arrays, each one individually fitting comfortably within the available physical memory.
Operating systems are not typically made aware of structures (like arrays) in user memory. Most operating systems I'm aware of, including Mac OS X, swap out memory on a page-by-page basis.
Although the process is often wrongly called swapping, on x86 as well as on many modern architectures, the OS performs paging to what is still called the swap device (mostly because of historical reasons). The virtual memory space of each process is divided into pages and a special table, called process page table, holds the mapping between pages in virtual memory and frames in physical memory. Each page can be mapped or not mapped. Further mapped pages can be present or not present. Access to an unmapped page results in segmentation fault. Access to a non-present page results in page fault which is further handled by the OS - it takes the page from the swap device and installs it into a frame in the physical memory (if any is available). The standard page size is 4 KiB on x86 and almost any other widespread architecture nowadays. Also, modern MMUs (Memory Management Units, often an integral part of the CPU) support huge pages (e.g. 2 MiB) that can be used to reduce the amount of entries in the page tables and thus leave more memory for user processes.
So paging is really fine grained in comparison with your data structures and one often has loose or no control whatsoever over how the OS does it. Still, most Unices allow you to give instructions and hints to the memory manager using the C API, available in the <sys/mman.h> header file. There are functions that allows you to lock a certain portion of memory and prevent the OS from paging it out to the disk. There are functions that allows you to hint the OS that a certain memory access pattern is to be expected so that it can optimise the way it moves pages in and out. You may combine these with clearly developed data structures in order to achieve some control over paging and to get the best performance of a given OS.

How did 16-bit C compilers work?

C's memory model, with its use of pointer arithmetic and all, seems to model flat address space. 16-bit computers used segmented memory access. How did 16-bit C compilers deal with this issue and simulate a flat address space from the perspective of the C programmer? For example, roughly what assembly language instructions would the following code compile to on an 8086?
long arr[65536]; // Assume 32 bit longs.
long i;
for(i = 0; i < 65536; i++) {
arr[i] = i;
}
How did 16-bit C compilers deal with
this issue and simulate a flat address
space from the perspective of the C
programmer?
They didn't. Instead, they made segmentation visible to the C programmer, extending the language by having multiple types of pointers: near, far, and huge. A near pointer was an offset only, while far and huge pointers were a combined segment and offset. There was a compiler option to set the memory model, which determined whether the default pointer type was near or far.
In Windows code, even today, you'll often see typedefs like LPCSTR (for const char*). The "LP" is a holdover from the 16-bit days; it stands for "Long (far) Pointer".
C memory model does not in any way imply flat address space. It never did. In fact, C language specification is specifically designed to allow non-flat address spaces.
In the most trivial implementation with segmented address space, the size of the largest continuous object would be limited by the size of the segment (65536 bytes on a 16 bit platform). This means that size_t in such implementation would be 16 bit, and that your code simply would not compile, since you are attempting to declare an object that has larger size than the allowed maximum.
A more complex implementation would support so called huge memory model. You see, there's really no problem addressing continuous memory blocks of any size on a segmented memory model, it just requires some extra efforts in pointer arithmetics. So, within the huge memory model, the implementation would make those extra efforts, which would make the code a bit slower, but at the same time would allow addressing objects of virtually any size. So, your code would compile perfectly fine.
The true 16-bit environments use 16 bit pointers which reach any address. Examples include the PDP-11, 6800 family (6802, 6809, 68HC11), and the 8085. This is a clean and efficient environment, just like a simple 32-bit architecture.
The 80x86 family forced upon us a hybrid 16-bit/20-bit address space in so-called "real mode"—the native 8086 addressing space. The usual mechanism to deal with this was enhancing the types of pointers into two basic types, near (16-bit pointer) and far (32-bit pointer). The default for code and data pointers can be set in bulk by a "memory model": tiny, small, compact, medium, far, and huge (some compilers do not support all models).
The tiny memory model is useful for small programs in which the entire space (code + data + stack) is less than 64K. All pointers are (by default) 16 bits or near; a pointer is implicitly associated with a segment value for the whole program.
The small model assumes that data + stack is less than 64K and in the same segment; the code segment contains only code, so can have up to 64K as well, for a maximum memory footprint of 128K. Code pointers are near and implicitly associated with CS (the code segment). Data pointers are also near and associated with DS (the data segment).
The medium model has up to 64K of data + stack (like small), but can have any amount of code. Data pointers are 16 bits and are implicitly tied to the data segment. Code pointers are 32 bit far pointers and have a segment value depending on how the linker has set up the code groups (a yucky bookkeeping hassle).
The compact model is the complement of medium: less than 64K of code, but any amount of data. Data pointers are far and code pointers are near.
In large or huge model, the default subtype of pointers are 32 bit or far. The main difference is that huge pointers are always automatically normalized so that incrementing them avoids problems with 64K wrap arounds. See this.
In DOS 16 bit, I dont remember being able to do that. You could have multiple things that were each 64K (bytes)(because the segment could be adjusted and the offset zeroed) but dont remember if you could cross the boundary with a single array. The flat memory space where you could willy nilly allocate whatever you wanted and reach as deep as you liked into an array didnt happen until we could compile 32 bit DOS programs (on 386 or 486 processors). Perhaps other operating systems and compilers other than microsoft and borland could generate flat arrays greater than 64kbytes. Win16 I dont remember that freedom until win32 hit, perhaps my memory is getting rusty...You were lucky or rich to have a megabyte of memory anyway, a 256kbyte or 512kbyte machine was not unheard of. Your floppy drive had a fraction of a meg to 1.44 meg eventually, and your hard disk if any had a dozen or few meg, so you just didnt compute thing that large that often.
I remember the particular challenge I had learning about DNS when you could download the entire DNS database of all registered domain names on the planet, in fact you had to to put up your own dns server which was almost required at the time to have a web site. That file was 35megabytes, and my hard disk was 100megabytes, plus dos and windows chewing up some of that. Probably had 1 or 2 meg of memory, might have been able to do 32 bit dos programs at the time. Part if it was me wanting to parse the ascii file which I did in multiple passes, but each pass the output had to go to another file, and I had to delete the prior file to have room on the disk for the next file. Two disk controllers on a standard motherboard, one for the hard disk and one for the cdrom drive, here again this stuff wasnt cheap, there were not a lot of spare isa slots if you could afford another hard disk and disk controller card.
There was even the problem of reading 64kbytes with C you passed fread the number of bytes you wanted to read in a 16 bit int, which meant 0 to 65535 not 65536 bytes, and performance dropped dramatically if you didnt read in even sized sectors so you just read 32kbytes at a time to maximize performance, 64k didnt come until well into the dos32 days when you were finally convinced that the value passed to fread was now a 32 bit number and the compiler wasnt going to chop off the upper 16 bits and only use the lower 16 bits (which happened often if you used enough compilers/versions). We are currently suffering similar problems in the 32 bit to 64 transition as we did with the 16 to 32 bit transition. What is most interesting is the code from the folks like me that learned that going from 16 to 32 bit int changed size, but unsigned char and unsigned long did not, so you adapted and rarely used int so that your programs would compile and work for both 16 and 32 bit. (The code from folks from that generation kind of stands out to other folks that also lived through it and used the same trick). But for the 32 to 64 transition it is the other way around and code not refactored to use uint32 type declarations are suffering.
Reading wallyk's answer that just came in, the huge pointer thing that wrapped around does ring a bell, also not always being able to compile for huge. small was the flat memory model we are comfortable with today, and as with today was easy because you didnt have to worry about segments. So it was a desireable to compile for small when you could. You still didnt have a lot of memory or disk or floppy space so you just didnt normally deal with data that large.
And agreeing with another answer, the segment offset thing was 8088/8086 intel. The whole world was not yet dominated by intel, so there were other platforms that just had a flat memory space, or used other tricks perhaps in hardware (outside the processor) to solve the problem. Because of the segment/offset intel was able to ride the 16 bit thing longer than it probably should have. Segment/offset had some cool and interesting things you could do with it, but it was as much a pain as anything else. You either simplified your life and lived in a flat memory space or you constantly worried about segment boundaries.
Really pinning down the address size on old x86's is sort of tricky. You could say that its 16 bit, because the arithmetic you can perform on an address must fit in a 16 bit register. You could also say that it's 32 bit, because actual addresses are computed against a 16 bit general purpose register and 16 bit segment register (all 32 bits are significant). You could also just say it's 20 bit, because the segment registers are shifted 4 bits left and added to the gp registers for hardware addressing.
It actually doesn't matter that much which one of these you chose, because they are all roughly equal approximations of the c abstract machine. Some compilers let you pick a memory model you were using per compilation, while others just assume 32 bit addresses and then carefully check that operations that could overflow 16 bits emit instructions that handle that case correctly.
Check out this wikipedia entry. About Far pointers. Basically, its possible to indicate a segment and an offset, making it possible to jump to another segment.

Resources