A program to control memory for a task in C - c

I have got a task to solve, that is a bit cryptic. The task is to make a program in C that handles texts messages, the program should simulate a system with a small amount of memory, the system should only be able to hold X messages with maximum X characters, every character takes 1 byte (ASCII). To manage messages should I make a system that is held in the primary memory (to simulate a system with limited memory). When the program starts the program should allocate ONE memory area for all information for messages.
This is called the metadatastructure in the task:
The memory area used for storage in its entirety to be continuous in memory, but divided in 32 bytes data blocks, the amount of data blocks in the system should be limited to 512.
The tasks also says that i should create X number data blocks , X depends on with value X number messages the system is set to contain.
I believe I need to create a structure like a ring buffer to hold every message (data block?).
This is called the bitmap for data blocks :
To keep track of witch data block that is free and busy I have to implement a bitmap where I have 1 but for each data block. The bit value is 0(busy)/ 1(free). This bitmap should be used to find free data blocks when I want to add a message, the bitmap should be up to date when the systems deletes or creates a data block for a message.
The allocated memory for this system should be divided into 3 blocks /areas , 1 for the metadatastructure, 1 for the bitmap for each data block and 1 for data blocks.
I need help to thing aloud about solutions and how this can be solved in C.
Thanks

At the beginning of your program malloc a large block. The return pointer is where it starts and you know how big the block you asked for is so you know where it ends.
That's your memory store.
Write a allocator and de-allocator that use the store (and only the store) and call them from the rest of your program instead of calling malloc and free...
This task can also be done with a whopping big array and using array offsets as pointer equivalents, but that would be silly in c. I only mention it because I used one constructed that way in fortran for years in a major piece of particle physics software called PAW.
concerning the bit map
Your allocator must know at all times which parts of the store are in use and which are not. That's the only way it can reliably give you a currently unused block, right? Maintaining a bitmap is one way to do that.
Why is this good? Imagine that you've been using this memory for a while. Many objects have been allocated in that space and some have been freed. The available space is no long continuous, but instead is rather patchy.
Suddenly you need to allocated a big object.
Where do you find a large chunk of continuous free memory to put it in? Scanning the bitmap will be faster than walking a complicated data structure.

Related

How did maloc manage to grab the memory that was already allocated in SSL Heartbeat?

The recent Heartbleed vulnerability is caused by this particular unchecked execution:
buffer = OPENSSL_malloc(1 + 2 + payload + padding);
(according to http://java.dzone.com/articles/everything-you-need-know-about-2)
But how could malloc at any time grab the memory that is already dished out somewhere else. Even though payload and padding variables are filled out by the user values, but it seems to me that these would only be able to cause an out of memory error (with the very large value), and not the shift in address space in order to read the server's RAM outside of this very buffer.
OpenSSL uses its own memory allocator (for speed reasons so they say). Therefore memory never gets passed back to the operation system. Instead they pool unused buffers and re-use them.
If you call OPENSSL_malloc the chances are almost 100% that the buffer you get contains data previously used by OpenSSL. This could be encrypted data, unencrypted data or even the private encryption keys.
It doesn't. It grabs a small block of memory and then proceeds to copy a much larger amount of data out of it, reading past the end of the malloc'd block into whatever happens to lie after it on the heap -- other malloc'd blocks (both alive and dead) that have been used for other things (other clients or system stuff) -- and copies that raw data to the attacker.

What is lazy space allocation in Google File system

I was going through google file system (GFS) paper, It mentions that GFS uses Lazy space allocation to reduce internal fragmentation.
Can someone explain, how lazy space reduces internal fragmetation?
Source: http://research.google.com/archive/gfs-sosp2003.pdf
With lazy space allocation, the physical allocation of space is delayed as long as possible, until data at the size of the chunk size (in GFS's case, 64 MB according the 2003 paper) is accumulated. In other words, the decision process that precedes the allocation of a new chunk on disk, is heavily influenced by the size of the data that is to be written. This preference of waiting instead of allocating more chunks based on some other characteristic, minimizes the chance of internal fragmentation (i.e. unused portions of the 64 MB chunk).
In the Google paper, it also says: "Most chunks are full because most files contain many chunks, only the last of which may be partially filled." So, the same approach is applied to file creation.
It is analogous to this:
http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory
I have not read the entire paper..but I am hoping that the following fragment should help you in a small way.
The first question I would ask is: what is the effect of having large block sizes in a file system? Let us say that FS block size is 64MB. Good news is that we write in good contiguous chunks to hard disks (more data written per seek), less metadata to keep in indirect blocks, etc. Bad news is internal fragmentation..if the file is 1MB, but minimum block size is 64MB, there is Internal fragmentation of 63MB. So, how to get the good news and avoid the bad news?
One way is to do lazy space allocation OR delayed space allocation. Here, we keep the block size small (say 1MB), but we write a big chumk of data i.e. many 1MB chunks together when we write to disk. This way, we get the goodness of large block writes. Note that this means that we write to an incore buffer but tell the write() sys call that it is done writing to disk...just like writing to the buffer cache.
NOTE: When the "time" has come to do the real block allocation, we need to be guaranteed space on disk. So, delayed block allocation => space reservation is done at the time of write, but space allocation is done at a later time when enough data blocks have accumulated in-core.
Data is first written into a buffer. So, instead of allocating memory the moment the file is created, they are waiting till the actual write occurs. As in XFS http://en.wikipedia.org/wiki/XFS#Delayed_allocation
You don't have to fix the file size on creating. And you can append it to a larger file. You can reference this.

Memory allocation issues with the LPC1788 microcontroller

I'm fairly new to programming microcontrollers; I've been working with the LPC1788 for a couple of weeks now.
One problem I've been having recently is that I'm running out of memory much sooner than I expect to. I've tested how much memory seems to be available by testing how large a block of contiguous memory I can malloc, and the result is 972 bytes. Allocation is done starting at address 0x10000000 (the start of the on-chip SRAM that should be around 64kB on this board).
The program I'm working on at the moment is meant to act as a simple debugger that utilises the LCD and allows messages to be printed to it. I have one string that will constantly be "added to" by new messages, and then the whole message will be printed on the LCD. When the message's length down the screen goes past the vertical boundary, it will delete the oldest messages (the ones nearer the top) until it fits. However, I can only add about 7 additional messages before it refuses to allocate more memory. If needed, the main.c for the project is hosted at http://pastebin.com/bwUdpnD3
Earlier I also started work on a project that uses the threadX RTOS to create and execute several threads. When I tried including use of the LCD in that program, I found memory to be very limited there aswell. The LCD seems to store all pixel data starting from the SDRAM base address, but I'm not sure if that's the same thing as the SRAM I'm using.
What I need is a way to allocate memory enough to allow several threads to function or large strings to be stored, while being able to utilise the LCD. One possibility might be to use buffers or other areas of memory, but I'm not quite sure how to do that. Any help would be appreciated.
tl;dr: Quickly running out of allocatable memory on SRAM when trying to print out large strings on the LCD.
EDIT 1: A memory leak was noticed with the variable currMessage. I think that's been fixed now:
strcpy(&trimMessage[1], &currMessage[trimIndex+1]);
// Frees up the memory allocated to currMessage from last iteration
// before assigning new memory.
free(currMessage);
currMessage = malloc((msgSize - trimIndex) * sizeof(char));
for(int i=0; i < msgSize - trimIndex; i++)
{
currMessage[i] = trimMessage[i];
}
EDIT 2: Implemented memory leak fixes. Program works a lot better now, and I feel pretty stupid.
You need to be careful when choosing to use dynamic memory allocation in an embedded environment, especially with constrained memory. You can very easily end up fragmenting the memory space such that the biggest hole left is 972 bytes.
If you must allocate from the heap, do it once, and then hang onto the memory--almost like a static buffer. If possible, use a static buffer and avoid the allocation all together. If you must have dynamic allocation, keeping it to fixed sized blocks will help with the fragmentation.
Unfortunately, it does take a bit of engineering effort to overcome the fragmentation issue. It is worth the effort, and it does make the system much more robust though.
As for SRAM vs SDRAM, they aren't the same. I'm not familiar with threadX, and whether or not they have a Board Support Package (BSP) for your board, but in general, SDRAM has to be setup. That means the boot code has to initialize the memory controller, setup the timings, and then enable that space. Depending on your heap implementation, you need to dynamically add it or--more likely--you need to compile with your heap space pointing to where it will ultimately live (in SDRAM space). Then, you have to make sure to come up and get the memory controlled configured and activated before actually using the heap.
One other thing to watch out for, you may actually be running code from the SRAM space, and some of that space is also reserved for processor exception tables. That whole space may not be available, and may live through two different addresses (0x00000000 and 0x10000000) for instance. I know in some other ARM9 processors, this is common. You can boot from flash, which gets mapped initially into the 0x00000000 space, and then you do a song and dance to copy the booter into SRAM and map SRAM into that space. At that point, you can boot into something like Linux, who expects to be able to update the tables that live down at 0.
BTW, it does look like you have some memory leaks in the code you posted. Namely, currentMessage is never freed... only overwritten with a new pointer. Those blocks are then lost forever.

CUDA shared memory not faster than global?

Hi i have kernel function, where i need to compare bytes. Area where i want to search is divided into blocks, so array of 4k bytes is divided to 4k/256 = 16 blocks. Each thread in block reads array on idx and compare it with another array, where is what i want to search. I've done this by two ways:
1.Compare data in global memory, but often threads in block need to read the same address.
2.Copy data from global memory to shared memory, and compare bytes in shared memory in the same way as mentioned above. Still problem with same address read.
Copy to shared memory looks like this:
myArray[idx] = global[someIndex-idx];
whatToSearch[idx] = global[someIndex+idx];
Rest of the code is the same. Only operations on data in example 2 are performed in shared arrays.
But first option is about 10% faster, than that with the shared memory, why?? Thank you for explanations.
If you are only using the data once and there is no data reuse between different threads in a block, then using shared memory will actually be slower. The reason is that when you copy data from global memory to shared, it still counts as a global transaction. Reads are faster when you read from shared memory, but it doesn't matter because you already had to read the memory once from global, and the second step of reading from shared memory is just an extra step that doesn't provide anything of value.
So, the key point is that using shared memory is only useful when you need to access the same data more than once (whether from the same thread, or from different threads in the same block).
You are using shared memory to save on accesses to global memory, but each thread is still making two accesses to global memory, so it won't be faster. The speed drop is probably because the threads that access the same location in global memory within a block try to read it into the same location in shared memory, and this needs to be serialized.
I'm not sure of exactly what you are doing from the code you posted, but you should ensure that the number of times global is read from and written to, aggregated across all the threads in a block, is significantly lower when you use shared memory. Otherwise you won't see a performance improvement.

malloc and obtaining recently freed memory

I am allocating the array and freeing it every callback of an audio thread. The main user thread (a web browser) is constantly allocating and deallocating memory based on user input. I am sending the uninited float array to the audio card. (example in my page from my profile.) The idea is to hear program state changes.
When I call malloc(sizeof(float)*256*13) and smaller i get an array filled with a wide range of floats which have a seemingly random distribution. It is not right to call it random - presumably this comes from whatever the memory block previously held. This is the behavior I expected and want to exploit. However when I do malloc(sizeof(float)*256*14) and larger, I get an array filled only with zeros. I would like to know why this cliff exists and if theres something I can do to get around it. I know it is undefined behavior per the standard, but I'm hoping someone that knows the implementation of malloc on some system might have an explanation.
Does this mean malloc is also memsetting the block to zero for larger sizes? This would be surprising since it wouldn't be efficient. Even if there are more chunks of memory zeroed out, I'd expect something to happen sometimes, since the arrays are constantly changing.
If possible I would like to be able to obtain chunks of memory that are reallocated over recently freed memory, so any alternatives would be welcomed.
I guess this is a strange question for some because my goal is to explore undefined behavior and use bad programming practices deliberately, but this is the application I am interested in making, so please bear with the usage of uninited arrays. I know the behavior of such usage is undefined, so please bear with me and don't tell me not to do it. I'm developing on a mac 10.5.
Most likely, the larger allocations result in the heap manager directly requesting pages of virtual address space from the kernel. Freeing will return that address space back to the kernel. The kernel must zero all pages that are allocated for a process - this is to prevent data leaking from one process to another.
Smaller allocations are handled by the user-mode heap manager within the process by taking these larger page allocations from the kernel, carving them up into smaller blocks, and reusing blocks on subsequent allocations. These do not need to be zero-initialized, since the memory contents always comes from your own process.
What you'll probably find is that previous requests could be filled using smaller blocks joined together. But when you request the bigger memory, then the existing free memory probably can't handle that much and flips some inbuilt switch for a request direct from the OS.

Resources