Memory management in Lua

Memory management in Lua - arrays

I am making a level editor for a simple game in lua and the tiles are represented by integers in a 2d array, when I read the level description in from a file, it may so happen that this 2d array is sparsely populated, how does lua manage memory ? will it kep those holes in the array or will it be smart about it and not waste any space?

The question itself is irrelevant in a practical sense. You have one of two cases:
Your tilemaps are reasonably small.
Your tilemaps are big enough such that compression is important for fitting in memory constraints.
If #1 is the case, then you shouldn't care. It doesn't matter how memory efficient Lua is or isn't, because your tilemaps aren't big enough for it to ever matter.
If #2 is the case, then you shouldn't care either. Why? Because if fitting in memory is important to you, and you're likely to run out, then you shouldn't leave it to the vagaries of how Lua happens to manage the memory for arrays.
If memory is important, you should build a specialized data structure that Lua can use, but is written in C. That way, you can have explicit control over memory management; your tilemaps will therefore take up as much or as little memory as you choose for them to.
As for the actual question, it rather depends on how you build your "array". Lua tables are associative arrays by nature, but their implementation is split between an "array part" and a "table part". In general though, if you store elements sparsely, then the elements will be sparsely stored in memory (for some definition of "sparse"). As long as you don't do something silly like:
for i = 1, max_table_size do
my_tilemap[i] = 0
end
Then again, you may want to do that for performance reasons. This ensures that you have a big array rather than a sparse table. Since the array elements are references rather than values, they only take up maybe 16 bytes per element. Once you decide to put something real in an entry (an actual tile), you can. Indexing into the array would be fast in this case, though since the table part is a hash-table, it's not exactly slow.

Related

Why would you use a LIFO stack over an array?

I was recently in an interview that required me to choose over the two data structures for a problem, and now I have the question of:
What is the reasoning for using a Stack over an array if the only operations needed are push and pop? An array provides constant time for appending and popping the last element from it and it takes up generally less memory than implementing a Stack with a LinkedList. It also provides random access should it be required. Is the only reasoning because an array is typically of fixed size, so we need to dynamically resize the array for each element we put in? This still is in constant time though isn't it unless the penalty is disproportionate?

There are several aspects to consider here...
First, a Stack is an abstract data type. It doesn't define how to implement itself.
An array is (generally) a well defined concrete implementation, and might even be fixed size unless explicitly defined to be dynamic.
A dynamic array can be implemented such that it automatically grows by some factor when exhausted and also might shrink when fill rate drops. These operations are not constant time, but are actually amortized to constant time because the array doesn't grow or shrink in each operation. In terms of memory usage it's hard to imagine an array being more expensive then a linked list unless extremely under used.
The main problem with an array is large allocation size. This is both a problem of maximum limitation and memory fragmentation. Using a linked list avoids both issues because every entry has a small memory footprint.

In some languages like C++, the underlying container that the 'stack' class uses can actually be changed between a dynamic array (vector), linked list (list), or even a double ended queue (deque). I only mention this because its typically not fair to compare a stack vs an array (one is an interface, another is a data structure).
Most dynamic array implementations will allocate more space than is needed, and upon filling the array they will again resize to 2x the size and so on. This avoids allocations and keeps the performance of push generally constant time. However the occasional resize does require copying elements O(n), though this is usually said to amortized to constant time. So in general, you are correct in that this is efficient.
Linked lists on the other hand typically require allocations for every push, which can be somewhat expensive, and the node's they create are larger in size than a single element in the array.
One possible advantage of linked lists, however, is that they do not require contiguous memory. If you have many many elements, its possible that you can fail to allocate a large enough block of memory for an array. Having said that, linked lists take up more memory... so its a bit of a wash.
In C++ for example, the stack by default uses the deque container. The deque is typically implemented as a dynamic array of 'pages' of memory. Each page of memory is fixed in size, which allows the container to actually have random access properties. Moreover, since each page is separate, then the entire container does not require contiguous memory meaning that it can store many many elements. Resizing is also cheap for a deque because it simply allocates another page, making it a great choice for a large stack.

What is the most suitable alternative for Linked List?

I am working on Embedded C, Task related implementation in OS. I have implemented the Linked List. Now it needs to minimize the use of pointers to satisfy MISRA C, in my present implementation I am searching for the best alternative for the Linked List, in Embedded OS for task operation.

It'd be easy to use a static array of structures to completely avoid pointers (you'd just use array indexes and not pointers). This has both advantages and disadvantages.
The disadvantages are:
you have to implement your own allocator (to allocate and free "array elements" within the static array)
the memory used for the array can't be used for any other purpose when it's not being used for the linked list
you have to determine a "max. number of elements that could possibly be needed"
it has all the same problems as pointers. E.g. you can access an array element that was freed, free the same array element multiple times, use an index that's out of bounds (including the equivalent of NULL if you decide to do something like use -1 to represent NULL_ELEMENT), etc.
The advantages are:
by implementing your own allocator you can avoid the mistakes caused by malloc(), including (e.g.) checking something isn't already free when freeing it and returning an error instead of trashing your own metadata
allocation can typically be simpler/faster, because you're only allocating/freeing one "thing" (array element) at a time and don't need to worry about allocating/freeing a variable number of contiguous "things" (bytes) at a time
entries in your list are more likely to be closer (in memory) to each other (unlike for malloc() where your entries are scattered among everything else you allocate), and this can improve performance (cache locality)
you have a "max. number of elements that could possibly be needed" to make it far easier to track down problems like (e.g.) memory leaks; and (where memory is limited) make it easier to determine things like worst case memory footprint
it satisfies pointless requirements (like "no pointers") despite not avoiding anything these requirements are intended to avoid

Now it needs to minimize the use of pointers to satisfy MISRA C
I used to work with some embedded engineers. They built low-end (and high-end) routers and gateways. Rather than dynamically allocating memory, they used fixed buffers provisioned at boot. They then tracked indexes into the array of provisioned buffers.
Static arrays and indexes begs for a Cursor data structure. Your first search hit is Cursor Implementation of Linked Lists from
Data Structures and Algorithm Analysis in C++, 2nd ed. by Mark Weiss. (I actually used that book in college years ago).

Memory consumption of a concurrent data structure in C

I would like to know how much memory a given data structure is consuming. So suppose I have a concurrent linked list. I would like to know how big the list is. I have a few options: malloc_hooks, which I do not think is thread-safe, and getrusage's ru_maxrss, but I don't really know what that gives me (how much memory the whole process consumed during its execution?). I would like to know if someone has actually measured memory consumption in this way. Is there a tool to do this? How does massif fare?

To get an idea of how many bytes it actually costs to malloc some structure, like a linked list node, make an isolated test case(non-concurrent!) which allocates thousands of them, and look at the delta values in the program's memory usage. There are various ways to do that. If your library has a mallinfo structure, like the GNU C Library found on GNU/Linux systems, you can look at the statistics before and after. Another way is to trace the program's system calls to watch its pattern of allocating from the OS. If, say, we allocate 10,000,000 list nodes, and the program performs a sbrk() call about 39,000 times, increasing the size of the process by 8192 bytes in each call, then that means that a list node takes up 32 bytes, overhead and all.
Keeping in mind that allocating thousands of objects of the same size in a single thread does not realistically represent the actual memory usage in a realistic program, which includes fragmentation.
If you want to allocate small structures and come close to not wasting a byte (or not causing any waste that you don't know about and control), and to control fragmentation, then allocate large arrays of the objects from malloc (or your system allocator of choice) and break them up yourself. There is still unknown overhead in the malloc but it is divided over a large number of objects, making it negligible.
Or, generally, write your own allocator whose behavior and overheads you understand in detail, and which itself takes big chunks from the system.

Conceptually speaking you need to know the number of items you are working with. Then you need to know the size of each different data type used in your data structure. You also will have to take into account the size of pointers or anything that is somewhat using some sort of memory.
Then you can come up with a formula that looks like the following:
Consumption= N *(sizeof(data types) ).
So in other words you want to make sure you add any data type together (the data type's size) and multiply it by the number of items.

Is that possible to mmap a very big file and using qsort?

I have to sort a large amount of data that can not fit in memory, and one thing could do this I know is "external sort". But I am wondering is that possible to mmap this large data file, and use 'qsort' as it is a 'normal data array'? If that's feasible, what's the differences with 'external sort'?

If the file will fit in a contiguous mapping in your address space, you can do this. If it won't, you can't.
As to the differences:
if the file just about fits, and then you add some more data, the mmap will fail. A normal external sort won't suddenly stop working because you have a little more data.
if you don't map it with MAP_PRIVATE, sorting will mutate the original file. A normal external sort won't (necessarily)
if you do map it with MAP_PRIVATE, you could crash at any time if the VM doesn't have room to duplicate the whole file. Again, a strictly external sort's memory requirements don't scale linearly with the data size.
tl;dr
It is possible, it may fail unpredictably and unrecoverably, you almost certainly shouldn't do it.

It should definitely work if the data fits in address space (almost certainly does on 64-bit machines; might or might not on 32-bit ones), but performance will depend on a lot on the underlying algorithm used by qsort and its data locality properties. One issue to consider is whether it's the number of elements that's huge, or whether each element is large on disk. In the latter case, you'd be better off doing the mmap, but allocating a separate array of pointers to each element, then sorting the pointer array with a comparison function that compares what they point to. This will drastically reduce the number of times data gets moved around in memory, but it will take a little work at the end if you want to store the output back to the same file.

Yes, this is possible as long as you have fixed-length records in the file and the file fits within a range of contiguous VM addresses, and in fact this can be considered a naive approach to external sorting. It may not be the fastest algorithm in town, though, since qsort implementations will not be tuned for this use case.

Fortran global work array vs. local dynamically allocated arrays

I am working with an older F77 code that has been upgraded to F9X. It still has some older "legacy" code structure and I'm curious on the performance aspect towards adding code in the legacy way or modern way. We have a separate F9x code that we are trying to integrate into this older code and use as many of their procedures as possible instead of rewriting our own versions. Also note, assume that all of these procedures are NOT explicitly interfaced.
Specifically, the old code has one large rank-1 work array that is allocated in the main program and as this array is passed deeper into procedures, it is split apart and used where it is needed. Essentially there is one allocation/deallocation and the only overhead with this array involves finding the starting indices (trivial) of needed temporary arrays and passing these sections of the work array into the procedure.
Our new code generally uses lower level procedures from the old code in which multiple dummy arrays originated from the older code's global work array. Instead of the hassle of creating our own work array, finding starting indices, and passing all these array sections with their starting indices, I could just create dynamically allocated arrays where they are needed. However, these procedures can be called thousands (possibly millions for some lower level routines) of times during the code execution and I am concerned with the overhead of allocating and deallocating each time any of these procedures are used. Also, these temporary arrays could contain many millions of double precision elements.
I've also dabbled with automatic arrays but stopped when I started encountering stack overflow issues and now almost exclusively use dynamic arrays. I've heard different things about the stack and heap with regards to how memory for different kinds of arrays is stored but I really don't know the difference and which is better (performance, efficiency, etc.).
Long story short, are these dynamically allocated (or automatic) arrays going to be significantly less efficient due to overhead issues? I also realize that dynamically allocated arrays are more robust in the life span of the code but what I am really after is performance. A 5% performance gain could mean many hours saved in code execution.
I realize I might not get a definitive answer to this due to differences in compiler optimizations and other factors but I'm curious if anyone might have some knowledge/experience with anything similar. Thanks for your help.

I think that any answers are going to be guesses and speculation. My guess: array creation is going to be a very low CPU load. Unless these subroutines are doing a negligible amount of computations, the differing overhead of differing arrays types won't be noticeable. But the only way to be sure would be to try two different methods and to time them, e.g., with the Fortran intrinsic cpu_time.
Automatic arrays are usually placed on the stack, but some compilers place large automatic arrays on the heap. Some compilers have an option to change this behavior. Allocatable are probably on the heap.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight