Dynamic datastructure array for storing large data

Dynamic datastructure array for storing large data - arrays

I read an article about Dynamically Sized Arrays on ITJungle and was wondering if this is not an "making easy thing much more complex" thing.
So as I understand if I define an static variable, including arrays, the runtime reserves the needed space at RUNTIME. So when defining a array of CHAR(10) DIM(10) the whole space would be reserved when starting the program.
So as the article says if I want to have a dynamically increasing array which resizes itself to fit the data like an List<String> in C#, I have to create a CHAR(10) DIM(10). I then have to re-allocate new space only if needed?
Why? The space is already reserved. What reason would someone have to base a array with (lets say) 100 bytes size on a pointer when only needing i.e. 80 bytes?
Am I just missing something? Is the "init-value" for sizing the array just to calm down the compiler so I don't get an error that the "compiler doesn't know the size at compile time"?

For normal arrays, you are correct that the space gets allocated at runtime as soon as the particular arrays scope is reached (start of the program for globals, start of subprocedure for subprocedures).
However, you will notice that the data structure is declared with based(pInfo). based is the keyword that will cause the memory to NOT be allocated. It will instead assume the all the memory for the data structure (included the array member) is already allocated at the location specified by the pointer passed to the based keyword (pInfo in this case).
Effectively, once you use the based keyword you are simply telling the compiler how you would like the memory at the specified pointer to be used but it is up to you to actually manage that memory.
In summary, if I understand your question properly, the statement you made about "knowing the size at compile time" is correct. RPG does not support pointer/array duality or array-like objects like some languages so you essentially just have to declare to RPG that you will NEVER go beyond "init-value" bounds.

Related

Shrink memory of an array of pointers, possible?

I am having difficulties to find a possible solution so I decided to post my question. I am writing a program in C, and:
i am generating a huge array containing a lot of pointers to ints, it is allocated dynamically and filled during runtime. So before I don't know which pointers will be added and how many. The problem is that they are just to many of them, so I need to shrink somehow the space.
IS there any package or tool available which could possibly encode my entries somehow or change the representation so that I save space?
Another question, I also thought about writing a file with my information, is this then kept in memory the whole time or just if I reopen the file again?

It seems like you are looking for a simple dynamic array (the advanced data type dynamic array, that is). There should be many implementations for this out there. You can simply start with a small dynamic array and push new items to the back just like you would do with a vector in c++ or java. One implementation would be GArray. You will only allocate the memory you need.
If you have to/want to do it manually, the usual method is to store the capacity and the size of the array you allocated along with the pointer in a struct and call realloc() from within push_back() whenever you need more space. Usually you should increase the size of your array by a factor of 1.3 to 1.4, but a factor of 2 will do if you're not expecting a HUGE array. If you call remove and your size is below a certain threshold (e.g. capacity/2) you shrink the array again with realloc();

When is malloc necessary in C?

I think all malloc(sizeof(structure)) can be replaced this way:
char[sizeof(structure)]
Then when is malloc necessary?

When you don't know how many object of some kind you need (e.g. linked list elements);
when you need to have data structures of size known only at runtime (e.g. strings based on unknown input); this is somewhat mitigated by the introduction of VLAs in C99, but see the next point:
when you know at compile time their size (or you can use VLAs), but it's just too big for the stack (typically a few MBs at most) and it would make no sense to make such thing global (e.g. big vectors to manipulate);
when you need to have an object whose lifetime is different than what automatic variables, which are scope-bound (=>are destroyed when the execution exits from the scope in which they are declared), can have (e.g. data that must be shared between different objects with different lifetimes and deleted when no one uses it anymore).
Notice that it isn't completely impossible to do without dynamic memory allocation (e.g. the whole rockbox project works almost without it), but there are cases in which you actually need to emulate it by using a big static buffer and writing your own allocator.
By the way, in C++ you will never use malloc()/free(), but the operators new and delete.
Related: a case in which trying to work without malloc has proven to be a big mess.

You will use malloc to dynamically allocate memory, either because:
you don't know at compile-time how much memory will be required,
you want to be able to reallocate memory later on (for instance using realloc),
you want to be able to discard the allocated memory earlier than by waiting for its release based on the scope of your variable.
I can see your point. You could think you could always using a declarative syntax for all of these, even using variables to declare the size of your memory spaces, but that would:
be non-standard,
give you less control,
possibly use more memory as you will need to do copies instead of re-allocating.
You will probably get to understand this in time, don't worry.
Also, you should try to learn more about the memory model. You don't use the same memory spaces when using a dynamic allocation and when using a static allocation.
For first pointers, visit:
Dynamic Memory Allocation
Static Memory Allocation
Stack vs Heap
Stack vs Heap?
How C Programming Works - Dynamic Data Structures
Friendly advice: I don't know if you develop C on *NIX or Windows, but in any case if you use gcc, I recommend using the following compilation flags when you teach yourself:
-Wall -ansi -pedantic -Wstrict-prototypes

You should read about dynamic memory allocation. You obviously don't know what it is.
The main difference between the two is that memory allocated with malloc() exists until you say so. Static memory such as char buff[10]; only exists in the function scope.

malloc is a dynamic memory allocator which helps u up to assign memory to ur variables according to ur need and therefore reduces the loss of memory.It is also supported by realloc() function through which u can edit the memory required which u have defined earlier through malloc() or calloc(). So in short we can say that malloc() can be used for managing the memory space and making use of the necessary memory without wasting it.

You never should do this the way you are proposing. Others already told you about the difference of allocating storage on the heap versus allocation on the function stack. But if and when you are allocating on the stack you should just declare your variable:
structure A = { /* initialize correctly */ };
There is no sense or point in doing that as an (basically) untyped char array. If you also need the address of that beast, well, take the address of with &A.

When you don't know how much memory to allocate at compile time. Take a very simple program, where you need to store the numbers entered by the user in linked list. Here you dont know how many numbers will be entered by the user. So as user enters a number you will create a node for it using malloc and store it in the linked list.

If you use char[sizeof(structure)] instead of malloc, then I think no dynamic memory allocation is done.

Besides the fact that your char[] method cannot resize or determine the size at runtime, your array might not be properly aligned for the type of structure you want to use it for. This can result in undefined behaviour.

Variable Length Array

I would like to know how a variable length array is managed (what extra variables or data structures are kept on the stack in order to have variable length arrays).
Thanks a lot.

It's just a dynamically sized array (implementation-dependent, but most commonly on the stack). It's pretty much like alloca in the old days, with the exception that sizeof will return the actual size of the array, which implies that the size of the array must also be stored somewhere (implementation-dependent as well, but probably on the stack too).

The size of variable length arrays is determined on run-time, instead of compilation time.
The way it's managed depends on the compiler.
GCC, for instance, allocates memory on the stack.But there is no special structure. It's just a normal array, whose size is known at run-time.

alternatively you could use some containers, e.g. ArrayList in java or vector in c/c++

Why does C need arrays if it has pointers?

If we can use pointers and malloc to create and use arrays, why does the array type exist in C? Isn't it unnecessary if we can use pointers instead?

Arrays are faster than dynamic memory allocation.
Arrays are "allocated" at "compile time" whereas malloc allocates at run time. Allocating takes time.
Also, C does not mandate that malloc() and friends are available in free-standing implementations.
Edit
Example of array
#define DECK_SIZE 52
int main(void) {
int deck[DECK_SIZE];
play(deck, DECK_SIZE);
return 0;
}
Example of malloc()
int main(void) {
size_t len = 52;
int *deck = malloc(len * sizeof *deck);
if (deck) {
play(deck, len);
}
free(deck);
return 0;
}
In the array version, the space for the deck array was reserved by the compiler when the program was created (but, of course, the memory is only reserved/occupied when the program is being run), in the malloc() version, space for the deck array has to be requested at every run of the program.
Arrays can never change size, malloc'd memory can grow when needed.
If you only need a fixed number of elements, use an array (within the limits of your implementation).
If you need memory that can grow or shrink during the running of the program, use malloc() and friends.

It's not a bad question. In fact, early C had no array types.
Global and static arrays are allocated at compile time (very fast). Other arrays are allocated on the stack at runtime (fast). Allocating memory with malloc (to be used for an array or otherwise) is much slower. A similar thing is seen in deallocation: dynamically allocated memory is slower to deallocate.
Speed is not the only issue. Array types are automatically deallocated when they go out of scope, so they cannot be "leaked" by mistake. You don't need to worry about accidentally freeing something twice, and so on. They also make it easier for static analysis tools to detect bugs.
You may argue that there is the function _alloca() which lets you allocate memory from the stack. Yes, there is no technical reason why arrays are needed over _alloca(). However, I think arrays are more convenient to use. Also, it is easier for the compiler to optimise the use of an array than a pointer with an _alloca() return value in it, since it's obvious what a stack-allocated array's offset from the stack pointer is, whereas if _alloca() is treated like a black-box function call, the compiler can't tell this value in advance.
EDIT, since tsubasa has asked for more details on how this allocation occurs:
On x86 architectures, the ebp register normally refers to the current function's stack frame, and is used to reference stack-allocated variables. For instance, you may have an int located at [ebp - 8] and a char array stretching from [ebp - 24] to [ebp - 9]. And perhaps more variables and arrays on the stack. (The compiler decides how to use the stack frame at compile time. C99 compilers allow variable-size arrays to be stack allocated, this is just a matter of doing a tiny bit of work at runtime.)
In x86 code, pointer offsets (such as [ebp - 16]) can be represented in a single instruction. Pretty efficient.
Now, an important point is that all stack-allocated variables and arrays in the current context are retrieved via offsets from a single register. If you call malloc there is (as I have said) some processing overhead in actually finding some memory for you. But also, malloc gives you a new memory address. Let's say it is stored in the ebx register. You can't use an offset from ebp anymore, because you can't tell what that offset will be at compile time. So you are basically "wasting" an extra register that you would not need if you used a normal array instead. If you malloc more arrays, you have more "unpredictable" pointer values that magnify this problem.

Arrays have their uses, and should be used when you can, as static allocation will help make programs more stable, and are a necessity at times due to the need to ensure memory leaks don't happen.
They exist because some requirements require them.
In a language such as BASIC, you have certain commands that are allowed, and this is known, due to the language construct. So, what is the benefit of using malloc to create the arrays, and then fill them in from strings?
If I have to define the names of the operations anyway, why not put them into an array?
C was written as a general purpose language, which means that it should be useful in any situation, so they had to ensure that it had the constructs to be useful for writing operating systems as well as embedded systems.
An array is a shorthand way to specify pointing to the beginning of a malloc for example.
But, imagine trying to do matrix math by using pointer manipulations rather than vec[x] * vec[y]. It would be very prone to difficult to find errors.

See this question discussing space hardening and C. Sometimes dynamic memory allocation is just a bad idea, I have worked with C libraries that are completely devoid of malloc() and friends.
You don't want a satellite dereferencing a NULL pointer any more than you want air traffic control software forgetting to zero out heap blocks.
Its also important (as others have pointed out) to understand what is part of C and what extends it into various uniform standards (i.e. POSIX).

Arrays are a nice syntax improvement compared to dealing with pointers. You can make all sorts of mistakes unknowingly when dealing with pointers. What if you move too many spaces across the memory because you're using the wrong byte size?

Explanation by Dennis Ritchie about C history:
Embryonic C
NB existed so briefly that no full description of it was written. It supplied the types int and char, arrays of them, and pointers to them, declared in a style typified by
int i, j;
char c, d;
int iarray[10];
int ipointer[];
char carray[10];
char cpointer[];
The semantics of arrays remained exactly as in B and BCPL: the declarations of iarray and carray create cells dynamically initialized with a value pointing to the first of a sequence of 10 integers and characters respectively. The declarations for ipointer and cpointer omit the size, to assert that no storage should be allocated automatically. Within procedures, the language's interpretation of the pointers was identical to that of the array variables: a pointer declaration created a cell differing from an array declaration only in that the programmer was expected to assign a referent, instead of letting the compiler allocate the space and initialize the cell.
Values stored in the cells bound to array and pointer names were the machine addresses, measured in bytes, of the corresponding storage area. Therefore, indirection through a pointer implied no run-time overhead to scale the pointer from word to byte offset. On the other hand, the machine code for array subscripting and pointer arithmetic now depended on the type of the array or the pointer: to compute iarray[i] or ipointer+i implied scaling the addend i by the size of the object referred to.
These semantics represented an easy transition from B, and I experimented with them for some months. Problems became evident when I tried to extend the type notation, especially to add structured (record) types. Structures, it seemed, should map in an intuitive way onto memory in the machine, but in a structure containing an array, there was no good place to stash the pointer containing the base of the array, nor any convenient way to arrange that it be initialized. For example, the directory entries of early Unix systems might be described in C as
struct {
int inumber;
char name[14];
};
I wanted the structure not merely to characterize an abstract object but also to describe a collection of bits that might be read from a directory. Where could the compiler hide the pointer to name that the semantics demanded? Even if structures were thought of more abstractly, and the space for pointers could be hidden somehow, how could I handle the technical problem of properly initializing these pointers when allocating a complicated object, perhaps one that specified structures containing arrays containing structures to arbitrary depth?
The solution constituted the crucial jump in the evolutionary chain between typeless BCPL and typed C. It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today's C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.
To summarize in my own words - if name above were just a pointer, any of that struct would contain an additional pointer, destroying the perfect mapping of it to an external object (like an directory entry).

Can I increase the size of a statically allocated array?

I know its possible to increase the size of a dynamically allocated array.
But can I increase the size of a statically allocated array?
If yes,how?
EDIT: Though this question is intended for C language, consider other languages too.Is it possible in any other language?

Simple answer is no, this cannot be done. Hence the name "static".
Now, lots of languages have things that look like statically allocated arrays but are actually statically allocated references to a dynamically allocated array. Those you could resize.

in VB .NET it would be:
Redim Preserve ArrayName(NewSize)
not sure what langauge you're after though...
And I wouldn't use this command a lot... its terribly inefficient. Linked lists and growing data structures are much more efficient.

No. It is not. There are two options here:
Use a dynamic one
Or,at the risk of wasting memory, if you have an idea about the maximum number of elements that the array will store, statically allocate accordingly
Yes, that was C.

If you're careful, you can use alloca(). The array is allocated on the stack, but in terms of the code style it's a lot like if you used malloc (you don't have to free it though, that's done automatically). I'll let you decide whether to call that a "static" array.

No. Static allocation gives the compiler permission to make all kinds of assumptions which are then baked into the program during compilation.
Among those assumptions are that:
it is safe to put other data immediately after the array (not leaving you room to grow), and
that the array starts at a certain address, which then becomes part of the machine code of the program; you can't allocate a new array somewhere (and use it) because the references to the address can't be updated.
(Well, references could be updated, if the program was stored in ram, but self-modifying programs are highly frowned upon, and surely more trouble than dynamic arrays.)

Technically, in C it isn´t even possible to increase the size of a dynamically allocated array.
In fact, realloc() does some kind of "create new object & copy the data" routine. It does not modify the size of an existant heap-memory object at all.
So the answer is simple as that, that you are not be able to change the size of any object or array of objects after it has been allocated, neither if it was dynamically or statically allocated.
What you can do is to use the same strategy by developing a function which is creating another static allocated array of objects with the desired size and copy the data. If the new array of objects is smaller than the old one, the values inside the difference are discarded.
The only difference is, that the size of the new array, equivalent to the size of the old array, need to be fixed at compile-time.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Dynamic datastructure array for storing large data - arrays

Related

Shrink memory of an array of pointers, possible?

When is malloc necessary in C?

Variable Length Array

Why does C need arrays if it has pointers?

Can I increase the size of a statically allocated array?

Categories

Resources