which one to use linked list or static arrays? - c

I have a structure in C which resembles that of a database table record.
Now when I query the table using select, I do not know how many records I will get.
I want to store all the returned records from the select query in a array of my structure data type.
Which method is best?
Method 1: find array size and allocate
first get the count of records by doing select count(*) from table
allocate a static array
run select * from table and then store each records in my structure in a loop.
Method 2: use single linked list
while ( records returned )
{
create new node
store the record in node
}
Which implementation is best?
My requirement is that when I have all the records,
I will probably make copies of them or something.
But I do not need random access and I will not be doing any search of a particular record.
Thanks

And I forgot option #4. Allocate an array of fixed size. When that array is full, allocate another. You can keep track of the arrays by linking them in a linked list, or having a higher level array that keeps the pointers to the data arrays. This two-level scheme is great when you need random access, you just need to break your index into two parts.

A problem with 'select count(*)' is that the value might change between calls, so your "real" select will have a number of items different from the count you'd expect.
I think the best solution is your "2".
Instead of a linked list, I would personally allocate an array (reallocating as necessary). This is easier in languages that support growing arrays (e.g. std::vector<myrecord> in C++ and List<myrecord> in C#).

You forgot option 3, it's a little more complicated but it might be best for your particular case. This is the way it's typically done in C++ std::vector.
Allocate an array of any comfortable size. When that array is filled, allocate a new larger array of 1.5x to 2x the size of the filled one, then copy the filled array to this one. Free the original array and replace it with the new one. Lather, rinse, repeat.

There are a good many possible critiques that should be made.
You are not talking about a static array at all - a static array would be of pre-determined size fixed at compile time, and either local to a source file or local to a function. You are talking about a dynamically allocated array.
You do not give any indication of record size or record count, nor of how dynamic the database underneath is (that is, could any other process change any of the data while yours is running). The sizing information isn't dreadfully critical, but the other factor is. If you're doing a report of some sort, then fetching the data into memory is fine; you aren't going to modify the database and the data is an accurate snapshot. However, if other people could be modifying the records while you are modifying records, your outline solution is a major example of how to lose other people's updates. That is a BAD thing!
Why do you need all the data in memory at once? Ignoring size constraints, what exactly is the benefit of that compared with processing each relevant record once in the correct sequence? You see, DBMS put a lot of effort into being able to select the relevant records (WHERE clauses) and the relevant data (SELECT lists) and allow you to specify the sequence (ORDER BY clauses) and they have the best sort systems they can afford (better than the ones you or I are likely to produce).
Beware of quadratic behaviour if you allocate your array in chunks. Each time you reallocate, there's a decent chance the old memory will have to be copied to the new location. This will fragment your memory (the old location will be available for reuse, but by definition will be too small to reuse). Mark Ransom points out a reasonable alternative - not the world's simplest scheme overall (but it avoids the quadratic behaviour I referred to). Of course, you can (and would) abstract that away by a set of suitable functions.
Bulk fetching (also mentioned by Mark Ransom) is also useful. You would want to preallocate the array into which a bulk fetch fetches so that you don't have to do extra copying. This is just linear behaviour though, so it is less serious.

Create a data structure to represent your array or list. Pretend you're in an OO language and create accessors and constructors for everything you need. Inside that data structure, keep an array, and, as others have said, when the array is filled to capacity, allocate a new array 2x as large and copy into it. Access the structure only through your defined routines for accessing it.
This is the way Java and other languages do this. Internally, this is even how Perl is implemented in C.
I was going to say your best option is to look for a library that already does this ... maybe you can borrow Perl's C implementation of this kind of data structure. I'm sure it's more well tested than anything you or I could roll up from scratch. :)

while(record = get_record()) {
records++;
records_array = (record_struct *) realloc(records_array, (sizeof record_struct)*records);
*records_array[records - 1] = record;
}
This is strictly an example — please don't use realloc() in production.

The linked list is a nice, simple option. I'd go with that. If you prefer the growing array, you can find an implementation as part of Dave Hanson's C Interfaces and Implementations, which as a bonus also provides linked lists.
This looks to me like a design decision that is likely to change as your application evolves, so you should definitely hide the representation behind a suitable API. If you don't already know how to do this, Hanson's code will give you a number of nice examples.

Related

C - Ways to free() groups of elements in a hash table?

I'm currently fiddling with a program that's trying to solve a 2d rubix cube.The program is using a hash table as a memory of sorts where it saves different categories of information and it's runs on repeat. From run to run there are certain categories of information I'd like to free/remove instead of freeing the whole table at the end of each run (which is what I'm currently doing).
I've come up two ways and I'm unsure which to use. Either i basically make one array/stack for each of the categories where I save a pointer that i can later free. Or i make separate hash tables for all of the different categories and free each one at my discretion.
Are there other options? Some where i read about a pointer pool and I'm not sure what that might be. Any ides or helpful comments would be great!
Did you have more memory or time ?) If you use hash table (include separate) then you need use if for check all element in your hash table. It is very more time. I think best way create second struct when you save object after create. For free all table you need run on simple array without check and use memory set zero for flush your hash table. You need little bit more memory but more effective work.

Efficient methods for storing many similar arrays?

Recently I was faced with having to store many 'versions' of an array in memory (think of an undo system or changes to a file in version-control - but could apply elsewhere too).
In case this isn't clear:
Arrays may be identical, share some data, or none at all.
Elements may be added or removed an any point.
The goal is to avoid storing an entirely new array when there are large sections of the array that are identical.
For the purpose of this question, changes such as adding a number to each value can be ignored (treated as different data).
I've looked into writing my own solution, in principle this can be done fairly simply:
divide the array into small blocks.
if nothing changes, reuse the blocks in each new version of the array.
if only one block changes, make a new block with changed data.
retrieving an array can be done by allocating the memory, then filling it with the data from each block.
Things become more involved is when the array length changes or when the data is re-ordered.
Then it becomes a trade off for how much time its worth to spend searching for duplicate blocks (in my case I hashed some data at the beginning of each block to help identify candidates to use).
I've got my implementation working (and can link to it if its useful, though I rather avoid discussing my spesific code, since it distracts from the general case).
I suspect my own code could be improved (using tried-and-tested memory hashing & searching methods). Possibly I'm not using the right terms but I wasn't able to find information on this searching online.
So my questions are:
Which methods are most efficient for recognizing and storing arrays that share some contiguous data?
Are there known, working methods which are considered best-practice to solve this problem?
Update, wrote a small(ish) single file library and tests, as well as a Python reference version.

How does Swift manage Arrays internally?

I would like to know how Swift managed arrays internally? Apple's language guide only handles usage, but does not elaborate on internal structures.
As a Java-developer I am used to looking at "bare" arrays as a very static and fixed data structure. I know that this is not true in Swift. In Swift, other than in Java, you can mutate the length of an array and also perform insert and delete operations. In Java I am used to decide what data structure I want to use (simple arrays, ArrayList, LinkedList etc.) based on what operations I want to perform with that structure and thus optimise my code for better performance.
In conclusion, I would like to know how arrays are implemented in Swift. Are they internally managed as (double) linked lists? And is there anything available comparable to Java's Collection Framework in order to tune for better performance?
You can find a lot of information on Array in the comment above it in the Swift standard library. To see this, you can cmd-opt-click Array in a playground, or you could look at it in the unofficial SwiftDoc page.
To paraphrase some of the info from there to answer your questions:
Arrays created in Swift hold their values in a contiguous region of memory. For this reason, you can efficiently pass a Swift array into a C API that requires that kind of structure.
As you mention, an array can grow as you append values to it, and at certain points that means that a fresh, larger, region of memory is allocated, and the previous values are copied into it. It is for this reason that its stated that operations like append may be O(n) – that is, the worst-case time to perform an append operation grows in proportion to the current size of the array (because of the time taken to copy the values over).
However, when the array has to grow its storage, the amount of new storage it allocates each time grows exponentially, which means that reallocations become rarer and rarer as you append, which means the "amortized" time to append over all calls approaches constant time.
Arrays also have a method, reserveCapacity, that allows you to preemptively avoid reallocations on calling append by requesting the array allocate itself some minimum amount of space up front. You can use this if you know ahead of time how many values you plan to hold in the array.
Inserting a new value into the middle of an array is also O(n), because arrays are held in contiguous memory, so inserting a new value involves shuffling subsequent values along to the end. Unlike appending though, this does not improve over multiple calls. This is very different from, say, a linked list where you can insert in O(1) i.e. constant time. But bear in mind the big tradeoff is that arrays are also randomly accessible in constant time, unlike linked lists.
Changes to single values in the array in-place (i.e. assigning via a subscript) should be O(1) (subscript doesn't actually have a documenting comment but this is a pretty safe bet). This means if you create an array, populate it, and then don't append or insert into it, it should behave similarly to a Java array in terms of performance.
There's one caveat to all this – arrays have "value" semantics. This means if you have an array variable a, and you assign it to another array variable b, this is essentially copying the array. Subsequent changes to the values in a will not affect b, and changing b will not affect a. This is unlike "reference" semantics where both a and b point to the same array and any changes made to it via a would be reflected to someone looking at it via b.
However, Swift arrays are actually "Copy-on-Write". That is, when you assign a to b no copying actually takes place. It only happens when one of the two variables is changed ("mutated"). This brings a big performance benefit, but it does mean that if two arrays are referencing the same storage because neither has performed a write since the copy, a change like a subscript assign does have a one-off cost of duplicating the entire array at that point.
For the most part, you shouldn't need to worry about any of this except in rare circumstances (especially when dealing with small-to-modest-size arrays), but if performance is critical to you it's definitely worth familiarizing yourself with all of the documentation in that link.

How do structures affect efficiency?

Pointers make the program more efficient and faster. But how do structures effect the efficiency of the program? Does it make it faster? Is it for just the readability of the code, or what? And may I have an example of how it does so?
Pointers are just pointing a memory address, they have nothing to do with efficiency and speed(it is just a variable who stores some address which is required/helpful for some instruction to execute, nothing more)
Yes but Data structures affect the efficiency of program/code in multiple ways.they can increase/decrease time complexity and space complexity of your algorithm and ultimately your code (in which you are implementing your algo.)
for example, let take example of array and linked list
Array : some amount of space allocated sequentially in memory
Linkedlist : some space allocated randomly in memory but connected via pointers
in all cases both can be used (assuming not much heavy space allocation). but as array is continuous allocation retrieval is faster than random allocation in linked list (every time get address of next allocated block and then fetch the data)
thus this improves speed/efficiency of your code
there are many such examples which will prove you why data sructures are more important (if they were not so important why new algorithms are designed and mostly why you are learning them)
Link to refer,
What are the lesser known but useful data structures?
Structures have little to do with efficiency, they're used for abstraction. They allow you to keep all related data together, and refer to it by a single name.
There are some performance-related features, though. If you have all your data in a structure, you can pass a pointer to that structure as one argument to a function. This is better than passing lots of separate arguments to the function, for each value that would have been a member of the structure. But this isn't the primary reason we use structures, it's mainly an added benefit.
Pointers do not contribute anything in program's efficiency and execution time/speed. Structure provides a way of storing different variables under the same name. These variables can be of different types, and each has a name which is used to select it from the structure. For example, if you want to store data about student, it may consist of Student_Id, Name, Sex, School, Address etc. where Student_Id is int, Name is string, Sex is char (M/F) etc. but all variables are grouped together as a single structure 'Student' at a 'single block of memory'. So everytime you need a student data to fetch or update, you need to deal with structured data only. Now imagine, how much problem you may face if you try to store all those int, char, char[] variables separately and update them individually. Because you need to update everything at different memory locations for each student's record.
But if you consider data structure to structure your whole data in abstract data types where you may go for different kind of linked list, tree, graph etc. or array implementation then your algorithm plays a vital role in deciding time and space complexity of the program. So in that sense you can make your program more efficient.
When you want to optimize your memory/cache usage structures can increase the efficiency of your code (make it faster). This is because when data is loaded from memory to the cache it done in words (32-64bits) by fitting you data to these word boundaries you can ensure when your first int is loaded so is your second for a two int structure (maybe a serious of coordinates).

Is it bad form to shuffle data instead of pointers to it?

This isn't actually a homework question per se, just a question that keeps nagging me as I do my homework. My book sometimes gives an exercise about rearranging data, and will explicitly say to do it by changing only pointers, not moving the data (for example, in a linked list making use of a "node" struct with a data field and a next/pointer field, only change the next field).
Is it bad form to move data instead? Sometimes it seems to make more sense (either for efficiency or clarity) to move the data from one struct to another instead of changing pointers around, and I guess I'm just wondering if there's a good reason to avoid doing that, or if the textbook is imposing that constraint to more effectively direct my learning.
Thanks for any thoughts. :)
Here are 3 reasons:
Genericness / Maintainability:
If you can get your algorithm to work by modifying pointers only, then it will always work regardless of what kind of data you put in your "node".
If you do it by modifying data, then your algorithm will be married to your data structure, and may not work if you change your data structure.
Efficiency:
Further, you mention efficiency, and you will be hard-pressed to find a more efficient operation than copying a pointer, which is just an integer, typically already the size of a machine word.
Safety:
And further still, the pointer-manipulation route will not cause confusion with other code which has its own pointers to your data, as #caf points out.
It depends. It generally makes sense to move the smaller thing, so if the data being shuffled is larger than a pointer (which is usually the case), then it makes more sense to shuffle pointers rather than data.
In addition, if other code might have retained pointers to the data, then it wouldn't expect the data to be changed from underneath, so this again points towards shuffling pointers rather than data.
Shuffling pointers or indexes is done when copying or moving the actual objects is difficult or inefficient. There's nothing wrong with shuffing the objects themselves if that's more convenient.
In fact by eliminating the pointers you eliminate a whole bunch of potential problems that you get with pointers, such as whether and when and how to delete them.
Moving data takes more time and depending on the nature of your data it may also don't like relocations (like the structure containing pointers into itself for whatever reasons).
If you have pointers, I assume they exist in the dynamic memory...
In other words, they just exist... So why bother changing the data from one to another, reallocating if necessary?
Usually, the purpose of a list is to have values from everywhere, from a memory perspective, into a continuous list.
With such a structure, you can re-arrange and re-order the list, without having to move the data.
You've to to understand that moving data implies reading and writing into memory (not speaking about reallocation).
It's resource consuming... So re-ordering only the addresses is a lot more efficient!
It depends on the data. If you're just moving around ints or chars, it would be no more expensive to shuffle the data than the pointer. However, once you pass a certain size or complexity, you start to lose efficiency quickly. Moving objects by pointer will work for any contained data, so getting used to using pointers, even on the toy structs that are used in your assignments, will help you handle those large, complex objects without.
It is especially idiomatic to handle things by pointer when dealing with something like a linked list. The whole point of the linked list is that the Node part can be as large or complex as you like, and the semantics of shuffling, sorting, inserting, or removing nodes all stay the same. This is the key to templated containers in C++ (which I know is not the primary target of this question). C++ also encourages you to consider and limit the number of times you shuffle things by data, because that involves calling a copy constructor on each object each time you move it. This doesn't work well with many C++ idioms, such as RAII, which makes a constructor a rather expensive but very useful operation.

Resources