How do structures affect efficiency? - c

Pointers make the program more efficient and faster. But how do structures effect the efficiency of the program? Does it make it faster? Is it for just the readability of the code, or what? And may I have an example of how it does so?

Pointers are just pointing a memory address, they have nothing to do with efficiency and speed(it is just a variable who stores some address which is required/helpful for some instruction to execute, nothing more)
Yes but Data structures affect the efficiency of program/code in multiple ways.they can increase/decrease time complexity and space complexity of your algorithm and ultimately your code (in which you are implementing your algo.)
for example, let take example of array and linked list
Array : some amount of space allocated sequentially in memory
Linkedlist : some space allocated randomly in memory but connected via pointers
in all cases both can be used (assuming not much heavy space allocation). but as array is continuous allocation retrieval is faster than random allocation in linked list (every time get address of next allocated block and then fetch the data)
thus this improves speed/efficiency of your code
there are many such examples which will prove you why data sructures are more important (if they were not so important why new algorithms are designed and mostly why you are learning them)
Link to refer,
What are the lesser known but useful data structures?

Structures have little to do with efficiency, they're used for abstraction. They allow you to keep all related data together, and refer to it by a single name.
There are some performance-related features, though. If you have all your data in a structure, you can pass a pointer to that structure as one argument to a function. This is better than passing lots of separate arguments to the function, for each value that would have been a member of the structure. But this isn't the primary reason we use structures, it's mainly an added benefit.

Pointers do not contribute anything in program's efficiency and execution time/speed. Structure provides a way of storing different variables under the same name. These variables can be of different types, and each has a name which is used to select it from the structure. For example, if you want to store data about student, it may consist of Student_Id, Name, Sex, School, Address etc. where Student_Id is int, Name is string, Sex is char (M/F) etc. but all variables are grouped together as a single structure 'Student' at a 'single block of memory'. So everytime you need a student data to fetch or update, you need to deal with structured data only. Now imagine, how much problem you may face if you try to store all those int, char, char[] variables separately and update them individually. Because you need to update everything at different memory locations for each student's record.
But if you consider data structure to structure your whole data in abstract data types where you may go for different kind of linked list, tree, graph etc. or array implementation then your algorithm plays a vital role in deciding time and space complexity of the program. So in that sense you can make your program more efficient.

When you want to optimize your memory/cache usage structures can increase the efficiency of your code (make it faster). This is because when data is loaded from memory to the cache it done in words (32-64bits) by fitting you data to these word boundaries you can ensure when your first int is loaded so is your second for a two int structure (maybe a serious of coordinates).

Related

How does Swift manage Arrays internally?

I would like to know how Swift managed arrays internally? Apple's language guide only handles usage, but does not elaborate on internal structures.
As a Java-developer I am used to looking at "bare" arrays as a very static and fixed data structure. I know that this is not true in Swift. In Swift, other than in Java, you can mutate the length of an array and also perform insert and delete operations. In Java I am used to decide what data structure I want to use (simple arrays, ArrayList, LinkedList etc.) based on what operations I want to perform with that structure and thus optimise my code for better performance.
In conclusion, I would like to know how arrays are implemented in Swift. Are they internally managed as (double) linked lists? And is there anything available comparable to Java's Collection Framework in order to tune for better performance?
You can find a lot of information on Array in the comment above it in the Swift standard library. To see this, you can cmd-opt-click Array in a playground, or you could look at it in the unofficial SwiftDoc page.
To paraphrase some of the info from there to answer your questions:
Arrays created in Swift hold their values in a contiguous region of memory. For this reason, you can efficiently pass a Swift array into a C API that requires that kind of structure.
As you mention, an array can grow as you append values to it, and at certain points that means that a fresh, larger, region of memory is allocated, and the previous values are copied into it. It is for this reason that its stated that operations like append may be O(n) – that is, the worst-case time to perform an append operation grows in proportion to the current size of the array (because of the time taken to copy the values over).
However, when the array has to grow its storage, the amount of new storage it allocates each time grows exponentially, which means that reallocations become rarer and rarer as you append, which means the "amortized" time to append over all calls approaches constant time.
Arrays also have a method, reserveCapacity, that allows you to preemptively avoid reallocations on calling append by requesting the array allocate itself some minimum amount of space up front. You can use this if you know ahead of time how many values you plan to hold in the array.
Inserting a new value into the middle of an array is also O(n), because arrays are held in contiguous memory, so inserting a new value involves shuffling subsequent values along to the end. Unlike appending though, this does not improve over multiple calls. This is very different from, say, a linked list where you can insert in O(1) i.e. constant time. But bear in mind the big tradeoff is that arrays are also randomly accessible in constant time, unlike linked lists.
Changes to single values in the array in-place (i.e. assigning via a subscript) should be O(1) (subscript doesn't actually have a documenting comment but this is a pretty safe bet). This means if you create an array, populate it, and then don't append or insert into it, it should behave similarly to a Java array in terms of performance.
There's one caveat to all this – arrays have "value" semantics. This means if you have an array variable a, and you assign it to another array variable b, this is essentially copying the array. Subsequent changes to the values in a will not affect b, and changing b will not affect a. This is unlike "reference" semantics where both a and b point to the same array and any changes made to it via a would be reflected to someone looking at it via b.
However, Swift arrays are actually "Copy-on-Write". That is, when you assign a to b no copying actually takes place. It only happens when one of the two variables is changed ("mutated"). This brings a big performance benefit, but it does mean that if two arrays are referencing the same storage because neither has performed a write since the copy, a change like a subscript assign does have a one-off cost of duplicating the entire array at that point.
For the most part, you shouldn't need to worry about any of this except in rare circumstances (especially when dealing with small-to-modest-size arrays), but if performance is critical to you it's definitely worth familiarizing yourself with all of the documentation in that link.

Data design: better to nest structures or pointers to structures?

Working in plain C, is it better to nest structures inside other structures or pointers to structures. Using pointers makes it easier to have good alignment, but then accessing the inner structures requires an additional dereference. Just to put this in concrete terms:
typedef struct {
unsigned int length;
char* string;
} SVALUE;
typedef struct {
unsigned int key;
SVALUE* name;
SVALUE* surname;
SVALUE* date_of_birth;
SVALUE* date_of_death;
SVALUE* place_of_birth;
SVALUE* place_of_death;
SVALUE* floruit;
} AUTHOR;
typedef struct {
SVALUE media_type;
SVALUE title;
AUTHOR author;
} MEDIA;
Here we have some nested structures, in some cases nesting pointer to the internal structure and in others embedding the structure itself.
One issue besides alignment and dereferencing is how memory is allocated. If I do not use pointers, and use pure nested structures, then when the instance of the structure is allocated, the entire nested tree is allocated in one step (and must also be freed in one step). However, if I use pointers, then I have to allocate and free the inner members separately, which means more lines of code but potentially more flexibility because I can, for example, leave members null if the record has no value for that field.
Which approach is preferable?
Nesting structures ensures their spatial locality, since the entire object is actually just a big block of memory even though it is made up of several structures; in memory, the tree is flattened and all members are stored contiguously. This might result in better use of fast memory such as processor caches. If you nest pointers to other structures, this level of indirection might mean the nested data is stored in a far away location, which might prevent such optimizations; by dereferencing the pointer the data would have to be fetched from main memory. Directly nesting data also simplifies access of structure members for purposes such as serialization and transmission.
It also has other implications, such as the impact on the size of your structure and the effects of passing its objects around by value. If you directly nest structures, the sizeof your structure will likely be much bigger than if you had nested pointers. Bigger structures have a larger memory footprint, which can grow noticeably if copies are being made all the time. If the objects are not opaque, they can be allocated on the stack and quickly overflow it. The larger the struct, the more fitting they are for dynamic allocation and indirect access through pointers. I speculate that copying around big amounts of data also carries a cost in speed, but I'm not sure.
Pointers provide additional semantics which may or may not be desirable in your case. They:
Can be NULL, indicating that the data is not available or is possibly optional
Create links between separate structures and allow one structure to exist without the other
Allow two different structures to be allocated differently and to have distinct lifetimes
Allow many different structures to share one possibly big common nested value without wasting memory
Let you to point to data which has not even been properly defined yet
You can point to opaque structures, which cannot be instantiated in the stack because the compiler does not yet know their size
There are too many factors involved in making such decisions. Most of the time it is not a matter of preference. It is a matter of ownership, lifetime and memory management.
Every object "lives" somewhere and is owned by someone/something. Whoever owns an object, has control over its lifetime, among other things. Everybody else can only refer to that object through pointers.
When a struct object is directly nested into another struct object, the nested object is owned by the object it is nested into. In your example each MEDIA object owns its media_type, title and author subobjects. They begin their lives together with their owning MEDIA object and they die together with that object.
Meanwhile, at the first sight AUTHOR object does not own its name, surname and other subobjects. AUTHOR object simply refers to those subobjects. name, surname and other SVALUE subobjects live somewhere else, they are owned by someone/something else, they are managed by someone/something else.
At the first sight, it looks like a strange design. Why doesn't AUTHOR own its name? One possible reason for that is that we are dealing with a database where many authors have identical names, surnames etc. In that case to save memory it might make sense to store these SVALUE objects in some external container (hash set, for example), which keeps only one copy of each specific SVALUE. Meanwhile, AUTHOR objects simply refer to those SVALUE objects. I.e. all AUTHOR objects with name "John" will refer to the same SVALUE "John".
In such case it is that hash set that owns these SVALUE objects.
But if AUTHOR is actually supposed to own its name, yet a pointer is used just to have an opportunity to leave it null... this does not strike me as a particularly good design, especially considering that SVALUE object already has its own capacity for representing null values. Unless you are looking at significant memory savings from the ability to leave some fields null, it would be a better idea to store name directly in AUTHOR.
Now, if you don't need any sort of cross-referencing between different data structures, then you simply don't need pointers. In other words, if the object is only known to its owner and no one else, then using pointers and allocating sub-objects independently make very little sense. In such cases it makes much more sense to nest structures directly.
On the other hand, some designs might not allow you to nest objects directly. Such designs might declare opaque struct types, which can only be instantiated through an API allocator function returning a pointer. In such designs your are forced to use pointers. But this is not the case in your example, I believe.

Linked list or sequential memory?

I'm not really 100% sure how to describe this, but I will try my best. I am currently working on a project where they have a struct (called set) that contains a pointer to a set of structs (called objs). To access these set of structs, on has to iterate through their memory address (like an array). The main struct has the number of structs in its set. This is how I do it
objs = set->objs;
for(n=0; n < set->numObjs; n++)
{
do_something(objs);
objs++;
}
My question is, would a linked list be safer, faster, or in any way better? How about an array of structs instead?
Thanks.
An array is usually a lot faster to traverse and manipulate element-wise, since all data sits contiguously in memory and will thus use the CPU cache very efficiently. By contrast, a linked list is more or less the worst in terms of cache usage, since every list node may easily end up in an entirely separate part of memory and occupy a whole cache line all by itself.
On the other hand, a linked list is easier to manipulate as a container, since you can insert and remove elements at very little cost, while you cannot really do so with an array at all unless you're willing to move an entire segment of the array around.
Take your pick.
Or better, try both and profile.
This source piece is somewhat incomplete however it appears that objs is a pointer to the same type as in set->objs. What you are doing is iterating over a list or an array of these objs by using pointer arithmetic rather than indexing with array syntax. However the list of objs are stored in sequential memory or the pointer incrementation would not be working to give you the next obj in the sequential list.
The question is really what kinds of operations you are wanting to do so far as maintaining and changing the list. For instance if the list is basically a static list that rarely changes, a sequential list should work fine. If the only major operation is to add something to the list, probably a sequential list would be fine if you know the maximum number and can allocate that much sequential memory.
Where a linked list shines is in the following areas: (1) inserting and/or deleting elements from the list especially elements that are not on the front or back, (2) being able to grow and not having to depend on a specific number of elements to the list.
In order to grow a fixed size sequential list, you would typically have to allocate a new region of memory and copy the list to the new memory area.
Another option is to have a data structure that is basically a set of linked sequential lists. As the sequential list fills up and you need more room, you would just allocate another sequential list area and then link the two. However with this approach you may need to have additional code for managing empty spaces and it will depend on whether you will need to delete items or have them in some kind of sorted order as you insert new items.
Here is a wikipedia article on linked lists.
A linked list would be slower, since you probably won't be using memory caches as efficiently (The list nodes may be on different memory pages, unlike with an array), however using a linked list is probably both easier and safer. I would recommend you only use arrays if you find that the linked list solution is too slow.

Is it bad form to shuffle data instead of pointers to it?

This isn't actually a homework question per se, just a question that keeps nagging me as I do my homework. My book sometimes gives an exercise about rearranging data, and will explicitly say to do it by changing only pointers, not moving the data (for example, in a linked list making use of a "node" struct with a data field and a next/pointer field, only change the next field).
Is it bad form to move data instead? Sometimes it seems to make more sense (either for efficiency or clarity) to move the data from one struct to another instead of changing pointers around, and I guess I'm just wondering if there's a good reason to avoid doing that, or if the textbook is imposing that constraint to more effectively direct my learning.
Thanks for any thoughts. :)
Here are 3 reasons:
Genericness / Maintainability:
If you can get your algorithm to work by modifying pointers only, then it will always work regardless of what kind of data you put in your "node".
If you do it by modifying data, then your algorithm will be married to your data structure, and may not work if you change your data structure.
Efficiency:
Further, you mention efficiency, and you will be hard-pressed to find a more efficient operation than copying a pointer, which is just an integer, typically already the size of a machine word.
Safety:
And further still, the pointer-manipulation route will not cause confusion with other code which has its own pointers to your data, as #caf points out.
It depends. It generally makes sense to move the smaller thing, so if the data being shuffled is larger than a pointer (which is usually the case), then it makes more sense to shuffle pointers rather than data.
In addition, if other code might have retained pointers to the data, then it wouldn't expect the data to be changed from underneath, so this again points towards shuffling pointers rather than data.
Shuffling pointers or indexes is done when copying or moving the actual objects is difficult or inefficient. There's nothing wrong with shuffing the objects themselves if that's more convenient.
In fact by eliminating the pointers you eliminate a whole bunch of potential problems that you get with pointers, such as whether and when and how to delete them.
Moving data takes more time and depending on the nature of your data it may also don't like relocations (like the structure containing pointers into itself for whatever reasons).
If you have pointers, I assume they exist in the dynamic memory...
In other words, they just exist... So why bother changing the data from one to another, reallocating if necessary?
Usually, the purpose of a list is to have values from everywhere, from a memory perspective, into a continuous list.
With such a structure, you can re-arrange and re-order the list, without having to move the data.
You've to to understand that moving data implies reading and writing into memory (not speaking about reallocation).
It's resource consuming... So re-ordering only the addresses is a lot more efficient!
It depends on the data. If you're just moving around ints or chars, it would be no more expensive to shuffle the data than the pointer. However, once you pass a certain size or complexity, you start to lose efficiency quickly. Moving objects by pointer will work for any contained data, so getting used to using pointers, even on the toy structs that are used in your assignments, will help you handle those large, complex objects without.
It is especially idiomatic to handle things by pointer when dealing with something like a linked list. The whole point of the linked list is that the Node part can be as large or complex as you like, and the semantics of shuffling, sorting, inserting, or removing nodes all stay the same. This is the key to templated containers in C++ (which I know is not the primary target of this question). C++ also encourages you to consider and limit the number of times you shuffle things by data, because that involves calling a copy constructor on each object each time you move it. This doesn't work well with many C++ idioms, such as RAII, which makes a constructor a rather expensive but very useful operation.

which one to use linked list or static arrays?

I have a structure in C which resembles that of a database table record.
Now when I query the table using select, I do not know how many records I will get.
I want to store all the returned records from the select query in a array of my structure data type.
Which method is best?
Method 1: find array size and allocate
first get the count of records by doing select count(*) from table
allocate a static array
run select * from table and then store each records in my structure in a loop.
Method 2: use single linked list
while ( records returned )
{
create new node
store the record in node
}
Which implementation is best?
My requirement is that when I have all the records,
I will probably make copies of them or something.
But I do not need random access and I will not be doing any search of a particular record.
Thanks
And I forgot option #4. Allocate an array of fixed size. When that array is full, allocate another. You can keep track of the arrays by linking them in a linked list, or having a higher level array that keeps the pointers to the data arrays. This two-level scheme is great when you need random access, you just need to break your index into two parts.
A problem with 'select count(*)' is that the value might change between calls, so your "real" select will have a number of items different from the count you'd expect.
I think the best solution is your "2".
Instead of a linked list, I would personally allocate an array (reallocating as necessary). This is easier in languages that support growing arrays (e.g. std::vector<myrecord> in C++ and List<myrecord> in C#).
You forgot option 3, it's a little more complicated but it might be best for your particular case. This is the way it's typically done in C++ std::vector.
Allocate an array of any comfortable size. When that array is filled, allocate a new larger array of 1.5x to 2x the size of the filled one, then copy the filled array to this one. Free the original array and replace it with the new one. Lather, rinse, repeat.
There are a good many possible critiques that should be made.
You are not talking about a static array at all - a static array would be of pre-determined size fixed at compile time, and either local to a source file or local to a function. You are talking about a dynamically allocated array.
You do not give any indication of record size or record count, nor of how dynamic the database underneath is (that is, could any other process change any of the data while yours is running). The sizing information isn't dreadfully critical, but the other factor is. If you're doing a report of some sort, then fetching the data into memory is fine; you aren't going to modify the database and the data is an accurate snapshot. However, if other people could be modifying the records while you are modifying records, your outline solution is a major example of how to lose other people's updates. That is a BAD thing!
Why do you need all the data in memory at once? Ignoring size constraints, what exactly is the benefit of that compared with processing each relevant record once in the correct sequence? You see, DBMS put a lot of effort into being able to select the relevant records (WHERE clauses) and the relevant data (SELECT lists) and allow you to specify the sequence (ORDER BY clauses) and they have the best sort systems they can afford (better than the ones you or I are likely to produce).
Beware of quadratic behaviour if you allocate your array in chunks. Each time you reallocate, there's a decent chance the old memory will have to be copied to the new location. This will fragment your memory (the old location will be available for reuse, but by definition will be too small to reuse). Mark Ransom points out a reasonable alternative - not the world's simplest scheme overall (but it avoids the quadratic behaviour I referred to). Of course, you can (and would) abstract that away by a set of suitable functions.
Bulk fetching (also mentioned by Mark Ransom) is also useful. You would want to preallocate the array into which a bulk fetch fetches so that you don't have to do extra copying. This is just linear behaviour though, so it is less serious.
Create a data structure to represent your array or list. Pretend you're in an OO language and create accessors and constructors for everything you need. Inside that data structure, keep an array, and, as others have said, when the array is filled to capacity, allocate a new array 2x as large and copy into it. Access the structure only through your defined routines for accessing it.
This is the way Java and other languages do this. Internally, this is even how Perl is implemented in C.
I was going to say your best option is to look for a library that already does this ... maybe you can borrow Perl's C implementation of this kind of data structure. I'm sure it's more well tested than anything you or I could roll up from scratch. :)
while(record = get_record()) {
records++;
records_array = (record_struct *) realloc(records_array, (sizeof record_struct)*records);
*records_array[records - 1] = record;
}
This is strictly an example — please don't use realloc() in production.
The linked list is a nice, simple option. I'd go with that. If you prefer the growing array, you can find an implementation as part of Dave Hanson's C Interfaces and Implementations, which as a bonus also provides linked lists.
This looks to me like a design decision that is likely to change as your application evolves, so you should definitely hide the representation behind a suitable API. If you don't already know how to do this, Hanson's code will give you a number of nice examples.

Resources