Distinguishing Data Types and Data structures - c

Well, somehow even after reading in a lot of textbooks (really a lot) and on the Internet for a long while I still can’t completely comprehend what the difference between the two mentioned things is.
To simplify the question, according to let’s say Wikipedia a data type is:
a classification identifying one of various types of data, such as real, integer or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored.
with it being mostly the implementation of some Abstract data type like the real numbers or the whole numbers.
All good, then comes the data structure:
is a particular way of organizing data in a computer so that it can be used efficiently.[1][2] Data structures can implement one or more particular abstract data types., which are the means of specifying the contract of operations and their complexity. In comparison, a data structure is a concrete implementation of the contract provided by an ADT.
So a data structure is an implementation of an ADT like say a stack or queue.
But wouldn’t that make it a data type as well??
All that I can truly see is that a data type could range from really simple things without any structural organization to sophisticated structures of data what really counts is that they are an implementation of an ADT mirroring the important aspects of it and that they could be envisioned as a single entity like ( a list or tree), but data structures must contain at least some sort of logical or mathematical organization to classify as a data structure, but sadly this difference would make many entities both a data structure and a data type simultaneously.
So what is the solid difference between a simple plain (data type) and a (data structure)?
I would gladly accept an answer with specifying a specific book about this topic which goes deep enough to explain all this matters, also if someone can recommend me some good books about data structures in C.

In C, a data type is a language-level construct. There are a limited number of predefined types (int, char, double, etc.), and a practically unlimited number of derived types (array types, structure types, union types, function types, pointer types, atomic types (the latter are new in C11)).
Any type may be given a simple name via a typedef declaration. For any type other than a function type or an incomplete type, you can have objects of that type; each object occupies a contiguous region of memory.
The types that can exist in C are completely described in section 6.2.5 of the C standard; see, for example, the N1570 draft.
A data structure, on the other hand, is a construct defined by your own code. The language does not define the concept of a linked list, or a binary tree, or a hash table, but you can implement such a data structure, usually by building it on top of derived data types. Typically there is no such thing as an object that's a linked list. An instance of a linked list data structure consists of a collection of related objects, and only the logic of your code turns that collection into a coherent entity. But you'll typically have an object of some data type that your program uses to refer to the linked list data structure, perhaps a structure or a pointer to a structure.
You'll typically have a set of functions that operate on instances of a data structure. Whether those functions are part of the data structure is a difficult question that I won't try to answer here.
An array, for example, can be considered both a data type and a data structure; more precisely, you can think of it as a data structure implemented using the existing array type.

Referring >=C99:
The are two kinds of data types:
intrinsic: char, int, float, double, _Complex, _Bool, void (for some of them there a variation to long and unsigned around)
derived: arrays, structures, unions, pointers, functions
The latter are build from the former and/or the latter.
So to answer your question:
So what is the solid difference between a simple plain (data type) and a (data structure)?
A "data structure [type]" is derived from "simple plain data type"(s) and/or other "data structure [type]"(s).

A data type specifies the values and operations allowed for a single expression or object; a data structure is a region of storage and algorithms that organize objects in that storage.
An example of a data type is int; objects of this type may store whole number values in at least the range [-32767, 32767], the usual arithmetic operations may be performed these objects (although the result of integer division is also an integer, which trips people up the first time around). You can't use the subscript operator [] on an int, nor may you use the function call () operator on an int object.
For an example of a data structure, we can look at a simple stack. We'll use an array as our region of storage. We'll define an additional integer item to serve as a stack pointer - it will contain the index of the element most recently added to the array. We define two algorithms - push and pop - that will add and remove items to the stack in a specific order.
push: if sp is less than stack size then
add 1 to sp
write input to array[sp]
else
stack overflow
end if
pop: if sp is greater than 0 then
get value from array[sp]
subtract 1 from sp
return value
else
stack underflow
end if
Our stack data structure stores a number of objects of some data type such that the last item added is always the first item removed, a.k.a. a last-in first-out (LIFO) queue. If we push the values 1, 2, and 3 onto the stack, they will be popped off in the order 3, 2, and 1.
Note that it's the algorithms that distinguish data structures from each other, not the type of storage. If your algorithms add an item to one end of the array and pull them from the other end, you have a first-in first-out (FIFO) queue. If your algorithms add items to the array such that for each element i in the array a[i] >= a[2*i] and a[i] >= a[2*i+1] are both true, you have a heap.

Basically,
A Data type defines a certain domain of values and it defines the operations allowed on those values. All the basic data types defined by the compiler are called Primitive Data Types
A Data structure is rather an User defined data type and is the systematic way to organize data so that it can be used efficiently. The operations and values of these are not specified in the language itself , but it is specified by the user.
The Book to learn more about this is "The C Programming Language - Dennis Ritchie"

A data type represents the type of data that is going to be stored in a variable. It specifies that a variable will only assign values of a specific type.
A data structure is a collection that holds various from of data.

Related

Fundamental limitations of cell arrays, arrays of structs, and scalar structs?

I've been using Matlab on and off for decades. I thought I had a good grip on arrays, structs, cell arrays, tables, an array of structs, and a struct in which each field is an array. For the latter two, I assumed that each field needed to be of uniform type. I'm finding that no such limitation exists:
Perhaps Matlab is becoming more flexible with the years (I'm using 2015b), but it does undermine my confidence in choosing the best type of variable for a task if I find that understanding of the limitations of each type is wrong. For the purpose of this question, I can't really articulate the needs of the task because the manner in which I break down a large to-do into tasks depends on my understanding of the data types at my disposal, and their advantages/limitations.
I can (and have) read online documentation ad nauseum, and while they will walk you through code to illustrate what the data types are able to do, I haven't yet come across a succinct description of the comparative limitations between cell arrays, arrays of structs, and structs whose fields are themselves arrays -- to the point that I can use that knowledge to choose the best structure in a given situation. Basic stuff, I do find, e.g., the same field names will occur in each struct of a struct array (but as the above example shows, each field of each struct can contain highly heterogeneous data types and/or array sizes).
THE QUESTION
Can anyone point to such a comparison of limitations between cell arrays, arrays of structs, and scalar structs whose fields are themselves arrays? I'm looking for a treatment at a level that informs a coder in deciding on the best trade-off between (i) speed, (ii) memory, and (iii) readability, maintainability, and evolvability.
I've deliberately left out tables because, although I'm enamoured of their convenient access to, and subsetting of, data sets (and presentation thereof), they have proved rather slow for manipulation of data. They have their uses, and I use them liberally, but I'm not interested in them for the purpose of this comparison, which is under-the-hood algorithm coding.
I think your question eventually narrows down to these three "types" of data structures:
comparative limitations between cell arrays, arrays of structs, and structs whose fiels are themselves arrays
[Note that "structs whose fields are themselves arrays" I translate as "scalar structs" here. An array of structs can also contain arbitrary arrays. My thinking becomes clear below, I hope.]
To me, these are not very different. All three are containers for heterogeneous data. (Heterogeneous data is non-uniform data, each data element is potentially of a different type and size.) Each of these statements can return an array of any type, unrelated to the type of any other array in the container:
cell array: array{i,j}
struct array: array(i,j).value
scalar struct: array.value
So it all depends on how you want to index:
array(i,j).value
^ ^
A B
If you want to index using A only, use a cell array (though you then need curly braces, of course). If you want to index using B only, use a scalar struct. If you want both A and B, use a struct array.
There is no difference in cost that I'm aware of. Each of the arrays contained in these containers takes up some space. The spatial overhead of the various containers is similar, and I have never noted a time overhead difference.
However, there is a huge difference between these two:
array(i).value % s1
array.value(i) % s2
I think that the question deals with this difference also. s1 has a lot more spatial overhead than s2:
>> s1=struct('value',num2cell(1:100))
s1 =
1×100 struct array with fields:
value
>> s2=struct('value',1:100)
s2 =
struct with fields:
value: [1×100 double]
>> whos
Name Size Bytes Class Attributes
s1 1x100 12064 struct
s2 1x1 976 struct
The data needs 800 bytes, so s2 has 176 bytes of overhead, whereas s1 has 11264 (1408%)!
The reason is not the container, but the fact that we're storing one array with 100 elements in one, and 100 arrays with one element in the other. Each array has a header of a certain size that MATLAB uses to know what type of array it is, what sizes it has, to manage its storage and the delayed copy mechanism. The fewer arrays one has, the less memory one uses.
So, don't use a heterogeneous container to store scalars! These things only make sense when you need to store larger arrays, or arrays of different type or size.
The heterogeneous container that is not explicitly asked about (and after the edit explicitly not asked about) is the table. A table is similar to a scalar struct in that each column of the table is a single array, and different columns can have different types. Note that it is possible to use a cell array as a column, allowing for heterogenous elements to be stored in a column, but they make most sense if this is not the case.
One difference with a scalar struct is that each column must have the same number of rows. Another difference is that indexing can look like that of a cell array, a scalar struct, or a struct array.
Thus, the table forces some constrains upon the contained data, which is very beneficial in some circumstances.
However, and as the OP noted, working with tables is slower than working with structs. This is because table is a custom class, not a native type like structs and cell arrays. If you type edit table in MATLAB, you'll see the source code, how it's implemented. It's a classdef file, just like something any of us could write. Consequently, it has the same speed limitations: the JIT is not optimized for it, indexing into a table implies running a function written as an M-file, etc.
One more thing: Don't create cell arrays of structs, or scalar structs with cell arrays. This increases the levels of containers, which increases overhead (both in space and time), and makes the contents more difficult to use. I have seen questions here on SO related to difficulty accessing data, caused by this type of construct:
data{i,j}.value % A cell array with structs. Don't do this!
data.value{i,j} % A struct with cell arrays. Don't do this!
The first example is equal to a struct array (with a lot more overhead), except there is no control over the struct fields within each cell. That is, it is possible for one of the cells to not have a .value field.
The second example makes sense only if value is a different size than a second struct field. If all struct fields are (supposed to be) cell arrays of the same size like this, then use a struct array. Again, less overhead and more uniformity.

Data structure - Array

Here it says:
Arrays are useful mostly because the element indices can be computed
at run time. Among other things, this feature allows a single
iterative statement to process arbitrarily many elements of an array.
For that reason, the elements of an array data structure are required
to have the same size and should use the same data representation.
Is this still true for modern languages?
For example, Java, you can have an array of Objects or Strings, right? Each object or string can have different length. Do I misunderstand the above quote, or languages like Java implements Array differently? How?
In java all types except primitives are referenced types meaning they are a pointer to some memory location manipulated by JVM.
But there are mainly two types of programming languages, fixed-typed like Java and C++ and dynamically-typed like python and PHP. In fixed-typed languages your array should consist of the same types whether String, Object or ...
but in dynamically-typed ones there's a bit more abstraction and you can have different data types in array (I don't know the actual implementation though).
An array is a regular arrangement of data in memory. Think of an array of soldiers, all in a line, with exactly equal spacing between each man.
So they can be indexed by lookup from a base address. But all items have to be the same size. So if they are not, you store pointers or references to make them the same size. All languages use that underlying structure, except for what are sometimes called "associative arrays", indexed by key (strings usually), where you have what is called a hash table. Essentially the hash function converts the key into an array index, with a fix-up to resolve collisions.

When is the best time to use a Structure or an array

I am a little new to C programming. I was writing a C program which has 3 integers to handle. I had all of them inside an array and suddenly I had a thought of why should I not use a structure.
My question here is when is the best time to use a structure and when to use an array. And is there any memory usage difference between the two in this particular case.
Any help regarding this is appriciated. Thanks!
An array is best when you want to loop through the values (which, essentially, means they're strongly related). Otherwise a structure allows you to give them meaningful names and avoids the need to document that array, e.g. myVar[1] is the name of the company and myVar[0] is its phone number, etc. as opposed to companyName, companyPhone.
The difference is about semantic information. If you want to store your information as a list where there is no semantic distinction between different members of that list, then use an array. Perhaps each member of the list represents a different value for the same thing.
If each of those integers represents something special or different, use a struct. Note the implications of using a struct, such as the fact that people expect the members to be closely related semantically.
struct has other advantages over array which can make it more powerful. For example, its ability to encapsulate multiple data types.
If you are passing this information between many functions, a structure is likely more practical (because there is no need to pass the size). It would be bad to pass an array (which decays to a pointer) and expect the callee to know how many items are in the array. Using a struct implicitly makes this part of the function contract.
In terms of size, there is no difference. A 4 byte int would typically be 4-byte aligned.
You can think of structure like an object in OOP languages, a structure ties related data into a single type and allows you to access each member of the structure using the member's name instead of array indices. If you can think of a singular name that could unify the related data then you should be using a structure.
An array can be thought of as a list of items, if the name you thought of above contains the word list or collection or is a plural, then you should be using arrays or other collection types. The primary use of arrays is to loop over it and apply the same operation to every items in the array or a range of items in the array. If you used an array but never looped over it, it's an indication that probably array may not be the best data type.
I would suggest to use an array if the different things you store are logically the same data, but different instance of this. (like a list of telephone numbers or ages). And use a struct when they mean different things (like age and size) bound together because they are related to the same thing (a person).
The size is equal, since both store 3 integers without anything else; You could actually cast the struct to an array and use it like that (although you shouldn't do that for its ugliness).
You could test that with this simple programm:
#include <stdio.h>
struct three_numbers{
int x;
int y;
int z;
};
int main(int argc, char** argv) {
int test[3];
printf("struct: %d, array: %d\n", sizeof(three_numbers), sizeof(test));
}
prints on my system:
struct: 12, array: 12
In my opinion, you should think first from the perspective of the design to decide which one to use. In your question you have mentioned that "I have three integers to handle". The point here is that how did you arrive at three integers?
Just as many others have noted, let's say you need store details of a person, first you need to think of the person as an object and then decide what all information relevant to that person you will need and then decide what data type you need to use for each of those details. What you are trying to do is that you have decided that data types first and then trying work your way up.
To just put in simple words about the difference between structure and array. Structure is a Composite Data Type (or a User defined data type) whereas array is just a collection of similar data.
Use structures to group information about a single object. Use arrays to group information about multiple objects.

What is the difference between an Array Data Structure and an Array Data-type in the context of a programming language like C?

Wikipedia differentiates an Array Data Structure and an Array Data-type.
What is the difference between an Array Data Structure and an Array Data-type in the context of a programming language like C?
What is this : int array[]={1, 2, 3, 4, 5}; ?
Is it an Array Data Structure or an Array Data-type? Why?
Short answer: Do yourself a favor and just ignore both articles. I don't doubt the good intentions of the authors, but the articles are confusing at best.
What is this : int array[]={1, 2, 3, 4, 5}; ?
Is it an Array Data Structure or an Array Data-type? Why?
It's both. The array data structure discussed in the article by that name is supposed to relate specifically to arrays as implemented in C. The array data type concept is supposed to be more abstract, but C arrays certainly are one implementation of array data type.
Long answer: The difference those two articles consider is the difference between behavior and implementation. As used in the articles, array data structure refers to elements stored sequentially in memory, so that you can calculate the address of any element by:
address = (base address) + (element index * size of a single element)
where 'base address' is the address of the element at index 0.
Array data type, on the other hand, refers to any data type that provides a logical sequence of elements accessed by index. For example, C++ provides std::vector, and Objective-C provides NSArray and NSMutableArray, none of which are likely to be implemented as a contiguous sequence of elements in memory.
The terminology used in the articles isn't very helpful. The definition given at the top of the array data structure article is:
an array data structure or simply array is a data structure consisting
of a collection of elements (values or variables), each identified by
at least one index
while the definition given for array data type is:
an array type is a data type that is meant to describe a collection of
elements (values or variables), each selected by one or more indices
that can be computed at run time
It doesn't help that the array data structure article, which is apparently supposed to be about the C-style implementation of arrays, includes discussion of associative arrays and other material that would be far more appropriate in the array data type article. You can learn why this is by reading the discussion page, particularly Proposal to split the article and Array structure. The only thing that's clear about these articles is that the various authors can't make up their collective mind about how 'array' should be defined and explained.
A type is something that the programmer sees; a data structure is how something is implemented behind the scenes. It's conceivable that an array type is implemented behind the scenes with e.g. a hashtable (this is the case for PHP, I think).
In C, there is no distinction; an array type must be implemented with a contiguous block of memory.
The structure of your array determines how the array is implemented (storage and access), the data type refers to the types of data contain within the array. For your reading pleasure read each of these links.
Brackets [] is how you designate an Array Data Type in C
Similary, * is how you designate a Pointer Data Type in C
int array[]={1, 2, 3, 4, 5}; is an example of an Array Data Structure in C
Specifically, you have defined a data structure which has 5 integers arranged contiguously, you have allocated sufficient memory on the stack for that data structure, and you have initialized that data structure with values 1, 2, 3, 4, 5.
A Data Structure in C has a non-zero size which can be found by calling sizeof() on an instance of that structure.

C - How to implement Set data structure?

Is there any tricky way to implement a set data structure (a collection of unique values) in C? All elements in a set will be of the same type and there is a huge RAM memory.
As I know, for integers it can be done really fast'N'easy using value-indexed arrays. But I'd like to have a very general Set data type. And it would be nice if a set could include itself.
There are multiple ways of implementing set (and map) functionality, for example:
tree-based approach (ordered traversal)
hash-based approach (unordered traversal)
Since you mentioned value-indexed arrays, let's try the hash-based approach which builds naturally on top of the value-indexed array technique.
Beware of the advantages and disadvantages of hash-based vs. tree-based approaches.
You can design a hash-set (a special case of hash-tables) of pointers to hashable PODs, with chaining, internally represented as a fixed-size array of buckets of hashables, where:
all hashables in a bucket have the same hash value
a bucket can be implemented as a dynamic array or linked list of hashables
a hashable's hash value is used to index into the array of buckets (hash-value-indexed array)
one or more of the hashables contained in the hash-set could be (a pointer to) another hash-set, or even to the hash-set itself (i.e. self-inclusion is possible)
With large amounts of memory at your disposal, you can size your array of buckets generously and, in combination with a good hash method, drastically reduce the probability of collision, achieving virtually constant-time performance.
You would have to implement:
the hash function for the type being hashed
an equality function for the type being used to test whether two hashables are equal or not
the hash-set contains/insert/remove functionality.
You can also use open addressing as an alternative to maintaining and managing buckets.
Sets are usually implemented as some variety of a binary tree. Red black trees have good worst case performance.
These can also be used to build an map to allow key / value lookups.
This approach requires some sort of ordering on the elements of the set and the key values in a map.
I'm not sure how you would manage a set that could possibly contain itself using binary trees if you limit set membership to well defined types in C ... comparison between such constructs could be problematic. You could do it easily enough in C++, though.
The way to get genericity in C is by void *, so you're going to be using pointers anyway, and pointers to different objects are unique. This means you need a hash map or binary tree containing pointers, and this will work for all data objects.
The downside of this is that you can't enter rvalues independently. You can't have a set containing the value 5; you have to assign 5 to a variable, which means it won't match a random 5. You could enter it as (void *) 5, and for practical purposes this is likely to work with small integers, but if your integers can get into large enough sizes to compete with pointers this has a very small probability of failing.
Nor does this work with string values. Given char a[] = "Hello, World!"; char b[] = "Hello, World!";, a set of pointers would find a and b to be different. You would probably want to hash the values, but if you're concerned about hash collisions you should save the string in the set and do a strncmp() to compare the stored string with the probing string.
(There's similar problems with floating-point numbers, but trying to represent floating-point numbers in sets is a bad idea in the first place.)
Therefore, you'd probably want a tagged value, one tag for any sort of object, one for integer value, and one for string value, and possibly more for different sorts of values. It's complicated, but doable.
If the maximum number of elements in the set (the cardinality of the underlying data type) is small enough, you might want to consider using a plain old array of bits (or whatever you call them in your favourite language).
Then you have a simple set membership check: bit n is 1 if element n is in the set. You could even count 'ordinary' members from 1, and only make bit 0 equal to 1 if the set contains itself.
This approach will probably require some sort of other data structure (or function) to translate from the member data type to the position in the bit array (and back), but it makes basic set operations (union, intersection, membership test, difference, insertion, removal,compelment) very very easy. And it is only suitable for relatively small sets, you wouldn't want to use it for sets of 32-bit integers I don't suppose.

Resources