Why each data type has two choices in pgAdmin 4? - pgadmin-4

Using pgAdmin 4, defining data type for a table column discover each type has two choices, what's the difference?
for example char has two choices.

Because Postgres does not work like programming data structures. It is not keeping them on memory but disk. So each data structure has a single and fixed length array value. See for more information: https://stackoverflow.com/a/42484838/13638824
Edit for your question (It was too long to be comment):
Writing data structures to a memory and disk is totally different. Memory is dynamic and writing to it is very cheap so it can handle changes in data very fast. But writing to disk is not cheap, when database systems decide to write a data they are trying to be efficient as possible. They are usually using fixed sized allocations in background to improve performance. When you define a Text dbms will not create a string. It will write it to disk as expandable data structure to ensure flexibility and performance. But if you use char[] the length will be predefined so it will be more performant.
TL;DR;
For memory data structures:
char[] == string.
But for DBMS char[] == fixed size byte array and text == flexible size byte array.

Related

How to deal with HUGE (32GB+) multi-dimensional arrays?

I have a multi-dimensional array of strings (arbitrary length but generally no more than 5 or 6 characters) which needs to be random-read capable. Currently my issue is that I cannot create this array in any programming language (which keeps or loads the entire array into memory) since the array would far exceed my 32GB RAM.
All values in this database are interrelated, so if I break the database up into smaller more manageable pieces, I will need to have every "piece" with related data in it, loaded, in order to do computations with the held values. Which would mean loading the entire database, so we're back to square one.
The array dimensions are: [50,000] * [50,000] * [2] * [8] (I'll refer to this structure as X*Y*Z*M)
The array needs to be infinitely resizable on the X, Y, and M dimensions, though the M dimension would very rarely be changed, so a sufficiently high upper-bound would be acceptable.
While I do have a specific use-case for this, this is meant to be a more general and open-ended question about dealing with huge multi-dimensional arrays - what methods, structures, or tricks would you recommend to store and index the values? The array itself, clearly needs to live on disk somewhere, as it is far too large to keep in memory.
I've of course looked into the basic options like an SQL database or a static file directory structure.
SQL doesnt seem like it would work since there is an upper-bound limitation to column widths, and it only supports tables with columns and rows - SQL doesn't seem to support the kind of multidimensionality that I require. Perhaps there's another DBMS system for things like this which someone recommends?
The static file structure seemed to work when I first created the database, but upon shutting the PC down and everything being lost from the read cache, the PC will no longer read from the disk. It returns zero on every read and doesnt even attempt to actually read the files. Any attempt to enumerate the database's contents (right-clicking properties on the directory) will BSOD the entire PC instantly. There's just too many files and directories, windows can't handle it.

How do structures affect efficiency?

Pointers make the program more efficient and faster. But how do structures effect the efficiency of the program? Does it make it faster? Is it for just the readability of the code, or what? And may I have an example of how it does so?
Pointers are just pointing a memory address, they have nothing to do with efficiency and speed(it is just a variable who stores some address which is required/helpful for some instruction to execute, nothing more)
Yes but Data structures affect the efficiency of program/code in multiple ways.they can increase/decrease time complexity and space complexity of your algorithm and ultimately your code (in which you are implementing your algo.)
for example, let take example of array and linked list
Array : some amount of space allocated sequentially in memory
Linkedlist : some space allocated randomly in memory but connected via pointers
in all cases both can be used (assuming not much heavy space allocation). but as array is continuous allocation retrieval is faster than random allocation in linked list (every time get address of next allocated block and then fetch the data)
thus this improves speed/efficiency of your code
there are many such examples which will prove you why data sructures are more important (if they were not so important why new algorithms are designed and mostly why you are learning them)
Link to refer,
What are the lesser known but useful data structures?
Structures have little to do with efficiency, they're used for abstraction. They allow you to keep all related data together, and refer to it by a single name.
There are some performance-related features, though. If you have all your data in a structure, you can pass a pointer to that structure as one argument to a function. This is better than passing lots of separate arguments to the function, for each value that would have been a member of the structure. But this isn't the primary reason we use structures, it's mainly an added benefit.
Pointers do not contribute anything in program's efficiency and execution time/speed. Structure provides a way of storing different variables under the same name. These variables can be of different types, and each has a name which is used to select it from the structure. For example, if you want to store data about student, it may consist of Student_Id, Name, Sex, School, Address etc. where Student_Id is int, Name is string, Sex is char (M/F) etc. but all variables are grouped together as a single structure 'Student' at a 'single block of memory'. So everytime you need a student data to fetch or update, you need to deal with structured data only. Now imagine, how much problem you may face if you try to store all those int, char, char[] variables separately and update them individually. Because you need to update everything at different memory locations for each student's record.
But if you consider data structure to structure your whole data in abstract data types where you may go for different kind of linked list, tree, graph etc. or array implementation then your algorithm plays a vital role in deciding time and space complexity of the program. So in that sense you can make your program more efficient.
When you want to optimize your memory/cache usage structures can increase the efficiency of your code (make it faster). This is because when data is loaded from memory to the cache it done in words (32-64bits) by fitting you data to these word boundaries you can ensure when your first int is loaded so is your second for a two int structure (maybe a serious of coordinates).

Which data type do I use in T-SQL if the field can be very small or very large?

Suppose I have a column in a table which may hold information as short as a single char 'a' or as big as a huge binary chunk which translates to a jpg, png, mp3, whatever.
Which data type should I use for this?
I thought varbinary(max) or varchar(max), but will it occupy unused space if I just store a single char or a short string?
How is data stored when a field has a data-type that may have variable lengths?
According to this qa, https://dba.stackexchange.com/questions/1767/how-do-too-long-fields-varchar-nvarchar-impact-performance-and-disk-usage-ms, it shouldn't matter, except for this:
Memory
If the client application allocates memory using the maximum size, the application would allocate significantly more memory than is necessary. Special considerations would have to be done to avoid this.
How do I know this? Sorry if I'm being dumb but it seems too vague.
What I would do if I were you, is using File-Stream enable database.
You can learn more about it here:
http://technet.microsoft.com/en-us/library/bb933993(v=sql.105).aspx
Also there are a lot of information in the net, so you should face no issues while using it.
Generally, you will have a table with many columns as you need and one column that will store the information in binary format or BLOB - Binary Large Object. The good thing is that the information that it can store depends only by your hard disk drive space, because it is store in your drive. In the other columns you can have a type field - for eaxmaple mp3/gpef/avi/etc that will help your application to convert this BLOB in its original type.

Need a fast way to write large blocks of data to file in C

I am not at all good when it comes to writing large chunks of data to file. I have a simulation which has structs like so
typedef struct
{
int age;
float height;
float weight;
int friends [ 250000 ];
} Person;
And I can have as many as 250,000 persons, each with 250000 friends (a clique). Obviously this is a great deal of data. If I want to save each struct so I can later load them, what is the most efficient way in C? Here is what I have considered so far
I don't want to create a HUGE string with 250,000 groups of data and then do a single write as this will use a great deal of memory
I also don't want to create 250,000 different files as doing so may be slow.
Appending the files based on index (ie person 1, then person 2...), but this might be slow too.
Saving the data as binary (is this more efficient?)
EDIT I am looking for efficient approaches to using fwrite (), namely whether it's faster to collect all the data and write to a single file, or whether to create multiple files and avoid the overhead of collecting all the data before hand.
You can loop over the people and just store the age, height and weight members (3 fwrites), then a friend_count and then loop over the friends and write them one by one. All of this with fwrite. You don't need to care about optimizing I/O, as the C library will buffer for you and do a big "write" when needed.
I think you are trying to [partially] reinvent a RDBMS (database). Reinventing is usually a bad idea. Consider storing your data in a free database system (e.g. Postgres). It will have other benefits -- you'll be able to interrogate your data w/o writing C code.
If a database sounds like an overkill, use a simpler, file based database storage library such as BerkleyDB or SQLite.
I am not very clear about your structure.
You have a Person structure array, and friends[] contain indexes of other Persons array?
The best way would be to distinguish between a Person and his friends.
This way you have a Person of fixed size, and can store all Persons in a single file, and quickly read back data of Person 12345 - it's at filepos 12345*sizeof(Person) from the beginning of the file.
Friends array can be kept in memory through a
int *Friends[MAXFRIENDS]
array -- you need MAXFRIENDS*sizeof(int *) more bytes of memory, for 250.000 friends it should be 2 megabytes on a 64-bit system. Small change. Each pointer holds the friend[] array for that person.
Then the friends of a Person are into a file in a directory, called, say, /dd/cc/aabbccdd, where aabbccdd is obtained by sprintf("%08x", PersonIndex). Using dd/cc leads to a slightly more balanced tree. To write the friends file, just point to Friends[PersonIndex] and write as many friend indexes as needed (I'd store FriendsNumber in the Person struct).
I'd look at a library like HDF5 so you can not only read the file back on this machine, but give the file to someone else and have the platform portability problem taken care of for you.

which one to use linked list or static arrays?

I have a structure in C which resembles that of a database table record.
Now when I query the table using select, I do not know how many records I will get.
I want to store all the returned records from the select query in a array of my structure data type.
Which method is best?
Method 1: find array size and allocate
first get the count of records by doing select count(*) from table
allocate a static array
run select * from table and then store each records in my structure in a loop.
Method 2: use single linked list
while ( records returned )
{
create new node
store the record in node
}
Which implementation is best?
My requirement is that when I have all the records,
I will probably make copies of them or something.
But I do not need random access and I will not be doing any search of a particular record.
Thanks
And I forgot option #4. Allocate an array of fixed size. When that array is full, allocate another. You can keep track of the arrays by linking them in a linked list, or having a higher level array that keeps the pointers to the data arrays. This two-level scheme is great when you need random access, you just need to break your index into two parts.
A problem with 'select count(*)' is that the value might change between calls, so your "real" select will have a number of items different from the count you'd expect.
I think the best solution is your "2".
Instead of a linked list, I would personally allocate an array (reallocating as necessary). This is easier in languages that support growing arrays (e.g. std::vector<myrecord> in C++ and List<myrecord> in C#).
You forgot option 3, it's a little more complicated but it might be best for your particular case. This is the way it's typically done in C++ std::vector.
Allocate an array of any comfortable size. When that array is filled, allocate a new larger array of 1.5x to 2x the size of the filled one, then copy the filled array to this one. Free the original array and replace it with the new one. Lather, rinse, repeat.
There are a good many possible critiques that should be made.
You are not talking about a static array at all - a static array would be of pre-determined size fixed at compile time, and either local to a source file or local to a function. You are talking about a dynamically allocated array.
You do not give any indication of record size or record count, nor of how dynamic the database underneath is (that is, could any other process change any of the data while yours is running). The sizing information isn't dreadfully critical, but the other factor is. If you're doing a report of some sort, then fetching the data into memory is fine; you aren't going to modify the database and the data is an accurate snapshot. However, if other people could be modifying the records while you are modifying records, your outline solution is a major example of how to lose other people's updates. That is a BAD thing!
Why do you need all the data in memory at once? Ignoring size constraints, what exactly is the benefit of that compared with processing each relevant record once in the correct sequence? You see, DBMS put a lot of effort into being able to select the relevant records (WHERE clauses) and the relevant data (SELECT lists) and allow you to specify the sequence (ORDER BY clauses) and they have the best sort systems they can afford (better than the ones you or I are likely to produce).
Beware of quadratic behaviour if you allocate your array in chunks. Each time you reallocate, there's a decent chance the old memory will have to be copied to the new location. This will fragment your memory (the old location will be available for reuse, but by definition will be too small to reuse). Mark Ransom points out a reasonable alternative - not the world's simplest scheme overall (but it avoids the quadratic behaviour I referred to). Of course, you can (and would) abstract that away by a set of suitable functions.
Bulk fetching (also mentioned by Mark Ransom) is also useful. You would want to preallocate the array into which a bulk fetch fetches so that you don't have to do extra copying. This is just linear behaviour though, so it is less serious.
Create a data structure to represent your array or list. Pretend you're in an OO language and create accessors and constructors for everything you need. Inside that data structure, keep an array, and, as others have said, when the array is filled to capacity, allocate a new array 2x as large and copy into it. Access the structure only through your defined routines for accessing it.
This is the way Java and other languages do this. Internally, this is even how Perl is implemented in C.
I was going to say your best option is to look for a library that already does this ... maybe you can borrow Perl's C implementation of this kind of data structure. I'm sure it's more well tested than anything you or I could roll up from scratch. :)
while(record = get_record()) {
records++;
records_array = (record_struct *) realloc(records_array, (sizeof record_struct)*records);
*records_array[records - 1] = record;
}
This is strictly an example — please don't use realloc() in production.
The linked list is a nice, simple option. I'd go with that. If you prefer the growing array, you can find an implementation as part of Dave Hanson's C Interfaces and Implementations, which as a bonus also provides linked lists.
This looks to me like a design decision that is likely to change as your application evolves, so you should definitely hide the representation behind a suitable API. If you don't already know how to do this, Hanson's code will give you a number of nice examples.

Resources