I'm currently fiddling with a program that's trying to solve a 2d rubix cube.The program is using a hash table as a memory of sorts where it saves different categories of information and it's runs on repeat. From run to run there are certain categories of information I'd like to free/remove instead of freeing the whole table at the end of each run (which is what I'm currently doing).
I've come up two ways and I'm unsure which to use. Either i basically make one array/stack for each of the categories where I save a pointer that i can later free. Or i make separate hash tables for all of the different categories and free each one at my discretion.
Are there other options? Some where i read about a pointer pool and I'm not sure what that might be. Any ides or helpful comments would be great!
Did you have more memory or time ?) If you use hash table (include separate) then you need use if for check all element in your hash table. It is very more time. I think best way create second struct when you save object after create. For free all table you need run on simple array without check and use memory set zero for flush your hash table. You need little bit more memory but more effective work.
Related
I'm a student and starting to relearn again the basics of programming.
The problem I stated above starts when I have read some Facebook posts that most of the programmers use arrays in their application and arrays are useful. And I started to realize that I never use arrays in my program.
I read some books but they only show the syntax of array and didn't discuss on when to apply them in creating real world applications. I tried to research this on the Internet but I cannot find any. Do you guys have circumstance when you use arrays. Can you please share it to me so I can have an idea.
Also, to clear my doubts can you please explain to me why arrays are good to store information because database can also store information. When is the right time for me to use database and arrays?
I hope to get a clear answer because I have one remaining semester before the internship and I want to clear my head on this. I do not include any specific programming language because I know most of the programming language have arrays.
I hope to get an answer that can I can easily understand.
When is the right time for me to use database and arrays?
I can see how databases and arrays may seem like competing solutions to the same problem, but you're comparing apples and oranges. Arrays are a way to represent structured data in memory. Databases are a tool to store data on disk until you need to retrieve it.
The question you pose is kind of like asking: "When is the right time to use an integer to hold a value, vs a piece of paper?" One of them is a structural representation in memory; the other is a storage tool.
Do you guys have circumstance when you use arrays
In most applications, databases and arrays work together. Applications often retrieve data from a database, and hold it in an array for easy processing. Here is a simple example:
Google allows you to receive an alert when something of interest is mentioned on the news. Let's call it the event. Many people can be interested in the event, so Google needs to keep a list of people to alert. How? Probably in a database.
When the event occurs, what does Google do? Well it needs to:
Retrieve the list of interested users from the DB and place it in an array
Loop through the array and send a notification to each user.
In this example, arrays work really well because users form a collection of similarly shaped data structures that needs to be put through a similar process. That's exactly what arrays are for!
Some other common uses of arrays
A bank wants to send invoice and payment due reminders at the end of the day. So it retrieves the users with past due payments from the DB, and loops through the users' array sending notifications.
An IT admin panel wants to check whether all critical websites in a list are still online. So it loops through the array of domains, pings each one and records the results in a log
An educational program wants to perform statistical functions on student test results. So it puts the results in an array to easily perform operations such as average, sum, standardDev...
Arrays are also awesome at keeping things in a predictable order. You can be certain that as you loop forward through an array, you encounter values in the order you put them in. If you're trying to simulate a checkout line at the store, the customers in a queue are a perfect candidate to represent in an array because:
They are similarly shaped data: each customer has a name, cart contents, wait time, and position in line
They will be put through a similar process: each customer needs methods for enter queue, request checkout, approve payment, reject payment, exit queue
Their order should be consistent: When your program executes next(), you should expect that the next customer in line will be the one at the register, not some customer from the back of the line.
Trying to store the checkout queue in a database doesn't make sense because we want to actively work with the queue while we run our simulation, so we need data in memory. The database can hold a historical record of all customers and their checkout outcomes, perhaps for another program to retrieve and use in another way (maybe build customized statistical reports)
There are two different points. Let's me try to explain the simple way:
Array: container objects to keep a fixed number of values. The array is stored in your memory. So it depends on your requirements but when you need a fixed and fast one, just use array.
Database: when you have a relational data or you would like to store it in somewhere and not really worry about the size of the objects. You can store 10, 100, 1000 records to you DB. It's also flexible and you can select/query/update the data flexible. Simple way to use is: have a relational data, large amount and would like to flexible it, use database.
Hope this help.
There are a number of ways to store data when you have multiple instances of the same type of data. (For example, say you want to keep information on all the people in your city. There would be some sort of object to hold the information on each person, and you might want to have a data structure that holds the information on every person.)
Java has two main ways to store multiple instances of data in memory: arrays and Collections.
Databases are something different. The difference between a database and an array or collection, as I see it, are:
databases are persistent, i.e. the data will stay around after your program has finished running;
databases can be shared between programs, often programs running in all different parts of the world;
databases can be extremely large, much, much larger than could fit in your computer's memory.
Arrays and collections, however, are intended only for use by one program as it runs. Your program may want to keep track of some information in order to do its calculations. But the data will be in your computer's memory, and therefore other programs on other computers won't be able to access it. And when your program is done running, the data is gone. However, since the data is in memory, it's much faster to use it than data in a database, which is stored on some sort of external device. (This is really an overgeneralization, and doesn't consider things like virtual memory and caching. But it's good enough for someone learning the basics.)
The Java run time gives you three basic kinds of collections: sets, lists, and maps. A set is an unordered collection of unique elements; you use that when the data doesn't belong in any particular order, and the main operations you want are to see if something is in the set, or return all the data in the set without caring about the order. A list is ordered, though; the data has a particular order, and provides operations like "get the Nth element" for some number N, and adding to the ends of the list or inserting in a particular place in the list. A map is unordered like a set, but it also attaches keys to the data, so that you can look for data by giving the key. (Again, this is an overgeneralization. Some sets do have order, like SortedSet. And I haven't even gotten into queues, trees, multisets, etc., some of which are in third-party libraries.)
Java provides a List type for ordered lists, but there are several ways to implement it. One is ArrayList. Like all lists, it provides the capability to get the Nth item in the list. But an ArrayList provides this capability faster; under the hood, it's able to go directly to the Nth item. Some other list implementations don't do that--they have to go through the first, second, etc., items, until they get to the Nth.
An array is similar to an ArrayList, but it has a different syntax. For an array x, you can get the Nth element by referring to x[n], while for an ArrayList you'd say x.get(n). As far as functionality goes, the biggest difference is that for an array, you have to know how big it is before you create it, while an ArrayList can grow. So you'd want to use an ArrayList if you don't know beforehand how big your list will be. If you do know, then an array is a little more efficient, I think. Although you can probably get by mostly with just ArrayList, arrays are still fundamental structures in every computer language. The implementation of ArrayList depends on arrays under the hood, for instance.
Think of an array as a book, and database as library. You can't share the book with others at the same time, but you can share a library. You can't put the entire library in one book, but you can checkout 1 book at a time.
I'm using the search.h library to define a hash table through the hcreate function.
How can I go through all the keys in that table? hsearch always expects an entry to search for (or store).
This is the documentation to all the three functions that manage the hash table ( hcreate, hsearch and hdestroy) but there's not mention of how to iterate through the structure to obtain all the stored keys.
When storing an entry in the table, I malloc the key value and so would like to have an easy way to free those malloc'd values.
Can I avoid having to store those in a separate structure such as an array?
I wouldn't expect hdestroy to be doing this automatically for me, as it has no way of knowing if key points to dynamically allocated or static memory (or indeed if I haven't already freed that memory).
Switching to a different hash search table library is not an option. I have to work with this. I'm on CentOS and use GCC 4.1.2.
There is no standard functionality of iterating through the entries of the hash table. This question is addressed here (in the hdestroy section):
It is important to remember that the elements contained in the hashing
table at the time hdestroy is called are not freed by this function.
It is the responsibility of the program code to free those strings (if
necessary at all). Freeing all the element memory is not possible
without extra, separately kept information since there is no function
to iterate through all available elements in the hashing table. If it
is really necessary to free a table and all elements the programmer
has to keep a list of all table elements and before calling hdestroy
s/he has to free all element’s data using this list. This is a very
unpleasant mechanism and it also shows that this kind of hashing
tables is mainly meant for tables which are created once and used
until the end of the program run.
Without looking at the actual source of the library, I would say there is no way walk the hash table after it has been created. You would be required to remember the pointers for your malloc'd memory in a separate structure.
Frankly, I don't think I'd touch that library with a ten foot pole. The API has numerous problems
Atrocious documentation
The library can only support a single hash table (note that hcreate does not return a handle which is then passed to hsearch or hdestroy)
The inability to walk the table, or retrieve the keys severely limits its uses.
Instead, depending on your platform (you don't say whether you are on Windows or a Unix-based OS), I'd take a good long look at glib which supports a rich set of data-structures (documentation home)
The docs for hash tables are here. That's for v2.42 of the library - they don't have a generic link for the "latest version".
glib is the core of GNOME (the Ubuntu UI) but you don't need to use any of the gmainloop or event pump related features.
In some (horrible 3rd party) code we're working with there is a dictionary lookup routine that scans through a table populated with "'name-string' -> function_pointer" pairs, basically copy-pasted from K&R Section 6.6.
I've had to extend this, and while reading the code was struck by the seemingly pointless inclusion of hashing routines that iterate through the source data structure and create a hash table.
Given that the source data structure is fixed at compile time (so will never be added to or changed when running), is there any point in having hashing routines in there?
I'm just having one of those moments when I can't tell if the author was doing something clever that I've missed, or was being lazy and not thinking (so far the latter has been the case more often than not).
Is there a reason to have a hash table for data that will never change?
Is there a reason to have a hash table for data that will never
change?
Probably the hash table code was already there and working fine, and the programmer just wanted to get the job done (e.g. looking up a function pointer from a string). If this function is not performance critical I see no reason to change it.
If you want to change it, then I suggest to take a look at perfect hash tables.
These are hash tables were the hash function is created from a fixed set of predefined keys. The good thing about them: They are often faster than a tree data-structure.
GPERF is a tool that does just this. It creates C-code from a set of strings: https://www.gnu.org/software/gperf/
This program I'm doing is about a social network, which means there are users and their profiles. The profiles structure is UserProfile.
Now, there are various possible Graph implementations and I don't think I'm using the best one. I have a Graph structure and inside, there's a pointer to a linked list of type Vertex. Each Vertex element has a value, a pointer to the next Vertex and a pointer to a linked list of type Edge. Each Edge element has a value (so I can define weights and whatever it's needed), a pointer to the next Edge and a pointer to the Vertex owner.
I have a 2 sample files with data to process (in CSV style) and insert into the Graph. The first one is the user data (one user per line); the second one is the user relations (for the graph). The first file is quickly inserted into the graph cause I always insert at the head and there's like ~18000 users. The second file takes ages but I still insert the edges at the head. The file has about ~520000 lines of user relations and takes between 13-15mins to insert into the Graph. I made a quick test and reading the data is pretty quickly, instantaneously really. The problem is in the insertion.
This problem exists because I have a Graph implemented with linked lists for the vertices. Every time I need to insert a relation, I need to lookup for 2 vertices, so I can link them together. This is the problem... Doing this for ~520000 relations, takes a while.
How should I solve this?
Solution 1) Some people recommended me to implement the Graph (the vertices part) as an array instead of a linked list. This way I have direct access to every vertex and the insertion is probably going to drop considerably. But, I don't like the idea of allocating an array with [18000] elements. How practically is this? My sample data has ~18000, but what if I need much less or much more? The linked list approach has that flexibility, I can have whatever size I want as long as there's memory for it. But the array doesn't, how am I going to handle such situation? What are your suggestions?
Using linked lists is good for space complexity but bad for time complexity. And using an array is good for time complexity but bad for space complexity.
Any thoughts about this solution?
Solution 2) This project also demands that I have some sort of data structures that allows quick lookup based on a name index and an ID index. For this I decided to use Hash Tables. My tables are implemented with separate chaining as collision resolution and when a load factor of 0.70 is reach, I normally recreate the table. I base the next table size on this Link.
Currently, both Hash Tables hold a pointer to the UserProfile instead of duplication the user profile itself. That would be stupid, changing data would require 3 changes and it's really dumb to do it that way. So I just save the pointer to the UserProfile. The same user profile pointer is also saved as value in each Graph Vertex.
So, I have 3 data structures, one Graph and two Hash Tables and every single one of them point to the same exact UserProfile. The Graph structure will serve the purpose of finding the shortest path and stuff like that while the Hash Tables serve as quick index by name and ID.
What I'm thinking to solve my Graph problem is to, instead of having the Hash Tables value point to the UserProfile, I point it to the corresponding Vertex. It's still a pointer, no more and no less space is used, I just change what I point to.
Like this, I can easily and quickly lookup for each Vertex I need and link them together. This will insert the ~520000 relations pretty quickly.
I thought of this solution because I already have the Hash Tables and I need to have them, then, why not take advantage of them for indexing the Graph vertices instead of the user profile? It's basically the same thing, I can still access the UserProfile pretty quickly, just go to the Vertex and then to the UserProfile.
But, do you see any cons on this second solution against the first one? Or only pros that overpower the pros and cons on the first solution?
Other Solution) If you have any other solution, I'm all ears. But please explain the pros and cons of that solution over the previous 2. I really don't have much time to be wasting with this right now, I need to move on with this project, so, if I'm doing to do such a change, I need to understand exactly what to change and if that's really the way to go.
Hopefully no one fell asleep reading this and closed the browser, sorry for the big testament. But I really need to decide what to do about this and I really need to make a change.
P.S: When answering my proposed solutions, please enumerate them as I did so I know exactly what are you talking about and don't confuse my self more than I already am.
The first approach is the Since the main issue here is speed, I would prefer the array approach.
You should, of course, maintain the hash table for the name-index lookup.
If I understood correctly, you only process the data one time. So there is no dynamic data insertion.
To deal with the space allocation problem, I would recommend:
1 - Read once the file, to get the number of vertex.
2 - allocate that space
If you data is dynamic, you could implement some simple method to increment the array size in steps of 50%.
3 - In the Edges, substitute you linked list for an array. This array should be dynamically incremented with steps of 50%.
Even with the "extra" space allocated, when you increment the size with steps of 50%, the total size used by the array should only be marginally larger than with the size of the linked list.
I hope I could help.
I have a structure in C which resembles that of a database table record.
Now when I query the table using select, I do not know how many records I will get.
I want to store all the returned records from the select query in a array of my structure data type.
Which method is best?
Method 1: find array size and allocate
first get the count of records by doing select count(*) from table
allocate a static array
run select * from table and then store each records in my structure in a loop.
Method 2: use single linked list
while ( records returned )
{
create new node
store the record in node
}
Which implementation is best?
My requirement is that when I have all the records,
I will probably make copies of them or something.
But I do not need random access and I will not be doing any search of a particular record.
Thanks
And I forgot option #4. Allocate an array of fixed size. When that array is full, allocate another. You can keep track of the arrays by linking them in a linked list, or having a higher level array that keeps the pointers to the data arrays. This two-level scheme is great when you need random access, you just need to break your index into two parts.
A problem with 'select count(*)' is that the value might change between calls, so your "real" select will have a number of items different from the count you'd expect.
I think the best solution is your "2".
Instead of a linked list, I would personally allocate an array (reallocating as necessary). This is easier in languages that support growing arrays (e.g. std::vector<myrecord> in C++ and List<myrecord> in C#).
You forgot option 3, it's a little more complicated but it might be best for your particular case. This is the way it's typically done in C++ std::vector.
Allocate an array of any comfortable size. When that array is filled, allocate a new larger array of 1.5x to 2x the size of the filled one, then copy the filled array to this one. Free the original array and replace it with the new one. Lather, rinse, repeat.
There are a good many possible critiques that should be made.
You are not talking about a static array at all - a static array would be of pre-determined size fixed at compile time, and either local to a source file or local to a function. You are talking about a dynamically allocated array.
You do not give any indication of record size or record count, nor of how dynamic the database underneath is (that is, could any other process change any of the data while yours is running). The sizing information isn't dreadfully critical, but the other factor is. If you're doing a report of some sort, then fetching the data into memory is fine; you aren't going to modify the database and the data is an accurate snapshot. However, if other people could be modifying the records while you are modifying records, your outline solution is a major example of how to lose other people's updates. That is a BAD thing!
Why do you need all the data in memory at once? Ignoring size constraints, what exactly is the benefit of that compared with processing each relevant record once in the correct sequence? You see, DBMS put a lot of effort into being able to select the relevant records (WHERE clauses) and the relevant data (SELECT lists) and allow you to specify the sequence (ORDER BY clauses) and they have the best sort systems they can afford (better than the ones you or I are likely to produce).
Beware of quadratic behaviour if you allocate your array in chunks. Each time you reallocate, there's a decent chance the old memory will have to be copied to the new location. This will fragment your memory (the old location will be available for reuse, but by definition will be too small to reuse). Mark Ransom points out a reasonable alternative - not the world's simplest scheme overall (but it avoids the quadratic behaviour I referred to). Of course, you can (and would) abstract that away by a set of suitable functions.
Bulk fetching (also mentioned by Mark Ransom) is also useful. You would want to preallocate the array into which a bulk fetch fetches so that you don't have to do extra copying. This is just linear behaviour though, so it is less serious.
Create a data structure to represent your array or list. Pretend you're in an OO language and create accessors and constructors for everything you need. Inside that data structure, keep an array, and, as others have said, when the array is filled to capacity, allocate a new array 2x as large and copy into it. Access the structure only through your defined routines for accessing it.
This is the way Java and other languages do this. Internally, this is even how Perl is implemented in C.
I was going to say your best option is to look for a library that already does this ... maybe you can borrow Perl's C implementation of this kind of data structure. I'm sure it's more well tested than anything you or I could roll up from scratch. :)
while(record = get_record()) {
records++;
records_array = (record_struct *) realloc(records_array, (sizeof record_struct)*records);
*records_array[records - 1] = record;
}
This is strictly an example — please don't use realloc() in production.
The linked list is a nice, simple option. I'd go with that. If you prefer the growing array, you can find an implementation as part of Dave Hanson's C Interfaces and Implementations, which as a bonus also provides linked lists.
This looks to me like a design decision that is likely to change as your application evolves, so you should definitely hide the representation behind a suitable API. If you don't already know how to do this, Hanson's code will give you a number of nice examples.