Cheers, I am trying to find an algorithm/data structure I can use to rank elements by their frequency.
For example, let's say I am given 5 names and I want to rank them based on their frequency. I am given the names consecutively, and every insertion and query I perform MUST be in O(log(n)) time, where n is the number of given names.
For example let's say I am given:
"foo"
"bar"
"bar"
"pop"
"foo"
"bar"
Then, by ranking the 1st should be "bar" (3 times), 2nd => "foo" and 3rd "pop". Keep in mind that when two or more elements have the same frequency (and the same ranking), which ever I return is correct.
I have tried using a Map (Hash), to keep the frequency in which the strings are given, for example if given "foo" I can return 3 (NOT the rank however), or even thought of using a Set (using an AVL tree) in order to arrange them by their frequency, but again I can't turn that into a Ranking data structure in logarithmic time. Any ideas ?
Return rating by name.
You can do insert and query in constant time O(1). For this, you need to employ two structures hash-map and something that I call doubly-linked-list.
Hash-map contains pairs - a name and pointer to a list item/bucket with this name statistics.
Doubly-linked-list bucket stores two numbers: an integer for the number of names pointing to the lower buckets (Rating) and a number of repetitions for the names in it (RepCount).
Initialization:
Create the first bucket, put all names into the hash-map and initialize pointers with the address of the first bucket. Create another bucket with RepCount = INFINITY and Rating = #names.
OPERATIONS:
Insert name. Find the address of the corresponding bucket Target, check if the bucket OneMore with OneMore.RepCount == Target.RepCount + 1 true exists. If it exists then --OneMore.Rating, if not then create one with RepCount = Target.RepCount + 1 and Rating = NextToTarget.Rating - 1. Observe that NextToTarget always exists due to initialization. Repoint hash-map entry to OneMore.
Query rating. Extract appropriate pointer from the hash-map and read Target.Rating.
Return name by rating (and rating by names)
You need two hash-maps and doubly-linked-list. In hash-map names store name => name-in-list*, in hash-map ratings store rating and a pointer to the first and the last name with this rating in the list rating => (first, last). In the list store pairs (name, rating) in the order described below.
Initialization:
Insert all names into the list. Insert a single entry into the hash-map (0, (list.head, list.tail)).
OPERATIONS:
Inset name. Recover name list node using names. Using ratings find out there node.rating finishes and move node next to it increasing its rating by one. Compare new rating with the next node's rating and see if you need to update an existing rating or create a new one in ratings. Remove ratings entry in case the old rating is empty now or update it if node was first or last.
Query name. Use ratings[..].first or return null if not exists.
Query rating. Return names[..].rating.
Related
I have a double-chained list with descriptor of which the information in each node is a soccer team (also a structured type, with name, origin and an identification number). I have to separate the total teams in two groups randomly, but keeping an equal amount in each (except if the amount is odd).
Select one team at random and assign it to group 1. Then pick another team at random and assign to group 2.
Repeat until all teams have been assigned to a group.
you can iterate over your list, compute a hash value for each item (which should almost be random), and sort the item against the hash value modulo the number of category you like to have (here is 2). It seems to me that could make the job.
So i have a CUSTOM PriorityQueue(implemented using a max-heap) containing songs. Each song is an object that has a number of likes, the name of the song and a unique ID. The priority of each song inside the PriorityQueue is defined as the number of likes it has and if two songs have the same number of likes then the name of the song is compared.
I am asked to create a "remove" function that takes as an argument the ID of the song and removes it from the PriorityQueue. The problem is that i have to search for the song that has this particular ID in order to remove it which happens in O(n), since it is a linear search. I am supposed to make the remove function in O(logn) but from what i have read i understand it is impossible to search in an unsorted array in less than O(n)... can someone offer any ideas?
(the hint that we have is that we may have to add some things inside the getMax function (getMax returns the root of the PQ which contains the most popular song) and the insert function).
I am new to Swift Lang, have seen lots of tutorials, but it's not clear – my question is what's the main difference between the Array, Set and Dictionary collection type?
Here are the practical differences between the different types:
Arrays are effectively ordered lists and are used to store lists of information in cases where order is important.
For example, posts in a social network app being displayed in a tableView may be stored in an array.
Sets are different in the sense that order does not matter and these will be used in cases where order does not matter.
Sets are especially useful when you need to ensure that an item only appears once in the set.
Dictionaries are used to store key, value pairs and are used when you want to easily find a value using a key, just like in a dictionary.
For example, you could store a list of items and links to more information about these items in a dictionary.
Hope this helps :)
(For more information and to find Apple's own definitions, check out Apple's guides at https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/CollectionTypes.html)
Detailed documentation can be found here on Apple's guide. Below are some quick definations extracted from there:
Array
An array stores values of the same type in an ordered list. The same value can appear in an array multiple times at different positions.
Set
A set stores distinct values of the same type in a collection with no defined ordering. You can use a set instead of an array when the order of items is not important, or when you need to ensure that an item only appears once.
Dictionary
A dictionary stores associations between keys of the same type and values of the same type in a collection with no defined ordering. Each value is associated with a unique key, which acts as an identifier for that value within the dictionary. Unlike items in an array, items in a dictionary do not have a specified order. You use a dictionary when you need to look up values based on their identifier, in much the same way that a real-world dictionary is used to look up the definition for a particular word.
Old thread yet worth to talk about performance.
With given N element inside an array or a dictionary it worth to consider the performance when you try to access elements or to add or to remove objects.
Arrays
To access a random element will cost you the same as accessing the first or last, as elements follow sequentially each other so they are accessed directly. They will cost you 1 cycle.
Inserting an element is costly. If you add to the beginning it will cost you 1 cycle. Inserting to the middle, the remainder needs to be shifted. It can cost you as much as N cycle in worst case (average N/2 cycles). If you append to the end and you have enough room in the array it will cost you 1 cycle. Otherwise the whole array will be copied which will cost you N cycle. This is why it is important to assign enough space to the array at the beginning of the operation.
Deleting from the beginning or the end it will cost you 1. From the middle shift operation is required. In average it is N/2.
Finding element with a given property will cost you N/2 cycle.
So be very cautious with huge arrays.
Dictionaries
While Dictionaries are disordered they can bring you some benefits here. As keys are hashed and stored in a hash table any given operation will cost you 1 cycle. Only exception can be finding an element with a given property. It can cost you N/2 cycle in the worst case. With clever design however you can assign property values as dictionary keys so the lookup will cost you 1 cycle only no matter how many elements are inside.
Swift Collections - Array, Dictionary, Set
Every collection is dynamic that is why it has some extra steps for expanding and collapsing. Array should allocate more memory and copy an old date into new one, Dictionary additionally should recalculate basket indexes for every object inside
Big O (O) notation describes a performance of some function
Array - ArrayList - a dynamic array of objects. It is based on usual array. It is used for task where you very often should have an access by index
get by index - O(1)
find element - O(n) - you try to find the latest element
insert/delete - O(n) - every time a tail of array is copied/pasted
Dictionary - HashTable, HashMap - saving key/value pairs. It contains a buckets/baskets(array structure, access by index) where each of them contains another structure(array list, linked list, tree). Collisions are solved by Separate chaining. The main idea is:
calculate key's hash code[About] (Hashable) and based on this hash code the index of bucket is calculated(for example by using modulo(mod)).
Since Hashable function returns Int it can not guarantees that two different objects will have different hash codes. More over count of basket is not equals Int.max. When we have two different objects with the same hash codes, or situation when two objects which have different hash codes are located into the same basket - it is a collision. Than is why when we know the index of basket we should check if anybody there is the same as our key, and Equatable is to the rescue. If two objects are equal the key/value object will be replaces, otherwise - new key/value object will be added inside
find element - O(1) to O(n)
insert/delete - O(1) to O(n)
O(n) - in case when hash code for every object is the same, that is why we have only one bucket. So hash function should evenly distributes the elements
As you see HashMap doesn't support access by index but in other cases it has better performance
Set - hash Set. Is based on HashTable without value
*Also you are able to implement a kind of Java TreeMap/TreeSet which is sorted structure but with O(log(n)) complexity to access an element
[Java Thread safe Collections]
I am using VBScript to find a quarterbacks passer rating. This rating is based on a single season. I would like to find a QBs overall career passer rating by:
Running the program to get the single season passer rating program
to record the result from the given input.
Running the program again based on a subsequent or previous season's
stats
Take the collective passer ratings and find the average of all
seasons
My logic is that I would place each seasons passer rating into an array.
dim array()
arrPasserRating = array(Passer1,Passer2, ....)
This array would have no upper limit so I assume I would have to do something like this:
ReDim Preserve ArrPasserRating(UBound(yourArray) + 1)
arrPasserRating(UBound(arrPasserRating) = "..."
where "..." is the next passer rating I calculated.
Now I would have the problem of a naming scheme and creating a for ... next in order to create a new array value with the name of the previous item in the array and adding the "number of the index that it occupies in the array.
For instance passerRating would stay passerRating, the next inputted passerRating would be passerRating1, etc.
Does anyone have a starting point for me or a solution to this problem?
I have a big array / list of 1 million id and then I need to find the first free id that can be used . It can be assumed that there are couple modules which refer to this data structure and take an id ( during which it shall be marked as used ) and then return it later ( shall be marked as free ).
I want to know what different data structures can be used ? and what algorithm I can use to do this efficiently time and space (seperately).
Please excuse if its already present here, I did search before posting .
One initial idea that might work would be to store a priority queue of all the unused IDs, sorted so that low IDs are dequeued before high IDs. Using a standard binary heap, this would make it possible to return an ID to the unused ID pool in O(log n) time and to find the next free ID in O(log n) time as well. This has the disadvantage that it requires you to explicitly store all of the IDs, which could be space-inefficient if there are a huge number of IDs.
One potential space-saving optimization would be to try to coalesce consecutive ID values into ID ranges. For example, if you have free IDs 1, 3, 4, 5, 6, 8, 9, 10, and 12, you could just store the ranges 1, 3-6, 8-10, and 12. This would require you to change the underlying data structure a bit. Rather than using a binary heap, you could use a balanced binary search tree which stores the ranges. Since these ranges won't overlap, you can compare the ranges as less than, equal to, or greater than other ranges. Since BSTs are stored in sorted order, you can find the first free ID by taking the minimum element of the tree (in O(log n) time) and looking at the low end of its range. You would then update the range to exclude that first element, which might require you to remove an empty range from the the tree. When returning an ID to the pool of unused IDs, you could do a predecessor and successor search to determine the ranges that come immediately before and after the ID. If either one of them could be extended to include that ID, you can just extend the range. (You might need to merge two ranges as well). This also only takes O(log n) time.
Hope this helps!
A naive but efficient method would be to store all your ids in a stack.
Getting an id is a constant time operation : pop first item of the list.
When the task is over just push the id on the stack.
If the lowest free id must be returned (and not any free id) you can use a min heap with insertion and pop lowest in O(log N).
Try to use linked list (linked list of id's). Linkup all those list and the head should point to the free list (lets say at init all are free). Whenever the it'll be marked as used, remove it and place it at the end of the list and make the head point to the next free list. In this way, your list will be structured in a manner of "from free to used". You can also get a free list in O(1). Also, when an id is marked as free - put it as the first member of the linked list (as it's become free it's become usable) i.e make head point to this list. Hope this will helps!
Preamble: binary heap seems the best answer indeed. I'll present here an alternative, that may have advantages in some scenarios.
One possible way is to use a Fenwick Tree. You can store in each position either 0 or 1, indicating that a position was already used or not. And you can find the first empty position with a binary search (to find the first range [1..n] that has sum n-1). The complexity of this operation is O(log^2 n), which is worse than a binary heap, but this approach has another advantages:
You can implement a Fenwick Tree in less than 10 lines of code
You can now calculate the density (number of used / total ids) of a range in O(log n)
If you do not strictly need the lowest id, you can allocate ids to modules in batches of a 1000. When freeing ids, they can be added to the back of the list. And once in a while you would sort the list, to make sure that again, the ids you assign are from the low end.
Well, an array probably isn't the best structure. An Hash would be better, speedwise at least. As for the structure for each "node", all that I can see you need is just the id, and wether it is being used, or not.