Algorithm to find specific char/letter in array of strings? - arrays

So I'm trying to come up with a algorithm to find words with a specific char/letter in an array of strings.
If I want words with vowel e, I would get word apple and hello from below set.
{apple, bird, hello}
To find words with specific character, will I need to look through all of the letters in the array and look through every character of each word?
Is there a clever way maybe by sorting the list and then searching somehow?
Also, what would be the running time of this algorithm?
Will it be considered as O(n) or O(n*m)?
Where n is the number of words in the dictionary and m is the length of each word in the array.

In order to find words with a specific character you need to read that character at least once. Thus you must reach each character from each word once, giving a runtime of O(n*m), where n is the number of words and m is the average word length. So yes, you need to lookup each character from each word.
Now if you're going to be doing lots of queries for different letters you can do a single pass over all words and map those words to the characters they are apart of. i.e. apple => a, p, l, e sets. Then you would have 26 sets which hold all words with that character ('a' : [apple], 'b' : [bird], 'c' : [], ... 'l' : [apple, hello], ...). As the number of queries increases relative to the size of your word set you would end up with an amortized lookup time of O(1) -- though you still have a O(n*m) initialization complexity.

Related

How to replace a specific character in an array with two characters

So I just came back from a job interview and one of the questions I had to face with was :
"Given an array of characters and three characters for example :
Array : [a,b,c,z,s,w,y,z,o]
Char 1: 'z'
Char 2 : 'R'
Char 3 : 'R'
Your goal is to replace each 'z' in the array to become two R characters within O(N) time complexity.
so your input will be Array : [a,b,c,z,s,w,y,z,o]
and your output array will be : [a,b,c,R,R,s,w,y,R,R,o]
assume that there is no 'R' in the array before.
You are not allowed to use other arrays or other variables.
The algorithm should be in-line algorithm.
Your final array must be a characters array."
My solution was within O(N^2) time complexity but there is a solution within O(N) time complexity .
The interview is over but I am still thinking about this problem, Can anyone help me to solve this ?
First scan the input to count how many occurrences of char 1 exist. This has a linear time complexity.
From that you know that the length of the final array will be the input length + the number of occurrences.
Then extend the array to its new length, leaving the new slots empty (or whatever value). The exact nature of the operation depends on how the array data structure is implemented. This can surely be done with at worst a linear time complexity.
Use two indexes, i and j, where i references the last character of the input array and j references the very last index in the array (potentially to an empty slot).
Start copying from i to j each time decreasing the values of these indices with one. If you copy the matching letter, then duplicate the copied character to j again, and only reduce j. This has again a linear time complexity.
The algorithm will end with both i and j equal to -1.
Do two iterations.
First, count the number of char1s ('z' in your example).
Now you know how long your array should be at the end: array.size() + num_char1s
Then, go from last to first with input and output iterators. If the element is char1, insert to the end iterator the new chars, otherwise - just copy.
Pseudo code:
num_char1s = 0
for x in array:
if x == char1:
num_char1s++
// Assuming array has sufficient memory already allocated.
out_iterator = num_char1s + size - 1
in_iterator = size - 1
while (in_iterator >= 0):
if (array[in_iterator] == char1):
array[out_iterator--] = char3
array[out_iterator--] = char2
else:
array[out_iterator--] = array[in_iterator]
in_iterator--
In your question, two things are very important.
can't use new variable
can't use new array
So, we must need to use given array.
First we will increase our given array size double. why? Cause at most our new array size = given_array_size*2 (if all characters = char 1)
Now we will shift our given array n times right, where n= given_array_size.
Now we will iterate our array from the new shifted position = n. iterate i=n to 2*n-1
We will take j=0, which will write new array. if we found char 1, we will
make array[j++]=char 2 and array[j++]=char 3.
But if a character is not 'z', we simply don't do anything. array[j++]=array[i]
At last 0 to j-1 is the right answer.
Complexity: O(n)
No new variable and array needed

What is the time complexity of this solution for the longest common prefix string among an array of strings?

This is the problem statement -
Write a function to find the longest common prefix string amongst an array of strings.
I'm not sure on the time complexity on my solution below. I'm guessing it's O(nmnlogn). nlogn for the sort, and n*m for iterating through the array, and making m comparisons on the strings.
string longestCommonPrefix(vector<string>& strs) {
if(strs.size() == 0) return "";
sort(strs.begin(), strs.end());
string prefix = strs[0];
int length = prefix.length();
for (int i = 0; i < strs.size(); ++i) {
if (prefix == strs[i].substr(0, length)) continue;
if (prefix != strs[i].substr(0, length)) {
prefix = prefix.substr(0, --length);
length = prefix.length();
i = 0;
}
if (prefix == "") break;
}
return prefix;
}
Assuming that everything after the sort is O(nm), then it would be O(nlogn + nm) (since you are doing the sort just once), which might be able to be simplified further based on the relationship between m and logn.
When you’re working with algorithms on strings, there are (usually) two separate quantities that we care about. First, there’s the number of strings, which we’ll denote n. Then, there’s the length of the strings. Let’s have the longest string’s length be m.
The reason this comes up is that we have to remember that m term throughout the algorithm. For example, the cost of the sort is not O(n log n), but rather O(mn log n), because while there are O(n log n) comparisons made, each comparison between strings might require scanning all the characters of those strings and thus we need to multiply in a factor of m to account for the comparisons.
The next part of your algorithm scans across the strings from left to right, comparing the longest viable prefix match found so far against each string until a match is detected. The cost of extracting the substring in this step is O(m), since we might need to grab a full substring. However, in the event that a mismatch is found, we’ll grab a shorter substring and try this again. In the worst case, we’ll first try a prefix of length m, then m-1, then m-2, etc, and that means we may have to do
m + (m-1) + (m-2) + ... + 1 = Θ(m2)
work. Though, if that happens, we’d then do less work on all future strings because the prefix length would be shorter. The worst case here, I think, is one where we have n-1 copies of the same string of length m followed by one last string also of length m that has only the empty string as a prefix. In that case, we’d do Θ(m) work for all the first n-1 strings plus Θ(m2) work for the last, for a net total of O(mn + m2) work in the worst case.
Overall, that gives you a time complexity of O(mn log n + m2), which isn’t great.
There are faster alternatives you could consider here. For example, imagine throwing all the strings into a trie data structure. If you did this, their longest common prefix would be found by walking the trie until you found a branching node. Building the trie takes time O(mn), since you need to read each character of the input strings, for a net runtime of O(mn). This could be improved even more by realizing that you don’t actually need to store the whole trie - as soon as you find a branching node, you can discard everything else. You could simulate this by just tracking the longest match found so far, at each step comparing what you have against that matched string and truncating it if a mismatch is found. In fact, doing that starting with the shortest string first will take time O(nl), where l is the length of the shortest string, which is about as good as things can get here!
Hope this helps!
Old question, but I'll answer since no one has answered correctly.
Your solution uses a sort, which is O(nlogn), and the loop traverses the entire list, albeit in a confusing manner (why set i=0 when finding that the prefix doesn't fit as a whole? every previous word has already been confirmed to has the entire prefix up to this point, so there is no need to re-check them after reducing the size of the prefix), with each iteration doing substring of length m (where m is the minimum length string in "strs[]"), so with substring being linear time complexity, your solution is O(nlogn + nm), which simplifies to O(nlogn), which is not good.
You can easily improve upon your solution.
(Hint)
Question: why are you sorting the strs[]?
Answer: to find the minimum string, which you then compare to every other string.
So if you only need the minimum string, this can be done in linear time (O(n)), sorting is unnecessary. This would reduce your algorithms time complexity from O(nlogn + nm) to O(n + nm) --> O(nm), with O(1) space complexity.
edit: #templatetypedef left a comment, correctly indicating that O(nlogn+nm) does not simplify to O(nlogn). This was an oversight of mine, as it wasn't the root of my answer, so I didn't think much into it when writing the post. I've corrected the post to better serve people learning big-O in the future

algorithm on sorting array of pointers to strings in O(n)

This is my algorithms assignment, and I do not know how to proceed.
Given an array A of m strings, where different strings may have different numbers of
characters, but the total number of characters over all the strings in the array is n. Show
how to sort the strings in O(n) time. Note that the desired order here is the standard
alphabetical order; for example, a < ab < b.
More technically speaking, A is an array of pointers each pointing to a string (which is
another array of characters); you can think about how strings are used in C. Also, we
assume that each character can be viewed as an integer ranging from 0 to 255.
Since this is an assignment I won't provide a complete answer, merely some ideas on how to proceed.
Since the strings can be any length you need to use an O(n) sorting algorithm.
One such algorithm is bucket sort.
So how do we arrange the buckets for variable length strings?
Create 256 buckets for the first character.
Let each bucket have a counter + a set of 256 buckets for the second character and so on.
Important note: Don't create any bucket set until you need to or the memory consumption will be infinite. Let an empty bucket set be NULL
When we have the bucket system set up. How do we sort a word into the bucket system?
Let's say we have the word Yes.
First character is Y so we go to the top level bucket set. The set is NULL so we create the top level and select bucket 'Y'.
Next character is e. The bucket set under Y is NULL so we create the set and select bucket 'e'.
Next character is s. The bucket set under Ye is NULL so we create the set and select bucket 's'.
The string ends. Increase the count for the current bucket Y->e->s.
Note that the task will be simpler if you use unsigned char, because then you can use the value directly as an index into an array of length 256.
The bucket struct could look like this:
typedef struct bucket {
int count;
struct bucket *next; // points to NULL or array of 256 buckets.
} bucket;
Time Complexity:
The maximum amount of work for each character is:
end of string check + NULL check + ((allocation and initialization of array of 256 buckets (I would use calloc for this) or (increase one bucket count)) + increase loop variable.
Memory Usage
Here comes the disadvantage of bucket sort. It uses a lot of memory if you need many buckets, and this solution will use quite a number.
You may think about String as about number in 255 basis. So, if distribution is uniform, then bucket sort would give linear time; also radix sort is linear, but you need some preparations before sort (to transform String into number).

Word ranking efficiency

I am not sure how to solve this problem within the constraints.
Consider a "word" as any sequence of capital letters A-Z (not limited to just "dictionary words"). For any word with at least two different letters, there are other words composed of the same letters but in a different order (for instance, STATIONARILY/ANTIROYALIST, which happen to both be dictionary words; for our purposes "AAIILNORSTTY" is also a "word" composed of the same letters as these two). We can then assign a number to every word, based on where it falls in an alphabetically sorted list of all words made up of the same set of letters. One way to do this would be to generate the entire list of words and find the desired one, but this would be slow if the word is long. Write a program which takes a word as a command line argument and prints to standard output its number. Do not use the method above of generating the entire list. Your program should be able to accept any word 25 letters or less in length (possibly with some letters repeated), and should use no more than 1 GB of memory and take no more than 500 milliseconds to run. Any answer we check will fit in a 64-bit integer.
Sample words, with their rank:
ABAB = 2
AAAB = 1
BAAA = 4
QUESTION = 24572
BOOKKEEPER = 10743
examples:
AAAB - 1
AABA - 2
ABAA - 3
BAAA - 4
AABB - 1
ABAB - 2
ABBA - 3
BAAB - 4
BABA - 5
BBAA - 6
I thought about using a binary search for a word and all the possible words built from the characters (1 - permutation(word)) but I think that would take too long. O(logN) might be too slow.
I found this solution but I am a bit confused and need a bit of help understanding it:
Consider the n-letter word { x1, x2, ... , xn }. My solution is based on the idea that the word number will be the sum of two quantities:
The number of combinations starting with letters lower in the alphabet than x1, and
how far we are into the the arrangements that start with x1.
The trick is that the second quantity happens to be the word number of the word { x2, ... , xn }. This suggests a recursive implementation.
Getting the first quantity is a little complicated:
Let uniqLowers = { u1, u2, ... , um } = all the unique letters lower than x1
For each uj, count the number of permutations starting with uj.
Add all those up.
The solutions says that the answer consists of two numbers. Look at the following picture describing the words that can be made from the word QUESTION:
EIONQSTU (first word lexographically, rank 1)
...
...
... (first word before Q, rank A)
QEIONSTU
....
....
QUESTION (our given word, rank x)
...
This phrase "how far we are into the the arrangements that start with x1", is the quantity (x-A), call it B. The thing is B is exactly equal to the word rank of "UESTION", which is our original word with the first letter cut off. This is asking the same question but with a subset of our input, suggesting a recursive solution.
It then remains to find A, this says to find the number of permutations of words beginning with words that come before Q. So A = number of words beginning with {E, I, O, N}

Finding number of occurences of an integer in a particular range

Given an array(unsorted) of n integers and an integer 'X'. We are also given a range (low,high).
How can we find the number of occurrences of 'X' in the range (low,high). Can we do it by Segment tree..??
As the array is unsorted you need to iterate through the whole array at least once. Therefore there is no better solution than to iterate through the whole array and just count the occurrences of 'X'. The solution requires O(n) time and O(1) memory.
Regards,
Damjan

Resources