Appending string much faster than appending character - arrays

I was doing https://www.hackerrank.com/challenges/30-review-loop problem on hacker rank and I was running into a time out issue that was resolved in round about way. I was hoping someone on here can explain to me why one is faster than the other. Or point me to documentation that explains this phenomenon
If you don't have an account here's a description of the issue you feed in the number of test cases and then a string which your code is to create a string with all the characters in the odd indices and a string with all the characters in the even indices. Example input
2
Hacker
Rank
returns
Hce akr
Rn ak
Simple right? Here's the code I made.
if let line = readLine(), numOftests = Int(line) {
for iter in 0..<numOftests {
var evenString = ""
var oddString = ""
var string = readLine()!
var arrChars = [Character](string.characters) //1
for idx in 0..<string.characters.count {
if idx % 2 == 0 {
oddString.append(arrChars[idx]) //1
//oddString.append(string[string.startIndex.advancedBy(idx)]) //2 <= Times out
}
else {
evenString.append(arrChars[idx]) //1
//evenString.append(string[string.startIndex.advancedBy(idx)]) //2 <= Times out
}
}
print("\(oddString) \(evenString)")
}
}
Originally I used the commented out code. This lead to a time out. To sum my problem is that using the subscripting system for a string, causes it to be a lot slower than indexing an array of characters. It caught me by surprise and if it wasn't for the discussion group in hacker rank I wouldn't have found a solution. Now it goads me because I don't know why this would make a difference.

The issue isn't the speed of appending a string vs. appending a character. The issue is how long it takes to locate the value you are appending.
Indexing an array is O(1) which means it happens in the same time whether you are accessing the first character or the 97th. It is efficient because Swift knows the size of the array elements, so it can just multiply the index by the size of the element to find the nth element.
string.startIndex.advancedBy(idx) is O(idx). It will take longer depending on how far you go into the string. Accessing the 97th character will take about 97 times as long as accessing the first character. Why? Because characters in Swift are not uniform in size. Swift strings are fully unicode compatible and a "😀" takes more bytes to represent than "A". So, it is necessary to look at every character from the startIndex to the one you are accessing.
That said, there is no reason for you to start at startIndex each time. If you kept the current index in a variable, you could advance it by 1 each time which would make the string indexing version about the same speed as the character array indexing version.
var currentIndex = string.startIndex
for idx in 0..<string.characters.count {
if idx % 2 == 0 {
oddString.append(string[currentIndex])
}
else {
evenString.append(string[currentIndex])
}
currentIndex = currentIndex.successor()
}
That said, I would probably write it like this:
for (idx, char) in string.characters.enumerate() {
if idx % 2 == 0 {
oddString.append(char)
}
else {
evenString.append(char)
}
}

Related

Why does this simple array access not work in Swift?

var word = "morning"
var arr = Array(word)
for s in 0...word.count {
print(arr[s])
}
This will not print. Of course, if I substitute a number for s, the code works fine.
Why will it not accept a variable in the array access braces? Is this peculiar to Swift?
I've spent a long time trying to figure this out, and it's nothing to do with s being optional.
Anyone understand this?
you are using inclusive range ... instead of ..<, so s goes from 0 to 7, not 0 to 6.
However, in arr the index goes from 0 to 6 because there are 7 characters.
Thus, when the program tries to access arr[7], it throws an index out of range error.
If you were coding on Xcode, the debugger would have told you that there is no arr[7].
As for the code, here is a better way to print every item in arr than using an index counter:
var word = "morning"
var arr = Array(word)
for s in arr {
print(s)
}
This is called a "foreach loop", for each item in arr, it assigns it to s, performs the code in the loop, and moves on to the next item, assigns it to s, and so on.
When you have to access every element in an array or a collection, foreach loop is generally considered to be a more elegant way to do so, unless you need to store the index of a certain item during the loop, in which case the only option is the range-based for loop (which you are using).
Happy coding!
When I run it, it prints the array then throws the error Fatal error: Index out of range. To fix this, change the for loop to:
for s in 0..<word.count {
print(arr[s])
}
try this
when you use a word to recognize
size of Array your Array index start as 0 so array last index must be equal with your (word.count - 1)
var word = "morning"
var arr = Array(word)
for s in 0...(word.count-1) {
print(arr[s])
}
Basically avoid index based for loops as much as possible.
To print each character of a string simply use
var word = "morning"
for s in word { // in Swift 3 word.characters
print(s)
}
To solve the index issue you have to use the half-open range operator ..< since indexes are zero-based.

Swift Dictionary of Arrays

I am making an app that has different game modes, and each game mode has a few scores. I am trying to store all the scores in a dictionary of arrays, where the dictionary's key is a game's id (a String), and the associated array has the list of scores for that game mode. But when I try to initialize the arrays' values to random values, Swift breaks, giving me the error below. This chunk of code will break in a playground. What am I doing wrong?
let modes = ["mode1", "mode2", "mode3"]
var dict = Dictionary<String, [Int]>()
for mode in modes
{
dict[mode] = Array<Int>()
for j in 1...5
{
dict[mode]?.append(j)
let array:[Int] = dict[mode]!
let value:Int = array[j] //breaks here
}
}
ERROR:
Execution was interrupted, reason: EXC_BAD_INSTRUCTION(code=EXC_I386_INVOP, subcode=0x0).
Your problem is array subscripts are zero-based. So when you write:
var a: [Int] = []
for i in 1...5 {
a.append(42)
println(a[i])
}
you will get a runtime error, because first time around the loop you are subscripting a[1] when there is only an a[0]. In your code, you either need to do for j in 0..<5 or let value = array[j-1].
By the way, even though it’s perfectly safe to do dict[mode]! (since you just added it), it’s a habit best avoided as one of these days your code won’t be as correct as you think, and that ! will explode in your face. There’s almost always a better way to write what you want without needing !.
Also, generally speaking, whenever you use array subscripts you are risking an accidental screw-up by accidentally addressing an out-of-bounds index like here. There are lots of alternatives that mean actually using a[i] is easy to avoid:
If you want the indices for a collection (like an array), instead of:
for i in 0..<a.count { }
you can write
for i in indices(a) { }
If you want to number the elements in an array, instead of
for i in indices(a) { println("item \(i) is \(a[i])" }
you can write
for (i, elem) in enumerate(a) { println("item \(i) is \(elem)") }
If the collection happens to have an Int for an index (such as Array), you can use i as an index, but if it doesn’t (such as String) an alternative to get the index and element is:
let s = "hello"
for (idx, char) in Zip2(indices(s),s) { }
If you want the first or last element of an array, instead of:
if a.count > 0 { let x = a[0] }
if a.count > 0 { let x = a[a.count - 1] }
you can write
if let first = a.first { let x = first }
if let last = a.last { let x = first }
Prefer map, filter and reduce to for loops in general (but don’t obsess over it, sometimes a for loop is better)

Find longest suffix of string in given array

Given a string and array of strings find the longest suffix of string in array.
for example
string = google.com.tr
array = tr, nic.tr, gov.nic.tr, org.tr, com.tr
returns com.tr
I have tried to use binary search with specific comparator, but failed.
C-code would be welcome.
Edit:
I should have said that im looking for a solution where i can do as much work as i can in preparation step (when i only have a array of suffixes, and i can sort it in every way possible, build any data-structure around it etc..), and than for given string find its suffix in this array as fast as possible. Also i know that i can build a trie out of this array, and probably this will give me best performance possible, BUT im very lazy and keeping a trie in raw C in huge peace of tangled enterprise code is no fun at all. So some binsearch-like approach will be very welcome.
Assuming constant time addressing of characters within strings this problem is isomorphic to finding the largest prefix.
Let i = 0.
Let S = null
Let c = prefix[i]
Remove strings a from A if a[i] != c and if A. Replace S with a if a.Length == i + 1.
Increment i.
Go to step 3.
Is that what you're looking for?
Example:
prefix = rt.moc.elgoog
array = rt.moc, rt.org, rt.cin.vof, rt.cin, rt
Pass 0: prefix[0] is 'r' and array[j][0] == 'r' for all j so nothing is removed from the array. i + 1 -> 0 + 1 -> 1 is our target length, but none of the strings have a length of 1, so S remains null.
Pass 1: prefix[1] is 't' and array[j][1] == 'r' for all j so nothing is removed from the array. However there is a string that has length 2, so S becomes rt.
Pass 2: prefix[2] is '.' and array[j][2] == '.' for the remaining strings so nothing changes.
Pass 3: prefix[3] is 'm' and array[j][3] != 'm' for rt.org, rt.cin.vof, and rt.cin so those strings are removed.
etc.
Another naïve, pseudo-answer.
Set boolean "found" to false. While "found" is false, iterate over the array comparing the source string to the strings in the array. If there's a match, set "found" to true and break. If there's no match, use something like strchr() to get to the segment of the string following the first period. Iterate over the array again. Continue until there's a match, or until the last segment of the source string has been compared to all the strings in the array and failed to match.
Not very efficient....
Naive, pseudo-answer:
Sort array of suffixes by length (yes, there may be strings of same length, which is a problem with the question you are asking I think)
Iterate over array and see if suffix is in given string
If it is, exit the loop because you are done! If not, continue.
Alternatively, you could skip the sorting and just iterate, assigning the biggestString if the currentString is bigger than the biggestString that has matched.
Edit 0:
Maybe you could improve this by looking at your array before hand and considering "minimal" elements that need to be checked.
For instance, if .com appears in 20 members you could just check .com against the given string to potentially eliminate 20 candidates.
Edit 1:
On second thought, in order to compare elements in the array you will need to use a string comparison. My feeling is that any gain you get out of an attempt at optimizing the list of strings for comparison might be negated by the expense of comparing them before doing so, if that makes sense. Would appreciate if a CS type could correct me here...
If your array of strings is something along the following:
char string[STRINGS][MAX_STRING_LENGTH];
string[0]="google.com.tr";
string[1]="nic.tr";
etc, then you can simply do this:
int x, max = 0;
for (x = 0; x < STRINGS; x++) {
if (strlen(string[x]) > max) {
max = strlen(string[x]);
}
}
x = 0;
while(true) {
if (string[max][x] == ".") {
GOTO out;
}
x++;
}
out:
char output[MAX_STRING_LENGTH];
int y = 0;
while (string[max][x] != NULL) {
output[y++] = string[++x];
}
(The above code may not actually work (errors, etc.), but you should get the general idea.
Why don't you use suffix arrays ? It works when you have large number of suffixes.
Complexity, O(n(logn)^2), there are O(nlogn) versions too.
Implementation in c here. You can also try googling suffix arrays.

C - Returning the most repeated/occurring string in an array of char pointers

I have almost completed the code for this problem, which I shall state as under:
Given:
Array of length 'n' (say n = 10000) declared as below,
char **records = malloc(10000*sizeof(*records));
Each record[i] is a char pointer and points to a non-empty string.
records[i] = malloc(11);
The strings are of fixed length (10 chars + '\0').
Requirement:
Return the most frequently occurring string in the above array.
But now, I am interested in obtaining a slightly less brutal algorithm than the primitive one which I have currently, which is to sift through the entire array in two for loops :(, storing strings encountered by the two loops in a temporary array of similar size ('n' - in case all are unique strings) for comparison with the next strings. The inner loop iterates from 'outer loop position + 1' to 'n'. At the same time, I have an integer array, of similar size - 'n', for counting repeat occurrences, with each i th element corresponding to the i th (unique) string in the comparison array. Then find the largest integer and use its index in the comparison array to return the most frequently occurring string.
I hope I am clear enough. I am quite ashamed of the algo myself, but it had to be done. I am sure there is a much smarter way to do this in C.
Have a great Sunday,
Cheers!
Without being good at nice algorithms (Google, Wikipedia and Stackoverflow are good enough for me), one solution that comes out at the top of my head is to sort the array, then use a single loop to go through the entries. As long as the current string is the same as the previous, increase a counter for that string. When done you have a "list" of strings and their occurrence, which can then be sorted if needed.
In most languages, the usual approach would be to construct a hashtable, mapping strings to counts. This has O(N) complexity.
For example, in Python (although usually you would use collections.Counter for this, and even this code can be made more concise using more specialised Python knowledge, but I've made it explicit for demonstration).
def most_common(strings):
counts = {}
for s in strings:
if s not in counts:
counts[s] = 0
counts[s] += 1
return max(counts, key=counts.get)
But in C, you don't have a hashtable in the standard library (although in C++ you can use hash_map from the STL), so a sort and scan can be done instead. It's O(N.log(N)) complexity, which is worse than optimal, but quite practical.
Here's some C (actually C99) code that implements this.
int compare_strings(const void*s0, const void*s1) {
return strcmp((const char*)s0, (const char*)s1);
}
const char *most_common(const char **records, size_t n) {
qsort(records, n, sizeof(records[0]), compare_strings);
const char *best = 0; // The most common string found so far.
size_t max = 0; // The longest run found.
size_t run = 0; // The length of the current run.
for (size_t i = 0; i < n; i++) {
if (!compare_strings(records[i], records[i - run])) {
run += 1;
} else {
run = 1;
}
if (run > max) {
best = records[i];
max = run;
}
}
return best;
}

How do I remove duplicate strings from an array in C?

I have an array of strings in C and an integer indicating how many strings are in the array.
char *strarray[MAX];
int strcount;
In this array, the highest index (where 10 is higher than 0) is the most recent item added and the lowest index is the most distant item added. The order of items within the array matters.
I need a quick way to check the array for duplicates, remove all but the highest index duplicate, and collapse the array.
For example:
strarray[0] = "Line 1";
strarray[1] = "Line 2";
strarray[2] = "Line 3";
strarray[3] = "Line 2";
strarray[4] = "Line 4";
would become:
strarray[0] = "Line 1";
strarray[1] = "Line 3";
strarray[2] = "Line 2";
strarray[3] = "Line 4";
Index 1 of the original array was removed and indexes 2, 3, and 4 slid downwards to fill the gap.
I have one idea of how to do it. It is untested and I am currently attempting to code it but just from my faint understanding, I am sure this is a horrendous algorithm.
The algorithm presented below would be ran every time a new string is added to the strarray.
For the interest of showing that I am trying, I will include my proposed algorithm below:
Search entire strarray for match to str
If no match, do nothing
If match found, put str in strarray
Now we have a strarray with a max of 1 duplicate entry
Add highest index strarray string to lowest index of temporary string array
Continue downwards into strarray and check each element
If duplicate found, skip it
If not, add it to the next highest index of the temporary string array
Reverse temporary string array and copy to strarray
Once again, this is untested (I am currently implementing it now). I just hope someone out there will have a much better solution.
The order of items is important and the code must utilize the C language (not C++). The lowest index duplicates should be removed and the single highest index kept.
Thank you!
The typical efficient unique function is to:
Sort the given array.
Verify that consecutive runs of the same item are setup so that only one remains.
I believe you can use qsort in combination with strcmp to accomplish the first part; writing an efficient remove would be all on you though.
Unfortunately I don't have specific ideas here; this is kind of a grey area for me because I'm usually using C++, where this would be a simple:
std::vector<std::string> src;
std::sort(src.begin(), src.end());
src.remove(std::unique(src.begin(), src.end()), src.end);
I know you can't use C++, but the implementation should essentially be the same.
Because you need to save the original order, you can have something like:
typedef struct
{
int originalPosition;
char * string;
} tempUniqueEntry;
Do your first sort with respect to string, remove unique sets of elements on the sorted set, then resort with respect to originalPosition. This way you still get O(n lg n) performance, yet you don't lose the original order.
EDIT2:
Simple C implementation example of std::unique:
tempUniqueEntry* unique ( tempUniqueEntry * first, tempUniqueEntry * last )
{
tempUniqueEntry *result=first;
while (++first != last)
{
if (strcmp(result->string,first->string))
*(++result)=*first;
}
return ++result;
}
I don't quite understand your proposed algorithm (I don't understand what it means to add a string to an index in step 5), but what I would do is:
unsigned int i;
for (i = n; i > 0; i--)
{
unsigned int j;
if (strarray[i - 1] == NULL)
{
continue;
}
for (j = i - 1; j > 0; j--)
{
if (strcmp(strarray[i - 1], strarray[j - 1]) == 0)
{
strarray[j - 1] = NULL;
}
}
}
Then you just need to filter the null pointers out of your array (which I'll leave as an exercise).
A different approach would be to iterate backwards over the array and to insert each item into a (balanced) binary search tree as you go. If the item is already in the binary search tree, flag the array item (such as setting the array element to NULL) and move on. When you've processed the entire array, filter out the flagged elements as before. This would have slightly more overhead and would consume more space, but its running time would be O(n log n) instead of O(n^2).
Can you control the input as it is going into the array? If so, just do something like this:
int addToArray(const char * toadd, char * strarray[], int strcount)
{
const int toaddlen = strlen(toadd);
// Add new string to end.
// Remember to add one for the \0 terminator.
strarray[strcount] = malloc(sizeof(char) * (toaddlen + 1));
strncpy(strarray[strcount], toadd, toaddlen + 1);
// Search for a duplicate.
// Note that we are cutting the new array short by one.
for(int i = 0; i < strcount; ++i)
{
if (strncmp(strarray[i], toaddlen + 1) == 0)
{
// Found duplicate.
// Remove it and compact.
// Note use of new array size here.
free(strarray[i]);
for(int k = i + 1; k < strcount + 1; ++k)
strarray[i] = strarray[k];
strarray[strcount] = null;
return strcount;
}
}
// No duplicate found.
return (strcount + 1);
}
You can always use the above function looping over the elements of an existing array, building a new array without duplicates.
PS: If you are doing this type of operation a lot, you should move away from an array as your storage structure, and used a linked list instead. They are much more efficient for removing elements from a location other than the end.
Sort the array with an algorithm like qsort (man 3 qsort in the terminal to see how it should be used) and then use the function strcmp to compare the strings and find duplicates
If you want to mantain the original order you could use a O(N^2) complexity algorithm nesting two for, the first each time pick an element to compare to the other and the second for will be used to scan the rest of the array to find if the chosen element is a duplicate.

Resources