Compare C char arrays of varying lengths - c

I want to greate a function that will determine if a C character array(not string, no null terminator) is less than another. For my uses, if one array ends and another doesn't you keep testing the last character against the other array ends. Capitals don't matter so I'm trying to use tolower()
Examples.
Aaah should be less than America
Watermelon should be less than wade
Watermelon should NOT be less than water.
Hello should be less than HelloWorld
How can I write a function to do this, given the size of both arrays?

Go through both arrays starting at index 0 and compare the characters at each index. If the characters are the same, go to the next index, and once they're not the same then you'll know which array is less than the other.

Related

How to compare two strings and return the number of words that are the same?

I am making a code in C, and I have not gotten an efficient way to make this comparison, if someone could help me I would be very grateful.
EXAMPLE:
W1: Big house with white walls
W2: house walls
return: 2
When thinking how to solve the problem, it helps to break it down into steps that you can see clearly how to proceed, and how they progress toward the solution. The usual approach for something like this would be
make a copy of each string
for each string, make an array of pointers to char*, which
point into the copied strings, which in turn
parsed into "words" (putting '\0' at all of the non-word characters
run qsort on the array of strings
Then, having two sorted arrays of pointers to words, you can write a loop using strcmp for checking equality of the words. I suggested strcmp because (since the arrays are sorted) it is simple to check when a word is missing from one or the other of the two arrays versus the other.
The copy/parse/sort part would naturally be a function, given a string and returning the array of pointers. The caller should free that (and the chopped-up string to which it points).

Finding the number of occurrences of each character in a String or character array

I am going over some interview preparation material and I was wondering what the best way to solve this problem would be if the characters in the String or array can be unicode characters. If it they were strictly ascii, you could make an int array of size 256 and map each ascii character to an index and that position in the array would represent the number of occurrences. If the string has unicode characters is it still possible to do so, i.e. does the unicode character a reasonable size that you could represent it using the indexes of a integer array? Since unicode characters can be more than 1 byte in size, what data type would you use to represent them? What would be the most optimal solution for this case?
Since Unicode only defines code points in the range [0, 221), you only need an array of 221 (i.e. 2 million) elements, which should fit comfortably into memory.
An array wouldn't be practical when using Unicode. This is because Unicode defines (less than) 221 characters.
Instead, consider using two parallel vectors, one for the character and one for the count. The setup would look something like this:
<'c', '$', 'F', '¿', '¤'> //unicode characters
< 1 , 3 , 1 , 9 , 4 > //number of times each character has appeared.
EDIT
After seeing Kerrek's answer, I must admit, an array of size 2 million would be reasonable. The amount of memory it would take up would be in the Megabyte range.
But as it's for an interview, I wouldn't recommend having an array 2 million elements long, especially if many of those slots will be unused (not all Unicode characters will appear, most likely). They're probably looking for something a little more elegant.
SECOND EDIT
As per the comments here, Kerrek's answer does indeed seem to be more efficient as well as easier to code.
While others here are focusing on data structures, you should also know that the notion of "Unicode character" is somewhat ill-defined. That's a potential interview trap. Consider: are å and å the same character? The first one is a "latin small letter a with ring above" (codepoint U+00E5). The second one is a "latin small letter a" (codepoint U+0061) followed by a "combining ring above" (U+030A). Depending on the purpose of the count, you might need to consider these as the same character.
You might want to look into Unicode normalization forms. It's great fun.
Convert string to UTF-32.
Sort the 32-bit characters.
Getting character counts is now trivial.

Find the first non repetitive character in a string

I was asked this question in an interview. My answer: "create an array of size 26. Traverse the string by each character and make counts of characters in the array. Check the first non repetitive character by traversing again in the string and checking the array if the character is not repeated." What if the string contains large set of characters like 10000 types of characters instead of 26 types of alphabets.
You can implement your original algorithm using less memory. Since you don't care about how many times the character repeated above 1, you only need 2 bits per character in the alphabet. When incrementing a value that is already above 1, just leave the value alone. The rest of your algorithm remains unchanged.
If memory is not a restriction, there is a faster algorithm that doesn't require another pass over the string. Allow each letter in the alphabet to be represented by a ListNode. Then have two lists, list1 and list2, that start out as empty. list1 contains letters that have only occurred once, and list2 contains letters that have occurred more than once. For each letter in the input string, get the corresponding ListNode, say node. If node is not in either list, put it at the end of list1. If node is already in list1, take it out, and put it in list2. After the input string is processed, if list1 is empty, there are no non-repeating characters. Otherwise, the character that corresponds with the first node in list1 is the first non-repeating character.
Follow this link to IDEONE for an implementation of the list based algorithm.
You gave the brute-force answer. A clever solution keeps an array of the size of the alphabet, we'll call it x, with each item initially -1. Then pass through the string once; if the current character has an x-value of -1, change it to the index of the current character, otherwise if the current character has an x-value greater than 0, change it to -2. At the end of the string, examine each location in the x array, keeping track of the smallest positive value and the associated character. I implement that algorithm in Scheme at my blog.
You could use a tree-like data structure instead of an array.
If our strings are not too long, loop through the string character by character and check, whether you manage to find it again. Drawback: n^2 runtime in the worst case
build a hash of each character for the first pass and then check each bucket seperately. Should be combined with the method above and I'm not sure whether it reduces runtime significantly - it depends on your real world data.
Start reading the string character-by-character
Put each character in HashMap
Return the first character which has conflict
Pros:
You don't need to create a BitMap/Bit-Array in advance
Cons:
The HashMap can grow to as much as number of characters in string, if it does not encounter repeating character (or if there is no repeating character)

iterating string in C, word by word

I just started learning C. What I am trying to right now is that I have two strings in which each word is separated by white spaces and I have to return the number of matching words in both strings. So, is there any function in C where I can take each word and compare it to everyother word in another string, if not any idea on how I can do that.
Break up the first string in words, this you can do in any number of ways everything from looping through the character array inserting \0 at each space to using strtok.
For each word found, go through the other string using strstr which checks if a string is in there. just check return value from strstr, if != NULL it found it.
I'd not use strtok but stick with pointer arithmetics length comparison and memcmp to compare strings of equal length.
There are two problems here:
1) splitting each string into words
The strtok() function can split a string into words.
It is a meaningful exercise to imagine how you might write your own equivalent to strtok.
The rosetta project shows both a strtok and a custom method approach to precisely this problem.
I would naturally write my own parser, as its the kind of code that appeals to me. It could be a fun exercise for you.
2) finding those words in one string that are also in another
If you iterate over each word in one string for each word in another, it has O(n*n) complexity.
If you index the words in one string it will take just O(n) which is substantially quicker (if your input is large enough to make this interesting). It is worth imagining how you might build a hashtable of the words in one string so that you can look for the words in the other.

Help with this ragged array/substring program in C?

so, I have a ragged array (called names) of length 200. each pointer in the array points to a string that is no longer than 50 characters and has no white spaces. I also have a string given through user-input called inname, with length 50, inname will be one of the strings stored in names. I need to find a way to go through the strings in my ragged array and find the string with the largest substring overlap with inname, excluding inname itself as that will be in the file. If no string has an overlap then we print out "no recommendation".
I've been melting trying to puzzle this out for many hours now, helps? O:) SO basically, the program finds the name in the array with the largest substring overlap with inname.
will edit to provide additional info if you need it
You should probably start on the smaller problem of determining the size of the overlap between infunc and a single string.
Wikipedia goes over some algorithms for solving the longest common substring problem (including pseudocode!)
This won't lead you to the most efficient way to find overlap (dynamic programming is one way--there are other crazy methods like suffix trees), but it should get you started:
First, think about how you would find the length of the overlap with the beginning of both strings aligned. For instance, find the longest overlap between these two:
programming
ungrammatical
In this case, only a single m overlaps--a length of 1.
Then think about how you would "shift" the strings and look for the overlap when they are aligned differently. (Don't actually change the strings: just change how you loop through them to compare them.) What is the overlap between these two?
programming
ungrammatical
Think about how to look at all possible alignments. If you keep track of the longest one you find, you have the longest alignment between two particular strings.
After that, move on to checking all the different strings. Keep track of the one with the best match, and again, once you have looked at all of them, you have your answer.

Resources