Find the first non repetitive character in a string - c

I was asked this question in an interview. My answer: "create an array of size 26. Traverse the string by each character and make counts of characters in the array. Check the first non repetitive character by traversing again in the string and checking the array if the character is not repeated." What if the string contains large set of characters like 10000 types of characters instead of 26 types of alphabets.

You can implement your original algorithm using less memory. Since you don't care about how many times the character repeated above 1, you only need 2 bits per character in the alphabet. When incrementing a value that is already above 1, just leave the value alone. The rest of your algorithm remains unchanged.
If memory is not a restriction, there is a faster algorithm that doesn't require another pass over the string. Allow each letter in the alphabet to be represented by a ListNode. Then have two lists, list1 and list2, that start out as empty. list1 contains letters that have only occurred once, and list2 contains letters that have occurred more than once. For each letter in the input string, get the corresponding ListNode, say node. If node is not in either list, put it at the end of list1. If node is already in list1, take it out, and put it in list2. After the input string is processed, if list1 is empty, there are no non-repeating characters. Otherwise, the character that corresponds with the first node in list1 is the first non-repeating character.
Follow this link to IDEONE for an implementation of the list based algorithm.

You gave the brute-force answer. A clever solution keeps an array of the size of the alphabet, we'll call it x, with each item initially -1. Then pass through the string once; if the current character has an x-value of -1, change it to the index of the current character, otherwise if the current character has an x-value greater than 0, change it to -2. At the end of the string, examine each location in the x array, keeping track of the smallest positive value and the associated character. I implement that algorithm in Scheme at my blog.

You could use a tree-like data structure instead of an array.
If our strings are not too long, loop through the string character by character and check, whether you manage to find it again. Drawback: n^2 runtime in the worst case
build a hash of each character for the first pass and then check each bucket seperately. Should be combined with the method above and I'm not sure whether it reduces runtime significantly - it depends on your real world data.

Start reading the string character-by-character
Put each character in HashMap
Return the first character which has conflict
Pros:
You don't need to create a BitMap/Bit-Array in advance
Cons:
The HashMap can grow to as much as number of characters in string, if it does not encounter repeating character (or if there is no repeating character)

Related

Is checking the size of a list linear?

I've seen many answers online saying that checking the size of a list is constant time, but I don't understand why.
My understanding was that a list isn't stored in contiguous memory chunks (like an array), meaning there is no way of getting the size of a list (last element index + 1), without first traversing through every element.
Thoughts?
I've seen many answers online saying that checking the size of a list is constant time.
This fallacy may originate in a fundamental flaw in the Python language: arrays are called lists in Python. As Python gains popularity, the word list has become ambiguous.
Computing the length of a linked list is an O(n) operation, unless the length has been stored separately and maintained properly.
Retrieving the size of an array is performed in constant time if the size is stored along with the array, as is the case in Python, so a=[1,2,3]; len(a) is indeed very fast.
Computing the length of an array may be an O(n) operation if the array must be scanned for a terminating value, such as a null pointer or a null byte. Thus strlen() in C, which computes the number of bytes in a C string (a null terminated array of char) operates in linear time.
The only way to find the length of a list without holding the locations before hand is iterating through the list. They are not stored in chunks so it requires iterating until reaching the last element. Therefore the time complexity is O(n). Of course if you store a variable holding the length and increment it every time an element is added (and decrement when an element is removed), you would not need to iterate through it which would make it constant as you would only need the first element, or retrieve the data from wherever the length is stored. Perhaps one could use the root element to hold the length therefore making it unnecessary to loop through it to get the length. In short, you are correct. The reason one might say constant is if it is stored beforehand.

Compare C char arrays of varying lengths

I want to greate a function that will determine if a C character array(not string, no null terminator) is less than another. For my uses, if one array ends and another doesn't you keep testing the last character against the other array ends. Capitals don't matter so I'm trying to use tolower()
Examples.
Aaah should be less than America
Watermelon should be less than wade
Watermelon should NOT be less than water.
Hello should be less than HelloWorld
How can I write a function to do this, given the size of both arrays?
Go through both arrays starting at index 0 and compare the characters at each index. If the characters are the same, go to the next index, and once they're not the same then you'll know which array is less than the other.

What happens when a string contains only '\0'? C

In my program I'm reading words from a .txt file and I will be inserting them into both a linked list and a hash table.
If two '\n' characters are read in a row after a word then the second word the program will read will be '\n', however I then overwrite it with '\0', so essentially the string contains only '\0'.
Is it worth me putting an if statement so the next part of my program only executes if the word is a real word (i.e. word[0] != '\n')? Would the string '\0' use up space in the hash table/linked list?
In C a character array with first element being \0 is an empty string, i.e. of length zero. There's not much sense in keeping empty strings in containers, if that's what you are asking.
It depends if you consider an empty string a valid entry. You seem to be storing words so I would guess that an empty string is of no interest, but that is application specific.
For example, an environment variable can be present (getenv returns a valid pointer) but the value can be "unset": an empty string. In that case the fact that the value is an empty string might be significant.
So, if an empty string is not significant is it worth adding an if statement to ignore it? Generally that would be a "yes", since the overhead of storing and maintaining the empty string could be significantly more than one if statement per entry. But of course that is only a guess, I don't know what your overheads are, how many times that if would get executed, and how many empty string entries you would be saving. You might not know that either, so my fallback position would be only to store data that is significant.

iterating string in C, word by word

I just started learning C. What I am trying to right now is that I have two strings in which each word is separated by white spaces and I have to return the number of matching words in both strings. So, is there any function in C where I can take each word and compare it to everyother word in another string, if not any idea on how I can do that.
Break up the first string in words, this you can do in any number of ways everything from looping through the character array inserting \0 at each space to using strtok.
For each word found, go through the other string using strstr which checks if a string is in there. just check return value from strstr, if != NULL it found it.
I'd not use strtok but stick with pointer arithmetics length comparison and memcmp to compare strings of equal length.
There are two problems here:
1) splitting each string into words
The strtok() function can split a string into words.
It is a meaningful exercise to imagine how you might write your own equivalent to strtok.
The rosetta project shows both a strtok and a custom method approach to precisely this problem.
I would naturally write my own parser, as its the kind of code that appeals to me. It could be a fun exercise for you.
2) finding those words in one string that are also in another
If you iterate over each word in one string for each word in another, it has O(n*n) complexity.
If you index the words in one string it will take just O(n) which is substantially quicker (if your input is large enough to make this interesting). It is worth imagining how you might build a hashtable of the words in one string so that you can look for the words in the other.

Remove spaces from a string, but not at the beginning or end

I am trying to remove spaces from a string in C, not from the end, nor the beginning, just multiple spaces in a string
For example
hello everyone this is a test
has two spaces between hello and everyone, and five spaces from this to is. Ultimately I would want to remove 1 space from the 2 and 4 from the 5, so every gap has 1 space exactly. Make sense?
This is what I was going to do:
create a pointer, point it to the string at element 1 char[0].
do a for loop through the length of the string
then my logic is, if my pointer at [i] is a space and my pointer at element [i+1] space then to do something
I am not quite sure what would be a good solution from here, bearing in mind I won't be using any pre-built functions. Does anyone have any ideas?
One way is to do it in-place. Loop through the string from the beginning to end. store a write pointer and a read pointer. Each loop the write pointer and read pointer advances by one. When you encounter a space transfer it as normal but then loop the read pointer incrementing each time until a non-space is found (Or the end of the string, obviously). Don't forget to add a '\0' at the end and you now have the same string without the spaces.
Are you allowed to use extra memory to create a duplicate of the string or you need to do the processing in place?
The easiest will be to allocate memory equally to the size of the original string and copy all characters there. If you meet an extra space, do not copy it.
If you need to do it in place, then create two pointers. One pointing to the character being read and one to the character being copied. When you meet an extra space, then adapt the 'read' pointer to point to the next non space character. Copy to the write position the character pointed by the read character. Then advance the read pointer to the character after the character being copied. The write pointer is incremented by one, whenever a copy is performed.
Example:
write
V
xxxx_xxxx__xxx
^
Read
A hard part here is that you can not remove an element from the array of characters easily. You could of course make a function that returns a char[] that has one particular element removed. Another option is to make an extra array that indicates which characters you should keep and afterward go over the char[] one more time only copying the characters you want to keep.
This is based on what Goz said, but I think he had finger trouble, because I'm pretty sure what he described would strip out all spaces (not just the second onwards of each run).
EDIT - oops - wrong about Goz, though the "extra one" wording would only cover runs of two spaces correctly.
EDIT - oops - pre-written solution removed...
The general idea, though, is to use the "from" and "to" pointers as others did, but also to preserve some information (state) from one iteration to the next so that you can decide whether you're in a run of spaces already or not.
You could do a find and replace for "  " and " ", and keep doing it until no more matches are found. Innefficient, but logical.

Resources