Trimming a char array in C - c

I am working on an assignment and I have been noticing a problem in my coding assignment. It is not clear to me how to tackle this problem, probably due to a lack of sleep but anyway. I need to trim a char array of it's white spaces for this assignment.
The solution I thought of involved a second char array and just simply copy the non white spaces to that array and I'm done. But how can I create a char array without knowing it's size, because at that moment I do not yet know the size. I still need to trim it in order to know how many characters need to be copied to the new array, which varies in the assignment
I know there are a lot of good questions out here on stackoverflow but I think this has more to do with the thought process rather then the correct syntax.
My second problem is how do I perform a fscanf/fgetc on a char array since it needs a stream, is it sufficient to give it a pointer rather then a stream?

If making the change in-place simply, shift every chracter after a space back, and repeat till the end of the array. This is very inefficient.
If making a new copy, make a new array of the same length, and then do as you were doing (copy all the non-space characters). If you copy the \0 character as well, then there will be no string termination issue. This is much more efficient.
Going by your comments, it appears you may have the option to input the array in any form you wish. I would then recommend that instead of doing text manipulations later on, just input the string in the form you need.
You can simply use scanf or fscanf repeatedly, to input the separate words into the same array. This will take care of all the whitespaces.

Here is one partial idea: You can make a first pass on the char array and count the blanks, then take the string length minus the blanks for the second array, then perform your copy across skipping the blanks.
You could also create a pass through the array:
Test until end of array:
Is my (Current/Index) position blank? (A space)
If so, grab next available non-blank value and put it there.
then index++
If not, index++
Not sure on the second, will do some checking and see if I can find a good answer there too.

Related

Check if a string has only whitespace characters in C

I am implementing a shell in C11, and I want to check if the input has the correct syntax before doing a system call to execute the command. One of the possible inputs that I want to guard against is a string made up of only white-space characters. What is an efficient way to check if a string contains only white spaces, tabs or any other white-space characters?
The solution must be in C11, and preferably using standard libraries. The string read from the command line using readline() from readline.h, and it is a saved in a char array (char[]). So far, the only solution that I've thought of is to loop over the array, and check each individual char with isspace(). Is there a more efficient way?
So far, the only solution that I've thought of is to loop over the array, and check each individual char with isspace().
That sounds about right!
Is there a more efficient way?
Not really. You need to check each character if you want to be sure only space is present. There could be some trick involving bitmasks to detect non-space characters in a faster way (like strlen() does to find a NUL terminator), but I would definitely not advise it.
You could make use of strspn() or strcspn() checking the returned value, but that would surely be slower since those functions are meant to work on arbitrary accept/reject strings and need to build lookup tables first, while isspace() is optimized for its purpose using a pre-built lookup table, and will most probably also get inlined by the compiler using proper optimization flags. Other than this, vectorization of the code seems like the only way to speed things up further. Compile with -O3 -march=native -ftree-vectorize (see also this post) and run some benchmarks.
"loop over the array, and check each individual char with isspace()" --> Yes go with that.
The time to do that is trivial compared to readline().
I'm going to provide an alternative solution to your problem: use strtok. It splits a string into substrings based on a specific set of ignored delimiters. With an empty string, you'd just get no tokens at all.
If you need more complicated matching than that for your shell (eg. To do quoted arguments) you're best off writing a small tokenizer/lexer. The strtok method is basically to just look for any of the delimeters you've specified, temporarily replace them with \0, returning the substring up to that point, putting the old character back, and repeating until it reaches the end of the string.
Edit:
As the busybee points out in the comment below, strtok does not put back the character that it replaces with \0. The above paragraph was worded poorly, but my intent was to explain how to implement your own simple tokenizer/lexer if you needed to, not to explain exactly how strtok works down to the smallest detail.

What happens when a string contains only '\0'? C

In my program I'm reading words from a .txt file and I will be inserting them into both a linked list and a hash table.
If two '\n' characters are read in a row after a word then the second word the program will read will be '\n', however I then overwrite it with '\0', so essentially the string contains only '\0'.
Is it worth me putting an if statement so the next part of my program only executes if the word is a real word (i.e. word[0] != '\n')? Would the string '\0' use up space in the hash table/linked list?
In C a character array with first element being \0 is an empty string, i.e. of length zero. There's not much sense in keeping empty strings in containers, if that's what you are asking.
It depends if you consider an empty string a valid entry. You seem to be storing words so I would guess that an empty string is of no interest, but that is application specific.
For example, an environment variable can be present (getenv returns a valid pointer) but the value can be "unset": an empty string. In that case the fact that the value is an empty string might be significant.
So, if an empty string is not significant is it worth adding an if statement to ignore it? Generally that would be a "yes", since the overhead of storing and maintaining the empty string could be significantly more than one if statement per entry. But of course that is only a guess, I don't know what your overheads are, how many times that if would get executed, and how many empty string entries you would be saving. You might not know that either, so my fallback position would be only to store data that is significant.

iterating string in C, word by word

I just started learning C. What I am trying to right now is that I have two strings in which each word is separated by white spaces and I have to return the number of matching words in both strings. So, is there any function in C where I can take each word and compare it to everyother word in another string, if not any idea on how I can do that.
Break up the first string in words, this you can do in any number of ways everything from looping through the character array inserting \0 at each space to using strtok.
For each word found, go through the other string using strstr which checks if a string is in there. just check return value from strstr, if != NULL it found it.
I'd not use strtok but stick with pointer arithmetics length comparison and memcmp to compare strings of equal length.
There are two problems here:
1) splitting each string into words
The strtok() function can split a string into words.
It is a meaningful exercise to imagine how you might write your own equivalent to strtok.
The rosetta project shows both a strtok and a custom method approach to precisely this problem.
I would naturally write my own parser, as its the kind of code that appeals to me. It could be a fun exercise for you.
2) finding those words in one string that are also in another
If you iterate over each word in one string for each word in another, it has O(n*n) complexity.
If you index the words in one string it will take just O(n) which is substantially quicker (if your input is large enough to make this interesting). It is worth imagining how you might build a hashtable of the words in one string so that you can look for the words in the other.

OK to use a terminator to manage fixed length arrays?

I'm working in ANSI C with lots of fixed length arrays. Rather than setting an array length variable for every array, it seems easier just to add a "NULL" terminator at the end of the array, similar to character strings. Fot my current app I'm using "999999" which would never occur in the actual arrays. I can execute loops and determine array lengths just by looking for the terminator. Is this a common approach? What are the issues with it? Thanks.
This approach is technically used by your main arguments, where the last value is a terminal NULL, but it's also accompanied by an argc that tells you the size.
Using just terminals sounds like it's more prone to mistakes in the future. What's wrong with storing the size along with an array?
Something like:
struct fixed_array {
unsigned long len;
int arr[];
};
This will also be more efficient and less error-prone.
The main problem I can think of is that keeping track of the length can be useful because there are built in functions in C that take length as a parameter, and you need it to know the length to know where to add the next element too.
In reality it depends on the size of your array, if it is a huge array than you should keep track of the length. Otherwise looping through it to determine the length every time you want to add an element to the end would be very expensive. O(n) instead of the O(1) time you normally get with arrays
The main problem with this approach is that you can't know the length in advance without looping to the end of the array - and that can affect the performance quite negatively if you only want to determine the length.
Why don't you just
Initialize it with a const int that you can use later in the code to check the size, or
Use int len = sizeof(my_array) / sizeof(the_type).
Since you're using 2-dimensional arrays to hold a ragged array, you could just use a ragged array: type *my_array[];. Or you could put the length in element 0 of each row and treat the rows as 1-indexed arrays. With some evil trickery you could even put the lengths at element -1 of each row![1]
Left as exercise ;)

Remove spaces from a string, but not at the beginning or end

I am trying to remove spaces from a string in C, not from the end, nor the beginning, just multiple spaces in a string
For example
hello everyone this is a test
has two spaces between hello and everyone, and five spaces from this to is. Ultimately I would want to remove 1 space from the 2 and 4 from the 5, so every gap has 1 space exactly. Make sense?
This is what I was going to do:
create a pointer, point it to the string at element 1 char[0].
do a for loop through the length of the string
then my logic is, if my pointer at [i] is a space and my pointer at element [i+1] space then to do something
I am not quite sure what would be a good solution from here, bearing in mind I won't be using any pre-built functions. Does anyone have any ideas?
One way is to do it in-place. Loop through the string from the beginning to end. store a write pointer and a read pointer. Each loop the write pointer and read pointer advances by one. When you encounter a space transfer it as normal but then loop the read pointer incrementing each time until a non-space is found (Or the end of the string, obviously). Don't forget to add a '\0' at the end and you now have the same string without the spaces.
Are you allowed to use extra memory to create a duplicate of the string or you need to do the processing in place?
The easiest will be to allocate memory equally to the size of the original string and copy all characters there. If you meet an extra space, do not copy it.
If you need to do it in place, then create two pointers. One pointing to the character being read and one to the character being copied. When you meet an extra space, then adapt the 'read' pointer to point to the next non space character. Copy to the write position the character pointed by the read character. Then advance the read pointer to the character after the character being copied. The write pointer is incremented by one, whenever a copy is performed.
Example:
write
V
xxxx_xxxx__xxx
^
Read
A hard part here is that you can not remove an element from the array of characters easily. You could of course make a function that returns a char[] that has one particular element removed. Another option is to make an extra array that indicates which characters you should keep and afterward go over the char[] one more time only copying the characters you want to keep.
This is based on what Goz said, but I think he had finger trouble, because I'm pretty sure what he described would strip out all spaces (not just the second onwards of each run).
EDIT - oops - wrong about Goz, though the "extra one" wording would only cover runs of two spaces correctly.
EDIT - oops - pre-written solution removed...
The general idea, though, is to use the "from" and "to" pointers as others did, but also to preserve some information (state) from one iteration to the next so that you can decide whether you're in a run of spaces already or not.
You could do a find and replace for "  " and " ", and keep doing it until no more matches are found. Innefficient, but logical.

Resources