What happens when a string contains only '\0'? C - c

In my program I'm reading words from a .txt file and I will be inserting them into both a linked list and a hash table.
If two '\n' characters are read in a row after a word then the second word the program will read will be '\n', however I then overwrite it with '\0', so essentially the string contains only '\0'.
Is it worth me putting an if statement so the next part of my program only executes if the word is a real word (i.e. word[0] != '\n')? Would the string '\0' use up space in the hash table/linked list?

In C a character array with first element being \0 is an empty string, i.e. of length zero. There's not much sense in keeping empty strings in containers, if that's what you are asking.

It depends if you consider an empty string a valid entry. You seem to be storing words so I would guess that an empty string is of no interest, but that is application specific.
For example, an environment variable can be present (getenv returns a valid pointer) but the value can be "unset": an empty string. In that case the fact that the value is an empty string might be significant.
So, if an empty string is not significant is it worth adding an if statement to ignore it? Generally that would be a "yes", since the overhead of storing and maintaining the empty string could be significantly more than one if statement per entry. But of course that is only a guess, I don't know what your overheads are, how many times that if would get executed, and how many empty string entries you would be saving. You might not know that either, so my fallback position would be only to store data that is significant.

Related

What is wrong with adding null character to non null-terminated string?

Why I shouldn't add a null character to the end of a non null-terminated string like in this answer? I mean if I have a non null-terminated string and add null character to the end of the string, I now have a null-terminated string which should be good, right?
Is there any security problem I don't see?
Here's the code in case the answer gets deleted:
char letters[SIZE + 1]; // Leave room for the null-terminator.
// ...
// Populate letters[].
// ...
letters[SIZE] = '\0'; // Null-terminate the array.
to know the end of the string you must have a null terminated string, otherwise there is no way to know the end of the string
There is nothing technically wrong in terminating the string with \0 this way. However, the approaches you can use to populate the array before adding \0 are prone to error. Take a look in some situations:
Suppose you decide to populate letters char by char. What happens if you forget to add some letters? What if you add more letters than the expected size?
What if there are thousands of letters to populate the array?
What if you need to populate letters with Unicode characters that (often) require more than one byte per symbol?
Of course you can address these situations very carefully but they still will be prone to error when maintaining the code.
To be clear: a string in C always has one and only one null character - it is the last character of the string. A string is an array of characters. If an array of characters does not have a null character, it is not a string.
A string is a contiguous sequence of characters terminated by and including the first null character. C11dr 7.1.1 1
There is nothing wrong with adding a null character to an array of characters as OP coded.
This is a fine way to form a a string if:
All the preceding characters are defined.
String functions are not call until after a null character is written.
You shouldn't use it, to avoid errors (or security holes) due mixing C/Pascal strings.
C style string: An array of char, terminated by NULL ('\0')
Pascal style string: a kind of structure, with a int with the size of the string, and an array with the string itself.
The Pascal style don't use in-band control, so it can use any char inside it, like NULL. C strings can't, as they use it as signaling control.
The problem is when you mix them, or assume one style when it's another. Or even try to convert between them.
Converting a C string to pascal would do no harm. But if you have a legit Pascal string with more then one NULL character, converting it to C style will cause problem, as it can't represent it.
A good example of this is the X.509 Null Char Exploit, where you could register a ssl certificate to:
www.mysimplesite.com\0www.bigbank.com
The X.509 certificate uses Pascal string, so this is valid. But when checking, the CA could use or assume C code or string style that just sees the first www.mysimplesite.com and signs the certificate. And some brosers parses this certificate as valid also for www.bigbank.com.
So, you CAN use it, but you SHOULD'NT, as it's risky to cause some bug or even a security breach.
More details and info:
https://www.blackhat.com/presentations/bh-usa-09/MARLINSPIKE/BHUSA09-Marlinspike-DefeatSSL-SLIDES.pdf
https://sites.google.com/site/cse825maninthemiddle/odds-and-ends/x-509-null-char-exploit
In general, there are two ways of keeping track of an array of some variable number of things:
Use a terminator. Of course, this is the C approach to representing strings: an array of characters of some unknown size, with the actual string length given by a null terminator.
Use an explicit count stored somewhere else. (As it happens, this is how Pascal traditionally represents strings.)
If you have an array containing a known but not null-terminated sequence of characters, and if you want to turn it into a proper null-terminated string, and if you know that the underlying array is allocated big enough to contain the null terminator, then yes, explicitly setting array[N] to '\0' is not only acceptable, it is the way to do it.
Bottom line: it's a fine technique (if the constraints are met). I don't know why that earlier answer was criticized and downvoted.

Finding a string in a file [C]

Could anyone tell me the way how to find string (which you enter in a program) in a .txt file without using function for that?(Just need an algorithm for that nothing else) EXAMPLE: i have file named NAMES.txt with surnames on the first line separate with space like that:
John Peter Paul
and in my program I enter name for example Paul and it finds it in that file and write "the name is there"
name = Paul;
I have one method on my mind that if i enter for example Paul to my program it would scan all chars one by one in that file in a row and if name[1] = P then it would start scaning and comparing letters and if they were the same it would each time increase counter p by one (p++) and if p = lenghth of name then the name would be there (there might be 1 bug which comes to my mind that if you enter Paul and in the file theres name Paula it will actually write "The name is there" if i used that method but it should not be impossible to debug)
Could anyone also tell me if my written method is possible to realize ?
I suggest avoiding reading the entire file into memory. Large files might result in large memory consumption, which is far from ideal.
Presumably you have the string to search for in memory somewhere; it's already in an allocation. Create another allocation of the size of that string, and read that many bytes into it... Don't forget to account for the '\0' string terminator.
Check to see if it matches. If the string matches, well, obviously you've found a match within that file. If it doesn't, shift the array one byte left, read another byte onto the end of it. Rinse, lather, repeat until you find a match.
The bug you mentioned implies that you need a string terminator, in the file, somewhere. Technically speaking, a string terminator is a '\0', but you could substitute any terminal value(s). Just replace the value(s) you choose (perhaps whitespace?) with a '\0' as you're reading.
Function fgets() will be useful for this task as you can store whole string in buffer instead of one character at a time .
fgets(name,100,fp)
Where name is string pointer where string read is stored ,100 is the number of characters to be read and fp is the FILE pointer from where you want to read.
And then you can use function strcmp() to compare the string and name you want to search .So it will eliminate the other possibility of matching with a different name.

Find the first non repetitive character in a string

I was asked this question in an interview. My answer: "create an array of size 26. Traverse the string by each character and make counts of characters in the array. Check the first non repetitive character by traversing again in the string and checking the array if the character is not repeated." What if the string contains large set of characters like 10000 types of characters instead of 26 types of alphabets.
You can implement your original algorithm using less memory. Since you don't care about how many times the character repeated above 1, you only need 2 bits per character in the alphabet. When incrementing a value that is already above 1, just leave the value alone. The rest of your algorithm remains unchanged.
If memory is not a restriction, there is a faster algorithm that doesn't require another pass over the string. Allow each letter in the alphabet to be represented by a ListNode. Then have two lists, list1 and list2, that start out as empty. list1 contains letters that have only occurred once, and list2 contains letters that have occurred more than once. For each letter in the input string, get the corresponding ListNode, say node. If node is not in either list, put it at the end of list1. If node is already in list1, take it out, and put it in list2. After the input string is processed, if list1 is empty, there are no non-repeating characters. Otherwise, the character that corresponds with the first node in list1 is the first non-repeating character.
Follow this link to IDEONE for an implementation of the list based algorithm.
You gave the brute-force answer. A clever solution keeps an array of the size of the alphabet, we'll call it x, with each item initially -1. Then pass through the string once; if the current character has an x-value of -1, change it to the index of the current character, otherwise if the current character has an x-value greater than 0, change it to -2. At the end of the string, examine each location in the x array, keeping track of the smallest positive value and the associated character. I implement that algorithm in Scheme at my blog.
You could use a tree-like data structure instead of an array.
If our strings are not too long, loop through the string character by character and check, whether you manage to find it again. Drawback: n^2 runtime in the worst case
build a hash of each character for the first pass and then check each bucket seperately. Should be combined with the method above and I'm not sure whether it reduces runtime significantly - it depends on your real world data.
Start reading the string character-by-character
Put each character in HashMap
Return the first character which has conflict
Pros:
You don't need to create a BitMap/Bit-Array in advance
Cons:
The HashMap can grow to as much as number of characters in string, if it does not encounter repeating character (or if there is no repeating character)

Trimming a char array in C

I am working on an assignment and I have been noticing a problem in my coding assignment. It is not clear to me how to tackle this problem, probably due to a lack of sleep but anyway. I need to trim a char array of it's white spaces for this assignment.
The solution I thought of involved a second char array and just simply copy the non white spaces to that array and I'm done. But how can I create a char array without knowing it's size, because at that moment I do not yet know the size. I still need to trim it in order to know how many characters need to be copied to the new array, which varies in the assignment
I know there are a lot of good questions out here on stackoverflow but I think this has more to do with the thought process rather then the correct syntax.
My second problem is how do I perform a fscanf/fgetc on a char array since it needs a stream, is it sufficient to give it a pointer rather then a stream?
If making the change in-place simply, shift every chracter after a space back, and repeat till the end of the array. This is very inefficient.
If making a new copy, make a new array of the same length, and then do as you were doing (copy all the non-space characters). If you copy the \0 character as well, then there will be no string termination issue. This is much more efficient.
Going by your comments, it appears you may have the option to input the array in any form you wish. I would then recommend that instead of doing text manipulations later on, just input the string in the form you need.
You can simply use scanf or fscanf repeatedly, to input the separate words into the same array. This will take care of all the whitespaces.
Here is one partial idea: You can make a first pass on the char array and count the blanks, then take the string length minus the blanks for the second array, then perform your copy across skipping the blanks.
You could also create a pass through the array:
Test until end of array:
Is my (Current/Index) position blank? (A space)
If so, grab next available non-blank value and put it there.
then index++
If not, index++
Not sure on the second, will do some checking and see if I can find a good answer there too.

Remove spaces from a string, but not at the beginning or end

I am trying to remove spaces from a string in C, not from the end, nor the beginning, just multiple spaces in a string
For example
hello everyone this is a test
has two spaces between hello and everyone, and five spaces from this to is. Ultimately I would want to remove 1 space from the 2 and 4 from the 5, so every gap has 1 space exactly. Make sense?
This is what I was going to do:
create a pointer, point it to the string at element 1 char[0].
do a for loop through the length of the string
then my logic is, if my pointer at [i] is a space and my pointer at element [i+1] space then to do something
I am not quite sure what would be a good solution from here, bearing in mind I won't be using any pre-built functions. Does anyone have any ideas?
One way is to do it in-place. Loop through the string from the beginning to end. store a write pointer and a read pointer. Each loop the write pointer and read pointer advances by one. When you encounter a space transfer it as normal but then loop the read pointer incrementing each time until a non-space is found (Or the end of the string, obviously). Don't forget to add a '\0' at the end and you now have the same string without the spaces.
Are you allowed to use extra memory to create a duplicate of the string or you need to do the processing in place?
The easiest will be to allocate memory equally to the size of the original string and copy all characters there. If you meet an extra space, do not copy it.
If you need to do it in place, then create two pointers. One pointing to the character being read and one to the character being copied. When you meet an extra space, then adapt the 'read' pointer to point to the next non space character. Copy to the write position the character pointed by the read character. Then advance the read pointer to the character after the character being copied. The write pointer is incremented by one, whenever a copy is performed.
Example:
write
V
xxxx_xxxx__xxx
^
Read
A hard part here is that you can not remove an element from the array of characters easily. You could of course make a function that returns a char[] that has one particular element removed. Another option is to make an extra array that indicates which characters you should keep and afterward go over the char[] one more time only copying the characters you want to keep.
This is based on what Goz said, but I think he had finger trouble, because I'm pretty sure what he described would strip out all spaces (not just the second onwards of each run).
EDIT - oops - wrong about Goz, though the "extra one" wording would only cover runs of two spaces correctly.
EDIT - oops - pre-written solution removed...
The general idea, though, is to use the "from" and "to" pointers as others did, but also to preserve some information (state) from one iteration to the next so that you can decide whether you're in a run of spaces already or not.
You could do a find and replace for "  " and " ", and keep doing it until no more matches are found. Innefficient, but logical.

Resources