Finding a string in a file [C] - c

Could anyone tell me the way how to find string (which you enter in a program) in a .txt file without using function for that?(Just need an algorithm for that nothing else) EXAMPLE: i have file named NAMES.txt with surnames on the first line separate with space like that:
John Peter Paul
and in my program I enter name for example Paul and it finds it in that file and write "the name is there"
name = Paul;
I have one method on my mind that if i enter for example Paul to my program it would scan all chars one by one in that file in a row and if name[1] = P then it would start scaning and comparing letters and if they were the same it would each time increase counter p by one (p++) and if p = lenghth of name then the name would be there (there might be 1 bug which comes to my mind that if you enter Paul and in the file theres name Paula it will actually write "The name is there" if i used that method but it should not be impossible to debug)
Could anyone also tell me if my written method is possible to realize ?

I suggest avoiding reading the entire file into memory. Large files might result in large memory consumption, which is far from ideal.
Presumably you have the string to search for in memory somewhere; it's already in an allocation. Create another allocation of the size of that string, and read that many bytes into it... Don't forget to account for the '\0' string terminator.
Check to see if it matches. If the string matches, well, obviously you've found a match within that file. If it doesn't, shift the array one byte left, read another byte onto the end of it. Rinse, lather, repeat until you find a match.
The bug you mentioned implies that you need a string terminator, in the file, somewhere. Technically speaking, a string terminator is a '\0', but you could substitute any terminal value(s). Just replace the value(s) you choose (perhaps whitespace?) with a '\0' as you're reading.

Function fgets() will be useful for this task as you can store whole string in buffer instead of one character at a time .
fgets(name,100,fp)
Where name is string pointer where string read is stored ,100 is the number of characters to be read and fp is the FILE pointer from where you want to read.
And then you can use function strcmp() to compare the string and name you want to search .So it will eliminate the other possibility of matching with a different name.

Related

Removing a substring from a char array without using any libraries in C

I am a computer science first year student and our teachers gave a binary pattern search task to do. We have to remove a substring from a string without using any libraries and built-ins like(memmove or strstr). Our only hint is that its something to do with '\0'. I don't see how we are going to achive this because as i know the null character only ends a string not removes it. And given an unknown input it gets even harder to get around. I need help about the usage of null character. EDIT: Oh and also we are not allowed to create new arrays. EDIT2: The problem is much more complicated if you are here for the solution read the comments under this thread and the marked solution's also.
C strings are "null terminated" which means they are considered to end wherever a null (written '\0' in C) appears.
If I start with the string "Stack Overflow" and I overwrite the space with '\0', I now have the string "Stack". The storage for "Overflow" still exists, but it is not part of the string according to C functions like strlen(), printf() etc. In fact, if I hold a pointer to the "O" part of the original string, it will be just as if there are two strings: "Stack" and "Overflow", and you can still use both of them.
It's like if I come to where you live and I build a huge wall across the road just before your house. The road is now shortened, and people on my side of it won't know you are there.

Is there a way to compare every line in one text file to one line in another text file in C?

For example, I have an index text file that has 400+ English words, and then I have another text file with decrypted text on each line.
I want to check each English word in my index file with each line of my decrypted text file (so checking 400+ English words for a match per line of decrypted text)
I was thinking of using strncmp(decryptedString, indexString, 10) because I know that strncmp terminates if the next character is NULL.
Each line of my decrypted text file is 352 characters long, and there's ~40 million lines of text stored in there (each line comes from a different output).
This is to decrypt a playfair cipher; I know that my decryption algorithm works because my professor gave us an example to test our program against and it worked fine.
I've been working on this project for six days straight and this is the only part I've been stuck on. I simply can't get it to work. I've tried using
while(getline(&line, &len, decryptedFile) != -1){
while(getline(&line2, &len2, indexFile) != -1){
if(strncmp(decryptedString, indexString, 10) == 0){
fprintf(potentialKey, "%s", key);
}
}
}
But I never get any matches. I've tried storing each string in into arrays and testing them one character at a time and that didn't work for me either since it would list all the English words are on one line. I'm simply lost, so any help or pointers in the right direction would be much appreciated. Thank you in advance.
EDIT: Based on advice from Clifford in the comments, here's an example of what I'm trying to do
Let's say indexFile contains:
HELLO
WORLD
PROGRAMMING
ENGLISH
And the decryptedFile contains
HEVWIABAKABWHWHVWC
HELLOHEGWVAHSBAKAP
DHVSHSBAJANAVSJSBF
WORLDHEEHHESBVWJWU
PROGRAMMINGENGLISH
I'm trying to compare each word from indexFile to decryptedFile, one at a time. So all four words from indexFile will be compared to line 1, line2, line 3, line 4, and line 5 respectively.
If what you are trying to do is check to see if an input line starts with a word, you should use:
strncmp(line, word, strlen(word));
If you know that line is longer than word, you can use
memcmp(line, word, strlen(word));
If you are doing that repeatedly with the same word(s), you'd be better off saving the length of the word in the same data structure as the word itself, to avoid recomputing it each time.
This is a common use case for strncmp. Note that your description of strncmp is slightly inaccurate. It will stop when it hits a NUL in either argument, but it only returns equal if both arguments have a NUL in the same place or if the count is exhausted without encountering a difference.
strncmp is safer than depending on the fact that line is longer than word, given that the speed difference between memcmp and strncmp is very small.
However, with that much data and that many words to check, you should try something which reduces the number of comparisons you need to do. You could put the words into a Trie, for example. Or, if that seems like too much work, you could at least categorize them by their first letter and only use the ones whose first letter matches the first letter of the line, if there are any.
If you are looking for an instance of the word(s) anywhere in the line, then you'll need a more sophisticated search strategy. There are lots of algorithms for this problem; Aho-Corasick is effective and simple, although there are faster ones.
If a line of decrypted text is 352 characters long and each word in the index is not 352 characters long, then a line of decrypted text will never match any word in the index.
From this I think you've misunderstood the requirements and asked a question based on the misunderstanding.
Specifically, I suspect that you want to compare each individual word in the decrypted line (and not the whole line) with each each word in your index, to determine if all words in the decrypted line are acceptable. To do that, the first step would be to break the decrypted line of characters into individual words - e.g. maybe finding the characters that separate words (spaces, tabs, commas?) within the decrypted text and replacing them with a zero terminator (so that you can use strcmp() and don't need to worry about "foobar" incorrectly matching "foo" just because the first letters match).
Note that there's probably potential optimisations. E.g. if you know that a word from the decrypted text is 8 characters (which you would've had to have known to place the zero terminator in the right spot) and if your index is split into "one list for each word length" (e.g. a list of index words with 3 characters, a list of index words with 4 characters, etc) then you might be able to skip a lot of string comparisions (and only compare the word from the decrypted line with words that have the same length in the index). In this case (where you know both words have the same length already) you can also avoid modifying the original 352 characters (you won't need to insert the zero terminator after each word).

I need someone to explain extracting these string from a inputbox

delphi 2010
I have a procedure in which the user enters in their name and surname and then i extract the surname and name into two different strings. Can someone please explain the significance of the +1,3 and pos' ' in the code, and when would those values need to be changed?(e.g why is it +1 and not +2) thank you
procedure TForm1.GenerateOnceoffPassword1Click(Sender: TObject);
var
suser, ssurname, sname, spassword : string;
arrpassword : array[1..150] of string;
begin
inc(icounter);
suser := inputbox('Enter name and surname','lower case ONLY','');
ssurname := copy(suser,pos(' ',suser)+1, 3);
sname := copy(suser, 1, pos(' ',suser)-1);
I assume you've looked up the Copy and Pos functions in the OLH or elsewhere. So, dealing with your points in your q and comment:
a. The "+1" in "copy(suser,pos(' ',suser)+1, 3)" means that the call to Copy should start at the first character after the first occurrence of a space character in suser returned by the call to Pos(). If Pos() finds no space in suser, it will return 0, so copying would then start at the first character of suser. See also point 2 below.
b. The "3" means that Copy should copy (at most) 3 characters from where it has been told to start copying by "pos() + 1". I say "at most" because that's how Copy() works and nothing in your code compels the user to enter a string having 3 or more characters after the first space. Seems a bit odd that a surname should be restricted to a maximum of 3 characters, btw.
c. Presumably referring to "1,=1" in your comment, you actually meant "1,=-1" Anyway, The "1" in the second call to Copy() means "start copying from the first character of suser", and the "pos() - 1" means copy at most X characters where X is one less than the value returned by the call to pos(), in other words copy the characters from suser up to one before the first occurrence of a space. If there is no space in suser, this will result in sname being empty.
Be aware that:
When using functions like Pos() and Copy() to split strings up, it's a good idea to get into the habit of using the Trim() function to remove any leading or trailing spaces from the substring(s). In point a. above, your code as written overlooks the possibility that the user might type two (or more) consecutive spaces.
Rather than prompt the user to use lower-case only, it would be better to get into the habit of writing code which works regardless of case. Obviously this isn't an issue with the specific code in your q, but anyway.
Traditionally, strings in Delphi have been 1-based, meaning that, if non-blank, inter alia the string can be accessed as if it were an array with a starting index of 1. Newer versions of the compiler (newer than D2010, that is) for mobile platforms like Android use 0-based strings, which cause the arithmetic of code like yours to be problematic if used unmodified.

What happens when a string contains only '\0'? C

In my program I'm reading words from a .txt file and I will be inserting them into both a linked list and a hash table.
If two '\n' characters are read in a row after a word then the second word the program will read will be '\n', however I then overwrite it with '\0', so essentially the string contains only '\0'.
Is it worth me putting an if statement so the next part of my program only executes if the word is a real word (i.e. word[0] != '\n')? Would the string '\0' use up space in the hash table/linked list?
In C a character array with first element being \0 is an empty string, i.e. of length zero. There's not much sense in keeping empty strings in containers, if that's what you are asking.
It depends if you consider an empty string a valid entry. You seem to be storing words so I would guess that an empty string is of no interest, but that is application specific.
For example, an environment variable can be present (getenv returns a valid pointer) but the value can be "unset": an empty string. In that case the fact that the value is an empty string might be significant.
So, if an empty string is not significant is it worth adding an if statement to ignore it? Generally that would be a "yes", since the overhead of storing and maintaining the empty string could be significantly more than one if statement per entry. But of course that is only a guess, I don't know what your overheads are, how many times that if would get executed, and how many empty string entries you would be saving. You might not know that either, so my fallback position would be only to store data that is significant.

Remove spaces from a string, but not at the beginning or end

I am trying to remove spaces from a string in C, not from the end, nor the beginning, just multiple spaces in a string
For example
hello everyone this is a test
has two spaces between hello and everyone, and five spaces from this to is. Ultimately I would want to remove 1 space from the 2 and 4 from the 5, so every gap has 1 space exactly. Make sense?
This is what I was going to do:
create a pointer, point it to the string at element 1 char[0].
do a for loop through the length of the string
then my logic is, if my pointer at [i] is a space and my pointer at element [i+1] space then to do something
I am not quite sure what would be a good solution from here, bearing in mind I won't be using any pre-built functions. Does anyone have any ideas?
One way is to do it in-place. Loop through the string from the beginning to end. store a write pointer and a read pointer. Each loop the write pointer and read pointer advances by one. When you encounter a space transfer it as normal but then loop the read pointer incrementing each time until a non-space is found (Or the end of the string, obviously). Don't forget to add a '\0' at the end and you now have the same string without the spaces.
Are you allowed to use extra memory to create a duplicate of the string or you need to do the processing in place?
The easiest will be to allocate memory equally to the size of the original string and copy all characters there. If you meet an extra space, do not copy it.
If you need to do it in place, then create two pointers. One pointing to the character being read and one to the character being copied. When you meet an extra space, then adapt the 'read' pointer to point to the next non space character. Copy to the write position the character pointed by the read character. Then advance the read pointer to the character after the character being copied. The write pointer is incremented by one, whenever a copy is performed.
Example:
write
V
xxxx_xxxx__xxx
^
Read
A hard part here is that you can not remove an element from the array of characters easily. You could of course make a function that returns a char[] that has one particular element removed. Another option is to make an extra array that indicates which characters you should keep and afterward go over the char[] one more time only copying the characters you want to keep.
This is based on what Goz said, but I think he had finger trouble, because I'm pretty sure what he described would strip out all spaces (not just the second onwards of each run).
EDIT - oops - wrong about Goz, though the "extra one" wording would only cover runs of two spaces correctly.
EDIT - oops - pre-written solution removed...
The general idea, though, is to use the "from" and "to" pointers as others did, but also to preserve some information (state) from one iteration to the next so that you can decide whether you're in a run of spaces already or not.
You could do a find and replace for "  " and " ", and keep doing it until no more matches are found. Innefficient, but logical.

Resources