Searching for string that contains escaped characters - c

I have a binary file that has this particular string in it: ^#^Aname^#Team Fortress 2
This is how I tried to find it using memmem:
char *game = "Team Fortress 2";
sprintf(searchString,"\1\1name\1%s\0",game);
...
if(pos = memmem(buffer,result,searchString,strlen(searchString)))
How do I match the escaped characters ^# and ^A?
It seems to find \1\1name, but not with game in searchString.

Because your string contains nulls it isn't a valid C string and string manipulation functions such as memmem and strlen won't work. You'll have to roll your own version.
The simplest way is to loop through each index of the string, then use a second loop to check it against the string you're searching for. There are fancier and faster methods, but they are more difficult to understand and implement properly if you don't need the extra speed. See Wikipedia for an overview of the subject.

It isn't caret notation?
char *game = "Team Fortress 2";
sprintf(searchString,"%c%cname%c%s\0",(char)0, (char)1, (char)0, game);
...
if(pos = memmem(buffer,result,searchString,strlen(searchString)))

Related

Creating an array that only contains the letters from a string

So I'm trying to change the string \t\n into an array of all of word characters in the string. The array I want would look like this: ["t","n"].
So far I've tried:
input = " \t\n"
array = input.scan(/\w/)
I've tried this regular expression on this string on rubular and it matches with all of the word characters as I'd like it to.
However, when using input.scan(/\w) an empty array is returned.
Please forgive my ignorance as I'm still new to this, but why is this?
Here you go! You were really close.
input = " \t\n"
array = input.dump.scan(/\w/)
=> ["t", "n"]
The key is to use String#dump (see: https://ruby-doc.org/core-2.6.5/String.html#method-i-dump)
I am not familiar with ruby but you seem to be having string interpolation confusion.
Per https://www.ruby-forum.com/t/new-line-in-string/176797
input = " \t\n"
Gives you a string with a space, tab, and newline.
You probably want to use single quotes to literally get the string you wrote:
input = ' \t\n'
If you sorely want to stick with double-quotes then I believe this would work:
input = " \\t\\n"
You should read https://blog.appsignal.com/2016/12/21/ruby-magic-escaping-in-ruby.html to learn more about string interpolation in Ruby. I would link you to the official docs but my lack of ruby experience translates to a lack of official doc experience.
So like colleagues explain in comments, the letters which you have in "\t\n" string are not ordinary letters, only something called special characters so I am not sure but there is not easy way to take this characters from this string cause \t is like one character.
With normal string like tn you could do something like this
"tn".split("")
and that give you array which you want.
But on special characters like in the example. you could do something like this
a = "\t\n".split("")
a.map! do |e|
if e == "\t"
"t"
elsif e == "\n"
"n"
end
end
which give you, I believe, results which you want.

Efficient way to find and then copy a substring in C

I just want to find a special sub-string in another string and save it as another string. Here is the code:
char sub[13]={};
char *ptr = sub;
char src[100] = "SOME DATE HERE CBC: 2345,23, SOME OTHER DATA";
// |----------|
ptr = strstr(src,"CBC:");
strncpy(sub,ptr,sizeof(sub)-1);
Is this an efficient way or does a better method exist for this?
Thanks.
Is this an efficient way or does a better method exist for this?
Pros:
Uses strstr() which is likely more efficient and correct that coding your own search.
Cons:
Does not handle the case when strstr() return NULL resulting in undefined behavior (UB). #Paul Ogilvie
ptr = strstr(src,"CBC:");
// add test
if (ptr) {
// copy
} else {
// Handle not found, perhaps `sub[0] = '\0';`
}
char sub[13]={}; is not compliant C code. #pmg. A full initialization of the array is not needed - even though it is a common good practice.
Code does not quite fulfill "want to find a special sub-string in another string and save it as another string". Its more like "want to find a special sub-string in another string and save it and more as another string".
strncpy(sub,ptr,sizeof(sub)-1) can unnecessarily mostly fill the array with null characters. This is inefficient when ptr points to a string much less than sizeof(sub). Code could use strncat() but that is tricky. See this good answer #AnT.
// alternative
char src[100] = "SOME DATE HERE CBC: 2345,23, SOME OTHER DATA";
char sub[13];
sub[0] = '\0';
const char *ptr = strstr(src, "CBC:");
if (ptr) {
strncat(sub, p, sizeof sub - 1);
}
Unless this piece of code is in a critical path and it's the actual source of a performance bottleneck, then just stick to what you have. Default strstr implementation should be quite adequate for the task.
You can squeeze some peanuts by pre-computing the end of src, for example, so you could use memcmp (an unconditional loop) instead of strncpy (which has a conditional loop) when extracting the sub. If you know beforehand the substring you are searching for, you can optimize around that too; especially if it's exactly 4 characters. And so on and so forth.
But if you are after these peanuts, you might be (much) better off by redoing the code to not extract sub to begin with and use something like ranged strings to keep track of where it is in the source string.

C Trying to match the exact substring and nothing more

I have tried different functions including strtok(), strcmp() and strstr(), but I guess I'm missing something. Is there a way to match the exact substring in a string?
For example:
If I have a name: "Tan"
And I have 2 file names: "SomethingTan5346" and "nothingTangyrs634"
So how can I make sure that I match the first string and not both? Because the second file is for the person Tangyrs. Or is it impossible with this approach? Am I going at it the wrong way?
If, as seems to be the case, you just want to identify strings that have your text but are immediately followed by a digit, your best bet is probably to get yourself a good regular expression implementation and just search for Tan[0-9].
It could be done simply be using strstr() to find the string then checking the character following that with isnum() but the actual code to do that would be:
not as easy as you think since you may have to do multiple searchs (e.g., TangoTangoTan42 would need three checks); and
inadvisable if there's the chance the searches my become more complex (such as Tan followed by 1-3 digits or exactly two # characters and an X).
A regular expression library will make this much easier, provided you're willing to invest a little effort into learning about it.
If you don't want to invest the time in learning regular expressions, the following complete test program should be a good starting point to evaluate a string based on the requirements in the first paragraph:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int hasSubstrWithDigit(char *lookFor, char *searchString) {
// Cache length and set initial search position.
size_t lookLen = strlen(lookFor);
char *foundPos = searchString;
// Keep looking for string until none left.
while ((foundPos = strstr(foundPos, lookFor)) != NULL) {
// If at end, no possibility of following digit.
if (strlen(foundPos) == lookLen) return 0;
// If followed by digit, return true.
if (isdigit(foundPos[lookLen])) return 1;
// Otherwise keep looking, from next character.
foundPos++;
}
// Not found, return false.
return 0;
}
int main(int argc, char *argv[]) {
if (argc < 3) {
printf("Usage testprog <lookFor> <searchIn>...\n");
return 1;
}
for (int i = 2; i < argc; ++i) {
printf("Result of looking for '%s' in '%s' is %d\n", argv[1], argv[i], hasSubstrWithDigit(argv[1], argv[i]));
}
return 0;
}
Though, as you can see, it's not as elegant as a regex search, and is likely to become even less elegant if your requirements change :-)
Running that with:
./testprog Tan xyzzyTan xyzzyTan7 xyzzyTangy4 xyzzyTangyTan12
shows it is action:
Result of looking for 'Tan' in 'xyzzyTan' is 0
Result of looking for 'Tan' in 'xyzzyTan7' is 1
Result of looking for 'Tan' in 'xyzzyTangy4' is 0
Result of looking for 'Tan' in 'xyzzyTangyTan12' is 1
The solution depends on your definition of exact matching.
This might be useful for you:
Traverse all matches of the target substring.
C find all occurrences of substring
Finding all instances of a substring in a string
find the count of substring in string
https://cboard.cprogramming.com/c-programming/73365-how-use-strstr-find-all-occurrences-substring-string-not-only-first.html
etc.
Having the span of the match, verify that the previous and following characters match/do not match your criterion for "exact match".
Or,
You could take advantage of regex in C++ (I know the tag is "C"), with #include <regex>, or POSIX #include <regex.h>.
You may want to use strstr(3) to search a substring in a string, strchr(3) to search a character in a string, or even regular expressions with regcomp(3).
You should read more about parsing techniques, notably about recursive descent parsers. In some cases, sscanf(3) with %n can also be handy. You should take care of the return count.
You could loop to read then parse every line, perhaps using getline(3), see this.
You need first to document your input file format (or your file name conventions, if SomethingTan5346 is some file path), perhaps using EBNF notation.
(you probably want to combine several approaches I am suggesting above)
BTW, I recommend limiting (for your convenience) file paths to a restricted set of characters. For example using * or ; or spaces or tabs in file paths is possible (see path_resolution(7)) but should be frowned upon.

c find exact word from string starting with "

i am doing an exercise in C for my C programming course. I have to read data from a text file into a linked list and look for matches, then print the result out.
Example of the text file:
"Apple/Orange",1
"Banana/Watermelon/Lemon",2
"Watermelon/Strawberry",3
"Orange/Grape/Watermelon",4
"Blueberry", 5
Stored them into my linked list by using fgets(), sscanf() and a void function, therefore the string will be starting with a quotation mark.
The problem is when i tried to use strncmp() to find a word from the string, it didn't work due to the quotation mark.
I did something like:
void findFruits(List *list){
Node *position = list->first;
while(position != NULL){
if(strncmp(position->fruits, "Watermelon", 10)==0){
printf("%s, %d\n", position->fruits, position->number);
}
position = position->next;
}
I literally have no clue for finding an exact word from the string which is beginning with a quotation mark, any help would be appreciated, thanks.
Solved now, thanks to Barmar's idea. It worked perfectly when i tried to use strstr() instead of strncmp().
if(strstr(position->fruits, "Watermelon")){
printf("%s, %d\n", position->fruits, position->number);
}
if you want search a single fruit word, for instance Watermelon in a string Banana/Watermelon/Lemon, you can't compare those two string but you must split your string with this separator / and compare the word between two separator; or you can compare a single character of your fruit with your string.

What's the easiest way to parse a string in C?

I have to parse this string in C:
XFR 3 NS 207.46.106.118:1863 0 207.46.104.20:1863\r\n
And be able to get the 207.46.106.118 part and 1863 part (the first ip address).
I know I could go char by char and eventually find my way through it, but what's the easiest way to get this information, given that the IP address in the string could change to a different format (with less digits)?
You can use sscanf() from the C standard lib. Here's an example of how to get the ip and port as strings, assuming the part in front of the address is constant:
#include <stdio.h>
int main(void)
{
const char *input = "XFR 3 NS 207.46.106.118:1863 0 207.46.104.20:1863\r\n";
const char *format = "XFR 3 NS %15[0-9.]:%5[0-9]";
char ip[16] = { 0 }; // ip4 addresses have max len 15
char port[6] = { 0 }; // port numbers are 16bit, ie 5 digits max
if(sscanf(input, format, ip, port) != 2)
puts("parsing failed");
else printf("ip = %s\nport = %s\n", ip, port);
return 0;
}
The important parts of the format strings are the scanset patterns %15[0-9.] and %5[0-9], which will match a string of at most 15 characters composed of digits or dots (ie ip addresses won't be checked for well-formedness) and a string of at most 5 digits respectively (which means invalid port numbers above 2^16 - 1 will slip through).
Depends on what defines the format of the document. In this case, it may be as simple as tokenizing the string and looking through the tokens for what you want. Simply use strtok and split on spaces to grab the 207.46.106.118:1863 and then you can tokenize that again (or simply scan for the : manually) to get the proper components.
You could use strtok to tokenize breaking on space, or you could use one of the scanf family to pull out data as well.
There is a big caveat in all of this though, these are functions that are notorious for security and mishandling bad input. YMMV.
Loop through until you get the first '.', and loop back until you find ' '. The loop forward until you find ':', building sub-strings every time you meet '.' or ':'. You can check the number of substrings and their lengths as simple error checking. Then loop until you find a ' ' and you have the 1863 part.
This would be robust if the beginning of the string doesn't vary much. And also very easy. You could make it even simpler if the string always begins with "XFR 3 NS ".
In this case, strok() is of trivial use and would be my choice. For safety, you might count the ':' in your string and proceed if there is exactly one ':'.
If the strings to be parsed are well-formatted then I'd go with Daniel and Ukko's suggestion to use strtok().
A word of warning though: strtok() modifies the string that it parses. Not always what you want.
This may be overkill, since you said you didn't want to use a regex library, but the re2c program will give you regex parsing without the library: it generates the DFSM for a regular expression as C code. The regexps are specified in comments embedded in C code.
And what seems like overkill now may become a comfort to you later should you have to parse the rest of the string; it is a lot easier to modify a few regexps to adjust or add new syntax than to modify a bunch of ad hoc tokenizing code. And it makes the structure of what you are parsing a lot clearer in your code.

Resources