Compare two char arrays without CR LF

Compare two char arrays without CR LF - c

I would like to use the following function to compare two char arrays:
if(strcmp((PtrTst->cDatVonCom),szGeraeteAntwort)==0)
Now my problem is that PtrTst->cDatVonCom[5000] is different than the szGeraeteAntwort[255] and the entire values looks a little bit different:
(abstract from the logfile).
PtrTst->cDatVonCom:
04/16/19 12:53:36 AB A{CR}{LF}
0 0{CR}{LF}
szGeraeteAntwort:
04/16/19 12:53:36 AB A 0 0{CR}{LF}
Could I check if the command (in this case AB A) is the same in both?
The command can change and it must be in both the same to go through the if statement.
UPDATE:
Both char arrays are always there and i need to check if the "szGeraeteAntwort" is in the PtrTst->cDatVonCom.
In C# i would use an cDatVonCom.Contains... or something like this to check if there the same.

You have two strings that whose logical content you want to compare, but their literal presentation may vary. In particular, there may be CR/LF line termination sequences inserted into one or both, which are not significant for the purposes of the comparison. There are many ways to approach this kind of problem, but one common one is to define a unique canonical form for your strings, prepare versions of both strings to that form, and compare the results. In this case, the canonical form would presumably be one without any CR or LF characters.
The most general way to approach this is to create canonicalized copies of your strings. This accounts for the case where you cannot modify the strings in-place. For example:
/*
* src - the source string
* dest - a pointer to the first element of an array that should receive the result.
* dest_size - the capacity of the destination buffer
* Returns 0 on success, -1 if the destination array has insufficient capacity
*/
int create_canonical_copy(const char src[], char dest[], size_t dest_size) {
static const char to_ignore[] = "\r\n";
const char *start = src;
size_t dest_length = 0;
int rval = 0;
while (*start) {
size_t segment_length = strcspn(start, to_ignore);
if (dest_length + segment_length + 1 >= dest_size) {
rval = -1;
break;
}
memcpy(dest + dest_length, start, segment_length);
dest_length += segment_length;
start += segment_length;
start += strspn(start, to_ignore);
}
dest[dest_length] = '\0';
return rval;
}
You might use that like so:
char tmp1[255], tmp2[255];
if (create_canonical_copy(PtrTst->cDatVonCom, tmp1, 255) != 0) {
// COMPARISON FAILS: cDatVonCom has more non-CR/LF data than szGeraeteAntwort
// can even accommodate
return -1;
} else if (create_canonical_copy(szGeraeteAntwort, tmp2, 255) != 0) {
// should not happen, given that szGeraeteAntwort's capacity is the same as tmp2's.
// If it does, then szGeraeteAntwort must not be properly terminated
assert(0);
return -1;
} else {
return strcmp(tmp1, tmp2);
}
That assumes you are comparing the strings for equality only. If you were comparing them for order, as well, then you could still use this approach, but you would need to be more care ful about canonicalizing as much data as the destination can accommodate, and about properly handling the data-too-large case.

A function that compares the strings while skipping over some characters could be used.
#include <stdio.h>
#include <string.h>
int strcmpskip ( char *match, char *against, char *skip) {
if ( ! match && ! against) { //both are NULL
return 0;
}
if ( ! match || ! against) {//one is NULL
return 1;
}
while ( *match && *against) {//both are not zero
while ( skip && strchr ( skip, *match)) {//skip not NULL and *match is in skip
match++;
if ( ! *match) {//zero
break;
}
}
while ( skip && strchr ( skip, *against)) {//skip not NULL and *against is in skip
against++;
if ( ! *against) {//zero
break;
}
}
if ( *match != *against) {
break;
}
if ( *match) {//not zero
match++;
}
if ( *against) {//not zero
against++;
}
}
return *match - *against;
}
int main( void) {
char line[] = "04/16/19 12:53:36 AB A\r\n 0 0\r\n";
char text[] = "04/16/19 12:53:36 AB A 0 0\r\n";
char ignore[] = "\n\r";
if ( strcmpskip ( line, text, ignore)) {
printf ( "do not match\n");
}
else {
printf ( "match\n");
}
return 0;
}

There are several things you can do; here are two:
Parse both strings (e.g. using scanf() or something more fancy)), and during the parsing ignore the newlines. Now you'll have the different fields (or an indication one of the lines can't be parsed properly, which is an error anyway). Then you can compare the commands.
Use a regular expression matcher on those two strings, to obtain just the command while ignoring everything else (treating CR and LF as newline characters essentially), and compare the commands. Of course you'll need to write an appropriate regular expression.

Related

Reading from a CSV file and separating the fields to store in a struct in C

I am trying to read from a CSV file and store each field to a variable inside a struct. I am using fgets and strtok to separate each field. However, I cannot handle a special field which includes comma inside the field.
typedef struct {
char name[20+1];
char surname[20+1];
char uniqueId[10+1];
char address[150+1];
} employee_t;
void readFile(FILE *fp, employee_t *employees[]){
int i=0;
char buffer[205];
char *tmp;
while (fgets(buffer,205,fp) != NULL) {
employee_t *new = (employee_t *)malloc(sizeof(*new));
tmp = strtok(buffer,",");
strcpy(new->name,tmp);
tmp = strtok(buffer,",");
strcpy(new->surname,tmp);
tmp = strtok(buffer,",");
strcpy(new->uniqueId,tmp);
tmp = strtok(buffer,",");
strcpy(new->address,tmp);
employees[i++] = new;
free(new);
}
}
The inputs are as follows:
Jim,Hunter,9239234245,"8/1 Hill Street, New Hampshire"
Jay,Rooney,92364434245,"122 McKay Street, Old Town"
Ray,Bundy,923912345,NOT SPECIFIED
I tried printing the tokens with this code and I get this:
Jim
Hunter
9239234245
"8/1 Hill Street
New Hampshire"
I am not sure how to handle the address field, since some of them might have a comma inside them. I tried reading character by character but not sure how to insert the strings in the struct using a single loop. Can someone help me with some ideas on how to fix this?

strcspn can be used to find either double quotes or double quote plus comma.
The origial string is not modified so string literals can be utilized.
The position of the double quotes is not significant. They can be in any field.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main( void) {
char *string[] = {
"Jim,Hunter,9239234245,\"8/1 Hill Street, New Hampshire\""
, "Jay,Rooney,92364434245,\"122 McKay Street, Old Town\""
, "Ray,Bundy,923912345,NOT SPECIFIED"
, "Ray,Bundy,\" double quote here\",NOT SPECIFIED"
};
for ( int each = 0; each < 4; ++each) {
char *token = string[each];
char *p = string[each];
while ( *p) {
if ( '\"' == *p) {//at a double quote
p += strcspn ( p + 1, "\"");//advance to next double quote
p += 2;//to include the opening and closing double quotes
}
else {
p += strcspn ( p, ",\"");//advance to a comma or double quote
}
int span = ( int)( p - token);
if ( span) {
printf ( "token:%.*s\n", span, token);//print span characters
//copy to another array
}
if ( *p) {//not at terminating zero
++p;//do not skip consecutive delimiters
token = p;//start of next token
}
}
}
return 0;
}
EDIT: copy to variables
A counter can be used to keep track of fields as they are processed.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZENAME 21
#define SIZEID 11
#define SIZEADDR 151
typedef struct {
char name[SIZENAME];
char surname[SIZENAME];
char uniqueId[SIZEID];
char address[SIZEADDR];
} employee_t;
int main( void) {
char *string[] = {
"Jim,Hunter,9239234245,\"8/1 Hill Street, New Hampshire\""
, "Jay,Rooney,92364434245,\"122 McKay Street, Old Town\""
, "Ray,Bundy,923912345,NOT SPECIFIED"
, "Ray,Bundy,\"quote\",NOT SPECIFIED"
};
employee_t *employees = malloc ( sizeof *employees * 4);
if ( ! employees) {
fprintf ( stderr, "problem malloc\n");
return 1;
}
for ( int each = 0; each < 4; ++each) {
char *token = string[each];
char *p = string[each];
int field = 0;
while ( *p) {
if ( '\"' == *p) {
p += strcspn ( p + 1, "\"");//advance to a delimiter
p += 2;//to include the opening and closing double quotes
}
else {
p += strcspn ( p, ",\"");//advance to a delimiter
}
int span = ( int)( p - token);
if ( span) {
++field;
if ( 1 == field) {
if ( span < SIZENAME) {
strncpy ( employees[each].name, token, span);
employees[each].name[span] = 0;
printf ( "copied:%s\n", employees[each].name);//print span characters
}
}
if ( 2 == field) {
if ( span < SIZENAME) {
strncpy ( employees[each].surname, token, span);
employees[each].surname[span] = 0;
printf ( "copied:%s\n", employees[each].surname);//print span characters
}
}
if ( 3 == field) {
if ( span < SIZEID) {
strncpy ( employees[each].uniqueId, token, span);
employees[each].uniqueId[span] = 0;
printf ( "copied:%s\n", employees[each].uniqueId);//print span characters
}
}
if ( 4 == field) {
if ( span < SIZEADDR) {
strncpy ( employees[each].address, token, span);
employees[each].address[span] = 0;
printf ( "copied:%s\n", employees[each].address);//print span characters
}
}
}
if ( *p) {//not at terminating zero
++p;//do not skip consceutive delimiters
token = p;//start of next token
}
}
}
free ( employees);
return 0;
}

In my view, this kind of problem calls for a "proper" tokenizer, perhaps based on a finite state machine (FSM). In this case you'd scan the input string character by character, assigning each character to a class. The tokenizer would start in a particular state and, according to the class of the character read, it might stay in the same state, or move to a new state. That is, the state transitions are controlled by the combination of the current state and the character under consideration.
For example, if you read a double-quote in the starting state, you transition to the "in a quoted string" state. In that state, the comma would not cause a transition to a new state -- it would just get added to the token you're building. In any other state, the comma would have a particular significance, as denoting the end of a token. You'd have to figure out when you needed to swallow additional whitespace between tokens, whether there was some "escape" that allowed a double-quote to be used in some other token, whether you could escape the end-of-line to make longer lines, and so on.
The important point is that, if you implement this is an FSM (or another, real tokenizer) you actually can consider all these things, and implement them as you need. If you use ad-hoc applications of strtok() and string searching, you can't -- not in an elegant, maintainable way, anyway.
And if, one day, you end up needing to do the whole job using wide characters, that's easy -- just convert the input into wide characters and iterate it one wide character (not byte) at a time.
It's easy to document the behaviour of an FSM parser using a state transition diagram -- easier, at least, that trying to explain it by documenting code in text.
My experience is that the first time somebody implements an FSM tokenizer, it's horrible. After that, it's easy. And you can use the same technique to parse input of much greater complexity when you know the method.

Split string by one of few delimiters? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have this
A > B , or A < B , or A==B ,
using the strtok I will destroy the data , and my goal is to get some kind of a structure where I can examine :
what kind of delimiter I had
get access to both sides of it (A and B).
so:
if ( > )
do something with A and B
else if (==)
do something with A and B
I know it sound simple , but it always comes to be cumbersome .
EDIT:
What i did was this, seems like too long for the task :
for (int k=1;k<strlen(p);k++)
{
char left[4]="" ;
char right[12]="" ;
switch(p[k])
{
case '>' :
{
long num =strstr(p,">") - p ;
strncpy(left,p,num);
strncpy(right,p+num+1,strlen(p)-num-1);
break;
}
case '<' :
{
long num =strstr(p,"<") - p ;
strncpy(left,p,num);
strncpy(right,p+num+1,strlen(p)-num-1);
break;
}
case '=' :
{
long num =strstr(p,"=") - p ;
strncpy(left,p,num);
strncpy(right,p+num+1,strlen(p)-num-1);
break;
}
case '!' :
{
long num =strstr(p,"!") - p ;
strncpy(left,p,num);
strncpy(right,p+num+1,strlen(p)-num-1);
break;
}
default :
{}
}
}

For simple situations where you just want to parse simple strings consisting of two operands and one operator, no "expressions" this might work
#include <stdio.h>
#include <string.h>
int
main(void)
{
const char *string = "A > B";
char lho[100];
char op[3];
char rho[100];
if (sscanf(string, "%99[^=><]%2[=><]%99[^=><]", lho, op, rho) == 3) {
fprintf(stdout, "left hand operand: %s\n", lho);
fprintf(stdout, "operator: %s\n", op);
fprintf(stdout, "right hand operand: %s\n", rho);
}
return 0;
}
This is by no means the best way to do it, it just shows that you can use it. Also, I didn't think a lot about it, I wrote the code to show you a possible solution. I don't actually like it, and I wouldn't use it

Here is a generalized procedure:
For a given set delimiters, use strstr to check each if it appears in the input string. As a bonus, my code below allows 'double' entries such as < and <>; it checks all and use the longest possible.
After determining the best delimiter to use, you have a pointer to its start. Then you can
.. copy everything at its left into a left variable;
.. copy the delimiter itself into a delim variable (for consistency); and
.. copy everything to the right of the delimiter into a right variable.
Point 4 is 'for consistency' with the other two variables. You could also create an enumeration (LESS, EQUALS, MORE, NOT_EQUAL (in my example)) and return that instead, because the set of possibilities is limited to these.
In code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
const char *delimiters[] = {
"<", ">", "==", "<>", NULL
};
int split_string (const char *input, char **dest_left, char **dest_delim, char **dest_right)
{
int iterator;
int best_fit_delim;
char *ptr;
/* (optionally) clean whitespace at start */
while (isspace(*input))
input++;
/* look for the longest delimiter we can find */
best_fit_delim = -1;
iterator = 0;
while (delimiters[iterator])
{
ptr = strstr (input, delimiters[iterator]);
if (ptr)
{
if (best_fit_delim == -1 || strlen(delimiters[iterator]) > strlen(delimiters[best_fit_delim]))
best_fit_delim = iterator;
}
iterator++;
}
/* did we find anything? */
if (best_fit_delim == -1)
return 0;
/* reset ptr to this found one */
ptr = strstr (input, delimiters[best_fit_delim]);
/* copy left hand side */
iterator = ptr - input;
/* clean whitespace at end */
while (iterator > 0 && isspace(input[iterator-1]))
iterator--;
*dest_left = malloc (iterator + 1);
memcpy (*dest_left, input, iterator);
(*dest_left)[iterator] = 0;
/* the delimiter itself */
*dest_delim = malloc(strlen(delimiters[best_fit_delim])+1);
strcpy (*dest_delim, delimiters[best_fit_delim]);
/* update the pointer to point to *end* of delimiter */
ptr += strlen(delimiters[best_fit_delim]);
/* skip whitespace at start */
while (isspace(*ptr))
ptr++;
/* copy right hand side */
*dest_right = malloc (strlen(ptr) + 1);
strcpy (*dest_right, ptr);
return 1;
}
int main (void)
{
char *source_str = "A <> B";
char *left, *delim, *right;
if (!split_string (source_str, &left, &delim, &right))
{
printf ("invalid input\n");
} else
{
printf ("left: \"%s\"\n", left);
printf ("delim: \"%s\"\n", delim);
printf ("right: \"%s\"\n", right);
free (left);
free (delim);
free (right);
}
return 0;
}
resulting, for A <> B, in
left: "A"
delim: "<>"
right: "B"
The code can be a bit smaller if you only need to check your list of <, ==, and >; then you can use strchr, for single characters (and if = is found, check the next character). You can also forget the best_fit length check, as there can be only one that fits.
The code removes whitespace only around the comparison operator. For consistency, you may want to remove all whitespace at the start and end of the input; then, invalid input can be detected by the return left or right variables having a length of 0 – i.e., they only contain the 0 string terminator. You still need to free those zero-length strings.
For fun, you can add "GT","LT","GE","LE" to the delimiters and see how it does on strings such as A GT B, ALLEQUAL, and FAULTY<MATCH.

Converting long linear data into data in C by modifying indexes in array

I'm trying to organize chunks of an apache log file into an array. For example, assume my apache file has a line like this:
[a] [b] [ab] [abc] file not found: /something
What I want to achieve is an array (let's name it ext) so that:
ext[0] = a
ext[1] = b
ext[2] = ab
ext[3] = abc
I then reserve enough space for 20 entries at 5000 characters each via:
char ext[20][5000];
Then I attempt to call my extraction function as follows:
extract("[a] [b] [ab] [abc]",18,ext);
Ideally, the string is replaced with the variable holding the data and the 18 is replaced with the variable showing the actual string size, but I'm using this data as an example.
The extract function won't compile.
It's complaining that in:
char s[20][5000]=*extr,*p,*l=longstring;
there's an invalid initializer. I'm guessing s[20][5000]=*extr is it, but I'm trying to initialize a character array with index values then I want to pass it onto the function caller
It then complains:
warning: passing argument 3 of 'extract' from incompatible pointer type
Am I forced to strictly use pointers and mathematics to calculate offsets or is there a way to pass actual char array with the ability to modify them using index values like I tried to do?
long extract(char* longstring,long sz,char **extr){
unsigned long sect=0,si=0,ssi=0;
char s[20][5000]=*extr,*p,*l=longstring;
while (sz-- > 0){
if (*l=='['){sect=1;p=s[si++];if (si > 20){break;}}
if (*l==']'){sect=0;}else{
if (sect==1){*p++=*l;}
}
l++;
}
}
UPDATE:
As per suggested, I made minor changes and my code is now as follows:
Mainline:
char ext[20][5000];
extract("[a] [b] [ab] [abc]",18,(char**)ext);
printf("%s\n",ext);
return 0;
Function:
long extract(char* longstring, long sz, char **extr) {
unsigned long sect = 0, si = 0, ssi = 0;
char **s = extr, *p, *l = longstring;
while (sz-- > 0) {
if (*l == '[') {
sect = 1;
p = s[si++];
if (si > 20) {
break;
}
}
if (*l == ']') {
sect = 0;
} else {
if (sect == 1) {
*p++ = *l;
}
}
l++;
}
}
And now I receive a segmentation fault. I'm not sure why when I set the offset of one string via p=s[si++] and then incremented it as I add data. I even changed p=s[si++] to p=s[si++][0] in an attempt to specifically want the address of the first character of a particular index but then the compiler shows "warning: assignment makes pointer from integer without a cast".

This uses a scanset, %[], to parse the string. The scan skips leading whitespace and then scans a [. Then the scanset reads characters that are not a ]. Finally a ] is scanned. The %n specifier reports the number of characters processed and that is added to offset to advance through the string. The 4999 prevents writing too many characters to the string [5000].
#include <stdio.h>
#include <stdlib.h>
int extract ( char* longstring,char (*extr)[5000]) {
int used = 0;
int offset = 0;
int si = 0;
while ( ( sscanf ( longstring + offset, " [%4999[^]]]%n", extr[si], &used)) == 1) {
//one item successfully scanned
si++;
offset += used;
if ( si > 20) {
break;
}
}
return si;
}
int main( int argc, char *argv[])
{
char ext[20][5000];
int i = 0;
int result = 0;
result = extract("[a] [b] [ab] [abc]", ext);
for ( i = 0; i < result; i++) {
printf("ext[%d] %s\n",i,ext[i]);
}
return 0;
}

Appending a char to a char* in C?

I'm trying to make a quick function that gets a word/argument in a string by its number:
char* arg(char* S, int Num) {
char* Return = "";
int Spaces = 0;
int i = 0;
for (i; i<strlen(S); i++) {
if (S[i] == ' ') {
Spaces++;
}
else if (Spaces == Num) {
//Want to append S[i] to Return here.
}
else if (Spaces > Num) {
return Return;
}
}
printf("%s-\n", Return);
return Return;
}
I can't find a way to put the characters into Return. I have found lots of posts that suggest strcat() or tricks with pointers, but every one segfaults. I've also seen people saying that malloc() should be used, but I'm not sure of how I'd used it in a loop like this.

I will not claim to understand what it is that you're trying to do, but your code has two problems:
You're assigning a read-only string to Return; that string will be in your
binary's data section, which is read-only, and if you try to modify it you will get a segfault.
Your for loop is O(n^2), because strlen() is O(n)
There are several different ways of solving the "how to return a string" problem. You can, for example:
Use malloc() / calloc() to allocate a new string, as has been suggested
Use asprintf(), which is similar but gives you formatting if you need
Pass an output string (and its maximum size) as a parameter to the function
The first two require the calling function to free() the returned value. The third allows the caller to decide how to allocate the string (stack or heap), but requires some sort of contract about the minumum size needed for the output string.

In your code, when the function returns, then Return will be gone as well, so this behavior is undefined. It might work, but you should never rely on it.
Typically in C, you'd want to pass the "return" string as an argument instead, so that you don't have to free it all the time. Both require a local variable on the caller's side, but malloc'ing it will require an additional call to free the allocated memory and is also more expensive than simply passing a pointer to a local variable.
As for appending to the string, just use array notation (keep track of the current char/index) and don't forget to add a null character at the end.
Example:
int arg(char* ptr, char* S, int Num) {
int i, Spaces = 0, cur = 0;
for (i=0; i<strlen(S); i++) {
if (S[i] == ' ') {
Spaces++;
}
else if (Spaces == Num) {
ptr[cur++] = S[i]; // append char
}
else if (Spaces > Num) {
ptr[cur] = '\0'; // insert null char
return 0; // returns 0 on success
}
}
ptr[cur] = '\0'; // insert null char
return (cur > 0 ? 0 : -1); // returns 0 on success, -1 on error
}
Then invoke it like so:
char myArg[50];
if (arg(myArg, "this is an example", 3) == 0) {
printf("arg is %s\n", myArg);
} else {
// arg not found
}
Just make sure you don't overflow ptr (e.g.: by passing its size and adding a check in the function).
There are numbers of ways you could improve your code, but let's just start by making it meet the standard. ;-)
P.S.: Don't malloc unless you need to. And in that case you don't.

char * Return; //by the way horrible name for a variable.
Return = malloc(<some size>);
......
......
*(Return + index) = *(S+i);

You can't assign anything to a string literal such as "".
You may want to use your loop to determine the offsets of the start of the word in your string that you're looking for. Then find its length by continuing through the string until you encounter the end or another space. Then, you can malloc an array of chars with size equal to the size of the offset+1 (For the null terminator.) Finally, copy the substring into this new buffer and return it.
Also, as mentioned above, you may want to remove the strlen call from the loop - most compilers will optimize it out but it is indeed a linear operation for every character in the array, making the loop O(n**2).

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *arg(const char *S, unsigned int Num) {
char *Return = "";
const char *top, *p;
unsigned int Spaces = 0;
int i = 0;
Return=(char*)malloc(sizeof(char));
*Return = '\0';
if(S == NULL || *S=='\0') return Return;
p=top=S;
while(Spaces != Num){
if(NULL!=(p=strchr(top, ' '))){
++Spaces;
top=++p;
} else {
break;
}
}
if(Spaces < Num) return Return;
if(NULL!=(p=strchr(top, ' '))){
int len = p - top;
Return=(char*)realloc(Return, sizeof(char)*(len+1));
strncpy(Return, top, len);
Return[len]='\0';
} else {
free(Return);
Return=strdup(top);
}
//printf("%s-\n", Return);
return Return;
}
int main(){
char *word;
word=arg("make a quick function", 2);//quick
printf("\"%s\"\n", word);
free(word);
return 0;
}

How to check if a string starts with another string in C?

Is there something like startsWith(str_a, str_b) in the standard C library?
It should take pointers to two strings that end with nullbytes, and tell me whether the first one also appears completely at the beginning of the second one.
Examples:
"abc", "abcdef" -> true
"abcdef", "abc" -> false
"abd", "abdcef" -> true
"abc", "abc" -> true

There's no standard function for this, but you can define
bool prefix(const char *pre, const char *str)
{
return strncmp(pre, str, strlen(pre)) == 0;
}
We don't have to worry about str being shorter than pre because according to the C standard (7.21.4.4/2):
The strncmp function compares not more than n characters (characters that follow a null character are not compared) from the array pointed to by s1 to the array pointed to by s2."

Apparently there's no standard C function for this. So:
bool startsWith(const char *pre, const char *str)
{
size_t lenpre = strlen(pre),
lenstr = strlen(str);
return lenstr < lenpre ? false : memcmp(pre, str, lenpre) == 0;
}
Note that the above is nice and clear, but if you're doing it in a tight loop or working with very large strings, it does not offer the best performance, as it scans the full length of both strings up front (strlen). Solutions like wj32's or Christoph's may offer better performance (although this comment about vectorization is beyond my ken of C). Also note Fred Foo's solution which avoids strlen on str (he's right, it's unnecessary if you use strncmp instead of memcmp). Only matters for (very) large strings or repeated use in tight loops, but when it matters, it matters.

I'd probably go with strncmp(), but just for fun a raw implementation:
_Bool starts_with(const char *restrict string, const char *restrict prefix)
{
while(*prefix)
{
if(*prefix++ != *string++)
return 0;
}
return 1;
}

Use strstr() function. Stra == strstr(stra, strb)
Reference
The strstr() function finds the first occurrence of string2 in string1. The function ignores the null character (\0) that ends string2 in the matching process.
https://www.ibm.com/docs/en/i/7.4?topic=functions-strstr-locate-substring

I'm no expert at writing elegant code, but...
int prefix(const char *pre, const char *str)
{
char cp;
char cs;
if (!*pre)
return 1;
while ((cp = *pre++) && (cs = *str++))
{
if (cp != cs)
return 0;
}
if (!cs)
return 0;
return 1;
}

Optimized (v.2. - corrected):
uint32 startsWith( const void* prefix_, const void* str_ ) {
uint8 _cp, _cs;
const uint8* _pr = (uint8*) prefix_;
const uint8* _str = (uint8*) str_;
while ( ( _cs = *_str++ ) & ( _cp = *_pr++ ) ) {
if ( _cp != _cs ) return 0;
}
return !_cp;
}

I noticed the following function definition in the Linux Kernel. It returns true if str starts with prefix, otherwise it returns false.
/**
* strstarts - does #str start with #prefix?
* #str: string to examine
* #prefix: prefix to look for.
*/
bool strstarts(const char *str, const char *prefix)
{
return strncmp(str, prefix, strlen(prefix)) == 0;
}

Because I ran the accepted version and had a problem with a very long str, I had to add in the following logic:
bool longEnough(const char *str, int min_length) {
int length = 0;
while (str[length] && length < min_length)
length++;
if (length == min_length)
return true;
return false;
}
bool startsWith(const char *pre, const char *str) {
size_t lenpre = strlen(pre);
return longEnough(str, lenpre) ? strncmp(str, pre, lenpre) == 0 : false;
}

Or a combination of the two approaches:
_Bool starts_with(const char *restrict string, const char *restrict prefix)
{
char * const restrict prefix_end = prefix + 13;
while (1)
{
if ( 0 == *prefix )
return 1;
if ( *prefix++ != *string++)
return 0;
if ( prefix_end <= prefix )
return 0 == strncmp(prefix, string, strlen(prefix));
}
}
EDIT: The code below does NOT work because if strncmp returns 0 it is not known if a terminating 0 or the length (block_size) was reached.
An additional idea is to compare block-wise. If the block is not equal compare that block with the original function:
_Bool starts_with_big(const char *restrict string, const char *restrict prefix)
{
size_t block_size = 64;
while (1)
{
if ( 0 != strncmp( string, prefix, block_size ) )
return starts_with( string, prefix);
string += block_size;
prefix += block_size;
if ( block_size < 4096 )
block_size *= 2;
}
}
The constants 13, 64, 4096, as well as the exponentiation of the block_size are just guesses. It would have to be selected for the used input data and hardware.

I use this macro:
#define STARTS_WITH(string_to_check, prefix) (strncmp(string_to_check, prefix, ((sizeof(prefix) / sizeof(prefix[0])) - 1)) ? 0:((sizeof(prefix) / sizeof(prefix[0])) - 1))
It returns the prexif length if the string starts with the prefix. This length is evaluated compile time (sizeof) so there is no runtime overhead.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Compare two char arrays without CR LF - c

Related

Reading from a CSV file and separating the fields to store in a struct in C

Split string by one of few delimiters? [closed]

Converting long linear data into data in C by modifying indexes in array

Appending a char to a char* in C?

How to check if a string starts with another string in C?

Categories

Resources