Efficient way to find and then copy a substring in C

Efficient way to find and then copy a substring in C - c

I just want to find a special sub-string in another string and save it as another string. Here is the code:
char sub[13]={};
char *ptr = sub;
char src[100] = "SOME DATE HERE CBC: 2345,23, SOME OTHER DATA";
// |----------|
ptr = strstr(src,"CBC:");
strncpy(sub,ptr,sizeof(sub)-1);
Is this an efficient way or does a better method exist for this?
Thanks.

Is this an efficient way or does a better method exist for this?
Pros:
Uses strstr() which is likely more efficient and correct that coding your own search.
Cons:
Does not handle the case when strstr() return NULL resulting in undefined behavior (UB). #Paul Ogilvie
ptr = strstr(src,"CBC:");
// add test
if (ptr) {
// copy
} else {
// Handle not found, perhaps `sub[0] = '\0';`
}
char sub[13]={}; is not compliant C code. #pmg. A full initialization of the array is not needed - even though it is a common good practice.
Code does not quite fulfill "want to find a special sub-string in another string and save it as another string". Its more like "want to find a special sub-string in another string and save it and more as another string".
strncpy(sub,ptr,sizeof(sub)-1) can unnecessarily mostly fill the array with null characters. This is inefficient when ptr points to a string much less than sizeof(sub). Code could use strncat() but that is tricky. See this good answer #AnT.
// alternative
char src[100] = "SOME DATE HERE CBC: 2345,23, SOME OTHER DATA";
char sub[13];
sub[0] = '\0';
const char *ptr = strstr(src, "CBC:");
if (ptr) {
strncat(sub, p, sizeof sub - 1);
}

Unless this piece of code is in a critical path and it's the actual source of a performance bottleneck, then just stick to what you have. Default strstr implementation should be quite adequate for the task.
You can squeeze some peanuts by pre-computing the end of src, for example, so you could use memcmp (an unconditional loop) instead of strncpy (which has a conditional loop) when extracting the sub. If you know beforehand the substring you are searching for, you can optimize around that too; especially if it's exactly 4 characters. And so on and so forth.
But if you are after these peanuts, you might be (much) better off by redoing the code to not extract sub to begin with and use something like ranged strings to keep track of where it is in the source string.

Related

Printing arithmetic operator '+' from user input in C [duplicate]

I am searching a character at first occurence in string using the following code.
But it is taking some time when the character is too long or the character that I am
searching is at far extent, which delays other operations. How could I tackle this problem. The code is below here.
Note: attrPtr is a char * which holds a reference to a string containing '"' character at far extent.
int position = 0;
char qolon = '"';//character to search
while (*(attrPtr + position++) != qolon);
char* attrValue = NULL;
attrValue = (char*)malloc(position * sizeof(char));
strncpy(attrValue, attrPtr, position-1);

strchr will usually be somewhat faster. Also, you need to check for the NUL terminator, which strchr will handle for you.
char *quotPtr = strchr(attrPtr, qolon);
if(quotPtr == NULL)
{
... // Handle error
}
int position = quotPtr - attrPtr;
char* attrValue = (char*) malloc((position + 1) * sizeof(char));
memcpy(attrValue, attrPtr, position);
attrValue[position] = '\0';
I haven't tested, though.
EDIT: Fix off-by-one.

C has a built-in function for searching for a character in a string - strchr(). strchr() returns a pointer to the found character, not the array position, so you have to subtract the pointer to the start of the string from the returned pointer to get that. You could rewrite your function as:
char qolon = '"';//character to search
char *found;
char *attrVal = NULL;
found = strchr(attrPtr, qolon);
if (found)
{
size_t len = found - attrPtr;
attrVal = malloc(len + 1);
memcpy(attrVal, attrPtr, len);
attrVal[len] = '\0';
}
This may be faster than your original by a small constant factor; however, you aren't going to get an order-of-magnitude speedup. Searching for a character within an un-ordered string is fundamentally O(n) in the length of the string.

Two important things:
1) Always check for a NULL terminator when searching a string this way:
while (*(attrPtr + position++) != qolon);
should be:
while (attrPtr[position] && attrPtr[position++] != qolon);
(if passed a string lacking your searched character, it could take a very long time as it scans all memory). Edit: I just noticed someone else posted this before, me, but oh well. I disagree, btw, strchr() is fine, but a simple loop that also checks for the terminator is fine (and often has advantages), too.
2) BEWARE of strncpy()!
strncpy(attrValue, attrPtr, position-1);
strlen(attrPtr)>=(position-1) so this will NOT null terminate the string in attrValue, which could cause all kinds of problems (including incredible slowdown in code later on). As a related note, strncpy() is erm, uniquely designed, so if you do something like:
char buf[512];
strncpy(buf,"",4096);
You will be writing 4096 bytes of zeroes.
Personally, I use lstrcpyn() on Win32, and on other platforms I have a simple implementation of it. It is much more useful for me.

It requires an O(n) algorithm to search for a character in the string. So you can't do much better than what your are already doing. Also, note that you are missing memset(attrValue, 0, position); , otherwise your string attrValue will not be null terminated.

The algorithm you posted doesn't properly handle the case where the character doesn't exist in the string. If that happens, it will just merilly march through memory until it either randomly happens to find a byte that matches your char, or you blow past your allocated memory and get a segfault. I suspect that is why it seems to be "taking too long" sometimes.
In C, strings are usually terminated with a 0 (ascii nul, or '\0'). Alternatively, if you know the length of the string ahead of time, you can use that.
Of course, there is a standard C library routine that does exactly this: strchr(). A wise programmer would use that rather than risk bugs by rolling their own.

removing multi-char constants in C

Here's some code I found in a very old C library that's trying to eat whitespace from a file...
while(
(line_buf[++line_idx] != ' ') &&
(line_buf[ line_idx] != ' ') &&
(line_buf[ line_idx] != ',') &&
(line_buf[ line_idx] != '\0') )
{
This great thread explains what the problem is, but most of the answers are "just ignore it" or "you should never do this". What I don't see, however, is the canonical solution. Can anyone offer a way to code this test using the "proper way"?
UPDATE: to clarify, the question is "what is the proper way to test for the presence of a string of one or more characters at a given index in another string". Forgive me if I am using the wrong terminology.

Original question
There is no canonical or correct way. Multi-character constants have always been implementation defined. Look up the documentation for the compiler used when the code was written and figure out what was meant.
Updated question
You can match multiple characters using strchr().
while (strchr( " ,", line_buf[++line_idx] ))
{
Again, this does not account for that multi-char constant. You should figure out why that was there before simply removing it.
Also, strchr() does not handle Unicode. If you are dealing with a UTF-8 stream, for example, you will need a function capable of handling it.
Finally, if you are concerned about speed, profile. The compiler might get you better results using the three (or four) individual test expressions in the ‘while’ condition.
In other words, the multiple tests might be the best solution!
Beyond that, I smell some uncouth indexing: the way that line_idx is updated depends on the surrounding code to actuate the loop properly. Make sure that you don’t create an off-by-one error when you update stuff.
Good luck!

UPDATE: to clarify, the question is "what is the proper way to test
for the presence of a string of one or more characters at a given
index in another string". Forgive me if I am using the wrong
terminology.
Well, there are a number of ways, but the standard way is using strspn which has the prototype:
size_t strspn(const char *s, const char *accept);
and it cleverly:
calculates the length (in bytes) of the initial segment of s
which consists entirely of bytes in accept.
This allows you to test for the "the presence of a string of one or more characters at a given index in another string" and tells you how many of the characters from that string were sequentially matched.
For example, if you had another string say char s = "somestring"; and wanted to know if it contained the letters r, s, t, say, in char *accept = "rst"; beginning at the 5th character, you could test:
size_t n;
if ((n = strspn (&s[4], accept)) > 0)
printf ("matched %zu chars from '%s' at beginning of '%s'\n",
n, accept, &s[4]);
To compare in order, you can use strncmp (&s[4], accept, strlen (accept));. You can also simply use nestest loops to iterate over s with the characters in accept.
All of the ways are "proper", so long as they do not invoke Undefined Behavior (and are reasonable efficient).

C strings initialization. Possible trouble

Good day everyone. I have the following c-string initialization:
*(str) = 0; where str is declared as char str[255];. The questions are:
1) is it the best way to initialize a string this way?
2) should I expect trouble with this code on 64-bit platform?
Thanks in advance.

The instruction *(str) = 0 is completely equivalent to str[0] = 0;.
The initiatlization simply place the integer 0 (which is equivalent to '\0') in the first position, making the string empty.
1) There's no "best way", the key is being consistent so if you choose the "*(str) = 0;" path, always do the same everywhere.
2) No trouble could ever come from that statement. Whatever the architecture, the compiler will take care of everything.

What this does is change the uninitialized char array str, of which each element has some undefined value, to have its first element set to the null terminator character.
As far as most string related functions go, they will traverse this string from left to right, and immediately see the null terminator.
There are surely functions that don't care about null terminators, and are given a maximum size. In these cases, you are reading undefined values, which is Bad™.
Use this instead:
char str[255] = { 0 }; // zero every element
or initialize the string to something useful on first use.

Showing the difference between 2 char * strings in C

I am new to C, and I am trying to figure out the best way to approach this problem. I have 2 strings that are both char *'s.
They have multiple \n characters within the strings themselves, and they are usually about 1000 characters in length. I want to display only single lines that are different. Typically only one character (or a relatively small number) would be different in the entire string. So I was hoping to make it so that I could display only that one changed line (the whole string from \n to \n).
I'm not asking for anybody to write the code, or even supply code examples, just in theory what would be the most efficient way to do this?
I've been looking into using strtok, using the '\n' symbol as a delimiter, and then using strcmp to compare the two strings, and if they were not equal then I could add that string to a "old_data" and "new_data" array. Would this be a bad way to do this?
Any advice would be a huge help.

It sounds like you're on the right track: strsep will let you chunk the string up by newline. One thing to keep in mind is that it operates on the original string in place, and doesn't allocate any new memory, which can be both a blessing and a curse.
Probably the most memory efficient way to do this would be to look into allocating arrays of pointers to hold your "old_data" and "new_data" values, and then just save the pointers that point directly into the original string rather than copying the strings themselves over. As long as your original two strings are going to stick around/not get freed from under you, this could save you a decent chunk of memory.
If you aren't ever going to be removing strings from the arrays, one naïve (but effective) way to implement your arrays is to maintain two state variables — a count, and a capacity — and double the capacity each time you're about to overflow the array. E.g.:
char **strArray = NULL;
unsigned int capacity = 10;
unsigned int count = 0;
strArray = malloc(capacity * sizeof(char *));
/* on insert */
if (count == capacity)
{
capacity *= 2;
strArray = realloc(strArray, capacity * sizeof(char *));
}
strArray[count++] = pointerIntoOriginalString;
Good luck!

strtok() is not reentrant. If you were going to do this with strtok, you'd have to iterate over the arrays one after the other. I recommend using strtok_r(), which is the reentrant implementation of strtok.
The other consideration you need to worry about is making sure that your old_data and new_data arrays are big enough, or resizable. Matt's answer shows a simple example of resizing the array, although if you're new to C you might just want to declare something like:
char *new_data[2000];
char *old_data[2000];
Especially since it sounds like you have a good idea of how many lines are in your buffer.

Proper way to empty a C-String

I've been working on a project in C that requires me to mess around with strings a lot. Normally, I do program in C++, so this is a bit different than just saying string.empty().
I'm wondering what would be the proper way to empty a string in C. Would this be it?
buffer[80] = "Hello World!\n";
// ...
strcpy(buffer, "");

It depends on what you mean by "empty". If you just want a zero-length string, then your example will work.
This will also work:
buffer[0] = '\0';
If you want to zero the entire contents of the string, you can do it this way:
memset(buffer,0,strlen(buffer));
but this will only work for zeroing up to the first NULL character.
If the string is a static array, you can use:
memset(buffer,0,sizeof(buffer));

Two other ways are strcpy(str, ""); and string[0] = 0
To really delete the Variable contents (in case you have dirty code which is not working properly with the snippets above :P ) use a loop like in the example below.
#include <string.h>
...
int i=0;
for(i=0;i<strlen(string);i++)
{
string[i] = 0;
}
In case you want to clear a dynamic allocated array of chars from the beginning,
you may either use a combination of malloc() and memset() or - and this is way faster - calloc() which does the same thing as malloc but initializing the whole array with Null.
At last i want you to have your runtime in mind.
All the way more, if you're handling huge arrays (6 digits and above) you should try to set the first value to Null instead of running memset() through the whole String.
It may look dirtier at first, but is way faster. You just need to pay more attention on your code ;)
I hope this was useful for anybody ;)

Depends on what you mean by emptying. If you just want an empty string, you could do
buffer[0] = 0;
If you want to set every element to zero, do
memset(buffer, 0, 80);

If you are trying to clear out a receive buffer for something that receives strings I have found the best way is to use memset as described above. The reason is that no matter how big the next received string is (limited to sizeof buffer of course), it will automatically be an asciiz string if written into a buffer that has been pre-zeroed.

I'm a beginner but...Up to my knowledge,the best way is
strncpy(dest_string,"",strlen(dest_string));

For those who clicked into this question to see some information about bzero and explicit_bzero:
bzero: this function is deprecated
explicit_bzero: this is not a standard c function and is absent on some of the BSDs
Further explanation:
why-use-bzero-over-memset
bzero(3)

needs name of string and its length
will zero all characters
other methods might stop at the first zero they encounter
void strClear(char p[],u8 len){u8 i=0;
if(len){while(i<len){p[i]=0;i++;}}
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Efficient way to find and then copy a substring in C - c

Related

Printing arithmetic operator '+' from user input in C [duplicate]

removing multi-char constants in C

C strings initialization. Possible trouble

Showing the difference between 2 char * strings in C

Proper way to empty a C-String

Categories

Resources