Finding first and last pattern in buffer - c

I have a character buffer which will contain text in this format.
somecontent...boundary="abc_is_the_boundary"
content-length=1234
--abc_is_the_boundary
somecontent
--abc_is_the_boundary
This buffer is stored in char * buf;
Now my objective is identify the boundary value which is abc_is_the_boundary in this case and pass all the contents in the buffer under that boundary to a function and get a new string which will replace it. Even --abc_is_the_boundary will be sent to the function.
So in this case the buffer passed to the function will be
--abc_is_the_boundary
somecontent
--abc_is_the_boundary
After processing, say it returns xyz.
The content-length has changed to 3 and now the resulting buffer must look like this
somecontent...boundary="abc_is_the_boundary"
content-length=3
xyz
I can identify the boundary value using strstr. But how do I find first instance of the boundary and last instance of the boundary? the boundary can be there multiple times, but only first and last have to be found. The content-length can be modified by using strstr again, and go to the speicific location and modified. Is that the best way.
I hope you have understood

You can use simple pointer arithmetic for finding the first and the last occurrence of the pattern. Think about it this way: For the first appearance of the pattern you use the first result of strstr, since this is exactly what this function was designed for. Then you ask yourself "is there another occurrence of the pattern after the first one" and use strstr again for this. You repeat this until you find no further occurrence. The last one you found must then be the last one in the whole buffer.
It would then look somewhat like this. The code below is neither compiled, nor tested, but the idea should be clear:
char *buf, *pattern, *firstOcc, *lastOcc, *temp;
// ... extract pattern from buffer
firstOcc = strstr(buf, pattern);
temp = firstOcc;
do {
lastOcc = temp;
temp = strstr(lastOcc + 1, pattern);
} while(temp != 0);
By searching from the last found location + 1 you exclude the last location, whence strstr will deliver to you the location after the last one found.

Related

Finding one array in another Array

We have two arrays
char A[]="ABABABABBBABAB";
And the other is
char B[]="BABA";
How can I find B in A and where it starts and where it ends for every occurence?
For example for this one
Between 2-5
Between 4-7
Between 10-13
Yes you can do this using strstr function.
This function returns a pointer to the first occurrence in haystack of any of the entire sequence of characters specified in needle, or a null pointer if the sequence is not present in haystack.
So you will find the pointer pointing the beginning of the string. But then again if you want to find the next occurence you will change the first parameter accordingly omitting the portion where first occurence is found. A simple illustration :-
char haystack[]="abismyabnameab";
char needle[]="ab";
char *ret;
ret = strstr(haystack, needle);
while(ret != NULL){
/* do work */
printf("%s (%zu,%zu)\n",ret, ret-haystack, ret-haystack+strlen(needle)-1 );
ret = strstr(haystack+(ret-haystack)+1,needle);
}
I omitted the part where you get those count's where it spits out the indices of the needle. As an hint notice one thing - the length of the needle will eb known to you and where it starts you know that using strstr. (ret - haystack specifically for each instance of needle in haystack).
Note this illustration code is showing the example for strings which are non-recurring within itself. For example, BB is found in BBBBB then we will find every occurrence in each position. But the solution above skips the second occurrence. A simple modification is adding to haystack 1 to search in string one character later than the previous iteration.
Better solution is to find the failure function using KMP. That will give a better complexity solution. O(n+m). But in earlier case it is O(n*m).

Getting strange characters from strncpy() function

I am supposed to load a list of names from a file, and then find those names in the second file and load them in a structure with some other data (for the simplicity, I will load them to another array called "test".
The first part is just fine, I am opening a file and loading all the names into a 2dimensional array called namesArr.
The second part is where unexpected characters occur, and I can't understand why. Here is the code of the function:
void loadStructure(void){
char line[MAX_PL_LENGTH], *found;
int i, j=0;
char test[20][20];
FILE *plotPtr=fopen(PLOT_FILE_PATH, "r");
if (plotPtr==NULL){perror("Error 05:\nError opening a file in loadStructure function. Check the file path"); exit(-5);}
while(fgets(line, MAX_PL_LENGTH, plotPtr)!=NULL){ // This will load each line from a file to an array "line" until it reaches the end of file.
for(i=0; i<numOfNames; i++){ // Looping through the "namesArr" array, which contains the list of 20 character names.
if((found=strstr(line, namesArr[i]))!=NULL){ // I use strstr() to find if any of those names appear in the particular line.
printf("** %s", found); // Used of debugging.
strncpy(test[j], found, strlen(namesArr[i])); j++; // Copying the newly found name to test[j] (copying only the name, by defining it's length, which is calculated by strlen function).
}
}
}
fclose(plotPtr);
printf("%s\n", test[0]);
printf("%s\n", test[1]);
printf("%s\n", test[2]);
}
This is the output I get:
...20 names were loaded from the "../Les-Mis-Names-20.txt".
** Leblanc, casting
** Fabantou seems to me to be better," went on M. Leblanc, casting
** Jondrette woman, as she stood
Leblanct╕&q
Fabantou
Jondretteⁿ  └
Process returned 0 (0x0) execution time : 0.005 s
Press any key to continue.
The question is, why am I getting characters like "╕&q" and "ⁿ  └" in the newly created array? And also, is there any other more efficient way to achieve what I am trying to do?
The problem is that strncpy does not store a null in the target array if the length specified is less than the source string (as is always the case here). So whatever garbage happpend to be in the test array will remain there.
You can fix this specific problem by zeroing the test array, either when you declare it:
char test[20][20] = { { 0 } };
or as you use it:
memset(test[j], 0, 20);
strncpy(test[j], found, strlen(namesArr[i]));
but in general, it is best to avoid strncpy for this reason.
The length limitation for strncpy should be based on the target size, not the source length: that's the point of using it over strcpy, which uses only the source length. In your code
strncpy(test[j], found, strlen(namesArr[i]));
the length parameter is from the source array, which defeats the purpose of using strncpy. In addition, the nul terminator will not be present if the function copies the full limit of bytes, so the code should be
strncpy(test[j], found, 19); // limit to target size, leaving room for terminator
test[j][19] = '\0'; // add terminator (if copy did not complete)
Whether you loaded namesArr[] from file correctly is another potential issue, since you do not show the code.
Edited:
Slight modification to a previous answer:
1) Since you are working with C strings, make sure (since strncpy(...) does not do it for you) that you null terminate the buffer.
2) When using strncpy the length argument should represent the target string byte capacity - 1 (space for null terminator), not the source string length.
...
int len = strlen(found)
memset(test[j], 0, 20);
strncpy(test[j], found, 19);//maximum length (19) matches array size
//of target string -1 ( test[j] ).
if(len > 19) len = 19; //in case length of found is longer than the target string.
test[j][len+1] = 0;
...
In addition to what Chris Dodd said,, quoted from man strncpy
The strncpy() function is similar [to the strcpy() function], except that at most n bytes of src are copied. Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
Since the size parameter in your strncpy call is the length of the string, this will not include the null byte at the end of the string and thus your destination string will not be null-terminated from this call.

Subtracting from pointer to get length

I wanted to find the length of a part of a string after searching for it within a bigger string.
I cannot use strlen since I am dealing with binary data.
char *temp= "this is some random text";
char *temp1 = strstr(temp,"some");
int len = strlen(temp);
int len1 =0;
len1 = temp+len - temp1;
to get length of "some random text"
len1 returns negative value (even the positive value of it is wrong)
If your data is not NULL-terminated, then you cannot call strstr() on it for the same reason you can't call strlen(). If you do that, you can end up scanning past the end of your data. If you find a match there (which is quite possible; reading past the end of arrays is not guaranteed to crash the program), then your pointer arithmetic is going to give you a negative value, because you're subtracting a larger address from a smaller one.
On the other hand, if your data is actually properly NULL-terminated, then your problem is probably that strstr() doesn't find the substring and thus returns NULL. Are you checking for NULL? Otherwise, what you end up doing is:
len1 = temp + len - (char*)NULL;
Final answer:
You're looking for len - (temp1 - temp). The length of the first part is temp1 - temp. Substract it from the length of the entire string to get the length of the remaining part.
Longer answer:
Since strlen (which is what you have used in your example, even if it only works for proper text messages) goes until it finds a \0 character you can simply use strlen(temp1) for the length of the last part of the input. If you are really concerned that calling strlen twice will harm your performance (really?) then you can use len - (temp1 - temp).
You only need to do pointer substraction if you are interested in the length of the first part of the input.
If you want to work with binary arrays which contain \0 in them at non-terminal position you cannot use strlen at all in your code. However, you have to have a way to specify the length of the entire input. Either you have this in an integer variable or you have a specific delimiter an a length-computing function.
If you have the integer variable for length then, since the length of the first part of the input is obtained by pointer substraction, you only have to do len - (temp1 - temp).
If you have a length-computing function, simply call it with temp1 as argument.
PS: Don't forget to check if strstr returns NULL (by the way, you cannot use strstr if you have binary data with \0 inside the buffer)

second memcpy() attaches previous memcpy() array to it

I have a little problem here with memcpy()
When I write this
char ipA[15], ipB[15];
size_t b = 15;
memcpy(ipA,line+15,b);
It copies b bytes from array line starting at 15th element (fine, this is what i want)
memcpy(ipB,line+31,b);
This copies b bytes from line starting at 31st element, but it also attaches to it the result for previous command i.e ipA.
Why? ipB size is 15, so it shouldnt have enough space to copy anything else. whats happening here?
result for ipA is 192.168.123.123
result for ipB becomes 205.123.123.122 192.168.123.123
Where am I wrong? I dont actually know alot about memory allocation in C.
It looks like you're not null-terminating the string in ipA. The compiler has put the two variables next to one another in memory, so string operations assume that the first null terminator is sometime after the second array (whenever the next 0 occurs in memory).
Try:
char ipA[16], ipB[16];
size_t b = 15;
memcpy(ipA,line+15,b);
ipA[15] = '\0';
memcpy(ipB,line+31,b);
ipB[15] = '\0';
printf("ipA: %s\nipB: %s\n", ipA, ipB)
This should confirm whether this is the problem. Obviously you could make the code a bit more elegant than my test code above. As an alternative to manually terminating, you could use printf("%.*s\n", b, ipA); or similar to force printf to print the correct number of characters.
Are you checking the content of the arrays by doing printf("%s", ipA) ? If so, you'll end up with the described effect since your array is interpreted as a C string which is not null terminated. Do this instead: printf("%.*s", sizeof(ipA), ipA)
Character strings in C require a terminating mark. It is the char value 0.
As your two character strings are contiguous in memory, if you don't terminate the first character string, then when reading it, you will continue until memory contains the end-of-string character.

Remove redundant whitespace from string (in-place)

Ok so i posted earlier about trying to (without any prebuilt functions) remove additional spaces so
"this is <insert many spaces!> a test" would return
"this is a test"
Remove spaces from a string, but not at the beginning or end
As it was homework i asked for no full solutions and some kind people provided me with the following
"If you need to do it in place, then create two pointers. One pointing to the character being read and one to the character being copied. When you meet an extra space, then adapt the 'write' pointer to point to the next non space character. Copy to the read position the character pointed by the write character. Then advance the read and write pointers to the character after the character being copied."
The problem is i now want to fully smash my computer in to pieces as i am sooooo irritated by it. I didnt realise at the time i couldnt utilise the char array so was using array indexes to do it, i thought i could suss how to get it to work but now i am just using pointers i am finding it very hard. I really need some help, not full solutions. So far this is what i am doing;
1)create a pointer called write, so i no where to write to
2)create a pointer called read so i no where to read from
(both of these pointers will now point to the first element of the char array)
while (read != '\0')
if read == a space
add one to read
if read equals a space now
add one to write
while read != a space {
set write to = read
}
add one to both read and write
else
add one to read and write
write = read
Try just doing this yourself character by character on a piece of paper and work out what it is you are doing first, then translate that into code.
If you are still having trouble, try doing something simpler first, for example just copying a string character for character without worrying about the "remove duplicate spaces" part of it - just to make sure that you haven't made a silly mistake elsewhere.
The advice of trying to do it with pen and paper first is good. And it really doesn't matter if you do it with pointers or array indexing; you can either use a reader and a writer pointer, or a reader and a writer index.
Think about when you want to move the indices forward. You always move the write index forward after you write a character. You move the read index forward when you've read a character.
Perhaps you could start with some code that just moves over the string, but actually doesn't change it. And then you add the logic that skips additional spaces.
char p[] = "this is a test";
char *readptr = &p[0];
char *writeptr = &p[0];
int inspaces = 0;
while(*readptr) {
if(isspace(*readptr)) {
inspaces ++;
} else {
inspaces = 0;
}
if(inspaces <= 1) {
*writeptr = *readptr;
writeptr++;
}
readptr++;
}
*writeptr = 0;
Here's a solution that only has a single loop (no inner loop to skip spaces) and no state data:
dm9pZCBTdHJpcFNwYWNlcyAoY2hhciAqdGV4dCkNCnsNCiAgY2hhciAqc3JjID0gdGV4dCwgKmRl
c3QgPSB0ZXh0Ow0KICB3aGlsZSAoKnNyYykNCiAgew0KICAgICpkZXN0ID0gKihzcmMrKyk7DQog
ICAgaWYgKCpzcmMgIT0gJyAnIHx8ICpkZXN0ICE9ICcgJykNCiAgICB7DQogICAgICArK2Rlc3Q7
DQogICAgfQ0KICB9DQogICpkZXN0ID0gMDsNCn0NCg==
The above is Base64 encoded as this is a homework question. To decode, copy the above block and paste into this website to decode it (you'll need to check the "Decode" radio button).
assuming you're trying to write a function like:
void removeSpaces(char * str){
/* ... stuff that changes the contents of str[] */
}
You want to scan the string for consecutive spaces, so that your write pointer is always trailing your read pointer. Advance your read pointer to a place where you are pointing at a space, but the next character is not a space. If your read and write pointers are not the same, then your write pointer ought to be pointing at the beginning of a sequence of spaces. The difference between your write and read pointer (i.e. read_pointer - write_pointer) will tell you the number of consecutive spaces that need to be overwritten to close the gap. When there is a difference of greater than zero, (prefix) advance both pointers along by that many positions, copying characters as you go. When you're read pointer is at the end of the string ('\0'), you should be done.
Can you use regular expressions? If so, it might be really easy. Use a regex to replace \s{2,}? with a single space. The \s means any white space (tabs, spaces, carriage feeds...); {2,} means 2 or more; ? means non-greedy. (Disclaimer: my regex might not be the best one you could write, since I'm no regex pro. Also, it's .net syntax, so the regex library for C might have slightly different syntax.)
Why don't you do something like this:
make a second char** that is the same length as the first. As you run through the first array with your pointers keep an extra pointer on the last space you've seen. If the last space you saw was the previous element then you don't copy that char to the second array.
But I would start with something that runs through the char** by each character and prints out each char. If you can make that happen then you can work on actually copying them into a second char**.

Resources