working with strings in c

working with strings in c - c

may someone please help me understand these lines of code in the program below
this program according the writer it writes a string of hello world then there is a function in it that also reverses the string to world hello,my quest is what does this code do?
char * p_divs = divs; //what does divs do
char tmp;
while(tmp = *p_divs++)
if (tmp == c) return 1
;
also this code in the void function
*dest = '\0';//what does this pointer do?
int source_len = strlen(source); //what is source
if (source_len == 0) return;
char * p_source = source + source_len - 1;
char * p_dest = dest;
while(p_source >= source){
while((p_source >= source) && (inDiv(*p_source, divs))) p_source--;
this is the main program
#include <stdio.h>
#include <string.h>
int inDiv(char c, char * divs){
char * p_divs = divs;
char tmp;
while(tmp = *p_divs++)
if (tmp == c) return 1;
return 0;
}
void reverse(char * source, char * dest, char * divs){
*dest = '\0';
int source_len = strlen(source);
if (source_len == 0) return;
char * p_source = source + source_len - 1;
char * p_dest = dest;
while(p_source >= source){
while((p_source >= source) && (inDiv(*p_source, divs))) p_source--;
if (p_source < source) break;
char * w_end = p_source;
while((p_source >= source) && (!inDiv(*p_source, divs))) p_source--;
char * w_beg = p_source + 1;
for(char * p = w_beg; p <= w_end; p++) *p_dest++ = *p;
*p_dest++ = ' ';
}
*p_dest = '\0';
}
#define MAS_SIZE 100
int main(){
char source[MAS_SIZE], dest[MAS_SIZE], divs[MAS_SIZE];
printf("String : "); gets(source);
printf("Dividers : "); gets(divs);
reverse(source, dest, divs);
printf("Reversed string : %s", dest);
return 0;
}

Here, inDiv can be called to search for the character c in the string divs, for example:
inDiv('x', "is there an x character in here somewhere?') will return 1
inDiv('x', "ahhh... not this time') will return 0
Working through it:
int inDiv(char c, char * divs)
{
char * p_divs = divs; // remember which character we're considering
char tmp;
while(tmp = *p_divs++) // copy that character into tmp, and move p_divs to the next character
// but if tmp is then 0/false, break out of the while loop
if (tmp == c) return 1; // if tmp is the character we're searching for, return "1" meaning found
return 0; // must be here because tmp == 0 indicating end-of-string - return "0" meaning not-found
}
We can infer things about reverse by looking at the call site:
int main()
{
char source[MAS_SIZE], dest[MAS_SIZE], divs[MAS_SIZE];
printf("String : ");
gets(source);
printf("Dividers : ");
gets(divs);
reverse(source, dest, divs);
printf("Reversed string : %s", dest);
We can see gets() called to read from standard input into character arrays source and divs -> those inputs are then provided to reverse(). The way dest is printed, it's clearly meant to be a destination for the reversal of the string in source. At this stage, there's no insight into the relevance of divs.
Let's look at the source...
void reverse(char * source, char * dest, char * divs)
{
*dest = '\0'; //what does this pointer do?
int source_len = strlen(source); //what is source
if (source_len == 0) return;
char* p_source = source + source_len - 1;
char* p_dest = dest;
while(p_source >= source)
{
while((p_source >= source) && (inDiv(*p_source, divs))) p_source--;
Here, *dest = '\0' writes a NUL character into the character array dest - that's the normal sentinel value encoding the end-of-string position - putting it in at the first character *dest implies we want the destination to be cleared out. We know source is the textual input that we'll be reversing - strlen() will set source_len to the number of characters therein. If there are no characters, then return as there's no work to do and the output is already terminated with NUL. Otherwise, a new pointer p_source is created and initialised to source + source_len - 1 -> that means it's pointing at the last non-NUL character in source. p_dest points at the NUL character at the start of the destination buffer.
Then the loop says: while (p_source >= source) - for this to do anything p_source must initially be >= source - that makes sense as p_source points at the last character and source is the first character address in the buffer; the comparison implies we'll be moving one or both towards the other until they would cross over - doing some work each time. Which brings us to:
while((p_source >= source) && (inDiv(*p_source, divs))) p_source--;
This is the same test we've just seen - but this time we're only moving p_source backwards towards the start of the string while inDiv(*p_source, divs) is also true... that means that the character at *p_source is one of the characters in the divs string. What it means is basically: move backwards until you've gone past the start of the string (though this test has undefined behaviour as Michael Burr points out in comments, and really might not work if the string happens to be allocated at address 0 - even if relative to some specific data segment, as the pointer could go from 0 to something like FFFFFFFF hex without ever seeming to be less than 0) or until you find a character that's not one of the "divider" characters.
Here we get some real insight into what the code's doing... dividing the input into "words" separated by any of a set of characters in the divs input, then writing them in reverse order with space delimiters into the destination buffer. That's getting ahead of ourselves a bit - but let's track it through:
The next line is...
if (p_source < source) break;
...which means if the loop exited having backed past the front of the source string, then break out of all the loops (looking ahead, we see the code just puts a new NUL on the end of the already-generated output and returns - but is that what we'd expect? - if we'd been backing through the "hello" in "hello world" then we'd hit the start of the string and terminate the loop without copying that last "hello" word to the output! The output will always be all the words in the input - except the first word - reversed - that's not the behaviour described by the author).
Otherwise:
char* w_end = p_source; // remember where the non-divider character "word" ends
// move backwards until there are no more characters (p_source < source) or you find a non-divider character
while((p_source >= source) && (!inDiv(*p_source, divs))) p_source--;
// either way that loop exited, the "word" begins at p_source + 1
char * w_beg = p_source + 1;
// append the word between w_beg and w_end to the destination buffer
for(char* p = w_beg; p <= w_end; p++) *p_dest++ = *p;
// also add a space...
*p_dest++ = ' ';
This keeps happening for each "word" in the input, then the final line adds a NUL terminator to the destination.
*p_dest = '\0';
Now, you said:
according [to] the writer it writes a string of hello world then there is a function in it that also reverses the string to world hello
Well, given inputs "hello world" and divider characters including a space (but none of the other characters in the input), then the output would be "hello world " (note the space at the end).
For what it's worth - this code isn't that bad... it's pretty normal for C handling of ASCIIZ buffers, though the assumptions about the length of the input are dangerous and it's missing that first word....
** How to fix the undefined behaviour **
Regarding the undefined behaviour - the smallest change to address that is to change the loops so they terminate when at the start of the buffer, and have the next line explicitly check why it terminated and work out what behaviour is required. That will be a bit ugly, but isn't rocket science....

char * p_divs = divs; //what does divs do
char tmp;
while(tmp = *p_divs++)
if (tmp == c) return 1
divs is a pointer to a char array (certainly a string). p_divs just points to the same string and within the while loop a single character is extraced and written to tmp, and then the pointer is incremented meaning that the next character will be extraced on the next iterator. If tmp matches c the function returns.
Edit: You should learn more about pointers, have a look at Pointer Arithmetic.

As I pointed out in the comments, I don't think C is really the ideal tool for this task (given a choice, I'd use C++ without a second thought).
However, I suppose if I'm going to talk about how horrible the code is, the counter-comment really was right: I should post something better. Contrary to the comment in question, however, I don't think this represents a compromise in elegance, concision, or performance.
The only part that might be open to real argument is elegance, but think this is enough simpler and more straightforward that there's little real question in that respect. It's clearly more concise -- using roughly the same formatting convention as the original, my rev_words is 14 lines long instead of 17. As most people would format them, mine is 17 lines and his is 21.
For performance, I'd expect the two to be about equivalent under most circumstances. Mine avoids running off the beginning of the array, which saves a tiny bit of time. The original contains an early exit, which will save a tiny bit of time on reversing an empty string. I'd consider both insignificant though.
I think one more point is far more important though: I'm reasonably certain mine doesn't use/invoke/depend upon undefined behavior like the original does. I suppose some people might consider that justified if it provided a huge advantage in another area, but given that it's roughly tied or inferior in the other areas, I can't imagine who anybody would consider it (even close to) justified in this case.
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
int contains(char const *input, char val) {
while (*input != val && *input != '\0')
++input;
return *input == val;
}
void rev_words(char *dest, size_t max_len, char const *input, char const *delims) {
char const *end = input + strlen(input);
char const *start;
char const *pos;
do {
for (; end>input && contains(delims, end[-1]); --end);
for (start=end; start>input && !contains(delims,start[-1]); --start);
for (pos=start; pos<end && max_len>1; --max_len)
*dest++=*pos++;
if (max_len > 1) { --max_len; *dest++ = ' '; }
end=start;
} while (max_len > 1 && start > input);
*dest++ = '\0';
}
int main(){
char reversed[100];
rev_words(reversed, sizeof(reversed), "This is an\tinput\nstring with\tseveral words in\n it.", " \t\n.");
printf("%s\n", reversed);
return 0;
}
Edit: The:
if (max_len > 1) { --max_len; *dest++ = ' '; }
should really be:
if (max_len > 1 && end-start > 0) { --max_len; *dest++ = ' '; }
If you want to allow for max_len < 1, you can change:
*dest++ = '\0';
to:
if (max_len > 0) *dest++ = '\0';
If the buffer length could somehow be set by via input from a (possibly hostile) user, that would probably be worthwhile. For many purposes it's sufficient to simply require a positive buffer size.

Related

Scanning data from text file, that doesn't have spacing between each item of data

I have encountered a problem with my homework. I need to scan some data from a text file, to a struct.
The text file looks like this.
012345678;danny;cohen;22;M;danny1993;123;1,2,4,8;Nice person
223325222;or;dan;25;M;ordan10;1234;3,5,6,7;Singer and dancer
203484758;shani;israel;25;F;shaninush;12345;4,5,6,7;Happy and cool girl
349950234;nadav;cohen;50;M;nd50;nadav;3,6,7,8;Engineer very smart
345656974;oshrit;hasson;30;F;osh321;111;3,4,5,7;Layer and a painter
Each item of data to its matching variable.
id = 012345678
first_name = danny
etc...
Now I can't use fscanf because there is no spacing, and the fgets scanning all the line.
I found some solution with %[^;]s, but then I will need to write one block of code and, copy and past it 9 times for each item of data.
Is there any other option without changing the text file, that similar to the code I would write with fscanf, if there was spacing between each item of data?
************* UPDATE **************
Hey, First of all, thanks everyone for the help really appreciating.
I didn't understand all your answers, but here something I did use.
Here's my code :
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct
{
char *idP, *firstNameP, *lastNameP;
int age;
char gender, *userNameP, *passwordP, hobbies, *descriptionP;
}user;
void main() {
FILE *fileP;
user temp;
char test[99];
temp.idP = (char *)malloc(99);
temp.firstNameP = (char *)malloc(99);
temp.lastNameP = (char *)malloc(99);
temp.age = (int )malloc(4);
temp.gender = (char )malloc(sizeof(char));
temp.userNameP = (char *)malloc(99);
fileP = fopen("input.txt", "r");
fscanf(fileP, "%9[^;];%99[^;];%99[^;];%d;%c", temp.idP,temp.firstNameP,temp.lastNameP,&temp.age, temp.gender);
printf("%s\n%s\n%s\n%d\n%c", temp.idP, temp.firstNameP, temp.lastNameP, temp.age, temp.gender);
fgets(test, 60, fileP); // Just testing where it stop scanning
printf("\n\n%s", test);
fclose(fileP);
getchar();
}
It all works well until I scan the int variable, right after that it doesn't scan anything, and I get an error.
Thanks a lot.

As discussed in the comments, fscanf is probably the shortest option (although fgets followed by strtok, and manual parsing are viable options).
You need to use the %[^;] specifier for the string fields (meaning: a string of characters other than ;), with the fields separated by ; to consume the actual semicolons (which we specifically requested not to be consumed as part of the string field). The last field should be %[^\n] to consume up to the newline, since the input doesn't have a terminating semicolon.
You should also (always) limit the length of each string field read with a scanf family function to one less than the available space (the terminating NUL byte is the +1). So, for example, if the first field is at most 9 characters long, you would need char field1[10] and the format would be %9[^;].
It is usually a good idea to put a single space in the beginning of the format string to consume any whitespace (such as the previous newline).
And, of course you should check the return value of fscanf, e.g., if you have 9 fields as per the example, it should return 9.
So, the end result would be something like:
if (fscanf(file, " %9[^;];%99[^;];%99[^;];%d;%c;%99[^;];%d;%99[^;];%99[^\n]",
s.field1, s.field2, s.field3, &s.field4, …, s.field9) != 9) {
// error
break;
}
(Alternatively, the field with numbers separated by commas could be read as four separate fields as %d,%d,%d,%d, in which case the count would go up to 12.)

Here you have simple tokenizer. As I see you have more than one delimiter here (; & ,)
str - string to be tokenized
del - string containing delimiters (in your case ";," or ";" only)
allowempty - if true allows empty tokens if there are two or more consecutive delimiters
return value is a NULL terminated table of pointers to the tokens.
char **mystrtok(const char *str, const char *del, int allowempty)
{
char **result = NULL;
const char *end = str;
size_t size = 0;
int extrachar;
while(*end)
{
if((extrachar = !!strchr(del, *end)) || !*(end + 1))
{
/* add temp variable and malloc / realloc checks */
/* free allocated memory on error */
if(!(!allowempty && !(end - str)))
{
extrachar = !extrachar * !*(end + 1);
result = realloc(result, (++size + 1) * sizeof(*result));
result[size] = NULL;
result[size -1] = malloc(end - str + 1 + extrachar);
strncpy(result[size -1], str, end - str + extrachar);
result[size -1][end - str + extrachar] = 0;
}
str = end + 1;
}
end++;
}
return result;
}
To free the the memory allocated by the tokenizer:
void myfree(char **ptr)
{
char **savedptr = ptr;
while(*ptr)
{
free(*ptr++);
}
free(savedptr);
}
Function is simple but your can use any separators and any number of separators.

Reversing strings in C - Memory Direction

Recently I've been learning about Strings and Pointers in C and have learned that you can do some pretty cool things in C using pointer arithmetic which reversing a String is one of these things. The following is the code that I am working with:
#include <stdio.h>
#include <string.h>
void print_reverse_string(char s[])
{
size_t length = strlen(s);
char *beginning = s;
char *end = beginning + length - 1;
while(end >= beginning)
{
printf("%c", *end);
end = end - 1;
}
}
int main()
{
printf("Please enter string to reverse: ");
char input_string[20];
fgets(input_string, 20, stdin);
/* Get rid of new line character from fgets function */
input_string[strlen(input_string) - 1] = '\0';
printf("Reversed string: ");
print_reverse_string(input_string);
return 0;
}
My concern begins with the following line of code:
char *end = beginning + length - 1;
This assumes that the ending memory location of an array will always be greater than the beginning memory location. Is this something that I should be concerned about as a C programmer or can I always be guaranteed that
&randomArray[0] < &randomArray[1] < &randomArray[2] < .... < &randomArray[lastElement]
It's just that I have been reading about different memory spaces and how certain spaces grow upwards while others grow downwards, for example, the stack growing downwards and the heap growing upwards, and thought that there might be a possibility of arrays growing downward in size.
Could this occur on a certain architecture or am I overthinking this possibility?

char *end = beginning + length - 1; leads to undefined behavior when length == 0. Example: "ending memory location of an array will always be greater" may be true expect code wants it to be false. Pointer arithmetic such as beginning + length - 1 is only valid from the beginning of an object to 1 object past its end. So beginning + 0 - 1 is UB.
Subsequent (like in an array) objects' addresses compare in increasing order - regardless of their underlying values, but the arithmetic is valid only in a narrow range.
Better to do
char *end = beginning + length;
while(end > beginning) {
end = end - 1;
printf("%c", *end);
}
Side issue: Should the first character read via fgets() is '\0', the below code attempts input_string[SIZE_MAX] = '\0'
// do not use
input_string[strlen(input_string) - 1] = '\0';
// alternatives
if (*input_string) input_string[strlen(input_string) - 1] = '\0';
// or
input_string[strcspn(input_string, "\n")] = '\0';

C arrays are always allocated contiguously in memory, from lowest address to highest.

This assumes that the ending memory location of an array will always be greater than the beginning memory location.
Yes, your assumption is correct. An array stores its elements in contiguous memory locations and in increasing order.

All answers above me were correct. When you write
char input_string[20];
you get 20*sizeof(char) bytes allocated on the stack, and input_string is only a pointer pointing to the first item, and it is handled like a pointer. For example input_string[10] means *(input_string + 10), etc.
Instead of repeating others, here's a tricky solution of the same task. Recursion in C is lovely for me, or black magic for the ignorants :) Takes some time to understand it, though. Not for serious use, but for fun!
The two backsides of recursion are: danger of stack overflow, and inefficiency. Neither of them are relevant to your question.
#include <stdio.h>
void reverse()
{
int c = getchar();
if (c != '\n' && c != EOF)
{
reverse();
putchar(c);
}
else
printf("Reversed string: ");
}
int main(void)
{
printf("Please enter string to reverse: ");
reverse();
return 0;
}

C—Infinite loop, I think?

I'm having an issue with a program in C and I think that a for loop is the culprit, but I'm not certain. The function is meant to take a char[] that has already been passed through a reverse function, and write it into a different char[] with all trailing white space characters removed. That is to say, any ' ' or '\t' characters that lie between a '\n' and any other character shouldn't be part of the output.
It works perfectly if there are no trailing white space characters, as in re-writing an exact duplicate of the input char[]. However, if there are any, there is no output at all.
The program is as follows:
#include<stdio.h>
#define MAXLINE 1000
void trim(char output[], char input[], int len);
void reverse(char output[], char input[], int len);
main()
{
int i, c;
int len;
char block[MAXLINE];
char blockrev[MAXLINE];
char blockout[MAXLINE];
char blockprint[MAXLINE];
i = 0;
while ((c = getchar()) != EOF)
{
block[i] = c;
++i;
}
printf("%s", block); // for debugging purposes
reverse(blockrev, block, i); // reverses block for trim function
trim(blockout, blockrev, i);
reverse(blockprint, blockout, i); // reverses it back to normal
// i also have a sneaking suspicion that i don't need this many arrays?
printf("%s", blockprint);
}
void trim(char output[], char input[], int len)
{
int i, j;
i = 0;
j = 0;
while (i <= len)
{
if (input[i] == ' ' || input[i] == '\t')
{
if (i > 0 && input[i-1] == '\n')
for (; input[i] == ' ' || input[i] == '\t'; ++i)
{
}
else
{
output[j] = input[i];
++i;
++j;
}
}
else
{
output[j] = input[i];
++i;
++j;
}
}
}
void reverse(char output[], char input[], int len)
{
int i;
for (i = 0; len - i >= 0; ++i)
{
output[i] = input[len - i];
}
}
I should note that this is a class assignment that doesn't allow the use of string functions, hence why it's so roundabout.

Change
for (i; input[i] == ' ' || input[i] == '\t'; ++i);
to
for (; i <= len && (input[i] == ' ' || input[i] == '\t'); ++i);
With the first method, if the whitespace is at the end, the loop will iterate indefinitely. Not sure how you didn't get an out of bounds access exception, but that's C/C++ for you.
Edit As Arkku brought up in the comments, make sure your character array is still NUL-terminated (the \0 character), and you can check on that case instead. Make sure you're not trimming the NUL character from the end either.

Declaring your main() function simply as main() is an obsolete style that should not be used. The function must be declared either as int main(void) or as int main(int argc, char *argv[]).
Your input process does not null-terminate your input. This means that what you're working with is not a "string", because a C string, by definition, is an array of char that the last element is a null character ('\0'). Instead, what you've got are simple arrays of char. This wouldn't be a problem as long as you're expecting that, and indeed your code is passing array lengths about, but you're also trying to print it with printf(), which requires C strings, not simple arrays of char.
Your reverse() function has an off-by-one error, because you aren't accounting for the fact that C arrays are zero-indexed, so what you're reversing is always one byte longer than your actual input.
What this means is that if you call reverse(output, input, 10), your code will start by assigning the value at input[10] to output[0], but input[10] is one past the end of your data, and since you didn't initialize your arrays before starting to fill them, that's an indeterminate value. In my testing, that indeterminate value happens, coincidentally, to have zero values much of the time, which means that output[0] gets filled with a null ('\0').
You need to be subtracting one more from the index into the input than you actually are. The loop-termination condition in the reverse() function is also wrong, in compensation, that condition should be len - i > 0, not len - i >= 0.
Your trim() function is unnecessarily complex. Additionally, it too has an incorrect loop condition to compensate for the off-by-one error in reverse(). The loop should be while ( i < len ), not while ( i <= len ).
Additionally, the trim() function has the ability to reduce the size of your data, but you don't provide a way to retain that information. (I see in the comments of Arkku's answer that you've corrected for this already. Good.)
Once you've fixed the issue with not keeping track of your data's size changes, and the off-by-one error which is copying indeterminate data (which happens, coincidentally, to be a null) from the end of the blockout array to the beginning of the blockprint array when you do the second reverse(), and you fix the incorrect <= condition in trim() and the incorrect >= condition in reverse(), and you null-terminate your byte array before passing it to printf(), your program will work.

(Moved from comments to an answer)
My guess is that the problem is outside this function, and is caused by the fact that in the described problem cases the output is shorter than the input. Since you are passing the length of the string as an argument, you need to calculate the length of the string after trim, because it may have changed...
For instance, passing an incorrect length to reverse can cause the terminating NUL character (and possibly some leftover whitespace) to end up at the beginning of the string, thus making the output appear empty.
edit: After seeing the edited question with the code of reverse included, in addition to the above problem, your reverse puts the terminating NUL as the first character of the reversed string, which causes it to be the empty string (in some cases your second reverse puts it back, so you don't see it without printing the output of the first reverse). Note that input[len] contains the '\0', not the last character of the string itself.
edit 2: Furthermore, you are not actually terminating the string in block before using it. It may be the case that the uninitialised array often happens to contain zeroes that serve to terminate the string, but for the program to be correct you absolutely need to terminate it with block[i] = '\0'; immediately after the input loop. Similarly you need ensure NUL-termination of the outputs of reverse and trim (in case of trim it seems to me that this already happens as a side-effect of having the loop condition i <= len instead of i < len, but it's not a sign of good code that it's hard to tell).

C reverse function wont work with 32 characters

To test my skills, I'm trying to write my own version of a few standard library functions. I wrote a replacement for strlen(), strlength():
int strlength(const char *c){
int len = 0;
while (*c != '\0') {
c++;
len++;
}
return len;
}
which doesn't include the null-terminator, and I am trying to write a function to reverse a string. This:
char *reverse(const char *s){
char *str = (char *)malloc(sizeof(char) * strlength(s));
int i = 0;
while (i < strlength(s)) {
str[i] = s[(strlength(s) - 1) - i];
i++;
}
str[strlength(s)] = '\0';
return str;
}
works for every string except for one with 32 characters (not including null-terminator) like foofoofoofoofoofoofoofoofoofoofo. It hangs in the reverse() functions while-loop. For all other amounts of characters, it works. Why is this happening?

Your buffer for str is off by 1. Your writes are overflowing into the rest of your heap.
As for why it works for values other than 32, my guess is that it has to do with the heap's memory alignment. The compiler is adding extra padding for smaller buffer sizes, but 32 bytes is nicely aligned (it's a power of 2, multiple of 8, etc.), so it doesn't add the extra padding and that's causing your bug to manifest. Try some other multiples of 8 and you'll probably get the same behavior.

char *reverse(const char *s){
here you allocate N characters (where N is the length of s without \0):
char *str = (char *)malloc(sizeof(char) * strlength(s));
then you iterate N times over all characters of s
int i = 0;
while (i < strlength(s)) {
str[i] = s[(strlength(s) - 1) - i];
i++;
}
and finally you add \0 at N+1 characters
str[strlength(s)] = '\0';
return str;
}
so you should do instead:
char *str = malloc(sizeof(*str) * strlength(s) + 1); // +1 for final `\0`
and funny thing is that I just tested your code, and it works fine for me (with one character off) and your 32 characters string. As #JoachimPileborg says, "That's the fun thing about undefined behavior"!
As suggested by others, the problem is certainly due to memory alignment, when you get your data aligned with your memory it overflows, whereas when it is not aligned it overwrites padding values.

You asked:
But this works for every other string length. Why won't it for 32?
Most likely because the runtime allocates memory in blocks of 32 bytes. So a 1-character buffer overrun when the buffer size is, say, 22 bytes, isn't a problem. But when you allocate 32 bytes and try to write to the 33rd byte, the problem shows up.
I suspect you'd see the same error with a string of 64 characters, 96, 128, etc . . .

Replace
malloc(sizeof(char) * strlength(s))
by
malloc(sizeof(char) * (1+strlength(s)));
The line:
str[strlength(s)] = '\0';
works often as the malloc library routine reserves word boundary aligned block and allocates only as much of it as requested in the call, viz. power of 2, but when the overflowing data overwrites beyond the allocated part, then examining the disassembly through debugger is the best hack to understand the build tool-chain specific to target's behavior. As the line is following the while loop rather than within it, so without disassembly how is the while loop mutating into infinite is unpredictable.

What everybody else said about buffer overflow and the vagaries of your runtime's memory allocation implementation/strategy. The size of a C-string is 1 more than its length, due to the NUL-termination octet.
Something like this ought to do you (a little cleaner and easier to read):
#define NUL ((char)0) ;
char *reverse( const char *s )
{
int len = strlen(s) ;
char *tgt = ((char*)malloc( 1+len )) + len ;
char *src = s ;
*(tgt--) = NUL ;
while ( *src )
{
*(tgt--) = *(src++) ;
}
return tgt;
}
Your strlen() implementation is more complicated than it needs to be, too. This is about all you need:
int string_length( const char *s )
{
char *p = s ;
while ( *p ) ++p ;
return p - s ;
}

Fast C comparison

As part of a protocol I'm receiving C string of the following format:
WORD * WORD
Where both WORDs are the same given string.
And, * - is any string of printable characters, NOT including spaces!
So the following are all legal:
WORD asjdfnkn WORD
WORD 234kjk2nd32jk WORD
And the following are illegal:
WORD akldmWORD
WORD asdm zz WORD
NOTWORD admkas WORD
NOTWORD admkas NOTWORD
Where (1) is missing a trailing space; (2) has 3 or more spaces; (3)/(4) do not open/end with the correct string (WORD).
Of-course this could be implemented pretty straight-forward, however I'm not sure what I'm doing is the most efficient.
Note: WORD is pre-set for a whole run, however could change from run to run.
Currently I'm strncmping each string against "WORD ".
If that checks manually (char-by-char) run over the string, to check for the second space char.
[If found] I then strcmp (all the way) with "WORD".
Would love to hear your solution, with an emphasis on efficiency as I'll be running over millions of theses in real-time.

I'd say, have a look at the algorithms in Handbook of Exact String-Matching Algorithms, compare the complexities and choose the one that you like best, implement it.
Or you can use some ready-made implementations.
You have some really classical algorithms for searching strings inside another string here:
KMP(Knuth-Morris-Pratt)
Rabin-Karp
Boyer-Moore
Hope this helps :)

Have you profiled?
There's not much gain to be had here, since you're doing basic string comparisons. If you want to go for the last few percent of performance, I'd change out the str... functions for mem... functions.
char *bufp, *bufe; // pointer to buffer, one past end of buffer
if (bufe - bufp < wordlen * 2 + 2)
error();
if (memcmp(bufp, word, wordlen) || bufp[wordlen] != ' ')
error();
bufp += wordlen + 1;
char *datap = bufp;
char *datae = memchr(bufp, ' ', bufe - buf);
if (!datae || bufe - datae < wordlen + 1)
error();
if (memcmp(datae + 1, word, wordlen))
error();
// Your data is in the range [datap, datae).
The performance gains are likely less than spectacular. You have to examine each character in the buffer since each character could be a space, and any character in the delimiters could be wrong. Changing a loop to memchr is slick, but modern compilers know how to do that for you. Changing a strncmp or strcmp to memcmp is also probably going to be negligible.

There is probably a tradeoff to be made between the shortest code and the fastest implementation. Choices are:
The regular expression ^WORD \S+ WORD$ (requires a regex engine)
strchr on "WORD " and a strrchr on " WORD" with a lot of messy checks (not really recommended)
Walking the whole string character by character, keeping track of the state you are in (scanning first word, scanning first space, scanning middle, scanning last space, scanning last word, expecting end of string).
Option 1 requires the least code but backtracks near the end, and Option 2 has no redeeming qualities. I think you can do option 3 elegantly. Use a state variable and it will look okay. Remember to manually enter the last two states based on the length of your word and the length of your overall string and this will avoid the backtracking that a regex will most likely have.

Do you know how long the string that is to be checked is? If not, your are somewhat limited in what you can do. If you do know how long the string is, you can speed things up a bit. You have not specified for sure that the '*' part has to be at least one character. You've also not stipulated whether tabs are allowed, or newlines, or ... is it only alphanumerics (as in your examples) or are punctuation and other characters allowed? Control characters?
You know how long WORD is, and can pre-construct both the start and end markers. The function error() reports an error (however you need it to be reported) and returns false. The test function might be bool string_is_ok(const char *string, int actstrlen);, returning true on success and false when there is a problem:
// Preset variables characterizing the search
static int wordlen = 4;
static int marklen = wordlen + 1;
static int minstrlen = 2 * marklen + 1; // Two blanks and one other character.
static char bword[] = "WORD "; // Start marker
static char eword[] = " WORD"; // End marker
static char verboten[] = " "; // Forbidden characters
bool string_is_ok(const char *string, int actstrlen)
{
if (actstrlen < minstrlen)
return error("string too short");
if (strncmp(string, bword, marklen) != 0)
return error("string does not start with WORD");
if (strcmp(string + actstrlen - marklen, eword) != 0)
return error("string does not finish with WORD");
if (strcspn(string + marklen, verboten) != actstrlen - 2 * marklen)
return error("string contains verboten characters");
return true;
}
You probably can't reduce the tests by much if you want your guarantees. The part that would change most depending on the restrictions in the alphabet is the strcspn() line. That is relatively fast for a small list of forbidden characters; it will likely be slower as the number of characters forbidden is increased. If you only allow alphanumerics, you have 62 OK and 193 not OK characters, unless you count some of the high-bit set characters as alphabetic too. That part will probably be slow. You might do better with a custom function that takes a start position and length and reports whether all characters are OK. This could be along the lines of:
#include <stdbool.h>
static bool ok_chars[256] = { false };
static void init_ok_chars(void)
{
const unsigned char *ok = "abcdefghijklmnopqrstuvwxyz...0123456789";
int c;
while ((c = *ok++) != 0)
ok_chars[c] = 1;
}
static bool all_chars_ok(const char *check, int numchars)
{
for (i = 0; i < numchars; i++)
if (ok_chars[check[i]] == 0)
return false;
return true;
}
You can then use:
return all_chars_ok(string + marklen, actstrlen - 2 * marklen);
in place of the call to strcspn().

If your "stuffing" should contain only '0'-'9', 'A'-'Z' and 'a'-'z' and are in some encoding based on ASCII (like most Unicode based encodings), then you can skip two comparisons in one of your loops, since only one bit differ between capital and minor characters.
Instead of
ch>='0' && ch<='9' && ch>='A' && ch<='Z' && ch>='a' && ch<='a'
you get
ch2 = ch & ~('a' ^ 'A')
ch>='0' && ch<='9' && ch2>='A' && ch2<='Z'
But you better look at the assembler code your compiler generate and do some benchmarking, depending on computer architecture and compiler, this trick could give slower code.
If branching is expensive compared to comparisons on your computer, you can also replace the && with &. But most modern compilers know this trick in most situations.
If, on the other hand, you test for any printable glyph from some large character encoding, then it is most likely less expensive to test for white-space glyphs, rather then printable glyph.
Also, compile specifically for the computer that the code will run on and don't forget turn of any generation of debugging-code.
Added:
Don't make subroutine calls within your scan loops, unless it is worth it.
Whatever trick you use to speed up your loops, it will diminish if you have to make a sub-routine call within one of them. It is fine to use built-in functions that your compiler inline into your code, but if you use something lika an external regex-library and your compiler is unable to inline those functions (gcc can do that, sometimes, if you ask it to), then making that subroutine call will shuffle a lot of memory around, in worse case between different types of memory (registers, CPU buffers, RAM, harddisk et.c.) and may mess up CPU predictions and pipelines. Unless your text-snippets are very long, so that you spend much time parsing each of them, and the subroutine is effective enough to compensate for the cost of the call, don't do that. Some functions for parsing use call-backs, it might be more effective then you making a lot of subroutine calls from your loops (since the function can scan several pattern-matches in one sweep and bunch several call-backs together outside the critical loop), but that depend on how someone else have written that function and basically it is the same thing as you making the call.

WORD is 4 characters, with uint32_t you could do a quick comparison. You will need a different constant depending on system endianness. The rest seems to be fine.
Since WORD can change you have to precalculate the uint32_t, uint64_t, ... you need depending on the length of the WORD.
Not sure from the description, but if you trust the source you could just chomp the first n+1 and last n+1 characters.

bool check_legal(
const char *start, const char *end,
const char *delim_start, const char *delim_end,
const char **content_start, const char **content_end
) {
const size_t delim_len = delim_end - delim_start;
const char *p = start;
if (start + delim_len + 1 + 0 + 1 + delim_len < end)
return false;
if (memcmp(p, delim_start, delim_len) != 0)
return false;
p += delim_len;
if (*p != ' ')
return false;
p++;
*content_start = p;
while (p < end - 1 - delim_len && *p != ' ')
p++;
if (p + 1 + delim_len != end)
return false;
*content_end = p;
p++;
if (memcmp(p, delim_start, delim_len) != 0)
return false;
return true;
}
And here is how to use it:
const char *line = "who is who";
const char *delim = "who";
const char *start, *end;
if (check_legal(line, line + strlen(line), delim, delim + strlen(delim), &start, &end)) {
printf("this %*s nice\n", (int) (end - start), start);
}
(It's all untested.)

using STL find the number of spaces..if they are not two obviously the string is wrong..and using find(algorithm.h) you can get the position of the two spaces and the middle word! Check for WORD at the beginning and the end! you are done..

This should return the true/false condition in O(n) time
int sameWord(char *str)
{
char *word1, *word2;
word1 = word2 = str;
// Word1, Word2 points to beginning of line where the first word is found
while (*word2 && *word2 != ' ') ++word2; // skip to first space
if (*word2 == ' ') ++word2; // skip space
// Word1 points to first word, word2 points to the middle-filler
while (*word2 && *word2 != ' ') ++word2; // skip to second space
if (*word2 == ' ') ++word2; // skip space
// Word1 points to first word, word2 points to the second word
// Now just compare that word1 and word2 point to identical strings.
while (*word1 != ' ' && *word2)
if (*word1++ != *word2++) return 0; //false
return *word1 == ' ' && (*word2 == 0 || *word2 == ' ');
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

working with strings in c - c

Related

Scanning data from text file, that doesn't have spacing between each item of data

Reversing strings in C - Memory Direction

C—Infinite loop, I think?

C reverse function wont work with 32 characters

Fast C comparison

Categories

Resources