C Get complete words from first 15 characters of string - c

I have a function that will return either the first 13 characters of a string or the second 13 characters of a string:
char* get_headsign_text(char* string, int position) {
if (position == 1){
char* myString = malloc(13);
strncpy(myString, string, 13);
myString[13] = '\0'; //null terminate destination
return myString;
free(myString);
} else {
char* myString = malloc(13);
string += 13;
strncpy(myString, string, 13);
myString[13] = '\0'; //null terminate destination
return myString;
free(myString);
}
}
I would like to have it so that the function will return only complete words (not cutoff words in the middle).
Example:
If the string is "Hi I'm Christopher"
get_headsign_text(string, 1) = "Hi I'm "
get_headsign_text(string, 2) = "Christopher"
So if the function would have cut within a word, instead it would cut before that last word, and if so, if it is trying to get the second 13 it would include the word that would have been cut.

When taking various edge cases into consideration, the structure of the code needs to change considerably.
For instance:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
inline int min_int(int a, int b) {
return a < b ? a : b;
}
inline int is_word_char(char c) {
return isgraph(c);
}
char* get_headsign_text(char* string, int position) {
int start_index, end_index;
if (position == 1) {
start_index = 0;
} else {
start_index = 13;
}
end_index = min_int(strlen(string) + 1, start_index + 13);
start_index = min_int(start_index, end_index);
int was_word_char = 1;
while(start_index > 0 && (was_word_char = is_word_char(string[start_index]))) {
--start_index;
}
if(!was_word_char) {
++start_index;
}
while(end_index > start_index && is_word_char(string[end_index])) {
--end_index;
}
int myStringLen = end_index - start_index;
char *myString = malloc(myStringLen + 1);
strncpy(myString, string + start_index, myStringLen);
myString[myStringLen] = '\0';
return myString;
}
int main(void) {
char s[] = "Hi, I\'m Christopher";
char *r1 = get_headsign_text(s, 1);
char *r2 = get_headsign_text(s, 2);
printf("<%s>\n<%s>\n", r1, r2);
free(r1);
free(r2);
return 0;
}
That said, there are numerous other problems/concerns with the code snippet you posted:
In the assignment myString[13] = '\0';, you are assigning to memory which you have not allocated. Although you have allocated 13 bytes, myString[13] refers to one byte past the last allocated byte.
Nothing after the return statement gets executed, and the calls to free are never reached.
You shouldn't be returning a block of memory only to free it immediately! It's quite counter-productive to give something to the caller only to take it away. :)
You do not validate the size of the string. Unless you are absolutely certain this will only be called on strings of sufficient length, your function will segfault when, say, position is 2 and your string buffer is only, say, 10 bytes long.

You need to check if your last character is not a blank space '' then it should find for trailing space and cut your string to that index.

Keep track of your spaces using an index variable. If the character count is 13, and the current character is not a space or null terminator, then adjust your character count by subtracting that and the last space index. Save the string, and then continue from the last space index.

You have a multitude of issues.
First issue, you're free'ing after returning the myString - meaning this function will not free the string.
Second issue. You allocate 13 chars, then you set the 13rd char to null. Are you sure this does what you expect?
Third issue - why are you adding 13 to the string pointer? What should that do?
Lastly, you should think about which character is used to separate words - when you've figured which one that is, try to scan for it and cut your string where it is.

Related

split a char array into tokens where the separator is NUL char

I want to split a char array into tokens using the NUL char as the separator.
I have a char array that I've received over the network from a recv command, so I know the length of the char array. In that char array there are bunch of strings that are separated by the NUL char (\0).
Because the separator is the NUL char, that means I can't use strtok, because it uses NULL for its own purposes.
So I want to iterate through all the strings starting from byte 8 (the strings are preceded by 2 32 bit integers).
I was thinking I could iterate though all the characters looking for the \0 character and then doing a memcpy of the length I have found so far, but I figured there must be a nicer method than this.
What other approach can I take?
Here is some simple code showing how you can get the contained strings:
#include <stdio.h>
#include <string.h>
int main(void) {
char recbuf[7] = {'a', 'b', 'c', '\0', 'd', 'e', '\0'};
int recbuf_size = 7;
int j = 0;
char* p = recbuf;
while(j < recbuf_size)
{
printf("%s\n", p); // print the string found
// Here you could copy the string if needed, e.g.
// strcpy(mySavedStrings[stringCount++], p);
int t = strlen(p); // get the length of the string just printed
p += t + 1; // move to next string - add 1 to include string termination
j += t + 1; // remember how far we are
}
return 0;
}
Output:
abc
de
If you need to skip some bytes in the start of the buffer then just do:
int number_of_bytes_to_skip = 4;
int j = number_of_bytes_to_skip;
char* p = recbuf + number_of_bytes_to_skip;
Notice:
The code above assumes that the receive buffer is always correctly terminated with a '\0'. In real world code, you should check that before running the code and add error handling, e.g.:
if (recbuf[recbuf_size-1] != '\0')
{
// Some error handling...
}
NUL separation actually makes your job lot easy.
char* DestStrings[MAX_STRINGS];
int j = 0;
int length = 0;
inr prevLength =0;
int offset = 8;
for(int i = 0;i<MAX_STRINGS;i++)
{
length += strlen(&srcbuffer[j+offset+length]);
if(length == prevLength)
{
break;
}
else
{
DestStrings[i] = malloc(length-prevLength+1);
strcpy(DestStrings[i],&srcbuffer[j+offset+length]);
prevLength = length;
j++;
}
}
You need add few additional checks to avoid potential buffer overflow errors.
Hope this code gives you slight idea about how to go ahead.
EDIT 1:
Though this is the code to start with an not the entire solution due to down-votes modifying the index.
EDIT 2 :
As the length of received data buffer is already known ,please append NUL to received data to make this code work as it is. On the other hand length of received data can itself used to compare with copied length.
I'd suggest using a structure implementing the tokenizer for doing such kind of work. It'll be easier to read and to maintain because it looks similar to object oriented code. It isolates the memcpy, so I think it's "nicer".
First, the headers I'll use:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
The Tokenizer structurehas to remember the beginning of the string (so that we can erase the memory after it's not needed anymore), the actual index and the end index to check if we already parsed the whole string:
struct Tokenizer {
char *string;
char *actual_index;
char *end_index;
};
I suggest using a factory-like function to create a tokenizer. It's constructed here, copying the input string using memcpy because string.h functions stop on the first '\0' character.
struct Tokenizer getTokenizer(char string[], unsigned length) {
struct Tokenizer tokenizer;
tokenizer.string = (char *)malloc(length);
tokenizer.actual_index = tokenizer.string;
tokenizer.end_index = tokenizer.string + length;
memcpy(tokenizer.string, string, length);
return tokenizer;
}
Now the function responsible for getting the tokens. It returns new allocated strings, which have a '\0' character at their end. It also changes the address actual_index is pointing to. It takes the address of the tokenizer as its argument, so it can change its values:
char * getNextToken(struct Tokenizer *tokenizer) {
char * token;
unsigned length;
if(tokenizer->actual_index == tokenizer->end_index)
return NULL;
length = strlen(tokenizer->actual_index);
token = (char *)malloc(length + 1);
// + 1 because the '\0' character has to fit in
strncpy(token, tokenizer->actual_index, length + 1);
for(;*tokenizer->actual_index != '\0'; tokenizer->actual_index++)
; // getting the next position
tokenizer->actual_index++;
return token;
}
Sample use of the tokenizer, to show how to handle the memory allocation ang how to use it.
int main() {
char c[] = "Lorem\0ipsum dolor sit amet,\0consectetur"
" adipiscing elit. Ut\0rhoncus volutpat viverra.";
char *temp;
struct Tokenizer tokenizer = getTokenizer(c, sizeof(c));
while((temp = getNextToken(&tokenizer))) {
puts(temp);
free(temp);
}
free(tokenizer.string);
return 0;
}
Assuming this input data:
char input[] = {
0x01, 0x02, 0x0a, 0x0b, /* A 32bit integer */
'h', 'e', 'l', 'l', 'o', 0x00,
'w', 'o', 'r', 'l', 'd', 0x00,
0x00 /* Necessary to make the end of the payload. */
};
A 32 integer in the beginning gives:
const size_t header_size = sizeof (uint32_t);
Parsing the input can be done by identifying the "string"'s 1st character and storing a pointer to it and then moving on exactly as much as the string found is long (1+) then start over until the end of input had been reached.
size_t strings_elements = 1; /* Set this to which ever start size you like. */
size_t delta = 1; /* 1 is conservative and slow for larger input,
increase as needed. */
/* Result as array of pointers to "string": */
char ** strings = malloc(strings_elements * sizeof *strings);
{
char * pc = input + header_size;
size_t strings_found = 0;
/* Parse input, if necessary increase result array, and populate its elements: */
while ('\0' != *pc)
{
if (strings_found >= strings_elements)
{
strings_elements += delta;
void * pvtmp = realloc(
strings,
(strings_elements + 1) * sizeof *strings /* Allocate one more to have a
stopper, being set to NULL as a sentinel.*/
);
if (NULL == pvtmp)
{
perror("realloc() failed");
exit(EXIT_FAILURE);
}
strings = pvtmp;
}
strings[strings_found] = pc;
++strings_found;
pc += strlen(pc) + 1;
}
strings[strings_found] = NULL; /* Set a stopper element.
NULL terminate the pointer array. */
}
/* Print result: */
{
char ** ppc = strings;
for(; NULL != *ppc; ++ppc)
{
printf("%zu: '%s'\n", ppc - strings + 1, *ppc)
}
}
/* Clean up: */
free(strings);
If you need to copy on split, replace this line
strings[strings_found] = pc;
by
strings[strings_found] = strdup(pc);
and add clean-up code after using and before free()ing strings:
{
char ** ppc = strings;
for(; NULL != *ppc; ++ppc)
{
free(*ppc);
}
}
The code above assume that at least 1 '\0' (NUL aka null-character) follows the payload.
If the latter condition is not met you need to either have any other terminating sequence be defined/around or need know the size of the input from some other source. If you don't your issue is not solvable.
The code above needs the following headers:
#include <inttypes.h> /* for int32_t */
#include <stdio.h> /* for printf(), perror() */
#include <string.h> /* for strlen() */
#include <stdlib.h> /* for realloc(), free(), exit() */
as well as it might need one of the following defines:
#define _POSIX_C_SOURCE 200809L
#define _GNU_SOURCE
or what else your C compiler requires to make strdup() available.

Buffer Overflow - Char Array not removed from stack after exiting function

I am trying to concatenate a few strings to a buffer. However, if I call the function repeatedly, the size of my buffer will keep growing.
void print_message(char *str) {
char message[8196];
sender *m = senderlist;
while(m) {
/* note: stricmp() is a case-insensitive version of strcmp() */
if(stricmp(m->sender,str)==0) {
strcat(message,m->sender);
strcat(message,", ");
}
m = m->next;
}
printf("strlen: %i",strlen(message));
printf("Message: %s\n",message);
return;
}
The size of message will continuously grow until the length will be 3799.
Example:
1st. call: strlen = 211
2nd call: strlen = 514
3rd call: strlen = 844
...
nth call: strlen = 3799
nth +1 call: strlen = 3799
nth +2 call: strlen = 3799
My understanding was, that statically allocated variables like char[] will automatically be freed upon exiting the function, and I'm not dynamically allocating anything on the heap.
And why will suddenly stop growing at 3799 bytes? Thanks for any pointers.
Add one more statement after the buffer definition
char message[8196];
message[0] = '\0';
Or initialize the buffer when it is defined
char message[8196] = { '\0' };
or
char message[8196] = "";
that is fully equivalent to the previous initialization.
The problem with your code is that the compiler does not initialize the buffer if you wiil not specify initialization explicitly. So array message contains some garbage but function strcat at first searches the terminating zero in the buffer that to append a new string. So your program has undefined behaviour.
What you are seeing is the growing of the senderlist or likely garbage in message. Fortunately not exceeding 8196.
The message array must start with the empty string. At the moment doing a strcat adds to garbage.
char message[8196];
sender *m = senderlist;
int len = 0;
*message = '\0';
while(m) {
/* note: stricmp() is a case-insensitive version of strcmp() */
if(stricmp(m->sender,str)==0) {
int sender_len = strlen(m->sender);
if (len + sender_len + 2 + 1 < sizeof(message)) {
strcpy(message + len, m->sender);
len += sender_len;
strcpy(message + len, ", ");
len += 2;
} else {
// Maybe appending "..." instead (+ 3 + 1 < ...).
break;
}
}
m = m->next;
}
printf("strlen: %i",strlen(message));
printf("Message: %s\n",message);
"Deallocation" is not the same as wiping the data; in fact, C generally leaves the data unerased for performance reasons.

Return the contiguous block in c

I create an array (char *charheap;) of length 32 bytes in the heap, and initialize all the elements to be \0. Here is my main function:
int main(void) {
char *str1 = alloc_and_print(5, "hello");
char *str2 = alloc_and_print(5, "brian");
}
char *alloc_and_print(int s, const char *cpy) {
char *ncb = char_alloc(s);// allocate the next contiguous block
if (ret == NULL) {
printf("Failed\n");
} else {
strcpy(ncb, cpy);
arr_print();// print the array
}
return ncb;
}
Here is what I implement:
/char_alloc(s): find the FIRST contiguous block of s+1 NULL ('\0')
characters in charheap that does not contain the NULL terminator
of some previously allocated string./
char *char_alloc(int s) {
int len = strlen(charheap);
for (int i = 0; i < len; i++) {
if (charheap[0] == '\0') {
char a = charheap[0];
return &a;
} else if (charheap[i] == '\0') {
char b = charheap[i+1];
return &b;
}
}
return NULL;
}
Expected Output: (\ means \0)
hello\\\\\\\\\\\\\\\\\\\\\\\\\\\
hello\brian\\\\\\\\\\\\\\\\\\\\\
This solution is completely wrong and I just print out two failed. :(
Actually, the char_alloc should return a pointer to the start of contiguous block but I don't know how to implement it properly. Can someone give me a hint or clue ?
Your function is returning a pointer to a local variable, therefore the caller receives a pointer to invalid memory. Just return the pointer into the charheap, which is what you want.
return &charheap[0]; /* was return &a; which is wrong */
return &charheap[i+1]; /* was return &b; which is wrong */
Your for loop uses i < len for the terminating condition, but, since charheap is \0 filled, strlen() will return a size of 0. You want to iterate through the whole charheap, so just use the size of that array (32 in this case).
int len = 32; /* or sizeof(charheap) if it is declared as an array */
The above two fixes should be enough to get your program to behave as you expect (see demonstration).
However, you do not place a check to make sure there is enough room in your heap to accept the allocation check. Your allocation should fail if the distance between the start of the available memory and the end of the charheap is less than or equal to the desired size. You can enforce this easily enough by setting the len to be the last point you are willing to check before you know there will not be enough space.
int len = 32 - s;
Finally, when you try to allocate a third string, your loop will skip over the first allocated string, but will overwrite the second allocated string. Your loop logic needs to change to skip over each allocated string. You first check if the current location in your charheap is free or not. If it is not, you advance your position by the length of the string, plus one more to skip over the '\0' terminator for the string. If the current location is free, you return it. If you are not able to find a free location, you return NULL.
char *char_alloc(int s) {
int i = 0;
int len = 32 - s;
while (i < len) {
if (charheap[i] == '\0') return &charheap[i];
i += strlen(charheap+i) + 1;
}
return NULL;
}

Having trouble reading strings from stdin

I need to create program that takes input from stdin in this format:
abcde //number of characters in word = number of words => square shape
fghij
klmno
pqrst
uvwxy
// \n separates first half from second
word1word //any amount of characters, any amount of words
word
word2
sdf
// \n to end input
My code works, but only about 50% of the time. I have couple of example inputs, that I use for testing, but for some of them my readwords function fails.
Here is my function, that reads words. Since I have no idea how many words or how long they are going to be, I use dynamic arrays and getchar() function.
void readWords(char **p,int *n,int w) /* before calling: n = 50; w = 20; p = 50x20 char array */
{
int i = 0,j = 0,x;
char tmp,prevtmp;
while (1)
{
prevtmp = tmp;
tmp = getchar();
if ((prevtmp == '\n' && tmp == '\n') || feof(stdin))
break; /* no more words to read */
if (tmp == '\n') /* end of word */
{
p[i][j] = '\0'; /* add \0 to create string format */
i++;
j = 0;
if (i == *n) /* if there is more words than there is space for them, double the size */
if (realloc(p,*n*2) != NULL)
*n*=2;
continue;
}
p[i][j] = tmp;
j++;
if (j == w) /* if width of word is larger than allocated space, double it */
{
for (x = 0; x < *n;x++);
if(realloc (p[x],w*2) != NULL);
w=w*2;
}
}
*n = i;
}
This is example of input for which this works (note:this function only reads second half after line with only \n):
dsjellivhsanxrr
riemjudhgdffcfz
<skipping>
atnaltapsllcelo
ryedunuhyxhedfy
atlanta
saltlakecity
<skipping 15 words>
hartford
jeffersoncity
And this is input that my function doesn't read properly:
<skipping>
...oywdz.ykasm.pkfwb.zazqy...
....ynu...ftk...zlb...akn....
missouri
delaware
<skipping>
minnesota
southdakota
What my function reads from this input:
e
yoming
xas
florida
lvania
ana
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
There is no difference between those two inputs (except different words and different amount and length of words), the first half gets read properly no matter what, but only the second half bugs out. How do I fix this?
P.S. sorry for long post, in case you want to see full input without skipped bytes, here is pastebin: http://pastebin.com/hBGn2tej
realloc() returns the address of the newly allocated memory, it does not update the argument passed into it. So this (and the other use of realloc()) is incorrect:
if (realloc(p,*n*2) != NULL)
and will results in the code accessing memory incorrectly, causing undefined behaviour. Store the result of realloc() to a temporary variable and check for non-NULL before updating p. The argument to realloc() also indicates the number of bytes, not the number of elements so the size argument calculation is incorrect as p is an array of char* so it should be realloc(p, sizeof(char*) * (*n * 2));. However, the change to p would not be visible to the caller. Also note that the only legal arguments to realloc() are pointers obtained from a previous call to malloc(), realloc() or calloc(). The comment p = 50x20 char array in the code suggests this is not the case.
Here is a small example that allocates an array of char* which should be helpful:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void f(char*** p)
{
/* Allocate space for two 'char*' elements.
Add a NULL pointer element as sentinel value
so caller knows where to find end of list. */
*p = malloc(sizeof(**p) * 3);
/* Allocate space for the two strings
and populate. */
(*p)[0] = malloc(10);
(*p)[1] = malloc(10);
strcpy((*p)[0], "hello");
strcpy((*p)[1], "world");
(*p)[2] = NULL;
/* Add a third string. */
char** tmp = realloc(*p, sizeof(**p) * 4);
if (tmp)
{
*p = tmp;
(*p)[2] = malloc(10);
strcpy((*p)[2], "again");
(*p)[3] = NULL;
}
}
int main()
{
char** word_list = 0;
f(&word_list);
if (word_list)
{
for (int i = 0; word_list[i]; i++)
{
printf("%s\n", word_list[i]);
free(word_list[i]);
}
}
free(word_list);
return 0;
}
Additionally:
prevtmp has an unknown value upon its first use.
getchar() actually returns an int and not a char.

Appending a char to a char* in C?

I'm trying to make a quick function that gets a word/argument in a string by its number:
char* arg(char* S, int Num) {
char* Return = "";
int Spaces = 0;
int i = 0;
for (i; i<strlen(S); i++) {
if (S[i] == ' ') {
Spaces++;
}
else if (Spaces == Num) {
//Want to append S[i] to Return here.
}
else if (Spaces > Num) {
return Return;
}
}
printf("%s-\n", Return);
return Return;
}
I can't find a way to put the characters into Return. I have found lots of posts that suggest strcat() or tricks with pointers, but every one segfaults. I've also seen people saying that malloc() should be used, but I'm not sure of how I'd used it in a loop like this.
I will not claim to understand what it is that you're trying to do, but your code has two problems:
You're assigning a read-only string to Return; that string will be in your
binary's data section, which is read-only, and if you try to modify it you will get a segfault.
Your for loop is O(n^2), because strlen() is O(n)
There are several different ways of solving the "how to return a string" problem. You can, for example:
Use malloc() / calloc() to allocate a new string, as has been suggested
Use asprintf(), which is similar but gives you formatting if you need
Pass an output string (and its maximum size) as a parameter to the function
The first two require the calling function to free() the returned value. The third allows the caller to decide how to allocate the string (stack or heap), but requires some sort of contract about the minumum size needed for the output string.
In your code, when the function returns, then Return will be gone as well, so this behavior is undefined. It might work, but you should never rely on it.
Typically in C, you'd want to pass the "return" string as an argument instead, so that you don't have to free it all the time. Both require a local variable on the caller's side, but malloc'ing it will require an additional call to free the allocated memory and is also more expensive than simply passing a pointer to a local variable.
As for appending to the string, just use array notation (keep track of the current char/index) and don't forget to add a null character at the end.
Example:
int arg(char* ptr, char* S, int Num) {
int i, Spaces = 0, cur = 0;
for (i=0; i<strlen(S); i++) {
if (S[i] == ' ') {
Spaces++;
}
else if (Spaces == Num) {
ptr[cur++] = S[i]; // append char
}
else if (Spaces > Num) {
ptr[cur] = '\0'; // insert null char
return 0; // returns 0 on success
}
}
ptr[cur] = '\0'; // insert null char
return (cur > 0 ? 0 : -1); // returns 0 on success, -1 on error
}
Then invoke it like so:
char myArg[50];
if (arg(myArg, "this is an example", 3) == 0) {
printf("arg is %s\n", myArg);
} else {
// arg not found
}
Just make sure you don't overflow ptr (e.g.: by passing its size and adding a check in the function).
There are numbers of ways you could improve your code, but let's just start by making it meet the standard. ;-)
P.S.: Don't malloc unless you need to. And in that case you don't.
char * Return; //by the way horrible name for a variable.
Return = malloc(<some size>);
......
......
*(Return + index) = *(S+i);
You can't assign anything to a string literal such as "".
You may want to use your loop to determine the offsets of the start of the word in your string that you're looking for. Then find its length by continuing through the string until you encounter the end or another space. Then, you can malloc an array of chars with size equal to the size of the offset+1 (For the null terminator.) Finally, copy the substring into this new buffer and return it.
Also, as mentioned above, you may want to remove the strlen call from the loop - most compilers will optimize it out but it is indeed a linear operation for every character in the array, making the loop O(n**2).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *arg(const char *S, unsigned int Num) {
char *Return = "";
const char *top, *p;
unsigned int Spaces = 0;
int i = 0;
Return=(char*)malloc(sizeof(char));
*Return = '\0';
if(S == NULL || *S=='\0') return Return;
p=top=S;
while(Spaces != Num){
if(NULL!=(p=strchr(top, ' '))){
++Spaces;
top=++p;
} else {
break;
}
}
if(Spaces < Num) return Return;
if(NULL!=(p=strchr(top, ' '))){
int len = p - top;
Return=(char*)realloc(Return, sizeof(char)*(len+1));
strncpy(Return, top, len);
Return[len]='\0';
} else {
free(Return);
Return=strdup(top);
}
//printf("%s-\n", Return);
return Return;
}
int main(){
char *word;
word=arg("make a quick function", 2);//quick
printf("\"%s\"\n", word);
free(word);
return 0;
}

Resources