So i have a buffer (array) :
char *buf;
buf = malloc(1024);
the buf is like "foo\0bar\0foo\0bar\0\0\0\0\0\0\0..."
it contains strings separated by the null terminator. I need to separate every string. I tried using the strtok() with \0 as the delemiter but of course it didnt work. How can i achieve that? Also afterwards each string needs to be "copied" somewhere else.
You can go through the array and copy every character except the \0 into another array/struct depending on what that "somewhere else" needs to be. So every string would end at \0.
Since what you have is not actually a string but a character array that may contain nulls, you can use the memchr function to search for nulls in the array. Then you can use strncpy or strcpy to copy out the individual strings.
char *p = buf;
char *list[1024];
int cnt = 0;
while (p) {
char *n = memchr(p, 0, 1024 - (p-buf));
if (n) {
list[cnt++] = strdup(p);
} else {
int size = 1024 - (p-buf);
list[cnt] = malloc(size + 1);
strncpy(list[cnt], p, size);
list[cnt++][size] = 0;
}
p = n;
if (p) p++;
}
We start by setting p to the beginning of buf. Then on each iteration, we use memchr to look for the next null byte between p and the end of the array. If we find one, we can treat p as a string and use strdup to allocate space for and duplicate the string. If we don't find a null, we copy the remaining bytes to a newly allocated buffer and manually add a null byte.
Note that you'll need to know how large your buffer is so that you don't read past the end of it.
EDIT:
There was an issue with the code as originally written. After one iteration, p was pointing to a null byte, so memchr would keep returning a pointer to that byte. I added an increment past that byte at the end of the loop so it isn't checked again.
Related
I have an char array and I am trying to copy a part of it (tokenization) into 0th index of array of pointer to char using the strncpy function. But during runtime a segmentation fault occurs.
Code example:
char array[30] = "ls -l";
char* args[10];
strncpy(args[0], array + 0, 2);
char *args[10] has the following declaration:
declare args as array 10 of pointer to char
That is to say, we have an array of uninitialized pointers. We'll need to make those pointers point somewhere first, before trying to place characters there. Remembering that we must NUL terminate ('\0') C strings, we can simultaneously allocate and NUL out space for our string by using calloc.
This will make space for just 'l', 's', and our mandatory '\0'.
char original_command[30] = "ls -l";
char *args[10] = { 0 };
args[0] = calloc(3, sizeof (char));
strncpy(args[0], original_command, 2);
Alternatively, we can use malloc, but we must remember the NUL terminating byte.
args[0] = malloc(3);
strncpy(args[0], original_command, 2);
args[0][2] = '\0';
It's generally a good idea to always initialize our variables - see how we initialize our args array to be full of NULL pointers (0). Makes it very clear they don't point anywhere useful yet.
Also note that strncpy does not place a NUL terminating byte if it was not found in the first n bytes of our source string. This is why it's very important to manually terminate our destination string.
Additionally, any call to an *alloc function must be matched later by a call to free, when we are finished using that memory.
/* Do whatever needs to be done */
/* ... */
free(args[0]);
You need to allocate space for the copied string content; char* args[10] reserves only space for holding the pointer to the content, not for the content itself. And don't forget to reserve space for the string terminating character '\0' then.
args[0] = malloc(2+1);
strncpy(agrs[0],array+0,2+1);
agrs[0][2] = '\0';
I have the following code in C now
int length = 50
char *target_str = (char*) malloc(length);
char *source_str = read_string_from_somewhere() // read a string from somewhere
// with length, say 20
memcpy(target_str, source_str, length);
The scenario is that target_str is initialized with 50 bytes. source_str is a string of length 20.
If I want to copy the source_str to target_str i use memcpy() as above with length 50, which is the size of target_str. The reason I use length in memcpy is that, the source_str can have a max value of length but is usually less than that (in the above example its 20).
Now, if I want to copy till length of source_str based on its terminating character ('\0'), even if memcpy length is more than the index of terminating character, is the above code a right way to do it? or is there an alternative suggestion.
Thanks for any help.
The scenario is that target_str is initialized with 50 bytes. source_str is a string of length 20.
If I want to copy the source_str to target_str i use memcpy() as above with length 50, which is the size of target_str.
currently you ask for memcpy to read 30 characters after the end of the source string because it does not care of a possible null terminator on the source, this is an undefined behavior
because you copy a string you can use strcpy rather than memcpy
but the problem of size can be reversed, I mean the target can be smaller than the source, and without protection you will have again a undefined behavior
so you can use strncpy giving the length of the target, just take care of the necessity to add a final null character in case the target is smaller than the source :
int length = 50
char *target_str = (char*) malloc(length);
char *source_str = read_string_from_somewhere(); // length unknown
strncpy(target_str, source_str, length - 1); // -1 to let place for \0
target_str[length - 1] = 0; // force the presence of a null character at end in case
If I want to copy the source_str to target_str i use memcpy() as above
with length 50, which is the size of target_str. The reason I use
length in memcpy is that, the source_str can have a max value of
length but is usually less than that (in the above example its 20).
It is crucially important to distinguish between
the size of the array to which source_str points, and
the length of the string, if any, to which source_str points (+/- the terminator).
If source_str is certain to point to an array of length 50 or more then the memcpy() approach you present is ok. If not, then it produces undefined behavior when source_str in fact points to a shorter array. Any result within the power of your C implementation may occur.
If source_str is certain to point to a (properly-terminated) C string of no more than length - 1 characters, and if it is its string value that you want to copy, then strcpy() is more natural than memcpy(). It will copy all the string contents, up to and including the terminator. This presents no problem when source_str points to an array shorter than length, so long as it contains a string terminator.
If neither of those cases is certain to hold, then it's not clear what you want to do. The strncpy() function may cover some of those cases, but it does not cover all of them.
Now, if I want to copy till length of source_str based on its terminating character ('\0'), even if memcpy length is more than the index of terminating character, is the above code a right way to do it?
No; you'd be copying the entire content of source_str, even past the null-terminator if it occurs before the end of the allocated space for the string it is pointing to.
If your concern is minimizing the auxiliary space used by your program, what you could do is use strlen to determine the length of source_str, and allocate target_str based on that. Also, strcpy is similar to memcpy but is specifically intended for null-terminated strings (observe that it has no "size" or "length" parameter):
char *target_str = NULL;
char *source_str = read_string_from_somewhere();
size_t len = strlen(source_str);
target_str = malloc(len + 1);
strcpy(target_str, source_str);
// ...
free(target_str);
target_str = NULL;
memcpy is used to copy fixed blocks of memory, so if you want to copy something shorter that is terminated by '\n' you don't want to use memcpy.
There is other functions like strncpy or strlcpy that do similar things.
Best to check what the implementations do. I removed the optimized versions from the original source code for the sake of readability.
This is an example memcpy implementation: https://git.musl-libc.org/cgit/musl/tree/src/string/memcpy.c
void *memcpy(void *restrict dest, const void *restrict src, size_t n)
{
unsigned char *d = dest;
const unsigned char *s = src;
for (; n; n--) *d++ = *s++;
return dest;
}
It's clear that here, both pieces of memory are visited for n times. regardless of the size of source or destination string, which causes copying of memory past your string if it was shorter. Which is bad and can cause various unwanted behavior.
this is strlcpy from: https://git.musl-libc.org/cgit/musl/tree/src/string/strlcpy.c
size_t strlcpy(char *d, const char *s, size_t n)
{
char *d0 = d;
size_t *wd;
if (!n--) goto finish;
for (; n && (*d=*s); n--, s++, d++);
*d = 0;
finish:
return d-d0 + strlen(s);
}
The trick here is that n && (*d = 0) evaluates to false and will break the looping condition and exit early.
Hence this gives you the wanted behaviour.
Use strlen to determine the exact size of source_string and allocate accordingly, remembering to add an extra byte for the null terminator. Here's a full example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
char *source_str = "string_read_from_somewhere";
int len = strlen(source_str);
char *target_str = malloc(len + 1);
if (!target_str) {
fprintf(stderr, "%s:%d: malloc failed", __FILE__, __LINE__);
return 1;
}
memcpy(target_str, source_str, len + 1);
puts(target_str);
free(target_str);
return 0;
}
Also, there's no need to cast the result of malloc. Don't forget to free the allocated memory.
As mentioned in the comments, you probably want to restrict the size of the malloced string to a sensible amount.
I need help with char array. I want to create a n-lenght array and initialize its values, but after malloc() function the array is longer then n*sizeof(char), and the content of array isnt only chars which I assign... In array is few random chars and I dont know how to solve that... I need that part of code for one project for exam in school, and I have to finish by Sunday... Please help :P
#include<stdlib.h>
#include<stdio.h>
int main(){
char *text;
int n = 10;
int i;
if((text = (char*) malloc((n)*sizeof(char))) == NULL){
fprintf(stderr, "allocation error");
}
for(i = 0; i < n; i++){
//text[i] = 'A';
strcat(text,"A");
}
int test = strlen(text);
printf("\n%d\n", test);
puts(text);
free(text);
return 0;
}
Well before using strcat make
text[0]=0;
strcat expects null terminated char array for the first argument also.
From standard 7.24.3.1
#include <string.h>
char *strcat(char * restrict s1,
const char * restrict s2);
The strcat function appends a copy of the string pointed to by s2
(including the terminating null character) to the end of the string
pointed to by s1. The initial character of s2 overwrites the null
character at the end of s1.
How do you think strcat will know where the first string ends if you don't
put a \0 in s1.
Also don't forget to allocate an extra byte for the \0 character. Otherwise you are writing past what you have allocated for. This is again undefined behavior.
And earlier you had undefined behavior.
Note:
You should check the return value of malloc to know whether the malloc invocation was successful or not.
Casting the return value of malloc is not needed. Conversion from void* to relevant pointer is done implicitly in this case.
strlen returns size_t not int. printf("%zu",strlen(text))
To start with, you're way of using malloc in
text = (char*) malloc((n)*sizeof(char)
is not ideal. You can change that to
text = malloc(n * sizeof *text); // Don't cast and using *text is straighforward and easy.
So the statement could be
if(NULL == (text = (char*) malloc((n)*sizeof(char))){
fprintf(stderr, "allocation error");
}
But the actual problem lies in
for(i = 0; i < n; i++){
//text[i] = 'A';
strcat(text,"A");
}
The strcat documentation says
dest − This is pointer to the destination array, which should contain
a C string, and should be large enough to contain the concatenated
resulting string.
Just to point out that the above method is flawed, you just need to consider that the C string "A" actually contains two characters in it, A and the terminating \0(the null character). In this case, when i is n-2, you have out of bounds access or buffer overrun1. If you wanted to fill the entire text array with A, you could have done
for(i = 0; i < n; i++){
// Note for n length, you can store n-1 chars plus terminating null
text[i]=(n-2)==i?'A':'\0'; // n-2 because, the count starts from zero
}
//Then print the null terminated string
printf("Filled string : %s\n",text); // You're all good :-)
Note: Use a tool like valgrind to find memory leaks & out of bound memory accesses.
i am trying to convert a string (example: "hey there mister") into a double pointer that's pointing to every word in the sentence.
so: split_string->|pointer1|pointer2|pointer3| where pointer1->"hey", pointer2->"there" and pointer3->"mister".
char **split(char *s) {
char **nystreng = malloc(strlen(s));
char str[strlen(s)];
int i;
for(i = 0; i < strlen(s); i++){
str[i] = s[i];
}
char *temp;
temp = strtok(str, " ");
int teller = 0;
while(temp != NULL){
printf("%s\n", temp);
nystreng[teller] = temp;
temp = strtok(NULL, " ");
}
nystreng[teller++] = NULL;
//free(nystreng);
return nystreng;
}
My question is, why isnt this working?
Your code has multiple problems. Among them:
char **nystreng = malloc(strlen(s)); is just wrong. The amount of space you need is the size of a char * times the number pieces into which the string will be split plus one (for the NULL pointer terminator).
You fill *nystreng with pointers obtained from strtok() operating on local array str. Those pointers are valid only for the lifetime of str, which ends when the function returns.
You do not allocate space for a string terminator in str, and you do not write one, yet you pass it to strtok() as if it were a terminated string.
You do not increment teller inside your tokenization loop, so each token pointer overwrites the previous one.
You have an essential problem here in that you do not know before splitting the string how many pieces there will be. You could nevertheless get an upper bound on that by counting the number of delimiter characters and adding 1. You could then allocate space for that many char pointers plus one. Alternatively, you could build a linked list to handle the pieces as you tokenize, then allocate the result array only after you know how many pieces there are.
As for str, if you want to return pointers into it, as apparently you do, then it needs to be dynamically allocated, too. If your platform provides strdup() then you could just use
char *str = strdup(s);
Otherwise, you'll need to check the length, allocate enough space with malloc() (including space for the terminator), and copy the input string into the allocated space, presumably with strcpy(). Normally you would want to free the string afterward, but you must not do that if you are returning pointers into that space.
On the other hand, you might consider returning an array of strings that can be individually freed. For that, you must allocate each substring individually (strdup() would again be your friend if you have it), and in that event you would want to free the working space (or allow it to be cleaned up automatically if you use a VLA).
There are two things you need to do -
char str[strlen(s)]; //size should be equal to strlen(s)+1
Extra 1 for '\0'. Right now you pass str (not terminated with '\0') to strtok which causes undefined behaviour .
And second thing ,you also need allocate memory to each pointer of nystring and then use strcpy instead of pointing to temp(don't forget space for nul terminator).
just brushing up on some C for a class and I've run across a little something that makes me scratch me head. For this code:
char * findString(const char * s){
/* Allocate space */
char * ret = malloc(strlen(s) + 1);
/* Copy characters */
char * n;
n = ret;
for ( ;*s != 0; s++)
if (isLetter(*s))
*n++ = *s;
*n = 0;
/* return pointer to beginning of string */
return ret;
}
(We're just assuming an isLetter that returns a 1/0).
The idea of the snippet is to take a string with a bunch of crap in it, and return a string that contains only the letters.
So, how does 'ret' work in this instance? I'm very confused by the returning of 'ret' when 'n = ret' is declared above the for loop and 'ret' never gets set to anything afterwards. Obviously I'm missing something here. Help!
-R. L.
both ret and n are pointers to the same block of memory. their 'values' are simply memory addresses -- when you change *n, you change *ret, even though n and ret retain their original values.
//make n point to the beginning of the block of memory pointed
//to by ret
n = ret;
//iterate through the string which was passed to
//the function
for ( ;*s != 0; s++)
//if the current character is a letter:
if (isLetter(*s))
//set the character pointed to by n to
//the current character in the string, and then
//make n point to the next one.
*n++ = *s;
note that the loop increments n, and then after the loop sets the last character to 0 (to null terminate the string). Now, n points to the end of the string -- but since ret was never changed it still points to the beginning of the memory that you malloced before the loop. When you return it, you're returning a pointer to the new string which is the string you passed to the function, minus all non-letters.
Note that after this function returns, it is the caller's responsibility to free() the memory allocated by the function, lest ye roam into memory leaks.
ret, while semantically being a string, is actually a pointer to the first character of the string. n is used as a pointer to the current position in that string. So ret stays pointing to the start of the string, while n moves along the string as it is filled in. the *n = 0 then adds the null terminator. Thus while ret doesn't get set to anything, the contents of the string it points to are set.
n and ret are pointers, which means they contain addresses. In this case they both contain the address of the same character buffer that's being allocated with malloc. In this sense, n and ret are interchangeable.
Not much to it, really, line 3 allocates an empty string, ret, that is long enough to hold the argument even if it is all letters. That string will eventually be returned. the function then iterates through the argument and if it is a letter, puts it into the return string, by way of an intermediate pointer, n, which keeps track of the current position in the returned string.
The ret is a pointer to the beginning of the string to be returned. You need to create another pointer, n, because this pointer will not always point to the beginning of the string to be returned, it will walk on it, changing its characters. And, to be able to return a string, you must return a pointer to the beginning of the string, and you need to know where it ends (that's why you need the 0 added to the end).
Hope I helped!