i have strings like "folder1/file1.txt" or "foldername1/hello.txt" and i need to take the substring that identify the folder name with the slash (/) included (example: from "folder1/file1.txt" i need "folder1/"). The folders name are not all with the same length.
How can i do this in C?? thanks
First, find the position of the slash with strchr:
char * f = "folder/foo.txt"; // or whatever
char * pos = strchr( f, '/' );
then copy into a suitable place:
char path[1000]; // or whatever
strncpy( path, f, (pos - f) + 1 );
path[(pos-f)+1] = 0; // null terminate
You should really write a function to do this, and you need to decide what to do if strchr() returns NULL, indicating the slash is not there.
Find the last '/' character, advance one character then truncate the string. Assuming that the string is modifiable, and is pointed to by char *filename;:
char *p;
p = strrchr(filename, '/');
if (p)
{
p[1] = '\0';
}
/* filename now points to just the path */
You could use the strstr() function:
char *s = "folder1/file1.txt";
char folder[100];
char *p = strstr(s, "/");
if (0 != p)
{
int len = p - s + 1;
strncpy(folder, s, len);
folder[len] = '\0';
puts(folder);
}
If you work on a POSIX machine, you can look up the dirname() function, which does more or less what you want. There are some special cases that it deals with that naïve code does not - but beware the weasel words in the standard too (about modifying the input string and maybe returning a pointer to the input string or maybe a pointer to static memory that can be modified later).
It (dirname()) does not keep the trailing slash that you say you require. However, that is seldom a practical problem; adding a slash between a directory name and the file is not hard.
Related
Hello I've come upon a problem. Im not very experienced in C.
I am trying to concatenate one char to my path variable.
But when I am running this line of code my other string variable gets "overriden" or behaves weird afterwards. When commented out everything works normally. I don't want to post the whole code here inseat I am just curios if this single line is somehow unsafe to run.
strcat(path, "/");
I also tried:
//edit i actually tried strcat but later strncat copied the line while reversing the changes//
char temp = '/';
strncat(path, &temp);
I am stuck wayyy to long on this so maybe someone can help.
For starters the function strncat has three parameters
char *strncat(char * restrict s1, const char * restrict s2, size_t n);
So this call
strncat(path, "/");
will not compile.
Apart from this error this code snippet
char temp = '/';
strncat(path, &temp);
has one more error that is the expression &temp does not point to a string.
You can append a character to a string only if the array containing the string has enough space to accommodate one more character. For example you may not change a string literal.
If the array containing the string has enough memory to accommodate the character '/' then you can write
strcat( path, "/" );
or
size_t n = strlen( path );
path[n++] = '/';
path[n] = '\0';
Or as #Barmar correctly pointed in his comment to the answer you could use strncat the following way
char temp = '/';
strncat(path, &temp, 1);
I am trying to tokenize a string but I need to know exactly when no data is seen between two tokens. e.g when tokenizing the following string "a,b,c,,,d,e" I need to know about the two empty slots between 'd' and 'e'... which I am unable to find out simply using strtok(). My attempt is shown below:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
tok=strtok(line,delim);//line contains the data
for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim))
{
if(tok)
sprintf(arr_fields[i], "%s", tok);
else
sprintf(arr_fields[i], "%s", "-");
}
Executing the above code with the aforementioned examples put characters a,b,c,d,e into first five elements of arr_fields which is not desirable. I need the position of each character to go in specific indexes of array: i.e if there is a character missing between two characters, it should be recorded as is.
7.21.5.8 the strtok function
The standard says the following regarding strtok:
[#3] The first call in the sequence searches the string
pointed to by s1 for the first character that is not
contained in the current separator string pointed to by s2.
If no such character is found, then there are no tokens in
the string pointed to by s1 and the strtok function returns
a null pointer. If such a character is found, it is the
start of the first token.
In the above quote we can read you cannot use strtok as a solution to your specific problem, since it will treat any sequential characters found in delims as a single token.
Am I doomed to weep in silence, or can somebody help me out?
You can easily implement your own version of strtok that does what you want, see the snippets at the end of this post.
strtok_single makes use of strpbrk (char const* src, const char* delims) which will return a pointer to the first occurrence of any character in delims that is found in the null-terminated string src.
If no matching character is found the function will return NULL.
strtok_single
char *
strtok_single (char * str, char const * delims)
{
static char * src = NULL;
char * p, * ret = 0;
if (str != NULL)
src = str;
if (src == NULL)
return NULL;
if ((p = strpbrk (src, delims)) != NULL) {
*p = 0;
ret = src;
src = ++p;
} else if (*src) {
ret = src;
src = NULL;
}
return ret;
}
sample use
char delims[] = ",";
char data [] = "foo,bar,,baz,biz";
char * p = strtok_single (data, delims);
while (p) {
printf ("%s\n", *p ? p : "<empty>");
p = strtok_single (NULL, delims);
}
output
foo
bar
<empty>
baz
biz
You can't use strtok() if that's what you want. From the man page:
A sequence of two or more contiguous delimiter characters in the parsed
string is considered to be a single delimiter. Delimiter characters at
the start or end of the string are ignored. Put another way: the
tokens returned by strtok() are always nonempty strings.
Therefore it is just going to jump from c to d in your example.
You're going to have to parse the string manually or perhaps search for a CSV parsing library that would make your life easier.
Lately I was looking for a solution to the same problem and found this thread.
You can use strsep().
From the manual:
The strsep() function was introduced as a replacement for strtok(3),
since the latter cannot handle empty fields.
As mentioned in this answer, you'll want to implement something like strtok yourself. I prefer using strcspn (as opposed to strpbrk), as it allows for fewer if statements:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
int current_token= 0;
int token_length;
for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim))
{
if(token_length)
sprintf(arr_fields[i], "%.*s", token_length, line + current_token);
else
sprintf(arr_fields[i], "%s", "-");
current_token += token_length;
}
Parse (for example, strtok)
Sort
Insert
Rinse and repeat as needed :)
You could try using strchr to find out the locations of the , symbols. Tokenize manually your string up to the token you found (using memcpy or strncpy) and then use again strchr. You will be able to see if two or more commas are next to each other this way (strchr will return numbers that their subtraction will equal 1) and you can write an if statement to handle that case.
I have a function that takes as its input a string containing a hyperlink and is attempting to output that same hyperlink except that if it contains a question mark, that character and any characters that follow it are purged.
First, I open a text file and read in a line containing a link and only a link like so:
FILE * ifp = fopen(raw_links,"r");
char link_to_filter[200];
if(ifp == NULL)
{
printf("Could not open %s for writing\n", raw_links);
exit(0);
}
while(fscanf(ifp,"%s", link_to_filter) == 1)
{
add_link(a,link_to_filter, link_under_wget);
};
fclose(ifp);
Part of what add_link does is strip the unnecessary parts of the link after a question mark (like with xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/online-giving/step1.php?new=1) which are causing an issue with my calls to wget. It does this by feeding link_to_filter through this function remove_extra, seen below.
char * remove_extra(char * url)
{
char * test = url;
int total_size;
test = strchr(url,'?');
if (test != NULL)
{
total_size = test-url;
url[total_size] = '\0';
}
return url;
}
at the end of remove_extra, upon returning from remove_extra and immediately prior to using strcpy a call to printf like so
printf("%s",url);
will print out what I expect to see (e.g. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/online-giving/step1.php without the '?' or trailing characters), but immediately after this block of code runs
struct node * temp = (struct node *)malloc(sizeof(struct node));
char * new_link = remove_extra(url);
temp->hyperlink =(char *)malloc(strlen(new_link) * sizeof(char));
strncpy(temp->hyperlink, new_link, strlen(new_link));
the result of a printf on member hyperlink occasionally has a single, junk character at the end (sometimes 'A' or 'Q' or '!', but always the same character corresponding to the same string). If this were happening with every link or with specific types of links, I could figure something out,
but it's only for maybe every 20th link and it happens to links both short and long.
e.g.
xxxxxxxxxxxxxxxxxxxx/hr/ --> xxxxxxxxxxxxxxxxxxxx/hr/!
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/windows-to-the-past/ --> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/windows-to-the-past/Q
This is happening both with strcpy and homemade string copy loops so I'm inclined to believe that it's not strcpy's fault, but I can't think of where else the error would lie.
If you compute len as the length of src, then the naive use of strncpy -- tempting though it might be -- is incorrect:
size_t len = strlen(src);
dest = malloc(len); /* DON'T DO THIS */
strncpy(dest, src, len); /* DON'T DO THIS, EITHER */
strncpy copies exactly len bytes; it does not guarantee to put a NUL byte at the end. Since len is precisely the length of src, there is no NUL byte within the first len bytes of src, and no NUL byte will be inserted by strncpy.
If you had used strcpy instead (the supposedly "unsafe" interface):
strcpy(dest, src);
it would have been fine, except for the fact that dest is not big enough. What you really need to do is this:
dest = malloc(strlen(src) + 1); /* Note: include space for the NUL */
strcpy(dest, src);
or, if you have the useful strdup function:
dest = strdup(src);
Mostly likely You forgot to copy a null terminator at the end of string, remember that strlen(str) gives the number of visible characters, You also need '\0' at the end.
You need to do
temp->hyperlink =(char *)malloc(strlen(new_link)+1);//char is one byte, sizeof(char)=1
and
strcpy(temp->hyperlink, new_link); Should work fine.
Why not use strdup?
I am trying to tokenize a string but I need to know exactly when no data is seen between two tokens. e.g when tokenizing the following string "a,b,c,,,d,e" I need to know about the two empty slots between 'd' and 'e'... which I am unable to find out simply using strtok(). My attempt is shown below:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
tok=strtok(line,delim);//line contains the data
for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim))
{
if(tok)
sprintf(arr_fields[i], "%s", tok);
else
sprintf(arr_fields[i], "%s", "-");
}
Executing the above code with the aforementioned examples put characters a,b,c,d,e into first five elements of arr_fields which is not desirable. I need the position of each character to go in specific indexes of array: i.e if there is a character missing between two characters, it should be recorded as is.
7.21.5.8 the strtok function
The standard says the following regarding strtok:
[#3] The first call in the sequence searches the string
pointed to by s1 for the first character that is not
contained in the current separator string pointed to by s2.
If no such character is found, then there are no tokens in
the string pointed to by s1 and the strtok function returns
a null pointer. If such a character is found, it is the
start of the first token.
In the above quote we can read you cannot use strtok as a solution to your specific problem, since it will treat any sequential characters found in delims as a single token.
Am I doomed to weep in silence, or can somebody help me out?
You can easily implement your own version of strtok that does what you want, see the snippets at the end of this post.
strtok_single makes use of strpbrk (char const* src, const char* delims) which will return a pointer to the first occurrence of any character in delims that is found in the null-terminated string src.
If no matching character is found the function will return NULL.
strtok_single
char *
strtok_single (char * str, char const * delims)
{
static char * src = NULL;
char * p, * ret = 0;
if (str != NULL)
src = str;
if (src == NULL)
return NULL;
if ((p = strpbrk (src, delims)) != NULL) {
*p = 0;
ret = src;
src = ++p;
} else if (*src) {
ret = src;
src = NULL;
}
return ret;
}
sample use
char delims[] = ",";
char data [] = "foo,bar,,baz,biz";
char * p = strtok_single (data, delims);
while (p) {
printf ("%s\n", *p ? p : "<empty>");
p = strtok_single (NULL, delims);
}
output
foo
bar
<empty>
baz
biz
You can't use strtok() if that's what you want. From the man page:
A sequence of two or more contiguous delimiter characters in the parsed
string is considered to be a single delimiter. Delimiter characters at
the start or end of the string are ignored. Put another way: the
tokens returned by strtok() are always nonempty strings.
Therefore it is just going to jump from c to d in your example.
You're going to have to parse the string manually or perhaps search for a CSV parsing library that would make your life easier.
Lately I was looking for a solution to the same problem and found this thread.
You can use strsep().
From the manual:
The strsep() function was introduced as a replacement for strtok(3),
since the latter cannot handle empty fields.
As mentioned in this answer, you'll want to implement something like strtok yourself. I prefer using strcspn (as opposed to strpbrk), as it allows for fewer if statements:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
int current_token= 0;
int token_length;
for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim))
{
if(token_length)
sprintf(arr_fields[i], "%.*s", token_length, line + current_token);
else
sprintf(arr_fields[i], "%s", "-");
current_token += token_length;
}
Parse (for example, strtok)
Sort
Insert
Rinse and repeat as needed :)
You could try using strchr to find out the locations of the , symbols. Tokenize manually your string up to the token you found (using memcpy or strncpy) and then use again strchr. You will be able to see if two or more commas are next to each other this way (strchr will return numbers that their subtraction will equal 1) and you can write an if statement to handle that case.
My application produces strings like the one below. I need to parse values between the separator into individual values.
2342|2sd45|dswer|2342||5523|||3654|Pswt
I am using strtok to do this in a loop. For the fifth token, I am getting 5523. However, I need to account for the empty value between the two separators || as well. 5523 should be the sixth token, as per my requirement.
token = (char *)strtok(strAccInfo, "|");
for (iLoop=1;iLoop<=106;iLoop++) {
token = (char *)strtok(NULL, "|");
}
Any suggestions?
In that case I often prefer a p2 = strchr(p1, '|') loop with a memcpy(s, p1, p2-p1) inside. It's fast, does not destroy the input buffer (so it can be used with const char *) and is really portable (even on embedded).
It's also reentrant; strtok isn't. (BTW: reentrant has nothing to do with multi-threading. strtok breaks already with nested loops. One can use strtok_r but it's not as portable.)
That's a limitation of strtok. The designers had whitespace-separated tokens in mind. strtok doesn't do much anyway; just roll your own parser. The C FAQ has an example.
On a first call, the function expects
a C string as argument for str, whose
first character is used as the
starting location to scan for tokens.
In subsequent calls, the function
expects a null pointer and uses the
position right after the end of last
token as the new starting location for
scanning.
To determine the beginning and the end
of a token, the function first scans
from the starting location for the
first character not contained in
delimiters (which becomes the
beginning of the token). And then
scans starting from this beginning of
the token for the first character
contained in delimiters, which becomes
the end of the token.
What this say is that it will skip any '|' characters at the beginning of a token. Making 5523 the 5th token, which you already knew. Just thought I would explain why (I had to look it up myself). This also says that you will not get any empty tokens.
Since your data is setup this way you have a couple of possible solutions:
1) find all occurrences of || and replace with | | (put a space in there)
2) do a strstr 5 times and find the beginning of the 5th element.
char *mystrtok(char **m,char *s,char c)
{
char *p=s?s:*m;
if( !*p )
return 0;
*m=strchr(p,c);
if( *m )
*(*m)++=0;
else
*m=p+strlen(p);
return p;
}
reentrant
threadsafe
strictly ANSI conform
needs an unused help-pointer from calling
context
e.g.
char *p,*t,s[]="2342|2sd45|dswer|2342||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
puts(t);
e.g.
char *p,*t,s[]="2,3,4,2|2s,d4,5|dswer|23,42||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
{
char *p1,*t1;
for(t1=mystrtok(&p1,t,',');t1;t1=mystrtok(&p1,0,','))
puts(t1);
}
your work :)
implement char *c as parameter 3
Look into using strsep instead: strsep reference
Use something other than strtok. It's simply not intended to do what you're asking for. When I've needed this, I usually used strcspn or strpbrk and handled the rest of the tokeninzing myself. If you don't mind it modifying the input string like strtok, it should be pretty simple. At least right off, something like this seems as if it should work:
// Warning: untested code. Should really use something with a less-ugly interface.
char *tokenize(char *input, char const *delim) {
static char *current; // just as ugly as strtok!
char *pos, *ret;
if (input != NULL)
current = input;
if (current == NULL)
return current;
ret = current;
pos = strpbrk(current, delim);
if (pos == NULL)
current = NULL;
else {
*pos = '\0';
current = pos+1;
}
return ret;
}
Inspired by Patrick Schlüter answer I made this function, it is supposed to be thread safe and support empty tokens and doesn't change the original string
char* strTok(char** newString, char* delimiter)
{
char* string = *newString;
char* delimiterFound = (char*) 0;
int tokLenght = 0;
char* tok = (char*) 0;
if(!string) return (char*) 0;
delimiterFound = strstr(string, delimiter);
if(delimiterFound){
tokLenght = delimiterFound-string;
}else{
tokLenght = strlen(string);
}
tok = malloc(tokLenght + 1);
memcpy(tok, string, tokLenght);
tok[tokLenght] = '\0';
*newString = delimiterFound ? delimiterFound + strlen(delimiter) : (char*)0;
return tok;
}
you can use it like
char* input = "1,2,3,,5,";
char** inputP = &input;
char* tok;
while( (tok=strTok(inputP, ",")) ){
printf("%s\n", tok);
}
This suppose to output
1
2
3
5
I tested it for simple strings but didn't use it in production yet, and posted it on code review too, so you can see what do others think about it
Below is the solution that is working for me now. Thanks to all of you who responded.
I am using LoadRunner. Hence, some unfamiliar commands, but I believe the flow can be understood easily enough.
char strAccInfo[1024], *p2;
int iLoop;
Action() { //This value would come from the wrsp call in the actual script.
lr_save_string("323|90||95|95|null|80|50|105|100|45","test_Param");
//Store the parameter into a string - saves memory.
strcpy(strAccInfo,lr_eval_string("{test_Param}"));
//Get the first instance of the separator "|" in the string
p2 = (char *) strchr(strAccInfo,'|');
//Start a loop - Set the max loop value to more than max expected.
for (iLoop = 1;iLoop<200;iLoop++) {
//Save parameter names in sequence.
lr_param_sprintf("Param_Name","Parameter_%d",iLoop);
//Get the first instance of the separator "|" in the string (within the loop).
p2 = (char *) strchr(strAccInfo,'|');
//Save the value for the parameters in sequence.
lr_save_var(strAccInfo,p2 - strAccInfo,0,lr_eval_string("{Param_Name}"));
//Save string after the first instance of p2, as strAccInfo - for looping.
strcpy(strAccInfo,p2+1);
//Start conditional loop for checking for last value in the string.
if (strchr(strAccInfo,'|')==NULL) {
lr_param_sprintf("Param_Name","Parameter_%d",iLoop+1);
lr_save_string(strAccInfo,lr_eval_string("{Param_Name}"));
iLoop = 200;
}
}
}