Comma Delimiting C - c

I need my program to take a series of file names (stored in a single "String" and separated by commas) and act on them.
The psuedo code would be:
for each filename in some_string
open filename
operate on contents of filename
close filename
The issue is that I'm stuck separating some_string ("filename1,filename2,...,filenamen") into [filename 1], [filename 2], ... [filename n].
Edit: to clarify, it seems simpler to keep some_string intact and extract each file name as needed, which is what I'm attempting to do.
My code, as it stands, is pretty clunky (and quite disgusting...)
int j = 0;
char *tempS = strdup(filenames);
while (strchr(tempS, ',')) {
char *ptr = strchr(tempS, ',');
*ptr++ = '.';
numFiles++;
}
for (; j < numFiles; j++) {
char *ptr = strchr(tempS, ',');
//don't know where to go from here...
fin = openFile(tempS);
if (fin != NULL) {
//do something
}
fclose(fin);
}
It's not done, obviously. I correctly find the number of files, but I'm a little lost when it comes to figuring out how to separate one at a time from the source string and operate on it.

You can use strtok for this
char *fname = strtok(tempS, ",");
while (fname != NULL) {
/* process filename */
fname = strtok(NULL, ",");
}
strtok delivers the strings separated by comma, one by one.

Usually for splitting string in C, strok() function from C standard library is used.
#include <string.h>
...
char *token;
char *line = "string1,string2,string3";
char *search = ",";
token = strtok(line, search);
token = strtok(NULL, search);

strtok() is not multithreaded-safe. If that matters to you, you should use strtok_r(). For example:
char *savedptr = NULL /* to be passed back to strtok_r in follow-on calls */
char *tempS = strdup( some_string ); /* to keep your original intact */
char *fname = strtok_r(tempS, ",", savedptr);
while (fname != NULL) {
/* process fname ... */
fname = strtok_r(NULL, ",", savedptr); /* pass savedptr back to strtok_r */
}

Related

Taking only a part of a text to an array [duplicate]

How to split a string into an tokens and then save them in an array?
Specifically, I have a string "abc/qwe/jkh". I want to separate "/", and then save the tokens into an array.
Output will be such that
array[0] = "abc"
array[1] = "qwe"
array[2] = "jkh"
please help me
#include <stdio.h>
#include <string.h>
int main ()
{
char buf[] ="abc/qwe/ccd";
int i = 0;
char *p = strtok (buf, "/");
char *array[3];
while (p != NULL)
{
array[i++] = p;
p = strtok (NULL, "/");
}
for (i = 0; i < 3; ++i)
printf("%s\n", array[i]);
return 0;
}
You can use strtok()
char string[] = "abc/qwe/jkh";
char *array[10];
int i = 0;
array[i] = strtok(string, "/");
while(array[i] != NULL)
array[++i] = strtok(NULL, "/");
Why strtok() is a bad idea
Do not use strtok() in normal code, strtok() uses static variables which have some problems. There are some use cases on embedded microcontrollers where static variables make sense but avoid them in most other cases. strtok() behaves unexpected when more than 1 thread uses it, when it is used in a interrupt or when there are some other circumstances where more than one input is processed between successive calls to strtok().
Consider this example:
#include <stdio.h>
#include <string.h>
//Splits the input by the / character and prints the content in between
//the / character. The input string will be changed
void printContent(char *input)
{
char *p = strtok(input, "/");
while(p)
{
printf("%s, ",p);
p = strtok(NULL, "/");
}
}
int main(void)
{
char buffer[] = "abc/def/ghi:ABC/DEF/GHI";
char *p = strtok(buffer, ":");
while(p)
{
printContent(p);
puts(""); //print newline
p = strtok(NULL, ":");
}
return 0;
}
You may expect the output:
abc, def, ghi,
ABC, DEF, GHI,
But you will get
abc, def, ghi,
This is because you call strtok() in printContent() resting the internal state of strtok() generated in main(). After returning, the content of strtok() is empty and the next call to strtok() returns NULL.
What you should do instead
You could use strtok_r() when you use a POSIX system, this versions does not need static variables. If your library does not provide strtok_r() you can write your own version of it. This should not be hard and Stackoverflow is not a coding service, you can write it on your own.

Segfault resulting from strdup and strtok

I've been assigned a homework from my college professor and I seem to have found some strange behavior of strtok
Basically, we have to parse a CSV file for my class, where the number of tokens in the CSV is known and the last element may have extra "," characters.
An example of a line:
Hello,World,This,Is,A lot, of Text
Where the tokens should be output as
1. Hello
2. World
3. This
4. Is
5. A lot, of Text
For this assignment we MUST use strtok. Because of this I found on some other SOF post that using strtok with an empty string (or passing "\n" as the second argument) results in reading until the end of the line. This is perfect for my application since the extra commas always appear in the last element.
I've created this code which works:
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#define NUM_TOKENS 5
const char *line = "Hello,World,This,Is,Text";
char **split_line(const char *line, int num_tokens)
{
char *copy = strdup(line);
// Make an array the correct size to hold num_tokens
char **tokens = (char**) malloc(sizeof(char*) * num_tokens);
int i = 0;
for (char *token = strtok(copy, ",\n"); i < NUM_TOKENS; token = strtok(NULL, i < NUM_TOKENS - 1 ? ",\n" : "\n"))
{
tokens[i++] = strdup(token);
}
free(copy);
return tokens;
}
int main()
{
char **tokens = split_line(line, NUM_TOKENS);
for (int i = 0; i < NUM_TOKENS; i++)
{
printf("%s\n", tokens[i]);
free(tokens[i]);
}
}
Now this works and should get me full credit but I hate this ternary that shouldn't be needed:
token = strtok(NULL, i < NUM_TOKENS - 1 ? ",\n" : "\n");
I'd like to replace the method with this version:
char **split_line(const char *line, int num_tokens)
{
char *copy = strdup(line);
// Make an array the correct size to hold num_tokens
char **tokens = (char**) malloc(sizeof(char*) * num_tokens);
int i = 0;
for (char *token = strtok(copy, ",\n"); i < NUM_TOKENS - 1; token = strtok(NULL, ",\n"))
{
tokens[i++] = strdup(token);
}
tokens[i] = strdup(strtok(NULL, "\n"));
free(copy);
return tokens;
}
This tickles my fancy much nicer since it is much easier to see that there is a final case. You also get rid of the strange ternary operator.
Sadly though, this segfaults! I can't for the life of me figure out why.
Edit: Add some output examples:
[11:56:06] gravypod:test git:(master*) $ ./test_no_fault
Hello
World
This
Is
Text
[11:56:10] gravypod:test git:(master*) $ ./test_seg_fault
[1] 3718 segmentation fault (core dumped) ./test_seg_fault
[11:56:14] gravypod:test git:(master*) $
Please check the return value from strtok before you risk passing NULL to another function. Your loop is calling strtok one more time than you think.
It is more usual to use this return value to control your loop, then you are not at the mercy of your data. As for the delimitors, best to keep it simple and not try anything fancy.
char **split_line(const char *line, int num_tokens)
{
char *copy = strdup(line);
char **tokens = (char**) malloc(sizeof(char*) * num_tokens);
int i = 0;
char *token;
char delim1[] = ",\r\n";
char delim2[] = "\r\n";
char *delim = delim1; // start with a comma in the delimiter set
token = strtok(copy, delim);
while(token != NULL) { // strtok result comtrols the loop
tokens[i++] = strdup(token);
if(i == NUM_TOKENS) {
delim = delim2; // change the delimiters
}
token = strtok(NULL, delim);
}
free(copy);
return tokens;
}
Note you should also check the return values from malloc and strdup and free your memory properly
When you get to the last loop, you'll get
for (char *token = strtok(copy, ",\n"); i < NUM_TOKENS - 1; token = strtok(NULL, ",\n"))
loop body
loop increment step, i.e. token = strtok(NULL, ",\n") (with the wrong second arg)
loop continuation check i < NUM_TOKENS - 1
i.e. it has still called strtok even though you're now out-of-range. You've also got an off-by-one on your array indices here: you'd want to initialise i=0 not 1.
You could avoid this by e.g.
making the initial strtok a special case outside the loop, e.g.
int i = 0;
tokens[i++] = strdup(strtok(copy, ",\n"));
then moving the strtok(NULL, ",\n") inside the loop
I'm also surprised you want the \n there at all, or even need to call the last strtok (wouldn't that already just point to the rest of the string? If you just trying to chop a trailing newline there are easier ways) but I haven't used strtok in years.
(As an aside you're also not freeing the malloced array you store the string pointers in. That said since it's the end of the program at that point that doesn't matter so much.)
Remember that strtok identifies a token when it finds any of the characters in the delimiter string (the second argument to strtok()) - it doesn't try to match the entire delimiter string itself.
Thus, the ternary operator was never needed in the first place - the string will be tokenized based on the occurrence of , OR \n in the input string, so the following works:
for (token = strtok(copy, ",\n"); i < NUM_TOKENS; token = strtok(NULL, ",\n"))
{
tokens[i++] = strdup(token);
}
The second example segfaults because it's already tokenized the input to the end of the string by the time it exits the for loop. Calling strtok() again sets token to NULL, and the segfault is generated when strdup() is called on the NULL pointer. Removing the extra call to strtok gives the expected results:
for (token = strtok(copy, ",\n"); i < NUM_TOKENS - 1; token = strtok(NULL, ",\n"))
{
tokens[i++] = strdup(token);
}
tokens[i] = strdup(token);

How to populate a 2D array with Strings (char*) from a CSV file in C

I am working on a project where I am trying to read in a CSV file and check it for all sorts of different parameters. C is a new language for me so I am still getting used to it, and my code may not be as streamlined as it could be. Anyways, my issue is this: when I read the file to make sure each row has the same number of entries it works fine, but when I try to put the values into a 2D array everything gets messed up. The value at arr[0][0] should be the first value in the CSV file, but when I print out to see what it is, it is always the first value in the last row. I can't for the life of me figure this out, and I have been looking at this code and tinkering with it for hours. Any help would be much appreciated!
int parseFile(int width, int height, char*fileName) {
char*** matrix = malloc(width*height*sizeof(char*));
int counter = 0;
int pointer = 0;
FILE *file = fopen(fileName, "r");
const size_t line_size = 1024;
char* line = (char*) malloc(line_size);
while (fgets(line, line_size, file) != NULL) {
char** data = malloc(width*sizeof(char*));
char *delim = ",";
char *string = strtok(line, delim);
while (string != NULL) {
size_t ln = strlen(string) - 1;
if (string[ln] == '\n') {
string[ln] = '\0';
}
data[counter] = string;
string = strtok(NULL, delim);
counter++;
}
matrix[pointer] = data;
counter = 0;
pointer++;
}
printf("%s", matrix[0][0]);
return 0;
}
This looks nearly right, but what is happening is strtok is returning a pointer into line. That means it only has copies of the last line.
Consider creating a new memory slot for each line, or strdup the result from strtok

Print delim used by strtok_r

I have this text for example:
I know,, more.- today, than yesterday!
And I'm extracting words with this code:
while(getline(&line, &len, fpSourceFile) > 0) {
last_word = NULL;
word = strtok_r(line, delim, &last_word);
while(word){
printf("%s ", word);
word = strtok_r(NULL, delim, &last_word);
// delim_used = ;
}
}
The output is:
I know more today than yesterday
But there is any way to get the delimiter used by strtok_r()? I want to replace same words by one integer, and do the same with delimiters. I can get one word with strtok_r(), but how get the delimiter used by that function?
Fortunately, strtok_r() is a pretty simple function - it's easy to create your own variant that does what you need:
#include <string.h>
/*
* public domain strtok_ex() based on a public domain
* strtok_r() by Charlie Gordon
*
* strtok_r from comp.lang.c 9/14/2007
*
* http://groups.google.com/group/comp.lang.c/msg/2ab1ecbb86646684
*
* (Declaration that it's public domain):
* http://groups.google.com/group/comp.lang.c/msg/7c7b39328fefab9c
*/
/*
strtok_ex() is an extended version of strtok_r() that optinally
returns the delimited that was used to terminate the token
the first 3 parameters are the same as for strtok_r(), the last
parameter:
char* delim_found
is an optional pointer to a character that will get the value of
the delimiter that was found to terminate the token.
*/
char* strtok_ex(
char *str,
const char *delim,
char **nextp,
char* delim_found)
{
char *ret;
char tmp;
if (!delim_found) delim_found = &tmp;
if (str == NULL)
{
str = *nextp;
}
str += strspn(str, delim);
if (*str == '\0')
{
*delim_found = '\0';
return NULL;
}
ret = str;
str += strcspn(str, delim);
*delim_found = *str;
if (*str)
{
*str++ = '\0';
}
*nextp = str;
return ret;
}
#include <stdio.h>
int main(void)
{
char delim[] = " ,.-!";
char line[] = "I know,, more.- today, than yesterday!";
char delim_used;
char* last_word = NULL;
char* word = strtok_ex(line, delim, &last_word, &delim_used);
while (word) {
printf("word: \"%s\" \tdelim: \'%c\'\n", word, delim_used);
word = strtok_ex(NULL, delim, &last_word, &delim_used);
}
return 0;
}
Getting any skipped delimiters would be a bit more work. I don't think it would be a lot of work, but I do think the interface would be unwieldy (strtok_ex()'s interface is already clunky), so you'd have to put some thought into that.
No, you cannot identify the delimiter (by means of the call to strtok_r() itself).
From man strtok_r:
BUGS
[...]
The identity of the delimiting byte is lost.

Parsing a file with strtok in C

I'm writing a short function to parse through a file by checking string tokens. It should stop when it hits "visgroups", which is the 9th line of the file I am using to test (which is in the buffer called *source). "versioninfo" is the first line. When I run this code it just repeatedly prints out "versioninfo" until I cancel the program manually. Why isn't the strtok function moving on?
I will be doing some different manipulation of the source when I reach this point, that's why the loop control variable is called "active". Would this have anything to do with the fact that strtok isn't thread-safe? I'm not using source in any other threads.
int countVisgroups(int *visgroups, char *source) {
const char delims[] = {'\t', '\n', ' '};
int active = 0;
char *temp;
while (!active){
temp = strtok(source, delims);
if (temp == NULL) {
printf("%s\n", "Reached end of file while parsing.");
return(0);
}
if (strncmp(temp, "visgroups", 9) == 0) {
active = 1;
return(0);
}
printf("%s\n", temp);
}
return(0);
}
Your delims array needs to be nul terminated. Otherwise how can strtok know how many separators you passed in? Normally you'd just use const char *delims = "\t\n " but you could simply add ..., 0 to your initializer.
After the first call to strtok with the string you want to tokenize, all subsequent calls must be done with the first parameter set to NULL.
temp = strtok(NULL, delims);
And no it probably doesn't have to do anything with thread safety.
Try to rewrite it like this:
int countVisgroups(int *visgroups, char *source) {
const char delims[] = {'\t', '\n', ' ', '\0'};
int active = 0;
char *temp;
temp = strtok(source, delims);
while (!active){
if (temp == NULL) {
printf("%s\n", "Reached end of file while parsing.");
return(0);
}
if (strncmp(temp, "visgroups", 9) == 0) {
active = 1;
return(0);
}
printf("%s\n", temp);
temp = strtok(NULL, delims);
}
return(0);
}

Resources