Strsep, Parsing CSV Input further - c

Using strsep to split a CSV with a bunch of usless junk in it ("," Delim). One of the entries has quotes on either side (ie Florida,"Bob",1999) and I'd like to pull those out before I save them in my array.
How do I remove the quotes from the name? Thanks!
for (int i = 0; i < 19; i++)
{
token = strsep(&copy, ","); //Split token
if (i == 3) {one_spot->location = token;}
if (i == 17) {one_spot->name = token;}//the Name is in quotes in CSV
if (i == 18) {one_spot->year = atoi(token);}
}
all_spots[j] = one_spot; //Add to array.

You could do something like this:
look for the first " using strchr
If found, look for the next "
Use memcpy to copy the strings between the quotes.
if (i == 17)
{
char *firstq = strchr(token, '"');
if(firstq == NULL)
{
one_song->name = strdup(token);
continue;
}
char *lastq = strchr(firstq++, '"');
if(lastq == NULL)
{
// does not end in ", copy everything
one_song->name = strdup(token);
continue;
}
size_t len = lastq - firstq;
char *word = calloc(len + 1, 1);
if(word == NULL)
{
// error handling, do not continue
}
memcpy(word, firstq, len); // do not worry about \0 because of calloc
one_song->name = word;
}
Note that I use strdup to do the assignment one_song->name = strdup(token);
and calloc to allocate memory. strsep returns a pointer to copy + an
offset. Depending how you created/allocated copy, this memory might be
invalid once the function exits. That's why it's better to create a copy of the
original before you assign it to the struct.
This code is very simple, it does not handle spaces at the beginning and end of
the string. It can distinguish between abc and "abc" but it fails at
"abc"d or "abc"def". It also does not handle escaped quotes, etc. This code shows you only a way of extracting a string from the quotes. It's not my job to write your exercise for you, but I can show you how to start.

Related

Variable empty after comparison

In the code, when I print var_value, it shows its content but when I need to assign it later on the if else statements, it's empty only IN THE LAST IF and I have no idea why it is. If I delete the last statement, the other three pass without problems.
while ((read = getline(&line, &len, f)) != -1){
printf("%s\n", line);
char *token;
token = strtok(line, "=");
var_name = token;
/* Separate every line by the '=' character */
while( token != NULL ) {
var_value = token;
token = strtok(NULL, "=");
}
printf("%s\n", var_name);
printf("%s\n", var_value);
// Obtain the parameters
if (strcmp(var_name, "puerto") == 0) {
puerto = atoi(var_value);
parameters_count += 1;
} else if (strcmp(var_name, "tamano_tabla") == 0) {
tamano_tabla = atoi(var_value);
parameters_count += 1;
} else if (strcmp(var_name, "periodo_archivo") == 0) {
periodo_archivo = atoi(var_value);
parameters_count += 1;
} else if (strcmp(var_name, "archivo_tabla") == 0) {
printf("%s var val\n", var_value);
strcpy(archivo_tabla, strtok(var_value, "\n")); //Remove \n and copy to destination variable
parameters_count += 1;
printf("%s filetabla\n", archivo_tabla);
}
}
Edit: Results in console and after the final one, segmentation fault
puerto=1212
puerto
1212
archivo_tabla=tabla.xml
archivo_tabla
tabla.xml
tabla.xml
var val
Your output does not support your contention that
In the code, when I print var_value, it shows its content but when I
need to assign it later on the if else statements, it's empty only IN
THE LAST IF [...].
In fact, your output shows the opposite: var_val is printed just fine. I have to assume that you are being confused by the fact that its value ends with a newline, which is printed, too. Thus, " var val" appears at the beginning of a line. Here's the expected value of var_val, including newline, followed by " var val":
tabla.xml
var val
The presence of a newline in the string provided by getline() is the whole point of the strtok(var_value, "\n") that happens next, after all. Or so I assume.
Note also that although the output you present appears to be truncated relative to the code you present, in my tests, the contents of var_val are successfully copied to variable archivo_tabla, too, less that pesky newline.
This line is suspect: strcpy(archivo_tabla, strtok(var_value, "\n")); //Remove \n and copy to destination variable
strtok mutates var_value. You seem to be copying archivo_tabla to the remaining part of var_value after the "\n" (which doesn't really make sense)
https://www.tutorialspoint.com/c_standard_library/c_function_strtok.htm

Parse HTTP Request Line In C

This is the problem that will never end. The task is to parse a request line in a web server -- of indeterminate length -- in C. I pulled the following off of the web as an example with which to work.
GET /path/script.cgi?field1=value1&field2=value2 HTTP/1.1
I must extract the absolute path: /path/script.cgi and the query: ?field1=value1&field2=value2. I'm told the following functions hold the key: strchr, strcpy, strncmp, strncpy, and/or strstr.
Here's what has happened so far: I've learned that using functions like strchr and strstr will absolutely allow me to truncate the request line at certain points, but will never allow me to get rid of portions of the request line I do not want, and it doesn't matter how I layer them.
For example, here's some code that get's me close to isolating the query, but I can't eliminate the http version.
bool parse(const char* line)
{
// request line w/o method
const char ch = '/';
char* lineptr = strchr(line, ch);
// request line w/ query and HTTP version
char ch_1 = '?';
char* lineptr_1 = strchr(lineptr, ch_1);
// request line w/o query
char ch_2 = ' ';
char* lineptr_2 = strchr(lineptr_1, ch_2);
printf("%s\n", lineptr_2);
if (lineptr_2 != NULL)
return true;
else
return false;
}
Needless to say, I have a similar issue trying to isolate the absolute path (I can ditch the method, but not the ? or anything thereafter), and I see no occasion on which I can use the functions that require me to know a priori how many chars I'd like to copy from one location (usually an array) to another because, when this is run in real time, I will have no clue what the request line will look like in advance. If someone sees something that I am missing and could point me in the right direction, I would be most grateful!
A more elegant solution.
#include <stdio.h>
#include <string.h>
int parse(const char* line)
{
/* Find out where everything is */
const char *start_of_path = strchr(line, ' ') + 1;
const char *start_of_query = strchr(start_of_path, '?');
const char *end_of_query = strchr(start_of_query, ' ');
/* Get the right amount of memory */
char path[start_of_query - start_of_path];
char query[end_of_query - start_of_query];
/* Copy the strings into our memory */
strncpy(path, start_of_path, start_of_query - start_of_path);
strncpy(query, start_of_query, end_of_query - start_of_query);
/* Null terminators (because strncpy does not provide them) */
path[sizeof(path)] = 0;
query[sizeof(query)] = 0;
/*Print */
printf("%s\n", query, sizeof(query));
printf("%s\n", path, sizeof(path));
}
int main(void)
{
parse("GET /path/script.cgi?field1=value1&field2=value2 HTTP/1.1");
return 0;
}
I wrote some functions in C a while back that manually parse c-strings up to a delimiter, similar to getline in C++.
// Trims all leading whitespace along with consecutive whitespace from provided cstring into destination char*. WARNING: ensure size <= sizeof(destination)
void Trim(char* destination, char* source, int size)
{
bool trim = true;
int index = 0;
int i;
for (i = 0; i < size; ++i)
{
if (source[i] == '\n' || source[i] == '\0')
{
destination[index++] = '\0';
break;
}
else if (source[i] != ' ' && source[i] != '\t')
{
destination[index++] = source[i];
trim = false;
}
else if (trim)
continue;
else
{
if (index > 0 && destination[index - 1] != ' ')
destination[index++] = ' ';
}
}
}
// Parses text up to the provided delimiter (or newline) into the destination char*. WARNING: ensure size <= sizeof(destination)
void ParseUpToSymbol(char* destination, char* source, int size, char delimiter)
{
int index = 0;
int i;
for (i = 0; i < size; ++i)
{
if (source[i] != delimiter && source[i] != '\n' && source[i] != '\0' && source[i] != ' '))
{
destination[index++] = source[i];
}
else
{
destination[i] = '\0';
break;
}
}
Trim(destination, destination, size);
}
Then you could parse your c-string with something along these lines:
char* buffer = (char*)malloc(64);
char* temp = (char*)malloc(256);
strcpy(temp, "GET /path/script.cgi?field1=value1&field2=value2 HTTP/1.1");
Trim(temp, temp, 256);
ParseUpToSymbol(buffer, cstr, 64, '?');
temp = temp + strlen(buffer) + 1;
Trim(temp, temp, 256);
The code above trims any leading and trailing whitespace from the target string, in this case "GET /path/script.cgi?field1=value1&field2=value2 HTTP/1.1", and then stores the parsed value into the variable buffer. Running this the first time should put the word "GET" inside of buffer. When you do the "temp = temp + strlen(buffer) + 1" you are readjusting the temp char-pointer so you can call ParseUpToSymbol again with the remaining part of the string. If you were to call it again, you should get the absolute path leading up to the first question mark. You could repeat this to get each individual query string or change the delimiter to a space and get the entire query string portion of the URL. I think you get the idea. This is just one of many solutions of course.

Blank Line Seg. Fault (Writing Own Shell)

I'm still learning about writing simple shell.
I want this shell to allow blank lines and comments.
I did some coding and I encountered a problem that if I directly just input enter (blank line), it directly seg.fault core dumped.
I don't know exactly where's the mistake, because I print everything and all seems fine. The only thing that I suspicious in these line
if (args[0] == NULL || !(strncmp(args[0],"#",1))) {
exitstat = 0;
}
I got the args from basic split command function. The weird thing is the comments works just fine.
Below is my functions for read what user input and split them (tokenize if I'm not mistaken). They are really basic because I'm learn those functions from internet tutorial.
char *commandInput() {
char *command = NULL;
ssize_t bufsize = 0;
getline(&command, &bufsize, stdin);
return command;
}
char **splitLine(char *command) {
int bufsize = 64,
int position = 0;
char **tokens = malloc(bufsize * sizeof(char*));
char *token;
token = strtok(command, DELIMITER);
while (token != NULL) {
tokens[position] = token;
position++;
if (position >= bufsize) {
bufsize += 64;
tokens = realloc(tokens, bufsize * sizeof(char*));
}
token = strtok(NULL, DELIMITER);
}
tokens[position] = NULL;
return tokens;
}
Anybody could help me recognize what makes it seg.fault if I enter blank line? Thank you.
EDIT
I used debugger (finally succeed to use it after several trial) and it turns out that the error is located at the line that I didn't expect to cause any problem (see ---UPDATE----).
They way I handle my commandInput function is in main() function, I write
int main () {
......
char * command = NULL
char **args;
command = commandInput();
args= splitLine(command);
------------------ UPDATE!(CAUSING ERROR IF STATEMENT) ---------------
background = 0
numbarguments = 0
// Condition to check whether there is a start program running in backgrond
if (!(strncmp(args[numbarguments - 1], "&",1))) {
background = 1;
args[numbarguments - 1] = NULL;
}
----------------------------------------------
if (args[0] == NULL || !(strncmp(args[0],"#",1))) {
exitstat = 0;
}
....... //(comparing the arguments other than null)
}
So any advice regarding that if condition that causing me seg.fault. Thank you.
The parameter you pass to splitline is modified. strtok has the effect of modifying the string it gets by inserting \0's and returning a pointer to substrings. What strtok returns is not something you can directly store for later use, instead you need to make a copy of it.
token = strtok(command, DELIMITER);
while (token != NULL)
{
tokens[position] = malloc(strlen(token)+1);
strcpy(tokesn[position],token);
...
so in other words it is not enough to allocate the array of pointers to strings, you also need to allocate space to hold strings that you tokenize with strtok.
The code
if (!(strncmp(args[numbarguments - 1], "&",1))) {
background = 1;
args[numbarguments - 1] = NULL;
}
looks wrong, numberarguments is initially 0 so you are comparing args[-1] with "&" then later you assign args[-1] = NULL which probably causes the seg fault.

How to convert my malloc + strcpy to strdup in C?

I am trying to save csv data in an array for use in other functions. I understand that strdup is good for this, but am unsure how to make it work for my situation. Any help is appreciated!
The data is stored in a struct:
typedef struct current{
char **data;
}CurrentData;
Function call:
int main(void){
int totalProducts = 0;
CurrentData *AllCurrentData = { '\0' };
FILE *current = fopen("C:\\User\\myfile.csv", "r");
if (current == NULL){
puts("current file data not found");
}
else{
totalProducts = getCurrentData(current, &AllCurrentData);
}
fclose(current);
return 0;
}
How I allocated memory;
int getCurrentData(FILE *current, CurrentData **AllCurrentData){
*AllCurrentData = malloc(totalProducts * sizeof(CurrentData));
/*allocate struct data memory*/
while ((next = fgetc(current)) != EOF){
if (next == '\n'){
(*AllCurrentData)[newLineCount].data = malloc(colCount * sizeof(char*));
newLineCount++;
}
}
newLineCount = 0;
rewind(current);
while ((next = fgetc(current)) != EOF && newLineCount <= totalProducts){
if (ch != '\0'){
buffer[i] = ch;
i++;
characterCount++;
}
if (ch == ',' && next != ' ' || ch == '\n' && ch != EOF){
if (i > 0){
buffer[i - 1] = '\0';
}
length = strlen(buffer);
/*(*AllCurrentData)[newLineCount].data[tabCount] = malloc(length + 1); /* originally was using strcpy */
strcpy((*AllCurrentData)[newLineCount].data[tabCount], buffer);
*/
(*AllCurrentData)[newLineCount].data[tabCount] = strdup(buffer); /* something like this? */
i = 0;
tabCount++;
for (j = 0; j < BUFFER_SIZE; j++){
buffer[j] = '\0';
}
}
You define a ptr AllCurrentData but you should set it to NULL.
CurrentData* AllCurrentData = NULL;
In getCurrentData you use totalProducts which seems a bit
odd since it is a local variable in main(), either you have another
global variable with the same name or there is an error.
The **data inside the structure seems odd, instead maybe you want
to parse the csv line and create proper members for them. You already
have an array of CurrentData so it seems odd to have another array
inside the struct -- i am just guessing cause you haven't explained
that part.
Since a csv file is line based use fgets() to read one line
from the file, then parse the string by using e.g. strtok or just by
checking the buffer after delimiters. Here strdup can come into play,
when you have taken out a token, do a strdup on it and store it in your
structure.
char line[255];
if ( fgets(line,sizeof(line),current) != NULL )
{
char* token = strdup(strtok( line, "," ));
...
}
Instead of allocating a big buffer that may be enough (or not) use
realloc to increase your buffer as you read from the file.
That said there are faster ways to extract data from a csv-file e.g.
you can read in the whole file with fread, then look for delimiters
and set these to \0 and create an array of char pointers into the buffer.
Okay, I wouldn't comment on other parts of your code, but you can use strdup to get rid of this line (*AllCurrentData)[newLineCount].data = malloc(colCount * sizeof(char*));, and this line (*AllCurrentData)[newLineCount].data[tabCount] = strdup(buffer); /* something like this? */
and replace them with this: (*AllCurrentData)[newLineCount].data = strdup(buffer);
For the function to read in the array of strings I would start with the following approach. This has not been tested or even compiled however it is a starting place.
There are a number of issues not addressed by this sample. The temporary buffer size of 4K characters may or may not be sufficiently large for all lines in the file. There may be more lines of text in the file than elements in the array of pointers and there is no indication from the function that this has happened.
Improvements to this would be better error handling. Also it might be modified so that the array of pointers is allocated in the function with some large amount and then if there are more lines in the file than array elements, using the realloc() function to enlarge the array of pointers by some size. Perhaps also a check on the file size and using an average text line length would be appropriate to provide an initial size for the array of pointers.
// Read lines of text from a text file returning the number of lines.
// The caller will provide an array of char pointers which will be used
// to return the list of lines of text from the file.
int GetTextLines (FILE *hFile, char **pStringArrays, int nArrayLength)
{
int iBuffSize = 4096;
int iLineCount = 0;
char tempBuffer [4096];
while (fgets (tempBuffer, iBuffSize, hFile) && iLineCount < nArrayLength) {
pStringArrays[iLineCount] = malloc ((strlen(tempBuffer) + 1) * sizeof (char));
if (! pStringArrays[iLineCount])
break;
strcpy (pStringArrays[iLineCount], tempBuffer);
iLineCount++;
}
return iLineCount;
}

Ignoring spaces in a string unless it's in quotes

char *args[32];
char **next = args;
char *temp = NULL;
char *quotes = NULL;
temp = strtok(line, " \n&");
while (temp != NULL) {
if (strncmp(temp, "\"", 1) == 0) {
//int i = strlen(temp);
printf("first if");
quotes = strtok(temp, "\"");
} else if (strncmp(temp, "\"", 1) != 0) {
*next++ = temp;
temp = strtok(NULL, " \n&");
}
}
I'm having trouble with trying to understand with how to still keep spaces if a part of the string is surrounded with quotes. For example, if I want execvp() to execute this: diff "space name.txt" sample.txt, it should save diff at args[0], space name.txt at args[1] and sample.txt at args[2].
I'm not really sure on how to implement this, I've tried a few different ways of logic with if statements, but I'm not quite there. At the moment I am trying to do something simple like: ls "folder", however, it gets stuck in the while loop of printing out my printf() statement.
I know this isn't worded as a question - it's more explaining what I'm trying to achieve and where I'm up to so far, but I'm having trouble and would really appreciate some hints of how the logic should be.
Instead of using strtok process the string char by char. If you see a ", set a flag. If flag is already set - unset it instead. If you see a space - check the flag and either switch to next arg, or add space to current. Any other char - add to current. Zero byte - done processing.
With some extra effort you'll be able to handle even stuff like diff "file \"one\"" file\ two (you should get diff, file "one" and file two as results)
I'm confused even to understand what you try to do. Are you trying to tokenize the input string into space separated tokens?
Just separate the input string on spaces and when you encounter a double quote char you need a second inner loop which handles quoted strings.
There is more to quoted strings than to search for the closing quote. You need to handle backslashes, for example backslashed escaped quotes and also backslash escaped backslashes.
Just consider the following:
diff "space name \" with quotes.txt\\" foo
Which refers to a (trashy) filename space name " with quotes.txt\. Use this as a test case, then you know when you are done with the basics. Note that shell command line splitting is a lot more crazy than that.
Here is my idea:
Make two pointers A and B, initially pointing at first char of the string.
Iterate through the string with pointer A, copying every char into an array as long as it's not a space.
Once you have reached a ", take the pointer B starting from the position A+1 and go forward until you reach the next ", copying everything including space.
Now repeat from number 2, starting from the char B+1.
Repeat as long as you haven't reached \0.
Note: You'll have to consider what to do if there are nested quotes though.
You can also use a flag (int 1 || 0) and a pointer to denote if you're in a quote or not, following 2 separate rules based on the flag.
Write three functions. All of these should return the number of bytes they process. Firstly, the one that handles quoted arguments.
size_t handle_quoted_argument(char *str, char **destination) {
assert(*str == '\"');
/* discard the opening quote */
*destination = str + 1;
/* find the closing quote (or a '\0' indicating the end of the string) */
size_t length = strcspn(str + 1, "\"") + 1;
assert(str[length] == '\"'); /* NOTE: You really should handle mismatching quotes properly, here */
/* discard the closing quote */
str[length] = '\0';
return length + 1;
}
... then a function to handle the unquoted arguments:
size_t handle_unquoted_argument(char *str, char **destination) {
size_t length = strcspn(str, " \n");
char c = str[length];
*destination = str;
str[length] = '\0';
return c == ' ' ? length + 1 : length;
}
... then a function to handle (possibly repetitive) whitespace:
size_t handle_whitespace(char *str) {
int whitespace_count;
/* This will count consecutive whitespace characters, eg. tabs, newlines, spaces... */
assert(sscanf(str, " %n", &whitespace_count) == 0);
return whitespace_count;
}
Combining these three should be simple:
size_t n = 0, argv = 0;
while (line[n] != '\0') {
n += handle_whitespace(line + n);
n += line[n] == '\"' ? handle_quoted_argument(line + n, args + argv++)
: handle_unquoted_argument(line + n, args + argv++);
}
By breaking this up into four separate algorithms, can you see how much simpler this task becomes?
So here is where I read in the line:
while((qtemp = fgets(line, size, stdin)) != NULL ) {
if (strcmp(line, "exit\n") == 0) {
exit(EXIT_SUCCESS);
}
spaceorquotes(qtemp);
}
Then I go to this: (I haven't added my initializers, you get the idea though)
length = strlen(qtemp);
for(i = 0; i < length; i++) {
position = strcspn(qtemp, " \"\n");
while (strncmp(qtemp, " ", 1) == 0) {
memmove(qtemp, qtemp+1, length-1);
position = strcspn(qtemp, " \"\n");
} /*this while loop is for handling multiple spaces*/
if (strncmp(qtemp, "\"", 1) == 0) { /*this is for handling quotes */
memmove(qtemp, qtemp+1, length-1);
position = strcspn(qtemp, "\"");
stemp = malloc(position*sizeof(char));
strncat(stemp, qtemp, position);
args[i] = stemp;
} else { /*otherwise handle it as a (single) space*/
stemp = malloc(position*sizeof(char));
strncat(stemp, qtemp, position);
args[i] = stemp;
}
//printf("args: %s\n", args[i]);
length = strlen(qtemp);
memmove(qtemp, qtemp+position+1, length-position);
}
args[i-1] = NULL; /*the last position seemed to be a space, so I overwrote it with a null to terminate */
if (execvp(args[0], args) == -1) {
perror("execvp");
exit(EXIT_FAILURE);
}
I found that using strcspn helped, as modifiable lvalue suggested.

Resources