Colon separated contents in a text file - c

I need a C program which can read contents from a text file and the contents in the file are colon separated as shown
CatId;1;CatName;CLOTHS;Prefix;CH;ActiveStatus;Y;......
So can any one suggest a best and simple logic to read the contents and store it in a buffer?
Thanks in advance

I'm not sure if it's the best way to do it but I would:
Use fgets to read the file line by
line
Use strtok to tokenize the string
(or do it manually depending on how
lazy I feel)
Something like this:
char *p;
while (fgets(line, MAXLINE, fp)) {
p = strtok(line, ";");
while (NULL != p) {
/* p is a token */
p = strtok(NULL, ";");
}
}

Related

C - Nested loop using strtok

I am trying to use strtok to split up a text file into strings that I can pass to a spell check function, the text file includes characters such as '\n', ' ?!,.' etc...
I need to print any words that fail the spell check and the line number that they are on. Keeping track of the line is what I'm struggling with.
I have tried this so far but it only returns results for the first line of the text file:
char str[409377];
fread(str, noOfChars, 1, file);
fclose(file);
int lines=1;
char *token;
char *line;
char splitLine[] = "\n";
char delimiters[] = " ,.?!(){}*&^%$£_-+=";
line = strtok(str, splitLine);
while(line!=NULL){
token = strtok(line, delimiters);
while(token != NULL){
//print is just to test if I can loop through all the words
printf("%s", token);
//spellCheck function & logic here
token = strtok(NULL, delimiters);
}
line = strtok(NULL, splitLine);
lines++
}
Is using the nested while loop and strtok possible? Is there a better way to keep track of the line number?
The strtok function is not reentrant! It can not be used to tokenize multiple strings simultaneously. It's because it keeps internal state about the string currently being tokenized.
If you have a modern compiler and standard library then you could use strtok_s instead. Otherwise you have to come up with another solution.
You can use strtok, but it's not very easy to use. It's a stupid function, all it really does is replace delimiters with nuls and return a pointer to the start of the sequence it has delimited. So it's destructive. It can't handle special cases like English words being allowed one apostrophe (we're is a word, we'r'e is not), you have to make sure you list all the delimiters specifically.
It's probably best to write mystrok yourself, so you understand how it works. Then use that as the basis for your own word extractor.
The reason for your bug is that you chop off the first line, then that is all that strok sees on the subsequent calls.

Reading text files into an array in C

Just got a question regards, when you read lines of text from a text file how would you separate the words and store them into an array.
For example if I have two lines of text in my text file that looks like this:
1005; AndyCool; Andy; Anderson; 23; LA
1006; JohnCool; John; Anderson; 23; LA
How would you split them into based on the ';' .
And then store them in 2D array.
Sorry I haven't started my coding just yet to paste it here
Cheers ...
Use the strsep function:
char* token;
char* line;
/* I assume the line as loaded from file */;
if( line != NULL ) {
while ((token = strsep(&line, ";")) != NULL)
{
/*
token points to the current extracted string,
use it to fill your array
*/
}
}
First read using fgets, then use strtok to split a string http://www.cplusplus.com/reference/cstring/strtok/
Look at the manual pages for fopen, fgets, strstr and and strchr and strspn functions ... The strtok and strsep functions also work for most things you will do.

Strtok - reading empty string at end of line

In my code below I use strtok to parse a line of code from a file that looks like:
1023.89,863.19 1001.05,861.94 996.44,945.67 1019.28,946.92 1023.89,863.19
As the file can have lines of different lengths I don't use fscanf. The code below works of except for one small glitch. It loops around one time too many and reads in a long empty string " " before looping again recognizing the null token "" and exiting the while loop. I don't know why this could be.
Any help would be greatly appreciated.
fgets(line, sizeof(line), some_file);
while ((line != OPC_NIL) {
token = strtok(line, "\t"); //Pull the string apart into tokens using the commas
input = op_prg_list_create();
while (token != NULL) {
test_token = strdup(token);
if (op_prg_list_size(input) == 0)
op_prg_list_insert(input,test_token,OPC_LISTPOS_HEAD);
else
op_prg_list_insert(input,test_token,OPC_LISTPOS_TAIL);
token = strtok (NULL, "\t");
}
fgets(line, sizeof(line), some_file);
}
You must use the correct list of delimiters. Your code contradicts comments:
token = strtok(line, "\t"); //Pull the string apart into tokens using the commas
If you want to separate tokens by commas, use "," instead of "\t". In addition, you certainly don't want the tokens to contain the newline character \n (which appears at the end of each line read from file by fgets). So add the newline character to the list of delimiters:
token = strtok(line, ",\n"); //Pull the string apart into tokens using the commas
...
token = strtok (NULL, ",\n");
You might want to add the space character to the list of delimiters too (is 863.19 1001.05 a single token or two tokens? Do you want to remove spaces at end of line?).
Your use of sizeof(line) tells me that line is a fixed size array living on the stack. In this case, (line != OPC_NIL) will never be false. However, fgets() will return NULL when the end of file is reached or some other error occurs. Your outer while loop should be rewritten as:
while(fgets(line, sizeof(line), some_file)) {
...
}
Your input file likely also has a newline character at the end of the last input line resulting in a single blank line at the end. This is the difference between this:
1023.89,863.19 1001.05,861.94 996.44,945.67 1019.28,946.92 1023.89,863.19↵
<blank line>
and this:
1023.89,863.19 1001.05,861.94 996.44,945.67 1019.28,946.92 1023.89,863.19
The first thing you should do in the while loop is check that the string is actually in the format you expect. If it's not then break:
while(fgets(line, sizeof(line), some_file)) {
if(strlen(line) == 0) // or other checks such as "contains tab characters"
break;
...
}

strtok only returning one token

I'm writing a simple shell that accepts some standard commands like cd and ls in C. I'm trying to implement a feature where the user can enter a ";" in between commands so that a bunch of commands can be written on the same line and be executed separately. So if I input "cd Desktop; ls" the shell should cd to Desktop and print the what's in the directory. The problem is it only executes the first command. Here's my main method:
char input[1024];
while(1)
{
printf("%s ", prompt);
fgets(input, 1024, stdin);
char delims[] = ";";
char *result = NULL;
result = strtok( input, delims );
while( result != NULL )
{
printf("%s\n", result);
char * copy = malloc(strlen(result) + 1); //Create a copy of the input token
strcpy(copy, result);
format(copy);
if(programs)
{
handle();
cleanup(programs);
programs = NULL;
}
free(copy);
result = strtok( NULL, delims );
cmdno++;
}
}
First I try to break up the input into tokens based on ";" and then feed the token to the format() method which looks like this:
int format(char input[])
{
input = strtok(input, "\n");
...
}
I know that strtok makes changes to the original string, which is why I create a copy of the token first before passing it to format. Is what I'm doing correct??
You can't mix multiple strtok calls. Here's what's happening:
You start splitting input so strtok takes note and stores stuff internally
You take a break from splitting input
You start splitting copy so again strtok takes note, thereby destroying the previous info
At this point strtok only knows about the copy business and doesn't know anything about the original input.
The main problem is that strtok doesn't know that you're doing two things at the same time. From its point of view, you simply started processing a different string before finishing the first string.
Possible solutions:
Use strtok_r if you have it. It's not standard C (but it is standard POSIX). The r stands for reentrant
Use your own splitting function (strchr / looping etc)
Change your program logic such that you don't need to split copy before finishing with input
About that last point:
Keep an array of char * and fill it with strtok without pausing to split sub-tokens. So each element should be a different command
When you're done with the ";" split, start processing each of the array elements
What about this:
char line[1024];
char *token;
while (1) {
printf("$ ");
fgets(line, 1000, stdin);
token = strtok(line, ";");
while (token != NULL) {
printf("%s\n", token);
token = strtok(NULL, ";");
}
}

How I can skip a blank line in an input file when using strtok?

I want to pass lines of a file using strtok; the values are comma separated. However, strtok also reads blank lines which only contain spaces. Isn't it suppose to return a null pointer in such a situation?
How can I ignore such a line? I tried to check NULL, but as mentioned above it doesn't work.
void function_name(void)
{
const char delimiter[] = ",";
char line_read[9000];
char keep_me[9000];
int i = 0;
while(fgets(line_read, sizeof(line_read), filename) != NULL)
{
/*
* Check if the line read in contains anything
*/
if(line_read != NULL){
keep_me[i] = strtok(line_read, delimiter);
i++;
}
}
}
So to explain.
You're reading in your file using a while loop which reads the entire file line by line (fgets) into the array line_read.
Every time it reads in a line it will check to see if it contains anything (the NULL check).
If it does contain something it was parse it using strtok and read it into keep_me otherwise it will stay in the line_read array which you obviously don't use in your program.

Resources