Breaking a string in C with multiple spaces - c

Ok, so my code currently splits a single string like this: "hello world" into:
hello
world
But when I have multiple spaces in between, before or after within the string, my code doesn't behave. It takes that space and counts it as a word/number to be analyzed. For example, if I put in two spaces in between hello and world my code would produce:
hello
(a space character)
world
The space is actually counted as a word/token.
int counter = 0;
int index = strcur->current_index;
char *string = strcur->myString;
char token_buffer = string[index];
while(strcur->current_index <= strcur->end_index)
{
counter = 0;
token_buffer = string[counter+index];
while(!is_delimiter(token_buffer) && (index+counter)<=strcur->end_index)//delimiters are: '\0','\n','\r',' '
{
counter++;
token_buffer = string[index+counter];
}
char *output_token = malloc(counter+1);
strncpy(output_token,string+index,counter);
printf("%s \n", output_token);
TKProcessing(output_token);
//update information
counter++;
strcur->current_index += counter;
index += counter;
}
I can see the problem area in my loop, but I'm a bit stumped as to how to fix this. Any help would be must appreciated.

From a coding stand point, if you wanted to know how to do this without a library as an exercise, what's happening is your loop breaks after you run into the first delimeter. Then when you loop to the second delimeter, you don't enter the second while loop and print a new line again. You can put
//update information
while(is_delimiter(token_buffer) && (index+counter)<=strcur->end_index)
{
counter++;
token_buffer = string[index+counter];
}

Use the standard C library function strtok().
Rather than redevelop such a standard function.
Here's the related related manual page.
Can use as following in your case:
#include <string.h>
char *token;
token = strtok (string, " \r\n");
// do something with your first token
while (token != NULL)
{
// do something with subsequents tokens
token = strtok (NULL, " \r\n");
}
As you can observe, each subsequent call to strtok using the same arguments will send you back a char* adressing to the next token.
In the case you're working on a threaded program, you might use strtok_r() C function.
First call to it should be the same as strtok(), but subsequent calls are done passing NULL as the first argument. :
#include <string.h>
char *token;
char *saveptr;
token = strtok_r(string, " \r\n", &saveptr)
// do something with your first token
while (token != NULL)
{
// do something with subsequents tokens
token = strtok_r(NULL, " \r\n", &saveptr)
}

Just put the process token logic into aif(counter > 0){...}, which makes malloc happen only when there was a real token. like this
if(counter > 0){ // it means has a real word, not delimeters
char *output_token = malloc(counter+1);
strncpy(output_token,string+index,counter);
printf("%s \n", output_token);
TKProcessing(output_token);
}

Related

Segmentation fault in c (C90), Whats the problem?

Heres my main.c:
int main() {
char *x = "add r3,r5";
char *t;
char **end;
t = getFirstTok(x,end);
printf("%s",t);
}
And the function getFirstTok:
/* getFirstTok function returns a pointer to the start of the first token. */
/* Also makes *endOfTok (if it's not NULL) to point at the last char after the token. */
char *getFirstTok(char *str, char **endOfTok)
{
char *tokStart = str;
char *tokEnd = NULL;
/* Trim the start */
trimLeftStr(&tokStart);
/* Find the end of the first word */
tokEnd = tokStart;
while (*tokEnd != '\0' && !isspace(*tokEnd))
{
tokEnd++;
}
/* Add \0 at the end if needed */
if (*tokEnd != '\0')
{
*tokEnd = '\0';
tokEnd++;
}
/* Make *endOfTok (if it's not NULL) to point at the last char after the token */
if (endOfTok)
{
*endOfTok = tokEnd;
}
return tokStart;
}
Why do i get segmentation fault running this main program?
I'm programming a two pass aseembler and i need a function that get parse a string by a delimiter, In this case a white space. Is it better to use strtok instead for this purpose?
I need a command pasrer - So that it will extract "add", an operand parser (By , delimiter), To extract "r3" and "r5". I wanted to check if this getFirstTok function is good for this purpose but when i try to run it i get a segmentation fault:
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
Thank you.
As pointed out in the comments, string literals are read-only, as they are baked into the compiled program. If you don't want to go with the suggested solution of making your "source program" a stack-allocated array of characters (char x[] = "add r3,r5"), you can use a function like strdup(3) to make a readable/writable copy like so:
#include <string.h>
[...]
char *rw_code = strdup(x);
t = getFirstTok(rw_code, end);
printf("%s", t);
free(rw_code); /* NOTE: invalidates _all_ references pointing at it! */
[...]
And as a little aside, I always make string literals constant const char *lit = "...", as the compiler will usually warn me if I attempt to write to them later on.

Why are the values in my dynamic 2D char array being overwritten?

I'm trying to build a process logger in C on Linux, but having trouble getting it right. I'd like it to have 3 colums: USER, PID, COMMAND. I'm using the output of ps aux and trying to dynamically append it to an array. That is, for every line ps aux outputs, I want to add a row to my array.
This is my code. (To keep the output short, I only grep for sublime. But this could be anything.)
#define _BSD_SOURCE
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
char** processes = NULL;
char* substr = NULL;
int n_spaces = 0;
int columns = 1;
char line[1024];
FILE *p;
p = popen("ps -eo user,pid,command --sort %cpu | grep sublime", "r");
if(!p)
{
fprintf(stderr, "Error");
exit(1);
}
while(fgets(line, sizeof(line) - 1, p))
{
puts(line);
substr = strtok(line, " ");
while(substr != NULL)
{
processes = realloc(processes, sizeof(char*) * ++n_spaces);
if(processes == NULL)
exit(-1);
processes[n_spaces - 1] = substr;
// first column user, second PID, third all the rest
if(columns < 2)//if user and PID are already in array, don't split anymore
{
substr = strtok(NULL, " ");
columns++;
}
else
{
substr = strtok(NULL, "");
}
}
columns = 1;
}
pclose(p);
for(int i = 0; i < (n_spaces); i++)
printf("processes[%d] = %s\n", i, processes[i]);
free(processes);
return 0;
}
The output of the for loop at the end looks like this.
processes[0] = user
processes[1] = 7194
processes[2] = /opt/sublime_text/plugin_host 27184
processes[3] = user
processes[4] = 7194
processes[5] = /opt/sublime_text/plugin_host 27184
processes[6] = user
processes[7] = 27194
processes[8] = /opt/sublime_text/plugin_host 27184
processes[9] = user
processes[10] = 27194
processes[11] = /opt/sublime_text/plugin_host 27184
But, from the puts(line) I get that the array should actually contain this:
user 5016 sh -c ps -eo user,pid,command --sort %cpu | grep sublime
user 5018 grep sublime
user 27184 /opt/sublime_text/sublime_text
user 27194 /opt/sublime_text/plugin_host 27184
So, apparently all the values are being overwritten and I can't figure out why... (Also, I don't get where the value 7194 in processes[0] = 7194 and processes[4] = 7194 comes from).
What am I doing wrong here? Is it somehow possible to make the output look like the output of puts(line)?
Any help would be appreciated!
The return value of strtok is a pointer into the string that you tokenise. (The token has been made null-terminated by overwriting the first separator after it with '\0'.)
When you read a new line with fgets, you overwrite the contents of this line and all tokens, not just the ones from the last parsing, point to the actual content of the line. (The pointers to the previous tokens remain valid, but the content at these locations changes.)
The are several ways to fix this.
You could make the tokens you save arrays of chars and strcpy the parsed contents.
You could duplicate the parsed tokens with (the non-standard) strdup, which allocates memory for the strings on the heap.
You could read an array of lines, so that the tokens are really unique.
strtok returns a pointer into the string it processes. That string is, always, stored in the variable line. So all your pointers in the array point to the same place in memory.
As #Lundin wrote in a comment, there is no two-dimensional array here, just a one-dimensional array of pointers.

char * disappears if I don't print it?

I have a very odd problem here. I basically have a function which takes in a char* and it splits the string and returns the substring. My problem is if I print length of the char* THEN return it then the value remains but if I don't call the function to get the length then when it comes out of the function it disappears.
I've probably explained it poorly above so I'll copy and paste segments of the code below:
void processFile (char *currentLine, int currentLineNumber)
{
int type;
char *accountName, *secCodeRef, *secCode = NULL, *reference = NULL;
if ((type = getType(currentLine)) == TYPE_HEADER)
{
accountName = strtok (currentLine, " "); //Remove "Type"
accountName = strtok (NULL, " "); //Get Account Name
secCodeRef = strtok (NULL, " "); //get Security code and reference
secCode = getSecCode(secCodeRef, secCode); //Get Security Code
printf("TEST:%s\n", secCode);
}
Basically secCodeRef is a string that contains both the security code and the reference (.e.g.) GB0007980592REFERENCE1. The first 11 characters are the security code and the rest is the reference. So I pass this string into a function called getSecCode: (SECCODELENGTH is 13 btw)
char *getSecCode (char *secCodeRef, char *secCode)
{
char SecCode[SECCODELENGTH];
char *SecuCode = (char*)&SecCode;
memcpy(SecCode, &secCodeRef[START], SECCODELENGTH-1);
SecCode[SECCODELENGTH-1] = '\0';
printf("%d\n", getStringLength(SecuCode));
return SecuCode;
}
It extracts SecuCode ok when this line runs:
printf(%d\n, getStringLength(SecuCode));
The result is: (I'm reading from a file btw with different data in it)
12
TEST:GB0007980592
12
TEST:GB0007980593
12
TEST:GB0007980594
Which is correct
But when I comment out:
//printf(%d\n, getStringLength(SecuCode));
The output is:
TEST:
TEST:
TEST:
Why does the print statement affect the return value at all?
char SecCode[SECCODELENGTH];
char *SecuCode = (char*)&SecCode;
SecCode is a local array to function getSecCode() and you return the address of this local array which will lead to undefined behavior.

Strtok + If Statement

im new in here and im having a headache with my program,the thing is that i need to get a input from the keyboard and then separate it using strtok but have to separate the tokens using 4 diferent cases and in each case i need to print the result and save it to a string like this:
input String : Label Instruction #50,Y; Label <with>
and the output should look like this:
Label: Label
Instruction: Instruction
Character [1]: #50
Character [2]: Y
Comentaries: Label <with>
also it has to be able to reconize if a instruction is missed like this:
Input String: adda
Output String
Label: -----
Instruction: adda
Character 1: -----
Comentaries: -----
My code can accept the first and correct instruction but when i type a incorrect one like in the second input it ignores it and continue like the first atempt just ading sometimes,i have tryed to use if to be able to separate each token with its delimeter but everitime i compile it it ignores the if statement no matter what argument i gave it i dont know what else to do
Heres my code
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#include <ctype.h>
int main() { char word[256];
fgets(word,256,stdin);
char *token;
while (token != NULL){
char delimiter[]="\n , ;";
token=strtok (word,delimiter);
//if(token != "\n") //{
//char delimiter[]="\n , ;";
//token=strtok (word,delimiter);
//if (delimiter != " "||"\t" || "_")
printf("Label \"%s\"\n", token);
token = strtok (NULL, "\n , ;"); //(NULL, "_,.-")//}
//token=strtok (word,delimiter); //}
//printf("Label ----------\n");
if (delimiter != "\n"||"\t")//{
printf("Instruction \"%s\"\n", token);
token = strtok (NULL, "\n , ;"); //(NULL, "_,.-")//}
printf("Character \"%s\"\n", token);
token = strtok (NULL, "\n , ;"); //(NULL, "_,.-")
printf("Character 2 \"%s\"\n", token);
token = strtok (NULL, "\n , ;"); //(NULL, "_,.-")
printf("Comentaries \"%s\"\n",token);
token = strtok (NULL, ";");
// printf("Character 2\"%s\"\n", token);
// token = strtok (NULL, "\n , ;"); //(NULL, "_,.-")
token = NULL;}
//token = NULL;
//printf("Comentaries \"%s\"\n", token);
//token = NULL;
return(0);
}
the // coments are all my failed attempts to try to make it work =(
Can someone help me please?
char *token;
while (token != NULL){
where is token initialized?
Then:
if (delimiter != "\n"||"\t")
You are only comparing pointers in the if controlling expression: use strcmp function to compare strings.
I could see various problems with your code, as listed below:
1.Your while loop is based on token which is not initialized. Better use a do/while.
2.delimiter is a string and cannot be compared using != operator. Use strcmp/strncmp.
3.What is the point of checking delimiter in every iteration when it is the assigned the same value everytime? I am not sure what are you trying to achieve by doing that. AFAIK, the value of delimiter is not changed on a call to strtok.
4.token should pass through a NULL check before every call to strtok as you area allowed to enter a "wrong" string.
5.Nothing wrong with it, but why exactly was fgets used for taking input from stdin, when you could simply do a scanf?
Hope above solves your issues.

Creating a terminal menu with a challenge

What I wont to do is to create a terminal menu that takes various types of arguments and place it in a array param. Under is the code: Here is some trouble that I have and cant find a good solution for.
if i just type 'list' I will get Not a valid command, I have to type “list “ (list and space).
Menu choice new should be like this: new “My name is hello”. param[0] = new and param[1] = My name is hello , (sow I can create a message with spaces).
How can I accomplish this?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
int menu()
{
printf(">");
char line[LINE_MAX];
int i = 0;
char *param[4];
while(fgets(line, LINE_MAX, stdin) != NULL) {
param[i++] = strtok(line, " \n");
if(param[0] != NULL) {
char *argument;
while((argument = strtok(NULL, "\n")) != NULL) {
param[i++] = argument;
}
}
if(strcmp(param[0], "new") == 0) {
//new(param[1]);
menu();
} else if(strcmp(param[0], "list") == 0) {
//list();
menu();
} else {
printf("Not a valid command.\n\n");
menu();
}
}
return 0;
}
You're delimiting on " ".
fgets reads the ENTER.
So, when you type "listENTER" and tokenise at spaces you get one token, namely "listENTER". Later you compare with "list" and, of course, it doesn't match.
Try
strtok(line, " \n"); /* maybe include tabs too? */
PS. Why are you calling menu recursively? You already have a while in the function ...
Your problem is param[i++] = strtok(line, " "); will only split on space, not on \n (newline). Try adding this to your array of delimeters.
Oh, and congratulations for some decent looking code that's clean and well formatted. A pleasant change.
I'm not sure if this causes your problem but these lines
/*new(param[1]);
/*list();
Start a comment that is never terminated.
If you want one line comments you can use:
// comment
(atleast in C++ and from C99 on)
But comments starting with /*must be ended with a */and not nested:
/* comment */
/* also multi line
allowed */
Since you start a comment in a comment your compiler should have emmited a warning, actually this shouldn't compile at all.
The reason you need to type "list " is that your first strtok tokenizes until a space character, so you need to enter one in this case. Try allowing both '\n' and space as separators, i.e. replace the second parameter of strtok with " \n".
As for quotes, you need to re-combine parameters starting from the one beginning with a quote to the one ending with one by replacing the characters in between them with spaces. Or do away with strtok and parse by manually iterating through the characters in line.

Resources