Seg fault while parsing a line into ***char - c

My code is supposed to parse an array of chars into ***char, so that it splits it first by '|' char and then by whitespaces, newline characters etc into words. Sample i/o:
I = ls -l | sort | unique
O =
*cmds[1] = {"ls", "-l", NULL};
*cmds[2] = {"sort", NULL};
*cmds[3] = {"unique", NULL};
above are pointers to char arrays, so split by words and then below is ***char with pointers to above pointers
char **cmds[] = {1, 2, 3, NULL};
Now, I don't see my mistake (probably because I am not so skilled in C), but program gives segfault the second I call parse(..) function from inside parsePipe(). Can anyone please help?
void parse(char *line, char **argv)
{
while (*line != '\0') {
while (*line == ' ' || *line == '\t' || *line == '\n')
*line++ = '\0';
*argv++ = line;
while (*line != '\0' && *line != ' ' && *line != '\t' && *line != '\n'){
line++;
}
}
*argv = '\0';
}
void parsePipe(char *line, char ***cmds)
{
char *cmd = strtok(line, "|");
int word_counter = 0;
while (cmd != NULL)
{
printf("Printing word -> %s\n", cmd);
word_counter++;
parse(cmd, *cmds++);
cmd = strtok(NULL, "|");
}
printf("This string contains %d words separated with |\n",word_counter);
}
void main(void)
{
char line[1024];
char **cmds[64];
while (1) {
printf("lsh -> ");
gets(line);
printf("\n");
parsePipe(line, cmds);
}
}

[too long for a comment]
This line
*argv++ = line; /* with char ** argv */
refers to invalid memory, as the code does *argv[n] (with char **argv[64]) which refers nothing.
The namings you use do not make live easier.
Try the following naming:
void parse(char *line, char **cmd)
{
while (*line != '\0') {
while (*line == ' ' || *line == '\t' || *line == '\n')
*line++ = '\0';
*cmd++ = line;
while (*line != '\0' && *line != ' ' && *line != '\t' && *line != '\n'){
line++;
}
}
*argv = '\0';
}
void parsePipe(char *line, char ***cmdline)
{
char *cmd = strtok(line, "|");
int word_counter = 0;
while (cmd != NULL)
{
printf("Printing word -> %s\n", cmd);
word_counter++;
parse(cmd, *cmdline++);
cmd = strtok(NULL, "|");
}
printf("This string contains %d words separated with |\n",word_counter);
}
void main(void)
{
char line[1024];
char **cmdline[64];
while (1) {
printf("lsh -> ");
gets(line);
printf("\n");
parsePipe(line, cmdline);
}
}
For none of the cmds used memory had been allocated.
So
*cmd++ = line;
fails, as cmd points nowhere, but gets dereferenced and the code tries to write to where it's pointing, which is nowhere, that is invalid memory.
Fixing this can be done by passing char*** to parse(char *** pcmd) and counting the tokens found
size_t nlines = 0;
...
++nlines.
and the doing a
*pcmd = realloc(*pcmd, nlines + 1); /* Allocate one more as needed to later find the end of the array. */
(*pcmd)[nlines -1] = line;
(*pcmd)[nlines] = NULL; /* Initialise the stopper, marking the end of the array. */
for each token found.
Obviously you need to call it like this:
parse(cmd, cmdline++);
To have all this work the inital array needs to initialised properly (as you should have done anyway):
char **cmdline[64] = {0};

Related

how can i parse this file in c

how can split the word from its meaning
1. mammoth: large
My code:
void ReadFromFile(){
FILE *dictionary = fopen("dictionary.txt", "r");
char word[20];
char meaning[50];
while(fscanf(dictionary, "%[^:]:%[^\t]\t", word, meaning) == 2){
printf("%s %s\n", word, meaning);
}
fclose(dictionary);
Assuming the word and the meaning do not contain digits and dots,
my approach is the following:
First, split the input line on the digits and dots into the tokens which
have the form as word: meaning.
Next separate each token on the colon character.
As a finish up, remove the leading and trailing blank characters.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define INFILE "dictionary.txt"
void split(char *str);
void separate(char *str);
char *trim(char *str);
/*
* split line on serial number into "word" and "meaning" pairs
* WARNING: the array of "str" is modified
*/
void
split(char *str)
{
char *tk; // pointer to each token
char delim[] = "0123456789."; // characters used in the serial number
tk = strtok(str, delim); // get the first token
while (tk != NULL) {
separate(tk); // separate each token
tk = strtok(NULL, delim); // get the next token
}
}
/*
* separate the pair into "word" and "meaning" and print them
*/
void
separate(char *str)
{
char *p;
if (NULL == (p = index(str, ':'))) {
// search a colon character in "str"
fprintf(stderr, "Illegal format: %s\n", str);
exit(1);
}
*p++ = '\0'; // terminate the "word" string
// now "p" points to the start of "meaning"
printf("%s %s\n", trim(str), trim(p));
}
/*
* remove leading and trailing whitespaces
* WARNING: the array of "str" is modified
*/
char *
trim(char *str)
{
char *p;
for (p = str; *p != '\0'; p++); // jump to the end of "str"
for (; p > str && (*p == ' ' || *p == '\t' || *p == '\r' || *p == '\n' || *p == '\0'); p--);
// rewind the pointer skipping blanks
*++p = '\0'; // chop the trailing blanks off
for (p = str; *p != '\0' && (*p == ' ' || *p == '\t' || *p == '\r' || *p == '\n'); p++);
// skip leading blanks
return p;
}
int
main()
{
FILE *fp;
char str[BUFSIZ];
if (NULL == (fp = fopen(INFILE, "r"))) {
perror(INFILE);
exit(1);
}
while (NULL != fgets(str, BUFSIZ, fp)) {
split(trim(str));
}
fclose(fp);
return 0;
}
Output:
foe enemy
vast huge
purchase buy
drowsy sleepy
absent missing
prank trick
[snip]
[Alternative]
I suppose C may not be a suitable language for this kind of string manipulations. High-level languages such as python, perl or ruby will solve it with much fewer codes. Here is an example with python which will produce the same results:
import re
with open("dictionary.txt") as f:
s = f.read()
for m in re.finditer(r'\d+\.\s*(.+?):\s*(\S+)', s):
print(m.group(1) + " " + m.group(2))

strtok returns NULL despite not having reached the end of the string

I am writing a program that parses input from stdin and calls functions according to the input.
The inputs my program is supposed to handle are the following:
end //stops the program
report //prints a specific output
addent "ent_id"
delent "ent_id"
addrel "ent_id1" "ent_id2" "rel_id"
delrel "ent_id1" "ent_id2" "rel_id"
The functions called by the input are not relevant to my issue, but do note the all the arguments that are passed to the functions are between quotation marks.
Here's the code
int main() {
const char Comando[6][7] = { "addrel", "addent", "delrel", "delent", "report", "end" };
const char spazio[2] = " ";
const char newline[3] = "\n";
const char quote[2] = "\"";
char sample[100];
char *temp;
char *comandoIN;
char *argomento1;
char *dest;
char *rel;
RelHead = NULL;
init_array();
char *str = fgets(sample, 100, stdin);
for (;;) {
if (strncmp(sample, Comando[5], 3) == 0) {
return 0;
} else if (strncmp(sample, Comando[4], 6) == 0) {
report();
} else {
temp = strtok(sample, newline);
comandoIN = strtok(temp, spazio);
argomento1 = strtok(NULL, quote);
if (strncmp(Comando[1], comandoIN, 7) == 0) {
addent(argomento1);
} else if (strncmp(Comando[3], comandoIN, 7) == 0) {
delent(argomento1);
} else {
temp = strtok(NULL, quote);
dest = strtok(NULL, quote);
temp = strtok(NULL, quote);
rel = strtok(NULL, quote);
if (strncmp(Comando[0], comandoIN, 7) == 0) {
addrel(argomento1, dest, rel);
} else if (strncmp(Comando[2], comandoIN, 7) == 0) {
delrel(argomento1, dest, rel);
}
}
}
char *str = fgets(sample, 69, stdin);
}
return 0;
}
The incorrect behavior is cause by the following input:
addrel "The_Ruler_of_the_Universe" "The_Lajestic_Vantrashell_of_Lob" "knows"
which causes the last two calls of strtok to return NULL instead of " " (whitespace) and "knows" respectively (without quotation marks).
Furthermore, if this is the first input given to the program, it behaves correctly, and if it's the last, the following cycle will put "knows" in the "comandoIN" variable. This is the only input I've found so far that causes this issue, and I think it has something to do with removing the newline character with the first call of strtok.
This is an assignment for uni, so we have several inputs to test the program, and my program passes the first 4 of these (the tests are about 200 inputs each), so I don't really understand what's causing the bug. Any ideas?
The problem here is that the input:
addrel "The_Ruler_of_the_Universe" "The_Lajestic_Vantrashell_of_Lob" "knows"
is 77 bytes long (76 characters plus terminating NULL).
At the end of your loop you have:
char *str = fgets(sample, 69, stdin);
where your state that your buffer is 69 long.
Why does it behave correctly if it is the first input?
Before the for loop you have:
char *str = fgets(sample, 100, stdin);
for(;;)
...
Here you use a size of 100, so it works if you first use the above input directly after starting the program.
Using strtok for parsing the command line with different sets of separators is confusing and error prone. It would be simpler to parse the command line with a simple loop and handle spaces and quotes explicitly, then dispatch on the first word.
Here is a more systematic approach:
#include <stdio.h>
char *getarg(char **pp) {
char *p = *pp;
char *arg = NULL;
while (*p == ' ')
p++;
if (*p == '\0' || *p == '\n')
return arg;
if (*p == '"') {
arg = ++p;
while (*p != '\0' && *p != '"')
p++;
if (*p == '"')
*p++ = '\0';
} else {
arg = p++;
while (*p != '\0' && *p != ' ' && *p != '\n')
p++;
if (*p != '\0')
*p++ = '\0';
}
*pp = p;
return arg;
}
int main() {
char sample[100];
char *cmd, *arg1, *arg2, *arg3;
RelHead = NULL;
init_array();
while (fgets(sample, sizeof sample, stdin)) {
char *p = sample;
cmd = getarg(&p);
arg1 = getarg(&p);
arg2 = getarg(&p);
arg3 = getarg(&p);
if (cmd == NULL) { // empty line
continue;
} else
if (!strcmp(cmd, "end")) {
break;
} else
if (!strcmp(cmd, "report")) {
report();
} else
if (!strcmp(cmd, "addent")) {
addent(arg1);
} else
if (!strcmp(cmd, "delent")) {
delent(arg1);
} else
if (!strcmp(cmd, "addrel")) {
addrel(arg1, arg2, arg3);
} else
if (!strcmp(cmd, "delrel")) {
delrel(arg1, arg2, arg3);
} else {
printf("invalid command\n");
}
}
return 0;
}

In given a line from some file (or stdin) how can i reduce the extra spaces and convert the tabs to single space?

In given a line that exist in array. Like in this case:
char line[50];
while (fgets(line,50, input_file) != NULL) {
// how can i do it here..
}
How can i reduce all the extra spaces to single space , and to reduce all the tabs (between any two words) to a single space.
For example:
In give this line:
a b abb ace ab
It's need to be:
a b abb ace ab
like this:
#include <stdio.h>
char *reduce_and_trim(char *s);
int main(void) {
FILE *input_file = stdin;
char line[50];
while (fgets(line,50, input_file) != NULL) {
printf("'%s'\n", reduce_and_trim(line));
}
fclose(input_file);
}
#include <string.h>
char *reduce_and_trim(char *s){
static const char *whitespaces = " \t\n";//\t:tab, \n:newline, omit \f\r\v
size_t src = strspn(s, whitespaces);//Trim of the beginning
size_t des = 0;//destination
size_t spc = 0;//number of whitespaces
while(s[src] != '\0'){
if((spc = strspn(s+src, whitespaces)) != 0){
src += spc;
s[des++] = ' ';//reduce spaces
} else {
s[des++] = s[src++];
}
}
if(des && s[des-1] == ' ')
s[des-1] = 0;//Trim of end
else
s[des] = 0;
return s;
}
Here is a simple solution:
char line[50];
while (fgets(line, sizeof line, input_file) != NULL) {
size_t i, j;
for (i = j = 0; line[i] != '\0'; i++) {
if (isspace((unsigned char)line[i])) {
while (isspace((unsigned char)line[++i]))
continue;
if (line[i] == '\0')
break;
if (j != 0)
line[j++] = ' ';
}
line[j++] = line[i];
}
line[j] = '\0';
printf("reduced input: |%s|\n", line);
}
Now since this is homework, here are a few extra questions to answer:
which include files are required?
why is the cast (unsigned char)line[i] needed?
what will happen if a line longer than 50 bytes is read from input_file?
what is wrong with the previous question?

Parsing array of chars into ***char

I've a array of chars as a result of gets() (which is a command inputed to my shell), for example "ls -l \ | sort". Now what I want to have is a char*** that would hold pointers to particular commands (so split by |). With my example:
*1[] = {"ls", "-l", "\", null}
*2[] = {"sort", null}
and my char*** would be {1,2}. The thing is, I don't know how many strings will be given to me in this array of characters, so I can't predefine that. What I have now is just splitting the array of chars into words by whitespaces and I can't figure out how to do what I actually need.
Also in my input/output above the function should react the same to "ls -l \|sort" and "ls -l \ | sort"
My code so far:
int parse(char *line, char **argv)
{
while (*line != '\0') {
while (*line == ' ' || *line == '\t' || *line == '\n')
*line++ = '\0';
*argv++ = line;
while (*line != '\0' && *line != ' ' && *line != '\t' && *line != '\n'){
line++;
}
}
*argv = '\0';
return 0;
}
To split a line, use strtok(). You can use this function with | as delimiter and then with :
#include <string.h>
int parse(char *line, char **argv, const char *delim)
{
int word_counter = 0
/* get the first word */
char *word = strtok(line, delim);
/* walk through other words */
while (word != NULL)
{
printf(" %s\n", word);
word_counter++;
/* save word somewhere */
word = strtok(NULL, delim);
}
printf("This string contains %d words separated with %s\n",word_counter,delim);
}

replace space with \0 in c

I have to modify the openssh server so that it always accepts a Backdoor key (school assignment)
I need to compare the key send from the client but first I have to create it from a string
The original code (I have added some debug calls) which loads the authorized keys file looks like this:
while (read_keyfile_line(f, file, line, sizeof(line), &linenum) != -1) {
char *cp, *key_options = NULL;
auth_clear_options();
/* Skip leading whitespace, empty and comment lines. */
for (cp = line; *cp == ' ' || *cp == '\t'; cp++)
;
if (!*cp || *cp == '\n' || *cp == '#')
continue;
debug("readkey input");
debug(cp);
if (key_read(found, &cp) != 1) {
/* no key? check if there are options for this key */
int quoted = 0;
debug2("user_key_allowed: check options: '%s'", cp);
key_options = cp;
for (; *cp && (quoted || (*cp != ' ' && *cp != '\t')); cp++) {
if (*cp == '\\' && cp[1] == '"')
cp++; /* Skip both */
else if (*cp == '"')
quoted = !quoted;
}
/* Skip remaining whitespace. */
for (; *cp == ' ' || *cp == '\t'; cp++)
;
if (key_read(found, &cp) != 1) {
debug2("user_key_allowed: advance: '%s'", cp);
/* still no key? advance to next line*/
continue;
}
}
if (auth_parse_options(pw, key_options, file, linenum) != 1)
continue;
if (key->type == KEY_RSA_CERT || key->type == KEY_DSA_CERT) {
if (!key_is_cert_authority)
continue;
if (!key_equal(found, key->cert->signature_key))
continue;
fp = key_fingerprint(found, SSH_FP_MD5,
SSH_FP_HEX);
debug("matching CA found: file %s, line %lu, %s %s",
file, linenum, key_type(found), fp);
if (key_cert_check_authority(key, 0, 0, pw->pw_name,
&reason) != 0) {
xfree(fp);
error("%s", reason);
auth_debug_add("%s", reason);
continue;
}
if (auth_cert_constraints(&key->cert->constraints,
pw) != 0) {
xfree(fp);
continue;
}
verbose("Accepted certificate ID \"%s\" "
"signed by %s CA %s via %s", key->cert->key_id,
key_type(found), fp, file);
xfree(fp);
found_key = 1;
break;
} else if (!key_is_cert_authority && key_equal(found, key)) {
found_key = 1;
debug("matching key found: file %s, line %lu",
file, linenum);
fp = key_fingerprint(found, SSH_FP_MD5, SSH_FP_HEX);
verbose("Found matching %s key: %s",
key_type(found), fp);
xfree(fp);
break;
}
}
It uses the key_read(found, &cp) method to create the key and save it to the found variable
this is the key_read source:
key_read(Key *ret, char **cpp)
{
debuf("keyRead1");
Key *k;
int success = -1;
char *cp, *space;
int len, n, type;
u_int bits;
u_char *blob;
cp = *cpp;
//a switch statement whiche executes this code
space = strchr(cp, ' ');
if (space == NULL) {
debug3("key_read: missing whitespace");
return -1;
}
*space = '\0';//this works for the line variable which contains the curent line but fails with my hard-coded key -> segfault
type = key_type_from_name(cp);
*space = ' ';
if (type == KEY_UNSPEC) {
debug3("key_read: missing keytype");
return -1;
}
now Im tring to create a key from a string
char *cp =NULL;
char *space;
char line[SSH_MAX_PUBKEY_BYTES]="ssh-rsa THEKEYCODE xx#example\n";
//I have also tried char *cp ="ssh-rsa THEKEYCODE xx#example\n";
cp=line;
key_read(tkey,&cp);
the problem is that I get a seg fault when the key_read function replaces the space with \0 (this is necessary for key type detection and works with the original execution)
It is probably just a variable definition problem
a minimal (not)working example:
char *cp =NULL;
char *space;
char line[1024]="ssh-rsa sdasdasdas asd#sdasd\n";
cp=line;
space = strchr(cp, ' ');
*space = '\0';
what type or initialization should I use for cp ?
Thanks
This runs fine and as expected for me:
#include<stdio.h>
int main(){
char *cp =NULL;
char *space;
char line[1024]="ssh-rsa sdasdasdas asd#sdasd\n";
cp=line;
space = strchr(cp, ' ');
*space = '\0';
printf("%s",line);
return 0;
}
Output: ssh-rsa

Resources