Parsing command-line entries in C: implementing a shell - c

I'm implementing a shell in C, and I ran into some problems parsing command-line entries. I want my parser method to separate command-line entries that are delimited by the whitespace character, and return the result as a double char pointer. ie, say I have "ls -l >ls.txt", my parser should return a char **r with r[0]="ls", r[1]="-l", and r[2]=">ls.txt".
Here is the code for my current parse method, which is, by the way, segfaulting, and I'm out of ideas as to how to fix that:
char **parser(int *argc, char *s)
{
char **r;
char *t, *m;
int i,n,size;
t = malloc(strlen(s)); // firs i used this instead of *r, but i run
// into trouble when i have more than two
// argc. ( You see why, right?)
//strcpy(t,s);
i = 0;
size = 5;
r = malloc(size*sizeof(char *));
while (( m = strchr(s, ' '))) {
n = ((int)m) - ((int)s);
if (i==0) {
*r = malloc(n);
} else {
*r = realloc(*r, n);
}
strncpy(*r, s, n);
*r[n]= '\0';
s = (char*)(s+n+1);
if (i == size)
r = realloc(r, (size = 2*size)*sizeof(char*));
i++;
r = (char **)(r + sizeof(char*));
}
s[strlen(s)-1] = '\0';
if ((i<1) || (strlen(s)>1)) {
*r = s;
}
*argcp = ++i;
return r;
}
I know my code isn't ideal. It could be made better using strsep, but my main concer is how to manage memory for the double char pointer I want to return.
Thanks for the help!

This is a quick stab.
My C is so rusty, all of the hinges are stuck, so.
The premise is that you will end up with a pointer to an array of pointers. The key detail though, is at the end of that list of pointers, is the argument data itself. So when you're done, you simply need to free the returned pointer.
Untested. There may well be a one off error sneaking in here.
Edit, I compiled and quickly tested it.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char **parser(int *argc, char *s) {
char **r, **rp;
char *t, *p, *w;
void *vp;
int l;
l = strlen(s); // size of cmd line
vp = malloc(l + (*argc * sizeof(char *))); // total buffer size
t = (char *)(vp + (*argc * sizeof(char *))); // offset into buffer for argument copy
r = (char **)vp; // start of buffer, start of pointer array to arguments
strcpy(t, s); // copy arguments in to buffer
p = t; // parsing pointer
w = t; // word pointer for each argument
rp = r; // storage for first pointer
while(*p) { // while not at end of string
if (*p == ' ') { // if we find a space
if (w) { // if we have a word pointer assigned
*rp++ = w; // store the word pointer
w = NULL; // set word pointer to null
*p = '\0'; // terminate argument with a 0
} // else do nothing continue to skip spaces
} else {
if (w == NULL) { // If we haven't got a new arg yet
w = p; // set it
} // otherwise, just keep scanning
}
p++; // move along the string
}
if (w) { // clean up at the end if we have an arg
*rp++ = w;
w = NULL; // no reason to set 0 at the end, it's already there from strcpy
}
return r;
}
int main() {
char *cmd = "arg1 arg2";
int argc = 2;
char **r = parser(&argc, cmd);
printf("%s\n",r[0]);
printf("%s\n",r[1]);
}

Related

Is there an easy way to remove specific chars from a char*?

char * deleteChars = "\"\'.“”‘’?:;-,—*($%)! \t\n\x0A\r"
I have this and i'm trying to remove any of these from a given char*. I'm not sure how I would go about comparing a char* to it.
For example if the char* is equal to "hello," how would I go about removing that comma with my deleteChars?
So far I have
void removeChar(char*p, char*delim){
char*holder = p;
while(*p){
if(!(*p==*delim++)){
*holder++=*p;
p++;
}
}
*holder = '\0';
A simple one-by-one approach:
You can use strchr to decide if the character is present in the deletion set. You then assign back into the buffer at the next unassigned position, only if not a filtered character.
It might be easier to understand this using two indices, instead of using pointer arithmetic.
#include <stdio.h>
#include <string.h>
void remove_characters(char *from, const char *set)
{
size_t i = 0, j = 0;
while (from[i]) {
if (!strchr(set, from[i]))
from[j++] = from[i];
i++;
}
from[j] = 0;
}
int main(void) {
const char *del = "\"\'.“”‘’?:;-,—*($%)! \t\n\x0A\r";
char buf[] = "hello, world!";
remove_characters(buf, del);
puts(buf);
}
stdout:
hello world
If you've several delimiters/characters to ignore, it's better to use a look-up table.
void remove_chars (char* str, const char* delims)
{
if (!str || !delims) return;
char* ans = str;
int dlt[256] = {0};
while (*delims)
dlt[(unsigned char)*delims++] = 1;
while (*str) {
if (dlt[(unsigned char)*str])
++str; // skip it
else //if (str != ans)
*ans++ = *str++;
}
*ans = '\0';
}
You could do a double loop, but depending on what you want to treat, it might not be ideal. And since you are FOR SURE shrinking the string you don't need to malloc (provided it was already malloced). I'd initialize a table like this.
#include <string.h>
...
char del[256];
memset(del, 0, 256 * sizeof(char));
for (int i = 0; deleteChars[i]; i++) del[deleteChars[i]] = 1;
Then in a function:
void delChars(char *del, char *string) {
int i, offset;
for (i = 0, offset = 0; string[i]; i++) {
string[i - offset] = string[i];
if (del[string[i]]) offset++;
}
string[i - offset] = 0;
}
This will not work on string literals (that you initialize with char* x = "") though because you'd end up writing in program memory, and probably segfault. I'm sure you can tweak it if that's your need. (Just do something like char *newString = malloc(strlen(string) + 1); newString[i - offset] = string[i])
Apply strchr(delim, p[i]) to each element in p[].
Let us take advantage that strchr(delim, 0) always returns a non-NULL pointer to eliminate the the null character test for every interrelation.
void removeChar(char *p, char *delim) {
size_t out = 0;
for (size_t in; /* empty */; in++) {
// p[in] in the delim set?
if (strchr(delim, p[in])) {
if (p[in] == '\0') {
break;
}
} else {
p[out++] = p[in];
}
}
p[out] = '\0';
}
Variation on #Oka good answer.
it is better way - return the string without needless characters
#include <string.h>
char * remove_chars(char * str, const char * delim) {
for ( char * p = strpbrk(str, delim); p; p = strpbrk(p, delim) )
memmove(p, p + 1, strlen(p));
return str;
}

string replace using dynamically allocated memory

I am using the below function to replace a sub-string in a given string
void ReplaceSubStr(char **inputString, const char *from, const char *to)
{
char *result = NULL;
int i, cnt = 0;
int tolen = strlen(to);
int fromlen = strlen(from);
if (*inputString == NULL)
return;
// Counting the number of times old word
// occur in the string
for (i = 0; (*inputString)[i] != '\0'; i++)
{
if (strstr((&(*inputString)[i]), from) == &(*inputString)[i])
{
cnt++;
// Jumping to index after the old word.
i += fromlen - 1;
}
}
// Making new string of enough length
result = (char *)malloc(i + cnt * (tolen - fromlen) + 1);
if (result == NULL)
return;
memset(result, 0, i + cnt * (tolen - fromlen) + 1);
i = 0;
while (&(*inputString))
{
// compare the substring with the result
if (strstr(*inputString, from) == *inputString)
{
strncpy(&result[i], to, strlen(to));
i += tolen;
*inputString += fromlen;
}
else
{
result[i++] = (*inputString)[0];
if ((*inputString)[1] == '\0')
break;
*inputString += 1;
}
}
result[i] = '\0';
*inputString = result;
return;
}
The problem with the above function is memory leak. Whatever memory is allocated for inputString will be lost after this line.
*inputString = result;
since I am using strstr and moving pointer of inputString *inputString += fromlen; inputString is pointing to NULL before the above line. So how to handle memory leak here.
Note: I dont want to return the new memory allocated inside the function. I need to alter the inputString memory based on new length.
You should use a local variable to iterate over the input string and avoid modifying *inputString before the final step where you free the previous string and replace it with the newly allocated pointer.
With the current API, ReplaceSubStr must be called with the address of a pointer to a block allocated with malloc() or similar. Passing a pointer to local storage or a string literal will have undefined behavior.
Here are a few ideas for improvement:
you could return the new string and leave it to the caller to free the previous one. In this case, you would take the input string by value instead of by address:
char *ReplaceSubStr(const char *inputString, const char *from, const char *to);
If the from string is empty, you should either insert the to string between each character of the input string or do nothing. As posted, your code has undefined behavior for this border case.
To check if the from string is present at offset i, use memcmp instead of strstr.
If cnt is 0, there is nothing to do.
You should return an error status for the caller to determine if memory could be allocated or not.
There is no need to initialize the result array.
avoid using strncpy(). This function has counter-intuitive semantics and is very often misused. Read this: https://randomascii.wordpress.com/2013/04/03/stop-using-strncpy-already/
Here is an improved version:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int ReplaceSubStr(char **inputString, const char *from, const char *to) {
char *input = *inputString;
char *p, *q, *result;
size_t cnt;
size_t tolen = strlen(to);
size_t fromlen = strlen(from);
if (input == NULL || fromlen == 0)
return 0;
// Counting the number of times old word occurs in the string
for (cnt = 0, p = input; (p = strstr(p, from)) != NULL; cnt++) {
p += fromlen;
}
if (cnt == 0) // no occurrence, nothing to do.
return 0;
// Making new string of enough length
result = (char *)malloc(strlen(input) + cnt * (tolen - fromlen) + 1);
if (result == NULL)
return -1;
for (p = input, q = result;;) {
char *p0 = p;
p = strstr(p, from);
if (p == NULL) {
strcpy(q, p0);
break;
}
memcpy(q, p0, p - p0);
q += p - p0;
memcpy(q, to, tolen);
q += tolen;
p += fromlen;
}
free(*inputString);
*inputString = result;
return 0;
}
int main() {
char *p = strdup("Hello world!");
ReplaceSubStr(&p, "l", "");
printf("%s\n", p); // prints Heo word!
free(p);
return 0;
}
You cannot obviously free the input as it can be a literal, some memory you don't control. That would cripple your function even more than now.
You could return the old value of inputString so you'd be able to free it if needed.
char *ReplaceSubStr(char **inputString, const char *from, const char *to)
{
char *old_string = *inputString;
...
return old_string;
}
The caller is responsible to free the contents of old_string if needed.
If not needed (we have to workaround the char ** input by assigning a valid writable array to a pointer to be able to pass this pointer:
char input[]="hello world";
char *ptr = input;
ReplaceSubStr(&ptr, "hello", "hi");
// input is now "hi world" in a different location
free(ptr); // when replaced string isn't needed
if needed:
char *input = strdup("hello world");
char *old_input = ReplaceSubStr(&input, "hello", "hi");
free(old_input);
or just
free(ReplaceSubStr(&input, "hello", "hi"));
then always (when replaced string isn't needed):
free(input);
The only constraint is that you cannot use a constant string literal as input (const char *input = "hello world") because of the prototype & the possible return of a char * to pass to free.

Issue working with double pointers

I'm new to C and having trouble wrapping my head around double pointers and keep getting segmentation fault errors. I've debugged the program a bit and located where things go wrong, but can't for the life of me figure out why. I'll post my code first:
int main() {
printf("Enter string to be split: \n");
a = readline();
String *st = newString(a);
String **split;
int num;
num = string_split(st, ',', split);
for (i=0; i<num; i++) { print_string(*(split+i)); }
}
readline() produces a pointer to an array of chars (entered by the user) and appends '\0' to it. newString and print_string definitely work. Here's the struct for string:
typedef struct {
char *chars;
int length;
int maxSize;
} String;
And here is the code for string_split which is causing me all this trouble.
int string_split(String *s, char delim, String **arrayOfStructs) {
char *c = getCharacters(s);
int len = length(s);
int begin = 0;
int end;
int arraycount = 0;
String **temp = (String**)malloc(sizeof(String*));
for (end=0; end<len+1; end++) {
if ((*(c+end) == delim || *(c+end) == '\0') && begin != end) {
String *st = substring(s,begin,end-1);
*(temp + arraycount) = st;
begin = end + 1;
arraycount++;
temp = (String**)realloc(temp, 1+arraycount*sizeof(String*));
}
}
arrayOfStructs = temp;
return arraycount;
}
In main, when I get back split, all the String*'s that it points too are gone. When print_string gets an individual String* and tries to grab one of its members, a segmentation fault occurs. I don't understand why, because I feel like I allocate memory every time it is necessary, but I feel like I'm missing something. Also, when debugging, if I step through string_split, temp is produced exactly like I expect, so I think I'm just not malloc'ing somewhere where I'm supposed to and it's not a problem with the logic of the function. Here is the code in substring, although I'm pretty sure it works since I've been able to return String* from substring and pass them to print_string just fine.
String *substring(String *s1, int begin, int end) {
String *s = (String*)malloc(sizeof(String));
int length = 0;
s->maxSize = 20;
char *temp = (char*)malloc(20*sizeof(char));
char *arr = s1->chars;
int i;
for (i=begin; i <= end; i++) {
*(temp+length) = *(arr+i);
length++;
if (length == s1->maxSize-1) {
s1->maxSize = s1->maxSize+20;
temp = (char*)realloc(temp, s1->maxSize*sizeof(char));
}
}
*(temp+length) = '\0';
s->length = length;
s->chars = temp;
return s;
}
Any help is greatly appreciated!
You need to pass the argument arrayOfStructs by reference and not by value. As C doesn't actually have proper references, you have to pass a pointer to the variable:
int string_split(String *s, char delim, String ***arrayOfStructs) {
...
*arrayOfStructs = temp;
return arraycount;
}
Call it using the address-of operator &:
num = string_split(st, ',', &split);
As it is now, you pass the argument by value, which means that the variable arrayOfStructs is just a local copy inside the function. Any changes to it is only made to the copy, and are lost once the variable goes out of scope when the function returns.
String **temp = (String**)malloc(sizeof(String*));
*(temp + arraycount) = st;
temp+arraycount is going to give you a random address in memory. temp contains the pointer you just malloced, which should point to another pointer.(which you have not initialised), but you are incrementing the pointer so you loose the location you just malloced.
temp is not pointing to consecutive memory, it specifically points to another pointer(which is 8bytes on a 64bit machine)

printf overwriting seeminlgy unrelated data?

EDIT: I should add how I have this all set up. The struct definition and prototypes are in mystring.h. The function definitions are in mystring.c. The main is in mystringtest.c. For mystring.c and mystringtest.c, I have #include "mystring.h" at the top. I'm compiling like gcc -o test.exe mystring.c mystringtest.c. Not sure if any of that matters, but I'm new with C so I'm just trying to include everything.
I have a good deal of experience with Java but am pretty new to C. I imagine this is related to pointers and memory but I'm totally at a loss here for what's going on. Here's my code:
typedef struct {
char *chars;
int length;
int maxSize;
} String;
int main() {
char *a;
a = readline();
String *s = newString(a);
int b = length(s);
printf("length is %d \n", b);
}
I run the program and enter "hello" (as prompted by readline()). I've stepped through the program and after length(s), s->chars is still a pointer to the array of chars 'hello'. After the print statement, s->chars becomes a pointer to the array of chars 'Length is %d \n'. I'm totally at a loss for what I'm doing wrong. I'm working on a virtual machine if that matters at all. Any help is greatly appreciated. I'll give the code for newString and length too.
int length(String *s) {
char *temp = s->chars;
char b = *temp;
int count;
if (b == '\0') { count = 0; }
else { count = 1; }
while (b != '\0') {
b = *(temp+count);
count++;
}
return count;
}
String *newString(char *s) {
String st;
st.length = 20;
st.maxSize = MAXCHAR;
char *temp = malloc(20 * sizeof(char));
char b = *s;
int count = 0;
while (b != '\0') {
*(temp + count) = b;
count++;
b = *(s+count);
if (count == st.maxSize) { break; }
if (count == st.length) {
st.length = st.length + 20;
temp = realloc(temp, st.length * sizeof(char));
}
}
st.chars = temp;
return &st;
}
String *newString(char *s) {
String st;
...
return &st;
}
You are returning a pointer to a local variable. After newString returns, the local variable no longer exists, so you have a dangling pointer.
Either allocate st with malloc, or return it by value.
you must null terminate the string after the while loop, you have not left space for the null terminator. Also I don't see why you need to realloc
//using strlen will eliminate the need for realloc, +1 is for the null terminator
int len = strlen(s)
char *temp = malloc((len * sizeof(char)) +1);
//null terminate
*(temp+count) = '\0';
st.chars = temp;

memory leak for simple program, how can I free allocs?

I am learning C, and am have a problem finding out how i can free my malloc()'s.
The program runs correctly.. but im Using valgrind and it is coming up with 8 allocs and 5 frees. I need to be able to free 3 more. I commented where I believe which I am not freeing but I am not sure of a solution.
Is there a way I can free up those allocs, or do I need to consider re-writing the tokenizer()?
Here is the code to the whole file.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
char *substr(const char *s, int from, int nchars) {
char *result = (char *) malloc((nchars * sizeof(char))+1);
strncpy(result, s+from, nchars);
return result;
}
/**
Extracts white-space separated tokens from s.
#param s A string containing 0 or more tokens.
#param ntokens The number of tokens found in s.
#return A pointer to a list of tokens. The list and tokens must be freed
by the caller.
*/
char **tokenize(const char *s, int *ntokens) {
int fromIndex = 0;
int toIndex = 0;
char **list;
int finalCount = *ntokens;
int count = 0;
list = malloc(*ntokens * sizeof(char*));
while ( count < finalCount) {
char *m = strchr(s,' ');
toIndex = m - s;
if(toIndex >= 0) {
list[count] = substr(s,fromIndex,toIndex); // This substr() gets free'ed from main()
s = substr(s, toIndex+1, strlen(s)); // I believe This is where I am making extra mallocs that are not being freed
count++;
} else {
list[count] = substr(s,fromIndex,strlen(s)); // This substr() gets free'ed from main()
count++;
}
}
return list;
}
int main(int argc, char **argv) {
char **list;
char *string = "terrific radiant humble pig";
int count = 4; // Hard-Coded
list = tokenize(string, &count);
for (int i=0;i<count;i++) {
printf("list[%d] = %s\n", i, list[i]);
}
// Free mallocs()'s
for (int i=0;i<count;i++) {
free(list[i]);
}
// Free List
free(list);
return 0;
}
You don't need substr s everytime after getting one token. This is too wasteful, in terms of both time and spape. You can just change the value of s to make it point to the string you need.
//s = substr(s, toIndex+1, strlen(s)); // You don't need have to generate a new string
s = s + toIndex + 1;//You can just change the value of s to make it point to the string you need
The problem is exactly where you thought it was!
Luckily in c is very easy to move the point , at which a string, you do not need to call again substr; because of pointers ;-)
// s = substr(s, toIndex+1, strlen(s));
s += toIndex+1;
A simple workaround I can think of, by just storing the current value of s in another pointer before you overwrite. And also make sure not to free the first value of s got directly as the parameter to tokenize().
char **tokenize(const char *s, int *ntokens) {
int fromIndex = 0;
int toIndex = 0;
char **list;
int finalCount = *ntokens;
int count = 0;
bool firstTime = true; // Use this to make sure you do not free up the memory for the initial s passed as the function arg
list = malloc(*ntokens * sizeof(char*));
while ( count < finalCount) {
char *m = strchr(s,' ');
toIndex = m - s;
if(toIndex >= 0) {
const char* previous_s = s; // Store the current value of s
list[count] = substr(s,fromIndex,toIndex); // This substr() gets free'ed from main()
s = substr(previous_s, toIndex+1, strlen(previous_s));
if (!firstTime)
{
free(previous_s); // Since we're done with the previous_s, we can free up the memory
}
firstTime = false;
count++;
} else {
list[count] = substr(s,fromIndex,strlen(s)); // This substr() gets free'ed from main()
count++;
}
}
if (!firstTime)
{
free(s); // There could be a block allocated last time which needs to be freed as well
}
return list;
}

Resources