Shell program pipes C - c

I am trying to run a small shell program and the first step to make sure my code is running properly is to make sure I get the correct command and parameters:
//Split the command and store each string in parameter[]
cp = (strtok(command, hash)); //Get the initial string (the command)
parameter[0] = (char*) malloc(strlen(cp)+ 1); //Allocate some space to the first element in the array
strncpy(parameter[0], cp, strlen(cp)+ 1);
for(i = 1; i < MAX_ARG; i++)
{
cp = strtok(NULL, hash); //Check for each string in the array
parameter[i] = (char*) malloc(strlen(cp)+ 1);
strncpy(parameter[i], cp, strlen(cp)+ 1); //Store the result string in an indexed off array
if(parameter[i] == NULL)
{
break;
}
if(strcmp(parameter[i], "|") == 0)
{
cp = strtok(NULL, hash);
parameter2[0] = (char*) malloc(strlen(cp)+ 1);
strncpy(parameter2[0], cp, strlen(cp)+ 1);
//Find the second set of commands and parameters
for (j = 1; j < MAX_ARG; j++)
{
cp = strtok(NULL, hash);
if (strlen(cp) == NULL)
{
break;
}
parameter2[j] = (char*) malloc(strlen(cp)+ 1);
strncpy(parameter2[j], cp, strlen(cp)+ 1);
}
break;
}
I am having a problem when I compare cp and NULL, my program crashes. What I want is to exit the loop once the entries for the second set or parameters have finished (which is what I tried doing with the if(strlen(cp) == NULL)

I may have misunderstood the question, but your program won't ever see the pipe character, |.
The shell processes the entire command line, and your program will only be given it's share of the command line, so to speak.
Example:
cat file1 file2 | sed s/frog/bat/
In the above example, cat is invoked with only two arguments, file1, and file2. Also, sed is invoked with only a single argument: s/frog/bat/.

Let's look at your code:
parameter[0] = malloc(255);
Since strtok() carves up the original command array, you don't have to allocate extra space with malloc(); you could simply point the parameter[n] pointers to the relevant sections of the original command string. However, once you move beyond space-separated commands (in a real shell, the | symbol does not have to be surrounded by spaces, but it does in yours), then you will probably need to copy the parts of the command string around, so this is not completely wrong.
You should check for success of memory allocation.
cp = strtok(command, " "); //Get the initial string (the command)
strncpy(parameter[0], cp, 50);
You allocated 255 characters; you copy at most 49. It might be better to wait until you have the parameter isolated, and then duplicate it - allocating just the space that is needed. Note that if the (path leading to the) command name is 50 characters or more, you won't have a null-terminated string - the space allocated by malloc() is not zeroed and strncpy() does not write a trailing zero on an overlong string.
for (i = 1; i < MAX_ARG; i++)
It is not clear that you should have an upper limit on the number of arguments that is as simple as this. There is an upper limit, but it is normally on the total length of all the arguments.
{
parameter[i] = malloc(255);
Similar comments about memory allocation - and checking.
cp = strtok(NULL, " ");
parameter[i] = cp;
Oops! There goes the memory. Sorry about the leak.
if (strcmp(parameter[i], "|") == 0)
I think it might be better to do this comparison before copying... Also, you don't want the pipe in the argument list of either command; it is a notation to the shell, not part of the command's argument lists. You should also ensure that the first command's argument list is terminated with a NULL pointer, especially since i is set just below to MAX_ARG so you won't know how many arguments were specified.
{
i = MAX_ARG;
cp = strtok(NULL, " ");
parameter2[0] = malloc(255);
strncpy(parameter2[0], cp, 50);
This feels odd; you isolate the command and then process its arguments separately. Setting i = MAX_ARG seems funny too since your next action is to break the loop.
break;
}
if(parameter[i] == NULL)
{
break;
}
}
//Find the second set of commands and parameter
//strncpy(parameter2[0], cp, 50);
for (j = 1; j < MAX_ARG; j++)
{
parameter2[j] = malloc(255);
cp = strtok(NULL, " ");
parameter2[j] = cp;
}
You should probably only enter this loop if you found a pipe. Then this code leaks memory like the other one does (so you're consistent - and consistency is important; but so is correctness).
You need to review your code to ensure it handles 'no pipe symbol' properly, and 'pipe but no following command'. At some point, you should consider multi-stage pipelines (three, four, ... commands). Generalizing your code to handle that is possible.
When writing code for Bash or an equivalent shell, I frequently use notations such as this script, which I used a number of times today.
ct find /vobs/somevob \
-branch 'brtype(dev.branch)' \
-version 'created_since(2011-10-11T00:00-00:00)' \
-print |
grep -v '/0$' |
xargs ct des -fmt '%u %d %Vn %En\n' |
grep '^jleffler ' |
sort -k 4 |
awk '{ printf "%-8s %s %-25s %s\n", $1, $2, $3, $4; }'
It doesn't much matter what it does (but it finds all checkins I made since 11th October on a particular branch in ClearCase); it's the notation that I was using that is important. (Yes, it could probably be optimized - it wasn't worth doing so.) Equally, this is not necessarily what you need to deal with now - but it does give you an inkling of where you need to go.

Related

Why are the values in my dynamic 2D char array being overwritten?

I'm trying to build a process logger in C on Linux, but having trouble getting it right. I'd like it to have 3 colums: USER, PID, COMMAND. I'm using the output of ps aux and trying to dynamically append it to an array. That is, for every line ps aux outputs, I want to add a row to my array.
This is my code. (To keep the output short, I only grep for sublime. But this could be anything.)
#define _BSD_SOURCE
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
char** processes = NULL;
char* substr = NULL;
int n_spaces = 0;
int columns = 1;
char line[1024];
FILE *p;
p = popen("ps -eo user,pid,command --sort %cpu | grep sublime", "r");
if(!p)
{
fprintf(stderr, "Error");
exit(1);
}
while(fgets(line, sizeof(line) - 1, p))
{
puts(line);
substr = strtok(line, " ");
while(substr != NULL)
{
processes = realloc(processes, sizeof(char*) * ++n_spaces);
if(processes == NULL)
exit(-1);
processes[n_spaces - 1] = substr;
// first column user, second PID, third all the rest
if(columns < 2)//if user and PID are already in array, don't split anymore
{
substr = strtok(NULL, " ");
columns++;
}
else
{
substr = strtok(NULL, "");
}
}
columns = 1;
}
pclose(p);
for(int i = 0; i < (n_spaces); i++)
printf("processes[%d] = %s\n", i, processes[i]);
free(processes);
return 0;
}
The output of the for loop at the end looks like this.
processes[0] = user
processes[1] = 7194
processes[2] = /opt/sublime_text/plugin_host 27184
processes[3] = user
processes[4] = 7194
processes[5] = /opt/sublime_text/plugin_host 27184
processes[6] = user
processes[7] = 27194
processes[8] = /opt/sublime_text/plugin_host 27184
processes[9] = user
processes[10] = 27194
processes[11] = /opt/sublime_text/plugin_host 27184
But, from the puts(line) I get that the array should actually contain this:
user 5016 sh -c ps -eo user,pid,command --sort %cpu | grep sublime
user 5018 grep sublime
user 27184 /opt/sublime_text/sublime_text
user 27194 /opt/sublime_text/plugin_host 27184
So, apparently all the values are being overwritten and I can't figure out why... (Also, I don't get where the value 7194 in processes[0] = 7194 and processes[4] = 7194 comes from).
What am I doing wrong here? Is it somehow possible to make the output look like the output of puts(line)?
Any help would be appreciated!
The return value of strtok is a pointer into the string that you tokenise. (The token has been made null-terminated by overwriting the first separator after it with '\0'.)
When you read a new line with fgets, you overwrite the contents of this line and all tokens, not just the ones from the last parsing, point to the actual content of the line. (The pointers to the previous tokens remain valid, but the content at these locations changes.)
The are several ways to fix this.
You could make the tokens you save arrays of chars and strcpy the parsed contents.
You could duplicate the parsed tokens with (the non-standard) strdup, which allocates memory for the strings on the heap.
You could read an array of lines, so that the tokens are really unique.
strtok returns a pointer into the string it processes. That string is, always, stored in the variable line. So all your pointers in the array point to the same place in memory.
As #Lundin wrote in a comment, there is no two-dimensional array here, just a one-dimensional array of pointers.

Why do these two methods of counting words differ significantly?

I wrote a program that allows a user to find the number of instances of a word or collection of words in any text file. The user can enter something like this in the command line:
$ ./wordCount Mars TripToMars.txt
to search for the number of instances of the word "Mars" in the book Trip To Mars, or
$ ./wordCount -f collectionOfSearchWords.txt TripToMars.txt
to search for the number of instances of several words on individual lines in collectionOfSearchWords.txt.
To ensure that the program was correct, I used the grep commands:
$ grep -o 'Mars' TripToMars.txt | wc -w
and
$ grep -o -w 'Mars' TripToMars.txt | wc -w
The first command finds the number of instances of the word anywhere, which would include terms like "Marsa", "Marseen", "Marses", etc., while the second command finds only instances of "Mars" as a standalone word, which would include trailing punctuation such as "Mars.", "Mars!", "Mars?", etc.
Both grep commands return 49 as the number of instances of "Mars" in the book.
When I use the code in the while loop below (for simplicity, I'm only including the relevant code), the program returns 49. Awesome!
FILE *textToSearch;
char *readMode = "r";
int count;
char nextWord[100];
char d;
textToSearch = fopen(argVector[argCount-1], readMode);
if (textToSearch == NULL) {
fprintf(stderr, "Cannot open %s to be searched\n", argVector[argCount-1]);
return 1;
} else {
while (fscanf(textToSearch, "%*[^a-zA-Z]"), fscanf(textToSearch, "%80[a-zA-Z]", nextWord) > 0) {
// increment the counter if the word is a match
if (strcmp(nextWord, argVector[word]) == 0) {
count++;
}
}
}
But when I substitute this while loop for the previous one, the program returns 17.
while(1) {
d = fscanf(textToSearch, "%s", nextWord);
if (d == EOF) break;
// increment the counter if the word is a match
if (strcmp(nextWord, argVector[word]) == 0) {
count++;
}
}
So, what's the big difference between
while (fscanf(textToSearch, "%*[^a-zA-Z]"), fscanf(textToSearch, "%80[a-zA-Z]", nextWord) > 0) {}
and
while(1) {
d = fscanf(textToSearch, "%s", nextWord);
if (d == EOF) break;
}
?
EDIT:
I added this code:
if (strcmp(nextWordDict, nextWord) == 0 ||
strcmp(nextWordDict, strcat(nextWord, ".")) == 0 ||
strcmp(nextWordDict, strcat(nextWord, "?")) == 0 ||
strcmp(nextWordDict, strcat(nextWord, "!")) == 0 ||
strcmp(nextWordDict, strcat(nextWord, ",")) == 0) {
count++;
}
to the code producing 17 for Mars to try and account for cases with trailing punctuation, and there was no change. Still 17.
EDIT2:
As John Bollinger correctly points out below, this code does nothing because the string buffered into nextWord would already have the trailing punctuation, and the code would only be adding more. This is faulty thinking on my part.
You are incorrect when you say that the command ...
$ grep -o -w 'Mars' TripToMars.txt | wc -w
... "finds only instances of 'Mars' as a standalone word", or at least that statement is misleading in context. The command finds instances of "Mars" that are not part of a larger word, where a "word" is defined as a contiguous string of letters, digits, and/or underscores. In particular, it will match "Mars" where it is followed by a punctuation mark, which conflicts with what you seem to be claiming.
But what is the difference between your two scanning approaches? Well, this ...
while (fscanf(textToSearch, "%*[^a-zA-Z]"),
fscanf(textToSearch, "%80[a-zA-Z]", nextWord) > 0) { /* ... */ }
... scans zero or more characters that are not Latin letters, ignoring whether any are matched and whether any input error occurs, then scans a contiguous sequence of up to 80 Latin letters, recording that sequence in the nextWord buffer.
On the other hand, this ...
while(1) {
d = fscanf(textToSearch, "%s", nextWord);
if (d == EOF) break;
}
... ignores leading whitespace, then scans the next contiguous string of non-whitespace into nextWord.
The two differ significantly in what they do with characters that are neither Latin letters nor whitespace: the former ignores them, whereas the latter includes them in nextWord. When you then compare nextWord with the string "Mars", the latter misses on
Going to Mars.
and
The name "Mars"
and
Is there water on Mars?
because the adjacent punctuation is included in the comparison. Your text is quite likely to have many constructions similar to those, and your grep commands do not demonstrate otherwise.

Filter text from huge .csv files, in C

I have the raw and unfiltered records in a csv file (more than 1000000 records), and I am suppose to filter out those records from a list of files (each weighing more than 282MB; approx. more than 2000000 records). I tried using strstr in C. This is my code:
while (!feof(rawfh)) //loop to read records from raw file
{
j=0; //counter
while( (c = fgetc(rawfh))!='\n' && !feof(rawfh)) //read a line from raw file
{
line[j] = c; line[j+1] = '\0'; j++;
}
//function to extract the element in the specified column, in the CSV
extractcol(line, relcolraw, entry);
printf("\nWorking on : %s", entry);
found=0;
//read a set of 4000 bytes; this is the target file
while( fgets(buffer, 4000, dncfh)!=NULL && !found )
{
if( strstr(buffer, entry) !=NULL) //compare it
found++;
}
rewind(dncfh); //put the file pointer back to the start
// if the record was not found in the target list, write it into another file
if(!found)
{
fprintf(out, "%s,\n", entry); printf(" *** written to filtered ***");
}
else
{
found=0; printf(" *** Found ***");
}
//I hope this is the right way to null out a string
entry[0] = '\0'; line[0] ='\0';
//just to display a # on the screen, to let the user know that the program
//is still alive and running.
rawreccntr++;
if(rawreccntr>=10)
{
printf("#"); rawreccntr=0;
}
}
This program takes approximately 7 to 10 seconds, on an average, to search one entry in the target file (282 MB). So, 10*1000000 = 10000000 seconds :( God knows how much is that going to take if I decide to search in 25 files.
I was thinking of writing a program, and not going to spoon fed solutions (grep, sed etc.). OH, sorry, but I am using Windows 8 (64 bit, 4 GB RAM, AMD processor Radeon 2 core - 1000Mhz). I used DevC++ (gcc) to compile this.
Please enlighten me with your ideas.
Thanks in advance, and sorry if I sound stupid.
Update by Ali, the key information extracted from a comment:
I have a raw CSV file with details for customer's phone number and address. I have the target file(s) in CSV format; the Do Not Call list. I am suppose to write a program to filter out phone number that are not present in the Do No Call List. The phone numbers (for both files) are in the 2nd column. I, however, don't know of any other method. I searched for Boyer-Moore algorithm, however, could not implement that in C. Any suggestions about how should I go about searching for records?
EDITED
I would recommend you have a try with the readymade tools in any Unix/Linux system, grep and awk. You'll probably find they are just as fast and much more easily maintained. I haven't seen your data format, but you say the phone numbers are in the second column, so you can get the phone numbers on their own like this:
awk '{print $2}' DontCallFile.csv
If your phone numbers are in double quotes, you can remove those like this:
awk '{print $2}' DontCallFile.csv | tr -d '"'
Then you can use fgrep with the -f option, to search whether strings listed in one file are present in a second file, like this:
fgrep -f file1.csv file2.csv
or you can invert the search and search for strings NOT present in another file, by adding the -v switch to fgrep.
So, your final command would probably end up like this:
fgrep -v -f <(awk '{print $2}' DontCallFile.csv | tr -d '"') file2.csv
That says... search, in file2.csv for all strings not present (-v option) in column 2 of file "DontCallFile.csv". If you want to understand the bit in <() it is called process substitution and it basically makes a pseudo-file out of the result of running the command inside the brackets. And we need a pseudo-file because fgrep -f expects a file.
ORIGINAL ANSWER
Why are you using fgetc() anyway. Surely you would use getline() like this:
while(getline(myfile,line ))
{
...
}
Are you really reading the whole "target" file from the start for every single line in your main file? That will kill you! And why are you doing it in chunks of 4,000 bytes? And what if one of your strings straddles the 4,000 bytes you compare it with - i.e. the first 8 bytes are in one 4k chunk and the last however many bytes are in the nect 4k chunk?
I think you will get better help on here if you take the time to explain properly what you are trying to do - and maybe do it with awk or grep (at least figuratively) so we can see what you are actually trying to achieve. Your decription doesn't mention the "target" file you use in the code, for example.
You can do this with awk, like this:
awk -F, '
FNR==NR {gsub(/"/,"",$2);dcn[$2]++;next}
{gsub(/ /,"",$2);if(!dcn[$2])print}
' DontCallFile.csv x.csv
That says... the field separator is a comma (-F,). Now read the first file (DontCallFile.csv) and process according to the part in curly braces after FNR==NR. Remove the double quotes from around the phone number in field 2, using gsub (global substitution). Then increment the element in the associative array (i.e. hash) as indexed by unquoted field 2 and then move to next record. So basically, after file "DontCallFile.csv" is processed, the array dcn[] will hold a hash of all the numbers not to call (dcn=dontcallnumbers). Then, the code in the second set of curly braces is executed for each line of the second file ("x.csv"). That says... remove all spaces from around the phone number in field 2. Then, if that phone number is not present in the array dcn[] that we built earlier, print the line.
Here is one idea for improvement...
In the code below, what's the point in setting line[j+1] = '\0' at every iteration?
while( (c = fgetc(rawfh))!='\n' && !feof(rawfh))
{
line[j] = c; line[j+1] = '\0'; j++;
}
You might as well do it outside the loop:
while( (c = fgetc(rawfh))!='\n' && !feof(rawfh))
line[j++] = c;
line[j] = '\0';
My advice is the following.
Put all don't call phone numbers into an array.
Sort this array.
Use binary search to check if a given phone number is among the sorted
don't call numbers.
In the code below, I just hard-coded the numbers. In your application, you will have to replace that with the corresponding code.
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int compare(const void* a, const void* b) {
return (strcmp(*(char **)a, *(char **)b));
}
int binary_search(const char** first, const char** last, const char* val) {
ptrdiff_t len = last - first;
while (len > 0) {
ptrdiff_t half = len >> 1;
const char** middle = first;
middle += half;
if (compare(&*middle, &val)) {
first = middle;
++first;
len = len - half - 1;
}
else
len = half;
}
return first != last && !compare(&val,&*first);
}
int main(int argc, char** argv) {
size_t i;
/* Read _all_ of your don't call phone numbers into an array. */
/* For the sake of the example, I just hard-coded it. */
char* dont_call[] = { "908-444-555", "800-200-400", "987-654-321" };
/* in your program, change length to the number of dont_call numbers actually read. */
size_t length = sizeof dont_call / sizeof dont_call[0];
qsort(dont_call, length, sizeof(char *), compare);
printf("The don\'t call numbers sorted\n");
for (i=0; i<length; ++i)
printf("%lu %s\n", i, dont_call[i]);
/* For each phone number, check if it is in the sorted dont_call list. */
/* Use binary search to check it. */
char* numbers[] = { "999-000-111", "333-444-555", "987-654-321" };
size_t n = sizeof numbers / sizeof numbers[0];
printf("Now checking if we should call a given number\n");
for (i=0; i<n; ++i) {
int should_call = binary_search((const char **)dont_call, (const char **)dont_call+length, numbers[i]);
char* as_text = should_call ? "no" : "yes";
printf("Should we call %s? %s\n",numbers[i], as_text);
}
return 0;
}
This prints:
The don't call numbers sorted
0 800-200-400
1 908-444-555
2 987-654-321
Now checking if we should call a given number
Should we call 999-000-111? yes
Should we call 333-444-555? yes
Should we call 987-654-321? no
The code is definitely not perfect but it is sufficient to get you started.
The problem with your algorithm is complexity. You approach is O(n*m) where n is number of customers and m is number of do_not_call records (or size of file in your case). You need reduce this complexity. (And Boyer-Moore algorithm would not help there which suggested by Ali. It would not improve asymptotic complexity but only constant.) Even binary search as Ali suggest in his answer is not best. It would be O((n+m)*log m). We can do better. Nice solutions are using fgrep and awk as suggested by Mark Setchell in his answers. (I would chose one using fgrep which should perform better I guess but it is only guess.) I can provide one similar solution in Perl which will provide more robust CSV parsing and should handle your data sizes in easy on decent HW. This type of solutions has complexity O(n+m).
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use Text::CSV_XS;
use constant PHN_COL_DNC => 1;
use constant PHN_COL_CUSTOMERS => 1;
die "Usage: $0 dnc_file [customers]" unless #ARGV>0;
my $dncfile = shift #ARGV;
my $csv = Text::CSV_XS->new({eol=>"\n", allow_whitespace=>1, binary=>1});
my %dnc;
open my $dnc, '<', $dncfile;
while(my $row = $csv->getline($dnc)){
$dnc{$row->[PHN_COL_DNC]} = undef;
}
close $dnc;
while(my $row = $csv->getline(*ARGV)){
$csv->print(*STDOUT, $row) unless exists $dnc{$row->[PHN_COL_CUSTOMERS]};
}
If it would not meet our performance expectation you can go down to C road but I would definitely recommend use some good csv parsing and hashmap libraries. I would try libcsv and khash.h

Parsing commands shell-like in C

I want to parse user input commands in my C (just C) program. Sample commands:
add node ID
add arc ID from ID to ID
print
exit
and so on. Then I want to do some validation with IDs and forward them to specified functions. Functions and validations are of course ready. It's all about parsing and matching functions...
I've made it with many ifs and strtoks, but I'm sure it's not the best way... Any ideas (libs)?
I think what you want is something like this:
while (1)
{
char *line = malloc(128); // we need to be able to increase the pointer
char *origLine = line;
fgets(line, 128, stdin);
char command[20];
sscanf(line, "%20s ", command);
line = strchr(line, ' ');
printf("The Command is: %s\n", command);
unsigned argumentsCount = 0;
char **arguments = malloc(sizeof(char *));
while (1)
{
char arg[20];
if (line && (sscanf(++line, "%20s", arg) == 1))
{
arguments[argumentsCount] = malloc(sizeof(char) * 20);
strncpy(arguments[argumentsCount], arg, 20);
argumentsCount++;
arguments = realloc(arguments, sizeof(char *) * argumentsCount + 1);
line = strchr(line, ' ');
}
else {
break;
}
}
for (int i = 0; i < argumentsCount; i++) {
printf("Argument %i is: %s\n", i, arguments[i]);
}
for (int i = 0; i < argumentsCount; i++) {
free(arguments[i]);
}
free(arguments);
free(origLine);
}
You can do what you wish with 'command' and 'arguments' just before you free it all.
It depends on how complicated your command language is. It might be worth going to the trouble of womping up a simple recursive descent parser if you have more than a couple of commands, or if each command can take multiple forms, such as your add command.
I've done a couple of RDPs by hand for some projects in the past. It's a bit of work, but it allows you to handle some fairly complex commands that wouldn't be straightforward to parse otherwise. You could also use a parser generator like lex/yacc or flex/bison, although that may be overkill for what you are doing.
Otherwise, it's basically what you've described; strok and a bunch of nested if statements.
I just wanted to add something to Richard Ross's reply: Check the returned value from malloc and realloc. It may lead to hard-to-find crashes in your program.
All your command line parameters will be stored into a array of strings called argv.
You can access those values using argv[0], argv[1] ... argv[n].

Way to pass argv[] to CreateProcess()

My C Win32 application should allow passing a full command line for another program to start, e.g.
myapp.exe /foo /bar "C:\Program Files\Some\App.exe" arg1 "arg 2"
myapp.exe may look something like
int main(int argc, char**argv)
{
int i;
for (i=1; i<argc; ++i) {
if (!strcmp(argv[i], "/foo") {
// handle /foo
} else if (!strcmp(argv[i], "/bar") {
// handle /bar
} else {
// not an option => start of a child command line
break;
}
}
// run the command
STARTUPINFO si;
PROCESS_INFORMATION pi;
// customize the above...
// I want this, but there is no such API! :(
CreateProcessFromArgv(argv+i, NULL, NULL, FALSE, 0, NULL, NULL, &si, &pi);
// use startup info si for some operations on a process
// ...
}
I can think about some workarounds:
use GetCommandLine()
and find a substring corresponding to argv[i]
write something similar to ArgvToCommandLine() also mentioned in another SO question
Both of them lengthy and re-implement cumbersome windows command line parsing logic, which is already a part of CommandLineToArgvW().
Is there a "standard" solution for my problem? A standard (Win32, CRT, etc.) implementation of workarounds counts as a solution.
It's actually easier than you think.
1) There is an API, GetCommandLine() that will return you the whole string
myapp.exe /foo /bar "C:\Program Files\Some\App.exe" arg1 "arg 2"
2) CreateProcess() allows to specify the command line, so using it as
CreateProcess(NULL, "c:\\hello.exe arg1 arg2 etc", ....)
will do exactly what you need.
3) By parsing your command line, you can just find where the exe name starts, and pass that address to the CreateProcess() . It could be easily done with
char* cmd_pos = strstr(GetCommandLine(), argv[3]);
and finally: CreateProcess(NULL, strstr(GetCommandLine(), argv[i]), ...);
EDIT: now I see that you've already considered this option. If you're concerned about performance penalties, they are nothing comparing to process creation.
The only standard function which you not yet included in your question is PathGetArgs, but it do not so much. The functions PathQuoteSpaces and PathUnquoteSpaces can be also helpful. In my opinion the usage of CommandLineToArgvW in combination with the with GetCommandLineW is what you really need. The usage of UNICODE during the parsing of the command line is in my opinion mandatory if you want to have a general solution.
I solved it as follows: With your Visual Studio install you can find a copy of some of the standard code used to create the C library. In particular if you look in VC\crt\src\stdargv.c you will find the implementation of "wparse_cmdline" function which creates argc and argv from the result of GetCommandLineW API. I created an augmented version of this code which also created a "cmdv" array of pointers which pointed back into the original string at the place where each argv pointer begins. You can then act on argv arguments as you wish, and when you want to pass the "rest" on to CreateProcess you can just pass in cmdv[i] instead.
This solution has the advantages that is uses the exact same parsing code, still provides argv as usual, and allows you to pass on the original without needing to re-quote or re-escape it.
I have faced the same problems with you. The thing is, we don't need to parse the whole string, if we can separate the result of GetCommandLine(), then you can put them together afterwards.
According to Microsoft's documentation, you should consider only backslashes and quotes.
You can find their documents here.
And, then, you can call Solve to get the next parameter start point.
E.g.
"a b c" d e
First Part: "a b c"
Next Parameter Start: d e
I solved the examples in Microsoft documentation, so do worry about the compatibility.
By calling the Solve function recursively, you can get the whole argv array.
Here's the file test.c
#include <stdio.h>
extern char* Solve(char* p);
void showString(char *str)
{
char *end = Solve(str);
char *p = str;
printf("First Part: ");
while(p < end){
fputc(*p, stdout);
p++;
}
printf("\nNext Parameter Start: %s\n", p + 1);
}
int main(){
char str[] = "\"a b c\" d e";
char str2[] = "a\\\\b d\"e f\"g h";
char str3[] = "a\\\\\\\"b c d";
char str4[] = "a\\\\\\\\\"b c\" d e";
showString(str);
showString(str2);
showString(str3);
showString(str4);
return 0;
}
Running result are:
First Part: "a b c"
Next Parameter Start: d e
First Part: a\\b
Next Parameter Start: d"e f"g h
First Part: a\\\"b
Next Parameter Start: c d
First Part: a\\\\"b c"
Next Parameter Start: d e
Here's all the source code of Solve function, file findarg.c
/**
This is a FSM for quote recognization.
Status will be
1. Quoted. (STATUS_QUOTE)
2. Normal. (STATUS_NORMAL)
3. End. (STATUS_END)
Quoted can be ended with a " or \0
Normal can be ended with a " or space( ) or \0
Slashes
*/
#ifndef TRUE
#define TRUE 1
#endif
#define STATUS_END 0
#define STATUS_NORMAL 1
#define STATUS_QUOTE 2
typedef char * Pointer;
typedef int STATUS;
static void MoveSlashes(Pointer *p){
/*According to Microsoft's note, http://msdn.microsoft.com/en-us/library/17w5ykft.aspx */
/*Backslashes are interpreted literally, unless they immediately precede a double quotation mark.*/
/*Here we skip every backslashes, and those linked with quotes. because we don't need to parse it.*/
while (**p == '\\'){
(*p)++;
//You need always check the next element
//Skip \" as well.
if (**p == '\\' || **p == '"')
(*p)++;
}
}
/* Quoted can be ended with a " or \0 */
static STATUS SolveQuote(Pointer *p){
while (TRUE){
MoveSlashes(p);
if (**p == 0)
return STATUS_END;
if (**p == '"')
return STATUS_NORMAL;
(*p)++;
}
}
/* Normal can be ended with a " or space( ) or \0 */
static STATUS SolveNormal(Pointer *p){
while (TRUE){
MoveSlashes(p);
if (**p == 0)
return STATUS_END;
if (**p == '"')
return STATUS_QUOTE;
if (**p == ' ')
return STATUS_END;
(*p)++;
}
}
/*
Solve the problem and return the end pointer.
#param p The start pointer
#return The target pointer.
*/
Pointer Solve(Pointer p){
STATUS status = STATUS_NORMAL;
while (status != STATUS_END){
switch (status)
{
case STATUS_NORMAL:
status = SolveNormal(&p); break;
case STATUS_QUOTE:
status = SolveQuote(&p); break;
case STATUS_END:
default:
break;
}
//Move pointer to the next place.
if (status != STATUS_END)
p++;
}
return p;
}
I think it's actually harder than you think for a general case.
See What's up with the strange treatment of quotation marks and backslashes by CommandLineToArgvW.
It's ultimately up to the individual programs how they tokenize the command-line into an argv array (and even CommandLineToArgv in theory (and perhaps in practice, if what one of the comments said is true) could behave differently than the CRT when it initializes argv to main()), so there isn't even a standard set of esoteric rules that you can follow.
But anyway, the short answer is: no, there unfortunately is no easy/standard solution. You'll have to roll your own function to deal with quotes and backslashes and the like.

Resources