How do I cover unintuitive code blocks? - c

For some reason, I'm having a hard time trying to cover the block of code below. This code is an excerpt from the UNIX uniq command. I'm trying to write test cases to cover all blocks, but can't seem to reach this block:
if (nfiles == 2)
{
// Generic error routine
}
In context:
int main (int argc, char **argv)
{
int optc = 0;
bool posixly_correct = (getenv ("POSIXLY_CORRECT") != NULL);
int nfiles = 0;
char const *file[2];
file[0] = file[1] = "-";
program_name = argv[0];
skip_chars = 0;
skip_fields = 0;
check_chars = SIZE_MAX;
for (;;)
{
/* Parse an operand with leading "+" as a file after "--" was
seen; or if pedantic and a file was seen; or if not
obsolete. */
if (optc == -1 || (posixly_correct && nfiles != 0) || ((optc = getopt_long (argc, argv, "-0123456789Dcdf:is:uw:", longopts, NULL)) == -1))
{
if (optind == argc)
break;
if (nfiles == 2)
{
// Handle errors
}
file[nfiles++] = argv[optind++];
}
else switch (optc)
{
case 1:
{
unsigned long int size;
if (optarg[0] == '+' && posix2_version () < 200112 && xstrtoul (optarg, NULL, 10, &size, "") == LONGINT_OK && size <= SIZE_MAX)
skip_chars = size;
else if (nfiles == 2)
{
// Handle error
}
else
file[nfiles++] = optarg;
}
break;
}
}
}
Any help would be greatly appreciated. Thanks.

It appears this could be reached when more than 2 files are supplied on the command line. In that case, nfiles would reach the value 2 after the name of the second file has been stored in file[1]. When the code checking nfiles == 2 is reached a third time, then the value will already be 2, and the error handling will execute.
There are two if statements in question. That in switch case "1" can only be reached by using the option in longopts with val == 1.

I just thought I would mention in passing that the automatic generation of test cases to satisfy coverage criteria is transitioning from being a research topic to useful applications. One prototype that I know of is PathCrawler.
The difficulties in the source code that may prevent this or a similar tool to work as hoped are the usual suspects: aliasing and dynamic memory allocation.

Related

how to make whitespace into consideration in C when dealing with command line args

I am trying to deal with the command line arguments but the command line args include whitespaces which I need to make into consideration.
Here is my code and I am trying to deal with the following command line args:
if I type in "-upper" as the command args then it changes the text to upper cases/
if I type in "-lower" then it changes the text to lower cases/
if I type in "-upper -lower" then it only changes the text to lower cases/
if I type in "-lower -upper" then it only changes the text to upper cases/
the snippet of my program is:
int main(int argc, char *argv[])
{
int ch;
int arg;
char buf[100];
int upper = 0;
int lower = 0;
for( arg = 1; arg < argc; arg++ )
{
if( strcmp( argv[arg] , "-upper") == 0 )
{
upper = 1;
}
if( strcmp( argv[arg] , "-lower") == 0)
{
lower = 1;
}
if( strcmp( argv[arg] , "-lower -upper") == 0 )
{
upper = 1;
}
else if( strcmp( argv[arg] , "-upper -lower") == 0)
{
lower = 1;
}
else
{
fprintf(stderr,"Invalid command line option\n" );
return 0;
}
}
while ((ch = getchar()) != EOF) //deal with cmd args
However, strcmp doesn't handle the whitespaces. So the issue arised is that C program will not treat "-lower -upper" as "to uppercase". How can I compare the cmd args including the whitespaces so that it will handle "-lower -upper" the same as "-upper"?
Thanks for helping!
You can simply clear lower when upper is set or the opposite, ie. the last argument clears the old ones:
int upper = 0;
int lower = 0;
for( arg = 1; arg < argc; arg++ )
{
printf("parsing %s\n", argv[arg]);
if( strcmp( argv[arg] , "-upper") == 0 )
{
printf("Now in upper mode\n");
lower = 0;
upper = 1;
continue;
}
if( strcmp( argv[arg] , "-lower") == 0)
{
printf("Now in lower mode\n");
upper = 0;
lower = 1;
continue;
}
fprintf(stderr,"Invalid command line option\n" );
return 0;
}
For command line utilities on Unix, it's generally the last flag on the command line that take precedence over the previous flags. Thus, ls -C -1 will list the files in the current directory in a single column, not in multiple columns, while ls -1 -C has the opposite effect.
You should not need to compare pairs or other combinations of command line flags.
If flag -upper is present, set a variable do_upper=1. If next you find -lower set do_upper=0.
Since -upper and -lower are mutually exclusive, it doesn't make sense to keep track of to variables in your code. If you do, you end up with silly things like
if (do_upper && !do_lower) { ... }
In fact, you could have your code default to one of these things, -lower for example, and only have a single flag, -upper, that changes the default behaviour. I don't know if that makes sense in your particular application though.
Personally, on Unix, I would stick with single-letter flags and use getopt() for command line parsing. It makes my life so much easier than trying to do string manipulation of user-provided data.

Prints new line after '\0' character in C

I'm currently doing an assignment where we are to recreate three switches of the cat command, -n/-T/-E. We are to compile and enter in two parameters, the switch and the file name. I store the textfile contents into a buffer.
int main(int argc, char *argv[]){
int index = 0;
int number = 1;
int fd, n, e, t;
n = e = t = 0;
char command[5];
char buffer[BUFFERSIZE];
strcpy(command, argv[1]);
fd = open(argv[2], O_RDONLY);
if( fd == -1)
{
perror(argv[2]);
exit(1);
}
read(fd, buffer,BUFFERSIZE);
if( !strcmp("cat", command)){
printf("%s\n", buffer);
}
else if( !strcmp("-n", command)){
n = 1;
}
else if( !strcmp("-E", command)){
e = 1;
}
else if( !strcmp("-T", command)){
t = 1;
}
else if( !strcmp("-nE", command) || !strcmp("-En", command)){
n = e = 1;
}
else if( !strcmp("-nT", command) || !strcmp("-Tn", command)){
n = t = 1;
}
else if( !strcmp("-ET", command) || !strcmp("-TE", command)){
t = e = 1;
}
else if( !strcmp("-nET", command) || !strcmp("-nTE", command) ||
!strcmp("-TnE", command) || !strcmp("-EnT", command) ||
!strcmp("-ETn", command) || !strcmp("-TEn", command)){
n = e = t = 1;
}
else{
printf("Invalid Switch Entry");
}
if(n){
printf("%d ", number++);
}
while(buffer[index++] != '\0' && ( n || e || t)){
if(buffer[index] == '\n' && e && n){
printf("$\n%d ", number++);
}
else if(buffer[index] == '\n' && e){
printf("$\n");
}
else if(buffer[index] == '\t' && t){
printf("^I");
}
else if(buffer[index] == '\n' && n){
printf("\n%d ", number++);
}
else {
printf("%c", buffer[index]);
}
}
printf("\n");
close(fd);
return 0;
}
Everything works perfectly except when I try to use the -n command. It adds an extra new line. I use a textfile that has
hello
hello
hello world!
instead of
1 hello
2 hello
3 hello world!
it will print out this:
1 hello
2 hello
3 hello world!
4
For some reason it adds the extra line after the world!
Am I missing something simple?
This might not fix your problem, but I don't see any code to put the terminating null character in buffer. Try:
// Reserve one character for the null terminator.
ssize_t n = read(fd, buffer, BUFFERSIZE-1);
if ( n == -1 )
{
// Deal with error.
printf("Unable to read the contents of the file.\n");
exit(1); //???
}
buffer[n] = '\0';
The three cat options that you implement have different "modes":
-T replaces a character (no tab is written);
-E prepends a character with additional output (the new-line character is still written);
-n prepends each line with additional output.
You can handle the first two modes directly. The third mode requires information from the character before: A new line starts at the start of the file and after a new-line character has been read. So you need a flag to keep track of that.
(Your code prints a line number after a new-line character is found. That means that you have to treat the first line explicitly and that you get one too many line umber at the end. After all, a file with n lines has n new-line characters and you print n + 1 line numbers.)
Other issues:
As R Sahu has pointed out, your input isn't null-terminated. You don't really need a null terminator here: read returns the number of bytes read or an error code. You can use that number as limit for index.
You incmenet index in the while condition, which means that you look at the character after the one you checked inside the loop, which might well be the null character. You will also miss the first character in the file.
In fact, you don't need a buffer here. When the file is larger than you buffer, you truncate it. You could call read in a loop until you read fewer bytes than BUFFERSIZE, but the simplest way in this case is to read one byte after the other and process it.
You use too many compound conditions. This isn't wrong per se, but it makes for complicated code. Your main loop reads like a big switch when there are in fact only a few special cases to treat.
The way you determine the flags is both too complicated and too restricted. You chack all combinations of flags, which is 6 for the case that all flags are given. What if you add another flag? Are you going to write 24 more strcmps? Look for the minus sign as first character and then at the letters one by one, setting flags and printing error messages as you go.
You don't need to copy argv[1] to command; you are only inspecting it. And you are introducing a source of error: If the second argument is longer than 4 characters, you will get undefined behaviour, very likely a crash.
If you don't give any options, the file name should be argv[1] instead of argv[2].
Putting this (sans the flag parsing) into practice:
FILE *f = fopen(argv[2], "r");
int newline = 1; // marker for line numbers
// Error checking
for (;;)
{
int c = fgetc(f); // read one character
if (c == EOF) break; // terminate loop on end of file
if (newline) {
if (n) printf("%5d ", number++);
newline = 0;
}
if (c == '\n') {
newline = 1;
if (e) putchar('$');
}
if (c == '\t' && t) {
putchar('^');
putchar('I');
} else {
putchar(c);
}
}
fclose(f);
Edit: If you are restricted to using the Unix open, close and read, you can still use the approach above. You need an additional loop that reads blocks of a certain size with read. The read function returns the value of the bytes read. If that is less than the number of bytes asked for, stop the loop.
The example below adds yet an additional loop that allows to concatenate several files.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define BUFFERSIZE 0x400
int main(int argc, char *argv[])
{
int n = 0;
int e = 0;
int t = 0;
int number = 0;
int first = 1;
while (first < argc && *argv[first] == '-') {
char *str = argv[first] + 1;
while (*str) {
switch (*str) {
case 'n': n = 1; break;
case 'E': e = 1; break;
case 'T': t = 1; break;
default: fprintf(stderr, "Unknown switch -%c.\n", *str);
exit(0);
}
str++;
}
first++;
}
while (first < argc) {
int fd = open(argv[first], O_RDONLY);
int newline = 1;
int bytes;
if (fd == -1) {
fprintf(stderr, "Could not open %s.\n", argv[first]);
exit(1);
}
do {
char buffer[BUFFERSIZE];
int i;
bytes = read(fd, buffer,BUFFERSIZE);
for (i = 0; i < bytes; i++) {
int c = buffer[i];
if (newline) {
if (n) printf("%5d ", number++);
newline = 0;
}
if (c == '\n') {
newline = 1;
if (e) putchar('$');
}
if (c == '\t' && t) {
putchar('^');
putchar('I');
} else {
putchar(c);
}
}
} while (bytes == BUFFERSIZE);
close(fd);
first++;
}
return 0;
}

Why program doesn't enter into the if statement when it should

I'm trying to implement an 'ls' command that lists file and directories. I have set the incoming argument array to the following:
argv[0] = "./a.out"
argv[1] = "-l"
argv[2] = "test.c"
Here is my code (assume that the main function passes argc and argv to the function I_AM_LS):
#include "ls.h"
int I_AM_LS(int argc, char ** argv)
{
// 'INCLUDING_HIDDEN_FILE' indicates program performs ls including hidden files
// 'EXCLUDING_HIDDEN_FILE' indicates program performs ls excluding.
int hidden_flag = EXCLUDING_HIDDEN_FILE;
int detail_flag = SIMPLY; // default option in ls.
// 'IN_DETAIL' indicates program performs ls with additional information.
// 'SIMPLY' indicates program performs ls without.
char option;
int i;
DIR * dp;
while ((option = getopt(argc, argv, "al")) != -1)
{
switch (option)
{
case 'a':
hidden_flag = INCLUDING_HIDDEN_FILE;
break;
case 'l':
detail_flag = IN_DETAIL;
break;
default: /* '?' */
printf("invaild option.\n");
return -1;
}
}
if( argv[optind] != NULL && argv[optind + 1] != NULL) // multiple argument
{
; // I have not finished the corresponding code yet.
}
else
{
if( argv[optind] == NULL) // case 1
I_REALLY_CALL_ls("./", hidden_flag, detail_flag);
else
I_REALLY_CALL_ls(argv[optind], hidden_flag, detail_flag);
}
printf("optind %d %d\n", optind, argv[optind]);
return 0;
}
}
int main(int argc, const char * argv[])
{
I_AM_LS(argc, argv);
return 0;
}
After the initial parsing loop, the program doesn't enter into the if statement 'argv[optind] != NULL'. We know that optind is 2 and argv[optind] points to "test.c", not NULL, the same behaviour is seem in debug mode.
Are there any problems with passing argv and argc to the function I_AM_LS? What should I do?
Note : I'm working on Xcode on OS X.
if( argv[optind] == NULL) // case 1
I_REALLY_CALL_ls("./", hidden_flag, detail_flag);
else if( argv[optind] != NULL && argv[optind] != NULL)
{
;
}
The condition in this else if is argv[optind] != NULL, evaluated twice for no good reason. So if the first condition doesn't hold, this one does, you do nothing (;), and
else if( argv[optind] != NULL)
{
// single non-option arguemnt.
I_REALLY_CALL_ls(argv[optind], hidden_flag, detail_flag);
}
is unreachable.

strcmp failing to compare strings properly

I'm having trouble using strcmp in C.
I'm trying to compare a program's arguments using strcmp but even though the strings are the same it doesn't work. Here is the portion of code.
while(strcmp(argv[i], "-e") != 0)
So for i = 11 if I print the value of argv[i] I get
printf("String %s i %d", argv[i],i);
>> String -e i 11
But the while keeps on going. Any ideas why this is happening?
Code:
while(strcmp(argv[i], "-e") != 0 || i != argc)
{
printf("String %s i %d", argv[i],i);
if(!isdigit((unsigned char)*argv[i]) && strcmp(argv[i], "-t") != 0)
{
archivo = fopen(argv[i] , "r");
TOT_IMG = TOT_IMG + 1;
for(t=0;t<NUM_FUNC_TRAZO;t++)
{
for(d=0;d<NUM_FUNC_DIAMETRICA;d++)
{
for(c=0;c<NUM_FUNC_CIRCO;c++)
{
if (fscanf(archivo, "%s",el) != EOF)
{
par->vector_circo[t][d][c] = strtod(el,NULL);
par->clase = clase;
}
else
{
break;
}
}
}
}
par_temp = par;
par->siguiente = (parametros_lista) malloc(sizeof(parametros_elem));
par = par->siguiente;
par->anterior = par_temp;
}
else
{
if(strcmp(argv[i], "-t") != 0)
{
clase = atoi(argv[i]);
CLASES = CLASES + 1;
}
}
i = i + 1;
}
Let's look at this:
while(strcmp(argv[i], "-e") != 0 || i != argc)
OK, so let's assume strcmp correctly returns 0 when argv[i] is "e". We'll assume this because it's exceedingly unlikely that there's a bug in your library implementation of strcmp.
What happens if strcmp returns 0? Well, things don't just stop, your code checks whether i != argc is true. Is it? My psychic debugging skills tell me that you should look into that second part of the while.
You may also want to note that it's possible that your code could, potentially, access argv[argc], which is NULL. You may get lucky if strcmp is lenient when the input is NULL, but it's a bug that you should fix.
I'd rather recommend you to use getopt (3). This is widely used approach to parameters parsing conforming with POSIX.
Also there was another question related to achieving getopt.h interface on windows: getopt.h: Compiling UNIX C-Code in Windows. What's important it is answered (Xgetopt) so portability should be not a case.

Exercise 1-24 from K&R - Rudimentary Syntax Checking

The exercise reads "Write a program to check a C program for rudimentary syntax errors like unbalanced parentheses, brackets, and braces. Don't forget about quotes, both single and double, escape sequences, and comments."
I chose to go about solving the problem by putting parentheses, brackets, and braces on a stack and making sure everything was LIFO along with various counters for marking whether we're in a comment, quote, etc.
The issue is that I feel my code, although it works, is poorly structured and not particularly idiomatic. I tried implementing the state variables (the stack, escaped, inString, etc.) within a struct and breaking apart the tests into subroutines. It didn't help much. Is there a way to solve this problem in a cleaner way while still handling escaped characters and the like correctly?
#include <stdio.h>
#include <stdlib.h>
#define INITIALSTACK 8
#define FALSE 0
#define TRUE 1
typedef struct {
int position;
int maxLength;
char* array;
} stack;
int match(char, char);
stack create();
void delete(stack*);
void push(stack*, char);
char pop(stack*);
int main() {
char c, out;
stack elemStack = create();
int escaped, inString, inChar, inComment, startComment, i, lineNum;
int returnValue;
escaped = inString = inChar = inComment = startComment = 0;
lineNum = 1;
while ((c = getchar()) != EOF) {
if (c == '\n')
lineNum++;
/* Test if in escaped state or for escape character */
if (escaped) {
escaped = FALSE;
}
else if (c == '\\') {
escaped = TRUE;
}
/* Test if currently in double/single quote or a comment */
else if (inString) {
if (c == '"' && !escaped) {
inString = FALSE;
}
}
else if (inChar) {
if (escaped)
escaped = FALSE;
else if (c == '\'' && !escaped) {
inChar = FALSE;
}
}
else if (inComment) {
if (c == '*')
startComment = TRUE;
else if (c == '/' && startComment)
inComment = FALSE;
else
startComment = FALSE;
}
/* Test if we should be starting a comment, quote, or escaped character */
else if (c == '*' && startComment)
inComment = TRUE;
else if (c == '/')
startComment = TRUE;
else if (c == '"') {
inString = TRUE;
}
else if (c == '\'') {
inChar = TRUE;
}
/* Accept the character and check braces on the stack */
else {
startComment = FALSE;
if (c == '(' || c == '[' || c == '{')
push(&elemStack, c);
else if (c == ')' || c == ']' || c == '}') {
out = pop(&elemStack);
if (out == -1 || !match(out, c)) {
printf("Syntax error on line %d: %c matched with %c\n.", lineNum, out, c);
return -1;
}
}
}
}
if (inString || inChar) {
printf("Syntax error: Quote not terminated by end of file.\n");
returnValue = -1;
}
else if (!elemStack.position) {
printf("Syntax check passed on %d line(s).\n", lineNum);
returnValue = 0;
}
else {
printf("Syntax error: Reached end of file with %d unmatched elements.\n ",
elemStack.position);
for(i = 0; i < elemStack.position; ++i)
printf(" %c", elemStack.array[i]);
printf("\n");
returnValue = -1;
}
delete(&elemStack);
return returnValue;
}
int match(char left, char right) {
return ((left == '{' && right == '}') ||
(left == '(' && right == ')') ||
(left == '[' && right == ']'));
}
stack create() {
stack newStack;
newStack.array = malloc(INITIALSTACK * sizeof(char));
newStack.maxLength = INITIALSTACK;
newStack.position = 0;
return newStack;
}
void delete(stack* stack) {
free(stack -> array);
stack -> array = NULL;
}
void push(stack* stack, char elem) {
if (stack -> position >= stack -> maxLength) {
char* newArray = malloc(2 * (stack -> maxLength) * sizeof(char));
int i;
for (i = 0; i < stack -> maxLength; ++i)
newArray[i] = stack -> array[i];
free(stack -> array);
stack -> array = newArray;
}
stack -> array[stack -> position] = elem;
(stack -> position)++;
}
char pop(stack* stack) {
if (!(stack -> position)) {
printf("Pop attempted on empty stack.\n");
return -1;
}
else {
(stack -> position)--;
return stack -> array[stack -> position];
}
}
Your solution is not that bad. It is very straight forward, which is a good thing. To learn a bit more from this excercise, I would probably implement this with a state machine. E.g. you have a few states like: code, comment, string etc.. then you define transitions between them. It gets much easier because you end up with logic depending on the state (so you don't have a blob of code, like in your main function). After that you can parse your code depending on the state. This means for example: If you're in a comment state, you ignore everything until you encounter an ending comment character. Then you change the state to code for example, and so forth.
In pseudo code it could look like this:
current_state = CODE
while(...) {
switch(current_state) {
case CODE:
if(input == COMMENT_START) {
current_state = COMMENT
break
}
if(input == STRING_START) {
current_state = STRING
break
}
// handle your {, [, ( stuff...
break
case COMMENT:
if(input == COMMENT_END) {
current_state = CODE
break
}
// handle comment.. i.e. ignore everything
break
case STRING:
// ... string stuff like above with state transitions..
break
}
}
Of course this can be done with e.g. yacc. But as I stated in a comment, I wouldn't suggest you use that. Maybe you could do that if you have enough time and want to learn as much as possible, but first I would implement it "the hard way".
I would probably approach this quite differently, by making use of a parser generator, like yacc, combined with a lexer generator, like lex.
You could base yourself on existing input files for these tools, for ANSI C. This lex specification and yacc grammar eg. can be a starting point. Alternatively, K&R contains a yacc compatible C grammar too in appendix A, or you could of course work directly with the grammar in the C standard.
For this exercise, you would only use those parts of the grammar that are of interest to you, and ignore the rest. The grammar will ensure that the syntax is correct (all braces matched etc.), and lex/yacc will take care of all the code generation. That leaves you with only having to specify some glue code, which will mostly be error messages in this case.
It will be a complete re-write of your code, but will probably give you a better understanding of the C grammar, and at the very least, you'll have learned to work with the great tools lex/yacc, which never hurts.

Resources