I'm writing a program that will have a command prompt where the user can infinitely input command strings and I will process them as needed.
I have a command-line limit of 200 characters, but for now, I am performing a hello world test with a limit of 4 characters per command to see how my system would handle an input overflow. To my absolute surprise and confusion, I'm seeing that even though I am declaring my command[5] input array as to only allocate 5 characters, I am able to write outside those bounds and read command[7] without getting any exception or runtime error. In the example below, I input hello world as a command and reading command[7] returns the letter o which is the correct answer (I was expecting an error after trying to read outside the 5 character bound of my array).
Can someone explain what's going on? How can I make sure that the input gets truncated as I was expecting when the user goes over the buffer size that I've established?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[])
{
char command[5]; //commands can't be longer than 4 characters
char c;
while (1)
{
printf("# "); //print command prompt
scanf("%[^\n]4s", command); //read command
while ((c = getchar()) != '\n' && c != EOF)
{
/*discard overflow input*/;
}
printf("received command: %c\n", command[7]); //echo character from command
if (strcmp(command, "exit") == 0)
{
break;
}
memset(command, 0, sizeof(command)); //clean command buffer
}
return 0;
}
The C standard doesn't specify what happens when you access memory outside of what you've allocated. It could read correctly, it could read something else (if something that owns that memory overwrites it, it could cause a segmentation fault (if you access memory outside of your program's allocated space).
One option would be to use the width modifier on scanf to ensure you receive at most 4 characters:
scanf("%4s", command);
As multiple people mentioned in the comments and other answers, the C standard establishes that accessing memory outside the allocated bounds results in undefined behavior, but that doesn't mean it will always give an error. I guess that was an interpretation error by me.
Regarding the specific issue I was having where scanf was reading more than 4 characters, the advice provided by #Weather Vane in the comments worked well. All that was needed was changing my scanf command.
From this: scanf("%[^\n]4s", command); //read command
To this: scanf("%4[^\n]s", command); //read command
This way, scanf will only write up to 4 characters into the buffer command, and the contiguous memory will be left untouched. Therefore, if I try to access command[7], I would get garbage or possibly an error.
For anyone wondering about the while loop that discards overflow, see the comment section.
Related
I need to create a method that get's commands from users using scanf and runs a function. The command can be simple as help or list but it can also be a command that has an argument like look DIRECTION or take ITEM. What is the best way to go about this? I could just loop through the characters of a single given string and check it manually but I was wondering there was a better way of doing this.
scanf("%s %s", command, argument);
This won't work if there's no argument. Is there a way around this?
There is a 'method' that may work. In fact, two come to mind.
Both rely on whitespace chars (in plain-english, '\n', ' 'and '\t') separating the arguments , and I assume this is good enough.
1
First, the relatively easy one - using main(int argc,char *argv[]) as most CLI programs do.
Then, running a long string of if()s/else if()s which check if the input string matched valid arguments , by testing if strcmp(argv[x],expected_command) returns 0.
You may not yet have been taught about how to use this, and it may appear scary, but its quite easy if you are familiar with string.h, arrays and pointers already.
Google searches and YouTube videos may be of help, and it won't take more than 20 or so minutes.
2
Second, if you have your program with a real CLU 'UI' and the program is in a loop and doesn't just terminate once output is generated - unlike say cat or ls , then you take input of 'command' strings within the program.
This means you will have to, apart from and before the if-ed strcmp()s , ensure that you take input with scanf() safely, and that you are able to take multiple strings as input, since you talk of sub-arguments like look DIRECTION.
The way I have done this myself (in the past) is as follows :
1. Declare a command string, say char cmd[21] = ""; and (optionally) initialise it to be empty , since reading an uninitialised string is UB (and the user may enter EOF).
2. Declare a function (for convenience) to check scanf() say like so:
int handle_scanf(int returned,int expected){
if(returned==expected)
return 0;
if(returned==EOF){
puts("\n Error : Input Terminated Immaturely.");
/* you may alternatively do perror() but then
will have to deal with resetting errno=0 and
including errno.h */
return -1;
}
else{
puts("\n Error : Insufficient Input.");
return -2;
}
}
Which can be used as : if(handle_scanf(scanf(xyz,&xyz),1)==0) {...}
As scanf() returns number of items 'taken' (items that matched with expected format-string and were hence saved) and here there is only 1 expected argument.
3. Declare a function (for convenience) to clear/flush stdin so that if and when unnecessary input is left in the input stream , (which if not dealt with, will be passed to the next place where input is taken) it can be 'eaten'.
I do it like so :
void eat()
{
int eat; while ((eat = getchar()) != '\n' && eat != EOF);
}
Essentially clears input till a newline or EOF is read. Since '\n' and EOF represent End Of Line and End Of File , and modern I/O is line buffered and performed through the stdin file , it makes sense to stop upon reading them.
EDIT : You may alternatively use a macro, for slightly better performance.
4. Print a prompt and take input, like so :
fputs("\n >>> ",stdout);
int check = handle_scanf(scanf("%20s",cmd),1);
Notice what I did here ?
"%20s" does two things - stops buffer overflow (because more than 20 chars won't be scanned into cmd) and also stops scanning when a whitespace char is encountered. So, your main command must be one-word.
5. Check if the the command is valid .
This is to be done with the aforementioned list of checking if strcmp(cmd,"expected_cmd")==0 , for all possible expected commands.
If there is no match, with an else , display an error message and call eat();(arguments to invalid command can be ignored) but only if(check != -1).
If check==-1 , this may mean that the user has sent an EOF signal to the program, in which case, calling eat() within a loop will result in an infinite loop displaying the error message, something which you don't want.
6. If there is a match, absorb the whitespace separating char and then scanf() into a char array ( if the user entered, look DIRECTION, DIRECTION is still in the input stream and will only now be saved to said char array ). This can be done like so :
#define SOME_SIZE 100 // use an appropriate size
if(strcmp(cmd,"look")==0 && check==0){ // do if(check==0) before these ifs, done here just for my convenience)
getchar(); // absorb whitespace seperator
char strbuff[SOME_SIZE] = ""; // string buffer of appropriate size
if(handle_scanf(scanf("%99[^\n]",strbuff),1)==0){
eat();
/* look at DIRECTION :) */
}
// handle_scanf() generated appropriate error msg if it doesn't return 0
}
Result
All in all, this code handles scanf mostly safely and can indeed be used in a way that the user will only type , say :
$ ./myprogram
>>> look DIRECTION
# output
>>> | #cursor
If it is all done within a big loop inside main() .
Conclusion
In reality, you may end up needing to use both together if your program is complex enough :)
I hope my slightly delayed answer is of help :)
In case of any inaccuracies , or missing details, please comment and I will get back to you ASAP
Here's a good way to parse an inputted string using strtok and scanf with a limit of 99 characters
#include <string.h>
char command[99];
scanf("%[^\n]%*c", command); //This gets the entire string and spaces
char *token;
token = strtok(command, " "); //token = the first string separated by a " "
if (strcmp(token, "help") == 0){
//do function
}
else if (strcmp(token, "go") == 0){ //if the command has an argument, you have to get the next string
token = strtok(NULL, " "); //this gets the next string separated by a space
if (strcmp(token, "north") == 0){
//do function
}
}
You can keep using token = strtok(NULL, " "); until token = NULL signifying the end of a string
Say I make an input :
"Hello world" // hit a new line
"Goodbye world" // second input
How could I scan through the two lines and input them separately in two different arrays. I believe I need to use getchar until it hits a '\n'. But how do I scan for the second input.
Thanks in advance. I am a beginner in C so please It'd be helpful to do it without pointers as I haven't covered that topic.
Try this code out :
#include<stdio.h>
int main(void)
{
int flx=0,fly=0;
char a,b[10][100];
while(1)
{
a=getchar();
if(a==EOF) exit(0);
else if(a=='\n')
{
flx++;
fly=0;
}
else
{
b[flx][fly++]=a;
}
}
}
Here I use a two dimensional array to store the strings.I read the input character by character.First i create an infinite loop which continues reading characters.If the user enters the end of File character the input stops. If there is a newline character then flx variable is incremented and the next characters are stored in the next array position.You can refer to the strings stored with b[n] where n is the index.
The function that you should probably look at is fgets. At least on my system, the definition is as follows:
char *fgets(char * restrict str, int size, FILE * restrict stream);
So a very simple program to read input from the keyboard would run something like this:
#include <stdio.h>
#include <stdlib.h>
#define MAXSTRINGSIZE 128
int main(void)
{
char array[2][MAXSTRINGSIZE];
int i;
void *result;
for (i = 0; i < 2; i++)
{
printf("Input String %d: ", i);
result = fgets(&array[i][0], MAXSTRINGSIZE, stdin);
if (result == NULL) exit(1);
}
printf("String 1: %s\nString 2: %s\n", &array[0][0], &array[1][0]);
exit(0);
}
That compiles and runs correctly on my system. The only issue with fgets though is that is retains the newline character \n in the string. So if you don't want that, you will need to remove it. As for the *FILE parameter, stdin is a predefined *FILE structure that indicates standard input, or file descriptor 0. There are also stdout for standard output (file descriptor 1) and a stderr for error messages and diagnostics (file descriptor 2). The file descriptor numbers correspond to the ones used in a shell like so:
$$$-> cat somefile > someotherfile 2>&1
What that does is take outfile of file descriptor 2 and redirect it to 1 with 1 in turn being redirected to a file. In addition, I am using the & operator because we are addressing parts of an array, and the functions in question (fgets, printf) require pointers. As for the result, the man page for gets and fgets states the following:
RETURN VALUES
Upon successful completion, fgets() and gets() return a pointer to the string. If end-of-file occurs before any characters are read,
they return NULL and the buffer contents remain unchanged. If an
error occurs, they return NULL and the buffer contents are
indeterminate. The fgets() and gets() functions do not distinguish
between end-of-file and error, and callers must use feof(3) and
ferror(3) to determine which occurred.
So to make your code more robust, if you get a NULL result, you need to check for errors using ferror or end of file using feof and respond approperiately. Furthermore, never EVER use gets. The only way that you can use it securely is that you have to have the ability to see into the future, which clearly nobody can do so it cannot be used securely. It will just open you up for a buffer overflow attack.
After Mark Lakata pointed out that the garbage isn't properly defined in my question I came up with this. I'll keep this updated to avoid confusions.
I am trying to get a function that I can call before a prompt for user input like printf("Enter your choice:); followed a scanf and be sure that only the things entered after the prompt would be scanned in by scanf as valid input.
As far as I can understand the function that is needed is something that flushes standard input completely. That is what I want. So for the purpose of this function the "garbage" is everything in user input i.e. the whole user input before that user prompt.
While using scanf() in C there is always the problem of extra input lying in the input buffer. So I was looking for a function that I call after every scanf call to remedy this problem. I used this, this, this and this to get these answers
//First approach
scanf("%*[^\n]\n");
//2ndapproach
scanf("%*[^\n]%*c");
//3rd approach
int c;
while((c = getchar()) != EOF)
if (c == '\n')
break;
All three are working as far as I could find by hit-and-trial and going by the references. But before using any of these in all of my codes I wanted to know whether any of these have any bugs?
EDIT:
Thanks to Mark Lakata for one bug in 3rd. I corrected it in the question.
EDIT2:
After Jerry Coffin answered I tested the 1st 2 approaches using this program in code:blocks IDE 12.11 using GNU GCC Compiler(Version not stated in the compiler settings).
#include<stdio.h>
int main()
{
int x = 3; //Some arbitrary value
//1st one
scanf("%*[^\n]\n");
scanf("%d", &x);
printf("%d\n", x);
x = 3;
//2nd one
scanf("%*[^\n]%*c");
scanf("%d", &x);
printf("%d", x);
}
I used the following 2 inputs
First Test Input (2 Newlines but no spaces in the middle of garbage input)
abhabdjasxd
23
bbhvdahdbkajdnalkalkd
46
For the first I got the following output by the printf statements
23
46
i.e. both codes worked properly.
Second Test input: (2 Newlines with spaces in the middle of garbage input)
hahasjbas asasadlk
23
manbdjas sadjadja a
46
For the second I got the following output by the printf statements
23
3
Hence I found that the second one won't be taking care of extra garbage input whitespaces. Hence, it isn't foolproof against garbage input.
I decided to try out a 3rd test case (garbage includes newline before and after the non-whitespace character)
``
hahasjbas asasadlk
23
manbdjas sadjadja a
46
The answer was
3
3
i.e. both failed in this test case.
The first two are subtly different: they both read and ignore all the characters up to a new-line. Then the first skips all consecutive white space so after it executes, the next character you read will be non-whitespace.
The second reads and ignores characters until it encounters a new-line then reads (and discards) exactly one more character.
The difference will show up if you have (for example) double-spaced text, like:
line 1
line 2
Let's assume you read to somewhere in the middle of line 1. If you then execute the first one, the next character you read in will be the 'l' on line 2. If you execute the second, the next character you read in will be the new-line between line 1 and line 2.
As for the third, if I were going to do this at all, I'd do something like:
int ch;
while ((ch=getchar()) != EOF && ch != '\n')
;
...and yes, this does work correctly -- && forces a sequence point, so its left operand is evaluated first. Then there's a sequence point. Then, if and only if the left operand evaluated to true, it evaluates its right operand.
As for performance differences: since you're dealing with I/O to start with, there's little reasonable question that all of these will always be I/O bound. Despite its apparent complexity, scanf (and company) are usually code that's been used and carefully optimized over years of use. In this case, the hand-rolled loop may be quite a bit slower (e.g., if the code for getchar doesn't get expanded inline) or it may be about the same speed. The only way it stands any chance of being significantly faster is if the person who wrote your standard library was incompetent.
As far maintainability: IMO, anybody who claims to know C should know the scan set conversion for scanf. This is neither new nor rocket science. Anybody who doesn't know it really isn't a competent C programmer.
The first 2 examples use a feature of scanf that I didn't even know existed, and I'm sure a lot of other people didn't know. Being able to support a feature in the future is important. Even if it was a well known feature, it will be less efficient and harder to read the format string than your 3rd example.
The third example looks fine.
(edit history: I made a mistake saying that ANSI-C did not guarantee left-to-right evaluation of && and proposed a change. However, ANSI-C does guarantee left-to-right evaluation of &&. I'm not sure about K&R C, but I can't find any reference to it and no one uses it anyways...)
Many other solutions have the problem that they cause the program to hang and wait for input when there is nothing left to flush. Waiting for EOF is wrong because you don't get that until the user closes the input completely!
On Linux, the following will do a non-blocking flush:
// flush any data from the internal buffers
fflush (stdin);
// read any data from the kernel buffers
char buffer[100];
while (-1 != recv (0, buffer, 100, MSG_DONTWAIT))
{
}
The Linux man page says that fflush on stdin is non-standard, but "Most other implementations behave the same as Linux."
The MSG_DONTWAIT flag is also non-standard (it causes recv to return immediately if there is no data to be delivered).
You should use getline/getchar:
#include <stdio.h>
int main()
{
int bytes_read;
int nbytes = 100;
char *my_string;
puts ("Please enter a line of text.");
/* These 2 lines are the heart of the program. */
my_string = (char *) malloc (nbytes + 1);
bytes_read = getline (&my_string, &nbytes, stdin);
if (bytes_read == -1)
{
puts ("ERROR!");
}
else
{
puts ("You typed:");
puts (my_string);
}
return 0;
I think if you see carefully at right hand side of this page you will see many questions similar to yours. You can use fflush() on windows.
Given the following program:
#include <stdio.h>
int main()
{
char buf[1024];
scanf("%s", buf);
printf("----> %s", buf);
return 0;
}
which is executed as follows:
grep ....| a.out
or
echo ....| a.out
I get a Segmentation fault error. Can anyone explain why?
Whatever you are echoing or grepping must contain more than 1023 characters. (1024 - 1 for the null terminator.)
Instead of using scanf, use fgets and specify a size. Alternatively, use scanf but specify the field length. You can do scanf("%1023s", buf);. If there's more bytes available, you can always do it again to read in the rest.
Given your test input, you should not receive a segfault. I just tried it locally and it worked fine. If you are on Linux, since you wrote a.out instead of ./a.out, depending on how your path is configured you may be running the wrong program (some sort of a.out in your bin folder?)
Don't ever use scanf with unbounded strings. fgets provides a much safer alternative, especially if you provide an intelligent wrapper function like the one in this answer.
I'm assuming that's just sample code here but, just in case it isn't, you can achieve the same effect with:
WhateverYourCommandIs | sed 's/^/----> '
without having to write your own tool to do the job. In fact, with sed, awk and the likes, you probably never need to write text processing tools yourself.
from scanf man:
s Matches a sequence of non-white-space characters; the next pointer must be a pointer to character array that is long enough to hold the input sequence and the terminating null character ('\0'), which is added automatically. The input string stops at white space or at the maximum field width, whichever occurs first.
specifying maximum field width will prevent stack overrun
scanf("%1023s", buf);
and to ensure stack no overrun on printf use memset:
memset(buf,0,1024);
so, programm will be:
#include <stdio.h>
#include <string.h>
int main()
{
char buf[1024];
memset(buf,0,1024);
scanf("%1023s", buf);
printf("----> %s", buf);
return 0;
}
Why does the following have the effect it does - it prints a terminal full of random characters and then exits leaving a command prompt that produces garbage when you type in it. (I tried it because I thought it would produce a seg fault).
#include <stdio.h>
int main(){
char* s = "lololololololol";
while(1){
printf("%c", *s);
s++;
}
}
it was compiled with:
gcc -std=c99 hello.c
It will eventually seg fault, but before that it'll print out whatever bytes are in the same page. That's why you see random chars on the screen.
Those may well include escape sequences to change (say) the character encoding of the console. That's why you end up with gibberish when you type on the console after it's exited, too.
Because you have an infinite loop (while(1)), and you keep getting the current value of pointer (*s), and then moving the pointer one char forward (s++). This has the effect of marching well past the end of the string into "garbage" (uninitialized memory), which gets printed to the console as a result.
In addition to what everyone else said in regards to you ignoring the string terminal character and just printing willy-nilly what's in memory past the string, the reason why your command prompt is also "garbage" is that by printing a particular "unprintable" character, your terminal session was left in a strange character mode. (I don't know which character it is or what mode change it does, but maybe someone else can pipe in about it that knows better than I.)
You are just printing out what is in memory because your loop doesn't stop at the end of the string. Each random byte is interpreted as a character. It will seg fault when you reach the end of the memory page (and get into unreadable territory).
Expanding ever so slightly on the answers given here (which are all excellent) ... I ran into this more than once myself when I was just beginning with C, and it's an easy mistake to make.
A quick tweak to your while loop will fix it. Everyone else has given you the why, I'll hook you up with the how:
#include <stdio.h>
int main() {
char *s = "lolololololololol";
while (*s != '\0') {
printf("%c", *s);
s++;
}
}
Note that instead of an infinite loop (while(1)), we're doing a loop check to ensure that the pointer we're pulling isn't the null-terminator for the string, thus avoiding the overrun you're encountering.
If you're stuck absolutely needing while(1) (for example, if this is homework and the instructor wants you to use it), use the break keyword to exit the loop. The following code smells, at least to me, but it works:
#include <stdio.h>
int main() {
char *s = "lolololololololol";
while (1) {
if (*s == '\0')
break;
printf("%c", *s);
s++;
}
}
Both produce the same console output, with no line break at the end:
lolololololololol
Your loop doesn't terminate, so println prints whatever is in the memory after the text you write; eventually it will access memory it is not allowed to read, causing it to segfault.
You can change the loop as the others suggested, or you can take advantage of fact that in c, zero is false and null (which terminates all strings) is also zero, so you can construct the loop as:
while (*s) {
Rather than:
while (*s != '\0')
The first one may be more difficult to understand, but it does have the advantage of brevity so it is often used to save a bit of typing.
Also, you can usually get back to your command prompt by using the 'reset' command, typing blindly of course. (type Enter, reset, Enter)