I am working on a project that will have a driver program redirect its output into the standard input of my program, how would I be able to scan what this program is feeding into my program and have my program respond accordingly. I was thinking of using scanf, would that work?
Additional info:
In the first line of the input (out of many lines), the driver gives a number ending in a new line character (\n). Depending on that number, my program will parse the rest of the lines in the input and output a response. Each line will be a string of random letters and my program will need to dynamically allocate memory for each string. These strings will be part of a struct in a linked list.
You can treat the input just like regular console input, all the console stdio input calls will work. Just dont try to have a conversation with it - ie dont go
Enter foodle count : dd
Invalid number
Enter Foodle count :
becuase there is nobody on the other end
It is generally recommended to use a more robust method than scanf, as it is full of edge cases, and offers little in the way of recovering from bad input.
A beginners' guide away from scanf() is a decent read, although I would offer stricter advice: forget scanf exists.
Given the requirement of
Each line will be a string of random letters and my program will need to dynamically allocate memory for each string. These strings will be part of a struct in a linked list.
POSIX getline (or getdelim) is a solid choice, if available. This function handles reading input from a file, and will dynamically allocate memory if requested.
Here is an example skeleton program, with functionality vaguely similar to what you have described.
#define _POSIX_C_SOURCE 200809L
#include <stdio.h>
#include <stdlib.h>
char *input(FILE *file) {
char *line = NULL;
size_t n;
if (-1 == getline(&line, &n, file))
return NULL;
return line;
}
size_t parse_header(char *header) {
return 42;
}
void use_data(char *data) {
free(data);
}
int main(void) {
char *header = input(stdin);
char *data;
if (!header)
return EXIT_FAILURE;
size_t lines_expected = parse_header(header);
size_t lines_read = 0;
free(header);
while ((data = input(stdin))) {
lines_read++;
use_data(data);
}
if (lines_read != lines_expected)
fprintf(stderr, "Mismatched lines.\n");
}
If POSIX is not available on your system, or you just want to explore alternatives, one of the better methods the standard library offers for reading input is fgets. Combined with malloc, strlen, and strcpy, you have will have roughly implemented getline, but do note the caveats in the provided fgets manual.
While scanf is a poor choice for both reading and parsing input, sscanf remains useful for parsing strings, as you have greater control over the state of your data.
The strtol/strtod family of functions are usually preferred over atoi style functions for parsing numbers, though they can be difficult to use properly.
Related
GNU manual
This quote is from the GNU manual
Warning: If the input data has a null character, you can’t tell. So
don’t use fgets unless you know the data cannot contain a null. Don’t
use it to read files edited by the user because, if the user inserts a
null character, you should either handle it properly or print a clear
error message. We recommend using getline instead of fgets.
As I usually do, I spent time searching before asking a question, and I did find a similar question on Stack Overflow from five years ago:
Why is the fgets function deprecated?
Although GNU recommends getline over fgets, I noticed that getline in stdio.h takes any size line. It calls realloc as needed. If I try to set the size to 10 char:
#include <stdio.h>
#include <stdlib.h>
int main()
{
char *buffer;
size_t bufsize = 10;
size_t characters;
buffer = (char *)malloc(bufsize * sizeof(char));
if( buffer == NULL)
{
perror("Unable to allocate buffer");
exit(1);
}
printf("Type something: ");
characters = getline(&buffer,&bufsize,stdin);
printf("%zu characters were read.\n",characters);
printf("You typed: '%s'\n",buffer);
return(0);
}
In the code above, type any size string, over 10 char, and getline will read it and give you the right output.
There is no need to even malloc, as I did in the code above — getline does it for you. I'm setting the buffer to size 0, and getline will malloc and realloc for me as needed.
#include <stdio.h>
#include <stdlib.h>
int main()
{
char *buffer;
size_t bufsize = 0;
size_t characters;
printf("Type something: ");
characters = getline(&buffer,&bufsize,stdin);
printf("%zu characters were read.\n",characters);
printf("You typed: '%s'\n",buffer);
return(0);
}
If you run this code, again you can enter any size string, and it works. Even though I set the buffer size to 0.
I've been looking at safe coding practices from CERT guidelines www.securecoding.cert.org
I was thinking of switching from fgets to getline, but the issue I am having, is I cannot figure out how to limit the input in getline. I think a malicious attacker can use a loop to send an unlimited amount of data, and use up all the ram available in the heap?
Is there a way of limiting the input size that getline uses or does getline have some limit within the function?
Using fgets is not necessarily problematic, all the gnu manual tells you is that if there's a '\0'-Byte in the file, so will there be in your buffer. You won't be able to tell if the null-delimiter in your buffer is the actual end of the file or just a null within the file. This means you can read a 100 char file into a 200 char buffer and it will contain a 50 char c-string.
The stdio.h readline in fact doesn't appear to have any sane length limitation so fread might be viable alternative.
Unlinke C getline and C++ std::getline(), C++ std::istream::getline() is limited to count characters
The GNU manual is just bad. Limiting the input length is usually the right thing to do, especially if input is untrusted, and fgets does this correctly. getline cannot be used safely in such a context.
I am trying to search a string in a text file,when the text file is like what given below :
"Naveen; Okies
PSG; Diploma
SREC; BECSE"
When output console ask for input string and when i type naveen it will result in printing Okies, when i typed PSG it will print Diploma. This works fine as I am using the below code :
fscanf(fp, "%[^;];%s\n", temp, Mean);
However below text file is not working,
"Naveen; Okies Is it working
PSG; Diploma Is it working
SREC; BECSE Is it working"
My code still gives me Okies as output for Naveen, where i need "Okies Is it working" as output.
So i changed my code to fscanf(fp, "%[^;];%[^\n]s", temp, Mean); where i am getting 'Okies Is it working' as output. But for searching string it's not searching next line. When i search PSG, I dont get any ouput.
Kindly help me to understand my issue.
Side-bar
Note that you should check the return value from fscanf().
You say you tried:
fscanf(fp, "%[^;];%[^\n]s", temp, Mean);
This is probably a confused format. The s at the end is looking for a literal s in the input, but it will never be found and you'll have no way of knowing that it is not found. The %[^\n] scan set conversion specification looks for a sequence of 'non-newlines'. It will only stop when the next character is a newline, or EOF. The s therefore is a literal s that will never be matched. But the return values from fscanf() is the number of successful conversions, which would probably be 2. You have no way of spotting whether that s was read. It should be removed from the format string.
Main answer
To address your main question, the %s format stops at the first blank. If you want to process the whole line, don't use %s. Use either POSIX getline() or standard C fgets() to read the line, and then analyze it.
You can analyze it with strtok(). I wouldn't do that in any library code because any library function that calls strtok() cannot be used from code that might also be using strtok(), nor can it call any function where that function, or one of the functions it calls directly or indirectly, uses strtok(). The strtok() function is poisonous — you can only use it in one function at a time. These comments do not apply to
strtok_r() or the analogous Microsoft-provided variant strtok_s() — which is similar to but different from the strtok_s() defined in optional Annex K of C11. The variants with a suffix are reentrant and do not poison the system like strtok() does.
I'd probably use strchr() in this context; you could also look at
strstr(),
strpbrk(),
strspn(),
strcpsn(). All of these are standard C functions, and have been since C89/C90.
I think Jonathan has explained it pretty well, however, I am adding a sample to show how to deal with your example for learning purposes. Bear in mind you might wanna change some functions as I used the deprecated (insecure) ones, probably this could be an exercise for you.
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
int main(int argc, char *argv[]) {
FILE *fp = NULL;
char szBuffer[1024] = { '\0' };
char szChoice[256] = { '\0' };
char szResult[256] = { '\0' };
if ((fp = fopen("test.txt", "r")) == NULL) {
printf("Error opening file\n");
return EXIT_FAILURE;
}
printf("Enter your choice: ");
scanf("%s", &szChoice);
while (fgets(szBuffer, sizeof(szBuffer), fp) != NULL) {
if (!strncmp(szBuffer, szChoice, strlen(szChoice))) {
char *pch = szBuffer;
pch += (strlen(szChoice) + 1);
printf("Result: %s", pch);
}
}
getchar();
return EXIT_SUCCESS;
}
Say I make an input :
"Hello world" // hit a new line
"Goodbye world" // second input
How could I scan through the two lines and input them separately in two different arrays. I believe I need to use getchar until it hits a '\n'. But how do I scan for the second input.
Thanks in advance. I am a beginner in C so please It'd be helpful to do it without pointers as I haven't covered that topic.
Try this code out :
#include<stdio.h>
int main(void)
{
int flx=0,fly=0;
char a,b[10][100];
while(1)
{
a=getchar();
if(a==EOF) exit(0);
else if(a=='\n')
{
flx++;
fly=0;
}
else
{
b[flx][fly++]=a;
}
}
}
Here I use a two dimensional array to store the strings.I read the input character by character.First i create an infinite loop which continues reading characters.If the user enters the end of File character the input stops. If there is a newline character then flx variable is incremented and the next characters are stored in the next array position.You can refer to the strings stored with b[n] where n is the index.
The function that you should probably look at is fgets. At least on my system, the definition is as follows:
char *fgets(char * restrict str, int size, FILE * restrict stream);
So a very simple program to read input from the keyboard would run something like this:
#include <stdio.h>
#include <stdlib.h>
#define MAXSTRINGSIZE 128
int main(void)
{
char array[2][MAXSTRINGSIZE];
int i;
void *result;
for (i = 0; i < 2; i++)
{
printf("Input String %d: ", i);
result = fgets(&array[i][0], MAXSTRINGSIZE, stdin);
if (result == NULL) exit(1);
}
printf("String 1: %s\nString 2: %s\n", &array[0][0], &array[1][0]);
exit(0);
}
That compiles and runs correctly on my system. The only issue with fgets though is that is retains the newline character \n in the string. So if you don't want that, you will need to remove it. As for the *FILE parameter, stdin is a predefined *FILE structure that indicates standard input, or file descriptor 0. There are also stdout for standard output (file descriptor 1) and a stderr for error messages and diagnostics (file descriptor 2). The file descriptor numbers correspond to the ones used in a shell like so:
$$$-> cat somefile > someotherfile 2>&1
What that does is take outfile of file descriptor 2 and redirect it to 1 with 1 in turn being redirected to a file. In addition, I am using the & operator because we are addressing parts of an array, and the functions in question (fgets, printf) require pointers. As for the result, the man page for gets and fgets states the following:
RETURN VALUES
Upon successful completion, fgets() and gets() return a pointer to the string. If end-of-file occurs before any characters are read,
they return NULL and the buffer contents remain unchanged. If an
error occurs, they return NULL and the buffer contents are
indeterminate. The fgets() and gets() functions do not distinguish
between end-of-file and error, and callers must use feof(3) and
ferror(3) to determine which occurred.
So to make your code more robust, if you get a NULL result, you need to check for errors using ferror or end of file using feof and respond approperiately. Furthermore, never EVER use gets. The only way that you can use it securely is that you have to have the ability to see into the future, which clearly nobody can do so it cannot be used securely. It will just open you up for a buffer overflow attack.
I'm brand new to C and trying to learn how to read a file. My file is a simple file (just for testing) which contains the following:
this file
has been
successfully read
by C!
So I read the file using the following C code:
#include <stdio.h>
int main() {
char str[100];
FILE *file = fopen("/myFile/path/test.txt", "r");
if(file == NULL) {
puts("This file does not exist!");
return -1;
}
while(fgets(str, 100, file) != '\0') {
puts(str);
}
fclose(file);
return 0;
}
This prints my text like this:
this file
has been
successfully read
by C!
When I compile and run it, I pipe its output to hexdump -C and can see an extra 0a at the end of each line.
Finally, why do I need to declare an array of chars to read from a file? What if I don't know how much data is on each line?
fgets() reads up to the newline and keeps the newline in the string and puts() always adds a newline to the string it is given to print. Hence you get double-spaced output when used as in your code.
Use fputs(str, stdout) instead of puts(); it does not add a newline.
The obsolete function gets() — removed from the 2011 version of the C standard — read up to the newline but removed it. The gets() and puts() pair worked well together, as do fgets() and fputs(). However, you should certainly NOT use gets(); it is a catastrophe waiting to happen. (The first internet worm in 1988 used gets() to migrate — Google search for 'morris internet worm').
In comments, inquisitor asked:
Why does the line need to be read into a char array of a specific size?
Because you need to make sure you don't overrun the space that is available. C does not do automatic allocation of space for strings. That is one of its weaknesses from some viewpoints; it is also a strength, but it routinely confuses newcomers to the language. If you want the input code to allocate enough space for a line, use the POSIX function getline().
So is it better to just read and output until I hit a '\0' since I won't always know the amount of chars on a given line?
No. In general, you won't hit '\0'; most text files do not contain any of those. If you don't want to allocate enough space for a line, then use:
int c;
while ((c = getchar()) != EOF)
putchar(c);
which reads one character at a time in the user code, but the underlying standard I/O packages buffer the input up so it isn't too costly — it is perfectly feasible to implement a program that way. If you need to work on lines, either allocate enough space for lines (I use char buffer[4096]; routinely) or use getline().
And Charlie Burns asked in a comment:
Why don't we see getline() suggested more often?
I think it is not mentioned all that often because getline() is relatively new, and not necessarily available everywhere yet. It was added to POSIX 2008; it is available on Linux and BSD. I'm not sure about the other mainline Unix variants (AIX, HP-UX, Solaris). It isn't hard to write for yourself (I've done it), but it is a nuisance if you need to write portable code (especially if 'portable' includes 'Microsoft'). One of its merits is that it tells you how long the line it read actually was.
Example using getline()
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
char *line = 0;
size_t length = 0;
char const name[] = "/myFile/path/test.txt";
FILE *file = fopen(name, "r");
if (file == NULL)
{
fprintf(stderr, "%s: failed to open file %s\n", argv[0], name);
return -1;
}
while (getline(&line, &length, file) > 0)
fputs(str, stdout);
free(line);
fclose(file);
return 0;
}
fgets saves the newline character at the end of the line when reading line by line. This allows you to determine wether actually a line was read or just your buffer was too small.
puts always adds a newline when printing.
Either trim off the newline from fgets or use printf
printf("%s", str);
I want to read line-by-line from a given input file,, process each line (i.e. its words) and then move on to other line...
So i am using fscanf(fptr,"%s",words) to read the word and it should stop once it encounters end of line...
but this is not possible in fscanf, i guess... so please tell me the way as to what to do...
I should read all the words in the given line (i.e. end of line should be encountered) to terminate and then move on to other line, and repeat the same process..
Use fgets(). Yeah, link is to cplusplus, but it originates from c stdio.h.
You may also use sscanf() to read words from string, or just strtok() to separate them.
In response to comment: this behavior of fgets() (leaving \n in the string) allows you to determine if the actual end-of-line was encountered. Note, that fgets() may also read only part of the line from file if supplied buffer is not large enough. In your case - just check for \n in the end and remove it, if you don't need it. Something like this:
// actually you'll get str contents from fgets()
char str[MAX_LEN] = "hello there\n";
size_t len = strlen(str);
if (len && str[len-1] == '\n') {
str[len-1] = 0;
}
Simple as that.
If you are working on a system with the GNU extensions available there is something called getline (man 3 getline) which allows you to read a file on a line by line basis, while getline will allocate extra memory for you if needed. The manpage contains an example which I modified to split the line using strtok (man 3 strtrok).
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
FILE * fp;
char * line = NULL;
size_t len = 0;
ssize_t read;
fp = fopen("/etc/motd", "r");
if (fp == NULL)
{
printf("File open failed\n");
return 0;
}
while ((read = getline(&line, &len, fp)) != -1) {
// At this point we have a line held within 'line'
printf("Line: %s", line);
const char * delim = " \n";
char * ptr;
ptr = (char * )strtok(line,delim);
while(ptr != NULL)
{
printf("Word: %s\n",ptr);
ptr = (char *) strtok(NULL,delim);
}
}
if (line)
{
free(line);
}
return 0;
}
Given the buffering inherent in all the stdio functions, I would be tempted to read the stream character by character with getc(). A simple finite state machine can identify word boundaries, and line boundaries if needed. An advantage is the complete lack of buffers to overflow, aside from whatever buffer you collect the current word in if your further processing requires it.
You might want to do a quick benchmark comparing the time required to read a large file completely with getc() vs. fgets()...
If an outside constraint requires that the file really be read a line at a time (for instance, if you need to handle line-oriented input from a tty) then fgets() probably is your friend as other answers point out, but even then the getc() approach may be acceptable as long as the input stream is running in line-buffered mode which is common for stdin if stdin is on a tty.
Edit: To have control over the buffer on the input stream, you might need to call setbuf() or setvbuf() to force it to a buffered mode. If the input stream ends up unbuffered, then using an explicit buffer of some form will always be faster than getc() on a raw stream.
Best performance would probably use a buffer related to your disk I/O, at least two disk blocks in size and probably a lot more than that. Often, even that performance can be beat by arranging the input to be a memory mapped file and relying on the kernel's paging to read and fill the buffer as you process the file as if it were one giant string.
Regardless of the choice, if performance is going to matter then you will want to benchmark several approaches and pick the one that works best in your platform. And even then, the simplest expression of your problem may still be the best overall answer if it gets written, debugged and used.
but this is not possible in fscanf,
It is, with a bit of wickedness ;)
Update: More clarification on evilness
but unfortunately a bit wrong. I assume [^\n]%*[^\n] should read [^\n]%*. Moreover, one should note that this approach will strip whitespaces from the lines. – dragonfly
Note that xstr(MAXLINE) [^\n] reads MAXLINE characters which can be anything except the newline character (i.e. \n). The second part of the specifier i.e. *[^\n] rejects anything (that's why the * character is there) if the line has more than MAXLINE characters upto but NOT including the newline character. The newline character tells scanf to stop matching. What if we did as dragonfly suggested? The only problem is scanf will not know where to stop and will keep suppressing assignment until the next newline is hit (which is another match for the first part). Hence you will trail by one line of input when reporting.
What if you wanted to read in a loop? A little modification is required. We need to add a getchar() to consume the unmatched newline. Here's the code:
#include <stdio.h>
#define MAXLINE 255
/* stringify macros: these work only in pairs, so keep both */
#define str(x) #x
#define xstr(x) str(x)
int main() {
char line[ MAXLINE + 1 ];
/*
Wickedness explained: we read from `stdin` to `line`.
The format specifier is the only tricky part: We don't
bite off more than we can chew -- hence the specification
of maximum number of chars i.e. MAXLINE. However, this
width has to go into a string, so we stringify it using
macros. The careful reader will observe that once we have
read MAXLINE characters we discard the rest upto and
including a newline.
*/
int n = fscanf(stdin, "%" xstr(MAXLINE) "[^\n]%*[^\n]", line);
if (!feof(stdin)) {
getchar();
}
while (n == 1) {
printf("[line:] %s\n", line);
n = fscanf(stdin, "%" xstr(MAXLINE) "[^\n]%*[^\n]", line);
if (!feof(stdin)) {
getchar();
}
}
return 0;
}