Read arbitrary number of characters from stdin until pressing enter in C - c

I want to make a program that accepts an arbitrary number of characters from user input, until "enter" is pressed, and stores them in a buffer.
Is there a way that to read characters from stdin without extracting them, count the characters, then allocate a buffer of precise size, finally copy the characters to the buffer.
Basically I do NOT want the way of using getc in loop and doubling the buffer size as it's running out.
EDIT:
To make my intentions more clear, let me express my intuition. I imagine the stdin buffer the same as a file (which may or may not grow dynamically). So I should be able to seek to the end of it (representing the end of user input), counting the offset then rewind back. Something like
long const start = ftell(stdin);
fseek(stdin, 0, SEEK_END);
long const length = ftell(stdin) - start;
rewind(stdin);

The readline() command is supported on some systems, and can be added to those that don't support it.
From the man page:
#include <stdio.h>
#include <readline/readline.h>
#include <readline/history.h>
char *
readline (const char *prompt);
readline will read a line from the terminal and return it, using prompt as a prompt. If prompt is NULL or the empty string, no prompt is issued. The line returned is allocated with malloc(3); the caller must free it when finished. The line returned has the final newline removed, so only the text of the line remains.

So, from reading the comments and other responses, you don't like the solution of using readline because it's not standard. (well, it's not an ISO or ANSI standard, but it is pretty popular anyway)
You don't like the convention of doubling the buffer size (most probably the approach used by readline internally) but you don't specify why you don't like it. There are more approaches than doubling the buffer size, you can increment it a fixed amount for example, but I think you'll not be glad with that solution either.
What can we do then, if you receive 1Hb (One HexaByte, 1.0E18 bytes) of data before you receive the first newline character? How can we deal with that?
How can any standarization office define a way to deal with this, and specify a way to proceed correctly?
Do you actually believe you are asking the right question?

Related

How to read multiple words in a line in C?

I want the user to be able to type
start < read_from_old_file.c > write_to_new_file.c
//or whatever type of file
So once the user types the command "start" followed by a "<", this will indicate reading a file, whereas ">" will indicate writing to a new file.
Problem is, I know you can use
scanf("%s", buff);
But this would read one word and not go onto the next.
You can use:
scanf("%[^\n]%*c",buff);
If you wish to use scanf() for this.
%*c is to get rid of the newline due to hitting enter.
In general, using the scanf family of functions is a bad idea. Instead, you should use fgets or fread to read in data, then perform your own processing on it. fgets will read a line at a time, whereas fread is for when you don't care about lines and just want n bytes of data (good for, say, implementing cat).
Most programs will want to read a line at a time. You'll need to allocate your own buddy to pass to fgets.

What's the difference between gets and scanf?

If the code is
scanf("%s\n",message)
vs
gets(message)
what's the difference?It seems that both of them get input to message.
The basic difference [in reference to your particular scenario],
scanf() ends taking input upon encountering a whitespace, newline or EOF
gets() considers a whitespace as a part of the input string and ends the input upon encountering newline or EOF.
However, to avoid buffer overflow errors and to avoid security risks, its safer to use fgets().
Disambiguation: In the following context I'd consider "safe" if not leading to trouble when correctly used. And "unsafe" if the "unsafetyness" cannot be maneuvered around.
scanf("%s\n",message)
vs
gets(message)
What's the difference?
In terms of safety there is no difference, both read in from Standard Input and might very well overflow message, if the user enters more data then messageprovides memory for.
Whereas scanf() allows you to be used safely by specifying the maximum amount of data to be scanned in:
char message[42];
...
scanf("%41s", message); /* Only read in one few then the buffer (messega here)
provides as one byte is necessary to store the
C-"string"'s 0-terminator. */
With gets() it is not possible to specify the maximum number of characters be read in, that's why the latter shall not be used!
The main difference is that gets reads until EOF or \n, while scanf("%s") reads until any whitespace has been encountered. scanf also provides more formatting options, but at the same time it has worse type safety than gets.
Another big difference is that scanf is a standard C function, while gets has been removed from the language, since it was both superfluous and dangerous: there was no protection against buffer overruns. The very same security flaw exists with scanf however, so neither of those two functions should be used in production code.
You should always use fgets, the C standard itself even recommends this, see C11 K.3.5.4.1
Recommended practice
6 The fgets function allows properly-written
programs to safely process input lines too long to store in the result
array. In general this requires that callers of fgets pay attention to
the presence or absence of a new-line character in the result array.
Consider using fgets (along with any needed processing based on
new-line characters) instead of gets_s.
(emphasis mine)
There are several. One is that gets() will only get character string data. Another is that gets() will get only one variable at a time. scanf() on the other hand is a much, much more flexible tool. It can read multiple items of different data types.
In the particular example you have picked, there is not much of a difference.
gets - Reads characters from stdin and stores them as a string.
scanf - Reads data from stdin and stores them according to the format specified int the scanf statement like %d, %f, %s, etc.
gets:->
gets() reads a line from stdin into the buffer pointed to by s until either a terminating newline or EOF, which it replaces with a null byte ('\0').
BUGS:->
Never use gets(). Because it is impossible to tell without knowing the data in advance how many characters gets() will read, and because gets() will continue to store characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security. Use fgets() instead.
scanf:->
The scanf() function reads input from the standard input stream stdin;
BUG:->
Some times scanf makes boundary problems when deals with array and string concepts.
In case of scanf you need that format mentioned, unlike in gets. So in gets you enter charecters, strings, numbers and spaces.
In case of scanf , you input ends as soon as a white-space is encountered.
But then in your example you are using '%s' so, neither gets() nor scanf() that the strings are valid pointers to arrays of sufficient length to hold the characters you are sending to them. Hence can easily cause an buffer overflow.
Tip: use fgets() , but that all depends on the use case
The concept that scanf does not take white space is completely wrong. If you use this part of code it will take white white space also :
#include<stdio.h>
int main()
{
char name[25];
printf("Enter your name :\n");
scanf("%[^\n]s",name);
printf("%s",name);
return 0;
}
Where the use of new line will only stop taking input. That means if you press enter only then it will stop taking inputs.
So, there is basically no difference between scanf and gets functions. It is just a tricky way of implementation.
scanf() is much more flexible tool while gets() only gets one variable at a time.
gets() is unsafe, for example: char str[1]; gets(str)
if you input more then the length, it will end with SIGSEGV.
if only can use gets, use malloc as the base variable.

Scan whole line from file in C Programming

I was writing a program to input multiple lines from a file.
the problem is i don't know the length of the lines, so i cant use fgets cause i need to give the size of the buffer and cant use fscanf cause it stops at a space token
I saw a solution where he recommended using malloc and realloc for each character taken as input but i think there's an easier way and then i found someone suggesting using
fscanf(file,"%[^\n]",line);
Does anyone have a better solution or can someone explain how the above works?(i haven't tested it)
i use GCC Compiler, if that's needed
You can use getline(3). It allocates memory on your behalf, which you should free when you are finished reading lines.
and then i found someone suggesting using fscanf(file,"%[^\n]",line);
That's practically an unsafe version of fgets(line, sizeof line, file);. Don't do that.
If you don't know the file size, you have two options.
There's a LINE_MAX macro defined somewhere in the C library (AFAIK it's POSIX-only, but some implementations may have equivalents). It's a fair assumption that lines don't exceed that length.
You can go the "read and realloc" way, but you don't have to realloc() for every character. A conventional solution to this problem is to exponentially expand the buffer size, i. e. always double the allocated memory when it's exhausted.
A simple format specifier for scanf or fscanf follows this prototype
%specifier
specifiers
As we know d is format specifier for integers Like this
[characters] is Scanset Any number of the characters specified between the brackets.
A dash (-) that is not the first character may produce non-portable behavior in some library implementations.
[^characters] is
Negated scanset Any number of characters none of them specified as characters between the brackets.
fscanf(file,"%[^\n]",line);
Read any characters till occurance of any charcter in Negated scanset in this case newline character
As others suggested you can use getline() or fgets() and see example
The line fscanf(file,"%[^\n]",line); means that it will read anything other than \n into line. This should work in Linux and Windows, I think. But may not work in OS X format which use \r to end a line.

Limit Console Input Length in C:

I am beginning to design a shell application to run within a Linux terminal for a class I am taking.
This, of course, will involve reading variable-length input strings (commands) from the user. I know that I can simply read each command into a buffer of a size that I would consider appropriate, but this has the potential to either a) truncate the command or b) cause a buffer overflow.
If possible, how can way limit the length of user input to the console?
Say, if I set the command length to 3, 123 would be allowed, but if 123 were already present in the input string (before the user has pressed enter) and the user attempted to add 4, no character would print to the console, perhaps even with an 'error ping'.
I realize that I could design such functionality, but if that is needed, I am not sure where to start to do such a thing.
Either a pre-existing solution or advice on implementing my own solution would be greatly appreciated.
Edit:
I suppose a cheap and easy solution would be to read a command on character at a time until an enter signal is reached or the maximum length is reached. Would problems arise with a solution of this sort?
I have little experience with readline, but here's what you could try:
Write a function that checks rl_end (the number of characters in rl_line_buffer)
If you want to allow more, just return rl_getc
If not, you can use rl_ding
Set the rl_getc_function to call your function as described above
As a side note, if you do use readline, you don't need to limit the input at all (the library manages its memory as it goes). Another (simpler) function you might be interested in is getline.
That kind of low-level control of the console is not something that's included in C's rather basic built-in I/O model.
You need to look into something platform-specific, such as ncurses for Unix-like systems.
Without digging into platform-specific controls, you cannot limit how many characters a used may type in a console before hitting "Enter".
What you can do is check for the presence of a newline character in your input buffer; if it isn't there, then the user typed in more characters than you're prepared to deal with. You can reject that input, and then read stdin repeatedly until you see the newline.
Example:
#include <stdio.h>
#include <string.h>
...
char buf[SIZE];
...
printf("Gimme something: ");
fflush(stdout);
if (fgets(buf, sizeof buf, stdin))
{
char *newline = strchr(buf, '\n');
if (!newline)
{
printf("Input too long: \"%s\"\n", buf);
while (!newline && fgets(buf, sizeof buf, stdin))
newline = strchr(buf, '\n');
}
else
{
// do something with buf
}
}
In response to your edit, terminals are usually line-buffered, allowing users to enter as much as they want before hitting enter without you even knowing about it. You could set the terminal to raw or cbreak mode, but then you're entering platform-specific territory.
Instead, I would suggest that you avoid this problem, and accept that a terminal is a silly vestige from 2 million years ago. Most platforms define LINE_MAX to be the maximum line size any program needs to handle. Beyond that, you can simply assume your user is messing with you, and truncate.

C: Reading a text file (with variable-length lines) line-by-line using fread()/fgets() instead of fgetc() (block I/O vs. character I/O)

Is there a getline function that uses fread (block I/O) instead of fgetc (character I/O)?
There's a performance penalty to reading a file character by character via fgetc. We think that to improve performance, we can use block reads via fread in the inner loop of getline. However, this introduces the potentially undesirable effect of reading past the end of a line. At the least, this would require the implementation of getline to keep track of the "unread" part of the file, which requires an abstraction beyond the ANSI C FILE semantics. This isn't something we want to implement ourselves!
We've profiled our application, and the slow performance is isolated to the fact that we are consuming large files character by character via fgetc. The rest of the overhead actually has a trivial cost by comparison. We're always sequentially reading every line of the file, from start to finish, and we can lock the entire file for the duration of the read. This probably makes an fread-based getline easier to implement.
So, does a getline function that uses fread (block I/O) instead of fgetc (character I/O) exist? We're pretty sure it does, but if not, how should we implement it?
Update Found a useful article, Handling User Input in C, by Paul Hsieh. It's a fgetc-based approach, but it has an interesting discussion of the alternatives (starting with how bad gets is, then discussing fgets):
On the other hand the common retort from C programmers (even those considered experienced) is to say that fgets() should be used as an alternative. Of course, by itself, fgets() doesn't really handle user input per se. Besides having a bizarre string termination condition (upon encountering \n or EOF, but not \0) the mechanism chosen for termination when the buffer has reached capacity is to simply abruptly halt the fgets() operation and \0 terminate it. So if user input exceeds the length of the preallocated buffer, fgets() returns a partial result. To deal with this programmers have a couple choices; 1) simply deal with truncated user input (there is no way to feed back to the user that the input has been truncated, while they are providing input) 2) Simulate a growable character array and fill it in with successive calls to fgets(). The first solution, is almost always a very poor solution for variable length user input because the buffer will inevitably be too large most of the time because its trying to capture too many ordinary cases, and too small for unusual cases. The second solution is fine except that it can be complicated to implement correctly. Neither deals with fgets' odd behavior with respect to '\0'.
Exercise left to the reader: In order to determine how many bytes was really read by a call to fgets(), one might try by scanning, just as it does, for a '\n' and skip over any '\0' while not exceeding the size passed to fgets(). Explain why this is insufficient for the very last line of a stream. What weakness of ftell() prevents it from addressing this problem completely?
Exercise left to the reader: Solve the problem determining the length of the data consumed by fgets() by overwriting the entire buffer with a non-zero value between each call to fgets().
So with fgets() we are left with the choice of writing a lot of code and living with a line termination condition which is inconsistent with the rest of the C library, or having an arbitrary cut-off. If this is not good enough, then what are we left with? scanf() mixes parsing with reading in a way that cannot be separated, and fread() will read past the end of the string. In short, the C library leaves us with nothing. We are forced to roll our own based on top of fgetc() directly. So lets give it a shot.
So, does a getline function that's based on fgets (and doesn't truncate the input) exist?
Don't use fread. Use fgets. I take it this is a homework/classproject problem so I'm not providing a complete answer, but if you say it's not, I'll give more advice. It is definitely possible to provide 100% of the semantics of GNU-style getline, including embedded null bytes, using purely fgets, but it requires some clever thinking.
OK, update since this isn't homework:
memset your buffer to '\n'.
Use fgets.
Use memchr to find the first '\n'.
If no '\n' is found, the line is longer than your buffer. Englarge the buffer, fill the new portion with '\n', and fgets into the new portion, repeating as necessary.
If the character following '\n' is '\0', then fgets terminated due to reaching end of a line.
Otherwise, fgets terminated due to reaching EOF, the '\n' is left over from your memset, the previous character is the terminating null that fgets wrote, and the character before that is the last character of actual data read.
You can eliminate the memset and use strlen in place of memchr if you don't care about supporting lines with embedded nulls (either way, the null will not terminate reading; it will just be part of your read-in line).
There's also a way to do the same thing with fscanf and the "%123[^\n]" specifier (where 123 is your buffer limit), which gives you the flexibility to stop at non-newline characters (ala GNU getdelim). However it's probably slow unless your system has a very fancy scanf implementation.
There isn't a big performance difference between fgets and fgetc/setvbuf.
Try:
int c;
FILE *f = fopen("blah.txt","r");
setvbuf(f,NULL,_IOLBF,4096); /* !!! check other values for last parameter in your OS */
while( (c=fgetc(f))!=EOF )
{
if( c=='\n' )
...
else
...
}

Resources