Reading an unknown length line from stdin in c with fgets - c

I am trying to read an unknown length line from stdin using the C language.
I have seen this when looking on the net:
char** str;
gets(&str);
But it seems to cause me some problems and I don't really understand how it is possible to do it this way.
Can you explain me why this example works/doesn't work
and what will be the correct way to implement it (with malloc?)

You don't want a pointer to pointer to char, use an array of chars
char str[128];
or a pointer to char
char *str;
if you choose a pointer you need to reserve space using malloc
str = malloc(128);
Then you can use fgets
fgets(str, 128, stdin);
and remove the trailling newline
char *ptr = strchr(str, '\n');
if (ptr != NULL) *ptr = '\0';
To read an arbitrary long line, you can use getline (a function added to the GNU version of libc):
#define _GNU_SOURCE
#include <stdio.h>
char *foo(FILE * f)
{
int n = 0, result;
char *buf;
result = getline(&buf, &n, f);
if (result < 0) return NULL;
return buf;
}
or your own implementation using fgets and realloc:
char *getline(FILE * f)
{
size_t size = 0;
size_t len = 0;
size_t last = 0;
char *buf = NULL;
do {
size += BUFSIZ; /* BUFSIZ is defined as "the optimal read size for this platform" */
buf = realloc(buf, size); /* realloc(NULL,n) is the same as malloc(n) */
/* Actually do the read. Note that fgets puts a terminal '\0' on the
end of the string, so we make sure we overwrite this */
if (buf == NULL) return NULL;
fgets(buf + last, BUFSIZ, f);
len = strlen(buf);
last = len - 1;
} while (!feof(f) && buf[last] != '\n');
return buf;
}
Call it using
char *str = getline(stdin);
if (str == NULL) {
perror("getline");
exit(EXIT_FAILURE);
}
...
free(str);
More info

Firstly, gets() provides no way of preventing a buffer overrun. That makes it so dangerous it has been removed from the latest C standard. It should not be used. However, the usual usage is something like
char buffer[20];
gets(buffer); /* pray that user enters no more than 19 characters in a line */
Your usage is passing gets() a pointer to a pointer to a pointer to char. That is not what gets() expects, so your code would not even compile.
That element of prayer reflected in the comment is why gets() is so dangerous. If the user enters 20 (or more) characters, gets() will happily write data past the end of buffer. There is no way a programmer can prevent that in code (short of accessing hardware to electrocute the user who enters too much data, which is outside the realm of standard C).
To answer your question, however, the only ways involve allocating a buffer of some size, reading data in some controlled way until that size is reached, reallocating if needed to get a greater size, and continuing until a newline (or end-of-file, or some other error condition on input) is encountered.
malloc() may be used for the initial allocation. malloc() or realloc() may be used for the reallocation (if needed). Bear in mind that a buffer allocated this way must be released (using free()) when the data is no longer needed - otherwise the result is a memory leak.

use the getline() function, this will return the length of the line, and a pointer to the contents of the line in an allocated memory area. (be sure to pass the line pointer to free() when done with it )

"Reading an unknown length line from stdin in c with fgets"
Late response - A Windows approach:
The OP does not specify Linux or Windows, but the viable answers posted in response for this question all seem to have the getline() function in common, which is POSIX only. Functions such as getline() and popen() are very useful and powerful but sadly are not included in Windows environments.
Consequently, implementing such a task in a Windows environment requires a different approach. The link here describes a method that can read input from stdin and has been tested up to 1.8 gigabytes on the system it was developed on. (Also described in the link.)_ The simple code snippet below was tested using the following command line to read large quantities on stdin:
cd c:\dev && dir /s // approximately 1.8Mbyte buffer is returned on my system
Simple example:
#include "cmd_rsp.h"
int main(void)
{
char *buf = {0};
buf = calloc(100, 1);//initialize buffer to some small value
if(!buf)return 0;
cmd_rsp("dir /s", &buf, 100);//recursive directory search on Windows system
printf("%s", buf);
free(buf);
return 0;
}
cmd_rsp() is fully described in the links above, but it is essentially a Windows implementation that includes popen() and getline() like capabilities, packaged up into this very simple function.

if u want to input an unknown length of string or input try using following code.
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
int main()
{
char *m;
clrscr();
printf("please input a string\n");
scanf("%ms",&m);
if (m == NULL)
fprintf(stderr, "That string was too long!\n");
else
{
printf("this is the string %s\n",m);
/* ... any other use of m */
free(m);
}
getch();
return 0;
}
Note that %ms, %as are GNU extensions..

Related

As I create a string in C with the exact size of the word entered by console

I am new to C and I was wondering how I can do so that when a word of X characters is entered through the console, an array of characters is created with the right memory allocation for it, for example I enter "hello" and it is saved in a string of 6-space characters.Thanks!
There is nothing in the ISO C standard library specification or Microsoft C library that directly does what you seek.
In the C POSIX library, however (what's the difference?), the scanf family of functions support an m modifier for string s inputs. When m is used, scanf() will allocate a buffer of an appropriate size, and assign it to the pointer that you provide:
From the Linux scanf() man page:
An optional 'm' character. This is used with string conversions (%s, %c, %[), and relieves the caller of the need to allocate a corresponding buffer to hold the input: instead, scanf() allocates a buffer of sufficient size, and assigns the address of this buffer to the corresponding pointer argument, which should be a pointer to a char * variable (this variable does not need to be initialized before the call). The caller should subsequently free(3) this buffer when it is no longer required.
Here's an example:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *input = NULL;
if (scanf("%ms", &input) != 1) {
fprintf(stderr, "scanf() failed\n");
return 1;
}
printf("scanf() allocated string: \"%s\"\n", input);
free(input);
return 0;
}
Supported C libraries:
glibc -- since 2.7 (267c54dcef41)
musl -- since 0.9.11 (16a1e036)
Unsupported C libraries:
dietlibc -- as of at least 0.34
Microsoft C runtime library (CRT)
Looked interesting to try to directly meet OP's goal of <enter "hello" and it is saved in a string of 6-space character> using standard library functions, even if it took some code.
Sadly standard C does not offer the nice option as suggested with #Jonathon Reinhart good answer.
Instead, below is an approach not suited for learners, but does the job. It does not have a prior line length limit. It recursively forms a link list of characters entered (recursion not suitable for a long line), then allocates an array based on depth, and assigns the array.
#include <stdlib.h>
#include <stdio.h>
typedef struct link_s {
struct link_s *prev;
char ch;
} llink;
char* foo(llink *prior) {
int ch = fgetc(stdin);
if (ch == '\n' || ch == '\r' || ch == EOF) {
llink *ll = prior;
size_t len = 1;
while (ll) {
len++;
ll = ll->prev;
}
char *buf = malloc(len);
if (buf) {
buf[--len] = '\0';
llink *ll = prior;
while (ll) {
buf[--len] = ll->ch;
ll = ll->prev;
}
}
return buf;
}
llink node = {.prev = prior, .ch = (char) ch};
return foo(&node);
}
int main() {
char *s = foo(NULL);
printf("<%s>\n", s);
free(s);
return 0;
}
Input
Hello<enter>
Output
<Hello>
If you only need to read until a newline, you can use getline defined in stdio.h. It does automatic memory allocation. The function will allocate an extra character to store the newline, you can remove it using buf[strcspn(buf, '\n')] = '\0'.
If you want it to read until a whitespace, sad to say there's no standard C library function to meet your needs for arbitrary string lengths (without writing a bunch of your own code). If you are a beginner learning C and need simple I/O, using scanf("%s", buf) into a fixed sized buffer should be plenty.

obstack, gets and getline

I am trying to get a line from stdin. as far as I understand, we should never use gets as said in man page of gets:
Never use gets(). Because it is impossible to tell without knowing
the data in advance how many characters gets() will read, and
because gets() will continue to store characters past the end of the
buffer, it is extremely dangerous to use. It has been used to
break computer security. Use fgets() instead.
it suggests that we can use fgets() instead. the problem with fgets() is that we don't know the size of the user input in advance and fgets() read exactly one less than size bytes from the stream as man said:
fgets() reads in at most one less than size characters from stream
and stores them into the buffer pointed to by s. Reading stops
after an EOF or a newline. If a newline is read, it is stored into
the buffer. A terminating null byte ('\0') is stored after the last
character in the buffer.
There is also another approach which is using POSIX getline() which uses realloc to update the buffer size so we can read any string with arbitrary length from input stream as man said:
Alternatively, before calling getline(), *lineptr can contain a
pointer to a malloc(3)-allocated buffer *n bytes in size. If the
buffer is not large enough to hold the line, getline() resizes it
with realloc(3), updating *lineptr and *n as necessary.
and finally there is another approach which is using obstack as libc manual said:
Aside from this one constraint of order of freeing, obstacks are
totally general: an obstack can contain any number of objects of
any size. They are implemented with macros, so allocation is
usually very fast as long as the objects are usually small. And the
only space overhead per object is the padding needed to start each
object on a suitable boundary...
So we can use obstack for any object of any size an allocation is very fast with a little space overhead which is not a big deal. I wrote this code to read input string without knowing the length of it.
#include <stdio.h>
#include <stdlib.h>
#include <obstack.h>
#define obstack_chunk_alloc malloc
#define obstack_chunk_free free
int main(){
unsigned char c;
struct obstack * mystack;
mystack = (struct obstack *) malloc(sizeof(struct obstack));
obstack_init(mystack);
c = fgetc(stdin);
while(c!='\r' && c!='\n'){
obstack_1grow(mystack,c);
c = fgetc(stdin);
}
printf("the size of the stack is: %d\n",obstack_object_size(mystack));
printf("the input is: %s\n",(char *)obstack_finish(mystack));
return 0;
}
So my question is :
Is it safe to use obstack like this?
Is it like using POSIX getline?
Am I missing something here? any drawbacks?
Why shouldn't I using it?
thanks in advance.
fgets has no drawbacks over gets. It just forces you to acknowledge that you must know the size of the buffer. gets instead requires you to somehow magically know beforehand the length of the input a (possibly malicious) user is going to feed into your program. That is why gets was removed from the C programming language. It is now non-standard, while fgets is standard and portable.
As for knowing the length of the line beforehand, POSIX says that an utility must be prepared to handle lines that fit in buffers that are of LINE_MAX size. Thus you can do:
char line[LINE_MAX];
while (fgets(line, LINE_MAX, fp) != NULL)
and any file that produces problems with that is not a standard text file. In practice everything will be mostly fine if you just don't blindly assume that the last character in the buffer is always '\n' (which it isn't).
getline is a POSIX standard function. obstack is a GNU libc extension that is not portable. getline was built for efficient reading of lines from files, obstack was not, it was built to be generic. With obstack, the string is not properly contiguous in memory / in its final place, until you call obstack_finish.
Use getline if on POSIX, use fgets in programs that need to be maximally portable; look for an emulation of getline for non-POSIX platforms built on fgets.
Why shouldn't I using it?
Well, you shouldn't use getline() if you care about portability. You should use getline() if you're specifically targeting only POSIX systems.
As for obstacks, they're specific to the GNU C library, which might already be a strong reason to avoid them (it further restricts portability). Also, they're not meant to be used for this purpose.
If you aim for portability, just use fgets(). It's not too complicated to write a function similar to getline() based on fgets() -- here's an example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define CHUNKSIZE 1024
char *readline(FILE *f)
{
size_t bufsize = CHUNKSIZE;
char *buf = malloc(bufsize);
if (!buf) return 0;
char *pos = buf;
size_t len = 0;
while (fgets(pos, CHUNKSIZE, f))
{
char *nl = strchr(pos, '\n');
if (nl)
{
// newline found, replace with string terminator
*nl = '\0';
char *tmp = realloc(buf, len + strlen(pos) + 1);
if (tmp) return tmp;
return buf;
}
// no newline, increase buffer size
len += strlen(pos);
char *tmp = realloc(buf, len + CHUNKSIZE);
if (!tmp)
{
free(buf);
return 0;
}
buf = tmp;
pos = buf + len;
}
// handle case when input ends without a newline
char *tmp = realloc(buf, len + 1);
if (tmp) return tmp;
return buf;
}
int main(void)
{
char *input = readline(stdin);
if (!input)
{
fputs("Error reading input!\n", stderr);
return 1;
}
puts(input);
free(input);
return 0;
}
This one removes the newline if it was found and returns a newly allocated buffer (which the caller has to free()). Adapt to your needs. It could be improved by increasing the buffer size only when the buffer was filled completely, with just a bit more code ...

How to put a char into a empty pointer of a string in pure C

I want to store a single char into a char array pointer and that action is in a while loop, adding in a new char every time. I strictly want to be into a variable and not printed because I am going to compare the text. Here's my code:
#include <stdio.h>
#include <string.h>
int main()
{
char c;
char *string;
while((c=getchar())!= EOF) //gets the next char in stdin and checks if stdin is not EOF.
{
char temp[2]; // I was trying to convert c, a char to temp, a const char so that I can use strcat to concernate them to string but printf returns nothing.
temp[0]=c; //assigns temp
temp[1]='\0'; //null end point
strcat(string,temp); //concernates the strings
}
printf(string); //prints out the string.
return 0;
}
I am using GCC on Debain (POSIX/UNIX operating system) and want to have windows compatability.
EDIT:
I notice some communication errors with what I actually intend to do so I will explain: I want to create a system where I can input a unlimited amount of characters and have the that input be store in a variable and read back from a variable to me, and to get around using realloc and malloc I made it so it would get the next available char until EOF. Keep in mind that I am a beginner to C (though most of you have probably guess it first) and haven't had a lot of experience memory management.
If you want unlimited amount of character input, you'll need to actively manage the size of your buffer. Which is not as hard as it sounds.
first use malloc to allocate, say, 1000 bytes.
read until this runs out.
use realloc to allocate 2000
read until this runs out.
like this:
int main(){
int buf_size=1000;
char* buf=malloc(buf_size);
char c;
int n=0;
while((c=getchar())!= EOF)
buf[n++] = c;
if(n=>buf_size-1)
{
buf_size+=1000;
buf=realloc(buf, buf_size);
}
}
buf[n] = '\0'; //add trailing 0 at the end, to make it a proper string
//do stuff with buf;
free(buf);
return 0;
}
You won't get around using malloc-oids if you want unlimited input.
You have undefined behavior.
You never set string to point anywhere, so you can't dereference that pointer.
You need something like:
char buf[1024] = "", *string = buf;
that initializes string to point to valid memory where you can write, and also sets that memory to an empty string so you can use strcat().
Note that looping strcat() like this is very inefficient, since it needs to find the end of the destination string on each call. It's better to just use pointers.
char *string;
You've declared an uninitialised variable with this statement. With some compilers, in debug this may be initialised to 0. In other compilers and a release build, you have no idea what this is pointing to in memory. You may find that when you build and run in release, your program will crash, but appears to be ok in debug. The actual behaviour is undefined.
You need to either create a variable on the stack by doing something like this
char string[100]; // assuming you're not going to receive more than 99 characters (100 including the NULL terminator)
Or, on the heap: -
char string* = (char*)malloc(100);
In which case you'll need to free the character array when you're finished with it.
Assuming you don't know how many characters the user will type, I suggest you keep track in your loop, to ensure you don't try to concatenate beyond the memory you've allocated.
Alternatively, you could limit the number of characters that a user may enter.
const int MAX_CHARS = 100;
char string[MAX_CHARS + 1]; // +1 for Null terminator
int numChars = 0;
while(numChars < MAX_CHARS) && (c=getchar())!= EOF)
{
...
++numChars;
}
As I wrote in comments, you cannot avoid malloc() / calloc() and probably realloc() for a problem such as you have described, where your program does not know until run time how much memory it will need, and must not have any predetermined limit. In addition to the memory management issues on which most of the discussion and answers have focused, however, your code has some additional issues, including:
getchar() returns type int, and to correctly handle all possible inputs you must not convert that int to char before testing against EOF. In fact, for maximum portability you need to take considerable care in converting to char, for if default char is signed, or if its representation has certain other allowed (but rare) properties, then the value returned by getchar() may exceed its maximum value, in which case direct conversion exhibits undefined behavior. (In truth, though, this issue is often ignored, usually to no ill effect in practice.)
Never pass a user-provided string to printf() as the format string. It will not do what you want for some inputs, and it can be exploited as a security vulnerability. If you want to just print a string verbatim then fputs(string, stdout) is a better choice, but you can also safely do printf("%s", string).
Here's a way to approach your problem that addresses all of these issues:
#include <stdio.h>
#include <string.h>
#include <limits.h>
#define INITIAL_BUFFER_SIZE 1024
int main()
{
char *string = malloc(INITIAL_BUFFER_SIZE);
size_t cap = INITIAL_BUFFER_SIZE;
size_t next = 0;
int c;
if (!string) {
// allocation error
return 1;
}
while ((c = getchar()) != EOF) {
if (next + 1 >= cap) {
/* insufficient space for another character plus a terminator */
cap *= 2;
string = realloc(string, cap);
if (!string) {
/* memory reallocation failure */
/* memory was leaked, but it's ok because we're about to exit */
return 1;
}
}
#if (CHAR_MAX != UCHAR_MAX)
/* char is signed; ensure defined behavior for the upcoming conversion */
if (c > CHAR_MAX) {
c -= UCHAR_MAX;
#if ((CHAR_MAX != (UCHAR_MAX >> 1)) || (CHAR_MAX == (-1 * CHAR_MIN)))
/* char's representation has more padding bits than unsigned
char's, or it is represented as sign/magnitude or ones' complement */
if (c < CHAR_MIN) {
/* not representable as a char */
return 1;
}
#endif
}
#endif
string[next++] = (char) c;
}
string[next] = '\0';
fputs(string, stdout);
return 0;
}

Why would scanf crash while reading a string?

This is just a small program I wrote to find a problem with a larger one. Everything changes when I add the line with scanf. I know it is not safe, I read other threads concerning printf errors that suggest other functions. Anything but cin is fine. Btw, I didn't choose the type definitions of the 'messages', that came from my teachers, so I cannot change them.
#include <stdio.h>
#include <string.h>
char message1 [] = "amfdalkfaklmdklfamd.";
char message2 [] = "fnmakajkkjlkjs.";
char initializer [] = ".";
char* com;
char* word;
int main()
{
com = initializer;
int i = 1;
while (i !=4)
{
printf ("%s \n", com);
scanf("%s",word);
i++;
};
return 0;
}
The problem: after a single iteration the program exits, nothing is printed.
The reason the scanf will crash is buffer is not initialized: word has not been assigned a value, so it is pointing nowhere.
You can fix it by allocating some memory to your buffer, and limiting scanf to a certain number of characters, like this:
char word[20];
...
scanf("%19s", word);
Note that the number between % and s, which signifies the maximum number of characters in a string, is less by 1 than the length of the actual buffer. This is because of null terminator, which is required for C strings.
com is a pointer whose value is the address of the literal string initializer. Literal strings are contained within read-only memory areas, but the scanf function will attempt to write into the address given to it, this is an access-violation and causes the OS to kill your process, hence the crash you're seeing.
Change your scanf code to resemble this, note the addition of width limit in the %s placeholder, as well as the use of the scanf_s version to ensure there is no buffer overflow.
static int const BufferLength = 2048; // 2KiB should be sufficient
char* buffer = calloc( BufferLength , 1 );
if( buffer == null ) exit(1);
int fieldCount = scanf_s("%2047s", buffer, BufferLength );
if( fieldCount == 1 ) {
// do stuff with `buffer`
}
free( buffer );
Note that calloc zeroes memory before returning, which means that buffer can serve as a null-terminated string directly, whereas a string allocated with malloc cannot (unless you zero it yourself).
word has no memory associated with it.
char* word;
scanf("%s",word);
Could use
char word[100];
word[0] = '\0';
scanf("%99s",word);
If available, use getline().
Although not standard C, getline() will dynamicaly allocate memory for arbitrarily long user input.
char *line = NULL;
size_t len = 0;
ssize_t read;
while ((read = getline(&line, &len, stdin)) != -1) {
printf("%s", line);
}
free(line);
Linux Programmer's Manual GETLINE(3)

How to read the standard input into string variable until EOF in C?

I am getting "Bus Error" trying to read stdin into a char* variable.
I just want to read whole stuff coming over stdin and put it first into a variable, then continue working on the variable.
My Code is as follows:
char* content;
char* c;
while( scanf( "%c", c)) {
strcat( content, c);
}
fprintf( stdout, "Size: %d", strlen( content));
But somehow I always get "Bus error" returned by calling cat test.txt | myapp, where myapp is the compiled code above.
My question is how do i read stdin until EOF into a variable? As you see in the code, I just want to print the size of input coming over stdin, in this case it should be equal to the size of the file test.txt.
I thought just using scanf would be enough, maybe buffered way to read stdin?
First, you're passing uninitialized pointers, which means scanf and strcat will write memory you don't own. Second, strcat expects two null-terminated strings, while c is just a character. This will again cause it to read memory you don't own. You don't need scanf, because you're not doing any real processing. Finally, reading one character at a time is needlessly slow. Here's the beginning of a solution, using a resizable buffer for the final string, and a fixed buffer for the fgets call
#define BUF_SIZE 1024
char buffer[BUF_SIZE];
size_t contentSize = 1; // includes NULL
/* Preallocate space. We could just allocate one char here,
but that wouldn't be efficient. */
char *content = malloc(sizeof(char) * BUF_SIZE);
if(content == NULL)
{
perror("Failed to allocate content");
exit(1);
}
content[0] = '\0'; // make null-terminated
while(fgets(buffer, BUF_SIZE, stdin))
{
char *old = content;
contentSize += strlen(buffer);
content = realloc(content, contentSize);
if(content == NULL)
{
perror("Failed to reallocate content");
free(old);
exit(2);
}
strcat(content, buffer);
}
if(ferror(stdin))
{
free(content);
perror("Error reading from stdin.");
exit(3);
}
EDIT: As Wolfer alluded to, a NULL in your input will cause the string to be terminated prematurely when using fgets. getline is a better choice if available, since it handles memory allocation and does not have issues with NUL input.
Since you don't care about the actual content, why bother building a string? I'd also use getchar():
int c;
size_t s = 0;
while ((c = getchar()) != EOF)
{
s++;
}
printf("Size: %z\n", s);
This code will correctly handle cases where your file has '\0' characters in it.
Your problem is that you've never allocated c and content, so they're not pointing anywhere defined -- they're likely pointing to some unallocated memory, or something that doesn't exist at all. And then you're putting data into them. You need to allocate them first. (That's what a bus error typically means; you've tried to do a memory access that's not valid.)
(Alternately, since c is always holding just a single character, you can declare it as char c and pass &c to scanf. No need to declare a string of characters when one will do.)
Once you do that, you'll run into the issue of making sure that content is long enough to hold all the input. Either you need to have a guess of how much input you expect and allocate it at least that long (and then error out if you exceed that), or you need a strategy to reallocate it in a larger size if it's not long enough.
Oh, and you'll also run into the problem that strcat expects a string, not a single character. Even if you leave c as a char*, the scanf call doesn't make it a string. A single-character string is (in memory) a character followed by a null character to indicate the end of the string. scanf, when scanning for a single character, isn't going to put in the null character after it. As a result, strcpy isn't going to know where the end of the string is, and will go wandering off through memory looking for the null character.
The problem here is that you are referencing a pointer variable that no memory allocated via malloc, hence the results would be undefined, and not alone that, by using strcat on a undefined pointer that could be pointing to anything, you ended up with a bus error!
This would be the fixed code required....
char* content = malloc (100 * sizeof(char));
char c;
if (content != NULL){
content[0] = '\0'; // Thanks David!
while ((c = getchar()) != EOF)
{
if (strlen(content) < 100){
strcat(content, c);
content[strlen(content)-1] = '\0';
}
}
}
/* When done with the variable */
free(content);
The code highlights the programmer's responsibility to manage the memory - for every malloc there's a free if not, you have a memory leak!
Edit: Thanks to David Gelhar for his point-out at my glitch! I have fixed up the code above to reflect the fixes...of course in a real-life situation, perhaps the fixed value of 100 could be changed to perhaps a #define to make it easy to expand the buffer by doubling over the amount of memory via realloc and trim it to size...
Assuming that you want to get (shorter than MAXL-1 chars) strings and not to process your file char by char, I did as follows:
#include <stdio.h>
#include <string.h>
#define MAXL 256
main(){
char s[MAXL];
s[0]=0;
scanf("%s",s);
while(strlen(s)>0){
printf("Size of %s : %d\n",s,strlen(s));
s[0]=0;
scanf("%s",s);
};
}

Resources