obstack, gets and getline - c

I am trying to get a line from stdin. as far as I understand, we should never use gets as said in man page of gets:
Never use gets(). Because it is impossible to tell without knowing
the data in advance how many characters gets() will read, and
because gets() will continue to store characters past the end of the
buffer, it is extremely dangerous to use. It has been used to
break computer security. Use fgets() instead.
it suggests that we can use fgets() instead. the problem with fgets() is that we don't know the size of the user input in advance and fgets() read exactly one less than size bytes from the stream as man said:
fgets() reads in at most one less than size characters from stream
and stores them into the buffer pointed to by s. Reading stops
after an EOF or a newline. If a newline is read, it is stored into
the buffer. A terminating null byte ('\0') is stored after the last
character in the buffer.
There is also another approach which is using POSIX getline() which uses realloc to update the buffer size so we can read any string with arbitrary length from input stream as man said:
Alternatively, before calling getline(), *lineptr can contain a
pointer to a malloc(3)-allocated buffer *n bytes in size. If the
buffer is not large enough to hold the line, getline() resizes it
with realloc(3), updating *lineptr and *n as necessary.
and finally there is another approach which is using obstack as libc manual said:
Aside from this one constraint of order of freeing, obstacks are
totally general: an obstack can contain any number of objects of
any size. They are implemented with macros, so allocation is
usually very fast as long as the objects are usually small. And the
only space overhead per object is the padding needed to start each
object on a suitable boundary...
So we can use obstack for any object of any size an allocation is very fast with a little space overhead which is not a big deal. I wrote this code to read input string without knowing the length of it.
#include <stdio.h>
#include <stdlib.h>
#include <obstack.h>
#define obstack_chunk_alloc malloc
#define obstack_chunk_free free
int main(){
unsigned char c;
struct obstack * mystack;
mystack = (struct obstack *) malloc(sizeof(struct obstack));
obstack_init(mystack);
c = fgetc(stdin);
while(c!='\r' && c!='\n'){
obstack_1grow(mystack,c);
c = fgetc(stdin);
}
printf("the size of the stack is: %d\n",obstack_object_size(mystack));
printf("the input is: %s\n",(char *)obstack_finish(mystack));
return 0;
}
So my question is :
Is it safe to use obstack like this?
Is it like using POSIX getline?
Am I missing something here? any drawbacks?
Why shouldn't I using it?
thanks in advance.

fgets has no drawbacks over gets. It just forces you to acknowledge that you must know the size of the buffer. gets instead requires you to somehow magically know beforehand the length of the input a (possibly malicious) user is going to feed into your program. That is why gets was removed from the C programming language. It is now non-standard, while fgets is standard and portable.
As for knowing the length of the line beforehand, POSIX says that an utility must be prepared to handle lines that fit in buffers that are of LINE_MAX size. Thus you can do:
char line[LINE_MAX];
while (fgets(line, LINE_MAX, fp) != NULL)
and any file that produces problems with that is not a standard text file. In practice everything will be mostly fine if you just don't blindly assume that the last character in the buffer is always '\n' (which it isn't).
getline is a POSIX standard function. obstack is a GNU libc extension that is not portable. getline was built for efficient reading of lines from files, obstack was not, it was built to be generic. With obstack, the string is not properly contiguous in memory / in its final place, until you call obstack_finish.
Use getline if on POSIX, use fgets in programs that need to be maximally portable; look for an emulation of getline for non-POSIX platforms built on fgets.

Why shouldn't I using it?
Well, you shouldn't use getline() if you care about portability. You should use getline() if you're specifically targeting only POSIX systems.
As for obstacks, they're specific to the GNU C library, which might already be a strong reason to avoid them (it further restricts portability). Also, they're not meant to be used for this purpose.
If you aim for portability, just use fgets(). It's not too complicated to write a function similar to getline() based on fgets() -- here's an example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define CHUNKSIZE 1024
char *readline(FILE *f)
{
size_t bufsize = CHUNKSIZE;
char *buf = malloc(bufsize);
if (!buf) return 0;
char *pos = buf;
size_t len = 0;
while (fgets(pos, CHUNKSIZE, f))
{
char *nl = strchr(pos, '\n');
if (nl)
{
// newline found, replace with string terminator
*nl = '\0';
char *tmp = realloc(buf, len + strlen(pos) + 1);
if (tmp) return tmp;
return buf;
}
// no newline, increase buffer size
len += strlen(pos);
char *tmp = realloc(buf, len + CHUNKSIZE);
if (!tmp)
{
free(buf);
return 0;
}
buf = tmp;
pos = buf + len;
}
// handle case when input ends without a newline
char *tmp = realloc(buf, len + 1);
if (tmp) return tmp;
return buf;
}
int main(void)
{
char *input = readline(stdin);
if (!input)
{
fputs("Error reading input!\n", stderr);
return 1;
}
puts(input);
free(input);
return 0;
}
This one removes the newline if it was found and returns a newly allocated buffer (which the caller has to free()). Adapt to your needs. It could be improved by increasing the buffer size only when the buffer was filled completely, with just a bit more code ...

Related

C redirect fprintf into a buffer or char array

I have the following function and I am wondering if there is a way to pass string or char array instead of stdout into it so I can get the printed representation as a string.
void print_Type(Type t, FILE *f)
{
fprintf(f,"stuff ...");
}
print_Type(t, stdout);
I have already tried this:
int SIZE = 100;
char buffer[SIZE];
print_Type(t, buffer);
But this is what I am seeing:
�����
Something like this
FILE* f = fmemopen(buffer, sizeof(buffer), "w");
print_Type(t, f);
fclose(f);
The fmemopen(void *buf, size_t size, const char *mode) function opens a stream. The stream allows I/O to be performed on the string or memory buffer pointed to by buf.
Yes there is sprintf() notice the leading s rather than f.
int SIZE = 100;
char buffer[SIZE];
sprintf(buffer, "stuff %d", 10);
This function prints to a string s rather than a file f. It has exactly the same properties and parameters to fprintf() the only difference is the destination, which must be a char array (either statically allocated as an array or dynamical allocated (usually via malloc)).
Note: This function is dangerous as it does not check the length and can easily overrun the end of the buffer if you are not careful.
If you are using a later version of C (c99). A better function is snprintf this adds the extra buffer length checking.
The problem with fmemopen is that it cannot resize the buffer. fmemopen did exist in Glibc for quite some time, but it was standardized only in POSIX.1-2008. But that revision included another function that handles dynamic memory allocation: open_memstream(3):
char *buffer = NULL;
size_t size = 0;
FILE* f = open_memstream(&buffer, &size);
print_Type(t, f);
fclose(f);
buffer will now point to a null-terminated buffer, with size bytes before the extra null terminator! I.e. you didn't write null bytes, then strlen(buffer) == size.
Thus the only merit of fmemopen is that it can be used to write to a fixed location memory buffer or fixed length, whereas open_memstream should be used everywhere else where the location of the buffer does not matter.
For fmemopen there is yet another undesired feature - the writes may fail when the buffer is being flushed and not before. Since the target is in memory, there is no point in buffering the writes, so it is suggested that if you choose to use fmemopen, Linux manual page fmemopen(3) recommends disabling buffering with setbuf(f, NULL);

Reading a line from file in C, dynamically

#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE *input_f;
input_f = fopen("Input.txt", "r"); //Opens the file in read mode.
if (input_f != NULL)
{
char line[2048];
while( fgets(line, sizeof line, input_f) != NULL )
{
//do something
}
fclose(input_f); //Close the input file.
}
else
{
perror("File couldn't opened"); //Will print that file couldn't opened and why.
}
return 0;
}
Hi. I know I can read line by line with this code in C, but I don't want to limit line size, say like in this code with 2048.
I thought about using malloc, but I don't know the size of the line before I read it, so IMO it cannot be done.
Is there a way to not to limit line size?
This question is just for my curiosity, thank you.
When you are allocating memory dynamically, you will want to change:
char line[2048];
to
#define MAXL 2048 /* the use of a define will become apparent when you */
size_t maxl = MAXL; /* need to check to determine if a realloc is needed */
char *line = malloc (maxl * sizeof *line);
if (!line) /* always check to insure allocation succeeded */
...error.. memory allocation failed
You read read up to (maxl -1) chars or a newline (if using fgetc, etc..) or read the line and then check whether line [strlen (line) - 1] == '\n' to determine whether you read the entire line (if using fgets). (POSIX requires all lines terminate with a newline) If you read maxl characters (fgetc) or did not read the newline (fgets), then it is a short read and more characters remain. Your choice is to realloc (generally doubling the size) and try again. To realloc:
char *tmp = realloc (line, 2 * maxl)
if (tmp) {
line = tmp;
maxl *= 2;
}
Note: never reallocate using your original pointer (e.g. line = realloc (line, 2 * maxl) because if realloc fails, the memory is freed and the pointer set to NULL and you will lose any data that existed in line. Also note that maxl is typically doubled each time you realloc. However, you are free to choose whatever size increasing scheme you like. (If you are concerned about zeroing all new memory allocated, you can use memset to initialize the newly allocated space to zero/null. Useful in some situations where you want to insure your line is always null-terminated)
That is the basic dynamic allocation/reallocation scheme. Note you are reading until you read the complete line, so you will need to restructure your loop test. And lastly, since you allocated the memory, you are responsible for freeing the memory when you are done with it. A tool you cannot live without is valgrind (or similar memory checker) to confirm you are not leaking memory.
Tip if you are reading and want to insure your string is always null-terminated, then after allocating your block of memory, zero (0) all characters. As mentioned earlier, memset is available, but if you choose calloc instead of malloc it will zero the memory for you. However, on realloc the new space is NOT zero'ed either way, so calling memset is required regardless of what function originally allocated the block.
Tip2 Look at the POSIX getline. getline will handle the allocation/reallocation needed so long as line is initialized to NULL. getline also returns the number of characters actually read dispensing with the need to call strlen after fgets to determine the same.
Let me know if you have additional questions.
Consider 2 thoughts:
An upper bound of allocated memory is reasonable. The nature of the task should have some idea of a maximum line length, be it 80, 1024 or 1 Mbyte.
With a clever OS, actual usage of allocated memory may not occur until needed. See Why is malloc not "using up" the memory on my computer?
So let code allocate 1 big buffer to limit pathological cases and let the underlying memory management (re-)allocate real memory as needed.
#define N (1000000)
char *buf = malloc(N);
...
while (fgets(buf, N, stdin) != NULL)) {
size_t len = strlen(buf);
if (len == N-1) {
perror("Excessive Long Line");
exit(EXIT_FAILURE);
}
}
free(buf);

Reading an unknown length line from stdin in c with fgets

I am trying to read an unknown length line from stdin using the C language.
I have seen this when looking on the net:
char** str;
gets(&str);
But it seems to cause me some problems and I don't really understand how it is possible to do it this way.
Can you explain me why this example works/doesn't work
and what will be the correct way to implement it (with malloc?)
You don't want a pointer to pointer to char, use an array of chars
char str[128];
or a pointer to char
char *str;
if you choose a pointer you need to reserve space using malloc
str = malloc(128);
Then you can use fgets
fgets(str, 128, stdin);
and remove the trailling newline
char *ptr = strchr(str, '\n');
if (ptr != NULL) *ptr = '\0';
To read an arbitrary long line, you can use getline (a function added to the GNU version of libc):
#define _GNU_SOURCE
#include <stdio.h>
char *foo(FILE * f)
{
int n = 0, result;
char *buf;
result = getline(&buf, &n, f);
if (result < 0) return NULL;
return buf;
}
or your own implementation using fgets and realloc:
char *getline(FILE * f)
{
size_t size = 0;
size_t len = 0;
size_t last = 0;
char *buf = NULL;
do {
size += BUFSIZ; /* BUFSIZ is defined as "the optimal read size for this platform" */
buf = realloc(buf, size); /* realloc(NULL,n) is the same as malloc(n) */
/* Actually do the read. Note that fgets puts a terminal '\0' on the
end of the string, so we make sure we overwrite this */
if (buf == NULL) return NULL;
fgets(buf + last, BUFSIZ, f);
len = strlen(buf);
last = len - 1;
} while (!feof(f) && buf[last] != '\n');
return buf;
}
Call it using
char *str = getline(stdin);
if (str == NULL) {
perror("getline");
exit(EXIT_FAILURE);
}
...
free(str);
More info
Firstly, gets() provides no way of preventing a buffer overrun. That makes it so dangerous it has been removed from the latest C standard. It should not be used. However, the usual usage is something like
char buffer[20];
gets(buffer); /* pray that user enters no more than 19 characters in a line */
Your usage is passing gets() a pointer to a pointer to a pointer to char. That is not what gets() expects, so your code would not even compile.
That element of prayer reflected in the comment is why gets() is so dangerous. If the user enters 20 (or more) characters, gets() will happily write data past the end of buffer. There is no way a programmer can prevent that in code (short of accessing hardware to electrocute the user who enters too much data, which is outside the realm of standard C).
To answer your question, however, the only ways involve allocating a buffer of some size, reading data in some controlled way until that size is reached, reallocating if needed to get a greater size, and continuing until a newline (or end-of-file, or some other error condition on input) is encountered.
malloc() may be used for the initial allocation. malloc() or realloc() may be used for the reallocation (if needed). Bear in mind that a buffer allocated this way must be released (using free()) when the data is no longer needed - otherwise the result is a memory leak.
use the getline() function, this will return the length of the line, and a pointer to the contents of the line in an allocated memory area. (be sure to pass the line pointer to free() when done with it )
"Reading an unknown length line from stdin in c with fgets"
Late response - A Windows approach:
The OP does not specify Linux or Windows, but the viable answers posted in response for this question all seem to have the getline() function in common, which is POSIX only. Functions such as getline() and popen() are very useful and powerful but sadly are not included in Windows environments.
Consequently, implementing such a task in a Windows environment requires a different approach. The link here describes a method that can read input from stdin and has been tested up to 1.8 gigabytes on the system it was developed on. (Also described in the link.)_ The simple code snippet below was tested using the following command line to read large quantities on stdin:
cd c:\dev && dir /s // approximately 1.8Mbyte buffer is returned on my system
Simple example:
#include "cmd_rsp.h"
int main(void)
{
char *buf = {0};
buf = calloc(100, 1);//initialize buffer to some small value
if(!buf)return 0;
cmd_rsp("dir /s", &buf, 100);//recursive directory search on Windows system
printf("%s", buf);
free(buf);
return 0;
}
cmd_rsp() is fully described in the links above, but it is essentially a Windows implementation that includes popen() and getline() like capabilities, packaged up into this very simple function.
if u want to input an unknown length of string or input try using following code.
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
int main()
{
char *m;
clrscr();
printf("please input a string\n");
scanf("%ms",&m);
if (m == NULL)
fprintf(stderr, "That string was too long!\n");
else
{
printf("this is the string %s\n",m);
/* ... any other use of m */
free(m);
}
getch();
return 0;
}
Note that %ms, %as are GNU extensions..

Proper memory allocation?

How would I only allocate as much memory as really needed without knowing how big the arguments to the function are?
Usually, I would use a fixed size, and calculate the rest with sizeof (note: the code isn't supposed to make sense, but to show the problem):
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
int test(const char* format, ...)
{
char* buffer;
int bufsize;
int status;
va_list arguments;
va_start(arguments, format);
bufsize = 1024; /* fixed size */
bufsize = sizeof(arguments) + sizeof(format) + 1024;
buffer = (char*)malloc(bufsize);
status = vsprintf(buffer, format, arguments);
fputs(buffer, stdout);
va_end(arguments);
return status;
}
int main()
{
const char* name = "World";
test("Hello, %s\n", name);
return 0;
}
However, I don't think this is the way to go... so, how would I calculate the required buffersize properly here?
If you have vsnprintf available to you, I would make use of that. It prevents buffer overflow since you provide the buffer size, and it returns the actual size needed.
So allocate your 1K buffer then attempt to use vsnprintf to write into that buffer, limiting the size. If the size returned was less than or equal to your buffer size, then it's worked and you can just use the buffer.
If the size returned was greater than the buffer size, then call realloc to get a bigger buffer and try it again. Provided the data hasn't changed (e.g., threading issues), the second one will work fine since you already know how big it will be.
This is relatively efficient provided you choose your default buffer size carefully. If the vast majority of your outputs are within that limit, very few reallocations has to take place (see below for a possible optimisation).
If you don't have an vsnprintf-type function, a trick we've used before is to open a file handle to /dev/null and use that for the same purpose (checking the size before outputting to a buffer). Use vfprintf to that file handle to get the size (the output goes to the bit bucket), then allocate enough space based on the return value, and vsprintf to that buffer. Again, it should be large enough since you've figured out the needed size.
An optimisation to the methods above would be to use a local buffer, rather than an allocated buffer, for the 1K chunk. This avoids having to use malloc in those situations where it's unnecessary, assuming your stack can handle it.
In other words, use something like:
int test(const char* format, ...)
{
char buff1k[1024];
char *buffer = buff1k; // default to local buffer, no malloc.
:
int need = 1 + vsnprintf (buffer, sizeof (buff1k), format, arguments);
if (need > sizeof (buff1k)) {
buffer = malloc (need);
// Now you have a big-enough buffer, vsprintf into there.
}
// Use string at buffer for whatever you want.
...
// Only free buffer if it was allocated.
if (buffer != buff1k)
free (buffer);
}

How to read the standard input into string variable until EOF in C?

I am getting "Bus Error" trying to read stdin into a char* variable.
I just want to read whole stuff coming over stdin and put it first into a variable, then continue working on the variable.
My Code is as follows:
char* content;
char* c;
while( scanf( "%c", c)) {
strcat( content, c);
}
fprintf( stdout, "Size: %d", strlen( content));
But somehow I always get "Bus error" returned by calling cat test.txt | myapp, where myapp is the compiled code above.
My question is how do i read stdin until EOF into a variable? As you see in the code, I just want to print the size of input coming over stdin, in this case it should be equal to the size of the file test.txt.
I thought just using scanf would be enough, maybe buffered way to read stdin?
First, you're passing uninitialized pointers, which means scanf and strcat will write memory you don't own. Second, strcat expects two null-terminated strings, while c is just a character. This will again cause it to read memory you don't own. You don't need scanf, because you're not doing any real processing. Finally, reading one character at a time is needlessly slow. Here's the beginning of a solution, using a resizable buffer for the final string, and a fixed buffer for the fgets call
#define BUF_SIZE 1024
char buffer[BUF_SIZE];
size_t contentSize = 1; // includes NULL
/* Preallocate space. We could just allocate one char here,
but that wouldn't be efficient. */
char *content = malloc(sizeof(char) * BUF_SIZE);
if(content == NULL)
{
perror("Failed to allocate content");
exit(1);
}
content[0] = '\0'; // make null-terminated
while(fgets(buffer, BUF_SIZE, stdin))
{
char *old = content;
contentSize += strlen(buffer);
content = realloc(content, contentSize);
if(content == NULL)
{
perror("Failed to reallocate content");
free(old);
exit(2);
}
strcat(content, buffer);
}
if(ferror(stdin))
{
free(content);
perror("Error reading from stdin.");
exit(3);
}
EDIT: As Wolfer alluded to, a NULL in your input will cause the string to be terminated prematurely when using fgets. getline is a better choice if available, since it handles memory allocation and does not have issues with NUL input.
Since you don't care about the actual content, why bother building a string? I'd also use getchar():
int c;
size_t s = 0;
while ((c = getchar()) != EOF)
{
s++;
}
printf("Size: %z\n", s);
This code will correctly handle cases where your file has '\0' characters in it.
Your problem is that you've never allocated c and content, so they're not pointing anywhere defined -- they're likely pointing to some unallocated memory, or something that doesn't exist at all. And then you're putting data into them. You need to allocate them first. (That's what a bus error typically means; you've tried to do a memory access that's not valid.)
(Alternately, since c is always holding just a single character, you can declare it as char c and pass &c to scanf. No need to declare a string of characters when one will do.)
Once you do that, you'll run into the issue of making sure that content is long enough to hold all the input. Either you need to have a guess of how much input you expect and allocate it at least that long (and then error out if you exceed that), or you need a strategy to reallocate it in a larger size if it's not long enough.
Oh, and you'll also run into the problem that strcat expects a string, not a single character. Even if you leave c as a char*, the scanf call doesn't make it a string. A single-character string is (in memory) a character followed by a null character to indicate the end of the string. scanf, when scanning for a single character, isn't going to put in the null character after it. As a result, strcpy isn't going to know where the end of the string is, and will go wandering off through memory looking for the null character.
The problem here is that you are referencing a pointer variable that no memory allocated via malloc, hence the results would be undefined, and not alone that, by using strcat on a undefined pointer that could be pointing to anything, you ended up with a bus error!
This would be the fixed code required....
char* content = malloc (100 * sizeof(char));
char c;
if (content != NULL){
content[0] = '\0'; // Thanks David!
while ((c = getchar()) != EOF)
{
if (strlen(content) < 100){
strcat(content, c);
content[strlen(content)-1] = '\0';
}
}
}
/* When done with the variable */
free(content);
The code highlights the programmer's responsibility to manage the memory - for every malloc there's a free if not, you have a memory leak!
Edit: Thanks to David Gelhar for his point-out at my glitch! I have fixed up the code above to reflect the fixes...of course in a real-life situation, perhaps the fixed value of 100 could be changed to perhaps a #define to make it easy to expand the buffer by doubling over the amount of memory via realloc and trim it to size...
Assuming that you want to get (shorter than MAXL-1 chars) strings and not to process your file char by char, I did as follows:
#include <stdio.h>
#include <string.h>
#define MAXL 256
main(){
char s[MAXL];
s[0]=0;
scanf("%s",s);
while(strlen(s)>0){
printf("Size of %s : %d\n",s,strlen(s));
s[0]=0;
scanf("%s",s);
};
}

Resources