Reading from stdin (file of variable length)

Reading from stdin (file of variable length) - c

So I've been trying to get this to assignment work in various different ways, but each time I get different errors. Basically what we have is a program that needs to read, byte by byte, the contents of a file that will be piped in (the file length could be humongous so we can't just call malloc and allocated a large chunk of space). We are required to use realloc to expand the amount of freed memory until we reach the end of the file. The final result should be one long C string (array) containing each byte (and we can't disregard null bytes either if they are part of the file). What I have at the moment is:
char *buff;
int n = 0;
char c;
int count;
if (ferror (stdin))
{
fprintf(stderr, "error reading file\n");
exit (1);
}
else
{
do {
buff = (char*) realloc (buff, n+1);
c = fgetc (stdin);
buff[n] = c;
if (c != EOF)
n++;
}
while (c != EOF);
}
printf("characters entered: ");
for (count = 0; count < n; count++)
printf("%s ", buff[count]);
free (buff);
It should keep reading until the end of the file, expanding the memory each time but when I try to run it by piping in a simple text file, it tells me I have a segmentation fault. I'm not quite sure what I'm doing wrong.
Note that we're allowed to use malloc and whatnot, but I couldn't see how to make that work since we have know idea how much memory is needed.

You are using an unassigned pointer buf in your first call to realloc. Change to
char *buf = malloc(100);
to avoid this problem.
Once you get it working, you'll notice that your program is rather inefficient, with a realloc per character. Consider realloc-ing in larger chunks to reduce the number of reallocations.

char* buff;
...
buff = (char*) realloc (buff, n+1);
You're trying to reallocate an unitialized pointer, which leads to undefined behaviour. Change to
char* buff = 0;
...
buff = (char*) realloc (buff, n+1);
But as has been pointed out, this is very inefficient.

Seems like the answers by #dasblinkenlight and #smocking are the current reason, but to avoid the next crashes:
Change char c; to int c;, as the EOF is represented by more than one char.
This is a bad idea to call realloc for one char at a time, instead increase the size in X bytes (let's say 100) each time, this will be MUCH more efficient.
You need to add the null terminator ('\0') at the end of the buffer, otherwise - undefined behavior at printf().

Here's what I came up with for reading stdin into a char[] or char* (when having embedded NULLs in stdin):
char* content = NULL;
char c;
int contentSize = 0;
while ((c = fgetc(stdin)) != EOF){
contentSize++;
content = (char*)(realloc(content, contentSize+1));
if (content == NULL) {
perror("Realloc failed.");
exit(2);
}
content[contentSize] = c;
}
for (int i = 0; i < contentSize; ++i) {
printf("%c",content[i]);
}

Related

C - Copying text from a file results in unknown characters being copied over as well

When running the following C file, copying the character to fgetc to my tmp pointer results in unknown characters being copied over for some reason. The characters received from fgetc() are the expected characters. However, for some reason when assigning this character to my tmp pointer unknown characters get copied over.
I've tried looking for the reason why online, but haven't found any luck. From what I have read it could be something to do with UTF-8 and ASCII issues. However, I'm not sure about the fix. I'm a relatively new C programmer and still new to memory management.
Output:
TMP: Hello, DATA!�
TEXT: Hello, DATA!�
game.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <allegro5/allegro5.h>
#include <allegro5/allegro_font.h>
const int WIN_WIDTH = 1366;
const int WIN_HEIGHT = 768;
char *readFile(const char *fileName) {
FILE *file;
file = fopen(fileName, "r");
if (file == NULL) {
printf("File could not be opened for reading.\n");
}
size_t tmpSize = 1;
char *tmp = (char *)malloc(tmpSize);
if (tmp == NULL) {
printf("malloc() could not be called on tmp.\n");
}
for (int c = fgetc(file); c != EOF; c = fgetc(file)) {
if (c != NULL) {
if (tmpSize > 1)
tmp = (char *)realloc(tmp, tmpSize);
tmp[tmpSize - 1] = (char *)c;
tmpSize++;
}
}
tmp[tmpSize] = 0;
fclose(file);
printf("TMP: %s\n", tmp);
return tmp;
}
int main(int argc, char **argv) {
al_init();
al_install_keyboard();
ALLEGRO_TIMER* timer = al_create_timer (1.0 / 30.0);
ALLEGRO_EVENT_QUEUE *queue = al_create_event_queue();
ALLEGRO_DISPLAY* display = al_create_display(WIN_WIDTH, WIN_HEIGHT);
ALLEGRO_FONT* font = al_create_builtin_font();
al_register_event_source(queue, al_get_keyboard_event_source());
al_register_event_source(queue, al_get_display_event_source(display));
al_register_event_source(queue, al_get_timer_event_source(timer));
int redraw = 1;
ALLEGRO_EVENT event;
al_start_timer(timer);
char *text = readFile("game.DATA");
printf("TEXT: %s\n", text);
while (1) {
al_wait_for_event(queue, &event);
if (event.type == ALLEGRO_EVENT_TIMER)
redraw = 1;
else if ((event.type == ALLEGRO_EVENT_KEY_DOWN) || (event.type == ALLEGRO_EVENT_DISPLAY_CLOSE))
break;
if (redraw && al_is_event_queue_empty(queue)) {
al_clear_to_color(al_map_rgb(0, 0, 0));
al_draw_text(font, al_map_rgb(255, 255, 255), 0, 0, 0, text);
al_flip_display();
redraw = false;
}
}
free(text);
al_destroy_font(font);
al_destroy_display(display);
al_destroy_timer(timer);
al_destroy_event_queue(queue);
return 0;
}
game.DATA file:
Hello, DATA!
What I use to run the program:
gcc game.c -o game $(pkg-config allegro-5 allegro_font-5 --libs --cflags)
--EDIT--
I tried taking the file reading code and running it in a new c file, for some reason it works there, but not when in the game.c file with allegro code.
test.c:
#include <stdlib.h>
#include <stdio.h>
char *readFile(const char *fileName) {
FILE *file;
file = fopen(fileName, "r");
if (file == NULL) {
printf("File could not be opened for reading.\n");
}
size_t tmpSize = 1;
char *tmp = (char *)malloc(tmpSize);
if (tmp == NULL) {
printf("malloc() could not be called on tmp.\n");
}
for (int c = fgetc(file); c != EOF; c = fgetc(file)) {
if (c != NULL) {
if (tmpSize > 1)
tmp = (char *)realloc(tmp, tmpSize);
tmp[tmpSize - 1] = (char *)c;
tmpSize++;
}
}
tmp[tmpSize] = 0;
fclose(file);
printf("TMP: %s\n", tmp);
return tmp;
}
void main() {
char *text = readFile("game.DATA");
printf("TEXT: %s\n", text);
free(text);
return 0;
}
Produces the correct output always:
TMP: Hello, DATA!
TEXT: Hello, DATA!

When you write a loop that updates various things each time through, like you do with tmpSize in your loop here, it's important to have a handle on what the theoretical computer science types call your "loop invariants". That is, what is it that's true each time through the loop? It's important not only to maintain your loop invariants properly, but also to pick your loop invariants so that they're easy to maintain, and easy for a later reader to understand and to verify.
Since tmpSize starts out as 1, I'm guessing your loop invariant is trying to be, "tmpSize is always one more than the size of the string I've read so far". A reason for picking that slightly-strange loop invariant is, of course, that you'll need that extra byte for the terminating \0. The other clue is that you're setting tmp[tmpSize-1] = c;.
But here's the first problem. When we exit the loop, and if tmpSize is still one more than the size of the string you've read so far, let's see what happens. Suppose we read three characters. So tmpSize should be 4. So we'll set tmp[4] = 0;. But wait! Remember, arrays in C are 0-based. So the three characters we read are in tmp[0], tmp[1], and tmp[2], and we want the terminating \0 character to go into tmp[3], not tmp[4]. Something is wrong.
But actually, it's worse than that. I wasn't at all sure I understood the loop invariant, so I cheated, and inserted a few debugging printouts. Right before the realloc call, I added
printf("realloc %zu\n", tmpSize);
and at the end, right before the tmp[tmpSize] = 0; line, I added
printf("final %zu\n", tmpSize);
The last few lines it printed (while reading a game.DATA file containing "Hello, DATA!" just like yours) were:
...
realloc 10
realloc 11
realloc 12
final 13
But this is off by two! If the last reallocation gave the array a size of 12, the valid indices are from 0 to 11. But somehow we end up writing the \0 into cell 13.
It took me a while to figure it out, but the second problem is that you do the reallocation at the top of the loop, before you've incremented tmpLen.
To me, the loop invariant of "one more than the size of the string read so far" is just too hard to think about. I very much prefer to use a loop invariant where the "size" variable keeps track of the number of characters I have read, not +1 or -1 off of that. Let's see how that loop might look. (I've also cleaned up a few other things.)
size_t tmpSize = 0;
char *tmp = malloc(tmpSize+1);
if (tmp == NULL) {
printf("malloc() failed.\n");
exit(1);
}
for (int c = getc(file); c != EOF; c = getc(file)) {
printf("realloc %zu\n", tmpSize+1+1);
tmp = realloc(tmp, tmpSize+1+1); /* +1 for c, +1 for \0 */
if (tmp == NULL) {
printf("realloc() failed.\n");
exit(1);
}
tmp[tmpSize] = c;
tmpSize++;
}
printf("final %zu\n", tmpSize);
tmp[tmpSize] = '\0';
There's still something fishy here -- I said I didn't like "fudge factors" like +1, and here I've got two -- but at least now the debugging printouts go
...
realloc 11
realloc 12
realloc 13
final 12
so it looks like I'm not overrunning the allocated memory any more.
To make this even better, I want to take a slightly different approach. You're not supposed to worry abut efficiency at first, but I can tell you that a loop that calls realloc to make the buffer bigger by 1, each time it reads a character, can end up being really inefficient. So let's make a few more changes:
size_t nchAllocated = 0;
size_t nchRead = 0;
char *tmp = NULL;
for (int c = getc(file); c != EOF; c = getc(file)) {
if(nchAllocated <= nchRead) {
nchAllocated += 10;
printf("realloc %zu\n", nchAllocated);
tmp = realloc(tmp, nchAllocated);
if (tmp == NULL) {
printf("realloc() failed.\n");
exit(1);
}
}
tmp[nchRead++] = c;
}
printf("final %zu\n", nchRead);
tmp[nchRead] = '\0';
Now there are two separate variables: nchAllocated keeps track of exactly how many characters I've allocated, and nchRead keeps track of exactly how many characters I've read. And although I've doubled the number of "counter" variables, in doing so I've simplified a lot of other things, so I think it's a net improvement.
First of all, notice that there are no +1 fudge factors any more, at all.
Second, this loop doesn't call realloc every time -- instead it allocates characters 10 at a time. And because there are separate variables for the number of characters allocated versus read, it can keep track of the fact that it may have allocated more characters than it has read so far. For this code, the debugging printouts are:
realloc 10
realloc 20
final 12
Another little improvement is that we don't have to "preallocate" the array -- there's no initial malloc call. One of our loop invariants is that nchAllocated is the number of characters allocated, and we start this out as 0, and if there are no characters allocated, then it's okay that tmp starts out as NULL. This relies on the fact that when you call realloc for the first time, with tmp equal to NULL, realloc is fine with that, and essentially acts like malloc.
But there's one question you might be asking: If I got rid of all my fudge factors, where do we arrange to allocate one extra byte to hold the terminating \0 character? It's there, but it's subtle: it's lurking in the test
if(nchAllocated <= nchRead)
The very first time through the loop, nchAllocated will be 0, and nchRead will be 0, but this test will be true, so we'll allocate our first chunk of 10 characters, and we're off and running. (If we didn't care about the \0 character, the test nchAllocated < nchRead would have sufficed.)
...But, actually, I've made a mistake! There's a subtle bug here!
What if the file being read is empty? tmp will start out NULL, and we'll never make any trips through the loop, so tmp will remain NULL, so when we assign tmp[nchRead] = 0 it'll blow up.
And actually, it's worse than that. If you trace through the logic very carefully, you'll find that any time the file size is an exact multiple of 10, not enough space gets allocated for the \0, after all.
And this indicates a significant drawback of the "allocate characters 10 at a time" scheme. The code is now harder to test, because the control flow is different for files whose size is a multiple of 10. If you never happen to test that case, you won't realize that this program has a bug in it.
The way I usually fix this is to notice that the \0 byte I have to add to terminate the string is sort of balanced by the EOF character I read that indicated the end of the file. Maybe, when I read the EOF, I can use it to remind me to allocate space for the \0. That's actually easy enough to do, and it looks like this:
int c;
while(1) {
c = getc(file);
if(nchAllocated <= nchRead) {
nchAllocated += 10;
printf("realloc %zu\n", nchAllocated);
tmp = realloc(tmp, nchAllocated);
if (tmp == NULL) {
printf("realloc() failed.\n");
exit(1);
}
}
if(c == EOF)
break;
tmp[nchRead++] = c;
}
printf("final %zu\n", nchRead);
tmp[nchRead] = '\0';
The trick here is that we don't test for EOF until after we've checked that there's enough space in the buffer, and called realloc if necessary. It's as if we allocate space in the buffer for the EOF -- except then we use that space for the \0 instead. This is what I meant by "use it to remind me to allocate space for the \0".
Now, I have to admit that there's still a drawback here, in that the loop is now somewhat unconventional. A loop that has while(1) at the top looks like an infinite loop. This one has
if(c == EOF) break;
down in the middle of it, so it is literally a "break in the middle" loop. (This is by contrast to conventional for and while loops, which are "break at the top", or a do/while loop, which is "break at the bottom".) Personally, I find this to be a useful idiom, and I use it all the time. But some programmers, and perhaps your instructor, would frown on it, because it's "weird", it's "different", it's "unconventional". And to some extent they're right: unconventional programming is somewhat dangerous programming, and is bad if later maintenance programmers can't understand it because they don't recognize or don't understand the idioms in it. (It's sort of the programming equivalent of the English word "ain't", or a split infinitive.)
Finally, if you're still with me, I have one more point to make. (And if you are still with me, thank you. I realize this answer has gotten very long, but I hope you're learning something.)
Earlier I said that "a loop that calls realloc to make the buffer bigger by 1, each time it reads a character, can end up being really inefficient." It turns out that a loop that makes the buffer bigger by 10 isn't much better, and can still be significantly inefficient. You can do a little better by incrementing it by 50 or 100, but if you're dealing with input that might be really big (thousands of characters or more), you're usually better off increasing the buffer size by leaps and bounds, perhaps by multiplying it by some factor, rather than adding. So here's the final version of that part of the loop:
if(nchAllocated <= nchRead) {
if(nchAllocated == 0) nchAllocated = 10;
else nchAllocated *= 2;
printf("realloc %zu\n", nchAllocated);
tmp = realloc(tmp, nchAllocated);
And even this improvement -- multiplying by 2, rather than adding something -- comes with a cost: we need an extra test, to special-case the first trip through the loop, because nchAllocated started out as 0, and 0 × 2 = 0.

Your reallocation scheme is incorrect: the array is always too short by one byte and the null terminator is written one position past the end of the string, instead of at the end of the string. This causes an extra byte to be printed, with whatever value happens to be in memory in the block returned by realloc(), which is uninitialized.
It is less confusing to use tmpLen as the length of the string read si far and allocate 2 extra bytes for the newly read character and the null terminator.
Furthermore the test c != NULL makes no sense: c is byte and NULL is a pointer. Similarly, tmp[tmpSize - 1] = (char *)c; is incorrect: you should just write
tmp[tmpSize - 1] = c;
Here is a corrected version:
char *readFile(const char *fileName) {
FILE *file = fopen(fileName, "r");
if (file == NULL) {
printf("File could not be opened for reading.\n");
return NULL;
}
size_t tmpLen = 0;
char *tmp = (char *)malloc(tmpLen + 1);
if (tmp == NULL) {
printf("malloc() could not be called on tmp.\n");
fclose(file);
return NULL;
}
int c;
while ((c = fgetc(file)) != EOF) {
char *new_tmp = (char *)realloc(tmp, tmpLen + 2);
if (new_tmp == NULL) {
printf("realloc() failure for %zu bytes.\n", tmpLen + 2);
free(tmp);
fclose(file);
return NULL;
}
tmp = new_tmp;
tmp[tmpLen++] = c;
}
tmp[tmpLen] = '\0';
fclose(file);
printf("TMP: %s\n", tmp);
return tmp;
}
It is usually better to reallocate in chunks or with a geometric size increment. Here is a simple implementation:
char *readFile(const char *fileName) {
FILE *file = fopen(fileName, "r");
if (file == NULL) {
printf("File could not be opened for reading.\n");
return NULL;
}
size_t tmpLen = 0;
size_t tmpSize = 16;
char *tmp = (char *)malloc(tmpSize);
char *newTmp;
if (tmp == NULL) {
printf("malloc() could not be called on tmp.\n");
fclose(file);
return NULL;
}
int c;
while ((c = fgetc(file)) != EOF) {
if (tmpSize - tmpLen < 2) {
size_t newSize = tmpSize + tmpSize / 2;
newTmp = (char *)realloc(tmp, newSize);
if (newTmp == NULL) {
printf("realloc() failure for %zu bytes.\n", newSize);
free(tmp);
fclose(file);
return NULL;
}
tmpSize = newSize;
tmp = newTmp;
}
tmp[tmpLen++] = c;
}
tmp[tmpLen] = '\0';
fclose(file);
printf("TMP: %s\n", tmp);
// try to shrink allocated block to the minimum size
// if realloc() fails, return the current block
// it seems impossible for this reallocation to fail
// but the C Standard allows it.
newTmp = (char *)realloc(tmp, tmpLen + 1);
return newTmp ? newTmp : tmp;
}

reading an unbounded line from the console with scanf

I need to read a finite yet unbounded-in-length string.
We learned only about scanf so I guess I cannot use fgets.
Anyway, I've ran this code on a an input with length larger than 5.
char arr[5];
scanf("%s", arr);
char *s = arr;
while (*s != '\0')
printf("%c", *s++);
scanf keeps scanning and writing the overflowed part, but it seems like an hack. Is that a good practice? If not, how should I read it?
Note: We have learned about the alloc functions family.

Buffer overflows are a plague, of the most famous and yet most elusive bugs. So you should definitely not rely on them.
Since you've learned about malloc() and friends, I suppose you're expected to make use of them.
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
// Array growing step size
#define CHUNK_SIZE 8
int main(void) {
size_t arrSize = CHUNK_SIZE;
char *arr = malloc(arrSize);
if(!arr) {
fprintf(stderr, "Initial allocation failed.\n");
goto failure;
}
// One past the end of the array
// (next insertion position)
size_t arrEnd = 0u;
for(char c = '\0'; c != '\n';) {
if(scanf("%c", &c) != 1) {
fprintf(stderr, "Reading character %zu failed.\n", arrEnd);
goto failure;
}
// No more room, grow the array
// (-1) takes into account the
// nul terminator.
if(arrEnd == arrSize - 1) {
arrSize += CHUNK_SIZE;
char *newArr = realloc(arr, arrSize);
if(!newArr) {
fprintf(stderr, "Reallocation failed.\n");
goto failure;
}
arr = newArr;
// Debug output
arr[arrEnd] = '\0';
printf("> %s\n", arr);
// Debug output
}
// Append the character and
// advance the end index
arr[arrEnd++] = c;
}
// Nul-terminate the array
arr[arrEnd++] = '\0';
// Done !
printf("%s", arr);
free(arr);
return 0;
failure:
free(arr);
return 1;
}

%as or %ms(POSIX) can be used for such purpose If you are using gcc with glibc.(not C standard)
#include <stdio.h>
#include <stdlib.h>
int main(void){
char *s;
scanf("%as", &s);
printf("%s\n", s);
free(s);
return 0;
}

scanf is the wrong tool for this job (as for most jobs). If you are required to use this function, read one char at a time with scanf("%c", &c).
You code misuses scanf(): you are passing arr, the address of an array of pointers to char instead of an array of char.
You should allocate an array of char with malloc, read characters into it and use realloc to extend it when it is too small, until you get a '\n' or EOF.
If you can rewind stdin, you can first compute the number of chars to read with scanf("%*s%n", &n);, then allocate the destination array to n+1 bytes, rewind(stdin); and re-read the string into the buffer with scanf("%s", buf);.
It is risky business as some streams such as console input cannot be rewinded.
For example:
fpos_t pos;
int n = 0;
char *buf;
fgetpos(stdin, &pos);
scanf("%*[^\n]%n", &n);
fsetpos(stdin, &pos);
buf = calloc(n+1, 1);
scanf("%[^\n]", buf);
Since you are supposed to know just some basic C, I doubt this solution is what is expected from you, but I cannot think of any other way to read an unbounded string in one step using standard C.
If you are using the glibc and may use extensions, you can do this:
scanf("%a[^\n]", &buf);
PS: all error checking and handling is purposely ignored, but should be handled in you actual assignment.

Try limiting the amount of characters accepted:
scanf("%4s", arr);

It's just that you're writing beyond arr[5]. "Hopefully" you're keeping writing on allocated memory of the process, but if you go beyond you'll end up with a segmentation fault.

Consider
1) malloc() on many systems only allocates memory, not uses it. It isn't until the memory is assigned that the underlining physical memory usage occurs. See Why is malloc not "using up" the memory on my computer?
2) Unbounded user input is not realistic. Given that some upper bound should be employed to prevent hackers and nefarious users, simple use a large buffer.
If you system can work with these two ideas:
char *buf = malloc(1000000);
if (buf == NULL) return NULL; // Out_of_memory
if (scanf("%999999s", buf) != 1) { free(buf); return NULL; } //EOF
// Now right-size buffer
size_t size = strlen(buf) + 1;
char *tmp = realloc(buf, size);
if (tmp == NULL) { free(buf); return NULL; } // Out_of_memory
return tmp;
Fixed up per #chqrlie comments.

How do I use scanf when I dont know how many values it will assign in C?

These are the instructions:
"Read characters from standard input until EOF (the end-of-file mark) is read. Do not prompt the user to enter text - just read data as soon as the program starts."
So the user will be entering characters, but I dont know how many. I will later need to use them to build a table that displays the ASCII code of each value entered.
How should I go about this?
This is my idea
int main(void){
int inputlist[], i = -1;
do {++i;scanf("%f",&inputlist[i]);}
while(inputlist[i] != EOF)

You said character.So this might be used
char arr[10000];
ch=getchar();
while(ch!=EOF)
{
arr[i++]=ch;
ch=getchar();
}
//arr[i]=0; TO make it a string,if necessary.
And to convert to ASCII
for(j=0;j<i;j++)
printf("%d\n",arr[j]);
If you are particular in using integer array,Use
int arr[1000];
while(scanf("%d",&arr[i++])!=EOF);
PPS:This works only if your input is one character per line.
scanf returns EOF on EOF

You have a reasonable attempt at a start to the solution, with a few errors. You can't define an array without specifying a size, so int inputlist[] shouldn't even compile. Your scanf() specifier is %f for float, which is wrong twice (once because you declared inputlist with an integer type, and twice because you said your input is characters, so you should be telling scanf() to use %c or %s), and really if you're reading input unconditionally until EOF, you should use an unconditional input function, such as fgets() or fread(). (or read(), if you prefer).
You'll need two things: A place to store the current chunk of input, and a place to store the input that you've already read in. Since the input functions I mentioned above expect you to specify the input buffer, you can allocate that with a simple declaration.
char input[1024];
However, for the place to store all input, you'll want something dynamically allocated. The simplest solution is to simply malloc() a chunk of storage, keep track of how large it is, and realloc() it if and when necessary.
char *all_input;
int poolsize=16384;
all_input = malloc(pool_size);
Then, just loop on your input function until the return value indicates that you've hit EOF, and on each iteration of the loop, append the input data to the end of your storage area, increment a counter by the size of the input data, and check whether you're getting too close to the size of your input storage area. (And if you are, then use realloc() to grow your storage.)

You could read the input by getchar until reach EOF. And you don't know the size of input, you should use dynamic size buffer in heap.
char *buf = NULL;
long size = 1024;
long count = 0;
char r;
buf = (char *)malloc(size);
if (buf == NULL) {
fprintf(stderr, "malloc failed\n");
exit(1);
}
while( (r = getchar()) != EOF) {
buf[count++] = r;
// leave one space for '\0' to terminate the string
if (count == size - 1) {
buf = realloc(buf,size*2);
if (buf == NULL) {
fprintf(stderr, "realloc failed\n");
exit(1);
}
size = size * 2;
}
}
buf[count] = '\0';
printf("%s \n", buf);
return 0;

Here is full solution for your needs with comments.
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
// Number of elements
#define CHARNUM 3
int main(int argc, char **argv) {
// Allocate memory for storing input data
// We calculate requested amount of bytes by the formula:
// NumElement * SizeOfOneElement
size_t size = CHARNUM * sizeof(int);
// Call function to allocate memory
int *buffer = (int *) calloc(1, size);
// Check that calloc() returned valid pointer
// It can: 1. Return pointer in success or NULL in faulire
// 2. Return pointer or NULL if size is 0
// (implementation dependened).
// We can't use this pointer later.
if (!buffer || !size)
{
exit(EXIT_FAILURE);
}
int curr_char;
int count = 0;
while ((curr_char = getchar()) != EOF)
{
if (count >= size/sizeof(int))
{
// If we put more characters than now our buffer
// can hold, we allocate more memory
fprintf(stderr, "Reallocate memory buffer\n");
size_t tmp_size = size + (CHARNUM * sizeof(int));
int *tmp_buffer = (int *) realloc(buffer, tmp_size);
if (!tmp_buffer)
{
fprintf(stderr, "Can't allocate enough memory\n");
exit(EXIT_FAILURE);
}
size = tmp_size;
buffer = tmp_buffer;
}
buffer[count] = curr_char;
++count;
}
// Here you get buffer with the characters from
// the standard input
fprintf(stderr, "\nNow buffer contains characters:\n");
for (int k = 0; k < count; ++k)
{
fprintf(stderr, "%c", buffer[k]);
}
fprintf(stderr, "\n");
// Todo something with the data
// Free all resources before exist
free(buffer);
exit(EXIT_SUCCESS); }
Compile with -std=c99 option if you use gcc.
Also you can use getline() function which will read from standard input line by line. It will allocate enough memory to store line. Just call it until End-Of-File.
errno = 0;
int read = 0;
char *buffer = NULL;
size_t len = 0;
while ((read = getline(&buffer, &len, stdin)) != -1)
{ // Process line }
if (errno) { // Get error }
// Process later
Note that if you are using getline() you should anyway use dynamic allocated memory. But not for storing characters, rather to store pointers to the strings.

Tokenizing user input in C (store in **arg)?

I'm attempting to write a simple shell like interface, that takes in a users input (by char) and stores it via a pointer to a pointer* (exactly how argv works). Here's my code:
char input[100];
char **argvInput;
char ch;
int charLoop = 0;
int wordCount = 0;
argvInput = malloc(25 * sizeof(char *));
while((ch = getc(stdin))) {
if ((ch == ' ' || ch == '\n') && charLoop != 0) {
input[charLoop] = '\0';
argvInput[wordCount] = malloc((charLoop + 1) * sizeof(char));
argvInput[wordCount] = input;
charLoop = 0;
wordCount++;
if (ch == '\n') {
break;
}
} else if (ch != ' ' && ch != '\n') {
input[charLoop] = ch;
charLoop++;
} else {
break;
}
}
If I loop through argvInput via:
int i = 0;
for (i = 0; i < wordCount; i++)
printf("Word %i: %s\n", i, argvInput[i]);
All of the values of argvInput[i] are whatever the last input assignment was. So if I type:
"happy days are coming soon", the output of the loop is:
Word 0: soon
Word 1: soon
Word 2: soon
Word 3: soon
Word 4: soon
I'm at a loss. Clearly each loop is overwriting the previous value, but I'm staring at the screen, unable to figure out why...

This line is your bane:
argvInput[wordCount] = input;
Doesn't matter that you allocate new space, if you're going to replace the pointer to it with another one (i.e. input).
Rather, use strncpy to extract parts of the input into argvInput[wordCount].

argvInput[wordCount] = input; is only making the pointer of argvInput[wordCount] point to the memory of input instead of copy the content of input into the new allocated memory. You should use memcpy or strcpy to correct your program.
After the pointer assignment the memory status looks like the image below. The memory allocated by malloc((charLoop + 1) * sizeof(char));, which are the grey ones in the graph, could not be accessed by your program anymore and this will lead to some memory leak issue. Please take care of that.

I suggest printing your argvInput pointers with %p, instead of %s, to identify this problem: printf("Word %i: %p\n", i, (void *) argvInput[i]);
What do you notice about the values it prints? How does this differ from the behaviour of argv? Try printing the pointers of argv: for (size_t x = 0; x < argc; x++) { printf("Word %zu: %p\n", x, (void *) argv[x]); }
Now that you've observed the problem, explaining it might become easier.
This code allocates memory, and stores a pointer to that memory in argvInput[wordCount]: argvInput[wordCount] = malloc((charLoop + 1) * sizeof(char)); (by the way, sizeof char is always 1 in C, so you're multiplying by 1 unnecessarily).
This code replaces that pointer to allocated memory with a pointer to input: argvInput[wordCount] = input; ... Hence, all of your items contain a pointer to the same array: input, and your allocated memory leaks because you lose reference to it. Clearly, this is the problematic line; It doesn't do what you initially thought it does.
It has been suggested that you replace your malloc call with a strdup call, and remove the problematic line. I don't like this suggestion, because strdup isn't in the C standard, and so it isn't required to exist.
strncpy will work, but it's unnecessarily complex. strcpy is guaranteed to work just as well because the destination array is allocated to be large enough to store the string. Hence, I recommend replacing the problematic line with strcpy(argvInput[wordCount], input);.
Another option that hasn't been explained in detail is strtok. It seems this is best left unexplored for now, because it would require too much modification to your code.
I have a bone to pick with this code: char ch; ch = getc(stdin); is wrong. getc returns an int for a reason: Any successful character read will be returned in the form of an unsigned char value, which can't possibly be negative. If getc encounters EOF or an error, it'll return a negative value. Once you assign the return value to ch, how do you differentiate between an error and a success?
Have you given any thought as to what happens if the first character is ' '? Currently, your code would break out of the loop. This seems like a bug, if your code is to mimic common argv parsing behaviours. Adapting this code to solve your problem might be a good idea:
for (int c = getc(stdin); c >= 0; c = getc(stdin)) {
if (c == '\n') {
/* Terminate your argv array and break out of the loop */
}
else if (c != ' ') {
/* Copy c into input */
}
else if (charLoop != 0) {
/* Allocate argvInput[wordCount] and copy input into it,
* reset charLoop and increment wordCount */
}
}

How can I read a file word by word using stdio.h in C?

I'm new to C and I can't quite get it without a segmentation fault.
Here's my idea so far:
#include<stdio.h>
#include<string.h>
char *nextWord(FILE *stream) {
char *word;
char c;
while ( (c = (char)fgetc(stream)) != ' ' && c != '\n' && c != '\0') {
strcat(word, &c);
}
return word;
}
int main() {
FILE *f;
f = fopen("testppm.ppm", "r");
char *word;
word = nextWord(f);
printf("%s",word);
}

In your nextWord function, you never initialize the local variable word to point at anything, so when you try to write to the pointed-at memory with strcat, you get a segfault.
You need to allocate memory to store the word that you are going to read. The problem is, you don't know how big that word will be, so you don't know how much space to allocate. There are a number of possible approaches:
Use a (large) fixed size buffer on the stack to hold the word as you read it, then copy it to a malloc'd area of the appropriate size when you return it. There will be problems if you encounter a word that is too big for your fixed size buffer.
allocate a small block to read the word into, and keep track of how much is used as you read characters. When the block is full, realloc it to be bigger.

Or you can also use the fscanf function in your while loop.
char *nextWord(FILE *stream) {
char *buffer[124], *word;
int previous_size = 0;
while(!feof(!stream)){
int n = fscanf(file, "%s", buffer);
if(word == NULL){
word = malloc(sizeof(char)*n)
} else {
realloc(word, n + previous_size);
}
strncat(word, buffer, strlen(buffer) - 1);
previous_size = n;
}
return word;
}
A little explanation. The function fscanf returns the number of characters read. So the first thing i do is to save that value. If word is NULL you allocate it with the number of character otherwise you allocate word with the previous_size variable.
Don't forget to flush the buffer variable

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Reading from stdin (file of variable length) - c

char* buff; ... buff = (char) realloc (buff, n+1); You're trying to reallocate an unitialized pointer, which leads to undefined behaviour. Change to char buff = 0; ... buff = (char*) realloc (buff, n+1); But as has been pointed out, this is very inefficient.

Related

C - Copying text from a file results in unknown characters being copied over as well

reading an unbounded line from the console with scanf

How do I use scanf when I dont know how many values it will assign in C?

Tokenizing user input in C (store in **arg)?

How can I read a file word by word using stdio.h in C?

Categories

Resources

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Reading from stdin (file of variable length) - c

char* buff; ... buff = (char*) realloc (buff, n+1); You're trying to reallocate an unitialized pointer, which leads to undefined behaviour. Change to char* buff = 0; ... buff = (char*) realloc (buff, n+1); But as has been pointed out, this is very inefficient.

Related

C - Copying text from a file results in unknown characters being copied over as well

reading an unbounded line from the console with scanf

How do I use scanf when I dont know how many values it will assign in C?

Tokenizing user input in C (store in **arg)?

How can I read a file word by word using stdio.h in C?

Categories

Resources

char* buff; ... buff = (char) realloc (buff, n+1); You're trying to reallocate an unitialized pointer, which leads to undefined behaviour. Change to char buff = 0; ... buff = (char*) realloc (buff, n+1); But as has been pointed out, this is very inefficient.