Reading From A Buffer and Storing the line in an array - c

I am trying to make a simple client and server. Right now I am able to prints the contents of a file out to the screen. I would now like to store every line i read from the buffer into an array. I have attempted this but for some reason it always just adds the last line received from the buffer. Can anyone point out where I have gone wrong
int getFile (char path[256], int fd)
{
char buffer[256];
char bufferCopy[256];
char arguments[1000][1000];
int total = 0;
char * ptr;
while(read(fd, buffer, 256) != NULL)
{
char * temp;
strcpy(arguments[total], buffer);
total++;
}
for(int i = 0; i < total; i++)
{
printf("\n %s", arguments[i]);
}
}

Your read call doesn't read lines, it reads up to 256 bytes from fd. read also doesn't know anything about null terminators so there is no guarantee that buffer will hold a string (i.e. have a null terminator) and hence no guarantee that strcpy will stop copying at a sensible place. You're almost certainly scribbling all over your stack and once you do that, all bets are off and you can't expect anything sensible to happen.
If you want to read lines then you might want to switch to fgets or keep using read and figure out where the EOLs are yourself.

Related

Find and replace a word in a file, how to avoid reading the entire file into a buffer?

I have an assignment where I'm supposed to write to a file, then perform a find and replace on it, with the condition that the old word must have the same length as the new one.
What I'm currently doing is finding the file size, then allocating a memory of that size and assign it to a buffer, read the entire file into the buffer, change the words, then write it back on the file.
This would fail if the files are too big, the only thing I can think of to avoid this is:
Check if the buffer contains \n
If it doesn't (the entire line wasn't read), then use realloc to increase its size by any amount (the original for example)
Delete the last n characters in the buffer, where n is the length of the word we want to replace. (To avoid reading the same data again)
Set the file pointer back by n. (Because the word could be cut)
Is there any other method? This feels complicated, and realloc causes some issues that might make the program need new buffers.
This is the current code where I read the entire file at once:
void replace_word(const char *s, const char *old_word, const char *new_word){
FILE *original_file;
if((original_file = fopen(s, "r+")) == NULL){
perror(s);
exit(EXIT_FAILURE);
}
const int BUFFER_SIZE = fsize(s);
char *buffer = malloc(BUFFER_SIZE);
char *init_loc = buffer;
int word_len = strlen(old_word);
int word_frequency = 0;
fgets(buffer, BUFFER_SIZE, original_file);
while((buffer = strstr(buffer, old_word))){
memcpy(buffer, new_word, word_len);
word_frequency++;
}
buffer = init_loc;
rewind(original_file);
fputs(buffer, original_file);
printf("'%s' found %i times\n", old_word, word_frequency);
fclose(original_file);
free(buffer);
}
You can do it with a "sliding window" algorithm using just one fixed buffer of any length that you want, as long as the buffer is longer than the word you are looking for.
The pseudocode to search for a word of length N would look as follows:
Begin with a buffer full of data from the file.
Loop:
Search for the word in the buffer; if found:
calculate the offset of the word in the file
write the replacement over it.
move the last N - 1 characters from the end of the buffer to the beginning of the buffer. (That's because these characters may contain part of the word, and the remaining part may be in the beginning of the next buffer that you will read.)
fill the remainder of the buffer from the file.
repeat the above loop until you reach the end of the file.
For this to perform well, the buffer must be much longer than the word. So, if your word is up to 100 characters long, the buffer should be at least 4 kilobytes long. But 64 and even 128 kilobyte buffers work well in modern systems.
Do not forget to seek to the right offset before each read operation.
I don't know if this is the best solution or not, but i would just look at one word at a time. Then when you find the word you want to change, go back by the size of the word you read and overwrite it. As long as the word is the same size, it should work.
Use fgetc to get one char at a time from your file. Replace getchar with fgetc in the code below.
Just modify this code, to work with fgetc, it from K&R famous book on C, which i read 10 months ago, to learn C. I've used it a few times in my own code, and it works fine.
#include <stdio.h>
#include <ctype.h>
/* getword: get next word or character from input */
int getword(char *word, int lim)
{
int c, getch(void);
void ungetch(int);
char *w = word;
while (isspace(c = getch()))
;
if (c != EOF)
*w++ = c;
if (!isalpha(c)) {
*w = '\0';
return c;
}
for ( ; --lim > 0; w++)
if (!isalnum(*w = getch())) {
ungetch(*w);
break;
}
*w = '\0';
return word[0];
}
#define BUFSIZE 100
char buf[BUFSIZE]; /* buffer for ungetch */
int bufp = 0; /* next free position in buf */
int getch(void) /* get a (possibly pushed-back) character */
{
return (bufp > 0) ? buf[--bufp] : getchar(); //change to fgetc
}
void ungetch(int c) /* push character back on input */
{
if (bufp >= BUFSIZE)
printf("ungetch: too many characters\n");
else
buf[bufp++] = c;
}
You can make the max size of the array anything you want, it's set to 100, since there should be no words bigger then 100 char, but you can make it anything.
just modify the code to read form fgetc, and end when you hit EOF.

Storing strings in array in C

I have read a lot of questions on this, and using them I have altered my code and have created code which I thought would work.
I think it's my understanding of C, which is failing me here as I can't see where I'm going wrong.
I get no compilation errors, but when I run i receive 'FileReader.exe has stopped working' from the command prompt.
My code is :
void storeFile(){
int i = 0;
char allWords [45440][25];
FILE *fp = fopen("fileToOpen.txt", "r");
while (i <= 45440){
char buffer[25];
fgets(buffer, 25, fp);
printf("The word read into buffer is : %s",buffer);
strcpy(allWords[i], buffer);
printf("The word in allWords[%d] is : %s", i, allWords[i]);
//allWords[i][strlen(allWords[i])-1] = '\0';
i = i + 1;
}
fclose(fp);
}
There are 45440 lines in the file, and no words longer than 25 char's in length. I'm trying to read each word into a char array named buffer, then store that buffer in an array of char arrays named allWords.
I am trying to get this part working, before I refactor to return the array to the main method (which I feel won't be a fun experience).
You are trying to allocate more than a megabyte (45440*25) worth of data in automatic storage. On many architectures this results in stack overflow before your file-reading code even gets to run.
You can work around this problem by allocating allWords statically, like this
static char allWords [45440][25];
or dynamically, like this:
char (*allWords)[25] = malloc(45440 * sizeof(*allWords));
Note that using buffer in the call to fgets is not required, because allWords[i] can be used instead, without strcpy:
fgets(allWords[i], sizeof(*allWords)-1, fp);
Also note that an assumption about file size is unnecessary: you can continue calling fgets until it returns NULL; this indicates that the end of the file has been reached, so you can exit the loop using break.

Dynamically allocate user inputted string

I am trying to write a function that does the following things:
Start an input loop, printing '> ' each iteration.
Take whatever the user enters (unknown length) and read it into a character array, dynamically allocating the size of the array if necessary. The user-entered line will end at a newline character.
Add a null byte, '\0', to the end of the character array.
Loop terminates when the user enters a blank line: '\n'
This is what I've currently written:
void input_loop(){
char *str = NULL;
printf("> ");
while(printf("> ") && scanf("%a[^\n]%*c",&input) == 1){
/*Add null byte to the end of str*/
/*Do stuff to input, including traversing until the null byte is reached*/
free(str);
str = NULL;
}
free(str);
str = NULL;
}
Now, I'm not too sure how to go about adding the null byte to the end of the string. I was thinking something like this:
last_index = strlen(str);
str[last_index] = '\0';
But I'm not too sure if that would work though. I can't test if it would work because I'm encountering this error when I try to compile my code:
warning: ISO C does not support the 'a' scanf flag [-Wformat=]
So what can I do to make my code work?
EDIT: changing scanf("%a[^\n]%*c",&input) == 1 to scanf("%as[^\n]%*c",&input) == 1 gives me the same error.
First of all, scanf format strings do not use regular expressions, so I don't think something close to what you want will work. As for the error you get, according to my trusty manual, the %a conversion flag is for floating point numbers, but it only works on C99 (and your compiler is probably configured for C90)
But then you have a bigger problem. scanf expects that you pass it a previously allocated empty buffer for it to fill in with the read input. It does not malloc the sctring for you so your attempts at initializing str to NULL and the corresponding frees will not work with scanf.
The simplest thing you can do is to give up on n arbritrary length strings. Create a large buffer and forbid inputs that are longer than that.
You can then use the fgets function to populate your buffer. To check if it managed to read the full line, check if your string ends with a "\n".
char str[256+1];
while(true){
printf("> ");
if(!fgets(str, sizeof str, stdin)){
//error or end of file
break;
}
size_t len = strlen(str);
if(len + 1 == sizeof str){
//user typed something too long
exit(1);
}
printf("user typed %s", str);
}
Another alternative is you can use a nonstandard library function. For example, in Linux there is the getline function that reads a full line of input using malloc behind the scenes.
No error checking, don't forget to free the pointer when you're done with it. If you use this code to read enormous lines, you deserve all the pain it will bring you.
#include <stdio.h>
#include <stdlib.h>
char *readInfiniteString() {
int l = 256;
char *buf = malloc(l);
int p = 0;
char ch;
ch = getchar();
while(ch != '\n') {
buf[p++] = ch;
if (p == l) {
l += 256;
buf = realloc(buf, l);
}
ch = getchar();
}
buf[p] = '\0';
return buf;
}
int main(int argc, char *argv[]) {
printf("> ");
char *buf = readInfiniteString();
printf("%s\n", buf);
free(buf);
}
If you are on a POSIX system such as Linux, you should have access to getline. It can be made to behave like fgets, but if you start with a null pointer and a zero length, it will take care of memory allocation for you.
You can use in in a loop like this:
#include <stdlib.h>
#include <stdio.h>
#include <string.h> // for strcmp
int main(void)
{
char *line = NULL;
size_t nline = 0;
for (;;) {
ptrdiff_t n;
printf("> ");
// read line, allocating as necessary
n = getline(&line, &nline, stdin);
if (n < 0) break;
// remove trailing newline
if (n && line[n - 1] == '\n') line[n - 1] = '\0';
// do stuff
printf("'%s'\n", line);
if (strcmp("quit", line) == 0) break;
}
free(line);
printf("\nBye\n");
return 0;
}
The passed pointer and the length value must be consistent, so that getline can reallocate memory as required. (That means that you shouldn't change nline or the pointer line in the loop.) If the line fits, the same buffer is used in each pass through the loop, so that you have to free the line string only once, when you're done reading.
Some have mentioned that scanf is probably unsuitable for this purpose. I wouldn't suggest using fgets, either. Though it is slightly more suitable, there are problems that seem difficult to avoid, at least at first. Few C programmers manage to use fgets right the first time without reading the fgets manual in full. The parts most people manage to neglect entirely are:
what happens when the line is too large, and
what happens when EOF or an error is encountered.
The fgets() function shall read bytes from stream into the array pointed to by s, until n-1 bytes are read, or a is read and transferred to s, or an end-of-file condition is encountered. The string is then terminated with a null byte.
Upon successful completion, fgets() shall return s. If the stream is at end-of-file, the end-of-file indicator for the stream shall be set and fgets() shall return a null pointer. If a read error occurs, the error indicator for the stream shall be set, fgets() shall return a null pointer...
I don't feel I need to stress the importance of checking the return value too much, so I won't mention it again. Suffice to say, if your program doesn't check the return value your program won't know when EOF or an error occurs; your program will probably be caught in an infinite loop.
When no '\n' is present, the remaining bytes of the line are yet to have been read. Thus, fgets will always parse the line at least once, internally. When you introduce extra logic, to check for a '\n', to that, you're parsing the data a second time.
This allows you to realloc the storage and call fgets again if you want to dynamically resize the storage, or discard the remainder of the line (warning the user of the truncation is a good idea), perhaps using something like fscanf(file, "%*[^\n]");.
hugomg mentioned using multiplication in the dynamic resize code to avoid quadratic runtime problems. Along this line, it would be a good idea to avoid parsing the same data over and over each iteration (thus introducing further quadratic runtime problems). This can be achieved by storing the number of bytes you've read (and parsed) somewhere. For example:
char *get_dynamic_line(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL, *temp;
do {
size_t alloc_size = bytes_read * 2 + 1;
temp = realloc(bytes, alloc_size);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
temp = fgets(bytes + bytes_read, alloc_size - bytes_read, f); /* Parsing data the first time */
bytes_read += strcspn(bytes + bytes_read, "\n"); /* Parsing data the second time */
} while (temp && bytes[bytes_read] != '\n');
bytes[bytes_read] = '\0';
return bytes;
}
Those who do manage to read the manual and come up with something correct (like this) may soon realise the complexity of an fgets solution is at least twice as poor as the same solution using fgetc. We can avoid parsing data the second time by using fgetc, so using fgetc might seem most appropriate. Alas most C programmers also manage to use fgetc incorrectly when neglecting the fgetc manual.
The most important detail is to realise that fgetc returns an int, not a char. It may return typically one of 256 distinct values, between 0 and UCHAR_MAX (inclusive). It may otherwise return EOF, meaning there are typically 257 distinct values that fgetc (or consequently, getchar) may return. Trying to store those values into a char or unsigned char results in loss of information, specifically the error modes. (Of course, this typical value of 257 will change if CHAR_BIT is greater than 8, and consequently UCHAR_MAX is greater than 255)
char *get_dynamic_line(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL;
do {
if ((bytes_read & (bytes_read + 1)) == 0) {
void *temp = realloc(bytes, bytes_read * 2 + 1);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
}
int c = fgetc(f);
bytes[bytes_read] = c >= 0 && c != '\n'
? c
: '\0';
} while (bytes[bytes_read++]);
return bytes;
}

Array Not Filling Properly

I am trying to deconstruct a document into its respective paragraphs, and input each paragraphs, as a string, into an array. However, each time a new value is added, it overwrites all previous values in the array. The last "paragraph" read (as denoted by newline) is the value of each non-null value of the array.
Here is the code:
char buffer[MAX_SIZE];
char **paragraphs = (char**)malloc(MAX_SIZE * sizeof(char*));
int pp = 0;
int i;
FILE *doc;
doc = fopen(argv[1], "r+");
assert(doc);
while((i = fgets(buffer, sizeof(buffer), doc) != NULL)) {
if(strncmp(buffer, "\n", sizeof(buffer))) {
paragraphs[pp++] = (char*)buffer;
}
}
printf("pp: %d\n", pp);
for(i = 0; i < MAX_SIZE && paragraphs[i] != NULL; i++) {
printf("paragraphs[%d]: %s", i, paragraphs[i]);
}
The output I receive is:
pp: 4
paragraphs[0]: paragraph four
paragraphs[1]: paragraph four
paragraphs[2]: paragraph four
paragraphs[3]: paragraph four
when the program is run as follows: ./prog.out doc.txt, where doc.txt is:
paragraph one
paragraph two
paragraph three
paragraph four
The behavior of the program is otherwise desired. The paragraph count works properly, ignoring the line that contains ONLY the newline character (line 4).
I assume the problem occurs in the while loop, however am unsure how to remedy the problem.
Your solution is pretty sound. Your Paragraph array is supposed to hold each paragraph, and since each paragraph element is just a small 4 bytes pointer you can afford to define a reasonable max number of them. However, since this max number is a constant, it is of little use to allocate the array dynamically.
The only meaningful use of dynamic allocation would be to read the whole text once to count the actual number of paragraphs, allocate the array accordingly and re-read the whole file a second time, but I doubt this is worth the effort.
The downside of using fixed-size paragraph array is that you must stop filling it once you reach the maximal number of elements.
You can then re-allocate a bigger array if you absolutely want to be able to process the whole Bible, but for an educational exercise I think it's reasonable to just stop recording paragraphs (thus producing a code that can store and count paragraphs up to a maximal number).
The real trouble with your code is, you don't store the paragraph contents anywhere. When you read the actual lines, it's always inside the same buffer, so each paragraph will point to the same string, which will contain the last paragraph read.
The solution is to make a unique copy of the buffer and have the current paragraph point to that.
C being already messy enough as it is, I suggest using the strdup() function, which duplicates a string (basically computing string length, allocating sufficient memory, copying the string into it and returning the new block of memory holding the new copy). You just need to remember to free this new copy once you're done using it (in your case at the end of your program).
This is not the most time-efficient solution, since each string will require a strlen and a malloc performed internally by strdump while you could have pre-allocated a big buffer for all paragraphs, but it is certainly simpler and probably more memory-efficient (only the minimal amount of memory will be allocated for each string, though each malloc consumes a few extra bytes for internal allocator housekeeping).
The bloody awkward fgets also stores the trailing \n at the end of the line, so you'll probably want to get rid of that.
Your last display loop would be simpler, more robust and more efficient if you simply used pp as a limit, instead of checking uninitialized paragraphs.
Lastly, you'd better define two different constants for max line size and max number of paragraphs. Using the same value for both makes little sense, unless you're processing perfectly square texts :).
#define MAX_LINE_SIZE 82 // max nr of characters in a line (including trailing \n and \0)
#define MAX_PARAGRAPHS 100 // max number of paragraphs in a file
void main (void)
{
char buffer[MAX_LINE_SIZE];
char * paragraphs[MAX_PARAGRAPHS];
int pp = 0;
int i;
FILE *doc;
doc = fopen(argv[1], "r+");
assert(doc != NULL);
while((fgets(buffer, sizeof(buffer), doc) != NULL)) {
if (pp != MAX_PARAGRAPHS // make sure we don't overflow our paragraphs array
&& strcmp(buffer, "\n")) {
// fgets awkwardly collects the ending \n, so get rid of it
if (buffer[strlen(buffer)-1] == '\n') buffer[strlen(buffer)-1] = '\0';
// current paragraph references a unique copy of the actual text
paragraphs[pp++] = strdup (buffer);
}
}
printf("pp: %d\n", pp);
for(i = 0; i != pp; i++) {
printf("paragraphs[%d]: %s", i, paragraphs[i]);
free(paragraphs[i]); // release memory allocated by strdup
}
}
What is the proper way to allocate the necessary memory? Is the malloc on line 2 not enough?
No, you need to allocate memory for the 2D array of strings you created. The following will not work as coded.
char **paragraphs = (char**)malloc(MAX_SIZE * sizeof(char*));
If you have: (for a simple explanation)
char **array = {0}; //array of C strings, before memory is allocation
Then you can create memory for it like this:
int main(void)
{
int numStrings = 10;// for example, change as necessary
int maxLen = MAX_SIZE; //for example, change as necessary
char **array {0};
array = allocMemory(array, numStrings, maxLen);
//use the array, then free it
freeMemory(array, numStrings);
return 0;
}
char ** allocMemory(char ** a, int numStrings, int maxStrLen)
{
int i;
a = calloc(sizeof(char*)*(numStrings+1), sizeof(char*));
for(i=0;i<numStrings; i++)
{
a[i] = calloc(sizeof(char)*maxStrLen + 1, sizeof(char));
}
return a;
}
void freeMemory(char ** a, int numStrings)
{
int i;
for(i=0;i<numStrings; i++)
if(a[i]) free(a[i]);
free(a);
}
Note: you can determine the number of lines in a file several ways, One way for example, by FILE *fp = fopen(filepath, "r");, then calling ret = fgets(lineBuf, lineLen, fp) in a loop until ret == EOF, keeping count of an index value for each loop. Then fclose(). (which you did not do either) This necessary step is not included in the code example above, but you can add it if that is the approach you want to use.
Once you have memory allocated, Change the following in your code:
paragraphs[pp++] = (char*)buffer;
To:
strcpy(paragraphs[pp++], buffer);//no need to cast buffer, it is already char *
Also, do not forget to call fclose() when you are finished with the open file.

reading an int and storing it in a char * buffer in C

I have a file with integers. I want to write in a buffer those integers as chars (its ascii number). Because it is part of a bigger project please do not post different but please help me on that. What I especially need is chars to be stored in a buffer of type char *.
These are my declarations.
FILE *in;
long io_len = 1000;
char * buffer;
in=fopen("input.txt","a+");
buffer = malloc(io_len * sizeof(*buffer));
if(buffer == NULL){
perror("malloc");
exit(EXIT_FAILURE);
}
I am figuring out 2 sollutions.
If I write this one:
read_ret = read(in, buffer, io_len);
it reads from file in, io_len bytes and stores them in buffer. But it reads characters. So for example if I write 123 it will write to buffer 1,2,3 not the character with ascii number 123.
So I did this:
while((fscanf(in,"%d", &i))==1){
printf(": %d\n", i);
}
which reads the integers as I want. Now I am a little bit confused on how I will store them in buffer, as characters. I have tried this but it get me a segmentation fault.
while((fscanf(in,"%d", &i))==1){
printf(": %d\n", i);
buffer=(char) i;
printf("Character in Buffer:%s\n",buffer);
buffer++;
}
Have in mind that later in my file I am writing my buffer somewhere else, so whatever I will do I want the pointer to be at the start of my char array(if it makes sense what I am saying)
Your final code should at least give you a warning about assigning an integer to a pointer in the line buffer=(char) i;. It looks like you want to dereference the pointer.
You are also printing a string when it looks like you really only want to print a character at a time.
Your code should probably look like this:
int character_index = 0;
while((fscanf(in,"%d", &i))==1){
printf(": %d\n", i);
buffer[character_index]=(char) i;
printf("Character in Buffer:%c\n",buffer[character_index]);
character_index++;
}

Resources