Append a character at the end of a malloc'd string - c

I am looking for a '.' character in a file.
I use the function read to look for it.
The problem is that I have a buffer size and I need to loop as long as I don't find the '.' character with read(fd, buffer, bufferSize).
I used malloc to allocate the memory for 1 buffer size (char *buffer = malloc((bufferSize + 1) * sizeof(char))), but I don't know how many times I'll have to loop.
How can I add memory to the buffer depending on the buffer size loops I'll do ?

A string is not needed.
Instead, to use read(), examine its return value to determine how much to search with memchr().
#define BUFFER_SIZE 1000
char buffer[BUF_SIZE];
long long loop_count = 0;
ssize_t size_read;
while ((size_read = read(fd, buffer, sizeof buffer)) > 0) {
void *dot_address = memchr(buffer, '.', size_read);
if (dot_address) {
loop_count += dot_address - buffer;
printf("Found '.' as offset %lld\n", loop_count);
break;
}
loop_count += size_read;
}
OP later adds in a comment, "I have to save all the characters until the dot in the buffer so I have to add memory every time I loop"
Not sure when continually additional requirements will stop.
Lost interest.

Related

Find and replace a word in a file, how to avoid reading the entire file into a buffer?

I have an assignment where I'm supposed to write to a file, then perform a find and replace on it, with the condition that the old word must have the same length as the new one.
What I'm currently doing is finding the file size, then allocating a memory of that size and assign it to a buffer, read the entire file into the buffer, change the words, then write it back on the file.
This would fail if the files are too big, the only thing I can think of to avoid this is:
Check if the buffer contains \n
If it doesn't (the entire line wasn't read), then use realloc to increase its size by any amount (the original for example)
Delete the last n characters in the buffer, where n is the length of the word we want to replace. (To avoid reading the same data again)
Set the file pointer back by n. (Because the word could be cut)
Is there any other method? This feels complicated, and realloc causes some issues that might make the program need new buffers.
This is the current code where I read the entire file at once:
void replace_word(const char *s, const char *old_word, const char *new_word){
FILE *original_file;
if((original_file = fopen(s, "r+")) == NULL){
perror(s);
exit(EXIT_FAILURE);
}
const int BUFFER_SIZE = fsize(s);
char *buffer = malloc(BUFFER_SIZE);
char *init_loc = buffer;
int word_len = strlen(old_word);
int word_frequency = 0;
fgets(buffer, BUFFER_SIZE, original_file);
while((buffer = strstr(buffer, old_word))){
memcpy(buffer, new_word, word_len);
word_frequency++;
}
buffer = init_loc;
rewind(original_file);
fputs(buffer, original_file);
printf("'%s' found %i times\n", old_word, word_frequency);
fclose(original_file);
free(buffer);
}
You can do it with a "sliding window" algorithm using just one fixed buffer of any length that you want, as long as the buffer is longer than the word you are looking for.
The pseudocode to search for a word of length N would look as follows:
Begin with a buffer full of data from the file.
Loop:
Search for the word in the buffer; if found:
calculate the offset of the word in the file
write the replacement over it.
move the last N - 1 characters from the end of the buffer to the beginning of the buffer. (That's because these characters may contain part of the word, and the remaining part may be in the beginning of the next buffer that you will read.)
fill the remainder of the buffer from the file.
repeat the above loop until you reach the end of the file.
For this to perform well, the buffer must be much longer than the word. So, if your word is up to 100 characters long, the buffer should be at least 4 kilobytes long. But 64 and even 128 kilobyte buffers work well in modern systems.
Do not forget to seek to the right offset before each read operation.
I don't know if this is the best solution or not, but i would just look at one word at a time. Then when you find the word you want to change, go back by the size of the word you read and overwrite it. As long as the word is the same size, it should work.
Use fgetc to get one char at a time from your file. Replace getchar with fgetc in the code below.
Just modify this code, to work with fgetc, it from K&R famous book on C, which i read 10 months ago, to learn C. I've used it a few times in my own code, and it works fine.
#include <stdio.h>
#include <ctype.h>
/* getword: get next word or character from input */
int getword(char *word, int lim)
{
int c, getch(void);
void ungetch(int);
char *w = word;
while (isspace(c = getch()))
;
if (c != EOF)
*w++ = c;
if (!isalpha(c)) {
*w = '\0';
return c;
}
for ( ; --lim > 0; w++)
if (!isalnum(*w = getch())) {
ungetch(*w);
break;
}
*w = '\0';
return word[0];
}
#define BUFSIZE 100
char buf[BUFSIZE]; /* buffer for ungetch */
int bufp = 0; /* next free position in buf */
int getch(void) /* get a (possibly pushed-back) character */
{
return (bufp > 0) ? buf[--bufp] : getchar(); //change to fgetc
}
void ungetch(int c) /* push character back on input */
{
if (bufp >= BUFSIZE)
printf("ungetch: too many characters\n");
else
buf[bufp++] = c;
}
You can make the max size of the array anything you want, it's set to 100, since there should be no words bigger then 100 char, but you can make it anything.
just modify the code to read form fgetc, and end when you hit EOF.

Invalid Argument Reported By getdelim

I'm trying to use the getdelim function to read an entire text file's contents into a string.
Here is the code I am using:
ssize_t bytesRead = getdelim(&buffer, 0, '\0', fp);
This is failing however, with strerror(errno) saying "Error: Invalid Argument"
I've looked at all the documentation I could and just can't get it working, I've tried getline which does work but I'd like to get this function working preferably.
buffer is NULL initialised as well so it doesn't seem to be that
fp is also not reporting any errors and the file opens perfectly
EDIT: My implementation is based on an answer from this stackoverflow question Easiest way to get file's contents in C
Kervate, please enable compiler warnings (-Wall for gcc), and heed them. They are helpful; why not accept all the help you can get?
As pointed out by WhozCraig and n.m. in comments to your original question, the getdelim() man page shows the correct usage.
If you wanted to read records delimited by the NUL character, you could use
FILE *input; /* Or, say, stdin */
char *buffer = NULL;
size_t size = 0;
ssize_t length;
while (1) {
length = getdelim(&buffer, &size, '\0', input);
if (length == (ssize_t)-1)
break;
/* buffer has length chars, including the trailing '\0' */
}
free(buffer);
buffer = NULL;
size = 0;
if (ferror(input) || !feof(input)) {
/* Error reading input, or some other reason
* that caused an early break out of the loop. */
}
If you want to read the contents of a file into a single character array, then getdelim() is the wrong function.
Instead, use realloc() to dynamically allocate and grow the buffer, appending to it using fread(). To get you started -- this is not complete! -- consider the following code:
FILE *input; /* Handle to the file to read, assumed already open */
char *buffer = NULL;
size_t size = 0;
size_t used = 0;
size_t more;
while (1) {
/* Grow buffer when less than 500 bytes of space. */
if (used + 500 >= size) {
size_t new_size = used + 30000; /* Allocate 30000 bytes more. */
char *new_buffer;
new_buffer = realloc(buffer, new_size);
if (!new_buffer) {
free(buffer); /* Old buffer still exists; release it. */
buffer = NULL;
size = 0;
used = 0;
fprintf(stderr, "Not enough memory to read file.\n");
exit(EXIT_FAILURE);
}
buffer = new_buffer;
size = new_size;
}
/* Try reading more data, as much as fits in buffer. */
more = fread(buffer + used, 1, size - used, input);
if (more == 0)
break; /* Could be end of file, could be error */
used += more;
}
Note that the buffer in this latter snippet is not a string. There is no terminating NUL character, so it's just an array of chars. In fact, if the file contains binary data, the array may contain lots of NULs (\0, zero bytes). Assuming there was no error and all of the file was read (you need to check for that, see the former example), buffer contains used chars read from the file, with enough space allocated for size. If used > 0, then size > used. If used == 0, then size may or may not be zero.
If you want to turn buffer into a string, you need to decide what to do with the possibly embedded \0 bytes -- I recommend either convert to e.g. spaces or tabs, or move the data to skip them altogether --, and add the string-terminating \0 at end to make it a valid string.

Scanning string with length restriction

Using the standard C library, is there a way to scan a string (containing no whitespace) from standard input only if it fits in a buffer? In the following example I would like scanCount to be 0 if the input string is larger than 32:
char str[32];
int scanCount;
scanCount = scanf("%32s", str);
Edit: I also need file pointer rollback when the input string is too large.
You specified a requirement to only read if the whole data fits your buffer. This requirement makes no sense at all as it doesn't provide any functionality to your program. You can easily achieve the same sort of tasks without it. It also is not how operating systems present files to the user applications.
You can simply create a buffer of any size you see fit and then you can keep the data in the buffer until you can handle it, or you can do magic like actually resizing the buffer to accomodate more incoming data.
You can read any number of characters from a file using the ANSI fread() function:
size_t count;
char buffer[50];
count = fread(buffer, 1, sizeof buffer, stdin);
You can then see how many characters have actually been read by looking at the count variable, you can fill in the final NUL character if it's less than the buffer size or you can decide what to do next, if the whole buffer has been read and more data may be availabe. You could of course read sizeof buffer - 1 instead, to be able to always finalize the string. When the count is smaller than your specified value, feof() and ferror() can be used to see what happened. You can also look at the actual and check for a LF character to see how many lines you have read.
When using an enlarging buffer, you will need malloc() or just create a NULL pointer that will later be allocated using realloc():
/* Set initial size and offset. */
size_t offset = 0;
size_t size = 0;
char *buffer = NULL;
When you need to change the size of the buffer, you can use realloc():
/* Change the size. */
size = 100;
buffer = realloc(buffer, size);
(The first time it's equivalent to buffer = malloc(size).)
You can then read data into the buffer:
size_t count = fread(buffer + offset, 1, size - offset, stdin);
count += offset;
(The first time it's equivalent to fread(buffer, 1, size, stdin).)
When finished, you should free the buffer:
free(buffer);
At any time, you still have all the already read data somewhere in a buffer, so you can get back to it at any time, you just decouple the reading and processing, where the above examples are all about reading.
The processing then depends on what you need. You generally need to identify the start and end of the data that you want to extract.
Example start and end, where end means one character after the last one you want, so the arithmetics work better:
size_t start = 0;
size_t end = 10;
Extract the data (using bits of C99):
char data[end - start + 1];
memcpy(data, buffer + start, end - start);
data[end] = '\0';
Now you have a NUL-terminated string containing the data you wanted to extract. Sometimes you just assume start = 0 and then want to consume the data from the buffer to make place for new data:
char data[end + 1];
/* copy out the data */
memcpy(data, buffer, end);
/* move data between end end offset to the beginning */
memmove(buffer, buffer + end, offset - end);
/* adjust the offset accordingly */
offset -= end;
Now you have your data extracted but you still have the buffer ready with the rest of the data you haven't processed, yet. This effectively achieves what you wanted, as by keeping the data in an intermediate buffer, you're effectively peeking into an arbitrary part of the data received on input and taking out the data only if it fits your expectations, doing whatever else if they don't.Of course you should carefully test all return values to check for exceptional conditions and such stuff.
I personally would also turn all indexes in the examples into pointers directly to the memory and adjust the arithmetics accordingly, but not everyone enjoys pointer arithmetics as I do ;). I also tend to prefer low-level POSIX API over the intermetiate layer in form of the ANSI API. Ready to fix bugs or improve explanations, please comment.
Your comment that you need the file pointer reset on scan failure makes this impossible to do with scanf().
scanf() is basically specified as "fscanf( stdin, ... )", and fscanf() is defined to "[push] back at most one input character onto the input stream" (C99, footnote 242). (I assume this is for the same reason that ungetc() is only required to support one byte of push-back: So that it can be conveniently buffered in memory.)
*scanf() is a poor choice to read uncertain inputs, for the reason described above and several other shortcomings when it comes to recovery-from-error. Generally speaking, if there is any chance that the input might not conform to the expected format, read input into an internal memory buffer first and then parse it from there.
Just read and store one character too many, and test for that.
char str[34]; // 33 characters + NUL terminator
int scanCount = scanf("%33s", str);
if (scanCount > 0 && strlen(str) > 32)
{
scanCount = 0;
}
On scanning a stream such as stdin is only allowed to "put back" up to 1 char. So scanning 32 or 33 char and then undoing is not possible.
If your input could use ftell() and fseek() (Available when stdin is redirected), code could
long pos = ftell(input);
char str[32+1];
int scanCount;
scanCount = fscanf(input, "%32s", str);
if (scanCount != 1 || strlen(str) >= 32) {
fseek(input, pos, SEEK_SET);
scanCount = fscanf(input, some_new_format, ....);
}
Otherwise use fgets() to read a maximal line and use sscanf()
char buf[1024];
if (fget(buf, sizeof buf, stdin) == NULL) Handle_IOError_or_EOF();
char str[32+1];
int scanCount;
scanCount = sscanf(buf, "%32s", str);
if (scanCount != 1 || strlen(str) >= 32) {
scanCount = sscanf(buf, some_new_format, ....);
}

Arduino serial message with unknown length

How can I store the result of Serial.readBytesUntil(character, buffer, length) in a buffer while I don't know the length of the incoming message ?
Here is a little code that makes use of realloc() to keep growing your buffer. You will have to free() when you're done with buf.
int length = 8;
char * buf = malloc(length);
int total_read = 0;
total_read = Serial.readBytesUntil(character, buf, length);
while(length == total_read) {
length *= 2;
buf = realloc(buf, length);
// Bug in this line:
// total_read += Serial.readBytesUntil(character, buf+total_read, length);
// Should be
total_read += Serial.readBytesUntil(character, buf+total_read, length-total_read);
}
*Edit: fixed a bug where readBytesUntil would have read off the end of buf by reading length bytes instead of length-total_read bytes.
make the buffer big enough for the message. Don't know the maximum length of the message? Use length to control the characters read, then continue reading until character encountered.
int bytesRead = Serial.readBytesUntil(character, buffer, length);
You could create a buffer that is just smaller than the remaining RAM and use that. The call to find the remaining ram (as I've posted elsewhere) is:
int freeRam () {
extern int __heap_start, *__brkval;
int v;
int fr = (int) &v - (__brkval == 0 ? (int) &__heap_start : (int) __brkval);
Serial.print("Free ram: ");
Serial.println(fr);
}
Regardless, you should make sure you only read into as much RAM as you actually have.
One answer is that when a program reads serial bytes it typically does NOT store them verbatim. Rather, the program examines each byte and determines what action to take next. This logic is typically implemented as Finite State Machine.
So, what does your specific serial stream represent? Can it be analyzed in sequential chunks? For example: "0008ABCDEFGH" says that 8 chars follow the 4 character length field. In this silly example your code would read 4 chars, then know how much space to allocate for the rest of the serial stream!

strange C program fgets

I was going through an open source codebase and I see the following:-
char *buf;
char *line;
#define BUFSIZE 5000
buf = malloc(BUFSIZE)
line = buf;
while(fgets(line, (unsigned)(buf + BUFSIZE -line), in) != NULL) {
// do stuff
// ....
}
Why is the second argument to fgets given as buf + BUFSIZE - line?
That gives the number of characters from line to end of buf. Your //do stuff likely increments line
buf + BUFSIZE is a char * pointing to the first char after the memory allocated for buf
buf + BUFSIZE - line is an integral, the number of chars from line to buf + BUFSIZE - and therefore the number of characters you can safely write to line without overflowing buf
buf + BUFSIZE - line gives the free space in the buffer.
This way line can be a scrolling pointer pointing to the first free byte, where the next read operation can put the data.
Line will probably get incremented during the loop. Thus this expression shrinks the value of BUFSIZE by the size of text already read.
It's a guess, sicne you didn't post the loop.

Resources