Reading strings and integers from one binary text in C - c

I'm using C and I want to read from a binaryFile.
I know that it is contain strings in the following way: Length of a string, the string itself, the length of a string, string itself, and so on...
I want to count the number of times which the string Str appears in the binary file.
So I want to do something like this:
int N;
while (!feof(file)){
if (fread(&N, sizeof(int), 1, file)==1)
...
Now I need to get the string itself. I know it's length. Should I do a 'for'
loop and get with fgetc char by char? I know I'm not allowed to use fscanf since
it's not a text file, but can I use fgetc? And would I get what I'm expecting for
my string? (To use dynamic allocation for char* for it with the size of the length
and use strcpy to add it to the current string?)

You could allocate some memory with malloc then fread into that buffer:
char *str;
/* ... */
if (fread(&N, sizeof(int), 1, file)==1)
{
/* check that N > 0 */
str = malloc(N+1);
if (fread(str, sizeof(char), N, file) == N)
{
str[N] = '\0'; /* terminate str */
printf("Read %d chars: %s\n", N, str);
}
free(str);
}

You should probably loop on:
while (fread(&N, sizeof(int), 1, file) == 1)
{
// Check N for sanity
char *buffer = malloc(N+1);
// Check malloc succeeded
if (fread(buffer, N, 1, file) != 1)
...process error...
buffer[N] = '\0'; // Null terminate for sanity's sake
...store buffer (the pointer) for later processing so you aren't leaking...
...or free it if you won't need it later...
}
You could use getc() or fgetc() in a loop; that would work. However, the direct fread() is much simpler (and is coded as if it uses getc() in a loop).
You might want to do some sanity checking on N before blindly using it with malloc(). In particular, negative values are likely to lead to much unhappiness.
The file format as written is tied to one class of machine — either big-endian or little-endian, and with the fixed size of int (probably 32-bits). Writing more portable data is slightly fiddlier, but eminently doable — but probably not relevant to you just yet.
Using feof() is seldom the correct way to test for whether to continue with a loop. Indeed, there is not often a need to use feof() in code. When it is used, it is because an I/O operation 'failed' and you need to disambiguate between 'it was not an error — just EOF' and 'there was some sort of error on the device'.

Related

How to avoid calling fopen() with a buffer that is not null-terminated in C?

Let's look at this example:
static FILE *open_file(const char *file_path)
{
char buf[80];
size_t n = snprintf(buf, sizeof (buf), "%s", file_path);
assert(n < sizeof (buf));
return fopen(buf, "r");
}
Here, the assert() is off-by-one. From the manpage for snprintf:
"Upon successful return, these functions return the number of characters printed (excluding the null byte used to end output to strings)."
So, if it returns 80, then the string will fill the buffer, and won't be terminated by \0. This will cause a problem because fopen() assumes it is null terminated.
What is the best way to prevent this?
So, if it returns 80, then the string will fill the buffer, and won't be terminated by \0
That is incorrect: the string would be null-terminated no matter what you pass for file_path. Obviously, the string would be cut off at the sizeof(buf)-1.
Note that snprintf could return a number above 80 as well. This would mean that the string you wanted to print was longer than the buffer you have provided.
What is the best way to prevent this?
You are already doing it: the assert is not necessary for preventing unterminated strings. You can use the return value to decide if any truncation has happened, and pass a larger buffer to compensate:
// Figure out the size
size_t n = snprintf(NULL, 0, "%s", file_path);
// Allocate the buffer and print into it
char *tmpBuf = malloc(n+1);
snprintf(tmpBuf, n+1, "%s", file_path);
// Prepare the file to return
FILE *res = fopen(tmpBuf, "r");
// Free the temporary buffer
free(tmpBuf);
return res;
What is the best way to prevent this?
Simple, don't give it a non-null terminated string. Academic questions aside, you are in control of the code you write. You don't have to protect against yourself sabotaging the project in every conceivable way, you just have to not sabotage yourself.
If everyone checked and double checked everything in code, the performance loss would be incredible. There's a reason why fopen doesn't do it.
There are a couple of issues here.
First assert() is used to catch issues as a part of designer testing. It is not meant to be used in production code.
Secondly if the file path is not complete then do you really want to call fopen()?
Normally what is done is to add one to the expected number of characters.
static FILE *open_file(const char *file_path)
{
char buf[80 + 1] = {0};
size_t n = snprintf(buf, 80, "%s", file_path);
assert(n < sizeof (buf));
return fopen(buf, "r");
}

Scanning string with length restriction

Using the standard C library, is there a way to scan a string (containing no whitespace) from standard input only if it fits in a buffer? In the following example I would like scanCount to be 0 if the input string is larger than 32:
char str[32];
int scanCount;
scanCount = scanf("%32s", str);
Edit: I also need file pointer rollback when the input string is too large.
You specified a requirement to only read if the whole data fits your buffer. This requirement makes no sense at all as it doesn't provide any functionality to your program. You can easily achieve the same sort of tasks without it. It also is not how operating systems present files to the user applications.
You can simply create a buffer of any size you see fit and then you can keep the data in the buffer until you can handle it, or you can do magic like actually resizing the buffer to accomodate more incoming data.
You can read any number of characters from a file using the ANSI fread() function:
size_t count;
char buffer[50];
count = fread(buffer, 1, sizeof buffer, stdin);
You can then see how many characters have actually been read by looking at the count variable, you can fill in the final NUL character if it's less than the buffer size or you can decide what to do next, if the whole buffer has been read and more data may be availabe. You could of course read sizeof buffer - 1 instead, to be able to always finalize the string. When the count is smaller than your specified value, feof() and ferror() can be used to see what happened. You can also look at the actual and check for a LF character to see how many lines you have read.
When using an enlarging buffer, you will need malloc() or just create a NULL pointer that will later be allocated using realloc():
/* Set initial size and offset. */
size_t offset = 0;
size_t size = 0;
char *buffer = NULL;
When you need to change the size of the buffer, you can use realloc():
/* Change the size. */
size = 100;
buffer = realloc(buffer, size);
(The first time it's equivalent to buffer = malloc(size).)
You can then read data into the buffer:
size_t count = fread(buffer + offset, 1, size - offset, stdin);
count += offset;
(The first time it's equivalent to fread(buffer, 1, size, stdin).)
When finished, you should free the buffer:
free(buffer);
At any time, you still have all the already read data somewhere in a buffer, so you can get back to it at any time, you just decouple the reading and processing, where the above examples are all about reading.
The processing then depends on what you need. You generally need to identify the start and end of the data that you want to extract.
Example start and end, where end means one character after the last one you want, so the arithmetics work better:
size_t start = 0;
size_t end = 10;
Extract the data (using bits of C99):
char data[end - start + 1];
memcpy(data, buffer + start, end - start);
data[end] = '\0';
Now you have a NUL-terminated string containing the data you wanted to extract. Sometimes you just assume start = 0 and then want to consume the data from the buffer to make place for new data:
char data[end + 1];
/* copy out the data */
memcpy(data, buffer, end);
/* move data between end end offset to the beginning */
memmove(buffer, buffer + end, offset - end);
/* adjust the offset accordingly */
offset -= end;
Now you have your data extracted but you still have the buffer ready with the rest of the data you haven't processed, yet. This effectively achieves what you wanted, as by keeping the data in an intermediate buffer, you're effectively peeking into an arbitrary part of the data received on input and taking out the data only if it fits your expectations, doing whatever else if they don't.Of course you should carefully test all return values to check for exceptional conditions and such stuff.
I personally would also turn all indexes in the examples into pointers directly to the memory and adjust the arithmetics accordingly, but not everyone enjoys pointer arithmetics as I do ;). I also tend to prefer low-level POSIX API over the intermetiate layer in form of the ANSI API. Ready to fix bugs or improve explanations, please comment.
Your comment that you need the file pointer reset on scan failure makes this impossible to do with scanf().
scanf() is basically specified as "fscanf( stdin, ... )", and fscanf() is defined to "[push] back at most one input character onto the input stream" (C99, footnote 242). (I assume this is for the same reason that ungetc() is only required to support one byte of push-back: So that it can be conveniently buffered in memory.)
*scanf() is a poor choice to read uncertain inputs, for the reason described above and several other shortcomings when it comes to recovery-from-error. Generally speaking, if there is any chance that the input might not conform to the expected format, read input into an internal memory buffer first and then parse it from there.
Just read and store one character too many, and test for that.
char str[34]; // 33 characters + NUL terminator
int scanCount = scanf("%33s", str);
if (scanCount > 0 && strlen(str) > 32)
{
scanCount = 0;
}
On scanning a stream such as stdin is only allowed to "put back" up to 1 char. So scanning 32 or 33 char and then undoing is not possible.
If your input could use ftell() and fseek() (Available when stdin is redirected), code could
long pos = ftell(input);
char str[32+1];
int scanCount;
scanCount = fscanf(input, "%32s", str);
if (scanCount != 1 || strlen(str) >= 32) {
fseek(input, pos, SEEK_SET);
scanCount = fscanf(input, some_new_format, ....);
}
Otherwise use fgets() to read a maximal line and use sscanf()
char buf[1024];
if (fget(buf, sizeof buf, stdin) == NULL) Handle_IOError_or_EOF();
char str[32+1];
int scanCount;
scanCount = sscanf(buf, "%32s", str);
if (scanCount != 1 || strlen(str) >= 32) {
scanCount = sscanf(buf, some_new_format, ....);
}

How to copy one line from long string C

I'm looking to copy the FIRST line from a LONG string P into a buffer
I have no idea how to make it.
while (*pros_id != '/n'){
*pros_id_line=*pros_id;
pros_id++;
pros_id_line++;
}
And tried
fgets(pros_id_line, sizeof(pros_id_line), pros_id);
Both are not working. Can I get some help please?
Note, as Adriano Repetti pointed out in a comment and an answer, that the newline character is '\n' and not '/n'.
Your initial code can be fixed up to work, provided that the destination buffer is big enough:
while (*pros_id != '\n' && *pros_id != '\0')
*pros_id_line++ = *pros_id++;
*pros_id_line = '\0';
This code does not include the newline in the copied buffer; it is easy enough to add it if you need it.
One advantage of this code is that it makes a single pass through the data up to the newline (or end of string). An alternative makes two passes through the data, one to find the newline and another to copy to the newline:
if ((end = strchr(pros_id, '\n')) != 0)
{
memmove(pros_id_line, pros_id, end - pros_id);
pros_id_line[end - pros_id] = '\0';
}
This ensures that the string is null-terminated; again, it omits the newline, and assumes there is enough space in the pros_id_line buffer for the data. You have to decide what is the correct behaviour when there is no newline in the buffer. It might be sufficient to copy the buffer without the newline into the target area, or you might prefer to report a problem.
You can use strncpy() instead of memmove() but it has a more complex loop condition than memmove() — it has to check for a null byte as well as the count, whereas memmove() only has to check the count. You can use memcpy() instead of memmove() if you're sure there's no overlap between source and target, but memmove() always works and memcpy() sometimes doesn't (though only when the source and target areas overlap), and I prefer reliability over possible misbehaviour.
Note that setting a buffer to zero before copying a string to it is a waste of energy. The parts that you're about to overwrite with data didn't need to be zeroed. The parts that you aren't going to overwrite with data didn't need to be zeroed either. You should know exactly which byte needs to be zeroed, so why waste the time on zeroing anything except the one byte that needs to be zeroed?
(One exception to this is if you are dealing with sensitive data and are concerned that some function that your code will call may deliberately read beyond the end of the string and come across parts of a password or other sensitive data. Then it may be appropriate to wipe the memory before writing new data to it. On the whole, though, most people aren't writing such code.)
New line is \n not /n anyway I'd use strchar for this:
char* endOfFirstLine = strchr(inputString, '\n');
if (endOfFirstLine != NULL)
{
strncpy(yourBuffer, inputString,
endOfFirstLine - inputString);
}
else // Input is one single line
{
strcpy(yourBuffer, inputString);
}
With inputString as your char* multiline string and inputBuffer (assuming it's big enough to contain all data from inputString and it has been zeroed) as your required output (first line of inputString).
If you're going to be doing a lot of reading from long text buffers, you could try using a memory stream, if you system supports them: https://www.gnu.org/software/libc/manual/html_node/String-Streams.html
#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
static char buffer[] = "foo\nbar";
int
main()
{
char arr[100];
FILE *stream;
stream = fmemopen(buffer, strlen(buffer), "r");
fgets(arr, sizeof arr, stream);
printf("First line: %s\n", arr);
fgets(arr, sizeof arr, stream);
printf("Second line: %s\n", arr);
fclose (stream);
return 0;
}
POSIX 2008 (e.g. most Linux systems) has getline(3) which heap-allocates a buffer for a line.
So you could code
FILE* fil = fopen("something.txt","r");
if (!fil) { perror("fopen"); exit(EXIT_FAILURE); };
char *linebuf=NULL;
size_t linesiz=0;
if (getline(&linebuf, &linesiz, fil) {
do_something_with(linebuf);
}
else { perror("getline"; exit(EXIT_FAILURE); }
If you want to read an editable line from stdin in a terminal consider GNU readline.
If you are restricted to pure C99 code you have to do the heap allocation yourself (malloc or calloc or perhaps -with care- realloc)
If you just want to copy the first line of some existing buffer char*bigbuf; which is non-NULL, valid, and zero-byte terminated:
char*line = NULL;
char *eol = strchr(bigbuf, '\n');
if (!eol) { // bigbuf is a single line so duplicate it
line = strdup(bigbuf);
if (!line) { perror("strdup"); exit(EXIT_FAILURE); }
} else {
size_t linesize = eol-bugbuf;
line = malloc(linesize+1);
if (!line) { perror("malloc"); exit(EXIT_FAILURE);
memcpy (line, bigbuf, linesize);
line[linesize] = '\0';
}

fscanf overwriting next bytes in memory (C)

The basic gist is, I'm reading words from a text file, storing them as a string, running a function, and then looping over this multiple times, rewriting that string with every new line read. After this loop is done, I need to deal with a different string. The problem is, the second string's bytes, even though I've memset them to 0 at declaration, are getting overwritten by the extra letters in words longer than the space I've allocated to the first string:
char* currDictWord = malloc(9*(sizeof(char));
char* currBrutWord = malloc(9*(sizeof(char));
memset(currBrutWord, 0, 9);
memset(currDictWord, 0, 9);
...
while (stuff) {
fscanf(dictionary, "%s", currDictWord);
}
...
printf("word: %s\n", currBrutWord);
currBrutWord will not be empty anymore. The two ways I've dealt with this are by either making sure currDictWord is longer than the longest word in the dictionary file (kind of a ghetto solution), and doing a new memset on currBrutWord after the loop. Is there no way to tell C to stop writing stuff into memory I've specifically allocated for a different variable?
Yes: stop using fscanf (and preferably the whole scanf-family), and use fgets instead, it lets you pass the maximum number of bytes to read into the variable.
EDIT: (in response to the comment)
fgets will stop reading until count bytes have been read or a newline has been found, which will be in the string. So after fgetsing the string check if there is a newline at the end of it (and remove if necessary). If there is no newline in the string fgetc from the file until you've found one, like this:
fgets(currDictWord, 9, dictionary);
if(currDictWord[strlen(currDictWord) - 1] != '\n'){
while(fgetc(dictionary) != '\n'); /* no body necssary */
/* the stream-pointer is now a the beginning of the next line */
}
Improper string assignment and that not validating data read from a file.
currBrutWord is overrun because too many chars were written into currBrutWord. The same would have happened had you done:
strcpy(currBrutWord, "123456789"); // Bad as this copy 9+1 char into currBrutWord
When using fscanf(), one could limit the data read via:
fscanf(dictionary, "%8s", currDictWord);
This prevents fscanf() from putting too much data into currDictWord. That part is good, but you still have unexpected data coming from the file. You need to challenge any data from the outside world.
if (NULL == fgets(bigbuf, sizeof bigbuf, dictionary)) {
; handle EOF or I/O error
}
// now parse and validate bigbuf using various tools: strtok(), sscanf(), etc.
int n;
if ((sscanf(bigbuf, "%8s%n", currDictWord, &n) < 1) || (bigbif[n] != '\n')) {
; handle error
}

C: Writing and Reading a string to and from a binary file

I want to store strings in a binary file, along with a lot of other data, im using the code below (when i use it for real the strings will be malloc'd) I can write to the file. Ive looked at it in a hex editor. Im not sure im writing the null terminator correctly (or if i need to). when i read back out i get the same string length that i stored, but not the string. what am i doing wrong?
FILE *fp = fopen("mybinfile.ttt", "wb");
char drumCString[6] = "Hello\0";
printf("%s\n", drumCString);
//the string length + 1 for the null terminator
unsigned short sizeOfString = strlen(drumCString) + 1;
fwrite(&sizeOfString, sizeof(unsigned short), 1, fp);
//write the string
fwrite(drumCString, sizeof(char), sizeOfString, fp);
fclose(fp);
fp = fopen("mybinfile.ttt", "rb");
unsigned short stringLength = 0;
fread(&stringLength, sizeof(unsigned short), 1, fp);
char *drumReadString = malloc(sizeof(char) * stringLength);
int count = fread(&drumReadString, sizeof(char), stringLength, fp);
//CRASH POINT
printf("%s\n", drumReadString);
fclose(fp);
You are doing wrong while reading.
you have put the & for the pointer variable that's why it gives segmentation fault.
I removed that it works fine and it returns Hello correctly.
int count = fread(drumReadString, sizeof(char), stringLength, fp);
I see a couple of issues, some problematic, some stylistic.
You should really test the return values from malloc, fread and fwrite since it's possible that the allocation can fail, and no data may be read or written.
sizeof(char) is always 1, there's no need to multiply by it.
The character array "Hello\0" is actually 7 bytes long. You don't need to add a superfluous null terminator.
I prefer the idiom char x[] = "xxx"; rather than specifying a definite length (unless you want an array longer than the string of course).
When you fread(&drumReadString ..., you're actually overwriting the pointer, not the memory it points to. This is the cause of your crash. It should be fread(drumReadString ....
A couple of tips:
1
A terminating \0 is implicit in any double quote string, and by adding an additional at the end you end up with two. The following two initializations are identical:
char str1[6] = "Hello\0";
char str2[6] = { 'H', 'e', 'l', 'l', 'o', '\0', '\0'};
So
char drumReadString[] = "Hello";
is enough, and specifying the size of the array is optional when it is initialized like this, the compiler will figure out the required size (6 bytes).
2
When writing a string, you might just as well just write all characters in one go (instead of writing one by one character sizeOfString times):
fwrite(drumCString, sizeOfString, 1, fp);
3
Even though not so common for a normal desktop pc scenario, malloc can return NULL and you will benefit from developing a habbit of always checking the result because in embedded environments, getting NULL is not an unlikely outcome.
char *drumReadString = malloc(sizeof(char) * stringLength);
if (drumReadString == NULL) {
fprintf(stderr, "drumReadString allocation failed\n");
return;
}
You don't write the terminating NUL, you don't need to but then you have to think about adding it when reading. ie malloc stringLength + 1 char, read stringLength chars and add a \0 at the end of what has been read.
Now the usual warning: if you are writing binary file the way you are doing here, you have lots of unstated assumptions which make your format difficult to port, sometimes even to another version of the same compiler -- I've seen default alignment in struct changes between compiler versions.
Some more to add to paxdiablo and AProgrammer - if you are going to use malloc in the future, just do it from the get go. It's better form and means you won't have to debug when switch over.
Additionally I'm not fully seeing the use of the unsigned short, if you are planning on writing a binary file, consider that the unsigned char type is generally of size byte, making it very convenient for that purpose.
You Just remove your &drumReadString in the fread function.You simply use drumReadString in that function as ganesh mentioned.Because,drumReadString is an array.Array is similar to pointers which point to the memory location directly.

Resources