How to wrap fscanf() using only fread() and vsscanf()

How to wrap fscanf() using only fread() and vsscanf() - c

I'm porting some code on an embedded platform that uses a C-like API. The original code uses fscanf() to read and parse data from files. Unfortunately on my API I don't have a fscanf() equivalent, so prior to the actual porting I'm trying to obtain the same behavior of fscanf() using fread() and vsscanf() (which I do have). I also have the equivalent of fseek() and ftell().
EDIT: please keep in mind that the access to the embedded filesystem is very limited (fread - fseek - ftell - fgetc - fgets), so I need a solution that works with strings in memory rather than accessing the file in some other way.
The code looks something like this:
int main()
{
[...] /* variable declarations and definitions */
do
{
read = wrapped_fscanf(pFile, "%d %s", &val, str);
} while (read == 2);
fclose(pFile);
return 0;
}
int wrapped_fscanf(FILE *f, const char *template, ...)
{
va_list args;
va_start(args, template);
char tmpstr[50];
fread(tmpstr, sizeof(char), sizeof(tmpstr), f);
int ret = vsscanf(tmpstr, template, args);
long offset = /* ??? */
fseek(f, offset, SEEK_CUR);
va_end(args);
return ret;
}
The problem is that fscanf() moves the pointer to the position in the file stream at the end of the match, whereas with fread() I'm reading a fixed amount of data (in this case 50 bytes) and I should find a way to move the pointer back to the end of the matched string.
Let's assume that the 50-char string I read from the file is the following:
12 bar 13 foo 56789012345678901234567890123456789
fscanf() would match the int 12 , the string bar and the pointer would point right after the "r" in "bar" so I can call it again and read 13 foo
On the other hand fread() puts the pointer after the last char in the 50-element sequence, which is wrong: I still have to read 13 foo but if I call wrapped_fscanf() again the pointer is in the 51st position.
I have to use fseek() to roll back to the end of the first match, but how do I do that? How do I calculate the value of offset ?
vsscanf() returns the number of matches, not the length of the string and I have no way of knowing how many whitespace charachters separate the elements of the match (or do I?)
I.e. I get the same outputs( {var,str,read} == {9,"xyz",2} ) with
9 xyz
and
9 xyz
Is there some trick that I'm not aware of or do I have to find another solution other than wrapping fscanf() with fread() vsscanf() ftell() and fseek()?
Thank you

Supposing that your vsscanf() implementation supports it, your substitute for fscanf() can append a %n field descriptor to the end of the provided format. As long as there is no failure prior to vsscanf() reaching that field, it will store the number of characters consumed up to that point in the corresponding argument. You could then use that result to reposition the stream appropriately. That would require a bit of varargs wrangling and probably some macro assistance, but I think it could be made to work.

You will need some intermediary buffering code, that will grab chunks of data (using fread), and scan your buffer for the pattern. if the pattern is found, truncate the buffer, if the pattern is not found, append some more data. this is effectively what fscanf will do.

Related

puts and printf do not give out full text (text containing CJK characters), when the text is read from a local file, on Windows, MSVC

The text contains:
..... (some characters can't be posted on SO)
xxxxxxxx=xxx xxxxxxx=xxxxx://xxx..xxx/xxxxx/xx9528994
(for full text & data please see https://github.com/ggaarder/snippets/raw/master/x.txt)
which is ended in xxxxx://xxx..xxx/xxxxx/xx9528994, however, when reading it then puts, it only gives out
..... (some characters can't be posted on SO)
xxxxxxxx=xxx xxxxxxx=xxxxx:/
which only prints to xxxxx:/, and /xxx..xxx/xxxxx/xx9528994 is missed.
Code to test:
#include <stdio.h>
int main(void)
{
char s[30000];
FILE *f = fopen("x.txt", "r");
fread(s, sizeof(s), 1, f);
puts(s);
return 0;
}
The buffer size 30000 is adequate. x.txt is 1049 bytes.
You can download x.txt at https://github.com/ggaarder/snippets/raw/master/x.txt, for convenience I have packed everything to https://github.com/ggaarder/snippets/raw/master/foo.zip.
It will be very kind of you to download and take a look of x.txt, since most part of it can't be posted on SO because of the special characters, including some CJK.
Attempts:
The whole file is read properly. #pmg notices that fread returns zero, while #Someprogrammerdude points out that if fread's size and count arguments are swapped fread returns 1049, and this supports the guess.
If the CJK letters are removed, the output will be totally OK. So I think there is no '\0' in the middle.
By adding
ret = puts(s);
printf("\nret: %d, %s", ret, strerror(errno));
We will get ret: 0, No error. puts return zero and there's nothing in errno.
You may notice that there's a heading \n in 3.. Yes, puts doesn't gives out the newline as usual - does this suggest that puts failed?
But why does it returns zero and there's nothing in errno?
May it be related to Windows NT cmd? Maybe some special terminal control letters are unintentionally out.
Reading by rb is the same. x.txt is an XML text, just for convenience I removed part of it that are the irrelevant, so it looks like spam.
I guess this is just yet another encoding issue, plus some magical secret Windows commandline control sequence .... I'm not taking it. I will just erase all non-ASCII characters.

The order of the "size" and "count" arguments to fread is crucial.
The first argument is the "element" size, and the second argument is the number of elements to attempt to read.
In the case of a text file, the element size is a single character, usually a single byte. The number of elements to attempt to read is the size of the destination array.
So your call should be
fread(s, 1, sizeof s, f);
instead.
What happens now when you have the opposite is that you say that the "element" size is 30000 bytes, and that fread should read one such element. Since the size of the file is less than 30000 bytes, it just can't read even a single element, and returns 0 to indicate it.

open the file in binary mode
switch arguments and check the return value of fread().
#include <stdio.h>
#include <stdlib.h>
int main(void) {
char s[30000];
FILE *f = fopen("x.txt", "rb"); // binary mode
unsigned long len = fread(s, 1, sizeof(s), f); // switch args, check value
if (len < 1) {
perror("bad fread");
exit(EXIT_FAILURE);
}
s[len] = 0; // properly terminate s
puts(s);
return 0;
}

It's just yet another encoding issue happening everyday. Just SetConsoleOutputCP(65001) or /utf-8 or set execution code page in #pragma and everything will be fine.

C - moving back the pointer in the file using lseek

I am writing an academic project in C and I can use only <fcntl.h> and <unistd.h> libraries to file operations.
I have the function to read file line by line. The algorithm is:
Set pointer at the beginning of the file and get current position.
Read data to the buffer (char buf[100]) with constant size, iterate character by character and detect end of line '\n'.
Increment current position: curr_pos = curr_pos + length_of_read_line;
Set pointer to current position using lseek(fd, current_position, SEEK_SET);
SEEK_SET - set pointer to given offset from the beginning of the file. In my pseudo code current_position is the offset.
And actually it works fine, but I always move the pointer starting at the beginning of the file - I use SEEK_SET - it isn't optimized.
lseek accept also argument SEEK_CUR - it's a current position. How can I move back pointer from current position of pointer (SEEK_CUR). I tried to set negative offset, but didn't work.

The most efficient way to read lines of data from a file is typically to read a large chunk of data that may span multiple lines, process lines of data from the chunk until one reaches the end, move any partial line from the end of the buffer to the start, and then read another chunk of data. Depending upon the target system and task to be performed, it may be better to read enough to fill whatever space remains after the partial line, or it may be better to always read a power-of-two number of bytes and make the buffer large enough to accommodate a chunk that size plus a maximum-length partial line (left over from the previous read). The one difficulty with this approach is that all data to be read from the stream using the same buffer. In cases where that is practical, however, it will often allow better performance than using many separate calls to fread, and may be nicer than using fgets.
While it should be possible for a standard-library function to facilitate line input, the design of fgets is rather needlessly hostile since it provides no convenient indication of how much data it has read. After reading each line, code that wants a string containing the printable portion will have to use strlen to try to ascertain how much data was read (hopefully the input won't contain any zero bytes) and then check the byte before the trailing zero to see if it's a newline. Not impossible, but awkward at the very least. If the fread-and-buffer approach will satisfy an application's needs, it's likely to be at least as efficient as using fgets, if not moreso, and since the effort required to use fgets() robustly will be comparable to that required to use a buffering approach, one may as well use the latter.

Since your question is tagged as posix, I would go with getline(), without having to manually take care of moving the file pointer.
Example:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
FILE* fp;
char* line = NULL;
size_t len = 0;
ssize_t read;
fp = fopen("input.txt", "r");
if(fp == NULL)
return -1;
while((read = getline(&line, &len, fp)) != -1)
{
printf("Read line of length %zu:\n", read);
printf("%s", line);
}
fclose(fp);
if(line)
free(line);
return 0;
}
Output with custom input:
Read line of length 11:
first line
Read line of length 12:
second line
Read line of length 11:
third line

fgets not reading the beginning of a line

I am having trouble reading a few lines of text from a file using fgets. The file is some basic user data that is written to a file within the bundle the first time the plugin is launched. Any subsequent launch of the plugin should result in the user data being read and cross referenced to check the users authenticity.
The data is always 3 lines long and is written with frwite exactly as it should be and is opened with fopen.
My original theory was to just call fgets 3 times reading each line into it's own char array which is part of a data struct. The problem is the first line is read correctly, the second line is read as though the position indicator starts on the next line but offset by the number of characters read from line 1. The third line is then not read at all.
fgets is not returning any errors and is behaving as though it has read the data it should have so i'm obviously missing something.
Anyway here's a portion of my code hopefully someone can some shed some light on my mistakes!
int length;
fgets(var.n, 128, regFile);
length = strlen(var.n);
var.n[length-1] = NULL;
fgets(var.em, 128, regFile);
length = strlen(var.em);
var.em[length-1] = NULL;
fgets(var.k, 128, regFile);
length = strlen(var.k);
var.k[length-1] = NULL;
fclose(regFile);
Setting the last character in each string to NULL is just to remove the /n
This sequence of code outputs the whole of line 1, the second half of line 2 and none of line 3.

Thanks to #alvits for the answer to this one:
fwrite() is not compatible with fgets(). Files created using fwrite() should use fread() to read them ?>back in. Both fwrite() and fread() operates on binary streams unless explicitly converted to and from >strings. fgets() is compatible with fputs(), both operates on strings.
I used fputs() to write my data instead and it read back in perfectly.

In POSIX systems, including Linux, there is no differentiation between binary and text files. When opening a file stream, the b flag is ignored. This is described in fopen().
You might ask "how would you differentiate text from binary files?". The contents differentiate them. How the contents are written makes them a binary or text file.
Look at the signature size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream). You'll notice that it writes the contents of *ptr with size describing the size of each members, nmemb. The written stream is not converted to string. If you were to write 97 it will write the binary 97 which in ascii is A. Binary data does not obey string terminations. Presence of \n and \0 in data is literally written as is.
Now look at the signature int fputs(const char *s, FILE *stream). It writes the string content of *s. If you were to write 97, it will have to be a string "97" which is not A. String termination is obeyed. \n is automatically converted to the O/S supported newline (CRLF or LF).
You can coerce fwrite() to behave like fputs() but not the other way around. For example, if you declare ptr as a pointer to string and calculate the size exactly as the length of the content excluding string terminator, you'll be able to write it out as text instead of binary. You will also need to handle \0 and \n and convert them to O/S supported newline. Writing the entire string buffer will write everything including and past the string terminators.

C, format file for data of HTTP response

I have no experience with fscanf() and very little with functions for FILE. I have code that correctly determines if a client requested an existing file (using stat() and it also ensures it is not a directory). I will omit this part because it is working fine.
My goal is to send a string back to the client with a HTTP header (a string) and the correctly read data, which I would imagine has to become a string at some point to be concatenated with the header for sending back. I know that + is not valid C, but for simplicity I would like to send this: headerString+dataString.
The code below does seem to work for text files but not images. I was hoping that reading each character individually would solve the problem but it does not. When I point a browser (Firefox) at my server looking for an image it tells me "The image (the name of the image) cannot be displayed because it contains errors.".
This is the code that is supposed to read a file into httpData:
int i = 0;
FILE* file;
file = fopen(fullPath, "r");
if (file == NULL) errorMessageExit("Failed to open file");
while(!feof(file)) {
fscanf(file, "%c", &httpData[i]);
i++;
}
fclose(file);
printf("httpData = %s\n", httpData);
Edit: This is what I send:
char* httpResponse = malloc((strlen(httpHeader)+strlen(httpData)+1)*sizeof(char));
strcpy(httpResponse, httpHeader);
strcat(httpResponse, httpData);
printf("HTTP response = %s\n", httpResponse);
The data part produces ???? for the image but correct html for an html file.

Images contain binary data. Any of the 256 distinct 8-bit patterns may appear in the image including, in particular, the null byte, 0x00 or '\0'. On some systems (notably Windows), you need to distinguish between text files and binary files, using the letter b in the standard I/O fopen() call (works fine on Unix as well as Windows). Given that binary data can contain null bytes, you can't use strcpy() et al to copy chunks of data around since the str*() functions stop copying at the first null byte. Therefore, you have to use the mem*() functions which take a start position and a length, or an equivalent.
Applied to your code, printing the binary httpData with %s won't work properly; the %s will stop at the first null byte. Since you have used stat() to verify the existence of the file, you also have a size for the file. Assuming you don't have to deal with dynamically changing files, that means you can allocate httpData to be the correct size. You can also pass the size to the reading code. This also means that the reading code can use fread() and the writing code can use fwrite(), saving on character-by-character I/O.
Thus, we might have a function:
int readHTTPData(const char *filename, size_t size, char *httpData)
{
FILE *fp = fopen(filename, "rb");
size_t n;
if (fp == 0)
return E_FILEOPEN;
n = fread(httpData, size, 1, fp);
fclose(fp);
if (n != 1)
return E_SHORTREAD;
fputs("httpData = ", stdout);
fwrite(httpData, size, 1, stdout);
putchar('\n');
return 0;
}
The function returns 0 on success, and some predefined (negative?) error numbers on failure. Since memory allocation is done before the routine is called, it is pretty simple:
Open the file; report error if that fails.
Read the file in a single operation.
Close the file.
Report error if the read did not get all the data that was expected.
Report on the data that was read (debugging only — and printing binary data to standard output raw is not the best idea in the world, but it parallels what the code in the question does).
Report on success.
In the original code, there is a loop:
int i = 0;
...
while(!feof(file)) {
fscanf(file, "%c", &httpData[i]);
i++;
}
This loop has a lot of problems:
You should not use feof() to test whether there is more data to read. It reports whether an EOF indication has been given, not whether it will be given.
Consequently, when the last character has been read, the feof() reports 'false', but the fscanf() tries to read the next (non-existent) character, adds it to the buffer (probably as a letter such as ÿ, y-umlaut, 0xFF, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS).
The code makes no check on how many characters have been read, so it has no protection against buffer overflow.
Using fscanf() to read a single character is a lot of overhead compared to getc().
Here's a more nearly correct version of the code, assuming that size is the number of bytes allocated to httpData.
int i = 0;
int c;
while ((c = getc(file)) != EOF && i < size)
httpData[i++] = c;
You could check that you get EOF when you expect it. Note that the fread() code does the size checking inside the fread() function. Also, the way I wrote the arguments, it is an all-or-nothing proposition — either all size bytes are read or everything is treated as missing. If you want byte counts and are willing to tolerate or handle short reads, you can reverse the order of the size arguments. You could also check the return from fwrite() if you wanted to be sure it was all written, but people tend to be less careful about checking that output succeeded. (It is almost always crucial to check that you got the input you expected, though — don't skimp on input checking.)
At some point, for plain text data, you need to think about CRLF vs NL line endings. Text files handle that automatically; binary files do not. If the data to be transferred is image/png or something similar, you probably don't need to worry about this. If you're on Unix and dealing with text/plain, you may have to worry about CRLF line endings (but I'm not an expert on this — I've not done low-level HTTP stuff recently (not in this millennium), so the rules may have changed).

How do I use strlen() on a formatted string?

I'd like to write a wrapper function for the mvwprint/mvwchgat ncurses functions which prints the message in the specified window and then changes its attributes.
However, mvwchgat needs to know how many characters it should change - and I have no idea how to tell mvwchgat how long the formatted string is, since a strlen() on, for instance, "abc%d" obviously returns 5, because strlen doesn't know what %d stands for ...

In C99 or C11, you can use a line like this:
length = snprintf(NULL, 0, format_string, args);
From the manual of snprintf (emphasis mine):
The functions snprintf() and vsnprintf() do not write more than size bytes (including the terminating null byte ('\0')). If the output was truncated due to this limit then the return value is the number of characters (excluding the terminating null byte) which would have been written to the final string if enough space had been available. Thus, a return value of size or more means that the output was truncated.
Since we are giving snprintf 0 as the size, then the output is always truncated and the output of snprintf would be the number of characters that would have been written, which is basically the length of the string.
In C89, you don't have snprintf. A workaround is to create a temporary file, or if you are in *nix open /dev/null and write something like this:
FILE *throw_away = fopen("/dev/null", "w"); /* On windows should be "NUL" but I haven't tested */
if (throw_away)
{
fprintf(throw_away, "<format goes here>%n", <args go here>, &length);
fclose(throw_away);
} /* else, try opening a temporary file */

You can't know in advance how long your string will be:
printf("abc%d", 0); //4 chars
printf("abc%d", 111111111);//12 chars
All with the same format string.
The only sure way is to sprintf the text in question into a buffer, strlen(buffer) the result and printf("%s", buffer); the result to screen.
This solution avoids double formatting at the cost of allocating long enough buffer.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to wrap fscanf() using only fread() and vsscanf() - c

You will need some intermediary buffering code, that will grab chunks of data (using fread), and scan your buffer for the pattern. if the pattern is found, truncate the buffer, if the pattern is not found, append some more data. this is effectively what fscanf will do.

Related

puts and printf do not give out full text (text containing CJK characters), when the text is read from a local file, on Windows, MSVC

C - moving back the pointer in the file using lseek

fgets not reading the beginning of a line

C, format file for data of HTTP response

How do I use strlen() on a formatted string?

Categories

Resources