Implementing scanf using read() - c

I want to implement scanf function using read system call, but I have a problem - when you use read you have to specify how many bits it will read, and I am not sure what number to use. I have for example implemented flag %b for binary numbers, %x for hex etc, should I just use a constant amount of bits like 256?

You will probably have to implement a buffering layer. When you call read(), you get a number of bytes from the stream. From the point of view of the system, those bytes are now gone and will not be returned by a subsequent call to read(). The problem is, if you have a format specifier like %d, you can't know when you are done reading the number until you actually read the next non-digit byte. You'll have to store that already-read input somewhere so you can use it later, for example for a following %s.

You'll basically have to do your own buffering, since any number you choose may be too large in some cases (unless you choose something ridiculously small) and too small in others (for example, lots of whitespace), so there is no right number.

read(), without a layer of buffering code, will not work to implement scanf() as scanf() needs the ability to "unget" characters back to stdin.
Instead, scanf() could be implemented with fgetc()/unget(), but it is a great deal to implement a complete scanf(), no matter what route is taken.

Related

fgetc vs getline or fgets - which is most flexible

I am reading data from a regular file and I was wondering which would allow for the most flexibility.
I have found that both fgets and getline both read in a line (one with a maximum number of characters, the other with dynamic memory allocation). In the case of fgets, if the length of the line is bigger than the given size, the rest of the line would not be read but remain buffered in the stream. With getline, I am worried that it may attempt to assign a large block of memory for an obscenely long line.
The obvious solution for me seems to be turning to fgetc, but this comes with the problem that there will be many calls to the function, thereby resulting in the read process being slow.
Is this compromise in either case between flexibility and efficiency unavoidable, or can it worked through?
The three functions you mention do different things:
fgetc() reads a single character from a FILE * descriptor, it buffers input and so, you can process the file in a buffered way without having the overhelm of making a system call for each character. when your problem can be handled in a character oriented way, it is the best.
fgets() reads a single line from a FILE * descriptor, it's like calling fgetc() to fill the character array you pass to it in order to read line by line. It has the drawback of making a partial read in case your input line is longer than the buffer size you specify. This function buffers also input data, so it is very efficient. If you know that your lines will be bounded, this is the best to read your data line by line. Sometimes you want to be able to process data in an unbounded line size way, and you must redesign your problem to use the available memory. Then the one below is probably better election.
getline() this function is relatively new, and is not ANSI-C, so it is possible you port your program to some architecture that lacks it. It's the most flexible, at the price of being the less efficient. It requires a reference to a pointer that is realloc()ated to fill more and more data. It doesn't bind the line length at the cost of being possible to fill all the memory available on a system. Both, the buffer pointer and the size of the buffer are passed by reference to allow them to be updated, so you know where the new string is located and the new size. It mus be free()d after use.
The reason of having three and not only one function is that you have different needs for different cases and selecting the mos efficient one is normally the best selection.
If you plan to use only one, probably you'll end in a situation where using the function you selected as the most flexible will not be the best election and you will probably fail.
Much is case dependent.
getline() is not part of the standard C library. Its functionality may differ - depends on the implementation and what other standards it follows - thus an advantage for the standard fgetc()/fgets().
... case between flexibility and efficiency unavoidable, ...
OP is missing the higher priorities.
Functionality - If code cannot function right with the selected function, why use it? Example: fgets() and reading null characters create issues.
Clarity - without clarity, feel the wrath of the poor soul who later has to maintain the code.
would allow for the most flexibility. (?)
fgetc() allows for the most flexibility at the low level - yet helper functions using it to read lines tend to fail corner cases.
fgets() allows for the most flexibility at mid level - still have to deal with long lines and those with embedded null characters, but at least the low level of slogging in the weeds is avoided.
getline() useful when high portability not needed and risks of allowing the user to overwhelm resources is not a concern.
For robust handing of user/file input to read a line, create a wrapping function (e.g. int my_read_line(size_t buf, char *buf, FILE *f)) and call that and only that in user code. Then when issues arise, they can be handled locally, regardless of the low level input function selected.

How to use sscanf correctly and safely

First of all, other questions about usage of sscanf do not answer my question because the common answer is to not use sscanf at all and use fgets or getch instead, which is impossible in my case.
The problem is my C professor wants me to use scanf in a program. It's a requirement.
However the program also must handle all the incorrect input.
The program must read an array of integers. It doesn't matter in what format the integers
for the array are supplied. To make the task easier, the program might first read the size of the array and then the integers each in a new line.
The program must handle the inputs like these (and report errors appropriately):
999999999999999...9 (numbers larger than integer)
12a3 (don't read this as an integer 12)
a...z (strings)
11 aa 22 33\n all in one line (this might be handled by discarding everything after 11)
inputs larger than the input array
There might be more incorrect cases, these are the only few I could think of.
If the erroneous input is supplied, the program must ask the user to input again until
the correct input is given, but the previous correct input must be kept (only incorrect
input must be cleared from the input stream).
Everything must conform to C99 standard.
The scanf family of function cannot be used safely, especially when dealing with integers. The first case you mentioned is particularly troublesome. The standard says this:
If this object does not have an appropriate type, or if the result of
the conversion cannot be represented in the object, the behavior is
undefined.
Plain and simple. You might think of %5d tricks and such but you'll find they're not reliable. Or maybe someone will think of errno. The scanf functions aren't required to set errno.
Follow this fun little page: they end up ditching scanf altogether.
So go back to your C professor and ask them: how exactly does C99 mandate that sscanf will report errors ?
Well, let sscanf accept all inputs as %s (i.e. strings) and then program analyze them
If you must use scanf to accept the input, I think you start with something a bit like the following.
int array[MAX];
int i, n;
scanf("%d", &n);
for (i = 0; i < n && !feof(stdin); i++) {
scanf("%d", &array[i]);
}
This will handle (more or less) the free-format input problem since scanf will automatically skip leading whitespace when matching a %d format.
The key observation for many of the rest of your concerns is that scanf tells you how many format codes it parsed successfully. So,
int matches = scanf("%d", &array[i]);
if (matches == 0) {
/* no integer in the input stream */
}
I think this handles directly concerns (3) and (4)
By itself, this doesn't quite handle the case of the input12a3. The first time through the loop, scanf would parse '12as an integer 12, leaving the remaininga3` for the next loop. You would get an error the next time round, though. Is that good enough for your professor's purposes?
For integers larger than maxint, eg, "999999.......999", I'm not sure what you can do with straight scanf.
For inputs larger than the input array, this isn't a scanf problem per se. You just need to count how many integers you've parsed so far.
If you're allowed to use sscanf to decode strings after they've been extracted from the input stream by something like scanf("%s") you could also do something like this:
while (...) {
scanf("%s", buf);
/* use strtol or sscanf if you really have to */
}
This works for any sequence of white-space separated words, and lets you separate scanning the input for words, and then seeing if those words look like numbers or not. And, if you have to, you can use scanf variants for each part.
The problem is my C professor wants me to use scanf in a program.
It's a requirement.
However the program also must handle all the incorrect input.
This is an old question, so the OP is not in that professor's class any more (and hopefully the professor is retired), but for the record, this is a fundamentally misguided and basically impossible requirement.
Experience has shown that when it comes to interactive user input, scanf is suitable only for quick-and-dirty situations when the input can be assumed to correct.
If you want to read an integer (or a floating-point number, or a simple string) quickly and easily, then scanf is a nice tool for the job. However, its ability to gracefully handle incorrect input is basically nonexistent.
If you want to read input robustly, reliably detecting incorrect input, and perhaps warning the user and asking them to try again, scanf is simply not the right tool for the job. It's like trying to drive screws with a hammer.
See this answer for some guidelines for using scanf safely in those quick-and-dirty situations. See this question for suggestions on how to do robust input using something other than scanf.
scanf("%s", string) into long int_string = strtol(string, &end_pointer, base:10)

good way to read text file in C

I need to read a text file which may contain long lines of text. I am thinking of the best way to do this. Considering efficiency, even though I am doing this in C++, I would still choose C library functions to do the IO.
Because I don't know how long a line is, potentially really really long, I don't want to allocate a large array and then use fgets to read a line. On the other hand, I do need to know where each line ends. One use case of such is to count the words/chars in each line. I could allocate a small array and use fgets to read, and then determine whether there is \r, \n, or \r\n appearing in the line to tell whether a full line has been read. But this involves a lot of strstr calls (for \r\n, or there are better ways? for example from the return value of fgets?). I could also do fgetc to read each individual char one at a time. But does this function have buffering?
Please suggest compare these or other different ways of doing this task.
The correct way to do I/O depends on what you're going to do with the data. If you're counting words, line-based input doesn't make much sense. A more natural approach is to use fgetc and deal with a character at a time and let stdio worry about the buffering. Only if you need the whole line in memory at the same time to process it should you actually allocate a buffer big enough to contain it all.

How can I get how many bytes sscanf_s read in its last operation?

I wrote up a quick memory reader class that emulates the same functions as fread and fscanf.
Basically, I used memcpy and increased an internal pointer to read the data like fread, but I have a fscanf_s call. I used sscanf_s, except that doesn't tell me how many bytes it read out of the data.
Is there a way to tell how many bytes sscanf_s read in the last operation in order to increase the internal pointer of the string reader? Thanks!
EDIT:
And example format I am reading is:
|172|44|40|128|32|28|
fscanf reads that fine, so does sscanf. The only reason is that, if it were to be:
|0|0|0|0|0|0|
The length would be different. What I'm wondering is how fscanf knows where to put the file pointer, but sscanf doesn't.
With scanf and family, use %n in the format string. It won't read anything in, but it will cause the number of characters read so far (by this call) to be stored in the corresponding parameter (expects an int*).
Maybe I´m silly, but I´m going to try anyway. It seems from the comment threads that there's still some misconception. You need to know the amount of bytes. But the method returns only the amount of fields read, or EOF.
To get to the amount of bytes, either use something that you can easily count, or use a size specifier in the format string. Otherwise, you won't stand a chance finding out how many bytes are read, other then going over the fields one by one. Also, what you may mean is that
sscanf_s(source, "%d%d"...)
will succeed on both inputs "123 456" and "10\t30", which has a different length. In these cases, there's no way to tell the size, unless you convert it back. So: use a fixed size field, or be left in oblivion....
Important note: remember that when using %c it's the only way to include the field separators (newline, tab and space) in the output. All others will skip the field boundaries, making it harder to find the right amount of bytes.
EDIT:
From "C++ The Complete Reference" I just read that:
%n Receives an integer value equal to
the nubmer of characters read so far
Isn't that precisely what you were after? Just add it in the format string. This is confirmed here, but I haven't tested it with sscanf_s.
From MSDN:
sscanf_s, _sscanf_s_l, swscanf_s, _swscanf_s_l
Each of these functions returns the number of fields successfully converted and assigned; the return value does not include fields that were read but not assigned. A return value of 0 indicates that no fields were assigned. The return value is EOF for an error or if the end of the string is reached before the first conversion.

will 'printf' always do its job?

printf("/*something else*/"); /*note that:without using \n in printf*/
I know printf() uses a buffer which prints whatever it contains when, in the line buffer, "\n" is seen by the buffer function. So when we forget to use "\n" in printf(), rarely, line buffer will not be emptied. Therefore, printf() wont do its job. Am I wrong?
The example you gave above is safe as there are no variable arguments to printf. However it is possible to specify a format string and supply variables that do not match up with the format, which can deliver unexpected (and unsafe) results. Some compilers are taking a more proactive approach with printf use case analysis, but even then one should be very, very careful when printf is used.
From my man page:
These functions return the number of characters printed (not including
the trailing \0 used to end output to strings) or a negative value
if an output error occurs, except for snprintf() and vsnprintf(), which
return the number of characters that would have been printed if the n
were unlimited (again, not including the final \0).
So, it sounds like the can fail with a negative error.
Yes, output to stdout in C (using printf) is normally line buffered. This means that printf() will collect output until either:
the buffer is full, or
the output contains a \n newline
If you want to force the output of the buffer, call fflush(stdout). This will work even if you have printed something without a newline.
Also printf and friends can fail.
Common implementations of C call malloc() in the printf family of the stdC library.
malloc can fail, so then will printf. In UNIX the write() call can be interrupted by EINTR, so context switching in UNIX will trigger faults (EINTR). Windows can and will do similar things.
And... Although you do not see it posted here often you should always check the return code from any system or library function that returns a value.
Like that, no. It won't always work as you expect, especially if you're using user input as the format string. If the first argument has %s or %d or other format specifiers in it, they will be parsed and replaced with values from the stack, which can easily break if it's expecting a pointer and gets an int instead.
This way tends to be a lot safer:
printf("%s", "....");
The output buffer will be flushed before exit, or before you get input, so the data will make it regardless of whether you send a \n.
printf could fail for any number of reasons. If you're deep in recursion, calling printf may blow your stack. The C and C++ standards have little to say on threading issues and calling printf while printf is executing in another thread may fail. It could fail because stdout is attached to a file and you just filled your filesystem, in which case the return value tells you there was a problem. If you call printf with a string that isn't zero terminated then bad things could happen. And printf can apparently fail if you're using buffered I/O and your buffer hasn't been flushed yet.

Resources