Being new to C, I just came across the C11 addition getenv_s. Here is what I'm actually trying to do:
Handling POST data sent by html form in CGI C
I'm trying to sanitize, both CONTENT_LENGTH and message-body(stdin) in my case. That is the objective here.
So in order to limit the upper-bounds (against malformed CONTENT_LENGTH, trying to cause overflow), I tried using an array instead of pointer, like this:
char some[512];
some = getenv("CONTENT_LENGTH");
It naturally threw an error (incompatible types when assigning to type char[512] from type char *). So I assume,
Q1. getenv is already a string?
Then I came across "getenv_s"
http://en.cppreference.com/w/c/program/getenv
Q2. Can anyone tell me a safe-as-rocksolid way of using this? To avoid underflow, overflow, etc.
First off, do not use any of the _s functions. They are an optional feature of C11 which, to my knowledge, has never been fully implemented by anyone, not even Microsoft, which invented them, and it has been proposed to remove them again; even more importantly, they do not actually solve the problems they purport to address. (The intention was to have a bunch of drop-in replacements for dangerous string-related functions, but it turns out that that doesn't work; fixing string-related security bugs in C programs requires actual redesign with thought put into it. The functions that genuinely could not be used safely already had portable replacements, e.g. fgets instead of gets, snprintf instead of sprintf, strsep instead of strtok -- sometimes the replacement is not in ISO C but it's usually widespread enough not to worry about, or you can get a shim implementation from gnulib.)
getenv is guaranteed to return a valid NUL-terminated C string (or a null pointer), but the string could be arbitrarily long. In the context of a CGI program written in C, the correct way to "sanitize" the value of the CONTENT_LENGTH environment variable is to feed it to strtol and carefully check for errors:
/* Returns a valid content_length, or -1 on error. */
long get_content_length(void)
{
char *content_length, *endp;
long rv;
content_length = getenv("CONTENT_LENGTH");
if (!content_length) return -1;
errno = 0;
rv = strtol(content_length, &endp, 10);
if (endp == content_length || *endp || errno || rv <= 0)
return -1;
return rv;
}
Each of the four clauses in the if statement after the strtol call checks for a different class of ill-formed input. You have to clear errno explicitly before the call, because the value strtol returns when it reports an overflow is also a value it can return when there was no overflow, so the only way to distinguish is to look at errno, but errno could have a stale nonzero value from some earlier operation.
Note that even if CONTENT_LENGTH is syntactically valid, it might not be trustworthy. That is, the actual amount of POST data available to you might be either less or more than CONTENT_LENGTH. Make sure to pay attention to the numbers returned by read as well. (This is an example of how swapping out string functions for "hardened" ones doesn't solve all your problems.)
Related
I ran a code analysis on my embedded C code with SonarCube with sonar.cxx plugin.
I also parse with sonarcube the XML generated with Rough Auditing Tool for Security (RATS) and i get this error:
This function does not properly handle non-NULL terminated strings. This does not result in exploitable code, but can lead to access violations.
This is the code that generates the above error:
if( (machineMarket == NULL) || (strlen(machineMarket) > VALUE_MARKET_MAX_LEN) )
Which is the best practise to handle the non NULL terminated string?
The auditing tool is warning that the call to strlen will keep reading bytes until it finds a zero byte. If the contents of machineMarket do not contain a zero, it is possible that strlen will keep reading right off the end of legal memory and cause an access violation.
You say you are declaring the variable like this
char machineMarket[VALUE_MARKET_MAX_LEN + 1];
So you can either use the strnlen function to ensure you never read too far, or use #Zan Lynx's method of forcibly inserting a 0 at the end.
With either method, you'll probably need to handle the case where the original string is/was not terminated.
The way that I handle it is whenever I get a string from outside my module, from a network read or a call into my library, I set a 0 on the end of it. Now, no matter what, it is a valid C string.
So if my library function accepts int func(char *output, size_t output_len) then right up front before I use it for anything I always validate with if( !output || !output_len) return; and then output[output_len-1] = 0;
Then even if they passed me complete garbage, it is at least a valid string.
If the contiguous block of memory that you own starting from machineMarket does not have a \0 then the behaviour of your code is undefined.
Use strnlen instead, passing something of the order VALUE_MARKET_MAX_LEN as the parameter and then refactor your >.
The manual says that
Upon successful return, these functions [printf, dprintf etc.] return the number of characters printed.
The manual does not mention whethet may this number less (but yet nonnegative) than the length of the "final" (substitutions and formattings done) string. Nor mentions that how to check whether (or achieve that) the string was completely written.
The dprintf function operates on file descriptor. Similarily to the write function, for which the manual does mention that
On success, the number of bytes written is returned (zero indicates nothing was written). It is not an error if this number is smaller than the number of bytes requested;
So if I want to write a string completely then I have to enclose the n = write() in a while-loop. Should I have to do the same in case of dprintf or printf?
My understanding of the documentation is that dprintf would either fail or output all the output. But I agree that it is some gray area (and I might not understand well); I'm guessing that a partial output is some kind of failure (so returns a negative size).
Here is the implementation of musl-libc:
In stdio/dprintf.c the dprintf function just calls vdprintf
But in stdio/vdprintf.c you just have:
static size_t wrap_write(FILE *f, const unsigned char *buf, size_t len)
{
return __stdio_write(f, buf, len);
}
int vdprintf(int fd, const char *restrict fmt, va_list ap)
{
FILE f = {
.fd = fd, .lbf = EOF, .write = wrap_write,
.buf = (void *)fmt, .buf_size = 0,
.lock = -1
};
return vfprintf(&f, fmt, ap);
}
So dprintf is returning a size like vfprintf (and fprintf....) does.
However, if you really are concerned, you'll better use snprintf or asprintf to output into some memory buffer, and explicitly use write(2) on that buffer.
Look into stdio/__stdio_write.c the implementation of __stdio_write (it uses writev(2) with a vector of two data chunks in a loop).
In other words, I would often not really care; but if you really need to be sure that every byte has been written as you expect it (for example if the file descriptor is some HTTP socket), I would suggest to buffer explicitly (e.g. by calling snprintf and/or asprintf) yourself, then use your explicit write(2).
PS. You might check yourself the source code of your particular C standard library providing dprintf; for GNU glibc see notably libio/iovdprintf.c
With stdio, returning the number of partially written bytes doesn't make much sense because stdio functions work with a (more or less) global buffer whose state is unknown to you and gets dragged in from previous calls.
If stdio functions allowed you to work with that, the error return values would need to be more complex as they would not only need to communicate how many characters were or were not outputted, but also whether the failure was before your last input somewhere in the buffer, or in the middle of your last input and if so, how much of the last input got buffered.
The d-functions could theoretically give you the number of partially written characters easy, but POSIX specifies that they should mirror the stdio functions and so they only give you a further unspecified negative value on error.
If you need more control, you can use the lower level functions.
Concerning printf(), it is quite clear.
The printf function returns the number of characters transmitted, or a negative value if an output or encoding error occurred. C11dr ยง7.21.6.3 3
A negative value is returned if an error occurred. In that case 0 or more characters may have printed. The count is unknowable via the standard library.
If the value return is not negative, that is the number sent to stdout.
Since stdout is often buffered, that may not be the number received at the output device on the conclusion of printf(). Follow printf() with a fflush(stdout)
int r1 = printf(....);
int r2 = fflush(stdout);
if (r1 < 0 || r2 != 0) Handle_Failure();
For finest control, "print" to a buffer and use putchar() or various non-standard functions.
My bet is that no. (After looking into the - obfuscated - source of printf.) So any nonnegative return value means that printf was fully succesful (reached the end of the format string, everything was passed to kernel buffers).
But some (authentic) people should confirm it.
So I was going through K&R second edition doing the exercises. Feeling pretty confident after doing few exercises I thought I'd check the actual implementations of these functions. It was then my confidence fled the scene. I could not understand any of it.
For example I check the getchar():
Here is the prototype in libio/stdio.h
extern int getchar (void);
So I follow it through it and gets this:
__STDIO_INLINE int
getchar (void)
{
return _IO_getc (stdin);
}
Again I follow it to the libio/getc.c:
int
_IO_getc (fp)
FILE *fp;
{
int result;
CHECK_FILE (fp, EOF);
_IO_acquire_lock (fp);
result = _IO_getc_unlocked (fp);
_IO_release_lock (fp);
return result;
}
And I'm taken to another header file libio/libio.h, which is pretty cryptic:
#define _IO_getc_unlocked(_fp) \
(_IO_BE ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end, 0) \
? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++)
Which is where I finally ended my journey.
My question is pretty broad. What does all this mean? I could not for the life of me figure out anything logical out of it by looking at the code. Looks like a bunch of codes abstracted away layers after layer.
More importantly when does it really get the character from stdin
_IO_getc_unlocked is an inlinable macro. The idea is that you can get a character from the stream without having to call a function, making it hopefully fast enough to use in tight loops, etc.
Let's take it apart one layer at a time. First, what is _IO_BE?
/usr/include/libio.h:# define _IO_BE(expr, res) __builtin_expect ((expr), res)
_IO_BE is a hint to the compiler, that expr will usually evaluate to res. It's used to structure code flow to be faster when the expectation is true, but has no other semantic effect. So we can get rid of that, leaving us with:
#define _IO_getc_unlocked(_fp) \
( ( (_fp)->_IO_read_ptr >= (_fp)->_IO_read_end ) \
? __uflow(_fp) : *(unsigned char *)(_fp)->_IO_read_ptr++) )
Let's turn this into an inline function for clarity:
inline int _IO_getc_unlocked(FILE *fp) {
if (_fp->_IO_read_ptr >= _fp->_IO_read_end)
return __uflow(_fp);
else
return *(unsigned char *)(_fp->_IO_read_ptr++);
}
In short, we have a pointer into a buffer, and a pointer to the end of the buffer. We check if the pointer is outside the buffer; if not, we increment it and return whatever character was at the old value. Otherwise we call __uflow to refill the buffer and return the newly read character.
As such, this allows us to avoid the overhead of a function call until we actually need to do IO to refill the input buffer.
Keep in mind that standard library functions can be complicated like this; they can also use extensions to the C language (such as __builtin_expect) that are NOT standard and may NOT work on all compilers. They do this because they need to be fast, and because they can make assumptions about what compiler they're using. Generally speaking your own code should not use such extensions unless absolutely necessary, as it'll make porting to other platforms more difficult.
Going from pseudo-code to real code we can break it down:
if (there is a character in the buffer)
return (that character)
else
call a function to refill the buffer and return the first character
end
Let's use the ?: operator:
#define getc(f) (is_there_buffered_stuff(f) ? *pointer++ : refill())
A bit closer:
#define getc(f) (is_there_buffered_stuff(f) ? *f->pointer++ : refill(f))
Now we are almost there. To determine if there is something buffered already, it uses the file structure pointer and a read pointer within the buffer
_fp->_IO_read_ptr >= _fp->_IO_read_end ?
This actually tests the opposite condition to my pseudo-code, "is the buffer empty", and if so, it calls __uflow(_fp) // "underflow", otherwise, it just reaches directly into the buffer with a pointer, gets the character, and then increments the pointer:
? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++)
I can highly recommend The Standard C Library by P.J. Plauger. He provides background on the standard and provides an implementation of every function. The implementation is simpler than what you'll see in glibc or a modern C compiler, but does still make use of macros like the _IO_getc_unlocked() you posted.
The macro is going to pull a character from buffered data (which could be the ungetc buffer) or read it from the stream (which may read and buffer multiple bytes).
The reason there is a standard library is that you should not need to know the exact implantation details of these functions. The code that implements the library calls at some point has to use nonstandard system calls which have to deal with issues you may not be concerned with. If you are learning C make sure you can understand other C programs besides the stdlib once you get a little more advance look at the stdlib, but it still won't make alot of sense until you understand the system calls involved.
The definition of getchar() redefines the request as a specific request for a character from stdin.
The definition of _IO_getc() does a sanity check to make sure that the FILE* exists and is not an End-Of-File, then it locks the stream to prevent other threads from corrupting the call to _IO_getc_unlocked().
The macro definition of _IO_getc_unlocked() simply checks to see if the read pointer is at or past the end of file point, and either calls __uflow if it is, or returns the char at the read pointer if it is not.
This is standard stuff for all stdlib implementations. You are not supposed to ever look at it. In fact, many stdlib implementations will use assembly language for optimal processing, which is even more cryptic.
I have an application which prints strings to a buffer using snprintf and vsnprintf. Currently, if it detects an overflow, it appends a > to the end of the string as a sign that the string was chopped and prints a warning to stderr. I'm trying to find a way to have it resume the string [from where it left off] in another buffer.
If this was using strncpy, it would be easy; I know how many bytes were written, and so I can start the next print from *(p+bytes_written); However, with printf, I have two problems; first, the formatting specifiers may take up more or less space in the final string as in the format string, and secondly, my valist may be partially parsed.
Does anyone have an easy-ish solution to this?
EDIT: I should probably clarify that I'm working on an embedded system with limited memory + no dynamic allocation [i.e., I don't want to use dynamic allocation]. I can print messages of 255 bytes, but no more, although I can print as many of those as I want. I don't, however, have the memory to allocate lots of memory on the stack, and my print function needs to be thread-safe, so I can't allocate just one global / static array.
I don't think you can do what you're looking for (other than by the straightforward way of reallocating the buffer to the necessary size and performing the entire operation again).
The reasons you listed are a couple contributors to this, but the real killer is that the formatter might have been in the middle of formatting an argument when it ran out of space, and there's no reasonable way to restart that.
For example, say there's 3 bytes left in the buffer, and the formatter starts working on a "%d" conversion for the value -1234567. It ll put "-1\0" into the buffer then do whatever else it needs to do to return the size of buffer you really need.
In addition to you being able to determine which specifier the formatter was working on, you'd need to be able to figure out that instead of passing in -1234567 on the second round you need to pass in 234567. I defy you to come up with a reasonable way to do that.
Now if there's a real reason you don't want to restart the operation from the top, you probably could wrap the snprintf()/vsnprintf() call with something that breaks down the format string, sending only a single conversion specifier at a time and concatenating that result to the output buffer. You'd have to come up with some way for the wrapper to keep some state across retries so it knows which conversion spec to pick up from.
So maybe it's doable in a sense, but it sure seems like it would be an awful lot of work to avoid the much simpler 'full retry' scheme. I could see maybe (maybe) trying this on a system where you don't have the luxury of dynamically allocating a larger buffer (an embedded system, maybe). In that case, I'd probably argue that what's needed is a much simpler/restricted scope formatter that doesn't have all the flexibility of printf() formatters and can handle retrying (because their scope is more limited).
But, man, I would try very hard to talk some sense into whoever said it was a requirement.
Edit:
Actually, I take some of that back. If you're willing to use a customized version of snprintf() (let's call it snprintf_ex()) I could see this being a relatively simple operation:
int snprintf_ex( char* s, size_t n, size_t skipChars, const char* fmt, ...);
snprintf_ex() (and its companion functions such as vsnprintf()) will format the string into the provided buffer (as usual) but will skip outputting the first skipChars characters.
You could probably rig this up pretty easy using the source from your compiler's library (or using something like Holger Weiss' snprintf()) as a starting point. Using this might look something like:
int bufSize = sizeof(buf);
char* fmt = "some complex format string...";
int needed = snprintf_ex( buf, bufSize, 0, fmt, arg1, arg2, etc, etc2);
if (needed >= bufSize) {
// dang truncation...
// do whatever you want with the truncated bits (send to a logger or whatever)
// format the rest of the string, skipping the bits we already got
needed = snprintf_ex( buf, bufSize, bufSize - 1, fmt, arg1, arg2, etc, etc2);
// now the buffer contains the part that was truncated before. Note that
// you'd still need to deal with the possibility that this is truncated yet
// again - that's an exercise for the reader, and it's probably trickier to
// deal with properly than it might sound...
}
One drawback (that might or might not be acceptable) is that the formatter will do all the formatting work over again from the start - it'll just throw away the first skipChars characters that it comes up with. If I had to use something like this, I'd think that would almost certainly be an acceptable thing (it what happens when someone deals with truncation using the standard snprintf() family of functions).
The C99 functions snprintf() and vsnprintf() both return the number of characters needed to print the whole format string with all the arguments.
If your implementation conforms to C99, you can create an array large enough for your output strings then deal with them as needed.
int chars_needed = snprintf(NULL, 0, fmt_string, v1, v2, v3, ...);
char *buf = malloc(chars_needed + 1);
if (buf) {
snprintf(buf, chars_needed + 1, fmt_string, v1, v2, v3, ...);
/* use buf */
free(buf);
} else {
/* no memory */
}
If you're on a POSIX-ish system (which I'm guessing you may be since you mentioned threads), one nice solution would be:
First try printing the string to a single buffer with snprintf. If it doesn't overflow, you've saved yourself a lot of work.
If that doesn't work, create a new thread and a pipe (with the pipe() function), fdopen the writing end of the pipe, and use vfprintf to write the string. Have the new thread read from the reading end of the pipe and break the output string into 255-byte messages. Close the pipe and join with the thread after vfprintf returns.
In the simplest way possible, how can I check if an integer initialized from function scanf is a number?
http://www.cplusplus.com/reference/clibrary/cstdio/scanf/
On success, [scanf] returns the
number of items succesfully read. This
count can match the expected number of
readings or fewer, even zero, if a
matching failure happens. In the case
of an input failure before any data
could be successfully read, EOF is
returned.
So you could do something like this:
#include <stdio.h>
int main()
{
int v;
if (scanf("%d", &v) == 1) {
printf("OK\n");
} else {
printf("Not an integer.\n");
}
return 0;
}
But it is suggest that you use fgets and strtol instead.
Your question is weirdly worded. An initialized integer is always a number (aside from exotic cases of trap representations), which means that there's no need to check anything.
I would guess that you need to check whether the given string is a valid representation of a number. For that, you first need to define what the valid representation should look like. Do you allow sign? Is a redundant + allowed (as a sign)? What about 0x prefix for hexadecimals? And so on.
C language offers its own set of rules that define the language's idea of a valid string representation of an integer. If that's what you need, then in order to verify whether the given string satisfies these rules, the best way is to use string-to-integer conversion function like strtol (and other functions from strto... group) and then check for the error condition.
Beware of the answers that suggest writing your own function that would verify these rules. This just doesn't make any sense, since the standard function exists already. Also, strictly speaking, in real-life programming there's rarely a need to perform verification without the actual conversion. strto... will do both.
Also, stay away from functions from scanf group to perfrom string-to-integer conversion. These functions produce undefined behavior on overflow (i.e. if the input string representation is too long). In general case, the only proper way to do such a conversion in C is functions from strto... group.