As the other C library function, like strcpy, strcat, there is a version which limits the size of string (strncpy, etc.), I am wondering why there is no such variant for strchr?
It does exist -- it is called memchr:
http://en.wikipedia.org/wiki/C_string_handling
In C, the term "string" usually means "null terminated array of characters", and the str* functions operate on those kinds of strings. The n in the functions you mention is mostly for the sake of controlling the output.
If you want to operate on an arbitary byte sequence without any implied termination semantics, use the mem* family of functions; in your case memchr should serve your needs.
strnchr and also strnlen are defined in some linux environments, for example https://manpages.debian.org/experimental/linux-manual-4.11/strnchr.9.en.html. It is really necessary. A program may crash on end of memory area if strlen or strcmp do not found any \0-termination. Unfortunately such things often are not standardized or too late and too sophisticated standardized. strnlen_s is existing in C11, but strnchr_s is not available.
You may found some more information about such problems in my internet page: https://www.vishia.org/emcdocu/html/portability_emC.html. I have defined some C-functions strnchr_emC... etc. which delivers the required functionality. To achieve compatibility you can define
#define strnchr strnchr_emC
In a common header but platform-specific. Refer the further content on https://www.vishia.org/emc/. You find the sources in https://github.com/JzHartmut
Related
When I am trying to use the strcpy function the visual studio gives me an error
error C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
After searching online and many answers from StackOverflow, the summary is that strcpy_s is safer than strcpy when copying a large string into a shorter one.
So, I tried the following code for coping into shorter string:
char a[50] = "void";
char b[3];
strcpy_s(b, sizeof(a), a);
printf("String = %s", b);
The code copiles successfuly. However, there is still a runtime error:
So, how is scrcpy_s is safe?
Am I understanding the safty concept wrong?
Why is strcpy_s() "safer"? Well, it's actually quite involved. (Note that this answer ignores any specific code issues in the posted code.)
First, when MSVC tells you standard functions such as strcpy() are "deprecated", at best Microsoft is being incomplete. At worst, Microsoft is downright lying to you. Ascribe whatever motiviation you want to Microsoft here, but strcpy() and a host of other functions that MSVC calls "deprecated" are standard C functions and they are most certainly NOT deprecated by anyone other than Microsoft. So when MSVC warns you that a function required to be implemented in any conforming C compiler (most of which then flow by requirement into C++...), it omits the "by Microsoft" part.
The "safer" functions that Microsoft is "helpfully" suggesting that you use - such as strcpy_s() would be standard, as they are part of the optional Annex K of the C standard, had Microsoft implemented them per the standard.
Per N1967 - Field Experience With Annex K — Bounds Checking Interfaces
Microsoft Visual Studio implements an early version of the APIs. However, the implementation is incomplete and conforms neither to C11 nor to the original TR 24731-1. For example, it doesn't provide the set_constraint_handler_s function but instead defines a _invalid_parameter_handler _set_invalid_parameter_handler(_invalid_parameter_handler) function with similar behavior but a slightly different and incompatible signature. It also doesn't define the abort_handler_s and ignore_handler_s functions, the memset_s function (which isn't part of the TR), or the RSIZE_MAX macro. The Microsoft implementation also doesn't treat overlapping source and destination sequences as runtime-constraint violations and instead has undefined behavior in such cases.
As a result of the numerous deviations from the specification the Microsoft implementation cannot be considered conforming or portable.
Outside of a few specific cases (of which strcpy() is one), whether Microsoft's version of Annex K's "safer" bounds-checking functions are safer is debatable. Per N1967 (bolding mine):
Suggested Technical Corrigendum
Despite more than a decade since the original proposal and nearly ten years since the ratification of ISO/IEC TR 24731-1:2007, and almost five years since the introduction of the Bounds checking interfaces into the C standard, no viable conforming implementations has emerged. The APIs continue to be controversial and requests for implementation continue to be rejected by implementers.
The design of the Bounds checking interfaces, though well-intentioned, suffers from far too many problems to correct. Using the APIs has been seen to lead to worse quality, less secure software than relying on established approaches or modern technologies. More effective and less intrusive approaches have become commonplace and are often preferred by users and security experts alike.
Therefore, we propose that Annex K be either removed from the next revision of the C standard, or deprecated and then removed.
Note, however, in the case of strcpy(), strcpy_s() is actually more akin to strncpy() as strcpy() is just a bog-standard C string function that doesn't do bounds checking, but strncpy() is a perverse function in that it will completely fill its target buffer, starting with data from the source string, and filling the entire target buffer with '\0' char values. Unless the source string fills the entire target buffer, in which case strncpy() will NOT terminate it with a '\0' char value.
I'll repeat that: strncpy() does not guarantee a properly terminated copy.
It's hard not to be "safer" than strncpy(). In this case strcpy_s() does not violate the principle of least astonishment like strncpy() does. I'd call that "safer".
But using strcpy_s() - and all the other "suggested" functions - makes your code de facto non-portable, as Microsoft is the only significant implementation of any form of Annex K's bounds-checking functions.
The header definiton for C is:
errno_t strcpy_s(char *dest,rsize_t dest_size,const char *src)
The invocation for your example should be:
#include <stdlib.h>
char a[50] = "void";
char b[3];
strcpy_s(b, sizeof(b), a);
printf("String = %s", b);
strcpy_s needs the size of the destination, which is smaller than the source in your example.
strcpy_s(b, sizeof(b), a);
would be the way to go.
As for the safety concept, there are many checks now done, and better ways to handle errors.
In your example, had you used strcpy, you would have triggered a buffer overflow. Other functions, like strncpy or strlcpy, would have copied the 3 first characters without any null-byte terminator, which in turn would have triggered a buffer overflow (in reading, this time).
There are many non safe functions in C like gets,printf and others.
When these vulnerabilities are well known, why haven't the functions been modified to make them safe?
In most cases, you can't change a function's signature (i.e. the number and type of the arguments) without breaking existing code.
Vulnerable functions tend to get fixed by introducing alternatives that have additional arguments (such as buffer sizes). These alternative functions are designed to be non-exploitable when used correctly.
Compare, for example, sprintf() and snprintf().
In a Systems Programming class I took this previous semester, we had to implement a basic client/server in C. When initializing the structs, like sock_addr_in, or char buffers (that we used to send data back and forth between client and server) the professor instructed us to only use bzero and not memset to initialize them. He never explained why, and I'm curious if there is a valid reason for this?
I see here: http://fdiv.net/2009/01/14/memset-vs-bzero-ultimate-showdown that bzero is more efficient due to the fact that is only ever going to be zeroing memory, so it doesn't have to do any additional checking that memset may do. That still doesn't necessarily seem like a reason to absolutely not use memset for zeroing memory though.
bzero is considered deprecated, and furthermore is a not a standard C function. According to the manual, memset is preferred over bzero for this reason. So why would you want to still use bzero over memset? Just for the efficiency gains, or is it something more? Likewise, what are the benefits of memset over bzero that make it the de facto preferred option for newer programs?
I don't see any reason to prefer bzero over memset.
memset is a standard C function while bzero has never been a C standard function. The rationale is probably because you can achieve exactly the same functionality using memset function.
Now regarding efficiency, compilers like gcc use builtin implementations for memset which switch to a particular implementation when a constant 0 is detected. Same for glibc when builtins are disabled.
I'm guessing you used (or your teacher was influenced by) UNIX Network Programming by W. Richard Stevens. He uses bzero frequently instead of memset, even in the most up-to-date edition. The book is so popular, I think it's become an idiom in network programming which is why you still see it used.
I would stick with memset simply because bzero is deprecated and reduces portability. I doubt you would see any real gains from using one over the other.
The one advantage that I think bzero() has over memset() for setting memory to zero is that there's a reduced chance of a mistake being made.
More than once I've come across a bug that looked like:
memset(someobject, size_of_object, 0); // clear object
The compiler won't complain (though maybe cranking up some warning levels might on some compilers) and the effect will be that the memory isn't cleared. Because this doesn't trash the object - it just leaves it alone - there's a decent chance that the bug might not manifest into anything obvious.
The fact that bzero() isn't standard is a minor irritant. (FWIW, I wouldn't be surprised if most function calls in my programs are non-standard; in fact writing such functions is kind of my job).
In a comment to another answer here, Aaron Newton cited the following from Unix Network Programming, Volume 1, 3rd Edition by Stevens, et al., Section 1.2 (emphasis added):
bzero is not an ANSI C function. It is derived from early Berkely
networking code. Nevertheless, we use it throughout the text, instead
of the ANSI C memset function, because bzero is easier to remember
(with only two arguments) than memset (with three arguments). Almost
every vendor that supports the sockets API also provides bzero, and
if not, we provide a macro definition in our unp.h header.
Indeed, the author of TCPv3 [TCP/IP Illustrated, Volume 3 - Stevens 1996] made the mistake of swapping the second
and third arguments to memset in 10 occurrences in the first
printing. A C compiler cannot catch this error because both arguments
are of the same type. (Actually, the second argument is an int and
the third argument is size_t, which is typically an unsigned int,
but the values specified, 0 and 16, respectively, are still acceptable
for the other type of argument.) The call to memset still worked,
because only a few of the socket functions actually require that the
final 8 bytes of an Internet socket address structure be set to 0.
Nevertheless, it was an error, and one that could be avoided by using
bzero, because swapping the two arguments to bzero will always be
caught by the C compiler if function prototypes are used.
I also believe that the vast majority of calls to memset() are to zero memory, so why not use an API that is tailored to that use case?
A possible drawback to bzero() is that compilers might be more likely to optimize memcpy() because it's standard and so they might be written to recognize it. However, keep in mind that correct code is still better than incorrect code that's been optimized. In most cases, using bzero() will not cause a noticeable impact on your program's performance, and that bzero() can be a macro or inline function that expands to memcpy().
Wanted to mention something about bzero vs. memset argument. Install ltrace and then compare what it does under the hood.
On Linux with libc6 (2.19-0ubuntu6.6), the calls made are exactly the same (via ltrace ./test123):
long m[] = {0}; // generates a call to memset(0x7fffefa28238, '\0', 8)
int* p;
bzero(&p, 4); // generates a call to memset(0x7fffefa28230, '\0', 4)
I've been told that unless I am working in the deep bowels of libc or any number of kernel/syscall interface, I don't have to worry about them.
All I should worry about is that the call satisfy the requirement of zero'ing the buffer. Others have mentioned about which one is preferable over the other so I'll stop here.
You probably shouldn't use bzero, it's not actually standard C, it was a POSIX thing.
And note that word "was" - it was deprecated in POSIX.1-2001 and removed in POSIX.1-2008 in deference to memset so you're better off using the standard C function.
Have it any way you like. :-)
#ifndef bzero
#define bzero(d,n) memset((d),0,(n))
#endif
Note that:
The original bzero returns nothing, memset returns void pointer (d). This can be fixed by adding the typecast to void in the definition.
#ifndef bzero does not prevent you from hiding the original function even if it exists. It tests the existence of a macro. This may cause lots of confusion.
It’s impossible to create a function pointer to a macro. When using bzero via function pointers, this will not work.
For memset function, the second argument is an int and the third argument is size_t,
void *memset(void *s, int c, size_t n);
which is typically an unsigned int, but if the values like, 0 and 16 for second and third argument respectively are entered in wrong order as 16 and 0 then,
such a call to memset can still work, but will do nothing. Because the number of bytes to initialize are specified as 0.
void bzero(void *s, size_t n)
Such an error can be avoided by using bzero, because swapping the two arguments to bzero will always be caught by the C compiler if function prototypes are used.
In short: memset require more assembly operations then bzero.
This is the source:
http://fdiv.net/2009/01/14/memset-vs-bzero-ultimate-showdown
memset takes 3 parameters, bzero takes 2
in memory constrained that extra parameter would take 4 more bytes and most of the time itll be used to set everything to 0
I am reading about using of safe strings at following location
https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=5111861
It is mentioned as below.
SafeStr strings, when used properly, can eliminate many of these errors and provide backward compatibility to legacy code as well.
My question is what does author mean by "provide backward compatibility to legacy code as well." ? Request to explain with example.
Thanks for your time and help
It means that functions from the standard libc (and others) which expects plain, null terminated char arrays, will work even on those SafeStrs. This is probably achieved by putting a control structure at a negative offset (or some other trick) from the start of the string.
Examples: strcmp() printf() etc can be used directly on the strings returned by SafeStr.
In contrast, there are also other string libraries for C which are very "smart" and dynamic, but these strings can not be sent without conversion to "old school" functions.
From that page:
The library is based on the safestr_t type which is completely
compatible with char *. This allows casting of safestr_t structures to
char *.
That's some backward compatibility with all the existing code that takes char * or const char * pointers.
Why this distinction? I've landed up with terrible problems, assuming itoa to be in stdlib.h and finally ending up with linking a custom version of itoa with a different prototype and thus producing some crazy errors.
So, why isn't itoa not a standard function? What's wrong with it? And why is the standard partial towards its twin brother atoi?
No itoa has ever been standardised so to add it to the standard you would need a compelling reason and a good interface to add it.
Most itoa interfaces that I have seen either use a static buffer which has re-entrancy and lifetime issues, allocate a dynamic buffer that the caller needs to free or require the user to supply a buffer which makes the interface no better than sprintf.
An "itoa" function would have to return a string. Since strings aren't first-class objects, the caller would have to pass a buffer + length and the function would have to have some way to indicate whether it ran out of room or not. By the time you get that far, you've created something similar enough to sprintf that it's not worth duplicating the code/functionality. The "atoi" function exists because it's less complicated (and arguably safer) than a full "scanf" call. An "itoa" function wouldn't be different enough to be worth it.
The itoa function isn't standard probably for the reason is that there is no consistent definition of it. Different compiler and library vendors have introduced subtly different versions of it, possibly as an invention to serve as a complement to atoi.
If some non-standard function is widely provided by vendors, the standard's job is to codify it: basically add a description of the existing function to the standard. This is possible if the function has more or less consistent argument conventions and behavior.
Because multiple flavors of itoa are already out there, such a function cannot be added into ISO C. Whatever behavior is described would be at odds with some implementations.
itoa has existed in forms such as:
void itoa(int n, char *s); /* Given in _The C Programming Language_, 1st ed. (K&R1) */
void itoa(int input, void (*subr)(char)); /* Ancient Unix library */
void itoa(int n, char *buf, int radix);
char *itoa(int in, char *buf, int radix);
Microsoft provides it in their Visual C Run Time Library under the altered name: _itoa.
Not only have C implementations historically provided it under differing definitions, C programs also provide a function named itoa function for themselves, which is another source for possible clashes.
Basically, the itoa identifier is "radioactive" with regard to standardization as an external name or macro. If such a function is standardized, it will have to be under a different name.