Why use bzero over memset? - c

In a Systems Programming class I took this previous semester, we had to implement a basic client/server in C. When initializing the structs, like sock_addr_in, or char buffers (that we used to send data back and forth between client and server) the professor instructed us to only use bzero and not memset to initialize them. He never explained why, and I'm curious if there is a valid reason for this?
I see here: http://fdiv.net/2009/01/14/memset-vs-bzero-ultimate-showdown that bzero is more efficient due to the fact that is only ever going to be zeroing memory, so it doesn't have to do any additional checking that memset may do. That still doesn't necessarily seem like a reason to absolutely not use memset for zeroing memory though.
bzero is considered deprecated, and furthermore is a not a standard C function. According to the manual, memset is preferred over bzero for this reason. So why would you want to still use bzero over memset? Just for the efficiency gains, or is it something more? Likewise, what are the benefits of memset over bzero that make it the de facto preferred option for newer programs?

I don't see any reason to prefer bzero over memset.
memset is a standard C function while bzero has never been a C standard function. The rationale is probably because you can achieve exactly the same functionality using memset function.
Now regarding efficiency, compilers like gcc use builtin implementations for memset which switch to a particular implementation when a constant 0 is detected. Same for glibc when builtins are disabled.

I'm guessing you used (or your teacher was influenced by) UNIX Network Programming by W. Richard Stevens. He uses bzero frequently instead of memset, even in the most up-to-date edition. The book is so popular, I think it's become an idiom in network programming which is why you still see it used.
I would stick with memset simply because bzero is deprecated and reduces portability. I doubt you would see any real gains from using one over the other.

The one advantage that I think bzero() has over memset() for setting memory to zero is that there's a reduced chance of a mistake being made.
More than once I've come across a bug that looked like:
memset(someobject, size_of_object, 0); // clear object
The compiler won't complain (though maybe cranking up some warning levels might on some compilers) and the effect will be that the memory isn't cleared. Because this doesn't trash the object - it just leaves it alone - there's a decent chance that the bug might not manifest into anything obvious.
The fact that bzero() isn't standard is a minor irritant. (FWIW, I wouldn't be surprised if most function calls in my programs are non-standard; in fact writing such functions is kind of my job).
In a comment to another answer here, Aaron Newton cited the following from Unix Network Programming, Volume 1, 3rd Edition by Stevens, et al., Section 1.2 (emphasis added):
bzero is not an ANSI C function. It is derived from early Berkely
networking code. Nevertheless, we use it throughout the text, instead
of the ANSI C memset function, because bzero is easier to remember
(with only two arguments) than memset (with three arguments). Almost
every vendor that supports the sockets API also provides bzero, and
if not, we provide a macro definition in our unp.h header.
Indeed, the author of TCPv3 [TCP/IP Illustrated, Volume 3 - Stevens 1996] made the mistake of swapping the second
and third arguments to memset in 10 occurrences in the first
printing. A C compiler cannot catch this error because both arguments
are of the same type. (Actually, the second argument is an int and
the third argument is size_t, which is typically an unsigned int,
but the values specified, 0 and 16, respectively, are still acceptable
for the other type of argument.) The call to memset still worked,
because only a few of the socket functions actually require that the
final 8 bytes of an Internet socket address structure be set to 0.
Nevertheless, it was an error, and one that could be avoided by using
bzero, because swapping the two arguments to bzero will always be
caught by the C compiler if function prototypes are used.
I also believe that the vast majority of calls to memset() are to zero memory, so why not use an API that is tailored to that use case?
A possible drawback to bzero() is that compilers might be more likely to optimize memcpy() because it's standard and so they might be written to recognize it. However, keep in mind that correct code is still better than incorrect code that's been optimized. In most cases, using bzero() will not cause a noticeable impact on your program's performance, and that bzero() can be a macro or inline function that expands to memcpy().

Wanted to mention something about bzero vs. memset argument. Install ltrace and then compare what it does under the hood.
On Linux with libc6 (2.19-0ubuntu6.6), the calls made are exactly the same (via ltrace ./test123):
long m[] = {0}; // generates a call to memset(0x7fffefa28238, '\0', 8)
int* p;
bzero(&p, 4); // generates a call to memset(0x7fffefa28230, '\0', 4)
I've been told that unless I am working in the deep bowels of libc or any number of kernel/syscall interface, I don't have to worry about them.
All I should worry about is that the call satisfy the requirement of zero'ing the buffer. Others have mentioned about which one is preferable over the other so I'll stop here.

You probably shouldn't use bzero, it's not actually standard C, it was a POSIX thing.
And note that word "was" - it was deprecated in POSIX.1-2001 and removed in POSIX.1-2008 in deference to memset so you're better off using the standard C function.

Have it any way you like. :-)
#ifndef bzero
#define bzero(d,n) memset((d),0,(n))
#endif
Note that:
The original bzero returns nothing, memset returns void pointer (d). This can be fixed by adding the typecast to void in the definition.
#ifndef bzero does not prevent you from hiding the original function even if it exists. It tests the existence of a macro. This may cause lots of confusion.
It’s impossible to create a function pointer to a macro. When using bzero via function pointers, this will not work.

For memset function, the second argument is an int and the third argument is size_t,
void *memset(void *s, int c, size_t n);
which is typically an unsigned int, but if the values like, 0 and 16 for second and third argument respectively are entered in wrong order as 16 and 0 then,
such a call to memset can still work, but will do nothing. Because the number of bytes to initialize are specified as 0.
void bzero(void *s, size_t n)
Such an error can be avoided by using bzero, because swapping the two arguments to bzero will always be caught by the C compiler if function prototypes are used.

In short: memset require more assembly operations then bzero.
This is the source:
http://fdiv.net/2009/01/14/memset-vs-bzero-ultimate-showdown

memset takes 3 parameters, bzero takes 2
in memory constrained that extra parameter would take 4 more bytes and most of the time itll be used to set everything to 0

Related

Why strcpy_s is safer than strcpy?

When I am trying to use the strcpy function the visual studio gives me an error
error C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
After searching online and many answers from StackOverflow, the summary is that strcpy_s is safer than strcpy when copying a large string into a shorter one.
So, I tried the following code for coping into shorter string:
char a[50] = "void";
char b[3];
strcpy_s(b, sizeof(a), a);
printf("String = %s", b);
The code copiles successfuly. However, there is still a runtime error:
So, how is scrcpy_s is safe?
Am I understanding the safty concept wrong?
Why is strcpy_s() "safer"? Well, it's actually quite involved. (Note that this answer ignores any specific code issues in the posted code.)
First, when MSVC tells you standard functions such as strcpy() are "deprecated", at best Microsoft is being incomplete. At worst, Microsoft is downright lying to you. Ascribe whatever motiviation you want to Microsoft here, but strcpy() and a host of other functions that MSVC calls "deprecated" are standard C functions and they are most certainly NOT deprecated by anyone other than Microsoft. So when MSVC warns you that a function required to be implemented in any conforming C compiler (most of which then flow by requirement into C++...), it omits the "by Microsoft" part.
The "safer" functions that Microsoft is "helpfully" suggesting that you use - such as strcpy_s() would be standard, as they are part of the optional Annex K of the C standard, had Microsoft implemented them per the standard.
Per N1967 - Field Experience With Annex K — Bounds Checking Interfaces
Microsoft Visual Studio implements an early version of the APIs. However, the implementation is incomplete and conforms neither to C11 nor to the original TR 24731-1. For example, it doesn't provide the set_constraint_handler_s function but instead defines a _invalid_parameter_handler _set_invalid_parameter_handler(_invalid_parameter_handler) function with similar behavior but a slightly different and incompatible signature. It also doesn't define the abort_handler_s and ignore_handler_s functions, the memset_s function (which isn't part of the TR), or the RSIZE_MAX macro. The Microsoft implementation also doesn't treat overlapping source and destination sequences as runtime-constraint violations and instead has undefined behavior in such cases.
As a result of the numerous deviations from the specification the Microsoft implementation cannot be considered conforming or portable.
Outside of a few specific cases (of which strcpy() is one), whether Microsoft's version of Annex K's "safer" bounds-checking functions are safer is debatable. Per N1967 (bolding mine):
Suggested Technical Corrigendum
Despite more than a decade since the original proposal and nearly ten years since the ratification of ISO/IEC TR 24731-1:2007, and almost five years since the introduction of the Bounds checking interfaces into the C standard, no viable conforming implementations has emerged. The APIs continue to be controversial and requests for implementation continue to be rejected by implementers.
The design of the Bounds checking interfaces, though well-intentioned, suffers from far too many problems to correct. Using the APIs has been seen to lead to worse quality, less secure software than relying on established approaches or modern technologies. More effective and less intrusive approaches have become commonplace and are often preferred by users and security experts alike.
Therefore, we propose that Annex K be either removed from the next revision of the C standard, or deprecated and then removed.
Note, however, in the case of strcpy(), strcpy_s() is actually more akin to strncpy() as strcpy() is just a bog-standard C string function that doesn't do bounds checking, but strncpy() is a perverse function in that it will completely fill its target buffer, starting with data from the source string, and filling the entire target buffer with '\0' char values. Unless the source string fills the entire target buffer, in which case strncpy() will NOT terminate it with a '\0' char value.
I'll repeat that: strncpy() does not guarantee a properly terminated copy.
It's hard not to be "safer" than strncpy(). In this case strcpy_s() does not violate the principle of least astonishment like strncpy() does. I'd call that "safer".
But using strcpy_s() - and all the other "suggested" functions - makes your code de facto non-portable, as Microsoft is the only significant implementation of any form of Annex K's bounds-checking functions.
The header definiton for C is:
errno_t strcpy_s(char *dest,rsize_t dest_size,const char *src)
The invocation for your example should be:
#include <stdlib.h>
char a[50] = "void";
char b[3];
strcpy_s(b, sizeof(b), a);
printf("String = %s", b);
strcpy_s needs the size of the destination, which is smaller than the source in your example.
strcpy_s(b, sizeof(b), a);
would be the way to go.
As for the safety concept, there are many checks now done, and better ways to handle errors.
In your example, had you used strcpy, you would have triggered a buffer overflow. Other functions, like strncpy or strlcpy, would have copied the 3 first characters without any null-byte terminator, which in turn would have triggered a buffer overflow (in reading, this time).

Why do Kernighan and Ritchie include seemingly unnecessary typecasts?

Second edition. I'm looking at their hash table example in section 6.6. I found the full source transcribed here. This is the part I'm puzzling over:
struct nlist *np;
if((np=lookup(name))==NULL){
np=(struct nlist *)malloc(sizeof(*np));
Why the cast to (struct nlist *) on the last line? I can remove it without getting any compiler warnings.
I'm similarly confused by
free((void *)np->def);
Are these intended to aid readability somehow?
Casting the result of malloc was necessary in some pre-ANSI dialects of C, and the usage was retained in K&R2 (which covers the language as of the 1989 ANSI C standard).
This is mentioned in the errata list for the book. I've seen it via Dennis Ritchie's home page, which isn't currently available (I hope AT&T hasn't permanently removed it), but I found it elsewhere:
142(§6.5, toward the end): The remark about casting the return value of malloc ("the proper method is to declare ... then explicitly coerce") needs to be rewritten. The example is correct and works, but the advice is debatable in the context of the 1988-1989 ANSI/ISO standards. It's not necessary (given that coercion of void * to ALMOSTANYTYPE * is automatic), and possibly harmful if malloc, or a proxy for it, fails to be declared as returning void *. The explicit cast can cover up an unintended error. On the other hand, pre-ANSI, the cast was necessary, and it is in C++ also.
Despite the opinion of the legions of posters here who will immediately jump on any code with an unnecessary (but harmless) cast of malloc(), the truth is that it just doesn't matter. Yes, assignment to and from void * does not require casting, but nor is it forbidden, and the arguments for leaving it in or taking it out really aren't that strong.
There are more important things to spend brain cells on. It just doesn't matter.
For that example to be completely correct today, you have to put
#include <stdlib.h>
so you get the proper prototype for malloc(3). In this case, it's not important if you do a cast or not, as malloc is declared as returning void * there, and no need to cast from this type to another pointer type (but you can if you desire).
Today, it's better not to do the cast, as you can hide a more than frequent error. If you do the cast and don't provide a prototype to the compiler for malloc, the compiler assumes malloc is default declared as int malloc(); (returning an int instead of a void *, and taking an unspecified number of arguments) and you want (as you stated it explicitly) to convert that int to a pointer. The compiler will call malloc and take the supposed int result (the 32 bit value, not the actual, 64bit returned by malloc ---depending on the architecture calling conventions these values can be related or not, but they are always different, as the int space is smaller than the pointer space) as return value, convert it blindly to the cast type you propose (withoug warning, as you have explicitly put the cast, add the missing 32bits to complete a full 64bit pointer ---THIS IS TRULY DANGEROUS IF INTEGER TYPES ARE NOT THE SAME SIZE AS POINTER TYPES---, you can check this on 64bit platforms where they aren't, or in old MS-DOS compilers in large memory models, where they aren't also) and hide the real problem (which is that you did not provide a proper #include file) You will be lucky if it works, as that means all the virtual pointers returned by malloc are below the 0x100000000 limit (this in 64bit intel architectures, let's see in 64bit big endian architectures) This is the actual source of undefined behaviour you should expect.
Normally, with modern compilers, you'll probably get some kind of warning for using a function with no declaration (if you provided no prototype) but the compiler will compile the program and generate an executable program, probably not the one you want. This is one of the main differences between C and C++ languages (C++ don't allow you to compile code with a function call if you have not declared a prototype for it before, so you'll get an error, instead of a possible warning, if you get the invalid malloc I mention above)
This kind of errors are very difficult to target and that's the reason the people that advocates for not casting actually does.
The cast is deprecated. void * can be assigned directly to any pointer type; you actually even should do so. K&R is a bit outdated in some aspects and you should definitively get something more recent (and for newer standards - C99 upwards).
See n1570: 6.3.2.3/1.

When do I need to cast the result of malloc in C language? [duplicate]

This question already has answers here:
Do I cast the result of malloc?
(29 answers)
Closed 8 years ago.
Based on this old question malloc returns a pointer to void that it
is automatically and safely promoted to any other pointer type
But reading K&R I've found this following code
char *strdup(char *s)
{
char *p;
/* make a duplicate of s */
p = (char *) malloc(strlen(s)+1)
What is the difference?
For any implementation conforming to C89 or later, casting the result of malloc() is never necessary.
Casting the result of malloc() is mentioned in the Errata for The C Programming Language, Second Edition:
142(§6.5, toward the end): The remark about casting the return value
of malloc ("the proper method is to declare ... then explicitly
coerce") needs to be rewritten. The example is correct and works, but
the advice is debatable in the context of the 1988-1989 ANSI/ISO
standards. It's not necessary (given that coercion of void * to
ALMOSTANYTYPE * is automatic), and possibly harmful if malloc, or a
proxy for it, fails to be declared as returning void *. The explicit
cast can cover up an unintended error. On the other hand, pre-ANSI,
the cast was necessary, and it is in C++ also.
(That link is no longer valid. The late Dennis Ritchie's home page is now here; it points to an updated location for the home page for the book, but that link is also invalid.)
Best practice is to assign the result of a malloc() call directly to a pointer object, and let the implicit conversion from void* take care of type consistency. The declaration of malloc() needs to be made visible via #include <stdlib.h>.
(It's barely conceivable that you might want to use malloc() in a context where an explicit conversion is needed. The only case I can think of is passing the result of malloc() directly to a variadic function that needs an argument of a pointer type other than void*. In such an unusual and contrived case, the cast can be avoided by assigning the result to a pointer object and passing that object's value to the function.)
The only good reasons to cast the result of malloc() are (a) compatibility with C++ (but the need to write code that compiles both as C and as C++ rarer than you might expect), and (b) compatibility with pre-ANSI C compilers (but the need to cater to such compilers is rare and becoming rarer).
Firstly: K&R C is ancient.
Secondly: I don't know if that's the full code in that example, but in K&R C, functions are assumed to have an int return value if not otherwise declared. If he's not declaring that malloc, it means it returns int as far as the compiler is concerned, hence the cast. Note that this is undefined behavior even in C89 (the earliest standarized version) and extremely bad practice; but he may have done it this way for brevity, or perhaps due to laziness --- or maybe for some other historical reason; I don't know all the intricacies of K&R C, and it might be that void* to char* casts were not implicit back then.
I should add that this is a very serious problem, since int and void* may have different sizes (and often do --- the most common example being most x86_64 code, where pointers are 64-bit, but ints are 32-bit).
UPDATE: #KeithThompson has informed me that the code is from 2nd edition, and malloc is declared there. That one is (unlike 1st ed) still relevant, if very much outdated at places.
I'm guessing that the cast was likely done for compatibility with non-conforming compilers, which mattered at the time, as many [most?] compilers were not fully conforming yet. Nowadays, you'd have to get out of your way to find one that would need the cast (and no, C++ compilers don't count; they're C++ compilers, not C compilers).
People who want (or may want) to cross compile C and C++ often cast malloc(.) because C++ won't automatically promote void*.
As others point out in the good old days when function prototypes were optional the compiler would assume malloc() returned int if it hadn't seen the prototype with potentially catastrophic run-time outcomes.
I try to see some point in that argument but I simply can't believe such situations occur with a regularity that warrants the flaming people then get for casting malloc(.).
Look at any Question that casts malloc(.) on this site and it doesn't matter what it's about some one will angrily demand they remove their casts as though they make their eyes bleed.
NB: It does no good and makes nothing wrong right to cast malloc(.) so it is clutter. In fact the only (real) argument I know is that it leads people to look at the cast and not look at the argument - which is what matters!
I saw code on here today:
int** x=(int**)malloc(sizeof(int)*n);
It's very easy to be lulled into a false sense of security.
The sizeof(int) is doing (or in this case failing to do) the work to make sure what comes back from malloc(.) is right for the purpose. (int**) does no work.
I'm not saying cast it. I am appealing for calm. I certainly think if a novice is failing to call free(.) we teach them that lesson before we get into nuances of casting malloc(.).

Why there is no strnchr function?

As the other C library function, like strcpy, strcat, there is a version which limits the size of string (strncpy, etc.), I am wondering why there is no such variant for strchr?
It does exist -- it is called memchr:
http://en.wikipedia.org/wiki/C_string_handling
In C, the term "string" usually means "null terminated array of characters", and the str* functions operate on those kinds of strings. The n in the functions you mention is mostly for the sake of controlling the output.
If you want to operate on an arbitary byte sequence without any implied termination semantics, use the mem* family of functions; in your case memchr should serve your needs.
strnchr and also strnlen are defined in some linux environments, for example https://manpages.debian.org/experimental/linux-manual-4.11/strnchr.9.en.html. It is really necessary. A program may crash on end of memory area if strlen or strcmp do not found any \0-termination. Unfortunately such things often are not standardized or too late and too sophisticated standardized. strnlen_s is existing in C11, but strnchr_s is not available.
You may found some more information about such problems in my internet page: https://www.vishia.org/emcdocu/html/portability_emC.html. I have defined some C-functions strnchr_emC... etc. which delivers the required functionality. To achieve compatibility you can define
#define strnchr strnchr_emC
In a common header but platform-specific. Refer the further content on https://www.vishia.org/emc/. You find the sources in https://github.com/JzHartmut

Is it good practice to ALWAYS cast variables in C?

I'm writing some C code and use the Windows API. I was wondering if it was in any way good practice to cast the types that are obviously the same, but have a different name? For example, when passing a TCHAR * to strcmp(), which expects a const char *. Should I do, assuming I want to write strict and in every way correct C, strcmp((const char *)my_tchar_string, "foo")?
Don't. But also don't use strcmp() but rather _tcscmp() (or even the safe alternatives).
_tcs* denotes a whole set of C runtime (string) functions that will behave correctly depending on how TCHAR gets translated by the preprocessor.
Concerning safe alternatives, look up functions with a trailing _s and otherwise named as the classic string functions from the C runtime. There is another set of functions that returns HRESULT, but it is not as compatible with the C runtime.
No, casting that away is not safe because TCHAR is not always equal to char. Instead of casting, you should pick a function that works with a TCHAR. See http://msdn.microsoft.com/en-us/library/e0z9k731(v=vs.71).aspx
Casting is generally a bad idea. Casting when you don't need to is terrible practice.
Think what happens if you change the type of the variable you are casting? Suppose that at some future date you change my_tchar_string to be wchar_t* rather than char*. Your code will still compile but will behave incorrectly.
One of your primary goals when writing C code is to minimise the number of casts in your code.
My advice would be to just avoid TCHAR (and associated functions) completely. Their real intent was to allow a single code base to compile natively for either 16-bit or 32-bit versions of Windows -- but the 16-bit versions of Windows are long gone, and with them the real reason to write code like this.
If you want/need to support wide characters, do it. If you're fine with only narrow/multibyte characters, do that. At least IME, trying to sit on the fence and do some of both generally means you end up not doing either one well. It also means roughly doubling the amount of testing necessary without even coming close to doubling the functionality you provide to the user.

Resources