Why does memset take an int instead of a char?

Why does memset take an int instead of a char? - c

Why does memset take an int as the second argument instead of a char, whereas wmemset takes a wchar_t instead of something like long or long long?

memset predates (by quite a bit) the addition of function prototypes to C. Without a prototype, you can't pass a char to a function -- when/if you try, it'll be promoted to int when you pass it, and what the function receives is an int.
It's also worth noting that in C, (but not in C++) a character literal like 'a' does not have type char -- it has type int, so what you pass will usually start out as an int anyway. Essentially the only way for it to start as a char and get promoted is if you pass a char variable.
In theory, memset could probably be modified so it receives a char instead of an int, but there's unlikely to be any benefit, and a pretty decent possibility of breaking some old code or other. With an unknown but potentially fairly high cost, and almost no chance of any real benefit, I'd say the chances of it being changed to receive a char fall right on the line between "slim" and "none".
Edit (responding to the comments): The CHAR_BIT least significant bits of the int are used as the value to write to the target.

Probably the same reason why the functions in <ctypes.h> take ints and not chars.
On most platforms, a char is too small to be pushed on the stack by itself, so one usually pushes the type closest to the machine's word size, i.e. int.
As the link in #Gui13's comment points out, doing that also increases performance.

See fred's answer, it's for performance reasons.
On my side, I tried this code:
#include <stdio.h>
#include <string.h>
int main (int argc, const char * argv[])
{
char c = 0x00;
printf("Before: c = 0x%02x\n", c);
memset( &c, 0xABCDEF54, 1);
printf("After: c = 0x%02x\n", c);
return 0;
}
And it gives me this on a 64bits Mac:
Before: c = 0x00
After: c = 0x54
So as you see, only the last byte gets written. I guess this is dependent on the architecture (endianness).

Related

Explicit cast to Implicit cast in C

I was wondering if it was possible to return the same result with only 1 explicit cast ?
void *begin(void *pt, size_t size)
{
return (void*)((size_t)pt & -size);
}
Every time in tried I got a BAD_ACCESS code 1
Exemple:
void *begin(void *pt, size_t size)
{
size_t *tmp = pt;
size_t res = *tmp & -size;
return (void*)(res);
}

It can be done with zero casts, with implementation-dependent code. However, this is akin to writing code without using the letter “e”: It may be a challenge, but it serves no purpose in production code. If it is posed as an academic exercise, it can be useful because artificial constraints can induce a student to think about things they might not otherwise think about so much, like alternative ways to do things or the technical rules of the language. However, in practice, this is generally pointless.
Your sample code uses size_t, but the preferred type for working with address as integers is uintptr_t, if it is defined. If it is defined, it is defined in <stdint.h>, and any normal C implementation of even modest quality should define it.
Your sample code assumes that converting an address to size_t yields a plain integer address in memory. (The address & -size operation is a common way of finding an address aligned to a multiple of size, which must be a power of two, by clearing the low bits, and so we recognize that your (size_t) pt must be a plain address, at least in its low bits.) Instead, let us assume that a pointer is represented in memory using a plain integer for the address and is the same size as uintptr_t. In any C implementation in which either of these true, the other is likely true too. Before using the following code, you should confirm this for your target C implementations.
Given that assumption, we can implement your begin routine with no casts:
#include <stdint.h>
#include <string.h>
void *begin(void *pt, size_t size)
{
uintptr_t us = size; // Convert size to uintptr_t to ensure it is at least as wide.
uintptr_t x; // Make space to copy pointer.
memcpy(&x, &pt, sizeof x); // Copy bytes.
x &= -us; // Zero low bits.
memcpy(&pt, &x, sizeof pt); // Copy bytes back.
return pt;
}
If the assumption is not true, it is nonetheless possible to implement begin for any chosen C implementation by setting an unsigned char pointer with unsigned char *p = (unsigned char *) &pt; and then using p to examine and manipulate the bytes of pt. The C standard requires each implementation to document its representation of types, so the meanings of the bytes in the void * pt must be documented, which enables writing code to compute with them as desired.
That uses one cast. It could be reduced to zero with void *v = &pt; unsigned char *p = v;.

How do I initialize a char array with a memory address in C?

I'm in a sophomore C class and this project is about dealing with pointers and designing a memory dump function. So I've been able to struggle through the pointers and got a beginning and ending address to dump, even bitmasked it, and I wanted to initialize a char array with the beginning memory address. I initialize it with the same variable storing my masked beginning address but when I print the array, it contains a different memory address. Here's the function:
void memDump(void *base, int bytes)
{
unsigned char *begin;
begin = base;//beginning of range of memory
unsigned char *end;// ending range of memory
end = base + bytes;
int a, b;
long long int d=base;
d=d&0xFFFFF0; //trying to bitmask
long long int e=end;
e = e&0xFFFF0; //masked off the beginning and ending range
char c[16]={d}; //loop variables
printf("%x", c);
for (a=begin; a<=end; a+=16)
{
printf("\n%016X\n", d);
printf("%016X\n", a);
printf("%016X", e);
}
}
Sorry guys, i can't find something similar and this is my last resort. Thanks!
Update: Thanks for the insight everyone, reading some more about C and some articles on how to debug helped me out.

You cannot "initialize a char array" with some "memory address." A char array can only be initialized with characters.
Stackoverflow is not about doing your homework for you, so I will give you some advice, and then you can try implementing it. If you cannot put the advice into code, then you do not deserve to turn in a completed assignment.
First of all, once you have bitmasked your "d", you need to store it back into "begin", so that you have a pointer from which you can start reading bytes to dump.
This instruction:
printf( "%08p ", begin );
Will render the hexadecimal representation of your "begin" address in 8 characters, followed by a space. This is how you need to begin each row of your memory dump.
The instruction:
printf( "%02x ", *(begin++) );
gets the byte pointed by "begin", and renders the hexadecimal representation of that byte in two characters, followed by a space. It then increments "begin", to point to the next byte. You need to do this 8 or 16 times, depending on how wide you want your memory dump to be, then do a printf( "\n" ) to move to the next line.
Then you need to keep repeating the above until your "begin" has exceeded your "end". (So, you are looking at an outer loop, for each row, and an inner loop, for each byte within the row.)
I hope this helps.

As #Jean-FrançoisFabre observed,
char c[16]={d};
probably does not do what you think it does. That is, unless what you think it does is convert the long long int value stored in d to type char (producing an implementation-defined result drawn from a much smaller range than that of d itself), initializing the first element of array c with that value, and initializing the other fifteen with 0. I can't imagine what you would want to do with the result, but since you actually don't do anything with it, that's probably moot.
As I observed myself,
printf("%x", c);
also probably does not do what you think it does. Indeed, you cannot rely on it to do any particular thing, because its behavior is undefined. You are passing a pointer to the first element of c as the second argument, but a value of type unsigned int will be expected instead (based on the format). In any case, this neither "print[s] the array" nor tells you anything about what it contains.
I suspect that what you actually had in mind was to declare c as an array whose address -- not contents -- is that designated by base, truncated to a 16-byte-aligned address. You cannot do that, because you cannot specify the address of any variable you declare, but you can declare c as a pointer, like this:
unsigned char *c = d;
(Oh no, more pointers!) There's some implementation-dependency there, but it probably has the result I think you want. Or if you want to be really clever, you might do this:
unsigned char (*c16)[16] = d;
That declares c16 as a pointer to an array of 16 unsigned char. It's as close as you can get to declaring an array at an address specified by you. I suspect you'll find it easier to work with the other declaration, however.
If you want to print the contents of the memory to which such a pointer points (as a "memory dump" function seems wont to do) then you'll need to do a little more work. The standard library's formatted I/O functions do not provide directly for printing arrays (for good reasons that I'll not go into here), except C strings, and you do not appear to want to print the data as a C string. Do, however, consider this call, and how you might modify it for or adapt it to your purpose (assuming my above declaration for c):
printf("%02x", *c);

When to cast size_t

I'm a little confused as how to use size_t when other data types like int, unsigned long int and unsigned long long int are present in a program. I try to illustrate my confusion minimally. Imagine a program where I use
void *calloc(size_t nmemb, size_t size)
to allocate an array (one- or multidimensional). Let the call to calloc() be dependent on nrow and sizeof(unsigned long int). sizeof(unsigned long int) is obviously fine because it returns size_t. But let nrow be such that it needs to have type unsigned long int. What do I do in such a case? Do I cast nrow in the call to calloc() from unsigned long int to size_t?
Another case would be
char *fgets(char *s, int size, FILE *stream)
fgets() expects type int as its second parameter. But what if I pass it an array, let's say save, as it's first parameter and use sizeof(save) to pass it the size of the array? Do I cast the call to sizeof() to int? That would be dangerous since int isn't guaranteed to hold all possible returns from sizeof().
What should I do in these two cases? Cast, or just ignore possible warnings from tools such as splint?
Here is an example regarding calloc() (I explicitly omit error-checking for clarity!):
long int **arr;
unsigned long int mrow;
unsigned long int ncol;
arr = calloc(mrow, sizeof(long int *));
for(i = 0; i < mrow; i++) {
arr[i] = calloc(ncol, sizeof(long int));
}
Here is an example for fgets() (Error-handling again omitted for clarity!):
char save[22];
char *ptr_save;
unsigned long int mrow
if (fgets(save, sizeof(save), stdin) != NULL) {
save[strcspn(save, "\n")] = '\0';
mrow = strtoul(save, &ptr_save, 10);
}

I'm a little confused as how to use size_t when other data types like
int, unsigned long int and unsigned long long int are present in a
program.
It is never a good idea to ignore warnings. Warnings are there to direct your attention to areas of your code that may be problematic. It is much better to take a few minutes to understand what the warning is telling you -- and fix it, then to get bit by it later when you hit a corner-case and stumble off into undefined behavior.
size_t itself is just a data-type like any other. While it can vary, it generally is nothing more than an unsigned int covering the range of positive values that can be represented by int including 0 (the type size was intended to be consistent across platforms, the actual bytes on each may differ). Your choice of data-type is a basic and fundamental part of programming. You choose the type based on the range of values your variable can represent (or should be limited to representing). So if whatever you are dealing with can't be negative, then an unsigned or size_t is the proper choice. The choice then allows the compiler to help identify areas where your code would cause that to be violated.
When you compile with warnings enabled (e.g. -Wall -Wextra) which you should use on every compile, you will be warned about possible conflicts in your data-type use. (i.e. comparison between signed and unsigned values, etc...) These are important!
Virtually all modern x86 & x86_64 computers use the twos-compliment representation for signed values. In simple terms it means that if the leftmost bit of a signed number is 1 the value is negative. Herein lie the subtle traps you may fall in when mixing/casting or comparing numbers of varying type. If you choose to cast an unsigned number to a signed number and that number happens to have the most significant bit populated, your large number just became a very small number.
What should I do in these two cases? Cast, or just ignore possible
warnings...
You do what you do each time you are faced with warnings from the compiler. You analyze what is causing the warning, and then you fix it (or if you can't fix it -- (i.e. is comes from some library you don't have access to) -- you understand the warning well enough that you can make an educated decision to disregard it knowing you will not hit any corner-cases that would lead to undefined behavior.
In your examples (while neither should produce warning, they may on some compilers):
arr = calloc (mrow, sizeof(long int *));
What is the range of sizeof(long int *)? Well -- it's the range of what the pointer size can be. So, what's that? (4 bytes on x86 or 8 bytes on x86_64). So the range of values is 4-8, yes that can be properly fixed with a cast to size_t if needed, or better just:
arr = calloc (mrow, sizeof *arr);
Looking at the next example:
char save[22];
...
fgets(save, sizeof(save), stdin)
Here again what is the possible range of sizeof save? From 22 - 22. So yes, if a warnings is produced complainting about the fact that sizeof returns long unsigned and fgets calls for int, 22 can be cast to int.

When to cast size_t
You shouldn't.
Use it where it's appropriate.
(As you already noticed) the libc-library functions tell you where this is the case.
Additionally use it to index arrays.
If in doubt the type suits your program's needs you might go for the useful assertion statement as per Steve Summit's answer and if it fails start over with your program's design.
More on this here by Dan Saks: "Why size_t matters" and "Further insights into size_t"

My other answer got waaaaaaay too long, so here's a short one.
Declare your variables of natural and appropriate types. Let the compiler take care of most conversions. If you have something that is or might be a size, go ahead and use size_t. (Similarly, if you have something that's involved in file sizes or offsets, use off_t.)
Try not to mix signed and unsigned types.
If you're getting warnings about possible data loss because of larger types getting downconverted to possibly smaller types, and if you can't change the types to make the warnings go away, first (a) convince yourselves that the values, in practice, will not ever actually overflow the smaller type, then (b) add an explicit downconversion cast to make the warning go away, and for extra credit (c) add an assertion to document and enforce your assumption:
.
assert(size_i_need <= SIZE_MAX);
char *buf = malloc((size_t)size_i_need);

In general, you're right, you should not ignore the warnings! And in general, if you can, you should shy away from explicit casts, because they can make your code less reliable, or silence warning which are really trying to tell you something important.
Most of the time, I believe, the compiler should do the right thing for you. For example, malloc() expects a size_t, and the compiler knows from the function prototype that it does, so if you write
int size_i_need = 10;
char *buf = malloc(size_i_need);
the compiler will insert the appropriate conversion from int to size_t, as necessary. (I don't believe I've had warnings here I had to worry about, either.)
If the variables you're using are already unsigned, so much the better!
Similarly, if you were to write
fgets(buf, sizeof(buf), ifp);
the compiler will again insert an appropriate conversion. Here, I guess I see what you're getting at, a 64-bit compiler might emit a warning about the downconversion from long to int. Now that I think about it, I'm not sure why I haven't had that problem, because this is a common idiom.
(You also asked about passing unsigned long to malloc, and on a machine where size_t is smaller than long, I suppose that might get you warnings, too. Is that what you were worried about?)
If you've got a downsize that you can't avoid, and your compiler or some other tool is warning about it, and you want to get rid of the warning safely, you could use a cast and an assertion. That is, if you write
unsigned long long size_i_need = 23;
char *buf = malloc(size_i_need);
this might get a warning on a machine where size_t is 32 bits. So you could silence the warning with a cast (on the assumption that your unsigned long long values will never actually be too big), and then back up your assumption with a call to assert:
unsigned long long size_i_need = 23;
assert(size_i_need <= SIZE_MAX);
char *buf = malloc((size_t)size_i_need);
In my experience, the biggest nuisance is printing these things out. If you write
printf("int size = %d\n", sizeof(int));
or
printf("string length = %d\n", strlen("abc"));
on a 64-bit machine, a modern compiler will typically (and correctly) warn you that "format specifies type 'int' but the argument has type 'unsigned long'", or something to that effect. You can fix this in two ways: cast the value to match the printf format, or change the printf format to match the value:
printf("int size = %d\n", (int)sizeof(int));
printf("string length = %lu\n", strlen("abc"));
In the first case, you're assuming that sizeof's result will fit in an int (which is probably a safe bet). In the second case, you're assuming that size_t is in fact unsigned long, which may be true on a 64-bit compiler but may not be true on some other. So it's actually safer to use an explicit cast in the second case, too:
printf("string length = %lu\n", (unsigned long)strlen("abc"));
The bottom line is that abstract types like size_t don't work so well with printf; this is where we can see that the C++ output style of cout << "string length = " << strlen("abc") << endl has its advantages.
To solve this problem, there are some special printf modifiers that are guaranteed to match size_t and I think off_t and a few other abstract types, although they're not so well known. (I wasn't sure where to look them up, but while I've been composing this answer, some commenters have already reminded me.) So the best way to print one of these things (if you can remember, and unless you're using old compilers) would be
printf("string length = %zu\n", strlen("abc"));
Bottom line:
You obviously don't have to worry about passing plain int or plain unsigned to a function like calloc that expects size_t.
When calling something that might result in a downcast, such as passing a size_t to fgets where size_t is 64 bits but int is 32, or passing unsigned long long to calloc where size_t is only 32 bits, you might get warnings. If you can't make the passed-in types smaller (which in the general case you're not going to be able to do), you'll have little choice to silence the warnings but to insert a cast. In this case, to be strictly correct, you might want to add some assertions.
With all of that said, I'm not sure I've actually answered your question, so if you'd like further clarification, please ask.

How declaration of variables behave?

#include<stdio.h>
#include<conio.h>
int main(){
char i;
int c;
scanf("%i",&c);
scanf("%c",&i);// catch the new line or character introduced before x number
printf("%i",i);// value of that character
getch();
return(0);
}
The program will behave in the same way with the next variable declarations instead of the above variable declaration:
this:
int c;
int *x;
int i;
or this:
int *x;
int c;
int i;
And only this way: c variable and a x pointer before the i variable.
I know that those last declarations haven't sense, the int i instead of char i, and an added pointer that isn't even needed.
But this have been occurred accidentally and im wondering if it's only an a coincidence.

The order in which you declare your variables should make no difference at all, assuming there's nothing wrong with the rest of your code. The order of declaration needn't have anything at all to do with the way they're laid out in memory. And even if it did, you refer to variables by name; as long as your code is correct, a reference to i is a reference to i, and the compiler will generate whatever code is needed to access the variable correctly.
Now if you do this:
int i;
scanf("%c", &i);
then you're doing something wrong. scanf with a "%i" format requires a char* argument, which points to the char object into which the value will be stored. You're giving it an int* rather than a char*. As a result, your program's behavior is undefined; the language standard says nothing about how it will behave.
So why does it appear to work correctly? What's probably happening is that scanf treats the address of the int object i as if it were a pointer to a char. It will probably point to the first byte of the representation of i; for example, i might be 32 bits, and the pointer will point to the first 8 of those bits. (They could be the high-order or low-order bits, depending on the system.)
Now when you print the value of i:
printf("%d\n", i);
the contents of i are, for example, 1 byte consisting of whatever character you just read into it, and 3 bytes of garbage. Those 3 garbage bytes may well all be zeros, but they could be anything. If the garbage bytes happen to be 0, and the first byte happens to be the high-order byte (i.e., you're on a big-endian machine), then you're likely to get the "correct" output.
But don't do that. Since the behavior is undefined, it can work "correctly" for years, and then fail spectacularly at the worst possible moment.
The lesson here is that C tends to assume that you know what you're doing. There are a lot of constructs that have undefined behavior, which means that they're invalid, but neither the compiler nor the runtime system is required to tell you that there's a problem. In C, more than in most other languages, it's up to you as a programmer to get things right. The compiler (and other tools) will tell you about some errors, but not all of them.
And in the presence of undefined behavior, the order in which you declare your variables can make a difference. For example, if you write code that reads or writes past the end of a variable, it can matter what happens to be stored there. But don't be tempted to shuffle your declarations around until the program works. Get rid of the undefined behavior so the order doesn't matter.
The solution: Don't make mistakes in the first place. (Of course that's much easier said than done.)
And naming conventions can be helpful. If you had called your char variable c, and your int variable i, rather than vice versa, it would have been easier to keep track of which is which.
But c is a reasonable name for an int variable used to hold input character values -- not for scanf, but for getchar(), as in:
int c;
while ((c = getchar()) != EOF) {
/* ... */
}

The function expects a sequence of references as additional arguments, each one pointing to an object of the type specified by their corresponding %-tag within the format string, in the same order. Read about scanf
These can additionally help you:
I don't understand why I can't get three inputs in c
scanf() leaves the new line char in buffer?
Regarding the last portion of your question, the number of bits of int is always more than char, so it won't cause a problem.

scanf("%d", char*) - char-as-int format string?

What is the format string modifier for char-as-number?
I want to read in a number never exceeding 255 (actually much less) into an unsigned char type variable using sscanf.
Using the typical
char source[] = "x32";
char separator;
unsigned char dest;
int len;
len = sscanf(source,"%c%d",&separator,&dest);
// validate and proceed...
I'm getting the expected warning: argument 4 of sscanf is type char*, int* expected.
As I understand the specs, there is no modifier for char (like %sd for short, or %lld for 64-bit long)
is it dangerous? (will overflow just overflow (roll-over) the variable or will it write outside the allocated space?)
is there a prettier way to achieve that than allocating a temporary int variable?
...or would you suggest an entirely different approach altogether?

You can use %hhd under glibc's scanf(), MSVC does not appear to support integer storage into a char directly (see MSDN scanf Width Specification for more information on the supported conversions)

It is dangerous to use that. Since there an implicit cast from a unsigned char* to an int*, if the number is more than 0xFF it is going to use bytes (max 3) next to the variable in the stack and corrupt their values.
The issue with %hhd is that depending of the size of an int (not necessarily 4 bytes), it might not be 1 byte.
It does not seem sscanf support storage of numbers into a char, I suggest you use an int instead. Although if you want to have the char roll-over, you can just cast the int into a char afterward, like:
int dest;
int len;
len = sscanf(source,"%c%d",&separator,&dest);
dest = (unsigned char)dest;

I want to read in a number never
exceeding 255 (actually much less)
into an unsigned char type variable
using sscanf.
In most contexts you save little to nothing by using char for an integer.
It generally depends on architecture and compiler, but most modern CPUs are not very well at handling data types which are of different in size from register. (Notable exception is the 32-bit int on 64-bit archs.)
Adding here penalties for non-CPU-word-aligned memory access (do not ask me why CPUs do that) use of char should be limited to the cases when one really needs a char or memory consumption is a concern.

It is not possible to do.
sscanf will never write single byte when reading an integer.
If you pass a pointer to a single allocated byte as a pointer to int, you will run out of bounds. This may be OKay due to default alignment, but you should not rely on that.
Create a temporary. This way you will also be able to run-time check it.

Probably the easiest way would be to load the number simply into a temporary integer and store it only if its in the required boundaries. (and you would probably need something like unsigned char result = tempInt & 0xFF; )

It's dangerous. Scanf will write integer sized value over the character-sized variable. In your example (most probably) nothing will happen unless you reorganize your stack variables (scanf will partially overwrite len when trying to write integer sized "dest", but then it returns correct "len" and overwrites it with "right" value).
Instead, do the "correct thing" rather than relying on compiler mood:
char source[] = "x32";
char separator;
unsigned char dest;
int temp;
int len;
len = sscanf(source,"%c%d",&separator,&temp);
// validate and proceed...
if (temp>=YOUR_MIN_VALUE && temp<=YOUR_MAX_VALUE) {
dest = (unsigned char)temp;
} else {
// validation failed
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight