memset sets some elements to *almost* zero or nan - c

I have the line memset(C, 0, N*M);, where C is a matrix of doubles. (I am using WSL.)
However if I look in gdb, certain elements in the matrix are set to either -nan(0xffffffffffff8) and others are set to for example 9.0096750001652956e-314.
The first one doesn't give any errors but a += doesn't seem to change anything (or at the very least doesn't seem to make the nan thing disappear), while the second is an issue if the element is not changed or only has += 0, as the comparison if (0 == C[i][j]) then fails.
If I set the values to 0 manually, then these issues do not arise at all.
Is this a WSL thing, or is there something about memset that I do not understand?

You do not fully initialize the matrix: memset() expects a number of bytes. Assuming the matrix layout is linear, either 1D or 2D, you should clear sizeof(double) * N * M bytes.
If your matrix is defined as a 2D array, you could write:
#define N 10
#define M 20
double C[N][M];
memset(C, 0, sizeof C);
If the matrix is received as a function argument, you actually get a pointer, so you must be more careful:
void clear_matrix(double C[N][M]) {
memset(C, 0, sizeof(*C) * N);
}
or possibly more readable:
void clear_matrix(double C[N][M]) {
memset(C, 0, sizeof(C[0][0]) * N * M);
}
or simply, as suggested by Lundin, but likely to break if the matrix element type changes:
void clear_matrix(double C[N][M]) {
memset(C, 0, sizeof(double[N][M]);
}
Note however that memset() will clear the matrix data to all bits zero, which sets the double values to +0.0 if the system uses IEEE-754 representation, but is not fully portable. A portable version would use a nested loop and a good compiler will generate the same memset call or inline code if appropriate for the target system:
#include <stddef.h>
#define N 10
#define M 20
void clear_matrix(double C[N][M]) {
for (size_t i = 0; i < N; i++) {
for (size_t j = 0; j < M; j++) {
C[i][j] = 0.0;
}
}
}

The M*N is the number of items (doubles) but memset expects the number of bytes. For an array double C[M][N], you need to do memset(C, 0, sizeof(C));.

Related

why do I have a runtime #2 failure in C when I have enough space and there isn't many data in the array

I'm writing this code in C for some offline games but when I run this code, it says "runtime failure #2" and "stack around the variable has corrupted". I searched the internet and saw some answers but I think there's nothing wrong with this.
#include <stdio.h>
int main(void) {
int a[16];
int player = 32;
for (int i = 0; i < sizeof(a); i++) {
if (player+1 == i) {
a[i] = 254;
}
else {
a[i] = 32;
}
}
printf("%d", a[15]);
return 0;
}
Your loop runs from 0 to sizeof(a), and sizeof(a) is the size in bytes of your array.
Each int is (typically) 4-bytes, and the total size of the array is 64-bytes. So variable i goes from 0 to 63.
But the valid indices of the array are only 0-15, because the array was declared [16].
The standard way to iterate over an array like this is:
#define count_of_array(x) (sizeof(x) / sizeof(*x))
for (int i = 0; i < count_of_array(a); i++) { ... }
The count_of_array macro calculates the number of elements in the array by taking the total size of the array, and dividing by the size of one element.
In your example, it would be (64 / 4) == 16.
sizeof(a) is not the size of a, but rather how many bytes a consumes.
a has 16 ints. The size of int depends on the implementation. A lot of C implementations make int has 4 bytes, but some implementations make int has 2 bytes. So sizeof(a) == 64 or sizeof(a) == 32. Either way, that's not what you want.
You define int a[16];, so the size of a is 16.
So, change your for loop into:
for (int i = 0; i < 16; i++)
You're indexing too far off the size of the array, trying to touch parts of memory that doesn't belong to your program. sizeof(a) returns 64 (depending on C implementation, actually), which is the total amount of bytes your int array is taking up.
There are good reasons for trying not to statically declare the number of iterations in a loop when iterating over an array.
For example, you might realloc memory (if you've declared the array using malloc) in order to grow or shrink the array, thus making it harder to keep track of the size of the array at any given point. Or maybe the size of the array depends on user input. Or something else altogether.
There's no good reason to avoid saying for (int i = 0; i < 16; i++) in this particular case, though. What I would do is declare const int foo = 16; and then use foo instead of any number, both in the array declaration and the for loop, so that if you ever need to change it, you only need to change it in one place. Else, if you really want to use sizeof() (maybe because one of the reasons above) you should divide the return value of sizeof(array) by the return value of sizeof(type of array). For example:
#include <stdio.h>
const int ARRAY_SIZE = 30;
int main(void)
{
int a[ARRAY_SIZE];
for(int i = 0; i < sizeof(a) / sizeof(int); i++)
a[i] = 100;
// I'd use for(int i = 0; i < ARRAY_SIZE; i++) though
}

Why am I geting floating point exception?

This program worked fine when i manually iterated over 5 individual variables but when I substituted them for those arrays and for loop, I started getting floating point exceptions. I have tried debugging but i can't find were the error comes out from.
#include <stdio.h>
int main(void) {
long int secIns;
int multiplicadors[4] = {60, 60, 24, 7};
int variables[5];
int i;
printf("insereix un aquantitat entera de segons: \n");
scanf("%ld", &secIns);
variables[0] = secIns;
for (i = 1; i < sizeof variables; i++) {
variables[i] = variables[i - 1]/multiplicadors[i - 1];
variables[i - 1] -= variables[i]*multiplicadors[i - 1];
}
printf("\n%ld segons són %d setmanes %d dies %d hores %d minuts %d segons\n", secIns, variables[4], variables[3], variables[2], variables[1], variables[0]);
return 0;
}
The problem is you're iterating past the ends of your arrays. The reason is that your sizeof expression isn't what you want. sizeof returns the size in bytes, not the number of elements.
To fix it, change the loop to:
for (i = 1; i < sizeof(variables)/sizeof(*variables); i++) {
On an unrelated note, you might consider changing secIns from long int to int, since it's being assigned to an element of an int array, so the added precision isn't really helping.
Consider this line of code:
for (i = 1; i < sizeof variables; i++) {
sizeof isn't doing what you think it's doing. You've declared an array of 5 ints. In this case, ints are 32-bit, which means they each use 4 bytes of memory. If you print the output of sizeof variables you'll get 20 because 4 * 5 = 20.
You'd need to divide the sizeof variables by the size of its first element.
As mentioned before, sizeOf returns the size of bytes the array holds.
Unlike java's .length that returns the actual length of the array. Takes a little bit more of knowledge with bytes when it comes to C.
https://www.geeksforgeeks.org/data-types-in-c/
This link tells you a bit more about data types and the memory(bytes) they take up.
You could also do sizeOf yourArrayName/sizeOf (int). sizeOf(datatype) returns the size of bytes the data type takes up.
sizeof will give the size (in bytes) of the variables and will yield different results depending on the data type.
Try:
for (i = 1; i < 5; i++) {
...
}

Using memset for integer array in C

char str[] = "beautiful earth";
memset(str, '*', 6);
printf("%s", str);
Output:
******ful earth
Like the above use of memset, can we initialize only a few integer array index values to 1 as given below?
int arr[15];
memset(arr, 1, 6);
No, you cannot use memset() like this. The manpage says (emphasis mine):
The memset() function fills the first n bytes of the memory area pointed to by s with the constant byte c.
Since an int is usually 4 bytes, this won't cut it.
If you (incorrectly!!) try to do this:
int arr[15];
memset(arr, 1, 6*sizeof(int)); //wrong!
then the first 6 ints in the array will actually be set to 0x01010101 = 16843009.
The only time it's ever really acceptable to write over a "blob" of data with non-byte datatype(s), is memset(thing, 0, sizeof(thing)); to "zero-out" the whole struture/array. This works because NULL, 0x00000000, 0.0, are all completely zeros.
The solution is to use a for loop and set it yourself:
int arr[15];
int i;
for (i=0; i<6; ++i) // Set the first 6 elements in the array
arr[i] = 1; // to the value 1.
Short answer, NO.
Long answer, memset sets bytes and works for characters because they are single bytes, but integers are not.
On Linux, OSX and other UNIX like operating systems where wchar_t is 32 bits and you can use wmemset() instead of memset().
#include<wchar.h>
...
int arr[15];
wmemset( arr, 1, 6 );
Note that wchar_t on MS-Windows is 16 bits so this trick may not work.
The third argument of memset is byte size. So you should set total byte size of arr[15]
memset(arr, 1, sizeof(arr));
However probably, you should want to set value 1 to whole elements in arr. Then you've better to set in the loop.
for (i = 0; i < sizeof(arr)/sizeof(arr[0]); i++) {
arr[i] = 1;
}
Because memset() set 1 in each bytes. So it's not your expected.
Since nobody mentioned it...
Although you cannot initialize the integers with value 1 using memset, you can initialize them with value -1 and simply change your logic to work with negative values instead.
For example, to initialize the first 6 numbers of your array with -1, you would do
memset(arr,-1,6*(sizeof int));
Furthermore, if you only need to do this initialization once, you can actually declare the array to start with values 1 from compile time.
int arr[15] = {1,1,1,1,1,1};
Actually it is possible with memset_pattern4 which sets 4 bytes at a time.
memset_pattern4(your_array, your_number, sizeof(your_array));
No, you can't [portably] use memset for that purpose, unless the desired target value is 0. memset treats the target memory region as an array of bytes, not an array of ints.
A fairly popular hack for filling a memory region with a repetitive pattern is actually based on memcpy. It critically relies on the expectation that memcpy copies data in forward direction
int arr[15];
arr[0] = 1;
memcpy(&arr[1], &arr[0], sizeof arr - sizeof *arr);
This is, of course, a pretty ugly hack, since the behavior of standard memcpy is undefined when the source and destination memory regions overlap. You can write your own version of memcpy though, making sure it copies data in forward direction, and use in the above fashion. But it is not really worth it. Just use a simple cycle to set the elements of your array to the desired value.
Memset sets values for data types having 1 byte but integers have 4 bytes or more , so it won't work and you'll get garbage values.
It's mostly used when you are working with char and string types.
Ideally you can not use memset to set your arrary to all 1.Because memset works on byte and set every byte to 1.
memset(hash, 1, cnt);
So once read, the value it will show 16843009 = 0x01010101 = 1000000010000000100000001
Not 0x00000001
But if your requiremnt is only for bool or binary value then we can set using C99 standard for C library
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h> //Use C99 standard for C language which supports bool variables
int main()
{
int i, cnt = 5;
bool *hash = NULL;
hash = malloc(cnt);
memset(hash, 1, cnt);
printf("Hello, World!\n");
for(i=0; i<cnt; i++)
printf("%d ", hash[i]);
return 0;
}
Output:
Hello, World!
1 1 1 1 1
The following program shows that we can initialize the array using memset() with -1 and 0 only
#include<stdio.h>
#include<string.h>
void printArray(int arr[], int len)
{
int i=0;
for(i=0; i<len; i++)
{
printf("%d ", arr[i]);
}
puts("");
}
int main()
{
int arrLen = 15;
int totalNoOfElementsToBeInitialized = 6;
int arr[arrLen];
printArray(arr, arrLen);
memset(arr, -1, totalNoOfElementsToBeInitialized*sizeof(arr[0]));
printArray(arr, arrLen);
memset(arr, 0, totalNoOfElementsToBeInitialized*sizeof(arr[0]));
printArray(arr, arrLen);
memset(arr, 1, totalNoOfElementsToBeInitialized*sizeof(arr[0]));
printArray(arr, arrLen);
memset(arr, 2, totalNoOfElementsToBeInitialized*sizeof(arr[0]));
printArray(arr, arrLen);
memset(arr, -2, totalNoOfElementsToBeInitialized*sizeof(arr[0]));
printArray(arr, arrLen);
return 0;
}

Allocate contiguous memory

I'm trying to allocate a large space of contiguous memory in C and print this out to the user. My strategy for doing this is to create two pointers (one a pointer to double, one a pointer to pointer to double), malloc one of them to the entire size (m * n) in this case the pointer to pointer to double. Then malloc the second one to the size of m. The last step will be to iterate through the size of m and perform pointer arithmetic that would ensure the addresses of the doubles in the large array will be stored in contiguous memory. Here is my code. But when I print out the address it doesn't seem to be in contiguous (or in any sort of order). How do i print out the memory addresses of the doubles (all of them are of value 0.0) correctly?
/* correct solution, with correct formatting */
/*The total number of bytes allocated was: 4
0x7fd5e1c038c0 - 1
0x7fd5e1c038c8 - 2
0x7fd5e1c038d0 - 3
0x7fd5e1c038d8 - 4*/
double **dmatrix(size_t m, size_t n);
int main(int argc, char const *argv[])
{
int m,n,i;
double ** f;
m = n = 2;
i = 0;
f = dmatrix(sizeof(m), sizeof(n));
printf("%s %d\n", "The total number of bytes allocated was: ", m * n);
for (i=0;i<n*m;++i) {
printf("%p - %d\n ", &f[i], i + 1);
}
return 0;
}
double **dmatrix(size_t m, size_t n) {
double ** ptr1 = (double **)malloc(sizeof(double *) * m * n);
double * ptr2 = (double *)malloc(sizeof(double) * m);
int i;
for (i = 0; i < n; i++){
ptr1[i] = ptr2+m*i;
}
return ptr1;
}
Remember that memory is just memory. Sounds trite, but so many people seem to think of memory allocation and memory management in C as being some magic-voodoo. It isn't. At the end of the day you allocate whatever memory you need, and free it when you're done.
So start with the most basic question: If you had a need for 'n' double values, how would you allocate them?
double *d1d = calloc(n, sizeof(double));
// ... use d1d like an array (d1d[0] = 100.00, etc. ...
free(d1d);
Simple enough. Next question, in two parts, where the first part has nothing to do with memory allocation (yet):
How many double values are in a 2D array that is m*n in size?
How can we allocate enough memory to hold them all.
Answers:
There are m*n doubles in a m*n 2D-matrix of doubles
Allocate enough memory to hold (m*n) doubles.
Seems simple enough:
size_t m=10;
size_t n=20;
double *d2d = calloc(m*n, sizeof(double));
But how do we access the actual elements? A little math is in order. Knowing m and n, you can simple do this
size_t i = 3; // value you want in the major index (0..(m-1)).
size_t j = 4; // value you want in the minor index (0..(n-1)).
d2d[i*n+j] = 100.0;
Is there a simpler way to do this? In standard C, yes; in C++ no. Standard C supports a very handy capability that generates the proper code to declare dynamically-sized indexible arrays:
size_t m=10;
size_t n=20;
double (*d2d)[n] = calloc(m, sizeof(*d2d));
Can't stress this enough: Standard C supports this, C++ does NOT. If you're using C++ you may want to write an object class to do this all for you anyway, so it won't be mentioned beyond that.
So what does the above actual do ? Well first, it should be obvious we are still allocating the same amount of memory we were allocating before. That is, m*n elements, each sizeof(double) large. But you're probably asking yourself,"What is with that variable declaration?" That needs a little explaining.
There is a clear and present difference between this:
double *ptrs[n]; // declares an array of `n` pointers to doubles.
and this:
double (*ptr)[n]; // declares a pointer to an array of `n` doubles.
The compiler is now aware of how wide each row is (n doubles in each row), so we can now reference elements in the array using two indexes:
size_t m=10;
size_t n=20;
double (*d2d)[n] = calloc(m, sizeof(*d2d));
d2d[2][5] = 100.0; // does the 2*n+5 math for you.
free(d2d);
Can we extend this to 3D? Of course, the math starts looking a little weird, but it is still just offset calculations into a big'ol'block'o'ram. First the "do-your-own-math" way, indexing with [i,j,k]:
size_t l=10;
size_t m=20;
size_t n=30;
double *d3d = calloc(l*m*n, sizeof(double));
size_t i=3;
size_t j=4;
size_t k=5;
d3d[i*m*n + j*m + k] = 100.0;
free(d3d);
You need to stare at the math in that for a minute to really gel on how it computes where the double value in that big block of ram actually is. Using the above dimensions and desired indexes, the "raw" index is:
i*m*n = 3*20*30 = 1800
j*m = 4*20 = 80
k = 5 = 5
======================
i*m*n+j*m+k = 1885
So we're hitting the 1885'th element in that big linear block. Lets do another. what about [0,1,2]?
i*m*n = 0*20*30 = 0
j*m = 1*20 = 20
k = 2 = 2
======================
i*m*n+j*m+k = 22
I.e. the 22nd element in the linear array.
It should be obvious by now that so long as you stay within the self-prescribed bounds of your array, i:[0..(l-1)], j:[0..(m-1)], and k:[0..(n-1)] any valid index trio will locate a unique value in the linear array that no other valid trio will also locate.
Finally, we use the same array pointer declaration like we did before with a 2D array, but extend it to 3D:
size_t l=10;
size_t m=20;
size_t n=30;
double (*d3d)[m][n] = calloc(l, sizeof(*d3d));
d3d[3][4][5] = 100.0;
free(d3d);
Again, all this really does is the same math we were doing before by hand, but letting the compiler do it for us.
I realize is may be a bit much to wrap your head around, but it is important. If it is paramount you have contiguous memory matrices (like feeding a matrix to a graphics rendering library like OpenGL, etc), you can do it relatively painlessly using the above techniques.
Finally, you might wonder why would anyone do the whole pointer arrays to pointer arrays to pointer arrays to values thing in the first place if you can do it like this? A lot of reasons. Suppose you're replacing rows. swapping a pointer is easy; copying an entire row? expensive. Supposed you're replacing an entire table-dimension (m*n) in your 3D array (l*n*m), even more-so, swapping a pointer: easy; copying an entire m*n table? expensive. And the not-so-obvious answer. What if the rows widths need to be independent from row to row (i.e. row0 can be 5 elements, row1 can be 6 elements). A fixed l*m*n allocation simply doesn't work then.
Best of luck.
Never mind, I figured it out.
/* The total number of bytes allocated was: 8
0x7fb35ac038c0 - 1
0x7fb35ac038c8 - 2
0x7fb35ac038d0 - 3
0x7fb35ac038d8 - 4
0x7fb35ac038e0 - 5
0x7fb35ac038e8 - 6
0x7fb35ac038f0 - 7
0x7fb35ac038f8 - 8 */
double ***d3darr(size_t l, size_t m, size_t n);
int main(int argc, char const *argv[])
{
int m,n,l,i;
double *** f;
m = n = l = 10; i = 0;
f = d3darr(sizeof(l), sizeof(m), sizeof(n));
printf("%s %d\n", "The total number of bytes allocated was: ", m * n * l);
for (i=0;i<n*m*l;++i) {
printf("%p - %d\n ", &f[i], i + 1);
}
return 0;
}
double ***d3darr(size_t l, size_t m, size_t n){
double *** ptr1 = (double ***)malloc(sizeof(double **) * m * n * l);
double ** ptr2 = (double **)malloc(sizeof(double *) * m * n);
double * ptr3 = (double *)malloc(sizeof(double) * m);
int i, j;
for (i = 0; i < l; ++i) {
ptr1[i] = ptr2+m*n*i;
for (j = 0; j < l; ++j){
ptr2[i] = ptr3+j*n;
}
}
return ptr1;
}

Strange behaviour of an elementary CUDA code.

I am having trouble understanding the output of the following simple CUDA code. All that the code does is allocate two integer arrays: one on the host and one on the device each of size 16. It then sets the device array elements to the integer value 3 and then copies these values into the host_array where all the elements are then printed out.
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
int num_elements = 16;
int num_bytes = num_elements * sizeof(int);
int *device_array = 0;
int *host_array = 0;
// malloc host memory
host_array = (int*)malloc(num_bytes);
// cudaMalloc device memory
cudaMalloc((void**)&device_array, num_bytes);
// Constant out the device array with cudaMemset
cudaMemset(device_array, 3, num_bytes);
// copy the contents of the device array to the host
cudaMemcpy(host_array, device_array, num_bytes, cudaMemcpyDeviceToHost);
// print out the result element by element
for(int i = 0; i < num_elements; ++i)
printf("%i\n", *(host_array+i));
// use free to deallocate the host array
free(host_array);
// use cudaFree to deallocate the device array
cudaFree(device_array);
return 0;
}
The output of this program is 50529027 printed line by line 16 times.
50529027
50529027
50529027
..
..
..
50529027
50529027
Where did this number come from? When I replace 3 with 0 in the cudaMemset call then I get correct behaviour. i.e.
0 printed line by line 16 times.
I compiled the code with nvcc test.cu on Ubuntu 10.10 with CUDA 4.0
I'm no cuda expert but 50529027 is 0x03030303 in hex. This means cudaMemset sets each byte in the array to 3 and not each int. This is not surprising given the signature of cuda memset (to pass in the number of bytes to set) and the general semantics of memset operations.
Edit: As to your (I guess) implicit question of how to achieve what you intended I think you have to write a loop and initialize each array element.
As others have pointed out, cudaMesetworks like the standard C memset- it sets byte values. From the CUDA documentation:
cudaError_t cudaMemset( void * devPtr, int value, size_t count)
Fills the first count bytes of the memory area pointed to by devPtr
with the constant byte value value.
If you want to set word size values, the best solution is to use your own memset kernel, perhaps something like this:
template<typename T>
__global__ void myMemset(T * x, T value, size_t count )
{
size_t tid = threadIdx.x + blockIdx.x * blockDim.x;
size_t stride = blockDim.x * gridDim.x;
for(int i=tid; i<count; i+=stride) {
x[i] = value;
}
}
which could be launched with enough blocks to cover the number of MP in your GPU, and each thread will do as many iterations as required to fill the memory allocation. Writes will be coalesced, so performance shouldn't be too bad. This could also be adapted to CUDA's vector types, if you so desired.
memset sets bytes, and integer is 4 bytes.. so what you get is 50529027 decimal, which is 0x3030303 in hex... In other words - you are using it wrong, and it has nothing to do with CUDA.
This is a classic memset shortcoming; it works only on data type with 8-bit size i.e char. This means it sets (probably) 3 to every 8-bits of the total memory. You can confirm this by a simple C++ code:
int main ()
{
int x=16;
size_t bytes = x*sizeof(int);
int *M = (int*)malloc(bytes);
memset(M,3,bytes);
for (int i = 0; i < x; ++i) {
printf("%d\n", M[i]);
}
return 0;
}
The only case in which memset works on all data types is when you set it to 0. (it sets every byte to 0 and hence all data to 0). If you change the data type to char, you'll see the desired output. cudaMemset is ditto copy of memset with the only difference that it takes a GPU pointer in input.
So memset or cudaMemset probably sets every byte to the integer value (in your case 3) of whole memory space defined by the third argument regardless of the datatype.
Tip:
Google: 50529027 in binary and you'll get the answer :)

Resources