I'm trying to fill zero regions in a matrix using memset() in this way:
unsigned short *ptr;
for(int i=0; i < nRows; ++i)
{
ptr = DepthMat.ptr<unsigned short>(i); /* OCV matrix of uint16 values */
for(int j=0; j<nCols; ++j)
{
uint n=0;
if(ptr[j] == 0) /* zero region found */
{
d_val = ptr[j-1]; /* saves last non-zero value */
while(ptr[j+n] == 0)
{ /* looks for non zero */
++n;
}
d_val = (d_val < ptr[j+n] ? d_val : ptr[j+n]);
memset( ptr+j, d_val, n*sizeof( ptr[0]) );
j += n;
}
}
}
I look for sequences of zero, then I store the positions (ptr+j-1 and ptr+j+n) and the values of the zero regions boundaries, and finally I use memset() to replace the zeros with d_val.
The problem is that when I check the values stored they don't match with d_val, for example, I put the value '222' but I get '57054'.
Any clue?
The value argument to memset() is only a single byte, even though the type for the argument is int.
The manual page describes the function as:
memset - fill memory with a constant byte
So, no more than the least-significant 8 bits of d_val will be written to memory. Since you're treating the memory as an array of short, you get "mangled" values that consist of the same byte repeated through the bytes of the short.
In ... short, don't do this; use a for loop to do a repeated write of actual shorts.
memset writes one char at a time, while you're accessing them as short. 57054 = 222*256+222
Related
Let's say we reserved 32 bytes for an array in C, but it turns out we are only using 24 bytes, how can I reduce the reserved memory that is not currently in use? Is this even possible?
I am not using malloc, but I could.
This is the working minimal reproducible example:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main() {
FILE *input;
input = fopen("1.txt", "r");
int arr[4][300];
if (input == NULL) {
printf( "ERROR. Coundn't open the file.\n" ) ;
} else {
for (int i = 0; i < 300; i++) {
fscanf(input, "%d %d %d %d", &arr[0][i], &arr[1][i],
&arr[2][i], &arr[3][i]);
}
fclose(input);
int sze = 0;
for (int i = 0; i < 300; i++) {
for (int j = 0; j < 4; j++) {
if (abs(arr[j][i]) >= 1)
sze += log10(abs(arr[j][i])) + 1;
else
sze++;
if (arr[j][i] < 0)
sze++;
}
}
printf("Size %d kB\n", sze);
}
return 0;
}
Clarification: What I need is to reduce the memory used by each element in the array, if possible. Let's say I have the number 45 stored, it doesn't take up all 4 bytes of an int, so I need to reduce the memory allocated to only 1 byte. As I said, I am not currently using malloc, but I could switch to malloc, if there's a way to what I want to.
If you want to reduce the used space for a value, you need to assign it to an object of different type.
In your example, you start with an int that probably uses 4 bytes on your system. Then you store the value "45" in it, which needs just one byte. Types with size of 1 byte are for example int8_t or signed char.
First, you cannot change the type of a variable, once it is defined. You may store it in another variable.
Second, all elements of an array have to be of the same type.
So the answer for the given example is simply "No."
If you want to "compress" the stored values, you need to roll your own type. You can invent some kind of "vector" that stores each value in as few bytes as necessary. You will need to store the size of each value, too. And you will need to implement access function to each vector element. This is not a simple task.
I am trying to write a function which uses only pointer-based logic to search through a region of memory (baseAddr) for a certain byte (Byte), counts the occurrences, and stores the offsets in an array (Offsets). Here is what I have so far:
uint8_t findBytes( uint16_t* const Offsets, const uint8_t* const baseAddr,
uint32_t length, uint8_t Byte) {
uint8_t bytesRead = 0;
int count = 0;
while (bytesRead < length) {
if ((baseAddr + bytesRead) == Byte) {
*(Offsets + count) = bytesRead;
count++;
}
bytesRead++;
}
return count;
}
For whatever reason, I always get stuck in an infinite loop if I use (bytesRead < length). Length is always 1024. What exactly is wrong with this code? I am still learning the proper uses of pointers, so I'm 99% sure it's to do with that.
bytesRead is a uint8_t, which has a maximum value of 255. If you have a uint32_t length, your offset should be the same data type. And since potentially every byte could be a match, your count (return value) also needs to be the same type, as does the offsets array.
More problems:
if ((baseAddr + bytesRead) == Byte)
You're not dereferencing the pointer here, you're only checking its value. Your compiler should be issuing a warning here for the type mismatch.
Try:
if (*(baseAddr + bytesRead) == Byte)
Finally, you need to make sure offsets points at enough memory. Again, potentially *every byte could match, which means offsets should be the same length (in elements) as the input data.
I want to read the values of the memory locations of the entire program flash memory of an MCU, in particular, the CC2538 on the OpenMote-CC2538. The read values are then computed into, currently, a large sum of all the values.
At this moment, I have the following code working to traverse the memory and get the values
uint64_t readMemory() {
unsigned char * bytes = (char *) 0x200000;
size_t size = 0x0007FFD4;
size_t i;
uint64_t amount = 0;
for (i = 0; i < size; i++) {
amount += bytes[i];
}
return amount;
}
uint64_t readFlashMemory() {
unsigned int * bytes = (int *) 0x200000;
size_t size = 0x0007FFD4;
size_t i;
uint64_t amount = 0;
for (i = 0; i < size; i+=4) {
amount += FlashGet(bytes);
bytes++;
}
return amount;
}
address 0x200000 and its size is 0x0007FFD4. The first function works with a char and goes to each address one by one, while the second one uses an existing function FlashGet(uint32_t) from the flash.c file, which is a direct access to a register (HWREG).
FlashGet requires a uint32_t address and returns a uint32_t value, as such it has a length of 4 and the address should be moved with 4 in the loop .The first function uses char for the addressing, which is a length of 1 and so the address should also move by 1 in the loop. Am I correct in these statements? If so, am I executing them correctly? For the second function, incrementing the pointer with 1 should move it with 4 due to it being of type uint32_t (similar to int).
However, the functions return a different value.
The first one returns: 674426297757
The second one returns: 8213668631160
As both functions should be doing the same, one or both must be incorrect and is not reading the entire program flash memory.
How can I fix both functions? Is there a better or easier way to read the entire memory when you have the starting address and size?
Consider you have a 4-byte flash memory with content
00 01 02 03
Adding by byte values will give you 0x000000000000006
Adding by 32-bit int values will give you 0x0000000003020100 assuming little-endian.
I have an issue with passing data between arrays for processing that I can't seem to iron out. (I'm running the code on a Nios II processor)
HAL Type Definitions:
alt_u8 : Unsigned 8-bit integer.
alt_u32 : Unsigned 32-bit integer.
The core in my FPGA takes in a 128 bits at a time for data processing. I have this working in my original code by passing 4 x 32 bit unsigned int to the function:
alt_u32 load[4] = {0x10101010, 0x10101010, 0x10101010, 0x10101010};
The function processes this data and using another array I retrieve the info.
data_setload(&context,&load); //load data
data_process(&context); //process
memcpy(resultdata,context.result,4*sizeof(unsigned int));
for(i=0; i<4 ; i++){
printf("received 0x%X \n",resultdata[i]); //print to screen
}
Above works perfectly, but when I try combine it with the second part it does not work.
I have a buffer used to store data:
alt_u8 rbuf[512];
When the data buffer becomes full I'm trying to transfer the contents of 'rbuf' to the array 'load'. The main problem is load[4] takes 4 by 32 bit unsigned int for processing. So I want to 'fill up' these 4 by 32 bit unsigned int with data from rbuf, process the data and save the result to an array. Then loop again and fill the array load[4] with the next set of data (from rbuf) and continue until rbuf is empty. (and pad with zeros if necessary)
alt_u8 rbuf[512];
alt_u8 store[512];
alt_u32 resultdata[512];
alt_u32 *reg;
int d, k, j;
for (j=0; j<512; j++){
read_byte(&ch); //gets data
rbuf[j]=ch; //stores to array rbuf
}
printf(" rbuf is full \n");
memcpy(store,rbuf,512*sizeof(alt_u8)); //store gets the value in rbuf.
for(k=0;k<16;k++) //for loop used take in 4 chars to one unsigned 32 bit int
{
for(d=0;d<4;d++) //store 4 chars into an one 32 bit unsigned int
{
*reg = (*reg<<8 | store[d]) ;
}
reg =+1; //increment pointer to next address location(not working properly)
} //loop back
reg = 0; //set pointer address back to 0
for(j=0;j<16;j++) //trying to process data from here
{
memcpy(load,reg,4*sizeof(alt_u32)); //copy first 4 locations from 'reg' to 'load'
data_setload(&context,&load); //pass 'load' to function
data_process(&context); //process 128 bits
memcpy(resultdata,context.result,4*sizeof(alt_u32)); //results copied to 'resultdata'
*reg = *reg + 4; //increment pointer address by 4?
*resultdata = *resultdata+4; //increment resultdata address by 4 and loop again
}
/** need to put data back in char form for displaying***/
for(k=0;k<16;k++) //for loop used take chars from 32 unsigned int
{
for(d=4;d>=0;d--) //loads 4 chars FROM A 32 unsigned int
{
store[d] = *resultdata;
*resultdata = *resultdata>>8;
}
resultdata =+1; //increment pointer next address location
}
for(d=0; d<512 ; d++){
printf("received 0x%X ",store[d]);
The end goal is to take:
Array_A of unsigned 8 bit copy it into an Array_B[4] of unsigned 32 bit >> Process the Array_B[4] with my HDL code. It requires the input to be 128bits.
Then loop back and take the next 128 bits and process them.
reg is defined but not initialized, so it will be a null pointer and you are triying to write a value to it (*reg assigns value, reg assigns address).
Also, the k-d loop is wrong. If you got reg initialized correctly, then a really easy way to do that is:
for(k=0;k<16;k += 4) //for loop used take chars from 32 unsigned int
{
*rbuf = *((alt_u32*)&store[k]);
rbuf++;
}
that loop will take the four intengers stored as bytes in the beginning of store and copies them to where rbuf is pointing.
I'm nearly shure that's not what you want to achieve, but is what your code was trying to do. If you want to fully copy the store to where rbuf points then you can do this:
for(k=0;k<512;k += 4) //for loop used take chars from 32 unsigned int
{
*rbuf = *((alt_u32*)&store[k]);
rbuf++;
}
That will copy all the values stored at store to rbuf.
Also, a better, faster, and cleaner way:
memcpy(rbuf, &store, 512);
rbuf += 512 / sizeof(alt_u32);
Finally, if you just want to fill load with the first four integers, then you can do that:
for(k = 0; k < 4; k++)
{
load[k] = *((alt_u32*)&rbuf[k * 4]);
}
or
memcpy(&load, &rbuf, 4 * sizeof(alt_u32));
then you don't need store for noting.
Finally, here is a full rewriten function with the minimum memory usage and best performance:
alt_u8 rbuf[512];
alt_u32 resultdata[128]; //fixed its size to 128, (512 / sizeof(alt_u32))
int j;
//Do the loop to load data in rbuf
for (j=0; j<512; j++)
read_byte(&rbuf[j]);
printf(" rbuf is full \n");
//Loop through rbuf in 4 * 32 bits per iteration (4*4 bytes)
for(j = 0; j < 512; j+= sizeof(alt_u32) * 4)
{
data_setload(&context, (alt_u32*)&rbuf[j]); //I assume this function expects an alt_u32 pointer to 4 alt_u32 values
data_process(&context);
memcpy(&resultdata[j / sizeof(alt_u32)], context.result, sizeof(alt_u32) * 4);//I assume context.result is a pointer, if not then add & before it
}
//Print received data
for(j=0; j<512 ; j++){
printf("received 0x%X ",rbuf[d]);
I am having trouble understanding the output of the following simple CUDA code. All that the code does is allocate two integer arrays: one on the host and one on the device each of size 16. It then sets the device array elements to the integer value 3 and then copies these values into the host_array where all the elements are then printed out.
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
int num_elements = 16;
int num_bytes = num_elements * sizeof(int);
int *device_array = 0;
int *host_array = 0;
// malloc host memory
host_array = (int*)malloc(num_bytes);
// cudaMalloc device memory
cudaMalloc((void**)&device_array, num_bytes);
// Constant out the device array with cudaMemset
cudaMemset(device_array, 3, num_bytes);
// copy the contents of the device array to the host
cudaMemcpy(host_array, device_array, num_bytes, cudaMemcpyDeviceToHost);
// print out the result element by element
for(int i = 0; i < num_elements; ++i)
printf("%i\n", *(host_array+i));
// use free to deallocate the host array
free(host_array);
// use cudaFree to deallocate the device array
cudaFree(device_array);
return 0;
}
The output of this program is 50529027 printed line by line 16 times.
50529027
50529027
50529027
..
..
..
50529027
50529027
Where did this number come from? When I replace 3 with 0 in the cudaMemset call then I get correct behaviour. i.e.
0 printed line by line 16 times.
I compiled the code with nvcc test.cu on Ubuntu 10.10 with CUDA 4.0
I'm no cuda expert but 50529027 is 0x03030303 in hex. This means cudaMemset sets each byte in the array to 3 and not each int. This is not surprising given the signature of cuda memset (to pass in the number of bytes to set) and the general semantics of memset operations.
Edit: As to your (I guess) implicit question of how to achieve what you intended I think you have to write a loop and initialize each array element.
As others have pointed out, cudaMesetworks like the standard C memset- it sets byte values. From the CUDA documentation:
cudaError_t cudaMemset( void * devPtr, int value, size_t count)
Fills the first count bytes of the memory area pointed to by devPtr
with the constant byte value value.
If you want to set word size values, the best solution is to use your own memset kernel, perhaps something like this:
template<typename T>
__global__ void myMemset(T * x, T value, size_t count )
{
size_t tid = threadIdx.x + blockIdx.x * blockDim.x;
size_t stride = blockDim.x * gridDim.x;
for(int i=tid; i<count; i+=stride) {
x[i] = value;
}
}
which could be launched with enough blocks to cover the number of MP in your GPU, and each thread will do as many iterations as required to fill the memory allocation. Writes will be coalesced, so performance shouldn't be too bad. This could also be adapted to CUDA's vector types, if you so desired.
memset sets bytes, and integer is 4 bytes.. so what you get is 50529027 decimal, which is 0x3030303 in hex... In other words - you are using it wrong, and it has nothing to do with CUDA.
This is a classic memset shortcoming; it works only on data type with 8-bit size i.e char. This means it sets (probably) 3 to every 8-bits of the total memory. You can confirm this by a simple C++ code:
int main ()
{
int x=16;
size_t bytes = x*sizeof(int);
int *M = (int*)malloc(bytes);
memset(M,3,bytes);
for (int i = 0; i < x; ++i) {
printf("%d\n", M[i]);
}
return 0;
}
The only case in which memset works on all data types is when you set it to 0. (it sets every byte to 0 and hence all data to 0). If you change the data type to char, you'll see the desired output. cudaMemset is ditto copy of memset with the only difference that it takes a GPU pointer in input.
So memset or cudaMemset probably sets every byte to the integer value (in your case 3) of whole memory space defined by the third argument regardless of the datatype.
Tip:
Google: 50529027 in binary and you'll get the answer :)