I have program which allocates a 32-bit int and subsequently tries to read 4 bytes from a socket into the int using read(2)
Sometimes the read is incomplete and returns having read say 2 bytes. Is there any way of recovering from this? I suppose I have to produce a pointer halfway into the int to be able to perform another read.
How are you supposed to handle this situation? I can imagine a couple of ugly ways, but no elegant one.
You will first need to ensure to have read the 4 bytes. You do this using function similar to this one (slightly modified):
#include <sys/types.h>
#include <sys/socket.h>
int readall(int s, char *buf, int *len)
{
int total = 0; // how many bytes we've read
int bytesleft = *len; // how many we have left to read
int n = -1;
while(total < *len) {
n = read(s, buf+total, bytesleft, 0);
if (n <= 0) { break; }
total += n;
bytesleft -= n;
}
*len = total; // return number actually read here
return (n<=0)?-1:0; // return -1 on failure, 0 on success
}
And afterwards you can assemble an integer using these four bytes.
Disclaimer: this is bad code. Don't do it.
int value = 0;
int bytes = 0;
while ( (bytes += read(((char*)&value)+bytes, 4-bytes)) < 4 );
So with the information here and at What's the correct way to add 1 byte to a pointer in C/C++? I concluded that:
Addressing into an int can cause alignment problems on some architectures.
You can try to circumvent this by using type punning (a union between an int and a char[]), but in practice this technique seems no more reliable than 1.
The only truly portable way of doing byte pointer arithmetic is using a char* into a char[]. Consequently the correct way of handling this situation is to read all the bytes into the array and construct the int afterwards.
Reconstruction into an int can be done by ntohl or bit-shifting. Bitshifting is portable between host machine endianness since it is defined in terms of multiplication.
So, where does that leave us? Well, the portable option is to read into a char[] and reconstruct with bitshifting or ntohl. The non-portable way is to point a char * into your int as #Patrick87 suggested.
I went down the pragmatic road and just created a char* into my int. Worked fine on my target platform.
Related
I have built a Winsock2 server. Part of that program has a function that receives data from clients. Originally the receive function I built would peek at the incoming data and determine if there was any additional data to be read before allowing recv() to pull the data from the buffer. This worked fine in the beginning of the project but I am now working on improving performance.
Here is a portion of the code I've written to eliminate the use of peek:
unsigned char recv_buffer[4096];
unsigned char *pComplete_buffer = malloc(sizeof(recv_buffer) * sizeof(unsigned char*));
int offset = 0;
int i = 0;
...
for (i; i <= sizeof(recv_buffer); i++) {
if (recv_buffer[i] == NULL) {
break;
}
pComplete_buffer[offset] = recv_buffer[i];
offset++;
}
...
This code would work great but the problem is that NULL == 0. If the client happens to send a 0 this loop will break prematurely. I thought I would be clever and leave the data uninitialized to 0xcc and use that to determine the end of recv_buffer but it seems that clients sometimes send that as part of their data as well.
Question:
Is there a character I can initialize recv_buffer to and reliably break on?
If not, is there another way I can eliminate the use of peek?
The correct solution is to keep track of how many bytes you store in recv_buffer to begin with. sizeof() gives you the TOTAL POSSIBLE size of the buffer, but it does not tell you HOW MANY bytes actually contain valid data.
recv() tells you how many bytes it returns to you. When you recv() data into recv_buffer, use that return value to increment a variable you define to indicate the number of valid bytes in recv_buffer.
For example:
unsigned char recv_buffer[4096];
int num_read, recv_buffer_size = 0;
const int max_cbuffer_size = sizeof(recv_buffer) * sizeof(unsigned char*);
unsigned char *pComplete_buffer = malloc(max_cbuffer_size);
...
num_read = recv(..., recv_buffer, sizeof(recv_buffer), ...);
if (num_read <= 0) {
// error handling...
return;
}
recv_buffer_size = num_read;
...
int available = max_cbuffer_size - offset;
int num_to_copy = min(recv_buffer_size, available);
memcpy(pComplete_buffer + offset, recv_buffer, num_to_copy);
offset += num_to_copy;
memmove(recv_buffer, recv_buffer + num_to_copy, recv_buffer_size - num_to_copy);
recv_buffer_size -= num_to_copy;
...
Is there a character I can initialize recv_buffer to and reliably break on?
Nope. If the other side can send any character at any time, you'll have to examine them.
If you know the sender will never send two NULs in a row (\0\0), you could check for that. But then some day the sender will decide to do that.
If you can change the message structure, I'd send the message length first (as a byte, network-ordered short or int depending on your protocol). Then, after parsing that length, the receiver will know exactly how long to keep reading.
Also if you're using select, that will block until there's something to read or the socket closes (mostly -- read the docs).
Let me start by saying that I openly admit this is for a homework assignment, but what I am asking is not related to the purpose of the assignment, just something I don't understand in C. This is just a very small part of a large program.
So my issue is, I have a set of data that consists various data types as follows:
[16 bit number][16 but number][16 bit number][char[234]][128 bit number]
where each block represents a variable from elsewhere in the program.
I need to send that data 8bytes at a time into a function that accepts uint32_t[2] as an input. How do I convert my 234byte char array into uint32_t without losing the char values?
In other words, I need to be able to convert back from the uint32_t version to the original char array later on. I know a char is 1byte, and the value can also be represented as a number in relation to its ascii value, but not sure how to convert between the two since some letters have a 3 digit ascii value and others have 2.
I tried to use sprintf to grab 8byte blocks from the data set, and store that value in a uint32_t[2] variable. It works, but then I lose the original char array because I can't figure out way to go back/undo it.
I know there has to be a relatively simple way to do this, i'm just lacking enough skill in C to make it happen.
Your question is very confusing, but I am guessing you are preparing some data structure for encryption by a function that requires 8 bytes or 2 uint32_t's.
You can convert a char array to uint32_t as follows
#define NELEM 234
char a[NELEM];
uint64_t b[(NELEM+sizeof(uint64_t)-1)/sizeof(uint64_t)]; // this rounds up to nearest modulo 4
memcpy(b,a,NELEM);
for(i .. ) {
encryption_thing(b[i]);
}
or
If you need to change endianess or something, that is more complicated.
#include <stdint.h>
void f(uint32_t a[2]) {}
int main() {
char data[234]; /* GCC can explicitly align with this: __attribute__ ((aligned (8))) */
int i = 0;
int stride = 8;
for (; i < 234 - stride; i += stride) {
f((uint32_t*)&data[i]); }
return 0; }
I need to send that data 8bytes at a time into a function that accepts
uint32_t[2] as an input. How do I convert my 234byte char array into
uint32_t without losing the char values?
you could use a union for this
typedef union
{
unsigned char arr[128]; // use unsigned char
uint32_t uints[16]; // 128/8
} myvaluetype;
myvaluetype value;
memcpy(value.arr, your_array, sizeof(value.arr));
say the prototype that you want to feed 2 uint32_t at a time is something like
foo(uint32_t* p);
you can now send the data 8 bytes at the time by
for (int i = 0; i < 16; i += 2)
{
foo(myvaluetype.uints + i);
}
then use the same struct to convert back.
of course some care must be taken about padding/alignment you also don't mention if it is sent over a network etc so there are other factors to consider.
I am having trouble understanding the output of the following simple CUDA code. All that the code does is allocate two integer arrays: one on the host and one on the device each of size 16. It then sets the device array elements to the integer value 3 and then copies these values into the host_array where all the elements are then printed out.
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
int num_elements = 16;
int num_bytes = num_elements * sizeof(int);
int *device_array = 0;
int *host_array = 0;
// malloc host memory
host_array = (int*)malloc(num_bytes);
// cudaMalloc device memory
cudaMalloc((void**)&device_array, num_bytes);
// Constant out the device array with cudaMemset
cudaMemset(device_array, 3, num_bytes);
// copy the contents of the device array to the host
cudaMemcpy(host_array, device_array, num_bytes, cudaMemcpyDeviceToHost);
// print out the result element by element
for(int i = 0; i < num_elements; ++i)
printf("%i\n", *(host_array+i));
// use free to deallocate the host array
free(host_array);
// use cudaFree to deallocate the device array
cudaFree(device_array);
return 0;
}
The output of this program is 50529027 printed line by line 16 times.
50529027
50529027
50529027
..
..
..
50529027
50529027
Where did this number come from? When I replace 3 with 0 in the cudaMemset call then I get correct behaviour. i.e.
0 printed line by line 16 times.
I compiled the code with nvcc test.cu on Ubuntu 10.10 with CUDA 4.0
I'm no cuda expert but 50529027 is 0x03030303 in hex. This means cudaMemset sets each byte in the array to 3 and not each int. This is not surprising given the signature of cuda memset (to pass in the number of bytes to set) and the general semantics of memset operations.
Edit: As to your (I guess) implicit question of how to achieve what you intended I think you have to write a loop and initialize each array element.
As others have pointed out, cudaMesetworks like the standard C memset- it sets byte values. From the CUDA documentation:
cudaError_t cudaMemset( void * devPtr, int value, size_t count)
Fills the first count bytes of the memory area pointed to by devPtr
with the constant byte value value.
If you want to set word size values, the best solution is to use your own memset kernel, perhaps something like this:
template<typename T>
__global__ void myMemset(T * x, T value, size_t count )
{
size_t tid = threadIdx.x + blockIdx.x * blockDim.x;
size_t stride = blockDim.x * gridDim.x;
for(int i=tid; i<count; i+=stride) {
x[i] = value;
}
}
which could be launched with enough blocks to cover the number of MP in your GPU, and each thread will do as many iterations as required to fill the memory allocation. Writes will be coalesced, so performance shouldn't be too bad. This could also be adapted to CUDA's vector types, if you so desired.
memset sets bytes, and integer is 4 bytes.. so what you get is 50529027 decimal, which is 0x3030303 in hex... In other words - you are using it wrong, and it has nothing to do with CUDA.
This is a classic memset shortcoming; it works only on data type with 8-bit size i.e char. This means it sets (probably) 3 to every 8-bits of the total memory. You can confirm this by a simple C++ code:
int main ()
{
int x=16;
size_t bytes = x*sizeof(int);
int *M = (int*)malloc(bytes);
memset(M,3,bytes);
for (int i = 0; i < x; ++i) {
printf("%d\n", M[i]);
}
return 0;
}
The only case in which memset works on all data types is when you set it to 0. (it sets every byte to 0 and hence all data to 0). If you change the data type to char, you'll see the desired output. cudaMemset is ditto copy of memset with the only difference that it takes a GPU pointer in input.
So memset or cudaMemset probably sets every byte to the integer value (in your case 3) of whole memory space defined by the third argument regardless of the datatype.
Tip:
Google: 50529027 in binary and you'll get the answer :)
I'm working on Project Euler #14 in C and have figured out the basic algorithm; however, it runs insufferably slow for large numbers, e.g. 2,000,000 as wanted; I presume because it has to generate the sequence over and over again, even though there should be a way to store known sequences (e.g., once we get to a 16, we know from previous experience that the next numbers are 8, 4, 2, then 1).
I'm not exactly sure how to do this with C's fixed-length array, but there must be a good way (that's amazingly efficient, I'm sure). Thanks in advance.
Here's what I currently have, if it helps.
#include <stdio.h>
#define UPTO 2000000
int collatzlen(int n);
int main(){
int i, l=-1, li=-1, c=0;
for(i=1; i<=UPTO; i++){
if( (c=collatzlen(i)) > l) l=c, li=i;
}
printf("Greatest length:\t\t%7d\nGreatest starting point:\t%7d\n", l, li);
return 1;
}
/* n != 0 */
int collatzlen(int n){
int len = 0;
while(n>1) n = (n%2==0 ? n/2 : 3*n+1), len+=1;
return len;
}
Your original program needs 3.5 seconds on my machine. Is it insufferably slow for you?
My dirty and ugly version needs 0.3 seconds. It uses a global array to store the values already calculated. And use them in future calculations.
int collatzlen2(unsigned long n);
static unsigned long array[2000000 + 1];//to store those already calculated
int main()
{
int i, l=-1, li=-1, c=0;
int x;
for(x = 0; x < 2000000 + 1; x++) {
array[x] = -1;//use -1 to denote not-calculated yet
}
for(i=1; i<=UPTO; i++){
if( (c=collatzlen2(i)) > l) l=c, li=i;
}
printf("Greatest length:\t\t%7d\nGreatest starting point:\t%7d\n", l, li);
return 1;
}
int collatzlen2(unsigned long n){
unsigned long len = 0;
unsigned long m = n;
while(n > 1){
if(n > 2000000 || array[n] == -1){ // outside range or not-calculated yet
n = (n%2 == 0 ? n/2 : 3*n+1);
len+=1;
}
else{ // if already calculated, use the value
len += array[n];
n = 1; // to get out of the while-loop
}
}
array[m] = len;
return len;
}
Given that this is essentially a throw-away program (i.e. once you've run it and got the answer, you're not going to be supporting it for years :), I would suggest having a global variable to hold the lengths of sequences already calculated:
int lengthfrom[UPTO] = {};
If your maximum size is a few million, then we're talking megabytes of memory, which should easily fit in RAM at once.
The above will initialise the array to zeros at startup. In your program - for each iteration, check whether the array contains zero. If it does - you'll have to keep going with the computation. If not - then you know that carrying on would go on for that many more iterations, so just add that to the number you've done so far and you're done. And then store the new result in the array, of course.
Don't be tempted to use a local variable for an array of this size: that will try to allocate it on the stack, which won't be big enough and will likely crash.
Also - remember that with this sequence the values go up as well as down, so you'll need to cope with that in your program (probably by having the array longer than UPTO values, and using an assert() to guard against indices greater than the size of the array).
If I recall correctly, your problem isn't a slow algorithm: the algorithm you have now is fast enough for what PE asks you to do. The problem is overflow: you sometimes end up multiplying your number by 3 so many times that it will eventually exceed the maximum value that can be stored in a signed int. Use unsigned ints, and if that still doesn't work (but I'm pretty sure it does), use 64 bit ints (long long).
This should run very fast, but if you want to do it even faster, the other answers already addressed that.
Following my previous question (Why do I get weird results when reading an array of integers from a TCP socket?), I have come up with the following code, which seems to work, sort of. The code sample works well with a small number of array elements, but once it becomes large, the data is corrupt toward the end.
This is the code to send the array of int over TCP:
#define ARRAY_LEN 262144
long *sourceArrayPointer = getSourceArray();
long sourceArray[ARRAY_LEN];
for (int i = 0; i < ARRAY_LEN; i++)
{
sourceArray[i] = sourceArrayPointer[i];
}
int result = send(clientSocketFD, sourceArray, sizeof(long) * ARRAY_LEN);
And this is the code to receive the array of int:
#define ARRAY_LEN 262144
long targetArray[ARRAY_LEN];
int result = read(socketFD, targetArray, sizeof(long) * ARRAY_LEN);
The first few numbers are fine, but further down the array the numbers start going completely different. At the end, when the numbers should look like this:
0
0
0
0
0
0
0
0
0
0
But they actually come out as this?
4310701
0
-12288
32767
-1
-1
10
0
-12288
32767
Is this because I'm using the wrong send/recieve size?
The call to read(..., len) doesn't read len bytes from the socket, it reads a maximum of len bytes. Your array is rather big and it will be split over many TCP/IP packets, so your call to read probably returns just a part of the array while the rest is still "in transit". read() returns how many bytes it received, so you should call it again until you received everything you want. You could do something like this:
long targetArray[ARRAY_LEN];
char *buffer = (char*)targetArray;
size_t remaining = sizeof(long) * ARRAY_LEN;
while (remaining) {
ssize_t recvd = read(socketFD, buffer, remaining);
// TODO: check for read errors etc here...
remaining -= recvd;
buffer += recvd;
}
Is the following ok?
for (int i = 0; sourceArrayPointer < i; i++)
You are comparing apples and oranges (read pointers and integers). This loop doesnot get executed since the pointer to array of longs is > 0 (most always). So, in the receiving end, you are reading off of from an unitialized array which results in those incorrect numbers being passed around).
It'd rather be:
for (int i = 0; i < ARRAY_LEN; i++)
Use functions from <net/hton.h>
http://en.wikipedia.org/wiki/Endianness#Endianness_in_networking
Not related to this question, but you also need to take care of endianness of platforms if you want to use TCP over different platforms.
It is much simpler to use some networking library like curl or ACE, if that is an option (additionally you learn a lot more at higher level like design patterns).
There is nothing to guarantee how TCP will packet up the data you send to a stream - it only guarantees that it will end up in the correct order at the application level. So you need to check the value of result, and keep on reading until you have read the right number of bytes. Otherwise you won't have read the whole of the data. You're making this more difficult for yourself using a long array rather than a byte array - the data may be send in any number of chunks, which may not be aligned to long boundaries.
I see a number of problem's here. First, this is how I would rewrite your send code as I understand it. I assume getSourceArray always returns a valid pointer to a static or malloced buffer of size ARRAY_LEN. I'm also assuming you don't need sourceArrayPointer later in the code.
#define ARRAY_LEN 262144
long *sourceArrayPointer = getSourceArray();
long sourceArray[ARRAY_LEN];
long *sourceArrayIdx = sourceArray;
for (; sourceArrayIdx < sourceArray+ARRAY_LEN ; )
sourceArrayIdx++ = sourceArrayPointer++;
int result = send(clientSocketFD, sourceArray, sizeof(long) * ARRAY_LEN);
if (result < sizeof(long) * ARRAY_LEN)
printf("send returned %d\n", result);
Looking at your original code I'm guessing that your for loop was messed up and never executing resulting in you sending whatever random junk happens to be in the memory sourceArray points to. Basically your condition
sourceArrayPointer < i;
is pretty much guaranteed to fail the first time through.