Broadcasting (Sharing) an array in MPI

Broadcasting (Sharing) an array in MPI - c

I am trying to share an array of characters among processes in MPI using MPI_Bcast(). Can anyone please let me know what must be passed as count and datatype in this case?
Array:
char * Var;
Broadcast function in MPI:
int MPI_Bcast( void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm )

You want the MPI_Datatype MPI_BYTE and int count should be strlen(Var) + 1
Here is the explanation why:
For the datatype you need something that resembles a char much like you would send MPI_INT to transport integer arrays. This is what MPI_BYTE is for:
MPI_BYTE:
This is an 8-bit positive integer betwee 0 and 255, i.e., a byte.
As for the length, that would have to be the size of the array. If I understand you correctly, this could vary depending on the length of the string you are sending. You therefore have to get the length of the string first by calling strlen(Var). As
The strlen() function calculates the length of the string s,
excluding the terminating null byte ('\0').
You have to add one byte for the null character terminator of the string.
Problem: The above code likely leads to memory leaks: Your problem is, that it lies in the nature of MPI_Broadcast that only one of the participating processes is aware of the message being transmitted. Therefore only this process is able to determine the correct count all other processes are at loss as they have nothing to call strlen on. So what can we do about this?
broadcast the length of the string determined by the root processor via strlen first and then broadcast the actual string
use MPI_Send in combination with MPI_Recv allowing you to poll for the message size on the receiving proceses.
I would suggest using the second alternative as this avoids unnecessary barriers. A good how-to on this topic can be found here.

Related

How to iterate character by character in strings which lengths are larger than INT_MAX, or SIZE_MAX?

C language, How to iterate character by character in strings which lengths are larger than INT_MAX, or SIZE_MAX?
How to find out that string length exceeded the any MAXIMUM SIZE applicable for the code below?
int len = strlen(item);
int i=0;
while (i <= len ) {
//do smth
i++;
}

You can access characters in a string (or elements in an array generally) without integer indices by using pointers:
for (char *p = item; *p; ++p)
{
// Do something.
// *p is the current character.
}

int len = strlen(item);
first, this is not an impedment to have a string longer thatn INT_MAX, but it will if you have to deal with it's length. If you thinkg about the implementation of strlen() you'll see that, as how strings are defined (a sequence of chars in memory bounded by thre presence of a null char) you'll see that the only possible implementation is to search the string, incrementing the length as you traverse it searching for the first null char on it. This makes your code very ineficient, because you first traverse the string searching for its end, then you traverse it a second time to do useful work.
int i=0;
while (i <= len ) {
//do smth
i++;
}
it should be better to use directly a pointer, in a for loop, like this one:
char *p;
for (p = item; *p; p++) {
// so something knowing that the char `*p` is the iterated char.
}
In this way, you navigate the string and stop when you find the null char, and you will not have to traverse it twice.
By the way, having strings longer than INT_MAX is quite difficult, because normally (and more with the new 64bit architectures) you are not allowed to create a so compact memory structure (this meaning that if you try to create a static array of that size, you will be fighting with the compiler, and if you try to malloc() such a huge amount of memory, you will end fighting wiht the operating system)
It's most normal that developers having to deal with huge amounts of memory, use an unseen structure to hold large amounts of characters. Just imagine that you need to insert one char and this forces you to move one gigabyte of memory one position because you have no other way to make room for it. It's simply unpractical to use such an amount. A simple approach is to use a similar structure as it is used for the file data in a disk in a unix system. The data has a series of direct pointers that point to fixed blocks of memory holding characters, but at some point those pointers become double pointers, pointing to an array of simple poointers, then a triple pointer, etc. This way you can handle strings as sort as one byte (with just a memory page)to more than INT_MAX bytes, by selecting an appropiate size for the page and the number of pointers.
Another approach is the mbuf_t approach used by BSD software to handle networking packets. This is expressely appropiate when you have to add to the string in front of it (e.g. to add a new protocol header) or to the rear of the packet (to add payload and/or checksum or trailing data)
One last thing... if you create an array of 5Gb, most probably every today operating system will swap it, as soon as you stop using part of it. This will make your application to start swaping as soon as you move on the array, and probably you will not be able to run your application in a computer with a limited address space (like, today a 32bit machine is)

Can `snprintf()` read out of bounds when a string argument is not null-terminated?

I have the following piece of code, which a colleague claims may contain an out-of-bounds read, which I do not agree with. Could you help settle this argument and explain why?
char *test_filename = malloc(Size + 1);
sprintf(test_filename, "");
if (Size > 0 && Data)
snprintf(test_filename, Size + 1, "%s", Data);
where Data is a non-null-terminated string of type const uint8_t *Data and Size is the size of Data, i.e., number of bytes in Data, of type size_t.
It may read out-of-bounds because the format string is %s, perhaps?

Your colleague is correct. Perhaps unintuitively, snprintf(test_filename, Size + 1, "%s", Data) is guaranteed to read bytes starting at Data until a 0 byte is encountered, in your case typically resulting in an out-of-bounds read.
It will only write Size of these bytes to test_filename and null terminate them, respecting the size limit of the destination; but it will continue to read on. The reason for that is a design choice which enables the caller to determine the needed destination size for dynamic allocation before anything is actually written: snprintf() returns the number of bytes which would be written if the destination had infinite space. This feature is supposed to be used with a destination size of 0 (and potentially a null pointer as the destination). This functionality is useful for arguments which are not strings: With numbers etc. the size of the output is difficult to predict (e.g. locale dependent) and best left to the function at run time.
At the same time the return value indicates whether the output was truncated: If it is greater or equal to the size parameter, not all of the input was used in the output. In your case, what was left out were the bytes starting a Data[Size] and ending with the first 0 byte, or a segmentation fault ;-).
Suggestion for a fix: First of all it is unclear why you would use the printf family to print a string; simply copy it. And then Andrew has a point in his comments that since Datais not null terminated it is not really a string (even if all bytes are printable); so don't start fiddling with strcpy and friends but simply memcpy() the bytes, and null terminate manually.
Oh, and the preceding sprintf(test_filename, ""); does not serve any discernible purpose. If you want to write a null byte to *Data, simply do so; but since you are not using strcat, which would rely on a terminated destination string to extend, it is quite unnecessary.

from the MAN page for snprintf()
The functions snprintf() and vsnprintf() write at most size bytes (including the terminating null byte ('\0')) to str.
Note the at most size bytes
This means that snprintf() will stop transferring bytes after the parameter Size bytes are transferred.
this statement;
sprintf(test_filename, "");
is completely unneeded and has no effect on the operation of the second call to snprintf()
If you want to result in a 'proper' string, suggest:
char *test_filename = calloc( sizeof( char ), Size + 1);
if (Size > 0 && Data)
snprintf(test_filename, Size, "%s", Data);
however, the function: snprintf() keeps reading until a NUL byte is encountered. This can create problems, upto and including a seg fault event.
The function: memcpy() is made for this kind of job. Suggest replacing the call to snprintf() with
memcpy( test_filename, Data, Size );

Why does dynamically allocated array does not update with the new data coming?

I am trying to receive a message from the socket server which sends a large file of around 7MB. Thus in the following code, I try to concatenate all data into one array s from buffer. But as I try the following, I see that the length of s does not change at all, although the total bytes received continue to increase.
char buffer[300];
char* s = calloc(1, sizeof(char));
size_t n = 1;
while ((b_recv = recv(socket_fd,
buffer,
sizeof(buffer), 0)) > 0) {
char *temp = realloc(s, b_recv + n);
s = temp;
memcpy(s + n -1, buffer, b_recv);
n += b_recv;
s[n-1] = '\0';
printf("%s -- %zu",s, strlen(s));
}
free(s);
Is this not the correct way to update receive data of varying sizes? Also when I try to print s, it gives some random question mark characters. What is the mistake that I am making?

Why does dynamically allocated array does not update with the new data coming?
You have not presented any reason to believe that the behavior is as the question characterizes it. You are receiving binary data and storing it in memory, which is fine, but you cannot expect sensible results from treating such data as if it were a C string. Not even when you replace the last byte with a string terminator.
Binary data can and generally does contain bytes with value 0. C strings use such bytes as terminators marking the end of the string data, so, for example, strlen will measure only the number of bytes before the first zero byte, regardless of how many additional bytes have been stored after it. Moreover, even if you do not receive any zero bytes at all, your particular code inserts them, clobbering some of the real bytes received.
You may attempt to print such data to the console as if it were text, but if in fact it does not consist of text encoded according to the runtime character encoding then there is no reason to expect the resulting display to convey useful information. Instead, examine it in memory via a debugger, or write the raw bytes to a file and examine the result with a hex editor, or write them (still raw) through a filter that converts to hexadecimal or some other text representation, or similar. And you have as many bytes to examine as you have copied to the allocated space. You're keeping track of that already, so you don't need strlen() to tell you how many that is.

What does write() write if null terminator is already reached?

For write(fd[1], string, size) - what would happen if string is shorter than size?
I looked up the man page but it doesn't clearly specify that situation. I know that for read, it would simply stop there and read whatever string is, but it's certainly not the case for write. So what is write doing? The return value is still size so is it appending null terminator? Why doesn't it just stop like read.

When you call write(), the system assumes you are writing generic data to some file - it doesn't care that you have a string. A null-terminated string is seen as a bunch of non-zero bytes followed by a zero byte - the system will keep writing out until it's written size bytes.
Thus, specifying size which is longer than your string could be dangerous. It's likely that the system is reading data beyond the end of the string out your file, probably filled with garbage data.

write will write size bytes of data starting at string. If you define string to be an array shorter than size it will have undefined behaviour. But in you previous question the char *line = "apple"; contains 6 characters (i.e. a, p, p, l, e and the null character).
So it is best to write the with the value of size set to the correct value

write(int fildes, const void *buf, size_t nbyte) does not write null terminated strings. It writes the content of a buffer. If there are any null characters in the buffer they will be written as well.
read(int fildes, void *buf, size_t nbyte) also pays no attention to null characters. It reads a number of bytes into the given buffer, up to a maximum of nbyte. It does not add any null terminating bytes.
These are low level routines, designed for reading and writing arbitrary data.

The write call outputs a buffer of the given size. It does not attempt to interpret the data in the buffer. That is, you give it a pointer to a memory location and a number of bytes to write (the length) then, as long as those memory locations exist in a legal portion of your program's data, it will copy those bytes to the output file descriptor.
Unlike the string manipulation routines write, and read for that matter, ignore null bytes, that is bytes with the value zero. read does pay attention to the EOF character and, on certain devices, will only read that amount of data available at the time, perhaps returning less data than requested, but they operate on raw bytes without interpreting them as "strings".
If you attempt to write more data than the buffer contains, it may or may not work depending on the position of the memory. At best the behavior is undefined. At worst you'll get a segment fault and your program will crash.

How do I send an array of integers over TCP in C?

I'm lead to believe that write() can only send data buffers of byte (i.e. signed char), so how do I send an array of long integers using the C write() function in the sys/socket.h library?
Obviously I can't just cast or convert long to char, as any numbers over 127 would be malformed.
I took a look at the question, how to decompose integer array to a byte array (pixel codings), but couldn't understand it - please could someone dumb it down a little if this is what I'm looking for?
Follow up question:
Why do I get weird results when reading an array of integers from a TCP socket?

the prototype for write is:
ssize_t write(int fd, const void *buf, size_t count);
so while it writes in units of bytes, it can take a pointer of any type. Passing an int* will be no problem at all.
EDIT:
I would however, recomend that you also send the amount of integers you plan to send first so the reciever knows how much to read. Something like this (error checking omitted for brevity):
int x[10] = { ... };
int count = 10;
write(sock, &count, sizeof(count));
write(sock, x, sizeof(x));
NOTE: if the array is from dynamic memory (like you malloced it), you cannot use sizeof on it. In this case count would be equal to: sizeof(int) * element_count
EDIT:
As Brian Mitchell noted, you will likely need to be careful of endian issues as well. This is the case when sending any multibyte value (as in the count I recommended as well as each element of the array). This is done with the: htons/htonl and ntohs/ntohl functions.

Write can do what you want it to, but there's some things to be aware of:
1: You may get a partial write that's not on an int boundary, so you have to be prepared to handle that situation
2: If the code needs to be portable, you should convert your array to a specific endianess, or encode the endianess in the message.

The simplest way to send a single int (assuming 4-byte ints) is :
int tmp = htonl(myInt);
write(socket, &tmp, 4);
where htonl is a function that converts the int to network byte order. (Similarly,. when you read from the socket, the function ntohl can be used to convert back to host byte order.)
For an array of ints, you would first want to send the count of array members as an int (in network byte order), then send the int values.

Yes, you can just cast a pointer to your buffer to a pointer to char, and call write() with that. Casting a pointer to a different type in C doesn't affect the contents of the memory being pointed to -- all it does is indicate the programmer's intention that the contents of memory at that address be interpreted in a different way.
Just make sure that you supply write() with the correct size in bytes of your array -- that would be the number of elements times sizeof (long) in your case.

It would be better to have serialize/de-serialize functionality in your client /server program.
Whenever you want to send data, serialize the data into a byte buffer and send it over TCP with byte count.
When receiving data, de-serialize the data from buffer to your own interpretation .
You can interpret byte buffer in any form as you like. It can contain basic data type, objects etc.
Just make sure to take care of endianess and also alignment stuff.

Declare a character array. In each location of the array, store integer numbers, not characters.
Then you just send that.
For example:
char tcp[100];
tcp[0] = 0;
tcp[1] = 0xA;
tcp[2] = 0xB;
tcp[3] = 0xC;
.
.
// Send the character array
write(sock, tcp, sizeof(tcp));

I think what you need to come up with here is a protocol.
Suppose your integer array is:
100, 99, 98, 97
Instead of writing the ints directly to the buffer, I would "serialize" the array by turning it into a string representation. The string might be:
"100,99,98,97"
That's what would be sent over the wire. On the receiving end, you'd split the string by the commas and build the array back up.
This is more standardised, is human readable, and means people don't have to think about hi/lo byte orders and other silly things.
// Sarcasm
If you were working in .NET or Java, you'd probably encode it in XML, like this:
<ArrayOfInt><Int>100</Int><Int>99</Int><Int>98</Int><Int>97</Int></ArrayOfInt>
:)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight