Range of MPI_offset in MPI - c

I am using MPI2.2 standard to write parallel program in C. I have 64 bit machine.
/* MPI offset is long long*/
MPI_Offset my_offset; printf ("%3d: my offset = %lld\n", my_rank, my_offset);
int count;
MPI_Get_count(&status, MPI_BYTE, &count);
printf ("%3d: read =%d\n", my_rank, count);
I am reading a file of very large size byte by byte. To read the file parallely i am setting the offset for each process using offset variable. I am having confusion for the data-type of MPI_offset type, that "whither it is signed or unsigned" long ?
My second question is about limitation of the "range of count variable" which is used in MPI_Get_count() function. since this function is used here to read all the elements from each process's buffer so i think it should also be of the long long type to read such a very large file.

MPI_Offset's size isn't defined by the standard - it is, roughly, as large as possible. ROMIO, a widely-used underlying implemetation of MPI-IO, uses 8-byte integers on systems which support them. You can probably find out for sure by looking in your system's mpi.h.
MPI_Offset is very definitely signed; there are functions like MPI_File_seek where it is perfectly reasonable to have values of type MPI_Offset take negative values.
MPI_Get_count returns an integer, of normal integer size, and this can certainly cause problems for some large file IO strategies.
Generally, it's better for a number of reasons not to use small low-level units of IO like bytes when doing MPI-IO; it's better in terms of performance and code readability to express the IO in units of your underlying data types. In doing so, these size limitations become less of an issue. If your underlying data type really is bytes, though, thre aren't many options.

Did you try to interleave MPI_File_read with something like MPI_File_seek(mpiFile,mpiOffset,MPI_SEEK_CUR ) ? This way you may succeed to avoid MPI_Offset overflow

Related

Reading from binary after x bytes in C

I am trying to read double values from a binary in C, but the binary starts with an integer and then the doubles I am looking for.
How do I skip that first 4 bytes when reading with fread()?
Thanks
Try this:
fseek(input, sizeof(int), SEEK_SET);
before any calls to fread.
As Weather Vane said you can use sizeof(int) safely if the file was generated in the same system architecture as the program you are writing. Otherwise, you should manually specify the size of integer of the system where the file originated.
You can use fseek to skip the initial integer. If you insist on using fread anyway, then you can read the integer first:
fread(ptr, sizeof(int), 1, stream).
Of course you have to declare ptr before calling fread.
As I said, fseek is another option:
fseek(stream, sizeof(int), SEEK_SET).
Beware that fseek moves the file pointer in bytes (1 in the given line from the beginning of the file); integer can be 4 or other number of bytes which is system specific.
Be careful when implementing things like this. If the file isn't created on the same machine, you may get invalid values due to different floating point specifications.
If the file you're reading is created on the same machine, make sure that the program that writes, correctly address the type sizes.
If both writer and reader are developed in C and are supposed to run only on the same machine, use the fseek() with the sizeof(type) used in the writer in the offset parameter.
If the machine that writes the binary isn't the same that will read it, you probably don't want to even read the doubles with fread() as their format may differ due to possible different architectures.
Many architectures rely on the IEEE 754 for floating point format, but if the application is supposed to address multi-platform support, you should make sure that the serialized format can be read from all architectures (or converted while unserializing).
Just read those 4 unneeded bytes, like
void* buffer = malloc(sizeof(double));
fread(buffer,4,1,input); //to skip those four bytes
fread(buffer,sizeof(double),1,input); //then read first double =)
double* data = (double*)buffer;//then convert it to double
And so on

Using RNDADDENTROPY to add entropy to /dev/random

I have a device which generates some noise that I want to add to the entropy pool for the /dev/random device in an embedded Linux system.
I'm reading the man page on /dev/random and I don't really understand the structure that you pass into the RNDADDENTROPY ioctl call.
RNDADDENTROPY
Add some additional entropy to the input pool, incrementing
the entropy count. This differs from writing to /dev/random
or /dev/urandom, which only adds some data but does not
increment the entropy count. The following structure is used:
struct rand_pool_info {
int entropy_count;
int buf_size;
__u32 buf[0];
};
Here entropy_count is the value added to (or subtracted from)
the entropy count, and buf is the buffer of size buf_size
which gets added to the entropy pool.
Is entropy_count in this structure the number of bits that I am adding? Why wouldn't this just always be buf_size * 8 (assuming that buf_size is in terms of bytes)?
Additionally why is buf a zero size array? How am I supposed to assign a value to it?
Thanks for any help here!
I am using a hardware RNG to stock my entropy pool. My struct is a static size
and looks like this (my kernel has a slightly different random.h; just copy what
you find in yours and increase the array size to whatever you want):
#define BUFSIZE 256
/* WARNING - this struct must match random.h's struct rand_pool_info */
typedef struct {
int bit_count; /* number of bits of entropy in data */
int byte_count; /* number of bytes of data in array */
unsigned char buf[BUFSIZ];
} entropy_t;
Whatever you pass in buf will be hashed and will stir the entropy pool.
If you are using /dev/urandom, it does not matter what you pass for bit_count
because /dev/urandom ignores it equaling zero and just keeps on going.
What bit_count does is push the point out at which /dev/random will block
and wait for something to add more entropy from a physical RNG source.
Thus, it's okay to guesstimate on bit_count. If you guess low, the worst
that will happen is that /dev/random will block sooner than it otherwise
would have. If you guess high, /dev/random will operate like /dev/urandom
for a little bit longer than it otherwise would have before it blocks.
You can guesstimate based on the "quality" of your entropy source.
If it's low, like characters typed by humans, you can set it to 1 or 2
per byte. If it's high, like values read from a dedicated hardware RNG,
you can set it to 8 bits per byte.
If your data is perfectly random, then I believe it would be appropriate for entropy_count to be the number of bits in the buffer you provide. However, many (most?) sources of randomness aren't perfect, and so it makes sense for the buffer size and amount of entropy to be kept as separate parameters.
buf being declared to be size zero is a standard C idiom. The deal is that when you actually allocate a rand_pool_info, you do malloc(sizeof(rand_pool_info) + size_of_desired_buf), and then you refer to the buffer using the buf member. Note: With some C compilers, you can declare buf[*] instead of buf[0] to be explicit that in reality buf is "stretchy".
The number of bytes you have in the buffer correlates to the entropy of the data but the entropy can not be calculated only from that data or its length.
Sure, if the data came from a good, unpredictable and equal-distributed hardware random noise generatr the entropy (in bits) is 8*size of the buffer (in bytes).
But if the bits are not equally distributed or are somehow predictable the entropy becomes less.
See https://en.wikipedia.org/wiki/Entropy_(information_theory)
I hope that helps.

How do I force the program to use unaligned addresses?

I've heard reads and writes of aligned int's are atomic and safe, I wonder when does the system make non malloc'd globals unaligned other than packed structures and casting/pointer arithmetic byte buffers?
[X86-64 linux] In all of my normal cases, the system always chooses integer locations that don't get word torn, for example, two byte on one word and the other two bytes on the other word. Can any one post a program/snip (C or assembly) that forces the global variable to unaligned address such that the integer gets torn and the system has to use two reads to load one integer value ?
When I print the below program, the addresses are close to each other such that multiple variables are within 64bits but never once word tearing is seen (smartness in the system or compiler ?)
#include <stdio.h>
int a;
char b;
char c;
int d;
int e = 0;
int isaligned(void *p, int N)
{
if (((int)p % N) == 0)
return 1;
else
return 0;
}
int main()
{
printf("processor is %d byte mode \n", sizeof(int *));
printf ( "a=%p/b=%p/c=%p/d=%p/f=%p\n", &a, &b, &c, &d, &e );
printf ( " check for 64bit alignment of test result of 0x80 = %d \n", isaligned( 0x80, 64 ));
printf ( " check for 64bit alignment of a result = %d \n", isaligned( &a, 64 ));
printf ( " check for 64bit alignment of d result = %d \n", isaligned( &e, 64 ));
return 0;}
Output:
processor is 8 byte mode
a=0x601038/b=0x60103c/c=0x60103d/d=0x601034/f=0x601030
check for 64bit alignment of test result of 0x80 = 1
check for 64bit alignment of a result = 0
check for 64bit alignment of d result = 0
How does a read of a char happen in the above case ? Does it read from 8 byte aligned boundary (in my case 0x601030 ) and then go to 0x60103c ?
Memory access granularity is always word size isn't it ?
Thx.
1) Yes, there is no guarantee that unaligned accesses are atomic, because [at least sometimes, on certain types of processors] the data may be written as two separate writes - for example if you cross over a memory page boundary [I'm not talking about 4KB pages for virtual memory, I'm talking about DDR2/3/4 pages, which is some fraction of the total memory size, typically 16Kbits times whatever the width is of the actual memory chip - which will vary depending on the memory stick itself]. Equally, on other processors than x86, you get a trap for reading unaligned memory, which would either cause the program to abort, or the read be emulated in software as multiple reads to "fix" the unaligned read.
2) You could always make an unaligned memory region by something like this:
char *ptr = malloc(sizeof(long long) * number+1);
long long *unaligned = (long long *)&ptr[2];
for(i = 0; i < number; i++)
temp = unaligned[i];
By the way, your alignment check checks if the address is aligned to 64 bytes, not 64 bits. You'll have to divide by 8 to check that it's aligned to 64 bits.
3) A char is a single byte read, and the address will be on the actual address of the byte itself. The actual memory read performed is probably for a full cache-line, starting at the target address, and then cycling around, so for example:
0x60103d is the target address, so the processor will read a cache line of 32 bytes, starting at the 64-bit word we want: 0x601038 (and as soon as that's completed the processor goes on to the next instruction - meanwhile the next read will be performed to fill the cacheline), then cacheline is filled with 0x601020, 0x601028, 0x601030. But should we turn the cache off [if you want your 3GHz latest x86 processor to be slightly slower than a 66MHz 486, disabling the cache is a good way to achieve that], the processor would just read one byte at 0x60103d.
4) Not on x86 processors, they have byte addressing - but for normal memory, reads are done on a cacheline basis, as explained above.
Note also that "may not be atomic" is not at all the same as "will not be atomic" - so you'll probably have a hard time making it go wrong by will - you really need to get all the timings of two different threads just right, and straddle cachelines, straddle memory page boundaries, and so on to make it go wrong - this will happen if you don't want it to happen, but trying to make it go wrong can be darn hard [trust me, I've been there, done that].
It probably doesn't, outside of those cases.
In assembly it's trivial. Something like:
.org 0x2
myglobal:
.word SOME_NUMBER
But on Intel, the processor can safely read unaligned memory. It might not be atomic, but that might not be apparent from the generated code.
Intel, right? The Intel ISA has single-byte read/write opcodes. Disassemble your program and see what it's using.
Not necessarily - you might have a mismatch between memory word size and processor word size.
1) This answer is platform-specific. In general, though, the compiler will align variables unless you force it to do otherwise.
2) The following will require two reads to load one variable when run on a 32-bit CPU:
uint64_t huge_variable;
The variable is larger than a register, so it will require multiple operations to access. You can also do something similar by using packed structures:
struct unaligned __attribute__ ((packed))
{
char buffer[2];
int unaligned;
char buffer2[2];
} sample_struct;
3) This answer is platform-specific. Some platforms may behave like you describe. Some platforms have instructions capable of fetching a half-register or quarter-register of data. I recommend examining the assembly emitted by your compiler for more details (make sure you turn off all compiler optimizations first).
4) The C language allows you to access memory with byte-sized granularity. How this is implemented under the hood and how much data your CPU fetches to read a single byte is platform-specific. For many CPUs, this is the same as the size of a general-purpose register.
The C standards guarantee that malloc(3) returns a memory area that complies to the strictest alignment requirements, so this just can't happen in that case. If there are unaligned data, it is probably read/written by pieces (that depends on the exact guarantees the architecture provides).
On some architectures unaligned access is allowed, on others it is a fatal error. When allowed, it is normally much slower than aligned access; when not allowed the compiler must take the pieces and splice them together, and that is even much slower.
Characters (really bytes) are normally allowed to have any byte address. The instructions working with bytes just get/store the individual byte in that case.
No, memory access is according to the width of the data. But real memory access is in terms of cache lines (read up on CPU cache for this).
Non-aligned objects can never come into existence without you invoking undefined behavior. In other words, there is no sequence of actions, all having well-defined behavior, which a program can take that will result in a non-aligned pointer coming into existence. In particular, there is no portable way to get the compiler to give you misaligned objects. The closest thing is the "packed structure" many compilers have, but that only applies to structure members, not independent objects.
Further, there is no way to test alignedness in portable C. You can use the implementation-defined conversions of pointers to integers and inspect the low bits, but there is no fundamental requirement that "aligned" pointers have zeros in the low bits, or that the low bits after conversion to integer even correspond to the "least significant" bits of the pointer, whatever that would mean. In other words, conversions between pointers and integers are not required to commute with arithmetic operations.
If you really want to make some misaligned pointers, the easiest way to do it, assuming alignof(int)>1, is something like:
char buf[2*sizeof(int)+1];
int *p1 = (int *)buf, *p2 = (int *)(buf+sizeof(int)+1);
It's impossible for both buf and buf+sizeof(int)+1 to be simultaneously aligned for int if alignof(int) is greater than 1. Thus at least one of the two (int *) casts gets applied to a misaligned pointer, invoking undefined behavior, and the typical result is a misaligned pointer.

Declare array large enough to hold a type

Suppose I'm given a function with the following signature:
void SendBytesAsync(unsigned char* data, T length)
and I need a buffer large enough to hold a byte array of the maximum length that can be specified by type T. How do I declare that buffer? I can't just use sizeof as it will return the size (in bytes) of type T and not the maximum value that the type could contain. I don't want to use limits.h as the underlying type could change and my buffer be too small. I can't use pow from math.h because I need a constant expression. So how do I get a constant expression for the maximum size of a type at compile time in C?
Edit
The type will be unsigned. Since everyone seems to be appalled at the idea of a statically allocated buffer determined at compile time, I'll provide a little background. This is for an embedded application (on a microcontroller) where reliability and speed are the priorities. As such, I'm perfectly OK with wasting statically assigned memory for the sake of run time integrity (no malloc issues) and performance (no overhead for memory allocation each time I need the buffer). I understand the risk that if the max size of T is too large my linker will not be able to allocate a buffer that big, but that will be a compile-time failure, which can be accommodated, rather than a run-time failure, which cannot be tolerated. If, for example I use size_t for the size of the payload and allocate the memory dynamically, there is a very real possibility that the system will not have that much memory available. I would much rather know this at compile time, than at run-time where this will result in packet loss, data corruption, etc. Looking at the function signature I provided, it is ridiculous to provide a type as a size parameter for a dynamically allocated buffer and not expect the possibility that a caller will use the max value of the type. So I'm not sure why there seems to be so much consternation about allocating that memory once, for good. I can see this being a huge problem in the Windows world where multiple processes are fighting for the same memory resources, but in the embedded world, there's only 1 task to be done and if you can't do that effectively, then it doesn't matter how much memory you saved.
Use _Generic:
#define MAX_SIZE(X) _Generic((X),
long: LONG_MAX,
unsigned long: ULONG_MAX,
/* ... */)
Prior to C11 there isn't a portable way to find an exact maximum value of an object of type T (all calculations with CHAR_BIT, for example, may yield overestimates due to padding bits).
Edit: Do note that under certain conditions (think segmented memory of real-life situations) you might not be able to allocate a buffer large enough to equal the maximum value of any given type T.
if T is unsigned, then would ((T) -1) work?
(This is probably really bad, and if so, please let me know why :-) )
Is there a reason why you are allocating the maximum possible buffer size instead of a buffer that is only as large as you need? Why not have the caller simply specify the amount of memory needed?
Recall that the malloc() function takes an argument of type size_t. That means that (size_t)(-1) (which is SIZE_MAX in C99 and later) will represent the largest value that can be passed to malloc. If you are using malloc as your allocator, then this will be your absolute upper limit.
Maybe try using a bit shift?
let's see:
unsigned long max_size = (1 << (8 * sizeof(T))) - 1
sizeof(T) gives you the number of bytes T occupies in memory. (not technically true. usually the compiler will align the structure with memory... so if T is one byte, it will actually allocate 4, or something.)
Breaking it down:
8 * sizeof(T) gives you the number of bits that size represents
1 << x is the same as saying 2 to the x power. Because every time you shift to the left, you're multiplying by two. Just as every time you shift to the left in base 10, you are multiplying by 10.
- 1 an 8-bit number can hold 256 values. 0..255.
Interesting question. I would start by looking in the 'limits' header for the max value of a numeric type T. I have not tried it but I would do something that uses T::max

int v/s. long in C

On my system, I get:
sizeof ( int ) = 4
sizeof ( long ) = 4
When I checked with a C program, both int & long overflowed to the negative after:
a = 2147483647;
a++;
If both can represent the same range of numbers, why would I ever use the long keyword?
int has a minimum range of -32767 to 32767, whereas long has a minimum range of -2147483647 to 2147483647.
If you are writing portable code that may have to compile on different C implementations, then you should use long if you need that range. If you're only writing non-portable code for one specific implementation, then you're right - it doesn't matter.
Because sizeof(int)==sizeof(long) isn't always true. int normaly represents the fastest size with at least 2*8 Bit. long on the other hand is at least 4*8 Bit.
C defines a number of integer types and specifies the relation of their sizes. Basically, what it says is that sizeof(long long) >= sizeof(long) >= sizeof(int) >= sizeof(short) >= sizeof(char), and that sizeof(char) == 1.
But the actual sizes are not defined, and depend on the architecture you are running on. On a 32-bit PC, int and long are typically four bytes and long long is 8 bytes. But on a 64-bit system, long is typically 8 bytes, and thus different from int.
There is also a type called uintptr_t (and intptr_t) that is guaranteed to have the same size as data pointers.
The important thing to remember is to not assume that you can, for example, store pointer values in a long or an int. Being portable is probably more important than you think, and it is likely that you will want to compile your code on a 64-bit system in the near future.
I think it's more of a compiler issue nowadays, since computers has gone much faster and demands more numbers, as was the case before.
On different platform or with a different compiler, the int and long may be different.
If you don't plan to port your code to anything else or use a different machine, then pick the one you want, it won't make a difference.
It depends on the compiler, and you might want to check this out: What does the C++ standard state the size of int, long type to be?
The size of built-in data types is variable depending on the C implementation, but they all have minimum ranges. Nowadays, int is typically 4 bytes long (32-bits) because most OS are 32-bit. Note that char will always be 1 bytes.
The size of a data type depends upon the compiler. Different compilers have diffrent size of int and other data types.
So if you make a code which is going to run on diffrent machine you should use long or it is depend on the range of the value tha t ur variable may have.

Resources