long int high and low bits pointers - c

I'm trying to implement improved sequential multiplication algorithm in C. where the size of the product register is two times the size of the multiplicand and multiplier. in C an int is of 4 bytes and a long int makes it 8 bytes. I wanted to access the higher and lower 32-bits independently. so I pointed the lower and upper bits like:
long long int product = 0;
int* high = &product;
int* low = &product;
low++;
but this didn't work because I thought that if an int is allotted 4 bytes then a long int would be allotted 8 bytes and the pointer would be pointing to the MSB of the allocated memory. I'm not sure if this is actually how allocation is done. can anyone please help me clear this confusion.
I solved the problem using by doing this:
long long int product=0;
int* low = &product;
int* high = &product;
high++;
but I'm still confused that why is it working correctly;

You are probably using a computer that is Little-Endian. On a little-endian machine, the least significant byte is first.

Related

Best way to store 256 bit AVX vectors into unsigned long integers

I was wondering what is the best way to store a 256 bit long AVX vectors into 4 64 bit unsigned long integers. According to the functions written in the website https://software.intel.com/sites/landingpage/IntrinsicsGuide/ I could only figure out using maskstore(code below) to do this. But is it the best way to do so? Or there exist other methods for this?
#include <immintrin.h>
#include <stdio.h>
int main() {
unsigned long long int i,j;
unsigned long long int bit[32][4];//256 bit random numbers
unsigned long long int bit_out[32][4];//256 bit random numbers for test
for(i=0;i<32;i++){ //load with 64 bit random integers
for(j=0;j<4;j++){
bit[i][j]=rand();
bit[i][j]=bit[i][j]<<32 | rand();
}
}
//--------------------load masking-------------------------
__m256i v_bit[32];
__m256i mask;
unsigned long long int mask_ar[4];
mask_ar[0]=~(0UL);mask_ar[1]=~(0UL);mask_ar[2]=~(0UL);mask_ar[3]=~(0UL);
mask = _mm256_loadu_si256 ((__m256i const *)mask_ar);
//--------------------load masking ends-------------------------
//--------------------------load the vectors-------------------
for(i=0;i<32;i++){
v_bit[i]=_mm256_loadu_si256 ((__m256i const *)bit[i]);
}
//--------------------------load the vectors ends-------------------
//--------------------------extract from the vectors-------------------
for(i=0;i<32;i++){
_mm256_maskstore_epi64 (bit_out[i], mask, v_bit[i]);
}
//--------------------------extract from the vectors end-------------------
for(i=0;i<32;i++){ //load with 64 bit random integers
for(j=0;j<4;j++){
if(bit[i][j]!=bit_out[i][j])
printf("----ERROR----\n");
}
}
return 0;
}
As other said in comments you do not need to use mask store in this case. the following loop got no error in your program
for(i=0;i<32;i++){
_mm256_storeu_si256 ((__m256i const *) bit_out[i], v_bit[i]);
}
So the best instruction that you are looking for is _mm256_storeu_si256 this instruction stores a __m256i vector to unaligned address if your data are aligned you can use _mm256_store_si256. to see your vectors values you can use this function:
#include <stdalign.h>
alignas(32) unsigned long long int tempu64[4];
void printVecu64(__m256i vec)
{
_mm256_store_si256((__m256i *)&tempu64[0], vec);
printf("[0]= %u, [1]=%u, [2]=%u, [3]=%u \n\n", tempu64[0],tempu64[1],tempu64[2],tempu64[3]) ;
}
the _mm256_maskstore_epi64 let you choose the elements that you are going to store to the memory. This instruction is useful when you want to store a vector with more options to store an element to the memory or not change the memory value.
I was reading the Intel 64 and IA-32 Architectures Optimization Reference Manual (248966-032), 2016, page 410. and interestingly found out that unaligned store is still a performance killer.
11.6.3 Prefer Aligned Stores Over Aligned Loads
There are cases where it is possible to align only a subset of the
processed data buffers. In these cases, aligning data buffers used for
store operations usually yields better performance than aligning data
buffers used for load operations. Unaligned stores are likely to cause
greater performance degradation than unaligned loads, since there is a
very high penalty on stores to a split cache-line that crosses pages.
This penalty is estimated at 150 cycles. Loads that cross a page
boundary are executed at retirement. In Example 11-12, unaligned store
address can affect SAXPY performance for 3 unaligned addresses to
about one quarter of the aligned case.
I shared here because some people said there are no differences between aligned/unaligned store except in debuging!

how can split integers into bytes without using arithmetic in c?

I am implementing four basic arithmetic functions(add, sub, division, multiplication) in C.
the basic structure of these functions I imagined is
the program gets two operands by user using scanf,
and the program split these values into bytes and compute!
I've completed addition and subtraction,
but I forgot that I shouldn't use arithmetic functions,
so when splitting integer into single bytes,
I wrote codes like
while(quotient!=0){
bin[i]=quotient%2;
quotient=quotient/2;
i++;
}
but since there is arithmetic functions that i shouldn't use..
so i have to rewrite that splitting parts,
but i really have no idea how can i split integer into single byte without using
% or /.
To access the bytes of a variable type punning can be used.
According to the Standard C (C99 and C11), only unsigned char brings certainty to perform this operation in a safe way.
This could be done in the following way:
typedef unsigned int myint_t;
myint_t x = 1234;
union {
myint_t val;
unsigned char byte[sizeof(myint_t)];
} u;
Now, you can of course access to the bytes of x in this way:
u.val = x;
for (int j = 0; j < sizeof(myint_t); j++)
printf("%d ",u.byte[j]);
However, as WhozCrag has pointed out, there are issues with endianness.
It cannot be assumed that the bytes are in determined order.
So, before doing any computation with bytes, your program needs to check how the endianness works.
#include <limits.h> /* To use UCHAR_MAX */
unsigned long int ByteFactor = 1u + UCHAR_MAX; /* 256 almost everywhere */
u.val = 0;
for (int j = sizeof(myint_t) - 1; j >= 0 ; j--)
u.val = u.val * ByteFactor + j;
Now, when you print the values of u.byte[], you will see the order in that bytes are arranged for the type myint_t.
The less significant byte will have value 0.
I assume 32 bit integers (if not the case then just change the sizes) there are more approaches:
BYTE pointer
#include<stdio.h>
int x; // your integer or whatever else data type
BYTE *p=(BYTE*)&x;
x=0x11223344;
printf("%x\n",p[0]);
printf("%x\n",p[1]);
printf("%x\n",p[2]);
printf("%x\n",p[3]);
just get the address of your data as BYTE pointer
and access the bytes directly via 1D array
union
#include<stdio.h>
union
{
int x; // your integer or whatever else data type
BYTE p[4];
} a;
a.x=0x11223344;
printf("%x\n",a.p[0]);
printf("%x\n",a.p[1]);
printf("%x\n",a.p[2]);
printf("%x\n",a.p[3]);
and access the bytes directly via 1D array
[notes]
if you do not have BYTE defined then change it for unsigned char
with ALU you can use not only %,/ but also >>,& which is way faster but still use arithmetics
now depending on the platform endianness the output can be 11,22,33,44 of 44,33,22,11 so you need to take that in mind (especially for code used in multiple platforms)
you need to handle sign of number, for unsigned integers there is no problem
but for signed the C uses 2'os complement so it is better to separate the sign before spliting like:
int s;
if (x<0) { s=-1; x=-x; } else s=+1;
// now split ...
[edit2] logical/bit operations
x<<n,x>>n - is bit shift left and right of x by n bits
x&y - is bitwise logical and (perform logical AND on each bit separately)
so when you have for example 32 bit unsigned int (called DWORD) yu can split it to BYTES like this:
DWORD x; // input 32 bit unsigned int
BYTE a0,a1,a2,a3; // output BYTES a0 is the least significant a3 is the most significant
x=0x11223344;
a0=DWORD((x )&255); // should be 0x44
a1=DWORD((x>> 8)&255); // should be 0x33
a2=DWORD((x>>16)&255); // should be 0x22
a3=DWORD((x>>24)&255); // should be 0x11
this approach is not affected by endianness
but it uses ALU
the point is shift the bits you want to position of 0..7 bit and mask out the rest
the &255 and DWORD() overtyping is not needed on all compilers but some do weird stuff without them especially on signed variables like char or int
x>>n is the same as x/(pow(2,n))=x/(1<<n)
x&((1<<n)-1) is the same as x%(pow(2,n))=x%(1<<n)
so (x>>8)=x/256 and (x&255)=x%256

Is it possible to compare and assign an unsigned char to an integer

Variables T, max_gray and qtd_px are always greater than 0, and numeros_px is an unsigned char vector that stores values from 0-255.
If it's possible, please explain why, because it seems to work fine here in CodeBlocks, but it doesn't make sense for me, because they are from different types: one is unsigned char and the other one is int.
void filtro(unsigned char *numeros_px, int qtd_px, int T, int max_gray){
int i;
for(i=0; i<qtd_px; i++){
if(numeros_px[i]>= T) numeros_px[i]=max_gray;
else numeros_px[i]=0;
}
}
Yes you can assign char to int, because char is 8 bit wide, integer is 32 (depends on the architecture but on PCs, and 32 bit ARMs, it is 32 bit wide. 16 bit on some chips)
What you can not do is the oppsite as obviously you might loose data.
*In your code this is bad : * numeros_px[i]=max_gray; as max_grey is 32 bits and you put it into an 8 bit variable. This is not a problem if max_grey is <255

How to make sure that two addresses have the least significant 4 bits the same?

So I have two pointers:
unsigned char * a;
unsigned char * b;
Let's assume that I used malloc and they are allocated of a certain size.
I want to make the least significant 4 bits of the address of the pointers to be the same... but I really don't know how.
First of all I want to take the least significant 4 bits from a. I tried something like
int least = (&a) & 0x0f;
but I get an error that & is an invalid operand. I was thinking to allocate more for b and search for an address that has the least significant 4 bits the same as a but I really have no idea how I can do that.
#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
int main()
{
unsigned char *a;
unsigned char *b;
a = malloc(8);
b = malloc(8);
if (((uintptr_t)a & 0x0F) == ((uintptr_t)b & 0x0F)) {
printf("Yeah, the least 4 bits are the same.\n");
} else {
printf("Nope, the least 4 bits are not the same.\n");
}
free(a);
free(b);
return EXIT_SUCCESS;
}
Try this:
int main()
{
unsigned char *a, *b;
a = malloc(32);
b = a + 16;
printf("%p %p\n", a, b); // You should see that their least significative
// 4-bits are equal
}
Since a and b are 16 byte apart and part of a contiguous memory block, their addresses should have the property you want.
One possible way to solve this problem is to use an allocation function that will only return allocations that are aligned on 16 byte boundaries (therefore the least significant 4 bits will be always be zero).
Some platforms have such alignment-guaranteed allocation functions such as _aligned_malloc() in MSVC or posix_memalign() on Unix variants. If you don't have such an allocator available, returning an aligned block of memory using plain vanilla malloc() is a common interview question - an internet search will net you many possible solutions.
What about this:
int least;
least = (int)(&a) ^ (int)(&b); //this is a bitwise XOR, returning 0s when the bits are the same
if (least % 16) = 0 then
{
//first four bits are zeroes, meaning they all match
}

how is data stored at bit level according to "Endianness"?

I read about Endianness and understood squat...
so I wrote this
main()
{
int k = 0xA5B9BF9F;
BYTE *b = (BYTE*)&k; //value at *b is 9f
b++; //value at *b is BF
b++; //value at *b is B9
b++; //value at *b is A5
}
k was equal to A5 B9 BF 9F
and (byte)pointer "walk" o/p was 9F BF b9 A5
so I get it bytes are stored backwards...ok.
~
so now I thought how is it stored at BIT level...
I means is "9f"(1001 1111) stored as "f9"(1111 1001)?
so I wrote this
int _tmain(int argc, _TCHAR* argv[])
{
int k = 0xA5B9BF9F;
void *ptr = &k;
bool temp= TRUE;
cout<<"ready or not here I come \n"<<endl;
for(int i=0;i<32;i++)
{
temp = *( (bool*)ptr + i );
if( temp )
cout<<"1 ";
if( !temp)
cout<<"0 ";
if(i==7||i==15||i==23)
cout<<" - ";
}
}
I get some random output
even for nos. like "32" I dont get anything sensible.
why ?
Just for completeness, machines are described in terms of both byte order and bit order.
The intel x86 is called Consistent Little Endian because it stores multi-byte values in LSB to MSB order as memory address increases. Its bit numbering convention is b0 = 2^0 and b31 = 2^31.
The Motorola 68000 is called Inconsistent Big Endian because it stores multi-byte values in MSB to LSB order as memory address increases. Its bit numbering convention is b0 = 2^0 and b31 = 2^31 (same as intel, which is why it is called 'Inconsistent' Big Endian).
The 32-bit IBM/Motorola PowerPC is called Consistent Big Endian because it stores multi-byte values in MSB to LSB order as memory address increases. Its bit numbering convention is b0 = 2^31 and b31 = 2^0.
Under normal high level language use the bit order is generally transparent to the developer. When writing in assembly language or working with the hardware, the bit numbering does come into play.
Endianness, as you discovered by your experiment refers to the order that bytes are stored in an object.
Bits do not get stored differently, they're always 8 bits, and always "human readable" (high->low).
Now that we've discussed that you don't need your code... About your code:
for(int i=0;i<32;i++)
{
temp = *( (bool*)ptr + i );
...
}
This isn't doing what you think it's doing. You're iterating over 0-32, the number of bits in a word - good. But your temp assignment is all wrong :)
It's important to note that a bool* is the same size as an int* is the same size as a BigStruct*. All pointers on the same machine are the same size - 32bits on a 32bit machine, 64bits on a 64bit machine.
ptr + i is adding i bytes to the ptr address. When i>3, you're reading a whole new word... this could possibly cause a segfault.
What you want to use is bit-masks. Something like this should work:
for (int i = 0; i < 32; i++) {
unsigned int mask = 1 << i;
bool bit_is_one = static_cast<unsigned int>(ptr) & mask;
...
}
Your machine almost certainly can't address individual bits of memory, so the layout of bits inside a byte is meaningless. Endianness refers only to the ordering of bytes inside multibyte objects.
To make your second program make sense (though there isn't really any reason to, since it won't give you any meaningful results) you need to learn about the bitwise operators - particularly & for this application.
Byte Endianness
On different machines this code may give different results:
union endian_example {
unsigned long u;
unsigned char a[sizeof(unsigned long)];
} x;
x.u = 0x0a0b0c0d;
int i;
for (i = 0; i< sizeof(unsigned long); i++) {
printf("%u\n", (unsigned)x.a[i]);
}
This is because different machines are free to store values in any byte order they wish. This is fairly arbitrary. There is no backwards or forwards in the grand scheme of things.
Bit Endianness
Usually you don't have to ever worry about bit endianness. The most common way to access individual bits is with shifts ( >>, << ) but those are really tied to values, not bytes or bits. They preform an arithmatic operation on a value. That value is stored in bits (which are in bytes).
Where you may run into a problem in C with bit endianness is if you ever use a bit field. This is a rarely used (for this reason and a few others) "feature" of C that allows you to tell the compiler how many bits a member of a struct will use.
struct thing {
unsigned y:1; // y will be one bit and can have the values 0 and 1
signed z:1; // z can only have the values 0 and -1
unsigned a:2; // a can be 0, 1, 2, or 3
unsigned b:4; // b is just here to take up the rest of the a byte
};
In this the bit endianness is compiler dependant. Should y be the most or least significant bit in a thing? Who knows? If you care about the bit ordering (describing things like the layout of a IPv4 packet header, control registers of device, or just a storage formate in a file) then you probably don't want to worry about some different compiler doing this the wrong way. Also, compilers aren't always as smart about how they work with bit fields as one would hope.
This line here:
temp = *( (bool*)ptr + i );
... when you do pointer arithmetic like this, the compiler moves the pointer on by the number you added times the sizeof the thing you are pointing to. Because you are casting your void* to a bool*, the compiler will be moving the pointer along by the size of one "bool", which is probably just an int under the covers, so you'll be printing out memory from further along than you thought.
You can't address the individual bits in a byte, so it's almost meaningless to ask which way round they are stored. (Your machine can store them whichever way it wants and you won't be able to tell). The only time you might care about it is when you come to actually spit bits out over a physical interface like I2C or RS232 or similar, where you have to actually spit the bits out one-by-one. Even then, though, the protocol would define which order to spit the bits out in, and the device driver code would have to translate between "an int with value 0xAABBCCDD" and "a bit sequence 11100011... [whatever] in protocol order".

Resources