I cannot understand this behavior of struct pointers and XOR - c

I'm working with struct pointers for the first time, and I can't seem to make sense of what's happening here. My test applies the basic property of xor that says x ^ y ^ y = x, but not in C?
The below code is in my main program, and accurately restores all of the letters of "test" (which I proceed to print on screen, but I've omitted a lot of junk so as to keep this question short(er)). The struct "aes" refers to this definition:
typedef uint32_t word;
struct aes {
word iv[4];
word key[8];
word state[4];
word schedule[56];
};
As the context might suggest, the encapsulating project is an AES implementation (I'm trying to speed up my current one by trying new techniques).
In my testing, make_string and make_state work reliably, even in the functions in question, but for references sake:
void make_string (word in[], char out[]) {
for (int i = 0; i < 4; i++) {
out[(i * 4) + 0] = (char) (in[i] >> 24);
out[(i * 4) + 1] = (char) (in[i] >> 16);
out[(i * 4) + 2] = (char) (in[i] >> 8);
out[(i * 4) + 3] = (char) (in[i] );
}
}
void make_state(word out[], char in[]) {
for (int i = 0; i < 4; i++) {
out[i] = (word) (in[(i * 4) + 0] << 24) ^
(word) (in[(i * 4) + 1] << 16) ^
(word) (in[(i * 4) + 2] << 8) ^
(word) (in[(i * 4) + 3] );
}
}
Anyway, here is the block that DOES work. It's this functionality that I'm trying to modularize by stowing it away in a function:
char test[16] = {
'a', 'b', 'c', 'd',
'e', 'f', 'g', 'h',
'i', 'j', 'k', 'l',
'm', 'n', 'o', 'p'
};
aes cipher;
struct aes * work;
work = &cipher;
make_state(work->state, test);
work->state[0] ^= 0xbc6378cd;
work->state[0] ^= 0xbc6378cd;
make_string(work->state, test);
And while this code works, doing the same thing by passing it to a function does not:
void encipher_block (struct aes * work, char in[]) {
make_state(work->state, in);
work->state[0] ^= 0xff00cd00;
make_string(work->state, in);
}
void decipher_block (struct aes * work, char in[]) {
make_state(work->state, in);
work->state[0] ^= 0xff00cd00;
make_string(work->state, in);
}
Yet, by removing the make_state and make_string calls in both encipher and decipher, it works as expected!
make_state(work->state, test);
encipher_block(&cipher, test);
decipher_block(&cipher, test);
make_string(work->state, test);
So to clarify, I do not have a problem! I just want to understand this behavior.

Change char to unsigned char. char may be signed, and likely is on your system, which causes problems when converting to other integer types and when shifting.
In the expression (char) (in[i] >> 24) in make_string, an unsigned 32-bit integer is converted to a signed 8-bit integer (in your C implementation). This expression may convert values to a char that are not representable in a char, notably the values from 128 to 255. According to C 2011 6.3.1.3 3, the result is implementation-defined or an implementation-defined signal is raised.
In the expression (word) (in[(i * 4) + 3] ) in make_state, in[…] is a char, which is a signed 8-bit integer (in your C implementation). This char is converted to an int, per the usual integer promotions defined in C 2011 6.3.1.1 2. If the char is negative, then the resulting int is negative. Then, when it is converted to a word, which is unsigned, the effect is that the sign bit is replicated in the high 24 bits. For example, if the char has value -166 (0x90), the result will be 0xffffff90, but you want 0x00000090.
Change char to unsigned char throughout this code.
Additionally, in make_state, in[(i * 4) + 0] should be cast to word before the left shift. This is because it will start as an unsigned char, which is promoted to int before the shift. If it has some value with the high bit set, such as 0x80, then shifting it left 24 bits produces a value that cannot be represented in an int, such as 0x80000000. Per C 2011 6.5.7 4, the behavior is then undefined.
This will not be a problem in most C implementations; two’s complement is commonly used for signed integers, and the result will wrap as desired. Additionally, I expect this is a model situation that the compiler developers design for, since it is a very common code structure. However, to improve portability, casting to word will avoid the possibility of overflow.

The make_state() function overwrites the array passed in the first argument. If you put the encipher_block() and decipher_block() bodies inline, you get this:
/* encipher_block inline */
make_state(work->state, in);
work->state[0] ^= 0xff00cd00;
make_string(work->state, in);
/* decipher_block inline */
make_state(work->state, in); /* <-- Here's the problem */
work->state[0] ^= 0xff00cd00;
make_string(work->state, in);

Related

How to shift a character by another character in C

How would I go about circle left shifting every character in a string by the corresponding character in a 'password' string.
char *shift_encrypt(char *plaintext, char *password) {
for (int i = 0; plaintext[i] != '\0'; i++) {
plaintext[i] = (plaintext[i] << password[i]) | (plaintext[i] >> (8 - password[i]));
}
return plaintext;
}
EDIT:
To clarify what I am asking, if I wanted to circle shift for example the character 'A' by the character 'p', I mean something along the lines of:
0100 0001 ('A') << 0x70 ('p')
shown bit-shifted left bit by bit
1. 1000 0010
2. 0000 0101
.
.
.
110. 0101 0000
111. 1010 0000
112. 0100 0001
So basically shifting by 1, 126 times?
To circular shift an 8-bit object with large values like 112 ('p'), mod the shift by 8u. % with a negative char and 8 is not mod so use unsigned math.
Access plaintext[i] as an unsigned char [] to avoid sign extension on right shifts.
Use size_t to index string to handle even very long strings.
Sample fix:
char *shift_encrypt2(char *plaintext, const char *password) {
unsigned char *uplaintext = (unsigned char *) plaintext;
for (size_t i = 0; uplaintext[i]; i++) {
unsigned shift = password[i] % 8u;
uplaintext[i] = (uplaintext[i] << shift) | (uplaintext[i] >> (8u - shift));
}
return plaintext;
}
Note: if the password string is shorter than than plaintext string, we have trouble. A possible fix would re-cycle through the password[].
Advanced: use restrict to allow the compiler to assume plaintext[] and password[] do not overlap and emit potentially faster code.
char *shift_encrypt2(char * restrict plaintext, const char * restrict password) {
Advanced: Code really should access password[] as an unsigned char array too, yet with common and ubiquitous 2's compliment, password[i] % 8u makes no difference.
char *shift_encrypt3(char * restrict plaintext, const char * restrict password) {
if (password[0]) {
unsigned char *uplaintext = (unsigned char *) plaintext;
const unsigned char *upassword = (const unsigned char *) password;
for (size_t i = 0; uplaintext[i]; i++) {
if (*upassword == 0) {
upassword = (const unsigned char *) password;
}
unsigned shift = *upassword++ % 8u;
uplaintext[i] = (uplaintext[i] << shift) | (uplaintext[i] >> (8u - shift));
}
}
return plaintext;
}
Disclaimer: as pointed out in the comments and explained here the C standard does not guarantee that letters are contiguous. However the idea behind this answer still holds.
Characters are defined as linear entries in an ASCII table. This means that each character is represented by a number. Adding 1 to a character brings you to the next one and so on.
You should also be familiar with modular arithmetic. What is "Z"
+1? It goes back to "a".
Putting together these information you can see how the first representable character in a string in an ASCII table is represented by the number 33 decimal and the last one is represented by 126.
You can then make a shift function to shift a letter by n:
shift_letter(L,n)
ret 33 + (((L-33)+n)%(126-33))
The L-33 is done to start from 0.
Then we add n.
We cycle back in case the result is grater than the number of possible letters. %(126-33)
We add offset again
PS:
As I said in the comments, your are shifting in the mathematical sense which not only makes no sense for the operation you want to do, but it also throws an error because shifting by 112 means multiplying by 2^112 which is just a bit too much.

sprintf to convert hexadecimal array to decimal char array only reads first byte

I have an array:
unsigned char datalog[4];
datalog[0] = 0;
datalog[1] = 0xce;
datalog[2] = 0x50;
datalog[3] = 0xa3;
These represent the hex value 0xce50a3. Its decimal value is 13521059.
I need to convert this hex value to a decimal array, preferably using sprintf, so that the final outcome will be:
finalarray[0] = '1';
finalarray[1] = '3';
finalarray[2] = '5';
finalarray[3] = '2';
finalarray[4] = '1';
finalarray[5] = '0';
finalarray[6] = '5';
finalarray[7] = '9';
I've tried several combinations of sprintf inputs, including concatenating my hex array into unsigned long datalogvalue = 0xce50a3. But sprintf only reads its first byte when it converts.
ex:
sprintf(finalarray, "%d", *(unsigned long *)datalog);
yields:
finalarray[0] = '2';
finalarray[1] = '0';
finalarray[2] = '6';
finalarray[3] = ' ';
.....
206 is the decimal representation of 0xce. So it's only converting the first hex byte and not the rest.
Any thoughts on how to convert the entire unsigned long into a decimal array?
As some others have mentioned, attempting to read the bytes of an array in order as a number will be system-dependent as Big Endian and Little Endian systems will give different results.
Furthermore, type-punning through pointer-trickery is undefined behavior as it breaks strict aliasing. The legal way to type pun to a type other than a char-family array involves using unions to represent the data in more than one fashion. Due to the above Endian issue, though, you should not do that for this problem and instead do the bit-shifting method as mentioned in R Sahu's answer.
A simply solution that does not depend on endian, int sizes or pointer tricks
Form the value
// LU to use unsigned long math
((datalog[0]*256LU + datalog[1])*256 + datalog[2])*256 + datalog[3]
Print it
sprintf(finalarray, "%lu", value);
Altogether
sprintf(finalarray, "%lu",
((datalog[0]*256LU + datalog[1])*256 + datalog[2])*256 + datalog[3]);
The outcome of casting a char* to unsigned long* and dereferencing that pointer depends on the endianness of your system. Unless efficiency of this particular calculation is critical for performance of your program, don't use such tricks. Use simple logic.
int res = (datalog[0] << 24) +
(datalog[1] << 16) +
(datalog[2] << 8) +
datalog[3];
sprintf(finalarray, "%d", res);
If you are required to use unsigned long for your type, make sure to use the right format specifier for unsigned long in the call to sprintf.
unsigned long res = (datalog[0] << 24) +
(datalog[1] << 16) +
(datalog[2] << 8) +
datalog[3];
sprintf(finalarray, "%lu", res);
First and foremost, endianness makes things abit troublesome here.
In order to be able to reinterpret your buffer as a 32 bit int you would have to take endianness into consideration when packing.
For example, on my system which is little-endian, datalog would be interpreted as: 2739981824 if converted to a 32 bit unsigned int.
Hence I would have to pack my data according to datalog2 in the example below in order to get the desired 13521059.
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
int main() {
uint8_t datalog[4];
datalog[0] = 0;
datalog[1] = 0xce;
datalog[2] = 0x50;
datalog[3] = 0xa3;
uint32_t temp = *((uint32_t*) datalog);
printf("%u\n", temp); // 2739981824
uint8_t datalog2[4];
datalog2[0] = 0xa3;
datalog2[1] = 0x50;
datalog2[2] = 0xce;
datalog2[3] = 0;
uint32_t temp2 = *((uint32_t*) datalog2);
printf("%u\n", temp2); // 13521059
return 0;
}
There is however another problem with what you are asking.
If I interpret your question correctly, you would like to end up with another array where each of the decimals making up 13521059 in base-10, ends up in its own index.
In order to do this you would have to be able to address log2(10) bits with each index, something that is impossible.
Therefore in order to get an array with the packing that you suggest, you would have to manually convert it.
Due to endianess, the bytes do not appear in the order you think they do:
IDEOne Link
#include <stdio.h>
int main(void) {
unsigned char datalog[4];
char finalarray[20] = {0};
datalog[0] = 0xa3;
datalog[1] = 0x50;
datalog[2] = 0xce;
datalog[3] = 0x00;
sprintf(finalarray, "%lu", *(unsigned long*)datalog);
printf("Answer: %s\n", finalarray);
return 0;
}
Output
Success #stdin #stdout 0s 4180KB
Answer: 13521059

How do I find unsigned BB[] such that (char *) (BB + 1) = "Red Ross!"

I have this short code fragment:
unsigned BB[] = ???;
printf("%s\n", (char *) (BB + 1));
I want the output of that printf to be "Red Ross!". I do not know how to approach this kind of problem, I think it has to do with ASCII table.
Here are some assumptions that can be used:
32-bit little-endian platform
sizeof(char) == 1
sizeof(unsigned) == 4
Start by taking the target string and printing out the value of each byte:
char *str = "Red Ross!";
int i;
for (i=0; i<strlen(str); i++) {
printf("%02x ", str[i]);
}
This will tell you what values to use. Then you can use those values to populate your unsigned array.
Because the unsigned values are implemented at 32-bit little endian, each element can store 4 bytes, and those bytes needs to be stored in the reverse order they should be displayed in due to little-endian byte ordering.
For example, if you wanted to store 0x01, 0x02, 0x03, and 0x04 in that order in a single unsigned, you would do so as follows:
unsigned value = 0x04030201;
So using those two pieces of information, you can construct your unsigned array. Also, because you need to start with BB + 1, the value first element of the array doesn't matter.
Lastly, make sure the last element of the array is 0 so that your string is null terminated.
Well, given that you've sacrificed a lot of portability already (around sizes of built-in types, complementing schemes, and endianness), why not go the whole hog and use multicharacter constants ?
int main(void) {
unsigned BB[] = {[1] = ' deR', 'ssoR', '!'}; // little endian
printf("%s\n", (char *) (BB + 1));
}
will do it, where I've made some fairly commonplace assumptions about your compiler's implementation of multicharacter constants. Note the use of the designated initialiser [1].
Saves all that fishing around in your character tables.
Working example at https://ideone.com/9DiFgy
Taking a look at which codes each letter corresponds to in the ASCII table might be helpful, but not necessary. You can assign character literals to an integer just fine, by using bitwise OR and bit shift:
uint32_t x = ('A' << 24) | ('B' << 16) | ('C' << 8) | 'D';
This puts 'A' in the most significant byte. Where that is depends on endianess. On little endian, the above would result in "DCBA".
This should be enough to solve the assignment. Do remember that strings are null terminated, so you need to end the "string" with a zero.

Convert char to short

I need to copy the data from 2 char (8 bit long) to a single short (16 bit long). I tried two different ways, but can't get it to work.
void char2short(char* pchar, short* pshort)
{
memcpy(pshort , pchar + 1 , 1);
memcpy(pshort + 1, pchar , 1);
}
And the other one:
void char2short(char* pchar, short* pshort)
{
short aux;
aux = ((*pchar & 0x00FF) << 8) | ((*(pchar+1) & 0xFF00) >> 8);
*pshort = aux;
}
#include <stdio.h>
void char2short(unsigned char* pchar, unsigned short* pshort)
{
*pshort = (pchar[0] << 8) | pchar[1];
}
int main()
{
unsigned char test[2];
unsigned short result = 0;
test[0] = 0xAB;
test[1] = 0xCD;
char2short(test, &result);
printf("%#X\n",result);
return 0;
}
this will do the job.
Assuming pchar is an array that contains your 2 chars, how about:
*pshort = (uint16_t)(((unsigned int)pchar[0]) |
(((unsigned int)pchar[1])<<8));
P.S. This work for little endianess.
Others didn't explain why your code didn't work, so I'll take a quick stab at it:
memcpy(pshort , pchar + 1 , 1);
memcpy(pshort + 1, pchar , 1);
Adding to a pointer TYPE * p moves the pointer by increments of sizeof( TYPE ) (so it does point at the next element, remember this is only defined if inside an array). So while pchar + 1 is correct, pshort + 1 is not (as it's addressing the next short).
aux = ((*pchar & 0x00FF) << 8) | ((*(pchar+1) & 0xFF00) >> 8);
Errr.... the right hand side is broken in more ways than one. First, *(pchar+1) is a char, and & 0xFF00 on a char will always yield 0 (because a char is only 8 bits to begin with, at least on contemporary machines...). And then you shift that 8 bits to the right...?
And just in case you weren't aware of it, if you hadn't used 0x00FF on the left hand side (promoting *pchar to the width of the right-hand operand) but (char-sized) 0xFF, the result of that operation would still be of type char, and shifting that 8 bits to the left doesn't make much sense either (as the type doesn't get expanded magically).
Another way to go about this not mentioned yet is the union:
#include <stdio.h>
struct chars_t
{
// could also go for char[2] here,
// whichever makes more sense semantically...
char first;
char second;
};
union combo_t
{
// elements of a union share the memory, i.e.
// reside at the same address, not consecutive ones
short shrt;
struct chars_t chrs;
};
int main()
{
union combo_t x;
x.chrs.first = 0x01;
x.chrs.second = 0x02;
printf( "%x", x.shrt );
return 0;
}
If you're using this in a larger context, beware of struct padding.
When doing bitwise operations, use robust code with real fixed-size integers, of known signedness. This will prevent you from writing bugs related to implicit type conversions, causing unintended signedness. The char type is particularly dangerous, since it has implementation-defined signedness. It should never be used for storing numbers.
#include <stdint.h>
void char2short(const uint8_t* pchar, uint16_t* pshort)
{
*pshort = ((uint16_t)pchar[0] << 8) | (uint16_t)pchar[1];
}

Algorithm to write two's complement integer in memory portably

Say I have the following:
int32 a = ...; // value of variable irrelevant; can be negative
unsigned char *buf = malloc(4); /* assuming octet bytes, this is just big
enough to hold an int32 */
Is there an efficient and portable algorithm to write the two's complement big-endian representation of a to the 4-byte buffer buf in a portable way? That is, regardless of how the machine we're running represents integers internally, how can I efficiently write the two's complement representation of a to the buffer?
This is a C question so you can rely on the C standard to determine if your answer meets the portability requirement.
Yes, you can certainly do it portably:
int32_t a = ...;
uint32_t b = a;
unsigned char *buf = malloc(sizeof a);
uint32_t mask = (1U << CHAR_BIT) - 1; // one-byte mask
for (int i = 0; i < sizeof a; i++)
{
int shift = CHAR_BIT * (sizeof a - i - 1); // downshift amount to put next
// byte in low bits
buf[i] = (b >> shift) & mask; // save current byte to buffer
}
At least, I think that's right. I'll make a quick test.
unsigned long tmp = a; // Converts to "twos complement"
unsigned char *buf = malloc(4);
buf[0] = tmp>>24 & 255;
buf[1] = tmp>>16 & 255;
buf[2] = tmp>>8 & 255;
buf[3] = tmp & 255;
You can drop the & 255 parts if you're assuming CHAR_BIT == 8.
If I understand correctly, you want to store 4 bytes of an int32 inside a char buffer, in a specific order(e.g. lower byte first), regardless of how int32 is represented.
Let's first make clear about those assumptions: sizeof(char)=8, two's compliment, and sizeof(int32)=4.
No, there is NO portable way in your code because you are trying to convert it to char instead of unsigned char. Storing a byte in char is implementation defined.
But if you store it in an unsigned char array, there are portable ways. You can right shift the value each time by 8 bit, to form a byte in the resulting array, or with the bitwise and operator &:
// a is unsigned
1st byte = a & 0xFF
2nd byte = a>>8 & 0xFF
3rd byte = a>>16 & 0xFF
4th byte = a>>24 & 0xFF

Resources