Is my Blowfish algorithm "standard"?

Is my Blowfish algorithm "standard"? - c

I wrote my implementation of a program, in C, in which I can encrypt a text file and vice versa.
The BLOWFISH algorithm is the standard one provided.
But then my thought is this: if I create a set of 4 chars in a long file, let's say 0x12345678, I can decode it because I know the proper order in which I read the file.
On the other hand, using a pre-made function like memcpy(), the content read is ordered like as 0x87654321, not as my previous function do. But the algorithm used is the same.
Is there a "standard" way to read and acquire data from a file, or both of the previous examples are fine? In an online site (blowfish online) the version used with memcpy() does not comply with that, when using the ECB mode. The version that acquires the data like 0x1234567 is working fine with the site. (Working means making an encrypted file with my program and decrypting it online).
For example, if I code and decode stuff with my program, that stuff should be (knowing the key) coded/decoded by other people who don't know my program (as general rule, at least)?
EDIT: the memcpy() function translate the lowest index of the array to the right end of the INT number.
This is the code which manipulate data for 64bit block:
memcpy(&cl, &file_cache[i], sizeof(unsigned long));
memcpy(&cr, &file_cache[i + 4], sizeof(unsigned long));
And this is the core part (is working fine, by correctly rearranging the read from the buffer, i.e. looping 8 times for each block) of the same portion which uses bitwise magic instead of memcpy() and comply with the endianess problem:
if (i==0){
cl <<= 24;
L |= 0xff000000 & cl;
}
else if (i==1){
cl <<= 16;
L |= 0x00ff0000 & cl;
}
else if (i==2){
cl <<= 8;
L |= 0x0000ff00 & cl;
}
else if (i==3){
//cl <<= 24;
L |= 0x000000ff & cl;
}
else if (i==4){
cl <<= 24;
R |= 0xff000000 & cl;
}
else if (i==5){
cl <<= 16;
R |= 0x00ff0000 & cl;
}
else if (i==6){
cl <<= 8;
R |= 0x0000ff00 & cl;
}
else if (i==7){
//cl <<= 8;
R |= 0x000000ff & cl;
}
Then L and R are sent to be encrypted. This last implementation works if I use other blowfish versions on line, so in principle should be better.
Which implementation is faster/better/lighter/stronger?
If the memcpy() is the one adviced, there's a convenient and faster way to reverse/mirroring the content of cl and cr?

Note that the leftmost byte is usually the "first byte send/received" for cryptography; i.e. if you have an array then the lowest index is to the left. If nothing has been specified, then this is the ad-hoc standard.
However, the Blowfish test vectors - as indicated by GregS - explicitly specify this default order, so there is no need to guess:
...
All data is shown as a hex string with 012345 loading as
data[0]=0x01;
data[1]=0x23;
data[2]=0x45;
...
As long as your code produces the same test vectors then you're OK, keeping in mind that your input / output should comply with the order of the test vectors.
It is highly recommended to make any cryptographic API operate on bytes (or rather, octets), not on other data types even if those bytes are internally handled as 32 or 64 bit words. The time required for conversion to/from bytes should be minimal compared to the actual encryption/decryption.

If you read the file as a sequence of 4-byte words then you would need to account for the endianness of those words in the memory layout, swapping the bytes as required to ensure the individual bytes are handled in a consistent order.
However, if you read/write your file as a sequence of bytes, and stored directly in sequence in memory (in an unsigned char array for example) then the data in file should have the same layout as in memory. That way you can obtain a consistent encoding/decoding whether you encode directly from/to memory or from/to file.

Related

Improving speed of bit copying in a lossless audio encoding algorithm (written in C)

I'm trying to implement a lossless audio codec that will be able to process data coming in at roughly 190 kHz to then be stored to an SD card using SPI DMA. I've found that the algorithm basically works, but has certain bottlenecks that I can't seem to overcome. I was hoping to get some advice on how to best optimize a certain portion of the code that I found to be the "slowest". I'm writing in C on a TI DSP and am using -O3 optimization.
for (j = packet_to_write.bfp_bits; j>0; j--)
{
encoded_data[(filled/16)] |= ((buf_filt[i] >> (j- 1)) & 1) << (filled++ % 16);
}
In this section of code, I am taking X number of bits from the original data and fitting it into a buffer of encoded data. I've found that the loop is fairly costly and when I am working with a set of data represented by 8+ bits, then this code is too slow for my application. Loop unrolling doesn't really work here since each block of data can be represented by a different number of bits. The "filled" variable represents a bit counter filling up Uint16 indices in the encoded_data buffer.
I'd like some help understanding where bottlenecks may come from in this snippet of code (and hopefully I can take those findings and apply that to other areas of the algo). The authors of the paper that I'm reading (whose algorithm I'm trying to replicate) noted that they used a mixture of C and assembly code, but I'm not sure how assembly would be useful in this case.
Finally, the code itself is functional and I have done some extensive testing on actual audio samples. It's just not fast enough for real-time!
Thanks!

You really need to change the representation that you use for the output data. Instead of just a target buffer and the number of bits written, expand this to:
//complete words that have been written
uint16_t *encoded_data;
//number of complete words that have been written
unsigned filled_words;
//bits waiting to be written to encoded_data, LSB first
uint32_t encoded_bits;
//number of bits in encoded_bits
unsinged filled_bits;
This uses a single 32-bit word to buffer bits until we have enough to write out a complete uint16_t. This greatly simplifies the shifting and masking, because you always have at least 16 free bits to write into.
Then you can write out n bits of any source word like this:
void write_bits(uint16_t bits, unsigned n) {
uint32_t mask = ((uint32_t)0x0FFFF) >> (16-n);
encoded_bits |= (bits&mask) << filled_bits;
filled_bits += n;
if (filled_bits >= 16) {
encoded_data[filled_words++] = (uint16_t)encoded_bits;
encoded_bits >>= 16;
filled_bits -= 16;
}
}
and instead of your loop, you just write
write_bits(buf_filt[i], packet_to_write.bfp_bits);
No one-bit-at-a-time operations are required.

c Code that reads a 4 byte little endian number from a buffer

I encountered this piece of C code that's existing. I am struggling to understand it.
I supposidly reads a 4 byte unsigned value passed in a buffer (in little endian format) into a variable of type "long".
This code runs on a 64 bit word size, little endian x86 machine - where sizeof(long) is 8 bytes.
My guess is that this code is intended to also run on a 32 bit x86 machine - so a variable of type long is used instead of int for sake of storing value from a four byte input data.
I am having some doubts and have put comments in the code to express what I understand, or what I don't :-)
Please answer questions below in that context
void read_Value_From_Four_Byte_Buff( char*input)
{
/* use long so on 32 bit machine, can still accommodate 4 bytes */
long intValueOfInput;
/* Bitwise and of input buffer's byte 0 with 0xFF gives MSB or LSB ?*/
/* This code seems to assume that assignment will store in rightmost byte - is that true on a x86 machine ?*/
intValueOfInput = 0xFF & input[0];
/*left shift byte-1 eight times, bitwise "or" places in 2nd byte frm right*/
intValueOfInput |= ((0xFF & input[1]) << 8);
/* similar left shift in mult. of 8 and bitwise "or" for next two bytes */
intValueOfInput |= ((0xFF & input[2]) << 16);
intValueOfInput |= ((0xFF & input[3]) << 24);
}
My questions
1) The input buffer is expected to be in "Little endian". But from code looks like assumption here is that it read in as Byte 0 = MSB, Byte 1, Byte 2, Byte 3= LSB. I thought so because code reads bytes starting from Byte 0, and subsequent bytes ( 1 onwards) are placed in the target variable after left shifting. Is that how it is or am I getting it wrong ?
2) I feel this is a convoluted way of doing things - is there a simpler alternative to copy value from 4 byte buffer into a long variable ?
3) Will the assumption "that this code will run on a 64 bit machine" will have any bearing on how easily I can do this alternatively? I mean is all this trouble to keep it agnostic to word size ( I assume its agnostic to word size now - not sure though) ?
Thanks for your enlightenment :-)

You have it backwards. When you left shift, you're putting into more significant bits. So (0xFF & input[3]) << 24) puts Byte 3 into the MSB.
This is the way to do it in standard C. POSIX has the function ntohl() that converts from network byte order to a native 32-bit integer, so this is usually used in Unix/Linux applications.
This will not work exactly the same on a 64-bit machine, unless you use unsigned long instead of long. As currently written, the highest bit of input[3] will be put into the sign bit of the result (assuming a twos-complement machine), so you can get negative results. If long is 64 bits, all the results will be positive.

The code you are using does indeed treat the input buffer as little endian. Look how it takes the first byte of the buffer and just assigns it to the variable without any shifting. If the first byte increases by 1, the value of your result increases by 1, so it is the least-significant byte (LSB). Left-shifting makes a byte more significant, not less. Left-shifting by 8 is generally the same as multiplying by 256.
I don't think you can get much simpler than this unless you use an external function, or make assumptions about the machine this code is running on, or invoke undefined behavior. In most instances, it would work to just write uint32_t x = *(uint32_t *)input; but this assumes your machine is little endian and I think it might be undefined behavior according to the C standard.
No, running on a 64-bit machine is not a problem. I recommend using types like uint32_t and int32_t to make it easier to reason about whether your code will work on different architectures. You just need to include the stdint.h header from C99 to use those types.
The right-hand side of the last line of this function might exhibit undefined behavior depending on the data in the input:
((0xFF & input[3]) << 24)
The problem is that (0xFF & input[3]) will be a signed int (because of integer promotion). The int will probably be 32-bit, and you are shifting it so far to the left that the resulting value might not be representable in an int. The C standard says this is undefined behavior, and you should really try to avoid that because it gives the compiler a license to do whatever it wants and you won't be able to predict the result.
A solution is to convert it from an int to a uint32_t before shifting it, using a cast.
Finally, the variable intValueOfInput is written to but never used. Shouldn't you return it or store it somewhere?
Taking all this into account, I would rewrite the function like this:
uint32_t read_value_from_four_byte_buff(char * input)
{
uint32_t x;
x = 0xFF & input[0];
x |= (0xFF & input[1]) << 8;
x |= (0xFF & input[2]) << 16;
x |= (uint32_t)(0xFF & input[3]) << 24;
return x;
}

From the code, Byte 0 is LSB, Byte 3 is MSB. But there are some typos. The lines should be
intValueOfInput |= ((0xFF & input[2]) << 16);
intValueOfInput |= ((0xFF & input[3]) << 24);
You can make the code shorter by dropping 0xFF but using the type "unsigned char" in the argument type.
To make the code shorter, you can do:
long intValueOfInput = 0;
for (int i = 0, shift = 0; i < 4; i++, shift += 8)
intValueOfInput |= ((unsigned char)input[i]) << shift;

logic operators & bit separation calculation in C (PIC programming)

I am programming a PIC18F94K20 to work in conjunction with a MCP7941X I2C RTCC ship and a 24AA128 I2C CMOS Serial EEPROM device. Currently I have code which successfully intialises the seconds/days/etc values of the RTCC and starts the timer, toggling a LED upon the turnover of every second.
I am attempting to augment the code to read back the correct data for these values, however I am running into trouble when I try to account for the various 'extra' bits in the values. The memory map may help elucidate my problem somewhat:
Taking, for example, the hours column, or the 02h address. Bit 6 is set as 1 to toggle 12 hour time, adding 01000000 to the hours bit. I can read back the entire contents of the byte at this address, but I want to employ an if statement to detect whether 12 or 24 hour time is in place, and adjust accordingly. I'm not worried about the 10-hour bits, as I can calculate that easily enough with a BCD conversion loop (I think).
I earlier used the bitwise OR operator in C to augment the original hours data to 24. I initialised the hours in this particular case to 0x11, and set the 12 hour control bit which is 0x64. When setting the time:
WriteI2C(0x11|0x64);
which as you can see uses the bitwise OR.
When reading back the hours, how can I incorporate operators into my code to separate the superfluous bits from the actual time bits? I tried doing something like this:
current_seconds = ReadI2C();
current_seconds = ST & current_seconds;
but that completely ruins everything. It compiles, but the device gets 'stuck' on this sequence.
How do I separate the ST / AMPM / VBATEN bits from the actual data I need, and what would a good method be of implementing for loops for the various circumstances they present (e.g. reading back 12 hour time if bit 6 = 0 and 24 hour time if bit6 = 1, and so on).
I'm a bit of a C novice and this is my first foray into electronics so I really appreciate any help. Thanks.

To remove (zero) a bit, you can AND the value with a mask having all other bits set, i.e., the complement of the bits that you wish to zero, e.g.:
value_without_bit_6 = value & ~(1<<6);
To isolate a bit within an integer, you can AND the value with a mask having only those bits set. For checking flags this is all you need to do, e.g.,
if (value & (1<<6)) {
// bit 6 is set
} else {
// bit 6 is not set
}
To read the value of a small integer offset within a larger one, first isolate the bits, and then shift them right by the index of the lowest bit (to get the least significant bit into correct position), e.g.:
value_in_bits_4_and_5 = (value & ((1<<4)|(1<<5))) >> 4;
For more readable code, you should use constants or #defined macros to represent the various bit masks you need, e.g.:
#define BIT_VBAT_EN (1<<3)
if (value & BIT_VBAT_EN) {
// VBAT is enabled
}
Another way to do this is to use bitfields to define the organisation of bits, e.g.:
typedef union {
struct {
unsigned ones:4;
unsigned tens:3;
unsigned st:1;
} seconds;
uint8_t byte;
} seconds_register_t;
seconds_register_t sr;
sr.byte = READ_ADDRESS(0x00);
unsigned int seconds = sr.seconds.ones + sr.seconds.tens * 10;
A potential problem with bitfields is that the code generated by the compiler may be unpredictably large or inefficient, which is sometimes a concern with microcontrollers, but obviously it's nicer to read and write. (Another problem often cited is that the organisation of bit fields, e.g., endianness, is largely unspecified by the C standard and thus not guaranteed portable across compilers and platforms. However, it is my opinion that low-level development for microcontrollers tends to be inherently non-portable, so if you find the right bit layout I wouldn't consider using bitfields “wrong”, especially for hobbyist projects.)
Yet you can accomplish similarly readable syntax with macros; it's just the macro itself that is less readable:
#define GET_SECONDS(r) ( ((r) & 0x0F) + (((r) & 0x70) >> 4) * 10 )
uint8_t sr = READ_ADDRESS(0x00);
unsigned int seconds = GET_SECONDS(sr);

Regarding the bit masking itself, you are going to want to make a model of that memory map in your microcontroller. The simplest, cudest way to do that is to #define a number of bit masks, like this:
#define REG1_ST 0x80u
#define REG1_10_SECONDS 0x70u
#define REG1_SECONDS 0x0Fu
#define REG2_10_MINUTES 0x70u
...
And then when reading each byte, mask out the data you are interested in. For example:
bool st = (data & REG1_ST) != 0;
uint8_t ten_seconds = (data & REG1_10_SECONDS) >> 4;
uint8_t seconds = (data & REG1_SECONDS);
The important part is to minimize the amount of "magic numbers" in the source code.
Writing data:
reg1 = 0;
reg1 |= st ? REG1_ST : 0;
reg1 |= (ten_seconds << 4) & REG1_10_SECONDS;
reg1 |= seconds & REG1_SECONDS;
Please note that I left out the I2C communication of this.

Convert two 8-bit uint to one 12-bit uint

I'm reading two registers from microcontroller. One have 4-bit MSB (First 4-bits has some other things) and another 8-bit LSB. I want to convert it into one 12-bit uint (16 bit to be precise). So far I made it like that:
UINT16 x;
UINT8 RegValue = 0;
UINT8 RegValue1 = 0;
ReadRegister(Register01, &RegValue1);
ReadRegister(Register02, &RegValue2);
x = RegValue1 & 0x000F;
x = x << 8;
x = x | RegValue2 & 0x00FF;
is there any better way to do that?
/* To be more precise ReadRegister is I2C communication to another ADC. Register01 and Register02 are different addresses. RegValue1 is 8 bit but only 4 LSB are needed and concatenate to RegValue (4-LSB of RegValue1 and all 8-bits of RegValue). */

If you know the endianness of your machine, you can read the bytes
directly into x like this:
ReadRegister(Register01, (UINT8*)&x + 1);
ReadRegister(Register02, (UINT8*)&x);
x &= 0xfff;
Note that this is not portable and the performance gain (if any) will
likely be small.

The RegValue & 0x00FF mask is unnecessary since RegValue is already 8 bit.
Breaking it down into three statements may be good for clarity, but this expression is probably simple enough to implement in one statement:
x = ((RegValue1 & 0x0Fu) << 8u) | RegValue ;
The use of an unsigned literal (0x0Fu) makes little difference but emphasises that we are dealing with unsigned 8-bit data. It is in fact an unsigned int even with only two digits, but again this emphasises to the reader perhaps that we are only dealing with 8 bits, and is purely stylistic rather than semantic. In C there is no 8-bit literal constant type (though in C++ '\x0f' has type char). You can force better type agreement as follows:
#define LS4BITMASK ((UINT8)0x0fu)
x = ((RegValue1 & LS4BITMASK) << 8u) | RegValue ;
The macro merely avoids repetition and clutter in the expression.
None of the above is necessarily "better" than your original code in terms of performance or actual generated code, and is largely a matter of preference or local coding standards or practices.

If the registers are adjacent to each other, they will most likley also be in the correct order with respect to target endianness. That being the case they can be read as a single 16 bit register and masked accordingly, assuming that Register01 is the lower address value:
ReadRegister16(Register01, &x ) ;
x &= 0x0fffu ;
Of course I have invented here the ReadRegister16() function, but if the registers are memory mapped, and Register01 is simply an address then this may simply be:
UINT16 x = *Register01 ;
x &= 0x0fffu ;

Why both utf-16le and utf-16be exists? endianness efficiency - C

I was wondering why both utf-16le and utf-16be exists? Is it considered to be "inefficient" for a big-endian environment to process a little-endian data?
Currently, this is what I use while storing 2 bytes var locally:
unsigned char octets[2];
short int shotint = 12345; /* (assuming short int = 2 bytes) */
octets[0] = (shortint) & 255;
octets[1] = (shortint >> 8) & 255);
I know that while storing and reading as a fixed endianness locally - there is no endian risk. I was wondering if it's considered to be "inefficient"? what would be the most "efficient" way to store a 2 bytes var? (while restricting the data to the environment's endianness, local use only.)
Thanks, Doori Bar

This allows code to write large amounts of Unicode data to a file without conversion. During loading, you must always check the endianess. If you're lucky, you need no conversion. So in 66% of the cases, you need no conversion and only on 33% you must convert.
In memory, you can then access the data using the native datatypes of your CPU which allows for efficient processing.
That way, everyone can be as happy as possible.
So in your case, you need to check the encoding when loading the data but in RAM, you can use an array of short int to process it.
[EDIT] The fastest way to convert a 16bit value to 2 octets is:
char octet[2];
short * prt = (short*)&octet[0];
*ptr = 12345;
Now you don't know if octet[0] is the low or upper 8 bits. To find that out, write a know value and then examine it.
This will give you one of the encodings; the native one of your CPU.
If you need the other encoding, you can either swap the octets as you write them to a file (i.e. write them octet[1],octet[0]) or your code.
If you have several octets, you can use 32bit integers to swap two 16bit values at once:
char octet[4];
short * prt = (short*)&octet[0];
*ptr ++ = 12345;
*ptr ++ = 23456;
int * ptr32 = (int*)&octet[0];
int val = ((*ptr32 << 8) & 0xff00ff00) || (*ptr >> 8) & 0x00ff00ff);

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight