Encrypting a Large File of irregular size using AES algorithm - c

Currently in my project, i am required to encrypt a large file of variable size (around 1 to 1.5 GB)
I am using the aes algorithm from the openssl project. But i am not using the entire library, but just a few functions which generate keys from "passwords" and use those keys to encrypt a fixed block of 128 bytes
In short,
void aes_encrypt(char* in, char* out , AES_KEY ekey);
void aes_decrypt(char* in, char* out , AES_KEY dkey);
The main problem now is that these functions work with a block size of 128 bytes only.
So i must write a wrapper function which takes my file and divides it into chunks of 128 bytes, and feed it to these encryption/decryption routines.
So my question is,
In my wrapper, how do i handle the case where the file size is not an
integer multiple of 128
Do i need to pad the encrypted file with 0's to make a multiple of
If this is the case, how do i recognize the amount zero padding i have done, as i understand that just removing the last bits of 0
from a file, may result in the file losing integrity especially if
the file happens to contain a 0 at the end.
Is it a better approach to prepend a header to the encrypted file,
containing the size information of the file and possibly its
checksum.
Thanks.
ps: I am new to encryption(especially AES)

The block length is 128 bit or 16 bytes. You can for example use PKCS7 padding (see section 10.3 of RFC 2315) to make the last block 16 bytes long.
It works like this: if one byte needs to be added you add a byte with value (all values shown in hex) 01, if two bytes need to be added you add two bytes with values 02, and so on. In the case that no padding is required you still have to add a block with 16 padding bytes with value 10.
To remove the padding bytes, just look at the last byte of the file, it gives the number of bytes to remove.
Also note that ECB mode (encrypting blocks independently of each other) is probably not the best to use, have a look at CBC mode as well.

Related

How to encrypt variable size of data by using AES (CBC) 128 Algorithm

I want encrypt 4 byte of data, but AES takes 16 byte of data as input and gives 16 bytes as output,
So How to overcome this problem, If some have source code please share
Thanks
To encrypt 4 byte of data we need to add 12 more bytes and make a block of 16 bytes, its take more data length to transfer via wireless, So How to send Encrypted data with Real data length
On encryption, append 12 bytes of random junk.
On decryption, ignore the last 12 bytes.
If length is not fixed, then
[1-bit junk][7-bit bit-length][n-bits of data][128-8-n junk bits to fill to 128]
Cipher block chaining (CBC mode) requires that the plaintext be a mulitple of the block size. If it is not, you'll need to pad it out to a multiple of the block size.
Cipher-feedback mode (CFB mode) exists primarily to avoid this -- as it uses the block cipher as a way of generating key bits to XOR with the the plaintext to encrypt it, it can be used with any bit length and padding is not needed. With CFB mode, however, it is critical to not reuse IVs, as that will directly leak the first block of the plaintext. In CBC mode, reusing IVs is still bad (will leak info about correlated inputs), but arguably not as bad.

Encryption algorithm for 14 byte data

I have an application, where I have to transfer an encrypted packet over BLE advertisement which is fixed 14 byte in length. AES algorithm restricts data to be minimum of 16 bytes long and DES requires it to be in multiples of 8 byte. I have a odd length of 14 byte which I cannot change. Is there any encryption algorithm which can be used to encrypt this 14 byte data. Also it would be good, if someone could point out any C based implementation of the algorithm?
Assuming you mean both plaintext and ciphertext are exactly 14 bytes in length:
Use AES in CTR mode. This yields the same 16 byte chunks of data on each side. You can use 14 of the 16 bytes as an XOR key and discard the last two.
Initialization of the IV is done with 14 bytes of IV and two bytes of zeros.
However, there's a wrinkle here. The underlying protocol is stateless broadcast. The only way to get an IV is to use a unique packet identifier, and there might not be one. Without about 10 additional bytes I would have a very hard time coming up with a unique IV generator.
You can use ECB-based Ciphertext Stealing (CTS) with a block cipher that has a block size smaller than 14 byte. I won't go into detail how CTS works, because the Wikipedia article does a pretty good job.
If you heard that ECB is bad then you are correct. Sadly any other mode requires some sort of IV which will eat away your payload constraint. Since CTS moves a part of the ciphertext to the last plaintext block the bad property of ECB goes away.
Block ciphers with small block sizes such as 64-bit have worse security properties than say block ciphers with 128-bit blocks. Just look at the Sweet32 attack. In your case I would guess that this is not really an issue since an attacker cannot get you to generate many many encrypted broadcasts and if they tried it would take them a really long time.
A popular block cipher with 64-bit block size is DES. You might have heard that DES is easily brute forceable due to the small key size and you would be correct. Triple-DES (EEE or EDE) comes to the rescue which has a key size that is three times larger and has a much better protection against brute force attacks.
Sadly, you cannot use AES because CTS needs at least two blocks to work and a single block of AES already breaks your size constraint.
Both AES and DES are Block Cipher algorithms, so you're right that they have minimum sizes for the blocks they can handle. BUT there are many different ways to use block encryption, some of which can process streams of data of arbitrary length. Check out "s-bit Cipher Feedback Mode" where the s is the size of the actual chunks that are individually being encrypted. Here you could use s=2.
Why not frame your data so that it features:
header
payload
padding
The header would contain the length of the data, plus maybe a "magic" code to check synchronisation. The payload is your data. The padding is then a block of zeroes (or random numbers if you so wish) in order to fit the necessary multiple of blockside for the block-cipher.
Assuming you want to transmit in the same 14 bytes you are a bit stuck. The suggestions above mean that you would need to transmit 16 bytes (or even more), then ignore the last 2 bytes.
How secure do you need the data to be? One answer might be to use DES to encrypt all the full blocks (one block of 8 bytes in your case), then use something weaker to encode the remaining (6 in your case) bytes.
A reasonably secure way would be to simply xor the previous 6-byte decoded byte values over your remaining 6 bytes of tail data. Until the DES protected data has been decoded, the xor value cannot be determined.
Of course, if the DES portion was all-zeros then your tail codes are going to be visible. If this is a rare occurrence this may not be an issue as the hacker does not know that the DES data is zeros. You don't have a lot of data, so it is hard to think of much more that you could do.
In general, you could generate a 64-bit (8-byte) checksum of the previous 1K of original data, and use this as the xor. In general you will have a number less than 8 bytes of tail data, which you would xor, and only transmit the number of bytes. On decode, the remaining bytes are ignored.

Do I need output buffer to be multiple of AES block size for CTR mode?

In my C code, I use OpenSSL's AES in CTR mode for encryption via EVP interface, that is I have something like this:
ret = EVP_EncryptInit_ex(ctx, EVP_aes_256_ctr(), NULL, key, iv))
...
ret = EVP_EncryptUpdate(ctx, out, &outlen, in, inlen);
Length of input data isn't multiple of AES block size, so if used, say, CBC mode, I would have to allocate additional bytes for padding in output buffer.
Do I need to do the same thing in CTR mode? And also I'm curious about CFB and OFB modes.
OpenSSL man on EVP_EncryptUpdate doesn't say anything specific about "stream-like" modes, they just warn additional bytes for padding are needed.
TLDR: EVP does not pad CTR (and other stream modes/ciphers)
Actually EVP_EncryptUpdate itself never pads, although it can 'carry' a partial block from one call to the next; thus if the ... in your code does not include any previous EncryptUpdate then this (first) EncryptUpdate will always be block-aligned and its output will never be longer:
EVP_EncryptUpdate() encrypts inl bytes from the buffer in and writes the encrypted version to out. This function can be called multiple times to encrypt successive blocks of data. The amount of data written depends on the block alignment of the encrypted data: as a result the amount of data written may be anything from zero bytes to (inl + cipher_block_size - 1) so out should contain sufficient room. The actual number of bytes written is placed in outl. It also checks if in and out are partially overlapping, and if they are 0 is returned to indicate failure.
It is EVP_EncryptFinal[_ex] that adds the padding and thus 'extra' output when needed as described in the next paragraph of the man page:
If padding is enabled (the default) then EVP_EncryptFinal_ex() encrypts the "final" data, that is any data that remains in a partial block. It uses standard block padding (aka PKCS padding) as described in the NOTES section, below. The encrypted final data is written to out which should have sufficient space for one cipher block. The number of bytes written is placed in outl. After this function is called the encryption operation is finished and no further calls to EVP_EncryptUpdate() should be made.
If padding is disabled then EVP_EncryptFinal_ex() will not encrypt any more data and it will return an error if any data remains in a partial block: that is if the total data length is not a multiple of the block size.
However, since the ciphertext actually consists of the output of all the Update calls (or the call if only one) plus the output of the Final call, the total buffer does need to be (up to) one block larger than input in cases where padding applies.
What the man page doesn't say (and should) is that padding doesn't apply to stream ciphers (like RC4) or stream modes (including CTR OFB* CFB*) at all. Each cipher/mode combination is described by an object of type EVP_CIPHER which is a typedef for struct evp_cipher_st; this is what EVP_aes_256_ctr() and similar routines return a pointer to. Among other things this structure contains a field block_size which contains, you guessed it, the block size (in bytes). For stream modes and ciphers it contains a dummy value 1 which is treated specially: EncryptFinal_ex does not add padding even if enabled, and DecryptFinal_ex does not remove and check padding even if enabled.
Do I need output buffer to be multiple of AES block size for CTR mode?
No, CTR mode does not have that requirement. In this respect, CTR mode is like OFB anb CFB mode. In contrast ECB and CBC mode requires a multiple of the block size.
OpenSSL uses the Init/Update/Final pattern, and both input and output bytes are buffered until needed. You can insert one byte at a time, if you'd like.
Internally, the EVP object will buffer input until there's enough to perform an encryption. Since CTR mode can be streamed, one output byte will be available after processing one input byte (its a simple XOR of the encrypted counter with the plain text). The EVP object will also buffer output bytes until you call Final.

Encryption and Decryption using AES in Linux Kernel

I want to encrypt files during their creation and decrypt files during their read operation using AES algorithm. I have also written code in vfs_write() and vfs_read() for encryption and decryption respectively and also it is working nicely, but the only problem now is that k whenever I pass a file to vfs_write() whose length is not multiple of 16 (AES BLOCK SIZE) den AES does padding to it to make it a multiple of 16 and bcz of this the size of the file increases but the write() function does not know about this and so it rejects
e.g.:- suppose i enter data as "123", here length is 4 (3 length of data + 1 '\0' character) and so AES padds 12 bytes to it to make it 16 bytes (as AES works on 16 bytes blocks), but write() only takes original length which was 4, and so i want to know how to change the file size to 16 (in this case) and also where to do it in kernel code.
i tried this
inode->i_size=new_length;
inode->i_op->truncate(inode);
also i tried
if(file->f_flags & O_APPEND)
*pos=i_size_read(inode);
but this is not working bcz kernel hangs and also i am not understanding where to do such things i.e., in which function and how.
also i tried changing the count variable with new length in vfs_write() but then it gives error as "cat write error: No space left on device".
it works fine when i pass file which is a multiple of 16.
Your current mode requires padding; and it can be hard to do padding in some cases (like what you meet). Usually, disk/filesystem encryption is done in other mode, which require no padding (and easy to do random reads/writes).
Overview of block cipher modes: http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation
Modes: ECB, CBC, PCBC, CFB, OFB require padding
and mode CTR (Counter) doesn't require padding.
This mode is easy and it is even easier to implement it in wrong (unsafe, easy-to-break) way.
There is an overview of Disk encryption http://en.wikipedia.org/wiki/Disk_encryption_theory with even more advanced modes (XEX, XTS). Some of them still require padding.
Even with universal cipher mode you will get a lot of problems.. Some of them are covered in "Cryptfs: A Stackable Vnode Level Encryption File System"
why dont you deliberately make it 16 bytes, encrypt it and then after decryption discard the padding. I mean instead of relying on aes alone, do it yourself. By this you wil be sure that the data you are getting is correct

Understanding `read, write` system calls in Unix

My Systems Programming project has us implementing a compression/decompression program to crunch down ASCII text files by removing the zero top bit and writing the output to a separate file, depending on whether the compression or decompression routine is working. To do this, the professor has required us to use the binary files and Unix system calls, which include open, close, read, write, etc.
From my understanding of read and write, it reads the binary data by defined byte chunks. However, since this data is binary, I'm not sure how to parse it.
This is a stripped down version of my code, minus the error checking:
void compress(char readFile[]){
char buffer[BUFFER] //buffer size set to 4096, but tunable to system preference
int openReadFile;
openReadFile= open(readFile, O_RDONLY);
}
If I use read to read the data into buffer, will the data in buffer be in binary or character format? Nothing I've come across addresses that detail, and its very relevant to how I parse the contents.
read() will read the bytes in without any interpretation (so "binary" mode).
Being binary, and you want to access the individual bytes, you should use a buffer of unsigned char
unsigned char buffer[BUFFER]. You can regard char/unsigned char as bytes, they'll be 8 bits on linux.
Now, since what you're dealing with is 8 bit ascii compressed down to 7 bit, you'll have to convert those 7 bits into 8 bits again so you can make sense of the data.
To explain what's been done - consider the text Hey .That's 3 bytes. The bytes will have 8 bits each, and in ascii that's the bit patterns :
01001000 01100101 01111001
Now, removing the most significant bit from this, you shift the remaining bits one bit to the left.
X1001000 X1100101 X1111001
Above, X is the bit to removed. Removing those, and shifting the others you end up with bytes with this pattern:
10010001 10010111 11001000
The rightmost 3 bits is just filled in with 0. So far, no space is saved though. There's still 3 bytes.
With a string of 8 bytes, we'd saved 1 byte as that would compress down to 7 bytes.
Now you have to do the reverse on the bytes you've read back in
I'll quote the manual of the fopen function (that is based on the open function/primitive) from http://www.kernel.org/doc/man-pages/online/pages/man3/fopen.3.html
The mode string can also include the
letter 'b' either as a last character
or as a character between the
characters in any of the two-character
strings described above. This is
strictly for compatibility with C89
and has no effect; the 'b' is ignored
on all POSIX conforming systems,
including Linux
So even the high level function ignores the mode :-)
It will read the binary content of the file and load it in the memory buffer points to. Of course, a byte is 8 bits, and that's why a char is 8 bits, so, if the file was a regular plain text document you'll end up with a printable string (be careful with how it ends, read returns the number of bytes (characters in a ascii-encoded plain text file) read).
Edit: in case the file you're reading isn't a text file, and is a collection of binary representations, you can make the type of the buffer the one of the file, even if it's a struct.

Resources