Determine size of decrypted data from gcry_cipher_decrypt? - c

I am using AES/GCM, but the following is a general question for other modes, like AES/CBC. I have the following call into libgcrypt:
#define COUNTOF(x) ( sizeof(x) / sizeof(x[0]) )
#define ROUNDUP(x, b) ( (x) ? (((x) + (b - 1)) / b) * b : b)
const byte cipher[] = { 0xD0,0x6D,0x69,0x0F ... };
byte recovered[ ROUNDUP(COUNTOF(cipher), 16) ];
...
err = gcry_cipher_decrypt(
handle, // gcry_cipher_hd_t
recovered, // void *
COUNTOF(recovered), // size_t
cipher, // const void *
COUNTOF(cipher)); // size_t
I cannot figure out how to determine what the size of the resulting recovered text is. I've checked the Working with cipher handles reference, and its not discussed (and there are 0 hits for 'pad). I also checked the libgrcrypt self tests in tests/basic.c and tests/fipsdrv.c, but they use the same oversized buffer and never prune the buffer to the actual size.
How do I determine the size of the data returned to me in the recovered buffer?

You need to apply a padding scheme to your input, and remove the padding after the decrypt. gcrypt doesn't handle it for you.
The most common choice is PKCS#7. A high level overview is that you fill the unused bytes in your final block with the number of padded bytes (block_size - used_bytes). If your input length is a multiple of the block size, you follow it with a block filled with block_size bytes.
For example, with 8-byte blocks and 4 bytes of input, your raw input would look like:
AB CD EF FF 04 04 04 04
When you do the decrypt, you take the value of the last byte of the last block, and remove that many bytes from the end.

Related

Hash a file with SHA on a memory-constrained system using mbedlts

I want to calculate the SHA256 value of a file, which has a size of more than 1M. In order to get this hash value with the mbedtls library, I need to copy the whole file to the memory. But my memory size is only 100K. So I want to know if there is some method that calculates the file hash value in sections.
In order to get this hash value with mbedtls library, I need to copy the whole file to the memory.
This is not accurate. The mbedtls library supports incremental calculation of hash values.
To calculate a SHA-256 hash with mbedtls, you would have to take the following steps (reference):
Create an instance of the mbedtls_sha256_context struct.
Initialize the context with mbedtls_sha256_init and then mbedtls_sha256_starts_ret.
Feed data into the hash function with mbedtls_sha256_update_ret.
Calculcate the final hash sum with mbedtls_sha256_finish_ret.
Free the context with mbedtls_sha256_free
Note that this does not mean that the mbedtls_sha256_context struct holds the entire data until mbedtls_sha256_finish_ret is called. Instead, mbedtls_sha256_context only holds the intermediate result of the hash calculation. When feeding additional data into the hash function with mbedtls_sha256_update_ret, the state of the calculation is updated and the new intermediate result is stored in the mbedtls_sha256_context.
The total size of a mbedtls_sha256_context, as determined by sizeof( mbedtls_sha256_context), is 108 bytes on my system. We can also see this from the mbedtls source code (reference):
typedef struct mbedtls_sha256_context
{
uint32_t total[2]; /*!< The number of Bytes processed. */
uint32_t state[8]; /*!< The intermediate digest state. */
unsigned char buffer[64]; /*!< The data block being processed. */
int is224; /*!< Determines which function to use:
0: Use SHA-256, or 1: Use SHA-224. */
}
mbedtls_sha256_context;
We can see that the struct holds a counter of size 2*32 bit = 8 byte that keeps track of the total number of bytes processed so far. 8*32 bit = 32 byte are used to track the intermediate result of the hash calculation. 64 byte are used to track the current data block being processed. As you can see, this is a fixed size buffer that does not grow with the amount of data that is being hashed. Finally an int is used to distinguish between SHA-224 and SHA-256. On my system sizeof(int) == 4. So in total, we get the 8+32+64+4 = 108 byte.
Consider the following example program, which reads a file step by step into a buffer of size 4096 and feeds the buffer into the hash function in each step:
#include <mbedtls/sha256.h>
#include <stdio.h>
#include <stdlib.h>
#define BUFFER_SIZE 4096
#define HASH_SIZE 32
int main(void) {
int ret;
// Initialize hash
mbedtls_sha256_context ctx;
mbedtls_sha256_init(&ctx);
mbedtls_sha256_starts_ret(&ctx, /*is224=*/0);
// Open file
FILE *fp = fopen("large_file", "r");
if (fp == NULL) {
ret = EXIT_FAILURE;
goto exit;
}
// Read file in chunks of size BUFFER_SIZE
uint8_t buffer[BUFFER_SIZE];
size_t read;
while ((read = fread(buffer, 1, BUFFER_SIZE, fp)) > 0) {
mbedtls_sha256_update_ret(&ctx, buffer, read);
}
// Calculate final hash sum
uint8_t hash[HASH_SIZE];
mbedtls_sha256_finish_ret(&ctx, hash);
// Simple debug printing. Use MBEDTLS_SSL_DEBUG_BUF in a real program.
for (size_t i = 0; i < HASH_SIZE; i++) {
printf("%02x", hash[i]);
}
printf("\n");
// Cleanup
fclose(fp);
ret = EXIT_SUCCESS;
exit:
mbedtls_sha256_free(&ctx);
return ret;
}
When running a program on a large sample file, the following behavior can be observed:
$ dd if=/dev/random of=large_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB, 977 MiB) copied, 5.78353 s, 177 MB/s
$ sha256sum large_file
ae2d3b46eec018e006533da47a80e933a741a8b1320cfce7392a5472faae0216 large_file
$ gcc -O3 -static test.c /usr/lib/libmbedcrypto.a
$ ./a.out
ae2d3b46eec018e006533da47a80e933a741a8b1320cfce7392a5472faae0216
We can see that the program calculates the correct SHA-256 hash. We can also inspect the memory used by the program:
$ command time -v ./a.out
...
Maximum resident set size (kbytes): 824
...
We can see that the program consumed at most 824 KB of memory. Thus, we have calculated the hash of a 1 GB file with < 1MB of memory. This shows that we do not have to load the entire file into memory at once to calculate its hash with mbedtls.
Keep in mind this measurement was done on a 64 bit desktop computer, not an embedded platform. Also, no further optimizations were performed besides -O3 and static linking (the latter approximately halved the memory usage of the program). I would expect the memory footprint to be even smaller on an embedded device with a smaller address size and a tool chain performing further optimizations.

size of cache and calculating cache set

I'm trying to understand cache basics.
If I have
#define OFFSET_BITS (6) // 64 bytes cache line
#define SET_INDEX_BITS (5) // 32 sets
#define TAG_BITS (64 - OFFSET_BITS - SET_INDEX_BITS) //
#define NWAYS (8) // 8 ways cache.
What is the size of cache in this machine?
Is it just adding the offset, set and tag bits?
Also, lets say I have an address 0x40000100, what is the cache set for the address? How do I calculate that?
Assume you have an array, like this:
uint8_t myCache[1 << SET_INDEX_BITS][NWAYS][1 << OFFSET_BITS];
For NWAYS = 8, SET_INDEX_BITS = 5 and OFFSET_BITS = 6; the size of the array (the size of the cache) would be 16384 bytes (16 KiB).
Note that "cache size" only refers to how much data the cache can store, and excludes the cost of storing the tags needed to find the data.
The tags could be representing by a second array, like this:
myCacheTags[1 << SET_INDEX_BITS][NWAYS];
If one tag costs 53 bits, then 256 tags will cost 13568 bits; so to actually implement the cache you'd need a minimum of 18080 bytes. Of course in C (where you can't have an array of 53-bit integers) it'd cost a little more for padding/alignment (the array of tags would end up costing 64 bits per tag instead).
To find a cache line in the cache you'd do something like:
uint8_t *getCacheLine(uint32_t address) {
int setIndex = (address >> OFFSET_BITS) & (( 1 << SET_INDEX_BITS) - 1);
int myTag = address >> (OFFSET_BITS + SET_INDEX_BITS);
for(int way = 0; way < NWAYS; way++) {
if(myTag == myCacheTags[setIndex][way]) {
return myCache[setIndex][way];
}
}
return NULL; // Cache miss
}
Note: Typically the tag contains some kind of "valid or invalid" flag (in case an entry in the cache contains nothing at all), and typically the tag also contains something to represent the how recently used the cache line is (for some kind of "least recently used" eviction algorithm). The example code I've provided is incomplete - it doesn't mask off these extra bits when doing if(myTag == myCacheTags[setIndex][way]), it doesn't check any valid/invalid flag, and it doesn't update the tag to indicate that the cache line was recently used.

Reading SQLite header

I was trying to parse the header from an SQLite database file, using this (fragment of the actual) code:
struct Header_info {
char *filename;
char *sql_string;
uint16_t page_size;
};
int read_header(FILE *db, struct Header_info *header)
{
assert(db);
uint8_t sql_buf[100] = {0};
/* load the header */
if(fread(sql_buf, 100, 1, db) != 1) {
return ERR_SIZE;
}
/* copy the string */
header->sql_string = strdup((char *)sql_buf);
/* verify that we have a proper header */
if(strcmp(header->sql_string, "SQLite format 3") != 0) {
return ERR_NOT_HEADER;
}
memcpy(&header->page_size, (sql_buf + 16), 2);
return 0;
}
Here are the relevant bytes of the file I'm testing it on:
0000000: 5351 4c69 7465 2066 6f72 6d61 7420 3300 SQLite format 3.
0000010: 1000 0101 0040 2020 0000 c698 0000 1a8e .....# ........
Following this spec, the code looks correct to me.
Later I print header->page_size with this line:
printf("\tPage size: %"PRIu16"\n", header->page_size);
But that line prints out 16, instead of the expected 4096. Why? I'm almost certain it's some basic thing that I've just overlooked.
It's an endianness problem. x86 is little-endian, that is, in memory, the least significant byte is stored first. When you load 10 00 into memory on a little-endian architecture, you therefore get 00 10 in human-readable form, which is 16 instead of 4096.
Your problem is therefore that memcpy is not an appropriate tool to read the value.
See the following section of the SQLite file format spec :
1.2.2 Page Size
The two-byte value beginning at offset 16 determines the page size of
the database. For SQLite versions 3.7.0.1 and earlier, this value is
interpreted as a big-endian integer and must be a power of two between
512 and 32768, inclusive. Beginning with SQLite version 3.7.1, a page
size of 65536 bytes is supported. The value 65536 will not fit in a
two-byte integer, so to specify a 65536-byte page size, the value is
at offset 16 is 0x00 0x01. This value can be interpreted as a
big-endian 1 and thought of is as a magic number to represent the
65536 page size. Or one can view the two-byte field as a little endian
number and say that it represents the page size divided by 256. These
two interpretations of the page-size field are equivalent.
It seems an endianness issue. If you are on a little-endian machine this line:
memcpy(&header->page_size, (sql_buf + 16), 2);
copies the two bytes 10 00 into an uint16_t which will have the low-order byte at the lower address.
You can do this instead:
header->page_size = sql_buf[17] | (sql_buf[16] << 8);
Update
For the record, note that the solution I propose will work regardless of the endianness of the machine (see this Rob Pike's Article).

Using bzip2 low-level routines to compress chunks of data

The Overview
I am using the low-level calls in the libbzip2 library: BZ2_bzCompressInit(), BZ2_bzCompress() and BZ2_bzCompressEnd() to compress chunks of data to standard output.
I am migrating working code from higher-level calls, because I have a stream of bytes coming in and I want to compress those bytes in sets of discrete chunks (a discrete chunk is a set of bytes that contains a group of tokens of interest — my input is logically divided into groups of these chunks).
A complete group of chunks might contain, say, 500 chunks, which I want to compress to one bzip2 stream and write to standard output.
Within a set, using the pseudocode I outline below, if my example buffer is able to hold 101 chunks at a time, I would open a new stream, compress 500 chunks in runs of 101, 101, 101, 101, and one final run of 96 chunks that closes the stream.
The Problem
The issue is that my bz_stream structure instance, which keeps tracks of the number of compressed bytes in a single pass of the BZ2_bzCompress() routine, seems to claim to be writing more compressed bytes than the total bytes in the final, compressed file.
For example, the compressed output could be a file with a true size of 1234 bytes, while the number of reported compressed bytes (which I track while debugging) is somewhat higher than 1234 bytes (say 2345 bytes).
My rough pseudocode is in two parts.
The first part is a rough sketch of what I do to compress a subset of chunks (and I know that I have another subset coming after this one):
bz_stream bzStream;
unsigned char bzBuffer[BZIP2_BUFFER_MAX_LENGTH] = {0};
unsigned long bzBytesWritten = 0UL;
unsigned long long cumulativeBytesWritten = 0ULL;
unsigned char myBuffer[UNCOMPRESSED_MAX_LENGTH] = {0};
size_t myBufferLength = 0;
/* initialize bzStream */
bzStream.next_in = NULL;
bzStream.avail_in = 0U;
bzStream.avail_out = 0U;
bzStream.bzalloc = NULL;
bzStream.bzfree = NULL;
bzStream.opaque = NULL;
int bzError = BZ2_bzCompressInit(&bzStream, 9, 0, 0);
/* bzError checking... */
do
{
/* read some bytes into myBuffer... */
/* compress bytes in myBuffer */
bzStream.next_in = myBuffer;
bzStream.avail_in = myBufferLength;
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
do
{
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
bzError = BZ2_bzCompress(&bzStream, BZ_RUN);
/* error checking... */
bzBytesWritten = ((unsigned long) bzStream.total_out_hi32 << 32) + bzStream.total_out_lo32;
cumulativeBytesWritten += bzBytesWritten;
/* write compressed data in bzBuffer to standard output */
fwrite(bzBuffer, 1, bzBytesWritten, stdout);
fflush(stdout);
}
while (bzError == BZ_OK);
}
while (/* while there is a non-final myBuffer full of discrete chunks left to compress... */);
Now we wrap up the output:
/* read in the final batch of bytes into myBuffer (with a total byte size of `myBufferLength`... */
/* compress remaining myBufferLength bytes in myBuffer */
bzStream.next_in = myBuffer;
bzStream.avail_in = myBufferLength;
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
do
{
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
bzError = BZ2_bzCompress(&bzStream, (bzStream.avail_in) ? BZ_RUN : BZ_FINISH);
/* bzError error checking... */
/* increment cumulativeBytesWritten by `bz_stream` struct `total_out_*` members */
bzBytesWritten = ((unsigned long) bzStream.total_out_hi32 << 32) + bzStream.total_out_lo32;
cumulativeBytesWritten += bzBytesWritten;
/* write compressed data in bzBuffer to standard output */
fwrite(bzBuffer, 1, bzBytesWritten, stdout);
fflush(stdout);
}
while (bzError != BZ_STREAM_END);
/* close stream */
bzError = BZ2_bzCompressEnd(&bzStream);
/* bzError checking... */
The Questions
Am I calculating cumulativeBytesWritten (or, specifically, bzBytesWritten) incorrectly, and how would I fix that?
I have been tracking these values in a debug build, and I do not seem to be "double counting" the bzBytesWritten value. This value is counted and used once to increment cumulativeBytesWritten after each successful BZ2_bzCompress() pass.
Alternatively, am I not understanding the correct use of the bz_stream state flags?
For example, does the following compress and keep the bzip2 stream open, so long as I keep sending some bytes?
bzError = BZ2_bzCompress(&bzStream, BZ_RUN);
Likewise, can the following statement compress data, so long as there are at least some bytes are available to access from the bzStream.next_in pointer (BZ_RUN), and then the stream is wrapped up when there are no more bytes available (BZ_FINISH)?
bzError = BZ2_bzCompress(&bzStream, (bzStream.avail_in) ? BZ_RUN : BZ_FINISH);
Or, am I not using these low-level calls correctly at all? Should I go back to using the higher-level calls to continuously append a grouping of compressed chunks of data to one main file?
There's probably a simple solution to this, but I've been banging my head on the table for a couple days in the course of debugging what could be wrong, and I'm not making much progress. Thank you for any advice.
In answer to my own question, it appears I am miscalculating the number of bytes written. I should not use the total_out_* members. The following correction works properly:
bzBytesWritten = sizeof(bzBuffer) - bzStream.avail_out;
The rest of the calculations follow.

Why does fread mess with my byte order?

Im trying to parse a bmp file with fread() and when I begin to parse, it reverses the order of my bytes.
typedef struct{
short magic_number;
int file_size;
short reserved_bytes[2];
int data_offset;
}BMPHeader;
...
BMPHeader header;
...
The hex data is 42 4D 36 00 03 00 00 00 00 00 36 00 00 00;
I am loading the hex data into the struct by fread(&header,14,1,fileIn);
My problem is where the magic number should be 0x424d //'BM' fread() it flips the bytes to be 0x4d42 // 'MB'
Why does fread() do this and how can I fix it;
EDIT: If I wasn't specific enough, I need to read the whole chunk of hex data into the struct not just the magic number. I only picked the magic number as an example.
This is not the fault of fread, but of your CPU, which is (apparently) little-endian. That is, your CPU treats the first byte in a short value as the low 8 bits, rather than (as you seem to have expected) the high 8 bits.
Whenever you read a binary file format, you must explicitly convert from the file format's endianness to the CPU's native endianness. You do that with functions like these:
/* CHAR_BIT == 8 assumed */
uint16_t le16_to_cpu(const uint8_t *buf)
{
return ((uint16_t)buf[0]) | (((uint16_t)buf[1]) << 8);
}
uint16_t be16_to_cpu(const uint8_t *buf)
{
return ((uint16_t)buf[1]) | (((uint16_t)buf[0]) << 8);
}
You do your fread into an uint8_t buffer of the appropriate size, and then you manually copy all the data bytes over to your BMPHeader struct, converting as necessary. That would look something like this:
/* note adjustments to type definition */
typedef struct BMPHeader
{
uint8_t magic_number[2];
uint32_t file_size;
uint8_t reserved[4];
uint32_t data_offset;
} BMPHeader;
/* in general this is _not_ equal to sizeof(BMPHeader) */
#define BMP_WIRE_HDR_LEN (2 + 4 + 4 + 4)
/* returns 0=success, -1=error */
int read_bmp_header(BMPHeader *hdr, FILE *fp)
{
uint8_t buf[BMP_WIRE_HDR_LEN];
if (fread(buf, 1, sizeof buf, fp) != sizeof buf)
return -1;
hdr->magic_number[0] = buf[0];
hdr->magic_number[1] = buf[1];
hdr->file_size = le32_to_cpu(buf+2);
hdr->reserved[0] = buf[6];
hdr->reserved[1] = buf[7];
hdr->reserved[2] = buf[8];
hdr->reserved[3] = buf[9];
hdr->data_offset = le32_to_cpu(buf+10);
return 0;
}
You do not assume that the CPU's endianness is the same as the file format's even if you know for a fact that right now they are the same; you write the conversions anyway, so that in the future your code will work without modification on a CPU with the opposite endianness.
You can make life easier for yourself by using the fixed-width <stdint.h> types, by using unsigned types unless being able to represent negative numbers is absolutely required, and by not using integers when character arrays will do. I've done all these things in the above example. You can see that you need not bother endian-converting the magic number, because the only thing you need to do with it is test magic_number[0]=='B' && magic_number[1]=='M'.
Conversion in the opposite direction, btw, looks like this:
void cpu_to_le16(uint8_t *buf, uint16_t val)
{
buf[0] = (val & 0x00FF);
buf[1] = (val & 0xFF00) >> 8;
}
void cpu_to_be16(uint8_t *buf, uint16_t val)
{
buf[0] = (val & 0xFF00) >> 8;
buf[1] = (val & 0x00FF);
}
Conversion of 32-/64-bit quantities left as an exercise.
I assume this is an endian issue. i.e. You are putting the bytes 42 and 4D into your short value. But your system is little endian (I could have the wrong name), which actually reads the bytes (within a multi-byte integer type) left to right instead of right to left.
Demonstrated in this code:
#include <stdio.h>
int main()
{
union {
short sval;
unsigned char bval[2];
} udata;
udata.sval = 1;
printf( "DEC[%5hu] HEX[%04hx] BYTES[%02hhx][%02hhx]\n"
, udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
udata.sval = 0x424d;
printf( "DEC[%5hu] HEX[%04hx] BYTES[%02hhx][%02hhx]\n"
, udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
udata.sval = 0x4d42;
printf( "DEC[%5hu] HEX[%04hx] BYTES[%02hhx][%02hhx]\n"
, udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
return 0;
}
Gives the following output
DEC[ 1] HEX[0001] BYTES[01][00]
DEC[16973] HEX[424d] BYTES[4d][42]
DEC[19778] HEX[4d42] BYTES[42][4d]
So if you want to be portable you will need to detect the endian-ness of your system and then do a byte shuffle if required. There will be plenty of examples round the internet of swapping the bytes around.
Subsequent question:
I ask only because my file size is 3 instead of 196662
This is due to memory alignment issues. 196662 is the bytes 36 00 03 00 and 3 is the bytes 03 00 00 00. Most systems need types like int etc to not be split over multiple memory words. So intuitively you think your struct is laid out im memory like:
Offset
short magic_number; 00 - 01
int file_size; 02 - 05
short reserved_bytes[2]; 06 - 09
int data_offset; 0A - 0D
BUT on a 32 bit system that means files_size has 2 bytes in the same word as magic_number and two bytes in the next word. Most compilers will not stand for this, so the way the structure is laid out in memory is actually like:
short magic_number; 00 - 01
<<unused padding>> 02 - 03
int file_size; 04 - 07
short reserved_bytes[2]; 08 - 0B
int data_offset; 0C - 0F
So when you read your byte stream in the 36 00 is going into your padding area which leaves your file_size as getting the 03 00 00 00. Now if you used fwrite to create this data it should have been OK as the padding bytes would have been written out. But if your input is always going to be in the format you have specified it is not appropriate to read the whole struct as one with fread. Instead you will need to read each of the elements individually.
Writing a struct to a file is highly non-portable -- it's safest to just not try to do it at all. Using a struct like this is guaranteed to work only if a) the struct is both written and read as a struct (never a sequence of bytes) and b) it's always both written and read on the same (type of) machine. Not only are there "endian" issues with different CPUs (which is what it seems you've run into), there are also "alignment" issues. Different hardware implementations have different rules about placing integers only on even 2-byte or even 4-byte or even 8-byte boundaries. The compiler is fully aware of all this, and inserts hidden padding bytes into your struct so it always works right. But as a result of the hidden padding bytes, it's not at all safe to assume a struct's bytes are laid out in memory like you think they are. If you're very lucky, you work on a computer that uses big-endian byte order and has no alignment restrictions at all, so you can lay structs directly over files and have it work. But you're probably not that lucky -- certainly programs that need to be "portable" to different machines have to avoid trying to lay structs directly over any part of any file.

Resources