How to create a md5 hash of a string in C? - c

I've found some md5 code that consists of the following prototypes...
I've been trying to find out where I have to put the string I want to hash, what functions I need to call, and where to find the string once it has been hashed. I'm confused with regards to what the uint32 buf[4] and uint32 bits[2] are in the struct.
struct MD5Context {
uint32 buf[4];
uint32 bits[2];
unsigned char in[64];
};
/*
* Start MD5 accumulation. Set bit count to 0 and buffer to mysterious
* initialization constants.
*/
void MD5Init(struct MD5Context *context);
/*
* Update context to reflect the concatenation of another buffer full
* of bytes.
*/
void MD5Update(struct MD5Context *context, unsigned char const *buf, unsigned len);
/*
* Final wrapup - pad to 64-byte boundary with the bit pattern
* 1 0* (64-bit count of bits processed, MSB-first)
*/
void MD5Final(unsigned char digest[16], struct MD5Context *context);
/*
* The core of the MD5 algorithm, this alters an existing MD5 hash to
* reflect the addition of 16 longwords of new data. MD5Update blocks
* the data and converts bytes into longwords for this routine.
*/
void MD5Transform(uint32 buf[4], uint32 const in[16]);

I don't know this particular library, but I've used very similar calls. So this is my best guess:
unsigned char digest[16];
const char* string = "Hello World";
struct MD5Context context;
MD5Init(&context);
MD5Update(&context, string, strlen(string));
MD5Final(digest, &context);
This will give you back an integer representation of the hash. You can then turn this into a hex representation if you want to pass it around as a string.
char md5string[33];
for(int i = 0; i < 16; ++i)
sprintf(&md5string[i*2], "%02x", (unsigned int)digest[i]);

Here's a complete example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#if defined(__APPLE__)
# define COMMON_DIGEST_FOR_OPENSSL
# include <CommonCrypto/CommonDigest.h>
# define SHA1 CC_SHA1
#else
# include <openssl/md5.h>
#endif
char *str2md5(const char *str, int length) {
int n;
MD5_CTX c;
unsigned char digest[16];
char *out = (char*)malloc(33);
MD5_Init(&c);
while (length > 0) {
if (length > 512) {
MD5_Update(&c, str, 512);
} else {
MD5_Update(&c, str, length);
}
length -= 512;
str += 512;
}
MD5_Final(digest, &c);
for (n = 0; n < 16; ++n) {
snprintf(&(out[n*2]), 16*2, "%02x", (unsigned int)digest[n]);
}
return out;
}
int main(int argc, char **argv) {
char *output = str2md5("hello", strlen("hello"));
printf("%s\n", output);
free(output);
return 0;
}

As other answers have mentioned, the following calls will compute the hash:
MD5Context md5;
MD5Init(&md5);
MD5Update(&md5, data, datalen);
MD5Final(digest, &md5);
The purpose of splitting it up into that many functions is to let you stream large datasets.
For example, if you're hashing a 10GB file and it doesn't fit into ram, here's how you would go about doing it. You would read the file in smaller chunks and call MD5Update on them.
MD5Context md5;
MD5Init(&md5);
fread(/* Read a block into data. */)
MD5Update(&md5, data, datalen);
fread(/* Read the next block into data. */)
MD5Update(&md5, data, datalen);
fread(/* Read the next block into data. */)
MD5Update(&md5, data, datalen);
...
// Now finish to get the final hash value.
MD5Final(digest, &md5);

To be honest, the comments accompanying the prototypes seem clear enough. Something like this should do the trick:
void compute_md5(char *str, unsigned char digest[16]) {
MD5Context ctx;
MD5Init(&ctx);
MD5Update(&ctx, str, strlen(str));
MD5Final(digest, &ctx);
}
where str is a C string you want the hash of, and digest is the resulting MD5 digest.

It would appear that you should
Create a struct MD5context and pass it to MD5Init to get it into a proper starting condition
Call MD5Update with the context and your data
Call MD5Final to get the resulting hash
These three functions and the structure definition make a nice abstract interface to the hash algorithm. I'm not sure why you were shown the core transform function in that header as you probably shouldn't interact with it directly.
The author could have done a little more implementation hiding by making the structure an abstract type, but then you would have been forced to allocate the structure on the heap every time (as opposed to now where you can put it on the stack if you so desire).

All of the existing answers use the deprecated MD5Init(), MD5Update(), and MD5Final().
Instead, use EVP_DigestInit_ex(), EVP_DigestUpdate(), and EVP_DigestFinal_ex(), e.g.
// example.c
//
// gcc example.c -lssl -lcrypto -o example
#include <openssl/evp.h>
#include <stdio.h>
#include <string.h>
void bytes2md5(const char *data, int len, char *md5buf) {
// Based on https://www.openssl.org/docs/manmaster/man3/EVP_DigestUpdate.html
EVP_MD_CTX *mdctx = EVP_MD_CTX_new();
const EVP_MD *md = EVP_md5();
unsigned char md_value[EVP_MAX_MD_SIZE];
unsigned int md_len, i;
EVP_DigestInit_ex(mdctx, md, NULL);
EVP_DigestUpdate(mdctx, data, len);
EVP_DigestFinal_ex(mdctx, md_value, &md_len);
EVP_MD_CTX_free(mdctx);
for (i = 0; i < md_len; i++) {
snprintf(&(md5buf[i * 2]), 16 * 2, "%02x", md_value[i]);
}
}
int main(void) {
const char *hello = "hello";
char md5[33]; // 32 characters + null terminator
bytes2md5(hello, strlen(hello), md5);
printf("%s\n", md5);
}

Related

Is there a way to "synchronize" the identifiers in an enum and function names called by a function pointer array?

I am writing C code on STM32 (specifically the STM32756-EVAL) and I have created an enum which reads an incoming char array and assigns an enum to it. The value of this enum is then placed as the index for a function pointer array.
The reason why I have this code is to be able to decide on what function to call based on the receiving char array, without relying on a giant if-else stack reading through the char arrays one by one.
The relevant code:
enum cmd cmd_Converter(unsigned char* inputCmd){//Converts the input cmd from uint8_t to an enum.
switch (inputCmd[0]){ //Currently we are using a switch-case. I expect this list to expand to something like 50.
case 'F':
if (memcmp(inputCmd, "FIRMV", COMMAND_LENGTH) == 0) return FIRMV;
else return INVAL;
break;
case 'V':
if (memcmp(inputCmd, "VALCN", COMMAND_LENGTH) == 0) return VALCN;
else return INVAL;
break;
default:
return INVAL;
}
}
void process_Message(uint8_t* message, uint16_t Len){
unsigned char inputCmd[COMMAND_LENGTH];
unsigned char inputData[DATA_LENGTH];
unsigned char outputCmd[COMMAND_LENGTH];
unsigned char outputData[DATA_LENGTH];
//Function that separates message, inputCmd, and inputMessage.
memcpy((char*) inputCmd, (const char*)message + COMMAND_CHAR, COMMAND_LENGTH);
memcpy((char*) inputData, (const char*)message + DATA_CHAR, DATA_LENGTH);
enum cmd enumCmd = cmd_Converter(inputCmd);
void (*cmd_Function_Pointer[])(unsigned char* inputData) = {FIRMV_F, VALCN_F, INVAL_F}; //Is this even needed?
(*cmd_Function_Pointer[enumCmd])(inputData);
// message_Received(message, Len);
// send_Message(outputCmd, outputData);
}
void FIRMV_F(unsigned char *inputData){
//Do thing
}
void VALCN_F(unsigned char *inputData){
//Do thing
}
void INVAL_F(unsigned char *inputData){
//Do thing
}
The enum is there to improve code readability, so that anyone reading the code can see the enum and the function pointer and go "enum FIRMV will call FIRMV_F from (*cmd_Function_Pointer[enumCmd])(inputData)". One of the weaknesses I've identified is that it relies on the sequence of enum cmd and cmd_Function_Pointer[] to be identical, and if the list of enums gets too long it will be hard to maintain this identical sequence.
I am wondering whether there are any methods within C that would allow for "synchronizing" the identifiers in an enum and function names called by a function pointer?
The full code:
usbd_cdc_if.c
/**
* #brief Data received over USB OUT endpoint are sent over CDC interface
* through this function.
*
* #note
* This function will issue a NAK packet on any OUT packet received on
* USB endpoint until exiting this function. If you exit this function
* before transfer is complete on CDC interface (ie. using DMA controller)
* it will result in receiving more data while previous ones are still
* not sent.
*
* #param Buf: Buffer of data to be received
* #param Len: Number of data received (in bytes)
* #retval Result of the operation: USBD_OK if all operations are OK else USBD_FAIL
*/
static int8_t CDC_Receive_FS(uint8_t* Buf, uint32_t *Len)
{
/* USER CODE BEGIN 6 */
USBD_CDC_SetRxBuffer(&hUsbDeviceFS, &Buf[0]);
USBD_CDC_ReceivePacket(&hUsbDeviceFS);
memset (buffer, '\0', 64); // clear the buffer
uint8_t len = (uint8_t)*Len; //Converts Len as uint32_t to len as uint8_t
memcpy(buffer, Buf, len); // copy the data to the buffer
memset(Buf, '\0', len); // clear the Buf also
//Code used to send message back
process_Message(buffer, len);
return (USBD_OK);
/* USER CODE END 6 */
}
/**
* #brief CDC_Transmit_FS
* Data to send over USB IN endpoint are sent over CDC interface
* through this function.
* #note
*
*
* #param Buf: Buffer of data to be sent
* #param Len: Number of data to be sent (in bytes)
* #retval USBD_OK if all operations are OK else USBD_FAIL or USBD_BUSY
*/
uint8_t CDC_Transmit_FS(uint8_t* Buf, uint16_t Len)
{
uint8_t result = USBD_OK;
/* USER CODE BEGIN 7 */
USBD_CDC_HandleTypeDef *hcdc = (USBD_CDC_HandleTypeDef*)hUsbDeviceFS.pClassData;
if (hcdc->TxState != 0){ //If TxState in hcdc is not 0, return USBD_BUSY.
return USBD_BUSY;
}
USBD_CDC_SetTxBuffer(&hUsbDeviceFS, Buf, Len); //SetTxBuffer sets the size of the buffer, as well as the buffer itself.
result = USBD_CDC_TransmitPacket(&hUsbDeviceFS); //USBD_CDC_TransmitPacket(&hUsbDeviceFS) transmits
/* USER CODE END 7 */
return result;
}
messageprocesser.c
#include "messageprocesser.h"
#include "main.h"
#include "usbd_cdc_if.h"
#include <string.h>
#include "string.h"
//Sample cmd: TOARM_FIRMV_00000000_4C\r\n
#define MESSAGE_LENGTH 25
#define COMMAND_CHAR 6 //See SW Protocol or sample cmd
#define COMMAND_LENGTH 5
#define DATA_CHAR 12
#define DATA_LENGTH 8
#define CHECKSUM_CHAR 21
#define CHECKSUM_LENGTH 2
enum cmd {FIRMV, VALCN, INVAL};
enum cmd cmd_Converter(unsigned char* inputCmd){//Converts the input cmd from uint8_t to an enum.
switch (inputCmd[0]){
case 'F':
if (memcmp(inputCmd, "FIRMV", COMMAND_LENGTH) == 0) return FIRMV;
else return INVAL;
break;
case 'V':
if (memcmp(inputCmd, "VALCN", COMMAND_LENGTH) == 0) return VALCN;
else return INVAL;
break;
default:
return INVAL;
}
}
void process_Message(uint8_t* message, uint16_t Len){
//HAL_GPIO_TogglePin(GPIOF, GPIO_PIN_10);
unsigned char inputCmd[COMMAND_LENGTH]; //These are not null-terminated strings.
unsigned char inputData[DATA_LENGTH]; //These are just an array of chars.
unsigned char outputCmd[COMMAND_LENGTH];
unsigned char outputData[DATA_LENGTH];
//Function that separates message, inputCmd, and inputMessage.
memcpy((char*) inputCmd, (const char*)message + COMMAND_CHAR, COMMAND_LENGTH);
memcpy((char*) inputData, (const char*)message + DATA_CHAR, DATA_LENGTH);
enum cmd enumCmd = cmd_Converter(inputCmd);
void (*cmd_Function_Pointer[])(unsigned char* inputData) = {FIRMV_F, VALCN_F, INVAL_F};
(*cmd_Function_Pointer[enumCmd])(inputData);
}
void FIRMV_F(unsigned char *inputData){
//HAL_GPIO_TogglePin(GPIOF, GPIO_PIN_10);
unsigned char outputCmd[COMMAND_LENGTH];
unsigned char outputData[DATA_LENGTH];
memcpy(outputCmd, "FIRMV", COMMAND_LENGTH);
memcpy(outputData, "01050A00", DATA_LENGTH);
send_Message(outputCmd, outputData);
}
void VALCN_F(unsigned char *inputData){
//HAL_GPIO_TogglePin(GPIOF, GPIO_PIN_10);
unsigned char outputCmd[COMMAND_LENGTH];
unsigned char outputData[DATA_LENGTH];
memcpy(outputCmd, "VALCN", COMMAND_LENGTH);
memcpy(outputData, "00000000", DATA_LENGTH);
send_Message(outputCmd, outputData);
}
void INVAL_F(unsigned char *inputData){
HAL_GPIO_TogglePin(GPIOF, GPIO_PIN_10);
unsigned char outputCmd[COMMAND_LENGTH];
unsigned char outputData[DATA_LENGTH];
memcpy(outputCmd, "REEEE", COMMAND_LENGTH);
memcpy(outputData, "99999999", DATA_LENGTH);
send_Message(outputCmd, outputData);
}
void send_Message(uint8_t* cmd, uint8_t* data){
uint8_t outputMessage[MESSAGE_LENGTH] = "TOWST_";
memcpy((char*) outputMessage + COMMAND_CHAR, (const char*) cmd, COMMAND_LENGTH);
outputMessage[COMMAND_CHAR + COMMAND_LENGTH] = '_';
memcpy((char*) outputMessage + DATA_CHAR, (const char*) data, DATA_LENGTH);
outputMessage[DATA_CHAR + DATA_LENGTH] = '_';
//Deal with checksum
int outputCheckSum = checkSum_Generator(outputMessage);
char outputCheckSumHex[2] = {'0', '0'};
itoa (outputCheckSum, outputCheckSumHex, 16);
if (outputCheckSum < 16) { //Adds a 0 if CS has fewer than 2 numbers
outputCheckSumHex[1] = outputCheckSumHex[0];
outputCheckSumHex[0] = '0';
}
outputCheckSumHex[0] = toupper (outputCheckSumHex[0]);
outputCheckSumHex[1] = toupper (outputCheckSumHex[1]);
memcpy((char*) outputMessage + CHECKSUM_CHAR, (const char*) outputCheckSumHex, CHECKSUM_LENGTH);
outputMessage[23] = '\r';
outputMessage[24] = '\n';
//return a processed message array
CDC_Transmit_FS(outputMessage, sizeof(outputMessage));
}
int checkSum_Generator(uint8_t* message){
int checkSum = 0;
for (int i = 0; i < CHECKSUM_CHAR; i++){ //Gives the cs of TOARM_COMND_DATA0000_.
checkSum ^= message[i];
}
return checkSum;
}
The attempts at solving this issue
Another question that involved "linking function pointers and enum" that I have looked into, but the solution offered doesn't seem to fix the issue I have mentioned, only circumventing by having smaller code.
So far I have tried changing the names of the functions to become identical to their enum counterparts, renaming FIRMV_F() to FIRMV(). Unsurprisingly, I got this:
../Core/Src/messageprocesser.c:59:6: error: 'FIRMV' redeclared as different kind of symbol
I have also tried assigning the function pointer array in a way similar to conventional arrays:
void process_Message(uint8_t* message, uint16_t Len){
//HAL_GPIO_TogglePin(GPIOF, GPIO_PIN_10);
unsigned char inputCmd[COMMAND_LENGTH]; //These are not null-terminated strings.
unsigned char inputData[DATA_LENGTH]; //These are just an array of chars.
unsigned char outputCmd[COMMAND_LENGTH];
unsigned char outputData[DATA_LENGTH];
//Function that separates message, inputCmd, and inputMessage.
memcpy((char*) inputCmd, (const char*)message + COMMAND_CHAR, COMMAND_LENGTH);
memcpy((char*) inputData, (const char*)message + DATA_CHAR, DATA_LENGTH);
enum cmd enumCmd = cmd_Converter(inputCmd);
void (*cmd_Function_Pointer[INVAL + 1])(unsigned char* inputData);
(*cmd_Function_Pointer[FIRMV]) = FIRMV_F;
(*cmd_Function_Pointer[VALCN]) = VALCN_F;
(*cmd_Function_Pointer[INVAL]) = INVAL_F;
(*cmd_Function_Pointer[enumCmd])(inputData); //What is going on here?
}
I have gotten the following errors.
../Core/Src/messageprocesser.c:52:33: error: lvalue required as left operand of assignment
52 | (*cmd_Function_Pointer[FIRMV]) = FIRMV_F;
| ^
../Core/Src/messageprocesser.c:53:33: error: lvalue required as left operand of assignment
53 | (*cmd_Function_Pointer[VALCN]) = VALCN_F;
| ^
../Core/Src/messageprocesser.c:54:33: error: lvalue required as left operand of assignment
54 | (*cmd_Function_Pointer[INVAL]) = INVAL_F;
This makes sense since my understanding is that the functions FIRMV_F do not have an lvalue, but I do not know how to fix it, assuming it is possible.
Please let me know if more detail or clarity is needed.
To Less Determined Readers: Using enum as index in function pointer array for readability. In current code enum sequencing needs to be identical to function pointer sequence. This seems vulnerable. Want method of making sure enum sequencing stays identical to function pointer sequence.
There's a common recommended practice, assuming enum values are just sequential. Make an enum like this:
typedef enum
{
INVAL,
FIRMV,
VALCN,
CMD_N // number of items in the enum
} cmd_t;
And a function template like this:
typedef void cmd_func_t (unsigned char *inputData);
Then you can create an array of function pointers where each index corresponds to the relevant enum, by using designated initializers:
static cmd_func_t* const cmd_list[] = // size not specified on purpose
{
[INVAL] = INVAL_F,
[FIRMV] = FIRMV_F,
[VALCN] = VALCN_F,
};
Verify integrity with:
_Static_assert(sizeof cmd_list/sizeof cmd_list[0] == CMD_N,
"cmd_list has wrong size compared to cmd_t");
Function call usage, for example:
cmd_list[FIRMV](param);
Also, just for completeness, we can go completely loco with "do not repeat yourself" and generate a lot of this through X-macros: https://godbolt.org/z/zY1nh5M5T. Not really recommended since it makes the code look obscure, but quite powerful. For example strings like "FIRMV" could be generated at compile-time, as shown in that example.
You can have your cmd_Converter() function just return the function pointer that is required. You can still use a switch statement to remove the need to have a long if...else if... chain. This way you don't need any enums, nor arrays of function pointers, and there is nothing to keep in sync.
#include <string.h>
#include <stdlib.h>
#define COMMAND_LENGTH 8
typedef void (*handler_t)(const unsigned char *arg);
void FIRMV_F(const unsigned char *inputData);
handler_t cmd_Converter(const unsigned char *inputCmd)
{
handler_t result = NULL;
switch (inputCmd[0])
{
case 'F':
if (memcmp(inputCmd, "FIRMV", COMMAND_LENGTH) == 0)
result = FIRMV_F;
break;
// more cases here.
default:
break;
}
return result;
}
Edit: If you want a purely data-driven approach that avoids the switch statement altogether, you can have an array of structures mapping command names to function pointers. This still removes the need for enums - the mapping is explicit:
#include <string.h>
#include <stdlib.h>
#define COMMAND_LENGTH 8
typedef void (*handler_t)(const unsigned char *arg);
typedef struct
{
const char *name;
handler_t fn_p;
} cmd_t;
void FIRMV_F(const unsigned char *inputData);
void VALCN_F(const unsigned char *inputData);
static const cmd_t commands[] =
{
{"FIRMV", FIRMV_F},
{"VALCN", VALCN_F}
};
handler_t cmd_Converter(const unsigned char *inputCmd)
{
handler_t result = NULL;
for (size_t i = 0; i < sizeof commands / sizeof commands[0]; ++i)
{
if (strcmp((const char *)inputCmd, commands[i].name) == 0)
{
result = commands[0].fn_p;
break;
}
}
return result;
}
This is using a linear search. If you don't have many commands, this might be enough. If there are a lot, you could sort the array and do a binary search.

A potential memory overflow but not sure what's causing it

I am seeing a potential overflow: streamBuffer is a struct object (part of the FreeRTOS lib) and upon executing the following line in OutputToSerial(), I see streamBuffer.xHead's value is set to an extremely large value even though it's not being modified at the time.
LONG_TO_STR(strData, txStr);
Note that I didn't have any issues when I called nRF24_ReadReg() multiple times before.
Also, often times I see that printf doesn't print the entire text that's being printed (prior to calling the time when I see a potential overflow) - instead misses on some chars.
Any way to get a better understanding of the cause? I don't see any hardfaults or anything to look in the registers...
For a reference, the following is the struct's definition:
typedef struct StreamBufferDef_t /*lint !e9058 Style convention uses tag. */
{
volatile size_t xTail; /* Index to the next item to read within the buffer. */
volatile size_t xHead; /* Index to the next item to write within the buffer. */
size_t xLength; /* The length of the buffer pointed to by pucBuffer. */
size_t xTriggerLevelBytes; /* The number of bytes that must be in the stream buffer before a task that is waiting for data is unblocked. */
volatile TaskHandle_t xTaskWaitingToReceive; /* Holds the handle of a task waiting for data, or NULL if no tasks are waiting. */
volatile TaskHandle_t xTaskWaitingToSend; /* Holds the handle of a task waiting to send data to a message buffer that is full. */
uint8_t *pucBuffer; /* Points to the buffer itself - that is - the RAM that stores the data passed through the buffer. */
uint8_t ucFlags;
#if ( configUSE_TRACE_FACILITY == 1 )
UBaseType_t uxStreamBufferNumber; /* Used for tracing purposes. */
#endif
} StreamBuffer_t;
// file.c
#define PRI_UINT64_C_Val(value) ((unsigned long) (value>>32)), ((unsigned long)value)
#define LONG_TO_STR(STR, LONG_VAL) (sprintf(STR, "%lx%lx", PRI_UINT64_C_Val(LONG_VAL)))
unsigned long long concatData(uint8_t *arr, uint8_t size)
{
long long unsigned value = 0;
for (uint8_t i = 0; i < size; i++)
{
value <<= 8;
value |= arr[i];
}
return value;
}
void nRF24_ReadReg(nrfl2401 *nrf, uint8_t reg, const uint8_t rxSize, uint8_t *rxBuffer, char *text)
{
uint8_t txBuffer[1] = {0};
uint8_t spiRxSize = rxSize;
if (reg <= nRF24_CMD_W_REG)
{
txBuffer[0] = nRF24_CMD_R_REG | (reg & nRF24_R_W_MASK);
spiRxSize++;
}
else
{
txBuffer[0] = reg;
}
nRF24_SendCommand(nrf, txBuffer, rxBuffer, spiRxSize);
OutputToSerial(txBuffer, rxBuffer, spiRxSize, text);
}
void OutputToSerial(uint8_t *writeBuffer, uint8_t *readBuffer, uint8_t size, char *text)
{
char strData[100] = {0}, rxStrData[100] = {0};
long long unsigned txStr = concatData(writeBuffer, size);
long long unsigned rxStr = concatData(readBuffer, size);
LONG_TO_STR(strData, txStr); // POTENTIAL ERROR.....!
LONG_TO_STR(rxStrData, rxStr);
char outputMsg[60] = {0};
strcpy(outputMsg, text);
strcat(outputMsg, ": 0x%s ----------- 0x%s\n");
printf (outputMsg, strData, rxStrData);
}
// main.c
StreamBufferHandle_t streamBuffer;
Perhaps other issues, yet the LONG_TO_STR(x) is simply a mess.
Consider the value 0x123400005678 would print as "12345678". That ref code is broken.
Yes its too bad code has long long yet no "%llx". Easy enough to re-write it all into a clean function.
//#define PRI_UINT64_C_Val(value) ((unsigned long) (value>>32)), ((unsigned long)value)
//#define LONG_TO_STR(STR, LONG_VAL) (sprintf(STR, "%lx%lx", PRI_UINT64_C_Val(LONG_VAL)))
#include <stdio.h>
#include <string.h>
#include <limits.h>
// Good for base [2...16]
void ullong_to_string(char *dest, unsigned long long x, int base) {
char buf[sizeof x * CHAR_BIT + 1]; // Worst case size
char *p = &buf[sizeof buf - 1]; // set to last element
*p = '\0';
do {
p--;
*p = "0123456789ABCDEF"[x % (unsigned) base];
x /= (unsigned) base;
} while (x);
strcpy(dest, p);
}
int main(void) {
char buf[100];
ullong_to_string(buf, 0x123400005678, 16); puts(buf);
ullong_to_string(buf, 0, 16); puts(buf);
ullong_to_string(buf, ULLONG_MAX, 16); puts(buf);
ullong_to_string(buf, ULLONG_MAX, 10); puts(buf);
return 0;
}
Output
123400005678
0
FFFFFFFFFFFFFFFF
18446744073709551615

Save or Move OpenSSL context to continue hash "later" or "elsewhere"

I'll say up front, that my aim is not for security, but rather verifying data integrity.
I want to compute a standard hash for several terabytes of data using OpenSSL routines. The data is logically serialized (i.e. one array of bytes), but is distributed over many computers.
Rather than move the huge dataset to a single process (and memory space) for hashing, I want to pass around the hashing "context" so that the hash computation can take place locally everywhere the data is held.
Another way to frame the problem would be, I want to add bytes to the dataset later and continue the hash computation but result in the same hash answer as though all the bytes were available initially.
I've run into a sticking point, because I can't figure out how the "context" is stored by the library. I don't know what I would need to pass around or save to accomplish my intent.
(It is safe to assume that all hashing participants will be using the same version of the OpenSSL library.)
I adapted the documentation example to process two chunks of bytes to facilitate discussion. Now I want to know is there a way to pull the plug on the computer between chunk1 and chunk2 and still get the same answer? Or what if chunk1 and chunk2 are stored on different computers?
#include <openssl/evp.h>
#include <stdio.h>
#include <string.h>
void print_bytes(const unsigned char * bytes, size_t count, const char * prefix)
{
printf("%s",prefix);
for(size_t i = 0; i < count; ++i) {
printf("%02x", bytes[i]);
}
printf(" (%zd Bytes)\n", count);
}
unsigned int md5_digest_process(const EVP_MD* type, const unsigned char *input, size_t numBytes, unsigned char* hash_out) {
EVP_MD_CTX mdctx;
unsigned int hash_len = 0;
size_t chunk1_size = numBytes/2;
size_t chunk2_size = numBytes - chunk1_size;
EVP_MD_CTX_init(&mdctx);
print_bytes( (const unsigned char *)&mdctx, sizeof(EVP_MD_CTX), "EVP_MD_CTX: " );
EVP_DigestInit_ex(&mdctx, type, NULL);
print_bytes( (const unsigned char *)&mdctx, sizeof(EVP_MD_CTX), "EVP_MD_CTX: " );
// Hash chunk 1:
EVP_DigestUpdate(&mdctx, input, chunk1_size);
print_bytes( (const unsigned char *)&mdctx, sizeof(EVP_MD_CTX), "EVP_MD_CTX: " );
// Hash chunk 2:
EVP_DigestUpdate(&mdctx, input+chunk1_size, chunk2_size);
print_bytes( (const unsigned char *)&mdctx, sizeof(EVP_MD_CTX), "EVP_MD_CTX: " );
EVP_DigestFinal_ex(&mdctx, hash_out, &hash_len);
EVP_MD_CTX_cleanup(&mdctx);
return hash_len;
}
int main(int argc, char *argv[]) {
if (argc!=2) {
fprintf(stderr, "One argument string expected.\n");
return 1;
}
//OpenSSL_add_all_digests();
//const EVP_MD *md = EVP_get_digestbyname("MD5");
const EVP_MD *md = EVP_md5(); //EVP_sha256();
if(!md) {
fprintf(stderr, "Unable to init MD5 digest\n");
return 1;
}
unsigned char hash[EVP_MAX_MD_SIZE];
const unsigned char * allBytes = (const unsigned char *)argv[1];
size_t numBytes = strlen(argv[1]); // skip null terminator, consistent with command line md5 -s [string]
printf("Hashing %zd bytes. EVP_MAX_MD_SIZE is %zd, EVP_MD_CTX size is %zd bytes.\n", numBytes, EVP_MAX_MD_SIZE, sizeof(EVP_MD_CTX));
int hash_len = md5_digest_process(md, allBytes, numBytes, hash);
print_bytes(hash, hash_len, "Hash: " );
return 0;
}
With output:
$ ./a.out foobar
Hashing 6 bytes. EVP_MAX_MD_SIZE is 64, EVP_MD_CTX size is 48 bytes.
EVP_MD_CTX: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 (48 Bytes)
EVP_MD_CTX: 609d9d01010000000000000000000000000000000000000080274062cd7f0000000000000000000040de8e0101000000 (48 Bytes)
EVP_MD_CTX: 609d9d01010000000000000000000000000000000000000080274062cd7f0000000000000000000040de8e0101000000 (48 Bytes)
EVP_MD_CTX: 609d9d01010000000000000000000000000000000000000080274062cd7f0000000000000000000040de8e0101000000 (48 Bytes)
Hash: 3858f62230ac3c915f300c664312c63f (16 Bytes)
$ md5 -s foobar
MD5 ("foobar") = 3858f62230ac3c915f300c664312c63f
Since EVP_MD_CTX is not changing after each update, I infer that the algorithm state is actually stored elsewhere and I cannot simply copy the 48 EVP_MD_CTX bytes.
I've seen EVP_MD_CTX_copy_ex() in the manual, but I don't see how to use it for my purpose.

copy a const char* into array of char (facing a bug)

I have following method
static void setName(const char* str, char buf[16])
{
int sz = MIN(strlen(str), 16);
for (int i = 0; i < sz; i++) buf[i] = str[i];
buf[sz] = 0;
}
int main()
{
const char* string1 = "I am getting bug for this long string greater than 16 lenght);
char mbuf[16];
setName(string,mybuf)
// if I use buf in my code it is leading to spurious characters since length is greater than 16 .
Please let me know what is the correct way to code above if the restriction for buf length is 16 in method static void setName(const char* str, char buf[16])
When passing an array as argument, array decays into the pointer of FIRST element of array. One must define a rule, to let the method know the number of elements.
You declare char mbuf[16], you pass it to setName(), setName() will not get char[], but will get char* instead.
So, the declaration should be
static void setName(const char* str, char* buf)
Next, char mbuf[16] can only store 15 chars, because the last char has to be 'null terminator', which is '\0'. Otherwise, the following situation will occur:
// if I use buf in my code it is leading to spurious characters since length is greater than 16 .
Perhaps this will help you understand:
char str[] = "foobar"; // = {'f','o','o','b','a','r','\0'};
So the code should be
static void setName(const char* str, char* buf)
{
int sz = MIN(strlen(str), 15); // not 16
for (int i = 0; i < sz; i++) buf[i] = str[i];
buf[sz] = '\0'; // assert that you're assigning 'null terminator'
}
Also, I would recommend you not to reinvent the wheel, why don't use strncpy instead?
char mbuf[16];
strncpy(mbuf, "12345678901234567890", 15);
The following code passes the size of the memory allocated to the buffer, to the setName function.
That way the setName function can ensure that it does not write outside the allocated memory.
Inside the function either a for loop or strncpy can be used. Both will be controlled by the size parameter sz and both will require that a null terminator character is placed after the copied characters. Again, sz will ensure that the null terminator is written within the memory allocated to the buffer.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static void setName(const char *str, char *buf, int sz);
int main()
{
const int a_sz = 16;
char* string = "This bit is OK!! but any more than 15 characters are dropped";
/* allocate memory for a buffer & test successful allocation*/
char *mbuf = malloc(a_sz);
if (mbuf == NULL) {
printf("Out of memory!\n");
return(1);
}
/* call function and pass size of buffer */
setName(string, mbuf, a_sz);
/* print resulting buffer contents */
printf("%s\n", mbuf); // printed: This bit is OK!
/* free the memory allocated to the buffer */
free(mbuf);
return(0);
}
static void setName(const char *str, char *buf, int sz)
{
int i;
/* size of string or max 15 */
if (strlen(str) > sz - 1) {
sz--;
} else {
sz = strlen(str);
}
/* copy a maximum of 15 characters into buffer (0 to 14) */
for (i = 0; i < sz; i++) buf[i] = str[i];
/* null terminate the string - won't be more than buf[15]) */
buf[i] = '\0';
}
Changing one value const int a_sz allows different numbers of characters to be copied. There is no 'hard coding' of the size in the function, so reducing the risk of errors if the code is modified later on.
I replaced MIN with a simple if ... else structure so that I could test the code.

Storing values of an array in one variable

I am trying to use md5 code to calculate checksums of file. Now the given function prints out the (previously calculated) checksum on screen, but I want to store it in a variable, to be able to compare it later on.
I guess the main problem is that I want to store the content of an array in one variable.
How can I manage that?
Probably this is a very stupid question, but maybe somone can help.
Below is the function to print out the value. I want to modify it to store the result in one variable.
static void MDPrint (MD5_CTX* mdContext)
{
int i;
for (i = 0; i < 16; i++)
{
printf ("%02x", mdContext->digest[i]);
} // end of for
} // end of function
For reasons of completeness the used struct:
/* typedef a 32 bit type */
typedef unsigned long int UINT4;
/* Data structure for MD5 (Message Digest) computation */
typedef struct {
UINT4 i[2]; /* number of _bits_ handled mod 2^64 */
UINT4 buf[4]; /* scratch buffer */
unsigned char in[64]; /* input buffer */
unsigned char digest[16]; /* actual digest after MD5Final call */
} MD5_CTX;
and the used function to calculate the checksum:
static int MDFile (char* filename)
{
FILE *inFile = fopen (filename, "rb");
MD5_CTX mdContext;
int bytes;
unsigned char data[1024];
if (inFile == NULL) {
printf ("%s can't be opened.\n", filename);
return -1;
} // end of if
MD5Init (&mdContext);
while ((bytes = fread (data, 1, 1024, inFile)) != 0)
MD5Update (&mdContext, data, bytes);
MD5Final (&mdContext);
MDPrint (&mdContext);
printf (" %s\n", filename);
fclose (inFile);
return 0;
}
Declare an array and memcpy the result.
Example:
unsigned char old_md5_dig[16]; // <-- binary format
...
MD5_CTX mdContext;
MD5Init(&mdContext);
MD5Update(&mdContext, data, bytes);
MD5Final(&mdContext);
memcpy(old_md5_dig, mdContext.digest, 16); // <--
Edit: to compare the previous with the new md5 hash you can use memcmp,
if (memcmp(old_md5_dig, mdContext.digest, 16)) {
// different hashes
}
Just pass a char buffer and its size to this function:
static void MDGen (mdContext, buf, size)
MD5_CTX *mdContext;
char *buf;
size_t size;
{
int i;
int minSize = 33; // 16 pairs of hex digits plus terminator
if ((buf != NULL) && (size >= minSize))
{
memset(buf, 0, size);
for (i = 0; i < 16; i++)
{
snprintf(buf + (i*2), size - (i*2), "%02x", mdContext->digest[i]);
}
}
}
You can define a variable:
char **md5sums;
You will then need to modify MDPrint to instead return a malloced null-terminated string with the 32 hex digits. You can basically reuse your existing loop, but with sprintf instead.
Then have main add each md5sum (a char*) to md5sums. You will need to use realloc to allocate memory for md5sums because you don't know the number of elements up front.
It should be:
static char* MDString (mdContext)
MD5_CTX *mdContext;
{
int i;
char *digest = malloc(sizeof(char) * 33);
if(digest == NULL)
{
return NULL;
}
for (i = 0; i < 16; i++)
{
sprintf(digest + (i * 2), "%02x", mdContext->digest[i]);
}
return digest;
}
Also, you should modify your code by editing your question. And why are you using K&R syntax?
EDIT: I fixed some incorrect counts.
Since you want to duplicate, store, compare, free and probably more the MD5 digest, just create a md5_t type and write appropriate functions to manipulate it, ie :
typedef char md5_t[16];
md5_t *md5_new( MD5_CTX *pMD5Context )
{
md5_t *pMD5 = malloc( sizeof( md5_t ) );
memcpy( pMD5, pMD5Context->digest, 16 );
return pMD5 ;
}
int md5_cmp( md5_t *pMD5A, md5_t *pMD5B )
{
return memcmp( pMD5A, pMD5B, 16 );
}
void md5_print( md5_t *pMD5 )
{
...
}
void md5_free( md5_t *pMD5 )
{
free( pMD5 );
}
And so on ... Next, create a type for your MD5 array and simple functions to manipulate it :
typedef struct md5array_t {
unsigned int uSize ;
md5_t **ppMD5 ;
}
md5array_t *md5array_new()
{
md5array_t *pArray = malloc( sizeof( md5array_t );
pArray->uSize = 0 ;
pArray->ppMD5 = NULL ;
}
md5array_t *md5array_add( md5array_t *pArray, md5_t *pMD5 )
{
pArray->uSize ++ ;
pArray = realloc( pArray, pArray->uSize + sizeof( md5_t * ) );
pArray->ppMD5[ pArray->uSize-1 ] = pMD5 ;
}
md5_t *md5array_get( md5array_t *pArray, unsigned int uIndex )
{
return pArray->ppMD5[ uIndex ];
}
void md5array_free( md5array_t *pArray }
{
/* I let you find what to write here.
Be sure to read AND understand the previous
functions. */
}
To resume : create a type and the functions you need to manipulate it as soon as you want to do more than one operation with a datum. You don't need to create a real, generic type with full-blown functions representing as many operations you can imagine on that type : just code what you need. For example, in the md5array_t, you can add a md5_t * but you cannot delete it (unless you write the function void md5array_del( md5array_t *pArray *, int iIndex ).
P.S. : my C code is here to "illustrate" my point of view, not to be useable by just copying/pasting it as is ...
Store it as a string and then use strcmp() to compare.
Just leave in an array!
You don't have you store it in variable; because it is ALREADY in a variable..
Just create global variable, store MD5 hash in it and compare to it later.
What you need is MD5IsEqual function, which takes 2 arrays like this.
int MD5IsEqual(unsigned char *x, unsigned char* y)
{
int i;
for(i=0; i<16;i++)
if(x[i] != y[i])
return 0;
return 1;
}
Why not make the function
MD5File(char * filename, unsigned char *digest){
/* as before */
memcpy(digest, mdContext->digest, 16);
return;
}
so that outside the function you have access to the digest (after printing it) ?
The digest is just an array of 16 unsigned char...
You know where the sum is stored by the way you print it: through ->digest[i]. Which is defined like
unsigned char digest[16];
So you basically just need to copy these 16 unsigned char in another array of unsigned char (at least 16 unsigned char long). The function memcpy can do it. If you need comparing two sums, you can use memcmp (the comparing size will be 16*sizeof(unsigned char) which is 16, being sizeof(unsigned char) 1.

Resources