Quick question for those more experienced in c...
I want to compute a SHA256 checksum using the functions from openssl for the current time an operation takes place. My code consists of the following:
time_t cur_time = 0;
char t_ID[40];
char obuf[40];
char * timeBuf = malloc(sizeof(char) * 40 + 1);
sprintf(timeBuf, "%s", asctime(gmtime(&cur_time)));
SHA256(timeBuf, strlen(timeBuf), obuf);
sprintf(t_ID, "%02x", obuf);
And yet, when I print out the value of t_ID in a debug statement, it looks like 'de54b910'. What am I missing here?
Edited to fix my typo around malloc and also to say I expected to see the digest form of a sha256 checksum, in hex.
Since obuf is an array, printing its value causes it to decay to a pointer and prints the value of the memory address that the array is stored at. Write sensible code to print a 256-bit value.
Maybe something like:
for (int i = 0; i < 32; ++i)
printf("%02X", obuf[i]);
This is not really intended as an answer, I'm just sharing a code fragment with the OP.
To hash the binary time_t directly without converting the time to a string, you could use something like (untested):
time_t cur_time;
char t_ID[40];
char obuf[40];
gmtime(&cur_time);
SHA256(&cur_time, sizeof(cur_time), obuf);
// You know this doesn't work:
// sprintf(t_ID, "%02x", obuf);
// Instead see https://stackoverflow.com/questions/6357031/how-do-you-convert-buffer-byte-array-to-hex-string-in-c
How do you convert buffer (byte array) to hex string in C?
This doesn't address byte order. You could use network byte order functions, see:
htons() function in socket programing
http://beej.us/guide/bgnet/output/html/multipage/htonsman.html
One complication: the size of time_t is not specified, it can vary by platform. It's traditionally 32 bits, but on 64 bit machines it can be 64 bits. It's also usually the number of seconds since Unix epoc, midnight, January 1, 1970.
If you're willing to live with assumption that the resolution is seconds and don't have to worry about the code working in 20 years (see: https://en.wikipedia.org/wiki/Year_2038_problem) then you might use (untested):
#include <netinet/in.h>
time_t cur_time;
uint32_t net_cur_time; // cur_time converted to network byte order
char obuf[40];
gmtime(&cur_time);
net_cur_time = htonl((uint32_t)cur_time);
SHA256(&net_cur_time, sizeof(net_cur_time), obuf);
I'll repeat what I mentioned in a comment: it's hard to understand what you possibly hope to gain from this hash, or why you can't use the timestamp directly. Cryptographically secure hashes such as SHA256 go through a lot of work to ensure the hash is not reversible. You can't benefit from that because the input data is from a limited known set. At the very least, why not use CRC32 instead because it's much faster.
Good luck.
Related
Today I am trying to copy a unsigned long variable into the contents of an unsigned char * variable.
The reasoning for this is, I wrote an RC4 cipher which requires the key input to be a unsigned char *, I am using the SYSTEMTIME class to obtain a value & combining it with a randomly generated long value to obtain my key for RC4 - I am using it as a timestamp for a user created account to mark in my sqlite dbs.
Anyways, the problem I ran into is that I cannot copy the ULONG into PUCHAR.
I've tried
wsprintfA(reinterpret_cast<LPSTR>(ucVar), "%lu", ulVar);
and I've tried
wsprintfA((LPSTR)ucVar, "%lu", ulVar);
However, after executing my program the result in ucVar is just empty, or it doesn't even compute, and crashing the application.
[edit 1]
I thought maybe the memcpy approach would work, so I tried declaring another variable and moving it into ucVar, but it still crashed the application - i.e. It didn't reach the MessageBox():
unsigned char *ucVar;
char tmp[64]; // since ulVar will never be bigger than 63 character + 1 for '\0'
wsprintfA(tmp, "%lu", ulVar);
memcpy(ucVar, tmp, sizeof(tmp));
MessageBox(0, (LPSTR)ucVar, "ucVar", 0);
[/edit 1]
[edit 2]
HeapAlloc() on ucVar with of size 64 fixed my problem, thank you ehnz for your suggestion!
[/edit 2]
Can anyone give me some approach to this problem? It is greatly appreciated!
Regards,
Andrew
Unless you have ownership of memory you're trying to use, all kinds of things can happen. These may range from the error going unnoticed because nothing else already owns that memory, to an instant crash, to a value that disappears because something else overwrites the memory between the time that you set it and the time that you attempt to retrieve a value from it.
Fairly fundamental concepts when dealing with dynamic memory allocation, but quite the trap for the uninitiated.
I am trying to compute the SHA1 value of a given string in C. I am using the OpenSSL library via #include <openssl/sha.h>. The relevant part of the program is below.
but it shouldn't cause any issues.
void checkHash(char* tempString) {
unsigned char testHash[SHA_DIGEST_LENGTH];
unsigned char* sha1String = (unsigned char*)tempString;
SHA1(sha1String, sizeof(sha1String), testHash);
printf("String: %s\nActual hash: 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8\nComputed hash: ", tempString);
// I verified the actual hash for "a" using multiple online hash generators.
for (i = 0; i < SHA_DIGEST_LENGTH; i++)
printf("%x", testHash[i]);
printf("\n");
}
Running the program with checkHash("a"); yields the following output:
String: a
Actual hash: 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8
Computed hash: 16fac7d269b6674eda4d9cafee21bb486556527c
How come these hashes do not match? I am running in a 64-bit Linux VM on top of a 64-bit Windows 7 machine. That has caused some problems with poor hashing implementations for me in the past but I doubt that is the issue using the OpenSSL version.
sizeof(sha1string) is the same thing as sizeof(unsigned char*), i.e. the size of a data pointer. You want to pass the string's length there, use strlen instead of sizeof, otherwise you won't be hashing what you think you're hashing.
If tempString isn't a null-terminated string but arbitrary data, you need to pass in the length of the data to checkHash, there's no way in that case to tell the length from within that function.
Anyone know why certain fields in proc.h in Minix are char, when I thought they'd be int?
37 char p_ticks_left; /* number of scheduling ticks left */
38 char p_quantum_size; /* quantum size in ticks */
So, if we want to add a new "int" field should we make it a char?
If char is big enough to hold all the necessary values, why not use it? Of course, int may be somewhat more performant, but at the same time char is usually smaller.
I believe you can use any type that makes sense.
consider from design, maybe It's enough to save value of "number of scheduling ticks left" and "quantum size in ticks". and size of char is smaller than size of int.
I was wondering if theres a realy good (performant) solution how to Convert a whole file to lower Case in C.
I use fgetc convert the char to lower case and write it in another temp-file with fputc. At the end i remove the original and rename the tempfile to the old originals name. But i think there must be a better Solution for it.
This doesn't really answer the question (community wiki), but here's an (over?)-optimized function to convert text to lowercase:
#include <assert.h>
#include <ctype.h>
#include <stdio.h>
int fast_lowercase(FILE *in, FILE *out)
{
char buffer[65536];
size_t readlen, wrotelen;
char *p, *e;
char conversion_table[256];
int i;
for (i = 0; i < 256; i++)
conversion_table[i] = tolower(i);
for (;;) {
readlen = fread(buffer, 1, sizeof(buffer), in);
if (readlen == 0) {
if (ferror(in))
return 1;
assert(feof(in));
return 0;
}
for (p = buffer, e = buffer + readlen; p < e; p++)
*p = conversion_table[(unsigned char) *p];
wrotelen = fwrite(buffer, 1, readlen, out);
if (wrotelen != readlen)
return 1;
}
}
This isn't Unicode-aware, of course.
I benchmarked this on an Intel Core 2 T5500 (1.66GHz), using GCC 4.6.0 and i686 (32-bit) Linux. Some interesting observations:
It's about 75% as fast when buffer is allocated with malloc rather than on the stack.
It's about 65% as fast using a conditional rather than a conversion table.
I'd say you've hit the nail on the head. Temp file means that you don't delete the original until you're sure that you're done processing it which means upon error the original remains. I'd say that's the correct way of doing it.
As suggested by another answer (if file size permits) you can do a memory mapping of the file via the mmap function and have it readily available in memory (no real performance difference if the file is less than the size of a page as it's probably going to get read into memory once you do the first read anyway)
You can usually get a little bit faster on big inputs by using fread and fwrite to read and write big chunks of the input/output. Also you should probably convert a bigger chunk (whole file if possible) into memory and then write it all at once.
edit: I just rememberd one more thing. Sometimes programs can be faster if you select a prime number (at the very least not a power of 2) as the buffer size. I seem to recall this has to do with specifics of the cacheing mechanism.
If you're processing big files (big as in, say, multi-megabytes) and this operation is absolutely speed-critical, then it might make sense to go beyond what you've inquired about. One thing to consider in particular is that a character-by-character operation will perform less well than using SIMD instructions.
I.e. if you'd use SSE2, you could code the toupper_parallel like (pseudocode):
for (cur_parallel_word = begin_of_block;
cur_parallel_word < end_of_block;
cur_parallel_word += parallel_word_width) {
/*
* in SSE2, parallel compares are either about 'greater' or 'equal'
* so '>=' and '<=' have to be constructed. This would use 'PCMPGTB'.
* The 'ALL' macro is supposed to replicate into all parallel bytes.
*/
mask1 = parallel_compare_greater_than(*cur_parallel_word, ALL('A' - 1));
mask2 = parallel_compare_greater_than(ALL('Z'), *cur_parallel_word);
/*
* vector op - and all bytes in two vectors, 'PAND'
*/
mask = mask1 & mask2;
/*
* vector op - add a vector of bytes. Would use 'PADDB'.
*/
new = parallel_add(cur_parallel_word, ALL('a' - 'A'));
/*
* vector op - zero bytes in the original vector that will be replaced
*/
*cur_parallel_word &= !mask; // that'd become 'PANDN'
/*
* vector op - extract characters from new that replace old, then or in.
*/
*cur_parallel_word |= (new & mask); // PAND / POR
}
I.e. you'd use parallel comparisons to check which bytes are uppercase, and then mask both original value and 'uppercased' version (one with the mask, the other with the inverse) before you or them together to form the result.
If you use mmap'ed file access, this could even be performed in-place, saving on the bounce buffer, and saving on many function and/or system calls.
There is a lot to optimize when your starting point is a character-by-character 'fgetc' / 'fputc' loop; even shell utilities are highly likely to perform better than that.
But I agree that if your need is very special-purpose (i.e. something as clear-cut as ASCII input to be converted to uppercase) then a handcrafted loop as above, using vector instruction sets (like SSE intrinsics/assembly, or ARM NEON, or PPC Altivec), is likely to make a significant speedup possible over existing general-purpose utilities.
Well, you can definitely speed this up a lot, if you know what the character encoding is. Since you're using Linux and C, I'm going to go out on a limb here and assume that you're using ASCII.
In ASCII, we know A-Z and a-z are contiguous and always 32 apart. So, what we can do is ignore the safety checks and locale checks of the toLower() function and do something like this:
(pseudo code)
foreach (int) char c in the file:
c -= 32.
Or, if there may be upper and lowercase letters, do a check like
if (c > 64 && c < 91) // the upper case ASCII range
then do the subtract and write it out to the file.
Also, batch writes are faster, so I would suggest first writing to an array, then all at once writing the contents of the array to the file.
This should be considerable faster.
how can i make a checksum of a file using C? i dont want to use any third party, just default c language and also speed is very important (its less the 50mb files but anyway)
thanks
I would suggest starting with the simple one and then only worrying about introducing the fast requirement if it turns out to be an issue.
Far too much time is wasted on solving problems that do not exist (see YAGNI).
By simple, I mean simply starting a checksum character (all characters here are unsigned) at zero, reading in every character and subtracting it from the checksum character until the end of the file is reached, assuming your implementation wraps intelligently.
Something like in the following program:
#include <stdio.h>
unsigned char checksum (unsigned char *ptr, size_t sz) {
unsigned char chk = 0;
while (sz-- != 0)
chk -= *ptr++;
return chk;
}
int main(int argc, char* argv[])
{
unsigned char x[] = "Hello_";
unsigned char y = checksum (x, 5);
printf ("Checksum is 0x%02x\n", y);
x[5] = y;
y = checksum (x, 6);
printf ("Checksum test is 0x%02x\n", y);
return 0;
}
which outputs:
Checksum is 0x0c
Checksum test is 0x00
That checksum function actually does both jobs. If you pass it a block of data without a checksum on the end, it will give you the checksum. If you pass it a block with the checksum on the end, it will give you zero for a good checksum, or non-zero if the checksum is bad.
This is the simplest approach and will detect most random errors. It won't detect edge cases like two swapped characters so, if you need even more veracity, use something like Fletcher or Adler.
Both of those Wikipedia pages have sample C code you can either use as-is, or analyse and re-code to avoid IP issues if you're concerned.
Determine which algorithm you want to use (CRC32 is one example)
Look up the algorithm on Wikipedia or other source
Write code to implement that algorithm
Post questions here if/when the code doesn't correctly implement the algorithm
Profit?
Simple and fast
FILE *fp = fopen("yourfile","rb");
unsigned char checksum = 0;
while (!feof(fp) && !ferror(fp)) {
checksum ^= fgetc(fp);
}
fclose(fp)
Generally, CRC32 with a good polynomial is probably your best choice for a non-cryptographic-hash checksum. See here for some reasons: http://guru.multimedia.cx/crc32-vs-adler32/ Click on the error correcting category on the right-hand side to get a lot more crc-related posts.
I would recommend using a BSD implementation. For example, http://www.freebsd.org/cgi/cvsweb.cgi/src/usr.bin/cksum/