I am looking for a hash function, that can hash a list of non-repeating integers while ignoring the order of them.
Example
I want the two lists
l1 = [0, 1, 3, 7]
l2 = [7, 3, 1, 0]
to have the same hash.
Background
I have an algorithm that finds a list of vertices on a graph. In an undirected graph, the algorithm will find certain lists multiple times in different orders. With my current understanding of the algorithm, it is easier to filter out the duplicates rather than re-inventing the algorithm. For performance reasons, I understand it to be easier to hash the found lists of vertices rather than comparing the whole lists.
Possible answers
Now, I see that
an XOR or a simple sum might be an answer.
Unfortunately, both offer too much potential for hash collisions, as I see it.
The not-very-efficient working method is to sort a list, and then use this sorted list to compare the new list (also sorted) against.
Other Thoughts
Given that
The lists contain only integers.
The integers will be the vertex indices, and the graph can have billions of vertices.
The integers in a list are non-repeating, and their order doesn't matter.
The lists can and will consist of between 2 and 100 (and in some cases > 1000) entries.
No need for cryptographically-secure randomness.
I have this feeling that there should be a relatively easy and straight-forward answer, and I just have not found it.
Use a combination of the product, sum and ^. All are communitive (order independent) with unsigned math.
unsigned long long product = 1;
unsigned sum = 0; // Maybe unsigned long long
unsigned x = 0;
for (i=0; i < array_element_count; i++) {
product *= l[i];
sum += l[i];
x ^= l[i];
}
unsigned long long pre_hash = product + sum + ((unsigned long long) x << 32));
unsigned hash = pre_hash % hash_table_size;
Tip: hash_table_size should be a prime to effectively use all pre_hash bits.
If array_element_count was high, I would consider p *= shift_right_until_odd(l[i]), else p will too often become 0.
If l[i] == 0 p *= l[i] deserves something different. A simple mitigation is p *= l[i] | 1, but that is something pulled out of the air.
Hashing takes time for good design and the above are candidate building blocks for OP.
Any CRC will do the job. Just XOR (I have used 64bit numbers, but 32bits crc, but it should work also with full 64 xor/crc or 32bit xor/crc) the elements together (to eliminate any order between them, as the XOR operation is conmutative, you eliminate the dependency on the order) mod 2&31, then take a CRC32 of the result (that will spread the set of values uniformly, as it warrants ---or tries to--- that a change in one bit will affect half of the bits in the result) See here for sample code and several crc tables. The repository is BSD license, so you can use it as desired.
Below is a sample implementation that generates random lists, and reorders them, comparing their hashes:
crc32ieee8023.h
#ifndef CRC32IEEE8023_H
#define CRC32IEEE8023_H
#include "crc.h"
extern CRC_STATE crc32ieee8023[];
#endif /* CRC32IEEE8023_H */
crc.h
#ifndef CRC_H
#define CRC_H
#include <stdlib.h>
#include <stdint.h>
#define CRC_TABLE_SIZE 256
#define CRC_BYTE_SIZE 8
#define CRC_BYTE_MASK 0xff
typedef uint8_t CRC_BYTE;
typedef uint64_t CRC_STATE;
CRC_STATE do_crc(
CRC_STATE state,
CRC_BYTE *buff,
size_t nbytes,
CRC_STATE *table);
#endif /* CRC_H */
test_xor_crc_hash.c
(This is the important file, where all the stuff is included.)
/* test_crc_table -- program to test a crc hash algorithm that
* checks a list of numbers and generates the same crc in a form
* that is independent on the list order presented.
* Program generates a list of random numbers (32bit) then it
* generates a random permutation of the list and a sorted list,
* calculates the hash over the three lists, and compares them.
*/
#include <errno.h>
#include <fcntl.h>
#include <getopt.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "crc.h"
#include "crc32ieee8023.h"
#define DFLT_N 10
#define RANDOM_DEV "/dev/urandom"
int long_compare(
const void *_a,
const void *_b);
void print(
const char *name,
const uint64_t *v,
int vsz,
CRC_STATE crc,
uint64_t xor);
int
main(int argc, char **argv)
{
int opt;
int n = DFLT_N,
res;
/* process options */
while ((opt = getopt(argc, argv, "n:")) != EOF) {
switch (opt) {
case 'n': res = sscanf(optarg, "%u", &n);
if (res != 1) {
fprintf(stderr,
"%s: invalid format (-n)\n",
optarg);
}
break;
} /* switch */
} /* while */
/* initialization of random number generator */
unsigned short random_state[3];
int fd = open(RANDOM_DEV, O_RDONLY);
if (fd < 0) {
fprintf(stderr,
"open: %s: %s\n",
RANDOM_DEV, strerror(errno));
exit(EXIT_FAILURE);
}
res = read(fd, random_state, sizeof random_state);
if (res < 0) { /* error */
fprintf(stderr,
"read: %s: %s\n",
RANDOM_DEV, strerror(errno));
exit(EXIT_FAILURE);
}
if (res < sizeof random_state) {
fprintf(stderr,
"read: %s: incomplete read (%d/%zd)\n",
RANDOM_DEV, res, sizeof random_state);
exit(EXIT_FAILURE);
}
seed48(random_state);
close(fd);
/* generate a list of random numbers and make two copies */
uint64_t *original = calloc(n, sizeof *original),
*copy_sorted = calloc(n, sizeof *copy_sorted),
*random_sort = calloc(n, sizeof *random_sort);
/* make two copies */
for (int i = 0; i < n; i++) {
original[i] = copy_sorted[i]
= random_sort[i]
= (long)lrand48() | ((long)lrand48() << 32);
}
/* sort the numbers */
qsort(copy_sorted, n, sizeof *copy_sorted, long_compare);
/* and random permutation */
for (int i = 0; i < n-1; i++) {
int j = lrand48() % (n - i);
if (i != j) {
uint64_t temp = random_sort[i];
random_sort[i] = random_sort[j];
random_sort[j] = temp;
}
}
/* calculate the sorts */
uint64_t xor_original = 0, xor_sorted = 0, xor_random = 0;
for (int i = 0; i < n; i++) {
xor_original ^= original[i];
xor_sorted ^= copy_sorted[i];
xor_random ^= random_sort[i];
}
/* now, calculate the crc's (a crc64 would be better for long) */
CRC_STATE
crc_original = do_crc(0xffffffff, (unsigned char *)&xor_original,
sizeof xor_original, crc32ieee8023),
crc_sorted = do_crc(0xffffffff, (unsigned char *)&xor_sorted,
sizeof xor_sorted, crc32ieee8023),
crc_random = do_crc(0xffffffff, (unsigned char *)&xor_random,
sizeof xor_random, crc32ieee8023);
print("original", original, n, crc_original, xor_original);
print(" sorted", copy_sorted, n, crc_sorted, xor_sorted);
print(" random", random_sort, n, crc_random, xor_random);
if (crc_original != crc_sorted || crc_sorted != crc_random) {
fprintf(stderr, "crc's don't match (crc_original == 0x%08lx, "
"crc_sorted == 0x%08lx, crc_random == 0x%08lx)\n",
crc_original, crc_sorted, crc_random);
}
/* change only one bit in one element to see how it changes the hash */
int bit_to_change = lrand48() % (n * 64),
elem_to_change = bit_to_change % n;
bit_to_change %= 64;
original[elem_to_change] ^= (1UL << bit_to_change); /* change the bit */
/* we should do the calculation over all elements, but just
* changing a bit in one element will change just the same bit in the
* xor_original accumulation variable */
uint64_t xor_original_new = xor_original;
xor_original_new ^= (1UL << bit_to_change);
printf("element=%d, bit=%d\n", elem_to_change, bit_to_change);
uint64_t crc_original_new = do_crc(0xffffffff, (unsigned char *)&xor_original_new, sizeof xor_original_new, crc32ieee8023);
print(" chg1bit", original, n, crc_original_new, xor_original_new);
}
int long_compare(const void *_a, const void *_b)
{
const uint64_t *a = _a, *b = _b;
return *a == *b
? 0
: *a > *b
? +1
: -1;
}
void print(const char *name, const uint64_t *v, int vsz, CRC_STATE crc, uint64_t xor)
{
printf("%s: { ", name);
char *sep = "";
for (int i = 0; i < vsz; i++) {
printf("%s0x%016lx", sep, v[i]);
sep = ", ";
}
printf(" }\n"
" xor = 0x%016lx, crc = 0x%08lx\n",
xor, crc);
}
crc.c
#include <sys/types.h>
#include "crc.h"
/* table based CRC calculation */
CRC_STATE do_crc(
CRC_STATE state,
CRC_BYTE *buff,
size_t nbytes,
CRC_STATE *table)
{
CRC_STATE index;
while (nbytes--) {
state ^= *buff++;
index = state & CRC_BYTE_MASK;
state >>= CRC_BYTE_SIZE;
state ^= table[index];
} /* while */
return state;
} /* do_crc */
crc32ieee8023.c
#include "crc.h"
/* variables */
CRC_STATE crc32ieee8023[] = {
/* Comando usado: mkcrc -gpedb88320 */
/* Polinomio: x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x+1 */
/* 0 */ 0x0, 0x77073096, 0xee0e612c, 0x990951ba,
/* 4 */ 0x76dc419, 0x706af48f, 0xe963a535, 0x9e6495a3,
/* 8 */ 0xedb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988,
/* 12 */ 0x9b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91,
/* 16 */ 0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de,
/* 20 */ 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
/* 24 */ 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec,
/* 28 */ 0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5,
/* 32 */ 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172,
/* 36 */ 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b,
/* 40 */ 0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940,
/* 44 */ 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
/* 48 */ 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116,
/* 52 */ 0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f,
/* 56 */ 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
/* 60 */ 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d,
/* 64 */ 0x76dc4190, 0x1db7106, 0x98d220bc, 0xefd5102a,
/* 68 */ 0x71b18589, 0x6b6b51f, 0x9fbfe4a5, 0xe8b8d433,
/* 72 */ 0x7807c9a2, 0xf00f934, 0x9609a88e, 0xe10e9818,
/* 76 */ 0x7f6a0dbb, 0x86d3d2d, 0x91646c97, 0xe6635c01,
/* 80 */ 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e,
/* 84 */ 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457,
/* 88 */ 0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c,
/* 92 */ 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
/* 96 */ 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2,
/* 100 */ 0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb,
/* 104 */ 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0,
/* 108 */ 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9,
/* 112 */ 0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086,
/* 116 */ 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
/* 120 */ 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4,
/* 124 */ 0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad,
/* 128 */ 0xedb88320, 0x9abfb3b6, 0x3b6e20c, 0x74b1d29a,
/* 132 */ 0xead54739, 0x9dd277af, 0x4db2615, 0x73dc1683,
/* 136 */ 0xe3630b12, 0x94643b84, 0xd6d6a3e, 0x7a6a5aa8,
/* 140 */ 0xe40ecf0b, 0x9309ff9d, 0xa00ae27, 0x7d079eb1,
/* 144 */ 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe,
/* 148 */ 0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7,
/* 152 */ 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc,
/* 156 */ 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5,
/* 160 */ 0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252,
/* 164 */ 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
/* 168 */ 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60,
/* 172 */ 0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79,
/* 176 */ 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
/* 180 */ 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f,
/* 184 */ 0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04,
/* 188 */ 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,
/* 192 */ 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x26d930a,
/* 196 */ 0x9c0906a9, 0xeb0e363f, 0x72076785, 0x5005713,
/* 200 */ 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0xcb61b38,
/* 204 */ 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0xbdbdf21,
/* 208 */ 0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e,
/* 212 */ 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,
/* 216 */ 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c,
/* 220 */ 0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45,
/* 224 */ 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2,
/* 228 */ 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db,
/* 232 */ 0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0,
/* 236 */ 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
/* 240 */ 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6,
/* 244 */ 0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf,
/* 248 */ 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94,
/* 252 */ 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d,
}; /* crc32ieee8023 */
Makefile
targets = test_xch
toclean = $(targets)
test_xch_deps =
test_xch_objs = crc32ieee8023.o crc.o test_xor_crc_hash.o
test_xch_libs =
test_xch_ldfl =
toclean += $(test_xch_objs)
all: $(targets)
clean:
$(RM) $(toclean)
test_xch: $(test_xch_deps) $(test_xch_objs)
$(CC) $(LDFLAGS) $($#_ldfl) -o $# $($#_objs) $($#_libs) $(LIBS)
To make the program, just run:
$ make
and to run it, you can use option -n that allows you to specify the number of random elements to generate.
I think you will have to invent one to avoid the slow sorting option. In addition to XOR and arithmetic addition, there are bit rotations, and bit masks you could use. If you need high collision resistance, you could just combine more than one of the hash functions. e.g. Assuming the d_i and arithmetic are modular like with uint32_t for example,
H_1 = sum_{i = 1 to n} d_i
H_2 = xor_{i = 1 to n} d_i
H_3 = xor_{i = 1 to n} (rotl(d_i, d_i & 0x1f) + c)
Then take H1H2H3 as a 12 byte hash.
I want to know if there is any way to convert a string like "5ABBF13A000A01" to the next struct using struct and union method:
struct xSensorData2{
unsigned char flags;
unsigned int refresh_rate;
unsigned int timestamp;
};
Data should be:
flags = 0x01 (last byte);
refresh_rate = 0x000A (two bytes);
timestamp = 5ABBF13A (four bytes);
I'm thinking in next struct of data:
struct xData{
union{
struct{
unsigned char flags:8;
unsigned int refresh_rate:16;
unsigned int timestamp:32;
};
char pcBytes[8];
};
}__attribute__ ((packed));
But I got a struct of 12 bytes, I think it's because bit fields don't work for different types of data. I should just convert string to array of hex, copy to pcBytes and have access to each field.
Update:
In stm32 platform i used this code:
bool bDecode_HexString(char *p)
{
uint64_t data = 0; /* exact width type for conversion
*/
char *endptr = p; /* endptr for strtoul validation */
errno = 0; /* reset errno */
data = strtoull (p, &endptr, 16); /* convert input to uint64_t */
if (p == endptr && data == 0) { /* validate digits converted */
fprintf (stderr, "error: no digits converted.\n");
return false;
}
if (errno) { /* validate no error occurred during conversion */
fprintf (stderr, "error: conversion failure.\n");
return false;
}
//printf ("data: 0x%" PRIx64 "\n\n", data); /* output conerted string */
sensor.flags = data & 0xFF; /* set flags */
sensor.rate = (data >> CHAR_BIT) & 0xFFFF; /* set rate */
sensor.tstamp = (data >> (3 * CHAR_BIT)) & 0xFFFFFFFF; /* set timestamp */
return true;
/* output sensor struct */
// printf ("sensor.flags : 0x%02" PRIx8 "\nsensor.rate : 0x%04" PRIx16
// "\nsensor.tstamp: 0x%08" PRIx32 "\n", sensor.flags, sensor.rate,
// sensor.tstamp);
}
and call the function:
char teste[50] = "5ABBF13A000A01";
bDecode_HexString(teste);
I get data = 0x3a000a01005abbf1
If you are still struggling with the separation of your input into flags, rate & timestamp, then the suggestions in the comments regarding using an unsigned type to hold your input string converted to a value, and using the exact width types provided in <stdint.h>, will avoid potential problems inherent in manipulating signed types (e.g. potential sign-extension, etc.)
If you want to separate the values and coordinate those in struct, that is 100% fine. The work come in separating the individual flags, rate & timestamp. How you choose to store them so they are convenient within your code is up to you. A simple struct utilizing exact-width type could be:
typedef struct { /* struct holding sensor data */
uint8_t flags;
uint16_t rate;
uint32_t tstamp;
} sensor_t;
In a conversion from char * to uint64_t with strtoull, you have two primary validation checks:
utilize both the pointer to the string to convert and the endptr parameter to validate that digits were in fact converted (strtoull sets endptr to point 1-character after the last digit converted). You use this to compare endptr with the original pointer for the data converted to confirm that a conversion took place (if no digits were converted the original pointer and endptr will be the same and the value returned will be zero); and
you set errno = 0; before the conversion and then check again after the conversion to insure no error occurred during the conversion itself. If strtoull encounters an error, value exceeds range, etc.., errno is set to a positive value.
(and if you have specific range validations, e.g. you want to store the result in a size less than that of the conversion, like uint32_t instead of uint64_t, you need to validate the final value can be stored in the smaller type)
A simple approach would be:
uint64_t data = 0; /* exact width type for conversion */
...
char *p = argc > 1 ? argv[1] : "0x5ABBF13A000A01", /* input */
*endptr = p; /* endptr for strtoul validation */
errno = 0; /* reset errno */
...
data = strtoull (p, &endptr, 0); /* convert input to uint64_t */
if (p == endptr && data == 0) { /* validate digits converted */
fprintf (stderr, "error: no digits converted.\n");
return 1;
}
if (errno) { /* validate no error occurred during conversion */
fprintf (stderr, "error: conversion failure.\n");
return 1;
}
printf ("data: 0x%" PRIx64 "\n\n", data); /* output conerted string */
Finally, separating the value in data into the individual values of flags, rate & timestamp, can be done with simple shifts & ANDS, e.g.
sensor_t sensor = { .flags = 0 }; /* declare instance of sensor */
...
sensor.flags = data & 0xFF; /* set flags */
sensor.rate = (data >> CHAR_BIT) & 0xFFFF; /* set rate */
sensor.tstamp = (data >> (3 * CHAR_BIT)) & 0xFFFFFFFF; /* set timestamp */
/* output sensor struct */
printf ("sensor.flags : 0x%02" PRIx8 "\nsensor.rate : 0x%04" PRIx16
"\nsensor.tstamp: 0x%08" PRIx32 "\n", sensor.flags, sensor.rate,
sensor.tstamp);
Putting it altogether, you could do something similar to:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <inttypes.h>
#include <errno.h>
#include <limits.h>
typedef struct { /* struct holding sensor data */
uint8_t flags;
uint16_t rate;
uint32_t tstamp;
} sensor_t;
int main (int argc, char **argv) {
uint64_t data = 0; /* exact width type for conversion */
sensor_t sensor = { .flags = 0 }; /* declare instace of sensor */
char *p = argc > 1 ? argv[1] : "0x5ABBF13A000A01", /* input */
*endptr = p; /* endptr for strtoul validation */
errno = 0; /* reset errno */
data = strtoull (p, &endptr, 0); /* convert input to uint64_t */
if (p == endptr && data == 0) { /* validate digits converted */
fprintf (stderr, "error: no digits converted.\n");
return 1;
}
if (errno) { /* validate no error occurred during conversion */
fprintf (stderr, "error: conversion failure.\n");
return 1;
}
printf ("data: 0x%" PRIx64 "\n\n", data); /* output conerted string */
sensor.flags = data & 0xFF; /* set flags */
sensor.rate = (data >> CHAR_BIT) & 0xFFFF; /* set rate */
sensor.tstamp = (data >> (3 * CHAR_BIT)) & 0xFFFFFFFF; /* set timestamp */
/* output sensor struct */
printf ("sensor.flags : 0x%02" PRIx8 "\nsensor.rate : 0x%04" PRIx16
"\nsensor.tstamp: 0x%08" PRIx32 "\n", sensor.flags, sensor.rate,
sensor.tstamp);
return 0;
}
Example Use/Output
$ ./bin/struct_sensor_bitwise
data: 0x5abbf13a000a01
sensor.flags : 0x01
sensor.rate : 0x000a
sensor.tstamp: 0x5abbf13a
Look things over and let me know if you have further questions.
Here, you have a string of length 14, representing a 7-byte value, consisting of 14 hexadecimal values. Consider this code using strtol, which hexadecimally converts your string into a long int, and then decodes it digit-wise.
uint64_t n = strtoul(str, NULL, 16); // Convert to hexadecimal
int flags = n % 0x10;
int refresh_rate = (n / 0x100) % 0x100000;
int timestamp = n / 0x1000000;
Here is a test case (#import <stdlib.h>):
char str[16] = "5ABBF13EE00AFF";
uint64_t n = strtoul(str, NULL, 16); // Convert to hexadecimal
// Use divide and mod to extract digit segment
int flags = n % 0x100;
int refresh_rate = (n / 0x100) % 0x10000;
int timestamp = n / 0x1000000;
// Print timestamp, refresh rate, and flags
printf("%p %p %p", timestamp, refresh_rate, flags);
The expected result is 0x5abbf13e 0xe00a 0xff as expected.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I understand the general idea of C and how making a log file would go. Reading/writing to a file and such.
My concern is the following format that is desired:
[![enter image description here][1]][1]
I've gotten a good chunk done now but am concerned with how to append to my log file after the first record. I increment the file's record count (in the top 2 bytes) and write the first record after it. How would I then setup to add the 2nd/3rd/etc records to showup after each other?
//confirm a file exists in the directory
bool fileExists(const char* file)
{
struct stat buf;
return (stat(file, &buf) == 0);
}
int rightBitShift(int val, int space)
{
return ((val >> space) & 0xFF);
}
int leftBitShift(int val, int space)
{
return (val << space);
}
int determineRecordCount(char * logName)
{
unsigned char record[2];
FILE *fp = fopen(logName, "rb");
fread(record, sizeof(record), 1, fp);
//display the record number
int recordNum = (record[0] << 8) | record[1];
recordNum = recordNum +1;
return (recordNum);
}
void createRecord(int argc, char **argv)
{
int recordNum;
int aux = 0;
int dst;
char* logName;
char message[30];
memset(message,' ',30);
//check argument count and validation
if (argc == 7 && strcmp("-a", argv[2]) ==0 && strcmp("-f", argv[3]) ==0 && strcmp("-t", argv[5]) ==0)
{
//aux flag on
aux = 1;
logName = argv[4];
strncpy(message, argv[6],strlen(argv[6]));
}
else if (argc == 6 && strcmp("-f", argv[2]) ==0 && strcmp("-t", argv[4]) ==0)
{
logName = argv[3];
strncpy(message, argv[5],strlen(argv[5]));
}
else
{
printf("Invalid Arguments\n");
exit(0);
}
//check if log exists to get latest recordNum
if (fileExists(logName))
{
recordNum = determineRecordCount(logName);
printf("%i\n",recordNum);
}
else
{
printf("Logfile %s not found\n", logName);
recordNum = 1;
}
//Begin creating record
unsigned char record[40]; /* One record takes up 40 bytes of space */
memset(record, 0, sizeof(record));
//recordCount---------------------------------------------------------------------
record[0] = rightBitShift (recordNum, 8); /* Upper byte of sequence number */
record[1] = rightBitShift (recordNum, 0); /* Lower byte of sequence number */
//get aux/dst flags---------------------------------------------------------------
//get date and time
time_t timeStamp = time(NULL);
struct tm *date = localtime( &timeStamp );
if (date->tm_isdst)
dst = 1;
record[2] |= aux << 7; //set 7th bit
record[2] |= dst << 6; //set 6th
//timeStamp-----------------------------------------------------------------------
record[3] |= rightBitShift(timeStamp, 24);//high byte
record[4] |= rightBitShift(timeStamp, 16);
record[5] |= rightBitShift(timeStamp, 8);
record[6] |= rightBitShift(timeStamp, 0); //low byte
//leave bytes 7-8, set to 0 -----------------------------------------
record[7] = 0;
record[8] = 0;
//store message--------------------------------------------
strncpy(&record[9], message, strlen(message));
//write record to log-----------------------------------------------------------------
FILE *fp = fopen(logName, "w+");
unsigned char recordCount[4];
recordCount[0] = rightBitShift (recordNum, 8); /* Upper byte of sequence number */
recordCount[1] = rightBitShift (recordNum, 0); /* Lower byte of sequence number */
recordCount[2] = 0;
recordCount[3] = 0;
fwrite(recordCount, sizeof(recordCount), 1, fp);
fwrite(record, sizeof(record), 1, fp);
fclose(fp);
printf("Record saved successfully\n");
}
NOTE: I've never had to do this before in C, take it with a grain of salt.
This is a very specific binary formatting where each bit is precisely accounted for. It's using the Least-Significant-Bit numbering scheme (LSB 0) where the bits are numbered from 7 to 0.
Specifying that the "upper byte" comes first means this format is big-endian. The most significant bits come first. This is like how we write our numbers, four thousand, three hundred, and twenty one is 4321. 1234 would be little-endian. For example, the Number Of Records and Sequence are both 16 bit big-endian numbers.
Finally, the checksum is a number calculated from the rest of the record to verify there were no mistakes in transmission. The spec defines how to make the checksum.
Your job is to precisely reproduce this format, probably using the fixed-sized types found in stdint.h or unsigned char. For example, the sequence would be a uint16_t or unsigned char[2].
The function to produce a record might have a signature like this:
unsigned char *make_record( const char *message, bool aux );
The user only has to supply you with the message and the aux flag. The rest you can be figured out by the function. You might decide to let them pass in the timestamp and sequence. Point is, the function needs to be passed just the data, it takes care of the formatting.
This byte-ordering means you can't just write out integers, they might be the wrong size or the wrong byte order. That means any multi-byte integers must be serialized before you can write them to the record. This answer covers ways to do that and I'll be using the ones from this answer because they proved a bit more convenient.
#include <stdio.h>
#include <stdint.h>
#include <time.h>
#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
unsigned char *make_record( const char *message, bool aux ) {
// Allocate and zero memory for the buffer.
// Zeroing means no risk of accidentally sending garbage.
unsigned char *buffer = calloc( 40, sizeof(unsigned char) );
// As we add to the buffer, pos will track the next byte to be written.
unsigned char *pos = buffer;
// I decided not make the user responsible for
// the sequence number. YMMV.
static uint16_t sequence = 1;
pos = serialize_uint16( pos, sequence );
// Get the timestamp and DST.
time_t timestamp = time(NULL);
struct tm *date = localtime( ×tamp );
// 2nd row is all flags and a bunch of 0s. Start with them all off.
uint8_t flags = 0;
if( aux ) {
// Flip the 7th bit on.
flags |= 0x80;
}
if( date->tm_isdst ) {
// Flip the 6th bit on.
flags |= 0x40;
}
// That an 8 bit integer has no endianness, this is to ensure
// pos is consistently incremented.
pos = serialize_uint8(pos, flags);
// I don't know what their timestamp format is.
// This is just a guess. It's probably wrong.
pos = serialize_uint32(pos, (uint32_t)timestamp);
// "Spare" is all zeros.
// The spec says this is 3 bytes, but only gives it bytes
// 7 and 8. I'm going with 2 bytes.
pos = serialize_uint16(pos, 0);
// Copy the message in, 30 bytes.
// strncpy() does not guarantee the message will be null
// terminated. This is probably fine as the field is fixed width.
// More info about the format would be necessary to know for sure.
strncpy( pos, message, 30 );
pos += 30;
// Checksum the first 39 bytes.
// Sorry, I don't know how to do 1's compliment sums.
pos = serialize_uint8( pos, record_checksum( buffer, 39 ) );
// pos has moved around, but buffer remains at the start
return buffer;
}
int main() {
unsigned char *record = make_record("Basset hounds got long ears", true);
fwrite(record, sizeof(unsigned char), 40, stdout);
}
At this point my expertise is exhausted, I've never had to do this before. I'd appreciate folks fixing up the little mistakes in edits and suggesting better ways to do it in the comments, like what to do with the timestamp. And maybe someone else can cover how to do 1's compliment checksums in another answer.
As a byte is composed by 8 bits (from 0 to 7) you can use bitwise operations to modify them as asked in your specifications. Take a look for general information (https://en.wikipedia.org/wiki/Bitwise_operations_in_C). As a preview, you can use >> or << operators to determine which bit to modify, and use logical operators | and & to set it's values.