Integer data compression for transfer in C without external libraries - c

I googled ans searched here a bunch without a fitting solution. The title is maybe a bit weird or not fully accurate, but let me explain:
My IoT device collects a bunch of data every second that I can represent as a list of integer. Here is an example of one row of sensor reads (the zeros are not always 0 btw):
230982 0 4294753011 -9 4294198951 -1 4294225518 0 0 0 524789 0 934585 0 4 0 0 0 0
On trigger I want to send the whole table (all rows until then) to my computer. I could just stringify it and concatenate everything, but wonder if there is a more efficient encoding/compression to reduce the byte count, both when storing in RAM/flash and for reduced transfer volume. Ideally this could be achieved with integrated functions, ie no external compression libraries. I am not that strong with encoding/compression, hope you can give me a hint.

Zlib/Zstd libraries are better suited for doing general purpose compression. If I may assume that you don't want to use any third party libraries, here is a hand coded version of some naive compression method, which saves half of the bytes of the input string.
The basic idea is very simple. Your strings will at most have 16 different characters which can be mapped to 4-bits rather than typical 8-bits. SEE THE ASSUMPTIONS BELOW. You can try base16, base64, base128 encodings too, but this is the simplest.
Assumptions:
First you'll convert all your numbers into a string in decimal format.
The string won't contain any other characters than 0,1,2,3,4,5,6,7,8,9,+,-,.,space, and a comma.
============================================================================
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
static inline char map(char c)
{
switch(c) {
case ' ' : return ('/' - '*');
case '\0': return 0;
default : return c - '*';
}
return 0;
}
static inline char revmap(char c)
{
switch(c) {
case '\0' : return 0;
case '/' - '*': return ' ';
default : return c + '*';
}
return 0;
}
char *compress(const char *s, int len)
{
int i, j;
char *compr = malloc((len+1)/2 + 1);
j = 0;
for (i = 1; i < len; i += 2)
compr[j++] = map(s[i-1]) << 4 | map(s[i]);
if (i-1 < len)
compr[j++] = map(s[i-1]) << 4;
compr[j] = '\0';
return compr;
}
char *decompress(const char *s, int len)
{
int i, j;
char *decompr = malloc(2*len + 1);
for (i = j = 0; i < len; i++) {
decompr[j++] = revmap((s[i] & 0xf0) >> 4);
decompr[j++] = revmap(s[i] & 0xf);
}
decompr[j] = '\0';
return decompr;
}
int main()
{
const char *input = "230982 0 4294753011 -9 4294198951 -1 4294225518 0 0 0 524789 0 934585 0 4 0 0 0 0 ";
int plen = strlen(input);
printf("plain(len=%d): %s\n", plen, input);
char *compr = compress(input, plen);
int clen = strlen(compr);
char *decompr = decompress(compr, clen);
int dlen = strlen(decompr);
printf("decompressed(len=%d): %s\n", dlen, decompr);
free(compr);
free(decompr);
}

Simplest solution is to simply dump data out in binary form. It may be smaller or bigger than string form depending on your data, but you don't have to do any data processing on device.
If most of your data is small, you can use variable length data encoding for serialization. There are several, but CBOR is fairly simple.
If your data changes only very little, you could send only first row as absolute values, and remaining rows as delta of previous row. This would result in many small numbers, which typically are more efficient in previously mentioned encoding systems.
I wouldn't try to implement any general purpose compression algorithms without any experience and external libraries, unless you absolutely need it. Finding suitable algorithm that compresses your data well enough and with reasonable resource usage can be time consuming.

Related

How to reverse strings that have been obfuscated using floats and double?

I'm working on a crackme , and having a bit of trouble making sense of the flag I'm supposed to retrieve.
I have disassembled the binary using radare2 and ghidra , ghidra gives me back the following pseudo-code:
undefined8 main(void)
{
long in_FS_OFFSET;
double dVar1;
double dVar2;
int local_38;
int local_34;
int local_30;
int iStack44;
int local_28;
undefined2 uStack36;
ushort uStack34;
char local_20;
undefined2 uStack31;
uint uStack29;
byte bStack25;
long local_10;
local_10 = *(long *)(in_FS_OFFSET + 0x28);
__printf_chk(1,"Insert flag: ");
__isoc99_scanf(&DAT_00102012,&local_38);
uStack34 = uStack34 << 8 | uStack34 >> 8;
uStack29 = uStack29 & 0xffffff00 | (uint)bStack25;
bStack25 = (undefined)uStack29;
if ((((local_38 == 0x41524146) && (local_34 == 0x7b594144)) && (local_30 == 0x62753064)) &&
(((iStack44 == 0x405f336c && (local_20 == '_')) &&
((local_28 == 0x665f646e && (CONCAT22(uStack34,uStack36) == 0x40746f31)))))) {
dVar1 = (double)CONCAT26(uStack34,CONCAT24(uStack36,0x665f646e));
dVar2 = (double)CONCAT17((undefined)uStack29,CONCAT43(uStack29,CONCAT21(uStack31,0x5f)));
__printf_chk(0x405f336c62753064,1,&DAT_00102017);
__printf_chk(dVar1,1,"y: %.30lf\n");
__printf_chk(dVar2,1,"z: %.30lf\n");
dVar1 = dVar1 * 124.8034902710365;
dVar2 = (dVar1 * dVar1) / dVar2;
round_double(dVar2,0x1e);
__printf_chk(1,"%.30lf\n");
dVar1 = (double)round_double(dVar2,0x1e);
if (1.192092895507812e-07 <= (double)((ulong)(dVar1 - 4088116.817143337) & 0x7fffffffffffffff))
{
puts("Try Again");
}
else {
puts("Well done!");
}
}
if (local_10 != *(long *)(in_FS_OFFSET + 0x28)) {
/* WARNING: Subroutine does not return */
__stack_chk_fail();
}
return 0;
}
It is easy to see that there's a part of the flag in plain-sight , but the other part is a bit more interesting :
if (1.192092895507812e-07 <= (double)((ulong)(dVar1 - 4088116.817143337) & 0x7fffffffffffffff))
From what I understand , I have to generate the missing part of the flag depending on this condition . The problem is that I absolutely have no idea how to do this .
I can assume this missing part is 8 bytes of size , according to this line :
dVar2=(double)CONCAT17((undefined)uStack29,CONCAT43(uStack29,CONCAT21(uStack31,0x5f)));`
Considering flags are usually ascii , with some special characters , let's say , each byte will have values from 0x21 to 0x7E , that's almost 8^100 combinations , which will clearly take too much time to compute.
Do you guys have an idea on how I should proceed to solve this ?
Edit : Here is the link to the binary : https://filebin.net/dpfr1nocyry3sijk
You can tweak the Ghidra reverse result by edit variable type. Based on scanf const string %32s your local_38 should be char [32].
Before the first if, there are some char swap.
And the first if statment give you a long constrain of flag
At this point, you can confirm part of flag is FARADAY{d0ubl3_#nd_f1o#t, then is ther main part of this challenge.
It print x, y, z based on the flag, but you'll quickly find x and y is constrain by the if, so you only need to solve z to get the flag, so you think you need to bruteforce all double value limit by printable ascii.
But there are a limitaion in if statment says byte0 of this double must be _ and a math constrain there, simple math tell dVar2 - 4088116.817143337 <= 1.192092895507813e-07 and it comes dVar2 is very close 4088116.817143337
And byte 3 and byte 7 in this double will swap
By reverse result: dVar2 = y*y*x*x/z, solve this equation you can say z must near 407.2786840401004 and packed to little endian is `be}uty#. Based on double internal structure format, MSB will affect exponent, so you can make sure last byte is # and it shows byte0 and byte3 is fixed now by constrain and flag common format with {} pair.
So finally, you only need to bureforce 5 bytes of printable ascii to resolve this challenge.
import string, struct
from itertools import product
possible = string.ascii_lowercase + string.punctuation + string.digits
for nxt in product(possible, repeat=5):
n = ''.join(nxt).encode()
s = b'_' + n[:2] + b'}' + n[2:] + b'#'
rtn = struct.unpack("<d", s)[0]
rtn = 1665002837.488342 / rtn
if abs(rtn - 4088116.817143337) <= 0.0000001192092895507812:
print(s)
And bingo the flag is FARADAY{d0ubl3_#nd_f1o#t_be#uty}

c - Avoid if in loop

Context
Debian 64.
Core 2 duo.
Fiddling with a loop. I came with different variations of the same loop but I would like to avoid conditional branching if possible.
But, even if I think it will be difficult to beat.
I thought about SSE or bit shifting but still, it would require a jump (look at the computed goto below). Spoiler : a computed jump doesn't seems to be the way to go.
The code is compiled without PGO. Because on this piece of code, it makes the code slower..
flags :
gcc -march=native -O3 -std=c11 test_comp.c
Unrolling the loop didn't help here..
63 in ascii is '?'.
The printf is here to force the code to execute. Nothing more.
My need :
A logic to avoid the condition. I assume this as a challenge to make my holydays :)
The code :
Test with the sentence. The character '?' is guaranteed to be there but at a random position.
hjkjhqsjhdjshnbcvvyzayuazeioufdhkjbvcxmlkdqijebdvyxjgqddsyduge?iorfe
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char **argv){
/* This is quite slow. Average actually.
Executes in 369,041 cycles here (cachegrind) */
for (int x = 0; x < 100; ++x){
if (argv[1][x] == 63){
printf("%d\n",x);
break;
}
}
/* This is the slowest.
Executes in 370,385 cycles here (cachegrind) */
register unsigned int i = 0;
static void * restrict table[] = {&&keep,&&end};
keep:
++i;
goto *table[(argv[1][i-1] == 63)];
end:
printf("i = %d",i-1);
/* This is slower. Because of the calculation..
Executes in 369,109 cycles here (cachegrind) */
for (int x = 100; ; --x){
if (argv[1][100 - x ] == 63){printf("%d\n",100-x);break;}
}
return 0;
}
Question
Is there a way to make it faster, avoiding the branch maybe ?
The branch miss is huge with 11.3% (cachegrind with --branch-sim=yes).
I cannot think it is the best one can achieve.
If some of you manage assembly with enough talent, please come in.
Assuming you have a buffer of well know size being able to hold the maximum amount of chars to test against, like
char buffer[100];
make it one byte larger
char buffer[100 + 1];
then fill it with the sequence to test against
read(fileno(stdin), buffer, 100);
and put your test-char '?' at the very end
buffer[100] = '?';
This allows you for a loop with only one test condition:
size_t i = 0;
while ('?' != buffer[i])
{
++i;
}
if (100 == i)
{
/* test failed */
}
else
{
/* test passed for i */
}
All other optimisation leave to the compiler.
However I couldn't resist, so here's a possible approach to do micro optimisation
char buffer[100 + 1];
read(fileno(stdin), buffer, 100);
buffer[100] = '?';
char * p = buffer;
while ('?' != *p)
{
++p;
}
if ((p - buffer) == 100)
{
/* test failed */
}
else
{
/* test passed for (p - buffer) */
}

How to optimize C for loop for font rendering on oled display

I need to optimize this function: Any strange way to optimize the for loop? (early break i think can't be possible)
void SeeedGrayOLED::putChar(unsigned char C)
{
if(C < 32 || C > 127) //Ignore non-printable ASCII characters. This can be modified for multilingual font.
{
C=' '; //Space
}
uint8_t k,offset = 0;
char bit1,bit2,c = 0;
for(char i=0;i<16;i++)
{
for(char j=0;j<32;j+=2)
{
if(i>8){
k=i-8;
offset = 1;
}else{
k=i;
}
// Character is constructed two pixel at a time using vertical mode from the default 8x8 font
c=0x00;
bit1=(pgm_read_byte(&hallfetica_normal[C-32][j+offset]) >> (8-k)) & 0x01;
bit2=(pgm_read_byte(&hallfetica_normal[C-32][j+offset]) >> ((8-k)-1)) & 0x01;
// Each bit is changed to a nibble
c|=(bit1)?grayH:0x00;
c|=(bit2)?grayL:0x00;
sendData(c);
}
}
}
I've got a font in the array hallfetica_normal, is an array of array of uint8_t, that maybe compressed or something like that?
This code run on a arduino, ad i've to run a countdown from 500 to 0 with one unit down every 10/20ms.
EDIT
This is the new code after yours indication, thanks all:
I'm looking to organise the font differently to permit less call to pgm_read_byte.. (something like changing the orientation... i wonder)
void SeeedGrayOLED::putChar(unsigned char C)
{
if(C < 32 || C > 127) //Ignore non-printable ASCII characters. This can be modified for multilingual font.
{
C=' '; //Space
}
char c,byte = 0x00;
unsigned char nibble_lookup[] = { 0, grayL, grayH, grayH | grayL };
for(int ii=0;ii<2;ii++){
for(int i=0;i<8;i++)
{
for(int j=0;j<32;j+=2)
{
byte = pgm_read_byte(&hallfetica_normal[C-32][j+ii]);
c = nibble_lookup[(byte >> (8-i)) & 3];
sendData(c);
}
}
}
}
Well, you seem to be reading the same byte twice in a row unnecessarily via pgm_read_byte(&hallfetica_normal[C-32][j+offset]). You could load that once into a local variable.
Additionally, you could avoid the if(i>8){ check per iteration by breaking up the code into two loops; one where i goes from 0 to 8 and another where it goes from 9 to 15. (Although I suspect you really intended >= here, making the loop boundaries 0-7 then 8-15.) That also means things like offset become constant values, which will help.
In an effort to make the inner loop as fast as possible, I'd try to get rid of all branching with a lookup table and see whether that helped.
First, I'd define the lookup table outside the loop:
/* outside the loop */
unsigned char h_lookup[] = { 0, grayH };
unsigned char l_lookup[] = { 0, grayL };
Then inside the loop, since you're testing the least-significant bit, you can use that as an index into the lookup table. If it's clear, then the lookup index will be 0. If it's set, then the lookup index will be 1:
/* inside the loop */
byte = pgm_read_byte(&hallfetica_normal[C-32][j+offset]);
c = h_lookup[((byte >> (8-k)) & 0x01)] |
l_lookup[((byte >> (8-k-1)) & 0x01)]
sendData(c);
Since you're masking and testing 2 adjacent bits, 8-k and 8-k-1, you could list all 4 possibilities in a single lookup table:
/* Outside loop */
unsigned char nibble_lookup[] = { 0, grayL, grayH, grayH | grayL };
And then the lookup becomes dramatically simplified.
/* loop */
byte = pgm_read_byte(&hallfetica_normal[C-32][j+offset]);
c = nibble_lookup[(byte >> (8-k)) & 3];
sendData(c);
The other answer has addressed what to do about the branches in the top part of your inner loop.

Seeking single file encryption implemenation which can handle whole file en/de-crypt in Delphi and C

[Update] I am offering a bonus for this. Frankly, I don't care which encryption method is used. Preferably something simple like XTEA, RC4, BlowFish ... but you chose.
I want minimum effort on my part, preferably just drop the files into my projects and build.
Idealy you should already have used the code to en/de-crypt a file in Delphi and C (I want to trade files between an Atmel UC3 micro-processor (coding in C) and a Windows PC (coding in Delphi) en-and-de-crypt in both directions).
I have a strong preference for a single .PAS unit and a single .C/.H file. I do not want to use a DLL or a library supporting dozens of encryption algorithms, just one (and I certainly don't want anything with an install program).
I hope that I don't sound too picky here, but I have been googling & trying code for over a week and still can't find two implementations which match. I suspect that only someone who has already done this can help me ...
Thanks in advance.
As a follow up to my previous post, I am still looking for some very simple code with why I can - with minimal effort - en-de crypt a file and exchange it between Delphi on a PC and C on an Atmel UC3 u-processor.
It sounds simple in theory, but in practice it's a nightmare. There are many possible candidates and I have spend days googling and trying them out - to no avail.
Some are humonous libraries, supporting many encryption algorithms, and I want something lightweight (especially on the C / u-processor end).
Some look good, but one set of source offers only block manipulation, the other strings (I would prefer whole file en/de-crypt).
Most seem to be very poorly documented, with meaningless parameter names and no example code to call the functions.
Over the past weekend (plus a few more days), I have burned my way through a slew of XTEA, XXTEA and BlowFish implementations, but while I can encrypt, I can't reverse the process.
Now I am looking at AES-256. Dos anyone know of an implementation in C which is a single AES.C file? (plus AES.H, of course)
Frankly, I will take anything that will do whole file en/de-crypt between Delphi and C, but unless anyone has actually done this themselves, I expect to hear only "any implementation that meets the standard should do" - which is a nice theoory but just not working out for me :-(
Any simple AES-256 in C out there? I have some reasonable looking Delphi code, but won't be sure until I try them together.
Thanks in advance ...
I would suggest using the .NET Micro Framework on a secondary microcontroller (e.g. Atmel SAM7X) as a crypto coprocessor. You can test this out on a Netduino, which you can pick up for around $35 / £30. The framework includes an AES implementation within it, under the System.Security.Cryptography namespace, alongside a variety of other cryptographic functions that might be useful for you. The benefit here is that you get a fully tested and working implementation, and increased security via type-safe code.
You could use SPI or I2C to communicate between the two microcontrollers, or bit-bang your own data transfer protocol over several I/O lines in parallel if higher throughput is needed.
I did exactly this with an Arduino and a Netduino (using the Netduino to hash blocks of data for a hardware BitTorrent device) and implemented a rudimentary asynchronous system using various commands sent between the devices via SPI and an interrupt mechanism.
Arduino is SPI master, Netduino is SPI slave.
A GPIO pin on the Netduino is set as an output, and tied to another interrupt-enabled GPIO pin on the Arduino that is set as an input. This is the interrupt pin.
Arduino sends 0xF1 as a "hello" initialization message.
Netduino sends back 0xF2 as an acknolwedgement.
When Arduino wants to hash a block, it sends 0x48 (ASCII 'H') followed by the data. When it is done sending data, it sets CS low. It must send whole bytes; setting CS low when the number of received bits is not divisible by 8 causes an error.
The Netduino receives the data, and sends back 0x68 (ASCII 'h') followed by the number of received bytes as a 2-byte unsigned integer. If an error occurred, it sends back 0x21 (ASCII '!') instead.
If it succeeded, the Netduino computes the hash, then sets the interrupt pin high. During the computation time, the Arduino is free to continue its job whilst waiting.
The Arduino sends 0x52 (ASCII 'R') to request the result.
The Netduino sets the interrupt pin low, then sends 0x72 (ASCII 'r') and the raw hash data back.
Since the Arduino can service interrupts via GPIO pins, it allowed me to make the processing entirely asynchronous. A variable on the Arduino side tracks whether we're currently waiting on the coprocessor to complete its task, so we don't try to send it a new block whilst it's still working on the old one.
You could easily adapt this scheme for computing AES blocks.
Small C library for AES-256 by Ilya Levin. Short implementation, asm-less, simple usage. Not sure how would it work on your current micro CPU, though.
[Edit]
You've mentioned having some delphi implementation, but in case something not working together, try this or this.
Also I've found arduino (avr-based) module using the Ilya's library - so it should also work on your micro CPU.
Can you compile C code from Delphi (you can compile Delphi code from C++ Builder, not sure about VV). Or maybe use the Free Borland Command line C++ compiler or even another C compiler.
The idea is to use the same C code in your Windows app as you use on your microprocessor.. That way you can be reasonably sure that the code will work in both directions.
[Update] See
http://www.drbob42.com/examines/examin92.htm
http://www.hflib.gov.cn/e_book/e_book_file/bcb/ch06.htm (Using C++ Code in Delphi)
http://edn.embarcadero.com/article/10156#H11
It looks like you need to use a DLL, but you can statically link it if you don't want to distribute it
Here is RC4 code. It is very lightweight.
The C has been used in a production system for five years.
I have added lightly tested Delphi code. The Pascal is a line-by-line port with with unsigned char going to Byte. I have only run the Pascal in Free Pascal with Delphi option turned on, not Delphi itself. Both C and Pascal have simple file processors.
Scrambling the ciphertext gives the original cleartext back.
No bugs reported to date. Hope this solves your problem.
rc4.h
#ifndef RC4_H
#define RC4_H
/*
* rc4.h -- Declarations for a simple rc4 encryption/decryption implementation.
* The code was inspired by libtomcrypt. See www.libtomcrypt.org.
*/
typedef struct TRC4State_s {
int x, y;
unsigned char buf[256];
} TRC4State;
/* rc4.c */
void init_rc4(TRC4State *state);
void setup_rc4(TRC4State *state, char *key, int keylen);
unsigned endecrypt_rc4(unsigned char *buf, unsigned len, TRC4State *state);
#endif
rc4.c
void init_rc4(TRC4State *state)
{
int x;
state->x = state->y = 0;
for (x = 0; x < 256; x++)
state->buf[x] = x;
}
void setup_rc4(TRC4State *state, char *key, int keylen)
{
unsigned tmp;
int x, y;
// use only first 256 characters of key
if (keylen > 256)
keylen = 256;
for (x = y = 0; x < 256; x++) {
y = (y + state->buf[x] + key[x % keylen]) & 255;
tmp = state->buf[x];
state->buf[x] = state->buf[y];
state->buf[y] = tmp;
}
state->x = 255;
state->y = y;
}
unsigned endecrypt_rc4(unsigned char *buf, unsigned len, TRC4State *state)
{
int x, y;
unsigned char *s, tmp;
unsigned n;
x = state->x;
y = state->y;
s = state->buf;
n = len;
while (n--) {
x = (x + 1) & 255;
y = (y + s[x]) & 255;
tmp = s[x]; s[x] = s[y]; s[y] = tmp;
tmp = (s[x] + s[y]) & 255;
*buf++ ^= s[tmp];
}
state->x = x;
state->y = y;
return len;
}
int endecrypt_file(FILE *f_in, FILE *f_out, char *key)
{
TRC4State state[1];
unsigned char buf[4096];
size_t n_read, n_written;
init_rc4(state);
setup_rc4(state, key, strlen(key));
do {
n_read = fread(buf, 1, sizeof buf, f_in);
endecrypt_rc4(buf, n_read, state);
n_written = fwrite(buf, 1, n_read, f_out);
} while (n_read == sizeof buf && n_written == n_read);
return (n_written == n_read) ? 0 : 1;
}
int endecrypt_file_at(char *f_in_name, char *f_out_name, char *key)
{
int rtn;
FILE *f_in = fopen(f_in_name, "rb");
if (!f_in) {
return 1;
}
FILE *f_out = fopen(f_out_name, "wb");
if (!f_out) {
close(f_in);
return 2;
}
rtn = endecrypt_file(f_in, f_out, key);
fclose(f_in);
fclose(f_out);
return rtn;
}
#ifdef TEST
// Simple test.
int main(void)
{
char *key = "This is the key!";
endecrypt_file_at("rc4.pas", "rc4-scrambled.c", key);
endecrypt_file_at("rc4-scrambled.c", "rc4-unscrambled.c", key);
return 0;
}
#endif
Here is lightly tested Pascal. I can scramble the source code in C and descramble it with the Pascal implementation just fine.
type
RC4State = record
x, y : Integer;
buf : array[0..255] of Byte;
end;
KeyString = String[255];
procedure initRC4(var state : RC4State);
var
x : Integer;
begin
state.x := 0;
state.y := 0;
for x := 0 to 255 do
state.buf[x] := Byte(x);
end;
procedure setupRC4(var state : RC4State; var key : KeyString);
var
tmp : Byte;
x, y : Integer;
begin
y := 0;
for x := 0 to 255 do begin
y := (y + state.buf[x] + Integer(key[1 + x mod Length(key)])) and 255;
tmp := state.buf[x];
state.buf[x] := state.buf[y];
state.buf[y] := tmp;
end;
state.x := 255;
state.y := y;
end;
procedure endecryptRC4(var buf : array of Byte; len : Integer; var state : RC4State);
var
x, y, i : Integer;
tmp : Byte;
begin
x := state.x;
y := state.y;
for i := 0 to len - 1 do begin
x := (x + 1) and 255;
y := (y + state.buf[x]) and 255;
tmp := state.buf[x];
state.buf[x] := state.buf[y];
state.buf[y] := tmp;
tmp := (state.buf[x] + state.buf[y]) and 255;
buf[i] := buf[i] xor state.buf[tmp]
end;
state.x := x;
state.y := y;
end;
procedure endecryptFile(var fIn, fOut : File; key : KeyString);
var
nRead, nWritten : Longword;
buf : array[0..4095] of Byte;
state : RC4State;
begin
initRC4(state);
setupRC4(state, key);
repeat
BlockRead(fIN, buf, sizeof(buf), nRead);
endecryptRC4(buf, nRead, state);
BlockWrite(fOut, buf, nRead, nWritten);
until (nRead <> sizeof(buf)) or (nRead <> nWritten);
end;
procedure endecryptFileAt(fInName, fOutName, key : String);
var
fIn, fOut : File;
begin
Assign(fIn, fInName);
Assign(fOut, fOutName);
Reset(fIn, 1);
Rewrite(fOut, 1);
endecryptFile(fIn, fOut, key);
Close(fIn);
Close(fOut);
end;
{$IFDEF TEST}
// Very small test.
const
key = 'This is the key!';
begin
endecryptFileAt('rc4.pas', 'rc4-scrambled.pas', key);
endecryptFileAt('rc4-scrambled.pas', 'rc4-unscrambled.pas', key);
end.
{$ENDIF}
It looks easier would be to get reference AES implementation (which works with blocks), and add some code to handle CBC (or CTR encryption).
This would need from you only adding ~30-50 lines of code, something like the following (for CBC):
aes_expand_key();
first_block = iv;
for (i = 0; i < filesize / 16; i++)
{
data_block = read(file, 16);
data_block = (data_block ^ iv);
iv = encrypt_block(data_block);
write(outputfile, iv);
}
// if filesize % 16 != 0, then you also need to add some padding and encrypt the last block
Assuming the encryption strength isn't an issue, as in satisfying an organization's Chinese Wall requirement, the very simple "Sawtooth" encryption scheme of adding (i++ % modulo 256) to fgetc(), for each byte, starting at the beginning of the file, might work just fine.
Declaring i as a UCHAR will eliminate the modulo requirement, as the single byte integer cannot help but cycle through its 0-255 range.
The code is so simple it's not worth posting. A little imagination, and you'll have some embellishments that can add a lot to the strength of this cypher. The primary vulnerability of this cypher is large blocks of identical characters. Rectifying this is a good place to start improving its strength.
This cypher works on every possible file type, and is especially effective if you've already 7Zipped the file.
Performance is phenomenal. You won't even know the code is there. Totally I/O bound.

C Library for compressing sequential positive integers

I have the very common problem of creating an index for an in-disk array of strings. In short, I need to store the position of each string in the in-disk representation. For example, a very naive solution would be an index array as follows:
uint64 idx[] = { 0, 20, 500, 1024, ..., 103434 };
Which says that the first string is at position 0, the second at position 20, the third at position 500 and the nth at position 103434.
The positions are always non-negative 64 bits integers in sequential order. Although the numbers could vary by any difference, in practice I expect the typical difference to be inside the range from 2^8 to 2^20. I expect this index to be mmap'ed in memory, and the positions will be accessed randomly (assume uniform distribution).
I was thinking about writing my own code for doing some sort of block delta encoding or other more sophisticated encoding, but there are so many different trade-offs between encoding/decoding speed and space that I would rather get a working library as a starting point and maybe even settle for something without any customizations.
Any hints? A c library would be ideal, but a c++ one would also allow me to run some initial benchmarks.
A few more details if you are still following. This will be used to build a library similar to cdb (http://cr.yp.to/cdb/cdbmake.html) on top the library cmph (http://cmph.sf.net). In short, it is for a large disk based read only associative map with a small index in memory.
Since it is a library, I don't have control over input, but the typical use case that I want to optimize have millions of hundreds of values, average value size in the few kilobytes ranges and maximum value at 2^31.
For the record, if I don't find a library ready to use I intend to implement delta encoding in blocks of 64 integers with the initial bytes specifying the block offset so far. The blocks themselves would be indexed with a tree, giving me O(log (n/64)) access time. There are way too many other options and I would prefer to not discuss them. I am really looking forward ready to use code rather than ideas on how to implement the encoding. I will be glad to share with everyone what I did once I have it working.
I appreciate your help and let me know if you have any doubts.
I use fastbit (Kesheng Wu LBL.GOV), it seems you need something good, fast and NOW, so fastbit is a highly competient improvement on Oracle's BBC (byte aligned bitmap code, berkeleydb). It's easy to setup and very good gernally.
However, given more time, you may want to look at a gray code solution, it seems optimal for your purposes.
Daniel Lemire has a number of libraries for C/++/Java released on code.google, I've read over some of his papers and they are quite nice, several advancements on fastbit and alternative approaches for column re-ordering with permutated grey codes's.
Almost forgot, I also came across Tokyo Cabinet, though I do not think it will be well suited for my current project, I may of considered it more if I had known about it before ;), it has a large degree of interoperability,
Tokyo Cabinet is written in the C
language, and provided as API of C,
Perl, Ruby, Java, and Lua. Tokyo
Cabinet is available on platforms
which have API conforming to C99 and
POSIX.
As you referred to CDB, the TC benchmark has a TC mode (TC support's several operational constraint's for varying perf) where it surpassed CDB by 10 times for read performance and 2 times for write.
With respect to your delta encoding requirement, I am quite confident in bsdiff and it's ability to out-perform any file.exe content patching system, it may also have some fundimental interfaces for your general needs.
Google's new binary compression application, courgette may be worth checking out, in case you missed the press release, 10x smaller diff's than bsdiff in the one test case I have seen published.
You have two conflicting requirements:
You want to compress very small items (8 bytes each).
You need efficient random access for each item.
The second requirement is very likely to impose a fixed length for each item.
What exactly are you trying to compress? If you are thinking about the total space of index, is it really worth the effort to save the space?
If so one thing you could try is to chop the space into half and store it into two tables. First stores (upper uint, start index, length, pointer to second table) and the second would store (index, lower uint).
For fast searching, indices would be implemented using something like B+ Tree.
I did something similar years ago for a full-text search engine. In my case, each indexed word generated a record which consisted of a record number (document id) and a word number (it could just as easily have stored word offsets) which needed to be compressed as much as possible. I used a delta-compression technique which took advantage of the fact that there would be a number of occurrences of the same word within a document, so the record number often did not need to be repeated at all. And the word offset delta would often fit within one or two bytes. Here is the code I used.
Since it's in C++, the code may is not going to be useful to you as is, but can be a good starting point for writing compressions routines.
Please excuse the hungarian notation and the magic numbers strewn within the code. Like I said, I wrote this many years ago :-)
IndexCompressor.h
//
// index compressor class
//
#pragma once
#include "File.h"
const int IC_BUFFER_SIZE = 8192;
//
// index compressor
//
class IndexCompressor
{
private :
File *m_pFile;
WA_DWORD m_dwRecNo;
WA_DWORD m_dwWordNo;
WA_DWORD m_dwRecordCount;
WA_DWORD m_dwHitCount;
WA_BYTE m_byBuffer[IC_BUFFER_SIZE];
WA_DWORD m_dwBytes;
bool m_bDebugDump;
void FlushBuffer(void);
public :
IndexCompressor(void) { m_pFile = 0; m_bDebugDump = false; }
~IndexCompressor(void) {}
void Attach(File& File) { m_pFile = &File; }
void Begin(void);
void Add(WA_DWORD dwRecNo, WA_DWORD dwWordNo);
void End(void);
WA_DWORD GetRecordCount(void) { return m_dwRecordCount; }
WA_DWORD GetHitCount(void) { return m_dwHitCount; }
void DebugDump(void) { m_bDebugDump = true; }
};
IndexCompressor.cpp
//
// index compressor class
//
#include "stdafx.h"
#include "IndexCompressor.h"
void IndexCompressor::FlushBuffer(void)
{
ASSERT(m_pFile != 0);
if (m_dwBytes > 0)
{
m_pFile->Write(m_byBuffer, m_dwBytes);
m_dwBytes = 0;
}
}
void IndexCompressor::Begin(void)
{
ASSERT(m_pFile != 0);
m_dwRecNo = m_dwWordNo = m_dwRecordCount = m_dwHitCount = 0;
m_dwBytes = 0;
}
void IndexCompressor::Add(WA_DWORD dwRecNo, WA_DWORD dwWordNo)
{
ASSERT(m_pFile != 0);
WA_BYTE buffer[16];
int nbytes = 1;
ASSERT(dwRecNo >= m_dwRecNo);
if (dwRecNo != m_dwRecNo)
m_dwWordNo = 0;
if (m_dwRecordCount == 0 || dwRecNo != m_dwRecNo)
++m_dwRecordCount;
++m_dwHitCount;
WA_DWORD dwRecNoDelta = dwRecNo - m_dwRecNo;
WA_DWORD dwWordNoDelta = dwWordNo - m_dwWordNo;
if (m_bDebugDump)
{
TRACE("%8X[%8X] %8X[%8X] : ", dwRecNo, dwRecNoDelta, dwWordNo, dwWordNoDelta);
}
// 1WWWWWWW
if (dwRecNoDelta == 0 && dwWordNoDelta < 128)
{
buffer[0] = 0x80 | WA_BYTE(dwWordNoDelta);
}
// 01WWWWWW WWWWWWWW
else if (dwRecNoDelta == 0 && dwWordNoDelta < 16384)
{
buffer[0] = 0x40 | WA_BYTE(dwWordNoDelta >> 8);
buffer[1] = WA_BYTE(dwWordNoDelta & 0x00ff);
nbytes += sizeof(WA_BYTE);
}
// 001RRRRR WWWWWWWW WWWWWWWW
else if (dwRecNoDelta < 32 && dwWordNoDelta < 65536)
{
buffer[0] = 0x20 | WA_BYTE(dwRecNoDelta);
WA_WORD *p = (WA_WORD *) (buffer+1);
*p = WA_WORD(dwWordNoDelta);
nbytes += sizeof(WA_WORD);
}
else
{
// 0001rrww
buffer[0] = 0x10;
// encode recno
if (dwRecNoDelta < 256)
{
buffer[nbytes] = WA_BYTE(dwRecNoDelta);
nbytes += sizeof(WA_BYTE);
}
else if (dwRecNoDelta < 65536)
{
buffer[0] |= 0x04;
WA_WORD *p = (WA_WORD *) (buffer+nbytes);
*p = WA_WORD(dwRecNoDelta);
nbytes += sizeof(WA_WORD);
}
else
{
buffer[0] |= 0x08;
WA_DWORD *p = (WA_DWORD *) (buffer+nbytes);
*p = dwRecNoDelta;
nbytes += sizeof(WA_DWORD);
}
// encode wordno
if (dwWordNoDelta < 256)
{
buffer[nbytes] = WA_BYTE(dwWordNoDelta);
nbytes += sizeof(WA_BYTE);
}
else if (dwWordNoDelta < 65536)
{
buffer[0] |= 0x01;
WA_WORD *p = (WA_WORD *) (buffer+nbytes);
*p = WA_WORD(dwWordNoDelta);
nbytes += sizeof(WA_WORD);
}
else
{
buffer[0] |= 0x02;
WA_DWORD *p = (WA_DWORD *) (buffer+nbytes);
*p = dwWordNoDelta;
nbytes += sizeof(WA_DWORD);
}
}
// update current setting
m_dwRecNo = dwRecNo;
m_dwWordNo = dwWordNo;
// add compressed data to buffer
ASSERT(buffer[0] != 0);
ASSERT(nbytes > 0 && nbytes < 10);
if (m_dwBytes + nbytes > IC_BUFFER_SIZE)
FlushBuffer();
CopyMemory(m_byBuffer + m_dwBytes, buffer, nbytes);
m_dwBytes += nbytes;
if (m_bDebugDump)
{
for (int i = 0; i < nbytes; ++i)
TRACE("%02X ", buffer[i]);
TRACE("\n");
}
}
void IndexCompressor::End(void)
{
FlushBuffer();
m_pFile->Write(WA_BYTE(0));
}
You've omitted critical information about the number of strings you intend to index.
But given that you say you expect the minimum length of an indexed string to be 256, storing the indices as 64% incurs at most 3% overhead. If the total length of the string file is less than 4GB, you could use 32-bit indices and incur 1.5% overhead. These numbers suggest to me that if compression matters, you're better off compressing the strings, not the indices. For that problem a variation on LZ77 seems in order.
If you want to try a wild idea, put each string in a separate file, pull them all into a zip file, and see how you can do with zziplib. This probably won't be great, but it's nearly zero work on your part.
More data on the problem would be welcome:
Number of strings
Average length of a string
Maximum length of a string
Median length of strings
Degree to which the strings file compresses with gzip
Whether you are allowed to change the order of strings to improve compression
EDIT
The comment and revised question makes the problem much clearer. I like your idea of grouping, and I would try a simple delta encoding, group the deltas, and use a variable-length code within each group. I wouldn't wire in 64 as the group size–I think you will probably want to determine that empirically.
You asked for existing libraries. For the grouping and delta encoding I doubt you will find much. For variable-length integer codes, I'm not seeing much in the way of C libraries, but you can find variable-length codings in Perl and Python. There are a ton of papers and some patents on this topic, and I suspect you're going to wind up having to roll your own. But there are some simple codes out there, and you could give UTF-8 a try—it can code unsigned integers up to 32 bits, and you can grab C code from Plan 9 and I'm sure many other sources.
Are you running on Windows? If so, I recommend creating the mmap file using naive solution your originally proposed, and then compressing the file using NTLM compression. Your application code never knows the file is compressed, and the OS does the file compression for you. You might not think this would be very performant or get good compression, but I think you'll be surprised if you try it.

Resources