Difference between two buffers in C/C++ - c

This development is being done on Windows in usermode.
I have two (potentially quite large) buffers, and I would like to know the number of bytes different between the two of them.
I wrote this myself just checking byte by byte, but this resulted in a quite slow implementation. As I'm comparing on the order of hundreds of megabytes, this is undesirable. I'm aware that I could optimize this though many different means, but this seems like a common problem that's probably got optimized solutions already out there, and there's no way I'm going to optimize this as effectively as if it was written by optimization experts.
Perhaps my Googling is inadequate, but I'm unable to find any other C or C++ functions that can count the number of different bytes between two buffers. Is there such a built in function to the C standard library, WinAPI, or C++ standard library that I just don't know of? Or do I need to manually optimize this?

I ended up writing this (perhaps somewhat poorly) optimized code to do the job for me. I was hoping it would vectorize this under the hood, but that doesn't appear to be happening unfortunately, and I didn't feel like digging around the SIMD intrinsics to do it manually. As a result, my bit fiddling tricks may end up making it slower, but it's still fast enough that it's no more than about 4% of my code's runtime (and almost all of that was memcmp). Whether or not it could be better, it's good enough for me.
I'll note that this is designed to be fast for my use case, where I'm expecting only rare differences.
inline size_t ComputeDifferenceSmall(
_In_reads_bytes_(size) char* buf1,
_In_reads_bytes_(size) char* buf2,
size_t size) {
/* size should be <= 0x1000 bytes */
/* In my case, I expect frequent differences if any at all are present. */
size_t res = 0;
for (size_t i = 0; i < (size & ~0xF); i += 0x10) {
uint64_t diff1 = *reinterpret_cast<uint64_t*>(buf1) ^
*reinterpret_cast<uint64_t*>(buf2);
if (!diff1) continue;
/* Bit fiddle to make each byte 1 if they're different and 0 if the same */
diff1 = ((diff1 & 0xF0F0F0F0F0F0F0F0ULL) >> 4) | (diff1 & 0x0F0F0F0F0F0F0F0FULL);
diff1 = ((diff1 & 0x0C0C0C0C0C0C0C0CULL) >> 2) | (diff1 & 0x0303030303030303ULL);
diff1 = ((diff1 & 0x0202020202020202ULL) >> 1) | (diff1 & 0x0101010101010101ULL);
/* Sum the bytes */
diff1 = (diff1 >> 32) + (diff1 & 0xFFFFFFFFULL);
diff1 = (diff1 >> 16) + (diff1 & 0xFFFFULL);
diff1 = (diff1 >> 8) + (diff1 & 0xFFULL);
diff1 = (diff1 >> 4) + (diff1 & 0xFULL);
res += diff1;
}
for (size_t i = (size & ~0xF); i < size; i++) {
res += (buf1[i] != buf2[i]);
}
return res;
}
size_t ComputeDifference(
_In_reads_bytes_(size) char* buf1,
_In_reads_bytes_(size) char* buf2,
size_t size) {
size_t res = 0;
/* I expect most pages to be identical, and both buffers should be page aligned if
* larger than a page. memcmp has more optimizations than I'll ever come up with,
* so I can just use that to determine if I need to check for differences
* in the page. */
for (size_t pn = 0; pn < (size & ~0xFFF); pn += 0x1000) {
if (memcmp(&buf1[pn], &buf2[pn], 0x1000)) {
res += ComputeDifferenceSmall(&buf1[pn], &buf2[pn], 0x1000);
}
}
return res + ComputeDifferenceSmall(
&buf1[size & ~0xFFF], &buf2[size & ~0xFFF], size & 0xFFF);
}

Related

About one line in an implementation of MD5

I'm confused by one line of code in an implementation of MD5,
void MD5_Update(MD5_CTX *ctx, const void *data, unsigned long size)
{
MD5_u32plus saved_lo;
unsigned long used, available;
saved_lo = ctx->lo;
if ((ctx->lo = (saved_lo + size) & 0x1fffffff) < saved_lo)
ctx->hi++;
ctx->hi += size >> 29;
used = saved_lo & 0x3f;
if (used)
{
available = 64 - used;
if (size < available)
{
memcpy(&ctx->buffer[used], data, size);
return;
}
memcpy(&ctx->buffer[used], data, available);
data = (const unsigned char *)data + available;
size -= available;
body(ctx, ctx->buffer, 64);
}
if (size >= 64)
{
data = body(ctx, data, size & ~(unsigned long)0x3f);
size &= 0x3f;
}
memcpy(ctx->buffer, data, size);
}
The question line is if ((ctx->lo = (saved_lo + size) & 0x1fffffff) < saved_lo), it seems the 'size' counts bytes, but the 'ctx->lo' and 'saved_lo' count bits. Why add them together? There are also some similar codes in Github, and also some projects use these code. So anyone can give some explanation?
The remarks about "bit counters" are likely misleading - ctx->hi and ctx->lo count bytes, just like size does.
You correctly notice that you're just adding size (bytes) to ctx->lo (and then checking for overflow/propagating overflow into ctx->hi). The overflow check is pretty simple - lo is used as a 29-bit integer, and if the result after adding/masking is less than the original value, then overflow occurred.
The checks around used are also evidence for ctx->lo and ctx->hi being byte counters -- body processes data 64 bytes at a time, and the lo counter is ANDed with 0x3F (i.e. 63).

how can I implement paging , and find physical memory address knowing virtual address

I want to implement the initialisation of paging .
Referring to some links of osdev wiki : https://wiki.osdev.org/Paging , https://wiki.osdev.org/Setting_Up_Paging , my own version is very different.
Because , when we look at the page directory , they said that 12 bits is for the flag and the rest is for the address of the page table , so I tried something like this:
void init_paging() {
unsigned int i = 0;
unsigned int __FIRST_PAGE_TABLE__[0x400] __attribute__((aligned(0x1000)));
for (i = 0; i < 0x400; i++) __PAGE_DIRECTORY__[i] = PAGE_PRESENT(0) | PAGE_READ_WRITE;
for (i = 0; i < 0x400; i++) __FIRST_PAGE_TABLE__[i] = ((i * 0x1000) << 12) | PAGE_PRESENT(1) | PAGE_READ_WRITE;
__PAGE_DIRECTORY__[0] = ((unsigned int)__FIRST_PAGE_TABLE__ << 12) | PAGE_PRESENT(1) | PAGE_READ_WRITE;
_EnablingPaging_();
}
this function help me to know the physical address knowing the virtual address :
void *get_phyaddr(void *virtualaddr) {
unsigned long pdindex = (unsigned long)virtualaddr >> 22;
unsigned long ptindex = (unsigned long)virtualaddr >> 12 & 0x03FF;
unsigned long *pd = (unsigned long *)__PAGE_DIRECTORY__[pdindex];
unsigned long *pt = (unsigned long *)pd[ptindex];
return (void *)(pt + ((unsigned int)virtualaddr & 0xFFF));
}
I'm in the wrong direction?
Or still the same?
Assuming you're trying to identity map the first 4 MiB of the physical address space:
a) for unsigned int __FIRST_PAGE_TABLE__[0x400] __attribute__((aligned(0x1000))); it's a local variable (e.g. likely put on the stack); and it will not survive after the function returns (e.g. the stack space it was using will be overwritten by other functions later), causing the page table to become corrupted. That isn't likely to end well.
b) For __FIRST_PAGE_TABLE__[i] = ((i * 0x1000) << 12) | PAGE_PRESENT(1) | PAGE_READ_WRITE;, you're shifting i twice, once with * 0x1000 (which is the same as << 12) and again with the << 12. This is too much, and it needs to be more like __FIRST_PAGE_TABLE__[i] = (i << 12) | PAGE_PRESENT(1) | PAGE_READ_WRITE;.
c) For __PAGE_DIRECTORY__[0] = ((unsigned int)__FIRST_PAGE_TABLE__ << 12) | PAGE_PRESENT(1) | PAGE_READ_WRITE;, the address is already an address (and not a "page number" that needs to be shifted), so it needs to be more like __PAGE_DIRECTORY__[0] = ((unsigned int)__FIRST_PAGE_TABLE__) | PAGE_PRESENT(1) | PAGE_READ_WRITE;.
Beyond that; I'd very much prefer better use of types. Specifically; you should probably get in the habit of using uint32_t (or uint64_t, or a typedef of your own) for physical addresses to make sure you don't accidentally confuse a virtual address with a physical address (and make sure the compiler complains abut the wrong type when you make a mistake); because (even though it's not very important now because you're identity mapping) it will become important "soon". I'd also recommend using uint32_t for page table entries and page directory entries, because they must be 32 bits and not "whatever size the compiler felt like int should be" (note that this is a difference in how you think about the code, which is more important than what the compiler actually does or whether int happens to be 32 bits anyway).
When we ask page , but the page was not present , we have pageFault Interrupt .
SO to avoid that , we can check if the page is there , else , i choice to return 0x0:
physaddr_t *get_phyaddr(void *virtualaddr) {
uint32_t pdindex = (uint32_t)virtualaddr >> 22;
uint32_t ptindex = (uint32_t)virtualaddr >> 12 & 0x03FF;
uint32_t *pd, *pt, ptable;
if ((page_directory[pdindex] & 0x3) == 0x3) {
pd = (uint32_t *)(page_directory[pdindex] & 0xFFFFF000);
if ((pd[ptindex] & 0x3) == 0x3) {
ptable = pd[ptindex] & 0xFFFFF000;
pt = (uint32_t *)ptable;
return (physaddr_t *)(pt + ((uint32_t)(virtualaddr)&0xFFF));
} else
return 0x0;
} else
return 0x0;
}

Get the character dominant from a string

Okay.. according to the title i am trying to figure out a way - function that returns the character that dominates in a string. I might be able to figure it out.. but it seems something is wrong with my logic and i failed on this. IF someome can come up with this without problems i will be extremelly glad thank you.
I say "in a string" to make it more simplified. I am actually doing that from a buffered data containing a BMP image. Trying to output the base color (the dominant pixel).
What i have for now is that unfinished function i started:
RGB
bitfox_get_primecolor_direct
(char *FILE_NAME)
{
dword size = bmp_dgets(FILE_NAME, byte);
FILE* fp = fopen(convert(FILE_NAME), "r");
BYTE *PIX_ARRAY = malloc(size-54+1), *PIX_CUR = calloc(sizeof(RGB), sizeof(BYTE));
dword readed, i, l;
RGB color, prime_color;
fseek(fp, 54, SEEK_SET); readed = fread(PIX_ARRAY, 1, size-54, fp);
for(i = 54; i<size-54; i+=3)
{
color = bitfox_pixel_init(PIXEL_ARRAY[i], PIXEL_ARRAY[i+1], PIXEL_ARRAY[i+2);
memmove(PIX_CUR, color, sizeof(RGB));
for(l = 54; l<size-54; l+=3)
{
if (PIX_CUR[2] == PIXEL_ARRAY[l] && PIX_CUR[1] == PIXEL_ARRAY[l+1] &&
PIX_CUR[0] == PIXEL_ARRAY[l+2])
{
}
Note that RGB is a struct containing 3 bytes (R, G and B).
I know thats nothing but.. thats all i have for now.
Is there any way i can finish this?
If you want this done fast throw a stack of RAM at it (if available, of course). You can use a large direct-lookup table with the RGB trio to manufacture a sequence of 24bit indexes into a contiguous array of counters. In partial-pseudo, partial code, something like this:
// create a zero-filled 2^24 array of unsigned counters.
uint32_t *counts = calloc(256*256*256, sizeof(*counts));
uint32_t max_count = 0
// enumerate your buffer of RGB values, three bytes at a time:
unsigned char rgb[3];
while (getNextRGB(src, rgb)) // returns false when no more data.
{
uint32_t idx = (((uint32_t)rgb[0]) << 16) | (((uint32_t)rgb[1]) << 8) | (uint32_t)rgb[2];
if (++counts[idx] > max_count)
max_count = idx;
}
R = (max_count >> 16) & 0xFF;
G = (max_count >> 8) & 0xFF;
B = max_count & 0xFF;
// free when you have no more images to process. for each new
// image you can memset the buffer to zero and reset the max
// for a fresh start.
free(counts);
Thats it. If you can afford to throw a big hulk of memory at this a (it would be 64MB in this case, at 4 bytes per entry at 16.7M entries), then performing this becomes O(N). If you have a succession of images to process you can simply memset() the array back to zeros, clear max_count, and repeat for each additional file. Finally, don't forget to free your memory when finished.
Best of luck.

Storing input values in structs for fastest comparison later

I'm sampling eight input ports and comparing the values up to ten times a second.
These inputs will be XOR'd against a similar field, indicating which signals are set to "Active Low", then an AND operation to mask out input signals that are not going to be compared (though all signals are sampled, whether compared or not).
So this is an example for the sampling. I've created a struct where the signals will be stored and then saved in memory. This struct contains a lot of other values, so replacing the whole struct is not an option. Anyway, these input values need to be saved in a efficient way so I later on can perform fast XOR and AND operations with my masks.
void SampleData(){
// These are not all values o be sampled, only inputs
currentSample.i0 = RD13_bit;
currentSample.i1 = RD12;
currentSample.i2 = RD11;
currentSample.i3 = RD10;
currentSample.i4 = RE12;
currentSample.i5 = RE13;
currentSample.i6 = RF8;
currentSample.i7 = RF9;
}
This is an example of the comparison I need
checkInputSignals(){
activated = ((inputValues ^ activeLowInputs) & activeInputsMask);
if(activated ){
importantMethod();
}
}
I've tried a bitfield, but I couldn't get the operators to work, and I've no knowledge about the effiency using bitfield. Efficiency in this project is not focused on memory, but speed and comfort. How should I store my three fields? If it helps, I am using a dsPic33EP microprocessor.
If using a 'char' or 'uint_8', my sample method would look like this, right? And this does not seem to be the most elegant solution.
unsigned char inputValues;
void SampleData(){
currentSample.i0 = RD13_bit;
currentSample.i1 = RD12;
currentSample.i2 = RD11;
currentSample.i3 = RD10;
currentSample.i4 = RE12;
currentSample.i5 = RE13;
currentSample.i6 = RF8;
currentSample.i7 = RF9;
// For the masking
inputValues += currentSample.i7;
inputValues = (inputValues << 1) + currentSample.i6;
inputValues = (inputValues << 1) + currentSample.i5;
inputValues = (inputValues << 1) + currentSample.i4;
inputValues = (inputValues << 1) + currentSample.i3;
inputValues = (inputValues << 1) + currentSample.i2;
inputValues = (inputValues << 1) + currentSample.i1;
inputValues = (inputValues << 1) + currentSample.i0;
}
And I would have to do the same for my masks, for example.
void ConfigureActiveLowInputs(){
activeLowInputs += currentCalibration->I0_activeLow;
activeLowInputs = (activeLowInputs << 1) + currentCalibration->I1_activeLow;
activeLowInputs = (activeLowInputs << 1) + currentCalibration->I2_activeLow;
activeLowInputs = (activeLowInputs << 1) + currentCalibration->I3_activeLow;
activeLowInputs = (activeLowInputs << 1) + currentCalibration->I4_activeLow;
activeLowInputs = (activeLowInputs << 1) + currentCalibration->I5_activeLow;
activeLowInputs = (activeLowInputs << 1) + currentCalibration->I6_activeLow;
activeLowInputs = (activeLowInputs << 1) + currentCalibration->I7_activeLow;
}
There must be a better solution than bit shifting?
Some things I think you need to know.
Don't use bit fields. Apart from being non-portable, they make this kind of bit-twiddling harder, not easier.
Don't use run-time shifts. Get the compiler to do your work.
Do read code, study and practice. Learning bit-twiddling can be hard, and from your code I don't think you're quite there yet.
If we're going to help there are some things we need to know.
You mention 8 ports. Are they single bit ports, or single ports with multiple bits?
You mention 3 fields. What are they?
Your sample code uses + operators, which are rarely used in bit operations. Why?
In C the code usually ends up with a set of macros and defines, plus a few small functions. It's all quite simple, generates good code, and runs fast without too much effort. If we only knew what you were trying to do.
You seem to be storing individual bits is separate structure members, and then packing them to word on the fly to be able to apply masks; but it is probably more efficient to pack them into a word, and use a mask to access the individual bits when necessary.
The members i0, i1 etc. are probably unnecessary. It would be simpler to pack the bits directly into a uint8_t member, then write functions or macros to return individual bits where necessary.
uint8_t void SampleData()
{
return (RD13_bit << 7 ) |
(RD12 << 6) |
(RD11 << 5) |
(RD10 << 4) |
(RE12 << 3) |
(RE13 << 2) |
(RF8 << 1) |
RF9 ;
}
Then:
currentSample.i = SampleData() ;
Then you can apply masks to that directly. If you need to access individual bits (and if you don't, why make then separate members in the first case?) then for example:
#include <stdbool.h>
#define GETBIT( word, bit ) (((word) & (1<<bit) != 0)
bool i6 = GETBIT( currentSample.i, 6 ) ;

Base64 encoding in c - where is this one stray "X" coming from?

I'm using Ryyst's code from here - How do I base64 encode (decode) in C? - to base64 encode an image file and insert it into a HTML document.
It works! - except on the second line of base64-encoded output there is a single stray "X" at the end of the line.
It's always the second line, and only the second line, no matter how large the binary file (I've tried many).
If I remove the stray "X" manually, the encoded data exactly matches the output of the base64 utility, and the image is correctly decoded by the browser.
I've tried adding "\0" to the ends of each char array to make sure they are properly terminated (made no difference). I've checked that "buffer" is always 60 bytes, and that output_length is always 80 bytes (they are). I've read and re-read Ryyst's code to see if anything there could cause it (didn't see anything, but I am a C n00b). I did a rain dance. I searched for a virgin to toss down a volcano (can't find either one around here). The bug is still there.
Here are the important bits of the code -
while (cgiFormFileRead(CoverImageFile, buffer, BUFFERLEN, &got) ==cgiFormSuccess)
{
if(got>0)
{
fputs(base64_encode(buffer, got, &output_length), targetfile);
fputs("\n", targetfile);
}
}
And the base64_encode function is -
char *base64_encode(const unsigned char *data, size_t input_length,
size_t *output_length)
{
*output_length = 4 * ((input_length + 2) / 3);
char *encoded_data = malloc(*output_length);
if (encoded_data == NULL)
return NULL;
int i = 0, j = 0;
for (i = 0, j = 0; i < input_length;)
{
uint32_t octet_a = i < input_length ? data[i++] : 0;
uint32_t octet_b = i < input_length ? data[i++] : 0;
uint32_t octet_c = i < input_length ? data[i++] : 0;
uint32_t triple = (octet_a << 0x10) + (octet_b << 0x08) + octet_c;
encoded_data[j++] = encoding_table[(triple >> 3 * 6) & 0x3F];
encoded_data[j++] = encoding_table[(triple >> 2 * 6) & 0x3F];
encoded_data[j++] = encoding_table[(triple >> 1 * 6) & 0x3F];
encoded_data[j++] = encoding_table[(triple >> 0 * 6) & 0x3F];
}
for (i = 0; i < mod_table[input_length % 3]; i++)
encoded_data[*output_length - 1 - i] = '=';
return encoded_data;
}
(as you can see, I'm also using the cgic library v 205, but I don't think the problem is from there because its giving the right number of bytes)
(And BUFFERLEN is a constant, equals 60.)
What am I doing wrong, guys?
(Even more frustratingly, I /did/ get Ryyst's algorithm to work flawlessly once before, so his code /does/ work.)
I'm compiling using gcc on an ARM-based Debian Linux system, if that makes any difference.
Comparing your function with the original you've deleted:
encoded_data[j++] = encoding_table[(triple >> 0 * 6) & 0x3F];
Apart from that, the function is the same, I'm guessing that's just a copy error.
The problem is you are using BUFFERLEN rather than looking at got, which returns the amount of data read, the second line doesn't read the full 60 characters so you are encoding whatever junk is at the end of the buffer.

Resources