Get the character dominant from a string - c

Okay.. according to the title i am trying to figure out a way - function that returns the character that dominates in a string. I might be able to figure it out.. but it seems something is wrong with my logic and i failed on this. IF someome can come up with this without problems i will be extremelly glad thank you.
I say "in a string" to make it more simplified. I am actually doing that from a buffered data containing a BMP image. Trying to output the base color (the dominant pixel).
What i have for now is that unfinished function i started:
RGB
bitfox_get_primecolor_direct
(char *FILE_NAME)
{
dword size = bmp_dgets(FILE_NAME, byte);
FILE* fp = fopen(convert(FILE_NAME), "r");
BYTE *PIX_ARRAY = malloc(size-54+1), *PIX_CUR = calloc(sizeof(RGB), sizeof(BYTE));
dword readed, i, l;
RGB color, prime_color;
fseek(fp, 54, SEEK_SET); readed = fread(PIX_ARRAY, 1, size-54, fp);
for(i = 54; i<size-54; i+=3)
{
color = bitfox_pixel_init(PIXEL_ARRAY[i], PIXEL_ARRAY[i+1], PIXEL_ARRAY[i+2);
memmove(PIX_CUR, color, sizeof(RGB));
for(l = 54; l<size-54; l+=3)
{
if (PIX_CUR[2] == PIXEL_ARRAY[l] && PIX_CUR[1] == PIXEL_ARRAY[l+1] &&
PIX_CUR[0] == PIXEL_ARRAY[l+2])
{
}
Note that RGB is a struct containing 3 bytes (R, G and B).
I know thats nothing but.. thats all i have for now.
Is there any way i can finish this?

If you want this done fast throw a stack of RAM at it (if available, of course). You can use a large direct-lookup table with the RGB trio to manufacture a sequence of 24bit indexes into a contiguous array of counters. In partial-pseudo, partial code, something like this:
// create a zero-filled 2^24 array of unsigned counters.
uint32_t *counts = calloc(256*256*256, sizeof(*counts));
uint32_t max_count = 0
// enumerate your buffer of RGB values, three bytes at a time:
unsigned char rgb[3];
while (getNextRGB(src, rgb)) // returns false when no more data.
{
uint32_t idx = (((uint32_t)rgb[0]) << 16) | (((uint32_t)rgb[1]) << 8) | (uint32_t)rgb[2];
if (++counts[idx] > max_count)
max_count = idx;
}
R = (max_count >> 16) & 0xFF;
G = (max_count >> 8) & 0xFF;
B = max_count & 0xFF;
// free when you have no more images to process. for each new
// image you can memset the buffer to zero and reset the max
// for a fresh start.
free(counts);
Thats it. If you can afford to throw a big hulk of memory at this a (it would be 64MB in this case, at 4 bytes per entry at 16.7M entries), then performing this becomes O(N). If you have a succession of images to process you can simply memset() the array back to zeros, clear max_count, and repeat for each additional file. Finally, don't forget to free your memory when finished.
Best of luck.

Related

About one line in an implementation of MD5

I'm confused by one line of code in an implementation of MD5,
void MD5_Update(MD5_CTX *ctx, const void *data, unsigned long size)
{
MD5_u32plus saved_lo;
unsigned long used, available;
saved_lo = ctx->lo;
if ((ctx->lo = (saved_lo + size) & 0x1fffffff) < saved_lo)
ctx->hi++;
ctx->hi += size >> 29;
used = saved_lo & 0x3f;
if (used)
{
available = 64 - used;
if (size < available)
{
memcpy(&ctx->buffer[used], data, size);
return;
}
memcpy(&ctx->buffer[used], data, available);
data = (const unsigned char *)data + available;
size -= available;
body(ctx, ctx->buffer, 64);
}
if (size >= 64)
{
data = body(ctx, data, size & ~(unsigned long)0x3f);
size &= 0x3f;
}
memcpy(ctx->buffer, data, size);
}
The question line is if ((ctx->lo = (saved_lo + size) & 0x1fffffff) < saved_lo), it seems the 'size' counts bytes, but the 'ctx->lo' and 'saved_lo' count bits. Why add them together? There are also some similar codes in Github, and also some projects use these code. So anyone can give some explanation?
The remarks about "bit counters" are likely misleading - ctx->hi and ctx->lo count bytes, just like size does.
You correctly notice that you're just adding size (bytes) to ctx->lo (and then checking for overflow/propagating overflow into ctx->hi). The overflow check is pretty simple - lo is used as a 29-bit integer, and if the result after adding/masking is less than the original value, then overflow occurred.
The checks around used are also evidence for ctx->lo and ctx->hi being byte counters -- body processes data 64 bytes at a time, and the lo counter is ANDed with 0x3F (i.e. 63).

Determine if a message is too long to embed in an image

I created a program that embeds a message in a PPM file by messing with the last bit in each byte in the file. The problem I have right now is that I don't know if I am checking if a message is too long or not correctly. Here's what I've got so far:
int hide_message(const char *input_file_name, const char *message, const char *output_file_name)
{
unsigned char * data;
int n;
int width;
int height;
int max_color;
//n = 3 * width * height;
int code = load_ppm_image(input_file_name, &data, &n, &width, &height, &max_color);
if (code)
{
// return the appropriate error message if the image doesn't load correctly
return code;
}
int len_message;
int count = 0;
unsigned char letter;
// get the length of the message to be hidden
len_message = (int)strlen(message);
if (len_message > n/3)
{
fprintf(stderr, "The message is longer than the image can support\n");
return 4;
}
for(int j = 0; j < len_message; j++)
{
letter = message[j];
int mask = 0x80;
// loop through each byte
for(int k = 0; k < 8; k++)
{
if((letter & mask) == 0)
{
//set right most bit to 0
data[count] = 0xfe & data[count];
}
else
{
//set right most bit to 1
data[count] = 0x01 | data[count];
}
// shift the mask
mask = mask>>1 ;
count++;
}
}
// create the null character at the end of the message (00000000)
for(int b = 0; b < 8; b++){
data[count] = 0xfe & data[count];
count++;
}
// write a new image file with the message hidden in it
int code2 = write_ppm_image(output_file_name, data, n, width, height, max_color);
if (code2)
{
// return the appropriate error message if the image doesn't load correctly
return code2;
}
return 0;
}
So I'm checking to see if the length of the message (len_message) is longer that n/3, which is the same thing as width*height. Does that seem correct?
The check you're currently doing is checking whether the message has more bytes than the image has pixels. Because you're only using 1 bit per pixel to encode the message, you need to check if the message has more bits than the message has pixels.
So you need to do this:
if (len_message*8 > n/3)
In addition to #dbush's remarks about checking the number of bits in your message, you appear not to be accounting for all the bytes available to you in the image. Normal ("raw", P6-format) PPM images use three color samples per pixel, at either 8 or 16 bits per sample. Thus, the image contains at least 3 * width * height bytes of color data, and maybe as many as 6 * width * height.
On the other hand, the point of steganophraphy is to make the presence of a hidden message difficult to detect. In service to that objective, if you have a PPM with 16 bits per sample then you probably want to avoid modifying the more-significant bytes of the samples. Or if you don't care about that, then you might as well use the whole low-order byte of each sample in that case.
Additionally, PPM files record the maximum possible value of any sample, which does not need to be the same as the maximum value of the underlying type. It is possible for your technique to change the actual maximum value to be greater than the recorded maximum, and if you do not then change the maximum-value field as well then the inconsistency could be a tip-off that the file has been tampered with.
Furthermore, raw PPM format affords the possibility of multiple images of the same size in one file. The file header does not express how many there are, so you have to look at the file size to tell. You can use the bytes of every image in the file to hide your message.

How to optimize C for loop for font rendering on oled display

I need to optimize this function: Any strange way to optimize the for loop? (early break i think can't be possible)
void SeeedGrayOLED::putChar(unsigned char C)
{
if(C < 32 || C > 127) //Ignore non-printable ASCII characters. This can be modified for multilingual font.
{
C=' '; //Space
}
uint8_t k,offset = 0;
char bit1,bit2,c = 0;
for(char i=0;i<16;i++)
{
for(char j=0;j<32;j+=2)
{
if(i>8){
k=i-8;
offset = 1;
}else{
k=i;
}
// Character is constructed two pixel at a time using vertical mode from the default 8x8 font
c=0x00;
bit1=(pgm_read_byte(&hallfetica_normal[C-32][j+offset]) >> (8-k)) & 0x01;
bit2=(pgm_read_byte(&hallfetica_normal[C-32][j+offset]) >> ((8-k)-1)) & 0x01;
// Each bit is changed to a nibble
c|=(bit1)?grayH:0x00;
c|=(bit2)?grayL:0x00;
sendData(c);
}
}
}
I've got a font in the array hallfetica_normal, is an array of array of uint8_t, that maybe compressed or something like that?
This code run on a arduino, ad i've to run a countdown from 500 to 0 with one unit down every 10/20ms.
EDIT
This is the new code after yours indication, thanks all:
I'm looking to organise the font differently to permit less call to pgm_read_byte.. (something like changing the orientation... i wonder)
void SeeedGrayOLED::putChar(unsigned char C)
{
if(C < 32 || C > 127) //Ignore non-printable ASCII characters. This can be modified for multilingual font.
{
C=' '; //Space
}
char c,byte = 0x00;
unsigned char nibble_lookup[] = { 0, grayL, grayH, grayH | grayL };
for(int ii=0;ii<2;ii++){
for(int i=0;i<8;i++)
{
for(int j=0;j<32;j+=2)
{
byte = pgm_read_byte(&hallfetica_normal[C-32][j+ii]);
c = nibble_lookup[(byte >> (8-i)) & 3];
sendData(c);
}
}
}
}
Well, you seem to be reading the same byte twice in a row unnecessarily via pgm_read_byte(&hallfetica_normal[C-32][j+offset]). You could load that once into a local variable.
Additionally, you could avoid the if(i>8){ check per iteration by breaking up the code into two loops; one where i goes from 0 to 8 and another where it goes from 9 to 15. (Although I suspect you really intended >= here, making the loop boundaries 0-7 then 8-15.) That also means things like offset become constant values, which will help.
In an effort to make the inner loop as fast as possible, I'd try to get rid of all branching with a lookup table and see whether that helped.
First, I'd define the lookup table outside the loop:
/* outside the loop */
unsigned char h_lookup[] = { 0, grayH };
unsigned char l_lookup[] = { 0, grayL };
Then inside the loop, since you're testing the least-significant bit, you can use that as an index into the lookup table. If it's clear, then the lookup index will be 0. If it's set, then the lookup index will be 1:
/* inside the loop */
byte = pgm_read_byte(&hallfetica_normal[C-32][j+offset]);
c = h_lookup[((byte >> (8-k)) & 0x01)] |
l_lookup[((byte >> (8-k-1)) & 0x01)]
sendData(c);
Since you're masking and testing 2 adjacent bits, 8-k and 8-k-1, you could list all 4 possibilities in a single lookup table:
/* Outside loop */
unsigned char nibble_lookup[] = { 0, grayL, grayH, grayH | grayL };
And then the lookup becomes dramatically simplified.
/* loop */
byte = pgm_read_byte(&hallfetica_normal[C-32][j+offset]);
c = nibble_lookup[(byte >> (8-k)) & 3];
sendData(c);
The other answer has addressed what to do about the branches in the top part of your inner loop.

Using bzip2 low-level routines to compress chunks of data

The Overview
I am using the low-level calls in the libbzip2 library: BZ2_bzCompressInit(), BZ2_bzCompress() and BZ2_bzCompressEnd() to compress chunks of data to standard output.
I am migrating working code from higher-level calls, because I have a stream of bytes coming in and I want to compress those bytes in sets of discrete chunks (a discrete chunk is a set of bytes that contains a group of tokens of interest — my input is logically divided into groups of these chunks).
A complete group of chunks might contain, say, 500 chunks, which I want to compress to one bzip2 stream and write to standard output.
Within a set, using the pseudocode I outline below, if my example buffer is able to hold 101 chunks at a time, I would open a new stream, compress 500 chunks in runs of 101, 101, 101, 101, and one final run of 96 chunks that closes the stream.
The Problem
The issue is that my bz_stream structure instance, which keeps tracks of the number of compressed bytes in a single pass of the BZ2_bzCompress() routine, seems to claim to be writing more compressed bytes than the total bytes in the final, compressed file.
For example, the compressed output could be a file with a true size of 1234 bytes, while the number of reported compressed bytes (which I track while debugging) is somewhat higher than 1234 bytes (say 2345 bytes).
My rough pseudocode is in two parts.
The first part is a rough sketch of what I do to compress a subset of chunks (and I know that I have another subset coming after this one):
bz_stream bzStream;
unsigned char bzBuffer[BZIP2_BUFFER_MAX_LENGTH] = {0};
unsigned long bzBytesWritten = 0UL;
unsigned long long cumulativeBytesWritten = 0ULL;
unsigned char myBuffer[UNCOMPRESSED_MAX_LENGTH] = {0};
size_t myBufferLength = 0;
/* initialize bzStream */
bzStream.next_in = NULL;
bzStream.avail_in = 0U;
bzStream.avail_out = 0U;
bzStream.bzalloc = NULL;
bzStream.bzfree = NULL;
bzStream.opaque = NULL;
int bzError = BZ2_bzCompressInit(&bzStream, 9, 0, 0);
/* bzError checking... */
do
{
/* read some bytes into myBuffer... */
/* compress bytes in myBuffer */
bzStream.next_in = myBuffer;
bzStream.avail_in = myBufferLength;
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
do
{
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
bzError = BZ2_bzCompress(&bzStream, BZ_RUN);
/* error checking... */
bzBytesWritten = ((unsigned long) bzStream.total_out_hi32 << 32) + bzStream.total_out_lo32;
cumulativeBytesWritten += bzBytesWritten;
/* write compressed data in bzBuffer to standard output */
fwrite(bzBuffer, 1, bzBytesWritten, stdout);
fflush(stdout);
}
while (bzError == BZ_OK);
}
while (/* while there is a non-final myBuffer full of discrete chunks left to compress... */);
Now we wrap up the output:
/* read in the final batch of bytes into myBuffer (with a total byte size of `myBufferLength`... */
/* compress remaining myBufferLength bytes in myBuffer */
bzStream.next_in = myBuffer;
bzStream.avail_in = myBufferLength;
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
do
{
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
bzError = BZ2_bzCompress(&bzStream, (bzStream.avail_in) ? BZ_RUN : BZ_FINISH);
/* bzError error checking... */
/* increment cumulativeBytesWritten by `bz_stream` struct `total_out_*` members */
bzBytesWritten = ((unsigned long) bzStream.total_out_hi32 << 32) + bzStream.total_out_lo32;
cumulativeBytesWritten += bzBytesWritten;
/* write compressed data in bzBuffer to standard output */
fwrite(bzBuffer, 1, bzBytesWritten, stdout);
fflush(stdout);
}
while (bzError != BZ_STREAM_END);
/* close stream */
bzError = BZ2_bzCompressEnd(&bzStream);
/* bzError checking... */
The Questions
Am I calculating cumulativeBytesWritten (or, specifically, bzBytesWritten) incorrectly, and how would I fix that?
I have been tracking these values in a debug build, and I do not seem to be "double counting" the bzBytesWritten value. This value is counted and used once to increment cumulativeBytesWritten after each successful BZ2_bzCompress() pass.
Alternatively, am I not understanding the correct use of the bz_stream state flags?
For example, does the following compress and keep the bzip2 stream open, so long as I keep sending some bytes?
bzError = BZ2_bzCompress(&bzStream, BZ_RUN);
Likewise, can the following statement compress data, so long as there are at least some bytes are available to access from the bzStream.next_in pointer (BZ_RUN), and then the stream is wrapped up when there are no more bytes available (BZ_FINISH)?
bzError = BZ2_bzCompress(&bzStream, (bzStream.avail_in) ? BZ_RUN : BZ_FINISH);
Or, am I not using these low-level calls correctly at all? Should I go back to using the higher-level calls to continuously append a grouping of compressed chunks of data to one main file?
There's probably a simple solution to this, but I've been banging my head on the table for a couple days in the course of debugging what could be wrong, and I'm not making much progress. Thank you for any advice.
In answer to my own question, it appears I am miscalculating the number of bytes written. I should not use the total_out_* members. The following correction works properly:
bzBytesWritten = sizeof(bzBuffer) - bzStream.avail_out;
The rest of the calculations follow.

hunting for a particular pair of bits '10' or '01' in a character array

This may be a slightly theoretical question. I have a char array of bytes containing network packets. I want to check for the occurrence of a particular pair of bits ('01' or '10')every 66 bits. That is to say once I locate the first pair of bits I have to skip 66 bits and check the presence of same pair of bits again. I am trying to implement a program with masks and shifts and it is kind of getting complicated. I want to know if someone can suggest a better way to do the same thing.
The code I have written so far looks something like this. It is not complete though.
test_sync_bits(char *rec, int len)
{
uint8_t target_byte = 0;
int offset = 0;
int save_offset = 0;
uint8_t *pload = (uint8_t*)(rec + 24);
uint8_t seed_mask = 0xc0;
uint8_t seed_shift = 6;
uint8_t value = 0;
uint8_t found_sync = 0;
const uint8_t sync_bit_spacing = 66;
/*hunt for the first '10' or '01' combination.*/
target_byte = *(uint8_t*)(pload + offset);
/*Get all combinations of two bits from target byte.*/
while(seed_shift)
{
value = ((target_byte & seed_mask) >> seed_shift);
if((value == 0x01) || (value == 0x10))
{
save_offset = offset;
found_sync = 1;
break;
}
else
{
seed_mask = (seed_mask >> 2) ;
seed_shift-=2;
}
}
offset = offset + 8;
seed_shift = (seed_shift - 4) > 0 ? (seed_shift - 4) : (seed_shift + 8 - 4);
seed_mask = (seed_mask >> (6 - seed_shift));
}
Another idea I came up with was to use a structure defined below
typedef struct
{
int remainder_bits;
int extra_bits;
int extra_byte;
}remainder_bits_extra_bits_map_t;
static remainder_bits_extra_bits_map_t sync_bit_check [] =
{
{6, 4, 0},
{5, 5, 0},
{4, 6, 0},
{3, 7, 0},
{2, 8, 0},
{1, 1, 1},
{0, 2, 1},
};
Is my approach correct? Can anyone suggest any improvements for the same?
Lookup Table Idea
There are only 256 possible bytes. That is few enough that you can construct a lookup table of all the possible bit combinations that can happen in one byte.
The lookup table value could record the bit position of the pattern and it could also have special values that mark possible continuation start or continuation finish values.
Edit:
I decided that continuation values would be silly. Instead, to check for a pattern that overlaps a byte, shift the byte and OR in the bit from the other byte, or manually check the end bits at each byte. Maybe ((bytes[i] & 0x01) & (bytes[i+1] & 0x80)) == 0x80 and ((bytes[i] & 0x01) & (bytes[i+1] & 0x80)) == 0x01 would work for you.
You didn't say so I am also assuming that you are looking for the first match in any byte. If you are looking for every match, then checking for the end pattern at +66 bits, that's a different problem.
To create the lookup table, I would write a program to do it for me. It could be in your favorite script language or it could be in C. The program would write a file that looked something like:
/* each value is the bit position of a possible pattern OR'd with a pattern ID bit. */
/* 0 is no match */
#define P_01 0x00
#define P_10 0x10
const char byte_lookup[256] = {
/* 0: 0000_0000, 0000_0001, 0000_0010, 0000_0011 */
0, 2|P_01, 3|P_01, 3|P_01,
/* 4: 0000_0100, 0000_0101, 0000_0110, 0000_0111, */
4|P_01, 4|P_01, 4|P_01, 4|P_01,
/* 8: 0000_1000, 0000_1001, 0000_1010, 0000_1011, */
5|P_01, 5|P_01, 5|P_01, 5|P_01,
};
Tedious. That's why I would write a program to write it for me.
This is a variation of the classic de-blocking problem that often comes up when reading from a stream. That is, data comes in discrete units that don't match up to the unit size that you wish to scan. The challenges in this are 1) buffering (which doesn't affect you because you have access to the whole array) and 2) managing all of the state (as you found out). A good approach is to write a consumer function that acts something like fread() and fseek() which maintains its own state. It returns the requested data you're interested in, aligned properly to the buffers you give it.

Resources