Calculate the size to a Base 64 encoded message - c

I have a binary string that I am encoding in Base 64. Now, I need to know before hand the size of the final Base 64 encoded string will be.
Is there any way to calculate that?
Something like:
BinaryStringSize is 64Kb
EncodedBinaryStringSize will be 127Kb after encoding.
Oh, the code is in C.
Thanks.

If you do Base64 exactly right, and that includes padding the end with = characters, and you break it up with a CR LF every 72 characters, the answer can be found with:
code_size = ((input_size * 4) / 3);
padding_size = (input_size % 3) ? (3 - (input_size % 3)) : 0;
crlfs_size = 2 + (2 * (code_size + padding_size) / 72);
total_size = code_size + padding_size + crlfs_size;
In C, you may also terminate with a \0-byte, so there'll be an extra byte there, and you may want to length-check at the end of every code as you write them, so if you're just looking for what you pass to malloc(), you might actually prefer a version that wastes a few bytes, in order to make the coding simpler:
output_size = ((input_size * 4) / 3) + (input_size / 96) + 6;

geocar's answer was close, but could sometimes be off slightly.
There are 4 bytes output for every 3 bytes of input. If the input size is not a multiple of three, we must add to make it one. Otherwise leave it alone.
input_size + ( (input_size % 3) ? (3 - (input_size % 3)) : 0)
Divide this by 3, then multiply by 4. That is our total output size, including padding.
code_padded_size = ((input_size + ( (input_size % 3) ? (3 - (input_size % 3)) : 0) ) / 3) * 4
As I said in my comment, the total size must be divided by the line width before doubling to properly account for the last line. Otherwise the number of CRLF characters will be overestimated. I am also assuming there will only be a CRLF pair if the line is 72 characters. This includes the last line, but not if it is under 72 characters.
newline_size = ((code_padded_size) / 72) * 2
So put it all together:
unsigned int code_padded_size = ((input_size + ( (input_size % 3) ? (3 - (input_size % 3)) : 0) ) / 3) * 4;
unsigned int newline_size = ((code_padded_size) / 72) * 2;
unsigned int total_size = code_padded_size + newline_size;
Or to make it a bit more readable:
unsigned int adjustment = ( (input_size % 3) ? (3 - (input_size % 3)) : 0);
unsigned int code_padded_size = ( (input_size + adjustment) / 3) * 4;
unsigned int newline_size = ((code_padded_size) / 72) * 2;
unsigned int total_size = code_padded_size + newline_size;

Here is a simple C implementation (without modulus and trinary operators) for raw base64 encoded size (with standard '=' padding):
int output_size;
output_size = ((input_size - 1) / 3) * 4 + 4;
To that you will need to add any additional overhead for CRLF if required. The standard base64 encoding (RFC 3548 or RFC 4648) allows CRLF line breaks (at either 64 or 76 characters) but does not require it. The MIME variant (RFC 2045) requires line breaks after every 76 characters.
For example, the total encoded length using 76 character lines building on the above:
int final_size;
final_size = output_size + (output_size / 76) * 2;
See the base64 wikipedia entry for more variants.

Check out the b64 library. The function b64_encode2() can give a maximum estimate of the required size if you pass NULL, so you can allocate memory with certainty, and then call again passing the buffer and have it do the conversion.

I ran into a similar situation in python, and using codecs.iterencode(text, "base64") the correct calculation was:
adjustment = 3 - (input_size % 3) if (input_size % 3) else 0
code_padded_size = ( (input_size + adjustment) / 3) * 4
newline_size = ((code_padded_size) / 76) * 1
return code_padded_size + newline_size

Base 64 transforms 3 bytes into 4.
If you're set of bits does not happen to be a multiple of 24 bits, you must pad it out so that it has a multiple of 24 bits (3 bytes).

I think this formula should work:
b64len = (size * 8 + 5) / 6

if (inputSize == 0) return 0;
int size = ((inputSize - 1) / 3) * 4 + 4;
int nlines = (size - 1)/ maxLine + 1;
return size + nlines * 2;
This formula adds a terminating CRLF (MIME, rfc2045) if and only if the last line does not fit exactly in max line length.

The actual length of MIME-compliant base64-encoded binary data is usually about 137% of the original data length, though for very short messages the overhead can be a lot higher because of the overhead of the headers. Very roughly, the final size of base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers).
In other words, you can approximate the size of the decoded data with this formula:
BytesNeededForEncoding = (string_length(base_string) * 1.37) + 814;
BytesNeededForDecoding = (string_length(encoded_string) - 814) / 1.37;
Source: http://en.wikipedia.org/wiki/Base64

Related

C math function with _BYTE and

I used IDA to decompile a function a program and i don`t know what exactly this code work.
flag[i] ^= *((_BYTE *)&v2 + (signed int)i % 4);
How does this work?
This could be used for xor-"decrypting" (or encrypting, the operation is symmetric) a buffer with a 4-byte key. See the following code, which might be a bit better readable, than the decompiler output
char flag[SIZE];
char key[4];
for (int i = 0; i < SIZE; i++) {
flag[i] = flag[i] ^ key[i%4];
}
So if your data is "ZGUIHUOHJIOJOPIJMXAR" and your key is "akey", then the snippet basically does
ZGUIHUOHJIOJOPIJMXA
^ akeyakeyakeyakeyake
=====================
yourplaintextresult (<- not really the result here, but you get the idea)
(_BYTE *)&v2
This says the address of v2 should be an address to a variable of type byte
(signed int)i % 4
This says the remainder of integer i divided by 4 (i is probably a loop counter)
(_BYTE *)&v2 + (signed int)i % 4
This says address that v2 points to it, should be incremented by (i % 4).
*((_BYTE *)&v2 + (signed int)i % 4)
This is to dereference the content in memory at position (v2 + i%4)
flag[i] ^= *((_BYTE *)&v2 + (signed int)i % 4);
This says the i th element of flag array should be XOR-ed with the result of the content in memory at the position of (v2 + i%4)

Circular buffer increment using alternate method

I am not able to understand how does the last statement increments the pointer.Can somebody explain me with few examples?
The code, as shown:
aptr = (aptr + 1) & (void *)(BUFFERSIZE - 1);
// |________| incremented here
Since it is a circular buffer AND the buffer size is a power of 2, then the & is an easy and fast way to roll over by simply masking. Assuming that the BUFFERSIZE is 256, then:
num & (256 - 1) == num % 256
num & (0x100 - 1) == num % 0x100
num & (0x0ff) == num % 0x100
When the number is not a power of 2, then you can't use the masking technique:
num & (257 - 1) != num % 257
num & (0x101 - 1) != num % 0x101
num & 0x100 != num % 0x101
The (void *) allows the compiler to choose an appropriate width for the BUFFERSIZE constant based on your pointer width... although it is generally best to know - and use! - the width before a statement like this.
I added the hex notation so to make more clear why the & results in an emulated rollover event. Note that 0xff is binary 0x11111111, so the AND operation is simply masking off the upper bits.
2 problems with this approach.
A) Using a pointer with a bit-wise operation is not portable code. #Ilja Everilä
char *aptr;
// error: invalid operands to binary & (have 'char *' and 'void *')
// The following increments the index: (not really)
// aptr = (aptr + 1) & (void *)(BUFFERSIZE-1);
B) With compilers that support the non-standard math on a void * akin to a char *, the math is wrong if aptr point to an object wider than char and BUFFERSIZE is the number of elements in the buffer and not the byte-size. Of course this depends on how the non-standard complier implements some_type * & void *. Why bother to unnecessarily code to use some implementation specific behavior?
Instead use i % BUFFERSIZE. This portable approach works when BUFFERSIZE is a power-of-2 and well as when it is not. When a compiler sees i % power-of-2 and i is some unsigned type, then the same code is certainly emitted as i & (power-of-2 - 1).
For compilers that do not recognize this optimization, then one should consider a better compiler.
#define BUFFERSIZE 256
int main(void) {
char buf[BUFFERSIZE];
// pointer solution
char *aptr = buf;
aptr = &buf[(aptr - buf + 1) % BUFFERSIZE];
// index solution
size_t index = 0;
index = (index + 1) % BUFFERSIZE;
}

Formula for memory alignment

While browsing through some kernel code, I found a formula for memory alignment as
aligned = ((operand + (alignment - 1)) & ~(alignment - 1))
So then I even write a program for this:
#include <stdio.h>
int main(int argc, char** argv) {
long long operand;
long long alignment;
if(argv[1]) {
operand = atol(argv[1]);
} else {
printf("Enter value to be aligned!\n");
return -1;
}
if(argv[2]) {
alignment = strtol(argv[2],NULL,16);
} else {
printf("\nDefaulting to 1MB alignment\n");
alignment = 0x100000;
}
long long aligned = ((operand + (alignment - 1)) & ~(alignment - 1));
printf("Aligned memory is: 0x%.8llx [Hex] <--> %lld\n",aligned,aligned);
return 0;
}
But I don't get this logic at all. How does this work?
Basically, the formula increase an integer operand (address) to a next address aligned to the alignment.
The expression
aligned = ((operand + (alignment - 1)) & ~(alignment - 1))
is basically the same as a bit easier to understand formula:
aligned = int((operand + (alignment - 1)) / alignment) * alignment
For example, having operand (address) 102 and alignment 10 we get:
aligned = int((102 + 9) / 10) * 10
aligned = int(111 / 10) * 10
aligned = 11 * 10
aligned = 110
First we add to the address 9 and get 111. Then, since our alignment is 10, basically we zero out the last digit, i.e. 111 / 10 * 10 = 110
Please note, that for each power of 10 alignment (i.e. 10, 100, 1000 etc) we basically zeroing out last digits.
On most CPUs, division and multiplication operations take much more time than bitwise operations, so let us get back to the original formula:
aligned = ((operand + (alignment - 1)) & ~(alignment - 1))
The second part of the formula makes sense only when alignment is a power of 2. For example:
... & ~(2 - 1) will zero last bit of address.
... & ~(64 - 1) will zero last 5 bits of address.
etc
Just like with zeroing out last few digits of an address for power of 10 alignments, we zeroing out last few bits for power of 2 alignments.
Hope it does make sense for you now.

lpc2148 RTC print issue

enter code here
void DateTimeConversion (void)
{
unsigned char TempDay,TempMonth,TempYear,i;
unsigned char TempHour,TempMinute,TempSecond;
TempDay=0;TempMonth=0;TempYear=0;TempHour=0;TempHour=0;TempSecond=0;
col=1;
for(i=0;i<10;i++)
{
sprintf(MyStr,"%c",(unsigned int)StoreUserID[i]);
ClcdGoto(col,2);ClcdPutS_P(MyStr);
col++;
}
TempDay=((unsigned int)(StoreUserID[6] * 10) + (unsigned int)(StoreUserID[7] * 1));
TempMonth=((unsigned int)(StoreUserID[8] * 10) + (unsigned int)(StoreUserID[9] * 1));
TempYear=((unsigned int)(StoreUserID[4] * 10) + (unsigned int)(StoreUserID[5] * 1));
TempHour = ((unsigned int)(StoreUserID[0] *10) + (unsigned int)(StoreUserID[1] * 1));
TempMinute = ((unsigned int)(StoreUserID[2] *10) + (unsigned int)(StoreUserID[3] * 1));
TempSecond = ((unsigned int)(StoreUserID[4] *10) + (unsigned int)(StoreUserID[5] * 1));
}
I am using LPC 2148 for RTC.
description:
I have used singlekey to read multiple values(0-9)on same column (2 line LCD display used).
The read value is stored in StoreUserId array(as col++ array also increased)
Above function is called to save value for SEC,MINUTE,HOUR.
StoreUserid is print to crosscheck value enter correctly.
But after conversion(check multiply *10) TempSecond, TempMinute, TempHour shows random value after conversion not getting where is issue?.
Your problem is almost certainly here:
{sprintf(MyStr,"%c",(unsigned int)StoreUserID[i]);
I'm not able to know how you declared the StoreUserID array (as char? as int?) and why did you consistently cast to unsigned int (for small positive numbers as year or seconds it has no meaning) but there are these 2 most probable corrections:
{sprintf(MyStr,"%c", StoreUserID[i]); // for character representation in StoreUserID[]
or - and I think that this is your case -
{sprintf(MyStr,"%c",StoreUserID[i] + '0'); // for numerical one - the conversion is needed
Explanation:
Number 0 has character representation '0', which is some number (in ASCII it is 48).
Number 1 has character representation '1', which is the next number (in ASCII it is 49).
... and so on.
So you need to add value '0' to obtain the character representation from the bare number.

Getting a double word from binary data

char * data = 0xFF000010FFFFFFFFFFFFFFFFFFFFFFFFFFF;
I want to get DOUBLE WORD in data[1] (0x00000010) and store it in var int i.
Would this do the trick?
int i = (int) data[1]+data[2]+data[3]+data[4]
You are attempting to just add four bytes rather than position the values into the correct part of the integer. Without specifying the endianness of your platform, it's not possible to provide a final answer.
The general approach is to place each byte in the correct position of the int, something like this:
int i = 256 * 256 * 256 * data[0] + 256 * 256 * data[1] + 256 * data[2] + data[3]
(big endian example)
Note that the indices are 0-based, not 1-based as in your example. The "base" in this example is 256 because each byte can represent 256 values.
To understand why this is so, consider the decimal number
5234
You can re-write that as:
5000 + 200 + 30 + 4
or 10 * 10 * 10 * 5 + 10 * 10 * 2 + 10 * 3 + 4
As you process data for each digit, you multiply the value by the-number-base-to-the-power-of-the-digit-position (rightmost digit for base 10 is 10^0, then 10^1, 10^2, etc).
You must convert from your string into the actual data. Consider using atol() or something similar to get the value in memory, then worry about editing it.

Resources