Bynary/Integer representation of file in C - c

I am trying to get some plain Ruby code ported to the C language for speed but I am not really being able to achieve that as C is not my primary language and I am struggling to understand what is causing it. Here is the code in Ruby for clarity of the purpose:
source_file = open("/tmp/image.jpg")
content = source_file.read
content_binary_string = content.unpack("b*")[0]
content_integer = content_binary_string.to_i(2)
I've been then playing around with several attempts in C to achieve the same results and somehow I am stuck on the binary comparison between the Ruby and C outputs. Here is the C code I've got so far:
// gcc -Wall -Wextra -std=c99 xxd.c
#include <stdio.h>
#include <string.h>
int main() {
char buffer[32];
FILE *image;
image = fopen("/tmp/image.jpg","rb");
while (!feof(image)) {
fread(buffer, sizeof(buffer), 1, image);
const size_t len = strlen(buffer);
for (size_t j = 0; j < len; j++ ) {
for (int i = 7; i >= 0; i--) {
printf(buffer[j] & (1 << i) ? "1" : "0");
}
}
}
return 0;
}
Please notice that the C code is not yet complete and that I am for now printing the binary value so I can compare the output with the current working code and I believe I am missing some basic concept that is abstracted by Ruby like byte size or something else that is not obvious to me yet. The output is quite different at this point.
After I get that step right, then the goal is to produce an integer based value from that.
Any clues towards understanding why that output isn't accurate is highly appreciated!

feof() should be used after an input function indicated a failed read . It distinguishes if the failure was due to End-of-File.
strlen() reports the length (not including null character) of a string. In C, a string is a sequence of characters up to and including the null character. Do no use this to determine how many bytes are in a buffer. Use the return value of fread(). #subin
Check the return value of input functions. #user694733
Avoid using a string that is not a format string as the first argument to printf().
Better to use unsigned like 1u with bit operations.
Minor: Avoid magic numbers like 7. CHAR_BIT is available in <limit.h>.
Minor: Use fclose() Put your toys away when done.
image = fopen("/tmp/image.jpg","rb");
if (image == NULL) Handle_OpenFailure();
size_t count;
while ((count = fread(buffer, sizeof buffer[0], sizeof buffer, image)) > 0) {
for (size_t j = 0; j < count; j++ ) {
for (int i = CHAR_BIT-1; i >= 0; i--) {
fputc(buffer[j] & (1u << i) ? '1' : '0', stdout);
}
}
}
fclose(image);

Related

Removing duplicate characters from two argument strings in C

I'm trying to optimize a problem that I have to make it more readable with the same speed optimization. My problem consists in this:
Allowed function: write.c, nothing else.
Write a program that takes two strings and displays, without doubles, the
characters that appear in either one of the strings.
The display will be in the order characters appear in the command line, and
will be followed by a \n.
As you can see, in the main it will take two of your argument strings (argv[1] and argv[2]) into our function (void remove_dup(char *str, char *str2) after is compiling it with GCC. That temporary array will hold the ASCII value of the character after a duplicate is detected. For example, str1 = "hello" and str2 = "laoblc". The expected output will result as "heloabc" using the write function.
However, GCC was complaining because I have an array subscript with my temporary character array filled in with zeroes from the index of my strings. To stop making the compiler complaint, I had to cast the string index as an int to hold the ASCII value inside my temporary array. This will be our checker, which will determine if there is a duplicate in our string depending on the value of the character. Recompiling it again, but this time using warning flags: gcc -Wextra -Werror -Wall remove_dup.c. This is the error that I get:
remove_dup:11 error: array subscript is of type 'char' [-Werror,-Wchar-subscripts]
if (temp[str[i]] == 0)
^~~~~~~
remove_dup:13 error: array subscript is of type 'char' [-Werror,-Wchar-subscripts]
temp[str[i]] = 1;
^~~~~~~
remove_dup:21 error: array subscript is of type 'char' [-Werror,-Wchar-subscripts]
if (temp[str2[i]] == 0)
^~~~~~~~
remove_dup.c:23 error: array subscript is of type 'char' [-Werror,-Wchar-subscripts]
temp[str2[i]] = 1;
^~~~~~~~
Now my real question is, how can I have the same time efficiency BUT without using any kind of casting into my array? This program is running as O(m + n) where m is our first string and n is our second string.
This is the code:
void remove_dup(char *str, char *str2)
{
int temp[10000] = {0};
int i;
i = 0;
while (str[i])
{
if (temp[(int)str[i]] == 0)
{
temp[(int)str[i]] = 1;
write(1, &str[i], 1);
}
i++;
}
i = 0;
while (str2[i])
{
if (temp[(int)str2[i]] == 0)
{
temp[(int)str2[i]] = 1;
write(1, &str2[i], 1);
}
i++;
}
}
int main(int argc, char *argv[])
{
if (argc == 3)
remove_dup(argv[1], argv[2]);
write(1, "\n", 1);
return (0);
}
I hope this is clear enough with the logic structure I explained. I might have grammar mistakes, so bear with me :).
Casting here will have no performance penalty.
However, as a rule of thumb, it is generally best to avoid explicit casts whenever possible. You can do this by for example by changing:
temp[(int)str[i]]
to:
temp[+str[i]]
This will work by the usual arithmetic conversions.
However, your code has another problem. You could ask: why would gcc bother to issue such an annoying warning message?
One answer is that they just like to be annoying. A better guess is that on most platforms char is signed-- see Is char signed or unsigned by default? --and so if your string happen to have an ASCII char greater than 127 (i.e. less than zero), you will have a bug.
One way to fix this is to replace:
temp[(int)str[i]]
with:
temp[str[i] + 128]
(and change int temp[10000] = {0} to int temp[256 + 128] = {0}). This will work regardless of the default sign of char.
Now my real question is, how can I have the same time efficiency BUT without using any kind of casting into my array?
I don't believe casting in C has a runtime penalty. Everything in C is a number anyway. I believe it's just telling the compiler that yes, you know you're using the wrong type and believe it's ok.
Note that char can be signed. It is possible for a negative number to sneak in there.
This program is running as O(m * n) where m is our first string and n is our second string.
No, it's running as O(n). O(m*n) would be if you were iterating over one string for every character of the other.
for( int i = 0; i < strlen(str1); i++ ) {
for( int j = 0; j < strlen(str2); j++ ) {
...
}
}
But you're looping over each string one after the other in two independent loops. This is O(m + n) which is O(n).
On to improvements. First, temp only ever needs to hold the char range which is, at most, 256. Let's give it a variable name that describes what it does, chars_seen.
Finally, there's no need to store a full integer. Normally we'd use bool from stdbool.h, but we can define our own using signed char which is what stdbool.h is likely to do. We're sure to wrap it in an #ifndef bool so we use the system supplied one if available, it will know better than we do what type to use for a boolean.
#ifndef bool
typedef signed char bool;
#endif
bool chars_seen[256] = {0};
You might be able to get a bit more performance by eliminating i and instead increment the pointer directly. Not only more performance, but this makes many string and array operations simpler.
for( ; *str != '\0'; str++ ) {
if( !chars_seen[(size_t)*str] ) {
chars_seen[(size_t)*str] = 1;
write(1, str, 1);
}
}
Note that I'm casting to size_t, not int, because that is the proper type for an index.
You might be able to shave a touch off by using post-increment, whether this helps is going to depend on your compiler.
if( !chars_seen[(size_t)*str]++ ) {
write(1, str, 1);
}
Finally, to avoid repeating your code and to extend it to work with any number of strings, we can write a function which takes in the set of characters seen and displays one string. And we'll give the compiler the hint to inline it, though it's of questionable use.
inline void display_chars_no_dups( const char *str, bool chars_seen[]) {
for( ; *str != '\0'; str++ ) {
if( !chars_seen[(size_t)*str]++ ) {
write(1, str, 1);
}
}
}
Then main allocates the array of seen characters and calls the function as many times as necessary.
int main(int argc, char *argv[]) {
bool chars_seen[256] = {0};
for( int i = 1; i < argc; i++ ) {
display_chars_no_dups( argv[i], chars_seen );
}
write(1, "\n", 1);
}

fseek creates infinite loop at run time

WHAT THE CODE DOES: I read a binary file and sort it. I use a frequency array in order to do so.
UPDATES:it does do the loop, but it doesn`t write the numbers correctly...
That is the code. I want to write on file after reading from it. I will suprascript what is already written, and that is okey. The problem is I have no error on compiling, but at run time it seems I have an infinite loop.
The file is binary. I read it byte by byte and that`s also the way I want to write it.
while(fread(readChar, sizeof(readChar)/2, 1, inFile)){
bit = atoi(readChar);
array[bit] = array[bit] + 1;
}
fseek(inFile, 0, SEEK_SET);
for( i = 0; i < 256; i++)
while(array[i] > 0){
writeChar[0] = array[i]; //do I correctly convert int to char?
fwite(writeChar, sizeof(readChar)/2, 1, inFile);
array[i] = array[i] -1;
}
The inFile file declaration is:
FILE* inFile = fopen (readFile, "rb+");
It reads from the file, but does not write!
Undefined behavior:
fread() is used to read a binary representation of data. atoi() takes a textual represetation of data: a string (a pointer to an array of char that is terminated with a '\0'.
Unless the data read into readChar has one of its bytes set to 0, calling atoi() may access data outside readChar.
fread(readChar, sizeof(readChar)/2, 1, inFile);
bit = atoi(readChar);
Code it not reading data "bit by bit" At #Jens comments: "The smallest unit is a byte." and that is at least 8 bits.
The only possible reason for an infinite loop I see is, that your array is not initialized.
After declaration with:
int array[256];
the elements can have any integer value, also very large ones.
So there are no infinite loops, but some loops can have very much iterations.
You should initialize your array with zeros:
int array[256]={0};
I don't know the count of elements in your array and if this is the way you declare it, but if you declare your array like shown, than ={0} will initialize all members with 0. You also can use a loop:
for(int i=0; i < COUNT_OF_ELEMENTS;i++) array[i] = 0;
EDIT: I forgot to mention, that your code is only able to sort files with only numbers within.
For that, you have also to change the conversion while writing:
char writeChar[2]={0};
for( int i = 0; i < 256; i++)
while(array[i] > 0){
_itoa(i,writeChar,10);
fwrite(writeChar, sizeof(char), 1, inFile);
array[i] = array[i] -1;
}
File content before:
12345735280735612385478504873457835489
File content after:
00112223333334444455555556777778888889
Is that what you want?

another char shift C

so i've looked through here and on google and tried various forms to try and accomplish this. it doesn't seem like it should be hard. i've tried getting a value from the char, tried just using math on it since i've read that a char in C is a number to the compiler anyway. what i have is an array of 4 strings. each element is another array of 20 + 1 characters (to include the null \0) what i'm trying to do is shift the value of each character in each string by a predefined amount using a variable "decryption_shift". what i thought i was doing is using 2 for loops, one to do one string at a time, the other to change each character in the strings. i've tried using pointers, tmp variables. yes this is a homework assignment, problem is it's a higher level class and they aren't teaching us methods/functions/syntax, they want us to research and learn on our own how to do it. i've already spent 2 hours trying to figure out this one snippet and don't know where else to turn. any help is greatly appreciated.
~justin
void decrypt_chunks()
{
for (m = 0; m < 0; m++)
{
for (n = 0; n < 20; n++)
{
// int *chunksp = &chunks[m][n];
chunks[m][n] = chunks[m][n] - DECRYPTION_SHIFT;
// *chunksp[m][n]=tmp;
// chunks[m][n]=tmp;
}
}
}
Your problem is here:
for (m = 0; m < 0; m++)
The loop will never execute because the termination condition is met on initialization. Try
for (m = 0; m < 4; m++)
I can't see where DECRYPTION_SHIFT or chunks is defined (and initialized), so make sure you're actually define it globally or in the decrpyt_chunks() function (note: usually you write variables lowercase and macros uppercase, and if DECRYPTION_SHIFT is a variable you should write it in lowercase letters)
for (m = 0; m < 0; m++) while never run, this statement in words would be something like: set m to ziro (btw where did you defined m?) and do the following things as long as m is less than ziro (never the case, as you set it to ziro)
As this seems like a very basic problem, make sure you actually understand what a programming language is and how it works and consider reading one or two books about c (or almost any other programming language, as this homework would be in most modern languages pretty much the same).
To really make this thing interresting, what you basically do is encrypting like Caesar, so to implement this, the code could look similar to this one:
#include <stdio.h>
#include <string.h>
void decrypt_chunks(int decryption_shift);
char chunks[4][21];
int main(int stdr, char *stdv[])
{
strcpy(chunks[0],"Hello World! And Bye");
printf("message string: %s\n", chunks[0]);
decrypt_chunks(1);
printf("encrypted string: %s\n", chunks[0]);
decrypt_chunks(-1);
printf("decrypted sring: %s\n", chunks[0]);
}
void decrypt_chunks(int decryption_shift)
{
for (int m = 0; m < 4; m++)
{
for (int n = 0; n < 20; n++)
{
chunks[m][n] = chunks[m][n] - decryption_shift;
}
}
}

How to XOR scramble a string in C and back again with the same function?

I am trying to obfuscate a string in a program. Currently, I only have a simple string reversal working. I would like to be able to perform XOR scrambling on the data to make it much more secure, however the method I have tried is not working.
The same function and input type is used to decode the string. This is no problem with string reversal, as it just reverses back, but can this be done easily with XORing without getting too complex? I would prefer if the process kept just the one string, like the reversal does. Here is my reversal function.
void reverse_string(unsigned char *buf, int length)
{
int i;
unsigned char temp;
for (i = 0; i < length / 2; i++)
{
temp = buf[i];
buf[i] = buf[length - i - 1];
buf[length - i - 1] = temp;
}
}
And here is the attempt at a XOR function
void charxor(char * text, int len) {
const unsigned char enc[8]={173,135,131,121,110,119,187,143};
char ch;
int i;
int ind=0;
for (i=0;i<len;i++) {
ch=*text++;
if (ch)
*text = ch ^ enc[ind++];
ind %=8;
}
}
Can anyone help? Would be much appreciated!
You seem to be overcomplicating things a bit. Try this instead:
void charxor (unsigned char *text, int len) {
const unsigned char enc[8] = {173,135,131,121,110,119,187,143};
int i;
for (i = 0; i < len; i++) {
text[i] ^= enc[i % 8];
}
}
Note that the XOR operation can introduce null chars into the string, so you really do need to keep track of its length instead of just relying on the presence of a trailing null char.
Also keep in mind that, while this may indeed be relatively speaking "much more secure" than just reversing the string, any reasonably clever person with access to enough samples of the output can probably figure out how to decode it in around fifteen minutes or so.
this is a pbox, it would require you to make a non repeating integer key - random - same size as said block. the last block would start with the offset which could be just random data. Doesn't cover null terminators so decide where the data is going / what your doing with it. you could realloc(buff, "A") to use memmove. make 3 64 bit boxes, and a subset of 16 4 bit boxes from the output of the 64 and it starts to look like a poor implementation of des, which openssl has build into it. The fundamental advantage is being able to encrypt/decrypt with the same function / address space. This could also allow you to encrypt in place without a extra buffer. KSZ is the length of your block(s)/key
char
*zecr
(bff, zbf, ky, ze)
char *bff;
char *zbf;
unsigned int ky[];
short ze;
{
/* main encrypt decrypt function */
int i=0;
while( i < KSZ ) {
int dx = ky[i];
if( ze == 1 ) { // encrypt
char c = bff[dx];
sprintf(zbf + i, "%c", c);
} else { // decrypt
char c = bff[i];
char tk[1] = "";
sprintf(tk, "%c", c);
memmove(zbf +dx, tk, 1);
}
i++;
}
return zbf;
}
xoring is a binary operation, which will yield vastly different results depending on how you cast it. You got the right idea using ocdec but if the idea is to keep it simple im going to assume you don't actually know assembly despite the requested reference, stick with c calls its simpler for how you are most likely going to be using the data.
-the female orgasm, that's the myth. -SUN TZU

Caesar Cipher Program - Absurd Number in Array Output

I'm actually writing about the same program as before, but I feel like I've made significant progress since the last time. I have a new question however; I have a function designed to store the frequencies of letters contained within the message inside an array so I can do some comparison checks later. When I ran a test segment through the function by outputting all of my array entries to see what their values are, it seems to be storing some absurd numbers. Here's the function of issue:
void calcFreq ( float found[] )
{
char infname[15], alpha[27];
char ch;
float count = 0;
FILE *fin;
int i = 0;
while (i < 26) {
alpha[i] = 'A' + i++;
}
printf("Please input the name of the file you wish to scan:\n");
scanf("%s", infname);
fin = fopen ( infname, "r");
while ( !feof(fin) ) {
fscanf(fin, "%c", &ch);
if ( isalpha(ch) ) {
count += 1;
i = 0;
if ( islower(ch) ) { ch = toupper(ch); }
while ( i < 26 ) {
if ( ch == alpha[i] ) {
found[i]++;
i = 30;
}
i++;
}
}
}
fclose(fin);
i = 0;
while ( i < 26 ) {
found[i] = found[i] / count;
printf("%f\n", found[i]);
i++;
}
}
At like... found[5], I get this hugely absurd number stored in there. Is there anything you can see that I'm just overlooking? Also, some array values are 0 and I'm pretty certain that every character of the alphabet is being used at least once in the text files I'm using.
I feel like a moron - this program should be easy, but I keep overlooking simple mistakes that cost me a lot of time >.> Thank you so much for your help.
EDIT So... I set the entries to 0 of the frequency array and it seems to turn out okay - in a Linux environment. When I try to use an IDE from a Windows environment, the program does nothing and Windows crashes. What the heck?
Here are a few pointers besides the most important one of initializing found[], which was mentioned in other comments.
the alpha[] array complicates things, and you don't need it. See below for a modified file-read-loop that doesn't need the alpha[] array to count the letters in the file.
And strictly speaking, the expression you're using to initialize the alpha[] array:
alpha[i] = 'A' + i++;
has undefined behavior because you modify i as well as use it as an index in two different parts of the expression. The good news is that since you don't need alpha[] you can get rid of its initialization entirely.
The way you're checking for EOF is incorrect - it'll result in you acting on the last character in the file twice (since the fscanf() call that results in an EOF will not change the value of ch). feof() won't return true until after the read that occurs at the end of the file. Change your ch variable to an int type, and modify the loop that reads the file to something like:
// assumes that `ch` is declared as `int`
while ( (ch = fgetc(fin)) != EOF ) {
if ( isalpha(ch) ) {
count += 1;
ch = toupper(ch);
// the following line is technically non-portable,
// but works for ASCII targets.
// I assume this will work for you because the way you
// initialized the `alpha[]` array assumed that `A`..`Z`
// were consecutive.
int index = ch - 'A';
found[index] += 1;
}
}
alpha[i] = 'A' + i++;
This is undefined behavior in C. Anything can happen when you do this, including crashes. Read this link.
Generally I would advise you to replace your while loops with for loops, when the maximum number of iterations is already known. This makes the code easier to read and possibly faster as well.
Is there a reason you are using float for counter variables? That doesn't make sense.
'i = 30;' What is this supposed to mean? If your intention was to end the loop, use a break statement instead of some mysterious magic number. If your intention was something else, then your code isn't doing what you think it does.
You should include some error handling if the file was not found. fin = fopen(..) and then if(fin == NULL) handle errors. I would say this is the most likely cause of the crash.
Check the definition of found[] in the caller function. You're probably running out of bounds.

Resources