Random text obfuscation algorithm failure - c

I have been experimenting with a simple XOR-based text obfuscation algorithm. Supposedly, when the algorithm is run twice in a series, I should get back the original input - yet in my implementation that only happens sometimes. Here's my code, with some random text to demonstrate the problem:
#include <stdio.h>
void obfuscate(char *text) {
char i = 0, p = 0;
while (text[i] != 0) {
text[i] = (text[i] ^ (char)0x41 ^ p) + 0xfe;
p = i++;
}
}
int main(int argc, char **argv) {
char text[] = "Letpy,Mprm` Nssl'w$:0==!";
printf("%s\n", text);
obfuscate(text);
printf("%s\n", text);
obfuscate(text);
printf("%s\n", text);
return 0;
}
How can I fix this algorithm so that it is indeed its own inverse? Any suggestions to improve the level of obfuscation?

I see two problems here:
The operation + 0xfe is not its own inverse. If you remove it and leave only XORs, each byte will be restored to its original value as expected.
A more subtle problem: Encrypting the text could create a zero byte, which will truncate the text because you use null-terminated strings. The best solution is probably to store the text length separately instead of null-terminating the encrypted text.

You're doing more than just a simple XOR here (if you left it at text[i] = text[i] ^ (char)0x41 it would work; you could even leave in the ^ p if you want, but the + 0xfe breaks it).
Why do you want to use this kind of text obfuscation? Common methods of non-secure obfuscation are Base64 (needs separate encode and decode) and Rot13 (apply a second time to reverse).

First, to decode
text[i] = (text[i] ^ (char)0x41 ^ p) + 0xfe;
you need its inverse function, that would be
text[i] = (text[i] - 0xfe) ^ (char)0x41 ^ p;
Second, char i will be able to work only with short strings, use int.
And last (and most important!), is that after such an "obfuscation" the string could get zero terminated before its original end, so you should also check its original length or ensure that you cannot get zeroes in the middle.

Why "add +0xfe"? That is (at least one) source of your non-reversibility.
I see that you obfuscate it by using XOR of the previous text value p, which means that repeated letters will cause bouncing back and forth between the values.

Related

CamelCase to snake_case in C without tolower

I want to write a function that converts CamelCase to snake_case without using tolower.
Example: helloWorld -> hello_world
This is what I have so far, but the output is wrong because I overwrite a character in the string here: string[i-1] = '_';.
I get hell_world. I don't know how to get it to work.
void snake_case(char *string)
{
int i = strlen(string);
while (i != 0)
{
if (string[i] >= 65 && string[i] <= 90)
{
string[i] = string[i] + 32;
string[i-1] = '_';
}
i--;
}
}
This conversion means, aside from converting a character from uppercase to lowercase, inserting a character into the string. This is one way to do it:
iterate from left to right,
if an uppercase character if found, use memmove to shift all characters from this position to the end the string one position to the right, and then assigning the current character the to-be-inserted value,
stop when the null-terminator (\0) has been reached, indicating the end of the string.
Iterating from right to left is also possible, but since the choice is arbitrary, going from left to right is more idiomatic.
A basic implementation may look like this:
#include <stdio.h>
#include <string.h>
void snake_case(char *string)
{
for ( ; *string != '\0'; ++string)
{
if (*string >= 65 && *string <= 90)
{
*string += 32;
memmove(string + 1U, string, strlen(string) + 1U);
*string = '_';
}
}
}
int main(void)
{
char string[64] = "helloWorldAbcDEFgHIj";
snake_case(string);
printf("%s\n", string);
}
Output: hello_world_abc_d_e_fg_h_ij
Note that:
The size of the string to move is the length of the string plus one, to also move the null-terminator (\0).
I am assuming the function isupper is off-limits as well.
The array needs to be large enough to hold the new string, otherwise memmove will perform invalid writes!
The latter is an issue that needs to be dealt with in a serious implementation. The general problem of "writing a result of unknown length" has several solutions. For this case, they may look like this:
First determine how long the resulting string will be, reallocating the array, and only then modifying the string. Requires two passes.
Every time an uppercase character is found, reallocate the string to its current size + 1. Requires only one pass, but frequent reallocations.
Same as 2, but whenever the array is too small, reallocate the array to twice its current size. Requires a single pass, and less frequent (but larger) reallocations. Finally reallocate the array to the length of the string it actually contains.
In this case, I consider option 1 to be the best. Doing two passes is an option if the string length is known, and the algorithm can be split into two distinct parts: find the new length, and modify the string. I can add it to the answer on request.

Inserting characters in the middle of char array

I have a char array filled with some characters. Let's say I have "HelloWorld" in my char array. (not string. taking up index of 0 to 9)
What I'm trying to do is insert a character in the middle of the array, and push the rest to the side to make room for the new character that is being inserted.
So, I can make the char array to have "Hello.World" in it.
char ch[15]; // assume it has "HelloWorld" in it
for(int i=0; i<=strlen(ch)-1; i++) {
if(ch[i]=='o' && ch[i+1]=='W') {
for(int j=strlen(ch)-1; j>=i+2; j--) {
ch[j] = ch[j-1]; // pushing process?
}
ch[i+1] = '.';
break;
}
}
Would this work? Would there be an easier way? I might just be thinking way too complicated on this.
You need to start the inner loop from strlen(ch) + 1, not strlen(ch) - 1, because you need to move the NULL-terminator to the right one place as well. Remember that strlen returns the length of the string such that string[strlen(string)] == '\0'; you can think of strlen as a function for obtaining the index of the NULL-terminator of a C-string.
If you want to move all the characters up by one, then you could do it using memmove.
#include <string.h>
char ch[15];
int make_room_at = 5;
int room_to_make = 1;
memmove(
ch + make_room_at + room_to_make,
ch + make_room_at,
15 - (make_room_at + room_to_make)
);
Simply do:
#define SHIFT 1
char bla[32] = "HelloWorld"; // We reserve enough room for this example
char *ptr = bla + 5; // pointer to "World"
memmove(ptr + SHIFT, ptr, strlen(ptr) + 1); // +1 for the trailing null
The initial starting value for the inner loop is one short. It should be something like the following. Note too that since the characters are moved to the right, a new null terminator needs to be added:
ch[strlen(ch) + 1] = '\0';
for(j=strlen(ch); j>=i+2; j--) { // note no "-1" after the strlen
Edit As far as the "Is this a good way?" part, I think it is reasonable; it just depends on the intended purpose. A couple thoughts come to mind:
Reducing the calls to strlen might be good. It could depend on how good the optimizer is (perhaps some might be optimized out). But each call to strlen require a scan of the string looking for the null terminator. In high traffic code, that can add up. So storing the initial length in a variable and then using the variable elsewhere could help.
This type of operation has the chance for buffer overflow. Always make sure the buffer is long enough (it is in the OP).
If you're going to manipulate a char array you shouldn't make it static. By doing this:
char ch[15];
you're hardcoding the array to always have 15 characters in it. Making it a pointer would be step 1:
char* ch;
This way you can modify it as need be.

Fast C comparison

As part of a protocol I'm receiving C string of the following format:
WORD * WORD
Where both WORDs are the same given string.
And, * - is any string of printable characters, NOT including spaces!
So the following are all legal:
WORD asjdfnkn WORD
WORD 234kjk2nd32jk WORD
And the following are illegal:
WORD akldmWORD
WORD asdm zz WORD
NOTWORD admkas WORD
NOTWORD admkas NOTWORD
Where (1) is missing a trailing space; (2) has 3 or more spaces; (3)/(4) do not open/end with the correct string (WORD).
Of-course this could be implemented pretty straight-forward, however I'm not sure what I'm doing is the most efficient.
Note: WORD is pre-set for a whole run, however could change from run to run.
Currently I'm strncmping each string against "WORD ".
If that checks manually (char-by-char) run over the string, to check for the second space char.
[If found] I then strcmp (all the way) with "WORD".
Would love to hear your solution, with an emphasis on efficiency as I'll be running over millions of theses in real-time.
I'd say, have a look at the algorithms in Handbook of Exact String-Matching Algorithms, compare the complexities and choose the one that you like best, implement it.
Or you can use some ready-made implementations.
You have some really classical algorithms for searching strings inside another string here:
KMP(Knuth-Morris-Pratt)
Rabin-Karp
Boyer-Moore
Hope this helps :)
Have you profiled?
There's not much gain to be had here, since you're doing basic string comparisons. If you want to go for the last few percent of performance, I'd change out the str... functions for mem... functions.
char *bufp, *bufe; // pointer to buffer, one past end of buffer
if (bufe - bufp < wordlen * 2 + 2)
error();
if (memcmp(bufp, word, wordlen) || bufp[wordlen] != ' ')
error();
bufp += wordlen + 1;
char *datap = bufp;
char *datae = memchr(bufp, ' ', bufe - buf);
if (!datae || bufe - datae < wordlen + 1)
error();
if (memcmp(datae + 1, word, wordlen))
error();
// Your data is in the range [datap, datae).
The performance gains are likely less than spectacular. You have to examine each character in the buffer since each character could be a space, and any character in the delimiters could be wrong. Changing a loop to memchr is slick, but modern compilers know how to do that for you. Changing a strncmp or strcmp to memcmp is also probably going to be negligible.
There is probably a tradeoff to be made between the shortest code and the fastest implementation. Choices are:
The regular expression ^WORD \S+ WORD$ (requires a regex engine)
strchr on "WORD " and a strrchr on " WORD" with a lot of messy checks (not really recommended)
Walking the whole string character by character, keeping track of the state you are in (scanning first word, scanning first space, scanning middle, scanning last space, scanning last word, expecting end of string).
Option 1 requires the least code but backtracks near the end, and Option 2 has no redeeming qualities. I think you can do option 3 elegantly. Use a state variable and it will look okay. Remember to manually enter the last two states based on the length of your word and the length of your overall string and this will avoid the backtracking that a regex will most likely have.
Do you know how long the string that is to be checked is? If not, your are somewhat limited in what you can do. If you do know how long the string is, you can speed things up a bit. You have not specified for sure that the '*' part has to be at least one character. You've also not stipulated whether tabs are allowed, or newlines, or ... is it only alphanumerics (as in your examples) or are punctuation and other characters allowed? Control characters?
You know how long WORD is, and can pre-construct both the start and end markers. The function error() reports an error (however you need it to be reported) and returns false. The test function might be bool string_is_ok(const char *string, int actstrlen);, returning true on success and false when there is a problem:
// Preset variables characterizing the search
static int wordlen = 4;
static int marklen = wordlen + 1;
static int minstrlen = 2 * marklen + 1; // Two blanks and one other character.
static char bword[] = "WORD "; // Start marker
static char eword[] = " WORD"; // End marker
static char verboten[] = " "; // Forbidden characters
bool string_is_ok(const char *string, int actstrlen)
{
if (actstrlen < minstrlen)
return error("string too short");
if (strncmp(string, bword, marklen) != 0)
return error("string does not start with WORD");
if (strcmp(string + actstrlen - marklen, eword) != 0)
return error("string does not finish with WORD");
if (strcspn(string + marklen, verboten) != actstrlen - 2 * marklen)
return error("string contains verboten characters");
return true;
}
You probably can't reduce the tests by much if you want your guarantees. The part that would change most depending on the restrictions in the alphabet is the strcspn() line. That is relatively fast for a small list of forbidden characters; it will likely be slower as the number of characters forbidden is increased. If you only allow alphanumerics, you have 62 OK and 193 not OK characters, unless you count some of the high-bit set characters as alphabetic too. That part will probably be slow. You might do better with a custom function that takes a start position and length and reports whether all characters are OK. This could be along the lines of:
#include <stdbool.h>
static bool ok_chars[256] = { false };
static void init_ok_chars(void)
{
const unsigned char *ok = "abcdefghijklmnopqrstuvwxyz...0123456789";
int c;
while ((c = *ok++) != 0)
ok_chars[c] = 1;
}
static bool all_chars_ok(const char *check, int numchars)
{
for (i = 0; i < numchars; i++)
if (ok_chars[check[i]] == 0)
return false;
return true;
}
You can then use:
return all_chars_ok(string + marklen, actstrlen - 2 * marklen);
in place of the call to strcspn().
If your "stuffing" should contain only '0'-'9', 'A'-'Z' and 'a'-'z' and are in some encoding based on ASCII (like most Unicode based encodings), then you can skip two comparisons in one of your loops, since only one bit differ between capital and minor characters.
Instead of
ch>='0' && ch<='9' && ch>='A' && ch<='Z' && ch>='a' && ch<='a'
you get
ch2 = ch & ~('a' ^ 'A')
ch>='0' && ch<='9' && ch2>='A' && ch2<='Z'
But you better look at the assembler code your compiler generate and do some benchmarking, depending on computer architecture and compiler, this trick could give slower code.
If branching is expensive compared to comparisons on your computer, you can also replace the && with &. But most modern compilers know this trick in most situations.
If, on the other hand, you test for any printable glyph from some large character encoding, then it is most likely less expensive to test for white-space glyphs, rather then printable glyph.
Also, compile specifically for the computer that the code will run on and don't forget turn of any generation of debugging-code.
Added:
Don't make subroutine calls within your scan loops, unless it is worth it.
Whatever trick you use to speed up your loops, it will diminish if you have to make a sub-routine call within one of them. It is fine to use built-in functions that your compiler inline into your code, but if you use something lika an external regex-library and your compiler is unable to inline those functions (gcc can do that, sometimes, if you ask it to), then making that subroutine call will shuffle a lot of memory around, in worse case between different types of memory (registers, CPU buffers, RAM, harddisk et.c.) and may mess up CPU predictions and pipelines. Unless your text-snippets are very long, so that you spend much time parsing each of them, and the subroutine is effective enough to compensate for the cost of the call, don't do that. Some functions for parsing use call-backs, it might be more effective then you making a lot of subroutine calls from your loops (since the function can scan several pattern-matches in one sweep and bunch several call-backs together outside the critical loop), but that depend on how someone else have written that function and basically it is the same thing as you making the call.
WORD is 4 characters, with uint32_t you could do a quick comparison. You will need a different constant depending on system endianness. The rest seems to be fine.
Since WORD can change you have to precalculate the uint32_t, uint64_t, ... you need depending on the length of the WORD.
Not sure from the description, but if you trust the source you could just chomp the first n+1 and last n+1 characters.
bool check_legal(
const char *start, const char *end,
const char *delim_start, const char *delim_end,
const char **content_start, const char **content_end
) {
const size_t delim_len = delim_end - delim_start;
const char *p = start;
if (start + delim_len + 1 + 0 + 1 + delim_len < end)
return false;
if (memcmp(p, delim_start, delim_len) != 0)
return false;
p += delim_len;
if (*p != ' ')
return false;
p++;
*content_start = p;
while (p < end - 1 - delim_len && *p != ' ')
p++;
if (p + 1 + delim_len != end)
return false;
*content_end = p;
p++;
if (memcmp(p, delim_start, delim_len) != 0)
return false;
return true;
}
And here is how to use it:
const char *line = "who is who";
const char *delim = "who";
const char *start, *end;
if (check_legal(line, line + strlen(line), delim, delim + strlen(delim), &start, &end)) {
printf("this %*s nice\n", (int) (end - start), start);
}
(It's all untested.)
using STL find the number of spaces..if they are not two obviously the string is wrong..and using find(algorithm.h) you can get the position of the two spaces and the middle word! Check for WORD at the beginning and the end! you are done..
This should return the true/false condition in O(n) time
int sameWord(char *str)
{
char *word1, *word2;
word1 = word2 = str;
// Word1, Word2 points to beginning of line where the first word is found
while (*word2 && *word2 != ' ') ++word2; // skip to first space
if (*word2 == ' ') ++word2; // skip space
// Word1 points to first word, word2 points to the middle-filler
while (*word2 && *word2 != ' ') ++word2; // skip to second space
if (*word2 == ' ') ++word2; // skip space
// Word1 points to first word, word2 points to the second word
// Now just compare that word1 and word2 point to identical strings.
while (*word1 != ' ' && *word2)
if (*word1++ != *word2++) return 0; //false
return *word1 == ' ' && (*word2 == 0 || *word2 == ' ');
}

robust string reverse

I am trying to code a trival interview question of reversing a string.
This is my code:
#include <string.h>
char* rev( char* str)
{
int i,j,l;
l = strlen(str);
for(i=0,j=l-1; i<l/2 ; i++, j--)
{
str[i] = (str[i] + str[j]);
str[j] = str[i] - str[j];
str[j] = str[i] - str[j];
}
return str;
}
int main()
{
char *str = " hello";
printf("\nthe reverse is %s ...", rev(str));
return 1;
}
Basically, this one gives a segmentation fault.
I have following questions:
I get segmentation fault probably because, the characters add up to something not defined in ascii and hence I cannot store them back as characters, I am using www.codepad.org [I wonder if it supports just ascii !!] . Is my understanding correct or there is something else to it.
How do I correct the problem , for the same platform [I mean swapping in place for codepad.org]
Here I have to use an additional integer l to calculate length. So to save a single char space by swapping in place .. I am using an extra int !!! .. just to impress the inteviewer :) ... Is this approach eve worth it !!!
This one is for those who are interested in writing unit tests/API tests . I want to have a robust implementation so what can be possible test cases. I assume that if interviewer asks such a simple question .. he definitely wants some very roboust implementation and test cases. Few that I thought off:
passing empty strings passing integer
strings passing integer array instead
of char array.
very long string ,
single char string string of special chars.
Any advise/suggestions would be helpful.
This line:
char *str = " hello";
Probably points to read-only memory. Try this:
char str[] = " hello";
(You have some other bugs too, but this change will fix your segfault).
Use a temporary variable rather than your approach for the swap. The compiler will probably use a register for the temporary variable due to optimizations.'
Either way, you implemented the swap algorithm wrong. It should be
str[i] = str[i] + str[j];
str[j] = str[i] - str[j];
str[i] = str[i] - str[j];
Page 62 of Kernighan & Ritchie's The C Programming Language shows an algorithm for in-place string reversal with a temporary variable.
Similar to this:
char* rev_string(char* const str)
{
int i, j;
char tmp;
for(i = 0, j = strlen(str)-1; i < j; i++; j--)
{
tmp = str[i];
str[i] = str[j];
str[j] = tmp;
}
return str;
}
This algorithm is easier to understand than the one without a temporary variable, imho.
As for item #3 in your list of questions:
As an interviewer, I would want to see simple, clear, and well structured code. That's impressive. Trickery will not impress me. Especially when its comes to premature optimization. BTW, my solution reverses the string in place with one additional char instead of an int. Impressive? :)
And item #4:
One other test case would be an unterminated string. Is your function robust enough to handle this case? Your function will only be as robust as the least robust part of it. Passing an unterminated string into my solution causes a Segmentation Fault, due to strlen reporting an incorrect string length. Not very robust.
The important point about robustness is, your code might be robust but you have to make sure all other external functions you use are, too!
Where to start...
OK, first you should be aware that your routine reverses a string in place, in other words changes are made to the original buffer.
This means that you could do
int main()
{
char str[] = "hello";
rev(str);
printf("\nthe reverse is %s ...", str);
return 0;
}
and the string would be reversed.
The other alternative is to create a new string that is a reversed copy of the original string. The algorithm is somewhat different, you should be able to do this too.
Next point:
str[i] = (str[i] + str[j]);
str[j] = str[i] - str[j];
str[j] = str[i] - str[j];
is broken. It should be
str[i] = str[i] + str[j];
str[j] = str[i] - str[j];
str[i] = str[i] - str[j];
But, as ~mathepic said, you should do this instead:
temp = str[i];
str[i] = str[j];
str[j] = temp;
Also: codepad makes it difficult to debug your code. Install a compiler and debugger (such as gcc and gdb) on your own computer.
the characters add up to something not defined in ascii and hence I cannot store them back as characters, I am using www.codepad.org [I wonder if it supports just ascii !!] . Is my understanding correct or there is something else to it.
In most C implementations (those that run on 32-bit PCs anyway), a char is an 8-bit integer. An int is a 32-bit integer. When you add or subtract two chars and the result is more than 8 bits, it will "wrap around" to some other value, but this process is reversible.
For instance, 255 + 1 gives 0, but 0 - 1 = 255. (Just an illustrative example.) This means that "I cannot store them back as characters" is not the problem here.
I want to have a robust implementation
You want to show that you take into account the costs and benefits of different design choices. Perhaps it is better to cause a segmentation fault if your routine is supplied with a NULL, because this very quickly alerts the programmer of a bug in his code.
passing empty strings
You must make sure your code works in a case like that.
passing integer
passing integer array instead
You can't pass integers, or an int [] to a function expecting a char *. In C, you cannot tell whether a char * really is a string or something else.
single char string
Make sure your routine works for a single char string, also for both strings with an odd number and an even number of chars.
string of special chars
There are no special chars in C (except, by convention, the null terminator '\0'). However, multi-char sequences are something that must be considered (reversing a UTF-8 string is different from reversing a regular string). However if the question does not specify I don't think you should be concerned about this.
Three final points:
In main(), return 1; usually indicates your program failed. return 0; is more common but return EXIT_SUCCESS; is best, though you may need to #include <stdlib.h>.
Consider using more descriptive variable names.
Consider making a strnrev() function, similar to the strncpy() and similar functions, where the function will not go beyond n characters if the null terminator is not found there.
If you are going to implement the swap of two characters without a temporary variable (which is a neat trick but not something that you should actually use in practice), it would be prudent to either use the "bitwise exclusive or" instead of addition/substraction, or use unsigned char instead of char, because overflow in signed arithmetic is undefined in the C99 standard, and guess what, gcc started to make use of this undefinedness for optimization purposes. I was just ranting about another case of unwanted optimization in another question.
I get segmentation fault probably because, the characters add up to something not defined in ascii and hence I cannot store them back as characters
I don't think so. They're all just numbers in C (albeit only 1 byte long), but you shouldn't have any problem there.
I think (but I'm not sure) that the problem is with this:
char *str = " hello";
printf("\nthe reverse is %s ...", rev(str));
What you're actually doing is creating the char array " hello", which is a constant array. That means, basically, that you're not supposed to change it. When you call rev, it actually changes the array in-place, so it's trying to assign new values to a constant char.
Since you do char* str = "hello", you're actually casting "hello" to an unsigned char, so this isn't treated as a compile-time error. But because "hello" is what's called a "string literal", it's being created as part of the executable file itself, i.e. it's not in the memory your program can freely change. That is why you're actually getting the run-time seg-fault, and not a compile-time error (although you should probably be getting a warning about this).
Thanks all for the reply. Here is the code with changes everyone suggested:
#include <string.h>
char* rev( char* str)
{
int start ,end ,len;
len = strlen(str);
for(start =0,end =len-1; start <len/2 ; start ++, end --)
{
str[start ] = str[start ] + str[end ];
str[end ] = str[start ] - str[end ];
str[start] = str[start ] - str[end ];
}
return str;
}
int main()
{
char str[] = " hello there !";
printf("\n the reverse string is %s ...", rev(str));
return 1;
}
The segmentation fault was bec *str was pointing to read only memory , change it to str[]. Thanks Carl Norum for pointing that out.
Any test cases [specifically for API testing] ?
as for testing:
null argument
empty string argument
length 1 string argument
various other lengths - perhaps one long string
you can of course implement the test method with the following strategy:
a generic verify method
verifyEquals( expected, actual ) { ... }
a test method with various cases:
testReverse() {
verifyEquals(NULL, rev(NULL));
verifyEquals("", rev(""));
verifyEquals("a", rev("a"));
verifyEquals("ba", rev("ab"));
verifyEquals("zyx", rev("xyz"));
verifyEquals("edcba", rev("abcde"));
}
You can also refactor the swap "algorithm" into a separate procedure and unit test it as well.

Best way to do binary arithmetic in C?

I am learning C and writing a simple program that will take 2 string values assumed to each be binary numbers and perform an arithmetic operation according to user selection:
Add the two values,
Subtract input 2 from input 1, or
Multiply the two values.
My implementation assumes each character in the string is a binary bit, e.g. char bin5 = "0101";, but it seems too naive an approach to parse through the string a character at a time. Ideally, I would want to work with the binary values directly.
What is the most efficient way to do this in C? Is there a better way to treat the input as binary values rather than scanf() and get each bit from the string?
I did some research but I didn't find any approach that was obviously better from the perspective of a beginner. Any suggestions would be appreciated!
Advice:
There's not much that's obviously better than marching through the string a character at a time and making sure the user entered only ones and zeros. Keep in mind that even though you could write a really fast assembly routine if you assume everything is 1 or 0, you don't really want to do that. The user could enter anything, and you'd like to be able to tell them if they screwed up or not.
It's true that this seems mind-bogglingly slow compared to the couple cycles it probably takes to add the actual numbers, but does it really matter if you get your answer in a nanosecond or a millisecond? Humans can only detect 30 milliseconds of latency anyway.
Finally, it already takes far longer to get input from the user and write output to the screen than it does to parse the string or add the numbers, so your algorithm is hardly the bottleneck here. Save your fancy optimizations for things that are actually computationally intensive :-).
What you should focus on here is making the task less manpower-intensive. And, it turns out someone already did that for you.
Solution:
Take a look at the strtol() manpage:
long strtol(const char *nptr, char **endptr, int base);
This will let you convert a string (nptr) in any base to a long. It checks errors, too. Sample usage for converting a binary string:
#include <stdlib.h>
char buf[MAX_BUF];
get_some_input(buf);
char *err;
long number = strtol(buf, &err, 2);
if (*err) {
// bad input: try again?
} else {
// number is now a long converted from a valid binary string.
}
Supplying base 2 tells strtol to convert binary literals.
First out I do recommend that you use stuff like strtol as recommended by tgamblin,
it's better to use things that the lib gives to you instead of creating the wheel over and over again.
But since you are learning C I did a little version without strtol,
it's neither fast or safe but I did play a little with the bit manipulation as a example.
int main()
{
unsigned int data = 0;
int i = 0;
char str[] = "1001";
char* pos;
pos = &str[strlen(str)-1];
while(*pos == '0' || *pos == '1')
{
(*pos) -= '0';
data += (*pos) << i;
i++;
pos--;
}
printf("data %d\n", data);
return 0;
}
In order to get the best performance, you need to distinguish between trusted and untrusted input to your functions.
For example, a function like getBinNum() which accepts input from the user should be checked for valid characters and compressed to remove leading zeroes. First, we'll show a general purpose in-place compression function:
// General purpose compression removes leading zeroes.
void compBinNum (char *num) {
char *src, *dst;
// Find first non-'0' and move chars if there are leading '0' chars.
for (src = dst = num; *src == '0'; src++);
if (src != dst) {
while (*src != '\0')
*dst++ = *src++;
*dst = '\0';
}
// Make zero if we removed the last zero.
if (*num == '\0')
strcpy (num, "0");
}
Then provide a checker function that returns either the passed in value, or NULL if it was invalid:
// Check untested number, return NULL if bad.
char *checkBinNum (char *num) {
char *ptr;
// Check for valid number.
for (ptr = num; *ptr == '0'; ptr++)
if ((*ptr != '1') && (*ptr != '0'))
return NULL;
return num;
}
Then the input function itself:
#define MAXBIN 256
// Get number from (untrusted) user, return NULL if bad.
char *getBinNum (char *prompt) {
char *num, *ptr;
// Allocate space for the number.
if ((num = malloc (MAXBIN)) == NULL)
return NULL;
// Get the number from the user.
printf ("%s: ", prompt);
if (fgets (num, MAXBIN, stdin) == NULL) {
free (num);
return NULL;
}
// Remove newline if there.
if (num[strlen (num) - 1] == '\n')
num[strlen (num) - 1] = '\0';
// Check for valid number then compress.
if (checkBinNum (num) == NULL) {
free (num);
return NULL;
}
compBinNum (num);
return num;
}
Other functions to add or multiply should be written to assume the input is already valid since it will have been created by one of the functions in this library. I won't provide the code for them since it's not relevant to the question:
char *addBinNum (char *num1, char *num2) {...}
char *mulBinNum (char *num1, char *num2) {...}
If the user chooses to source their data from somewhere other than getBinNum(), you could allow them to call checkBinNum() to validate it.
If you were really paranoid, you could check every number passed in to your routines and act accordingly (return NULL), but that would require relatively expensive checks that aren't necessary.
Wouldn't it be easier to parse the strings into integers, and then perform your maths on the integers?
I'm assuming this is a school assignment, but i'm upvoting you because you appear to be giving it a good effort.
Assuming that a string is a binary number simply because it consists only of digits from the set {0,1} is dangerous. For example, when your input is "11", the user may have meant eleven in decimal, not three in binary. It is this kind of carelessness that gives rise to horrible bugs. Your input is ambiguously incomplete and you should really request that the user specifies the base too.

Resources