Caesar Cipher in C Encryption & Decryption, ASCII values too high

Caesar Cipher in C Encryption & Decryption, ASCII values too high - c

I am working on an encryption/decryption code for C.
Whilst I have the code encrypting and decrypting well enough there are still some errors.
I want the users input to be in between '!' (ASCII value 32) and '~' (ASCII value 126), However, when the user puts in a character such as '~' with a fairly highly offset/key-value the output is in the extended ASCII characters range. Is there any way that I can force the output to not be in the extended character range and still decrypt/encrypt properly?
I have messed around with changing the values for "cipher" but it hasn't helped too much.
Below is my code
void Encryption(char* OriginalText, int offset) {
int i = 0;
int cipher;
char encrypt;
printf("Encrypted string: \n");
while (OriginalText[i] != '\0') {
if (OriginalText[i] == ' ') {
encrypt = ' ';
printf("%c", encrypt);
i = i + 1;
}
cipher = ((int)OriginalText[i] + (offset % 26) % -26);
encrypt = (char)(cipher);
printf("%c", encrypt);
i = i + 1;
}
printf("\n");
}

The '%' operator has higher precedence than the '+'.
Therefore, it applies on the result of the exprssion offset % 26 first, and then added to OriginalText[i].
In order to fix this, you should add another pair of parenthesis in the assignment to cipher:
cipher = (((int)OriginalText[i] + (offset % 26)) % -26);
Moreover, when you're dealing with ASCII characters, just to walk on the safe side and prevent annoying and unwanted bugs, you should treat your I/O as unsigned char, hence:
unsigned char cipher;
unsigned char encrypt;
...
cipher = ((OriginalText[i] + (offset % 26)) % 26);
encrypt = cipher;

You should first convert your character to an index between zero and the size of the alphabet (exclusive), i.e. from 0 to 25 for the ABC. Only then do the modular addition / subtraction. Finally, convert back. The best way to do this is to simply perform character - 'a' (in case of lowercase characters of course). Do not perform modular operations on ASCII values itself.

Related

How to extract values from a string in hexadecimal in a C program?

I have a hexadecimal string like:
char str[] = "40004A0060007A0034006600";
I want to extract individual values from it like 0x40, 0x00, 0x4A, 0x00 etc.
How to do it?

Copy the 2 bytes of interest into a temporary 3 byte array.
Null terminate the 3 byte array to turn it into a string.
Call strtoul on this array from stdlib.h.
Alternatively you could manually decode it, since it's a trivial thing to do. Mask out nibbles, subtract some ASCII values or do a lookup table check, then multiply the ms nibble by 16.

NOTE: This answer applies to revision 3 of the question. Meanwhile, the question has been modified, thereby invalidating option #1 of my answer. As pointed out in the comments section of the question, this was not OP's fault, though.
You have two options:
Convert the string to an integer type, for example using the function strtoul or strtoull, and then use bit-shifting (>> operator) and bit-masking (& operator) to obtain the desired values. However, due to limitations in the range of values that the data types long and long long can represent, this option is only guaranteed to work with up to 8 hexadecimal digits with strtoul and 16 digits with strtoull. EDIT: Meanwhile, the question has been modified in such a way that the string is longer than 16 digits, so this solution is no longer viable.
Obtain the desired values by looking them up directly in the string. For example, if you are looking for the 3rd group of hexadecimal digits, then you will find them using str[4] and str[5]. This will give you two character values. If you want to convert these two hexadecimal characters to the number that they represent, then you can create a string from these two values and then use strtoul on that string.

Since you seem to be a beginner, I broke the task up into its constituent parts. This is a very simple hex dump facility where each step your code needs to take is its own routine. It is a quick and dirty and rather imperfect implementation, but understanding how to improve it will help you learn and write your own.
#include <ctype.h>
#include <stdint.h>
#include <stdio.h>
int
nibble(uint8_t ch) {
if ((ch >= '0') && (ch <= '9')) {
return ch - '0';
}
if ((ch >= 'A') && (ch <= 'F')) {
return 10 + (ch - 'A');
}
if ((ch >= 'a') && (ch <= 'f')) {
return 10 + (ch - 'a');
}
/* should never get here if isxdigit was called first */
return -1;
}
int
next_byte(const char *in)
{
uint8_t hi = 16 * nibble(*in);
uint8_t lo = nibble(*(in + 1));
return hi + lo;
}
int points_to_byte(const char *in) {
return ((*in) && isxdigit(*in))
&& (*(in + 1)) && isxdigit(*(in + 1));
}
void
dump(const char *in) {
/* Decide what to do with input that is not a string of hex bytes */
for (int i = 0; points_to_byte(in + i); i += 2) {
printf("%d\n", next_byte(in + i));
}
}
int
main(int argc, char *argv[]) {
if (argc < 2) {
puts("Need hex strings as arguments");
}
for (int i = 1; i < argc; ++i) {
dump(argv[i]);
}
}
When compile this into an executable called t and run it with your input, this is the output I get:
$ ./t 400004a005b002000113efb29f73f57589343e70e5244162edf312e303030322e313420200043472d58585858000000000032303139303833585858585858000000505230474C5043343554334C3343
64
0
4
160
...
67
52
195
52
want to convert it into a other string like
char str1[] ="0x40,0x00,0x4A,0x00,0x60";
Since you do not control the input string, you are going to need to malloc the buffer for the output. That and storing the transformed output is left as an exercise.

Do char's in C have pre-assigned zero indexed values?

Sorry if my title is a little misleading, I am still new to a lot of this but:
I recently worked on a small cipher project where the user can give the file a argument at the command line but it must be alphabetical. (Ex: ./file abc)
This argument will then be used in a formula to encipher a message of plain text you provide. I got the code to work, thanks to my friend for helping but i'm not 100% a specific part of this formula.
#include <stdio.h>
#include <cs50.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <ctype.h>
int main (int argc, string argv[])
{ //Clarify that the argument count is not larger than 2
if (argc != 2)
{
printf("Please Submit a Valid Argument.\n");
return 1;
}
//Store the given arguemnt (our key) inside a string var 'k' and check if it is alpha
string k = (argv[1]);
//Store how long the key is
int kLen = strlen(k);
//Tell the user we are checking their key
printf("Checking key validation...\n");
//Pause the program for 2 seconds
sleep(2);
//Check to make sure the key submitted is alphabetical
for (int h = 0, strlk = strlen(k); h < strlk; h++)
{
if isalpha(k[h])
{
printf("Character %c is valid\n", k[h]);
sleep(1);
}
else
{ //Telling the user the key is invalid and returning them to the console
printf("Key is not alphabetical, please try again!\n");
return 0;
}
}
//Store the users soon to be enciphered text in a string var 'pt'
string pt = get_string("Please enter the text to be enciphered: ");
//A prompt that the encrypted text will display on
printf("Printing encrypted text: ");
sleep(2);
//Encipher Function
for(int i = 0, j = 0, strl = strlen(pt); i < strl; i++)
{
//Get the letter 'key'
int lk = tolower(k[j % kLen]) - 'a';
//If the char is uppercase, run the V formula and increment j by 1
if isupper(pt[i])
{
printf("%c", 'A' + (pt[i] - 'A' + lk) % 26);
j++;
}
//If the char is lowercase, run the V formula and increment j by 1
else if islower(pt[i])
{
printf("%c", 'a' + (pt[i] - 'a' + lk) % 26);
j++;
}
//If the char is a symbol just print said symbol
else
{
printf("%c", pt[i]);
}
}
printf("\n");
printf("Closing Script...\n");
return 0;
}
The Encipher Function:
Uses 'A' as a char for the placeholder but does 'A' hold a zero indexed value automatically? (B = 1, C = 2, ...)

In C, character literals like 'A' are of type int, and represent whatever integer value encodes the character A on your system. On the 99.999...% of systems that use ASCII character encoding, that's the number 65. If you have an old IBM mainframe from the 1970s using EBCDIC, it might be something else. You'll notice that the code is subtracting 'A' to make 0-based values.
This does make the assumption that the letters A-Z occupy 26 consecutive codes. This is true of ASCII (A=65, B=66, etc.), but not of all codes, and not guaranteed by the language.

does 'A' hold a zero indexed value automatically? (B = 1, C = 2, ...)
No. Strictly conforming C code can not depend on any character encoding other than the numerals 0-9 being represented consecutively, even though the common ASCII character set does represent them consecutively.
The only guarantee regarding character sets is per 5.2.1 Character sets, paragraph 3 of the C standard:
... the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous...
Character sets such as EBCDIC don't represent letters consecutively

char is a numeric type that happens to also often be used to represent visible characters (or special non-visible pseudo-characters). 'A' is a value (with actual type int) that can be converted to a char without overflow or underflow. That is, it's really some number, but you usually don't need to know what number, since you generally use a particular char value either as just a number or as just a character, not both.
But this program is using char values in both ways, so it somewhat does matter what the numeric values corresponding to visible characters are. One way it's very often done, but not always, is using the ASCII values which are numbered 0 to 127, or some other scheme which uses those values plus more values outside that range. So for example, if the computer uses one of those schemes, then 'A'==65, and 'A'+1==66, which is 'B'.
This program is assuming that all the lowercase Latin-alphabet letters have numeric values in consecutive order from 'a' to 'z', and all the uppercase Latin-alphabet letters have numeric values in consecutive order from 'A' to 'Z', without caring exactly what those values are. This is true of ASCII, so it will work on many kinds of machines. But there's no guarantee it will always be true!
C does guarantee the ten digit characters from '0' to '9' are in consecutive order, which means that if n is a digit number from zero to nine inclusive, then n + '0' is the character for displaying that digit, and if c is such a digit character, then c - '0' is the number from zero to nine it represents. But that's the only guarantee the C language makes about the values of characters.
For one counter-example, see EBCDIC, which is not in much use now, but was used on some older computers, and C supports it. Its alphabetic characters are arranged in clumps of consecutive letters, but not with all 26 letters of each case all together. So the program would give incorrect results running on such a computer.

Sequentiality is only one aspect of concern.
Proper use of isalpha(ch) is another, not quite implemented properly in OP's code.
isalpha(ch) expects a ch in the range of unsigned char or EOF. With k[h], a char, that value could be negative. Insure a non-negative value with:
// if isalpha(k[h])
if isalpha((unsigned char) k[h])

Mixed UTF-16 and ASCII string

I have mixed ASCII and UTF-16 strings, the main problem is that, I need to somehow split it as each character in string.
for example assuming we're under Windows and (in most cases) default encoding is UTF-16:
const wchar_t msg[] = L"AД诶B";
I have defined total of 4 characters.
A = 2 bytes.
Д = 2 bytes.
诶 = 4 bytes.
B = 2 bytes.
I need to take 4th character from the string (ASCII B), but if I do msg[4] it will split Chinese keyword and will return wrong result. How can I solve that without any additional libraries?

As you've already discovered, UTF-16 is really a variable-width encoding. So, you will have to scan across the string to perform accurate character indexing.
Luckily, it is very easy to tell if a character is part of a multi-word sequence: the only multiword sequences in UTF-16 (as currently defined) are surrogate pairs: a word in the range [D800-DBFF] followed by a word in the range [DC00-DFFF]. So, when you encounter such a sequence, treat it as a single character.
This may work for your needs:
UChar32 utf16_char_at_index(const wchar_t *s, off_t index) {
while(1) {
if(s[0] >= 0xd800 && s[0] <= 0xdbff) {
/* First half of surrogate pair; check next half */
if(s[1] >= 0xdc00 && s[1] <= 0xdfff) {
/* surrogate pair: skip or return */
if(index == 0) {
return ((s[0] - 0xd800) << 10) | (s[1] - 0xdc00);
}
s += 2;
index--;
continue;
}
/* Otherwise, decoding error...may want to flag error here */
}
if(index == 0) {
return s[0];
}
s++;
index--;
}
}

Random text obfuscation algorithm failure

I have been experimenting with a simple XOR-based text obfuscation algorithm. Supposedly, when the algorithm is run twice in a series, I should get back the original input - yet in my implementation that only happens sometimes. Here's my code, with some random text to demonstrate the problem:
#include <stdio.h>
void obfuscate(char *text) {
char i = 0, p = 0;
while (text[i] != 0) {
text[i] = (text[i] ^ (char)0x41 ^ p) + 0xfe;
p = i++;
}
}
int main(int argc, char **argv) {
char text[] = "Letpy,Mprm` Nssl'w$:0==!";
printf("%s\n", text);
obfuscate(text);
printf("%s\n", text);
obfuscate(text);
printf("%s\n", text);
return 0;
}
How can I fix this algorithm so that it is indeed its own inverse? Any suggestions to improve the level of obfuscation?

I see two problems here:
The operation + 0xfe is not its own inverse. If you remove it and leave only XORs, each byte will be restored to its original value as expected.
A more subtle problem: Encrypting the text could create a zero byte, which will truncate the text because you use null-terminated strings. The best solution is probably to store the text length separately instead of null-terminating the encrypted text.

You're doing more than just a simple XOR here (if you left it at text[i] = text[i] ^ (char)0x41 it would work; you could even leave in the ^ p if you want, but the + 0xfe breaks it).
Why do you want to use this kind of text obfuscation? Common methods of non-secure obfuscation are Base64 (needs separate encode and decode) and Rot13 (apply a second time to reverse).

First, to decode
text[i] = (text[i] ^ (char)0x41 ^ p) + 0xfe;
you need its inverse function, that would be
text[i] = (text[i] - 0xfe) ^ (char)0x41 ^ p;
Second, char i will be able to work only with short strings, use int.
And last (and most important!), is that after such an "obfuscation" the string could get zero terminated before its original end, so you should also check its original length or ensure that you cannot get zeroes in the middle.

Why "add +0xfe"? That is (at least one) source of your non-reversibility.
I see that you obfuscate it by using XOR of the previous text value p, which means that repeated letters will cause bouncing back and forth between the values.

How to convert a hexadecimal number into Ascii in C

I plan to make a program like this:
loop
read first character
read second character
make a two-digit hexadecimal number from the two characters
convert the hexadecimal number into decimal
display the ascii character corresponding to that number.
end loop
The problem I'm having is turning the two characters into a hexadecimal number and then turning that into a decimal number. Once I have a decimal number I can display the ascii character.

Unless you really want to write the conversion yourself, you can read the hex number with [f]scanf using the %x conversion, or you can read a string, and convert with (for one possibility) strtol.
If you do want to do the conversion yourself, you can convert individual digits something like this:
if (ixdigit(ch))
if (isdigit(ch))
value = (16 * value) + (ch - '0');
else
value = (16 * value) + (tolower(ch) - 'a' + 10);
else
fprintf(stderr, "%c is not a valid hex digit", ch);

char a, b;
...read them in however you like e.g. getch()
// validation
if (!isxdigit(a) || !isxdigit(b))
fatal_error();
a = tolower(a);
b = tolower(b);
int a_digit_value = a >= 'a' ? (a - 'a' + 10) : a - '0';
int b_digit_value = b >= 'a' ? (b - 'a' + 10) : b - '0';
int value = a_digit_value * 0x10 + b_digit_value;

put your two characters into a char array, null-terminate it, and use strtol() from '<stdlib.h>' (docs) to convert it to an integer.
char s[3];
s[0] = '2';
s[1] = 'a';
s[2] = '\0';
int i = strtol(s, null, 16);