I was doing a caesar cypher in c to practice and I make a functioning one; but it shows a strange behavior. The code is the one follows:
#define lenght 18
char * caesar ( char * cyphertext, int key){
static char result [lenght];
for ( int i= 0; i < lenght ; i++){
result [i] =(char)(((int) cyphertext[i]) + key) % 255;
}
return result;
}
int main(){
char * text = caesar("Hola buenas tardes", 23 );
printf("%s \n" , text );
char * check = caesar( text , 256 - 23);
printf("%s \n" , check);
return 0;
}
The encrypted version is _x7y|x7x{|; a shorter number; but when i run the second caesar cypher with the decryption
key it decrypts it with no problem to the original state. I have been looking around and it probably is about how
the characters are stored. I will very grateful for any help
The encrypted version is _x7y|x7x{|; a shorter number;
No, what printf prints is the above. Or even more precisely, that's how your terminal displays what printf prints. If you want to be certain exactly what the encrypted version is then you should run your program in a debugger, and use it to examine the bytes of the encoded version.
Your approach will encode some character codes from the printable ASCII range (codes 32 - 126 decimal) as codes outside that range. How your terminal handles those bytes depends on your configuration and environment, but if it expects UTF-8-encoded data then it will trip over invalid code sequences in the output, and if it expects an ISO-8859 encoding then some of the output codes will be interpreted as control characters from the C1 set. There are other possibilities.
Usually, a Caesar-cypher program is careful to map all printable characters to other printable characters, leaving others alone. The typical academic exercise is even more narrowly scoped, asking for a program that maps only the upper- and lowercase Latin letters, preserving case, and leaves all others (punctuation, digits, control characters) alone. This is, accordingly, left as an exercise.
The printf function should not be used to print the cipher text, it mainly support ascii characters and you have random unprintable characters. Consider converting it to a hexadecimal string.
Related
I am trying to read non-printable characters from a text file, print out the characters' ASCII code, and finally write these non-printable characters into an output file.
However, I have noticed that for every non-printable character I read, there is always an extra non-printable character existing in front of what I really want to read.
For example, the character I want to read is "§".
And when I print out its ASCII code in my program, instead of printing just "167", it prints out "194 167".
I looked it up in the debugger and saw "§" in the char array. But I don't have  anywhere in my input file.
screenshot of debugger
And after I write the non-printable character into my output file, I have noticed that it is also just "§", not "§".
There is an extra character being attached to every single non-printable character I read. Why is this happening? How do I get rid of it?
Thanks!
Code as follows:
case 1:
mode = 1;
FILE *fp;
fp = fopen ("input2.txt", "r");
int charCount = 0;
while(!feof(fp)) {
original_message[charCount] = fgetc(fp);
charCount++;
}
original_message[charCount - 1] = '\0';
fclose(fp);
k = strlen(original_message);//split the original message into k input symbols
printf("k: \n%lld\n", k);
printf("ASCII code:\n");
for (int i = 0; i < k; i++)
{
ASCII = original_message[i];
printf("%d ", ASCII);
}
C's getchar (and getc and fgetc) functions are designed to read individual bytes. They won't directly handle "wide" or "multibyte" characters such as occur in the UTF-8 encoding of Unicode.
But there are other functions which are specifically designed to deal with those extended characters. In particular, if you wish, you can replace your call to fgetc(fp) with fgetwc(fp), and then you should be able to start reading characters like § as themselves.
You will have to #include <wchar.h> to get the prototype for fgetwc. And you may have to add the call
setlocale(LC_CTYPE, "");
at the top of your program to synchronize your program's character set "locale" with that of your operating system.
Not your original code, but I wrote this little program:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main()
{
wchar_t c;
setlocale(LC_CTYPE, "");
while((c = fgetwc(stdin)) != EOF)
printf("%lc %d\n", c, c);
}
When I type "A", it prints A 65.
When I type "§", it prints § 167.
When I type "Ƶ", it prints Ƶ 437.
When I type "†", it prints † 8224.
Now, with all that said, reading wide characters using functions like fgetwc isn't the only or necessarily even the best way of dealing with extended characters. In your case, it carries a number of additional consequences:
Your original_message array is going to have to be an array of wchar_t, not an array of char.
Your original_message array isn't going to be an ordinary C string — it's a "wide character string". So you can't call strlen on it; you're going to have to call wcslen.
Similarly, you can't print it using %s, or its characters using %c. You'll have to remember to use %ls or %lc.
So although you can convert your entire program to use "wide" strings and "w" functions everywhere, it's a ton of work. In many cases, and despite anomalies like the one you asked about, it's much easier to use UTF-8 everywhere, since it tends to Just Work. In particular, as long as you don't have to pick a string apart and work with its individual characters, or compute the on-screen display length of a string (in "characters") using strlen, you can just use plain C strings everywhere, and let the magic of UTF-8 sequences take care of any non-ASCII characters your users happen to enter.
I've been trying to make a program on Vernam Cipher which requires me to XOR two strings. I tried to do this program in C and have been getting an error.The length of the two strings are the same.
#include<stdio.h>
#include<string.h>
int main()
{
printf("Enter your string to be encrypted ");
char a[50];
char b[50];
scanf("%s",a);
printf("Enter the key ");
scanf("%s",b);
char c[50];
int q=strlen(a);
int i=0;
for(i=0;i<q;i++)
{
c[i]=(char)(a[i]^b[i]);
}
printf("%s",c);
}
Whenever I run the code, I get output as ????? in boxes. What is the method to XOR these two strings ?
I've been trying to make a program on Vernam Cipher which requires me to XOR two strings
Yes, it does, but that's not the only thing it requires. The Vernam cipher involves first representing the message and key in the ITA2 encoding (also known as Baudot-Murray code), and then computing the XOR of each pair of corresponding character codes from the message and key streams.
Moreover, to display the result in the manner you indicate wanting to do, you must first convert it from ITA2 to the appropriate character encoding for your locale, which is probably a superset of ASCII.
The transcoding to and from ITA2 is relatively straightforward, but not so trivial that I'm inclined to write them for you. There is a code chart at the ITA2 link above.
Note also that ITA2 is a stateful encoding that includes shift codes and a null character. This implies that the enciphered message may contain non-printing characters, which could cause some confusion, including a null character, which will be misinterpreted as a string terminator if you are not careful. More importantly, encoding in ITA2 may increase the length of the message as a result of a need to insert shift codes.
Additionally, as a technical matter, if you want to treat the enciphered bytes as a C string, then you need to ensure that it is terminated with a null character. On a related note, scanf() will do that for the strings it reads, which uses one character, leaving you only 49 each for the actual message and key characters.
What is the method to XOR these two strings ?
The XOR itself is not your problem. Your code for that is fine. The problem is that you are XORing the wrong values, and (once the preceding is corrected) outputting the result in a manner that does not serve your purpose.
Whenever I run the code, I get output as ????? in boxes...
XORing two printable characters does not always result in a printable value.
Consider the following:
the ^ operator operates at the bit level.
there is a limited range of values that are printable. (from here):
Control Characters (0–31 & 127): Control characters are not printable characters. They are used to send commands to the PC or the
printer and are based on telex technology. With these characters, you
can set line breaks or tabs. Today, they are mostly out of use.
Special Characters (32–47 / 58–64 / 91–96 / 123–126): Special characters include all printable characters that are neither letters
nor numbers. These include punctuation or technical, mathematical
characters. ASCII also includes the space (a non-visible but printable
character), and, therefore, does not belong to the control characters
category, as one might suspect.
Numbers (30–39): These numbers include the ten Arabic numerals from 0-9.
Letters (65–90 / 97–122): Letters are divided into two blocks, with the first group containing the uppercase letters and the second
group containing the lowercase.
Using the following two strings and the following code:
char str1 = {"asdf"};
char str1 = {"jkl;"};
Following demonstrates XORing the elements of the strings:
int main(void)
{
char str1[] = {"asdf"};
char str2[] = {"jkl;"};
for(int i=0;i<sizeof(str1)/sizeof(str1[i]);i++)
{
printf("%d ^ %d: %d\n", str1[i],str2[i], str1[i]^str2[i]);
}
getchar();
return 0;
}
While all of the input characters are printable (except the NULL character), not all of the XOR results of corresponding characters are:
97 ^ 106: 11 //not printable
115 ^ 107: 24 //not printable
100 ^ 108: 8 //not printable
102 ^ 59: 93
0 ^ 0: 0
This is why you are seeing the odd output. While all of the values may be completely valid for your purposes, they are not all printable.
To investigate how C deals with UTF-8 / Unicode characters, I did this little experiment.
It's not that I'm trying to solve anything particular at the moment, but I know that Java deals with the whole encoding situation in a transparent way to the coder and I was wondering how C, that is a lot lower level, treats its characters.
The following test seems to indicate that C is entirely ignorant about encoding concerns, as that it's just up to the display device to know how to interpret the sequence of chars when showing them on screen. The later tests (when printing the characters surrounded by _) seem particular telling?
#include <stdio.h>
#include <string.h>
int main() {
char str[] = "João"; // ã does not belong to the standard
// (or extended) ASCII characters
printf("number of chars = %d\n", (int)strlen(str)); // 5
int len = 0;
while (str[len] != '\0')
len++;
printf("number of bytes = %d\n", len); // 5
for (int i = 0; i < len; i++)
printf("%c", str[i]);
puts("");
// "João"
for (int i = 0; i < len; i++)
printf("_%c_", str[i]);
puts("");
// _J__o__�__�__o_ -> wow!!!
str[2] = 'X'; // let's change this special character
// and see what happens
for (int i = 0; i < len; i++)
printf("%c", str[i]);
puts("");
// JoX�o
for (int i = 0; i < len; i++)
printf("_%c_", str[i]);
puts("");
// _J__o__X__�__o_
}
I have knowledge of how ASCII / UTF-8 work, what I'm really unsure is on at what moment do the characters get interpreted as "compound" characters, as it seems that C just treats them as dumb bytes. What's really the science behind this?
The printing isn't a function of C, but of the display context, whatever that is. For a terminal there are UTF-8 decoding functions which map the raw character data into the character to be shown on screen using a particular font. A similar sort of display logic happens in graphical applications, though with even more complexity relating to proportional font widths, ligatures, hyphenation, and numerous other typographical concerns.
Internally this is often done by decoding UTF-8 into some intermediate form first, like UTF-16 or UTF-32, for look-up purposes. In extremely simple terms, each character in a font has a Unicode identifier. In practice this is a lot more complicated as there is room for character variants, and multiple characters may be represented by a singular character in a font, like "fi" and "ff" ligatures. Accented characters like "ç" may be a combination of characters, as allowed by Unicode. That's where things like Zalgo text come about: you can often stack a truly ridiculous number of Unicode "combining characters" together into a single output character.
Typography is a complex world with complex libraries required to render properly.
You can handle UTF-8 data in C, but only with special libraries. Nothing that C ships with in the Standard Library can understand them, to C it's just a series of bytes, and it assumes byte is equivalent to character for the purposes of length. That is strlen and such work with bytes as a unit, not characters.
C++, as an example, has much better support for this distinction between byte and character. Other languages have even better support, with languages like Swift having exceptional support for UTF-8 specifically and Unicode in general.
printf("_%c_", str[i]); prints the character associated with each str[i] - one at a time.
The value of char str[i] is converted to an int when passed ot a ... function. The int value is then converted to unsigned char as directed by "%c" and "and the resulting character is written".
char str[] = "João"; does not certainly specify a UTF8 sequence. That in an implementation detail. A specified way is to use char str[] = u8"João"; since C11 (or maybe C99).
printf() does not specify a direct way to print UTF8 stirrings.
I have a snippet of code that goes through the first 256 characters of what I thought was ASCII, outputs the character, and outputs the occurrences of that string in a text file. What is curious is that the characters it outputs doesn't correspond to any ASCII table online. The first character (i = 0) is empty, but the second and third characters are smiley faces followed by a heart, diamond, club, and spade. What is even more curious is that when I check the alphabet ((char)65 = 'A', ...), everything works fine and corresponds to ASCII. Why is this? It only messes up before and after the more standard symbols, saying (char)254 = an integral sign. This is definitely not ASCII...
If it is any consolation, I am running this program through Code::Blocks on a windoes 8 machine.
My code:
void display ()
{
int i;
for(i=0; i<256; i++)
{
printf("Character: %c", (char)i);
printf("\tOccurrences: %d", characterCount[i]);
printf("\n");
}
}
ASCII designates all of the characters from the initial 32 of 128 as non-printable. Some encodings which are based on ASCII assign graphical representations to these characters. They also assign graphical representations to characters 128 and above, which are not even part of ASCII encoding. For example, a common PC encoding called Page 437 assigns smiley faces to characters 1 and 2, characters depicting card suits to characters 3 through 7, and so on.
What you described looks very much like Page 437. However, this behavior is very much system-dependent.
I am writing a simple Caesar implementation for Amharic- one of the languages widely spoken in Ethiopia. Here is the code
main(){
setlocale(LC_ALL, " ");
int i, key=0;
wchar_t message[20];
wprintf(L"Enter message:>");
fgetws(message, sizeof(message), stdin);
wprintf(L"Enter cipher key:>");
wscanf(L"%d", &key);
/* basic input validation goes here */
for(i=0;message[i]!='\0'; i++){
message[i]=message[i]+key;
}
wprintf(L"The cipher is: %ls", message);
wprintf(L"\n");
return 0;
}
The code compiles without a warning. Works fine if key is less or equal to 5. The problem comes when key value is 6 and above. It prints one additional char so far as I tested it. I ran it through gdb to see where it's picking up the additional char. But couldn't make much sense. Here are my question:
Why does it work for the key 0 to 5 but not above?
Where does it get the additional char it prints for key grater than 5?
If it helps the sizeof wchar on my machine is 4byte.
Thank you
EDIT:
sample input:
message: ተደለ
key: 6
output: ቶዶሎ 0010
output I expect ቶዶሎ
The 0010 is displayed like those chars without a corresponding symbol on the unicode table.
Thanks again.
The extra char you see is the wide-char newline (0x000a) that is kept at the end of the string you read with fgetws and that you shift by 6 chars, resulting in 0x0010. That doesn't seem to be a printable character and the terminal decides to print the character code as plain hex numbers.
Remove the trailing new-line before shifting or shift only printable characters.