I have a snippet of code that goes through the first 256 characters of what I thought was ASCII, outputs the character, and outputs the occurrences of that string in a text file. What is curious is that the characters it outputs doesn't correspond to any ASCII table online. The first character (i = 0) is empty, but the second and third characters are smiley faces followed by a heart, diamond, club, and spade. What is even more curious is that when I check the alphabet ((char)65 = 'A', ...), everything works fine and corresponds to ASCII. Why is this? It only messes up before and after the more standard symbols, saying (char)254 = an integral sign. This is definitely not ASCII...
If it is any consolation, I am running this program through Code::Blocks on a windoes 8 machine.
My code:
void display ()
{
int i;
for(i=0; i<256; i++)
{
printf("Character: %c", (char)i);
printf("\tOccurrences: %d", characterCount[i]);
printf("\n");
}
}
ASCII designates all of the characters from the initial 32 of 128 as non-printable. Some encodings which are based on ASCII assign graphical representations to these characters. They also assign graphical representations to characters 128 and above, which are not even part of ASCII encoding. For example, a common PC encoding called Page 437 assigns smiley faces to characters 1 and 2, characters depicting card suits to characters 3 through 7, and so on.
What you described looks very much like Page 437. However, this behavior is very much system-dependent.
Related
I can't understand why this code output is weird. I wrote this out of curiosity and now if I enter 55 it shows a leaf. And also many other things depending on number. I searched it in google but didn't find any possible explanation.
#include <stdio.h>
int main(){
char input='*';
int x;
scanf("%d",&x);
printf("%c",input*x);
return 0;
}
Characters are encoded as integers, usually 8-bit by default in C. Some are defined by ASCII and others depend on your OS, screen font, etc.
The * character has code 42. If you enter 55, your code computes 42*55=2310 and uses the low 8 bits of this value, which is 6, as the character to print. Character 6 is ACK which is not defined as a printable character by ASCII, but it sounds like your system is using something like the legacy IBM code page 437, in which character 6 displays as the spade symbol ♠.
Multiplying a character code by an integer is not a very useful thing to do. I'm not sure what you were expecting to accomplish with this program. If you thought it would print 55 copies of the * character, like Perl's x operator, well it doesn't. C has no built-in way to do that; you would just write a loop that prints * once per iteration, and iterates 55 times.
Is there something wrong with my program that prints the characters for the ASCII Code?
My program doesn't print characters from 0 to 32. Otherwise, it works.
Code:
#include <stdio.h>
int main(){
int i;
for(i=0; i<=127; i++){
if(i%10 == 0){
printf("--------------------\n");
printf(" Dec Char \n");
printf("--------------------\n");
}
printf(" %d %5c \n", i, i);
}
return 0;
}
Thank you!
My program don't print characters from 0 to 32.
Your program does print all characters from 0 to 127, inclusive. Presuming your C implementation uses ASCII, which is likely, most of the characters from 0 to 32 and character 127 do not have entirely visible effects.
The characters from 0 to 31 are control codes of various kinds. Character 7 is an “alert” character. You likely heard a ding, beep, or other sound when it was “printed.”
Character 9 is a horizontal tab. If you print a visible character before it and another after it, you will see some amount of space between them. The space will be such that the character after it is printed in the text “tab column,” often column 8, 16, 24, 32, and so on with default settings (counting the first column as column 0).
Character 10 is a line-feed or new-line character, which moved the paper up in old teletype machines. You probably saw a blank line after “10” and before “11” due to this character. 13 is a carriage return. (Originally it moved the carriage, the mechanism holding the paper, back to align the left side with the printing mechanism. Later, it moved the printing mechanism back to the left side of the paper.) 12 is a vertical tab. 8 is a backspace. In modern terminal window software, printing “a”, backspace, and “b” would likely backspace over the “a” and replace it with “b”. In old teletypes, it would physically print the “b” over the “a”.
32 is a space. It was printed normally, but your output does not give a clear way to make it visible. (It could be visible by copying the text from the terminal window, pasting it into a text editor, and examining the specific characters in detail. That could reveal there is a space character in that line that is not present in the line above it.) If you print “a”, space (character 32), and “b”, you will see the space.
Character 127 is a delete character.
Many of the other codes are archaic for text output to humans. However, some of them have uses for input, such as signaling requests to pause or resume program output, to interrupt or to suspend a program, to undo the character most recently typed, and so on. In addition, the escape character, 27, is used to start “escape sequences” that some terminal software interprets as requests of various kinds, such as to change the text color, put new text in the window title bar, or to change or report the cursor position.
You can print ascii after 32 these 32 for control like start text, end text etc.
I was doing a caesar cypher in c to practice and I make a functioning one; but it shows a strange behavior. The code is the one follows:
#define lenght 18
char * caesar ( char * cyphertext, int key){
static char result [lenght];
for ( int i= 0; i < lenght ; i++){
result [i] =(char)(((int) cyphertext[i]) + key) % 255;
}
return result;
}
int main(){
char * text = caesar("Hola buenas tardes", 23 );
printf("%s \n" , text );
char * check = caesar( text , 256 - 23);
printf("%s \n" , check);
return 0;
}
The encrypted version is _x7y|x7x{|; a shorter number; but when i run the second caesar cypher with the decryption
key it decrypts it with no problem to the original state. I have been looking around and it probably is about how
the characters are stored. I will very grateful for any help
The encrypted version is _x7y|x7x{|; a shorter number;
No, what printf prints is the above. Or even more precisely, that's how your terminal displays what printf prints. If you want to be certain exactly what the encrypted version is then you should run your program in a debugger, and use it to examine the bytes of the encoded version.
Your approach will encode some character codes from the printable ASCII range (codes 32 - 126 decimal) as codes outside that range. How your terminal handles those bytes depends on your configuration and environment, but if it expects UTF-8-encoded data then it will trip over invalid code sequences in the output, and if it expects an ISO-8859 encoding then some of the output codes will be interpreted as control characters from the C1 set. There are other possibilities.
Usually, a Caesar-cypher program is careful to map all printable characters to other printable characters, leaving others alone. The typical academic exercise is even more narrowly scoped, asking for a program that maps only the upper- and lowercase Latin letters, preserving case, and leaves all others (punctuation, digits, control characters) alone. This is, accordingly, left as an exercise.
The printf function should not be used to print the cipher text, it mainly support ascii characters and you have random unprintable characters. Consider converting it to a hexadecimal string.
I've been trying to make a program on Vernam Cipher which requires me to XOR two strings. I tried to do this program in C and have been getting an error.The length of the two strings are the same.
#include<stdio.h>
#include<string.h>
int main()
{
printf("Enter your string to be encrypted ");
char a[50];
char b[50];
scanf("%s",a);
printf("Enter the key ");
scanf("%s",b);
char c[50];
int q=strlen(a);
int i=0;
for(i=0;i<q;i++)
{
c[i]=(char)(a[i]^b[i]);
}
printf("%s",c);
}
Whenever I run the code, I get output as ????? in boxes. What is the method to XOR these two strings ?
I've been trying to make a program on Vernam Cipher which requires me to XOR two strings
Yes, it does, but that's not the only thing it requires. The Vernam cipher involves first representing the message and key in the ITA2 encoding (also known as Baudot-Murray code), and then computing the XOR of each pair of corresponding character codes from the message and key streams.
Moreover, to display the result in the manner you indicate wanting to do, you must first convert it from ITA2 to the appropriate character encoding for your locale, which is probably a superset of ASCII.
The transcoding to and from ITA2 is relatively straightforward, but not so trivial that I'm inclined to write them for you. There is a code chart at the ITA2 link above.
Note also that ITA2 is a stateful encoding that includes shift codes and a null character. This implies that the enciphered message may contain non-printing characters, which could cause some confusion, including a null character, which will be misinterpreted as a string terminator if you are not careful. More importantly, encoding in ITA2 may increase the length of the message as a result of a need to insert shift codes.
Additionally, as a technical matter, if you want to treat the enciphered bytes as a C string, then you need to ensure that it is terminated with a null character. On a related note, scanf() will do that for the strings it reads, which uses one character, leaving you only 49 each for the actual message and key characters.
What is the method to XOR these two strings ?
The XOR itself is not your problem. Your code for that is fine. The problem is that you are XORing the wrong values, and (once the preceding is corrected) outputting the result in a manner that does not serve your purpose.
Whenever I run the code, I get output as ????? in boxes...
XORing two printable characters does not always result in a printable value.
Consider the following:
the ^ operator operates at the bit level.
there is a limited range of values that are printable. (from here):
Control Characters (0–31 & 127): Control characters are not printable characters. They are used to send commands to the PC or the
printer and are based on telex technology. With these characters, you
can set line breaks or tabs. Today, they are mostly out of use.
Special Characters (32–47 / 58–64 / 91–96 / 123–126): Special characters include all printable characters that are neither letters
nor numbers. These include punctuation or technical, mathematical
characters. ASCII also includes the space (a non-visible but printable
character), and, therefore, does not belong to the control characters
category, as one might suspect.
Numbers (30–39): These numbers include the ten Arabic numerals from 0-9.
Letters (65–90 / 97–122): Letters are divided into two blocks, with the first group containing the uppercase letters and the second
group containing the lowercase.
Using the following two strings and the following code:
char str1 = {"asdf"};
char str1 = {"jkl;"};
Following demonstrates XORing the elements of the strings:
int main(void)
{
char str1[] = {"asdf"};
char str2[] = {"jkl;"};
for(int i=0;i<sizeof(str1)/sizeof(str1[i]);i++)
{
printf("%d ^ %d: %d\n", str1[i],str2[i], str1[i]^str2[i]);
}
getchar();
return 0;
}
While all of the input characters are printable (except the NULL character), not all of the XOR results of corresponding characters are:
97 ^ 106: 11 //not printable
115 ^ 107: 24 //not printable
100 ^ 108: 8 //not printable
102 ^ 59: 93
0 ^ 0: 0
This is why you are seeing the odd output. While all of the values may be completely valid for your purposes, they are not all printable.
I need to find the non-ASCII characters from a UTF-8 string.
my understanding:
UTF-8 is a superset of character encoding in which 0-127 are ascii characters.
So if in a UTF-8 string , a characters value is Not between 0-127, then it is not a ascii character , right? Please correct me if i'm wrong here.
On the above understanding i have written following code in C :
Note:
I'm using the Ubuntu gcc compiler to run C code
utf-string is x√ab c
long i;
char arr[] = "x√ab c";
printf("length : %lu \n", sizeof(arr));
for(i=0; i<sizeof(arr); i++){
char ch = arr[i];
if (isascii(ch))
printf("Ascii character %c\n", ch);
else
printf("Not ascii character %c\n", ch);
}
Which prints the output like:
length : 9
Ascii character x
Not ascii character
Not ascii character �
Not ascii character �
Ascii character a
Ascii character b
Ascii character
Ascii character c
Ascii character
To naked eye length of x√ab c seems to be 6, but in code it is coming as 9 ?
Correct answer for the x√ab c is 1 ...i.e it has only 1 non-ascii character , but in above output it is coming as 3 (times Not ascii character).
How can i find the non-ascii character from UTF-8 string, correctly.
Please guide on the subject.
What C calls a char is actually a byte. A UTF-8 character can be made up of several bytes.
In fact only the ASCII characters are represented by a single byte in UTF-8 (which is why all valid ASCII-encoded text is also effectively UTF-8 encoded).
So to count the number of UTF-8 characters you have to do a partial decoding: count the number of UTF-8 start codepoints.
See the Wikipedia article on UTF-8 to find out how they are encoded.
Basically there are 3 categories:
single-byte codes 0b0xxxxxxx
start bytes: 0b110xxxxx, 0b1110xxxx, 0b11110xxx
continuation bytes: 0b10xxxxxx
To count the number of unicode codepoint simply count all characters that are not continuation bytes.
However unicode codepoints don't always have a 1-to-1 correspondence to "characters" (depending on your exact definition of character).
The UTF-8 characters when taken in a character array occupies it in such a way that the first byte occupied by each UTF-8 character would contain the information regarding the number of bytes taken to represent the character. The number of consecutive 1's from the MSB of the first byte would represent the total bytes taken by the non-ascii character. In case of '√' the binary form would be: 11100010,10001000,10011010. Counting the number of 1's the in the first byte gives the number of bytes occupied as 3. Something like the code below would work for this:
int get_count(char non_ascii_char){
/*
The function returns the number of bytes occupied by the UTF-8 character
It takes the non ASCII character as the input and returns the length
to the calling function.
*/
int bit_counter=7,count=0;
/*
bit_counter - is the counter initialized to traverse through each bit of the
non ascii character
count - stores the number of bytes occupied by the character
*/
for(;bit_counter>=0;bit_counter--){
if((non_ascii_char>>bit_counter)&1){
count++;// increments on the number of consecutive 1s in the byte
}
else{
break;// breaks on encountering the first 0
}
}
return count;// returns the count to the calling function
}