Why does multiplying a character produce other seemingly random characters? - c

I can't understand why this code output is weird. I wrote this out of curiosity and now if I enter 55 it shows a leaf. And also many other things depending on number. I searched it in google but didn't find any possible explanation.
#include <stdio.h>
int main(){
char input='*';
int x;
scanf("%d",&x);
printf("%c",input*x);
return 0;
}

Characters are encoded as integers, usually 8-bit by default in C. Some are defined by ASCII and others depend on your OS, screen font, etc.
The * character has code 42. If you enter 55, your code computes 42*55=2310 and uses the low 8 bits of this value, which is 6, as the character to print. Character 6 is ACK which is not defined as a printable character by ASCII, but it sounds like your system is using something like the legacy IBM code page 437, in which character 6 displays as the spade symbol ♠.
Multiplying a character code by an integer is not a very useful thing to do. I'm not sure what you were expecting to accomplish with this program. If you thought it would print 55 copies of the * character, like Perl's x operator, well it doesn't. C has no built-in way to do that; you would just write a loop that prints * once per iteration, and iterates 55 times.

Related

What happens when we make an array defined using characters instead of integers in C?

This is a code I have used to define an array:
int characters[126];
following which I wanted to get a record of the frequencies of all the characters recorded for which I used the while loop in this format:
while((a=getchar())!=EOF){
characters[a]=characters[a]+1;
}
Then using a for loop I print the values of integers in the array.
How exactly is this working?
Does C assign a specific number for letters ie. a,b,c, etc in the array?
What happens when we make an array defined using characters instead of integers in C?
Let's be sure we are clear: you are using integer values returned by getchar() as indexes into your array. This is not defining the array, it is just accessing its elements.
Does C assign a specific number for letters ie. a,b,c, etc in the array?
There are no letters in the array. There are ints. However, yes, the characters read by getchar() are encoded as integer values, so they are, in principle, suitable array indexes. Thus, this line ...
characters[a]=characters[a]+1;
... reads the int value then stored at index a in array characters, adds 1 to it, and then assigns the result back to element a of the array, provided that the value of a is a valid index into the array.
More generally, it is important to understand that although one of its major uses is to represent characters, type char is an integer type. Its values are numbers. The mapping from characters to numbers is implementation and context dependent, but it is common enough for the mapping to be consistent with the ASCII code that you will often see programs that assume such a mapping.
Indeed, your code makes exactly such an assumption (and others) by allowing only for character codes less than 126.
You should also be aware that if your characters array is declared inside a function then it is not initialized. The code depends on all elements to be initially to zero. I would recommend this declaration instead:
int characters[UCHAR_MAX + 1] = {0};
That upper bound will be sufficient for all the non-EOF values returned by getchar(), and the explicit zero-initialization will ensure the needed initial values regardless of where the array is declared.
I have realized the charecter set that can function as an input for getchar() is part of the ASCII table and comes under an int. I used the code following to find that out:
#include <stdio.h>
int main(){
int a[128];
a['b']=4;
printf("%d",a[98]); //it is 98 as according to the table 'b' is assigned the value of 98
}
following which executing this code i get the output of 4.
I am really new to coding so feel free to correct me.
Character values are represented using some kind of integer encoding - ASCII (very common), EBCDIC (mostly IBM mainframes), UTF-8 (backward-compatible to ASCII), etc.
The character value 'a' maps to some integer value - 97 in ASCII and UTF-8, 129 in EBCDIC. So yes, you can use a character value to index into an array - arr['a']++ would be equivalent to arr[97]++ if you were using ASCII or UTF-8.
The C language does not dictate this - it's determined by the underlying platform.

Logical XOR in character arrays

I've been trying to make a program on Vernam Cipher which requires me to XOR two strings. I tried to do this program in C and have been getting an error.The length of the two strings are the same.
#include<stdio.h>
#include<string.h>
int main()
{
printf("Enter your string to be encrypted ");
char a[50];
char b[50];
scanf("%s",a);
printf("Enter the key ");
scanf("%s",b);
char c[50];
int q=strlen(a);
int i=0;
for(i=0;i<q;i++)
{
c[i]=(char)(a[i]^b[i]);
}
printf("%s",c);
}
Whenever I run the code, I get output as ????? in boxes. What is the method to XOR these two strings ?
I've been trying to make a program on Vernam Cipher which requires me to XOR two strings
Yes, it does, but that's not the only thing it requires. The Vernam cipher involves first representing the message and key in the ITA2 encoding (also known as Baudot-Murray code), and then computing the XOR of each pair of corresponding character codes from the message and key streams.
Moreover, to display the result in the manner you indicate wanting to do, you must first convert it from ITA2 to the appropriate character encoding for your locale, which is probably a superset of ASCII.
The transcoding to and from ITA2 is relatively straightforward, but not so trivial that I'm inclined to write them for you. There is a code chart at the ITA2 link above.
Note also that ITA2 is a stateful encoding that includes shift codes and a null character. This implies that the enciphered message may contain non-printing characters, which could cause some confusion, including a null character, which will be misinterpreted as a string terminator if you are not careful. More importantly, encoding in ITA2 may increase the length of the message as a result of a need to insert shift codes.
Additionally, as a technical matter, if you want to treat the enciphered bytes as a C string, then you need to ensure that it is terminated with a null character. On a related note, scanf() will do that for the strings it reads, which uses one character, leaving you only 49 each for the actual message and key characters.
What is the method to XOR these two strings ?
The XOR itself is not your problem. Your code for that is fine. The problem is that you are XORing the wrong values, and (once the preceding is corrected) outputting the result in a manner that does not serve your purpose.
Whenever I run the code, I get output as ????? in boxes...
XORing two printable characters does not always result in a printable value.
Consider the following:
the ^ operator operates at the bit level.
there is a limited range of values that are printable. (from here):
Control Characters (0–31 & 127): Control characters are not printable characters. They are used to send commands to the PC or the
printer and are based on telex technology. With these characters, you
can set line breaks or tabs. Today, they are mostly out of use.
Special Characters (32–47 / 58–64 / 91–96 / 123–126): Special characters include all printable characters that are neither letters
nor numbers. These include punctuation or technical, mathematical
characters. ASCII also includes the space (a non-visible but printable
character), and, therefore, does not belong to the control characters
category, as one might suspect.
Numbers (30–39): These numbers include the ten Arabic numerals from 0-9.
Letters (65–90 / 97–122): Letters are divided into two blocks, with the first group containing the uppercase letters and the second
group containing the lowercase.
Using the following two strings and the following code:
char str1 = {"asdf"};
char str1 = {"jkl;"};
Following demonstrates XORing the elements of the strings:
int main(void)
{
char str1[] = {"asdf"};
char str2[] = {"jkl;"};
for(int i=0;i<sizeof(str1)/sizeof(str1[i]);i++)
{
printf("%d ^ %d: %d\n", str1[i],str2[i], str1[i]^str2[i]);
}
getchar();
return 0;
}
While all of the input characters are printable (except the NULL character), not all of the XOR results of corresponding characters are:
97 ^ 106: 11 //not printable
115 ^ 107: 24 //not printable
100 ^ 108: 8 //not printable
102 ^ 59: 93
0 ^ 0: 0
This is why you are seeing the odd output. While all of the values may be completely valid for your purposes, they are not all printable.

Why are integers not converting to ASCII in C?

I have a snippet of code that goes through the first 256 characters of what I thought was ASCII, outputs the character, and outputs the occurrences of that string in a text file. What is curious is that the characters it outputs doesn't correspond to any ASCII table online. The first character (i = 0) is empty, but the second and third characters are smiley faces followed by a heart, diamond, club, and spade. What is even more curious is that when I check the alphabet ((char)65 = 'A', ...), everything works fine and corresponds to ASCII. Why is this? It only messes up before and after the more standard symbols, saying (char)254 = an integral sign. This is definitely not ASCII...
If it is any consolation, I am running this program through Code::Blocks on a windoes 8 machine.
My code:
void display ()
{
int i;
for(i=0; i<256; i++)
{
printf("Character: %c", (char)i);
printf("\tOccurrences: %d", characterCount[i]);
printf("\n");
}
}
ASCII designates all of the characters from the initial 32 of 128 as non-printable. Some encodings which are based on ASCII assign graphical representations to these characters. They also assign graphical representations to characters 128 and above, which are not even part of ASCII encoding. For example, a common PC encoding called Page 437 assigns smiley faces to characters 1 and 2, characters depicting card suits to characters 3 through 7, and so on.
What you described looks very much like Page 437. However, this behavior is very much system-dependent.

C programming - integers and characters

I ripped this from an ebook on C programming.
I understand that ASCII representations of the characters '0' and '9' are integers, so I understand the compatibility with the integer array. I am simply not sure how the shown output is computed? There input is the code itself.
What does this statement mean?
++ndigit[c-'0'];
So, is the program essentially checking if the input is one of the first 10 installments of of the ASCII code table?
ASCII CODE
No, it doesn't.
c - '0' subtracts the (not necessarily ASCII) character code of the character 0 from that of c. This will yield a number between 0 and 9 if c is a digit. Then, the resulting integer is used to index the zero-initialized ndigit array using the [] operator, and the prefix increment operator (++) is then used to increment the element at that particular index.
By the way, the code is erroneous at multiple places. I suggest you switch to another book because this one appears to be either outdated and/or encouraging the use of several types of bad programming practice.
First, main() doesn't have a return type, which is an error. It needs to be declared as int main() or int main(void) or int main(int, char **). Older compilers had the bad habit of assuming an implicit int return type if it was omitted, but this behavior is now deprecated.
Second, it would be better to initialize the ndigit array, like this:
int ndigit[10] = { 0 };
The for loop is superfluous because we can have initialization; it's also less readable than the initialization syntax, and it's also dangerous: the author doesn't calculate the count of the array using sizeof(ndigits) / sizeof(ndigits[0]), but he hardcodes its length, which may cause a buffer overrun when the length of the array is changed (decreased) and the hard-coded length value in the for loop is forgotten about.
The program computes the number of times a digit between 0 and 9 was introduced as input, how many white spaces and how many other characters were in the input.
++ndigit[c-'0'];
'0' - as integer is the ASCII code for 0.
c - is the read character (its ASCII code)
c - '0' = the actual digit (between 0 and 9) represented by the ASCII code c.
For example '3'(ASCII) would be 3(digit=integer) + '0'(ASCII)
So that's how you obtain the index in the array for your digit and you increment the number of times that digit showed up.

figure out 2 strings similar or not

Rules:
2 strings, a and b, both of them consist of ASCII chars and non-ASCII chars (say, Chinese Characters gbk-encoded).
If the non-ASCII chars contained in b also show up in a and no less than the times they appear in b, then we say b is similar with a.
For example:
a = "ab中ef日jkl中本" //non-ASCII chars:'中'(twice), '日'(once), '本'(once)
b = "bej中中日" //non-ASCII chars:'中'(twice), '日'(once)
c = 'lk日日日' //non-ASCII chars:'日'(3 times, more than twice in a)
according to the rule, b is similar with a, but c is not.
Here is my question:
We don't know how many non-ASCII chars are there in a and b, probably many.
So to find out how many times a non-ASCII char appears in a and b, am I supposed to use a Hash-Table to store their appearing-times?
Take string a as an example:
[non-ASCII's hash-value]:[times]
中's hash-val : 2
日's hash-val : 1
本's hash-val : 1
Check string b, if we encounter a non-ASCII char in b, then hash it and check a's hash-table, if the char is present in a's hash-table, then its appearing-times decrements by 1.
If the appearing-times is less than 0 (-1), then we say b is not similar with a.
Or is there any better way?
PS:
I read string a byte by byte, if the byte is less than 128, then I take is as an ASCII char, otherwise I take it as part of a non-ASCII char (multi-bytes).
This is what I am doing to find out the non-ASCII chars.
Is it right?
You have asked two questions:
Can we count the non-ASCII characters using a hashtable? Answer: sure. As you read the characters (not the bytes), examine the codepoints. For any codepoint greater than 127, put it into a counting hashtable. That is for a character c, add (c,1) if c is not in the table, and update (c,x) to (c, x+1) if c is in the table already.
Is there a better way to solve this problem than your approach of incrementing counts in a and decrementing as you run through b? If your hashtable implementation gives nearly O(1) access, then I suspect not. You are looking at each character in the string exactly once, and for each character your are doing either an hashtable insert or lookup and an addition or subtraction, and a check against 0. With unsorted strings, you have to look at all the characters in both strings anyway, so you've given, I think, the best solution.
The interviewer might be looking for you to say things like, "Hmmmmm, if these strings were actually massive files that could not fit in memory, what would I do?" Or for you to ask "Well are the string sorted? Because if they are, I can do it faster...".
But now let's say the strings are massive. The only thing you are storing in memory is the hashtable. Unicode has only around 1 million codepoints and you are storing an integer count for each, so even if you are getting data from gigabyte sized files you only need around 4MB or so for your hash table (or a small multiple of this, as there will be overhead).
In the absence of any other conditions, your algorithm is nice. Sorting the strings beforehand isn't good; it takes up more memory and isn't a linear-time operation.
ADDENDUM
Since your original comments mentioned the type char as opposed to wchar_t, I thought I'd show an example of using wide strings. See http://codepad.org/B3MXOgqc
Hope that helps.
ADDENDUM 2
Okay here is a C program that shows exactly how to go through a widestring and work at the character level:
http://codepad.org/QVX3QPat
It is a very short program so I will also paste it here:
#include <stdio.h>
#include <string.h>
#include <wchar.h>
char *s1 = "abd中日";
wchar_t *s2 = L"abd中日";
int main() {
int i, n;
printf("length of s1 is %d\n", strlen(s1));
printf("length of s2 using wcslen is %d\n", wcslen(s2));
printf("The codepoints of the characters of s2 are\n");
for (i = 0, n = wcslen(s2); i < n; i++) {
printf("%02x\n", s2[i]);
}
return 0;
}
Output:
length of s1 is 9
length of s2 using wcslen is 5
The codepoints of the characters of s2 are
61
62
64
4e2d
65e5
What can we learn from this? A couple things:
If you use plain old char for CJK characters then the string length will be wrong.
To use Unicode characters in C, use wchar_t
String literals have a leading L for wide strings
In this example I defined a string with CJK characters and used wchar_t and a for-loop with wcslen. Please note here that I am working with real characters, NOT BYTES, so I get the correct count of characters, which is 5. Now I print out each codepoint. In your interview question, you will be looking to see if the codepoint is >= 128. I showed them in Hex, as is the culture, so you can look for > 0x7F. :-)
ADDENDUM 3
A few notes in http://tldp.org/HOWTO/Unicode-HOWTO-6.html are worth reading. There is a lot more to character handling than the simple example above shows. In the comments below J.F. Sebastian gives a number of other important links.
Of the few things that need to be addressed is normalization. For example, does your interviewer care that when given two strings, one containing just a Ç and the other a C followed by a COMBINING MARK CEDILLA BELOW, would they be the same? They represent the same character, but one uses one codepoint and the other uses two.

Resources