Why does this variable equal a number and a character?

Why does this variable equal a number and a character? - c

While debugging this code snippet here:
int main () {
char str[] = "Stackoverflow";
char a = *str;
return 0;
}
Why does a show as 83 'S'?

I think you might want to have more than one thing clarified:
First, pointer str points to the first character of a sequence of character values in memory, i.e. S, t, a, ...
Then, *str dereferences this pointer, i.e. it reads the value of the character to which str points. Hence *str yields S.
Statement char a = *str assigns the value S to variable a of type char, which represents a portion of memory capable of storing one character. Usually, char is an 8 bit signed (or unsigned) integral value, and any simple character is therefore represented by a value between -127 and +128 (the range of signed 8 bit values). The character value S, for example, seems to be represented as integral value 83 according to ASCII. Whether a system uses ASCII or some other character set is system defined, but ASCII is by far the most common character set today.
So S and 83 are actually the same thing, it's just that when a terminal interprets value 83 to be printed as character, it prints S. The other way round, if we interpret S as integral value, a terminal would print 83:
#include <stdio.h>
int main() {
printf("'S' as integral value: %d\n", 'S');
printf("83 as character value: %c\n", 83);
char c1 = 'S';
char c2 = 83;
if (c1 == c2) {
printf("c1 and c2 are equal.\n");
} else {
printf("c1 and c2 are not equal.\n");
}
}
Output:
'S' as integral value: 83
83 as character value: S
c1 and c2 are equal.

83 is the ASCII code for uppercase letter 's'.
*str is equal to writing str[0] so in this case the first memory slot of the array str which corresponds to the character 'S'

Computers understand everything as numbers: Characters, strings, photos, videos, audio ... etc. Everything is a number inside a computer and thus people wondered how to represent characters.
And because of this fact, they decided to encode characters as numbers so that every character has a corresponding number that encodes it inside the computer.
Throughout history, many character encoding schemes (A matching between characters and numbers) have been worked out but one of them is very famous and almost used everywhere : It's called ASCII character encoding. ASCII is a 7-bits encoding that represents all numerical characters and Latin alphabet characters (Uppercase and lowercase) beside some other symbols.
By default, your system provides ASCII input to your C program and thus, internally, this input is stored in memory as ASCII standard says. For instance, when you type A on your keyboard, the keyboard sends the value 65 (This is the decimal value of the character A in the ASCII standard. Internally, it is sent as a sequence of 1000001101 because computers work in binary) to your program. Your program stores this value (65) inside a memory location specified by a variable (char c;). When you ask the computer to print this character, it checks the ASCII value stored in the character's variable and then figures out a way how to draw the matching symbol on the screen.
In C, strings are just a sequence (Or an array) of characters. When you hold a pointer to a string, it actually points to the first character of the string (The character array). If you advance the pointer by 1, you will point to the second character and so on. So, if you dereference your original pointer (That points to the first character), you will get the ASCII value of the character stored in that position (The first position) and thus in your case you get 83 which corresponds to the symbol 'S'.
The program below shows all ASCII characters and their graphical representation : Some few characters might not have a visual representation because they are used for controlling input and terminal, especially, the first few characters (First 34 values).
#include <stdio.h>
int main ()
{
/* Unsigned to avoid integer overflow in the loop below */
unsigned char c;
/* ASCII is 7-bit so it can represent
2^7 = 128 (from 0 to 127) symbols */
for (c = 0; c < 128; c++)
printf ("ASCII value of %c = %d\n", c, c);
return 0;
}

Related

Logical XOR in character arrays

I've been trying to make a program on Vernam Cipher which requires me to XOR two strings. I tried to do this program in C and have been getting an error.The length of the two strings are the same.
#include<stdio.h>
#include<string.h>
int main()
{
printf("Enter your string to be encrypted ");
char a[50];
char b[50];
scanf("%s",a);
printf("Enter the key ");
scanf("%s",b);
char c[50];
int q=strlen(a);
int i=0;
for(i=0;i<q;i++)
{
c[i]=(char)(a[i]^b[i]);
}
printf("%s",c);
}
Whenever I run the code, I get output as ????? in boxes. What is the method to XOR these two strings ?

I've been trying to make a program on Vernam Cipher which requires me to XOR two strings
Yes, it does, but that's not the only thing it requires. The Vernam cipher involves first representing the message and key in the ITA2 encoding (also known as Baudot-Murray code), and then computing the XOR of each pair of corresponding character codes from the message and key streams.
Moreover, to display the result in the manner you indicate wanting to do, you must first convert it from ITA2 to the appropriate character encoding for your locale, which is probably a superset of ASCII.
The transcoding to and from ITA2 is relatively straightforward, but not so trivial that I'm inclined to write them for you. There is a code chart at the ITA2 link above.
Note also that ITA2 is a stateful encoding that includes shift codes and a null character. This implies that the enciphered message may contain non-printing characters, which could cause some confusion, including a null character, which will be misinterpreted as a string terminator if you are not careful. More importantly, encoding in ITA2 may increase the length of the message as a result of a need to insert shift codes.
Additionally, as a technical matter, if you want to treat the enciphered bytes as a C string, then you need to ensure that it is terminated with a null character. On a related note, scanf() will do that for the strings it reads, which uses one character, leaving you only 49 each for the actual message and key characters.
What is the method to XOR these two strings ?
The XOR itself is not your problem. Your code for that is fine. The problem is that you are XORing the wrong values, and (once the preceding is corrected) outputting the result in a manner that does not serve your purpose.

Whenever I run the code, I get output as ????? in boxes...
XORing two printable characters does not always result in a printable value.
Consider the following:
the ^ operator operates at the bit level.
there is a limited range of values that are printable. (from here):
Control Characters (0–31 & 127): Control characters are not printable characters. They are used to send commands to the PC or the
printer and are based on telex technology. With these characters, you
can set line breaks or tabs. Today, they are mostly out of use.
Special Characters (32–47 / 58–64 / 91–96 / 123–126): Special characters include all printable characters that are neither letters
nor numbers. These include punctuation or technical, mathematical
characters. ASCII also includes the space (a non-visible but printable
character), and, therefore, does not belong to the control characters
category, as one might suspect.
Numbers (30–39): These numbers include the ten Arabic numerals from 0-9.
Letters (65–90 / 97–122): Letters are divided into two blocks, with the first group containing the uppercase letters and the second
group containing the lowercase.
Using the following two strings and the following code:
char str1 = {"asdf"};
char str1 = {"jkl;"};
Following demonstrates XORing the elements of the strings:
int main(void)
{
char str1[] = {"asdf"};
char str2[] = {"jkl;"};
for(int i=0;i<sizeof(str1)/sizeof(str1[i]);i++)
{
printf("%d ^ %d: %d\n", str1[i],str2[i], str1[i]^str2[i]);
}
getchar();
return 0;
}
While all of the input characters are printable (except the NULL character), not all of the XOR results of corresponding characters are:
97 ^ 106: 11 //not printable
115 ^ 107: 24 //not printable
100 ^ 108: 8 //not printable
102 ^ 59: 93
0 ^ 0: 0
This is why you are seeing the odd output. While all of the values may be completely valid for your purposes, they are not all printable.

Does strlen return same value for a binary and ascii data

Please find the below code snippet.
unsigned char bInput[20];
unsigned char cInput[20];
From a function, I get a binary data in bInput and I determined its length using strlen(bInput).
I converted bInput which is in binary to ASCII and stored in cInput and printed its length. But both are different.
I am new to programming. Please guide regarding its behaviour.

Function strlen returns the index of the first character in memory with a value of 0 (AKA '\0'), starting from the memory address indicated by the input argument passed to this function.
If you pass a memory address of "something else" other than a zero-terminated string of characters (which has been properly allocated at that memory address), then there's a fair chance that it will result with a memory-access violation (AKA segmentation fault).

result wont be same for both cases.
Below is one sample scenario:
Null is valid UTF-8, it just doesn't work with C 'strings'.
char temp[8];
buf = "abcde\0f";
What we have here is a buffer of length 8, which contains these char values:
97 98 99 100 101 0 102 0
here,strlen(temp) is equal to 5 as per strlen design,however,The actual length of the buffer is eight.

strlen() counts each byte untill it reaches NULL character ('\0' that means value of a byte is zero). So if you are getting different length for binary and ascii characters means you need to check the below two points in your conversion logic,
what you are doing if binary value is zero.
whether you are converting any nonzero binary value to zero.

C: sizeof() related doubts?

#include <stdio.h>
#include <string.h>
main()
{
printf("%d \n ",sizeof(' '));
printf("%d ",sizeof(""));
}
output:
4
1
Why o/p is coming 4 for 1st printf and moreover if i am giving it as '' it is showing error as error: empty character constant but for double quote blank i.e. without any space is fine no error?

The ' ' is example of integer character constant, which has type int (it's not converted, it has such type). Second is "" character literal, which contains only one character i.e. null character and since sizeof(char) is guaranteed to be 1, the size of whole array is 1 as well.

' ' is converted to an integer character constant(hence 4 bytes on your machine), "" is empty character array, which is still 1 byte('\0') terminated.

Here in below check the difference
#include<stdio.h>
int main()
{
char a= 'b';
printf("%d %d %d", sizeof(a),sizeof('b'), sizeof("a"));
return 0;
}
here a is defined as character whose data type size is 1 byte.
But 'b' is character constant. A character constant is an integer,The value of a character constant is the numeric value of the character in the machine's character set. sizeof char constant is nothing but int which is 4 byte
this is string literals "a" ---> array character whose size is number of character + \0 (NULL). Here its 2

This is answered in Size of character ('a') in C/C++
In C, the type of a character constant like 'a' is actually an int, with size of 4 (or some other implementation-dependent value). In C++, the type is char, with size of 1. This is one of many small differences between the two languages.

The 'space', or 'any single character', is actually of type integer, equal to the ASCII value of that character. So it's size will be 4 bytes.
If you create a character variable and store a character in it, then only it is stored in 1 byte memory.
char ch;
ch=' ';
printf("%d",sizeof(ch));
//outputs 1
For anything to be a string, it must be terminated with a null character represented as '\0'.
If we write a string "hello", it is actually stored as 'h' 'e' 'l' 'l' 'o' '\0', so that the system knows string ends after the 'o' in "hello" and it stops reading when null character comes. The length of this string is still 5 if you use strlen() function but actually the sizeof(string) is 6 bytes.
When we create an empty string, like "", it's length is 0 but size is 1 byte as it must terminate where it starts, i.e. at 0th character.
Hence an empty string consists of only one character, that is null character, giving size 1 byte.

From C Traps and Pitfalls
Single and double quotes mean very different things in C.
A Character enclosed in single quotes is just a another way of writing the integer that corresponds to the given character in ASCII implementation. Thus ' ' means exactly same thing as 32.
On the other hand, A string enclosed in double quotes is a short-hand way of writing a pointer to the initial character of a nameless array that has been initialized with the characters between the quotes and an extra character whose binary value is zero. Thus writing "" that is empty string still has '\0' character whose size is one.

because of in 1st case there is a character that's why sizeof operator is take the SACII value of character and it's take as an integer so in 1st case it will give you 4.
in 2nd case sizeof operator take as a string and in string there is no data means it's understood NULL string , so NULL string size is 1, that's why it will give you answer as a 1.

Confused about C string constants

When I came across this C language implementation of Porters Stemming algorithm I found a C-ism I was confused about.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void test( char *s )
{
int len = s[0];
printf("len= %i\n", len );
printf("s[len] = %c\n", s[len] );
}
int main()
{
test("\07" "abcdefg");
return 0;
}
and output:
len = 7
s[len] = g
However, when I input
test("\08" "abcdefgh");
or any string constant that is longer than 7 with the corresponding length in the first pair of parenthesis ( i.e. test("\09" "abcdefghi"); the output is
len = 0
s[len] =
But any input like test("\01" "abcdefgh"); prints out the character in that position ( if we call the first character position 1 and not 0 for the moment )
It appears if test( char *s ) reads the number in the first pair of parenthesis ( how it does this I am not sure since I thought s[0] would be able to only read a single char, i.e. the '\' ) and prints the last character at that index + 1 of the string constant in the second pair of parenthesis.
My question is this: It seems as if we are passing two string constants into test( char *s ). What exactly is happening here, meaning, how does the compiler seem to "split" up the string over two pairs of parenthesis? Another question one might have is, is a string of the form "blah" "abcdefg" one consecutive block of memory? It may be the case that I have overlooked something elementary, but even so I would like to know what I overlooked. I know this is a basic concept but I could not find a clear example or situation on the web that explains this and in all honesty I don't follow the output. Any helpful comments are welcomed.

There are at least three things going on here:
Literal strings juxtaposed against one another are concatenated by the compiler. "a" "b" is exactly the same as "ab".
The backslash is an escape character, which means it is not copied literally into the resulting string. The notation \01 means "the character with ASCII value 1".
The notation \0... means an octal character constant. Octal numbers are base 8, made up from digits that range from 0 through 7 inclusive. 8 is not a valid octal constant, so "\08" does not follow "\07".

The problem is not in the length of the string, but in the \o syntax for specifying non-printable values in string literals. \o, \oo, and \ooo denote octal constants, i.e. a single character whose value is written in base 8. Since 08 in \08 doesn't represent a valid base 8 number, it is interpreted as \0 followed by the ASCII character 8.
To fix the problem, represent 8 as \10 or \010:
test("\007" "abcdefg");
test("\010" "abcdefgh");
...or switch to hexadecimal, where the \x prefix makes the base more explicit to the casual reader:
test("\x07" "abcdefg");
test("\x08" "abcdefgh");
test("\x09" "abcdefghi");
test("\x0a" "abcdefghij");
...

\number in a character or string literal is means the character whose code is the value number. number is interpreted in octal, so the first non-octal digit terminates the number. So "\07" is a one-character string containing the character with code 7, but \08 is a two-character string containing the character with code 0 followed by the digit 8.
Additionally, code 0 the null terminator that's used in C to indicate the end of the string. So that second string ends at the beginning, because its first byte is the terminator. This why the length of the string in your second example is 0.

When two or more string literals are adjacent (separated only by white-space), the compiler will join them into a single string. Therefore "\07" "abcdefg" is equivalent to "\07abcdefg".
"\07" is an octal escape. An octal escape ends after three digits or with first non-octal character. So, when you enter "\08", 8 is a non octal character therefore escape ends and 0 is stored at s[0].
Now, len is 0 and printing s[len] will try to print the character at s[0] which has a non printable ASCII code (Only character above ASCII value above 32 are printable).

Looping through an array in C

Just wondering if someone could explain this to me? I have a program that asks a user to input a sentence. The program then reads the user input into an array and changes all of the vowels to a $ sign. My question is how does the for loop work? When initialising char c = 0; does that not mean that the array element is an int? I can't understand how it functions.
#include <stdio.h>
#include <string.h>
int main(void)
{
char words[50];
char c;
printf("Enter any number of words: \n");
fgets(words, 50, stdin);
for(c = 0; words[c] != '\n'; c++)
{
if(words[c] =='a'||words[c]=='e'||words[c]=='i'||words[c]=='o'||words[c]=='u')
{
words[c] = '$';
}
}
printf("%s", words);
return 0;
}

The code treats c as an integer variable (in C, char is basically a very narrow integer). In my view it would be cleaner to declare it as int (perhaps unsigned int). However, given that words is at most 50 characters long, char c works fine.
As to the loop:
c = 0 initializes c to zero.
words[c] != '\n' checks -- right at the start and also after each iteration -- whether the current character (words[c]) is a newline, and stops if it is.
c++ increments c after each iteration.

An array is like a building, you have several floors each one with a number.
In the floor 1 lives John.
In floor 2 lives Michael.
If you want to go to Jonh apartment you press 1 on the elevator. If you want to go to Michael's you press 2.
Thats the same with arrays. Every position in the array stores a value, in this case a letter.
Every position has a index associated. The first position is 0.
When you want to access a position of the array you use array[position] where position is the index in the array that you want to access.
The variable c holds the position to be acessed. When you do words[c] you're acctualy accessing the cnt position in the array and retrieving its value.
Supose the word is cool
word[1] results in o,
word[0] results in c
To determine the end of the word, a the caracter \n is set at the last position of the array.

Not really, char and int are implicitly converted.
You can look at a char in this case as a smaller int. sizeof(char) == 1, so it's smaller than an int, that's probably the reason it was used. Programatically, there's no difference in this case, unless the input string is very long, in which case the char will overflow before an int does.

Number literals (such as 0 in your case) are compatible with variables of type char. In fact, even a character literal enclosed in single quotes (for example '\n') is of type int but is implicitly converted to a char when assigned or compared to another char.
Number literals are interchangeable with character literals, as long as the former do not exceed the range of a character.
The following should result in a compiler warning:
char c = 257;
whereas this will not:
char c = 127;

A char is C is an integral type as is short, int, long, and long long (and many other types):
It is defined as the smallest addressable unit on the machine you are compiling on and will usually be 8 bits which means it can hold values -128 to 127. And an unsigned char can hold values 0 - 255.
It works as an iterator in the above since it will stop before 50 all the time and it can hold values up to 127. Whereas an int type can usually hold values up to 2,147,483,647, but takes up 4 times the space in the machine as an 8 bit char. An int is only guaranteed to be at least 16 bits in C which means values between −32,768 and 32,767 or 0 - 6,5535 for an unsigned int.
So your loop is just accessing elements in your array, one after the other like words[0] at the beginning to look at the first character, then words[1] to look at the next character. Since you use a char, which I'm assuming is 8 bits on your machine as that is very common. Your char will be enough to store the iterator for your loop until it gets above 127. If you read in more than 127 characters (instead of just 50) and used a char to iterate you would run into weird problems since the char can't hold 128 and will loop around to -128. Causing you to access words[-128] which would most likely result in a Segmentation Fault.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight