What is the size of a given char? - c

I've got a code that i cannot understand in C;
char c is string, that supposed to be randomized,
here is the question however, 26 is supposed to be range of values starting from 97, but it easy to understand for integer, but in case of char i have no clue what it is supposed to be
char c = (char) rand() % 26 + 97;

That is generating a random character. In ASCII, alphabetical characters start at 97. So, the code is taking 97, adding a random number between 0 and 25 to it, then casting it to a char, which generates a random alphabetical character.

97 = ascii 'a'
It generates a random character between 'a' and 'z' inclusive.
Ref: ASCII values

It's a bad, non-portable way to generate a random character by directly computing the ASCII code.
A better, more portable, way is to randomize the index into a table of characters. This pushes the responsibility for what code is used to represent each character into the compiler, where it belongs:
char random_char(void)
{
const char alpha[] = "abcdefghijklmnopqrstuvwxyz";
return alpha[rand() % sizeof alpha];
}
Any decent compiler will very likely inline the above.
NOTE Using % to range-limit the return value of rand() is generally frowned upon, but that's not the focus here.

Related

When trying to create a cipher, how do you go about changing the characters in a string to integers

What I am trying to ask is when you are coding a cipher and you are asking for user input. How would you go about changing the characters in the string to numbers so that you could plug them into a formula and then get another letter out.
string s = get_string("Plain Text:");
while (s[a] != '\0')
{
if (isalpha(s[a]))
{
for (c = 0, e = strlen(s); c < e; c++)
{
if (isupper(s[a]))
{
printf("C\n");
a++;
}
if (islower(s[a])) // (x=(?+(argv[1][i]))%26) This is the formula, the ? is where im trying to figure out how to change characters into numbers and then back in characters
{
printf("c\n");
a++;
}
}
}
}
The 'printf' were to make sure that the code was checking for lower or upper case properly. If im not mistaken i'd need 2 formulas with different ranges in order for one to be upper and the other to be lower with that i would be able to plug them both into one cipher formula. Im pretty sure that you could also do without the whole check if its upper or lower but I dont really understand how you would go about creating the range or the loop so that it always stay with in alphabetical characters whether it be capital or lower case.
The char type is an 8-bit integer. So a string is an array of 8-bit integers.
Because you mention a formula that applies c % 26 to the character values, I'm assuming that you are only interested in ASCII text for this exercise. If you care about non-ASCII encodings, the whole approach will need to change. You can't do byte-by-byte analysis with strings in general. For example, in UTF8, some glyphs will span more than one byte, and in UTF16, each glyph is at least two bytes.
The upper-case A–Z characters are in the range 65..90 and the lower-case a-z characters are in the range 97..122. So you will want to get those values in the range 0..25 before applying an arithmetic formula to them.
You can do this like this:
When isupper(s[a]) is true: int c = s[a] - 65;
When islower(s[a]) is true: int c = s[a] - 97;
Then apply your formula.
As an aside, normally when doing arithmetic with chars, you'll need to pay special care to values between 128 and 255, because char * is not a pointer to an array of unsigned values. And with Unicode strings, some of your glyphs may span more than one byte each. But because we're only considering ASCII strings, this doesn't apply here.

Clear explanation of ndigit[c - '0']

I've finally started to read the K&R, and I've just arrived to the Array part.
But in the example of this section, there's a piece of code I really don't understand completely, so I'd like to ask you for a clear explanation of it, as I wanna understand every concept of C as I know it's the fundamental for a fast learning of C++.
By the way I already have a decent knowledge of JAVA, hope this will help you in your explanation setup.
Question:
In this piece of code ndigit[c - '0'] I don't understand what it's trying to do, I know from other Stack Overlfow questions that 0 should refer to the ASCI standards and should be 48, but still don't understand what c and that 0 have in relationship.
Short answer:
by ndigit[c-'0'] you are not doing normal arithmetic e.g [5+5] = [10]. Rather you are first converting your characters into character code. Then doing arithmetic with character code.
To say simply- we represent five in english with this "5" sign. Other languages have their own sign for five i.e 五 is five in japanise. Likewise Computer represent five with 53(assuming ascii, ascii is like a language). So with ndigit[c-'0'], its first being converted to character code then doing arithmetic.
Long answer:
Lets go through ndigit[c - '0'] first.
ndigit = name of the array
[ ] = index, used for specifying total elements number.
c - '0' = arithmetic operation. c is a variable. - is minus.And
'0' is not 0 but 48.
now let me add additional code from the from the book -
int ndigit[10];
...//fill in the array with 0s
while((c = getchar()) != EOF)
if(c >= '0' && c <= '9')
++ndigit[c - '0']; //<== unable to understand this part
here something to note of is getchar(). getchar() return int type data. even though it return int type but it wont return the character rather it will return character code. Let me give a example-
#include <stdio.h>
int main(){
int c;
c = getchar();
printf("%d\n", c);
}
output:
#mix:~
$ cc test.c
#mix:~
$ ./a.out
5 ;wrote 5 in terminal
53 ; printed 53 instead of 5
5 is the character. And 53 is the character code of 5.
Now lets back to our main topic ndigit[c - '0']. c is getting a value from getchar(). and getchar() reads from input. lets say the input is 5 . Now due to getchar() function behaviour instead of 5, c will contain 53. So
ndigit[c - '0'] == `ndigit[53 - '0']` != `ndigit[5 - '0']`
Also notice we are not using 0. Rather '0'.
means the arithmetic -
ndigit[53 - '0']
=ndigit[53] ;**wrong**
using '0' means we meant to use character code(used single quote). Like described above '0' = 48 (according to ascii). so
ndigit[53 - '0']
= ndigit[53 - 48]
= ndigit[5]
Now we get back our character which was read by getchar(). But why should we get back our character? Will ndigit[c] instead of ndigit[c - '0'] work?
ndigit[c] wont work because at the start of our code we wrote ndigit[10] . Our ndigit[10] array can hold 10 elements at max. As a result c cant be greater than 10 or ndigit[53] is invalid as its size is 53 and surpassing 10 . Thats why we use ndigit[c - '0'] to do character code subtraction and get a value under 10.
IF still unclear, search and learn about the following-
charater encoding
array in c
In this code c is a char, presumably representing a digit. In C a char is an integral type, so you can perform arithmetic operations on them.
Digits are encoded with numbers from a consecutive range: if the code for '0' is k, the code for '1' is k+1, the code for '2' is k+2, and so on. That is why by subtracting '0' from a character representing a digit you get the numeric value of that digit.
For example, by subtracting '5'-'0' you get a numeric 5 instead of character '5'.
If you make an array ndigit[10], then ndigit[c - '0'] lets you access an array element corresponding to the digit. This can be used, for example, to count the number of different digits in the input.
As you said '0' is equal to 48 (assuming ASCII encoding). Thus the other digits are equal to 49 through 57 respectively. So '1' is equal to 49, '2' to 50 etc. Thus '1' - '0' is equal to '49 - 48', which is 1 and '2' - '0' is equal to '50 - 48', which is 2 and so on.
In other words c - '0' converts a digit like '5' to its integer equivalent (which would be 5 for '5').

Differences between int/char arrays/strings

I'm still new to the forum so I apologize in advance for forum - etiquette issues.
I'm having trouble understanding the differences between int arrays and char arrays.
I recently wrote a program for a Project Euler problem that originally used a char array to store a string of numbers, and later called specific characters and tried to use int operations on them to find a product. When I used a char string I got a ridiculously large product, clearly incorrect. Even if I converted what I thought would be compiled as a character (str[n]) to an integer in-line ((int)str[n]) it did the exact same thing. Only when I actually used an integer array did it work.
Code is as follows
for the char string
char str[21] = "73167176531330624919";
This did not work. I got an answer of about 1.5 trillion for an answer that should have been about 40k.
for the int array
int str[] = {7,3,1,6,7,1,7,6,5,3,1,3,3,0,6,2,4,9,1,9};
This is what did work. I took off the in-line type casting too.
Any explanation as to why these things worked/did not work and anything that can lead to a better understanding of these ideas will be appreciated. Links to helpful stuff are as well. I have researched strings and arrays and pointers plenty on my own (I'm self taught as I'm in high school) but the concepts are still confusing.
Side question, are strings in C automatically stored as arrays or is it just possible to do so?
To elaborate on WhozCraig's answer, the trouble you are having does not have to do with strings, but with the individual characters.
Strings in C are stored by and large as arrays of characters (with the caveat that there exists a null terminator at the end).
The characters themselves are encoded in a system called ascii which assigns codes between 0 - 127 for characters used in the english language (only). Thus "7" is not stored as 7 but as the ascii encoding of 7 which is 55.
I think now you can see why your product got so large.
One elegant way to fix would be to convert
int num = (int) str[n];
to
int num = str[n] - '0';
//thanks for fixing, ' ' is used for characters, " " is used for strings
This solution subtracts the ascii code for 0 from the ascii code for your character, say "7". Since the numbers are encoded linearly, this will work (for single digit numbers). For larger numbers, you should use atoi or strtol from stdlib.h
Strings are just character arrays with a null terminating byte.
There is no separate string data type in c.
When using a char as an integer, the numeric ascii value is used. For example, saying something like printf("%d\n", (int)'a'); will result in 97 (the ascii value of 'a') being printed.
You cannot use a string of numbers to do numeric calculations unless you convert it to an integer array. To convert a digit as a character into its integer form, you can do something like this:
char a = '2';
int a_num = a - '0';
//a_num now stores integer 2
This causes the ascii value of '0' (48) to be subtracted from ascii value '2' (50), finally leaving 2.
char str[21] = "73167176531330624919"
this code is equivalent to
char str[21] = {'7','3','1','6','7','1','7','6','5',/
'3','1','3','3','0','6','2','4','9','1','9'}
so whatever stored in str[21] is not numbers, but the char(their ASCII equivalent representation is different).
side question answer - yes/no, the strings are automatically stored as char arrays, but the string does has a extra character('\0') as the last element(where a char array need not have such a one).

String parsing in C

how would you parse the string, 1234567 into individual numbers?
char mystring[] = "1234567";
Each digit is going to be mystring[n] - '0'.
What Delan said. Also, it's probably bad practice for maintainability to use a ASCII dependent trickery. Try using this one from the standard library:
int atoi ( const char * str );
EDIT: Much better idea (the one above has been pointed out to me as being a slow way to do it) Put a function like this in:
int ASCIIdigitToInt(char c){
return (int) c - '0';
}
and iterate this along your string.
Don't forget that a string in C is actually an array of type 'char'. You could walk through the array, and grab each individual character by array index and subtract from that character's ascii value the ascii value of '0' (which can be represented by '0').

Char to int conversion in C

If I want to convert a single numeric char to it's numeric value, for example, if:
char c = '5';
and I want c to hold 5 instead of '5', is it 100% portable doing it like this?
c = c - '0';
I heard that all character sets store the numbers in consecutive order so I assume so, but I'd like to know if there is an organized library function to do this conversion, and how it is done conventionally. I'm a real beginner :)
Yes, this is a safe conversion. C requires it to work. This guarantee is in section 5.2.1 paragraph 2 of the latest ISO C standard, a recent draft of which is N1570:
Both the basic source and basic execution character sets shall have the following
members:
[...]
the 10 decimal digits
0 1 2 3 4 5 6 7 8 9
[...]
In both the source and execution basic character sets, the
value of each character after 0 in the above list of decimal digits shall be one greater than
the value of the previous.
Both ASCII and EBCDIC, and character sets derived from them, satisfy this requirement, which is why the C standard was able to impose it. Note that letters are not contiguous iN EBCDIC, and C doesn't require them to be.
There is no library function to do it for a single char, you would need to build a string first:
int digit_to_int(char d)
{
char str[2];
str[0] = d;
str[1] = '\0';
return (int) strtol(str, NULL, 10);
}
You could also use the atoi() function to do the conversion, once you have a string, but strtol() is better and safer.
As commenters have pointed out though, it is extreme overkill to call a function to do this conversion; your initial approach to subtract '0' is the proper way of doing this. I just wanted to show how the recommended standard approach of converting a number as a string to a "true" number would be used, here.
Try this :
char c = '5' - '0';
int i = c - '0';
You should be aware that this doesn't perform any validation against the character - for example, if the character was 'a' then you would get 91 - 48 = 49. Especially if you are dealing with user or network input, you should probably perform validation to avoid bad behavior in your program. Just check the range:
if ('0' <= c && c <= '9') {
i = c - '0';
} else {
/* handle error */
}
Note that if you want your conversion to handle hex digits you can check the range and perform the appropriate calculation.
if ('0' <= c && c <= '9') {
i = c - '0';
} else if ('a' <= c && c <= 'f') {
i = 10 + c - 'a';
} else if ('A' <= c && c <= 'F') {
i = 10 + c - 'A';
} else {
/* handle error */
}
That will convert a single hex character, upper or lowercase independent, into an integer.
You can use atoi, which is part of the standard library.
Since you're only converting one character, the function atoi() is overkill. atoi() is useful if you are converting string representations of numbers. The other posts have given examples of this. If I read your post correctly, you are only converting one numeric character. So, you are only going to convert a character that is the range 0 to 9. In the case of only converting one numeric character, your suggestion to subtract '0' will give you the result you want. The reason why this works is because ASCII values are consecutive (like you said). So, subtracting the ASCII value of 0 (ASCII value 48 - see ASCII Table for values) from a numeric character will give the value of the number. So, your example of c = c - '0' where c = '5', what is really happening is 53 (the ASCII value of 5) - 48 (the ASCII value of 0) = 5.
When I first posted this answer, I didn't take into consideration your comment about being 100% portable between different character sets. I did some further looking around around and it seems like your answer is still mostly correct. The problem is that you are using a char which is an 8-bit data type. Which wouldn't work with all character types. Read this article by Joel Spolsky on Unicode for a lot more information on Unicode. In this article, he says that he uses wchar_t for characters. This has worked well for him and he publishes his web site in 29 languages. So, you would need to change your char to a wchar_t. Other than that, he says that the character under value 127 and below are basically the same. This would include characters that represent numbers. This means the basic math you proposed should work for what you were trying to achieve.
Yes. This is safe as long as you are using standard ascii characters, like you are in this example.
Normally, if there's no guarantee that your input is in the '0'..'9' range, you'd have to perform a check like this:
if (c >= '0' && c <= '9') {
int v = c - '0';
// safely use v
}
An alternative is to use a lookup table. You get simple range checking and conversion with less (and possibly faster) code:
// one-time setup of an array of 256 integers;
// all slots set to -1 except for ones corresponding
// to the numeric characters
static const int CHAR_TO_NUMBER[] = {
-1, -1, -1, ...,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, // '0'..'9'
-1, -1, -1, ...
};
// Now, all you need is:
int v = CHAR_TO_NUMBER[c];
if (v != -1) {
// safely use v
}
P.S. I know that this is an overkill. I just wanted to present it as an alternative solution that may not be immediately evident.
As others have suggested, but wrapped in a function:
int char_to_digit(char c) {
return c - '0';
}
Now just use the function. If, down the line, you decide to use a different method, you just need to change the implementation (performance, charset differences, whatever), you wont need to change the callers.
This version assumes that c contains a char which represents a digit. You can check that before calling the function, using ctype.h's isdigit function.
Since the ASCII codes for '0','1','2'.... are placed from 48 to 57 they are essentially continuous. Now the arithmetic operations require conversion of char datatype to int datatype.Hence what you are basically doing is:
53-48 and hence it stores the value 5 with which you can do any integer operations.Note that while converting back from int to char the compiler gives no error but just performs a modulo 256 operation to put the value in its acceptable range
You can simply use theatol()function:
#include <stdio.h>
#include <stdlib.h>
int main()
{
const char *c = "5";
int d = atol(c);
printf("%d\n", d);
}

Resources