CS50 problem result is not right and I don't understand why - c

I'm really sorry to bother but I have a problem and I don't know how to fix it. I've been doing CS5O and in Problem Set 2 I've been receiving wrong results and I don´t understand what I did wrong. It should be giving me "Before Grade 1" for the sentence "One fish. Two fish. Red fish. Blue fish." but is giving me Grade 2, it is giving Grade 14 for a sentence that is Grade 16+. Can someone help me? This is my code:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <math.h>
int count_letters(string text);
int count_word(string text);
int count_sentences(string text);
int letters;
int words;
int sentences;
int main(void)
{
string text = get_string("Text: ");
printf("Text: %s\n", text);
count_letters(text);
count_word(text);
count_sentences(text);
float L = 100 * (letters / words);
float S = 100 * (sentences / words);
int index = round(0.0588 * L - 0.296 * S - 15.8);
if (index < 1)
{
printf("Before Grade 1\n");
} else if (index > 16)
{
printf("Grade 16+\n");
} else
{
printf("Grade: %i\n", index);
}
}
int count_letters(string text)
{
letters = 0;
for (int i = 0; i < strlen(text); i++)
{
if ((text[i] >= 65 && text[i] <= 99) || (text[i] >= 97 && text[i] <= 122))
{
letters++;
}
}
return letters;}
int count_word(string text)
{
words = 1;
for (int i = 0; i < strlen(text); i++)
{
if (isspace(text[i]))
{
words++;
}
}
return words;}
int count_sentences(string text)
{
sentences = 0;
for (int i = 0; i < strlen(text); i++)
{
if (text[i] == 33 || text[i] == 46 || text[i] == 63)
{
sentences++;
}
}
return sentences;
}
Thank you!!

Regarding types:
There's a phenomenon sometimes called "sloppy typing" which basically means carelessly spamming out any random variable type in the code and then wonder why nothing works. Here's some things you need to know:
Integer division truncates the result by dropping the remainder. int is an integer, constants such as 100 is also int. Thus 3/2 evaluates to 1 in C.
Unless you have very good and exotic reasons1), you should pretty much never use the type float in a C program.
On high end systmes like PC, always use double. All the default math functions use double, for example round(). If you wish to use them with float you'd use a special function roundf(). Similarly, all floating point constants 1.0 are of type double. If you want them to be float you'd use 1.0f.
For the two above reasons, make it a habit of never mixing different types in the same expression. Do not mix integer and floating point. Do not mix float and double. Mixing types can lead to unexpected implicit conversion problems, accidental truncation, loss of precision and so on.
So for example a line such as float L = 100 * (letters / words); needs to be rewritten to explicitly use double everywhere:
double L = 100.0 * ((double)letters / (double)words);
1) Like using a microcontroller with single precision floating point FPU but only software floating point double. Or a FPU where double would be far less efficient.
Regarding functions and global variables:
The variables letters, words and sentences could have been declared locally inside main(), since your functions return the values to use anyway.
Declaring global variables like you do is considered very bad practice for multiple reasons. It exposes variables to other parts of the program that shouldn't have access, which in turn increases chances of accidental/intentional abuse since the variables are available everywhere. It increases the chances for naming collisions. It is unsafe in multi-threaded applications. It's universally bad; don't do this.
Instead pass variables to/from functions by parameters and return values.
Regarding "magic numbers":
Dropping a constant such as 0.0588 in the middle of source code, with zero explanation where that number comes from or what it does, is known as "magic numbers". This is bad practice since the reader of the code has no clue what it's therefore. The reader is most often yourself, one year from now, when you have forgotten all about what the code does, so it's a form of self-torture.
So instead of typing out some number like that, use a #define or const variable with a meaningful name, then use that meaningful name in the equation/expression instead.
In case of symbol table values, we don't have to invent that meaningful name ourselves, since C already has a built-in mechanism for it. Instead of text[i] >= 65 you should type text[i] >= 'A', which is 100% equivalent but much more readable.
An advanced detail regarding symbol table values is that they aren't actually guaranteed to be adjacant. Something like ch >= 'A' && ch <= 'Z' may work on classic 7 bit ASCII, but it's non-portable and also don't conver "locale" - local language-specific letters (as seen in Spanish, French, German and so on - almost every major language using latin letters). The portable solution to this is to use isupper or islower from ctype.h. Or in your case better yet, isalpha.
Regarding code formatting:
Don't invent some own non-standard formatting. There are some things regarding code formatting that are subjective, but these are not:
Always use empty lines between function bodies.
Always place the last } of a function at a line of its own.

Related

Count Different Character Types In String

I wrote a program that counts and prints the number of occurrences of elements in a string but it throws a garbage value when i use fgets() but for gets() it's not so.
Here is my code:
#include<stdio.h>
#include<string.h>
#include<ctype.h>
#include<stdlib.h>
int main() {
char c[1005];
fgets(c, 1005, stdin);
int cnt[26] = {0};
for (int i = 0; i < strlen(c); i++) {
cnt[c[i] - 'a']++;
}
for (int i = 0; i < strlen(c); i++) {
if(cnt[c[i]-'a'] != 0) {
printf("%c %d\n", c[i], cnt[c[i] - 'a']);
cnt[c[i] - 'a'] = 0;
}
}
return 0;
}
This is what I get when I use fgets():
baaaabca
b 2
a 5
c 1
32767
--------------------------------
Process exited after 8.61 seconds with return value 0
Press any key to continue . . . _
I fixed it by using gets and got the correct result but i still don't understand why fgets() gives wrong result
Hurray! So, the most important reason your code is failing is that your code does not observe the following inviolable advice:
Always sanitize your inputs
What this means is that if you let the user input anything then he/she/it can break your code. This is a major, common source of problems in all areas of computer science. It is so well known that a NASA engineer has given us the tale of Little Bobby Tables:
Exploits of a Mom #xkcd.com
It is always worth reading the explanation even if you get it already #explainxkcd.com
medium.com wrote an article about “How Little Bobby Tables Ruined the Internet”
Heck, Bobby’s even got his own website — bobby-tables.com
Okay, so, all that stuff is about SQL injection, but the point is, validate your input before blithely using it. There are many, many examples of C programs that fail because they do not carefully manage input. One of the most recent and widely known is the Heartbleed Bug.
For more fun side reading, here is a superlatively-titled list of “The 10 Worst Programming Mistakes In History” #makeuseof.com — a good number of which were caused by failure to process bad input!
Academia, methinks, often fails students by not having an entire course on just input processing. Instead we tend to pretend that the issue will be later understood and handled — code in academia, science, online competition forums, etc, often assumes valid input!
Where your code went wrong
Using gets() is dangerous because it does not stop reading and storing input as long as the user is supplying it. It has created so many software vulnerabilities that the C Standard has (at long last) officially removed it from C. SO actually has an excellent post on it: Why is the gets function so dangerous that it should not be used?
But it does remove the Enter key from the end of the user’s input!
fgets(), in contrast, stops reading input at some point! However, it also lets you know whether you actually got an entire line of of text by not removing that Enter key.
Hence, assuming the user types: b a n a n a Enter
gets() returns the string "banana"
fgets() returns the string "banana\n"
That newline character '\n' (what you get when the user presses the Enter key) messes up your code because your code only accepts (or works correctly given) minuscule alphabet letters!
The Fix
The fix is to reject anything that your algorithm does not like. The easiest way to recognize “good” input is to have a list of it:
// Here is a complete list of VALID INPUTS that we can histogram
//
const char letters[] = "abcdefghijklmnopqrstuvwxyz";
Now we want to create a mapping from each letter in letters[] to an array of integers (its name doesn’t matter, but we’re calling it count[]). Let’s wrap that up in a little function:
// Here is our mapping of letters[] ←→ integers[]
// • supply a valid input → get an integer unique to that specific input
// • supply an invalid input → get an integer shared with ALL invalid input
//
int * histogram(char c) {
static int fooey; // number of invalid inputs
static int count[sizeof(letters)] = {0}; // numbers of each valid input 'a'..'z'
const char * p = strchr(letters, c); // find the valid input, else NULL
if (p) {
int index = p - letters; // 'a'=0, 'b'=1, ... (same order as in letters[])
return &count[index]; // VALID INPUT → the corresponding integer in count[]
}
else return &fooey; // INVALID INPUT → returns a dummy integer
}
For the more astute among you, this is rather verbose: we can totally get rid of those fooey and index variables.
“Okay, okay, that’s some pretty fancy stuff there, mister. I’m a bloomin’ beginner. What about me, huh?”
Easy. Just check that your character is in range:
int * histogram(char c) {
static int fooey = 0;
static int count[26] = {0};
if (('a' <= c) && (c <= 'z')) return &count[c - 'a'];
return &fooey;
}
“But EBCDIC...!”
Fine. The following will work with both EBCDIC and ASCII:
int * histogram(char c) {
static int fooey = 0;
static int count[26] = {0};
if (('a' <= c) && (c <= 'i')) return &count[ 0 + c - 'a'];
if (('j' <= c) && (c <= 'r')) return &count[ 9 + c - 'j'];
if (('s' <= c) && (c <= 'z')) return &count[18 + c - 's'];
return &fooey;
}
You will honestly never have to worry about any other character encoding for the Latin minuscules 'a'..'z'.Prove me wrong.
Back to main()
Before we forget, stick the required magic at the top of your program:
#include <stdio.h>
#include <string.h>
Now we can put our fancy-pants histogram mapping to use, without the possibility of undefined behavior due to bad input.
int main() {
// Ask for and get user input
char s[1005];
printf("s? ");
fgets(s, 1005, stdin);
// Histogram the input
for (int i = 0; i < strlen(s); i++) {
*histogram(s[i]) += 1;
}
// Print out the histogram, not printing zeros
for (int i = 0; i < strlen(letters); i++) {
if (*histogram(letters[i])) {
printf("%c %d\n", letters[i], *histogram(letters[i]));
}
}
return 0;
}
We make sure to read and store no more than 1004 characters (plus the terminating nul), and we prevent unwanted input from indexing outside of our histogram’s count[] array! Win-win!
s? a - ba na na !
a 4
b 1
n 2
But wait, there’s more!
We can totally reuse our histogram. Check out this little function:
// Reset the histogram to all zeros
//
void clear_histogram(void) {
for (const char * p = letters; *p; p++)
*histogram(*p) = 0;
}
All this stuff is not obvious. User input is hard. But you will find that it doesn’t have to be impossibly difficult genius-level stuff. It should be entertaining!
Other ways you could handle input is to transform things into acceptable values. For example you can use tolower() to convert any majuscule letters to your histogram’s input set.
s? ba na NA!
a 3
b 1
n 2
But I digress again...
Hang in there!

Create a precise atof() implementation in c

I have written an atof() implementation in c . I am facing rounding off errors in this implementation . So , putting in a test value of 1236.965 gives a result of 1236.964966 but the library atof() function reurns 1236.965000 . My question is , how to make the user defined atof() implementation more 'correct' ?
Can the library definition of atof() be found somewhere ?
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
float str_to_float(char *);
void float_to_str(float,char *);
int main(){
int max_size;
float x;
char *arr;
printf("Enter max size of string : ");
scanf("%d",&max_size);
arr=malloc((max_size+1)*sizeof(char));
scanf("%s",arr);
x=str_to_float(arr);
printf("%f\n%f",x,atof(arr));
return 0;
}
float str_to_float(char *arr){
int i,j,flag;
float val;
char c;
i=0;
j=0;
val=0;
flag=0;
while ((c = *(arr+i))!='\0'){
// if ((c<'0')||(c>'9')) return 0;
if (c!='.'){
val =(val*10)+(c-'0');
if (flag == 1){
--j;
}
}
if (c=='.'){ if (flag == 1) return 0; flag=1;}
++i;
}
val = val*pow(10,j);
return val;
}
Change all your floats to doubles. When I tested it, that gave the same result as the library function atof for your test case.
atof returns double, not float. Remember that it actually is double and not float that is the "normal" floating-point type in C. A floating-point literal, such as 3.14, is of type double, and library functions such as sin, log and (the perhaps deceptively named) atof work with doubles.
It will still not be "precise", though. The closest you can get to 1236.965 as a float is (exactly) 1236.9649658203125, and as a double 1236.964999999999918145476840436458587646484375, which will be rounded to 1236.965000 by printf. No matter how many bits you have in a binary floating-point number, 1236.965 can't be exactly represented, similar to how 1/3 can't be exactly represented with a finite number of decimal digits: 0.3333333333333333...
And also, as seen in the discussion in comments, this is a hard problem, with many possible pitfalls if you want code that will always give the closest value.
I used your code as inspiration to write my own.
What other commenters and answers do not recognize is that the original reason for the question is an embedded situation. In my case the library "atof" pulls in something that does "printf" which pulls in "systemcalls" which I don't have.
So.... here I present a simple (does not implement exponential notation) atof implementation that works in floats, and is suitable for embedding.
My implementation uses way less variables.
float ratof(char *arr)
{
float val = 0;
int afterdot=0;
float scale=1;
int neg = 0;
if (*arr == '-') {
arr++;
neg = 1;
}
while (*arr) {
if (afterdot) {
scale = scale/10;
val = val + (*arr-'0')*scale;
} else {
if (*arr == '.')
afterdot++;
else
val = val * 10.0 + (*arr - '0');
}
arr++;
}
if(neg) return -val;
else return val;
}
how to make the user defined atof() implementation more 'correct' ?
Easy: 1) never overflow intermediate calculation and 2) only round once (at the end).
It is hard to do those 2 steps.
Note: C's atof(), strtof(), etc. also handle exponential notation - in decimal and hex.
Potential roundings
val*10
(val*10)+(c-'0');
pow(10,j)
val*pow(10,j) // This last multiplication is the only tolerable one.
Potential overflow (even though the final answer is within range)
val*10
(val*10)+(c-'0');
pow(10,j)
Using a wider type like double can greatly lessen the occurrence of such problems and achieve OP's "more 'correct'". Yet they still exist.
This is not an easy problem to solved to get the best (correct) floating point result from all string inputs.
Sample approaches to solve.
Avoid overflow: rather than pow(10,j):
val = val*pow(5,j); // rounds, `pow(5,j)` not expected to overflow a finite final result.
val = val*pow(2,j); // Does not round except at extremes
Code should form (ival*10)+(c-'0') using extended integer math in the loop for exactness.
Yet this is just scratching the surface of the many corner cases.
#Eric Postpischil commented on a robust C++ code that handles non-exponential notation string input well. It does initial math using integers and only rounds later in the process. This linked code is not visible unless your rep is 10,000+ as the question was deleted.

Convert read-only character array to float without null termination in C

I'm looking for a C function like the following that parses a length-terminated char array that expresses a floating point value and returns that value as a float.
float convert_carray_to_float( char const * inchars, int incharslen ) {
...
}
Constraints:
The character at inchars[incharslen] might be a digit or other character that might confuse the commonly used standard conversion routines.
The routine is not allowed to invoke inchars[incharslen] = 0 to create a z terminated string in place and then use the typical library routines. Even patching up the z-overwritten character before returning is not allowed.
Obviously one could copy the char array in to a new writable char array and append a null at the end, but I am hoping to avoid copying. My concern here is performance.
This will be called often so I'd like this to be as efficient as possible. I'd be happy to write my own routine that parses and builds up the float, but if that's the best solution, I'd be interested in the most efficient way to do this in C.
If you think removing constraint 3 really is the way to go to achieve high performance, please explain why and provide a sample that you think will perform better than solutions that maintain constraint 3.
David Gay's implementation, used in the *BSD libcs, can be found here: https://svnweb.freebsd.org/base/head/contrib/gdtoa/ The most important file is strtod.c, but it requires some of the headers and utilities. Modifying that to check the termination every time the string pointer is updated would be a bit of work but not awful.
However, you might afterwards think that the cost of the extra checks is comparable to the cost of copying the string to a temporary buffer of known length, particularly if the strings are short and of a known length, as in your example of a buffer packed with 3-byte undelimited numbers. On most architectures, if the numbers are no more than 8 bytes long and you were careful to ensure that the buffer had a bit of tail room, you could do the copy with a single 8-byte unaligned memory access at very little cost.
Here's a pretty good outline.
Not sure it covers all cases, but it shows most of the flow:
float convert_carray_to_float(char const * inchars, int incharslen)
{
int Sign = +1;
int IntegerPart = 0;
int DecimalPart = 0;
int Denominator = 1;
bool beforeDecimal = true;
if (incharslen == 0)
{
return 0.0f;
}
int i=0;
if (inchars[0] == '-')
{
Sign = -1;
i++;
}
if (inchars[0] == '+')
{
Sign = +1;
i++;
}
for( ; i<incharslen; ++i)
{
if (inchars[i] == '.')
{
beforeDecimal = false;
continue;
}
if (!isdigit(inchars[i]))
{
return 0.0f;
}
if (beforeDecimal)
{
IntegerPart = 10 * IntegerPart + (inchars[i] - '0');
}
else
{
DecimalPart = 10 * DecimalPart + (inchars[i] - '0');
Denominator *= 10;
}
}
return Sign * (IntegerPart + ((float)DecimalPart / Denominator));
}

cipher overflowing

I am ciphering like the following but don't know how to prevent the capitals going into other symbols if they are shifted out of range, similarly the lowercase go out of range of the lowercase letters. How can I make them go round in a circle and stop overflow? Thanks
int size = strlen(plain_text);
int arrayelement = 0;
for(arrayelement = 0; arrayelement < size; arrayelement++)
{
if (islower(plain_text[arrayelement]))
{
ciphered_text[arrayelement] = (int)(plain_text[arrayelement] + shiftkey);
}
else if (isupper(plain_text[arrayelement]))
{
ciphered_text[arrayelement] = (int)(plain_text[arrayelement] + shiftkey);
}
}
ciphered_text[size] = '\0';
printf("%s", ciphered_text);
I guess you use a type like char so an easy solution to not overflow is to do
int tmp_ciphered = (my_char + shift) % 0xff;
char ciphered = (char)(tmp_ciphered);
thenyou turn and do not overflow, this is a ring
This duplicates (almost exactly) c++ simple Caesar cipher algorithm.
Note that I don't agree with the accepted answer on that post. Basically you have to map the characters back into the range using something like ((c-'a'+shift) % 26) + 'a'. However that assumes your characters are in 'a'..'z'. Might be safer to use c >= 'a' && c <= 'z' instead of islower as I'm not sure how locale will play into on non-English systems. Similar for isupper and the other range. Finally, you need an else clause to handle when the char is not in either range.
The only truly portable way to do this involves building a lookup table for the input domain, and manually building the chars based on non-linear-assumptions.
Even for the restricted domain of ['a'..'z','A'..'Z'], assuming 'A'..'Z' is contiguous is not defined by the language standard, and is provably not always the case. For any naysayers that think otherwise, I direct you to ordinal positions of characters in the chart at this link, paying close attention to the dead-zones in the middle of the assumed sequences. If you think "Nobody uses EBCDIC anymore", let me assure you both AS/400 and OS/390 are alive and well (and probably processing your US taxes right now, as the IRS is one of IBM's biggest customers).
In fact, the C standard is pretty explicit about this:
C99-5.2.1.3 In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.
Nowhere is there even a mention of defined ordering or even implied ordering on any other part of the character sets. In fact, '0'..'9' has one other unique attribute: they are the only characters guaranteed to be unaffected by locale changes.
So rather than assume a linear continuation exists for characters while thumbing our noses at the suspicious silence of the standard, let us define our own, hard map. I'lll not inline the code here like I normally do; if you're still with me you're genuinely interested in knowing and will likely read and critique the code below. But I will describe in summary how it works:
Static-declare two alphabets, double in length (A..ZA..Z,a..za..z).
Declare two arrays (encrypt and decrypt) large enough to hold (1<<CHAR_BIT) entries.
Fully initialize both arrays with values corresponding to their indexes. Ex: a[0]=0,a[1]=1,...
Fill each location in the encrypt-array that is part of our alphabets from (1) with the proper value corresponding to the shift width Ex. a['a'] = 'g' for a ROT5.
Mirror (4) by working backward from the tail of the alphabet applying the opposite shift direction. Ex: `a['g'] = 'a';
You can now use the encryption array as a simple table to translate input text to cipher text:
enc-char = encrypt[ dec-char ];
dec-char = decrypt[ enc-char ];
If you think it seems like a ton of work just to get source-level platform independence, you're absolutely right. But you would be amazed at the #ifdef #endif hell that people try to pass off as "multi-platform". The core goal of platform-independent code is to not only define common source, but define behavior as well. No matter what the platform, the concepts above will work. (and not a #ifdef in sight).
Thanks for taking the time to read this fiasco. Such a seemingly simple problem...
Sample main.cpp
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <string.h>
// global tables for encoding. must call init_tables() before using
static char xlat_enc[1 << CHAR_BIT];
static char xlat_dec[1 << CHAR_BIT];
void init_tables(unsigned shift)
{
// our rotation alphabets
static char ucase[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ";
static char lcase[] = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz";
int i=0;
// ensure shift is below our maximum shift
shift %= 26;
// prime our table
for (;i<(1 << CHAR_BIT);i++)
xlat_enc[i] = xlat_dec[i] = i;
// apply shift to our xlat tables, both enc and dec.
for (i=0;i<(sizeof(ucase)+1)/2;i++)
{
xlat_enc[ lcase[i] ] = lcase[i+shift];
xlat_enc[ ucase[i] ] = ucase[i+shift];
xlat_dec[ lcase[sizeof(lcase) - i - 1] ] = lcase[sizeof(lcase) - i - 1 - shift];
xlat_dec[ ucase[sizeof(ucase) - i - 1] ] = ucase[sizeof(ucase) - i - 1 - shift];
}
}
// main entrypoint
int main(int argc, char *argv[])
{
// using a shift of 13 for our sample
const int shift = 13;
// initialize the tables
init_tables(shift);
// now drop the messsage to the console
char plain[] = "The quick brown fox jumps over the lazy dog.";
char *p = plain;
for (;*p; fputc(xlat_enc[*p++], stdout));
fputc('\n', stdout);
char cipher[] = "Gur dhvpx oebja sbk whzcf bire gur ynml qbt.";
p = cipher;
for (;*p; fputc(xlat_dec[*p++], stdout));
fputc('\n', stdout);
return EXIT_SUCCESS;
}
Output
Gur dhvpx oebja sbk whzcf bire gur ynml qbt.
The quick brown fox jumps over the lazy dog.
You can implement it literally:
"if they are shifted out of range":
if (ciphered_text[arrayelement] > 'z')
"make them go round in a circle and stop overflow":
ciphered_text[arrayelement] -= 26;
In your context:
if (plain_text[arrayelement] >= 'a' && plain_text[arrayelement] <= 'z')
{
ciphered_text[arrayelement] = (int)(plain_text[arrayelement] + shiftkey);
if (ciphered_text[arrayelement] > 'z')
ciphered_text[arrayelement] -= 26;
}
(assuming you work with English text in ACSII encoding, and shiftkey is in the range 1...25, like it should be)

Converting Letters to Numbers in C

I'm trying to write a code that would convert letters into numbers. For example
A ==> 0
B ==> 1
C ==> 2
and so on. Im thinking of writing 26 if statements. I'm wondering if there's a better way to do this...
Thank you!
This is a way that I feel is better than the switch method, and yet is standards compliant (does not assume ASCII):
#include <string.h>
#include <ctype.h>
/* returns -1 if c is not an alphabetic character */
int c_to_n(char c)
{
int n = -1;
static const char * const alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char *p = strchr(alphabet, toupper((unsigned char)c));
if (p)
{
n = p - alphabet;
}
return n;
}
If you need to deal with upper-case and lower-case then you may want to do something like:
if (letter >= 'A' && letter <= 'Z')
num = letter - 'A';
else if (letter >= 'a' && letter <= 'z')
num = letter - 'a';
If you want to display these, then you will want to convert the number into an ascii value by adding a '0' to it:
asciinumber = num + '0';
The C standard does not guarantee that the characters of the alphabet will be numbered sequentially. Hence, portable code cannot assume, for example, that 'B'-'A' is equal to 1.
The relevant section of the C specification is section 5.2.1 which describes the character sets:
3 Both the basic source and basic execution character sets shall have
the following members: the 26 uppercase letters of the Latin
alphabet
ABCDEFGHIJKLM
NOPQRSTUVWXYZ
the 26 lowercase letters of the Latin alphabet
abcdefghijklm
nopqrstuvwxyz
the 10 decimal digits
0123456789
the following 29 graphic characters
!"#%&'()*+,-./:
;<=>?[\]^_{|}~
the space character, and control characters representing horizontal
tab, vertical tab, and form feed. The
representation of each member of the source and execution basic
character sets shall fit in a byte. In both the source and execution
basic character sets, the value of each character after 0 in the above
list of decimal digits shall be one greater than the value of the
previous.
So the specification only guarantees that the digits will have sequential encodings. There is absolutely no restriction on how the alphabetic characters are encoded.
Fortunately, there is an easy and efficient way to convert A to 0, B to 1, etc. Here's the code
char letter = 'E'; // could be any upper or lower case letter
char str[2] = { letter }; // make a string out of the letter
int num = strtol( str, NULL, 36 ) - 10; // convert the letter to a number
The reason this works can be found in the man page for strtol which states:
(In bases above 10, the letter 'A' in either upper or lower case
represents 10, 'B' represents 11, and so forth, with 'Z' representing
35.)
So passing 36 to strtol as the base tells strtol to convert 'A' or 'a' to 10, 'B' or 'b' to 11, and so on. All you need to do is subtract 10 to get the final answer.
Another, far worse (but still better than 26 if statements) alternative is to use switch/case:
switch(letter)
{
case 'A':
case 'a': // don't use this line if you want only capital letters
num = 0;
break;
case 'B':
case 'b': // same as above about 'a'
num = 1;
break;
/* and so on and so on */
default:
fprintf(stderr, "WTF?\n");
}
Consider this only if there is absolutely no relationship between the letter and its code. Since there is a clear sequential relationship between the letter and the code in your case, using this is rather silly and going to be awful to maintain, but if you had to encode random characters to random values, this would be the way to avoid writing a zillion if()/else if()/else if()/else statements.
There is a much better way.
In ASCII (www.asciitable.com) you can know the numerical values of these characters.
'A' is 0x41.
So you can simply minus 0x41 from them, to get the numbers. I don't know c very well, but something like:
int num = 'A' - 0x41;
should work.
In most programming and scripting languages there is a means to get the "ordinal" value of any character. (Think of it as an offset from the beginning of the character set).
Thus you can usually do something like:
for ch in somestring:
if lowercase(ch):
n = ord(ch) - ord ('a')
elif uppercase(ch):
n = ord(ch) - ord('A')
else:
n = -1 # Sentinel error value
# (or raise an exception as appropriate to your programming
# environment and to the assignment specification)
Of course this wouldn't work for an EBCDIC based system (and might not work for some other exotic character sets). I suppose a reasonable sanity check would be to test of this function returned monotonically increasing values in the range 0..26 for the strings "abc...xzy" and "ABC...XYZ").
A whole different approach would be to create an associative array (dictionary, table, hash) of your letters and their values (one or two simple loops). Then use that. (Most modern programming languages include support for associative arrays.
Naturally I'm not "doing your homework." You'll have to do that for yourself. I'm simply explaining that those are the obvious approaches that would be used by any professional programmer. (Okay, an assembly language hack might also just mask out one bit for each byte, too).
Since the char data type is treated similar to an int data type in C and C++, you could go with some thing like:
char c = 'A'; // just some character
int urValue = c - 65;
If you are worried about case senstivity:
#include <ctype.h> // if using C++ #include <cctype>
int urValue = toupper(c) - 65;
Aww if you had C++
For unicode
definition of how to map characters to values
typedef std::map<wchar_t, int> WCharValueMap;
WCharValueMap myConversion = fillMap();
WCharValueMap fillMap() {
WCharValueMap result;
result[L'A']=0;
result[L'Â']=0;
result[L'B']=1;
result[L'C']=2;
return result;
}
usage
int value = myConversion[L'Â'];
I wrote this bit of code for a project, and I was wondering how naive this approach was.
The benefit here is that is seems to be adherent to the standard, and my guess is that the runtime is approx. O(k) where k is the size of the alphabet.
int ctoi(char c)
{
int index;
char* alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
c = toupper(c);
// avoid doing strlen here to juice some efficiency.
for(index = 0; index != 26; index++)
{
if(c == alphabet[index])
{
return index;
}
}
return -1;
}
#include<stdio.h>
#include<ctype.h>
int val(char a);
int main()
{
char r;
scanf("%c",&r);
printf("\n%d\n",val(r));
}
int val(char a)
{
int i=0;
char k;
for(k='A';k<=toupper(a);k++)
i++;
return i;
}//enter code here

Resources