I'm trying to write a code that would convert letters into numbers. For example
A ==> 0
B ==> 1
C ==> 2
and so on. Im thinking of writing 26 if statements. I'm wondering if there's a better way to do this...
Thank you!
This is a way that I feel is better than the switch method, and yet is standards compliant (does not assume ASCII):
#include <string.h>
#include <ctype.h>
/* returns -1 if c is not an alphabetic character */
int c_to_n(char c)
{
int n = -1;
static const char * const alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char *p = strchr(alphabet, toupper((unsigned char)c));
if (p)
{
n = p - alphabet;
}
return n;
}
If you need to deal with upper-case and lower-case then you may want to do something like:
if (letter >= 'A' && letter <= 'Z')
num = letter - 'A';
else if (letter >= 'a' && letter <= 'z')
num = letter - 'a';
If you want to display these, then you will want to convert the number into an ascii value by adding a '0' to it:
asciinumber = num + '0';
The C standard does not guarantee that the characters of the alphabet will be numbered sequentially. Hence, portable code cannot assume, for example, that 'B'-'A' is equal to 1.
The relevant section of the C specification is section 5.2.1 which describes the character sets:
3 Both the basic source and basic execution character sets shall have
the following members: the 26 uppercase letters of the Latin
alphabet
ABCDEFGHIJKLM
NOPQRSTUVWXYZ
the 26 lowercase letters of the Latin alphabet
abcdefghijklm
nopqrstuvwxyz
the 10 decimal digits
0123456789
the following 29 graphic characters
!"#%&'()*+,-./:
;<=>?[\]^_{|}~
the space character, and control characters representing horizontal
tab, vertical tab, and form feed. The
representation of each member of the source and execution basic
character sets shall fit in a byte. In both the source and execution
basic character sets, the value of each character after 0 in the above
list of decimal digits shall be one greater than the value of the
previous.
So the specification only guarantees that the digits will have sequential encodings. There is absolutely no restriction on how the alphabetic characters are encoded.
Fortunately, there is an easy and efficient way to convert A to 0, B to 1, etc. Here's the code
char letter = 'E'; // could be any upper or lower case letter
char str[2] = { letter }; // make a string out of the letter
int num = strtol( str, NULL, 36 ) - 10; // convert the letter to a number
The reason this works can be found in the man page for strtol which states:
(In bases above 10, the letter 'A' in either upper or lower case
represents 10, 'B' represents 11, and so forth, with 'Z' representing
35.)
So passing 36 to strtol as the base tells strtol to convert 'A' or 'a' to 10, 'B' or 'b' to 11, and so on. All you need to do is subtract 10 to get the final answer.
Another, far worse (but still better than 26 if statements) alternative is to use switch/case:
switch(letter)
{
case 'A':
case 'a': // don't use this line if you want only capital letters
num = 0;
break;
case 'B':
case 'b': // same as above about 'a'
num = 1;
break;
/* and so on and so on */
default:
fprintf(stderr, "WTF?\n");
}
Consider this only if there is absolutely no relationship between the letter and its code. Since there is a clear sequential relationship between the letter and the code in your case, using this is rather silly and going to be awful to maintain, but if you had to encode random characters to random values, this would be the way to avoid writing a zillion if()/else if()/else if()/else statements.
There is a much better way.
In ASCII (www.asciitable.com) you can know the numerical values of these characters.
'A' is 0x41.
So you can simply minus 0x41 from them, to get the numbers. I don't know c very well, but something like:
int num = 'A' - 0x41;
should work.
In most programming and scripting languages there is a means to get the "ordinal" value of any character. (Think of it as an offset from the beginning of the character set).
Thus you can usually do something like:
for ch in somestring:
if lowercase(ch):
n = ord(ch) - ord ('a')
elif uppercase(ch):
n = ord(ch) - ord('A')
else:
n = -1 # Sentinel error value
# (or raise an exception as appropriate to your programming
# environment and to the assignment specification)
Of course this wouldn't work for an EBCDIC based system (and might not work for some other exotic character sets). I suppose a reasonable sanity check would be to test of this function returned monotonically increasing values in the range 0..26 for the strings "abc...xzy" and "ABC...XYZ").
A whole different approach would be to create an associative array (dictionary, table, hash) of your letters and their values (one or two simple loops). Then use that. (Most modern programming languages include support for associative arrays.
Naturally I'm not "doing your homework." You'll have to do that for yourself. I'm simply explaining that those are the obvious approaches that would be used by any professional programmer. (Okay, an assembly language hack might also just mask out one bit for each byte, too).
Since the char data type is treated similar to an int data type in C and C++, you could go with some thing like:
char c = 'A'; // just some character
int urValue = c - 65;
If you are worried about case senstivity:
#include <ctype.h> // if using C++ #include <cctype>
int urValue = toupper(c) - 65;
Aww if you had C++
For unicode
definition of how to map characters to values
typedef std::map<wchar_t, int> WCharValueMap;
WCharValueMap myConversion = fillMap();
WCharValueMap fillMap() {
WCharValueMap result;
result[L'A']=0;
result[L'Â']=0;
result[L'B']=1;
result[L'C']=2;
return result;
}
usage
int value = myConversion[L'Â'];
I wrote this bit of code for a project, and I was wondering how naive this approach was.
The benefit here is that is seems to be adherent to the standard, and my guess is that the runtime is approx. O(k) where k is the size of the alphabet.
int ctoi(char c)
{
int index;
char* alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
c = toupper(c);
// avoid doing strlen here to juice some efficiency.
for(index = 0; index != 26; index++)
{
if(c == alphabet[index])
{
return index;
}
}
return -1;
}
#include<stdio.h>
#include<ctype.h>
int val(char a);
int main()
{
char r;
scanf("%c",&r);
printf("\n%d\n",val(r));
}
int val(char a)
{
int i=0;
char k;
for(k='A';k<=toupper(a);k++)
i++;
return i;
}//enter code here
Related
I wrote a program that counts and prints the number of occurrences of elements in a string but it throws a garbage value when i use fgets() but for gets() it's not so.
Here is my code:
#include<stdio.h>
#include<string.h>
#include<ctype.h>
#include<stdlib.h>
int main() {
char c[1005];
fgets(c, 1005, stdin);
int cnt[26] = {0};
for (int i = 0; i < strlen(c); i++) {
cnt[c[i] - 'a']++;
}
for (int i = 0; i < strlen(c); i++) {
if(cnt[c[i]-'a'] != 0) {
printf("%c %d\n", c[i], cnt[c[i] - 'a']);
cnt[c[i] - 'a'] = 0;
}
}
return 0;
}
This is what I get when I use fgets():
baaaabca
b 2
a 5
c 1
32767
--------------------------------
Process exited after 8.61 seconds with return value 0
Press any key to continue . . . _
I fixed it by using gets and got the correct result but i still don't understand why fgets() gives wrong result
Hurray! So, the most important reason your code is failing is that your code does not observe the following inviolable advice:
Always sanitize your inputs
What this means is that if you let the user input anything then he/she/it can break your code. This is a major, common source of problems in all areas of computer science. It is so well known that a NASA engineer has given us the tale of Little Bobby Tables:
Exploits of a Mom #xkcd.com
It is always worth reading the explanation even if you get it already #explainxkcd.com
medium.com wrote an article about “How Little Bobby Tables Ruined the Internet”
Heck, Bobby’s even got his own website — bobby-tables.com
Okay, so, all that stuff is about SQL injection, but the point is, validate your input before blithely using it. There are many, many examples of C programs that fail because they do not carefully manage input. One of the most recent and widely known is the Heartbleed Bug.
For more fun side reading, here is a superlatively-titled list of “The 10 Worst Programming Mistakes In History” #makeuseof.com — a good number of which were caused by failure to process bad input!
Academia, methinks, often fails students by not having an entire course on just input processing. Instead we tend to pretend that the issue will be later understood and handled — code in academia, science, online competition forums, etc, often assumes valid input!
Where your code went wrong
Using gets() is dangerous because it does not stop reading and storing input as long as the user is supplying it. It has created so many software vulnerabilities that the C Standard has (at long last) officially removed it from C. SO actually has an excellent post on it: Why is the gets function so dangerous that it should not be used?
But it does remove the Enter key from the end of the user’s input!
fgets(), in contrast, stops reading input at some point! However, it also lets you know whether you actually got an entire line of of text by not removing that Enter key.
Hence, assuming the user types: b a n a n a Enter
gets() returns the string "banana"
fgets() returns the string "banana\n"
That newline character '\n' (what you get when the user presses the Enter key) messes up your code because your code only accepts (or works correctly given) minuscule alphabet letters!
The Fix
The fix is to reject anything that your algorithm does not like. The easiest way to recognize “good” input is to have a list of it:
// Here is a complete list of VALID INPUTS that we can histogram
//
const char letters[] = "abcdefghijklmnopqrstuvwxyz";
Now we want to create a mapping from each letter in letters[] to an array of integers (its name doesn’t matter, but we’re calling it count[]). Let’s wrap that up in a little function:
// Here is our mapping of letters[] ←→ integers[]
// • supply a valid input → get an integer unique to that specific input
// • supply an invalid input → get an integer shared with ALL invalid input
//
int * histogram(char c) {
static int fooey; // number of invalid inputs
static int count[sizeof(letters)] = {0}; // numbers of each valid input 'a'..'z'
const char * p = strchr(letters, c); // find the valid input, else NULL
if (p) {
int index = p - letters; // 'a'=0, 'b'=1, ... (same order as in letters[])
return &count[index]; // VALID INPUT → the corresponding integer in count[]
}
else return &fooey; // INVALID INPUT → returns a dummy integer
}
For the more astute among you, this is rather verbose: we can totally get rid of those fooey and index variables.
“Okay, okay, that’s some pretty fancy stuff there, mister. I’m a bloomin’ beginner. What about me, huh?”
Easy. Just check that your character is in range:
int * histogram(char c) {
static int fooey = 0;
static int count[26] = {0};
if (('a' <= c) && (c <= 'z')) return &count[c - 'a'];
return &fooey;
}
“But EBCDIC...!”
Fine. The following will work with both EBCDIC and ASCII:
int * histogram(char c) {
static int fooey = 0;
static int count[26] = {0};
if (('a' <= c) && (c <= 'i')) return &count[ 0 + c - 'a'];
if (('j' <= c) && (c <= 'r')) return &count[ 9 + c - 'j'];
if (('s' <= c) && (c <= 'z')) return &count[18 + c - 's'];
return &fooey;
}
You will honestly never have to worry about any other character encoding for the Latin minuscules 'a'..'z'.Prove me wrong.
Back to main()
Before we forget, stick the required magic at the top of your program:
#include <stdio.h>
#include <string.h>
Now we can put our fancy-pants histogram mapping to use, without the possibility of undefined behavior due to bad input.
int main() {
// Ask for and get user input
char s[1005];
printf("s? ");
fgets(s, 1005, stdin);
// Histogram the input
for (int i = 0; i < strlen(s); i++) {
*histogram(s[i]) += 1;
}
// Print out the histogram, not printing zeros
for (int i = 0; i < strlen(letters); i++) {
if (*histogram(letters[i])) {
printf("%c %d\n", letters[i], *histogram(letters[i]));
}
}
return 0;
}
We make sure to read and store no more than 1004 characters (plus the terminating nul), and we prevent unwanted input from indexing outside of our histogram’s count[] array! Win-win!
s? a - ba na na !
a 4
b 1
n 2
But wait, there’s more!
We can totally reuse our histogram. Check out this little function:
// Reset the histogram to all zeros
//
void clear_histogram(void) {
for (const char * p = letters; *p; p++)
*histogram(*p) = 0;
}
All this stuff is not obvious. User input is hard. But you will find that it doesn’t have to be impossibly difficult genius-level stuff. It should be entertaining!
Other ways you could handle input is to transform things into acceptable values. For example you can use tolower() to convert any majuscule letters to your histogram’s input set.
s? ba na NA!
a 3
b 1
n 2
But I digress again...
Hang in there!
I have the following problem:
I would to implement a ceaser cipher which works mostly, but when I reach the end of the alphabet it goes beyond the alphabet which I assume is due to the ascii values.
for example:
if I insert a k and use the key 35 I get a H but it should wrap around in the lowercase letters and produce a b.
It also sometimes produces a punctuation mark or something else like < which I do not want.
The code responsible for the encryption is
encripted_text = (plain_text + key - 97)%26 +97;
am I missing something to make it wrap around and only stay in the alphabet.
Example run of program:
char plain_text = 'k';
int k = 35;
char encripted_text = '\0';
encripted_text = (plain_text + key - 97)%26 + 97;
printf("%c", encripted_text);
Thanks for your help.
Assuming that encripted_text and plain_text are both char variables (1 byte), you have to decide a formula making sure that even when the result wraps around a valid output character is calculated.
How to do that? It depends on what are valid chars for you! In some cases you can simply rely on how characters are mapped in ASCII code, but I want suggest you a general solution that you will be able to translate in your specific requirement.
This general solution consists in defining an array with the accepted alphabet, and translate the input character to one of the characters in it.
For example:
char alphabet[] = { 'a','A','b','B','c','C' }; //just an example
char input_char = 'k';
int key = 35;
char encripted_char = '\0';
encripted_char = alphabet[(input_char + key - 97)%(sizeof(alphabet))];
printf("%c", encripted_char );
Summarizing: the formula doesn't calculate directly the encrypted char, but the index of the accepted alphabet array normalized using % operator.
I got the right output with the same logic #nad34. In fact, I correctly got the output as 't' for 'k'. It shouldn't and won't give a 'b'.
Your code is having the right logic, except for a few slight errors.
I don't know why you're using a string here, but since you are anyway, this -> char plain_text[] = 'k'; should instead be char plain_text[] = "k"; ==> Note the double quotes.
int k = 35; should be int key = 35;, since you have used the variable name key and not k.
Logicwise it is right, and it will give you the right output.
You can check out the execution of the same code here.
I'm a python programmer who is doing a project in c. I need to map letters to the usb sendcodes corresponding to them. In python I would hardcode in a dictionary. In C I'm using a giant switch statement. Is there a better way?
switch(c){
case 'a':
return "x04";
break;
case 'b':
return "x05";
break;
case 'c':
return "x06";
break;
For my part, I'd create a lookup table:
static char sendcodes[256][5]; // declared outside of any function;
// "static" means it's only visible
// within the current source file
void init_sendcodes( void )
{
for ( unsigned char c = 'a'; c <= 'z'; c++ )
sprintf( sendcodes[c], "%02hhx", c - 'a' + 4 );
}
char *mapSendcode( char c )
{
return sendcodes[c];
}
Note that this code assumes an encoding where 'a' through 'z' are contiguous (ASCII or UTF-8). If they're not, well, you'll have to use multiple loops.
When you're done, sendcode['a'] contains the string "0x04", sendcode['b'] contains "0x05", etc. So while it takes some work to initialize the table, you only have to do that once at the beginning of the program - after that it's just an array lookup.
Assuming your system is using something like ASCII or UTF-8 character encoding, where the Latin letters a-z are sequential, you can create an array of your values:
char *sendcodes[] = { "0x04", "0x05", "0x06", ... };
Then you would index it by subtracting 'a' from the letter in question, giving you an index from 0 to 25:
return sendcodes[c - 'a'];
You could try using parallel arrays, as long as you have the same amount of data in both of the arrays. Just loop through the first array and use the variable i as an index into the second array:
char array1[] = {'a', 'b', 'c'};
char *array2[] = {"x04", "x05", "x06"};
for (int i=0; i<sizeof(array1)/sizeof(array1[0]); i++)
{
printf("%s\n", array2[i]);
}
I'm writing a code in C programming language that receives a string (of chars) as input and each letter advances 3 in the alphabet. E.g. If user types "abc", the program should return "def". The problem is that if the user types 'z' (e.g.), the program returns some char, instead of my goal (which would be this case the letter 'c'). My current algorithm includes this if statement:
if ((text[i]>='a' && text[i]<='w')||(text[i]>='A' && text[i]<='W'))
text[i] = (text[i]) + 3;
But this forces me to write all this lines:
else if (text[i]=='x') text[i]='a';
else if (text[i]=='X') text[i]='A';
else if (text[i]=='y') text[i]='b';
else if (text[i]=='Y') text[i]='B';
else if (text[i]=='z') text[i]='c';
else if (text[i]=='Z') text[i]='C';
How can I optimize my code?
Your problem can be addressed with simple arithmetic logic. The range of char ranges from 0 - 255. Each value corresponds to a separate character. The letters 'A-Z' and 'a-z' range from 65 - 90 and 97 - 122 respectively. So for your problem there are two ranges. You can check with the standard library function that your character falls in upper case or lower case range. then you can set the base for your range. Next you will find the offset of your character from base, add 3 in it. the new value can be made circular using % operator. each range has maximum of 26 characters so you can make your range circular by taking a mod from 26. Now add the resulting value (offset) to the base to get the desired character.
#include <ctype.h>
...
...
...
char c = text[i];
char base = isupper(c)? 'A' : 'a';
text[i] = base + (((c - base) + 3) % 26); // New Desired Character
I am ciphering like the following but don't know how to prevent the capitals going into other symbols if they are shifted out of range, similarly the lowercase go out of range of the lowercase letters. How can I make them go round in a circle and stop overflow? Thanks
int size = strlen(plain_text);
int arrayelement = 0;
for(arrayelement = 0; arrayelement < size; arrayelement++)
{
if (islower(plain_text[arrayelement]))
{
ciphered_text[arrayelement] = (int)(plain_text[arrayelement] + shiftkey);
}
else if (isupper(plain_text[arrayelement]))
{
ciphered_text[arrayelement] = (int)(plain_text[arrayelement] + shiftkey);
}
}
ciphered_text[size] = '\0';
printf("%s", ciphered_text);
I guess you use a type like char so an easy solution to not overflow is to do
int tmp_ciphered = (my_char + shift) % 0xff;
char ciphered = (char)(tmp_ciphered);
thenyou turn and do not overflow, this is a ring
This duplicates (almost exactly) c++ simple Caesar cipher algorithm.
Note that I don't agree with the accepted answer on that post. Basically you have to map the characters back into the range using something like ((c-'a'+shift) % 26) + 'a'. However that assumes your characters are in 'a'..'z'. Might be safer to use c >= 'a' && c <= 'z' instead of islower as I'm not sure how locale will play into on non-English systems. Similar for isupper and the other range. Finally, you need an else clause to handle when the char is not in either range.
The only truly portable way to do this involves building a lookup table for the input domain, and manually building the chars based on non-linear-assumptions.
Even for the restricted domain of ['a'..'z','A'..'Z'], assuming 'A'..'Z' is contiguous is not defined by the language standard, and is provably not always the case. For any naysayers that think otherwise, I direct you to ordinal positions of characters in the chart at this link, paying close attention to the dead-zones in the middle of the assumed sequences. If you think "Nobody uses EBCDIC anymore", let me assure you both AS/400 and OS/390 are alive and well (and probably processing your US taxes right now, as the IRS is one of IBM's biggest customers).
In fact, the C standard is pretty explicit about this:
C99-5.2.1.3 In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.
Nowhere is there even a mention of defined ordering or even implied ordering on any other part of the character sets. In fact, '0'..'9' has one other unique attribute: they are the only characters guaranteed to be unaffected by locale changes.
So rather than assume a linear continuation exists for characters while thumbing our noses at the suspicious silence of the standard, let us define our own, hard map. I'lll not inline the code here like I normally do; if you're still with me you're genuinely interested in knowing and will likely read and critique the code below. But I will describe in summary how it works:
Static-declare two alphabets, double in length (A..ZA..Z,a..za..z).
Declare two arrays (encrypt and decrypt) large enough to hold (1<<CHAR_BIT) entries.
Fully initialize both arrays with values corresponding to their indexes. Ex: a[0]=0,a[1]=1,...
Fill each location in the encrypt-array that is part of our alphabets from (1) with the proper value corresponding to the shift width Ex. a['a'] = 'g' for a ROT5.
Mirror (4) by working backward from the tail of the alphabet applying the opposite shift direction. Ex: `a['g'] = 'a';
You can now use the encryption array as a simple table to translate input text to cipher text:
enc-char = encrypt[ dec-char ];
dec-char = decrypt[ enc-char ];
If you think it seems like a ton of work just to get source-level platform independence, you're absolutely right. But you would be amazed at the #ifdef #endif hell that people try to pass off as "multi-platform". The core goal of platform-independent code is to not only define common source, but define behavior as well. No matter what the platform, the concepts above will work. (and not a #ifdef in sight).
Thanks for taking the time to read this fiasco. Such a seemingly simple problem...
Sample main.cpp
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <string.h>
// global tables for encoding. must call init_tables() before using
static char xlat_enc[1 << CHAR_BIT];
static char xlat_dec[1 << CHAR_BIT];
void init_tables(unsigned shift)
{
// our rotation alphabets
static char ucase[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ";
static char lcase[] = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz";
int i=0;
// ensure shift is below our maximum shift
shift %= 26;
// prime our table
for (;i<(1 << CHAR_BIT);i++)
xlat_enc[i] = xlat_dec[i] = i;
// apply shift to our xlat tables, both enc and dec.
for (i=0;i<(sizeof(ucase)+1)/2;i++)
{
xlat_enc[ lcase[i] ] = lcase[i+shift];
xlat_enc[ ucase[i] ] = ucase[i+shift];
xlat_dec[ lcase[sizeof(lcase) - i - 1] ] = lcase[sizeof(lcase) - i - 1 - shift];
xlat_dec[ ucase[sizeof(ucase) - i - 1] ] = ucase[sizeof(ucase) - i - 1 - shift];
}
}
// main entrypoint
int main(int argc, char *argv[])
{
// using a shift of 13 for our sample
const int shift = 13;
// initialize the tables
init_tables(shift);
// now drop the messsage to the console
char plain[] = "The quick brown fox jumps over the lazy dog.";
char *p = plain;
for (;*p; fputc(xlat_enc[*p++], stdout));
fputc('\n', stdout);
char cipher[] = "Gur dhvpx oebja sbk whzcf bire gur ynml qbt.";
p = cipher;
for (;*p; fputc(xlat_dec[*p++], stdout));
fputc('\n', stdout);
return EXIT_SUCCESS;
}
Output
Gur dhvpx oebja sbk whzcf bire gur ynml qbt.
The quick brown fox jumps over the lazy dog.
You can implement it literally:
"if they are shifted out of range":
if (ciphered_text[arrayelement] > 'z')
"make them go round in a circle and stop overflow":
ciphered_text[arrayelement] -= 26;
In your context:
if (plain_text[arrayelement] >= 'a' && plain_text[arrayelement] <= 'z')
{
ciphered_text[arrayelement] = (int)(plain_text[arrayelement] + shiftkey);
if (ciphered_text[arrayelement] > 'z')
ciphered_text[arrayelement] -= 26;
}
(assuming you work with English text in ACSII encoding, and shiftkey is in the range 1...25, like it should be)