Accented/umlauted characters in C? - c

I'm just learning about C and got an assignment where we have to translate plain text into morse code and back. (I am mostly familiar with Java so bear with me on the terms I use).
To do this, I have an array with the strings for all letters.
char *letters[] = {
".- ", "-... ", "-.-. ", "-.. ", ".", "..-." etc
I wrote a function for returning the position of the desired letter.
int letter_nr(unsigned char c)
{
return c-97;
}
This is working, but the assignment specifications require the handling of the Swedish umlauted letters åäö. The Swedish alphabet is the same as the English with these three letters in the end. I tried checking for these, like so:
int letter_nr(unsigned char c)
{
if (c == 'å')
return 26;
if (c == 'ä')
return 27;
if (c == 'ö')
return 28;
return c-97;
}
Unfortunately, when I tried testing this function, I get the same value for all of these three: 98. Here is my main, testing function:
int main()
{
unsigned char letter;
while(1)
{
printf("Type a letter to get its position: ");
scanf("%c", &letter);
printf("%d\n", letter_nr(letter));
}
return 0;
}
What can I do to resolve this?

The encoding of character constants actually depend on your locale settings.
The safest bet is to use wide characters, and the corresponding functions. You declare the alphabet as const wchar_t* alphabet = L"abcdefghijklmnopqrstuvwxyzäöå", and the individual characters as L'ö';
This small example program works for me (also on a UNIX console with UTF-8) - try it.
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main(int argc, char** argv)
{
wint_t letter = L'\0';
setlocale(LC_ALL, ""); /* Initialize locale, to get the correct conversion to/from wchars */
while(1)
{
if(!letter)
printf("Type a letter to get its position: ");
letter = fgetwc(stdin);
if(letter == WEOF) {
putchar('\n');
return 0;
} else if(letter == L'\n' || letter == L'\r') {
letter = L'\0'; /* skip newlines - and print the instruction again*/
} else {
printf("%d\n", letter); /* print the character value, and don't print the instruction again */
}
}
return 0;
}
Example session:
Type a letter to get its position: a
97
Type a letter to get its position: A
65
Type a letter to get its position: Ö
214
Type a letter to get its position: ö
246
Type a letter to get its position: Å
197
Type a letter to get its position: <^D>
I understand that on Windows, this does not work with characters outside the Unicode BMP, but that's not an issue here.

In general encoding stuff is quite complicated. On the other hand if you just want a dirty solution specific to your compiler/platform than add something like this to your code:
printf("letter 0x%x is number %d\n", letter, letter_nr(letter));
It will give hex value for your umlauts. Than just replace in if statements your letter with number.
EDIT You say that you are always getting 98 so your scanf got 98 + 97 = 195 = 0x3C from console. According to this table 0x3C is start of UTF8 sequence for common LATIN SMALL LETTER N WITH Something in Latin1 block. You are on Mac OS X ?
EDIT This is my final call. Quite hackery but it works for me :)
#include <stdio.h>
// scanf for for letter. Return position in Morse Table.
// Recognises UTF8 for swedish letters.
int letter_nr()
{
unsigned char letter;
// scan for the first time,
scanf("%c", &letter);
if(0xC3 == letter)
{
// we scanf again since this is UTF8 and two byte encoded character will come
scanf("%c", &letter);
//LATIN SMALL LETTER A WITH RING ABOVE = å
if(0xA5 == letter)
return 26;
//LATIN SMALL LETTER A WITH DIAERESIS = ä
if(0xA4 == letter)
return 27;
// LATIN SMALL LETTER O WITH DIAERESIS = ö
if(0xB6 == letter)
return 28;
printf("Unknown letter. 0x%x. ", letter);
return -1;
}
// is seems to be regular ASCII
return letter - 97;
} // letter_nr
int main()
{
while(1)
{
printf("Type a letter to get its position: ");
int val = letter_nr();
if(-1 != val)
printf("Morse code is %d.\n", val);
else
printf("Unknown Morse code.\n");
// strip remaining new line
unsigned char new_line;
scanf("%c", &new_line);
}
return 0;
}

Hmmm ... at first I'd say the "funny" characters are not chars. You cannot pass one of them to a function accepting a char argument and expect it to work.
Try this (add the remaining bits):
char buf[100];
printf("Enter a string with funny characters: ");
fflush(stdout);
fgets(buf, sizeof buf, stdin);
/* now print it, as if it was a sequence of `char`s */
char *p = buf;
while (*p) {
printf("The character '%c' has value %d\n", *p, *p);
p++;
}
Now try the same with wide characters: #include <wchar.h> and replace printf with wprintf, fgets with fgetws, etc ...

Related

Ensuring you're reading a character through scanf

Is there a simple way to make sure you're reading a character through scanf. If it were an integer I'd use a do while loop
do{
printf("enter a number");
fehler = scanf(" %d", &x);
getchar();
} while(fehler!=1);
But I'm not fully sure what to do if the input is meant to be a string. I know the alphabets are stored as ASCII values but the if constraints in the while statement don't seem to be working(unless I'm doing it wrong)
char * temp2;
temp2 = malloc(sizeof(string));
do{
printf("PLease enter a string: ");
scanf(" %s", temp2);
getchar();
} while(temp2 <= 'A' && temp2 <= 'z')
You can't compare a string to a single character. You have to loop through the entire string, checking every character.
#include <ctype.h>
int is_alphabetic(char *str) {
for (int i = 0; str[i]; i++) {
if (!isalpha(str[i])) {
return 0;
}
}
return 1;
}
...
do{
printf("Please enter an alphabetic string: ");
scanf(" %s", temp2);
getchar();
} while(!is_alphabetic(temp2));
You see printf and scanf work independently. Whatever you store be it a character or number is stored in form of a number. Now it depends on the printf function what it demands.
Eg.: If you store 'a' at a location, the number 97 is stored. Now if you print a number it prints 97 and if you demand a character it gives a.
#include <stdio.h>
int main()
{
int i = 97;
printf("%d \n", i);
printf("%c", i);
return 0;
}
See the results. Further char, int , long int are just data types which specify the number of bits that would be resrved for the inputs for the variable.
Execute this program and you'll understand:
#include <stdio.h>
int main()
{
int i;
for (i=97; i <=200 ; i++)
{
printf("%d %c,\t",i,i);
};
return 0;}
This will show you a nmber when printed as a number and then the SAME number read as character.
Note there are no markers in memory to store which type of data it is. It is straightforward stored as number.
scanf is absolutely the wrong tool for this. But if you want to read only alphabetic characters, you can do it easily enough with something like:
char s[32];
if( 1 == scanf(" %31[a-zA-Z]", s) ){ ... }
The %31[a-zA-Z] conversion specifier will match only the literal characters a thru z and A thru Z, and will only consume up to 31 characters of input. You must always use a field width modifier with %s or %[] conversion specifiers to avoid an overflow.

Is there a way to check the uppercase and isalpha case for strings in an array?

while(1)
{
char buff[1000];
printf("Enter the word: ");
fgets(buff, 1000, stdin);
if(!strcmp(buff, "\n"))//empty search then break the loop
{
fprintf(stderr, "Exiting the program\n");
break;
}
int error = 0;
int i = 0;
while(buff[i] != '\0')
{
if(buff[i] >= 33 && buff[i] <= 96)
{
break;
}
error = 1;
i++;
}
if(error == 0)
{
fprintf(stderr, "Please enter something containing only lower-case letters\n");
}
}
I expect the output of hello World to be Please enter something containing only lower-case letters, but I am not getting that error.
If I enter World hello I am getting the expected result which is, it prints the error message.
Is there a way to use isalpha for the whole array?
You should not hard-code letter values but use the actual values. In this problem any letter outside the range 'a' to 'z' is invalid. But it is more portable to use the library functions isalpha() and islower() because the letters values are not guaranteed to be consecutive.
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(void)
{
while(1) {
char buff[1000];
printf("Enter the word: ");
fgets(buff, sizeof buff, stdin);
if(!strcmp(buff, "\n")) {
fprintf(stderr, "Exiting the program\n");
break;
}
int error = 0;
int i = 0;
while(buff[i] != '\0') {
if(isalpha(buff[i]) && !islower(buff[i])) {
error = 1;
break;
}
i++;
}
if(error == 1) {
fprintf(stderr, "Please enter something containing only lower-case letters\n");
}
}
}
Program session
Enter the word: hello world
Enter the word: Hello world
Please enter something containing only lower-case letters
Enter the word: hello World
Please enter something containing only lower-case letters
Enter the word: hello, world!
Enter the word:
Exiting the program
There are library functions for checking upper and lowercase. They are called isupper and islower. Use them. Although its uncommon with 'a' being something else than 97, it may happen. If you mean the letter 'a' then use the character literal 'a' instead of the number 97. Furthermore, the letters aren't even guaranteed to be consecutive, so 'z'-'a' is not guaranteed to be evaluated to 22. However, digits are required to be consecutive so '9'-'0' will always evaluate to 9. But it is much safer to rely on library functions like isalpha and such. I wrote about encoding here: https://stackoverflow.com/a/46890148/6699433
To correct your bug, you need a proper condition. According to your question, it should print the error message if any of the characters is not either a lower case or a space. Furthermore, your code is overly complicated. Here is a solution:
int i = 0;
while(buff[i] != '\0') {
if(!(islower(buff[i]) || isspace(buff[i]))) {
fprintf(stderr, "Please enter something containing only lower-case letters\n");
break;
}
i++;
}
Is there a way to use isalpha for the whole array?
C does not have built in functionality for such things, but you can write your own mapper.
/* Apply function f on each of the elements in str and return false
* if f returns false for any of the elements and true otherwise.
*/
bool string_is_mapper(const char *str, size_t size, int (*f)(int c))
{
for(int i=0; i<size && str[i] != '\0'; i++)
if(!f(str[i])) return false;
return true;
}
Now, you can use this mapper like this:
if(string_is_mapper(str, strlen(str), isupper)
puts("All characters in str is upper case");
You can even write your own functions to plugin, as long as they fit this prototype:
int condition(int c);

Converting user input to an array of characters, and filtering letters from other characters?

#include "stdafx.h"
#include "stdlib.h"
#include <ctype.h>
int num = 0;
int i = 0;
int ch = 0;
int letter_index_in_alphabet(int ch) {
if (isalpha(ch) == true) {
char temp_str[2] = { ch };
num = strtol(temp_str, NULL, 36) - 9;
printf("%d is a letter, with %d as its location in the alphabet!", ch, num);
}
else {
return -1;
}
}
int main()
{
char input_str[10];
printf("Please enter a series of up to 10 letters and numbers: \n");
fgets(input_str, 10, stdin);
for (i == 0; i <= 10; i++) {
ch = input_str[i];
letter_index_in_alphabet(ch);
}
return 0;
}
Hello everyone, this is my first post on SOF! The goal of this program is to read characters from the standard input to EOF. For each character, report if it is a letter. If it is a letter, print out its respective index in the alphabet ('a' or 'A' = 1, 'b' or 'B' = 2..etc). I have been searching some other posts on stackoverflow and this has helped me get this far(using fgets and strtol functions). I have no visible syntax errors when I run this code, but after I enter a string of characters (ex: 567gh3fr) the program crashes.
Basically, I am trying to use 'fgets' to bring each character entered into a string with the appropriate index. Once I have that string, I check each index for a letter and if it is, I print the number assigned to that letter of the alphabet.
Any help or insight into why this isn't working as intended is greatly appreciated, Thanks!
You have a few problems.
First, char input_str[10] is only big enough for the user to enter 9 characters, not 10, because you need to allow one character for the null byte that ends a string.
Second, your loop goes too far. For a string with 10 characters, indexes go up to 9, not 10. It also should stop when it gets to the null byte, since the user might not have entered all 9 characters.
To get the position in the alphabet, you can simply subtract the value of A or a from the value of the character. Use tolower() or toupper() to convert the character to the case that you're going to use. Your method works, but it's overly complicated and confusing.
letter_index_in_alphabet() is declared to return int. But when the character is a letter, it doesn't execute a return statement. I'm not sure why it's supposed to return something, since you never use the return value, but I've changed it to return the position (maybe the caller should be the one that prints the message, so the function just does the calculation).
In the for loop, it should be i = 0 to perform an assignment, not i == 0 which is comparison.
You also shouldn't use global variables so much. And system header files should have <> around them, not "".
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>
int letter_index_in_alphabet(int ch) {
if (isalpha(ch)) {
int num = tolower(ch) - 'a' + 1;
printf("%d is a letter, with %d as its location in the alphabet!\n", ch, num);
return num;
} else {
return -1;
}
}
int main()
{
char input_str[10];
printf("Please enter a series of up to 9 letters and numbers: \n");
fgets(input_str, sizeof(input_str), stdin);
for (int i = 0; input_str[i]; i++) {
letter_index_in_alphabet(input_str[i]);
}
return 0;
}

How to detect space and letters in a Char in C?

Do you know how I can detect space and letters in a CHAR variable?
I need to detect letters or space in a input of numbers:
This what I want to do:
Enter Document Number of 8 numbers:
// i press space and pressed enter
ERROR: please enter the age again: 4fpdpfsg
There's where my code doesn't detect the letters after the 4, and what I want is recognize that there's letters in the input, and then shows only the 4.
int isLetter(char input[]){
int i = 0;
while(input[i]!='\0'){
if((input[i]!=' ') && (input[i]<'a'||input[i]>'z') && (input[i]<'A'||input[i]>'Z'))
return 0;
i++;
}
return 1;
}
The standard C library has various character type testing functions. They are declared in the #include <ctype.h> header.
Unfortunately, the obvious way of using these functions is often wrong. They take an argument of type int which is actually expected to be an unsigned character value (a byte, effectively) in the range 0 to UCHAR_MAX. If you pass in a char value which happens to be negative, undefined behavior ensues, which might work by coincidence, crash or worse yet form a vulnerability similar to heartbleed (possibly worse).
Therefore the cast to (unsigned char) is quite likely necessary in the following:
#include <ctype.h>
/* ... */
char ch;
/* ... */
if (isalpha((unsigned char) ch) || ch == ' ') {
/* ch is an alphabetic character, or a space */
}
Simple character constants (not numeric escaped ones) derived from the C translation time character set have positive values in the execution environment; code which can safely assume that it only manipulates such characters can do without the cast. (For instance, if all the data being manipulated by the program came from string or character literals in the program itself, and all those literals use nothing but the basic C translation time character set.)
That is to say, isalpha('a') is safe; a is in the C translation time character set, and so the value of the character constant 'a' is positive. But say you're working with source code in ISO-8859-1 and have char ch = 'à';. If char is signed, this ch will have a negative value, which is fine according to ISO C because an accented à isn't in the basic C translation character set. The expression isalpha(ch); then passes a negative value to the isalpha function, which is wrong.
Try:
if (!((input[i] == ' ') || (input[i] >= 'a' && input[i] <= 'z') || (input[i] >= 'A' && input[i] <= 'Z')))
or, better:
#include <ctype.h>
if (!((input[i] == ' ') || isalpha(input[i])))
You could use sscanf(input,"%d%n",&number,&nrOfDigits) which reads in an integral value into number and additionally stores the position of the first character which has not been part of the number in nrOfDigits. With this information, you can then decide what to do, e.g. nrOfDigits < 8 would indicate that either the input was shorter than 8 characters, or that it does contain less than 4 consecutive digits. See sample code of the usage below.
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
int isLetter(char input[]){
int nrOfDigits=0;
int number;
int scannedElems = sscanf(input,"%d%n",&number,&nrOfDigits);
if (scannedElems == 0) {// number could not be read--
printf ("No number read.\n");
return 0;
}
else {
char c = input[nrOfDigits];
int isAlpha = isalpha(c);
printf("input %s leads to number %d with %d digit(s); first characer after the digits is '%c', (isalpha=%d)\n", input, number, nrOfDigits, c, isAlpha);
return number;
}
}
int main(){
isLetter("4fpdpfsg"); // input 4fpdpfsg leads to number 4 with 1 digit(s); first characer after the digits is 'f', (isalpha=1)
isLetter("afpdpfsg"); // No number read.
isLetter("12345678"); // input 12345678 leads to number 12345678 with 8 digit(s); first characer after the digits is '�', (isalpha=0)
return 0;
}
BTW: you could implement a similar logic with strtoul as well.
hey guys i finally get the way to detect the input is conformed only for 8 numbers theres the code
char* InputDni(char dni[])
{
int sizeletter;
int i;
fflush(stdin);
gets(dni);
// 8 is the size of DNI in argentina
while((isLetter(dni)) || (strlen(dni)!=8))
{
printf("ERROR: enter again the DNI: ");
fflush(stdin);
gets(dni);
}
sizeletter=strlen(dni);
for(i=0 ;i<sizeletter; i++)
{
while(isalpha(dni[i]))
{
printf("ERROR: enter again the DNI: ");
fflush(stdin);
gets(dni);
i++
}
}
return dni;
}
//isLetter
int isLetter(char input[])
{
int i = 0;
int sizeletter;
int flag=1;
sizeletter=strlen(input);
for(i=0;i<sizeletter;i++)
{
if((input[i]!=' ') && (input[i]<'a'||input[i]>'z') && (input[i]<'A'||input[i]>'Z'))
{
flag=0;
}
}
return flag;
}
picture of the code running in cmd:

Inverting Capitalization of letter in C

ok so i was asked to invert capitalization in C using a function inver_caps. My function works and prints the new letter correctly, but im having trouble as to why in the main it will not print correctly?
Any ideas?
void invert_caps (char letter);
int main(void){
char lettermain;
printf("Enter a letter: ");
scanf(" %c", &lettermain);
invert_caps(lettermain);
printf("The invert of the letter is %c \n", lettermain);
system("PAUSE");
return 0;
}
void invert_caps (char letter){
printf("\nletter is %d\n",letter); /*this was used for debugging*/
if ((int)letter >=65 && (int)letter<=90){
letter = (int)letter+32;
}else{
letter = (int)letter - 32;
}
printf("\nnew letter is %d or %c\n",letter, letter); /*this was used for debugging*/
return letter;
}
You are doing pass by value (copy value), that doesn't reflects change made in calling at caller function. Do
(1) pass by pointer (pass address), Or
(2) simply return converted value from function.
Side note: Don't use ascii values in code, just use char constants to keep code readable (you don't have to remember ascii value).
For example I write (2) solution for you (I believe that will be easy for you presently, avoiding pointer at this stage).
to understand the code read comments:
char invert_caps (char letter){
// ^ added return type, its not void now
if ( letter >= 'A' && letter<= 'Z'){ // not using ASCII value but char Constants
letter = letter + ('a' - 'A'); // its more readable
// Note 32 = 'a' - 'A' that is 97 - 54
}
else {
if ( letter >= 'a' && letter<= 'z'){// Add this case, to be safe is good practice
letter = letter - ('a' - 'A');
}
else
letter = '\0'; // if letter is neither Upper or lower alphabetic case
} // then convened it into nul symbol (exception case)
return letter; // added a line
}
// I removed unnecessary typecasts and debug statements
In main() you need to call it like at same place:
lettermain = invert_caps(lettermain);
// ^ return value assigned to variable `lettermain`
return is a key word in C. The return statement terminates the execution of a function and returns control to the calling function. A return statement can also return a value to the calling function and from function invert_caps() we are returning converted value.
Here is a small routine using library functions to detect and change letter case.
#include <ctype.h>
int invert_case(int c)
{
return islower(c) ? toupper(c) : isupper(c) ? tolower(c) : c;
}
If you don't like the nested ?: ternary operators, here is the same logic using if.
int invert_case(int c)
{
if (islower(c)) {
return toupper(c);
}
if (isupper(c)) {
return tolower(c);
}
return c;
}
Note: this can be made even simpler, see Dave's comment below.
basic reason you are not getting the inversion in the main method is you are only passing a copy of lettermain to invert_caps() method.
Rather than that do the following,
void invert_caps (char letter);
int main(void){
char lettermain;
printf("\n=========Question 8=========\n");
printf("Enter a letter: ");
scanf(" %c", &lettermain);
lettermain=invert_caps(lettermain);
printf("The invert of the letter is %c \n", lettermain);
system("PAUSE");
return 0;
}
char invert_caps (char letter){
printf("\nletter is %d\n",letter);
if ((int)letter >=65 && (int)letter<=90){
letter = (int)letter+32;
}else{
letter = (int)letter - 32;
}
printf("\nnew letter is %d or %c\n",letter, letter); /*this was used for debugging*/
return letter;
}
Because when you pass the lettermain variable from main, it's copied.
You need to either return the new character, or pass the variable by reference.

Resources