Iterating over string/strlen with umlauted characters

Iterating over string/strlen with umlauted characters - c

This is a follow-up to my previous question . I succeeded in implementing the algorithm for checking umlauted characters. The next problem comes from iterating over all characters in a string. I do this like so:
int main()
{
char* str = "Hej du kalleåäö";
printf("length of str: %d", strlen(str));
for (int i = 0; i < strlen(str); i++)
{
printf("%s ", to_morse(str[i]));
}
putchar('\n');
return 0;
}
The problem is that, because of the umlauted characters, it prints 18, and also makes the to_morse function fail (ignoring these characters). The toMorse method accepts an unsigned char as a parameter. What would be the best way to solve this? I know I can check for the umlaut character here instead of the letterNr function but I don't know if that would be a pretty/logical solution.

Normally, you'd store the string in a wchar_t and use something like ansi_strlen to get the length of it - that would give you the number of printed characters as opposed to the number of bytes you stored.
You really shouldn't be implementing UTF or Unicode or whatever multibyte character handling yourself - there are libraries for that sort of thing.

On OS X, Cocoa is a solution - note the use of "%C" in NSLog - that's an unichar (16-bit Unicode character):
#import <Cocoa/Cocoa.h>
int main()
{
NSAutoreleasePool * pool = [NSAutoreleasePool new];
NSString * input = #"Hej du kalleåäö";
printf("length of str: %d", [input length]);
int i=0;
for (i = 0; i < [input length]; i++)
{
NSLog(#"%C", [input characterAtIndex:i]);
}
[pool release];
}

You could do something like
for (int i = 0; str[i]!='\0'; ++i){
//do something with str[i]
}
Strings in C are terminated with '\0'. So it is possible to check for the end of the string like that.

EDIT: What locale are you using?
If you are going to iterating over a string, don't bother with getting its length with strlen. Just iterate until you see a NUL character:
char *p = str;
while(*p != '\0') {
printf("%c\n", *p);
++p;
}
As for the umlauted characters and such, are they UTF-8? If the string is multi-byte, you could do something like this:
size_t n = strlen(str);
char *p = str;
char *e = p + n;
while(*p != '\0') {
wchar_t wc;
int l = mbtowc(&wc, p, e - p);
if(l <= 0) break;
p += l;
/* do whatever with wc which is now in wchar_t form */
}
I honestly don't know if mbtowc will simply return -1 if it encounters a NUL in the middle of a MB character. If it does, you could just pass MB_CUR_MAX instead of e - p and do away with the strlen call. But I have a feeling this is not the case.

Related

Bug with strlen?

I'm just getting started with C and I just started trying to figure out
call by reference in functions. I have noticed an odd result in my output
when using strlen() to iterate over a string and modify its contents. In this
example the result of strlen() is 3, not including the null character,
but if I do not explicitly check for the null character (or use less than the
result of strlen() instead of less than or equals) during the for loop then
it gives a bizarre bit character in the output which I ASSUME is because of the null character?
Please help this noob to understand what is happening here.
Code:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void f_test_s(char s[]);
void f_test_s2(char s[]);
int main(){
char s_test[] = "abc";
f_test_s(s_test);
f_test_s2(s_test);
puts("\nTest complete!");
return 0;
}
void f_test_s(char s[]){
puts("Test #1: ");
printf("string before: %s\n", s);
int len = strlen(s);
printf("strlen() = %d\n", len);
int i=0;
for(i=0;i<=len;i++){
if(s[i] != '\0'){
s[i]++;
}
}
printf("string after: %s\n", s);
}
void f_test_s2(char s[]){
puts("\nTest #2: ");
printf("string before: %s\n", s);
int len = strlen(s);
printf("strlen() = %d\n", len);
int i=0;
for(i=0;i<=len;i++){
s[i]++;
}
printf("string after: %s\n", s);
}
output:
Test #1:
string before: abc
strlen() = 3
string after: bcd
Test #2:
string before: bcd
strlen() = 3
string after: cde
Test complete!
If it matters I am using gcc version 7.3.0 on Ubuntu. I am definitely
not an expert with either C, gcc, or Ubuntu.

This is the problem:
for (i = 0; i <= len; i++) {
s[i]++;
}
It should be:
for (i = 0; i < len; i++) {
s[i]++;
}
s[len] is the null char (0). When you removed null char and replaced it with the value of 1, the contents of the array are now {'a', 'b', 'c', 0x1}. And when printf attempts to print s it's going to keep printing characters past the value memory address of the array until it encounters a null char. Technically this is undefined behavior.

Change this:
for (i = 0; i <= len; i++) {
to this:
for (i = 0; i < len; i++) {
since strlen() returns the length of the string. A C string is as long as the number of characters between the beginning of the string and the terminating null character (without including the terminating null character itself).
Your code invokes Undefined Behavior (UB), since you go out of bounds. Standard string functions (like printf()) depend on the NULL terminating character to mark the end of the string. Without it, they do not know when to stop . . .

How to change individual characters in string on C?

Trying to make some basic hangman code to practice learning C but I can't seem to change individual characters in the program
int main(int argc, char *argv[]) {
int length, a, x;
x = 0;
char gWord[100];
char word[100] = "horse";
length = strlen(word) - 1;
for(a = 0; a<= length; a = a + 1){
gWord[a] = "_";
printf("%s", gWord[a]);
}
printf("%s", gWord);
}
when I try to run this it just prints (null) for every time it goes through the loop. It's probably a basic fix but I'm new to C and can't find anything about it online

To print character instead of string change:
printf("%s", gWord[a]);
to:
printf("%c", gWord[a]);
but before that change also:
gWord[a] = "_";
to:
gWord[a] = '_';
The last problem is that you were assigning a string literal to a single character.
Edit:
Also as #4386427 pointed out, you never zero-terminate gWord before printing it later on with printf("%s", gWord). You should change the last line from:
printf("%s", gWord);
to:
gWord[a] = '\0';
printf("%s", gWord);
because otherwise this would very likely lead to a buffer overflow.

This line
printf("%s", gWord[a]);
Must be
printf("%c", gWord[a]);
To print a char, c is the right specifier. s is only for whole strings and takes char pointers.

Are you getting any warning message('s) when compiling your code?
Since you are learning C language, one suggestion - Never ignore any warning message given by the compiler, they are there for some reason.
Three problems in your code:
First:
Assigning string to a character:
gWord[a] = "_";
gWord is an array of 100 characters and gWord[a] is a character at location a of gWord array. Instead, you should do
gWord[a] = '_';
Second:
Using wrong format specifier for printing a character:
printf("%s", gWord[a]);
^^
For printing a character you should use %c format specifier:
printf("%c", gWord[a]);
Third:
Missed adding null terminating character at the end in gWord and printing it:
printf("%s", gWord);
In C language, strings are actually one-dimensional array of characters terminated by a null character '\0'. The %s format specifier is used for character string and by default characters are printed until the ending null character is encountered. So, you should make sure to add '\0' at the end of gWord after the for loop finishes:
gWord[a] = '\0';
Apart from these, there are couple of more things -
This statement:
length = strlen(word) - 1;
I do not see any reason of subtracting 1 from the word length. The strlen return the length of string without including the terminating null character itself. So, the strlen(word) will give output 5. Now you are subtracting 1 from this and running loop till <= length may confuse the reader of the code. You should simply do:
length = strlen(word);
for(a = 0; a < length; a = a + 1){
....
....
Also, the return type of strlen() is size_t and the size_t is an unsigned type. So, you should use the variable of type size_t to receive strlen() return value.
Last but not least, make sure to not to have any unused variables/parameters in your program. You are not using argc and argv anywhere in your program. If you are using gcc compiler then compile it with -Wall -Wextra options. You will find that compiler will report all the unused variables/parameters. So, if not using argc and argv then you should simply give void in the parameter list of main() function.
Putting these all together, you can do:
#include <stdio.h>
#include <string.h>
int main(void) {
size_t length, a;
char gWord[100];
char word[100] = "horse";
length = strlen(word);
for(a = 0; a < length; a = a + 1) {
gWord[a] = '_';
printf("%c", gWord[a]);
}
gWord[a] = '\0';
printf ("\n");
printf("%s\n", gWord);
return 0;
}

Syntactically, your code is alright #Blookey. But logically, no actually.
Let me point out 3 places which are causing the undesired behavior in your code:
gWord[a] = "_"; Observe this line. You have specified the _ in " ". In case you are unaware of this fact, each individual element of a string is a character. And each character is supposed to be given in ' ', i.e., single quotes and not double quotes.
printf("%s", gWord[a]); A similar error again. gWord[a] is a character, not a string. Hence you need to print it using the format specifier %c instead of %s which is for string instead.
A string, any string is supposed to end with a NULL, which is \0 (backslashZERO). That is what differentiates an array of characters from a string. So just add the following line once you finish loading characters into gWord[].
gWord[a] = '\0';
Here is the complete code, just with the 3 changes:
#include<stdio.h>
#include<string.h>
int main(int argc, char *argv[]) {
int length, a, x;
x = 0;
char gWord[100];
char word[100] = "horse";
length = strlen(word) - 1;
for(a = 0; a<= length; a = a + 1){
gWord[a] = '_';
printf("%c ", gWord[a]);
}
gWord[a] = '\0';
printf("\n%s", gWord);
}
Here is the OUTPUT:
_ _ _ _ _ .
_____.

#include <stdio.h>
#include <stdlib.h>
#include<string.h>
int main(int argc, char *argv[]) {
int length, a, x;
x = 0;
char gWord[100] = {0};
char word[100] = "horse"; /*note that if the array in place strlen +1 is not nulled before using strlen you might not get the correct result*/
length = strlen(word) - 1;/*strln will return 5, that is the letters inn the string the pointer is pointing to until the first terminator '\0'*/
for(a = 0; a<= length; a = a + 1){
gWord[a] = '_'; /* if you use "_" it will try to fit in the chars '_' and '\0' to each char slot of the array*/
printf("%c", gWord[a]); /* %s looks for a string to print while here there are single chars to print*/
}
printf("\n");/*you can print the hole string like this */
printf("%s", gWord);
return 0;/*and remember that main function should always have a return value*/
}

sscanf adding in 0's to my string

I'm writing some c code for an assembler used for a virtual computer designed for our text book. The point is to get the binary output to look the same as it does after running assembly through the program that accompanies the text book. I was on the last instruction to convert to binary, BR (for branch), and was having some trouble with sscanf. The function is,
char* br(char* line) {
int num, i, l, n = 0, z = 0, p = 0;
char bin[17] = "0000";
char word[20], arg1[20];
sscanf(line, "%S%S", word, arg1);
l = strlen(word);
for (i = 2; i < l; i++) {
if (word[i] == 'N') {
n = 1;
} else if (word[i] == 'Z') {
z = 1;
} else if (word[i] == 'P') {
p = 1;
}
}
bin[4] = n + '0';
bin[5] = z + '0';
bin[6] = p + '0';
while (label[i] != 0) {
if (strcmp(label[i], arg1) == 0) {
num = address[i] - currentAddress - 1;
decToBinary(num, arg1);
break;
}
i++;
}
for (i = 7; i < 16; i++) {
bin[i] = arg1[i];
}
return bin;
}
The problem I'm having is that sscanf is adding 0's between every character placed in word and arg1 so they are terminated. The incoming string "BRZP START" is broken into "B" for word and "S" for arg1 respectively. I've used sscanf in this way a bunch already and don't know why its not working now.

If you look at the man page for sscanf it seems that %S (capital S) does not really mean anything (based off the printf man page, it looks as if it is equivalent to "ls", which reads wide characters from the string, but should not be used. When a string of short characters is converted to a string of long characters, every second character will seem to equal zero). Try this:
sscanf(line, "%s%s", word, arg1);

You don't specify the host OS or development platform you're using, but I'm not in the least bit surprised you're getting two bytes for every character considering that %S tells sscanf to read wide chars in some implementations. If your input is ASCII, you'll get ASCII chars plus a null byte, just as you're seeing. The solution is simple: use %s not %S.

How do i cycle through each letter in a string?

#include <stdio.h>
int main()
#include <stdio.h>
int main()
{
char msg[31] = {'\0'};
char encrypted[31] = {'\0'};
int key;
printf("Please enter a message under 30 characters: ");
fgets(msg, 31, stdin);
printf("Please enter an encryption key: ");
scanf("%d", &key);
int i = 0;
while (msg[i] && ('a' <= msg[i] <= 'z' || 'A' < msg[i] < 'Z'))
{
encrypted[i] = (msg[i] + key);
i++;
}
printf("%s\n", msg);
printf("%d\n", key);
printf("%s\n", encrypted);
}
Okay i've got my code to increment the characters but i don't know how to make it ignore special characters and spaces. Also how do i use % to loop back to 'a' and 'A'?
Thank you.

You just need a simple for loop:
for (int i = 0; i < 31; i++)
{
// operate on msg[i]
}
If you didn't know the length of the string to begin with, you might prefer a while loop that detects the null terminator:
int i = 0;
while (msg[i])
{
// operate on msg[i]
i++;
}
Your fgets and scanf are probably fine, but personally, I would be consistent when reading input, and fgets for it all. Then you can sscanf to get key out later.

scanf and fgets seem fine in this situation the way you've used them.
In C, a string is just an array of characters. So, you access each element using a for loop and array indexing:
for (int i = 0; i < strlen(str); i++) {
char thisChar = str[i];
//Do the processing for each character
}
You can perform arithmetic on thisChar as necessary, but be careful not to exceed 255. You might want to put a check on key to ensure it doesn't get too big.

Getting a string from scanf:
char msg[31];
scanf("%30s", msg);
OR (less efficient, because you have to fill the array with 0s first)
char msg[31] = { 0 };
scanf("%30c", msg);
Iterating a string is as easy a for loop (be sure to use c99 or c11)
int len = strlen(msg);
for(int i = 0; i < len; i++) {
char current = msg[i];
//do something
msg[i] = current;
}
"Encrypting" (i.e. ciphering) a character require a few steps
Determine if we have an uppercase character, lowercase character, or non-alphabetic character
Determine the position in the alphabet, if alphabetic.
Update the position, using the modulus operator (%)
Correct the position, if alphabetic
I could give you the code here, but then you wouldn't learn anything from doing it yourself. Instead, I encourage you to implement the cipher based on the steps I provided above.
Note that you can do things like:
char c = 'C';
char e = 'E' + 2;
char lower_c = 'C' - 'A' + 'a';

Reverse a string containing ASCII chars and non-ASCII chars

I got a problem about how to reverse a string containing this 'abcd汉字efg'.
str_to_reverse = "abcd汉字efg"; /* those non-ASCII chars are Chinese characters, each of them takes 2 bytes */
after reversion, it should be:
str_toreverse = "gfe字汉dcba";
I thought, to reverse the string, I gotta identify those non-ASCII chars, because I think that simply reversing every byte won't get the right answer.
How can I do it?
PS:
I wrote this program under Ubuntu, 32-bit.
then I printed every byte:
for(i = 0; i < strlen(s); i++)
printf("%c", s[i]);
I got some gibberish text instead of "汉字".

Pure C89 answer:
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>
#include <string.h>
int main()
{
char const* str;
size_t slen;
char* rev;
setlocale(LC_ALL, "");
str = "abcd汉字efg";
printf("%s\n", str);
slen = strlen(str);
rev = malloc(slen+1)+slen;
*--rev = '\0';
while (*str != '\0') {
int clen, i;
clen = mblen(str, slen);
if (clen == -1) {
fprintf(stderr, "Bad encoding\n");
return EXIT_FAILURE;
}
for (i = 0; i < clen; ++i) {
*--rev = str[clen-1-i];
}
str += clen;
}
printf("%s\n", rev);
return 0;
}

If the string is encoded as utf8, it is pretty simple. You can obtain the length of well formed utf8 sequences by inspecting only the first byte.
In a first pass you reverse only the utf8 "subsequences" (those with length > 1)
In a second pass you reverse the whole string.
Voila.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Iterating over string/strlen with umlauted characters - c

You could do something like for (int i = 0; str[i]!='\0'; ++i){ //do something with str[i] } Strings in C are terminated with '\0'. So it is possible to check for the end of the string like that.

Related

Bug with strlen?

How to change individual characters in string on C?

sscanf adding in 0's to my string

How do i cycle through each letter in a string?

Reverse a string containing ASCII chars and non-ASCII chars

Categories

Resources