Ascii character encoding issue

Ascii character encoding issue - c

#include <stdio.h>
int main()
{
char line[80];
int count;
// read the line of charecter
printf("Enter the line of text below: \n");
scanf("%[ˆ\n]",line);
// encode each individual charecter and display them
for(count = 0; line[count]!= '\0'; ++ count){
if(((line[count]>='0')&& (line [count]<= '9')) ||
((line[count]>= 'A')&& (line[count]<='Z')) ||
((line[count]>= 'a')&& (line[count]<='z')))
putchar(line[count]+1);
else if (line[count]=='9')putchar('0');
else if (line [count]== 'A')putchar('Z');
else if (line [count]== 'a') putchar('z');
else putchar('.');
}
}
In the above code problem is converting encoding. Whenever I compile the code, the compiler automatically converts the encoding and then I am unable to get required output.
My target output should look like:
enter the string
Hello World 456
Output
Ifmmp.uif.tusjof
For every letter, it is replaced by 2nd letter and space is replaced by '.'.

This is suspect:
scanf("%[ˆ\n]",line);
It should be:
scanf("%79[^\n]",line);
Your version has a multibyte character that looks a bit like ^, instead of the ^. This would cause your scans to malfunction. Your symptoms sound as if the text that has been input is actually multi-byte characters.
BTW you could make your code easier to read by using isalnum( (unsigned char)line[count] ). That test replaces your a-z, A-Z, 0-9 tests.

You are not checking your conditions correctly:
if (line[count]>= 'A')&& (line[count]<='Z)
..
already converts the character 'Z'. The next check,
if (line [count]== 'A')putchar('Z');
is never executed. But that is not the only thing wrong here. The character 'A' should be translated to 'B', not 'Z'. You probably want
if (line[count]>= 'A' && line[count] < 'Z)
(< instead of <=) and
if (line [count]== 'Z')putchar('A');
and the same for lowercase and digits.

The problem is your format string for scanf. If you want to read a line of text from the console, you should use %s.
If you want to make sure that you read a maximum of 79 characters, you should use %79s (because your line vector has a length of 80).
So you should replace your scanf with this:
scanf("%79s", line);

Related

C, Help while loop continues while not true

Task:
Write a char do/while loop, where the program will end if the letter is not in capital:
Solution:
Char input;
do{
scanf("%c", &input);
} while (input <'a' || 'z'< input);
So my program says: "do this, while the input is either a or z". Why does it control all letters from a to z and how come my program ends if it's a little char instead of a capital?
I'm new to C, and I can't find an explanation anywhere, thanks in advance.

the problem is this statement:
while (input <'a' || 'z'< input);
as this is looking for anything not lower case letters and not taking into account the whole ascii (single char) table of possibilities.
And the criteria is for upper case which those letter are lower case.
you could use:
while ('A' <= input && input <= 'Z')
however, best to use the functionality in the header file: ctype.h because not all systems use the ASCII character set. (IBM mainframe for instance, uses EBCDIC rather than ASCII, where the alphabet is not contiguous )
Remember that the 'enter' key is not upper case, (and not allowed for in the code)
the following proposed code:
cleanly compiles
performs the desired function
properly checks for errors
uses the facilities defined in the header file ctype.h
and now the proposed code
#include <stdio.h> // scanf(), perror()
#include <stdlib.h> // exit(), EXIT_FAILURE
#include <ctype.h> // isupper()
int main( void )
{
// 'char' is all lower case:
// so this statement: Char input;
// does not compile, suggest:
char input;
do
{
int scanfStatus = scanf("%c", &input);
// always check the returned value (not the parameter value)
if( 1 != scanfStatus )
{
perror( "scanf failed" );
exit( EXIT_FAILURE );
}
} while ( isupper( input ) );
} // end function: main

Your question is:
Why does it control all letters from a to z and how come my program ends if it's a little char instead of a capital?
The answer is, because of the while test, which tests whether input <'a' or 'z'< input.
Here is some background information that will help you understand why this happens.
In your program, input is a char, and, according to the C standard, the char type is an integral type. This means that 'a' (C's way to designate a char literal) is, in fact, a number, and thus, it can be compared with comparison operators such as < or > to other (integral) numbers or other char (here, the content of input).
Now, what is the actual integral value of a char? While the integral values of the character set are implementation-defined, in general, C compilers (including Visual Studio's) will use the ASCII Character Codes.
So:
'a' in your code, refers to the integral value of ASCII code for the char 'a', which is 97,
and 'z' in your code refers to the integral value of ASCII code for the char 'z', which is 122
As you can see also from the ASCII table (ASCII Character Codes Chart 1 from MSDN), the alphabet a-z has consecutive code numbers ('a' is 97, 'b' is 98, etc.), and the lowercase alphabet is effectively an ASCII code from 97 to 122.
So, in your code, the test input <'a' or 'z'< input is equivalent to input < 96 or 122 < input, and this will be true when the entered char has any ASCII value outside the range of 96 - 122, meaning, any char that is entered which is not in the range of ASCII codes for lowercase letters from 'a' to 'z' will result in the while test being true, and repeating the scanf().
Finally, as noted by other commentators or contributors, the right type is char, not Char since C is case-sensitive.

How to print printable characters and characters like '\n', '\t', etc

I'm trying to display an HTTP frame. The problem is that some characters are not recognized. I use the isprint function.
Here is the function I created:
void printAscii(const int dataLength, const char *data){
if (dataLength <= 0) {
printf("No data (data length <= 0)\n");
} else {
printf("Warning: Unsupported characters are not displayed.\n\n");
size_t i;
printf("|- ");
for (i = 0; i < dataLength; i++) {
if (isprint(data[i])) {
printf("%c", data[i]);
}
if (data[i] == '\n') {
printf("|- ");
}
}
}
}
The problem is that characters like "\ n" and "\ t" are not displayed either.
I thought of adding additional conditions in my function
if (isprint(data[i]) || data[i] == '\n' || data[i] == '\t')
But I was wondering if there was not a more "clean" way?
I started the C there is not too long so do not hesitate if I made mistakes in my function.
EDIT
I may not have been clear enough in my question.
My project is a frame analyzer (pcap), and I get to the HTTP part. The frame contains only ASCII, so it is relatively easy to display this type of frame. The problem is that some characters are not displayed directly (encoding for images for example) so I decided to ignore these characters. The problem is with isprint () characters like '\ n', '\ t', etc ... are not displayed either and so my display is less "beautiful".
For example, this HTTP trame :
<ul>
<li>Foo</li>
<li>Bar</li>
</ul>
become :
<ul><li>Foo</li><li>Bar</li></ul>
which is less understandable.
Edit 2
I found. This code works as desired.
if (isprint(data[i]) || isspace(data[i]))
Thanks anyway.

I found. This code works as desired.
if (isprint(data[i]) || isspace(data[i]))
– Eraseth

Since these characters ASCII values are white space you will not see them with the "%c" format specifier. What you need to do is use the hexadecimal string format specifier in printf as follows:
printf("%x", data[i]);
Note: The "%x"(lowercase) and "%X"(uppercase) means to display the hexadecimal value of the character instead of the actual ASCII value.
I should also note, that formatting the string in this fashion will give you the raw data. This is what it would look like in memory or what it would look like if it were transmitted on the wire. So, a \n would be a 0x0A and a \t would be a 0x09
See the following link for a great reference on what all the specifiers mean.
http://www.cplusplus.com/reference/cstdio/printf/
Also, here is another link to an ASCII Table.
https://www.techonthenet.com/ascii/chart.php
Hope this helps!

How to change multicharacter signs by other ones in C?

I've got an UTF-8 text file containing several signs that i'd like to change by other ones (only those between |( and |) ), but the problem is that some of these signs are not considered as characters but as multi-character signs. (By this i mean they can't be put between '∞' but only like this "∞", so char * ?)
Here is my textfile :
Text : |(abc∞∪v=|)
For example :
∞ should be changed by ¤c
∪ by ¸!
= changed by "
So as some signs(∞ and ∪) are multicharacters, i decided to use fscanf to get all the text word by word. The problem with this method is that I have to put space between each character ... My file should look like this :
Text : |( a b c ∞ ∪ v = |)
fgetc can't be used because characters like ∞ can't be considered as one single character.If i use it I won't be able to strcmp a char with each sign (char * ), i tried to convert my char to char* but strcmp !=0.
Here is my code in C to help you understanding my problem :
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(void){
char *carac[]={"∞","=","∪"}; //array with our signs
FILE *flot,*flot3;
flot=fopen("fichierdeTest2.txt","r"); // input text file
flot3=fopen("resultat.txt","w"); //output file
int i=0,j=0;
char a[1024]; //array that will contain each read word.
while(!feof(flot))
{
fscanf(flot,"%s",&a[i]);
if (strstr(&a[i], "|(") != NULL){ // if the word read contains |( then j=1
j=1;
fprintf(flot3,"|(");
}
if (strcmp(&a[i], "|)") == 0)
j=0;
if(j==1) { //it means we are between |( and |) so the conversion can begin
if (strcmp(carac[0], &a[i]) == 0) { fprintf(flot3, "¤c"); }
else if (strcmp(carac[1], &a[i]) == 0) { fprintf(flot3,"\"" ); }
else if (strcmp(carac[2], &a[i]) == 0) { fprintf(flot3, " ¸!"); }
else fprintf(flot3,"%s",&a[i]); // when it's a letter, number or sign that doesn't need to be converted
}
else { // when we are not between |( and |) just copy the word to the output file with a space after it
fprintf(flot3, "%s", &a[i]);
fprintf(flot3, " ");
}
i++;
}
}
Thanks a lot for the future help !
EDIT : Every sign will be changed correctly if i put a space between each them but without ,it won't work, that's what i'm trying to solve.

First of all, get the terminology right. Proper terminology is a bit confusing, but at least other people will understand what you are talking about.
In C, char is the same as byte. However, a character is something abstract like ∞ or ¤ or c. One character may contain a few bytes (that is a few chars). Such characters are called multi-byte ones.
Converting a character to a sequence of bytes (encoding) is not trivial. Different systems do it differently; some use UTF-8, while others may use UTF-16 big-endian, UTF-16 little endian, a 8-bit codepage or any other encoding.
When your C program has something inside quotes, like "∞" - it's a C-string, that is, several bytes terminated by a zero byte. When your code uses strcmp to compare strings, it compares each byte of both strings, to make sure they are equal. So, if your source code and your input file use different encodings, the strings (byte sequences) won't match, even though you will see the same character when examining them!
So, to rule out any encoding mismatches, you might want to use a sequence of bytes instead of a character in your source code. For example, if you know that your input file uses the UTF-8 encoding:
char *carac[]={
"\xe2\x88\x9e", // ∞
"=",
"\xe2\x88\xaa"}; // ∪
Alternatively, make sure the encodings (of your source code and your program's input file) are the same.
Another, less subtle, problem: when comparing strings, you actually have a big string and a small string, and you want to check whether the big string starts with the small string. Here strcmp does the wrong thing! You must use strncmp here instead:
if (strncmp(carac[0], &a[i], strlen(carac[0])) == 0)
{
fprintf(flot3, "\xC2\xA4""c"); // ¤c
}
Another problem (actually, a major bug): the fscanf function reads a word (text delimited by spaces) from the input file. If you only examine the first byte in this word, the other bytes will not be processed. To fix, make a loop over all bytes:
fscanf(flot,"%s",a);
for (i = 0; a[i] != '\0'; )
{
if (strncmp(&a[i], "|(", 2)) // start pattern
{
now_replacing = 1;
i += 2;
continue;
}
if (now_replacing)
{
if (strncmp(&a[i], whatever, strlen(whatever)))
{
fprintf(...);
i += strlen(whatever);
}
}
else
{
fputc(a[i], output);
i += 1; // processed just one char
}
}

You're on the right track, but you need to look at characters differently than strings.
strcmp(carac[0], &a[i])
(Pretending i = 2) As you know this compares the string "∞" with &a[2]. But you forget that &a[2] is the address of the second character of the string, and strcmp works by scanning the entire string until it hits a null terminator. So "∞" actually ends up getting compared with "abc∞∪v=|)" because a is only null terminated at the very end.
What you should do is not use strings, but expand each character (8 bits) to a short (16 bits). And then you can compare them with your UTF-16 characters
if( 8734 = *((short *)&a[i])) { /* character is infinity */ }
The reason for that 8734 is because that's the UTF16 value of infinity.
VERY IMPORTANT NOTE:
Depending if your machine is big-endian or little-endian matters for this case. If 8734 (0x221E) does not work, give 7714 (0x1E22) a try.
Edit Something else I overlooked is you're scanning the entire string at once. "%s: String of characters. This will read subsequent characters until a whitespace is found (whitespace characters are considered to be blank, newline and tab)." (source)
//feof = false.
fscanf(flot,"%s",&a[i]);
//feof = ture.
That means you never actually iterate. You need to go back and rethink your scanning procedure.

Reading from text file with '/n'

Okay. So I'm reading and storing text from a text file into a char array, this is working as intended. However, the textfile contains numerous newline escape sequences. The problem then is that when I print out the string array with the stored text, it ignores these newline sequences and simply prints them out as "\n".
Here is my code:
char *strings[100];
void readAndStore(FILE *file) {
int count = 0;
char buffer[250];
while(!feof(file)) {
char *readLine = fgets(buffer, sizeof(buffer), file);
if(readLine) {
strings[count] = malloc(sizeof(buffer));
strcpy(strings[count], buffer);
++count;
}
}
}
int main() {
FILE *file1 = fopen("txts", "r");
readAndStore(&*file1);
printf("%s\n", strings[0]);
printf("%s\n", strings[1]);
return 0;
}
And the output becomes something like this:
Lots of text here \n More text that should be on a new line, but isn't \n And so \n on and
and on \n
Is there any way to make it read the "\n" as actual newline escape sequences or do I just need to remove them from my text file and figure out some other way to space out my text?

No. Fact is that \n is a special escape sequence for your compiler, which turns it into a single character literal, namely "LF" (line feed, return), having ASCII code 0x0A. So, it's the compiler which gives a special meaning to that sequence.
Instead, when reading from file, \n is read as two distinct character, ASCII codes 0x5c,0x6e.
You will need to write a routine which replaces all occurences of \\n (the string composed by characters \ and n, the double escape is necessary to tell the compiler not to interpret it as an escape sequence) with \n (the single escape sequence, meaning new line).

If you only intend to replace '\n' by the actual character, use a custom replacement function like
void replacenewlines(char * str)
{
while(*str)
{
if (*str == '\\' && *(str+1) == 'n') //found \n in the string. Warning, \\n will be replaced also.
{
*str = '\n'; //this is one character to replace two characters
memmove(str, str+1, strlen(str)); //So we need to move the rest of the string leftwards
//Note memmove instead of memcpy/strcpy. Note that '\0' will be moved as well
}
++str;
}
}
This code is not tested, but the general idea must be clear. It is not the only way to replace the string, you may use your own or find some other solution.
If you intend to replace all special characters, it might be better to lookup some existing implementation or sanitize the string and pass it as the format parameter to printf. As the very minimum you will need to duplicate all '%' signs in the string.
Do not pass the string as the first argument of printf as is, that would cause all kinds of funny stuff.

c detecting empty input for stdin

This seems like it should be a simple thing but after hours of searching I've found nothing...
I've got a function that reads an input string from stdin and sanitizes it. The problem is that when I hit enter without typing anything in, it apparently just reads in some junk from the input buffer.
In the following examples, the prompt is "input?" and everything that occurs after it on the same line is what I type. The line following the prompt echoes what the function has read.
First, here is what happens when I type something in both times. In this case, the function works exactly as intended.
input? abcd
abcd
input? efgh
efgh
Second, here is what happens when I type something in the first time, but just hit enter the second time:
input? abcd
abcd
input?
cd
And here is what happens when I just hit enter both times:
input?
y
input?
y
It happens to return either 'y' or '#' every time when I run it anew. 'y' is particularly dangerous for obvious reasons.
Here is my code:
#include <stdio.h>
#include <stdlib.h>
#define STRLEN 128
int main() {
char str[STRLEN];
promptString("input?", str);
printf("%s\n", str);
promptString("input?", str);
printf("%s\n", str);
return EXIT_SUCCESS;
}
void promptString(const char* _prompt, char* _writeTo) {
printf("%s ", _prompt);
fgets(_writeTo, STRLEN, stdin);
cleanString(_writeTo);
return;
}
void cleanString(char* _str) {
char temp[STRLEN];
int i = 0;
int j = 0;
while (_str[i] < 32 || _str[i] > 126)
i++;
while (_str[i] > 31 && _str[i] < 127) {
temp[j] = _str[i];
i++;
j++;
}
i = 0;
while (i < j) {
_str[i] = temp[i];
i++;
}
_str[i] = '\0';
return;
}
I've tried various methods (even the unsafe ones) of flushing the input buffer (fseek, rewind, fflush). None of it has fixed this.
How can I detect an empty input so that I can re-prompt, instead of this annoying and potentially dangerous behavior?

This part of cleanString
while (_str[i] < 32 || _str[i] > 126)
i++;
jumps over \0 when the string is empty.
You should add _str[i] != '\0' into the loop's condition.
To detect an empty string, simply check it's length just after the input:
do {
printf("%s ", _prompt);
fgets(_writeTo, STRLEN, stdin);
} while (strlen(_writeTo) < 2);
(comparing with two because of '\n' which fgets puts into the end of buffer)

Why do you have a bunch of variable names with leading underscores? That's nasty.
Anyway, the first thing you must do is check the return value of fgets. If it returns NULL, you didn't get any input. (You can then test feof or ferror to find out why you didn't get input.)
Moving on to cleanString, you have a while loop that consumes a sequence of non-printable characters (and you could use isprint for that instead of magic numbers), followed by a while loop that consumes a sequence of printable characters. If the input string doesn't consist of a sequence of non-printables followed by a sequence of printables, you will either consume too much or not enough. Why not use a single loop?
while(str[i]) {
if(isprint(str[i]))
temp[j++] = str[i];
++i;
}
This is guaranteed to consume the whole string until the \0 terminator, and it can't keep going past the terminator, and it copies the "good" characters to temp. I assume that's what you wanted.
You don't even really need to use a temp buffer, you could just copy from str[i] to str[j], since j can never get ahead of i you'll never be overwriting anything that you haven't already processed.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Ascii character encoding issue - c

Related

C, Help while loop continues while not true

How to print printable characters and characters like '\n', '\t', etc

How to change multicharacter signs by other ones in C?

Reading from text file with '/n'

c detecting empty input for stdin

Categories

Resources