fgetc only reads UTF8 encoded file. not working for UTF16 - c

My aim is to find the encoding of a text file by dividing the size of the file by the number of characters in the file. but fgetc only reads UTF8 encoded files. not working for UTF16. Kindly help me to solve this problem or suggest me if any substitute for fgetc.
#include <stdio.h>
#include <stdlib.h>
void main()
{
findEncode("C:\\UTF-8_TestCase\\TestCase1.txt");
}
int findEncode(char *str){
int ch = NumberOfCharecter(str);
int size = SizeOfFile(str);
if(size/ch == 1){
printf("UTF-8");
}else if(size/ch == 2){
printf("UTF-16");
}else {
printf("UTF-32");
}
}
int NumberOfCharecter(char *str){
FILE *fptr;
char ch;
int character=1;
fptr=fopen(str,"r");
if(fptr==NULL)
{
printf("File does not exist or can not be opened.");
}
while(1)
{
ch = fgetc(fptr); //fgetc only reads UTF8 encoded file. not working for UTF16
if(ch==EOF)
break;
character++;
}
fclose(fptr);
printf("The number of characters in the file %s are : %d\n\n",str,character-1);
return character-1;
}
//SizeOfFile working well
int SizeOfFile(char *str) {
FILE *fptr;
char ch;
int sz;
fptr=fopen(str,"r+");
fseek(fptr, 0, SEEK_END);
sz = ftell(fptr);
printf("the size of the file is %d \n\n", sz);
fclose(fptr);
return sz;
}

char ch;
…
ch = fgetc(fptr); //…
if(ch==EOF)
You wrongly assign the return value of fgetc() to a char; in order to compare it to EOF, you have to define int ch. After this, you'll find that NumberOfCharecter() returns the same number as SizeOfFile(), since the character read by fgetc() is not a character in the sense of an encoding, it's independent from that.

Related

Saving file as UTF-8

I am trying to write a program for translating English to Greek. So I find the ASCII number of the English character (a) and then saving the new char to a file. However is still saves 'á' and not 'α' cause they have the same decimal number.
int main(int argc, char *argv[]) {
FILE *fp1, *fp2;
char ch,demo;
int i;
fp1 = fopen( argv[1], "r");
fp2 = fopen("Translated.txt", "w");
while (1) {
ch = fgetc(fp1);
if (ch == EOF)
break;
else{
i = ch + 128;
demo = i;
putc(demo, fp2);
}
}
printf("File copied Successfully!");
fclose(fp1);
fclose(fp2);
return 0;
}
How can I save a file as UTF-8 in order to view it as a Greek character ?
Any other way of converting ISO8859-1 to ISO8859-7 ?

Read txt file line by line into char array C

I know this question has been asked a few times, but never in a way that helps me figure out my problem. Essentially, I am reading four text files, all single words separated by a new line, and wanting to store these in a char array. I first count the number of lines in the file and then create a new char array, but for the life of me, I cannot figure out how to get it to read correctly. The last two lines are just to test if it has read the entire file correctly and they always come back a NULL and the question mark symbol.
I want each line to be at the next index in the char array.
Any help would be awesome! Thank you ahead of time.
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
void countAnagrams(char* fileName);
void main ()
{
char *fileNames[] = {"AnagramA.txt","AnagramB.txt","AnagramC.txt","AnagramD.txt"};
countAnagrams(fileNames[0]);
countAnagrams(fileNames[1]);
countAnagrams(fileNames[2]);
countAnagrams(fileNames[3]);
}
void countAnagrams(char* fileName)
{
int anagramCount = 0;
int ch, lines = 0;
//Count number of lines in file
FILE *myfile = fopen(fileName, "r");
do
{
ch = fgetc(myfile);
if(ch == '\n')
lines++;
}while(ch != EOF);
char contents[lines];
int i = 0;
for(i=1;i<lines;i++)
{
fscanf(myfile,"%s",contents[i]);
}
fclose(myfile);
printf("%.12s\n",fileName);
printf("number of lines: %d\n", lines);
printf("first thing: %s\n", contents[0]);
printf("last thing: %s\n", contents[lines-1]);
}
Here's a slight modification of your code that might help you.
The main points:
You can use getline() instead of fscanf(). fscanf() can be used to read line-by-line, but it needs an explicit check for the end of line condition. getline() does this automatically.
As kaylum pointed out, it's necessary to rewind() the file pointer back to the beginning of the file after counting the number of lines.
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
void countAnagrams(char* fileName);
void main ()
{
char *fileNames[] = {"AnagramA.txt","AnagramB.txt","AnagramC.txt","AnagramD.txt"};
countAnagrams(fileNames[0]);
countAnagrams(fileNames[1]);
countAnagrams(fileNames[2]);
countAnagrams(fileNames[3]);
}
void countAnagrams(char* fileName)
{
int anagramCount = 0;
int ch, lines = 0;
//Count number of lines in file
FILE *myfile = fopen(fileName, "r");
do
{
ch = fgetc(myfile);
if (ch == '\n')
lines++;
} while (ch != EOF);
rewind(myfile);
char *contents[lines];
int i = 0;
size_t len = 0;
for(i = 0; i < lines; i++)
{
contents[i] = NULL;
len = 0;
getline(&contents[i], &len, myfile);
}
fclose(myfile);
printf("%.12s\n",fileName);
printf("number of lines: %d\n", lines);
printf("first thing: %s\n", contents[0]);
printf("last thing: %s\n", contents[lines-1]);
}
I think that the problem is char contents[lines] and then fscanf(myfile,"%s",contents[i]) and the printf-s after. contents[i] is char type, and you want to read an array of chars into one char. contents needs to be declared as char* contents[lines] to be able to read a char array into contents[i].

Why is my_getline() causing a system hang?

This program attempts to save the contents of a text file into a character variable array. It is then supposed to use my_getline() to print the contents of the character array. I've tested and see that the contents are in fact getting saved into char *text but I can't figure out how to print the contents of char *text using my_getline(). my_getline is a function we wrote in class that I need to use in this program. When I attempt to call it in the way that was taught, it 1 is printed to terminal but then the terminal just waits and nothing else is printed. Any guidance would be appreciated. Also, let me know if I'm missing any information that would help.
/* Include the standard input/output and string libraries */
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
/* Define the maximum lines allowed in an input text and NEWLINE for getline funct. */
#define MAXPATTERN 15
#define MAXFILENAMELENGTH 15
#define NEWLINE '\n'
/* function prototypes */
void my_getline(char text[]);
int find_string(char text[], char pattern[], int length_text, int length_pattern);
int main()
{
FILE *fp;
long lSize;
char *text;
char fileName[MAXFILENAMELENGTH], pattern[MAXPATTERN];
char c;
int length_text, length_pattern, j, lineNumber = 1;
printf("Enter file name: ");
scanf("%s", fileName);
fp = fopen(fileName, "r");
if (fp == NULL)
{
printf("fopen failed.\n");
return(-1);
}
fseek(fp, 0L, SEEK_END);
lSize = ftell(fp);
rewind(fp);
/* allocate memory for all of text file */
text = calloc(1, lSize + 2);
if(!text)
{
fclose(fp);
fputs("memory allocs fails", stderr);
exit(1);
}
/* copy the file into text */
if(1 != fread(text, lSize, 1, fp))
{
fclose(fp);
free(text);
fputs("Entire read fails", stderr);
exit(1);
}
text[lSize + 1] = '\0';
printf("%s has been copied.\n", fileName);
rewind(fp);
printf("%d ", lineNumber);
for (j = 0; (j = getchar()) != '\0'; j++)
{
my_getline(text);
printf("%d %s\n", j+1, text);
}
printf("Enter the pattern you would like to search for: ");
scanf("%s", pattern);
printf("\nYou have chosen to search for: %s\n", pattern);
fclose(fp);
free(text);
return(0);
}
void my_getline(char text[])
{
int i = 0;
while ((text[i] = getchar()) != NEWLINE)
++i;
text[i] = '\0';
}
Your function is causing a system hang because you're calling getchar(), which returns the next character from the standard input. Is this really what you want?
At this point, your program is expecting input from the user. Try typing in the console windows and pressing to see it coming back from the "hang"
It is most likely causing an infinite loop because you are not checking whether you have reached EOF.
void my_getline(char text[])
{
int i = 0;
int c;
while ( (c = getchar()) != NEWLINE && c != EOF )
text[i++] = c;
text[i] = '\0';
}

counting the number of times a character appears in a file in a case insensitive manner using C language

The problem statement : a C program to count the number of times a character appears in the File. character is considered Case insensitive.
I have converted both the input character and character from the file to upper case so that none of the occurrence of the character goes uncounted. but when am executing this on an online editor am getting the result as "wrong answer" the editor isn`t accepting this code. what is the mistake in this code??
#include<stdio.h>
#include<ctype.h>
#include<stdlib.h>
int main
{
FILE *fp;
char filename[20];
char character;
char compare;
int to_upper1;
int to_upper2;
int count=0;
printf("\nEnter the file name");
scanf("%s", filename);
fp = fopen(filename,"r");
if(fp == NULL)
{
exit(-1);
}
printf("\nEnter the character to be counted");
scanf("%c", &character);
to_upper1 = toupper(character);
while((compare = fgets(fp)) != EOF)
{
to_upper2 = toupper(compare);
if(to_upper1 == to_upper2)
count++;
}
printf("\nFile \'%s\' has %d instances of letter \'%c\'", filename, count, character);
return 0;
}
I found a few errors in your code, and made a few small tweaks. The errors are - not eating the "whitespace" before the character input, using fgets() instead of fgetc(), using escape characters before the ' symbols in the output text.
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
int main(void) // added arg for compilability
{
FILE *fp;
char filename[20];
char character;
int compare; // correct type for fgetc and toupper
int to_upper1;
int to_upper2;
int count=0;
printf("Enter the file name: ");
scanf("%19s", filename); // restrict length
fp = fopen(filename,"r");
if(fp == NULL)
{
printf ("Cannot open file '%s'\n", filename);
exit(-1);
}
printf("\nEnter the character to be counted: ");
scanf(" %c", &character); // filter out whitespace
to_upper1 = toupper((int)character);
while((compare = fgetc(fp)) != EOF) // changed from fgets()
{
to_upper2 = toupper(compare);
if(to_upper1 == to_upper2)
count++;
}
fclose(fp); // remember to close file!
printf("File '%s' has %d instances of letter '%c'", filename, count, character);
return 0;
}
#include<stdio.h>
#include<stdlib.h>
#include<ctype.h>
int main()
{
FILE *fptr;
int d=0;
char c;
char ch,ck;
char b[100];
printf("Enter the file name\n");
scanf("%19s",b);
fptr=fopen(b,"r");
printf("Enter the character to be counted\n");
scanf(" %c",&c);
c=toupper(c);
if(fptr==NULL)
{
exit(-1);
}
while((ck=fgetc(fptr))!=EOF)
{
ch=toupper(ck);
if(c==ch||c==ck)
++d;
}
fclose(fptr);
printf("File '%s' has %d instances of letter '%c'.",b,d,c);
return(0);
}

Reading a UTF-16 CSV file by char

Currently I am trying to read a UTF-16 encoded CSV file char by char, and convert each char into ascii so I can process it. I later plan to change my processed data back to UTF-16 but that is besides the point right now.
I know right off the bat I am doing this completely wrong, as I have never attempted anything like this before:
int main(void)
{
FILE *fp;
int ch;
if(!(fp = fopen("x.csv", "r"))) return 1;
while(ch != EOF)
{
ch = fgetc(fp);
ch = (wchar_t) ch;
ch = (char) ch;
printf("%c", ch);
}
fclose(fp);
return 0;
}
Wishfully thinking, I was hoping that that work by magic for some reason but that was not the case. How can I read a UTF-16 CSV file and convert it to ascii? My guess is since each utf-16 char is two bytes (i think?) I'm going to have to read two bytes at a time from the file into a variable of some datatype which I am not sure of. Then I guess I will have to check the bits of this variable to make sure it is valid ascii and convert it from there? I don't know how I would do this though and any help would be great.
You should use fgetwc. The below code should work in the presence of a byte-order mark, and an available locale named en_US.UTF-16.
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
main() {
setlocale(LC_ALL, "en_US.UTF-16");
FILE *fp = fopen("x.csv", "rb");
if (fp) {
int order = fgetc(fp) == 0xFE;
order = fgetc(fp) == 0xFF;
wint_t ch;
while ((ch = fgetwc(fp)) != WEOF) {
putchar(order ? ch >> 8 : ch);
}
putchar('\n');
fclose(fp);
return 0;
} else {
perror("opening x.csv");
return 1;
}
}
This is my solution thanks to the comments under my original question. Since every character in the CSV file is valid ascii the solution was simple as this:
int main(void)
{
FILE *fp;
int ch, i = 1;
if(!(fp = fopen("x.csv", "r"))) return 1;
while(ch != EOF)
{
ch = fgetc(fp);
if(i % 2) //ch is valid ascii
i++;
}
fclose(fp);
return 0;
}

Resources