Issue with file names and content that includes foreign characters - c

I'm trying to write a program that fetches the name and contents of a text file (UTF-8).
I use
system("chcp 65001 > nul");
to be able to properly read foreign characters in the text and copy them to a string one character at a time. The characters get properly copied to the string.
I also want to get the name of the file which includes Turkish characters such as İ, Ö, Ç, Ü etc.
For this I use strcpy with dir->d_name
The issue is when I print them to the terminal text contents are fine but file name is missing afromentioned foreign characters.
"TRKE.txt" instead of "TÜRKÇE.txt"
I must be able to accurately compare the file contents with file name using strcmp but it's not possible since the name is missing characters.
Using
system("chcp 1254 > nul");
instead makes it possible to get the file name correctly but then the contents of the file are not represented correctly. Alternating between the two lines doesn't work since I need both to be working at the same time to use strcmp().
setlocale(LC_ALL, "Turkish");
doesn't fix it either. What do I do?

Related

Is it possible to prevent adding BOM to output UTF-8 file? (Visual Studio 2005)

I need some help.
I'm writing a program that opens 2 source files in UTF-8 encoding without BOM. The first contains English text and some other information, including ID. The second contains only string ID and translation. The program changes every string from the first file by replacing English chars to Russian translation from the second one and writes these strings to output file. Everything seems to be ok, but there is BOM appears in destination file. And i want to create file without BOM, like source.
I open files with fopen function in text mode with ccs=UTF-8
read string with fgetws function to wchar_t buffer
and write with fputws function to output file
Don't use text mode, don't use the MS ccs= extension to fopen, and don't use fputws. Instead use fopen in binary mode and write the correct UTF-8 yourself.

Foreign language characters replaced by "?"

I am working on a program which takes file/folder names as input. Currently when I try to run a file which has got foreign language character in its name it is replaced by a ? For each of its character. I am running my exe on command prompt so trying to run the particular file results in an error. When I am using DIR on command prompt it displays ? For each character of the file name. Is there any way to display the actual foreign language characters in command prompt as I believe that could be causing my exe not to work any of those files.
This is the text that I am trying to read - 科普書籍推展教案 which is being replaced by ? on the console.
The command prompt can only display characters in your current ACP. So, if you have files with names outside the ACP, you're going to see ?. You can use changecp to pick a different CP, but here is no code page for full Unicode in the DOS box.
Inside your code, you need to learn to use 'W' API to work with full unicode pathnames. The safest thing is to just #define _UNICODE and use it uniformly.

How Can I Read Specific Parts of a File in C?

I wish to read and write to a .csv file using this format: ~83474\t>wed 19 march 2014\n
When reading, I need to ignore the ~, the tab and the >. They are just there to remind my program of what the values that follow are used for. So far I figured out how to write to file using that format, however, I do not know how to read from the file either. I wish to store the numbers after the ~ as an integer value and the characters after the > as a string. How can I read those two values from every line in the file if each line has the format stated above?
Read the whole line as a string using fgets and process it.

Putting spaces in file directory string

I am new to file I/O, and I am writing a program in C to read a file that I already created. The examples in the book I have do not use literals with spaces. I was wondering if:
#define kErrorLog "/Dropbox/Dev/Learn%20C%20on%20Mac/Error%20Log"
would give me the appropriate path that corresponds to user/dropbox/dev/Learn C on Mac/Error Log.
No, you should just use spaces:
#define kErrorLog "/Dropbox/Dev/Learn C on Mac/Error Log"
The %20 escape is interpreted by web servers. Filenames are just character strings.
No; the file name need not be URL-encoded like that. You can include spaces normally:
#define kErrorLog "/Dropbox/Dev/Learn C on Mac/Error Log"
In general, there is no need to escape file names in C. If you're putting a file name directly in your code, you might need to escape problematic characters inside a string literal (for example, backslashes), but once you have it in a string, no modifications need to be made to that string to use it as a file name.

differences in size of two

i am having a directory which contains 4 files namely 1.c,2.c,3.c and 4.c.i am reading the file names present under this directory by using readdir system call which returns to some structure variable namely myStruct.
2)I am having another open file namely a.txt file which contains file names like 1.c,2.c,3.c,4.c etc...
My intention is to compare the files present in a.txt with the files presen in the directory(just the name comparison is enough..not checking its contents).
when i do the comparison,even though the names present in the directory matches with those present in the a.txt file,they dont show equal comparison and then when i printed the lenghths they are unequal.
Can anyone please let me know any solution to this problem
thanks
maddy
When you read from the file, there is an extra null character at the end of the line you have read, so the comparison will show that they are unequal. So after reading the line, trim off the \n and then try.
EDIT
This discussion tells you about how to trim whitespaces in a string using C - Painless way to trim leading/trailing whitespace in C?

Resources