A user of my program has reported problems reading a settings file written by my program. I looked at the settings file in question and instead of decimal points using the period "." it uses commas ",".
I'm assuming this is to do with locales?
The file i/o is using fprintf and mpfr_out_str for file output and getline combined with atol, atof, mpfr_set_str, etc for file input.
What do I do here? Should I force my program to always use periods even if the machine's locale wants to use commas? If so, where do I start?
Edit: I've just noticed that this problem occurs when specifying the settings file to use on the command line instead of loading it via the GUI - would this indicate a problem on the OP's machine or in my code?
Do you call setlocale at all? If not, I would suggest either embedding the locale used to generate the file in the settings file or force all settings file I/O to use the C locale, via the previous suggestion of setlocale(LC_ALL, "C").
One other option is to use the locale specific formatting functions (suffixed with _l in MSVC) and create the C locale explicitly, via _create_locale(LC_ALL, "C").
Related
I'm using a system function which writes the output information into a stream of file pointer.
func(FILE *__fp)
I need to use this information in my program rather than printing this out to a file. For that I thought of creating a tmpfile() and writing to it then reading back from it. But is there a better way to get this information?
There are OS-specific solutions to writing to a memory buffer instead of a file, like for example the POSIX fmemopen or open_memstream (both which should be very useful considering your linux tag).
You can also change the internal buffer to your own with setvbuf.
On an unrelated note: Symbols starting with a leading underscore and followed by another underscore (like for example your __fp argument) are reserved. Such symbols may only be used by "the implementation", i.e. the compiler and library.
I just want to learn that how can I open a file with fopen() function from a dynamic location. I mean, for example it will be a system file and in another computer, this file can be in another location. So if I will set my location in my code not dynamically, my program will not work in another computer. So how Can I set the location dynamically for my program will find this file wherever it is?
You can (and often should) pass program arguments to your main, thru the conventional int argc, char**argv formal arguments of your main. See also this.
(I am focusing on Linux, but you could adapt my answer to other OSes and platforms)
So you would use some convention to pass that file path (not a location, that word usually refers to memory addresses) to your program (often thru the command line starting your program). See also this answer.
You could use (at least on Linux) getopt_long(3) to parse program arguments. But there are other ways, and you can process the arguments of main explicitly.
You could also use some environment variable to pass that information. You'll query it with getenv(3). Read also environ(7).
Many programs have configuration files (whose path is wired into the program but often can be given by program arguments or by environment variables) and are parsing them to find relevant file paths.
And you could even consider some other inter-process communication to pass a file path to your program. After all, a file path is just some string (with restrictions and interpretations explained in path_resolution(7)). There are many ways to pass some data to a program.
Read also about globbing, notably glob(7). On Unix, the shell is expanding the program arguments. You may want to use functions like glob(3) or wordexp(3) on something obtained elsewhere (e.g. in some configuration file) to get similar expansion.
BTW, be sure, when using fopen, to check against its failure. You'll probably use perror like here.
Look also into the source code of several free software projects (perhaps on github) for inspiration.
I would suggest you to use the environment variables, In a PC set your file location as environment variable. then read the environment variable value in your program, then open the file. This idea works both in linux and windows however you have adopt the code based on the OS to read the environment variables.
Besides specifying file location at runtime through command line arguments, environment variables or configuration files, you can implement a PATH-like logic:
Possible locations for your file are set in an environment variable:
export MY_FILE_PATH=/usr/bin:/bin:/opt/bin:$HOME/bin
Your program reads that environment variable, parses its contents and checks existence of file in each specified path, with fopen() return status.
I am using setlocale(LC_ALL,"Portuguese") so my program can read brazillian portuguese accents worlds like "joão" from a text file and print it at screen, and it works fine for this purpose. But when i try to input a word like "joão" from the keyboard and using gets() or scanf() the string saved is something different from the input . Any advices ?
If you are expecting terminal input, it is rarely correct to use setlocale in any way other than
setlocale(LC_ALL, "");
That will set the program's locale to the environment's locale. Normally, the locale setting in the interactive environment corresponds to the configuration of the terminal, so it represents the expectation of the interactive user. Changing the program's locale has no effect on the terminal [Note 1], so if you do change it, it will simply mean that the program's locale no longer corresponds to the user's expectations.
It would be correct to setlocale for file input if you provide some mechanism to specify the environment for the file [Note 2]. In Unix, however, the simplest way for the user to specify that is on the command-line:
LC_ALL=pt_BR.utf8 ./my_command the_portuguese_file.utf8
For Windows, you may want to provide a different mechanism to communicate the file's locale to the program. But in the absence of such a declaration, using the locale configured in the environment will usually be the correct option.
The one exception to the above is programs which prefer to be locale-unaware, which may wish to set the locale to "C" (or "POSIX", but "C" does not require a Posix-compatible setlocale). That can be useful to do as a form of self-documentation, but it is not necessary because a program which does not call setlocale at all will be executed in the "C" locale (on most operating systems).
Notes
In most cases, changing the environment's locale by modifying the value of the environment variable LC_ALL also has not effect on the terminal configuration. Indeed, the terminal may not even be part of the environment; for example, if you have a remote ssh/telnet session, or the GUI equivalent. A user should first configure their terminal according to their expectations, and then configure their environment to correspond; they will expect utility programs they run to respect the environment setting.
Aside from the strings "C", "POSIX" and "", there are no standards which will let you even know what possible locale names are, which is yet another reason not to try to set the locale except when the user has asked you to.
I have some questions, but I can´t find straight answer anywhere.
So, basically, I know what locale is, I know how to use (set) it, but what I dont know is
how is work behind the scene, and I would very like to know it.
So, when I use functions for IO, lets say for example scanf do float, when I need to decide whether country use decimal point or comma (I am actually from decimal comma country :)),
does scanf function "look" to check the current locale?
But if I doesn´t set it in my code, does it by default creates some standard locale itself, OR does it get it from OS?
For example in the part of code when you get handle to console for stdout stderr and stdin?
By default your program will have the C locale.
When you run setlocale(LC_ALL,""); you will set the locale from the outside environment (or you can set just parts LC_*).
By calling setlocale(LC_ALL,"specific_locale"); you will set the specific locale.
All I/O functions should follow the current locale (standard C I/O functions).
The behind-the-code behaviour depends on the operating system and compiler you are using.
When we invoke system call in linux like 'open' or stdio function like 'fopen' we must provide a 'const char * filename'. My question is what is the encoding used here? It's utf-8 or ascii or iso8859-x? Does it depend on the system or environment setting?
I know in MS Windows there is a _wopen which accept utf-16.
It's a byte string, the interpretation is up to the particular filesystem.
Filesystem calls on Linux are encoding-agnostic, i.e. they do not (need to) know about the particular encoding. As far as they are concerned, the byte-string pointed to by the filename argument is passed down to the filesystem as-is. The filesystem expects that filenames are in the correct encoding (usually UTF-8, as mentioned by Matthew Talbert).
This means that you often don't need to do anything (filenames are treated as opaque byte-strings), but it really depends on where you receive the filename from, and whether you need to manipulate the filename in any way.
It depends on the system locale. Look at the output of the "locale" command. If the variables end in UTF-8, then your locale is UTF-8. Most modern linuxes will be using UTF-8. Although Andrew is correct that technically it's just a byte string, if you don't match the system locale some programs may not work correctly and it will be impossible to get correct user input, etc. It's best to stick with UTF-8.
The filename is the byte string; regardless of locale or any other conventions you're using about how filenames should be encoded, the string you must pass to fopen and to all functions taking filenames/pathnames is the exact byte string for how the file is named. For example if you have a file named ö.txt in UTF-8 in NFC, and your locale is UTF-8 encoded and uses NFC, you can just write the name as ö.txt and pass that to fopen. If your locale is Latin-1 based, though, you can't pass the Latin-1 form of ö.txt ("\xf6.txt") to fopen and expect it to succeed; that's a different byte string and thus a different filename. You would need to pass "\xc3\xb6.txt" ("ö.txt" if you interpret that as Latin-1), the same byte string as the actual name.
This situation is very different from Windows, which you seem to be familiar with, where the filename is is a sequence of 16-bit units interpreted as UTF-16 (although AFAIK they need not actually be valid UTF-16) and filenames passed to fopen, etc. are interpreted according to the current locale as Unicode characters which are then used to open/access the file based on its UTF-16 name.
As already mentioned above, this will be a byte string and the interpretation will be open to the underlying system. More specifically, imagine to C functions; one in user space and one in kernel space which take char * as their parameter. The encoding in user space will depend upon the execution character set of the user program (eg. specified by -fexec-charset=charset in gcc). The encoding expected by the kernel function depends upon the execution charset used during kernel compilation (not sure where to get that information).
I did some further inquiries on this topic and came to the conclusion that there are two different ways how filename encoding can be handled by unixoid file systems.
File names are encoded in the "sytem locale", which usually is, but needs not to be the same as the current environment locale that is reflected by the locale command (but some preset in a global configuration file).
File names are encoded in UTF-8, independent from any locale settings.
GTK+ solves this mess by assuming UTF-8 and allowing to override it either by the current locale encoding or a user-supplied encoding.
Qt solves it by assuming locale encoding (and that system locale is reflected in the current locale) and allowing to override it with a user-supplied conversion function.
So the bottom line is: Use either UTF-8 or what LC_ALL or LANG tell you by default, and provide an override setting at least for the other alternative.