I am working on a personal prooject in which I need to obtain full locale formatting information from a C locale.
I cannot simply use localeconv or localeconv_l since lconv does not provide all formatting information needed. To solve this on *NIX there are nl_langinfo and nl_langinfo_l functions, however they are not present on Windows.
What ways are there to obtain locale formatting information on Windows?
start with: GetUserDefaultUILanguage
Similar and related APIs include:
GetUserDefaultLocaleName
GetUserDefaultLCID
GetUserDefaultLangID
The Perl 5 open source C language code contains an emulation of nl_langinfo(), for Windows and other platforms that lack it. You can steal the code, though it is complicated by trying to work on a bunch of different platforms with a bunch of different configurations
A few fields aren't implemented such as the Japanese emperor era names. But anything in common use is available.
Start with this file: https://github.com/Perl/perl5/blob/blead/locale.c
The code continues to evolve
I apologise if it is a silly question..I recently developed an application in windows with C and WinApi. I am in need to check whether application is UNICODE compatible or not. How can I test my machine? Is there any procedure followed to check UNICODE Compatibility. More over I dont have a chinese language machine or any other languages. I want this test to be done in my machine in which language is by default English.
Please provide some links if possible or a detailed procedure.
Great question. On Windows platform this is challenging indeed, because there are many different encodings and code pages supported and one can mix between them.
What I usually do is test the application on input which is a mix of two non-ASCII languages, such as a filename which is a mix of Russian and Hebrew letters, and see that the application is able to open this file, etc. You can copy this: "שלום привет hello" and see how it works for this kind of input.
Because we have two languages here, it is not possible to support with an ANSI codepage, so there will be no this kind of a bug, which is the most common.
Do all popular iconv implementations support conversion from UTF-16BE (i.e. UTF-16 with big-endian byte order)? GNU iconv supports this encoding, but what about the other implementations in common use? Specifically, what do mingw and the *BSDs support?
Should I rather do this conversion myself?
If it's a big deal for you, you have an easy way out. Just write an autoconf test for UTF-16BE support, and then make the configuration script fail with an error message if it's not present.
Then you can take your time to sift through the standards, or, just forget about the whole issue.
Since libiconv is LGPL and supports UTF-16BE (website), you can always point users to that. There are some projects that include libiconv rather than rely on platform implementations.
So, I’m working on a plain-C (ANSI 9899:1999) project, and am trying to figure out where to get started re: Unicode, UTF-8, and all that jazz.
Specifically, it’s a language interpreter project, and I have two primary places where I’ll need to handle Unicode: reading in source files (the language ostensibly supports Unicode identifiers and such), and in ‘string’ objects.
I’m familiar with all the obvious basics about Unicode, UTF-7/8/16/32 & UCS-2/4, so on and so forth… I’m mostly looking for useful, C-specific (that is, please no C++ or C#, which is all that’s been documented here on SO previously) resources as to my ‘next steps’ to implement Unicode-friendly stuff… in C.
Any links, manpages, Wikipedia articles, example code, is all extremely welcome. I’ll also try to maintain a list of such resources here in the original question, for anybody who happens across it later.
A must read before considering anything else, if you’re unfamiliar with Unicode, and what an encoding actually is: http://www.joelonsoftware.com/articles/Unicode.html
The UTF-8 home-page: http://www.utf-8.com/
man 3 iconv (as well as iconv_open and iconvctl)
International Components for Unicode (via Geoff Reedy)
libbasekit, which seems to include light Unicode-handling tools
Glib has some Unicode functions
A basic UTF-8 detector function, by Christoph
International Components for Unicode provides a portable C library for handling unicode. Here's their elevator pitch for ICU4C:
The C and C++ languages and many operating system environments do not provide full support for Unicode and standards-compliant text handling services. Even though some platforms do provide good Unicode text handling services, portable application code can not make use of them. The ICU4C libraries fills in this gap. ICU4C provides an open, flexible, portable foundation for applications to use for their software globalization requirements. ICU4C closely tracks industry standards, including Unicode and CLDR (Common Locale Data Repository).
GLib has some Unicode functions and is a pretty lightweight library. It's not near the same level of functionality that ICU provides, but it might be good enough for some applications. The other features of GLib are good to have for portable C programs too.
GTK+ is built on top of GLib. GLib provides the fundamental algorithmic language constructs commonly duplicated in applications. This library has features such as (this list is not a comprehensive list):
Object and type system
Main loop
Dynamic loading of modules (i.e. plug-ins)
Thread support
Timer support
Memory allocator
Threaded Queues (synchronous and asynchronous)
Lists (singly linked, doubly linked, double ended)
Hash tables
Arrays
Trees (N-ary and binary balanced)
String utilities and charset handling
Lexical scanner and XML parser
Base64 (encoding & decoding)
I think one of the interesting questions is - what should your canonical internal format for strings be? The 2 obvious choices (to me at least) are
a) utf8 in vanilla c-strings
b) utf16 in unsigned short arrays
In previous projects I have always chosen utf-8. Why ; because its the path of least resistance in the C world. Everything you are interfacing with (stdio, string.h etc) will work fine.
Next comes - what file format. The problem here is that its visible to your users (unless you provide the only editor for your language). Here I guess you have to take what they give you and try to guess by peeking (byte order marks help)
Is there any difference in C that is written in Windows and Unix?
I teach C as well as C++ but some of my students have come back saying some of the sample programs do not run for them in Unix. Unix is alien to me. Unfortunately no experience with it whatsoever. All I know is to spell it. If there are any differences then I should be advising our department to invest on systems for Unix as currently there are no Unix systems in our lab. I do not want my students to feel that they have been denied or kept away from something.
That kind of problems usually appear when you don't stick to the bare C standard, and make assumptions about the environment that may not be true. These may include reliance on:
nonstandard, platform specific includes (<conio.h>, <windows.h>, <unistd.h>, ...);
undefined behavior (fflush(stdin), as someone else reported, is not required to do anything by the standard - it's actually undefined behavior to invoke fflush on anything but output streams; in general, older compilers were more lenient about violation of some subtle rules such as strict aliasing, so be careful with "clever" pointer tricks);
data type size (the short=16 bit, int=long=32 bit assumption doesn't hold everywhere - 64 bit Linux, for example, has 64 bit long);
in particular, pointer size (void * isn't always 32 bit, and can't be always casted safely to an unsigned long); in general you should be careful with conversions and comparisons that involve pointers, and you should always use the provided types for that kind of tasks instead of "normal" ints (see in particular size_t, ptrdiff_t, uintptr_t)
data type "inner format" (the standard does not say that floats and doubles are in IEEE 754, although I've never seen platforms doing it differently);
nonstandard functions (__beginthread, MS safe strings functions; on the other side, POSIX/GNU extensions)
compiler extensions (__inline, __declspec, #pragmas, ...) and in general anything that begins with double underscore (or even with a single underscore, in old, nonstandard implementations);
console escape codes (this usually is a problem when you try to run Unix code on Windows);
carriage return format: in normal strings it's \n everywhere, but when written on file it's \n on *NIX, \r\n on Windows, \r on pre-OSX Macs; the conversion is handled automagically by the file streams, so be careful to open files in binary when you actually want to write binary data, and leave them in text mode when you want to write text.
Anyhow an example of program that do not compile on *NIX would be helpful, we could give you preciser suggestions.
The details on the program am yet to get. The students were from our previous batch. Have asked for it. turbo C is what is being used currently.
As said in the comment, please drop Turbo C and (if you use it) Turbo C++, nowadays they are both pieces of history and have many incompatibilities with the current C and C++ standards (and if I remember well they both generate 16-bit executables, that won't even run on 64 bit OSes on x86_64).
There are a lot of free, working and standard-compliant alternatives (VC++ Express, MinGW, Pelles C, CygWin on Windows, and gcc/g++ is the de-facto standard on Linux, rivaled by clang), you just have to pick one.
The language is the same, but the libraries used to get anything platform-specific done are different. But if you are teaching C (and not systems programming) you should easily be able to write portable code. The fact that you are not doing so makes me wonder about the quality of your training materials.
The standard libraries that ship with MSVC and those that ship with a typical Linux or Unix compiler are different enough that you are likely to encounter compatibility issues. There may also be minor dialectic variations between MSVC and GCC.
The simplest way to test your examples in a unix-like environment would be to install Cygwin or MSYS on your existing Windows kit. These are based on GCC and common open-source libraries and will behave much more like the C compiler environment on a unix or linux system.
Cygwin is the most 'unix like', and is based on a cygwin.dll, which is an emulation layer that emulates unix system calls on top of the native Win32 API. Generally anything that would compile on Cygwin is very likely to compile on Linux, as Cygwin is based on gcc and glibc. However, native Win32 APIs are not available to applications compiled on Cygwin.
MSYS/MinGW32 is designed for producing native Win32 apps using GCC. However, most of the standard GNU and other OSS libraries are available, so it behaves more like a unix environment than VC does. In fact, if you are working with code that doesn't use Win32 or unix specific APIs it will probably port between MinGW32 and Linux more easily than it would between MinGW32 and MSVC.
While getting Linux installed in your lab is probably a useful thing to do (Use VMWare player or some other hypervisor if you can't get funding for new servers) you can use either of the above toolchains to get something that will probably be 'close enough' for your purposes. You can learn unix as takes your fancy, and both Cygwin and MSYS will give you a unix-like environment that could give you a bit of a gentle intro in the meantime.
C syntax must be the same if both Windows and Unix compilers adhere to the same C standard. I was told that MS compilers still don't support C99 in full, although Unix compilers are up to speed, so it seems C89 is a lowest common denominator.
However in Unix world you typically will use POSIX syscalls to do system stuff, like IPC etc. Windows isn't POSIX system so it has different API for it.
There is this thing called Ansi C. As long as you code purely Ansi C, there should be no difference. However, this is a rather academic assumption.
In real life, I have never encountered any of my codes being portable from Linux to Windows and vice versa without any modification. Actually, this modificationS (definitely plural) turned out into a vast amout of pre-processor directives, such as #ifdef WINDOWS ... #endif and #ifdef UNIX ... #endif ... even more, if some parallel libs, such as OPENMPI were used.
As you may imagine, this is totally contrary to readable and debugable code, but that was what worked ;-)
Besides, you have got to consider things already mentioned: UTF-8 will sometimes knock out linux compilers...
There should be no difference between the C programming language under windows or *nix,cause the language is specified by the ISO standard.
The C language itself is the portable from Windows to Unix. But operating system details are different and sometimes those intrude into your code.
For instance Unix systems typically use only "\n" to separate lines in a text file, while most Windows tools expect to see "\r\n". There are ways to deal with this sort of difference in a way that gets the C runtime to handle it for you but if you aren't careful you know about them, it's pretty easy to write OS specific C code.
I could that you run a Unix in a Virtual Machine and use that to test your code before you share it with your students.
I think its critical that you familiarize yourself with unix right now.
An excellent way to do this is a with a Knoppix CD.
Try to compile your programs under Linux using gc, and when they don't work, track down the problems (#include <windows>?) and make it work. Then return to windows, and it'll likely compile ok.
In this way, you will discover your programs become cleaner and better teaching material, even for lab exercises on windows machines.
A common problem is that fflush(stdin) doesn't work on Unix.
Which is perfectly normal, since the standard doesn't define how the implementation should handle it.
The solution is to use something like this (untested):
do
{
int c = getchar();
}
while (c != '\n' && c != EOF);
Similarly, you need to avoid anything that causes undefined behavior.