How to test my application is UNICODE Compatible or not? - c

I apologise if it is a silly question..I recently developed an application in windows with C and WinApi. I am in need to check whether application is UNICODE compatible or not. How can I test my machine? Is there any procedure followed to check UNICODE Compatibility. More over I dont have a chinese language machine or any other languages. I want this test to be done in my machine in which language is by default English.
Please provide some links if possible or a detailed procedure.

Great question. On Windows platform this is challenging indeed, because there are many different encodings and code pages supported and one can mix between them.
What I usually do is test the application on input which is a mix of two non-ASCII languages, such as a filename which is a mix of Russian and Hebrew letters, and see that the application is able to open this file, etc. You can copy this: "שלום привет hello" and see how it works for this kind of input.
Because we have two languages here, it is not possible to support with an ANSI codepage, so there will be no this kind of a bug, which is the most common.

Related

How to get extended locale information with Windows CRT API

I am working on a personal prooject in which I need to obtain full locale formatting information from a C locale.
I cannot simply use localeconv or localeconv_l since lconv does not provide all formatting information needed. To solve this on *NIX there are nl_langinfo and nl_langinfo_l functions, however they are not present on Windows.
What ways are there to obtain locale formatting information on Windows?
start with: GetUserDefaultUILanguage
Similar and related APIs include:
GetUserDefaultLocaleName
GetUserDefaultLCID
GetUserDefaultLangID
The Perl 5 open source C language code contains an emulation of nl_langinfo(), for Windows and other platforms that lack it. You can steal the code, though it is complicated by trying to work on a bunch of different platforms with a bunch of different configurations
A few fields aren't implemented such as the Japanese emperor era names. But anything in common use is available.
Start with this file: https://github.com/Perl/perl5/blob/blead/locale.c
The code continues to evolve

Windows application ANSI to Unicode

I have made a mistake by developing a big Windows application that supports only ANSI. What are all the impacts, if my application doesn't support Unicode? Also, I am planning to migrate my code to Unicode. As, I am completely blocked due to the critical time factor of my task, could someone help me?
My target environment: XP, Windows Vista, Windows 7 (both 32 bit and 64 bit). It's enough if my application supports only English, but it should run anywhere in the world.
Note: I am not getting input from the user as strings.
What are all the impacts, if my application doesn't support unicode?
It probably won't be able to deal with Unicode in
UI elements (no problem is your UI doesn't have any text entry facilities)
File paths
Text file content
Translation messages (no problem as you state English to be enough)
File formats like XML
Registry data (I think)
I won't claim to have listed all the impacts. I'll make this a community wiki, feel free to add to the list.
i am planning to migrate my code to Unicode? […] could some one help me?
The first sentence is not a question, is it? I doubt SO is the right place to search for people who'll help you porting a “big” application, except by answering specific questions whenever you get stuck.
I guess this MSDN link should help you understand better about using unicode.
http://msdn.microsoft.com/en-us/goglobal/bb688113.aspx
Also, refer to the w3c documentation at:
http://www.w3.org/International/articles/unicode-migration/
They pretty much cover most of the answers, to your question.

Status of Oberon readiness for application programming

I am getting interested in the Oberon language and I would like to know: is the language actually used by common programmers or is it still only used by researchers? Is it production-ready? What I have in mind are non-scientific applications requiring GUI support and possibly Internet connectivity (at least client-side POP3 and SMTP functionality).
Also, which of the Oberon flavors would you recommend for my needs (Oberon2, Active Oberon, etc)? The simpler, the better, as long as it is well maintained and has some community.
If possible, I would like to run my applications in a conventional host environment (Windows or Linux), without the need for a special runtime environment or a special operating system.
Thanks
BlackBox has some of what you want, runs on flavors of Windows.
There are also some environments that compile to Java bytecode and target the JVM.
Look at POW, and Gardens Point Component Pascal.
I happen to be using some command-line only tools that are Oberon Compilers.
OO2C is an Oberon to C compiler (but the output is not for human consumption).
Ofront is an Oberon to Human-Readable C, but I haven't yet set up a linux box to run it on. (otherwise, it is supposed to run inside of BlackBox on Windows).
There is also Oxford Oberon Compiler by Professor Spivey. A VERY enjoyable Compiler that compiles to a Virtual Machine, but the whole object code is a self-contained application (albeit command -line).
It is a VERY small download, meant for an educational environment, keeps everything CLEAN, and works well for prototyping some of the grunt work or procedures/modules of your code. It also is supposed to allow bitmap drawing in XWindows in Black and White only, probably for drawing graphs, etc, but I have not had an opportunity to use that feature yet.
It has a GUI-based debugger, profiling, and some other interesting tools, and still is very small by comparison to most modern compilers like gcc. It is also totally stand alone.
Works on Mac, Win, Linux, and has source.
By comparison, OO2C took me about a day of futzing and compiling to get it going (but it is working).
I don't have a Windows box right now, so I can't run my copy of BlackBox, but it had a full GUI, and lots of Source code available at the Component Pascal Collection website.
http://www.zinnamturm.eu/index.htm
If you are looking for source code you should also check out that site in hopes you don't have to reinvent the wheel.
Really a joy to step into Oberon after having to fight C/C++ all day long to get simple stuff done.
OBNC is a new compiler for the latest version (2016) of the original Oberon language by Niklaus Wirth. It compiles via C and makes it easy to interface to existing C libraries.
https://miasap.se/obnc/
Given that Oberon [language] was developed as a complete [operating-]system, and that ETH's CS department ran ALL its computers (even the secretary's) on it I should think it is application-ready. This according to the following PDF:
http://www.ics.uci.edu/~franz/Site/pubs-pdf/BC03.pdf
is the language actually used by common programmers or is it still only used by researchers?
There was/is little use of the original Oberon language outside academia; there was some industrial adaptation of Oberon dialects like e.g. Component Pascal.
Is it production-ready?
Depends on your requirements. Given todays expectations of software developers the (original) language and available toolchains seem very minimalistic.
non-scientific applications requiring GUI support and possibly Internet connectivity ... in a conventional host environment... which of the Oberon flavors would you recommend for my needs?
GUI support and network programming in a conventional host environment is e.g. supported by https://blackboxframework.org as already mentioned, which uses a language related to Oberon.
You could also have a look at https://github.com/rochus-keller/Oberon which includes a platform independend IDE with semantic navigation and a source-level debugger, plus a platform independent foreign function interface as a language extension which allows you to directly use any C shared library, and thus reuse the plethora of existing proven GUI or network libraries out there without having to program in C. It also offers a modern, lean syntax variant without all the semicolons and capitalized keywords, which should appeal especially to younger developers; but of course also the traditional syntax is supported, even mixed modern/traditional syntax projects.

should I eliminate TCHAR from Windows code?

I am revising some very old (10 years) C code. The code compiles on Unix/Mac with GCC and cross-compiles for Windows with MinGW. Currently there are TCHAR strings throughout. I'd like to get rid of the TCHAR and use a C++ string instead. Is it still necessary to use the Windows wide functions, or can I do everything now with Unicode and UTF-8?
Windows uses UTF16 still and most likely always will. You need to use wstring rather than string therefore. Windows APIs don't offer support for UTF8 directly largely because Windows supported Unicode before UTF8 was invented.
It is thus rather painful to write Unicode code that will compile on both Windows and Unix platforms.
Is it still necessary to use the
Windows wide functions, or can I do
everything now with Unicode and UTF-8?
Yes. Unfortunately, Windows does not have native support for UTF-8. If you want proper Unicode support, you need to use the wchar_t version of the Windows API functions, not the char version.
should I eliminate TCHAR from Windows code?
Yes, you should. The reason TCHAR exists is to support both Unicode and non-Unicode versions of Windows. Non-Unicode support may have been a major concern back in 2001 when Windows 98 was still popular, but not today.
And it's highly unlikely that any non-Windows-specific library would have the same kind of char/wchar_t overloading that makes TCHAR usable.
So go ahead and replace all your TCHARs with wchar_ts.
The code compiles on Unix/Mac with GCC and cross-compiles for Windows with MinGW.
I've had to write cross-platform C++ code before. (Now my job is writing cross-platform C# code.) Character encoding is rather painful when Windows doesn't support UTF-8 and Un*x doesn't support UTF-16. I ended up using UTF-8 as our main encoding and converting as necessary on Windows.
Yes, writing non-unicode applications nowadays is shooting yourself in the foot. Just use the wide API everywhere, and you'll not have to cry about it later. You can still use UTF8 on UNIX and wchar_t on Windows if you don't need (network) communication between platforms (or convert the wchar_t's with Win32 API to UTF-8), or go the hard way and use UTF-8 everywhere and convert to wchar_t's when you use Win32 API functions (that's what I do).
To directly answer your question:
Is it still necessary to use the Windows wide functions, or can I do everything now with Unicode and UTF-8?
No, (non-ASCII) UTF-8 is not accepted by the vast majority of Windows API functions. You still have to use the wide APIs.
One could similarly bemoan that other OSes still have no support for wchar_t. So you also have to support UTF-8.
The other answers provide some good advice on how to manage this in a cross-platform codebase, but it sounds as if you already have an implementation supporting different character types. As desirable as ripping that out to simplify the code might sound, don't.
And I predict that someday, although probably not before the year 2020, Windows will add UTF-8 support, simply by adding U versions of all the API functions, alongside A and W, plus the same kind of linker hack. The 8-bit A functions are just a translation layer over the native W (UTF-16) functions. I bet they could generate a U-layer semi-automatically from the A-layer.
Once they've been teased enough, long enough, about their '20th century' Unicode support...
They'll still manage to make it awkward to write, ugly to read and non-portable by default, by using carefully chosen macros and default Visual Studio settings.

Where can I get started with Unicode-friendly programming in C?

So, I’m working on a plain-C (ANSI 9899:1999) project, and am trying to figure out where to get started re: Unicode, UTF-8, and all that jazz.
Specifically, it’s a language interpreter project, and I have two primary places where I’ll need to handle Unicode: reading in source files (the language ostensibly supports Unicode identifiers and such), and in ‘string’ objects.
I’m familiar with all the obvious basics about Unicode, UTF-7/8/16/32 & UCS-2/4, so on and so forth… I’m mostly looking for useful, C-specific (that is, please no C++ or C#, which is all that’s been documented here on SO previously) resources as to my ‘next steps’ to implement Unicode-friendly stuff… in C.
Any links, manpages, Wikipedia articles, example code, is all extremely welcome. I’ll also try to maintain a list of such resources here in the original question, for anybody who happens across it later.
A must read before considering anything else, if you’re unfamiliar with Unicode, and what an encoding actually is: http://www.joelonsoftware.com/articles/Unicode.html
The UTF-8 home-page: http://www.utf-8.com/
man 3 iconv (as well as iconv_open and iconvctl)
International Components for Unicode (via Geoff Reedy)
libbasekit, which seems to include light Unicode-handling tools
Glib has some Unicode functions
A basic UTF-8 detector function, by Christoph
International Components for Unicode provides a portable C library for handling unicode. Here's their elevator pitch for ICU4C:
The C and C++ languages and many operating system environments do not provide full support for Unicode and standards-compliant text handling services. Even though some platforms do provide good Unicode text handling services, portable application code can not make use of them. The ICU4C libraries fills in this gap. ICU4C provides an open, flexible, portable foundation for applications to use for their software globalization requirements. ICU4C closely tracks industry standards, including Unicode and CLDR (Common Locale Data Repository).
GLib has some Unicode functions and is a pretty lightweight library. It's not near the same level of functionality that ICU provides, but it might be good enough for some applications. The other features of GLib are good to have for portable C programs too.
GTK+ is built on top of GLib. GLib provides the fundamental algorithmic language constructs commonly duplicated in applications. This library has features such as (this list is not a comprehensive list):
Object and type system
Main loop
Dynamic loading of modules (i.e. plug-ins)
Thread support
Timer support
Memory allocator
Threaded Queues (synchronous and asynchronous)
Lists (singly linked, doubly linked, double ended)
Hash tables
Arrays
Trees (N-ary and binary balanced)
String utilities and charset handling
Lexical scanner and XML parser
Base64 (encoding & decoding)
I think one of the interesting questions is - what should your canonical internal format for strings be? The 2 obvious choices (to me at least) are
a) utf8 in vanilla c-strings
b) utf16 in unsigned short arrays
In previous projects I have always chosen utf-8. Why ; because its the path of least resistance in the C world. Everything you are interfacing with (stdio, string.h etc) will work fine.
Next comes - what file format. The problem here is that its visible to your users (unless you provide the only editor for your language). Here I guess you have to take what they give you and try to guess by peeking (byte order marks help)

Resources