Wide-characters support on VxWorks downloadable kernel module - kernel-module

I'm working on porting a project from Linux to VxWorks DKM.
But I face a problem: On linux, wide characters and wide char functions ( like wcslen() or mbrtowc() ) are used in some parts of this project and since VxWorks DKM doesn't support wide chars ( and wide chars functions...) i'm stuck.
My question is : Is there any alternative to Wide chars on VxWorks DKM I can use ?
Wide char are suported in the RTP mode of VxWorks but not in DKM.

Related

wchar_t to UTF-8 and back string conversion in crossplatform styled c program

My program is written as C language crossplatform one (at least linux and windows are supported currently). It sends and receives some data over the network in the custom defined packages. Today is good time to enable Unicode characters in this data. I consider the most compact and universal format for this task is utf-8.
So I gets null-terminated wchar_t* string from the console input to be able to handle all characters supported by the system the program is run on. Then I need to convert wchar_t* to utf-8-coded char* to send the data over the network.
Then I receives an answer from the other side and need to complete the same convertion backward to wchar_t*.
Is there a simple way to do these conversions in crossplatform manner? Do I need to use some heavy library like ICU or the things are more easy?

wide char string functions on linux / windows

I want to create a string library with two different string classes for handling UTF-8 and UCS-2 (which I beleive is some kind of UTF-16 not handling surrogates and characters above 0xFFFF).
On windows platforms, wide chars are 2 octets wide. On Linux they are 4. So what happens with functions related to wide char strings ? Do you pass buffers of 2 octets wide items on windows and 4 octets wide items on linux ? If yes then it makes these functions totally different on windows and linux, which doesn't make them really "standard"...
How can one handle this problem when trying to create a library that is supposed to manipulate wide chars the same way for cross platform code ? Thank you.
You're right about the different sizes of wchar_t on Windows and Linux. That also means you're right about the wide-character handling functions not being too useful. You should probably check out an encoding conversion library such as libiconv. Then you can work with UTF-8 internally and just convert on I/O.

wchar_t encoding on different platforms

I faced a problem with encodings on different platforms (in my case Windows and Linux). On windows, size of wchar_t is 2 bytes, whereas on Linux it's 4 bytes. How can I "standardize" wchar_t to be same size for both platforms? Is it hard to implement without additional libraries? For now, I'm aiming for printf/wprintf API. The data is sent via socket communication. Thank you.
If you want to send Unicode data across different platforms and architectures, I'd suggest using UTF-8 encoding and (8-bit) chars. UTF-8 has some advantages like not having endiannes issues (UTF-8 is just a plain sequence of bytes, instead both UTF-16 and UTF-32 can be little-endian or big-endian...).
On Windows, just convert the UTF-8 text to UTF-16 at the boundary of Win32 APIs (since Windows APIs tend to work with UTF-16). You can use the MultiByteToWideChar() API for that.
To solve this problem I think you are going to have to convert all strings into UTF-8 before transmitting. On Windows you would use the WideCharToMultiByte function to convert wchar_t strings to UTF-8 strings, and MultiByteToWideChar to convert UTF-8 strings into wchar_t strings.
On Linux things aren't as straightforward. You can use the functions wctomb and mbtowc, however what they convert to/from depends on the underlying locale setting. So if you want these to convert to/from UTF-8 and Unicode then you'll need to make sure the locale is set to use UTF-8 encoding.
This article might also be a good resource.

C - Read Directly From the Keyboard Buffer

This is a question in the C programming language.
How do I directly read the data in the keyboard buffer?
I want to directly access the data and store it in a variable. Of what data type should the variable be?
I need it for an operating system our institute is currently developing. It's called ICS-OS and I am not quite sure about the specifics. It runs on x86, 32-bit machines (we run it on QEMU in a Linux box). Here is the link for the Google Code http://code.google.com/p/ics-os/. I hope that's sufficient enough information.
The operating system does not support the conio.h library so kbhit is not an option.
This is really platform dependent.
If this is for Windows, the most direct access to a "keyboard buffer" is using WM_INPUT and GetRawInputData. See Using raw input with example for both keyboard and mouse.
Another DOS / Windows specific way are conio.h functions getch() / kbhit().
Portable library is called Curses and has ports for both Linux and Windows.
However, as you are targeting quite specific OS, you need to check the docs for that OS.
The most direct platform independent way is getchar / scanf / anything which reads from stdin, but stdin is line buffered, therefore you will get no data until enter is pressed. You may be able to change the buffering settings, but again, this is platform dependent and may be not possible on some platform. See a related discussion of setbuf(stdin,NULL).
Have you tried looking at the source code of the linux kernel for the keyboard driver?
Take a look at /drivers/input/keyboard/xtkbd.* for a simple XT keyboard driver.
Also, here's an article which briefly explains how it's done.
if you want to directly read data from keyboard buffer then you can user getchar or getc!
This is read from keyboard buffer
scanf("%d",&myvariable);
but you have to use
"%d" for int ,"%f" for float ,%e for double ,"%c" for char , "%s" for strings to identifing type which has to match type of your variable.

Why isn't wchar_t widely used in code for Linux / related platforms?

This intrigues me, so I'm going to ask - for what reason is wchar_t not used so widely on Linux/Linux-like systems as it is on Windows? Specifically, the Windows API uses wchar_t internally whereas I believe Linux does not and this is reflected in a number of open source packages using char types.
My understanding is that given a character c which requires multiple bytes to represent it, then in a char[] form c is split over several parts of char* whereas it forms a single unit in wchar_t[]. Is it not easier, then, to use wchar_t always? Have I missed a technical reason that negates this difference? Or is it just an adoption problem?
wchar_t is a wide character with platform-defined width, which doesn't really help much.
UTF-8 characters span 1-4 bytes per character. UCS-2, which spans exactly 2 bytes per character, is now obsolete and can't represent the full Unicode character set.
Linux applications that support Unicode tend to do so properly, above the byte-wise storage layer. Windows applications tend to make this silly assumption that only two bytes will do.
wchar_t's Wikipedia article briefly touches on this.
The first people to use UTF-8 on a Unix-based platform explained:
The Unicode Standard [then at version 1.1]
defines an
adequate character set but an
unreasonable representation [UCS-2]. It states
that all characters are 16 bits wide [no longer true]
and are communicated and stored in 16-bit units.
It also reserves a pair
of characters (hexadecimal FFFE and
FEFF) to detect byte order in
transmitted text, requiring state in
the byte stream. (The Unicode
Consortium was thinking of files, not
pipes.) To adopt this encoding, we
would have had to convert all text
going into and out of Plan 9 between
ASCII and Unicode, which cannot be
done. Within a single program, in
command of all its input and output,
it is possible to define characters as
16-bit quantities; in the context of a
networked system with hundreds of
applications on diverse machines by
different manufacturers [italics mine], it is
impossible.
The italicized part is less relevant to Windows systems, which have a preference towards monolithic applications (Microsoft Office), non-diverse machines (everything's an x86 and thus little-endian), and a single OS vendor.
And the Unix philosophy of having small, single-purpose programs means fewer of them need to do serious character manipulation.
The source for our tools and
applications had already been
converted to work with Latin-1, so it
was ‘8-bit safe’, but the conversion
to the Unicode Standard and UTF[-8] is
more involved. Some programs needed no
change at all: cat, for instance,
interprets its argument strings,
delivered in UTF[-8], as file names
that it passes uninterpreted to the
open system call, and then just copies
bytes from its input to its output; it
never makes decisions based on the
values of the bytes...Most programs,
however, needed modest change.
...Few tools actually need to operate
on runes [Unicode code points]
internally; more typically they need
only to look for the final slash in a
file name and similar trivial tasks.
Of the 170 C source programs...only 23
now contain the word Rune.
The programs that do store runes
internally are mostly those whose
raison d’être is character
manipulation: sam (the text editor),
sed, sort, tr, troff, 8½ (the window
system and terminal emulator), and so
on. To decide whether to compute using
runes or UTF-encoded byte strings
requires balancing the cost of
converting the data when read and
written against the cost of converting
relevant text on demand. For programs
such as editors that run a long time
with a relatively constant dataset,
runes are the better choice...
UTF-32, with code points directly accessible, is indeed more convenient if you need character properties like categories and case mappings.
But widechars are awkward to use on Linux for the same reason that UTF-8 is awkward to use on Windows. GNU libc has no _wfopen or _wstat function.
UTF-8, being compatible to ASCII, makes it possible to ignore Unicode somewhat.
Often, programs don't care (and in fact, don't need to care) about what the input is, as long as there is not a \0 that could terminate strings. See:
char buf[whatever];
printf("Your favorite pizza topping is which?\n");
fgets(buf, sizeof(buf), stdin); /* Jalapeños */
printf("%s it shall be.\n", buf);
The only times when I found I needed Unicode support is when I had to have a multibyte character as a single unit (wchar_t); e.g. when having to count the number of characters in a string, rather than bytes. iconv from utf-8 to wchar_t will quickly do that. For bigger issues like zero-width spaces and combining diacritics, something more heavy like icu is needed—but how often do you do that anyway?
wchar_t is not the same size on all platforms. On Windows it is a UTF-16 code unit that uses two bytes. On other platforms it typically uses 4 bytes (for UCS-4/UTF-32). It is therefore unlikely that these platforms would standardize on using wchar_t, since it would waste a lot of space.

Resources