Is it safe to assume they are ISO-8859-15 (Window-1252?), or is there some function I can call to query this? The end goal is to conversion to UTF-8.
Background:
The problem described by this question arises because XMLStarlet assumes its command line arguments are UTF-8. Under Windows it seems they are actually ISO-8859-15 (Window-1252?), or at least adding the following to the beginning of main makes things work:
char **utf8argv = malloc(sizeof(char*) * (argc+1));
utf8argv[argc] = NULL;
{
iconv_t windows2utf8 = iconv_open("UTF-8", "ISO-8859-15");
int i;
for (i = 0; i < argc; i++) {
const char *arg = argv[i];
size_t len = strlen(arg);
size_t outlen = len*2 + 1;
char *utfarg = malloc(outlen);
char *out = utfarg;
size_t ret = iconv(windows2utf8,
&arg, &len,
&out, &outlen);
if (ret < 0) {
perror("iconv");
utf8argv[i] = NULL;
continue;
}
out[0] = '\0';
utf8argv[i] = utfarg;
}
argv = utf8argv;
}
Testing Encoding
The following program prints out the bytes of its first argument in decimal:
#include <strings.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
for (int i = 0; i < strlen(argv[1]); i++) {
printf("%d ", (unsigned char) argv[1][i]);
}
printf("\n");
return 0;
}
chcp reports code page 850, so the characters æ and Æ should be 145 and 146, respectively.
C:\Users\npostavs\tmp>chcp
Active code page: 850
But we see 230 and 198 reported which matches 1252:
C:\Users\npostavs\tmp>cmd-chars æÆ
230 198
Passing characters outside of codepage causes lossy transformation
Making a shortcut to cmd-chars.exe with arguments αβγ (these are not present in codepage 1252) gives
C:\Users\npostavs\tmp>shortcut-cmd-chars.lnk
97 223 63
Which is aß?.
You can call CommandLineToArgvW with a call to GetCommandLineW as the first argument to get the command-line arguments in an argv-style array of wide strings. This is the only portable Windows way, especially with the code page mess; Japanese characters can be passed via a Windows shortcut for example. After that, you can use WideCharToMultiByte with a code page argument of CP_UTF8 to convert each wide-character argv element to UTF-8.
Note that calling WideCharToMultiByte with an output buffer size (byte count) of 0 will allow you to determine the number of UTF-8 bytes required for the number of characters specified (or the entire wide string including the null terminator if you wish to pass -1 as the number of wide characters to simplify your code). Then you can allocate the required number of bytes using malloc et al. and call WideCharToMultiByte again with the correct number of bytes instead of 0. If this was performance-critical, a different solution would probably be best, but since this is a one-time function to get command-line arguments, I'd say any decrease in performance would be negligible.
Of course, don't forget to free all of your memory, including calling LocalFree with the pointer returned by CommandLineToArgvW as the argument.
For more info on the functions and how you can use them, click the links to see the MSDN documentation.
The command-line parameters are in the system default codepage, which varies depending on system settings. Rather than specify a specific source charset at all, you can specify "char" or "" instead and let iconv_open() figure out what the system charset actually is:
iconv_t windows2utf8 = iconv_open("UTF-8", "char");
Otherwise, you are better off retrieving the command-line as UTF-16 instead of as Ansi, and then you can convert it directly to UTF-8 using iconv_open("UTF-8", "UTF-16LE"), or WideCharToMultiByte(CP_UTF8) like Chrono suggested.
It seems that you are under windows.
In this case, you can make a system() call to run the CHCP command.
#include <stdlib.h> // Uses: system()
#include <stdio.h>
// .....
// 1st: Store the present windows codepage in a text file:
system("CMD /C \"CHCP > myenc.txt\"");
// 2nd: Read the first line in the file:
FILE *F = fopen("myenc.txt", "r");
char buffer[100];
fgets(buffer, F);
fclose(F);
// 3rd: Analyze the loaded string to find the Windows codepage:
int codepage = my_CHCP_analizer_func(buffer);
// The function my_CHCP_analizer_func() must be written for you,
// and it has to take in account the way in that CHCP prints the information.
Finally, the codepages sent by CHCP can be checked for example here:
Windows Codepages
Related
I have some code that works perfectly fine on Linux BUT on Windows it only works as expected if is compiled using Cygwin, which emulates a Linux env. on Windows but is bad for portability (you must have Cygwin installed for compiled binary to work.) The program does the following:
Opens a document in read mode and ccs=UTF-8 and reads it char by char.
Writes the braille Unicode pattern (U+2800..U+28FF) corresponding to that letter, num. or punct. mark to a 'dest' document (opened in write mode and ccs=UTF-8)
Significant code:
const char *brai[26] = {
"⠁","⠃","⠉","⠙","⠑","⠋","⠛","⠓","⠊","⠚",
"⠅","⠇","⠍","⠝","⠕","⠏","⠟","⠗","⠎","⠞",
"⠥","⠧","⠭","⠽","⠵","⠺"
}
int main(void) {
setlocale(LC_ALL, "es_MX.UTF-8");
FILE *source = fopen(origen, "r, ccs=UTF-8");
FILE *dest = fopen(destino, "w, ccs=UTF-8");
unsigned int letra;
while ((letra = fgetc(source)) != EOF) {
// This next line is the problem, I guess:
fwprintf(dest, L"%s", "⠷"); // Prints directly the braille sign as a char[]
// OR prints it from an array that contains the exact same sign.
fwprintf(dest, L"%s", brai[7]);
}
}
Code works as expected on Linux every time, but not for Windows. I tried everything and nothing seems to get the output right. On the 'dest' document I get random chars like:
甥╩極肠─猀甥iꃢ¨.
The only way to print braille patterns to the doc so far on Windows was:
fwprintf(dest, L"⠷");
Which is not very useful (would need to make an 'else if' for every case instead).
If you wish to see the full code, it's on Github:
https://github.com/oliver-almaraz/Texto_a_Braille
What I tried so far:
Changing files open options to UTF-16LE and UNICODE.
Changing fwprintf() arguments in every way I could imagine.
Changing the array properties to unsigned int for the arrays containing the braille patterns.
Different compilers.
Here's a tested (with MSVC and mingw on Windows), semi-working example.
#include <stdio.h>
#include <ctype.h>
const char *brai[26] = {
"⠁","⠃","⠉","⠙","⠑","⠋","⠛","⠓","⠊","⠚",
"⠅","⠇","⠍","⠝","⠕","⠏","⠟","⠗","⠎","⠞",
"⠥","⠧","⠭","⠽","⠵","⠺"
};
int main(void) {
char* origen = "a.txt";
char* destino = "b.txt";
FILE *source = fopen(origen, "r");
FILE *dest = fopen(destino, "w");
int letra;
while ((letra = fgetc(source)) != EOF) {
if (isupper(letra))
fprintf(dest, "%s", brai[letra - 'A']);
else if (islower(letra))
fprintf(dest, "%s", brai[letra - 'a']);
else
fprintf (dest, "%c", letra);
}
}
Note these things.
No locale or wide character or anything like that in sight. None of this is needed.
This code only translates English letters. No punctuation or numbers (I don't know nearly enough about Braille to add that, but this should be straightforward).
Since the code only translates English letters and leaves everything else as is, it is OK to feed it a UTF-8 encoded file. It will just leave unrecognised characters untranslated. If you ever need to translate accented letters, you will need to learn a whole lot more about Unicode. Here is a good place to start.
Error handling omitted for brevity.
The code must use the correct charset. For MSVC, either UTF-8 with BOM or UTF16, alternatively use UTF-8 without BOM and /utf-8 compiler switch if your MSVC version recognises it. For mingw, just use UTF-8.
This method will not work for standard console output on Windows. It is not a big problem since Windows console by default won't output Braille characters anyway. It will however work for msys console and many others.
Option 1: Use wchar_t and fwprintf. Make sure to save the source as UTF-8 w/ BOM encoding or use UTF-8 encoding and the /utf-8 switch to force assuming UTF-8 encoding on the Microsoft compiler; otherwise, MSVS assumes an ANSI encoding for the source file and you get mojibake.
#include <stdio.h>
const wchar_t brai[] = L"⠁⠃⠉⠙⠑⠋⠛⠓⠊⠚⠅⠇⠍⠝⠕⠏⠟⠗⠎⠞⠥⠧⠭⠽⠵⠺";
int main(void) {
FILE *dest = fopen("out.txt", "w, ccs=UTF-8");
fwprintf(dest, L"%s", brai);
}
out.txt (encoded as UTF-8 w/ BOM):
⠁⠃⠉⠙⠑⠋⠛⠓⠊⠚⠅⠇⠍⠝⠕⠏⠟⠗⠎⠞⠥⠧⠭⠽⠵⠺
Option 2: Use char and fprintf, save the source as UTF-8 or UTF-8 w/ BOM, and use the /utf-8 Microsoft compile switch. The char string will be in the source encoding, so it must be UTF-8 to get UTF-8 in the output file.
#include <stdio.h>
const char brai[] = "⠁⠃⠉⠙⠑⠋⠛⠓⠊⠚⠅⠇⠍⠝⠕⠏⠟⠗⠎⠞⠥⠧⠭⠽⠵⠺";
int main(void) {
FILE *dest = fopen("out.csv","w");
fprintf(dest, "%s", brai);
}
The latest compiler can also use the u8"" syntax. The advantage here is you can use a different source encoding and the char string will still be UTF-8 as long as you use the appropriate compiler switch to indicate the source encoding.
const char brai[] = u8"⠁⠃⠉⠙⠑⠋⠛⠓⠊⠚⠅⠇⠍⠝⠕⠏⠟⠗⠎⠞⠥⠧⠭⠽⠵⠺";
For reference, these are the Microsoft compiler options:
/source-charset:<iana-name>|.nnnn set source character set
/execution-charset:<iana-name>|.nnnn set execution character set
/utf-8 set source and execution character set to UTF-8
Is it possible to read a text file hat has non-english text?
Example of text in file:
E 37
SVAR:
Fettembolisyndrom. (1 poäng)
Example of what is present in buffer which stores "fread" output using "puts" :
E 37 SVAR:
Fettembolisyndrom.
(1 poäng)
Under Linux my program was working fine but in Windows I am seeing this problem with non-english letters. Any advise how this can be fixed?
Program:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
int debug = 0;
int main(int argc, char* argv[])
{
if (argc < 2)
{
puts("ERROR! Please enter a filename\n");
exit(1);
}
else if (argc > 2)
{
debug = atoi(argv[2]);
puts("Debugging mode ENABLED!\n");
}
FILE *fp = fopen(argv[1], "rb");
fseek(fp, 0, SEEK_END);
long fileSz = ftell(fp);
fseek(fp, 0, SEEK_SET);
char* buffer;
buffer = (char*) malloc (sizeof(char)*fileSz);
size_t readSz = fread(buffer, 1, fileSz, fp);
rewind(fp);
if (readSz == fileSz)
{
char tmpBuff[100];
fgets(tmpBuff, 100, fp);
if (!ferror(fp))
{
printf("100 characters from text file: %s\n", tmpBuff);
}
else
{
printf("Error encounter");
}
}
if (strstr("FRÅGA",buffer) == NULL)
{
printf("String not found!");
}
return 0;
}
Sample output
Text file
Summary: If you read text from a file encoded in UTF-8 and display it on the console you must either set the console to UTF-8 or transcode the text from UTF-8 to the encoding used by the console (in English-speaking countries, usually MS-DOS code page 437 or 850).
Longer explanation
Bytes are not characters and characters are not bytes. The char data type in C holds a byte, not a character. In particular, the character Å (Unicode <U+00C5>) mentioned in the comments can be represented in many ways, called encodings:
In UTF-8 it is two bytes, '\xC3' '\x85';
In UTF-16 it is two bytes, either '\xC5' '\x00' (little-endian UTF-16), or '\x00' '\xC5' (big-endian UTF-16);
In Latin-1 and Windows-1252, it is one byte, '\xC5';
In MS-DOS code page 437 and code page 850, it is one byte, '\x8F'.
It is the responsibility of the programmer to translate between the internal encoding used by the program (usually but not always Unicode), the encoding used in input or output files, and the encoding expected by the display device.
Note: Sometimes, if the program does not do much with the characters it reads and outputs, one can get by just by making sure that the input files, the output files, and the display device all use the same encoding. In Linux, this encoding is almost always UTF-8. Unfortunately, on Windows the existence of multiple encodings is a fact of life. System calls expect either UTF-16 or Windows-1252. By default, the console displays Code Page 437 or 850. Text files are quite often in UTF-8. Windows is old and complicated.
I'm trying to create wide chars file using MinGW C on Windows, however wide chars seem to be omitted. My code:
const wchar_t* str = L"příšerně žluťoučký kůň úpěl ďábelské ódy";
FILE* fd = fopen("file.txt","w");
// FILE* fd = _wfopen(L"demo.txgs",L"w"); // attempt to open wide file doesn't help
fwide(fd,1); // attempt to force wide mode, doesn't help
fwprintf(fd,L"%ls",str);
// fputws(p,fd); // stops output after writing "p" (1B file size)
fclose(fd);
File contents
píern luouký k úpl ábelské ódy
The file size is 30B, so the wide chars are really missing. How to convince the compiler to write them?
As #chqrlie suggests in the comments: the result of
fwrite(str, 1, sizeof(L"příšerně žluťoučký kůň úpěl ďábelské ódy"), fd);
is 82 (I guess 2*30 + 2*10 (ommited chars) + 2 (wide trailing zero)).
It also might be useful to quote from here
The external representation of wide characters in files are multibyte
characters: These are obtained as if wcrtomb was called to convert
each wide character (using the stream's internal mbstate_t object).
Which explains why the ISO-8859-1 chars are single byte in the file, but I don't know how to use this information to solve my problem. Doing the opposite task (reading multibyte UTF-8 into wide chars) I failed to use mbtowc and ended up using winAPI's MultiByteToWideChar.
I am not a Windows user, but you might try this:
const wchar_t *str = L"příšerně žluťoučký kůň úpěl ďábelské ódy";
FILE *fd = fopen("file.txt", "w,ccs=UTF-8");
fwprintf(fd, L"%ls", str);
fclose(fd);
I got this idea from this question: How do I write a UTF-8 encoded string to a file in windows, in C++
I figured this out. The internal use of wcrtomb (mentioned in details of my question) needs setlocale call, but that call fails with UTF-8 on Windows. So I used winAPI here:
char output[100]; // not wchar_t, write byte-by-byte
int len = WideCharToMultiByte(CP_UTF8,0,str,-1,NULL,0,NULL,NULL);
if(len>100) len = 100;
WideCharToMultiByte(CP_UTF8,0,str,-1,output,len,NULL,NULL);
fputs(output,fd);
And voila! The file is 56B long with expected UTF-8 contents:
příšerně žluťoučký kůň úpěl ďábelské ódy
I hope this will save some nerves to Windows coders.
I'm using Windows XP.
I want to read from files in ASCII, UTF-8 and Unicode encodings and print out strings on stdout.
I was trying to use functions from wchar.h like fgetwc()/fputwc() and fgetws()/fputws(), they work on ASCII but not when a file is in UTF-8 or Unicode. Doesn't print out language specific characters and when a file is in Unicode it doesn't print out anything but the box and first letter.
Is there any way of making a program in pure C that will read files, compare strings and print them out correctly on stdout regardless of the encoding of the files fed to the program?
Since you're on Windows, the key is that you want to write your strings out using the WriteConsoleW function, having first assembled the sequence of UTF-16 characters that you want to write out. (You probably should only write a few kilobytes of characters at a time.) Use GetStdHandle to obtain the console handle, of course.
Harder is determining the encoding of a file. Luckily, you don't need to distinguish between ASCII and UTF-8 as the latter is a strict superset of the former. But for any other single-byte encoding, you need to guess. Some UTF-8 files, more likely so on Windows than elsewhere, have a UTF-8 encoded byte-order mark at the beginning of the file; that's nasty as BOMs are not really supposed to be used with UTF-8, but a strong indicator if present. (Spotting UTF-16 is easier, as it should either have a byte-order mark, or you can guess it from the presence of NUL (0) bytes.)
Here's a little piece of code I used to print various characters outside of the ASCII subset of Unicode (contains workarounds for what seems to be a bug in the Open Watcom compiler's implementation of printf()):
// Compile with Open Watcom C/C++ 1.9: wcl386 cons-utf8.c
#include <windows.h>
#include <stdio.h>
#include <stddef.h>
// Workarounds for printf() not printing multi-byte (UTF-8) strings
// with Open Watcom C/C++ 1.7-1.9.
// 0 - no workaround for printf()
// 1 - setbuf(stdout, NULL) before printf()
// 2 - fflush(stdout) after printf()
// 3 - WriteConsole() instead of printf()
#define PRINT_WORKAROUND 03
int main(void)
{
DWORD err, i, j;
// Code point ranges of characters to print
static const DWORD ranges[][2] =
{
{ 0x0A0, 0x0FF }, // Latin chars with diacritic marks + some others
{ 0x391, 0x3CE }, // Greek chars
{ 0x410, 0x44F } // Cyrillic chars
};
#if PRINT_WORKAROUND == 1
setbuf(stdout, NULL);
#endif
if (!SetConsoleOutputCP(CP_UTF8))
{
err = GetLastError();
printf("SetConsoleOutputCP(CP_UTF8) failed with error 0x%X\n", err);
goto Exit;
}
printf("Workaround: %d\n", PRINT_WORKAROUND);
for (j = 0; j < sizeof(ranges) / sizeof(ranges[0]); j++)
{
for (i = ranges[j][0]; i <= ranges[j][1]; i++)
{
char str[8];
int sz;
wchar_t wstr[2];
wstr[0] = i;
wstr[1] = 0;
sz = WideCharToMultiByte(CP_UTF8,
0,
wstr,
-1,
str,
sizeof(str),
NULL,
NULL);
if (sz <= 0)
{
err = GetLastError();
printf("WideCharToMultiByte() failed with error 0x%X\n", err);
goto Exit;
}
#if PRINT_WORKAROUND < 3
printf("%s", str);
#if PRINT_WORKAROUND == 2
fflush(stdout);
#endif
#else
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE),
str,
sz - 1,
&err,
NULL);
#endif
}
printf("\n");
}
printf("\n");
Exit:
return 0;
}
Output:
C:\>cons-utf8.exe
Workaround: 3
¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩΪΫάέήίΰαβγδεζηθικλμνξοπρςστυφχψωϊϋόύώ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя
I didn't find a way to print UTF-16 code points directly to the console in Windows XP that would work the same as above.
Is there any way of making a program in pure C that will read files, compare strings and print them out correctly on stdout regardless of the encoding of the files fed to the program?
No, the program has obviously to told the encoding of the files too. Internally, you can choose to represent the data of the files with multibyte strings in UTF-8 or with wide strings.
I'm trying to build an instruction pipeline simulator and I'm having a lot of trouble getting started. What I need to do is read binary from stdin, and then store it in memory somehow while I manipulate the data. I need to read in chunks of exactly 32 bits one after the other.
How do I read in chunks of exactly 32 bits at a time? Secondly, how do I store it for manipulation later?
Here's what I've got so far, but examining the binary chunks I read further, it just doesn't look right, I don't think I'm reading exactly 32 bits like I need.
char buffer[4] = { 0 }; // initialize to 0
unsigned long c = 0;
int bytesize = 4; // read in 32 bits
while (fgets(buffer, bytesize, stdin)) {
memcpy(&c, buffer, bytesize); // copy the data to a more usable structure for bit manipulation later
// more stuff
buffer[0] = 0; buffer[1] = 0; buffer[2] = 0; buffer[3] = 0; // set to zero before next loop
}
fclose(stdin);
How do I read in 32 bits at a time (they are all 1/0, no newlines etc), and what do I store it in, is char[] okay?
EDIT: I'm able to read the binary in but none of the answers produce the bits in the correct order — they are all mangled up, I suspect endianness and problems reading and moving 8 bits around ( 1 char) at a time — this needs to work on Windows and C ... ?
What you need is freopen(). From the manpage:
If filename is a null pointer, the freopen() function shall attempt to change the mode of the stream to that specified by mode, as if the name of the file currently associated with the stream had been used. In this case, the file descriptor associated with the stream need not be closed if the call to freopen() succeeds. It is implementation-defined which changes of mode are permitted (if any), and under what circumstances.
Basically, the best you can really do is this:
freopen(NULL, "rb", stdin);
This will reopen stdin to be the same input stream, but in binary mode. In the normal mode, reading from stdin on Windows will convert \r\n (Windows newline) to the single character ASCII 10. Using the "rb" mode disables this conversion so that you can properly read in binary data.
freopen() returns a filehandle, but it's the previous value (before we put it in binary mode), so don't use it for anything. After that, use fread() as has been mentioned.
As to your concerns, however, you may not be reading in "32 bits" but if you use fread() you will be reading in 4 chars (which is the best you can do in C - char is guaranteed to be at least 8 bits but some historical and embedded platforms have 16 bit chars (some even have 18 or worse)). If you use fgets() you will never read in 4 bytes. You will read in at least 3 (depending on whether any of them are newlines), and the 4th byte will be '\0' because C strings are nul-terminated and fgets() nul-terminates what it reads (like a good function). Obviously, this is not what you want, so you should use fread().
Consider using SET_BINARY_MODE macro and setmode:
#ifdef _WIN32
# include <io.h>
# include <fcntl.h>
# define SET_BINARY_MODE(handle) setmode(handle, O_BINARY)
#else
# define SET_BINARY_MODE(handle) ((void)0)
#endif
More details about SET_BINARY_MODE macro here: "Handling binary files via standard I/O"
More details about setmode here: "_setmode"
I had to piece the answer together from the various comments from the kind people above, so here is a fully-working sample that works - only for Windows, but you can probably translate the windows-specific stuff to your platform.
#include "stdafx.h"
#include "stdio.h"
#include "stdlib.h"
#include "windows.h"
#include <io.h>
#include <fcntl.h>
int main()
{
char rbuf[4096];
char *deffile = "c:\\temp\\outvideo.bin";
size_t r;
char *outfilename = deffile;
FILE *newin;
freopen(NULL, "rb", stdin);
_setmode(_fileno(stdin), _O_BINARY);
FILE *f = fopen(outfilename, "w+b");
if (f == NULL)
{
printf("unable to open %s\n", outfilename);
exit(1);
}
for (;; )
{
r = fread(rbuf, 1, sizeof(rbuf), stdin);
if (r > 0)
{
size_t w;
for (size_t nleft = r; nleft > 0; )
{
w = fwrite(rbuf, 1, nleft, f);
if (w == 0)
{
printf("error: unable to write %d bytes to %s\n", nleft, outfilename);
exit(1);
}
nleft -= w;
fflush(f);
}
}
else
{
Sleep(10); // wait for more input, but not in a tight loop
}
}
return 0;
}
For Windows, this Microsoft _setmode example specifically shows how to change stdin to binary mode:
// crt_setmode.c
// This program uses _setmode to change
// stdin from text mode to binary mode.
#include <stdio.h>
#include <fcntl.h>
#include <io.h>
int main( void )
{
int result;
// Set "stdin" to have binary mode:
result = _setmode( _fileno( stdin ), _O_BINARY );
if( result == -1 )
perror( "Cannot set mode" );
else
printf( "'stdin' successfully changed to binary mode\n" );
}
fgets() is all wrong here. It's aimed at human-readable ASCII text terminated by end-of-line characters, not binary data, and won't get you what you need.
I recently did exactly what you want using the read() call. Unless your program has explicitly closed stdin, for the first argument (the file descriptor), you can use a constant value of 0 for stdin. Or, if you're on a POSIX system (Linux, Mac OS X, or some other modern variant of Unix), you can use STDIN_FILENO.
fread() suits best for reading binary data.
Yes, char array is OK, if you are planning to process them bytewise.
I don't know what OS you are running, but you typically cannot "open stdin in binary". You can try things like
int fd = fdreopen (fileno (stdin), outfname, O_RDONLY | OPEN_O_BINARY);
to try to force it. Then use
uint32_t opcode;
read(fd, &opcode, sizeof (opcode));
But I have no actually tried it myself. :)
I had it right the first time, except, I needed ntohl ... C Endian Conversion : bit by bit