I have a file named output.txt and i want to print in a root (√) symbol in it.
Here is my program
#include<stdio.h>
#include<conio.h>
void main(void)
{
FILE *fptr;
fptr = fopen("output.txt","w+"); //open the file to write
fprintf(fptr,"\xfb"); // \xfb is the hexadecimal code for root symbol
fclose(fptr);
}
but when i run the program (û) is printed in it
The problem you are encountering is because you are attempting to use part of the extended ASCII set (ie: characters above 127 in value). The code page is something you set so that if the MSB of an 8-bit ASCII symbol is set, it can map to one of many different code pages depending on region/locale, OS, etc (ie: Greek, Latin, etc). In most cases, ASCII characters are generally considered to be 7-bit, ignoring the code page enable bit.
Attempting to use extended ASCII is not a portable approach, so your best alternative is to:
Make use of unicode
Make sure your C compiler is C99 compliant.
The following example resolves the original problem.
Source Code
#include <stdio.h>
void main(void) {
FILE *fptr;
fptr = fopen("output.txt","w+"); //open the file to write
fprintf(fptr, "\u221A\n");
fclose(fptr);
}
Output from Sample Run
√
References
How to print Extended ASCII characters 127 to 160 in through a C program?, Accessed 2014-04-16, <https://stackoverflow.com/questions/16359225/how-to-print-extended-ascii-characters-127-to-160-in-through-a-c-program>
Unicode Character 'SQUARE ROOT' (U+221A), Accessed 2014-04-16, <http://www.fileformat.info/info/unicode/char/221a/index.htm>
Related
I have some code that works perfectly fine on Linux BUT on Windows it only works as expected if is compiled using Cygwin, which emulates a Linux env. on Windows but is bad for portability (you must have Cygwin installed for compiled binary to work.) The program does the following:
Opens a document in read mode and ccs=UTF-8 and reads it char by char.
Writes the braille Unicode pattern (U+2800..U+28FF) corresponding to that letter, num. or punct. mark to a 'dest' document (opened in write mode and ccs=UTF-8)
Significant code:
const char *brai[26] = {
"⠁","⠃","⠉","⠙","⠑","⠋","⠛","⠓","⠊","⠚",
"⠅","⠇","⠍","⠝","⠕","⠏","⠟","⠗","⠎","⠞",
"⠥","⠧","⠭","⠽","⠵","⠺"
}
int main(void) {
setlocale(LC_ALL, "es_MX.UTF-8");
FILE *source = fopen(origen, "r, ccs=UTF-8");
FILE *dest = fopen(destino, "w, ccs=UTF-8");
unsigned int letra;
while ((letra = fgetc(source)) != EOF) {
// This next line is the problem, I guess:
fwprintf(dest, L"%s", "⠷"); // Prints directly the braille sign as a char[]
// OR prints it from an array that contains the exact same sign.
fwprintf(dest, L"%s", brai[7]);
}
}
Code works as expected on Linux every time, but not for Windows. I tried everything and nothing seems to get the output right. On the 'dest' document I get random chars like:
甥╩極肠─猀甥iꃢ¨.
The only way to print braille patterns to the doc so far on Windows was:
fwprintf(dest, L"⠷");
Which is not very useful (would need to make an 'else if' for every case instead).
If you wish to see the full code, it's on Github:
https://github.com/oliver-almaraz/Texto_a_Braille
What I tried so far:
Changing files open options to UTF-16LE and UNICODE.
Changing fwprintf() arguments in every way I could imagine.
Changing the array properties to unsigned int for the arrays containing the braille patterns.
Different compilers.
Here's a tested (with MSVC and mingw on Windows), semi-working example.
#include <stdio.h>
#include <ctype.h>
const char *brai[26] = {
"⠁","⠃","⠉","⠙","⠑","⠋","⠛","⠓","⠊","⠚",
"⠅","⠇","⠍","⠝","⠕","⠏","⠟","⠗","⠎","⠞",
"⠥","⠧","⠭","⠽","⠵","⠺"
};
int main(void) {
char* origen = "a.txt";
char* destino = "b.txt";
FILE *source = fopen(origen, "r");
FILE *dest = fopen(destino, "w");
int letra;
while ((letra = fgetc(source)) != EOF) {
if (isupper(letra))
fprintf(dest, "%s", brai[letra - 'A']);
else if (islower(letra))
fprintf(dest, "%s", brai[letra - 'a']);
else
fprintf (dest, "%c", letra);
}
}
Note these things.
No locale or wide character or anything like that in sight. None of this is needed.
This code only translates English letters. No punctuation or numbers (I don't know nearly enough about Braille to add that, but this should be straightforward).
Since the code only translates English letters and leaves everything else as is, it is OK to feed it a UTF-8 encoded file. It will just leave unrecognised characters untranslated. If you ever need to translate accented letters, you will need to learn a whole lot more about Unicode. Here is a good place to start.
Error handling omitted for brevity.
The code must use the correct charset. For MSVC, either UTF-8 with BOM or UTF16, alternatively use UTF-8 without BOM and /utf-8 compiler switch if your MSVC version recognises it. For mingw, just use UTF-8.
This method will not work for standard console output on Windows. It is not a big problem since Windows console by default won't output Braille characters anyway. It will however work for msys console and many others.
Option 1: Use wchar_t and fwprintf. Make sure to save the source as UTF-8 w/ BOM encoding or use UTF-8 encoding and the /utf-8 switch to force assuming UTF-8 encoding on the Microsoft compiler; otherwise, MSVS assumes an ANSI encoding for the source file and you get mojibake.
#include <stdio.h>
const wchar_t brai[] = L"⠁⠃⠉⠙⠑⠋⠛⠓⠊⠚⠅⠇⠍⠝⠕⠏⠟⠗⠎⠞⠥⠧⠭⠽⠵⠺";
int main(void) {
FILE *dest = fopen("out.txt", "w, ccs=UTF-8");
fwprintf(dest, L"%s", brai);
}
out.txt (encoded as UTF-8 w/ BOM):
⠁⠃⠉⠙⠑⠋⠛⠓⠊⠚⠅⠇⠍⠝⠕⠏⠟⠗⠎⠞⠥⠧⠭⠽⠵⠺
Option 2: Use char and fprintf, save the source as UTF-8 or UTF-8 w/ BOM, and use the /utf-8 Microsoft compile switch. The char string will be in the source encoding, so it must be UTF-8 to get UTF-8 in the output file.
#include <stdio.h>
const char brai[] = "⠁⠃⠉⠙⠑⠋⠛⠓⠊⠚⠅⠇⠍⠝⠕⠏⠟⠗⠎⠞⠥⠧⠭⠽⠵⠺";
int main(void) {
FILE *dest = fopen("out.csv","w");
fprintf(dest, "%s", brai);
}
The latest compiler can also use the u8"" syntax. The advantage here is you can use a different source encoding and the char string will still be UTF-8 as long as you use the appropriate compiler switch to indicate the source encoding.
const char brai[] = u8"⠁⠃⠉⠙⠑⠋⠛⠓⠊⠚⠅⠇⠍⠝⠕⠏⠟⠗⠎⠞⠥⠧⠭⠽⠵⠺";
For reference, these are the Microsoft compiler options:
/source-charset:<iana-name>|.nnnn set source character set
/execution-charset:<iana-name>|.nnnn set execution character set
/utf-8 set source and execution character set to UTF-8
What translation occurs when writing to a file that was opened in text mode that does not occur in binary mode? Specifically in MS Visual C.
unsigned char buffer[256];
for (int i = 0; i < 256; i++) buffer[i]=i;
int size = 1;
int count = 256;
Binary mode:
FILE *fp_binary = fopen(filename, "wb");
fwrite(buffer, size, count, fp_binary);
Versus text mode:
FILE *fp_text = fopen(filename, "wt");
fwrite(buffer, size, count, fp_text);
I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:
line feeds ('\n') will be translated to '\r\n" sequences on output
carriage return/line feed sequences will be translated to line feeds on input.
If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.
In text mode, a newline "\n" may be converted to a carriage return + newline "\r\n"
Usually you'll want to open in binary mode. Trying to read any binary data in text mode won't work, it will be corrupted. You can read text ok in binary mode though - it just won't do automatic translations of "\n" to "\r\n".
See fopen
Additionally, when you fopen a file with "rt" the input is terminated on a Crtl-Z character.
Another difference is when using fseek
If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Some binary streams may not support the SEEK_END.
If the stream is open in text mode, the only supported values for offset are zero (which works with any origin) and a value returned by an earlier call to std::ftell on a stream associated with the same file (which only works with origin of SEEK_SET.
Even though this question was already answered and clearly explained, I think it would be interesting to show the main issue (translation between \n and \r\n) with a simple code example. Note that I'm not addressing the issue of the Crtl-Z character at the end of the file.
#include <stdio.h>
#include <string.h>
int main() {
FILE *f;
char string[] = "A\nB";
int len;
len = strlen(string);
printf("As you'd expect string has %d characters... ", len); /* prints 3*/
f = fopen("test.txt", "w"); /* Text mode */
fwrite(string, 1, len, f); /* On windows "A\r\nB" is writen */
printf ("but %ld bytes were writen to file", ftell(f)); /* prints 4 on Windows, 3 on Linux*/
fclose(f);
return 0;
}
If you execute the program on Windows, you will see the following message printed:
As you'd expect string has 3 characters... but 4 bytes were writen to file
Of course you can also open the file with a text editor like Notepad++ and see yourself the characters:
The inverse conversion is performed on Windows when reading the file in text mode.
We had an interesting problem with opening files in text mode where the files had a mixture of line ending characters:
1\n\r
2\n\r
3\n
4\n\r
5\n\r
Our requirement is that we can store our current position in the file (we used fgetpos), close the file and then later to reopen the file and seek to that position (we used fsetpos).
However, where a file has mixtures of line endings then this process failed to seek to the actual same position. In our case (our tool parses C++), we were re-reading parts of the file we'd already seen.
Go with binary - then you can control exactly what is read and written from the file.
In 'w' mode, the file is opened in write mode and the basic coding is 'utf-8'
in 'wb' mode, the file is opened in write -binary mode and it is resposible for writing other special characters and the encoding may be 'utf-16le' or others
Here's my code:
#include <stdio.h>
#include <stdlib.h>
main(){
FILE* fp = fopen("img.ppm","w");
fprintf(fp,"%c", 10);
fclose(fp);
return 0;
}
for some reason that I am unable to uncover, this writes 2 bytes to the file: "0x0D 0x0A" while the behaviour I would expect is for it to just write "0x0A" which is 10 in decimal. It seems to work fine with every single other value between 0 and 255 included, it just writes one byte to the file. I am completely lost, any help?
Assuming you are using the Windows C runtime library, newline characters are written as \r\n, or 13 10. Which is 0x0D 0x0A. This is the only character that's actually written as two characters (by software compiled using the Windows toolchain).
You need to open the file with fopen("img.ppm","wb") to write binary.
I need to write some binary data into file. The format is uint64_t.
#include <stdio.h>
#include <assert.h>
typedef unsigned long long uint64_t;
int main()
{
FILE * file = fopen("data","w");assert(file);
uint64_t a[]={16000550, 1051320,14456018, 4743184,11840752 ,4225032,\
13642264,6059108,563784 ,11823354,3989084 ,15759410,\
13413018 ,1582802,1574952 ,1635384,1102996 ,10511428,\
10239562 ,9472574,2641952 ,1350256,3432142 ,9920,11573360,\
12121180,10255874 ,3198684,7628524,16522766,12908660,\
2681374,9482820 ,6354462,15230702 ,16255676,5813862, \
8174782,7642752,7362790,6089340 ,803928,2669686 ,4225032,\
7603956 ,16551562,15734364 ,14424308,12060282 ,572450,\
18432 ,10276902,8134910 ,10749010,14166126 ,1636942,\
5295788 ,12342876,2151156 ,12322948};
for(int i=0;i<sizeof(a)/sizeof(uint64_t);i++)
{
fwrite((char*)&a[i],sizeof(uint64_t),1,file);
}
fclose(file);
}
I found the output doesn't satisfy my expectation only when the size of array is large, so I give 60 uint64_ts in my example.
In test, I found it will output 0000 fe20 7c00 0000 for 8134910. Also, some others errors exists in it. In GCC, it works well and in VS2012, it works bad.
Based on your feedback in comments, the reason it's different in VS2012 is because the file has been opened by defualt in "text" mode. In this mode, each \n when written will be expanded to \r\n, which will corrupt your data.
The solution is to explicitly open the file in binary mode:
FILE * file = fopen("data","wb")
Quoting from MSDN regarding the t and b characters that may be appended to the mode:
t
Open in text (translated) mode. In this mode, CTRL+Z is interpreted
as an EOF character on input. In files that are opened for
reading/writing by using "a+", fopen checks for a CTRL+Z at the end of
the file and removes it, if possible. This is done because using fseek
and ftell to move within a file that ends with CTRL+Z may cause fseek
to behave incorrectly near the end of the file.
In text mode, carriage return–linefeed combinations are translated
into single linefeeds on input, and linefeed characters are translated
to carriage return–linefeed combinations on output. When a Unicode
stream-I/O function operates in text mode (the default), the source or
destination stream is assumed to be a sequence of multibyte
characters. Therefore, the Unicode stream-input functions convert
multibyte characters to wide characters (as if by a call to the mbtowc
function). For the same reason, the Unicode stream-output functions
convert wide characters to multibyte characters (as if by a call to
the wctomb function).
b
Open in binary (untranslated) mode; translations involving
carriage-return and linefeed characters are suppressed.
If t or b is not given in mode, the default translation mode is
defined by the global variable _fmode.
The MSDN documentation for _fmode says:
The default setting of _fmode is _O_TEXT for text-mode
translation. _O_BINARY is the setting for binary mode.
I am working on TTCN-3 (Testing and Test Control Notation) scripting language. I wanted to prepare on guideline checker for this code files.
For that I want to read lines of TTCN-3 script file( some thing like file.ttcn ) one by one into a buffer. But for me fopen / sopen / open / fgetc / fscanf are not able to work properly and are not reading the file correctly. It is giving NULL. Is there any way I can read characters of it into a buffer. I think C cannot read files with more than three extension characters (like .ttcn). Forgive me if my assumption is wrong.
My Environment is Turbo C on windows.
Edit:
Yes I checked those errors also but they are giving unknown error for read()
and no such file or directory exists.
My code is as follows
#include <errno.h>
#include <io.h>
#include <fcntl.h>
#include <sys\stat.h>
#include <process.h>
#include <share.h>
#include <stdio.h>
int main(void)
{
int handle;
int status;
int i=0;
char ch;
FILE *fp;
char *buffer;
char *buf;
clrscr();
handle = sopen("c:\\tc\\bin\\hi.ttcn", O_BINARY, SH_DENYNONE, S_IREAD);
/here even I used O_TEXT and others/
if (!handle)
{
printf("sopen failed\n");
// exit(1);
}
printf("\nObtained string %s #",buf);
close(handle);
fp=fopen("c:\\tc\\bin\\hi.ttcn","r"); \\sorry for the old version of one slash
if(fp==NULL) \\I was doing it with argv[1] for opening
{ \\user given file name
printf("\nCannot open file");
}
ch=fgetc(fp);
i=0;
while(i<10)
{
printf("\ncharacter is %c %d",ch,ch);
i++; //Here I wanted to take characters into
ch=fgetc(fp); //buffer
}
getch();
return 0;
}
The most likely culprit is your Turbo C, an ancient compiler. It's techincally a DOS compiler, not Windows. That would limit it's RunTme Library to 8.3 filenames. Upgrade to something newer - Turbo C++ seems like a logical successor, but Microsoft's VC++ Express would work as well.
Your assumption is wrong about extensions. If fopen is returning NULL, you should output the result of strerror(errno) or use the perror() function to see why it failed.
Edit: The problem is probably because you have "c:\tc\bin\hi.ttcn". in C, "\t" is interpreted as tab, for example.
You could do
"c:\\tc\\bin\\hi.ttcn"
But this is extremely ugly, and your system should accept:
"c:/tc/bin/hi.ttcn"
MS-DOS does not know about long file names, thos including files with extensions longer than 3 characters. Therefore, the CRT provided by Turbo C most probably does not look for the name you are providing, but a truncated one - or something else.
Windows conveniently provides a short (i.e. matching the 8.3 format, most of the time ending in ~1 unless you play with files having the same 8-character prefix) file name for those; one way to discover it is to open a console window and to run "dir /x" in the folder your file is stored.
Find the short name associated to your file and patch it into your C source file.
Edit: Darn, I'll read the comments next time. All credits to j_random_hacker.
Now that you've posted the code, another problem comes to light.
The following line:
fp=fopen("c:\tc\bin\hi.ttcn","r");
Should instead read:
fp=fopen("c:\\tc\\bin\\hi.ttcn","r");
In C strings, the backslash (\) is an escape character that is used to encode special characters (e.g. \n represents a newline character, \t a tab character). To actually use a literal backslash, you need to double it. As it stands, the compiler is actually trying to open a file named "C:<tab>c<backspace>in\hi.ttcn" -- needless to say, no such file exists!