Tring to read a text file with emoji and print it - c

Input -> 😂😂
Output-> 😂😂
I simply want to maintain the original state of the emoji.
All i am doing is this
#include <stdio.h>
#include <stdlib.h>
int main()
{
char ch;
FILE *fp;
fp = fopen("test.txt","r");
while( ( ch = fgetc(fp) ) != EOF )
printf("%c",ch);
fclose(fp);
return 0;
}

In Unicode encoding, emoji must take more than one bytes. Hence printing byte by byte will not help in this case. If you redirect the output to a file, you may get almost same as your file.
You may try to print the string by changing locale(on Linux) or you can try wprintf on Windows (remember to convert to Wide string).

Related

Problem with the first character of the binary file while reading it in a C file

I am trying to read a binary file and its content.
/*aoObj.fb is the pointer of the file (e.x. FILE *fp)*/
char ch;
aoObj.fp = fopen(aoObj.f_name, "rb");
if (aoObj.fp == NULL)
{
perror("Error while opening the file.\n");
exit(EXIT_FAILURE);
}
/* /\*list all strings *\/ */
printf("\n\nThe content of the file: \n");
while ((ch = fgetc(aoObj.fp)) != EOF)
printf("%c", ch);
fclose(aoObj.fp);
(void) opt_free(&aoObj);
return 0;
}
But I am facing issues when I print the content of thi file because only the first character of the input isn't printed ok, as follows:
May I know why this is happening ?
EDIT: All the variables which are being read are declared as STRINGS
The OP states the file contents are 'binary' not 'text' Therefore, accessing the file should be via the I/O operators made for binary files,
Suggest:
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
Since the data read from the 'binary' file is NOT ascii characters, it is an 'error' to try to print those 'binary' characters with the 'output format conversion' specifier: %c.
Suggest:
printf( "%02x\n", ch );
Note: the %02x so a leading nibble of 0x0 will be printed rather than suppressed.
When the code is corrected to use: fread() rather than fgetc() the declaration of ch can/should be unsigned char ch; so no need to change that to int ch;
The following proposed code:
cleanly compiles
performs the desired functionality
is missing a main() function and the passing of the parameter: f_name so does not link
properly checks for an error when opening the input file
uses the returned value from fread() to 'assume' EOF, however, it may be instructive (and for robust code) to check the value of errno just to assure there was no other error.
documents why each header file is included
Note: the proposed code is not very efficient as it only reads a single byte at a time rather than a whole buffer full of bytes
Note: the proposed code will output one byte contents (in hex) on a single line. You might want to modify that to output several bytes contents (in hex) before moving to a new line.
and now, the proposed code:
#include <stdio.h> // FILE, fopen(), perror(), printf(), fclose()
// fread()
#include <stdlib.h> // exit(), EXIT_FAILURE
void myfunc( char *f_name )
{
unsigned char ch;
FILE *fp = fopen( f_name, "rb");
if (fp == NULL)
{
perror("Error while opening the file.\n");
exit(EXIT_FAILURE);
}
/* /\*list all strings *\/ */
printf("\n\nThe content of the file: \n");
size_t bytesRead;
while ( ( bytesRead = fread( &ch, 1, 1, fp ) ) == 1 )
{
printf("%02x\n", ch);
}
fclose(fp);
}

Store text from file in character array using fread()

Here is a minimal "working" example:
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char* argv[])
{
int num = 10;
FILE* fp = fopen("test.txt", "r"); // test.txt contains character sequence
char* ptr = (char*) malloc(sizeof (char)*(num+1)); // +1 for '\0'
fread(ptr, sizeof(char), num, fp); // read bytes from file
ptr[num] = '\0';
printf("%s\n", ptr); // output: ´╗┐abcdefg
free(ptr);
fclose(fp);
return 0;
}
I would like to read some letters from a text file, containing all letters from the alphabet in a single line. I want my array to store the first 10 letters, but the first 3 shown in the output are weird symbols (see the comment at the printf statement).
What am I doing wrong?
The issue is that your file is encoded using UTF-8. While UTF-8 is backwards-compatible with ASCII (which is what your code will be using) there are many differences.
In particular, many programs will put a BOM (Byte Order Mark) symbol at the start of the file to indicate which direction the bytes go. If you print the BOM using the default windows code page, you get the two symbols you saw.
Whatever program you used to create your text file was automatically inserting that BOM at the start of the file. Notepad++ is notorious for doing this. Check the save options and make sure to save either as plain ASCII or as UTF-8 without BOM. That will solve your problem.

File in c language

I need help about my code, I got some works, and it is one of the assignments.
suppose an encrypted file was created using the encoding/decoding scheme.
Each letter is substituted by some other letter according to a given mapping as shown below.
char * letters = "abcdefghijklmnopqrstuvwxyz";
char * enc = "kngcadsxbvfhjtiumylzqropwe";
For example, every a becomes a k when encoding a text, and every k becomes an a when decoding.
You will write a program, encode or decode a File, and then encodes or decodes the File using the mapping above.
Capital letters are mapped the same way as the lower case letters above, but remain capitalized.
For example, every 'A' becomes 'K' when encoding a file, and every 'K' becomes an 'A' when decoding.
Numbers and other characters are not encoded and remain the same.
Write a program to read a file and encode the file to an encrypted file.
And write a program to get an encrypted file and decode to original file.
Your program should prompt the user to enter an input file name and an output file name.
Ask for input file name/ output file name (encrypted file). The encrypt using above encode/decode.
Ask for encrypted file and decoded to original input file.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
int main()
{
char letters[]={"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"};
char enlet[]={"kngcadsxbvfhjtiumylzqropweKNGCADSXBVFHJTIUMYLZQROPWE"};
char infile[20];
char outfile[20];
char ch;
int i;
FILE *org, * enc, *dec;
printf("Enter file name (***.txt) : ");
gets(infile);
printf("Enter saving file name (***.txt) : ");
gets(outfile);
org = fopen(infile,"r");
enc = fopen(outfile,"w+");
while((ch=fgetc(org))!=EOF)
{
for(i=0;i<52;i++)
{
if(letters[i]==ch)
{
ch=enlet[i];
}
}
fputc(ch,enc);
}
fclose(org);
fclose(enc);
return 0;
}
this code is working but letters doesn't change correctly.
If there are "abcdefghijklmnopqrstuvwxyz" in my original file,
then, it happens "felcadlpbrfhjeiqmwleqropwe" in encoded file.
I expected it would be "kngcadsxbvfhjtiumylzqropwe"
I don't know what are the errors in my code.
Your if block should read:
if ( letters[i]==ch )
{
ch = enlet[i];
break;
}
so that ch is not replaced twice. I.e., the moment you know the substitution for that input file position, break, and move on.
Inside this loop, you overwrite ch after it has been replaced.
while((ch=fgetc(org))!=EOF)
{
for(i=0;i<52;i++)
{
if(letters[i]==ch)
{
ch=enlet[i];
}
}
fputc(ch,enc);
}
You could do one of two things:
Instead of assigning ch=enlet[i] just do the fputch(enlet[i])
or
Do break the loop as soon as you found a match
You could skip the for() loop and just use:
if( org && enc )
while( (ch=fgetc(org))!=EOF)
{
char *p = strchr( letters, ch );
fputc( (p)?enlet[p-letters]:ch, enc );
}
Also, you really should declare ch as an int to compare it to EOF. And gets() is a buffer overflow waiting to happen and crash your program / provide a security exploit hook (use fgets() and remember to parse off the trailing newlines). And you never check to see if org and enc aren't NULL (files opened successfully)

C echo user input

So I am a very beginner to C programming (I have used Ruby, Python and Haskell before) and I am having trouble getting the most simple thing to work in C (probably because of all the manual memory stuff). Anyway, what I am trying to do is (using simple constructs) make a script that just echoes what the user inputs to the console.
e.g. user inputs hi, console prints hi.
This is what I came up with.
Also, I haven't really mastered pointers, so none of that.
// echo C script
int echo();
int main() {
echo();
return 0;
}
int echo() {
char input[500];
while (1) {
if (scanf("%[^\n]", input) > 0) {
printf("%s\n", input);
}
input[0] = 0;
}
return 1;
}
I realize that there is a bunch of bad practices here, like setting a giant string array, but that is just for simplifying it.
Anyway, my problem is that it repeats the first input then the input freezes. As far as I can tell, it freezes during the while loop (1 is never returned).
Any help would be appreciated.
Oh, and using TCC as the compiler.
You don't need an array for echo
#include <stdio.h>
int main(void)
{
int c;
while((c = getchar()) != EOF) putchar(c);
return 0;
}
It's fine that you have such a large string allocated, as long as it's possible for users to input a string of that length. What I would use for input is fgets (read this for more information). Proper usage in your situation, given that you still would like to use the string of size 500, would be:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int echo(){
char input[500];
while(fgets(input, 500, STDIN)){ //read from STDIN (aka command-line)
printf("%s\n", input); //print out what user typed in
memset(input, 0, strlen(input)); //reset string to all 0's
}
return 1;
}
Note that changing the value of 500 to whatever smaller number (I would normally go with some power of 2 by convention, like 512, but it doesn't really matter) will limit the length of the user's input to that number. Also note that I didn't test my code but it should work.
scanf("%[^\n]", input
Should be:
scanf("%s",input)
Then after your if you should do:
memset(input,0,500);
There are many ways of accomplishing this task however the easiest would be to read from stdin one byte at a time and output that byte to stdout as you process each byte.
Snippet:
#include <stdio.h>
int main( void ) {
// Iterates until EOF is sent.
for ( int byte = getchar(); byte != EOF; byte = getchar() ) {
// Outputs to stdout the byte.
putchar( byte );
}
return 0;
}
Remark:
You must store the byte that you are reading through stdin in an integer. This is because you are not guaranteed that char is signed or unsigned, there are in fact 3 char types in C (char, signed char and unsigned char). Include the limits library to determine whether a char is signed or not in your environment.
You must compile using the C99 standards, otherwise move the declaration of byte outside of the for loop.

Why is file size different when the files have the same number of characters?

Here when I get file size using stat() it gives different output, why does it behave like this?
When "huffman.txt" contains a simple string like "Hi how are you" it gives file_size = 14. But when "huffman.txt" contains a string like "ά­SUä5Ñ®qøá"F" it gives file size = 30.
#include <sys/stat.h>
#include <stdio.h>
int main()
{
int size = 0;
FILE* original_fileptr = fopen("huffman.txt", "rb");
if (original_fileptr == NULL) {
printf("ERROR: fopen fail in %s at %d\n", __FUNCTION__, __LINE__);
return 1;
}
/*create variable of stat*/
struct stat stp = { 0 };
stat("huffman.txt", &stp);
/*determine the size of data which is in file*/
int filesize = stp.st_size;
printf("\nFile size is %d\n", filesize);
}
This has got to do with encoding.
Plain-text english characters are encoded in ASCII, where each character is one byte.
However, characters in non-plain text english are encoded in Unicode each being 2-byte.
Easiest way to see what is happening is to print each character using
char c;
/* Read file. */
while (c = fgetc())
printf ("%c", c)
You'll understand why the file size is different.
If you're asking why different strings with the same number of characters could have different sizes in bytes, read up on UTF-8

Resources