File in c language - c

I need help about my code, I got some works, and it is one of the assignments.
suppose an encrypted file was created using the encoding/decoding scheme.
Each letter is substituted by some other letter according to a given mapping as shown below.
char * letters = "abcdefghijklmnopqrstuvwxyz";
char * enc = "kngcadsxbvfhjtiumylzqropwe";
For example, every a becomes a k when encoding a text, and every k becomes an a when decoding.
You will write a program, encode or decode a File, and then encodes or decodes the File using the mapping above.
Capital letters are mapped the same way as the lower case letters above, but remain capitalized.
For example, every 'A' becomes 'K' when encoding a file, and every 'K' becomes an 'A' when decoding.
Numbers and other characters are not encoded and remain the same.
Write a program to read a file and encode the file to an encrypted file.
And write a program to get an encrypted file and decode to original file.
Your program should prompt the user to enter an input file name and an output file name.
Ask for input file name/ output file name (encrypted file). The encrypt using above encode/decode.
Ask for encrypted file and decoded to original input file.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
int main()
{
char letters[]={"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"};
char enlet[]={"kngcadsxbvfhjtiumylzqropweKNGCADSXBVFHJTIUMYLZQROPWE"};
char infile[20];
char outfile[20];
char ch;
int i;
FILE *org, * enc, *dec;
printf("Enter file name (***.txt) : ");
gets(infile);
printf("Enter saving file name (***.txt) : ");
gets(outfile);
org = fopen(infile,"r");
enc = fopen(outfile,"w+");
while((ch=fgetc(org))!=EOF)
{
for(i=0;i<52;i++)
{
if(letters[i]==ch)
{
ch=enlet[i];
}
}
fputc(ch,enc);
}
fclose(org);
fclose(enc);
return 0;
}
this code is working but letters doesn't change correctly.
If there are "abcdefghijklmnopqrstuvwxyz" in my original file,
then, it happens "felcadlpbrfhjeiqmwleqropwe" in encoded file.
I expected it would be "kngcadsxbvfhjtiumylzqropwe"
I don't know what are the errors in my code.

Your if block should read:
if ( letters[i]==ch )
{
ch = enlet[i];
break;
}
so that ch is not replaced twice. I.e., the moment you know the substitution for that input file position, break, and move on.

Inside this loop, you overwrite ch after it has been replaced.
while((ch=fgetc(org))!=EOF)
{
for(i=0;i<52;i++)
{
if(letters[i]==ch)
{
ch=enlet[i];
}
}
fputc(ch,enc);
}
You could do one of two things:
Instead of assigning ch=enlet[i] just do the fputch(enlet[i])
or
Do break the loop as soon as you found a match

You could skip the for() loop and just use:
if( org && enc )
while( (ch=fgetc(org))!=EOF)
{
char *p = strchr( letters, ch );
fputc( (p)?enlet[p-letters]:ch, enc );
}
Also, you really should declare ch as an int to compare it to EOF. And gets() is a buffer overflow waiting to happen and crash your program / provide a security exploit hook (use fgets() and remember to parse off the trailing newlines). And you never check to see if org and enc aren't NULL (files opened successfully)

Related

Stdin + Dictionary Text Replacement Tool -- Debugging

I'm working on a project in which I have two main files. Essentially, the program reads in a text file defining a dictionary with key-value mappings. Each key has a unique value and the file is formatted like this where each key-value pair is on its own line:
ipsum i%##!
fubar fubar
IpSum XXXXX24
Ipsum YYYYY211
Then the program reads in input from stdin, and if any of the "words" match the keys in the dictionary file, they get replaced with the value. There is a slight thing about upper and lower cases -- this is the order of "match priority"
The exact word is in the replacement set
The word with all but the first character converted to lower case is in the replacement set
The word converted completely to lower case is in the replacement set
Meaning if the exact word is in the dictionary, it gets replaced, but if not the next possibility (2) is checked and so on...
My program passes the basic cases we were provided but then the terminal shows
that the output vs reference binary files differ.
I went into both files (not c files, but binary files), and one was super long with tons of numbers and the other just had a line of random characters. So that didn't really help. I also reviewed my code and made some small tests but it seems okay? A friend recommended I make sure I'm accounting for the null operator in processInput() and I already was (or at least I think so, correct me if I'm wrong). I also converted getchar() to an int to properly check for EOF, and allocated extra space for the char array. I also tried vimdiff and got more confused. I would love some help debugging this, please! I've been at it all day and I'm very confused.
There are multiple issues in the processInput() function:
the loop should not stop when the byte read is 0, you should process the full input with:
while ((ch = getchar()) != EOF)
the test for EOF should actually be done differently so the last word of the file gets a chance to be handled if it occurs exactly at the end of the file.
the cast in isalnum((char)ch) is incorrect: you should pass ch directly to isalnum. Casting as char is actually counterproductive because it will turn byte values beyond CHAR_MAX to negative values for which isalnum() has undefined behavior.
the test if(ind >= cap) is too loose: if word contains cap characters, setting the null terminator at word[ind] will write beyond the end of the array. Change the test to if (cap - ind < 2) to allow for a byte and a null terminator at all times.
you should check that there is at least one character in the word to avoid calling checkData() with an empty string.
char key[ind + 1]; is useless: you can just pass word to checkData().
checkData(key, ind) is incorrect: you should pass the size of the buffer for the case conversions, which is at least ind + 1 to allow for the null terminator.
the cast in putchar((char)ch); is useless and confusing.
There are some small issues in the rest of the code, but none that should cause a problem.
Start by testing your tokeniser with:
$ ./a.out <badhash2.c >zooi
$ diff badhash2.c zooi
$
Does it work for binary files, too?:
$ ./a.out <./a.out > zooibin
$ diff ./a.out zooibin
$
Yes, it does!
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
void processInput(void);
int main(int argc, char **argv) {
processInput();
return 0;
}
void processInput() {
int ch;
char *word;
int len = 0;
int cap = 60;
word = malloc(cap);
while(1) {
ch = getchar(); // (1)
if( ch != EOF && isalnum(ch)) { // (2)
if(len+1 >= cap) { // (3)
cap += cap/2;
word = realloc(word, cap);
}
word[len++] = ch;
} else {
if (len) { // (4)
#if 0
char key[len + 1];
memcpy(key, word, len); key[len] = 0;
checkData(key, len);
#else
word[len] = 0;
fputs(word, stdout);
#endif
len = 0;
}
if (ch == EOF) break; // (5)
putchar(ch);
}
}
free(word);
}
I only repaired your tokeniser, leaving out the hash table and the search & replace stuff. It is now supposed to generate a verbatim copy of the input. (which is silly, but great for testing)
If you want to allow binary input, you cannot use while((ch = getchar()) ...) : a NUL in the input would cause the loop to end. You must pospone testing for EOF, because ther could still be a final word in your buffer ...&& ch != EOF)
treat EOF just like a space here: it could be the end of a word
you must reserve space for the NUL ('\0') , too.
if (len==0) there would be no word, so no need to look it up.
we treated EOF just like a space, but we don't want to write it to the output. Time to break out of the loop.

fgets storing unknown data from text file

I was creating a word guessing game that reads a text file line per line until it finds a random word and stores it in a string (word). Then the user enters letters until all the letters of the stored word are revealed.
So far it works perfectly but every time it's the very first word that is read some unknown characters get stored at the beginning of char word[20].
Note that I use C mobile app and it uses clang 6.0 compiler I think. So does the error come from my code or it come from their app? (which I love).
Here is the FULL, clearer code:
//guess the right word
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <stdlib.h>
#define TRUE 1
#define FALSE 0
#define NB_OF_WORDS 3 //number of words in text file
main() {
char begin, word[20] = { 0 }, guessedletter;
int num, rightletter, success;
int i = 0;
int show[20] = { 0 };//shown letters
FILE *ressource = NULL;
srand(time(NULL));
if ((ressource = fopen("ressource.txt", "r")) == NULL) {
fprintf(stderr, "Error ressource.txt");//open in read only mode
exit(EXIT_FAILURE);
}
printf("Welcome to Word Guess, a word guessing contest.\n"
"You have to find the letters of a word and guess what word it is.\n"
"Begin? (y/n)\n");
while ((begin = getchar()) == 'y') { //game loop
/*reinitializations*/
fseek(ressource, 3, SEEK_SET);//replaces rewind(ressource)
success = FALSE;
rightletter = 0;
num = 0;
num = rand() % NB_OF_WORDS; //random number between 0 and NB-1
for (i = 0; i < 20; i++) {
show[i] = FALSE;
word[i] = 0;
}
i = 0;
/*end of reinitializations*/
while (i <= num) {//reads line by line until random word is stored
if (fgets(word, 20, ressource) == NULL) {
fprintf(stderr, "fgets did not work");
exit(EXIT_FAILURE);
}
i++;
}
printf("%s", word);//here is just for testing if the read word is well read. Which isn't the case if num=0
for (i = 0; i < 20; i++)
if (word[i] == '\n')
word[i] = '\0';//adds zero character to show where the string ends
while (!success) { //guessing loop
printf("\nWrite a letter: ");
scanf("%c", &guessedletter);
printf("\n");
for (i = 0; word[i] != '\0'; i++) { //compares entered letter to letter from string. If they match...
if (word[i] == guessedletter) {
if (!show[i])
rightletter++;//avoids letters entered twice from increasing count
show[i] = TRUE;
}
if (show[i])
printf("%c", word[i]);//...a letter is revealed...
else
printf("*");//else it stays hidden
}
if (rightletter == strlen(word))
success=TRUE;//if all the right letters found (same number of letters found as number of letters in the words) you win
}
printf("\nCongratulations you have won!\nDo you want to replay? (y/n)");
getchar();//clears newline character
}
fclose(ressource);
return 0;
}
When num=0, I get a weird sign before the printed word as if the first characters of the text file were not supposed to be there...
Moreover during the guessing game if the word to guess (word[20]) is "annoying" let's say and that it's the first word from "ressource.txt" (num=0). The word will print like so on the screen once I guessed all the letters: ***annoying. Which does not happen with ANY other words from the list.
I'm new to this site and post from my phone... sorry for any mistakes.
EDIT: removed fgetc for fgets. Still get several unknown characters if fgets reads first line.
EDIT 2: added the whole code, translated mot[20] into word[20], added error testing
EDIT 3: Replacing rewind(ressource); by fseek(ressource, 3, SEEK_SET); solved the problem. Which means there are indeed three unknown characters at the beginning of the text file
There are multiple problems in the posted code:
You only posted a fragment of code, the rest of the function could cause problems that cannot be analysed from what you posted. Please post the complete code to a minimal program exhibiting the problem.
You do not test for end of file in the while loop. If you try to skip more lines than are present in the file, this loop will run indefinitely.
You do not test the return value of fgets(): if by chance you skipped the full contents of the file in the preceding while loop, fgets() will fail and return NULL, leaving mot in an undetermined state, causing unexpected behavior in the subsequent printf.
EDIT: the modified code still does not check the return value of fgets().
EDIT: thank you for posting the full code, but modifying the question this way makes this answer and the comments irrelevant.
Your dictionary file ressource.txt seems to start with a BOM (Byte Order Mark) encoded in UTF-8. I am guessing it contains French words including some with accented letters such as reculées and encodées... Your text editor saved it encoded in UTF-8 with an extra code-point at the beginning for other programs to determine this encoding easily. The problem with that is your program does not handle UTF-8. It assumes each byte read from stdin represents a single character. It might by chance match some words with accented letters, but will more likely fail to find them.

Store text from file in character array using fread()

Here is a minimal "working" example:
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char* argv[])
{
int num = 10;
FILE* fp = fopen("test.txt", "r"); // test.txt contains character sequence
char* ptr = (char*) malloc(sizeof (char)*(num+1)); // +1 for '\0'
fread(ptr, sizeof(char), num, fp); // read bytes from file
ptr[num] = '\0';
printf("%s\n", ptr); // output: ´╗┐abcdefg
free(ptr);
fclose(fp);
return 0;
}
I would like to read some letters from a text file, containing all letters from the alphabet in a single line. I want my array to store the first 10 letters, but the first 3 shown in the output are weird symbols (see the comment at the printf statement).
What am I doing wrong?
The issue is that your file is encoded using UTF-8. While UTF-8 is backwards-compatible with ASCII (which is what your code will be using) there are many differences.
In particular, many programs will put a BOM (Byte Order Mark) symbol at the start of the file to indicate which direction the bytes go. If you print the BOM using the default windows code page, you get the two symbols you saw.
Whatever program you used to create your text file was automatically inserting that BOM at the start of the file. Notepad++ is notorious for doing this. Check the save options and make sure to save either as plain ASCII or as UTF-8 without BOM. That will solve your problem.

Assembly Vigenère cipher program

I'm not really sure how to approach this problem:
For better frequency characteristics the keyword should not have any repeated
letters. Also, if it contains the letter A the encrypted letter will be the same as the plaintext, although this is not necessarily a bad thing.
To implement this algorithm with a pencil and paper, many descriptions ask you tobuild a Vigenère Square. However this is not really necessary when you are using acomputer to do the encoding and decoding.
Essentially the keyword is written repeatedly over and over above the plaintext.
Suppose the keyword is CRYPTOGRAM.
CRYPTOGRAMCRYPTOGRAMCRYPTOGRAMCRYPTOGRAMCRYPTOGRAMCRYPTOGRAMCRYPTOGR
WEHAVEBEENBETRAYEDALLISDISCOVEREDFLYATONCEMEETUSBYTHEOLDTREEATNINEPM
Consider that the letters are numbered 0 to 25. The letter on the top determines
which Caesar-cypher to use for the letter below. Thus C means shift the alphabet by 2, A means shift by 0, and so on. In mathematical terms, we are adding the two letters together modulo 26. (The square was used because the concept of modular arithmetic was not generally understood by soldiers in 1553.)
To decrypt the message, the same operation is performed in reverse. That is, the
value of the keyword letter is subtracted rather than added. Step 3. What your code should do
Your code should use STDIN and STDOUT for input and output. (This is the
default.) Use redirection on the command line to read from a file and write to a
file.
Your code should open a file, read it character by character and save it into an
array.
When you get to the end of the file you should encode the contents of the
array with a Vigenère cipher using the keyword CRYPTOGRAM, then print it
out.
Maintain the distinction between upper-case and lower-case letters, and do
not modify non-alphabetic characters. This is not very good for the security of
your message, but the result will look neater.
This program should use glibc functions. In addition to printf(), you may
need getchar() and putchar().
Assume that the input file contains just ASCII text Don't worry about what
happens with non-text files.
Once the encoder is working, build a decoder by duplicating the code and
changing the addition to a subtraction.
If you use printf() to output the array, remember that a null termination is
required on a string.
Start by breaking the problem down in smaller parts like "read input from stdin", "encrypt a string", "print output to stdout".
You need to be familiar with the modulus operator, because you will need to use it more than once in your program.
If you are having a hard time, here is one way to break down the problem
(there are other ways that are just as good):
/* For printf, getchar etc: */
#include <stdio.h>
/* For isalpha, isupper, islower etc: */
#include <ctype.h>
char encryptChar(char ch, char cypher) {
int shiftBy = cypher - 'A';
char encryptedLetter;
/* There are 3 cases: uppercase, lowercase, other char */
if (isupper(ch)) {
/* add code to encrypt uppercase char */
} else if (islower(ch)) {
/* add code to encrypt lowercase char */
} else {
/* Other characters stay as they are */
encryptedLetter = ch;
}
return encryptedLetter;
}
char *cypherString = "CRYPTOGRAM";
int main(int argc, char **argv) {
int ch;
int cypherStringLength = strlen(cypherString);
int counter = 0;
char cypher;
while ((ch = getchar()) != EOF) {
cypher = cypherString[counter%cypherStringLength];
ch = encryptChar(ch, cypher);
/* Add code to print the character */
counter++;
}
return 0;
}

Tring to read a text file with emoji and print it

Input -> 😂😂
Output-> 😂😂
I simply want to maintain the original state of the emoji.
All i am doing is this
#include <stdio.h>
#include <stdlib.h>
int main()
{
char ch;
FILE *fp;
fp = fopen("test.txt","r");
while( ( ch = fgetc(fp) ) != EOF )
printf("%c",ch);
fclose(fp);
return 0;
}
In Unicode encoding, emoji must take more than one bytes. Hence printing byte by byte will not help in this case. If you redirect the output to a file, you may get almost same as your file.
You may try to print the string by changing locale(on Linux) or you can try wprintf on Windows (remember to convert to Wide string).

Resources