C: Output with symbols in Caesar’s cipher encrypts, WHY? pset2 cs50 - c

This is Caesar’s cipher encrypts problem in pset2 of cs50x course in edx.org.
I already solved this problem with another algorithm but this was my first try and I'm still curious why appear all these symbols at the right side of the caesar text.
ie. I enter the text "Testing" and the output is "Fqefuz�����w����l��B��" but the answer is correct without the symbols.
Can anyone explain me that?
int main(int argc, string argv[])
{
bool keyOk = false;
int k = 0;
do
{
if(argc != 2) // Checking if the key was correctly entered.
{
printf("You should enter the key in one argument from"
" the prompt(i.e. './caesar <key>').\n");
return 1;
}
else
{
k = atoi(argv[1]); // Converting string to int.
keyOk = true; // Approving key.
}
}
while(keyOk == false);
string msg = GetString(); // Reading user input.
char caesarMsg[strlen(msg)];
for(int i=0, n = strlen(msg); i < n; i++)
{
if( (msg[i] >= 'a') && (msg[i] <= 'z') )
// Processing lower case characters
{
caesarMsg[i] = ((((msg[i] - 97) + k) % 26) + 97);
}
else if( (msg[i] >= 'A') && (msg[i] <= 'Z') )
// Processing upper case characters
{
caesarMsg[i] = ((((msg[i] - 65) + k) % 26) + 65);
}
else
{
caesarMsg[i] = msg[i];
}
}
printf("%s", caesarMsg);
printf("\n");
}

The root problem is C does not have a full, proper, or first-class "string" datatype. In C strings are in fact character arrays that are terminated with the NUL ('\0') (*) character.
Look at
string msg = GetString(); // Reading user input.
char caesarMsg[strlen(msg)];
This is equivalent to
char* msg = GetString(); /* User or library function defined elsewhere */
/* calculates the length of the string s, excluding the terminating null
byte ('\0') */
size_t len = strlen(msg);
char caesarMsg[len]; /* Create an character (byte) array of size `len` */
Hopefully this makes it clearer, why this section fails to work correctly. The variable len that I've added, is the length of the sequence of non-NUL characters in the string msg. So when you create the character array caesarMsg of length len, there is no room for the NUL character to be stored.
The for loop correctly executes, but the printf("%s", caesarMsg); will continue to print characters until it finds a NUL or crashes.
BTW you can reduce the two printf statements at the end into a single printf statement easily.
printf("%s\n", caesarMsg);
Strings and character arrays are a frequent source of confusion to anyone new to C, and some not-so-new to C. Some additional references:
I really recommend bookmarking is the comp.lang.c FAQ.
I also strongly that you have either get your own copy or ensure you have access to Kernighan and Ritchie's The C Programming Language, Second Edition (1988).
Rant: And whoever created the string typedef is evil / making a grave error, by misleading students that into thinking C's strings are are a "real" (or first-class) data type.
(*) NUL is different from NULL, because NULL (the null-pointer) is cast as a pointer as so it the same size as other pointers, where as NUL is a null-character (and either the size of a char or int).

Related

Unsure as to why toupper() is cutting off last letter in C

So the goal of this program is to basically take a 26 letter 'key' in the terminal (through argv[]) and use its index's as a substitution guideline. So there are 2 inputs you enter in the terminal, one in the argv[] and one is just a plain get_string() input. The argv[] input will look like this: ./s YTNSHKVEFXRBAUQZCLWDMIPGJO where s is the file name. And then the get_string() input will look like this: plaintext: HELLO. (The input is HELLO). What the program will then do is loop through all the letters in the plaintext input and substitute its alphabetical index according to the index of the argv[] key. For example, H has an alphabetical index of 7 (where a = 0 and z = 25), so we look at the 7th index in the key YTNSHKV(E)FXRBAUQZCLWDMIPGJO which in this case is E. It does this for each letter in the input and we'll end up with the output ciphertext: EHBBQ. This is what it should look like in the terminal:
./s YTNSHKVEFXRBAUQZCLWDMIPGJO
plaintext: HELLO
ciphertext: EHBBQ
But my output is EHBB, since it cuts off the last letter for some reason when I use toupper().
And also, the uppercase and lowercase depends on the plaintext input, if the plaintext input was hello, world and the argv[] key was YTNSHKVEFXRBAUQZCLWDMIPGJO, the output would be jrssb, ybwsp, and if the input was HellO, world with the same key, the output would be JrssB, ybwsp.
I'm basically done with the problem, my program substitutes the plaintext given into the correct ciphertext based on the key that was inputted through the command line. Right now, say if the plaintext input was HELLO, and the key was vchprzgjntlskfbdqwaxeuymoi (all lowercase), then it should return HELLO and not hello. This is because my program puts all the letters in the command line key into an array of length 26 and I loop through all the plaintext letters and match it's ascii value (minus a certain number to get it into 0-25 index range) with the index in the key. So E has an alphabetical index of 4 so in this case my program would get lowercase p, but I need it to be P, so that's why I'm using toupper().
When I use tolower(), everything worked fine, and once I started using toupper(), the last letter of the ciphertext is cut off for some reason. Here is my output before using toupper():
ciphertext: EHBBQ
And here is my output after I use toupper():
ciphertext: EHBB
Here is my code:
int main(int argc, string argv[]) {
string plaintext = get_string("plaintext: ");
// Putting all the argv letters into an array called key
char key[26]; // change 4 to 26
for (int i = 0; i < 26; i++) // change 4 to 26
{
key[i] = argv[1][i];
}
// Assigning array called ciphertext, the length of the inputted text, to hold cipertext chars
char ciphertext[strlen(plaintext)];
// Looping through the inputted text, checking for upper and lower case letters
for (int i = 0; i < strlen(plaintext); i++)
{
// The letter is lower case
if (islower(plaintext[i]) != 0)
{
int asciiVal = plaintext[i] - 97; // Converting from ascii to decimal value and getting it into alphabetical index (0-25)
char l = tolower(key[asciiVal]); // tolower() works properly
//printf("%c", l);
strncat(ciphertext, &l, 1); // Using strncat() to append the converted plaintext char to ciphertext
}
// The letter is uppercase
else if (isupper(plaintext[i]) != 0)
{
int asciiVal = plaintext[i] - 65; // Converting from ascii to decimal value and getting it into alphabetical index (0-25)
char u = toupper(key[asciiVal]); // For some reason having this cuts off the last letter
strncat(ciphertext, &u, 1); // Using strncat() to append the converted plaintext char to ciphertext
}
// If its a space, comma, apostrophe, etc...
else
{
strncat(ciphertext, &plaintext[i], 1);
}
}
// prints out ciphertext output
printf("ciphertext: ");
for (int i = 0; i < strlen(plaintext); i++)
{
printf("%c", ciphertext[i]);
}
printf("\n");
printf("%c\n", ciphertext[1]);
printf("%c\n", ciphertext[4]);
//printf("%s\n", ciphertext);
return 0;
}
The strncat function expects its first argument to be a null terminated string that it appends to. You're calling it with ciphertext while it is uninitialized. This means that you're reading unitialized memory, possibly reading past the end of the array, triggering undefined behavior.
You need to make ciphertext an empty string before you call strncat on it. Also, you need to add 1 to the size of this array to account for the terminating null byte on the completed string to prevent writing off the end of it.
char ciphertext[strlen(plaintext)+1];
ciphertext[0] = 0;
There are multiple problems in the code:
you do not test the command line argument presence and length
the array should be allocated with 1 extra byte for the null terminator and initialized as an empty string for strncat() to work properly.
instead of hard coding ASCII values such as 97 and 65, use character constants such as 'a' and 'A'
strncat() is overkill for your purpose. You could just write ciphertext[i] = l; instead of strncat(ciphertext, &l, 1)
islower() and isupper() are only defined for positive values of the type unsigned char and the special negative value EOF. You should cast char arguments as (unsigned char)c to avoid undefined behavior on non ASCII bytes on platforms where char happens to be a signed type.
avoid redundant tests such as islower(xxx) != 0. It is more idiomatic to just write if (islower(xxx))
Here is a modified version:
#include <ctype.h>
#include <stdio.h>
#include <string.h>
#include <cs50.h>
int main(int argc, string argv[]) {
// Testing the argument
if (argc < 2 || strlen(argv[1]) != 26) {
printf("invalid or missing argument\n");
return 1;
}
// Putting all the argv letters into an array called key
char key[26];
memcpy(key, argv[1], 26);
string plaintext = get_string("plaintext: ");
int len = strlen(plaintext);
// Define an array called ciphertext, the length of the inputted text, to hold ciphertext chars and a null terminator
char ciphertext[len + 1];
// Looping through the inputted text, checking for upper and lower case letters
for (int i = 0; i < len; i++) {
unsigned char c = plaintext[i];
if (islower(c)) { // The letter is lower case
int index = c - 'a'; // Converting from ascii to decimal value and getting it into alphabetical index (0-25)
ciphertext[i] = tolower((unsigned char)key[index]);
} else
if (isupper(c)) {
// The letter is uppercase
int index = c - 'A'; // Converting from ascii to decimal value and getting it into alphabetical index (0-25)
ciphertext[i] = toupper((unsigned char)key[index]);
} else {
// other characters are unchanged
ciphertext[i] = c;
}
}
ciphertext[len] = '\0'; // set the null terminator
printf("ciphertext: %s\n", ciphertext);
return 0;
}

Why does the output of the code changes, when printing a random text with printf?

The following code changes the letters of the input by k. k is the first argument. For example, if the input should be changed by 2 letters the command line argument would be "./caesar 2", if three then "./caesar 3". etc.
Changing letters means, for example, change by 2, then input 'a' becomes 'c'. Change by 3 means input "abc" becomes "def", etc.
The input provided by the user is being checked if (a) the number arguments is exact 2, (b) the argument is a number.
The code executed as the code is written below the output is shortened by one letter. For example, "hello" changed by 1 letter becomes "iffm".
If one letter is entered only, it shows the correct output follwed by some undefined letters. For example, 'a' becomes "b��P" or "bm>�" or "b;���".
When either (1) the input check (b) [if the argument is a number] is removed OR (2) a printf line with a random statement (it can even be an empty string) is inserted EXACTLY between the get_string function, when asking the user for input and the for-loop, when changing the letters the output is as intended. Or (3) if the input's last character is a special character, the output is as expected (special character is any non-alphabetical character). for example, "hello1" or "hello!" changed by one letter becomes "ifmmp1" or "ifmmp!".
I am really desperate and I don't know what is happening and more importantly why this is happening.
So my questions are:
(1) Why is output shortened by one letter? Why is the output wrong, when the input is one letter only? (I guess it's the same problem).
(2)
(a) Why does the output change when either the number check is removed or
(b) a random printf line is inserted exactly between the lines mentioned above or
(c) the last character is a non-alphabetical character?
I really appreciate any help and please excuse any weird English as it is not my native language :). Thanks a lot! A desperate code learner :)
This is the code:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
bool isNumber(string numberToCheck);
int main(int argc, string argv[])
{
// checking, if arguments are correct
// checking, if input is correct (i.e. 2)
if(argc != 2)
{
printf("Usage: ./cesar key\n");
exit(1);
}
// checking, if input is a number, if the following if statement is removed the output changes
if(!isNumber(argv[1]))
{
printf("Usage: ./caesar key\n");
exit(1);
}
// variables
int k = atoi(argv[1]);
string plaintext;
int plaintextLength;
// getting the plaintext from user input, creating ciphertext string of same length
plaintext = get_string("paintext: ");
// checking the length of the input
plaintextLength = strlen(plaintext);
//the output changes, when the next line is being inserted
printf("");
// creating new empty string with plaintextLength
char ciphertext[plaintextLength];
// iterating through plaintext char by char
for(int i = 0; i < plaintextLength;i++)
{
// in case of capital letter
if(plaintext[i] >= 65 && plaintext[i] <= 90)
{
ciphertext[i] = 65 + (((plaintext[i] - 65) + k) % 26);
}
// else in case of small letter
else if(plaintext[i] >= 97 && plaintext[i] <= 122)
{
ciphertext[i] = 97 + (((plaintext[i] - 97) + k) % 26);
}
// else in case of non alphabetical letter
else
{
ciphertext[i] = plaintext[i];
}
}
printf("ciphertext: %s\n", ciphertext);
}
bool isNumber(string numberToCheck)
{
for(int i = 0;i < strlen(numberToCheck); i++)
{
if(!isdigit(numberToCheck[i]))
{
return false;
}
}
return true;
}
int i;
// iterating through plaintext char by char
for(i = 0; i < plaintextLength;i++)
{
// in case of capital letter
if(plaintext[i] >= 65 && plaintext[i] <= 90)
{
ciphertext[i] = 65 + (((plaintext[i] - 65) + k) % 26);
}
// else in case of small letter
else if(plaintext[i] >= 97 && plaintext[i] <= 122)
{
ciphertext[i] = 97 + (((plaintext[i] - 97) + k) % 26);
}
// else in case of non alphabetical letter
else
{
ciphertext[i] = plaintext[i];
}
}
ciphertext[i] = '\0';
printf("ciphertext: %s\n", ciphertext);
}
In C every string uses \0 to tell the computer that it has reached the end of a string.
Without this NULL terminating operator, the computer might try to read past the string into the memory, that is why you encounter random symbols.
So when you are creating a string make sure to add \0 at the end of it, this is what I did with your code here.
I declared "i" outside the for loop the it doesn't cease to exist at the end of the loop.
When you reach the end of the loop, "i" will be equal to the length of your string.
If my plaintext is "lol" and the key is 5, "i" will be equal to 3 and the ciphertext will be "qtq".
Ciphertext[i] will point just after the last "q" since we count from 0 an this is where you want to put your \0.
Also there is a typo in your getstring.
Hope my explaination is clear, if you have any question juste ask :)

Iterating through all possible Letter combinations

Trying to crack password using brute force but program does not seem to realize when it found the password.
So I tried writing a little program to crack passwords using brute force.
basically I iterate rhough all letter combinations using nested for loops (The passwords needn't be longer than 4 letters), then I use crypt on the password with a given salt (assuming the salt is fixed) to check whether I cracked the password (I have access to the crypted passwords). Now I had the code output what it generates and it looks like it does iterate through all lowercase variants. But somehow it never find the password.
Now I have created the crypted version myself so I know the password is 4 Letters long, that’s not the problem and the program does seems to iterate through all possibilities, I think, so what could be the problem ? Is the if condition wrong ?
Now I guess this is not the most elegant solution for the problem but I think the general Idea is right. But if there is a problem with that nested for loop approach I'd be happy to know that to :).
#define _XOPEN_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <cs50.h>
int main(int argc, string argv[]) {
if(argc < 2) {
printf("Please enter a password to crack! \n");
return 1;
}
char s[4] = "";
for(int i = 0; i <=27; i++) {
if(i == 0)
s[0] = 0;
else
s[0] = i - 1 + 'a';
if(strcmp(crypt(s, "50"), argv[1]) == 0) {
break; }
for(int j = 0; j <=26; j++) {
if(j == 0)
s[1] = 0;
else
s[1] = (j - 1 + 'a');
if(strcmp(crypt(s, "50"), argv[1]) == 0)
break;
for(int k = 0; k <= 26; k++) {
if(k == 0)
s[2] = 0;
s[2] = (k - 1 + 'a');
if(strcmp(crypt(s, "50"), argv[1]) == 0)
break;
for(int l = 0; l <= 26; l++) {
printf("%s \n", s);
if(l == 0)
s[3] = 0;
else
s[3] = (l - 1 + 'a');
if(strcmp(crypt(s, "50"), argv[1]) == 0)
break;
}
}
}
}
if (strcmp(crypt(s, "50"), argv[1]) != 0)
printf("Password not found");
else
printf("%s \n", s);
Declaring char s[4] fails to leave room for a null character terminator when the array has four non-null characters, resulting in behavior not defined by the C standard.
C strings are null terminated, that means, that a string in C is basically an array of bytes/characters (like the one you initialize: char s[4] = "";) with the last byte set to 0.
strcmp goes through two strings(byte arrays) and compares them one by one until it finds a different character or until both strings end, i.e. the currently compared byte is a 0 for both strings. When you initialize your char array s to be 4 bytes, and write 4 non-null bytes into it, strcmp will continue comparing bytes after the end of the array, because it did not hit the end of a string. This can also crash your program or lead to security vulnerabilities, because you continue reading into memory, that you are not supposed to. To get around this, properly terminate your input strings with a null byte and use the strncmp function when possible, which takes an additional parameter, indicating how many characters it should compare at most.
As you correctly identified, the nested loop approach is not the most elegant solution, because you are repeating yourself over and over and the resulting code is very rigid and can only compare up to 4 character passwords. It would be cleaner to have an outer for loop to loop through the possible passwords lengths and then call a function which checks all possible passwords of a certain length (using two nested loops).
If you are just looking to bruteforce some passwords and not actually want to program it yourself, you will probably be better of using some professional tool like https://hashcat.net/hashcat/.

C: Comparing hash value seems to disappear

For the love of holy code, I am trying to compare hashes to find the correct password. I am given a hash as a command line argument, and I then hash words from "a" to "ZZZZ" until one of the hash pairs match.
void decipher(string hash)
{
//Set the password, and the salt.
char pass[4] = "a";
char salt[] ="50";
//Compare the crypted pass againts the hash until found.
while (strcmp(hash,crypt(pass, salt)) != 0)
{
//Use int i to hold position, and return next char
int i = 0;
pass[i] = get_next(pass[i]);
tick_over (pass, i);
//Hardcode in a fail safe max length: exit.
if (strlen(pass) > 4)
{
break;
}
}
printf("%s\n", pass);
}
The problem is that it will not 'catch' the correct password / comparison, when that password is 4 letters long. It works for 1,2 and 3 letter long words.
//Tick over casino style
string tick_over (string pass, int i)
{
//Once a char reaches 'Z', move the next char in line up one value.
char a[] = "a";
if (pass[i] == 'Z')
{
if (strlen(pass) < i+2)
{
strncat (pass, &a[0], 1);
return pass;
}
pass[i+1] = get_next(pass[i+1]);
//Recursively run again, moving along the string as necessary
tick_over (pass, i+1);
}
return pass;
}
//Give the next character in the sequence of available characters
char get_next (char y)
{
if (y == 'z')
{
return 'A';
}
else if (y == 'Z')
{
return 'a';
}
else
{
return y + 1;
}
}
It does iterate through the correct word, as I have found in debugging. I have tried moving the
strcmp(hash, crypt(pass, salt)) == 0
into a nested if statement among other things, but it doesn't seem to be the problem. Is c somehow 'forgetting' the command line value? When debugging the hash value seemed to have disappeared :/ Please help!
char pass[4] = "a"; you're defining a char array which can contain at most 3 chars + null terminator.
that's not coherent with your "safety" test: if (strlen(pass) > 4)
When strlen is 4 the array is already overwriting something in memory because of the null termination char: undefined behaviour.
Quickfix: char pass[5] ...
Here is the explanation of the function strncat:
Append characters from string
Appends the first num characters of source to destination, plus a terminating null-character.
with a size of 4 you are not considering the terminating null character of your four chars array.

Code not working as expected in C

I was working on a program in C to count the number of spaces in a sentence. But I haven't managed to get it to work properly. If I enter something like Hello world 1234 how are you the output I'm getting is 3 when the output expected is 5.
My code is :
//Program to count number of words in a given Sentence
#include <stdio.h>
#include <string.h>
int main()
{
char sent[100];
char sentence[] = {' ', '\0'};
printf("\nEnter a sentence :\n");
gets(sent);
strcat(sentence, sent);
int l = strlen(sentence), i = 0, count = 0, countCh = 0;
printf("%d", l);
char ch, ch1;
for (i = 0; i < (l-1); i++)
{
ch = sentence[i];
if (ch == ' ')
{
ch1 = sentence[i+1];
if (((ch1 >= 'A') && (ch1 <= 'Z'))||((ch1 >= 'a') && (ch1 <= 'z')))
count++;
}
}
printf("\nNo of words is : %d", count);
return 0;
}
I used the same logic in Java and it worked fine. Could someone explain whats going wrong?
The problem in your code is with the definition of sentence. When you leave out the array dimension and initialize it, the size of the array will be determined by the length of the initializer.
Quoting the man page of strcat()
The strcat() function appends the src string to the dest string, overwriting the terminating null byte ('\0') at the end of dest, and then adds a terminating null byte. The strings may not overlap, and the dest string must have enough space for the result. If dest is not large enough, program behavior is unpredictable;
That is, the program will invoke undefined behavior.
This way, sentence has certainly way less memory than it it supposed to hold. Moreover, strcat() is not at all required there.
The correct way to do it will be
Define sentence with a proper dimention, like char sentence[MAXSIZE] = {0};, where MAXSIZE will be a MACRO having the size of your choice.
use fgets() to read the user input.
use isspace() (from ctype.h) in a loop to check for presence of space in the input string.
The following
if (((ch1 >= 'A') && (ch1 <= 'Z'))||((ch1 >= 'a') && (ch1 <= 'z')))
count++;
probably should be
if (ch1 != ' ')
count++;
As now " 12345" would not be counted as word.
Also count counts the spaces, so the word count is one more: hence 3 instead of 5.
Your sentence seems to have had the intention of counting the terminatin NUL.
If you want to count real words containing letters, use a bool state whether current and prior state of being in a letter differ.
As mentioned overflow is possible with your code.

Resources