CS50 / BEGINNER - Segmentation fault in nested for loop in C - arrays

I'm trying to write code that will take each digit from a plaintext string input and, if it is a letter, output a different letter, as defined by a substitution key (26-letter key).
In other words, if the alphabet was "abcd" and provided key was "hjkl", an input of "bad" would output "jhl".
// Regular alphabet is to be used as comparison base for key indexes //
string alphabet = "abcdefghijklmnopqrstuvwxyz";
// Prompt user for input and assign it to plaintext variable //
string plaintext = get_string("plaintext: ");
Non-letters should be printed as-is.
My idea was to loop the input digit through every index in the alphabet looking for the corresponding letter and, if found, print the same index character from the string. (confusing, I think)
This loop, however, returns a segfault when I run it, but not when debugging:
// Loop will iterate through every ith digit in plaintext and operate the cipher //
for (int i = 0; plaintext[i] != '\0'; i++) {
// Storing plaintext digit in n and converting char to string //
char n[2] = "";
n[0] = plaintext[i];
n[1] = '\0';
// If digit is alphabetic, operate cipher case-sensitive; if not, print as-is //
if (isalpha(n) != 0) {
for (int k = 0; alphabet[k] != '\0'; k++) {
char j[2] = "";
j[0] = alphabet[k];
j[1] = '\0';
if (n[0] == j[0] || n[0] == toupper(j[0])) {
if (islower(n) != 0) {
printf("%c", key[k]);
break;
} else {
printf("%c", key[k] + 32);
break;
}
}
}
} else {
printf("%c", (char) n);
}
}
What's going wrong? I've looked for help online but most sources are not very beginner-friendly.

Your code seems to be working except one error: The program crashes at
isalpha(n)
Cause you declared
char n[2]
the parameter there is a pointer of type char*. But islower only accepts an int parameter, so just write it as
isalpha(n[0])
Same for islower.

Related

Trying to write a loop to replace letters in a string with letters from a different string at the same index, but loop is ending early

I'm working on a program that is supposed to take a key as an input argument, and encrypt a user input word using this key.
The program should:
Ask the user for a plaintext word to encrypt
Standardize the letter case
Take each letter from the plaintext and find the index of this letter (A = 0, B = 1,...)
Look at the letter indexed at this location in the key string (input argument)
Assign this encrypted letter to a new sting called cypher
Print the new cyphertext string.
The code I'm using is this:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(int argc, string argv[])
{
//Check that key has 26 letters or end program
string key = argv[1];
if (strlen(argv[1]) != 26)
{
printf("Key must contain 26 characters\n");
return 1;
}
//Get plaintext
string plain = get_string("plaintext: ");
//Make key all letters upper case
for (int i = 0; i < plain[i]; i++)
{
if (islower(plain[i]))
{
plain[i] = plain[i] - 32;
}
printf("%c", plain[i]);
}
printf("\n");
//Encrypt
int index[] = {};
int cypher[] = {};
//Cycle through the letters in the word to be encoded
//printf("cyphertext: ");
printf("%c\n", key[79 - 65]);
for (int i = 0; i < strlen(plain); i++)
{
printf("index in key: %i\n", plain[i] - 65);
cypher[i] = key[plain[i] - 65];
printf("cypher: %c\n", cypher[i]);
}
printf("\n");
}
Everything executes fine until the fourth loop of the for loop that assigns the new values to the cypher string. When the program tries to set i = 4, I get the error Segmentation fault (core dumped)
I was expecting the last for loop to loop once for each letter of the input (e.g. input: hello; loops: 5), but I found that it stops at 4 and only outputs: 'HELL'.
I tried:
Words with 4 characters - executes the correct number of loops, but I still get Segmentation fault (core dumped) after the final loop
Words with 3 characters - executes fine, no error
Words with 5+ letters - Still loops 4 times before error
Please help!
The for loop should iterate from 0 to length of plain.
//Get plaintext
string plain = get_string("plaintext: ");
//Make key all letters upper case
for (int i = 0; i < strlen(plain); i++)
{
if (islower(plain[i]))
{
plain[i] = plain[i] - 32;
}
printf("%c", plain[i]);
}
//*** Must allocate memory for array
//Encrypt
int index[100] = {};
int cypher[100] = {};
//Cycle through the letters in the word to be encoded
//printf("cyphertext: ");
printf("%c\n", key[79 - 65]);
for (int i = 0; i < strlen(plain); i++)
{
printf("index in key: %i\n", plain[i] - 65);
cypher[i] = key[plain[i] - 65];
printf("cypher: %c\n", cypher[i]);
}
printf("\n");

substitution of one letter for another cipher key

So I am working on a SC50 problem where I need to make a simple cipher and be able to encrypt words or sentences... After 2 full days I actually kindof figured it all out, but my code was really long and after some googling I found a version out there that was much better. And it was real easy and all, except for the part where there is stuff that I don't really understand how it works, and I would really like to find out how... so here is the full code below (unfortunately I can't seem to find the original source of the code right now, but I actually did at least half of it myself, and only the part after "//SUBSTITUTION is copied) :
and also, what I wonder about, are these two rows:
printf("%c", toupper(arg[plaintext[i] - 65])); //calculation to print the encipher text amd make sure it is Uppercase (case doesn't change)
and
printf("%c", tolower(argv[1][plaintext[i] - 97])); ///calculation to print the encipher text amd make sure it is lowercase(case doesn't change)
...I can't wrap my head around, how the calculation "-65" and "-66" are solving the issue...
Lets say that in my key, the first letter is a Q, and when I write and A, it should be substituted for a Q...
A = 65 and Q = 81 on the Ascii table, so when I take 65 - 65... mm why would I do that? obviously it needs to be done for this program to work correctly, but I don't understand how it works and what actually happens...
what is the logic behind these calculations? please help!
#include <cs50.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(int argc, string argv[]) {
if (argc != 2) {
printf("Usage: ./substitution key\n");
return 1;
}
string arg = argv[1];
int chars = 0;
for (int i = 0; i < strlen(arg); i++) {
if (isalpha(arg[i])) {
for (int j = i+1; j < strlen(arg); j++) {
if (toupper(arg[j]) == toupper(arg[i])) {
printf("Key must not contain repeated alphabets.\n");
return 1;
}
}
chars += 1;
}
}
if (chars != 26) {
printf("Key must contain 26 characters.\n");
return 1;
}
// SUBSTITUTION
printf("%s\n", arg);
string plaintext = get_string("plaintext: "); //Getting user's input as plaintext
printf("ciphertext: "); //to print the ciphertext
int plaintext_length = strlen(plaintext); //get the strlen of plaintext (user's input)
for (int i = 0; i < plaintext_length; i++) { //iterate over the plaintext_Length
if (isupper(plaintext[i])) { //check if plaintext character is uppercase
printf("%c", toupper(arg[plaintext[i] - 65])); //calculation to print the encipher text amd make sure it is Uppercase (case doesn't change)
}
else if (islower(plaintext[i])) { //check if plaintext character is lowercase
printf("%c", tolower(arg[plaintext[i] - 97])); ///calculation to print the encipher text amd make sure it is lowercase(case doesn't change)
}
else { //if plaintext is anything else, print it like that without changing
printf("%c", plaintext[i]);
}
}
printf("\n"); //print new line
}
In general it is a bad code.
For example instead of using magic numbers 65 or 97
printf("%c", toupper(arg[plaintext[i] - 65]));
printf("%c", tolower(argv[1][plaintext[i] - 97]));
it is better to write
printf("%c", toupper(arg[plaintext[i] - 'A']));
printf("%c", tolower(argv[1][plaintext[i] - 'a']));
argv[1] or arg contains a string of 26 letters as for example
"xyzabcgtidefuvwjklmn0rspqh"
If you have a string as for example "Hello" then 'H' - 'A' gives the value 7. Using the number you can find at position 7 in the array pointed by the string argv[1] or arg the letter t
"xyzabcgtidefuvwjklmn0rspqh"
^
|
'H'
So the letter 'H' in the source string is coded like the letter 'T'. For the second letter 'e' you have 'e' - 'a' is equal to 4. So you have
"xyzabcgtidefuvwjklmn0rspqh"
^
|
'e'
So the first two letter of the string "Hello" becomes "Tbllo". This approach is used for the remaining letters of the source string to encrypt it.
"... I can't wrap my head around,..."
Yes, it is difficult to see what is going on in code that avoids (appropriately) using a temporary copy of the data of interest.
Without the comments, this is the "confusing" portion of your code, rewritten to temporarily use a single character variable:
// SUBSTITUTION
printf("%s\n", arg);
string plaintext = get_string("plaintext: "); //Getting user's input as plaintext
printf("ciphertext: "); //to print the ciphertext
int plaintext_length = strlen(plaintext); //get the strlen of plaintext (user's input)
for (int i = 0; i < plaintext_length; i++) //iterate over the plaintext_Length
{
char c = plaintext[i];
if( isupper( c ) )
{
printf("%c", toupper(arg[c - 65]));
}
else if ( islower( c ) )
{
printf("%c", tolower(arg[c - 97]));
}
else
{
printf("%c", plaintext[i]);
}
}
It is now obvious that there will be an output character, so 3 calls to printf() are distracting. Simplifying that leads to 're-using' the temporary char variable. (Here just showing the for() loop):
for (int i = 0; i < plaintext_length; i++) //iterate over the plaintext_Length
{
char c = plaintext[i];
if( isupper( c ) )
{
c = toupper(arg[c - 65]);
}
else if ( islower( c ) )
{
c = tolower(arg[c - 97]);
}
else
{
c = plaintext[i];
}
printf( "%c", c );
}
It is now apparent that the final 'else' is redundant:
for (int i = 0; i < plaintext_length; i++) //iterate over the plaintext_Length
{
char c = plaintext[i];
if( isupper( c ) )
{
c = toupper(arg[c - 65]);
}
else if ( islower( c ) )
{
c = tolower(arg[c - 97]);
}
printf( "%c", c );
}
The values '65' and '97' are called "magic numbers" (that you already understand correspond to ASCII 'A' and 'a' respectively.) Cleaning up that bad practice.
if( isupper( c ) )
{
c = toupper( arg[ c - 'A' ] );
}
else if ( islower( c ) )
{
c = tolower( arg[ c - 'a' ] );
}
printf( "%c", c );
It now is readily apparent that the 'case' of each input character determines the 'case' of the corresponding output character.
It should now also be readily apparent that the difference 'offset' from 'A' or 'a' of the plaintext character ( 'A/a' = 0, 'B/b' = 1, 'C/c' = 2) is being calculated. The result of that calculation becomes the INDEX of the 26 character enciphering key. Your 'Q' becomes '16' so the 16th character of the key is "looked up", turned into the appropriate case, and then used.
This operation can be further reduced as per the following:
for (int i = 0; i < plaintext_length; i++) //iterate over the plaintext_Length
{
char c = plaintext[i]; // copy of plaintext character
if( isalpha( c ) ) { // translate only alphabetic chars
c = tolower( c ) - 'a'; // 'a-z' ==> '0-25'
c = arg[ c ]; // use as index into key.
if( isupper( plaintext[i] ) // make case of enciphered char match input
c = toupper( c );;
}
printf( "%c", c );
}
Or, even more:
for (int i = 0; i < plaintext_length; i++) //iterate over the plaintext_Length
{
char c = plaintext[i]; // copy of plaintext character
if( isalpha( c ) ) { // translate only alphabetic chars
c = arg[ tolower( c ) - 'a' ]; // select corresponding 'key' character
if( isupper( plaintext[i] ) // make case of enciphered char match input
c = toupper( c );;
}
printf( "%c", c );
}
Although that seems intricate, its brevity is its strength.
EDIT: isalpha() toupper() and tolower() are standard C functions. The code will need to: #include <ctype.h> to use those functions.
EDIT2: toupper() and tolower() will return an unsigned char. To compile without warnings, change the declaration of 'c':
unsigned char c = plaintext[i]; // copy of plaintext character
EDIT3:
Your OP did not ask about the "validation code" that you say you wrote. I'm sorry, but it is insufficient. While it confirms there are 26 distinct characters in the key, a user could type a key containing additional punctuation sprinkled in. "abc" contains 3 distinct letters, but so does "a.b:!c"... You test for isalpha(). Why not halt immediately if a non-alpha char is found in the key? As written, illegitimate keys may be used and the 'enciphering' very, very incorrect...

Unsure as to why toupper() is cutting off last letter in C

So the goal of this program is to basically take a 26 letter 'key' in the terminal (through argv[]) and use its index's as a substitution guideline. So there are 2 inputs you enter in the terminal, one in the argv[] and one is just a plain get_string() input. The argv[] input will look like this: ./s YTNSHKVEFXRBAUQZCLWDMIPGJO where s is the file name. And then the get_string() input will look like this: plaintext: HELLO. (The input is HELLO). What the program will then do is loop through all the letters in the plaintext input and substitute its alphabetical index according to the index of the argv[] key. For example, H has an alphabetical index of 7 (where a = 0 and z = 25), so we look at the 7th index in the key YTNSHKV(E)FXRBAUQZCLWDMIPGJO which in this case is E. It does this for each letter in the input and we'll end up with the output ciphertext: EHBBQ. This is what it should look like in the terminal:
./s YTNSHKVEFXRBAUQZCLWDMIPGJO
plaintext: HELLO
ciphertext: EHBBQ
But my output is EHBB, since it cuts off the last letter for some reason when I use toupper().
And also, the uppercase and lowercase depends on the plaintext input, if the plaintext input was hello, world and the argv[] key was YTNSHKVEFXRBAUQZCLWDMIPGJO, the output would be jrssb, ybwsp, and if the input was HellO, world with the same key, the output would be JrssB, ybwsp.
I'm basically done with the problem, my program substitutes the plaintext given into the correct ciphertext based on the key that was inputted through the command line. Right now, say if the plaintext input was HELLO, and the key was vchprzgjntlskfbdqwaxeuymoi (all lowercase), then it should return HELLO and not hello. This is because my program puts all the letters in the command line key into an array of length 26 and I loop through all the plaintext letters and match it's ascii value (minus a certain number to get it into 0-25 index range) with the index in the key. So E has an alphabetical index of 4 so in this case my program would get lowercase p, but I need it to be P, so that's why I'm using toupper().
When I use tolower(), everything worked fine, and once I started using toupper(), the last letter of the ciphertext is cut off for some reason. Here is my output before using toupper():
ciphertext: EHBBQ
And here is my output after I use toupper():
ciphertext: EHBB
Here is my code:
int main(int argc, string argv[]) {
string plaintext = get_string("plaintext: ");
// Putting all the argv letters into an array called key
char key[26]; // change 4 to 26
for (int i = 0; i < 26; i++) // change 4 to 26
{
key[i] = argv[1][i];
}
// Assigning array called ciphertext, the length of the inputted text, to hold cipertext chars
char ciphertext[strlen(plaintext)];
// Looping through the inputted text, checking for upper and lower case letters
for (int i = 0; i < strlen(plaintext); i++)
{
// The letter is lower case
if (islower(plaintext[i]) != 0)
{
int asciiVal = plaintext[i] - 97; // Converting from ascii to decimal value and getting it into alphabetical index (0-25)
char l = tolower(key[asciiVal]); // tolower() works properly
//printf("%c", l);
strncat(ciphertext, &l, 1); // Using strncat() to append the converted plaintext char to ciphertext
}
// The letter is uppercase
else if (isupper(plaintext[i]) != 0)
{
int asciiVal = plaintext[i] - 65; // Converting from ascii to decimal value and getting it into alphabetical index (0-25)
char u = toupper(key[asciiVal]); // For some reason having this cuts off the last letter
strncat(ciphertext, &u, 1); // Using strncat() to append the converted plaintext char to ciphertext
}
// If its a space, comma, apostrophe, etc...
else
{
strncat(ciphertext, &plaintext[i], 1);
}
}
// prints out ciphertext output
printf("ciphertext: ");
for (int i = 0; i < strlen(plaintext); i++)
{
printf("%c", ciphertext[i]);
}
printf("\n");
printf("%c\n", ciphertext[1]);
printf("%c\n", ciphertext[4]);
//printf("%s\n", ciphertext);
return 0;
}
The strncat function expects its first argument to be a null terminated string that it appends to. You're calling it with ciphertext while it is uninitialized. This means that you're reading unitialized memory, possibly reading past the end of the array, triggering undefined behavior.
You need to make ciphertext an empty string before you call strncat on it. Also, you need to add 1 to the size of this array to account for the terminating null byte on the completed string to prevent writing off the end of it.
char ciphertext[strlen(plaintext)+1];
ciphertext[0] = 0;
There are multiple problems in the code:
you do not test the command line argument presence and length
the array should be allocated with 1 extra byte for the null terminator and initialized as an empty string for strncat() to work properly.
instead of hard coding ASCII values such as 97 and 65, use character constants such as 'a' and 'A'
strncat() is overkill for your purpose. You could just write ciphertext[i] = l; instead of strncat(ciphertext, &l, 1)
islower() and isupper() are only defined for positive values of the type unsigned char and the special negative value EOF. You should cast char arguments as (unsigned char)c to avoid undefined behavior on non ASCII bytes on platforms where char happens to be a signed type.
avoid redundant tests such as islower(xxx) != 0. It is more idiomatic to just write if (islower(xxx))
Here is a modified version:
#include <ctype.h>
#include <stdio.h>
#include <string.h>
#include <cs50.h>
int main(int argc, string argv[]) {
// Testing the argument
if (argc < 2 || strlen(argv[1]) != 26) {
printf("invalid or missing argument\n");
return 1;
}
// Putting all the argv letters into an array called key
char key[26];
memcpy(key, argv[1], 26);
string plaintext = get_string("plaintext: ");
int len = strlen(plaintext);
// Define an array called ciphertext, the length of the inputted text, to hold ciphertext chars and a null terminator
char ciphertext[len + 1];
// Looping through the inputted text, checking for upper and lower case letters
for (int i = 0; i < len; i++) {
unsigned char c = plaintext[i];
if (islower(c)) { // The letter is lower case
int index = c - 'a'; // Converting from ascii to decimal value and getting it into alphabetical index (0-25)
ciphertext[i] = tolower((unsigned char)key[index]);
} else
if (isupper(c)) {
// The letter is uppercase
int index = c - 'A'; // Converting from ascii to decimal value and getting it into alphabetical index (0-25)
ciphertext[i] = toupper((unsigned char)key[index]);
} else {
// other characters are unchanged
ciphertext[i] = c;
}
}
ciphertext[len] = '\0'; // set the null terminator
printf("ciphertext: %s\n", ciphertext);
return 0;
}

Why would a character array be unchanged after a for-loop?

I have built a function with the goal of taking text that is fed from elsewhere in the program and removing all whitespace and punctuation from it. I'm able to remove whitespace and punctuation, but the changes don't stay after they are made. For instance, I put the character array/string into a for-loop to remove whitespace and verify that the whitespace is removed by printing the current string to the screen. When I send the string through a loop to remove punctuation, though, it acts as though I did not remove whitespace from earlier. This is an example of what I'm talking about:
Example of output to screen
The function that I'm using is here.
//eliminates all punctuation, capital letters, and whitespace in plaintext
char *formatPlainText(char *plainText) {
int length = strlen(plainText);
//turn capital letters into lower case letters
for (int i = 0; i < length; i++)
plainText[i] = tolower(plainText[i]);
//remove whitespace
for (int i = 0; i < length; i++) {
if (plainText[i] == ' ')
plainText[i] = plainText[i++];
printf("%c", plainText[i]);
}
printf("\n\n");
//remove punctuation from text
for (int i = 0; i < length; i++) {
if (ispunct(plainText[i]))
plainText[i] = plainText[i++];
printf("%c", plainText[i]);
}
}
Any help as to why the text is unchanged after if exits the loop would be appreciated.
Those for loops are not necessary. Your function can be modified as follows and I commented where I made those changes:
char* formatPlainText(char *plainText)
{
char *dest = plainText; //dest to hold the modified version of plainText
while ( *plainText ) // as far as *plainText is not '\0'
{
int k = tolower(*plainText);
if( !ispunct(k) && k != ' ') // check each char for ' ' and any punctuation mark
*dest++ = tolower(*plainText); // place the lower case of *plainText to *dest and increment dest
plainText++;
}
*dest = '\0'; // This is important because in the while loop we escape it
return dest;
}
From main:
int main( void ){
char str[] = "Practice ????? &&!!! makes ??progress!!!!!";
char * res = formatPlainText(str);
printf("%s \n", str);
}
The code does convert the string to lower case, but the space and punctuation removal phases are broken: plainText[i] = plainText[i++]; has undefined behavior because you use i and modify it elsewhere in the same expression.
Furthermore, you do not return plainText from the function. Depending on how you use the function, this leads to undefined behavior if you store the return value to a pointer and later dereference it.
You can fix the problems by using 2 different index variables for reading and writing to the string when removing characters.
Note too that you should not use a length variable as the string length changes in the second and third phase. Texting for the null terminator is simpler.
Also note that tolower() and ispunct() and other functions from <ctype.h> are only defined for argument values in the range 0..UCHAR_MAX and the special negative value EOF. char arguments must be cast as (unsigned char) to avoid undefined behavior on negative char values on platforms where char is signed by default.
Here is a modified version:
#include <ctype.h>
//eliminate all punctuation, capital letters, and whitespace in plaintext
char *formatPlainText(char *plainText) {
size_t i, j;
//turn capital letters into lower case letters
for (i = 0; plainText[i] != '\0'; i++) {
plainText[i] = tolower((unsigned char)plainText[i]);
}
printf("lowercase: %s\n", plainText);
//remove whitespace
for (i = j = 0; plainText[i] != '\0'; i++) {
if (plainText[i] != ' ')
plainText[j++] = plainText[i];
}
plainText[j] = '\0';
printf("no white space: %s\n", plainText);
//remove punctuation from text
for (i = j = 0; plainText[i] != '\0'; i++) {
if (!ispunct((unsigned char)plainText[i]))
plainText[j++] = plainText[i];
}
plainText[j] = '\0';
printf("no punctuation: %s\n", plainText);
return plainText;
}

C: Comparing hash value seems to disappear

For the love of holy code, I am trying to compare hashes to find the correct password. I am given a hash as a command line argument, and I then hash words from "a" to "ZZZZ" until one of the hash pairs match.
void decipher(string hash)
{
//Set the password, and the salt.
char pass[4] = "a";
char salt[] ="50";
//Compare the crypted pass againts the hash until found.
while (strcmp(hash,crypt(pass, salt)) != 0)
{
//Use int i to hold position, and return next char
int i = 0;
pass[i] = get_next(pass[i]);
tick_over (pass, i);
//Hardcode in a fail safe max length: exit.
if (strlen(pass) > 4)
{
break;
}
}
printf("%s\n", pass);
}
The problem is that it will not 'catch' the correct password / comparison, when that password is 4 letters long. It works for 1,2 and 3 letter long words.
//Tick over casino style
string tick_over (string pass, int i)
{
//Once a char reaches 'Z', move the next char in line up one value.
char a[] = "a";
if (pass[i] == 'Z')
{
if (strlen(pass) < i+2)
{
strncat (pass, &a[0], 1);
return pass;
}
pass[i+1] = get_next(pass[i+1]);
//Recursively run again, moving along the string as necessary
tick_over (pass, i+1);
}
return pass;
}
//Give the next character in the sequence of available characters
char get_next (char y)
{
if (y == 'z')
{
return 'A';
}
else if (y == 'Z')
{
return 'a';
}
else
{
return y + 1;
}
}
It does iterate through the correct word, as I have found in debugging. I have tried moving the
strcmp(hash, crypt(pass, salt)) == 0
into a nested if statement among other things, but it doesn't seem to be the problem. Is c somehow 'forgetting' the command line value? When debugging the hash value seemed to have disappeared :/ Please help!
char pass[4] = "a"; you're defining a char array which can contain at most 3 chars + null terminator.
that's not coherent with your "safety" test: if (strlen(pass) > 4)
When strlen is 4 the array is already overwriting something in memory because of the null termination char: undefined behaviour.
Quickfix: char pass[5] ...
Here is the explanation of the function strncat:
Append characters from string
Appends the first num characters of source to destination, plus a terminating null-character.
with a size of 4 you are not considering the terminating null character of your four chars array.

Resources