Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
How can I refactor this with less code?
This is homework and is cracking a Caesar cipher-text using frequency distribution.
I have completed the assignment but would like it to be cleaner.
int main(int argc, char **argv){
// first allocate some space for our input text (we will read from stdin).
char* text = (char*)malloc(sizeof(char)*TEXT_SIZE+1);
char textfreq[ALEN][2];
char map[ALEN][2];
char newtext[TEXT_SIZE];
char ch, opt, tmpc, tmpc2;
int i, j, tmpi;
// Check the CLI arguments and extract the mode: interactive or dump and store in opt.
if(!(argc == 2 && isalpha(opt = argv[1][1]) && (opt == 'i' || opt == 'd'))){
printf("format is: '%s' [-d|-i]\n", argv[0]);
exit(1);
}
// Now read TEXT_SIZE or feof worth of characters (whichever is smaller) and convert to uppercase as we do it.
for(i = 0, ch = fgetc(stdin); i < TEXT_SIZE && !feof(stdin); i++, ch = fgetc(stdin)){
text[i] = (isalpha(ch)?upcase(ch):ch);
}
text[i] = '\0'; // terminate the string properly.
// Assign alphabet to one dimension of text frequency array and a counter to the other dimension
for (i = 0; i < ALEN; i++) {
textfreq[i][0] = ALPHABET[i];
textfreq[i][1] = 0;
}
// Count frequency of characters in the given text
for (i = 0; i < strlen(text); i++) {
for (j = 0; j < ALEN; j++) {
if (text[i] == textfreq[j][0]) textfreq[j][1]+=1;
}
}
//Sort the character frequency array in descending order
for (i = 0; i < ALEN-1; i++) {
for (j= 0; j < ALEN-i-1; j++) {
if (textfreq[j][1] < textfreq[j+1][1]) {
tmpi = textfreq[j][1];
tmpc = textfreq[j][0];
textfreq[j][1] = textfreq[j+1][1];
textfreq[j][0] = textfreq[j+1][0];
textfreq[j+1][1] = tmpi;
textfreq[j+1][0] = tmpc;
}
}
}
//Map characters to most occurring English characters
for (i = 0; i < ALEN; i++) {
map[i][0] = CHFREQ[i];
map[i][1] = textfreq[i][0];
}
// Sort the map lexicographically
for (i = 0; i < ALEN-1; i++) {
for (j= 0; j < ALEN-i-1; j++) {
if (map[j][0] > map[j+1][0]) {
tmpc = map[j][0];
tmpc2 = map[j][1];
map[j][0] = map[j+1][0];
map[j][1] = map[j+1][1];
map[j+1][0] = tmpc;
map[j+1][1] = tmpc2;
}
}
}
if(opt == 'd'){
decode_text(text, newtext, map);
} else {
// do option -i
}
// Print alphabet and map to stderr and the decoded text to stdout
fprintf(stderr, "\n%s\n", ALPHABET);
for (i = 0; i < ALEN; i++) {
fprintf(stderr, "%c", map[i][1]);
}
printf("\n%s\n", newtext);
return 0;
}
Um, Refactoring != less code. Obfuscation can sometimes result in less code, if that is your objective :)
Refactoring is done to improved code readability and reduced complexity. Suggestions for improvement in your case:
Look at the chunks of logic you've implemented and consider replacing them with in built functions is usually a good place to begin. I'm convinced that some of the sorting you've performed can be replaced with qsort(). However, side note, if this is your assignment, your tutor may be a douche and want to see you write out the code in FULL VS using C's in built function, and dock you points on being too smart. (Sorry personal history here :P)
Move your logical units of work into dedicated functions, and have a main function to perform orchestration.
Related
I want to search for a block of characters (word) in a text.
For example, I have the next text "Hello xyz world", and I want to search for "xyz ", note the space after the word.
// The Text
const char * text = "Hello xyz world";
// The target word
const char * patt = "xyz ";
size_t textLen = strlen(text),
pattLen = strlen(patt), i, j;
for (i = 0; i < textLen; i++) {
printf("%c", text[i]);
for (j = 0; j < pattLen; j++) {
if (text[i] == patt[j]) {
printf(" <--");
break;
}
}
printf("\n");
}
The result must be like following:
But unfortunately, the result as the following:
It collects all the similar characters in the whole text, not just the target characters (the word).
How to fix that problem?
You have to do a full substring match before you print; mark the applicable characters on a first pass, and then have a second pass to print the results. In your case, you'd create a second array, with boolean values corresponding to the first, something like
text = "Hello xyz world";
match 000000111100000
I assume that you can find a basic substring match program online. Printing on the second pass will be easy: you already have the logic. Instead of if (text[i] == patt[j]), just use if match[i].
Is that enough of a hint?
You need to make sure that there is a full match before starting to print any <--. And to avoid to do accesses passed end of array on patt, you will have to stop searching when less than pattLen characters remain in array.
Then when you have found a full match, you can print the content of patt followed with <-- and increment position of pointer of pattLen-1. And at the end you will have to copy remaining characters from text.
Code could become:
// The Text
const char * text = "Hello xyz world";
// The target word
const char * patt = "xyz ";
size_t textLen = strlen(text),
pattLen = strlen(patt), i, j;
for (i = 0; i <= textLen - pattLen; i++) { // don't search if less that pattLen remains
printf("%c", text[i]);
if (text[i] == patt[0]) { // first char matches
int found = 1; // be optimistic...
for (j = 1; j < pattLen; j++) {
if (patt[j] != text[i + j]) {
found = 0;
break; // does not fully match, go on
}
}
if (found) { // yeah, a full match!
printf(" <--"); // already printed first char
for (j = 1; j < pattLen; j++) {
printf("\n%c <--", patt[j]);// print all others chars from patt
}
i += pattLen - 1; // increase index...
}
}
printf("\n");
}
while (i < textLen) {
printf("%c\n", text[i++]); // process the end of text
}
Above code gives expected output for "xyz " and also "llo"...
You should check every letter of your pattern from the beginning (and not check the whole pattern). Try this (not tested):
int currIndex = 0;
for (i = 0; i < textLen; i++) {
printf("%c", text[i]);
if (text[i] == patt[currIndex]) {
for (j = 0; j < pattLen; j++) {
if(text[i+j] != patt[j]){
continue;
}
}
printf(" <--");
currIndex++;
if(currIndex==pattLen)
currIndex = 0;
}
else{
currIndex = 0;
}
printf("\n");
}
Note: It is not the best way to achieve this but the easiest with your example
Note 2: This question should be closed as it is:
Questions seeking debugging help ("why isn't this code working?") must
include the desired behavior, a specific problem or error and the
shortest code necessary to reproduce it in the question itself.
Questions without a clear problem statement are not useful to other
readers. See: How to create a Minimal, Complete, and Verifiable
example.
I have a program I wrote to take a string of words and, based on the delimiter that appears, separate each word and add it to an array.
I've adjusted it to account for either a ' ' , '.' or '.'. Now the goal is to adjust for multiple delimiters appearing together (as in "the dog,,,was walking") and still only add the word. While my program works, and it doesn't print out extra delimiters, every time it encounters additional delimiters, it includes a space in the output instead of ignoring them.
int main(int argc, const char * argv[]) {
char *givenString = "USA,Canada,Mexico,Bermuda,Grenada,Belize";
int stringCharCount;
//get length of string to allocate enough memory for array
for (int i = 0; i < 1000; i++) {
if (givenString[i] == '\0') {
break;
}
else {
stringCharCount++;
}
}
// counting # of commas in the original string
int commaCount = 1;
for (int i = 0; i < stringCharCount; i++) {
if (givenString[i] == ',' || givenString[i] == '.' || givenString[i] == ' ') {
commaCount++;
}
}
//declare blank Array that is the length of commas (which is the number of elements in the original string)
//char *finalArray[commaCount];
int z = 0;
char *finalArray[commaCount] ;
char *wordFiller = malloc(stringCharCount);
int j = 0;
char current = ' ';
for (int i = 0; i <= stringCharCount; i++) {
if (((givenString[i] == ',' || givenString[i] == '\0' || givenString[i] == ',' || givenString[i] == ' ') && (current != (' ' | '.' | ',')))) {
finalArray[z] = wordFiller;
wordFiller = malloc(stringCharCount);
j=0;
z++;
current = givenString[i];
}
else {
wordFiller[j++] = givenString[i];
}
}
for (int i = 0; i < commaCount; i++) {
printf("%s\n", finalArray[i]);
}
return 0;
}
This program took me hours and hours to get together (with help from more experienced developers) and I can't help but get frustrated. I'm using the debugger to my best ability but definitely need more experience with it.
/////////
I went back to pad and paper and kind of rewrote my code. Now I'm trying to store delimiters in an array and compare the elements of that array to the current string value. If they are equal, then we have come across a new word and we add it to the final string array. I'm struggling to figure out the placement and content of the "for" loop that I would use for this.
char * original = "USA,Canada,Mexico,Bermuda,Grenada,Belize";
//creating two intialized variables to count the number of characters and elements to add to the array (so we can allocate enough mmemory)
int stringCharCount = 0;
//by setting elementCount to 1, we can account for the last word that comes after the last comma
int elementCount = 1;
//calculate value of stringCharCount and elementCount to allocate enough memory for temporary word storage and for final array
for (int i = 0; i < 1000; i++) {
if (original[i] == '\0') {
break;
}
else {
stringCharCount++;
if (original[i] == ',') {
elementCount++;
}
}
}
//account for the final element
elementCount = elementCount;
char *tempWord = malloc(stringCharCount);
char *finalArray[elementCount];
int a = 0;
int b = 0;
//int c = 0;
//char *delimiters[4] = {".", ",", " ", "\0"};
for (int i = 0; i <= stringCharCount; i++) {
if (original[i] == ',' || original[i] == '\0') {
finalArray[a] = tempWord;
tempWord = malloc(stringCharCount);
tempWord[b] = '\0';
b = 0;
a++;
}
else {
tempWord[b++] = original[i];
}
}
for (int i = 0; i < elementCount; i++) {
printf("%s\n", finalArray[i]);
}
return 0;
}
Many issues. Suggest dividing code into small pieces and debug those first.
--
Un-initialize data.
// int stringCharCount;
int stringCharCount = 0;
...
stringCharCount++;
Or
int stringCharCount = strlen(givenString);
Other problems too: finalArray[] is never assigned a terminarting null character yet printf("%s\n", finalArray[i]); used.
Unclear use of char *
char *wordFiller = malloc(stringCharCount);
wordFiller = malloc(stringCharCount);
There are more bugs than lines in your code.
I'd suggest you start with something much simpler.
Work through a basic programming book with excercises.
Edit
Or, if this is about learning to program, try another, simpler programming language:
In C# your task looks rather simple:
string givenString = "USA,Canada Mexico,Bermuda.Grenada,Belize";
string [] words = string.Split(new char[] {' ', ',', '.'});
foreach(word in words)
Console.WriteLine(word);
As you see, there are much issues to worry about:
No memory management (alloc/free) this is handeled by the Garbage Collector
no pointers, so nothing can go wrong with them
powerful builtin string capabilities like Split()
foreach makes loops much simpler
I'm writing a program that will read input and then give back a histogram of the character count from K & R - Ex. 1.13
Any suggestions on how I can improve my code? Does it matter whether or not if I test for status in condition or out first? I have noticed in my examples people test to see if c is a blank or tab first.
I think I need to revisit my histogram. It doesn't really scale the results. It just draws a hyphen based on the length.
Revised to make a little bit more readable I think.
// Print a histogram of the length of words in it's input.
#include <stdio.h>
#define IN 1
#define OUT 2
#define MAX 99
int main(){
int c; // the character
int countOfLetters = 0;
int insideWord = OUT;
int frequencyOfLengths[MAX];
int longestWordCount = 0;
int i, j; // Counters
for (i = 0; i < MAX; i++){
frequencyOfLengths[i] = 0;
}
while ((c = getchar()) != EOF){
if (c == ' ' || c == '\n' || c == '\t'){
if (insideWord == IN){
if (countOfLetters > MAX){
return 1;
}
++frequencyOfLengths[countOfLetters];
if (countOfLetters >= longestWordCount) longestWordCount = countOfLetters;
}
countOfLetters = 0;
}
else {
countOfLetters++;
insideWord = IN;
}
}
for (i = 1; i <= longestWordCount; i++){
printf("%3i : %3i ", i, frequencyOfLengths[i]);
for (j = 0; j < frequencyOfLengths[i]; j++){
printf("*");
}
printf("\n");
}
return 0;
}
Definitely scale results, check out my Character Histogram that does a horizontal scaling histogram.
Also, you could benefit a y-axis label. It's hard to tell which bar is for which kind of word length. I have no idea which bar is for what word length.
I added this code right before you display the histogram, it basically halves every value, which does throw off your bar number labels. You can figure it out!
// Iterates and tells us the most frequent word length
int mostFrequent = 0;
for (i = 1; i < MAXWORD; i++)
if (charCount[i] > mostFrequent)
mostFrequent = charCount[i];
// If the bar will be too big, cut every value in half
while (mostFrequent > 60) {
for (i = 1; i < MAXWORD; i++)
if (charCount[i] > 0) {
charCount[i] /= 2;
charCount[i] |= 1;
}
// Check again to find the most frequent word length category
mostFrequent = 0;
for (i = 1; i < MAXWORD; i++)
if (charCount[i] > mostFrequent)
mostFrequent = charCount[i];
}
Honestly the bars are hard to read, maybe just use a single row of characters such as █ !
Great book so far, we're practically reading it together and are on the same page!
Cheers
I am experiencing an issue where the invocation of realloc seems to modify the contents of another string, keyfile.
It's supposed to run through a null-terminated char* (keyfile), which contains just above 500 characters. The problem, however, is that the reallocation I perform in the while-loop seems to modify the contents of the keyfile.
I tried removing the dynamic reallocation with realloc and instead initialize the pointers in the for-loop with a size of 200*sizeof(int) instead. The problem remains, the keyfile string is modified during the (re)allocation of memory, and I have no idea why. I have confirmed this by printing the keyfile-string before and after both the malloc and realloc statements.
Note: The keyfile only contains the characters a-z, no digits, spaces, linebreaks or uppercase. Only a text of 26, lowercase letters.
int **getCharMap(const char *keyfile) {
char *alphabet = "abcdefghijklmnopqrstuvwxyz";
int **charmap = malloc(26*sizeof(int));
for (int i = 0; i < 26; i++) {
charmap[(int) alphabet[i]] = malloc(sizeof(int));
charmap[(int) alphabet[i]][0] = 0; // place a counter at index 0
}
int letter;
int count = 0;
unsigned char c = keyfile[count];
while (c != '\0') {
int arr_count = charmap[c][0];
arr_count++;
charmap[c] = realloc(charmap[c], (arr_count+1)*sizeof(int));
charmap[c][0] = arr_count;
charmap[c][arr_count] = count;
c = keyfile[++count];
}
// Just inspecting the results for debugging
printf("\nCHARMAP\n");
for (int i = 0; i < 26; i++) {
letter = (int) alphabet[i];
printf("%c: ", (char) letter);
int count = charmap[letter][0];
printf("%d", charmap[letter][0]);
if (count > 0) {
for (int j = 1; j < count+1; j++) {
printf(",%d", charmap[letter][j]);
}
}
printf("\n");
}
exit(0);
return charmap;
}
charmap[(int) alphabet[i]] = malloc(sizeof(int));
charmap[(int) alphabet[i]][0] = 0; // place a counter at index 0
You are writing beyond the end of your charmap array. So, you are invoking undefined behaviour and it's not surprising that you are seeing weird effects.
You are using the character codes as an index into the array, but they do not start at 0! They start at whatever the ASCII code for a is.
You should use alphabet[i] - 'a' as your array index.
The following piece of code is a source of troubles:
int **charmap = malloc(26*sizeof(int));
for (int i = 0; i < 26; i++)
charmap[...] = ...;
If sizeof(int) < sizeof(int*), then it will be performing illegal memory access operations.
For example, on 64-bit platforms, the case is usually sizeof(int) == 4 < 8 == sizeof(int*).
Under that scenario, by writing into charmap[13...25], you will be accessing unallocated memory.
Change this:
int **charmap = malloc(26*sizeof(int));
To this:
int **charmap = malloc(26*sizeof(int*));
I've just started to get in to C programming and would appreciate criticism on my ReplaceString function.
It seems pretty fast (it doesn't allocate any memory other than one malloc for the result string) but it seems awfully verbose and I know it could be done better.
Example usage:
printf("New string: %s\n", ReplaceString("great", "ok", "have a g grea great day and have a great day great"));
printf("New string: %s\n", ReplaceString("great", "fantastic", "have a g grea great day and have a great day great"));
Code:
#ifndef uint
#define uint unsigned int
#endif
char *ReplaceString(char *needle, char *replace, char *haystack)
{
char *newString;
uint lNeedle = strlen(needle);
uint lReplace = strlen(replace);
uint lHaystack = strlen(haystack);
uint i;
uint j = 0;
uint k = 0;
uint lNew;
char active = 0;
uint start = 0;
uint end = 0;
/* Calculate new string size */
lNew = lHaystack;
for (i = 0; i < lHaystack; i++)
{
if ( (!active) && (haystack[i] == needle[0]))
{
/* Start of needle found */
active = 1;
start = i;
end = i;
}
else if ( (active) && (i-start == lNeedle) )
{
/* End of needle */
active = 0;
lNew += lReplace - lNeedle;
}
else if ( (active) && (i-start < lNeedle) && (haystack[i] == needle[i-start]) )
{
/* Next part of needle found */
end++;
}
else if (active)
{
/* Didn't match the entire needle... */
active = 0;
}
}
active= 0;
end = 0;
/* Prepare new string */
newString = malloc(sizeof(char) * lNew + 1);
newString[sizeof(char) * lNew] = 0;
/* Build new string */
for (i = 0; i < lHaystack; i++)
{
if ( (!active) && (haystack[i] == needle[0]))
{
/* Start of needle found */
active = 1;
start = i;
end = i;
}
else if ( (active) && (i-start == lNeedle) )
{
/* End of needle - apply replacement */
active = 0;
for (k = 0; k < lReplace; k++)
{
newString[j] = replace[k];
j++;
}
newString[j] = haystack[i];
j++;
}
else if ( (active) && (i-start < lNeedle) && (haystack[i] == needle[i-start])
)
{
/* Next part of needle found */
end++;
}
else if (active)
{
/* Didn't match the entire needle, so apply skipped chars */
active = 0;
for (k = start; k < end+2; k++)
{
newString[j] = haystack[k];
j++;
}
}
else if (!active)
{
/* No needle matched */
newString[j] = haystack[i];
j++;
}
}
/* If still matching a needle... */
if ( active && (i-start == lNeedle))
{
/* If full needle */
for (k = 0; k < lReplace; k++)
{
newString[j] = replace[k];
j++;
}
newString[j] = haystack[i];
j++;
}
else if (active)
{
for (k = start; k < end+2; k++)
{
newString[j] = haystack[k];
j++;
}
}
return newString;
}
Any ideas? Thanks very much!
Don't call strlen(haystack). You are already checking every character in the string, so computing the string length is implicit to your loop, as follows:
for (i = 0; haystack[i] != '\0'; i++)
{
...
}
lHaystack = i;
It's possible you are doing this in your own way for practice. If so, you get many points for effort.
If not, you can often save time by using functions that are in the C Runtime Library (CRT) versus coding your own equivalent function. For example, you could use strstr to locate the string that's targeted for replacement. Other string manipulation functions may also be useful to you.
A good exercise would be to complete this example to your satisfaction and then recode using the CRT to see how much faster it is to code and execute.
While looping the first time, you should keep indices on where there need to be replacement and skip those on the strcopy/replace part of the function. This would result in a loop where you only do strncpy from haystack or replacement to new string.
Make the parameters const
char *ReplaceString(const char *needle, const char *replace, const char *haystack)
Oh ... is the function supposed to work only once per word?
ReplaceString("BAR", "bar", "BARBARA WENT TO THE BAR")
My one suggestion has nothing to do with improving performance, but with improving readability.
"Cute" parameter names are much harder to understand than descriptive ones. Which of the following parameters do you think better convey their purpose?
char *ReplaceString(char *needle, char *replace, char *haystack)
char *ReplaceString(char *oldText, char *newText, char *inString)
With one, you have to consciously map a name to a purpose. With the other, the purpose IS the name. Juggling a bunch of name mappings in your head while trying to understand a piece of code can become difficult, especially as the number of variables increases.
This might not seem so important when you're the only one using your code, but it's paramount when your code is being used or read by someone else. And sometimes, "someone else" is yourself, a year later, looking at your own code, wondering why you're searching through haystacks and trying to replace needles ;)