Program runs too slowly with large input - C - c

The goal for this program is for it to count the number of instances that two consecutive letters are identical and print this number for every test case. The input can be up to 1,000,000 characters long (thus the size of the char array to hold the input). The website which has the coding challenge on it, however, states that the program times out at a 2s run-time. My question is, how can this program be optimized to process the data faster? Does the issue stem from the large char array?
Also: I get a compiler warning "assignment makes integer from pointer without a cast" for the line str[1000000] = "" What does this mean and how should it be handled instead?
Input:
number of test cases
strings of capital A's and B's
Output:
Number of duplicate letters next to each other for each test case, each on a new line.
Code:
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>
int main() {
int n, c, a, results[10] = {};
char str[1000000];
scanf("%d", &n);
for (c = 0; c < n; c++) {
str[1000000] = "";
scanf("%s", str);
for (a = 0; a < (strlen(str)-1); a++) {
if (str[a] == str[a+1]) { results[c] += 1; }
}
}
for (c = 0; c < n; c++) {
printf("%d\n", results[c]);
}
return 0;
}

You don't need the line
str[1000000] = "";
scanf() adds a null terminator when it parses the input and writes it to str. This line is also writing beyond the end of the array, since the last element of the array is str[999999].
The reason you're getting the warning is because the type of str[10000000] is char, but the type of a string literal is char*.
To speed up the program, take the call to strlen() out of the loop.
size_t len = strlen(str)-1;
for (a = 0; a < len; a++) {
...
}

str[1000000] = "";
This does not do what you think it does and you're overflowing the buffer which results in undefined behaviour. An indexer's range is from 0 - sizeof(str) EXCLUSIVE. So you either add one to the
1000000 when initializing or use 999999 to access it instead. To get rid of the compiler warning and produce cleaner code use:
str[1000000] = '\0';
Or
str[999999] = '\0';
Depending on what you did to fix it.
As to optimizing, you should look at the assembly and go from there.

count the number of instances that two consecutive letters are identical and print this number for every test case
For efficiency, code needs a new approach as suggeted by #john bollinger & #molbdnilo
void ReportPairs(const char *str, size_t n) {
int previous = EOF;
unsigned long repeat = 0;
for (size_t i=0; i<n; i++) {
int ch = (unsigned char) str[i];
if (isalpha(ch) && ch == previous) {
repeat++;
}
previous = ch;
}
printf("Pair count %lu\n", repeat);
}
char *testcase1 = "test1122a33";
ReportPairs(testcase1, strlen(testcase1));
or directly from input and "each test case, each on a new line."
int ReportPairs2(FILE *inf) {
int previous = EOF;
unsigned long repeat = 0;
int ch;
for ((ch = fgetc(inf)) != '\n') {
if (ch == EOF) return ch;
if (isalpha(ch) && ch == previous) {
repeat++;
}
previous = ch;
}
printf("Pair count %lu\n", repeat);
return ch;
}
while (ReportPairs2(stdin) != EOF);
Unclear how OP wants to count "AAAA" as 2 or 3. This code counts it as 3.

One way to dramatically improve the run-time for your code is to limit the number of times you read from stdin. (basically process input in bigger chunks). You can do this a number of way, but probably one of the most efficient would be with fread. Even reading in 8-byte chunks can provide a big improvement over reading a character at a time. One example of such an implementation considering capital letters [A-Z] only would be:
#include <stdio.h>
#define RSIZE 8
int main (void) {
char qword[RSIZE] = {0};
char last = 0;
size_t i = 0;
size_t nchr = 0;
size_t dcount = 0;
/* read up to 8-bytes at a time */
while ((nchr = fread (qword, sizeof *qword, RSIZE, stdin)))
{ /* compare each byte to byte before */
for (i = 1; i < nchr && qword[i] && qword[i] != '\n'; i++)
{ /* if not [A-Z] continue, else compare */
if (qword[i-1] < 'A' || qword[i-1] > 'Z') continue;
if (i == 1 && last == qword[i-1]) dcount++;
if (qword[i-1] == qword[i]) dcount++;
}
last = qword[i-1]; /* save last for comparison w/next */
}
printf ("\n sequential duplicated characters [A-Z] : %zu\n\n",
dcount);
return 0;
}
Output/Time with 868789 chars
$ time ./bin/find_dup_digits <dat/d434839c-d-input-d4340a6.txt
sequential duplicated characters [A-Z] : 434893
real 0m0.024s
user 0m0.017s
sys 0m0.005s
Note: the string was actually a string of '0's and '1's run with a modified test of if (qword[i-1] < '0' || qword[i-1] > '9') continue; rather than the test for [A-Z]...continue, but your results with 'A's and 'B's should be virtually identical. 1000000 would still be significantly under .1 seconds. You can play with the RSIZE value to see if there is any benefit to reading a larger (suggested 'power of 2') size of characters. (note: this counts AAAA as 3) Hope this helps.

Related

Why is this code producing an infinite loop?

#include <Stdio.h>
#include <string.h>
int main(){
char str[51];
int k = 1;
printf("Enter string\n");
scanf("%s", &str);
for(int i = 0; i < strlen(str); i++){
while(str[k] != '\0')){
if(str[i] == str[k]){
printf("%c", str[i]);
k++;
}
}
}
return 0;
}
It is simple C code that checks for duplicate characters in string and prints the characters. I am not understanding why it is producing an infinite loop. The inner while loop should stop when str[k] reaches the null terminator but the program continues infinitely.
Points to know
You don't need to pass the address of the variable str to scanf()
Don't use "%s", use "%<WIDTH>s", to avoid buffer-overflow
Always check whether scanf() conversion was successful or not, by checking its return value
Always use size_t to iterator over any array
i < strlen(str), makes the loop's time complexity O(n3), instead of O(n2), which also isn't very good you should check whether str[i] != 0. But, many modern compilers of C will optimize it by the way.
#include <Stdio.h> it is very wrong, stdio.h != Stdio.h
Call to printf() can be optimized using puts() and putc() without any special formatting, here also modern compiler can optimize it
while(str[k] != '\0')){ has a bracket (')')
Initialize your variable str using {}, this will assign 0 to all the elements of str
Better Implementation
My implementation for this problem is that create a list of character (256 max) with 0 initialized, and then add 1 to ASCII value of the character (from str) in that list. After that print those character whose value was greater than 1.
Time Complexity = O(n), where n is the length of the string
Space Complexity = O(NO_OF_CHARACTERS), where NO_OF_CHARACTERS is 256
Final Code
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
static void print_dup(const char *str)
{
size_t *count = calloc(1 << CHAR_BIT, sizeof(size_t));
for(size_t i = 0; str[i]; i++)
{
count[(unsigned char)str[i]]++;
}
for (size_t i = 0; i < (1 << CHAR_BIT); i++)
{
if(count[i] > 1)
{
printf("`%c`, count = %zu\n", i, count[i]);
}
}
free(count);
}
int main(void) {
char str[51] = {};
puts("Enter string:");
if (scanf("%50s", str) != 1)
{
perror("bad input");
return EXIT_FAILURE;
}
print_dup(str);
return EXIT_SUCCESS;
}
Read your code in English: You only increment variable k if character at index k is equal to character at index i. For any string that has different first two characters you will encounter infinite loop: char at index i==0 is not equal to char at index k==1, so k is not incremented and while(str[k]!=0) loops forever.

Kernighan and Ritchie C exercise 1-16

I tried to implement a solution for the exercise on the C language of K&R's book. I wanted to ask here if this could be considered a legal "solution", just modifying the main without changing things inside external functions.
Revise the main routine of the longest-line program so it will
correctly print the length of arbitrary long input lines, and as much
as possible of the text.
#include <stdio.h>
#define MAXLINE 2 ////
int get_line1(char s[], int lim)
{
int c, i;
for (i = 0; i < lim - 1 && ((c = getchar()) != EOF) && c != '\n'; i++) {
s[i] = c;
}
if (c == '\n') {
s[i] = c;
i++;
}
s[i] = '\0';
return i;
}
int main()
{
int len;
int max = MAXLINE;
char line[MAXLINE];
int tot = 0;
int text_l = 0;
while ((len = get_line1(line, max)) > 0) {
if (line[len - 1] != '\n') {
tot = tot + len;
}
if (line[1] == '\n' || line[0] == '\n') {
printf("%d\n", tot + 1);
text_l = text_l + (tot + 1);
tot = 0;
}
}
printf("%d\n", text_l);
}
The idea is to set the max lenght of the string considered for the array line ad 2.
For a string as abcdef\n , the array line will be ab. Since the last element of the array is not \n (thus the line we are considering is not over), we save the length up until now and repeat the cycle. We will get then the array made of cd, then ef and at the end we will get the array of just \n. Then the else if condition is executed, since the first element of this array is\n, and we print the tot length obtained from the previous additions. We add +1 in order to also consider the new character \n. This works also for odd strings: with abcdefg\n the process will go on up until we reach g\n and the sum is done correctly.
Outside the loop then we print the total amount of text.
Is this a correct way to do the exercise?
The exercise says to “Revise the main routine,” but you altered the definition of MAXLINE, which is outside of main, so that is not a valid solution.
Also, your code does not have the copy or getline routines of the original. Your get_line1 appears to be identical except for the name. However, a correction solution would use identical source code except for the code inside main.
Additionally, the exercise says to print “as much as possible of the text.” That is unclearly stated, but I expect it means to keep a buffer of MAXLINE characters (with MAXLINE at its original value of 1000) and use it to print the first MAXLINE−1 characters of the longest line.

Program to get an indefinite number of strings in C and print them out

As part of an assignment, I am supposed to write a small program that accepts an indefinite number of strings, and then print them out.
This program compiles (with the following warning
desafio1.c:24:16: warning: format not a string literal and no format arguments [-Wform
at-security]
printf(words[i]);
and it prints the following characters on the screen: �����8 ���#Rl�. I guess it did not end the strings I entered by using getchar properly with the null byte, and it prints out garbage. The logic of the program is to initiate a while loop, which runs untill I press the enter key \n, and if there are an space, this is a word that will be store in the array of characters words. Why am I running into problems, if in the else statement once a space is found, I close the word[i] = \0, in that way and store the result in the array words?
#include <stdio.h>
#include <string.h>
int main()
{
char words[100][100];
int i,c;
char word[1000];
while((c = getchar()) != '\n')
{
if (c != ' '){
word[i++] = c;
c = getchar();
}
else{
word[i] = '\0';
words[i] == word;
}
}
int num = sizeof(words) / sizeof(words[0]);
for (i = 0; i < num; i++){
printf(words[i]);
}
return 0;
}
Here are some fixes to your code. As a pointer (as mentioned in other comments), make sure to enable compiler warnings, which will help you find 90% of the issues you had. (gcc -Wall)
#include <stdio.h>
#include <string.h>
int main() {
char words[100][100];
int i = 0;
int j = 0;
int c;
char word[1000];
while((c = getchar()) != '\n') {
if (c != ' '){
word[i++] = c;
} else {
word[i] = '\0';
strcpy(words[j++], word);
i = 0;
}
}
word[i] = '\0';
strcpy(words[j++], word);
for (i = 0; i < j; i++) {
printf("%s\n", words[i]);
}
return 0;
}
i was uninitialized, so its value was undefined. It should start at 0. It also needs to be reset to 0 after each word so it starts at the beginning.
The second c = getchar() was unnecessary, as this is done in every iteration of the loop. This was causing your code to skip every other letter.
You need two counters, one for the place in the word, and one for the number of words read in. That's what j is.
== is for comparison, not assignment. Either way, strcpy() was needed here since you are filling out an array.
Rather than looping through all 100 elements of the array, just loop through the words that have actually been filled (up to j).
The last word input was ignored by your code, since it ends with a \n, not a . That's what the lines after the while are for.
When using printf(), the arguments should always be a format string ("%s"), followed by the arguments.
Of course, there are other things as well that I didn't fix (such as the disagreement between the 1000-character word and the 100-character words). If I were you, I'd think about what to do if the user entered, for some reason, more than 1000 characters in a word, or more than 100 words. Your logic will need to be modified in these cases to prevent illegal memory accesses (outside the bounds of the arrays).
As a reminder, this program does not accept an indefinite number of words, but only up to 100. You may need to rethink your solution as a result.

Getchar gets char only one time

Even if while is true getchar iterates one time. I tried my code with getchar in while condition and body, but it doesn't work.
int main() {
char* s = malloc(sizeof(char)) /*= get_string("Write number: ")*/;
char a[MAXN];
int i = 0;
do {
a[i] = getchar();
*s++ = a[i];
i++;
} while (isdigit(a[i-1]) && a[i-1] != EOF && a[i-1] != '\n' && i< MAXN);
/*while (isdigit(*s++=getchar()))
i++;*/
*s = '\0';
s -= i;
long n = conversion(s);
printf("\n%lu\n", n);
}
As others have pointed out, there isn't much use for s because a can be passed to conversion. And, again, the malloc for s only allocates a single byte.
You're incrementing i before doing the loop tests, so you have to use i-1 there. Also, the loop ends with i being one too large.
Even for your original code, doing int chr = getchar(); a[i] = chr; and replacing a[i-1] with chr can simplify things a bit.
Better yet, by restructuring to use a for instead of a do/while loop, we can add some more commenting for each escape condition rather than a larger single condition expression.
#define MAXN 1000
int
main(void)
{
char a[MAXN + 1];
int i;
for (i = 0; i < MAXN; ++i) {
// get the next character
int chr = getchar();
// stop on EOF
if (chr == EOF)
break;
// stop on newline
if (chr == '\n')
break;
// stop on non-digit
if (! isdigit(chr))
break;
// add digit to the output array
a[i] = chr;
}
// add EOS terminator to string
a[i] = 0;
unsigned long n = conversion(a);
printf("\n%lu\n",n);
return 0;
}
Code does not allocate enough memory with malloc(sizeof(char)) as that is only 1 byte.
When code tries to save a 2nd char into s, bad things can happen: undefined behavior (UB).
In any case, the allocation is not needed.
Instead form a reasonable fixed sized buffer and store characters/digits there.
// The max digits in a `long` is about log10(LONG_MAX) + a few
// The number of [bits in an `long`]/3 is about log10(INT_MAX)
#define LONG_DEC_SZ (CHAR_BIT*sizeof(long)/3 + 3)
int main(void) {
char a[LONG_DEC_SZ * 2]; // lets go for 2x to allow some leading zeros
int i = 0;
int ch; // `getchar()` typically returns 257 different values, use `int`
// As long as there is room and code is reading digits ...
while (i < sizeof a && isdigit((ch = getchar())) ) {
a[i++] = ch;
}
a[i++] = '\0';
long n = conversion(a);
printf("\n%ld\n", n);
}
To Do: This code does not allow a leading sign character like '-' or '+'

C program, Reversing an array

I am writing C program that reads input from the standard input a line of characters.Then output the line of characters in reverse order.
it doesn't print reversed array, instead it prints the regular array.
Can anyone help me?
What am I doing wrong?
main()
{
int count;
int MAX_SIZE = 20;
char c;
char arr[MAX_SIZE];
char revArr[MAX_SIZE];
while(c != EOF)
{
count = 0;
c = getchar();
arr[count++] = c;
getReverse(revArr, arr);
printf("%s", revArr);
if (c == '\n')
{
printf("\n");
count = 0;
}
}
}
void getReverse(char dest[], char src[])
{
int i, j, n = sizeof(src);
for (i = n - 1, j = 0; i >= 0; i--)
{
j = 0;
dest[j] = src[i];
j++;
}
}
You have quite a few problems in there. The first is that there is no prototype in scope for getReverse() when you use it in main(). You should either provide a prototype or just move getReverse() to above main() so that main() knows about it.
The second is the fact that you're trying to reverse the string after every character being entered, and that your input method is not quite right (it checks an indeterminate c before ever getting a character). It would be better as something like this:
count = 0;
c = getchar();
while (c != EOF) {
arr[count++] = c;
c = getchar();
}
arr[count] = '\0';
That will get you a proper C string albeit one with a newline on the end, and even possibly a multi-line string, which doesn't match your specs ("reads input from the standard input a line of characters"). If you want a newline or file-end to terminate input, you can use this instead:
count = 0;
c = getchar();
while ((c != '\n') && (c != EOF)) {
arr[count++] = c;
c = getchar();
}
arr[count] = '\0';
And, on top of that, c should actually be an int, not a char, because it has to be able to store every possible character plus the EOF marker.
Your getReverse() function also has problems, mainly due to the fact it's not putting an end-string marker at the end of the array but also because it uses the wrong size (sizeof rather than strlen) and because it appears to re-initialise j every time through the loop. In any case, it can be greatly simplified:
void getReverse (char *dest, char *src) {
int i = strlen(src) - 1, j = 0;
while (i >= 0) {
dest[j] = src[i];
j++;
i--;
}
dest[j] = '\0';
}
or, once you're a proficient coder:
void getReverse (char *dest, char *src) {
int i = strlen(src) - 1, j = 0;
while (i >= 0)
dest[j++] = src[i--];
dest[j] = '\0';
}
If you need a main program which gives you reversed characters for each line, you can do that with something like this:
int main (void) {
int count;
int MAX_SIZE = 20;
int c;
char arr[MAX_SIZE];
char revArr[MAX_SIZE];
c = getchar();
count = 0;
while(c != EOF) {
if (c != '\n') {
arr[count++] = c;
c = getchar();
continue;
}
arr[count] = '\0';
getReverse(revArr, arr);
printf("'%s' => '%s'\n", arr, revArr);
count = 0;
c = getchar();
}
return 0;
}
which, on a sample run, shows:
pax> ./testprog
hello
'hello' => 'olleh'
goodbye
'goodbye' => 'eybdoog'
a man a plan a canal panama
'a man a plan a canal panama' => 'amanap lanac a nalp a nam a'
Your 'count' variable goes to 0 every time the while loop runs.
Count is initialised to 0 everytime the loop is entered
you are sending the array with each character for reversal which is not a very bright thing to do but won't create problems. Rather, first store all the characters in the array and send it once to the getreverse function after the array is complete.
sizeof(src) will not give the number of characters. How about you send i after the loop was terminated in main as a parameter too. Ofcourse there are many ways and various function but since it seems like you are in the initial stages, you can try up strlen and other such functions.
you have initialised j to 0 in the for loop but again, specifying it INSIDE the loop will initialise the value everytime its run from the top hence j ends up not incrmenting. So remore the j=0 and i=0 from INSIDE the loop since you only need to get it initialised once.
check this out
#include <stdio.h>
#include <ctype.h>
void getReverse(char dest[], char src[], int count);
int main()
{
// *always* initialize variables
int count = 0;
const int MaxLen = 20; // max length string, leave upper case names for MACROS
const int MaxSize = MaxLen + 1; // add one for ending \0
int c = '\0';
char arr[MaxSize] = {0};
char revArr[MaxSize] = {0};
// first collect characters to be reversed
// note that input is buffered so user could enter more than MAX_SIZE
do
{
c = fgetc(stdin);
if ( c != EOF && (isalpha(c) || isdigit(c))) // only consider "proper" characters
{
arr[count++] = (char)c;
}
}
while(c != EOF && c != '\n' && count < MaxLen); // EOF or Newline or MaxLen
getReverse( revArr, arr, count );
printf("%s\n", revArr);
return 0;
}
void getReverse(char dest[], char src[], int count)
{
int i = count - 1;
int j = 0;
while ( i > -1 )
{
dest[j++] = src[i--];
}
}
Dealing with strings is a rich source of bugs in C, because even simple operations like copying and modifying require thinking about issues of allocation and storage. This problem though can be simplified considerably by thinking of the input and output not as strings but as streams of characters, and relying on recursion and local storage to handle all allocation.
The following is a complete program that will read one line of standard input and print its reverse to standard output, with the length of the input limited only by the growth of the stack:
int florb (int c) { return c == '\n' ? c : putchar(florb(getchar())), c; }
main() { florb('-'); }
..or check this
#include <stdio.h>
#include <stdlib.h>
#define MAX 100
char *my_rev(const char *source);
int main(void)
{
char *stringA;
stringA = malloc(MAX); /* memory allocation for 100 characters */
if(stringA == NULL) /* if malloc returns NULL error msg is printed and program exits */
{
fprintf(stdout, "Out of memory error\n");
exit(1);
}
else
{
fprintf(stdout, "Type a string:\n");
fgets(stringA, MAX, stdin);
my_rev(stringA);
}
return 0;
}
char *my_rev(const char *source) /* const makes sure that function does not modify the value pointed to by source pointer */
{
int len = 0; /* first function calculates the length of the string */
while(*source != '\n') /* fgets preserves terminating newline, that's why \n is used instead of \0 */
{
len++;
*source++;
}
len--; /* length calculation includes newline, so length is subtracted by one */
*source--; /* pointer moved to point to last character instead of \n */
int b;
for(b = len; b >= 0; b--) /* for loop prints string in reverse order */
{
fprintf(stdout, "%c", *source);
len--;
*source--;
}
return;
}
Output looks like this:
Type a string:
writing about C programming
gnimmargorp C tuoba gnitirw

Resources