Normalizing the frequency of letters in a given .txt file - c

The problem: Given a txt file, find the normalized frequencies of all the letters in the document.
For example if given letters.txt containing "aaabb"
Output would be:
Letter | Frequency
a 0.6
b 0.4
Non-letters or numbers should be ignored.
My solution so far:
Since they want to input a text file, my main() should receive command line arguments.
int main (int argc, char* argv[]){
I've made an EOF check using getchar()
while ((c=getchar()) != EOF){
and an if statement that checks if the char that getchar() is within the ASCII values for a->z or A->Z
if (argv[1][c] >= 'a' && argv[1][c] <= 'z' || argv[1][c] >= 'A' && argv[1]<= 'Z')
2 things here - I dont know if argv[1][c] is the right way to go about writing this but intuitively it made sense to me.
Once the check is satisfied, I want the corresponding letter to update a count specifically for its # position in the alphabet. Therefore needing a declared array that iterates for that letter each time it is found.
count[26];
Here is where I'm having troubles associating the letter a or A to position count[0] in the count array. I don't know how to code this part.

Why not create an int array that is of size 52 and have the first half of the array be for lower-case character counts and the upper half be for uppercase?
So in pseudocode:
#define ALPHA_COUNTS (52)
#define UPPER_OFFSET (26)
int counts[ALPHA_COUNTS] = {0};
for (char c : the_file_stream) {
if (c is an alphabet character) {
if (c is a lowercase character){
++counts[c - 'a'];
} else {
++counts[c - 'A' + UPPER_OFFSET];
}
}
}
Even easier would be to just create a table for all ASCII characters and just populate it for alphabet characters:
#define ASCII_COUNT (127)
int counts[ASCII_COUNT] = {0};
for (char c : the_file_stream) {
if (c is an alphabet character) {
++counts[c];
}
}
Then later on, you could just iterate through the set of [Aa-Zz] and check the counts of each of the characters.

Simplify the statistics gathering by counting the occurrence of all input.
Non-letters or numbers should be ignored in the result.
unsigned long long count[UCHAR_MAX + 1] = {0};
int ch;
while ((ch=getchar()) != EOF){
count[ch]++;
}
Now only sum the ones that are letters
unsigned long long sum = 0;
for (int i=0; i<=UCHAR_MAX; i++) {
if (isalpha(i)) {
sum += count[i];
// fold into lower case
if (isupper(i)) {
count[tolower(i)] += count[i];
count[i] = 0;
}
}
}
Print their frequency
for (int i=0; i<=UCHAR_MAX; i++) {
if (isalpha(i) && count[i] > 0) {
printf("%c %f\n", i, 1.0*count[i]/sum);
}
}

There are a number of different ways to approach the problem. You can make use the the functions provided in ctype.h (e.g. isalpha, tolower, toupper, etc.), or for the limited number of tests required, you can simply test the characters directly using arithmetic or basic bitwise operations. For example, you can test if a value is between 'a' and 'z' for lower-case, and for all characters the 6th-bit in 7-bit ASCII is the case-bit, so simply toggling the case-bit will change a character from upper-to-lower or vice versa.
The read then analyze approach chux outlined is an excellent approach to take. Any time you can separate input/output from processing data, you provide yourself a great deal of flexibility.
Using that logic, an example of using arithmetic and simple bitwise operations to analyze the frequency of alpha-characters ([A-Za-z]) which occur in a file can be written similar to the following. Note, the program will read from the filename provided as the first argument (or from stdin by default if no filename is given:
#include <stdio.h>
#include <limits.h>
int main (int argc, char **argv) {
unsigned long long count[UCHAR_MAX + 1] = {0}, sum = 0;
int c, i;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
while ((c = fgetc(fp)) != EOF) count[c]++; /* fill count */
for (i = 0; i <= UCHAR_MAX; i++) {/* freq of only chars */
if ('A' <= i && i <= 'Z') { /* fold upper-case */
count[i ^ (1u << 5)] += count [i]; /* into lower */
count[i] = 0; /* zero index */
}
if ('a' <= i && i <= 'z') /* if lower-case */
sum += count[i]; /* add to sum */
}
printf ("\n total characters [A-Za-z]: %llu\n\n", sum);
for (i = 0; i <= UCHAR_MAX; i++)
if (count[i] > 0 && ('a' <= i && i <= 'z'))
printf (" %c%c : %.2f\n", i ^ (1u << 5), i, 1.0 * count[i]/sum);
putchar ('\n');
if (fp != stdin) fclose (fp); /* close if not stdin */
return 0;
}
Examples Use/Output
Using your example of 'aaabb' the program produces the following:
$ ./bin/char_freq < <(echo "aaabb")
total characters [A-Za-z]: 5
Aa : 0.60
Bb : 0.40
A slightly longer example shows the full character selection of only [A-Za-z]:
$ ./bin/char_freq < <(echo "*(a)123A_a/B+4b.")
total characters [A-Za-z]: 5
Aa : 0.60
Bb : 0.40
Look over this answer as well as all the others and let me know if you have additional questions.

Related

checking if a string contains numbers

I want to check a passcode if it contain at least 2 numbers.
I tried it with a double for loop like this:
for(int i = 0; i <= count; i++)
{
for(int k = 0; k < 10; k++)
{
if(passcode[i] == k)
{
printf("yee\n");
break;
}
}
}
but it doesn't work
I thought for each 'round', k would be a number from 0 to 9, and if it would be equal to a number the user gave, it would (in this case) print something
(passcode is an input from the user and count is the amount of signs given in)
int sum = 0;
for(int i = 0; i <= count; i++)
{
if (passcode[i] >= '0' && passcode[i] <= '9')
{
sum++
}
}
if (sum >=2)
printf("At least two digits");
As mentioned in some comments, you can also use isdigit(), if you have access to this function.
One other important point (also in the comments): characters and their number meaning might be different: '2' is different from 2: '2' is a character, there are 256 ASCII characters. If you want to verify this with a number, you are generally dealing with the ASCII code of that character (which is 50 for the character '2').
the digits are just ascii values you need single comparision like.
int len = LENGHT_OF_STRING;
char * s = "the string";
for(int i = 0; i < len; i++){
if(s[i] >= '0' && s[i] <= '9'){
prinf("Found digit.");
}
}
your comparision of k belongs to [0, 10) is not a valid implementaion.
This is doing comparision of ascii value of i'th char to the above range.
You have another option -- let strpbrk() do the work for you. strpbrk() (in string.h) locates the first occurrence within a string of any of the characters specified in the second accept string. So in the case of digits, your accept string is simply "0123456789". The function returns a pointer to the first occurrence found, so you simply loop twice adding 1 to the returned pointer address before the second iteration, considering any NULL return a failure to find two digits in the string.
A short working example is:
#include <stdio.h>
#include <string.h>
#define DIGITS "0123456789"
#define NDIGITS 2
int main (int argc, char **argv) {
int ndigits = NDIGITS;
char *p = argv[1];
if (argc < 2 ) { /* validate 1 argument given for password */
fprintf (stderr, "error: insufficient input,\n"
"usage: %s password\n", argv[0]);
return 1;
}
while (ndigits--) { /* loop NDIGITS times */
if (!(p = strpbrk (p, DIGITS))) { /* get ptr to digit (or NULL) */
fputs ("error: password lacks 2 digits.\n", stderr);
return 1;
}
p += 1; /* increment pointer by 1 for next search */
}
puts ("password contains at least 2 digits");
}
Example Use/Output
$ ./bin/passwd2digits some2digitpass1
password contains at least 2 digits
or
$ ./bin/passwd2digits some2digitpass
error: password lacks 2 digits.
Let strpbrk() worry about how to find the digits -- all you need to do it use it. man 3 strpbrk

C programming - How to count number of repetition of letter in a word

I am trying to write a program where its supposed to count the occurrence of each characters using getchar(). For example if the input is xxxyyyzzzz the output should be 334 (x is repeated 3 times, y 3 and z 4times). If the input is xxyxx the output should be 212.
Here's what I have tried so far:
double nc ;
for (nc=0 ; getchar()!=EOF;++nc);
printf ("%.0f\n",nc);
return 0;
Input aaabbbccc, output 10. The expected out put is 333
Unfortunately , this shows the total number of characters including enter but not the repetition.
What you are describing is classically handled using a frequency array that is simply an array with the number of elements equal to the number of items you want to count the frequency of (e.g. 26 for all lowercase characters). The array is initialized to all zero. You then map each member of the set you want the frequency of to an array index. (for lowercase characters this is particularly easy since you can map ch - 'a' to map each character a to z to array index 0 - 25 (e.g. 'a' - 'a' = 0, 'b' - 'a' = 1, etc...
Then it is just a matter of looping over all characters on stdin checking if they are one in the set you want the frequency of (e.g. a lowercase character) and incrementing the array index that corresponds to that character. For example if you read all characters with c = getchar();, then you would test and increment as follows:
#define NLOWER 26 /* if you need a constant, define one */
...
int c, /* var to hold each char */
charfreq[NLOWER] = {0}; /* frequency array */
...
if ('a' <= c && c <= 'z') /* is it a lowercase char ? */
charfreq[c - 'a']++; /* update frequency array */
When your are done reading characters, then the frequency of each character is captured in your frequency array (e.g. charfreq[0] holds the number of 'a', charfreq[1] holds the number of 'b', etc...)
Putting it altogether you could do something similar to:
#include <stdio.h>
#define NLOWER 26 /* if you need a constant, define one */
int main (void) {
int c, /* var to hold each char */
charfreq[NLOWER] = {0}, /* frequency array */
i; /* loop var i */
while ((c = getchar()) != EOF ) /* loop over each char */
if ('a' <= c && c <= 'z') /* is it a lowercase char ? */
charfreq[c - 'a']++; /* update frequency array */
/* output results */
printf ("\ncharacter frequency is:\n");
for (i = 0; i < NLOWER; i++)
if (charfreq[i])
printf (" %c : %2d\n", 'a' + i, charfreq[i]);
return 0;
}
Example Use/Output
$ echo "my dog has xxxyyyzzzz fleas" | ./bin/freqofcharstdin
character frequency is:
a : 2
d : 1
e : 1
f : 1
g : 1
h : 1
l : 1
m : 1
o : 1
s : 2
x : 3
y : 4
z : 4
Look things over and let me know if you have questions. This is a fundamental frequency tracking scheme you will use over and over again in many different circumstances.
Outputting All Frequencies Sequentially
With changes to the printf only, you can output the frequencies as a string of integers instead of nicely formatted tabular output, e.g.
#include <stdio.h>
#define NLOWER 26 /* if you need a constant, define one */
int main (void) {
int c, /* var to hold each char */
charfreq[NLOWER] = {0}, /* frequency array */
i; /* loop var i */
while ((c = getchar()) != EOF ) /* loop over each char */
if ('a' <= c && c <= 'z') /* is it a lowercase char ? */
charfreq[c - 'a']++; /* update frequency array */
/* output results */
printf ("\ncharacter frequency is:\n");
for (i = 0; i < NLOWER; i++)
if (charfreq[i])
#ifdef SEQUENTIALOUT
printf ("%d", charfreq[i]);
putchar ('\n');
#else
printf (" %c : %2d\n", 'a' + i, charfreq[i]);
#endif
return 0;
}
Compile with SEQUENTIALOUT defined
$ gcc -Wall -Wextra -pedantic -std=gnu11 -Ofast -DSEQUENTIALOUT \
-o bin/freqofcharstdin freqofcharstdin.c
Example Use/Output
$ echo "my dog has xxxyyyzzzz fleas" | ./bin/freqofcharstdin
character frequency is:
2111111112344
or for the exact string and output in question:
$ echo "xxxyyyzzzz" | ./bin/freqofcharstdin
character frequency is:
334
Sequential Duplicate Characters
If I misunderstood your question and you do not want the frequency of occurrence, but instead you want the count of sequential duplicate characters, then you can do something simple like the following:
#include <stdio.h>
#define NLOWER 26 /* if you need a constant, define one */
int main (void) {
int c, /* var to hold each char */
prev = 0, /* var to hold previous char */
count = 1; /* sequential count */
while ((c = getchar()) != EOF) { /* loop over each char */
if (prev) { /* does prev contain a char ? */
if (prev == c) /* same as last ? */
count++; /* imcrement count */
else { /* chars differ */
printf ("%d", count); /* output count */
count = 1; /* reset count */
}
}
prev = c; /* save c as prev */
}
putchar ('\n');
return 0;
}
Example Use/Output
$ echo "xxyxx" | ./bin/sequentialduplicates
212
$ echo "xxxyyyzzzz" | ./bin/sequentialduplicates
334
you are missing two things, saving the last reviewed char and resetting the counter when updating it...
int c, prev = -1, count = 0;
while ((c = getchar()) != '\n') {
/* if the current char is the same as the last one reviewed increment counter*/
if (c == prev) {
count++;
}
else {
/* handle the start condition of prev */
if (prev != -1) {
printf("%d", count);
}
/* update the prev char reviewed */
prev = c;
/* reset the counter - upon char change */
count = 1;
}
}
/* print out count for final letter */
printf("%d", count);

How to exit stdin when EOF is reached in C programming

I'm having troubles with my program. The aim of it is to read an input of numbers from the user, and when they stop inputting numbers (ctrl-d) it collects the inputted numbers and prints out 'Odd numbers were: blah blah'
and 'Even numbers were: blah blah'.
I'm having trouble with how to exit the program at EOF and when it feels like I have overcome that problem, another problem occurs which is my program doesn't print the numbers from the array. It only prints 'Odd numbers were:' and 'Even numbers were:'.
Any help is appreciated.
Thanks
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
int main(void) {
int n, i, array[1000];
i=0;
while (i = getchar() !=EOF) {
scanf("%d", &array[i]);
i++;
}
printf("Odd numbers were:");
i=0 ;
while(i = getchar() != EOF) {
if(array[i]%2!=0) {
printf(" %d", array[i]);
i++;
}
}
printf("\nEven numbers were:");
i=0 ;
while(i = getchar() !=EOF) {
if (array[i]%2==0) {
printf(" %d", array[i]);
i++;
}
}
printf("\n");
return 0;
}
Performing Single-Digit Conversion to int
You may be making things a bit harder on yourself than it needs to be. While you can use while (scanf ("%d", &array[i]) == 1) to read whitespace separated integers and avoid having to perform a manual conversion, if your intent is to read single-digits, then using getchar() is fine. (for that matter, you can read any multi-digit integer with getchar(), it is simply up to you to provide a conversion from the ASCII characters to final numeric value)
The manual conversion for single-digit characters is straight forward? When you read digits as characters, you are not reading the integer value represented by the digit, you are reading the ASCII value for the character that represents each digit. See ASCII Table and Description. In order to convert a single ASCII character digit to it's integer value, you must subtract the ASCII value of '0'. (note: the single-quotes are significant)
For example if you read a digit with int c = getchar();, then you need to subtract '0' to obtain the single-digit integer value, e.g. int n = c - '0';
When filling an array, you must ALWAYS protect against writing beyond the bounds of your array. If you declare int array[1000] = {0}; (which has available zero-based array indexes of 0-999), then you must validate that you never write beyond index 999 or Undefined Behavior results. To protect your array bounds, simply keep track of the number of indexes filled and test it is always below the number of array elements available, e.g.
while (n < MAX && (c = getchar()) != EOF) /* always protect array bounds */
if ('0' <= c && c <= '9') /* only handle digits */
array[n++] = c - '0'; /* convert ASCII to int */
Next, while you are free to test n % 2 (using the modulo operator), there is no need (little endian). Since any odd number will have its ones-bit set to 1, all you need is a simple bitwise comparison, e.g (7 in binary is 0111).
if (array[i] & 1) /* if ones-bit is 1, odd */
printf (" %d", array[i]);
Of course, for even numbers, the ones-bit will be 0 (e.g. 8 in binary is 1000), so the corresponding test can be:
if ((array[i] & 1) == 0) /* if ones-bit is 0, even */
printf (" %d", array[i]);
Putting all of the pieces together, you can store all single digits read in array and then segregate the even and odd numbers for printing in a very simple manner (note: neither stdlib.h or math.h are required),
#include <stdio.h>
#define MAX 1000 /* define a constant rather than use 'magic' numbers in code */
int main (void)
{
int array[MAX] = {0},
c, i, n = 0;
while (n < MAX && (c = getchar()) != EOF) /* always protect array bounds */
if ('0' <= c && c <= '9') /* only handle digits */
array[n++] = c - '0'; /* convert ASCII to int */
printf ("\narray : "); /* output array contents */
for (i = 0; i < n; i++)
printf (" %d", array[i]);
printf ("\narray - odd : ");
for (i = 0; i < n; i++)
if (array[i] & 1) /* if ones-bit is 1, odd */
printf (" %d", array[i]);
printf ("\narray - even : ");
for (i = 0; i < n; i++)
if ((array[i] & 1) == 0) /* if ones-bit is 0, even */
printf (" %d", array[i]);
putchar ('\n'); /* tidy up w/newline */
return 0;
}
Example Compilation with Warnings Enabled
$ gcc -Wall -Wextra -pedantic-std=gnu11 -Ofast -o bin/arrayevenodd arrayevenodd.c
Example Use/Output
$ echo "01234567890123456789" | ./bin/arrayevenodd
array : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
array - odd : 1 3 5 7 9 1 3 5 7 9
array - even : 0 2 4 6 8 0 2 4 6 8
Look things over and let me know if you have further questions.
Performing Manual Multi-Digit Conversion to int
If you are needing to convert multi-digit integers from characters, the easy way is to use a function that provides the conversion for you (e.g. the scanf family of functions or by using line oriented with fgets or getline and parsing and converting the strings of digits with strtok and then strtol, or parsing with sscanf, etc.)
However, performing a manual conversion of individual characters read by getchar() into a multi-digit integers is straight forward. You simply check for a valid character than can begin an integer (including the prefixes of +/-) and sum each digit providing a proper offset for the place of the digit by increasingly multiplying the sum by 10 before adding (or actually subtracting if building a the sum as a negative sum for coding efficiency purposes) each digit until you reach the next non-digit and then you add the final sum to your array and advance the array index.
While building the sum, it is up to you to check for integer Overflow before adding the final sum to your array. (you can handle the overflow condition however you like, the example below just throws the error and exits)
The manual conversion probably adds about 20 lines of code, e.g.
#include <stdio.h>
#include <stdint.h> /* exact length types */
#include <limits.h> /* INT_X constants */
#define MAX 1000 /* define constant rather than use 'magic' number in code */
int main (void)
{
int array[MAX] = {0},
c, i, start = 0, sign = 1, n = 0;
int64_t sum = 0;
while (n < MAX && (c = getchar()) != EOF) { /* always protect array bounds */
if (!start) { /* flag to start building int */
start = 1; /* flag - working on int */
if (c == '+') /* handle leading '+' sign */
continue;
else if (c == '-') /* handle leading '-' sign */
sign = -1;
else if ('0' <= c && c <= '9') /* handle digits */
sum = sum * 10 - (c - '0'); /* note: sum always computed */
else /* as negative value */
start = 0; /* reset - char not handled */
}
else if ('0' <= c && c <= '9') { /* handle digits */
sum = sum * 10 - (c - '0'); /* convert ASCII to int */
if (sum < INT_MIN || (sign != -1 && -sum > INT_MAX))
goto err; /* check for overflow, handle error */
}
else { /* non-digit ends conversion */
if (sum) /* if sum has value, add to array */
array[n++] = sign != -1 ? -sum : sum;
sign = 1; /* reset values for next conversion */
start = 0;
sum = 0;
}
}
if (sum) /* add last element to array on EOF */
array[n++] = sign != -1 ? -sum : sum;
printf ("\narray : "); /* output array contents */
for (i = 0; i < n; i++)
printf (" %d", array[i]);
printf ("\narray - odd : ");
for (i = 0; i < n; i++)
if (array[i] & 1) /* if ones-bit is 1, odd */
printf (" %d", array[i]);
printf ("\narray - even : ");
for (i = 0; i < n; i++)
if ((array[i] & 1) == 0) /* if ones-bit is 0, even */
printf (" %d", array[i]);
putchar ('\n'); /* tidy up w/newline */
return 0;
err:
fprintf (stderr, "error: overflow detected - array[%d]\n", n);
return 1;
}
Example Use/Output
$ echo "1,2,3,5,76,435" | ./bin/arrayevenodd2
array : 1 2 3 5 76 435
array - odd : 1 3 5 435
array - even : 2 76
Example with '+/-' Prefixes
$ echo "1,2,-3,+5,-76,435" | ./bin/arrayevenodd2
array : 1 2 -3 5 -76 435
array - odd : 1 -3 5 435
array - even : 2 -76
Look over the new example and let me know if you have any more questions.
Change the input loop to:
n = 0;
while ( 1 == scanf("%d", &array[n]) )
++n;
What you actually want to do is keep reading numbers until the read attempt fails; this loop condition expresses that logic. Forget about EOF. (It would also be a good idea to add logic to stop when n reaches 1000, to avoid a buffer overflow).
In the output loops, you do not want to do any input, so it is not a good idea to call getchar(). Instead, use a "normal" loop, for (i = 0; i < n; ++i).

Program runs too slowly with large input - C

The goal for this program is for it to count the number of instances that two consecutive letters are identical and print this number for every test case. The input can be up to 1,000,000 characters long (thus the size of the char array to hold the input). The website which has the coding challenge on it, however, states that the program times out at a 2s run-time. My question is, how can this program be optimized to process the data faster? Does the issue stem from the large char array?
Also: I get a compiler warning "assignment makes integer from pointer without a cast" for the line str[1000000] = "" What does this mean and how should it be handled instead?
Input:
number of test cases
strings of capital A's and B's
Output:
Number of duplicate letters next to each other for each test case, each on a new line.
Code:
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>
int main() {
int n, c, a, results[10] = {};
char str[1000000];
scanf("%d", &n);
for (c = 0; c < n; c++) {
str[1000000] = "";
scanf("%s", str);
for (a = 0; a < (strlen(str)-1); a++) {
if (str[a] == str[a+1]) { results[c] += 1; }
}
}
for (c = 0; c < n; c++) {
printf("%d\n", results[c]);
}
return 0;
}
You don't need the line
str[1000000] = "";
scanf() adds a null terminator when it parses the input and writes it to str. This line is also writing beyond the end of the array, since the last element of the array is str[999999].
The reason you're getting the warning is because the type of str[10000000] is char, but the type of a string literal is char*.
To speed up the program, take the call to strlen() out of the loop.
size_t len = strlen(str)-1;
for (a = 0; a < len; a++) {
...
}
str[1000000] = "";
This does not do what you think it does and you're overflowing the buffer which results in undefined behaviour. An indexer's range is from 0 - sizeof(str) EXCLUSIVE. So you either add one to the
1000000 when initializing or use 999999 to access it instead. To get rid of the compiler warning and produce cleaner code use:
str[1000000] = '\0';
Or
str[999999] = '\0';
Depending on what you did to fix it.
As to optimizing, you should look at the assembly and go from there.
count the number of instances that two consecutive letters are identical and print this number for every test case
For efficiency, code needs a new approach as suggeted by #john bollinger & #molbdnilo
void ReportPairs(const char *str, size_t n) {
int previous = EOF;
unsigned long repeat = 0;
for (size_t i=0; i<n; i++) {
int ch = (unsigned char) str[i];
if (isalpha(ch) && ch == previous) {
repeat++;
}
previous = ch;
}
printf("Pair count %lu\n", repeat);
}
char *testcase1 = "test1122a33";
ReportPairs(testcase1, strlen(testcase1));
or directly from input and "each test case, each on a new line."
int ReportPairs2(FILE *inf) {
int previous = EOF;
unsigned long repeat = 0;
int ch;
for ((ch = fgetc(inf)) != '\n') {
if (ch == EOF) return ch;
if (isalpha(ch) && ch == previous) {
repeat++;
}
previous = ch;
}
printf("Pair count %lu\n", repeat);
return ch;
}
while (ReportPairs2(stdin) != EOF);
Unclear how OP wants to count "AAAA" as 2 or 3. This code counts it as 3.
One way to dramatically improve the run-time for your code is to limit the number of times you read from stdin. (basically process input in bigger chunks). You can do this a number of way, but probably one of the most efficient would be with fread. Even reading in 8-byte chunks can provide a big improvement over reading a character at a time. One example of such an implementation considering capital letters [A-Z] only would be:
#include <stdio.h>
#define RSIZE 8
int main (void) {
char qword[RSIZE] = {0};
char last = 0;
size_t i = 0;
size_t nchr = 0;
size_t dcount = 0;
/* read up to 8-bytes at a time */
while ((nchr = fread (qword, sizeof *qword, RSIZE, stdin)))
{ /* compare each byte to byte before */
for (i = 1; i < nchr && qword[i] && qword[i] != '\n'; i++)
{ /* if not [A-Z] continue, else compare */
if (qword[i-1] < 'A' || qword[i-1] > 'Z') continue;
if (i == 1 && last == qword[i-1]) dcount++;
if (qword[i-1] == qword[i]) dcount++;
}
last = qword[i-1]; /* save last for comparison w/next */
}
printf ("\n sequential duplicated characters [A-Z] : %zu\n\n",
dcount);
return 0;
}
Output/Time with 868789 chars
$ time ./bin/find_dup_digits <dat/d434839c-d-input-d4340a6.txt
sequential duplicated characters [A-Z] : 434893
real 0m0.024s
user 0m0.017s
sys 0m0.005s
Note: the string was actually a string of '0's and '1's run with a modified test of if (qword[i-1] < '0' || qword[i-1] > '9') continue; rather than the test for [A-Z]...continue, but your results with 'A's and 'B's should be virtually identical. 1000000 would still be significantly under .1 seconds. You can play with the RSIZE value to see if there is any benefit to reading a larger (suggested 'power of 2') size of characters. (note: this counts AAAA as 3) Hope this helps.

Validation 2 digits minimum C language

My job is to make a password validator with:
No spaces allowed
1 Symbol
2 Digits
Minimum 6 characters
Maximum 10 characters
Just the above restrictions, so far I have done alone the 1, 2, 4, 5 and I can't solve the third requirement about 2 digits. I can only do so with 1 digit so how do I to do it with 2? I think regex doesn't work like C++ and C# in C. Here is my code:
#include <stdio.h>
#include <string.h>
int main(){
char pass[11];
int stop;
printf("Give password:\n");
scanf(" %[^\n]s",pass);
do{
if(strlen(pass)<6){
stop=0;
printf("Too short password.");
}
else if(strlen(pass)>10){
stop=0;
printf("Too long password.");
}
else if(strchr(pass, ' ')){
stop=0;
printf("No spaces.");
}
else if((strchr(pass, '$')==NULL && strchr(pass, '#')==NULL && strchr(pass, '#')==NULL && strchr(pass, '!')==NULL && strchr(pass, '%')==NULL && strchr(pass, '*')==NULL)){
stop=0;
printf("Must give at least one of $, #, #, !, %% or *.");
}
else{
stop=1;
printf("Your password is %s\n", pass);
}
}while(stop=0);
return 0;}
That's not really a regex. What you have there is basically a small program that checks its input for different conditions. So, in the same manner, to make sure you have 2 digits, let's create a function that counts the number of digits. Keep in mind that in ASCII the digits 0 to 9 are in a continuous block (this is very important, as you will see).
int countDigits(char *input) {
int digitCount = 0;
for (int i = 0; i < strlen(input); i++) // for every character
if (input[i] >= '0' && input[i] <= '9') // if it is a digit
digitCount++;
return digitCount;
}
I've written this function this way to introduce show some character manipulation techniques. It would have been better if the condition would be:
if (isdigit(input[i]))
for portability reasons. Note that the function isdigit is defined in the ctype.h header. There is still some room for improvement:
int countDigits(char *input) {
int digitCount = 0;
int noOfCharacters = strlen(input); // avoid strlen() being called
// for every iteration
for (int i = 0; i < strlen(input); i++) // for every character
digitCount += isdigit(input[i]);
return digitCount;
}
Now you need just to take this function and check if it returns 2 for the input you've got.
No reason for small-ish buffer. Test #5 cannot have been done properly with pass[11]
// char pass[11];
char pass[100];
Dangerous to use unlimited length input. Change to
scanf(" %99[^\n]",pass);
Count the digits
int cnt = 0;
for (i=0; pass[i]; i++) {
if (pass[i] >= '0' && pass[i] <= '9') cnt++;
}
if (cnt != 2) BadPsss();
RegEx search [^0-9] the input string and replace globally with nothing.
Verify the remaining string to be two characters long.

Resources