Comparing two arrays in C without case sensitivity - arrays

I have for example two arrays
char 1 is: 77a abcd Abc abc1d #### v k
char2 is: 789 ABA AABB 123 ab #% abcde
The common index should be in places 0,3,4,5,9,10,12,20
The result should be 8 but I get 9 The problem is that an Aski code lower than 64 still works and it should not
Is the code
int intersection(char arrayNumberOne[size], char arrayNumberTwo[size])
{
int counter = 0;
for (int i = 0; i < strlen(arrayNumberOne); i++)
{
if ((arrayNumberOne[i] == arrayNumberTwo[i]) || (arrayNumberOne[i] == arrayNumberTwo[i] + 32) || (arrayNumberOne[i] + 32 == arrayNumberTwo[i]))
{
if (arrayNumberOne[i] < 64)
{
?????
}
counter++;
}
}
return counter;
}

Sum 32 to an ASCII character can lead to undesirable combinations, and you already found the issue.
I suggest you, break your conditionals into small pieces and there are some small changes:
First, the condition about the symbols is incomplete, then change:
if (arrayNumberOne[i] < 64)
to:
if (arrayNumberOne[i] <= 64 || arrayNumberTwo[i] <= 64)
Because both arrays could contain a symbol.
Second, organize the expressions to see the logic better, but you can join after all:
// considering inside a for-loop used 'continue' to skip the other verifications
// compare any character
// both lower case or upper case
if (arrayNumberOne[i] == arrayNumberTwo[i]) {
counter++;
continue;
}
// is a symbol, skip the verification ahead
if (arrayNumberOne[i] <= 64 || arrayNumberTwo[i] <= 64)
continue;
// character verifications
// first is uppercase
if (arrayNumberOne[i] + 32 == arrayNumberTwo[i]) {
counter++;
continue;
}
// second is uppercase
if (arrayNumberOne[i] == arrayNumberTwo[i] + 32) {
counter++;
continue;
}
This will make your code work but would be better to check for 'a-z' or use tolower as they said in the comments/answers.

Use tolower, no need to do strlen and you need to check both string length.
cast to (unsigned char) this allow to avoid an undefined behavior if char is a signed type (which it often is) and if any of the arrays contain negative char values (that are different from EOF: the one negative value permitted in tolower) (thanks #Kaz to flag this)
#include <ctype.h>
#include <stdio.h>
int intersection(const char *s1, const char *s2)
{
int counter = 0;
while (*s1 != '\0' && *s2 != '\0') {
if (tolower((unsigned char) *s1++) == tolower((unsigned char) *s2++)) {
counter++;
}
}
return counter;
}
int main() {
printf("%d\n", intersection("77a abcd Abc abc1d #### v k",
"789 ABA AABB 123 ab #% abcde"));
return 0;
}
Returns 8

Related

Substitute characters in a string with their values?

I have a string given (a+b)&(a+c) and I have created a truth table with values of a,b, and c. Now the problem is to evaluate the logic expression by substituting a,b, and c with corresponding values from the truth table. How it can be done in C?
Ex: a=0 b=0 c=0 r=(0+0)&(0+)=0
a=0 b=0 c=1 r=(0+0)&(0+1)=0
and so on
The code itself looks like this
#include <stdio.h>
#include <stdlib.h>
int main()
{
char c,* str, *vars, **result;
int i=0,count=0,j=0;
unsigned long long rows;
str = (char*) malloc(1*sizeof(char));
vars=(char*) malloc(1*sizeof(char));
result=(char**)malloc(1*sizeof(char));
char values[] = {'F', 'T'};
while ((c = getchar()) != EOF)
{
str[i++] = c;
str = (char*) realloc(str, (i+1) * sizeof(char));
if (c >= 'a' && c <= 'z')
{
vars[j++]=c;
vars=(char*) realloc(vars,(j+1)*sizeof(char));
count++;
}
}
rows=1ULL<<(count);
result=(char**)realloc(result,(rows+2)*sizeof(char));
for (i = 0; i < rows+1; i++)
{
result[i]=(char*)malloc(sizeof(char)*(count+1));
for (j = 0; j < count; j++)
{
if(i==0)
result[i][j]=vars[j];
else
result[i][j]=values[(i >> j) & 1];
}
}
result[0][count]='R';
for(i=0;i<rows+1;i++)
{
for(j=0;j<count+1;j++)
{
//do something
}
}
Now the problem is to evaluate the logic expression by substituting a,b, and c with corresponding values from the truth table.
Aside from the issues mentioned in the question's comments, substituting alone won't do the job to evaluate the logic expression. The following function for example substitutes the values while evaluating the expression. (You didn't specify the general syntax of your expressions, so I chose to support combinations of the used operators and lower case variables.)
#include <ctype.h>
#include <string.h>
int indx(char *s, char c) { return strchr(s, c)-s; }
char *gstr, *gvars, *vals; // expression string, variables, value combination
char eval()
{ // evaluate expression "gstr"
char or = 0; // neutral element of +
do
{
char and = 1; // neutral element of &
do
{
char c = *gstr++; // get next token
if (islower(c))
and &= indx("FT", vals[indx(gvars, c)]);
else
if (c == '(')
{ // evaluate subexpression
and &= eval();
c = *gstr++; // get next token
if (c != ')')
printf("error at '%c': expected ')'\n", c), exit(1);
}
else
printf("error at '%c'\n", c), exit(1);
} while (*gstr == '&' && ++gstr);
or |= and;
} while (*gstr == '+' && ++gstr);
return or;
}
It can be called from your main (inserted in your code, hence the inconsistent spacing)
result[0][count]='R';
gvars = vars; // make variable names globally accessible
for (i = 1; i <= rows; ++i)
{
gstr = str, vals = result[i], // globally accessible
result[i][count] = values[eval()];
while (isspace(*gstr)) ++gstr;
if (*gstr)
printf("error at '%c': expected end of input\n", *gstr), exit(1);
}
for(i=0;i<rows+1;i++)
{
for(j=0;j<count+1;j++)
{
putchar(result[i][j]);
}
putchar('\n');
}
(Don't forget to put str[i] = '\0'; after your getchar loop to make a null-terminated string.) Note that due to the given for loop counting, the order of the truth table entries is somewhat unusual in that the row with all variables F comes last.

how to check if the string contains only numbers or word in c

My teacher gave us code that Write a function declared as "int categorize(char *str)" which accepts a string as an input parameter and categorize the string. The function should return the category number as decribed below:
Category 1: If the string contains only capital letters (e.g. ABCDEF), the function should return 1.
Category 2: If the string contains only small letters (e.g. abcdef), the function should return 2.
Category 3: If the string contains only letters but no other symbols (e.g. ABCabc), the function should return 3.
Category 4: If the string contains only digits (e.g. 12345), the function should return 4.
Category 5: If the string contains both letters and digits but no other symbols (e.g. Hello123), the function should return 5.
Category 6: Otherwise, the function should return 6.
I write this but not working
#include<stdio.h>
#include<string.h>
// Write your categorize function here
int categorize(char *str) {
int x=0;
while(str[x]!= '\0'){
if((*str>='A'&&*str<='Z') && (*str>='a'&&*str<='z') && (*str>='0'&&*str<='9'))
return 5;
else if((*str>='A'&&*str<='Z') && (*str>='a'&&*str<='z'))
return 3;
if((*str>='A') && (*str<='Z'))
return 1;
else if((*str>='a')&&(*str<='z'))
return 2;
else if((*str>='0')&&(*str<='9'))
return 4;
x++;
}
return 6;
}
///////////////////////////////////////
// Test main()
// DO NOT MODIFY main()
// or your code will not be evaluated
int main() {
char str[100];
scanf("%s",str);
int c = categorize(str);
printf("%d\n",c);
return 0;
}
///////////////////////////////////////
*str this is a pointer to the first character of the string so along your cycle it only checks this first character.
Furthermore you are returning values before checking the entire string, you need to use flags and set them along the cycle for digits caps and small letters, and in the end return the values according to those flags.
There is the <ctype.h> library which has functions like isalpha and isdigit and can make your job easier, for that I refer you to #DevSolar answer as a better method.
If you can't use it this is the way to go:
Live demo
int categorize(char *str) {
int x = 0;
int is_digit, is_cap, is_small;
is_cap = is_digit = is_small = 0;
while (str[x] != '\0') {
if (!(str[x] >= 'a' && str[x] <= 'z') && !(str[x] >= 'A' && str[x] <= 'Z') && !(str[x] >= '0' && str[x] <= '9'))
return 6;
if (str[x] >= 'A' && str[x] <= 'Z' && !is_cap) //if there are caps
is_cap = 1;
if (str[x] >= '0' && str[x] <= '9' && !is_digit) //if there are digits
is_digit = 1;
if (str[x] >= 'a' && str[x] <= 'z' && !is_small) //if there are smalls
is_small = 1;
x++;
}
if ((is_small || is_cap) && is_digit){
return 5;
}
if(is_small && is_cap){
return 3;
}
if(is_digit){
return 4;
}
if(is_small){
return 2;
}
if(is_cap){
return 1;
}
return 6;
}
Note that this kind of character arithmetic works well in ASCII but fails in other character encodings like EBCDIC which don't have sequencial order for alphabetic characters.
You can do this in O(n) as follows:
#include <ctype.h>
#include <stdio.h>
int check_str(char* str)
{
int category = 0; // 0b001 -> Uper | 0b010 -> Lower | 0b100 -> Number
for ( ; *str != 0; str++) {
if (isupper(*str)) {
category |= 1;
}
else if (islower(*str)) {
category |= 2;
}
else if (isdigit(*str)) {
category |= 4;
}
else
return 6;
}
if (category == 1)
return 1;
if (category == 2)
return 2;
if (category == 3)
return 3;
if (category == 4)
return 4;
if (category == 7)
return 5;
return 6;
}
int main()
{
printf("%d\n", check_str("ABCDEF"));
printf("%d\n", check_str("abcdef"));
printf("%d\n", check_str("ABCabc"));
printf("%d\n", check_str("12345"));
printf("%d\n", check_str("Hello123"));
printf("%d\n", check_str("Hello World 123"));
return 0;
}
OUTPUT
1
2
3
4
5
6
EDIT
Since 0b... is not portable, binary numbers are changed to their decimal equivalents
The standard header <ctype.h> literally has everything you are looking for:
Category 1 is if isupper() is true for all characters.
Category 2 is if islower() is true for all characters.
Category 3 is if isalpha() is true for all characters.
Category 4 is if isdigit() is true for all characters.
Category 5 is if isalnum() is true for all characters.
A quick-and-dirty approach could just iterate through the input once for each of the above:
int categorize( char * str )
{
char * s;
for ( s = str; *s; ++s )
{
/* Need the cast to unsigned char here; negative values
are reserved for EOF!
*/
if ( ! isupper( (unsigned char)*s ) )
{
break;
}
}
if ( ! *s )
{
/* True if the loop ran its full course without breaking */
return 1;
}
/* Same for the other categories */
return 6;
}
Once you got this done, you could start getting funny with the logic, putting all the checks into one loop and keeping track of conditions met.
You seem to have a pretty good start for your project. You capture a String, and then categorize it.
Where you need to improve is the logic for categorization. I would start by making each of your categories into a function.
int isOnlyCapital(char* string)
might return 1 if the string is only comprised of capital letters, and 0 otherwise. The logic might look like
char* position = string;
while (position != '\0') {
if (*position < 'A' || *position > 'Z') {
return false;
}
position++;
}
return true;
Note that you can only return true if you managed to check all charaters with the above code, as you have looped over all characters until you hit the NULL character \0.
With some work, you can modify the above loop to handle your other cases. That would make your logic for the overall top level look like.
if (isOnlyCapital(string)) return 1;
if (isOnlyLowercase(string)) return 2;
... and so on...

Questions about a Kernighan and Ritchie Excercise 2-3

I'm trying to write a program in C that converts hexadecimal numbers to integers. I've written successfully a program that converts octals to integers. However, the problems begin once I start using the letters (a-f). My idea for the program is ads follows:
The parameter must be a string that starts with 0x or 0X.
The parameter hexadecimal number is stored in a char string s[].
The integer n is initialized to 0 and then converted as per the rules.
My code is as follows (I've only read up to p37 of K & R so don't know much about pointers) :
/*Write a function htoi(s), which converts a string of hexadecimal digits (including an optional 0x or 0X) into its equivalent integer value. The allowable digits are 0 through 9, a through f, and A through F.*/
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <ctype.h>
int htoi(const char s[]) { //why do I need this to be constant??
int i;
int n = 0;
int l = strlen(s);
while (s[i] != '\0') {
if ((s[0] == '0' && s[1] == 'X') || (s[0] == '0' && s[1] == 'x')) {
for (i = 2; i < (l - 1); ++i) {
if (isdigit(s[i])) {
n += (s[i] - '0') * pow(16, l - i - 1);
} else if ((s[i] == 'a') || (s[i] == 'A')) {
n += 10 * pow(16, l - i - 1);
} else if ((s[i] == 'b') || (s[i] == 'B')) {
n += 11 * pow(16, l - i - 1);
} else if ((s[i] == 'c') || (s[i] == 'C')) {
n += 12 * pow(16, l - i - 1);
} else if ((s[i] == 'd') || (s[i] == 'D')) {
n += 13 * pow(16, l - i - 1);
} else if ((s[i] == 'e') || (s[i] == 'E')) {
n += 14 * pow(16, l - i - 1);
} else if ((s[i] == 'f') || (s[i] == 'F')) {
n += 15 * pow(16, l - i - 1);
} else {
;
}
}
}
}
return n;
}
int main(void) {
int a = htoi("0x66");
printf("%d\n", a);
int b = htoi("0x5A55");
printf("%d\n", b);
int c = htoi("0x1CA");
printf("%d\n", c);
int d = htoi("0x1ca");
printf("%d\n", d);
}
My questions are:
1. If I don't use const in the argument for htoi(s), i get the following warnings from the g++ compiler :
2-3.c: In function ‘int main()’: 2-3.c:93:20: warning: deprecated
conversion from string constant to ‘char*’ [-Wwrite-strings]
2-3.c:97:22: warning: deprecated conversion from string constant to
‘char*’ [-Wwrite-strings] 2-3.c:101:21: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings] 2-3.c:105:21:
warning: deprecated conversion from string constant to ‘char*’
[-Wwrite-strings]
Why is this?
2.Why is my program taking so much time to run? I haven't seen the results yet.
3.Why is it that when I type in cc 2-3.c instead of g++ 2-3.c in the terminal, I get the following error message:
"undefined reference to `pow'"
on every line that I've used the power function?
4. Please do point out other errors/ potential improvements in my program.
If I don't use const in the argument for htoi(s), i get the following warnings from the g++ compiler
The const parameter should be there, because it is regarded as good and proper programming to never typecast away const from a pointer. String literals "..." should be treated as constants, so if you don't have const as parameter, the compiler thinks you are casting away the const qualifier.
Furthermore, you should declare all pointer parameters that you don't intend to modify the contents of as const, Google the term const correctness.
Why is my program taking so much time to run? I haven't seen the results yet.
I think mainly because you have made an initialization goof-up. int i; i contains rubbish. Then while (s[rubbish_value] != '\0'). This function can be written a whole lot better too. Start by checking for the 0x in the start of the string, if they aren't there, signal some error (return NULL?), otherwise discard them. Then start one single loop after that, you don't need 2 loops.
Note that the pow() function deals with float numbers, which will make your program a slight bit slower. You could consider using an integer-only version. Unfortunately there is no such function in standard C, so you will have to found one elsewhere.
Also consider the function isxdigit(), a standard function in ctype.h, which checks for digits 0-9 as well as hex letters A-F or a-f. It may however not help with performance, as you will need to perform different calculations for digits and letters.
For what it is worth, here is a snippet showing how you can convert a single char to a hexadecimal int. It is not the most optimized version possible, but it takes advantage of available standard functions, for increased readability and portability:
#include <ctype.h>
uint8_t hexchar_to_int (char ch)
{
uint8_t result;
if(isdigit(ch))
{
result = ch - '0';
}
else if (isxdigit(ch))
{
result = toupper(ch) - 'A' + 0xA;
}
else
{
// error
}
return result;
}
Don't use a C++ compiler to compile a C program. That's my first advice to you.
Secondly const in a function parameter for a char * ensures that the programmer doesn't accidentally modify the string.
Thirdly you need to include the math library with -lm as stated above.
a const char[] means that you cannot change it in the function. Casting from a const to not-const gives a warning. There is much to be said about const. Check out its Wikipedia page.
--
Probably, cc doesn't link the right libraries. Try the following build command: cc 2-3.c -lm
Improvements:
Don't use pow(), it is quite expensive in terms of processing time.
Use the same trick with the letters as you do with the numbers to get the value, instead of using fixed 'magic' numbers.
You don't need the last else part. Just leave it empty (or put an error message there, because those characters aren't allowed).
Good luck!
About my remark about the pow() call (with the use of the hexchar_to_int() function described above, this is how I'd implement this (without error checking):
const char *t = "0x12ab";
int i = 0, n = 0;
int result = 0;
for (i = 2; i < strlen(t); i++) {
n = hexchar_to_int(t[i]);
result |= n;
result <<= 4;
}
/* undo the last shift */
result >>= 4;
I just worked through this exercise myself, and I think one of the main ideas was to use the knowledge that chars can be compared as integers (they talk about this in chapter 2).
Here's my function for reference. Thought it may be useful as the book doesn't contain answers to exercises.
int htoi(char s[]) {
int i = 0;
if(s[i] == '0') {
++i;
if(s[i] == 'x' || s[i] == 'X') {
++i;
}
}
int val = 0;
while (s[i] != '\0') {
val = 16 * val;
if (s[i] >= '0' && s[i] <= '9')
val += (s[i] - '0');
else if (s[i] >= 'A' && s[i] <= 'F')
val += (s[i] - 'A') + 10;
else if (s[i] >= 'a' && s[i] <= 'f')
val += (s[i] - 'a') + 10;
else {
printf("Error: number supplied not valid hexadecimal.\n");
return -1;
}
++i;
}
return val;
}
Always init your variables int i=0, otherwise i will contain a garbage value, could be any number, not necessary 0 as you expect. You're running the while statement in an infinite loop, that's why it takes forever to get the results, print i to see why. Also, add a break if the string doesn't start with 0x, will avoid the same loop issue when the user is used on a random string. As others mention you need to import the library containing pow function and declare your string with const to get rid of the warning.
This is my version of program for the question above. It converts the string of hex into decimal digits irrespective of optional prefix(0x or 0X).
4 important library functions used are strlen(s), isdigit(c), isupper(c), isxdigit(c), pow(m,n)
Suggestions to improve the code are welcome :)
/*Program - 5d Function that converts hex(s)into dec -*/
#include<stdio.h>
#include<stdlib.h>
#include<math.h> //Declares mathematical functions and macros
#include<string.h> //Refer appendix in Page 249 (very useful)
#define HEX_LIMIT 10
int hex_to_dec(char hex[]) //Function created by me :)
{
int dec = 0; //Initialization of decimal value
int size = strlen(hex); //To find the size of hex array
int temp = size-1 ; //Pointer pointing the right element in array
int loop_limit = 0; //To exclude '0x' or 'OX' prefix in input
if(hex[0]=='0' && ((hex[1]=='x') || (hex[1]=='X')))
loop_limit = 2;
while(temp>=loop_limit)
{
int hex_value = 0; //Temporary value to hold the equivalent hex digit in decimal
if(isdigit(hex[temp]))
hex_value = (hex[(temp)]-'0') ;
else if(isxdigit(hex[temp]))
hex_value = (toupper(hex[temp])-'A' + 10);
else{
printf("Error: No supplied is not a valid hex\n\n");
return -1;
}
dec += hex_value * pow(16,(size-temp-1)); //Computes equivalent dec from hex
temp--; //Moves the pointer to the left of the array
}
return dec;
}
int main()
{
char hex[HEX_LIMIT];
printf("Enter the hex no you want to convert: ");
scanf("%s",hex);
printf("Converted no in decimal: %d\n", hex_to_dec(hex));
return 0;
}

method for expand a-z to abc...xyz form

Hi:) what i'm trying to do is write a simple program to expand from shortest entry
for example
a-z or 0-9 or a-b-c or a-z0-9
to longest write
for example
abc...xyz or 0123456789 or abc or abcdefghijklmnouprstwxyz0123456789
1-st examle shortest entry = 1-st example result which should give:)
so far i write something like this and it's work only for letters from a to z:
expand(char s[])
{
int i,n,c;
n=c=0;
int len = strlen(s);
for(i = 1;s[i] > '0' && s[i]<= '9' || s[i] >= 'a' && s[i] <= 'z' || s[i]=='-';i++)
{
/*c = s[i-1];
g = s[i];
n = s[i+1];*/
if( s[0] == '-')
printf("%c",s[0]);
else if(s[i] == '-')
{
if(s[i-1]<s[i+1])
{
while(s[i-1] <= s[i+1])
{
printf("%c", s[i-1]);
s[i-1]++;
}
}
else if(s[i-1] == s[i+1])
printf("%c",s[i]);
else if(s[i+1] != '-')
printf("%c",s[i]);
else if(s[i-1] != '-')
printf("%c",s[i]);
}
else if(s[i] == s[i+1])
{
while(s[i] == s[i+1])
{
printf("%c",s[i]);
s[i]++;
}
}
else if( s[len] == '-')
printf("%c",s[len]);
}
}
but now i'm stuck:(
any ideas what should i check to my program work correctly?
Edit1: #Andrew Kozak (1) abcd (2) 01234
Thanks for advance:)
Here is a C version (in about 38 effective lines) that satisfies the same test as my earlier C++ version.
The full test program including your test cases, mine and some torture test can be seen live on http://ideone.com/sXM7b#info_3915048
Rationale
I'm pretty sure I'm overstating the requirements, but
this should be an excellent example of how to do parsing in a robust fashion
use states in an explicit fashion
validate input (!)
this version doesn't assume a-c-b can't happen
It also doesn't choke or even fail on simple input like 'Hello World' (or (char*) 0)
it shows how you can avoid printf("%c", c) each char without using extraneous functions.
I put in some comments as to explain what happens why, but overall you'll find that the code is much more legible anyways, by
staying away from too many short-named variables
avoiding complicated conditionals with un-transparent indexers
avoiding the whole string length business: We only need max lookahead of 2 characters, and *it=='-' or predicate(*it) will just return false if it is the null character. Shortcut evaluation prevents us from accessing past-the-end input characters
ONE caveat: I haven't implemented a proper check for output buffer overrun (the capacity is hardcoded at 2048 chars). I'll leave it as the proverbial exercise for the reader
Last but not least, the reason I did this:
It will allow me to compare raw performance of the C++ version and this C version, now that they perform equivalent functions. Right now, I fully expect the C version to outperform the C++ by some factor (let's guess: 4x?) but, again, let's just see what suprises the GNU compilers have in store for us. More later Update turns out I wasn't far off: github (code + results)
Pure C Implementation
Without further ado, the implementation, including the testcase:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int alpha_range(char c) { return (c>='a') && (c<='z'); }
int digit_range(char c) { return (c>='0') && (c<='9'); }
char* expand(const char* s)
{
char buf[2048];
const char* in = s;
char* out = buf;
// parser state
int (*predicate)(char) = 0; // either: NULL (free state), alpha_range (in alphabetic range), digit_range (in digit range)
char lower=0,upper=0; // tracks lower and upper bound of character ranges in the range parsing states
// init
*out = 0;
while (*in)
{
if (!predicate)
{
// free parsing state
if (alpha_range(*in) && (in[1] == '-') && alpha_range(in[2]))
{
lower = upper = *in++;
predicate = &alpha_range;
}
else if (digit_range(*in) && (in[1] == '-') && digit_range(in[2]))
{
lower = upper = *in++;
predicate = &digit_range;
}
else *out++ = *in;
} else
{
// in a range
if (*in < lower) lower = *in;
if (*in > upper) upper = *in;
if (in[1] == '-' && predicate(in[2]))
in++; // more coming
else
{
// end of range mode, dump expansion
char c;
for (c=lower; c<=upper; *out++ = c++);
predicate = 0;
}
}
in++;
}
*out = 0; // null-terminate buf
return strdup(buf);
}
void dotest(const char* const input)
{
char* ex = expand(input);
printf("input : '%s'\noutput: '%s'\n\n", input, ex);
if (ex)
free(ex);
}
int main (int argc, char *argv[])
{
dotest("a-z or 0-9 or a-b-c or a-z0-9"); // from the original post
dotest("This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6"); // from my C++ answer
dotest("-x-s a-9 9- a-k-9 9-a-c-7-3"); // assorted torture tests
return 0;
}
Test output:
input : 'a-z or 0-9 or a-b-c or a-z0-9'
output: 'abcdefghijklmnopqrstuvwxyz or 0123456789 or abc or abcdefghijklmnopqrstuvwxyz0123456789'
input : 'This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6'
output: 'This is some efghijklmnopqrstuvwxyz test in 567 steps; this works: abc. This works too: bcdefghijk. Likewise 45678'
input : '-x-s a-9 9- a-k-9 9-a-c-7-3'
output: '-stuvwx a-9 9- abcdefghijk-9 9-abc-34567'
Ok I tested your program out and it seems to be working for nearly every case. It correctly expands a-z and other expansions with only two letters/numbers. It fails when there are more letters and numbers. The fix is easy, just make a new char to keep the last printed character, if the currently printed character matches the last one skip it. The a-z0-9 scenario didn't work because you forgot a s[i] >= '0' instead of s[i] > '0'. the code is:
#include <stdio.h>
#include <string.h>
void expand(char s[])
{
int i,g,n,c,l;
n=c=0;
int len = strlen(s);
for(i = 1;s[i] >= '0' && s[i]<= '9' || s[i] >= 'a' && s[i] <= 'z' || s[i]=='-';i++)
{
c = s[i-1];
g = s[i];
n = s[i+1];
//printf("\nc = %c g = %c n = %c\n", c,g,n);
if(s[0] == '-')
printf("%c",s[0]);
else if(g == '-')
{
if(c<n)
{
if (c != l){
while(c <= n)
{
printf("%c", c);
c++;
}
l = c - 1;
//printf("\nl is %c\n", l);
}
else
{
c++;
while(c <= n)
{
printf("%c", c);
c++;
}
l = c - 1;
//printf("\nl is %c\n", l);
}
}
else if(c == n)
printf("%c",g);
else if(n != '-')
printf("%c",g);
else if(c != '-')
printf("%c",g);
}
else if(g == n)
{
while(g == n)
{
printf("%c",s[i]);
g++;
}
}
else if( s[len] == '-')
printf("%c",s[len]);
}
printf("\n");
}
int main (int argc, char *argv[])
{
expand(argv[1]);
}
Isn't this problem from K&R? I think I saw it there. Anyway I hope I helped.
Based on the fact that the existing function addresses "a-z" and "0-9" sequences just fine, separately, we should explore what happens when they meet. Trace your code (try printing each variable's value at each step -- yes it will be cluttered, so use line breaks), and I believe you will find a logical short-circuit when iterating, for example, from "current token is 'y' and next token is 'z'" to "current token is 'z' and next token is '0'". Explore the if() condition and you will find that it does not cover all possibilities, i.e. you have covered yourself if you are within a<-->z, within 0<-->9, or exactly equal to '-', but you have not considered being at the end of one (a-z or 0-9) with your next character at the start of the next.
Just for fun, I decided to demonstrate to myself that C++ is really just as suited to this kind of thing.
Test-first, please
First, let me define the requirements a little more strictly: I assumed it needs to handle these cases:
int main()
{
const std::string in("This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6");
std::cout << "input : " << in << std::endl;
std::cout << "output: " << expand(in) << std::endl;
}
input : This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6
output: This is some efghijklmnopqrstuvwxyz test in 567 steps; this works: abc. This works too: bcdefghijk. Likewise 45678
C++0x Implementation
Here is an implementation (actually a few variants) in 14 lines (23 including whitespace, comments) of C++0x code1
static std::string expand(const std::string& in)
{
static const regex re(R"([a-z](?:-[a-z])+|[0-9](?:-[0-9])+)");
std::string out;
auto tail = in.begin();
for (auto match : make_iterator_range(sregex_iterator(in.begin(), in.end(), re), sregex_iterator()))
{
out.append(tail, match[0].first);
// char range bounds: the cost of accepting unordered ranges...
char a=127, b=0;
for (auto x=match[0].first; x<match[0].second; x+=2)
{ a = std::min(*x,a); b = std::max(*x,b); }
for (char c=a; c<=b; out.push_back(c++));
tail = match.suffix().first;
}
out.append(tail, in.end());
return out;
}
Of course I'm cheating a little because I'm using regex iterators from Boost. I will do some timings comparing to the C version for performance. I rather expect the C++ version to compete within a 50% margin. But, let's see what kind of surprises the GNU compiler ahs in store for us :)
Here is a complete program that demonstrates the sample input. _It also contains some benchmark timings and a few variations that trade-off
functional flexibility
legibility / performance
#include <set> // only needed for the 'slow variant'
#include <boost/regex.hpp>
#include <boost/range.hpp>
using namespace boost;
using namespace boost::range;
static std::string expand(const std::string& in)
{
// static const regex re(R"([a-z]-[a-z]|[0-9]-[0-9])"); // "a-c-d" --> "abc-d", "a-c-e-g" --> "abc-efg"
static const regex re(R"([a-z](?:-[a-z])+|[0-9](?:-[0-9])+)");
std::string out;
out.reserve(in.size() + 12); // heuristic
auto tail = in.begin();
for (auto match : make_iterator_range(sregex_iterator(in.begin(), in.end(), re), sregex_iterator()))
{
out.append(tail, match[0].first);
// char range bounds: the cost of accepting unordered ranges...
#if !SIMPLE_BUT_SLOWER
// debug 15.149s / release 8.258s (at 1024k iterations)
char a=127, b=0;
for (auto x=match[0].first; x<match[0].second; x+=2)
{ a = std::min(*x,a); b = std::max(*x,b); }
for (char c=a; c<=b; out.push_back(c++));
#else // simpler but slower
// debug 24.962s / release 10.270s (at 1024k iterations)
std::set<char> bounds(match[0].first, match[0].second);
bounds.erase('-');
for (char c=*bounds.begin(); c<=*bounds.rbegin(); out.push_back(c++));
#endif
tail = match.suffix().first;
}
out.append(tail, in.end());
return out;
}
int main()
{
const std::string in("This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6");
std::cout << "input : " << in << std::endl;
std::cout << "output: " << expand(in) << std::endl;
}
1 Compiled with g++-4.6 -std=c++0x
This is a Java implementation. It expands the character ranges similar to 0-9, a-z and A-Z. Maybe someone will need it someday and Google will bring them to this page.
package your.package;
public class CharacterRange {
/**
* Expands character ranges similar to 0-9, a-z and A-Z.
*
* #param string a string to be expanded
* #return a string
*/
public static String expand(String string) {
StringBuilder buffer = new StringBuilder();
int i = 1;
while (i <= string.length()) {
final char a = string.charAt(i - 1); // previous char
if ((i < string.length() - 1) && (string.charAt(i) == '-')) {
final char b = string.charAt(i + 1); // next char
char[] expanded = expand(a, b);
if (expanded.length != 0) {
i += 2; // skip
buffer.append(expanded);
} else {
buffer.append(a);
}
} else {
buffer.append(a);
}
i++;
}
return buffer.toString();
}
private static char[] expand(char a, char b) {
char[] expanded = expand(a, b, '0', '9'); // digits (0-9)
if (expanded.length == 0) {
expanded = expand(a, b, 'a', 'z'); // lower case letters (a-z)
}
if (expanded.length == 0) {
expanded = expand(a, b, 'A', 'Z'); // upper case letters (A-Z)
}
return expanded;
}
private static char[] expand(char a, char b, char min, char max) {
if ((a > b) || !(a >= min && a <= max && b >= min && b <= max)) {
return new char[0];
}
char[] buffer = new char[(b - a) + 1];
for (int i = 0; i < buffer.length; i++) {
buffer[i] = (char) (a + i);
}
return buffer;
}
public static void main(String[] args) {
String[] ranges = { //
"0-9", "a-z", "A-Z", "0-9a-f", "a-z2-7", "0-9a-v", //
"0-9a-hj-kmnp-tv-z", "0-9a-z", "1-9A-HJ-NP-Za-km-z", //
"A-Za-z0-9", "A-Za-z0-9+/", "A-Za-z0-9-_" };
for (int i = 0; i < ranges.length; i++) {
String input = ranges[i];
String output = CharacterRange.expand(ranges[i]);
System.out.println("input: " + input);
System.out.println("output: " + output);
System.out.println();
}
}
}
Output:
input: 0-9
output: 0123456789
input: a-z
output: abcdefghijklmnopqrstuvwxyz
input: A-Z
output: ABCDEFGHIJKLMNOPQRSTUVWXYZ
input: 0-9a-f
output: 0123456789abcdef
input: a-z2-7
output: abcdefghijklmnopqrstuvwxyz234567
input: 0-9a-v
output: 0123456789abcdefghijklmnopqrstuv
input: 0-9a-hj-kmnp-tv-z
output: 0123456789abcdefghjkmnpqrstvwxyz
input: 0-9a-z
output: 0123456789abcdefghijklmnopqrstuvwxyz
input: 1-9A-HJ-NP-Za-km-z
output: 123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz
input: A-Za-z0-9
output: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
input: A-Za-z0-9+/
output: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
input: A-Za-z0-9-_
output: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_

How to create a function to count occurrences of each letter?

In C, I need to create a function that, for an input, will count and display the number of times each letter occurs.
For input of "Lorem ipsum dolor sit amet", the function should return something similar to:
a: 0
b: 0
c: 0
d: 1
e: 2
f: 0
...
So you will basically need to read through the entire file char-by-char. Assuming that you know how file-reading works, you will need to do something like (apologies, been a while since I did C):
if (isalpha(ch)) {
count[ch-'a']++;
}
/* rest of code where char pointer is moved on etc. */
You will need to import the ctype library for that:
#include <ctype.h>
** forgot to mention, assumed you would deduce the following: ch is your pointer to the currently read in character, while count[] is an int[], initialized to all zeros with a size of (26 * 2) = 52 to cater for both upper and lowercase. If upper and lower-case should be treated the same, you can use the tolower(int c) function also included in the ctype library. In this case you only need a 26 size array.
if (isalpha(ch)) {
count[tolower(ch)-'a']++;
}
Then the count[] should contain the counts for each character.
/* ***** */
If you wanted to do this with only the stdio.h library, you can implement the two functions used from the ctype.h library.
A simple implementation of the isalpha(int c) function could be something like:
if (((int)c >= 'a' && (int)c <= 'z') || ((int)c >= 'A' && (int)c <= 'Z') {
return TRUE;
} else {
return FALSE;
}
(where TRUE and FALSE are of type your return type and something you defined).
And a REALLY simple version of tolower could be something like:
if ((int)c >= 'A' && (int)c <= 'Z') {
return (int)c - 'a';
} else {
return (int)c;
}
You could probably do without all the casts...
hints:
char c[26] = { 0 }; // init
// read each input chars
++c[input-'a'];
I would have an array (of the size equal to char domain) and increment the count at apropriate position.
count[ch]++;
Just in case you are concerned with speed:
unsigned int chars[255], *p = text;
while(*p)
chars[*p]++;
for(int i = 0; i < 255; i++)
if(i > ('A' - 1) && i < ('Z' + 1))
printf("%c) %u/n", i, chars[i];
Sorry for the "/n" but my mac pro does not have the right character on its keyboard...

Resources