String compare in C

String compare in C - c

I'm a little confused with the string compare strcmp() function in C.
When you have two strings, grass and grapes and you use strcmp(grass, grapes); which results in 39, or any positive number, does this mean that "grapes" is alphabetized before "grass", or the opposite?
I know that if it results to 0, they're equal.

strcmp function starts comparing the first character of each string. If they are equal to each other, it continues with the following pairs until the characters differ or until a terminating null-character is reached.
This means that, this function performs a binary comparison of the characters.
The following program should give you an Idea about how strcmp works:
#include <stdio.h>
#include <string.h>
int stringcmp(char *s1, char *s2){
int count = 0;
while (s1[count] == s2[count]) {
if (s1[count] == '\0' || s2[count] == '\0')
break;
count++;
}
if (s1[count] == '\0' && s2[count] == '\0'){
return 0;
}
if(strlen(s1) < strlen(s2)){
return -1;
}else{
return 1;
}
}
int main(void){
char *b = "grass";
char *a = "grapes";
if(stringcmp(a, b) == 0){
printf("Are equal.\n");
printf("Length of A = %zu\n",strlen(a));
printf("Length of B = %zu\n",strlen(b));
printf("Return of stringcmp = %d\n",stringcmp(a, b));
}else{
printf("Are not equal.\n");
printf("Length of A = %zu\n",strlen(a));
printf("Length of B = %zu\n",strlen(b));
printf("Return of stringcmp = %d\n",stringcmp(a, b));
}
return 0;
}
Output:
Are not equal.
Length of A = 5
Length of B = 6
Return of stringcmp = -1
If you swap a with b you get:
Are not equal.
Length of A = 6
Length of B = 5
Return of stringcmp = 1
And if A and B are the same:
Are equal.
Length of A = 5
Length of B = 5
Return of stringcmp = 0

The return value of strcmp is defined in C99 7.21.4
The sign of a nonzero value returned by the comparison functions memcmp, strcmp,
and strncmp is determined by the sign of the difference between the values of the first
pair of characters (both interpreted as unsigned char) that differ in the objects being
compared.
So if the result is positive, it means the second argument comes after the first.
It's not exactly alphabetical order, but is rather dependent on the underlying encoding of the characers. For instance, in ASCII, 'B' < 'a', because 'B' is encoded as 66 and 'a' is 97. If the characters are all letters of the same case, this will be equivalent to alphabetical order in all (non-multibyte) encodings I'm familiar with, but I don't believe this is required.
For cases like "grass" vs "grapes", it'll just keep scanning until it finds characters that differ ('s' vs 'p' in this case), and then make the decision. A special case of this is when one string is a substring of another: e.g. "grape" vs "grapes". For that case, you just need to remember that "grape" is actually { 'g', 'r', 'a', 'p', 'e', '\0' }, and apply the normal rule: '\0' < 's', so "grape" comes before "grapes".
This would be a conforming implementation of strcmp:
int strcmp(const char *a, const char *b) {
size_t i = 0;
while (a[i] || b[i]) {
if (a[i] != b[i]) {
if (a[i] < b[i]) return -1;
else return 1;
}
i++;
}
return 0;
}

Related

how to know the difference between the sum of the values of the even index and the sum of the values of the odd index in an array (recursive code)

(recursive solutions only ) I'm using the function : int diff(char str[],int i)
the input is the string : 123, the sum of the values in the even indexes is 1+3=4
the sum of the values in the odd indexes is 2
so the difference between the sum of the values of the even index and the sum of the values of the odd index in an array is :4-2= 2.
I have written this in the main but its not right ,how can I fix my code ??? :
printf("Enter a string:");
if(scanf("%s",str)!=1)
{
printf("Input error");
return 1;
}
printf("The difference is: %d", diff(str, 0));
return 0;
and outside the main was the function :
int diff (char str[], int i)
{
if(str[i]=='\0' || i>=100)
return 0;
if(i%2==0)
return (str[i]+diff(str,i+1));
else
return (-str[i] +diff(str,i+1));
}

Another approach could be:
int diff (const char str[])
{
if (str[0] == '\0')
return 0;
if (str[1] == '\0')
return str[0] - '0';
return str[0] - str[1] + diff(str + 2);
}

The code, as written, is not working because it is not converting the character codes held in str to integer values in the range 0-9.
If the input to the diff function was "12345" then inspection of the values of str[0], str[1], ... str[5] either with the debugger or printing them out would show them to be (assuming an ASCII-derived encoding):
49 50 51 52 53
Luckily, (and thanks to user #SomeProgrammerDude for pointing this out), the C standard requires (see, for example: ISO/IEC 9899:TC3 §5.2.1, paragraph 3):
In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.
What this means in practice is you can convert the characters '0','1',...,'9' to their equivalent values by subtracting '0'.
int value = str[i] - '0';
Adding this to code gives a working version of diff:
int diff (char str[], int i)
{
if(str[i]=='\0' || i>=100)
return 0;
int value = str[i] - '0';
if(i%2 == 0)
return (value + diff(str, i+1));
else
return (-value + diff(str, i+1));
}

Binary Search with strings in C

I have implemented this binary search algorithm with the goal of finding a desired char contained in an array. I've tried doing the 'if' 'else' comparisons the standard way, considering the ASCII table: [ if ( searchElement > arrayChar[mid] ... etc ]. That didn't seem to work, so I realized that maybe it had something to do with string comparisons. I'm now using the function strcmp and using its returned value to compare:
(If string1 < string2 -> negative value. If string1 > string2 -> positive value. If string1 == string2 -> 0. )
which it doesn't seem to be working as well.
#include <stdlib.h>
#include <time.h>
#include <stdio.h>
#include <string.h>
int binarySearch(int arraySize, char arrayChar[]) {
// Variable declaration;
char searchElement[2];
int startingPoint = 0, endingPoint = arraySize - 1;
// Input for desired search element;
printf("\nSearch for: ");
scanf(" %c", &searchElement);
while (startingPoint <= endingPoint) {
int mid = (startingPoint + endingPoint) / 2;
if (strcmp(searchElement, arrayChar[mid]) == 0) {
return mid;
}
else if (strcmp(searchElement, arrayChar[mid]) < 0)
endingPoint = mid - 1;
else {
if (strcmp(searchElement, arrayChar[mid]) > 0)
startingPoint = mid + 1;
}
}
return -1;
}
int main() {
// Array declaration;
char randomCharArray[7] = {'a', 'c', 'e', 'f', 'f', 'g', 'h'};
// Calling binarySearch() ;
if (binarySearch(6, randomCharArray) == -1) printf("Element not found!");
else printf("Element found in [%d] .", binarySearch(6, randomCharArray));
return 1;
}

You seem to be having trouble with the distinction between individual char values and strings, which are contiguous sequences of char values ending with one having value 0. Pairs of individual char values can be compared with the standard relational and equality-test operators (<, >, ==, etc.). Pairs of strings can be compared via the strcmp() function. Individual chars cannot be directly compared with strings, and you don't have strings anyway because neither the contents of binarySearch()'s searchElement nor the contents of main()'s randomCharArray are null-terminated.
That leads me to suggestion 1: make searchElement a char, not an array, since you know you need it to represent a single char only:
char searchElement;
Having done that, you can now (suggestion 2) compare the value of searchElement to elements of arrayChar via the standard operators, as you say originally tried to do. For example,
// with 'searchElement' as a char, not a char[]:
if (searchElement == arrayChar[mid]) {
Alternatively, if you kept searchElement as an array then you could use searchElement[0] to access its first char for comparison.

How do I check the first two characters of my char array in C?

This is code to create a similar C library function atoi() without the use of any C runtime library routines.
I'm currently stuck on how to check for the first two digits of the char array s to see whether the input begins with "0x".
If it starts with 0x, this means that I can then convert it in to hexadecimal.
#include <stdio.h>
int checkforint(char x){
if (x>='0' && x<='9'){
return 1;
}
else{
return 0;
}
}
unsigned char converthex(char x){
//lets convert the character to lowercase, in the event its in uppercase
x = tolower(x);
if (x >= '0' && x<= '9') {
return ( x -'0');
}
if (x>='a' && x<='f'){
return (x - 'a' +10);
}
return 16;
}
int checkforhex(const char *a, const char *b){
if(a = '0' && b = 'x'){
return 1;
}else{
return 0;
}
}
//int checkforint
/* We read an ASCII character s and return the integer it represents*/
int atoi_ex(const char*s, int ishex)
{
int result = 0; //this is the result
int sign = 1; //this variable is to help us deal with negative numbers
//we initialise the sign as 1, as we will assume the input is positive, and change the sign accordingly if not
int i = 0; //iterative variable for the loop
int j = 2;
//we check if the input is a negative number
if (s[0] == '-') { //if the first digit is a negative symbol
sign = -1; //we set the sign as negative
i++; //also increment i, so that we can skip past the sign when we start the for loop
}
//now we can check whether the first characters start with 0x
if (ishex==1){
for (j=2; s[j]!='\0'; ++j)
result = result + converthex(s[j]);
return sign*result;
}
//iterate through all the characters
//we start from the first character of the input and then iterate through the whole input string
//for every iteration, we update the result accordingly
for (; s[i]!='\0'; ++i){
//this checks whether the current character is an integer or not
//if it is not an integer, we skip past it and go to the top of the loop and move to the next character
if (checkforint(s[i]) == 0){
continue;
} else {
result = result * 10 + s[i] -'0';
}
//result = s[i];
}
return sign * result;
}
int main(int argc)
{
int isithex;
char s[] = "-1";
char a = s[1];
char b = s[2];
isithex=checkforhex(a,b);
int val = atoi_ex(s,isithex);
printf("%d\n", val);
return 0;
}

There are several errors in your code. First, in C you start counting from zero. So in main(), you should write:
char a = s[0];
char b = s[1];
isithex = checkforhex(a, b);
Then, in checkforhex(), you should use == (two equal signs) to do comparisons, not =. So:
if (a == '0' && b == 'x')
However, as pointed out by kaylum, why not write the function to pass a pointer to the string instead of two characters? Like so:
int checkforhex(const char *str) {
if (str[0] == '0' && str[1] == 'x') {
...
}
}
And in main() call it like so:
isithex = checkforhex(s);

how do I correct this code to check for anagrams?

#include <stdio.h>
main()
{
char *a = "stack";
char *b = "cats";
int l = 0; // for length of string b
clrscr();
while (*b != NULL)
{
l++;
b++;
}
b--; // now b points to last char of the string
// printf("%d\n", l);
while ((*a - *b == 32) || (*a == *b))
{
printf("%c %c\n", *a, *b);
a++;
b--;
l--;
if (l == 0)
break;
}
l == 0 ? printf("anagrams") : printf("not anagrams");
getch();
}
I wrote the code to check whether two strings a and b are anagrams.
the code displays a false positive when b is shorter in length than a and is a substring of a.
for eg. b is "cats" which in reverse is "stac" and a is "stack" the code shows a and b as anagrams

Some pseudo-code for checking if two strings are anagrams of each other (using your variable names):
if (strlen(a) != strlen(b))
// Not an anagram
return
char *a_sorted = sort_chars(a)
char *b_sorted = sort_chars(b)
// Once the characters in each word are re-arranged in alphabetical
// order, we can just test for equality. Anagrams will be equal
// (Note that this will not work on multi-word strings with differing
// numbers of whitespace characters)
if (strcmp(a_sorted, b_sorted) == 0)
// Anagram!
else
// Not an anagram
The real work here is in the sort_chars(in) function, which sorts the characters in the in string alphabetically, and returns a pointer to a new string containing the sorted characters. Using the numeric ASCII values of the characters (they're already in alphabetical order, yay someone's doing something right out there!), something like bubble sort (or any other sorting algorithm you want to use) should work just fine.

First, the code you are writing looks more like you trying to check for the two words being palindromic. If that was your intent, you are taking the length of the smaller string, making l hit 0 before the last character of a is seen.
If you didn't want to check for palindromes, I would imagine that you are really looking for the code for checking if a is a permutation of b, since that code is the same as checking for an anagram if you just have one word.
Don't do what others have suggested and sort the strings and compare. That ranges from comparable to much worse than three, simple linear traversals, albeit with a small space trade-off and a bound on number of possible char occurrences in a string.
The logic is this: make a map from chars to occurrence values indicating what chars are present in a and how many times they occurred. Then, go through b once making sure that every char is present, decrementing the occurrence count. Then, go through b again, making sure all occurrences are zero.
Here:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
// Only works for single words, but you get the idea.
int palindrome(const char* a, const char* b) {
ssize_t len_a = (ssize_t)strlen(a);
ssize_t len_b = (ssize_t)strlen(b);
if (len_a != len_b) return 0;
b += (len_b - 1);
while (len_a--) if (*a++ != *b--) return 0;
return 1;
}
// Really checks for a case-sensitive permutations.
int anagram(const char* a, const char* b) {
size_t len_a = strlen(a);
size_t len_b = strlen(b);
if (len_a != len_b) return 0;
uint16_t occurrences[256] = {0};
for (; *a; occurrences[(int)*a]++, a++);
for (; *b; occurrences[(int)*b]--, b++)
if (!occurrences[(int)*b]) return 0;
b -= len_b;
for (; *b; b++) if (occurrences[(int)*b]) return 0;
return 1;
}
int main(void) {
const char *a = "stack";
const char *b = "kcats";
printf("anagram: %d\n", anagram(a, b));
b = "cats";
printf("anagram: %d\n", anagram(a, b));
b = "kcbts";
printf("anagram: %d\n", anagram(a, b));
a = "aabbcdddd";
b = "aaddddbbc";
printf("anagram: %d\n", anagram(a, b));
a = "aabbcddcd";
b = "aadddddbc";
printf("anagram: %d\n", anagram(a, b));
return EXIT_SUCCESS;
}
Note that when you do any of these kinds of algorithms, there is an early trivial false condition if the lengths of the strings differ.
Also, the code is case-sensitive here for simplicity, but that is easily changed.
Output:
anagram: 1
anagram: 0
anagram: 0
anagram: 1
anagram: 0

Continuing from my comment, you can use a simple frequency array (recording how many times each character of the subject appears) and do the same thing for the proposed anagram and then simply compare the two frequency arrays to determine if the proposed anagram is in fact an anagram of the subject.
For example, you could do something similar to the following:
#include <stdio.h>
#include <string.h>
#define MAXF 62
int main (int argc, char **argv) {
char *sub = argc > 1 ? argv[1] : "anagram",
*ana = argc > 2 ? argv[2] : "nag a ram",
*sp = sub, *ap = ana;
int subf[MAXF] = {0}, /* subject frequency array */
anaf[MAXF] = {0}; /* anagram frequency array */
while (*sp) { /* fill subject frequency array */
if ('a' <= *sp && *sp <= 'z')
subf[*sp - 'a']++;
else if ('A' <= *sp && *sp <= 'Z')
subf[*sp - 'A' + 26]++;
else if ('0' <= *sp && *sp <= '9')
subf[*sp - '0' + 52]++;
sp++;
}
while (*ap) { /* fill anagram frequency array */
if ('a' <= *ap && *ap <= 'z')
anaf[*ap - 'a']++;
else if ('A' <= *ap && *ap <= 'Z')
anaf[*ap - 'A' + 26]++;
else if ('0' <= *ap && *ap <= '9')
anaf[*ap - '0' + 52]++;
ap++;
}
/* if character frequency arrays are equal - it's an anagram */
if (memcmp (subf, anaf, MAXF * sizeof *subf) == 0)
printf ("'%s' is an anagram of '%s'\n", ana, sub);
else
printf ("'%s' is NOT an anagram of '%s'\n", ana, sub);
return 0;
}
Example Use/Output
Checking the default values of anagram and nag a ram
$ ./bin/anagram
'nag a ram' is an anagram of 'anagram'
Checking if nag a rim is an anagram of the subject word anagram
$ ./bin/anagram anagram "nag a rim"
'nag a rim' is NOT an anagram of 'anagram'

C - Largest String From a Big One

So pray tell, how would I go about getting the largest contiguous string of letters out of a string of garbage in C? Here's an example:
char *s = "(2034HEY!!11 th[]thisiswhatwewant44";
Would return...
thisiswhatwewant
I had this on a quiz the other day...and it drove me nuts (still is) trying to figure it out!
UPDATE:
My fault guys, I forgot to include the fact that the only function you are allowed to use is the strlen function. Thus making it harder...

Uae strtok() to split your string into tokens, using all non-letter characters as delimiters, and find the longest token.
To find the longest token you will need to organise some storage for tokens - I'd use linked list.
As simple as this.
EDIT
Ok, if strlen() is the only function allowed, you can first find the length of your source string, then loop through it and replace all non-letter characters with NULL - basically that's what strtok() does.
Then you need to go through your modified source string second time, advancing one token at a time, and find the longest one, using strlen().

This sounds similar to the standard UNIX 'strings' utility.
Keep track of the longest run of printable characters terminated by a NULL.
Walk through the bytes until you hit a printable character. Start counting. If you hit a non-printable character stop counting and throw away the starting point. If you hit a NULL, check to see if the length of the current run is greater then the previous record holder. If so record it, and start looking for the next string.

What defines the "good" substrings compared to the many others -- being lowercase alphas only? (i.e., no spaces, digits, punctuation, uppercase, &c)?
Whatever the predicate P that checks for a character being "good", a single pass over s applying P to each character lets you easily identify the start and end of each "run of good characters", and remember and pick the longest. In pseudocode:
longest_run_length = 0
longest_run_start = longest_run_end = null
status = bad
for i in (all indices over s):
if P(s[i]): # current char is good
if status == bad: # previous one was bad
current_run_start = current_run_end = i
status = good
else: # previous one was also good
current_run_end = i
else: # current char is bad
if status == good: # previous one was good -> end of run
current_run_length = current_run_end - current_run_start + 1
if current_run_length > longest_run_length:
longest_run_start = current_run_start
longest_run_end = current_run_end
longest_run_length = current_run_length
status = bad
# if a good run ends with end-of-string:
if status == good: # previous one was good -> end of run
current_run_length = current_run_end - current_run_start + 1
if current_run_length > longest_run_length:
longest_run_start = current_run_start
longest_run_end = current_run_end
longest_run_length = current_run_length

Why use strlen() at all?
Here's my version which uses no function whatsoever.
#ifdef UNIT_TEST
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#endif
/*
// largest_letter_sequence()
// Returns a pointer to the beginning of the largest letter
// sequence (including trailing characters which are not letters)
// or NULL if no letters are found in s
// Passing NULL in `s` causes undefined behaviour
// If the string has two or more sequences with the same number of letters
// the return value is a pointer to the first sequence.
// The parameter `len`, if not NULL, will have the size of the letter sequence
//
// This function assumes an ASCII-like character set
// ('z' > 'a'; 'z' - 'a' == 25; ('a' <= each of {abc...xyz} <= 'z'))
// and the same for uppercase letters
// Of course, ASCII works for the assumptions :)
*/
const char *largest_letter_sequence(const char *s, size_t *len) {
const char *p = NULL;
const char *pp = NULL;
size_t curlen = 0;
size_t maxlen = 0;
while (*s) {
if ((('a' <= *s) && (*s <= 'z')) || (('A' <= *s) && (*s <= 'Z'))) {
if (p == NULL) p = s;
curlen++;
if (curlen > maxlen) {
maxlen = curlen;
pp = p;
}
} else {
curlen = 0;
p = NULL;
}
s++;
}
if (len != NULL) *len = maxlen;
return pp;
}
#ifdef UNIT_TEST
void fxtest(const char *s) {
char *test;
const char *p;
size_t len;
p = largest_letter_sequence(s, &len);
if (len && (len < 999)) {
test = malloc(len + 1);
if (!test) {
fprintf(stderr, "No memory.\n");
return;
}
strncpy(test, p, len);
test[len] = 0;
printf("%s ==> %s\n", s, test);
free(test);
} else {
if (len == 0) {
printf("no letters found in \"%s\"\n", s);
} else {
fprintf(stderr, "ERROR: string too large\n");
}
}
}
int main(void) {
fxtest("(2034HEY!!11 th[]thisiswhatwewant44");
fxtest("123456789");
fxtest("");
fxtest("aaa%ggg");
return 0;
}
#endif

While I waited for you to post this as a question I coded something up.
This code iterates through a string passed to a "longest" function, and when it finds the first of a sequence of letters it sets a pointer to it and starts counting the length of it. If it is the longest sequence of letters yet seen, it sets another pointer (the 'maxStringStart' pointer) to the beginning of that sequence until it finds a longer one.
At the end, it allocates enough room for the new string and returns a pointer to it.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int isLetter(char c){
return ( (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') );
}
char *longest(char *s) {
char *newString = 0;
int maxLength = 0;
char *maxStringStart = 0;
int curLength = 0;
char *curStringStart = 0;
do {
//reset the current string length and skip this
//iteration if it's not a letter
if( ! isLetter(*s)) {
curLength = 0;
continue;
}
//increase the current sequence length. If the length before
//incrementing is zero, then it's the first letter of the sequence:
//set the pointer to the beginning of the sequence of letters
if(curLength++ == 0) curStringStart = s;
//if this is the longest sequence so far, set the
//maxStringStart pointer to the beginning of it
//and start increasing the max length.
if(curLength > maxLength) {
maxStringStart = curStringStart;
maxLength++;
}
} while(*s++);
//return null pointer if there were no letters in the string,
//or if we can't allocate any memory.
if(maxLength == 0) return NULL;
if( ! (newString = malloc(maxLength + 1)) ) return NULL;
//copy the longest string into our newly allocated block of
//memory (see my update for the strlen() only requirement)
//and null-terminate the string by putting 0 at the end of it.
memcpy(newString, maxStringStart, maxLength);
newString[maxLength + 1] = 0;
return newString;
}
int main(int argc, char *argv[]) {
int i;
for(i = 1; i < argc; i++) {
printf("longest all-letter string in argument %d:\n", i);
printf(" argument: \"%s\"\n", argv[i]);
printf(" longest: \"%s\"\n\n", longest(argv[i]));
}
return 0;
}
This is my solution in simple C, without any data structures.
I can run it in my terminal like this:
~/c/t $ ./longest "hello there, My name is Carson Myers." "abc123defg4567hijklmnop890"
longest all-letter string in argument 1:
argument: "hello there, My name is Carson Myers."
longest: "Carson"
longest all-letter string in argument 2:
argument: "abc123defg4567hijklmnop890"
longest: "hijklmnop"
~/c/t $
the criteria for what constitutes a letter could be changed in the isLetter() function easily. For example:
return (
(c >= 'a' && c <= 'z') ||
(c >= 'A' && c <= 'Z') ||
(c == '.') ||
(c == ' ') ||
(c == ',') );
would count periods, commas and spaces as 'letters' also.
as per your update:
replace memcpy(newString, maxStringStart, maxLength); with:
int i;
for(i = 0; i < maxLength; i++)
newString[i] = maxStringStart[i];
however, this problem would be much more easily solved with the use of the C standard library:
char *longest(char *s) {
int longest = 0;
int curLength = 0;
char *curString = 0;
char *longestString = 0;
char *tokens = " ,.!?'\"()#$%\r\n;:+-*/\\";
curString = strtok(s, tokens);
do {
curLength = strlen(curString);
if( curLength > longest ) {
longest = curLength;
longestString = curString;
}
} while( curString = strtok(NULL, tokens) );
char *newString = 0;
if( longest == 0 ) return NULL;
if( ! (newString = malloc(longest + 1)) ) return NULL;
strcpy(newString, longestString);
return newString;
}

First, define "string" and define "garbage". What do you consider a valid, non-garbage string? Write down a concrete definition you can program - this is how programming specs get written. Is it a sequence of alphanumeric characters? Should it start with a letter and not a digit?
Once you get that figured out, it's very simple to program. Start with a naive method of looping over the "garbage" looking for what you need. Once you have that, look up useful C library functions (like strtok) to make the code leaner.

Another variant.
#include <stdio.h>
#include <string.h>
int main(void)
{
char s[] = "(2034HEY!!11 th[]thisiswhatwewant44";
int len = strlen(s);
int i = 0;
int biggest = 0;
char* p = s;
while (p[0])
{
if (!((p[0] >= 'A' && p[0] <= 'Z') || (p[0] >= 'a' && p[0] <= 'z')))
{
p[0] = '\0';
}
p++;
}
for (; i < len; i++)
{
if (s[i] && strlen(&s[i]) > biggest)
{
biggest = strlen(&s[i]);
p = &s[i];
}
}
printf("%s\n", p);
return 0;
}