Binary Search with strings in C - c

I have implemented this binary search algorithm with the goal of finding a desired char contained in an array. I've tried doing the 'if' 'else' comparisons the standard way, considering the ASCII table: [ if ( searchElement > arrayChar[mid] ... etc ]. That didn't seem to work, so I realized that maybe it had something to do with string comparisons. I'm now using the function strcmp and using its returned value to compare:
(If string1 < string2 -> negative value. If string1 > string2 -> positive value. If string1 == string2 -> 0. )
which it doesn't seem to be working as well.
#include <stdlib.h>
#include <time.h>
#include <stdio.h>
#include <string.h>
int binarySearch(int arraySize, char arrayChar[]) {
// Variable declaration;
char searchElement[2];
int startingPoint = 0, endingPoint = arraySize - 1;
// Input for desired search element;
printf("\nSearch for: ");
scanf(" %c", &searchElement);
while (startingPoint <= endingPoint) {
int mid = (startingPoint + endingPoint) / 2;
if (strcmp(searchElement, arrayChar[mid]) == 0) {
return mid;
}
else if (strcmp(searchElement, arrayChar[mid]) < 0)
endingPoint = mid - 1;
else {
if (strcmp(searchElement, arrayChar[mid]) > 0)
startingPoint = mid + 1;
}
}
return -1;
}
int main() {
// Array declaration;
char randomCharArray[7] = {'a', 'c', 'e', 'f', 'f', 'g', 'h'};
// Calling binarySearch() ;
if (binarySearch(6, randomCharArray) == -1) printf("Element not found!");
else printf("Element found in [%d] .", binarySearch(6, randomCharArray));
return 1;
}

You seem to be having trouble with the distinction between individual char values and strings, which are contiguous sequences of char values ending with one having value 0. Pairs of individual char values can be compared with the standard relational and equality-test operators (<, >, ==, etc.). Pairs of strings can be compared via the strcmp() function. Individual chars cannot be directly compared with strings, and you don't have strings anyway because neither the contents of binarySearch()'s searchElement nor the contents of main()'s randomCharArray are null-terminated.
That leads me to suggestion 1: make searchElement a char, not an array, since you know you need it to represent a single char only:
char searchElement;
Having done that, you can now (suggestion 2) compare the value of searchElement to elements of arrayChar via the standard operators, as you say originally tried to do. For example,
// with 'searchElement' as a char, not a char[]:
if (searchElement == arrayChar[mid]) {
Alternatively, if you kept searchElement as an array then you could use searchElement[0] to access its first char for comparison.

Related

Design a Character Searching Function, While Forced to Use strchr

Background Information
I was recently approached by a friend who was given a homework problem to develop a searching algorithm. Before anyone asks, I did think of a solution! However, my solution is not what the teacher is asking for...
Anyway, this is an introductory C programming course where the students have been asked to write a search function called ch_search that is supposed to search an array of characters to determine how many times a specific character occurs. The constraints are what I don't understand...
Constraints:
The arguments are: array to search, character to search for, and length of the array being searched.
The function must use a for-loop.
The algorithm must use the strchr function.
Okay, so the first two constraints I can understand... but the 3rd constraint is what really gets me... I was initially thinking that we could just use a for-loop to iterate through the string from the beginning to the end, simply counting each instance of the character. When the student originally described the problem to me, I came up with (although incorrect) the solution:
Proposed Solution
int ch_search(char array_to_search[], char char_to_search_for, int array_size)
{
int count = 0;
for (int i = 0; i < array_size; i++)
{
// count each character instance
if (array_to_search[i] == char_to_search_for)
{
// keep incrementing the count
count++;
}
}
return count;
}
Then I was told that I had to specifically use the character position function (and apparently it has to be strchr and not strrchr so we can't start at the end I guess?)... I just don't see how that wouldn't be overcomplicating this. I don't see how that would help at all, especially counting from the beginning... Even strrchr might make a little more sense to me. Thoughts?
It's true that having the length of the array and having to use a for loop,
the most natural thing to do would be to iterate over every characters of the
source array. But you can also loop over the result of strchr like this:
int ch_search(char haystack[], char needle, int size)
{
int count = 0;
char *found;
for(; (found = strchr(haystack, needle)) != NULL; haystack = found + 1)
count++;
return count;
}
In this case you don't need the size of the array but the assignment doesn't say
that you have to use it. Obviously this solution requires the source to be '\0'-terminated.
I think the teacher wanted you to use strchr to navigate to the next occurrence of the char_to_search_for within a string:
int ch_search(char array_to_search[], char char_to_search_for, int array_size) {
int count = 0;
for (char *ptr = array_to_search ; ptr != &array_to_search[array_size] ; ptr++) {
ptr = strchr(ptr, char_to_search_for);
if (!ptr) {
break; // Character is not found
}
count++;
}
return count;
}
Note that array_to_search must be null-terminated in order to be used together with strchr solution above.
This sounds like your friend was given a trick question. The function gets an array of chars and the length of that array but is required to use strchr() even though that function only works on '\0' terminated strings (and there was not given any guaranty that the array is '\0' terminated).
You might thing that it would be fine to use strchr() on the array anyway and then compare the returned pointer to the given length of the array to check if it went past the end of the array. But there are two problems with that:
If strchr() searches past the end of the array, then you already have Undefined Behavior before getting to the check. The program might have crashed before returning from strchr(), the returned pointer might be some total garbage or you might get a pointer to an address a bit further in memory than the end of the array.
Even if the returned pointer is just to an address a bit further in memory than the end of the array, then there is the problem that comparing two pointers (or subtracting them to find the distance between the pointed addresses) is Undefined Behavior unless they're both pointing to parts of the same memory object (or one position past the end of the object). In this instance it means that checking if the returned pointer is within the bounds of the array is only defined behavior if the returned pointer is within the bounds of the array (or one past the end) making the check a bit useless.
The only solution to that is to make sure that strchr() is working with a '\0' terminated string. For example:
int ch_search(char array_to_search[], char char_to_search_for, int array_size)
{
char *buffer = malloc(array_size + 1);
// Add test here to check if malloc was succesful
strncpy(buffer, array_to_search, array_size);
buffer[array_size] = '\0';
int count = 0;
for (char *i = buffer; (i = strchr(i, char_to_search_for)) != NULL; i++) {
count++;
}
free(buffer);
return count;
}
strchr is a very convenient function to search for a char in a string.
Find and read more about strchr. This is my favorite function ever!
The C library function char *strchr(const char *str, int c) searches for the first occurrence of the character c (an unsigned char) in the string pointed to by the argument str.
Declaration
Following is the declaration for strchr() function.
char *strchr(const char *str, int c)
Parameters
str − This is the C string to be scanned.
c − This is the character to be searched in str.
Return value
Function returns a pointer to the first occurrence of the character c in the string str, or NULL if the character is not found.
Constraints:
1) The arguments are: array to search, character to search for, and
length of the array being searched.
This constrain gives the length of the array to be searched. The given array has to contain '\0' at some point. However the length of search search can be shorter and specified by the search_length.
Following compact solution takes this under account.
int ch_search(char array_to_search[], char char_to_search_for, int search_length)
{
int count = 0;
for(char *p = array_to_search; ;p++)
{
p = strchr(p, char_to_search_for);
if( p != NULL && (p - array_to_search < search_length) )
count++;
else
break;
}
return count;
}
Or equivalent ch_search2:
#include<stdio.h>
#include<string.h>
int ch_search(char array_to_search[], char char_to_search_for, int search_length)
{
int count = 0;
for(char *p = array_to_search; ;p++)
{
p = strchr(p, char_to_search_for);
if( p != NULL && (p - array_to_search < search_length) )
count++;
else
break;
}
return count;
}
// Your original function:
int ch_search1(char array_to_search[], char char_to_search_for, int array_size)
{
int count = 0;
for (int i = 0; i < array_size; i++){
// count each character instance
if (array_to_search[i] == char_to_search_for){
count++; // keep incrementing the count
}
}
return count;
}
int ch_search2(char array_to_search[], char char_to_search_for, int array_size)
{
int count = 0;
char *p = array_to_search;
for(;;)
{
p = strchr(p, char_to_search_for);
if( p != NULL )
{
if (p - array_to_search >= array_size) // we reached beyond
{
break;
}
else
{
count++;
p++;
}
}
else
break; // char not found
}
return count;
}
int main(void)
{
// the arr has to contain '\0' terminator but we can search within the specified length.
char arr[]={'1','1','2','2','1','1','3','3','3','1','4','4', '1','1','!','1','\0','1'};
char arr1[] = "zdxbab";
printf("count %d count %d \n",ch_search(arr , '1', 12),ch_search2(arr , '1', 12));
printf("count %d count %d \n",ch_search(arr1,'b',strlen(arr1)),ch_search2(arr1,'b',strlen(arr1)));
return 0;
}
Output:
count 5 count 5
count 2 count 2

String compare in C

I'm a little confused with the string compare strcmp() function in C.
When you have two strings, grass and grapes and you use strcmp(grass, grapes); which results in 39, or any positive number, does this mean that "grapes" is alphabetized before "grass", or the opposite?
I know that if it results to 0, they're equal.
strcmp function starts comparing the first character of each string. If they are equal to each other, it continues with the following pairs until the characters differ or until a terminating null-character is reached.
This means that, this function performs a binary comparison of the characters.
The following program should give you an Idea about how strcmp works:
#include <stdio.h>
#include <string.h>
int stringcmp(char *s1, char *s2){
int count = 0;
while (s1[count] == s2[count]) {
if (s1[count] == '\0' || s2[count] == '\0')
break;
count++;
}
if (s1[count] == '\0' && s2[count] == '\0'){
return 0;
}
if(strlen(s1) < strlen(s2)){
return -1;
}else{
return 1;
}
}
int main(void){
char *b = "grass";
char *a = "grapes";
if(stringcmp(a, b) == 0){
printf("Are equal.\n");
printf("Length of A = %zu\n",strlen(a));
printf("Length of B = %zu\n",strlen(b));
printf("Return of stringcmp = %d\n",stringcmp(a, b));
}else{
printf("Are not equal.\n");
printf("Length of A = %zu\n",strlen(a));
printf("Length of B = %zu\n",strlen(b));
printf("Return of stringcmp = %d\n",stringcmp(a, b));
}
return 0;
}
Output:
Are not equal.
Length of A = 5
Length of B = 6
Return of stringcmp = -1
If you swap a with b you get:
Are not equal.
Length of A = 6
Length of B = 5
Return of stringcmp = 1
And if A and B are the same:
Are equal.
Length of A = 5
Length of B = 5
Return of stringcmp = 0
The return value of strcmp is defined in C99 7.21.4
The sign of a nonzero value returned by the comparison functions memcmp, strcmp,
and strncmp is determined by the sign of the difference between the values of the first
pair of characters (both interpreted as unsigned char) that differ in the objects being
compared.
So if the result is positive, it means the second argument comes after the first.
It's not exactly alphabetical order, but is rather dependent on the underlying encoding of the characers. For instance, in ASCII, 'B' < 'a', because 'B' is encoded as 66 and 'a' is 97. If the characters are all letters of the same case, this will be equivalent to alphabetical order in all (non-multibyte) encodings I'm familiar with, but I don't believe this is required.
For cases like "grass" vs "grapes", it'll just keep scanning until it finds characters that differ ('s' vs 'p' in this case), and then make the decision. A special case of this is when one string is a substring of another: e.g. "grape" vs "grapes". For that case, you just need to remember that "grape" is actually { 'g', 'r', 'a', 'p', 'e', '\0' }, and apply the normal rule: '\0' < 's', so "grape" comes before "grapes".
This would be a conforming implementation of strcmp:
int strcmp(const char *a, const char *b) {
size_t i = 0;
while (a[i] || b[i]) {
if (a[i] != b[i]) {
if (a[i] < b[i]) return -1;
else return 1;
}
i++;
}
return 0;
}

Pointers to string C

trying to write function that returns 1 if every letter in “word” appears in “s”.
for example:

containsLetters1("this_is_a_long_string","gas") returns 1
containsLetters1("this_is_a_longstring","gaz") returns 0
containsLetters1("hello","p") returns 0
Can't understand why its not right:
#include <stdio.h>
#include <string.h>
#define MAX_STRING 100
int containsLetters1(char *s, char *word)
{
int j,i, flag;
long len;
len=strlen(word);
for (i=0; i<=len; i++) {
flag=0;
for (j=0; j<MAX_STRING; j++) {
if (word==s) {
flag=1;
word++;
s++;
break;
}
s++;
}
if (flag==0) {
break;
}
}
return flag;
}
int main() {
char string1[MAX_STRING] , string2[MAX_STRING] ;
printf("Enter 2 strings for containsLetters1\n");
scanf ("%s %s", string1, string2);
printf("Return value from containsLetters1 is: %d\n",containsLetters1(string1,string2));
return 0;
Try these:
for (i=0; i < len; i++)... (use < instead of <=, since otherwise you would take one additional character);
if (word==s) should be if (*word==*s) (you compare characters stored at the pointed locations, not pointers);
Pointer s advances, but it should get back to the start of the word s, after reaching its end, i.e. s -= len after the for (j=...);
s++ after word++ is not needed, you advance the pointer by the same amount, whether or not you found a match;
flag should be initialized with 1 when declared.
Ah, that should be if(*word == *s) you need to use the indirection operator. Also as hackss said, the flag = 0; must be outside the first for() loop.
Unrelated but probably replace scanf with fgets or use scanf with length specifier For example
scanf("%99s",string1)
Things I can see wrong at first glance:
Your loop goes over MAX_STRING, it only needs to go over the length of s.
Your iteration should cover only the length of the string, but indexes start at 0 and not 1. for (i=0; i<=len; i++) is not correct.
You should also compare the contents of the pointer and not the pointers themselves. if(*word == *s)
The pointer advance logic is incorrect. Maybe treating the pointer as an array could simplify your logic.
Another unrelated point: A different algorithm is to hash the characters of string1 to a map, then check each character of the string2 and see if it is present in the map. If all characters are present then return 1 and when you encounter the first one that is not present then return 0. If you are only limited to using ASCII characters a hashing function is very easy. The longer your ASCII strings are the better the performance of the second approach.
Here is a one-liner solution, in keeping with Henry Spencer's Commandment 7 for C Programmers.
#include <string.h>
/*
* Does l contain every character that appears in r?
*
* Note degenerate cases: true if r is an empty string, even if l is empty.
*/
int contains(const char *l, const char *r)
{
return strspn(r, l) == strlen(r);
}
However, the problem statement is not about characters, but about letters. To solve the problem as literally given in the question, we must remove non-letters from the right string. For instance if r is the word error-prone, and l does not contain a hyphen, then the function returns 0, even if l contains every letter in r.
If we are allowed to modify the string r in place, then what we can do is replace every non-letter in the string with one of the letters that it does contain. (If it contains no letters, then we can just turn it into an empty string.)
void nuke_non_letters(char *r)
{
static const char *alpha =
"abcdefghijklmnopqrstuvwxyz"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ";
while (*r) {
size_t letter_span = strspn(r, alpha);
size_t non_letter_span = strcspn(r + letter_span, alpha);
char replace = (letter_span != 0) ? *r : 0;
memset(r + letter_span, replace, non_letter_span);
r += letter_span + non_letter_span;
}
}
This also brings up another flaw: letters can be upper and lower case. If the right string is A, and the left one contains only a lower-case a, then we have failure.
One way to fix it is to filter the characters of both strings through tolower or toupper.
A third problem is that a letter is more than just the 26 letters of the English alphabet. A modern program should work with wide characters and recognize all Unicode letters as such so that it works in any language.
By the time we deal with all that, we may well surpass the length of some of the other answers.
Extending the idea in Rajiv's answer, you might build the character map incrementally, as in containsLetters2() below.
The containsLetters1() function is a simple brute force implementation using the standard string functions. If there are N characters in the string (haystack) and M in the word (needle), it has a worst-case performance of O(N*M) when the characters of the word being looked for only appear at the very end of the searched string. The strchr(needle, needle[i]) >= &needle[i] test is an optimization if there are likely to be repeated characters in the needle; if there won't be any repeats, it is a pessimization (but it can be removed and the code still works fine).
The containsLetters2() function searches through the string (haystack) at most once and searches through the word (needle) at most once, for a worst case performance of O(N+M).
#include <assert.h>
#include <stdio.h>
#include <string.h>
static int containsLetters1(char const *haystack, char const *needle)
{
for (int i = 0; needle[i] != '\0'; i++)
{
if (strchr(needle, needle[i]) >= &needle[i] &&
strchr(haystack, needle[i]) == 0)
return 0;
}
return 1;
}
static int containsLetters2(char const *haystack, char const *needle)
{
char map[256] = { 0 };
size_t j = 0;
for (int i = 0; needle[i] != '\0'; i++)
{
unsigned char c_needle = needle[i];
if (map[c_needle] == 0)
{
/* We don't know whether needle[i] is in the haystack yet */
unsigned char c_stack;
do
{
c_stack = haystack[j++];
if (c_stack == 0)
return 0;
map[c_stack] = 1;
} while (c_stack != c_needle);
}
}
return 1;
}
int main(void)
{
assert(containsLetters1("this_is_a_long_string","gagahats") == 1);
assert(containsLetters1("this_is_a_longstring","gaz") == 0);
assert(containsLetters1("hello","p") == 0);
assert(containsLetters2("this_is_a_long_string","gagahats") == 1);
assert(containsLetters2("this_is_a_longstring","gaz") == 0);
assert(containsLetters2("hello","p") == 0);
}
Since you can see the entire scope of the testing, this is not anything like thoroughly tested, but I believe it should work fine, regardless of how many repeats there are in the needle.

Array manipulation in C

I am like 3 weeks new at writing c code, so I am a newbie just trying some examples from a Harvard course video hosted online. I am trying to write some code that will encrypt a file based on the keyword.
The point is each letter of the alphabet will be assigned a numerical value from 0 to 25, so 'A' and 'a' will be 0, and likewise 'z' and 'Z' will be 25. If the keyword is 'abc' for example, I need to be able to convert it to its numerical form which is '012'. The approach I am trying to take (having learned nothing yet about many c functions) is to assign the alphabet list in an array. I think in the lecture he hinted at a multidimensional array but not sure how to implement that. The problem is, if the alphabet is stored as an array then the letters will be the actual values of the array and I'd need to know how to search an array based on the value, which I don't know how to do (so far I've just been returning values based on the index). I'd like some pseudo code help so I can figure this out. Thanks
In C, a char is an 8-bit integer, so, assuming your letters are in order, you can actually use the char value to get the index by using the first letter (a) as an offset:
char offset = 'a';
char value = 'b';
int index = value - offset; /* index = 1 */
This is hard to answer, not knowing what you've learned so far, but here's a hint to what I would do: the chars representing letters are bytes representing their ASCII values, and occur sequentially, from a to z and A to Z though they don't start at zero. You can cast them to ints and get the ascii values out.
Here's the pseudo code for how I'd write it:
Cast the character to a number
IF it's between the ascii values of A and Z, subtract it from A
ELSE Subtract it from the ASCII value of a or A
Output the result.
For what it's worth, I don't see an obvious solution to the problem that involves multidimensional arrays.
char '0' is the value 48
char 'A' is the value 65
char 'a' is the value 97
You said you want to learn how to search in the array:
char foo[26]; //your character array
...
...
//here is initialization of the array
for(int biz=0;biz<26;biz++)
{
foo[biz]=65+biz; // capital alphabet
}
...
...
//here is searching 1 by 1 iteration(low-yield)
char baz=67; //means we will find 'C'
for(int bar=0;bar<26;bar++)
{
if(foo[bar]==baz) {printf("we found C at the index: %i ",bar);break;}
}
//since this is a soted-array, you can use more-yield search algortihms.
Binary search algortihm(you may use on later chapters):
http://en.wikipedia.org/wiki/Binary_search_algorithm
The use of a multidimensional array is to store both the lower case and upper case alphabets in an array so that they can be mapped. An efficient way is using their ASCII code, but since you are a beginner, I guess this example will introduce you to handle for loops and multidimensional arrays, which I think is the plan of the instructor as well.
Let us first set up the array for the alphabets. We will have two rows with 26 alphabets in each row:
alphabetsEnglish[26][2] = {{'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'},
{'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'}};
Now we can map elements of both cases.
int main()
{
int c,i,j;
char word[10];
printf("Enter a word:");
scanf("%s",word);
c=strlen(word);
printf("Your word has %d letters ", c);
for (i = 0; i < c; i++) //loop for the length of your word
{
for (j = 0; j <= 25; j++) //second loop to go through your alphabet list
{
if (word[i] == alphabetsEnglish[0][j] || word[i] == alphabetsEnglish[1][j]) //check for both cases of your alphabet
{
printf("Your alphabet %c translates to %d: ", word[i], j);
}
}
}
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int *conv(char* str){
static const char* table = "abcdefghijklmnopqrstuvwxyz";
int size, *ret, *p;
if(NULL==str || *str == '\0') return NULL;
size = strlen(str);
ret=p=(int*)malloc(size*sizeof(int));
while(*str){
char *pos;
pos=strchr(table, tolower(*str++));
*p++ = pos == NULL ? -1 : pos - table;
}
return ret;
}
int main(void){
char *word = "abc";
int i, size = strlen(word), *result;
result = conv(word);
for(i=0;i<size;++i){
printf("%d ", result[i]);//0 1 2
}
free(result);
return 0;
}

pointers and string parsing in c

I was wondering if somebody could explain me how pointers and string parsing works. I know that I can do something like the following in a loop but I still don't follow very well how it works.
for (a = str; * a; a++) ...
For instance, I'm trying to get the last integer from the string. if I have a string as const char *str = "some string here 100 2000";
Using the method above, how could I parse it and get the last integer of the string (2000), knowing that the last integer (2000) may vary.
Thanks
for (a = str; * a; a++) ...
This works by starting a pointer a at the beginning of the string, until dereferencing a is implicitly converted to false, incrementing a at each step.
Basically, you'll walk the array until you get to the NUL terminator that's at the end of your string (\0) because the NUL terminator implicitly converts to false - other characters do not.
Using the method above, how could I parse it and get the last integer of the string (2000), knowing that the last integer (2000) may vary.
You're going to want to look for the last space before the \0, then you're going to want to call a function to convert the remaining characters to an integer. See strtol.
Consider this approach:
find the end of the string (using that loop)
search backwards for a space.
use that to call strtol.
-
for (a = str; *a; a++); // Find the end.
while (*a != ' ') a--; // Move back to the space.
a++; // Move one past the space.
int result = strtol(a, NULL, 10);
Or alternatively, just keep track of the start of the last token:
const char* start = str;
for (a = str; *a; a++) { // Until you hit the end of the string.
if (*a == ' ') start = a; // New token, reassign start.
}
int result = strtol(start, NULL, 10);
This version has the benefit of not requiring a space in the string.
You just need to implement a simple state machine with two states, e.g
#include <ctype.h>
int num = 0; // the final int value will be contained here
int state = 0; // state == 0 == not parsing int, state == 1 == parsing int
for (i = 0; i < strlen(s); ++i)
{
if (state == 0) // if currently in state 0, i.e. not parsing int
{
if (isdigit(s[i])) // if we just found the first digit character of an int
{
num = s[i] - '0'; // discard any old int value and start accumulating new value
state = 1; // we are now in state 1
}
// otherwise do nothing and remain in state 0
}
else // currently in state 1, i.e. parsing int
{
if (isdigit(s[i])) // if this is another digit character
{
num = num * 10 + s[i] - '0'; // continue accumulating int
// remain in state 1...
}
else // no longer parsing int
{
state = 0; // return to state 0
}
}
}
I know this has been answered already but all the answers thus far are recreating code that is available in the Standard C Library. Here is what I would use by taking advantage of strrchr()
#include <string.h>
#include <stdio.h>
int main(void)
{
const char* input = "some string here 100 2000";
char* p;
long l = 0;
if(p = strrchr(input, ' '))
l = strtol(p+1, NULL, 10);
printf("%ld\n", l);
return 0;
}
Output
2000
for (a = str; * a; a++)...
is equivalent to
a=str;
while(*a!='\0') //'\0' is NUL, don't confuse it with NULL which is a macro
{
....
a++;
}
The loop you've presented just goes through all characters (string is a pointer to the array of 1-byte chars that ends with 0). For parsing you should use sscanf or better C++'s string and string stream.

Resources