Segmentation fault in Binary Search in C - c

I'm trying to write a program that scans an input file that contains the number of letters in the array, a sorted list of letters, the number of letters to search for, a list of letters to search for. It displays the search results in a format shown in the sample file.
I'm getting a segmentation fault error message at runtime with the code I have included below. Now before this post gets negative feedback for not including the right amount of code, I don't really know where the error is with this segmentation fault. I've included the relevant files here on Pastebin:
main.c
#include <stdio.h>
#include <stdlib.h>
#include "Proto.h"
int main()
{
/* Accepts number of elements from user */
scanf("%d", &elements);
/* Creates dynamic array */
array = (char *) calloc(elements, sizeof(char));
/* Copies sorted values to the dynamic array */
for(i = 0; i < elements; i++)
{
scanf("%s", &array[i]);
}
/* Accepts number of elements to search */
scanf("%d", &search);
/* Searches for elements in sorted array one at a time */
for(i = 1; i <= search; i++)
{
/* Accepts value to search */
scanf("%s", &value);
/* Resets counter to 0 */
count = 0;
/* Finds location of element in the sorted list using binary search */
location = binarySearch(array, value, 0, (elements-1));
/* Checks if element is present in the sorted list */
if (location == -1)
{
printf("%4s not found!\n", value);
}
else
{
printf("%4s found at %4d iteration during iteration %4d\n", value, location, count);
}
}
free(array);
}
BinarySearch.c
#include <stdio.h>
#include "Proto.h"
int binarySearch(char * nums, char svalue, int start, int end)
{
middle = (start + end) / 2;
/* Target found */
if (nums[middle] == svalue)
{
return middle;
}
/* Target not in list */
else if( start == end )
{
return -1;
}
/* Search to the left */
else if( nums[middle] > svalue )
{
count++;
return binarySearch( nums, svalue, start, (middle-1) );
}
/* Search to the right */
else if( nums[middle] < svalue )
{
count++;
return binarySearch( nums, svalue, (middle+1), end );
}
}
Proto.h
#ifndef _PROTO_H
#define _PROTO_H
char * array;
int elements, search, location, count, middle, i;
char value;
int binarySearch(char *, char, int, int);
#endif
Sample Input/Output
Sample Input file:
6
a d n o x y
3
n x z
Sample Output file:
n found at 2 during iteration 0.
x found at 4 during iteration 1.
z not found!

I did not check the whole code but I see this error inyour main.c
your code
/* Creates dynamic array */
array = (char *) calloc(elements, sizeof(char));
/* Copies sorted values to the dynamic array */
for(i = 0; i < elements; i++)
{
scanf("%s", &array[i]);
}
is wrong. your array shoud be double pointer char **array
/* Creates dynamic array */
array = calloc(elements, sizeof(char*));
/* Copies sorted values to the dynamic array */
for(i = 0; i < elements; i++)
{
scanf("%ms", &array[i]);
}
Try to divide your code and found out the part wich is the cause of the problem and back a gain with a small part of code this will help to find out the solution

Related

Segmentation Fault when My Code Executes the printf() in c

below I have posted my code. When I compile I receive no errors, and only one warning about variables I haven't used yet. the code works all the way to the line in code where it starts to print. I have tested all the sections and I believe that one is at fault. please let me know what I am doing wrong so I can fix it.
#include <stdio.h>
#include <string.h>
#define NUM_LINES 37
#define LINE_LENGTH 60
void select_sort_str(char list[NUM_LINES][LINE_LENGTH], int n);
int alpha_first(char list[NUM_LINES][LINE_LENGTH], int min_sub, int max_sub);
int main (void){
//store each line in an array of strings
FILE *inp;
FILE *outp;
char hurr[NUM_LINES][LINE_LENGTH];
;
inp = fopen("hurricanes.csv","r");
outp = fopen("out.txt","w");
//read in lines from file
for (int i = 0; i<NUM_LINES; i++){
fgets(hurr[i], LINE_LENGTH, inp);
}
inp = fopen("hurricanes.cvs","r");
//printf("%s", hurr[0]);
//define function
select_sort_str(hurr, NUM_LINES);
return(0);
}
int
alpha_first(char list[NUM_LINES][LINE_LENGTH], // input - array of pointers to strings
int min_sub, // input - min and max subscripts of
int max_sub) // portion of list to consider
{
int first, i;
first = min_sub;
for (i = min_sub + 1; i <= max_sub; ++i) {
if (strcmp(list[i], list[first]) < 0) {
first = i;
}
}
return (first);
}
/*
* Orders the pointers in an array list so they access strings in
* alphabetical order
* Pre: first n elements of list reference string of uniform case;
* n >= 0
*/
void
select_sort_str(char list[NUM_LINES][LINE_LENGTH], // input/output - array of pointers being
// ordered to acces strings alphabetically
int n) // input - number of elements to sort
{
int fill, // index of element to contain next string in order
index_of_min; // index of next string in order
char *temp;
char temp1[NUM_LINES][LINE_LENGTH];
for (fill = 0; fill < n - 1; ++fill) {
index_of_min = alpha_first(list, fill, n - 1);
if (index_of_min != fill) {
temp = list[index_of_min];
list[index_of_min][LINE_LENGTH] = list[fill][LINE_LENGTH];
strncpy(temp1[index_of_min], list[index_of_min], LINE_LENGTH);
temp1[fill][LINE_LENGTH] = *temp;
}
}
char *name;
char *cat = 0;
char *date;
for (int i = 0; i<NUM_LINES; i++){
name = strtok(NULL, ",");
cat = strtok(NULL, "h");
date = strtok(NULL, " ");
printf("%s %s %s\n", name, cat, date);
}
// for( int i =0; i<NUM_LINES; i++){
// printf("%s", list[i]);
// }
}
The only first parameter you ever pass to strtok is NULL. You never actually give it anything to parse. Did you perhaps mean strtok(temp1[i], ",");?
Also, why no error checking? It's much easier to find bugs in code with error checking.

How to return a matrix of all occurrences of given word/pattern?

I am trying to implement a function that's called Sniffer which gets two inputs and returns the correspond matrix.
First input is an integer binary array values (just zeros or ones)
second input is your searched word ( the word you choose it as argument to your function )
The functionally of the function :
Searching for occurrence of the given word within your given binary array.
At each occurrence of your word there's always follows 8 bits following it, assume that always the input is correct (it means that there's no possibility two occurrence occur one after the other without at least there's space 8bits (8 binary values)!).
Then the function must return a matrix(integer matrix) of those followed 8bit for each occurrence of the word sorted corresponded by every row of the matrix. (the functions returns just the first 8bit followed each occurrence of the Searched word)
This means:
First row has first 8 followed bit on first occurrence of the word.
Second row has first 8 followed bit on second occurrence of the word.
Third row has first 8 followed bit on third occurrence of the word.
Fourth row has first 8 followed bit on fourth occurrence of the word.
etc ...
I will elaborate by examples:
function structure is
Code:
int ** SnifferPattern(int* givenArray , int* SearchedWord);
//it returns the corresponded matrix so used int** ;
example 1:
givenArray = {1,0,1,0,1,1,1,1,1,1,1,1,0,0};
SearchedWord = {1,0,1,0};
so the function returns a matrix(size 1x8 - 8 corresponds to 8followed bit)
the first row is {1,1,1,1,1,1,1,1}, which is the first 8 bit followed
the word 1010 the matrix here with one row because there's just
one occurrence of the `SearchedWord` in the given array.
example 2:
givenArray = {1,1,0,0,1,1,1,1,1,1,1,1,1,1,0,0,1,0,1,0,1,0,1,0};
SearchedWord = {1,1,0,0}
so the function returns a matrix the first row is {1,1,1,1,1,1,1,1}
which is the first 8 bit followed the word 1010 for the first occurrence.
for the second occurrence we see the word appear , so the second row of
the returned matrix(size 2x8) will include the first 8bit followed the
word 1010 of the second occurrence. so second row of the matrix
is {1,0,1,0,1,0,1,0} so the returned matrix (size 2x8) has two rows
(because there's two occurrences of the SearchedWord) which each row
corresponds to each occurrence of the SearchedWord.
example 3:
givenArray = {1,1,1,0,1,1,1,1,1,1,1,1,0,0};
SearchedWord = {0,1,0}
so the function returns a matrix zero row (like an array with zero values)
size 1x8 (8 columns is corresponded to 8followed bit). There's no
occurrence of the SearchedWord within the givenArray so we return a zero
matrix. There's no overlap between the occurrences of the searchedWords ,
so we assume always correctness of input.
I will explain my algorithm (a pleasure if there's another suggestions more compatible to my case)
My algorithm is searching for the word at every occurrence and then take at every occurrence the first followed 8bit. I take them and store them in a matrix's rows. But it sounds much hard to complete with this.
what I succeeded / tried to implement in C is this:
int ** SnifferPattern(int* s ; int* w)
{
// s is the given array, w is the searched Word , number occurrences
// here 2 , so it should be two prints on the output of the program.
int n;
int a[1000];
int i=0;
int j;
int k = 0;
int l;
int found = 0;
int t = 0;
a[k++] = i;
j = 0;
for (i = 0; i < k; i++)
{
n = a[i] - j;
if (n == (sizeof(w)/sizeof(w[0])))
{
t = 0;
for (l = 0; w[l]; l++)
{
if (s[l + j] == w[l])
{
t++; // Matched a character.
}
}
if (t == (sizeof(w)/sizeof(w[0])))
{
found++; // We found a match!
printf("word occurred at location=%d \n", j); // Pint location
}
}
j = a[i] + 1;
}
}
int main() {
int s[1000] = {1,0,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1};
int w[1000] = {1,0,1};
// s is the given array, w is the searched Word , number occurrences
// here 2 , so it should be two prints on the output of the program.
SnifferPattern(s , w)
//I should print all the matrix's row in the main function .
return 0;
}
I think I have figured out what you need. And, as stated in your question ("at each occurrence of your word there's always follows 8 bits"), the following requires that a least 8-integers follow any match of w in s. Since you will include 8-integers in each row of the matrix you return, then using a pointer-to-array-of int[8] allows a single free() of the result in the caller.
Your sniffer function will loop over each integer in s keeping a counter index (ndx) of each time an integer in w matches the integer in s. When ndx equals the number of elements in w a match has been found and the next 8 integers are are collected as the columns in that row of your matrix using next8 as the index. You could write your sniffer function as:
#define AFTER 8
/* function returning allocated pointer to array of 8 int, *n of them */
int (*sniffer (int *s, int selem, int *w, int welem, int *n))[AFTER]
{
int ndx = 0, /* match index */
next8 = AFTER, /* counter for next 8 after match found */
found = 0, /* number of tiems w found in s */
(*matrix)[AFTER] = NULL;
for (int i = 0; i < selem; i++) { /* loop over each int in s */
if (next8 < AFTER) { /* count if next8 less than AFTER */
matrix[found-1][next8++] = s[i]; /* set next8 index in matrix row */
}
else if (s[i] == w[ndx]) { /* if s[i] matches w[ndx] */
if (++ndx == welem) { /* increment ndx, compare w/welem */
/* allocate storage for next row in matrix */
if (!(matrix = realloc (matrix, (found + 1) * sizeof *matrix))) {
perror ("realloc-matrix"); /* on failure, handle error */
*n = 0;
return NULL;
}
/* (optional) set all elements in row 0 */
memset (matrix[found], 0, sizeof *matrix);
found += 1; /* increment found count */
ndx = next8 = 0; /* reset ndx, set next8 to count */
}
}
else /* otherwise, not a match and not counting */
ndx = 0; /* set match index to 0 */
}
*n = found; /* update value at n to found, so n available in main */
return matrix; /* return allocated matrix */
}
(note: the number of elements in s is provided in selem and the number of elements in w is provided by welem)
Changing your s[] in main to int s[] = {1,0,1,0,0,0,0,1,1,1,1,1,0,1,1,1,1,1,0,0,0,0} so it is easy to verify the results, you could write you program as:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARSZ 1000 /* if you need a constant, #define one (or more) */
#define AFTER 8
/* function returning allocated pointer to array of 8 int, *n of them */
int (*sniffer (int *s, int selem, int *w, int welem, int *n))[AFTER]
{
int ndx = 0, /* match index */
next8 = AFTER, /* counter for next 8 after match found */
found = 0, /* number of tiems w found in s */
(*matrix)[AFTER] = NULL;
for (int i = 0; i < selem; i++) { /* loop over each int in s */
if (next8 < AFTER) { /* count if next8 less than AFTER */
matrix[found-1][next8++] = s[i]; /* set next8 index in matrix row */
}
else if (s[i] == w[ndx]) { /* if s[i] matches w[ndx] */
if (++ndx == welem) { /* increment ndx, compare w/welem */
/* allocate storage for next row in matrix */
if (!(matrix = realloc (matrix, (found + 1) * sizeof *matrix))) {
perror ("realloc-matrix"); /* on failure, handle error */
*n = 0;
return NULL;
}
/* (optional) set all elements in row 0 */
memset (matrix[found], 0, sizeof *matrix);
found += 1; /* increment found count */
ndx = next8 = 0; /* reset ndx, set next8 to count */
}
}
else /* otherwise, not a match and not counting */
ndx = 0; /* set match index to 0 */
}
*n = found; /* update value at n to found, so n available in main */
return matrix; /* return allocated matrix */
}
int main (void) {
int s[] = {1,0,1,0,0,0,0,1,1,1,1,1,0,1,1,1,1,1,0,0,0,0},
w[] = {1,0,1},
n = 0,
(*result)[AFTER] = NULL;
result = sniffer (s, sizeof s/sizeof *s, w, sizeof w/sizeof *w, &n);
for (int i = 0; i < n; i++) { /* loop over result matrix */
printf ("matrix[%d] = {", i); /* output prefix */
for (int j = 0; j < AFTER; j++) /* loop over column values */
printf (j ? ",%d" : "%d", result[i][j]); /* output column value */
puts ("}"); /* output suffix and \n */
}
free (result); /* free allocated memory */
}
Example Use/Output
$ ./bin/pattern_in_s
matrix[0] = {0,0,0,0,1,1,1,1}
matrix[1] = {1,1,1,1,0,0,0,0}
If I have misunderstood your question, please let me know in a comment below and I'm happy to help further. Let me know if you have any questions.
There are several issues you must solve.
How are arrays represented?
You have arrays of integers, whose valid values can be 0 or 1. How do you datermine the length of sich an array. There are basically two possibilities:
Use a sentinel value. That's how C strings are stored: The actual string is terminated by the special value '\0', the null terminator. The memory used to store the string may be larger. In your case, you could use a value that isn't used:
enum {
END = -1
};
int a[] = {0, 1, 0, 1, END};
The disadvantage here is that you must be careful not to forget the explicit terminator. Also, if you want to find out the length of an array, you must walk it to the end. (But that's not an issue with small arrays.)
Use an explicit length that goes along with the array.
int a[] = {1, 0, 1, 0};
int alen = sizeof(a) / sizeof(*a);
The disadvantage here is that you must pass the length to any function that operates on the array. The function f(a) does not know kow long a is; the function should be something like f(a, alen). (The sizeof(a) / sizeof(*a) mechanism works only when the array a is in scope. See here for a discussion on how to find the length of an array.)
You could, of course, define a struct that combines data and length.
How do you return an array from a function?
That's your actual question. Again there are several possibilities:
Return the array. That usually means to allocate the array you want to return on the heap with malloc, which means that the caller must call free on the result at some time. The caller must know how big the returned array is. You can use sentinels as described above or you could pass in a pointer to a sive variable, which the function fills in:
int **f(..., int *length) { ... }
Call this function like this:
int length;
int **p = f(..., &length);
for (int i = 0; i < length; i++) ...
Pass in an array and hve the function fill it. That means that the function must know about the size of the array. The return value can be used to return the actual size of the array:
int f(..., int **res, int max) { ... }
Call this function like this:
int *p[20];
int length = f(..., p, 20);
for (int i = 0; i < length; i++) ...
Let's apply this to your problem.
You want to match a pattern in a string and then return a list of all 8-bit sequences after the matches. Let's represent an array as array + length. Your function might then look like this:
int sniffer(const int *s, int slen, // haystack array + length
const int *w, int wlen, // needle array + length
const int **res, int reslen) // results array
{ ... }
It passes in the arrays s and w plus their lengths. It also passes in a third array plus its length. That array hold the results. The number of valid results – the number of rows in your matrix – is the returned value.
Call this function:
int s[] = {...};
int w[] = {...};
const int *res[8];
int n = sniffer(s, sizeof(s) / sizeof(*s),
w, sizeof(w) / sizeof(*w),
res, sizeof(res) / sizeof(*res));
What happens if there are more than reslen matches? The excess matches cannot be written, of course, but do they contribute to the return value? If they do, you could pass in an array length of 0 just to see how many matches there are. (That'
s what the string function snprintf does, in a way.) If they don't you get the exact length of the result array. Both strategies are valid.
Here's an example implementation. It uses your test case #2:
#include <stdlib.h>
#include <stdio.h>
/*
* test whether the next len elements of s and w match
*/
int match(const int *s, const int *w, int len)
{
while (len-- > 0) {
if (*s++ != *w++) return 0;
}
return 1;
}
int sniffer(const int *s, int slen, // haystack array + length
const int *w, int wlen, // needle array + length
const int **res, int reslen) // results array
{
int n = 0;
for (int i = 0; i <= slen - wlen - 8; i++) {
const int *p = s + i;
if (match(p, w, wlen)) {
if (n < reslen) res[n] = p + wlen;
n++;
}
}
return n;
}
int main(void)
{
int s[] = {1, 1, 0, 0, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 0, 0,
1, 0, 1, 0, 1, 0, 1, 0};
int w[] = {1, 1, 0, 0};
const int *res[8];
int reslen = sizeof(res) / sizeof(*res);
int n = sniffer(s, sizeof(s) / sizeof(*s),
w, sizeof(w) / sizeof(*w),
res, reslen);
printf("%d patterns:\n", n);
for (int i = 0; i < n && i < reslen; i++) {
printf("[%d] {", i);
for (int j = 0; j < 8; j++) {
if (j) printf(", ");
printf("%d", res[i][j]);
}
printf("}\n");
}
return 0;
}

Boyer Moore replace more than one pattern

I am working on a string search and replace project. I can only change 1 of the target pattern in the sentence. But I can find both.
Example: just do it. you will do it.
find: do
replace: think
expected---> just think it. you will think it.
what actually happened ---> just do it. you will think it.
How can I replace both of them?
I read the sentence from file input.txt
# include <limits.h>
# include <string.h>
# include <stdio.h>
#include <sys/time.h>
# define NO_OF_CHARS 256
# define MAX 10000
int sum = 0;
int control = 0;
// A utility function to get maximum of two integers
int max (int a, int b) { return (a > b)? a: b; }
// The preprocessing function for Boyer Moore's bad character heuristic
void badCharHeuristic( char *str, int size, int badchar[NO_OF_CHARS]) {
int i;
// Initialize all occurrences as -1
for (i = 0; i < NO_OF_CHARS; i++)
badchar[i] = -1;
// Fill the actual value of last occurrence of a character
for (i = 0; i < size; i++)
badchar[(int) str[i]] = i;
}
/* A pattern searching function that uses Bad Character Heuristic of Boyer Moore Algorithm */
void search( char *txt, char *pat,char temp3[MAX],int k,char*r) {
int m = strlen(pat);
int n = strlen(txt);
char src[MAX],p[MAX],temp[MAX],temp2[MAX],tempP[MAX],out[MAX];
int badchar[NO_OF_CHARS],i,leng,l,count;
char v;
/* Fill the bad character array by calling the preprocessing function badCharHeuristic() for given pattern */
badCharHeuristic(pat, m, badchar);
leng = strlen(pat);
strcpy(tempP,r);
//strcat(tempP,"</mark>");
leng = strlen(pat);
l = strlen(txt);
int s = 0; // s is shift of the pattern with respect to text
while(s <= (n - m)) {
int j = m-1;
/* Keep reducing index j of pattern while characters of pattern and text are matching at this shift s */
while(j >= 0 && pat[j] == txt[s+j]) {
count++;
j--;
}
/* If the pattern is present at current shift, then index j will become -1 after the above loop */
if (j < 0) {
//printf("pattern occurs at shift = %d\n", s);
/* Shift the pattern so that the next character in text
aligns with the last occurrence of it in pattern.
The condition s+m < n is necessary for the case when
pattern occurs at the end of text */
printf("The desired pattern was found starting from %d. line at position %d\n",k,s+1);
strncpy(temp, txt, s);
temp[s] = '\0';
//strcat(temp,"<mark>");
control++;
strcat(temp,tempP);
for(i=0;i<MAX;i++) {
if((s+leng+i)<strlen(txt))
temp2[i] = txt[s+leng+i];
else
temp2[i] = v;
}
strcat(temp,temp2);
strcpy(temp3,temp);
s += (s+m < n)? m-badchar[txt[s+m]] : 1;
}
else
/* Shift the pattern so that the bad character in text
aligns with the last occurrence of it in pattern. The
max function is used to make sure that we get a positive
shift. We may get a negative shift if the last occurrence
of bad character in pattern is on the right side of the
current character. */
s += max(1, j - badchar[txt[s+j]]);
}
sum +=count;
}
/* Driver program to test above funtion */
int main() {
char txt[MAX],p[MAX],r[MAX],temp[MAX],temp2[MAX],tempP[MAX],out[MAX];
int k = 1;
FILE *input = fopen("input.txt","r");
FILE *output = fopen("output.txt","w");
printf("Enter the text in which pattern is to be searched:");
fgets(p, MAX, stdin);
printf("Enter the text in which pattern is to be replaced:");
fgets(r, MAX, stdin);
struct timeval tv1, tv2;
gettimeofday(&tv1, NULL);
p[strlen(p)-1]='\0';
temp[1]='a';
while(!feof(input)){
if(fgets (txt, MAX, input)!=NULL) {
txt[strlen(txt)-1] = '\0';
search(txt, p,temp,k,r);
if(temp[1]!='a') {
fprintf(output,"%s\n",temp);
temp[1]='a';
}
else
fprintf(output,"%s\n",txt);
}
k++;
}
if(control==0) {
printf("\nThe pattern was not found in the given text\n\n");
}
gettimeofday(&tv2, NULL);
printf ("Total time = %f seconds\n", (double) (tv2.tv_usec - tv1.tv_usec) / 1000000 + (double) (tv2.tv_sec - tv1.tv_sec));
fclose(input);
fclose(output);
printf("The number of character comparison: %d\n",sum);
return 0;
}

Try to split string but got messy substrings

I try to split one string to 3-gram strings. But turns out that the resulting substrings were always messy. The length and char ** input... are needed, since I will use them as args later for python calling the funxtion.
This is the function I wrote.
struct strArrIntArr getSearchArr(char* input, int length) {
struct strArrIntArr nameIndArr;
// flag of same bit
int same;
// flag/index of identical strings
int flag = 0;
// how many identical strings
int num = 0;
// array of split strings
char** nameArr = (char **)malloc(sizeof(char *) * (length - 2));
if ( nameArr == NULL ) exit(0);
// numbers of every split string
int* valueArr = (int* )malloc(sizeof(int) * (length-2));
if ( valueArr == NULL ) exit(0);
// loop length of search string -2 times (3-gram)
for(int i = 0; i<length-2; i++){
if(flag==0){
nameArr[i - num] = (char *)malloc(sizeof(char) * 3);
if ( nameArr[i - num] == NULL ) exit(0);
printf("----i------------%d------\n", i);
printf("----i-num--------%d------\n", i-num);
}
flag = 0;
// compare splitting string with existing split strings,
// if a string exists, it would not be stored
for(int k=0; k<i-num; k++){
same = 0;
for(int j=0; j<3; j++){
if(input[i + j] == nameArr[k][j]){
same ++;
}
}
// identical strings found, if all the three bits are the same
if(same == 3){
flag = k;
num++;
break;
}
}
// if the current split string doesn't exist yet
// put current split string to array
if(flag == 0){
for(int j=0; j<3; j++){
nameArr[i-num][j] = input[i + j];
valueArr[i-num] = 1;
}
}else{
valueArr[flag]++;
}
printf("-----string----%s\n", nameArr[i-num]);
}
// number of N-gram strings
nameIndArr.length = length- 2- num;
// array of N-gram strings
nameIndArr.charArr = nameArr;
nameIndArr.intArr = valueArr;
return nameIndArr;
}
To call the function:
int main(int argc, const char * argv[]) {
int length = 30;
char* input = (char *)malloc(sizeof(char) * length);
input = "googleapis.com.wncln.wncln.org";
// split the search string into N-gram strings
// and count the numbers of every split string
struct strArrIntArr nameIndArr = getSearchArr(input, length);
}
Below is the result. The strings from 17 are messy.
----i------------0------
----i-num--------0------
-----string----goo
----i------------1------
----i-num--------1------
-----string----oog
----i------------2------
----i-num--------2------
-----string----ogl
----i------------3------
----i-num--------3------
-----string----gle
----i------------4------
----i-num--------4------
-----string----lea
----i------------5------
----i-num--------5------
-----string----eap
----i------------6------
----i-num--------6------
-----string----api
----i------------7------
----i-num--------7------
-----string----pis
----i------------8------
----i-num--------8------
-----string----is.
----i------------9------
----i-num--------9------
-----string----s.c
----i------------10------
----i-num--------10------
-----string----.co
----i------------11------
----i-num--------11------
-----string----com
----i------------12------
----i-num--------12------
-----string----om.
----i------------13------
----i-num--------13------
-----string----m.w
----i------------14------
----i-num--------14------
-----string----.wn
----i------------15------
----i-num--------15------
-----string----wnc
---i------------16------
----i-num--------16------
-----string----ncl
----i------------17------
----i-num--------17------
-----string----clnsole
----i------------18------
----i-num--------18------
-----string----ln.=C:
----i------------19------
----i-num--------19------
-----string----n.wgram 馻绚s
----i------------20------
----i-num--------20------
-----string----n.wgram 馻绚s
-----string----n.wgram 馻绚s
-----string----n.wgram 馻绚s
-----string----n.wgram 馻绚s
-----string----n.wgram 馻绚s
-----string----n.oiles(騛窑=
----i------------26------
----i-num--------21------
-----string----.orSModu鯽蓼t
----i------------27------
----i-num--------22------
-----string----org
under win10, codeblocks 17.12, gcc 8.1.0
You are making life complicated for you in several places:
Don't count backwards: Instead of making num the count of duplicates, make it the count of unique trigraphs.
Scope variable definitions in functions as closely as possible. You have several uninitialized variables. You have declared them at the start of the function, but you need them only in local blocks.
Initialize as soon as you allocate. In your code, you use a flag to determine whather to create a new string. The code to allocate he string and to initialize it are in different blocks. Those blocks have the same flag as condition, but the flag is updated in between. This could lead to asynchronities, even to bugs when you try to initialize memory that wasn't allocated.
It's probably better to keep the strings and their counts together in a struct. If anything, this will help you with sorting later. This also offers some simplification: Instead of allocating chunks of 3 bytes, keep a char array of four bytes in the struct, so that all entries can be properly null-terminated. Those don't need to be allocated separately.
Here's an alternative implementation:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
struct tri {
char str[4]; // trigraph: 3 chars and NUL
int count; // count of occurrences
};
struct stat {
struct tri *tri; // list of trigraphs with counts
int size; // number of trigraphs
};
/*
* Find string 'key' in list of trigraphs. Return the index
* or in the array or -1 if it isn't found.
*/
int find_trigraph(const struct tri *tri, int n, const char *key)
{
for (int i = 0; i < n; i++) {
int j = 0;
while (j < 3 && tri[i].str[j] == key[j]) j++;
if (j == 3) return i;
}
return -1;
}
/*
* Create an array of trigraphs from the input string.
*/
struct stat getSearchArr(char* input, int length)
{
int num = 0;
struct tri *tri = malloc(sizeof(*tri) * (length - 2));
for(int i = 0; i < length - 2; i++) {
int index = find_trigraph(tri, num, input + i);
if (index < 0) {
snprintf(tri[num].str, 4, "%.3s", input + i); // see [1]
tri[num].count = 1;
num++;
} else {
tri[index].count++;
}
}
for(int i = 0; i < num; i++) {
printf("#%d %s: %d\n", i, tri[i].str, tri[i].count);
}
struct stat stat = { tri, num };
return stat;
}
/*
* Driver code
*/
int main(void)
{
char *input = "googleapis.com.wncln.wncln.org";
int length = strlen(input);
struct stat stat = getSearchArr(input, length);
// ... do stuff with stat ...
free(stat.tri);
return 0;
}
Footnote 1: I find that snprintf(str, n, "%.*s", len, str + offset) is useful for copying substrings: The result will not overflow the buffer and it will be null-terminated. There really ought to be a stanard function for this, but strcpy may overflow and strncpy may leave the buffer unterminated.
This answer tries to fix the existing code instead of proposing alternative/better solutions.
After fixing the output
printf("-----string----%s\n", nameArr[i-num]);
in the question, there is still another important problem.
You want to store 3 characters in nameArr[i-num] and allocate space for 3 characters. Later you print is as a string in the code shown above. This requires a trailing '\0' after the 3 characters, so you have to allocate memory for 4 characters and either append a '\0' or initialize the allocated memory with 0. Using calloc instead of malloc would automatically initialize the memory to 0.
Here is a modified version of the source code
I also changed the initialization of the string value and its length in main() to avoid the memory leak.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct strArrIntArr {
int length;
char **charArr;
int *intArr;
};
struct strArrIntArr getSearchArr(char* input, int length) {
struct strArrIntArr nameIndArr;
// flag of same bit
int same;
// flag/index of identical strings
int flag = 0;
// how many identical strings
int num = 0;
// array of split strings
char** nameArr = (char **)malloc(sizeof(char *) * (length - 2));
if ( nameArr == NULL ) exit(0);
// numbers of every split string
int* valueArr = (int* )malloc(sizeof(int) * (length-2));
if ( valueArr == NULL ) exit(0);
// loop length of search string -2 times (3-gram)
for(int i = 0; i<length-2; i++){
if(flag==0){
nameArr[i - num] = (char *)malloc(sizeof(char) * 4);
if ( nameArr[i - num] == NULL ) exit(0);
printf("----i------------%d------\n", i);
printf("----i-num--------%d------\n", i-num);
}
flag = 0;
// compare splitting string with existing split strings,
// if a string exists, it would not be stored
for(int k=0; k<i-num; k++){
same = 0;
for(int j=0; j<3; j++){
if(input[i + j] == nameArr[k][j]){
same ++;
}
}
// identical strings found, if all the three bits are the same
if(same == 3){
flag = 1;
num++;
break;
}
}
// if the current split string doesn't exist yet
// put current split string to array
if(flag == 0){
for(int j=0; j<3; j++){
nameArr[i-num][j] = input[i + j];
valueArr[i-num] = 1;
}
nameArr[i-num][3] = '\0';
}else{
valueArr[flag]++;
}
printf("-----string----%s\n", nameArr[i-num]);
}
// number of N-gram strings
nameIndArr.length = length- 2- num;
// array of N-gram strings
nameIndArr.charArr = nameArr;
nameIndArr.intArr = valueArr;
return nameIndArr;
}
int main(int argc, const char * argv[]) {
int length;
char* input = strdup("googleapis.com.wncln.wncln.org");
length = strlen(input);
// split the search string into N-gram strings
// and count the numbers of every split string
struct strArrIntArr nameIndArr = getSearchArr(input, length);
}
This other answer contains more improvements which I personally would prefer over the modified original solution.

C - Print the most frequent strings

in these days I have been posting some code because I am doing an exercise, finally it seems that I have ended it, but I noticed it doesn't work.
The exercise asks in input:
- N an integer, representing the number of strings to read
- K an integer
- N strings
The strings can be duplicates. In the output there is a print of the K strings most frequent, ordered according to their frequency (decreasing order).
Example test set:
Input:
6
2
mickey
mouse
mickey
hello
mouse
mickey
Output:
mickey // Has freq 3
mouse // Has freq 2
I hope I explained the exercise in a good way, as this is my attempt.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct _stringa {
char* string;
int freq;
} stringa;
int compare(const void *elem1, const void *elem2) {
stringa *first = (stringa *)elem1;
stringa *second = (stringa *)elem2;
if (first->freq < second->freq) {
return -1;
} else if (first->freq > second->freq) {
return 1;
} else {
return 0;
}
}
int BinarySearch(stringa** array, char* string, int left, int right) {
int middle;
if (left==right) {
if (strcmp(string,array[left]->string)==0) {
return left;
} else {
return -1;
}
}
middle = (left+right)/2;
if ((strcmp(string,array[middle]->string)<0) || (strcmp(string,array[middle]->string)==0) ) {
return BinarySearch(array, string, left, middle);
} else {
return BinarySearch(array, string, middle+1, right);
}
}
int main (void)
{
char value[101];
int n = 0;
int stop;
scanf("%d", &n); // Number of strings
scanf("%d", &stop); // number of the most frequent strings to print
stringa **array = NULL;
array = malloc ( n * sizeof (struct _stringa *) );
int i = 0;
for (i=0; i<n; i++) {
array[i] = malloc (sizeof (struct _stringa));
array[i]->string = malloc (sizeof (value));
scanf("%s", value);
int already;
already = BinarySearch(array, value, 0, i); // With a binary search, I see if the string is present in the previous positions of the array I am occupying. If it is not present, I copy the string into the array, otherwise, I use the value of binary search (which is the position of the element in the array) and I update the frequency field
if (already==-1) {
strcpy(array[i]->string,value);
array[i]->freq = 1;
} else {
array[already]->freq += 1;
}
}
stringa **newarray = NULL; // New struct array of strings
newarray = malloc ( n * sizeof (struct _stringa *) );
int k = 0;
for (i=0; i<n; i++) { // I use this loop to copy the element that don't have a frequency == 0
if (array[i]->freq != 0) {
newarray[k] = malloc(sizeof(struct _stringa));
newarray[k] = malloc(sizeof(value));
newarray[k]->string = array[i]->string;
newarray[k]->freq = array[i]->freq;
k++;
}
}
qsort(newarray, n, sizeof(stringa*), compare);
i=0;
while ((newarray[i]!= NULL) && (i<k)) {
printf("%s ", newarray[i]->string);
printf("%d\n", newarray[i]->freq);
i++;
}
// Freeing operations
while (--n >= 0) {
if (array[n]->string) free (array[n]->string);
if (array[n]) free (array[n]);
}
if (array) free (array);
if (newarray) free (newarray);
return 0;
}
Thank you in advance to anyone who will have the time and patience to read this code.
EDIT:
I forgot to add what it's not working right.
If I don't use the qsort for debugging reasons, and I use this input for example:
5
2 // random number, I still have to do the 'print the k strings' part,
hello
hello
hello
hello
hello
It prints:
hello 3 (freq)
hello 2 (freq)
So it doesn't work properly. As you suggested in the comments, the binary search is flawed as it works only on an ordered list. What I could do is order the array each time, but I think this would be counter-productive. What could be the idea to get rid of the problem of locating only the strings that are not present in the array?
If you want an efficient method without sorting, use a hash table.
Otherwise, simply put the each unique string in an array and scan it linearly, simple and reliable.
On modern hardware, this kind of scan is actually fast due to caches and minimising indirection. For small numbers of items an insertion sort is actually more efficient than qsort's in practice. Looking at the "Tim sort" algorithm for instance, which is stable and avoids qsort's poor performance with nearly sorted data, it mixes merge and insertion sorts to achieve n Log n, without extreme cases on real data.

Resources