Boyer-Moore Algorithm

Boyer-Moore Algorithm - c

I'm trying to implement Boyer-Moore Algorithm in C for searching a particular word in .pcap file. I have referenced code from http://ideone.com/FhJok5. I'm using this code as it is.
Just I'm passing packet as string and the keyword I'm searching for to the function search() in it. When I'm running my code it is giving different values every time. Some times its giving correct value too. But most of times its not identifying some values.
I have obtained results from Naive Algo Implementation. Results are always perfect.
I am using Ubuntu 12.0.4 over VMware 10.0.1. lang: C
My question is It has to give the same result every time right? whether right or wrong. This output keeps on changing every time i run the file on same inputs; and during several runs, it gives correct answer too. Mostly the value is varying between 3 or 4 values.
For Debugging I did so far:
passed strings in stead of packet every time, Its working perfect and same and correct value every time.
checking pcap part, I can see all packets are being passed to the function (I checked by printing packet frame no).
same packets I am sending to Naive Algo code, its giving perfect code.
Please give me some idea, what can be the issue. I suspect some thing wrong with memory management. but how to find which one?
Thanks in advance.
# include <limits.h>
# include <string.h>
# include <stdio.h>
# define NO_OF_CHARS 256
// A utility function to get maximum of two integers
int max (int a, int b) { return (a > b)? a: b; }
// The preprocessing function for Boyer Moore's bad character heuristic
void badCharHeuristic( char *str, int size, int badchar[NO_OF_CHARS])
{
int i;
// Initialize all occurrences as -1
for (i = 0; i < NO_OF_CHARS; i++)
badchar[i] = -1;
// Fill the actual value of last occurrence of a character
for (i = 0; i < size; i++)
badchar[(int) str[i]] = i;
}
/* A pattern searching function that uses Bad Character Heuristic of
Boyer Moore Algorithm */
void search( char *txt, char *pat)
{
int m = strlen(pat);
int n = strlen(txt);
int badchar[NO_OF_CHARS];
/* Fill the bad character array by calling the preprocessing
function badCharHeuristic() for given pattern */
badCharHeuristic(pat, m, badchar);
int s = 0; // s is shift of the pattern with respect to text
while(s <= (n - m))
{
int j = m-1;
/* Keep reducing index j of pattern while characters of
pattern and text are matching at this shift s */
while(j >= 0 && pat[j] == txt[s+j])
j--;
/* If the pattern is present at current shift, then index j
will become -1 after the above loop */
if (j < 0)
{
printf("\n pattern occurs at shift = %d", s);
/* Shift the pattern so that the next character in text
aligns with the last occurrence of it in pattern.
The condition s+m < n is necessary for the case when
pattern occurs at the end of text */
s += (s+m < n)? m-badchar[txt[s+m]] : 1;
}
else
/* Shift the pattern so that the bad character in text
aligns with the last occurrence of it in pattern. The
max function is used to make sure that we get a positive
shift. We may get a negative shift if the last occurrence
of bad character in pattern is on the right side of the
current character. */
s += max(1, j - badchar[txt[s+j]]);
}
}
/* Driver program to test above function */
int main()
{
char txt[] = "ABAAAABAACD";
char pat[] = "AA";
search(txt, pat);
return 0;

Related

Count Different Character Types In String

I wrote a program that counts and prints the number of occurrences of elements in a string but it throws a garbage value when i use fgets() but for gets() it's not so.
Here is my code:
#include<stdio.h>
#include<string.h>
#include<ctype.h>
#include<stdlib.h>
int main() {
char c[1005];
fgets(c, 1005, stdin);
int cnt[26] = {0};
for (int i = 0; i < strlen(c); i++) {
cnt[c[i] - 'a']++;
}
for (int i = 0; i < strlen(c); i++) {
if(cnt[c[i]-'a'] != 0) {
printf("%c %d\n", c[i], cnt[c[i] - 'a']);
cnt[c[i] - 'a'] = 0;
}
}
return 0;
}
This is what I get when I use fgets():
baaaabca
b 2
a 5
c 1
32767
--------------------------------
Process exited after 8.61 seconds with return value 0
Press any key to continue . . . _
I fixed it by using gets and got the correct result but i still don't understand why fgets() gives wrong result

Hurray! So, the most important reason your code is failing is that your code does not observe the following inviolable advice:
Always sanitize your inputs
What this means is that if you let the user input anything then he/she/it can break your code. This is a major, common source of problems in all areas of computer science. It is so well known that a NASA engineer has given us the tale of Little Bobby Tables:
Exploits of a Mom #xkcd.com
It is always worth reading the explanation even if you get it already #explainxkcd.com
medium.com wrote an article about “How Little Bobby Tables Ruined the Internet”
Heck, Bobby’s even got his own website — bobby-tables.com
Okay, so, all that stuff is about SQL injection, but the point is, validate your input before blithely using it. There are many, many examples of C programs that fail because they do not carefully manage input. One of the most recent and widely known is the Heartbleed Bug.
For more fun side reading, here is a superlatively-titled list of “The 10 Worst Programming Mistakes In History” #makeuseof.com — a good number of which were caused by failure to process bad input!
Academia, methinks, often fails students by not having an entire course on just input processing. Instead we tend to pretend that the issue will be later understood and handled — code in academia, science, online competition forums, etc, often assumes valid input!
Where your code went wrong
Using gets() is dangerous because it does not stop reading and storing input as long as the user is supplying it. It has created so many software vulnerabilities that the C Standard has (at long last) officially removed it from C. SO actually has an excellent post on it: Why is the gets function so dangerous that it should not be used?
But it does remove the Enter key from the end of the user’s input!
fgets(), in contrast, stops reading input at some point! However, it also lets you know whether you actually got an entire line of of text by not removing that Enter key.
Hence, assuming the user types: b a n a n a Enter
gets() returns the string "banana"
fgets() returns the string "banana\n"
That newline character '\n' (what you get when the user presses the Enter key) messes up your code because your code only accepts (or works correctly given) minuscule alphabet letters!
The Fix
The fix is to reject anything that your algorithm does not like. The easiest way to recognize “good” input is to have a list of it:
// Here is a complete list of VALID INPUTS that we can histogram
//
const char letters[] = "abcdefghijklmnopqrstuvwxyz";
Now we want to create a mapping from each letter in letters[] to an array of integers (its name doesn’t matter, but we’re calling it count[]). Let’s wrap that up in a little function:
// Here is our mapping of letters[] ←→ integers[]
// • supply a valid input → get an integer unique to that specific input
// • supply an invalid input → get an integer shared with ALL invalid input
//
int * histogram(char c) {
static int fooey; // number of invalid inputs
static int count[sizeof(letters)] = {0}; // numbers of each valid input 'a'..'z'
const char * p = strchr(letters, c); // find the valid input, else NULL
if (p) {
int index = p - letters; // 'a'=0, 'b'=1, ... (same order as in letters[])
return &count[index]; // VALID INPUT → the corresponding integer in count[]
}
else return &fooey; // INVALID INPUT → returns a dummy integer
}
For the more astute among you, this is rather verbose: we can totally get rid of those fooey and index variables.
“Okay, okay, that’s some pretty fancy stuff there, mister. I’m a bloomin’ beginner. What about me, huh?”
Easy. Just check that your character is in range:
int * histogram(char c) {
static int fooey = 0;
static int count[26] = {0};
if (('a' <= c) && (c <= 'z')) return &count[c - 'a'];
return &fooey;
}
“But EBCDIC...!”
Fine. The following will work with both EBCDIC and ASCII:
int * histogram(char c) {
static int fooey = 0;
static int count[26] = {0};
if (('a' <= c) && (c <= 'i')) return &count[ 0 + c - 'a'];
if (('j' <= c) && (c <= 'r')) return &count[ 9 + c - 'j'];
if (('s' <= c) && (c <= 'z')) return &count[18 + c - 's'];
return &fooey;
}
You will honestly never have to worry about any other character encoding for the Latin minuscules 'a'..'z'.Prove me wrong.
Back to main()
Before we forget, stick the required magic at the top of your program:
#include <stdio.h>
#include <string.h>
Now we can put our fancy-pants histogram mapping to use, without the possibility of undefined behavior due to bad input.
int main() {
// Ask for and get user input
char s[1005];
printf("s? ");
fgets(s, 1005, stdin);
// Histogram the input
for (int i = 0; i < strlen(s); i++) {
*histogram(s[i]) += 1;
}
// Print out the histogram, not printing zeros
for (int i = 0; i < strlen(letters); i++) {
if (*histogram(letters[i])) {
printf("%c %d\n", letters[i], *histogram(letters[i]));
}
}
return 0;
}
We make sure to read and store no more than 1004 characters (plus the terminating nul), and we prevent unwanted input from indexing outside of our histogram’s count[] array! Win-win!
s? a - ba na na !
a 4
b 1
n 2
But wait, there’s more!
We can totally reuse our histogram. Check out this little function:
// Reset the histogram to all zeros
//
void clear_histogram(void) {
for (const char * p = letters; *p; p++)
*histogram(*p) = 0;
}
All this stuff is not obvious. User input is hard. But you will find that it doesn’t have to be impossibly difficult genius-level stuff. It should be entertaining!
Other ways you could handle input is to transform things into acceptable values. For example you can use tolower() to convert any majuscule letters to your histogram’s input set.
s? ba na NA!
a 3
b 1
n 2
But I digress again...
Hang in there!

Insertion sort C programming - alphabet letters

I just started to learn C and came across the Insertion Sort algorithm. I am now trying to apply it to the alphabet letters. However, the challenge I face is that I want the program to sort out the letters independently of whether or not they are caps or not.
For example, 'd' should come before 'E' in the sorted list. Here's what I have so far.
Thank you for the help.
#include <stdio.h>
#define MAX_NUMS 10
void InsertionSort(char list[]);
int main()
{
int index = 0; /* iteration variable */
char letters[MAX_NUMS]; /* list of number to be sorted */
/* Get input */
printf("Enter a word: ");
scanf("%s", letters);
printf("%s\n", letters);
InsertionSort(letters); /* Call sorting routine */
printf("pass");
/* Print sorted list */
printf("\nThe input set, in ascending order:\n");
while (letters[index] != '\0') {
printf("%c\n", letters[index]);
index += 1;
}
/* printf("%s", letters);
for (index = 0; index < MAX_NUMS; index++)
printf("%c\n", letters[index]); */
}
void InsertionSort(char list[])
{
int unsorted; /* index for unsorted list items */
int sorted; /* index for sorted items */
char unsortedItem; /* Current item to be sorted */
char lowUnsortedItem;
/* This loop iterates from 1 thru MAX_NUMS */
for (unsorted = 0; list[unsorted] != '\0'; unsorted++) {
unsortedItem = list[unsorted];
if (unsortedItem >= 'A' && unsortedItem <= 'Z') {
lowUnsortedItem = unsortedItem + 32;
}
/* This loop iterates from unsorted thru 0, unless
we hit an element smaller than current item */
for (sorted = unsorted - 1;
(sorted >= 0) && (list[sorted] > unsortedItem);
sorted--)
list[sorted + 1] = list[sorted];
list[sorted + 1] = unsortedItem; /* Insert item */
}
}

I think you need to use tolower() from ctype.h to obtain lowercase variant of each character and then compare those lowercase variants.
i.e. like:
lowUnsortedItem = tolower(unsortedItem)
...
for ( ...; tolower(list[sorted]) > lowUnsortedItem)
...

Improved answer: My previous attempt contained a bug and did not include the (later) requirement that, e.g., A should come before a. I have rewritten therefore my answer.
Since we have a more complex sorting condition than merely "less than" according to the ASCII character set, let's make the code more modular by writing a function that sorts and a function that returns whether two characters are sorted correctly (the function isLessThan below). If, at a later stage, we'd like to compare different types or use a different ordering, we can simply use a different comparison function. Alternative, we could also use this comparison function with another sorting algorithm. Also, see the C standard library's qsort which uses a similar modular approach.
To use boolean values, include the stdbool.h header. It is more convenient to use library functions to detect if a character is uppercase or lowercase, and to change between the two. Therefore, include ctype.h. Note that these functions accept int's! Be sure to only give them characters as input, otherwise their behavior is undefined.
Modified code snippet:
bool isLessThan(char lhs, char rhs)
{
if (tolower(lhs) == tolower(rhs))
return isupper(lhs) || islower(rhs);
return tolower(lhs) < tolower(rhs);
}
void InsertionSort(char list[])
{
for (size_t unsortedIdx = 1U; list[unsortedIdx] != '\0'; ++unsortedIdx)
{
char unsortedItem = list[unsortedIdx];
size_t sortedIdx = unsortedIdx;
for ( ; sortedIdx > 0U && !isLessThan(list[sortedIdx - 1U], unsortedItem); --sortedIdx)
list[sortedIdx] = list[sortedIdx - 1U];
list[sortedIdx] = unsortedItem; // Insert item
}
}
The comparison function first checks for the special property we'd like to have, e.g. A comes before a. Otherwise, we make sure both characters are lowercase and compare them.
The insertion sort algorithm now only contains the "insertion sorting logic". The indices used have been changed slightly to accommodate using the unsigned type size_t for indices, which is a good practice.
Some additional advice: scanf can be unsafe w.r.t. buffer overflows. It is best replaced by fgets and sscanf.

Accounting for no existant characters as inputs C

Sorry if the question title is a little bit off, I had no idea what to call it just because it is such a peculiar question. What I am aiming to do is decode an input string encoded using a method I will explain in a bit, into a plain English text.
Encoding is done by choosing an integer nRows between 2 and half the length of the message, e.g. a message of length 11 would allow values of nRows in the range 2 to 5. The message is then written down the columns of a grid, one character in each grid cell, nRows in each column, until all message characters have been used. This may result in the last column being only partially filled. The message is then read out row-wise.
For example if the input message was ALL HAIL CAESAR, and the nRows value was 2, encoding would look like this:
A L H I A S R
L A L C E A #
Where # symbolizes a or blank character in the table, that doesn't actually exist - I have simply added it to explain the next part :)
The actual question I have is decoding these phrases. The code I have written thus far works for a few problems, but once the blank characters (#) become many the code begins to break down, as the code obviously does not register them and the algorithm skips past them.
My code is:
/*
* DeConfabulons.c
* A program to Decode for the Confabulons
*
* August 9th 2015
*/
#include <stdio.h>
#include <string.h>
#include <math.h>
//A simple function confab which given input text encoded using
//the Confabulons encoding scheme, and a number of rows, returns
//the originally encoded phrase.
void deconfab(const char inText[], int nRows, char outText[])
{
int count = 0;
int i = 0;
int len = strlen(inText);
float help = ((float)len/(float)nRows);
int z = 0;
while (z < round(help))
{
while (((int)inText[count] > 0) && (count <= len))
{
outText[i] = inText[count];
i ++;
if (count < (int)help)
{
count = count + round((int)help+0.5);
}
else
{
float helper = count + help;
count = round(helper);
}
}
z ++;
count = z;
}
outText[i] = '\0';
}
Which thus far works for the Caesar example I gave earlier. The encoded form of it was ALHI ASRL ALCEA. The main(void) input I have been provided for that problem was:
char buffer[40] = {'\0'};
deconfab("ALHI ASRL ALCEA", 2, buffer);
printf("%s\n", buffer);
Which correctly outputs:
ALL HAIL CAESAR
However when working with cases with extra "blank" characters such as:
char buffer[60] = {0};
char* s = "Two hnvde eo frgqo .uxti hcjeku mlbparszo y";
deconfab(s, 13, buffer);
printf("%s\n", buffer);
The output should be:
The quick brown fox jumps over the lazy dog.
However my code will return:
Thdefq.the browneorouickmps ov g x julazy
I have concluded that this caused by the blank characters at the end in the last column by running through multiple tests by hand, however no matter what I try the code will not work for every test case. I am allowed to edit the bulk of the function in nearly any way, however any inputs or anything in int main(void) is not allowed to be edited.
I am simply looking for a way to have these blank characters recognized as characters without actually being there (as such) :)

First of all, as far as I see, you don't include those "null" characters in your input - if you did that (I guess) by adding any "dummy" characters, the algorithm would work. The reason it does in the first case is that the 'blank' character is missing at the end of the input - the same place as it's missing in the sentence.
You can try to make a workaround by guessing the length of a message with those dummy characters (I'm not sure how to formulate this) like:
ALHI ASRL ALCEA has 15 characters (15 mod 2 = 1) but ALHI ASRL ALCEA# has 16 characters. Similarly, Two hnvde eo frgqo .uxti hcjeku mlbparszo y has 44 characters (44 mod 13 = 5) so you need quite a lot of the dummy chars to make this work (13-5=8).
There are several ways at this point - you can for instance try to insert the missing blank spaces to align the columns, copy everything into a 2-dimensional array char by char, and then read it line by line, or just determine the (len mod rows) characters from the last column, remove them from the input (requires some fiddling with the classic C string functions so I won't give you the full answer here), read the rest and then append the characters from the last column.
I hope this helps.

There is some mess with index calculation.
At first it is pure discrete transformation. So, it should be implemented using only integer numbers.
The function below does what you need.
void deconfab(const char inText[], int nRows, char outText[])
{
int len = strlen(inText);
int cols = len / nRows;
int rows_with_large_cols = len % nRows;
int count = 0;
int col = 0;
int row = 0;
while (count < len)
{
int idx;
if (row < rows_with_large_cols)
idx = row * (cols + 1) + col;
else
idx = rows_with_large_cols * (cols + 1) +
(row - rows_with_large_cols) * cols + col;
if (idx > len - 1) {
++col;
row = 0;
idx = col;
}
outText[count] = inText[idx];
++row;
++count;
}
outText[count] = '\0';
}
It may be rewritten more nicely. Now it is like a pseudocode to explain the algorithm.

You cannot use the standard str* functions if you are going to handle nulls. You must, instead, work with the data directly and use the *read family of functions to get your data.

How should I generate the n-th digit of this sequence in logarithmic time complexity?

I have the following problem:
The point (a) was easy, here is my solution:
#include <stdio.h>
#include <string.h>
#define MAX_DIGITS 1000000
char conjugateDigit(char digit)
{
if(digit == '1')
return '2';
else
return '1';
}
void conjugateChunk(char* chunk, char* result, int size)
{
int i = 0;
for(; i < size; ++i)
{
result[i] = conjugateDigit(chunk[i]);
}
result[i] = '\0';
}
void displaySequence(int n)
{
// +1 for '\0'
char result[MAX_DIGITS + 1];
// In this variable I store temporally the conjugates at each iteration.
// Since every component of the sequence is 1/4 the size of the sequence
// the length of `tmp` will be MAX_DIGITS / 4 + the string terminator.
char tmp[(MAX_DIGITS / 4) + 1];
// There I assing the basic value to the sequence
strcpy(result, "1221");
// The initial value of k will be 4, since the base sequence has ethe length
// 4. We can see that at each step the size of the sequence 4 times bigger
// than the previous one.
for(int k = 4; k < n; k *= 4)
{
// We conjugate the first part of the sequence.
conjugateChunk(result, tmp, k);
// We will concatenate the conjugate 2 time to the original sequence
strcat(result, tmp);
strcat(result, tmp);
// Now we conjugate the conjugate in order to get the first part.
conjugateChunk(tmp, tmp, k);
strcat(result, tmp);
}
for(int i = 0; i < n; ++i)
{
printf("%c", result[i]);
}
printf("\n");
}
int main()
{
int n;
printf("Insert n: ");
scanf("%d", &n);
printf("The result is: ");
displaySequence(n);
return 0;
}
But for the point b I have to generate the n-th digit in logarithmic time. I have no idea how to do it. I have tried to find a mathematical property of that sequence, but I failed. Can you help me please? It is not the solution itself that really matters, but how do you tackle this kind of problems in a short amount of time.
This problem was given last year (in 2014) at the admission exam at the Faculty of Mathematics and Computer Science at the University of Bucharest.

Suppose you define d_ij as the value of the ith digit in s_j.
Note that for a fixed i, d_ij is defined only for large enough values of j (at first, s_j is not large enough).
Now you should be able to prove to yourself the two following things:
once d_ij is defined for some j, it will never change as j increases (hint: induction).
For a fixed i, d_ij is defined for j logarithmic in i (hint: how does the length of s_j increase as a function of j?).
Combining this with the first item, which you solved, should give you the result along with the complexity proof.

There is a simple programming solution, the key is to use recursion.
Firstly determine the minimal k that the length of s_k is more than n, so that n-th digit exists in s_k. According to a definition, s_k can be split into 4 equal-length parts. You can easily determine into which part the n-th symbol falls, and what is the number of this n-th symbol within that part --- say that n-th symbol in the whole string is n'-th within this part. This part is either s_{k-1}, either inv(s_{k-1}). In any case you recursively determine what is n'-th symbol within that s_{k-1}, and then, if needed, invert it.

The digits up to 4^k are used to determine the digts up to 4^(k+1). This suggests writing n in base 4.
Consider the binary expansion of n where we pair digits together, or equivalently the base 4 expansion where we write 0=(00), 1=(01), 2=(10), and 3=(11).
Let f(n) = +1 if the nth digit is 1, and -1 if the nth digit is 2, where the sequence starts at index 0 so f(0)=1, f(1)=-1, f(2)-1, f(3)=1. This index is one lower than the index starting from 1 used to compute the examples in the question. The 0-based nth digit is (3-f(n))/2. If you start the indices at 1, the nth digit is (3-f(n-1))/2.
f((00)n) = f(n).
f((01)n) = -f(n).
f((10)n) = -f(n).
f((11)n) = f(n).
You can use these to compute f recursively, but since it is a back-recursion you might as well compute f iteratively. f(n) is (-1)^(binary weight of n) = (-1)^(sum of the binary digits of n).
See the Thue-Morse sequence.

Non-recursive combination algorithm to generate distinct character strings

This problem has been irritating me for too long. I need a non-recursive algorithm in C to generate non-distinct character strings. For instance, if a given character string is 26 characters long, and the string is of length 2, then there are 26^2 non-distinct characters.
Please note that these are distinct combinations, aab is not the same as baa or aba. I've searched S.O., and most solutions produce non-distinct combinations. Also, I do not need permutations.
The algorithm can't rely on a libraries. I'm going to translate this C code into cuda where standard C libraries don't work (at least not efficiently).
Before I show you what I started, let me explain an aspect of the program. It is multithreaded on a GPU, so I initialize the beginning string with a few characters, aa in this case. To create a combination, I add one or more characters depending on the desired length.
Here's one method that I have attempted:
int main(void){
//Declarations
char final[12] = {0};
char b[3] = "aa";
char charSet[27] = "abcdefghijklmnopqrstuvwxyz";
int max = 4; //Set for demonstration purposes
int ul = 1;
int k,i;
//This program is multithreaded on a GPU. Each thread is initialized
//to a starting value for the string. In this case, it is aa
//Set final with a starting prefix
int pref = strlen(b);
memcpy(final, b, pref+1);
//Determine the number of non-distinct combinations
for(int j = 0; j < length; j++) ul *= strlen(charSet);
//Start concatenating characters to the current character string
for(k = 0; k < ul; k++)
{
final[pref+1] = charSet[k];
//Do some work with the string
}
...
It should be obvious that this program does nothing useful, accept if I'm only appending one character from charSet.
My professor suggested that I try using a mapping (this isn't homework; I asked him about possible ways to generate distinct combinations without recursion).
His suggestion is similar to what I started above. Using the number of combinations calculated, he suggested to decompose it according to mod 10. However, I realized it wouldn't work.
For example, say I need to append two characters. This gives me 676 combinations using the character set above. If I am on the 523rd combination, the decomposition he demonstrated would yield
523 % 10 = 3
52 % 10 = 2
5 % 10 = 5
It should be obvious that this doesn't work. For one, it yields three characters, and two, if my character set is larger than 10 characters, the mapping ignores those above index 9.
Still, I believe a mapping is key to the solution.
The other method I explored utilized for loops:
//Psuedocode
c = charset;
for(i = 0; i <length(charset); i++){
concat string
for(j = 0; i <length(charset); i++){
concat string
for...
However, this hardcodes the length of the string I want to compute. I could use an if statement with a goto to break it, but I would like to avoid this method.
Any constructive input is appreciated.

Given a string, to find the next possible string in the sequence:
Find the last character in the string which is not the last character in the alphabet.
Replace it with the next character in the alphabet.
Change every character to the right of that character with the first character in the alphabet.
Start with a string which is a repetition of the first character of the alphabet. When step 1 fails (because the string is all the last character of the alphabet) then you're done.
Example: the alphabet is "ajxz".
Start with aaaa.
First iteration: the rightmost character which is not z is the last one. Change it to the next character: aaaj
Second iteration. Ditto. aaax
Third iteration: Again. aaaz
Four iteration: Now the rightmost non-z character is the second last one. Advance it and change all characters to the right to a: aaja
Etc.

First, thanks for everyone's input; it was helpful. Being that I am translating this algorithm into cuda, I need it to be as efficient as possible on a GPU. The methods proposed certainly work, but not necessarily optimal for GPU architecture. I came up with a different solution using modular arithmetic that takes advantage of the base of my character set. Here's an example program, primarily in C with a mix of C++ for output, and it's fairly fast.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <iostream>
using namespace std;
typedef unsigned long long ull;
int main(void){
//Declarations
int init = 2;
char final[12] = {'a', 'a'};
char charSet[27] = "abcdefghijklmnopqrstuvwxyz";
ull max = 2; //Modify as need be
int base = strlen(charSet);
int placeHolder; //Maps to character in charset (result of %)
ull quotient; //Quotient after division by base
ull nComb = 1;
char comb[max+1]; //Array to hold combinations
int c = 0;
ull i,j;
//Compute the number of distinct combinations ((size of charset)^length)
for(j = 0; j < max; j++) nComb *= strlen(charSet);
//Begin computing combinations
for(i = 0; i < nComb; i++){
quotient = i;
for(j = 0; j < max; j++){ //No need to check whether the quotient is zero
placeHolder = quotient % base;
final[init+j] = charSet[placeHolder]; //Copy the indicated character
quotient /= base; //Divide the number by its base to calculate the next character
}
string str(final);
c++;
//Print combinations
cout << final << "\n";
}
cout << "\n\n" << c << " combinations calculated";
getchar();
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Boyer-Moore Algorithm - c

Related

Count Different Character Types In String

Insertion sort C programming - alphabet letters

Accounting for no existant characters as inputs C

How should I generate the n-th digit of this sequence in logarithmic time complexity?

Non-recursive combination algorithm to generate distinct character strings

Categories

Resources