Suggestions to improve a C ReplaceString function? - c

I've just started to get in to C programming and would appreciate criticism on my ReplaceString function.
It seems pretty fast (it doesn't allocate any memory other than one malloc for the result string) but it seems awfully verbose and I know it could be done better.
Example usage:
printf("New string: %s\n", ReplaceString("great", "ok", "have a g grea great day and have a great day great"));
printf("New string: %s\n", ReplaceString("great", "fantastic", "have a g grea great day and have a great day great"));
Code:
#ifndef uint
#define uint unsigned int
#endif
char *ReplaceString(char *needle, char *replace, char *haystack)
{
char *newString;
uint lNeedle = strlen(needle);
uint lReplace = strlen(replace);
uint lHaystack = strlen(haystack);
uint i;
uint j = 0;
uint k = 0;
uint lNew;
char active = 0;
uint start = 0;
uint end = 0;
/* Calculate new string size */
lNew = lHaystack;
for (i = 0; i < lHaystack; i++)
{
if ( (!active) && (haystack[i] == needle[0]))
{
/* Start of needle found */
active = 1;
start = i;
end = i;
}
else if ( (active) && (i-start == lNeedle) )
{
/* End of needle */
active = 0;
lNew += lReplace - lNeedle;
}
else if ( (active) && (i-start < lNeedle) && (haystack[i] == needle[i-start]) )
{
/* Next part of needle found */
end++;
}
else if (active)
{
/* Didn't match the entire needle... */
active = 0;
}
}
active= 0;
end = 0;
/* Prepare new string */
newString = malloc(sizeof(char) * lNew + 1);
newString[sizeof(char) * lNew] = 0;
/* Build new string */
for (i = 0; i < lHaystack; i++)
{
if ( (!active) && (haystack[i] == needle[0]))
{
/* Start of needle found */
active = 1;
start = i;
end = i;
}
else if ( (active) && (i-start == lNeedle) )
{
/* End of needle - apply replacement */
active = 0;
for (k = 0; k < lReplace; k++)
{
newString[j] = replace[k];
j++;
}
newString[j] = haystack[i];
j++;
}
else if ( (active) && (i-start < lNeedle) && (haystack[i] == needle[i-start])
)
{
/* Next part of needle found */
end++;
}
else if (active)
{
/* Didn't match the entire needle, so apply skipped chars */
active = 0;
for (k = start; k < end+2; k++)
{
newString[j] = haystack[k];
j++;
}
}
else if (!active)
{
/* No needle matched */
newString[j] = haystack[i];
j++;
}
}
/* If still matching a needle... */
if ( active && (i-start == lNeedle))
{
/* If full needle */
for (k = 0; k < lReplace; k++)
{
newString[j] = replace[k];
j++;
}
newString[j] = haystack[i];
j++;
}
else if (active)
{
for (k = start; k < end+2; k++)
{
newString[j] = haystack[k];
j++;
}
}
return newString;
}
Any ideas? Thanks very much!

Don't call strlen(haystack). You are already checking every character in the string, so computing the string length is implicit to your loop, as follows:
for (i = 0; haystack[i] != '\0'; i++)
{
...
}
lHaystack = i;

It's possible you are doing this in your own way for practice. If so, you get many points for effort.
If not, you can often save time by using functions that are in the C Runtime Library (CRT) versus coding your own equivalent function. For example, you could use strstr to locate the string that's targeted for replacement. Other string manipulation functions may also be useful to you.
A good exercise would be to complete this example to your satisfaction and then recode using the CRT to see how much faster it is to code and execute.

While looping the first time, you should keep indices on where there need to be replacement and skip those on the strcopy/replace part of the function. This would result in a loop where you only do strncpy from haystack or replacement to new string.

Make the parameters const
char *ReplaceString(const char *needle, const char *replace, const char *haystack)
Oh ... is the function supposed to work only once per word?
ReplaceString("BAR", "bar", "BARBARA WENT TO THE BAR")

My one suggestion has nothing to do with improving performance, but with improving readability.
"Cute" parameter names are much harder to understand than descriptive ones. Which of the following parameters do you think better convey their purpose?
char *ReplaceString(char *needle, char *replace, char *haystack)
char *ReplaceString(char *oldText, char *newText, char *inString)
With one, you have to consciously map a name to a purpose. With the other, the purpose IS the name. Juggling a bunch of name mappings in your head while trying to understand a piece of code can become difficult, especially as the number of variables increases.
This might not seem so important when you're the only one using your code, but it's paramount when your code is being used or read by someone else. And sometimes, "someone else" is yourself, a year later, looking at your own code, wondering why you're searching through haystacks and trying to replace needles ;)

Related

Why does this code bit work in clion but doesn't in VS

Hi I copied the following code from my linux machine with clion running. But in VS on Windows it seems to cause problems
entry_t* find_entry( char* n )
{
// TODO (2)
int x = strlen(n);
char str[x];
for (size_t i = 0; i < strlen(str); i++)
{
str[i] = toupper(n[i]);
}
n = &str;
for (size_t i = 0; i < list_length; i++)
{
if (strcmp(n, name_list[i].name) == 0)
{
return &name_list[i];
}
}
}
VS underlines the x in char str[x]; before the statement do find x was in the brackets of str. I thought finding the length first in another variable would solve the problem
VS give the following error
Schweregrad Code Beschreibung Projekt Datei Zeile Unterdrückungszustand
Fehler (aktiv) E0028 Der Ausdruck muss einen Konstantenwert aufweisen. Names.exe - x64-Debug C:\Users\Eyüp\source\repos\09\main.c 102
trying my best to translate it
-> Error(active) E0028 Statement needs to be a constant value
Variable-length arrays (i.e. arrays whose size is not known at compile-time) are not supported in MSVC because they don't care. Hence you need to use malloc and friends instead.
However that is not the only problem in your code: it has multiple undefined behaviours. Here is a suggested fix:
entry_t* find_entry( char* n )
{
// return value of strlen is of type size_t, not int
size_t x = strlen(n);
// [x] was wrong, it needs to be [x + 1] for the null terminator!
char *str = malloc(x + 1);
// do not use strlen again in the loop. In worst case it does need
// to go through the entire string looking for the null terminator every time.
// You must copy the null terminator, hence i <= x or i < x + 1
for (size_t i = 0; i <= x; i++)
{
// the argument of toupper needs to be *unsigned char*
str[i] = toupper((unsigned char)n[i]);
}
// why this did even exist? And it has type error anyway
// n = &str;
for (size_t i = 0; i < list_length; i++)
{
if (strcmp(str, name_list[i].name) == 0)
{
// need to free the str...
free(str);
return &name_list[i];
}
}
// in both paths...
free(str);
// add default return value
return NULL;
}
Your code invokes undefined behaviour:
as you do not null terminate the string
you call strlen on not null terminated (and initially not initialized string)
The logic is also wrong.
entry_t* find_entry( const char* n )
{
// TODO (2)
size_t x = strlen(n);
char str[x + 1];
for (size_t i = 0; i <= x; i++)
{
str[i] = toupper((unsigned char)n[i]);
}
str[x] = 0;
for (size_t i = 0; i < list_length; i++)
{
if (strcmp(str, name_list[i].name) == 0)
{
return &name_list[i];
}
}
return NULL;
}
You need to return something if the sting was not found.
To use VLAs in VS How to use Visual Studio as an IDE with variable length array(VLA) working?

Check if Char Array contains special sequence without using string library on Unix in C

Let‘s assume we have a char array and a sequence. Next we would like to check if the char array contains the special sequence WITHOUT <string.h> LIBRARY: if yes -> return true; if no -> return false.
bool contains(char *Array, char *Sequence) {
// CONTAINS - Function
for (int i = 0; i < sizeof(Array); i++) {
for (int s = 0; s < sizeof(Sequence); s++) {
if (Array[i] == Sequence[i]) {
// How to check if Sequence is contained ?
}
}
}
return false;
}
// in Main Function
char *Arr = "ABCDEFG";
char *Seq = "AB";
bool contained = contains(Arr, Seq);
if (contained) {
printf("Contained\n");
} else {
printf("Not Contained\n");
}
Any ideas, suggestions, websites ... ?
Thanks in advance,
Regards, from ∆
The simplest way is the naive search function:
for (i = 0; i < lenS1; i++) {
for (j = 0; j < lenS2; j++) {
if (arr[i] != seq[j]) {
break; // seq is not present in arr at position i!
}
}
if (j == lenS2) {
return true;
}
}
Note that you cannot use sizeof because the value you seek is not known at run time. Sizeof will return the pointer size, so almost certainly always four or eight whatever the strings you use. You need to explicitly calculate the string lengths, which in C is done by knowing that the last character of the string is a zero:
lenS1 = 0;
while (string1[lenS1]) lenS1++;
lenS2 = 0;
while (string2[lenS2]) lenS2++;
An obvious and easy improvement is to limit i between 0 and lenS1 - lenS2, and if lenS1 < lenS2, immediately return false. Obviously if you haven't found "HELLO" in "WELCOME" by the time you've gotten to the 'L', there's no chance of five-character HELLO being ever contained in the four-character remainder COME:
if (lenS1 < lenS2) {
return false; // You will never find "PEACE" in "WAR".
}
lenS1minuslenS2 = lenS1 - lenS2;
for (i = 0; i < lenS1minuslenS2; i++)
Further improvements depend on your use case.
Looking for the same sequence among lots of arrays, looking for different sequences always in the same array, looking for lots of different sequences in lots of different arrays - all call for different optimizations.
The length and distribution of characters within both array and sequence also matter a lot, because if you know that there only are (say) three E's in a long string and you know where they are, and you need to search for HELLO, there's only three places where HELLO might fit. So you needn't scan the whole "WE WISH YOU A MERRY CHRISTMAS, WE WISH YOU A MERRY CHRISTMAS AND A HAPPY NEW YEAR" string. Actually you may notice there are no L's in the array and immediately return false.
A balanced option for an average use case (it does have pathological cases) might be supplied by the Boyer-Moore string matching algorithm (C source and explanation supplied at the link). This has a setup cost, so if you need to look for different short strings within very large texts, it is not a good choice (there is a parallel-search version which is good for some of those cases).
This is not the most efficient algorithm but I do not want to change your code too much.
size_t mystrlen(const char *str)
{
const char *end = str;
while(*end++);
return end - str - 1;
}
bool contains(char *Array, char *Sequence) {
// CONTAINS - Function
bool result = false;
size_t s, i;
size_t arrayLen = mystrlen(Array);
size_t sequenceLen = mystrlen(Sequence);
if(sequenceLen <= arrayLen)
{
for (i = 0; i < arrayLen; i++) {
for (s = 0; s < sequenceLen; s++)
{
if (Array[i + s] != Sequence[s])
{
break;
}
}
if(s == sequenceLen)
{
result = true;
break;
}
}
}
return result;
}
int main()
{
char *Arr = "ABCDEFG";
char *Seq = "AB";
bool contained = contains(Arr, Seq);
if (contained)
{
printf("Contained\n");
}
else
{
printf("Not Contained\n");
}
}
Basically this is strstr
const char* strstrn(const char* orig, const char* pat, int n)
{
const char* it = orig;
do
{
const char* tmp = it;
const char* tmp2 = pat;
if (*tmp == *tmp2) {
while (*tmp == *tmp2 && *tmp != '\0') {
tmp++;
tmp2++;
}
if (n-- == 0)
return it;
}
tmp = it;
tmp2 = pat;
} while (*it++ != '\0');
return NULL;
}
The above returns n matches of substring in a string.

code accounting for multiple delimiters isn't working

I have a program I wrote to take a string of words and, based on the delimiter that appears, separate each word and add it to an array.
I've adjusted it to account for either a ' ' , '.' or '.'. Now the goal is to adjust for multiple delimiters appearing together (as in "the dog,,,was walking") and still only add the word. While my program works, and it doesn't print out extra delimiters, every time it encounters additional delimiters, it includes a space in the output instead of ignoring them.
int main(int argc, const char * argv[]) {
char *givenString = "USA,Canada,Mexico,Bermuda,Grenada,Belize";
int stringCharCount;
//get length of string to allocate enough memory for array
for (int i = 0; i < 1000; i++) {
if (givenString[i] == '\0') {
break;
}
else {
stringCharCount++;
}
}
// counting # of commas in the original string
int commaCount = 1;
for (int i = 0; i < stringCharCount; i++) {
if (givenString[i] == ',' || givenString[i] == '.' || givenString[i] == ' ') {
commaCount++;
}
}
//declare blank Array that is the length of commas (which is the number of elements in the original string)
//char *finalArray[commaCount];
int z = 0;
char *finalArray[commaCount] ;
char *wordFiller = malloc(stringCharCount);
int j = 0;
char current = ' ';
for (int i = 0; i <= stringCharCount; i++) {
if (((givenString[i] == ',' || givenString[i] == '\0' || givenString[i] == ',' || givenString[i] == ' ') && (current != (' ' | '.' | ',')))) {
finalArray[z] = wordFiller;
wordFiller = malloc(stringCharCount);
j=0;
z++;
current = givenString[i];
}
else {
wordFiller[j++] = givenString[i];
}
}
for (int i = 0; i < commaCount; i++) {
printf("%s\n", finalArray[i]);
}
return 0;
}
This program took me hours and hours to get together (with help from more experienced developers) and I can't help but get frustrated. I'm using the debugger to my best ability but definitely need more experience with it.
/////////
I went back to pad and paper and kind of rewrote my code. Now I'm trying to store delimiters in an array and compare the elements of that array to the current string value. If they are equal, then we have come across a new word and we add it to the final string array. I'm struggling to figure out the placement and content of the "for" loop that I would use for this.
char * original = "USA,Canada,Mexico,Bermuda,Grenada,Belize";
//creating two intialized variables to count the number of characters and elements to add to the array (so we can allocate enough mmemory)
int stringCharCount = 0;
//by setting elementCount to 1, we can account for the last word that comes after the last comma
int elementCount = 1;
//calculate value of stringCharCount and elementCount to allocate enough memory for temporary word storage and for final array
for (int i = 0; i < 1000; i++) {
if (original[i] == '\0') {
break;
}
else {
stringCharCount++;
if (original[i] == ',') {
elementCount++;
}
}
}
//account for the final element
elementCount = elementCount;
char *tempWord = malloc(stringCharCount);
char *finalArray[elementCount];
int a = 0;
int b = 0;
//int c = 0;
//char *delimiters[4] = {".", ",", " ", "\0"};
for (int i = 0; i <= stringCharCount; i++) {
if (original[i] == ',' || original[i] == '\0') {
finalArray[a] = tempWord;
tempWord = malloc(stringCharCount);
tempWord[b] = '\0';
b = 0;
a++;
}
else {
tempWord[b++] = original[i];
}
}
for (int i = 0; i < elementCount; i++) {
printf("%s\n", finalArray[i]);
}
return 0;
}
Many issues. Suggest dividing code into small pieces and debug those first.
--
Un-initialize data.
// int stringCharCount;
int stringCharCount = 0;
...
stringCharCount++;
Or
int stringCharCount = strlen(givenString);
Other problems too: finalArray[] is never assigned a terminarting null character yet printf("%s\n", finalArray[i]); used.
Unclear use of char *
char *wordFiller = malloc(stringCharCount);
wordFiller = malloc(stringCharCount);
There are more bugs than lines in your code.
I'd suggest you start with something much simpler.
Work through a basic programming book with excercises.
Edit
Or, if this is about learning to program, try another, simpler programming language:
In C# your task looks rather simple:
string givenString = "USA,Canada Mexico,Bermuda.Grenada,Belize";
string [] words = string.Split(new char[] {' ', ',', '.'});
foreach(word in words)
Console.WriteLine(word);
As you see, there are much issues to worry about:
No memory management (alloc/free) this is handeled by the Garbage Collector
no pointers, so nothing can go wrong with them
powerful builtin string capabilities like Split()
foreach makes loops much simpler

Split String into String array

I have been playing around with programming for arduino but today i've come across a problem that i can't solve with my very limited C knowledge.
Here's how it goes.
I'm creating a pc application that sends serial input to the arduino (deviceID, command, commandparameters). This arduino will transmit that command over RF to other arduino's. depending on the deviceID the correct arduino will perform the command.
To be able to determine the deviceID i want to split that string on the ",".
this is my problem, i know how to do this easily in java (even by not using the standard split function), however in C it's a totally different story.
Can any of you guys tell me how to get this working?
thanks
/*
Serial Event example
When new serial data arrives, this sketch adds it to a String.
When a newline is received, the loop prints the string and
clears it.
A good test for this is to try it with a GPS receiver
that sends out NMEA 0183 sentences.
Created 9 May 2011
by Tom Igoe
This example code is in the public domain.
http://www.arduino.cc/en/Tutorial/SerialEvent
*/
String inputString; // a string to hold incoming data
boolean stringComplete = false; // whether the string is complete
String[] receivedData;
void setup() {
// initialize serial:
Serial.begin(9600);
// reserve 200 bytes for the inputString:
inputString.reserve(200);
}
void loop() {
// print the string when a newline arrives:
if (stringComplete) {
Serial.println(inputString);
// clear the string:
inputString = "";
stringComplete = false;
}
}
/*
SerialEvent occurs whenever a new data comes in the
hardware serial RX. This routine is run between each
time loop() runs, so using delay inside loop can delay
response. Multiple bytes of data may be available.
*/
void serialEvent() {
while (Serial.available()) {
// get the new byte:
char inChar = (char)Serial.read();
if (inChar == '\n') {
stringComplete = true;
}
// add it to the inputString:
if(stringComplete == false) {
inputString += inChar;
}
// if the incoming character is a newline, set a flag
// so the main loop can do something about it:
}
}
String[] splitCommand(String text, char splitChar) {
int splitCount = countSplitCharacters(text, splitChar);
String returnValue[splitCount];
int index = -1;
int index2;
for(int i = 0; i < splitCount - 1; i++) {
index = text.indexOf(splitChar, index + 1);
index2 = text.indexOf(splitChar, index + 1);
if(index2 < 0) index2 = text.length() - 1;
returnValue[i] = text.substring(index, index2);
}
return returnValue;
}
int countSplitCharacters(String text, char splitChar) {
int returnValue = 0;
int index = -1;
while (index > -1) {
index = text.indexOf(splitChar, index + 1);
if(index > -1) returnValue+=1;
}
return returnValue;
}
I have decided I'm going to use the strtok function.
I'm running into another problem now. The error happened is
SerialEvent.cpp: In function 'void splitCommand(String, char)':
SerialEvent:68: error: cannot convert 'String' to 'char*' for argument '1' to 'char* strtok(char*, const char*)'
SerialEvent:68: error: 'null' was not declared in this scope
Code is like,
String inputString; // a string to hold incoming data
void splitCommand(String text, char splitChar) {
String temp;
int index = -1;
int index2;
for(temp = strtok(text, splitChar); temp; temp = strtok(null, splitChar)) {
Serial.println(temp);
}
for(int i = 0; i < 3; i++) {
Serial.println(command[i]);
}
}
This is an old question, but i have created some piece of code that may help:
String getValue(String data, char separator, int index)
{
int found = 0;
int strIndex[] = {0, -1};
int maxIndex = data.length()-1;
for(int i=0; i<=maxIndex && found<=index; i++){
if(data.charAt(i)==separator || i==maxIndex){
found++;
strIndex[0] = strIndex[1]+1;
strIndex[1] = (i == maxIndex) ? i+1 : i;
}
}
return found>index ? data.substring(strIndex[0], strIndex[1]) : "";
}
This function returns a single string separated by a predefined character at a given index. For example:
String split = "hi this is a split test";
String word3 = getValue(split, ' ', 2);
Serial.println(word3);
Should print 'is'. You also can try with index 0 returning 'hi' or safely trying index 5 returning 'test'.
Hope this help!
Implementation:
int sa[4], r=0, t=0;
String oneLine = "123;456;789;999;";
for (int i=0; i < oneLine.length(); i++)
{
if(oneLine.charAt(i) == ';')
{
sa[t] = oneLine.substring(r, i).toInt();
r=(i+1);
t++;
}
}
Result:
// sa[0] = 123
// sa[1] = 456
// sa[2] = 789
// sa[3] = 999
For dynamic allocation of memory, you will need to use malloc, ie:
String returnvalue[splitcount];
for(int i=0; i< splitcount; i++)
{
String returnvalue[i] = malloc(maxsizeofstring * sizeof(char));
}
You will also need the maximum string length.
The C way to split a string based on a delimiter is to use strtok (or strtok_r).
See also this question.
I think your idea is a good start point. Here is a code that i use (to parse HTTP GET REST requests with an Ethernet shield).
The idea is to use a while loop and lastIndexOf of and store the strings into an array (but your could do something else).
"request" is the string you want to parse (for me it was called request because.. it was).
int goOn = 1;
int count = -1;
int pos1;
int pos2 = request.length();
while( goOn == 1 ) {
pos1 = request.lastIndexOf("/", pos2);
pos2 = request.lastIndexOf("/", pos1 - 1);
if( pos2 <= 0 ) goOn = 0;
String tmp = request.substring(pos2 + 1, pos1);
count++;
params[count] = tmp;
// Serial.println( params[count] );
if( goOn != 1) break;
}
// At the end you can know how many items the array will have: count + 1 !
I have used this code successfully, but i thing their is an encoding problem when i try to print params[x]... i'm alos a beginner so i don't master chars vs string...
Hope it helps.
I believe this is the most straight forward and quickest way:
String strings[10]; // Max amount of strings anticipated
void setup() {
Serial.begin(9600);
int count = split("L,-1,0,1023,0", ',');
for (int j = 0; j < count; ++j)
{
if (strings[j].length() > 0)
Serial.println(strings[j]);
}
}
void loop() {
delay(1000);
}
// string: string to parse
// c: delimiter
// returns number of items parsed
int split(String string, char c)
{
String data = "";
int bufferIndex = 0;
for (int i = 0; i < string.length(); ++i)
{
char c = string[i];
if (c != ',')
{
data += c;
}
else
{
data += '\0';
strings[bufferIndex++] = data;
data = "";
}
}
return bufferIndex;
}

Boyer Moore Algorithm Implementation?

Is there a working example of the Boyer-Moore string search algorithm in C?
I've looked at a few sites, but they seem pretty buggy, including wikipedia.
Thanks.
The best site for substring search algorithms:
http://igm.univ-mlv.fr/~lecroq/string/
There are a couple of implementations of Boyer-Moore-Horspool (including Sunday's variant) on Bob Stout's Snippets site. Ray Gardner's implementation in BMHSRCH.C is bug-free as far as I know1, and definitely the fastest I've ever seen or heard of. It's not, however, the easiest to understand -- he uses some fairly tricky code to keep the inner loop as a simple as possible. I may be biased, but I think my version2 in PBMSRCH.C is a bit easier to understand (though definitely a bit slower).
1 Within its limits -- it was originally written for MS-DOS, and could use a rewrite for environments that provide more memory.
2 This somehow got labeled as "Pratt-Boyer-Moore", but is actually Sunday's variant of Boyer-Moore-Horspool (though I wasn't aware of it at the time and didn't publish it, I believe I actually invented it about a year before Sunday did).
Here is a C90 implementation that I have stressed with a lot of strange test cases:
#ifndef MAX
#define MAX(a,b) ((a > b) ? (a) : (b))
#endif
void fillBadCharIndexTable (
/*----------------------------------------------------------------
function:
the table fits for 8 bit character only (including utf-8)
parameters: */
size_t aBadCharIndexTable [],
char const * const pPattern,
size_t const patternLength)
/*----------------------------------------------------------------*/
{
size_t i;
size_t remainingPatternLength = patternLength - 1;
for (i = 0; i < 256; ++i) {
aBadCharIndexTable [i] = patternLength;
}
for (i = 0; i < patternLength; ++i) {
aBadCharIndexTable [pPattern [i]] = remainingPatternLength--;
}
}
void fillGoodSuffixRuleTable (
/*----------------------------------------------------------------
function:
the table fits for patterns of length < 256; for longer patterns ... (1 of)
- increase the static size
- use variable length arrays and >= C99 compilers
- allocate (and finally release) heap according to demand
parameters: */
size_t aGoodSuffixIndexTable [],
char const * const pPattern,
size_t const patternLength)
/*----------------------------------------------------------------*/
{
size_t const highestPatternIndex = patternLength - 1;
size_t prefixLength = 1;
/* complementary prefix length, i.e. difference from highest possible pattern index and prefix length */
size_t cplPrefixLength = highestPatternIndex;
/* complementary length of recently inspected pattern substring which is simultaneously pattern prefix and suffix */
size_t cplPrefixSuffixLength = patternLength;
/* too hard to explain in a C source ;-) */
size_t iRepeatedSuffixMax;
aGoodSuffixIndexTable [cplPrefixLength] = patternLength;
while (cplPrefixLength > 0) {
if (!strncmp (pPattern, pPattern + cplPrefixLength, prefixLength)) {
cplPrefixSuffixLength = cplPrefixLength;
}
aGoodSuffixIndexTable [--cplPrefixLength] = cplPrefixSuffixLength + prefixLength++;
}
if (pPattern [0] != pPattern [highestPatternIndex]) {
aGoodSuffixIndexTable [highestPatternIndex] = highestPatternIndex;
}
for (iRepeatedSuffixMax = 1; iRepeatedSuffixMax < highestPatternIndex; ++iRepeatedSuffixMax) {
size_t iSuffix = highestPatternIndex;
size_t iRepeatedSuffix = iRepeatedSuffixMax;
do {
if (pPattern [iRepeatedSuffix] != pPattern [iSuffix]) {
aGoodSuffixIndexTable [iSuffix] = highestPatternIndex - iRepeatedSuffix;
break;
}
--iSuffix;
} while (--iRepeatedSuffix > 0);
}
}
char const * boyerMoore (
/*----------------------------------------------------------------
function:
find a pattern (needle) inside a text (haystack)
parameters: */
char const * const pHaystack,
size_t const haystackLength,
char const * const pPattern)
/*----------------------------------------------------------------*/
{
size_t const patternLength = strlen (pPattern);
size_t const highestPatternIndex = patternLength - 1;
size_t aBadCharIndexTable [256];
size_t aGoodSuffixIndexTable [256];
if (*pPattern == '\0') {
return pHaystack;
}
if (patternLength <= 1) {
return strchr (pHaystack, *pPattern);
}
if (patternLength >= sizeof aGoodSuffixIndexTable) {
/* exit for too long patterns */
return 0;
}
{
char const * pInHaystack = pHaystack + highestPatternIndex;
/* search preparation */
fillBadCharIndexTable (
aBadCharIndexTable,
pPattern,
patternLength);
fillGoodSuffixRuleTable (
aGoodSuffixIndexTable,
pPattern,
patternLength);
/* search execution */
while (pInHaystack++ < pHaystack + haystackLength) {
int iPattern = (int) highestPatternIndex;
while (*--pInHaystack == pPattern [iPattern]) {
if (--iPattern < 0) {
return pInHaystack;
}
}
pInHaystack += MAX (aBadCharIndexTable [*pInHaystack], aGoodSuffixIndexTable [iPattern]);
}
}
return 0;
}

Resources