Binary Search error in Finding Longest Substring of Repeating Characters - arrays

I have been doing a problem given as :
Given a string s, find the length of the longest substring without repeating characters.
e.g : Input: s = "pwwkew"
Output: 3 i.e. "wke" .
I implemented the code using c++. But it's giving a WA while running this testcase due to some error in using binarysearch in this problem . Here is my implementation of the code .
class Solution {
public:
int lengthOfLongestSubstring(string s) {
int n = s.size() ;
vector<char> s1 ;
int maxi = 0 ;
for(int i = 0;i<n;i++){
if(binary_search(s1.begin(),s1.end(),s[i])){
s1.clear();
}
s1.push_back(s[i]);
int p = s1.size() ;
maxi = max(maxi , p) ;
}
return maxi ;
}
};
Here the code runs for all the cases properly but when the vector has [w,k,e] inside it , the binary_search fails to find w inside this array and the array is changed to [w,k,e,w] . Why is this happening ? I understand there is an optimal solution to this problem using sliding window . But what's the error in this implementation?

Related

How to implement multi keyword search in C?

I want to implement a case-insensitive text search which supports parallel testing of multiple keywords. I was already able to achieve this in a way which to me does not seem to be efficient in terms of performance.
The function "strcasestr" (Link to Linux man page) seems to be doing a good job when searching for one keyword, but when you want to simultaneously test multiple keywords - in my understanding - you want to iterate the characters of the text (Haystack) only one single time to find an occurrence of the keywords (Needles).
Using "strcasestr" multiple times would cause - how I understand it - multiple iterations over the text (Haystack), which might not be the fastest solution. An example:
#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
int main (void) {
// Text to search in
char *str = "This is a test!";
char *result = strcasestr(str, "not_found1");
if (result == NULL) {
result = strcasestr(str, "NOT_FOUND2");
}
if (result == NULL) {
result = strcasestr(str, "TEST!");
}
printf("Result pointer: %s\n", result );
return 0;
}
Is there a way to get the position of the first occurrence of one of the (case-insensitive) keywords in the text in a faster way than I did it?
I would appreciate it if the solution would be extensible so that I could continue looping over the text to find all positions of the occurrences of the keywords, because I am working on a full-text search with a result rating system. Frameworks and small hints to put me in the right direction are also very welcome.
After a long time of learning and testing I found a solution which is working well for me. I tested a one-keyword version of it and the performance was comparable to the function "strcasestr" (Tested with ca. 500 MB of text).
To explain what the below code does:
First the text (Haystack) and the keywords (Needles) are defined. Then the keywords are already converted into lowercase for good performance. iter is an Array of numbers which reflect how many characters the current text progress is in match with each keyword. The program linearly iterates over each character of text until it finds a match in one of the keywords - in this case, the program ends and the result is "True". If it does not find a match (=0), the result if "False".
I welcome tips in the comments for better code quality or higher performance.
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main (void) {
int i, j;
int match = 0;
// Haystack
char *text = "This is a test!";
// Needles
int keywords_len = 3;
char keywords[][12] = {
"not_found1",
"NOT_FOUND2",
"TEST!"
};
// Make needles lowercase
for (i = 0; i < keywords_len; i++)
for (j = 0; keywords[i][j]; j++)
keywords[i][j] = tolower(keywords[i][j]);
// Define counters for keywords matches
int iter[] = { 0, 0, 0 };
// Loop over all characters and test match
char ptext;
while (ptext = *text++)
// Compare matches
// NOTE: (x | 32) means case-insensitive
if (!match)
for (i = 0; i < keywords_len; i++)
if ((ptext | 32) == keywords[i][iter[i]]) {
if (keywords[i][++(iter[i])] == '\0') {
match = 1;
break;
}
} else
iter[i] = 0;
else
break;
printf("Result: %s\n", match ? "True" : "False");
return 0;
}

Inserting word from a text file into a tree in C

I have been encountering a weird problem for the past 2 days and I can't get to solve it yet. I am trying to get words from 2 texts files and add those words to a tree. The methods I choose to get the words are refereed here:
Splitting a text file into words in C.
The function that I use to insert words into a tree is the following:
void InsertWord(typosWords Words, char * w)
{
int error ;
DataType x ;
x.word = w ;
printf(" Trying to insert word : %s \n",x.word );
Tree_Insert(&(Words->WordsRoot),x, &error) ;
if (error)
{
printf("Error Occured \n");
}
}
As mentioned in the link posted , when I am trying to import the words from a text file into the tree , I am getting "Error Occured". For once again the function:
the text file :
a
aaah
aaahh
char this_word[15];
while (fscanf(wordlist, "%14s", this_word) == 1)
{
printf("Latest word that was read: '%s'\n", this_word);
InsertWord(W,this_word);
}
But when I am inserting the exact same words with the following way , it works just fine.
for (i = 0 ; i <=2 ; i++)
{
if (i==0)
InsertWord(W,"a");
if (i==1)
InsertWord(W,"aaah");
if (i==2)
InsertWord(W,"aaahh");
}
That proves the tree's functions works fine , but I can't understand what's happening then.I am debugging for straight 2 days and still can't figure it. Any ideas ?
When you read the words using
char this_word[15];
while (fscanf(wordlist, "%14s", this_word) == 1)
{
printf("Latest word that was read: '%s'\n", this_word);
InsertWord(W,this_word);
}
you are always reusing the same memory buffer for the strings. This means when you do
x.word = w ;
you are ALWAYS storing the SAME address. And every read redefine ALL already stored words, basically corrupting the data structure.
Try changing the char this_word[15]; to char *this_word; and placing a this_word = malloc(15);in the beggining of thewhile` loop instead, making it allocate a new buffer for each iteration. So looking like
char *this_word;
while (fscanf(wordlist, "%14s", this_word) == 1)
{
this_word = malloc(15);
printf("Latest word that was read: '%s'\n", this_word);
InsertWord(W,this_word);
}
As suggested by Michael Walz a strdup(3) also solves the immediate problem.
Of course you will also have do free up the .word elements when finished with the tree.
Seems like the problem was in the assignment of the strings.Strdup seemed to solve the problem !

Stack implemented as an array defaulting first value to 0 in C

I have an assignment where I am supposed to use this very very simple (or so I thought) stack that my teacher wrote in C, just using an array. From this, I have to implement reverse polish notation from a text file.
In order for me to implement this, I am using a stack, pushing values on until I hit an operation. I then do the operation and push the result back onto the stack until the user hits p to print the value.
The problem is, for some reason, my professor's implementation of the stack array defaults the first (index 0) value to 0. Printing the stack without pushing anything onto it should result in null but it appears the output is 0.
Here is my professor's implementation of the stack:
#define STK_MAX 1024
#define ELE int
ELE _stk[STK_MAX];
int _top = 0;
void stk_error(char *msg)
{
fprintf(stderr, "Error: %s\n", msg);
exit(-1);
}
int stk_is_full()
{
return _top >= STK_MAX;
}
int stk_is_empty()
{
return _top == 0;
}
void stk_push(ELE v)
{
if ( stk_is_full() )
stk_error("Push on full stack");
_stk[_top++] = v;
}
ELE stk_pop()
{
if ( stk_is_empty() )
stk_error("pop on empty stack");
return _stk[--_top];
}
void print()
{
for(int i = 0; i <= _top; ++i)
printf("%d ", _stk[i]);
printf("\n");
}
I realize that the print statement will print a value that has not been pushed yet, but the problem is, is that when I don't print it, it still ends up there and it ends up screwing up my rpn calculator. Here is what happens when I do this:
// input
stk_push(2);
print();
stk_push(4);
print();
// output
2 0
2 4 0
How do I get rid of the 0 value that is affecting my calculator? Doing stk_pop() after the pushing the first value onto the stack didn't seem to work, and checking that top == 0, then directly inserting that element before incrementing _top didn't work.
When you are printing, loop from 0 to (_top - 1), since your top most element is actually at _top - 1. Hint : Look at your pop/push method.
void print()
{
for(int i = 0; i < _top; ++i)
printf("%d ", _stk[i]);
printf("\n");
}
"The problem is, is that the rpn calculator relies on the TOS being accurate. When I do pop() though, it will pop 0 and not the real TOS."
Sounds like a problem with your calculator implementation. You assumed the top of the stack would be null, but that's not the case for your professors stack implementation. Simply a invalid assumption.
Instead he's provided a stk_is_empty() method to help determine when you've pop everything.
If you need to pop all elements, you'll need to break on the condition of stk_is_empty().
stk_push(2);
stk_push(4);
while( stk_is_empty() == false)
{
stk_pop();
}
Of course in reality you'd be setting the pop return to a variable and doing something with it. The key point is leveraging stk_is_empty().
I haven't written C++ in few years so hopefully I didn't make a minor syntax error.

Entry Point Obscuring

I've been writing an EPO program and so far I've been able to find a call opcode and get the RVA from the following address in the binary, then parse the IAT to get names of functions that are imported and their corresponding RVA's.
I've come to a problem when trying fill arrays with the names + RVA's and going on to compare the WORD value I have from the call address against the RVA's of all the imported functions.
Here's the code I've been working with;
//Declarations.
DWORD dwImportDirectoryVA,dwSectionCount,dwSection=0,dwRawOffset;
PIMAGE_IMPORT_DESCRIPTOR pImportDescriptor;
PIMAGE_THUNK_DATA pThunkData, pFThunkData;
// Arrays to hold names + rva's
unsigned long namearray[100];
DWORD rvaArray[100];
int i = 0;
And the rest:
/* Import Code: */
dwSectionCount = pNtHeaders->FileHeader.NumberOfSections;
dwImportDirectoryVA = pNtHeaders->OptionalHeader.DataDirectory[1].VirtualAddress;
for(;dwSection < dwSectionCount && pSectionHeader->VirtualAddress <= dwImportDirectoryVA;pSectionHeader++,dwSection++);
pSectionHeader--;
dwRawOffset = (DWORD)hMap+pSectionHeader->PointerToRawData;
pImportDescriptor = (PIMAGE_IMPORT_DESCRIPTOR)(dwRawOffset+(dwImportDirectoryVA-pSectionHeader->VirtualAddress));
for(;pImportDescriptor->Name!=0;pImportDescriptor++)
{
pThunkData = (PIMAGE_THUNK_DATA)(dwRawOffset+(pImportDescriptor->OriginalFirstThunk-pSectionHeader->VirtualAddress));
pFThunkData = (PIMAGE_THUNK_DATA)pImportDescriptor->FirstThunk;
for(;pThunkData->u1.AddressOfData != 0;pThunkData++)
{
if(!(pThunkData->u1.Ordinal & IMAGE_ORDINAL_FLAG32))
{
namearray[i] = (dwRawOffset+(pThunkData->u1.AddressOfData-pSectionHeader->VirtualAddress+2));
rvaArray[i] = pFThunkData;
i++;
//
pFThunkData++;
}
}
}
printf("\nFinished.\n");
for (i = 0 ; i <= 100 ; i++)
{
//wRva is defined and initialized earlier in code.
if (rvaArray[i] == wRva)
{
printf("Call to %s found. Address: %X\n", namearray[i], rvaArray[i]);
}
}
NOTE: A lot of this code has been stripped down ( printf statements to track progress.)
The problem is the types of arrays I've been using. I'm not sure how I can store pThunkData (Names) and pFThunkData (RVA's) correctly for usage later on.
I've tried a few things a messed around with the code but I'm admitting defeat and asking for your help.
You could create a list or array of structs, containing pThunkData and pFThunkData.
#define n 100
struct pdata
{
PIMAGE_THUNK_DATA p_thunk_data;
PIMAGE_THUNK_DATA pf_thunk_data;
}
struct pdata pdatas[n]

Ford-Fulkerson algorithm with depth first search

I am doing a homework of Implementing Ford-Fulkerson algorithm, they said we should use DFS for path finding but i am stuck at somewhere. I am not posting the code because it's localized too much. Actually my DFS algorithm works well but the dead ends cause a problem for example if i run my code i get output of DFS like that
0=>1 1=>2 1=>3 3=>5
It starts from 0 and it ends in 5 but the 1=>2 part is unnessecary for my algorithm also i store my path using [N][2] matrix. My question is how can I remove the dead ends in my resulting matrix (Inside of the DFS recursion perhaps?)
You should perform the DFS to find some path between the source and the sink. Then, when the dfs is retuning, you should add a flow
Here's an example. The function "send" is a DFS. Notice that I pass along with the DFS the minimum capacity value found during the search:
https://github.com/juanplopes/icpc/blob/master/uva/820.cpp
int send(int s, int t, int minn) {
V[s] = true;
if (s==t) return minn;
for(int i=1; i<=n; i++) {
int capacity = G[s][i]-F[s][i];
if (!V[i] && capacity > 0) {
if (int sent = send(i, t, min(minn, capacity))) {
F[s][i] += sent;
F[i][s] -= sent;
return sent;
}
}
}
return 0;
}

Resources