C: Help with Custom strpos() Function - c

I have the following function:
int strpos(const char *needle, const char *haystack)
{
int neLen, haLen, foundPos, nePos, i;
char temp;
neLen = strlen(needle);
haLen = strlen(haystack);
if(haLen < neLen)
return -1;
nePos = 0;
foundPos = -1;
i = 0;
while((temp = *haystack++) != '\0'
&& (i < (haLen-neLen+1) || foundPos > -1)
&& nePos < neLen)
{
if(temp == *needle+nePos)
{
if(nePos == 0)
foundPos = i;
nePos++;
}
else
{
nePos = 0;
foundPos = -1;
}
i++;
}
return foundPos;
}
It works properly when I search for a single character:
printf("Strpos: %d\n", strpos("a", "laoo")); // Result: "Strpos: 1"
But it improperly with longer string:
printf("Strpos: %d\n", strpos("ao", "laoo")); // Result: "Strpos: -1"
What is the problem?
Bonus question: is the while loop properly broken into multiple lines? What is the accepted way to do this?
EDIT: strlen() is, naturally, a custom function that returns the length of the string. This works properly.

Each time you go round the loop you get the next character from haystack. So if needle has two characters by the time you have finished comparing needle with the substring of haystack beginning at position 0, the haystack pointer is pointing at position 2 (for a two character needle).
This means that you skip comparing needle with the substring of haystack beginning at position 1.

The solution is of the standard bang-your-head-against-the-wall-in-an-infinite-loop-and-wonder-why-the-hell-you're-a-programmer variety.
if(temp == *needle+nePos)
Should be:
if(temp == *(needle+nePos))

Related

Count the occurrences of every word in C

I want to count the occurrences of every word of this small text " A broken heart of a broken mind ."
Every word of this text is in 2d array[100][20] in which 100 is the max_words and 20 is the max_word_length. And I have a pointers array[100] in which every pointer points the word. I can't find a clever way to count the same words,
for example
a: 2 times
broken: 2 times
heart: 1 time
mind: 1 time
. : 1 time
These would be the pointers and the words array:
POINTERS ARRAY WORDS ARRAY
point0(points "a") a
point1(points "broken") broken
point2(points "heart") heart
point3(points "of") of
point4 (points "a") mind
point5(points "broken") .
point6(points "mind") \0\0\0\0\0
point7(points ".") \0\0\0\0\0
NULL ..
NULL
..
NULL \0\0\0\0\0
Side note: Every word is lowercase.
void frequence_word(char *pointers[], int frequence_array[]) {
int word = 0;
int i;
int count = 1;
int check[MAX_WORDS];
for (word = 0; word < MAX_WORDS; word++) {
check[word] = -1;
}
for (word = 0; word < MAX_WORDS; word++) {
count = 1;
for (i = word + 1; i < MAX_WORDS; i++) {
if (pointers[word + 1] != NULL
&& strcmp(pointers[word], pointers[i]) == 0) {
count++;
check[i] = 0;
}
}
if (check[word] != 0) {
check[word] = count;
}
}
}
Any ideas please?
This seems like a use case for strstr. You can call strstr, then iteratively reassign to the original string until NULL is reached.
const char substring[] = "A broken heart of a broken mind";
const char* total = ...;
const char* result;
long count = 0;
while (result = strstr(total, substring)) {
count++;
total += (sizeof(substring) - 1);
}
I think this is mostly self-explanetory, but I will explain this line:
total += (sizeof(substring) - 1);
It takes advantage of the fact that sizeof on arrays returns the array length. Thus, sizeof on a character array returns the number of characters in it. We subtract one to ignore the null terminator.

Trying to delete a specific character from a string in C?

I'm trying to delete a specific character (?) from the end of a string and return a pointer to a string, but it's not removing it at all at the moment. What am I doing wrong? Is there a better way to go about it?
char * word_copy = malloc(strlen(word)+1);
strcpy(word_copy, word);
int length = strlen(word_copy);
int i = 0;
int j = 0;
for (i = 0; word_copy[i] != '\0'; i++) {
if (word_copy[length - 1] == '?' && i == length - 1){
break;
}
}
for (int j = i; word_copy[j] != '\0'; j++) {
word_copy[j] = word_copy[j+1];
}
word = strdup(word_copy);
I'm immediately seeing a couple of problems.
The first for loop does nothing. It doesn't actually depend on i so it could be replaced with a single if statement.
if (word_copy[length - 1] == '?') {
i = length - 1;
} else {
i = length + 1;
}
The second for loop also acts as an if statement since it starts at the end of the string and can only ever run 0 or 1 times.
You could instead do something like this to remove the ?. This code will return a new malloced string with the last character removed if its ?.
char *remove_question_mark(char *word) {
unsigned int length = strlen(word);
if (length == 0) {
return calloc(1, 1);
}
if (word[length - 1] == '?') {
char *word_copy = malloc(length);
// Copy up to '?' and put null terminator
memcpy(word_copy, word, length - 1);
word_copy[length - 1] = 0;
return word_copy;
}
char *word_copy = malloc(length + 1);
memcpy(word_copy, word, length + 1);
return word_copy;
}
Or if you are feeling lazy, you could also just make the last character the new null terminator instead. Its essentially creates a memory leak of 1 byte, but that may be an acceptable loss. It should also be a fair bit faster since it doesn't need to allocate any new memory or copy the previous string.
unsigned int length = strlen(word);
if (length > 0 && word[length - 1] == '?') {
word[length] = 0;
}

Check if Char Array contains special sequence without using string library on Unix in C

Let‘s assume we have a char array and a sequence. Next we would like to check if the char array contains the special sequence WITHOUT <string.h> LIBRARY: if yes -> return true; if no -> return false.
bool contains(char *Array, char *Sequence) {
// CONTAINS - Function
for (int i = 0; i < sizeof(Array); i++) {
for (int s = 0; s < sizeof(Sequence); s++) {
if (Array[i] == Sequence[i]) {
// How to check if Sequence is contained ?
}
}
}
return false;
}
// in Main Function
char *Arr = "ABCDEFG";
char *Seq = "AB";
bool contained = contains(Arr, Seq);
if (contained) {
printf("Contained\n");
} else {
printf("Not Contained\n");
}
Any ideas, suggestions, websites ... ?
Thanks in advance,
Regards, from ∆
The simplest way is the naive search function:
for (i = 0; i < lenS1; i++) {
for (j = 0; j < lenS2; j++) {
if (arr[i] != seq[j]) {
break; // seq is not present in arr at position i!
}
}
if (j == lenS2) {
return true;
}
}
Note that you cannot use sizeof because the value you seek is not known at run time. Sizeof will return the pointer size, so almost certainly always four or eight whatever the strings you use. You need to explicitly calculate the string lengths, which in C is done by knowing that the last character of the string is a zero:
lenS1 = 0;
while (string1[lenS1]) lenS1++;
lenS2 = 0;
while (string2[lenS2]) lenS2++;
An obvious and easy improvement is to limit i between 0 and lenS1 - lenS2, and if lenS1 < lenS2, immediately return false. Obviously if you haven't found "HELLO" in "WELCOME" by the time you've gotten to the 'L', there's no chance of five-character HELLO being ever contained in the four-character remainder COME:
if (lenS1 < lenS2) {
return false; // You will never find "PEACE" in "WAR".
}
lenS1minuslenS2 = lenS1 - lenS2;
for (i = 0; i < lenS1minuslenS2; i++)
Further improvements depend on your use case.
Looking for the same sequence among lots of arrays, looking for different sequences always in the same array, looking for lots of different sequences in lots of different arrays - all call for different optimizations.
The length and distribution of characters within both array and sequence also matter a lot, because if you know that there only are (say) three E's in a long string and you know where they are, and you need to search for HELLO, there's only three places where HELLO might fit. So you needn't scan the whole "WE WISH YOU A MERRY CHRISTMAS, WE WISH YOU A MERRY CHRISTMAS AND A HAPPY NEW YEAR" string. Actually you may notice there are no L's in the array and immediately return false.
A balanced option for an average use case (it does have pathological cases) might be supplied by the Boyer-Moore string matching algorithm (C source and explanation supplied at the link). This has a setup cost, so if you need to look for different short strings within very large texts, it is not a good choice (there is a parallel-search version which is good for some of those cases).
This is not the most efficient algorithm but I do not want to change your code too much.
size_t mystrlen(const char *str)
{
const char *end = str;
while(*end++);
return end - str - 1;
}
bool contains(char *Array, char *Sequence) {
// CONTAINS - Function
bool result = false;
size_t s, i;
size_t arrayLen = mystrlen(Array);
size_t sequenceLen = mystrlen(Sequence);
if(sequenceLen <= arrayLen)
{
for (i = 0; i < arrayLen; i++) {
for (s = 0; s < sequenceLen; s++)
{
if (Array[i + s] != Sequence[s])
{
break;
}
}
if(s == sequenceLen)
{
result = true;
break;
}
}
}
return result;
}
int main()
{
char *Arr = "ABCDEFG";
char *Seq = "AB";
bool contained = contains(Arr, Seq);
if (contained)
{
printf("Contained\n");
}
else
{
printf("Not Contained\n");
}
}
Basically this is strstr
const char* strstrn(const char* orig, const char* pat, int n)
{
const char* it = orig;
do
{
const char* tmp = it;
const char* tmp2 = pat;
if (*tmp == *tmp2) {
while (*tmp == *tmp2 && *tmp != '\0') {
tmp++;
tmp2++;
}
if (n-- == 0)
return it;
}
tmp = it;
tmp2 = pat;
} while (*it++ != '\0');
return NULL;
}
The above returns n matches of substring in a string.

Parsing character array to words held in pointer array (C-programming)

I am trying to separate each word from a character array and put them into a pointer array, one word for each slot. Also, I am supposed to use isspace() to detect blanks. But if there is a better way, I am all ears. At the end of the code I want to print out the content of the parameter array.
Let's say the line is: "this is a sentence". What happens is that it prints out "sentence" (the last word in the line, and usually followed by some random character) 4 times (the number of words). Then I get "Segmentation fault (core dumped)".
Where am I going wrong?
int split_line(char line[120])
{
char *param[21]; // Here I want to put one word for each slot
char buffer[120]; // Word buffer
int i; // For characters in line
int j = 0; // For param words
int k = 0; // For buffer chars
for(i = 0; i < 120; i++)
{
if(line[i] == '\0')
break;
else if(!isspace(line[i]))
{
buffer[k] = line[i];
k++;
}
else if(isspace(line[i]))
{
buffer[k+1] = '\0';
param[j] = buffer; // Puts word into pointer array
j++;
k = 0;
}
else if(j == 21)
{
param[j] = NULL;
break;
}
}
i = 0;
while(param[i] != NULL)
{
printf("%s\n", param[i]);
i++;
}
return 0;
}
There are many little problems in this code :
param[j] = buffer; k = 0; : you rewrite at the beginning of buffer erasing previous words
if(!isspace(line[i])) ... else if(isspace(line[i])) ... else ... : isspace(line[i]) is either true of false, and you always use the 2 first choices and never the third.
if (line[i] == '\0') : you forget to terminate current word by a '\0'
if there are multiple white spaces, you currently (try to) add empty words in param
Here is a working version :
int split_line(char line[120])
{
char *param[21]; // Here I want to put one word for each slot
char buffer[120]; // Word buffer
int i; // For characters in line
int j = 0; // For param words
int k = 0; // For buffer chars
int inspace = 0;
param[j] = buffer;
for(i = 0; i < 120; i++) {
if(line[i] == '\0') {
param[j++][k] = '\0';
param[j] = NULL;
break;
}
else if(!isspace(line[i])) {
inspace = 0;
param[j][k++] = line[i];
}
else if (! inspace) {
inspace = 1;
param[j++][k] = '\0';
param[j] = &(param[j-1][k+1]);
k = 0;
if(j == 21) {
param[j] = NULL;
break;
}
}
}
i = 0;
while(param[i] != NULL)
{
printf("%s\n", param[i]);
i++;
}
return 0;
}
I only fixed the errors. I leave for you as an exercise the following improvements :
the split_line routine should not print itself but rather return an array of words - beware you cannot return an automatic array, but it would be another question
you should not have magic constants in you code (120), you should at least have a #define and use symbolic constants, or better accept a line of any size - here again it is not simple because you will have to malloc and free at appropriate places, and again would be a different question
Anyway good luck in learning that good old C :-)
This line does not seems right to me
param[j] = buffer;
because you keep assigning the same value buffer to different param[j] s .
I would suggest you copy all the char s from line[120] to buffer[120], then point param[j] to location of buffer + Next_Word_Postition.
You may want to look at strtok in string.h. It sounds like this is what you are looking for, as it will separate words/tokens based on the delimiter you choose. To separate by spaces, simply use:
dest = strtok(src, " ");
Where src is the source string and dest is the destination for the first token on the source string. Looping through until dest == NULL will give you all of the separated words, and all you have to do is change dest each time based on your pointer array. It is also nice to note that passing NULL for the src argument will continue parsing from where strtok left off, so after an initial strtok outside of your loop, just use src = NULL inside. I hope that helps. Good luck!

C - Largest String From a Big One

So pray tell, how would I go about getting the largest contiguous string of letters out of a string of garbage in C? Here's an example:
char *s = "(2034HEY!!11 th[]thisiswhatwewant44";
Would return...
thisiswhatwewant
I had this on a quiz the other day...and it drove me nuts (still is) trying to figure it out!
UPDATE:
My fault guys, I forgot to include the fact that the only function you are allowed to use is the strlen function. Thus making it harder...
Uae strtok() to split your string into tokens, using all non-letter characters as delimiters, and find the longest token.
To find the longest token you will need to organise some storage for tokens - I'd use linked list.
As simple as this.
EDIT
Ok, if strlen() is the only function allowed, you can first find the length of your source string, then loop through it and replace all non-letter characters with NULL - basically that's what strtok() does.
Then you need to go through your modified source string second time, advancing one token at a time, and find the longest one, using strlen().
This sounds similar to the standard UNIX 'strings' utility.
Keep track of the longest run of printable characters terminated by a NULL.
Walk through the bytes until you hit a printable character. Start counting. If you hit a non-printable character stop counting and throw away the starting point. If you hit a NULL, check to see if the length of the current run is greater then the previous record holder. If so record it, and start looking for the next string.
What defines the "good" substrings compared to the many others -- being lowercase alphas only? (i.e., no spaces, digits, punctuation, uppercase, &c)?
Whatever the predicate P that checks for a character being "good", a single pass over s applying P to each character lets you easily identify the start and end of each "run of good characters", and remember and pick the longest. In pseudocode:
longest_run_length = 0
longest_run_start = longest_run_end = null
status = bad
for i in (all indices over s):
if P(s[i]): # current char is good
if status == bad: # previous one was bad
current_run_start = current_run_end = i
status = good
else: # previous one was also good
current_run_end = i
else: # current char is bad
if status == good: # previous one was good -> end of run
current_run_length = current_run_end - current_run_start + 1
if current_run_length > longest_run_length:
longest_run_start = current_run_start
longest_run_end = current_run_end
longest_run_length = current_run_length
status = bad
# if a good run ends with end-of-string:
if status == good: # previous one was good -> end of run
current_run_length = current_run_end - current_run_start + 1
if current_run_length > longest_run_length:
longest_run_start = current_run_start
longest_run_end = current_run_end
longest_run_length = current_run_length
Why use strlen() at all?
Here's my version which uses no function whatsoever.
#ifdef UNIT_TEST
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#endif
/*
// largest_letter_sequence()
// Returns a pointer to the beginning of the largest letter
// sequence (including trailing characters which are not letters)
// or NULL if no letters are found in s
// Passing NULL in `s` causes undefined behaviour
// If the string has two or more sequences with the same number of letters
// the return value is a pointer to the first sequence.
// The parameter `len`, if not NULL, will have the size of the letter sequence
//
// This function assumes an ASCII-like character set
// ('z' > 'a'; 'z' - 'a' == 25; ('a' <= each of {abc...xyz} <= 'z'))
// and the same for uppercase letters
// Of course, ASCII works for the assumptions :)
*/
const char *largest_letter_sequence(const char *s, size_t *len) {
const char *p = NULL;
const char *pp = NULL;
size_t curlen = 0;
size_t maxlen = 0;
while (*s) {
if ((('a' <= *s) && (*s <= 'z')) || (('A' <= *s) && (*s <= 'Z'))) {
if (p == NULL) p = s;
curlen++;
if (curlen > maxlen) {
maxlen = curlen;
pp = p;
}
} else {
curlen = 0;
p = NULL;
}
s++;
}
if (len != NULL) *len = maxlen;
return pp;
}
#ifdef UNIT_TEST
void fxtest(const char *s) {
char *test;
const char *p;
size_t len;
p = largest_letter_sequence(s, &len);
if (len && (len < 999)) {
test = malloc(len + 1);
if (!test) {
fprintf(stderr, "No memory.\n");
return;
}
strncpy(test, p, len);
test[len] = 0;
printf("%s ==> %s\n", s, test);
free(test);
} else {
if (len == 0) {
printf("no letters found in \"%s\"\n", s);
} else {
fprintf(stderr, "ERROR: string too large\n");
}
}
}
int main(void) {
fxtest("(2034HEY!!11 th[]thisiswhatwewant44");
fxtest("123456789");
fxtest("");
fxtest("aaa%ggg");
return 0;
}
#endif
While I waited for you to post this as a question I coded something up.
This code iterates through a string passed to a "longest" function, and when it finds the first of a sequence of letters it sets a pointer to it and starts counting the length of it. If it is the longest sequence of letters yet seen, it sets another pointer (the 'maxStringStart' pointer) to the beginning of that sequence until it finds a longer one.
At the end, it allocates enough room for the new string and returns a pointer to it.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int isLetter(char c){
return ( (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') );
}
char *longest(char *s) {
char *newString = 0;
int maxLength = 0;
char *maxStringStart = 0;
int curLength = 0;
char *curStringStart = 0;
do {
//reset the current string length and skip this
//iteration if it's not a letter
if( ! isLetter(*s)) {
curLength = 0;
continue;
}
//increase the current sequence length. If the length before
//incrementing is zero, then it's the first letter of the sequence:
//set the pointer to the beginning of the sequence of letters
if(curLength++ == 0) curStringStart = s;
//if this is the longest sequence so far, set the
//maxStringStart pointer to the beginning of it
//and start increasing the max length.
if(curLength > maxLength) {
maxStringStart = curStringStart;
maxLength++;
}
} while(*s++);
//return null pointer if there were no letters in the string,
//or if we can't allocate any memory.
if(maxLength == 0) return NULL;
if( ! (newString = malloc(maxLength + 1)) ) return NULL;
//copy the longest string into our newly allocated block of
//memory (see my update for the strlen() only requirement)
//and null-terminate the string by putting 0 at the end of it.
memcpy(newString, maxStringStart, maxLength);
newString[maxLength + 1] = 0;
return newString;
}
int main(int argc, char *argv[]) {
int i;
for(i = 1; i < argc; i++) {
printf("longest all-letter string in argument %d:\n", i);
printf(" argument: \"%s\"\n", argv[i]);
printf(" longest: \"%s\"\n\n", longest(argv[i]));
}
return 0;
}
This is my solution in simple C, without any data structures.
I can run it in my terminal like this:
~/c/t $ ./longest "hello there, My name is Carson Myers." "abc123defg4567hijklmnop890"
longest all-letter string in argument 1:
argument: "hello there, My name is Carson Myers."
longest: "Carson"
longest all-letter string in argument 2:
argument: "abc123defg4567hijklmnop890"
longest: "hijklmnop"
~/c/t $
the criteria for what constitutes a letter could be changed in the isLetter() function easily. For example:
return (
(c >= 'a' && c <= 'z') ||
(c >= 'A' && c <= 'Z') ||
(c == '.') ||
(c == ' ') ||
(c == ',') );
would count periods, commas and spaces as 'letters' also.
as per your update:
replace memcpy(newString, maxStringStart, maxLength); with:
int i;
for(i = 0; i < maxLength; i++)
newString[i] = maxStringStart[i];
however, this problem would be much more easily solved with the use of the C standard library:
char *longest(char *s) {
int longest = 0;
int curLength = 0;
char *curString = 0;
char *longestString = 0;
char *tokens = " ,.!?'\"()#$%\r\n;:+-*/\\";
curString = strtok(s, tokens);
do {
curLength = strlen(curString);
if( curLength > longest ) {
longest = curLength;
longestString = curString;
}
} while( curString = strtok(NULL, tokens) );
char *newString = 0;
if( longest == 0 ) return NULL;
if( ! (newString = malloc(longest + 1)) ) return NULL;
strcpy(newString, longestString);
return newString;
}
First, define "string" and define "garbage". What do you consider a valid, non-garbage string? Write down a concrete definition you can program - this is how programming specs get written. Is it a sequence of alphanumeric characters? Should it start with a letter and not a digit?
Once you get that figured out, it's very simple to program. Start with a naive method of looping over the "garbage" looking for what you need. Once you have that, look up useful C library functions (like strtok) to make the code leaner.
Another variant.
#include <stdio.h>
#include <string.h>
int main(void)
{
char s[] = "(2034HEY!!11 th[]thisiswhatwewant44";
int len = strlen(s);
int i = 0;
int biggest = 0;
char* p = s;
while (p[0])
{
if (!((p[0] >= 'A' && p[0] <= 'Z') || (p[0] >= 'a' && p[0] <= 'z')))
{
p[0] = '\0';
}
p++;
}
for (; i < len; i++)
{
if (s[i] && strlen(&s[i]) > biggest)
{
biggest = strlen(&s[i]);
p = &s[i];
}
}
printf("%s\n", p);
return 0;
}

Resources