C String -- Sort by first-word length [closed]

C String -- Sort by first-word length [closed] - c

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I'm self-studying C and doing an exercise that, among other things, asks me to sort a list of user-entered strings by length of the first word in the string. The other functions in the exercise (including sorting the string by entire length) were easy to write. I've been working on this one for over three hours and can't get it to work. I'm sorting an array of pointers-to-char, and then printing them with a for loop in the main() function.
There's probably a much easier way to do this, but even if so, I cannot understand why this function doesn't work. I've made about thirty changes to it and the sort still comes out pretty random.
void srtlengthw(char * strings[], int n)
{
int top, seek, ct, ct_temp, i;
int ar_ct[n]
char * temp;
bool inWord;
for (top = 0, ct = 0, i = 0, inWord = false; top < n - 1; top++)
{
while (strings[top][i])
{
if (!isblank(strings[top][i]))
{
i++;
ct++;
inWord = true;
}
else if (!inWord)
i++;
else
break;
}
ar_ct[top] = ct;
for (seek = top + 1, ct = 0, i = 0, inWord = false; seek < n; seek++)
{
while(strings[seek][i])
{
if (!isblank(strings[seek][i]))
{
i++;
ct++;
inWord = true;
}
else if (!inWord)
i++;
else
break;
}
ar_ct[seek] = ct;
if (ar_ct[top] > ar_ct[seek])
{
ct_temp = ar_ct[top];
ar_ct[top] = ar_ct[seek];
ar_ct[seek] = ct_temp;
temp = strings[top];
strings[top] = strings[seek];
strings[seek] = temp;
}
}
}}
Example of wrong output, as requested:
Input:
Mary
had
a
little
lamb
that
was
sacrificed
to
Satan
=========
Output:
had
a
little
lamb
that
was
sacrificed
to
Mary
Satan
And here's an example of a much simpler function that worked properly. It's meant to sort the pointers by length of the entire string rather than just the first word. I tried to model the word-length sort function on this one, but I'm apparently having trouble dealing with my counter variables and maybe my bool flag right.
void srtlength(char * strings[], int n)
{
int top, seek;
char * temp;
for (top = 0; top < n - 1; top++)
for (seek = top + 1; seek < n; seek++)
if (strlen(strings[top]) > strlen(strings[seek]))
{
temp = strings[top];
strings[top] = strings[seek];
strings[seek] = temp;
}
}
For Craig, hopefully this helps?
Input:
They say it's lonely at the top, and whatever you do
You always gotta watch m*********s around you
Nobody's invincible
No plan is foolproof
We all must meet our moment of truth
The same sheisty cats that you hang with and do your thang with
Could set you up and wet you up, n***a, peep the language
It's universal
You play with fire, it may hurt you, or burn you
Lessons are blessins you should learn through
Output for me:
You always gotta watch m********s around you
Nobody's invincible
No plan is foolproof
We all must meet our moment of truth
The same sheisty cats that you hang with and do your thang with
Could set you up and wet you up, n***a, peep the language
It's universal
You play with fire, it may hurt you, or burn you
Lessons are blessins you should learn through
They say it's lonely at the top, and whatever you do

If you're looking for output similar to that of the example code that you posted, then I suggest using it as a template for a version with your expected behavior. The key that I'm looking to point out is that it sorts by the return value of the strlen function.
strlen is a function in C's <string.h> library (I think?) that returns the length of a C-style string. In C, as you're probably aware, the end of a string is identified by a null terminator, which is represented as a '\0'.
While the precise strlen may vary from one library to another, here is one standard implementation (made easier to read):
int strlen(char * str){
char * l;
for(l = str; *l != '\0'; l++);
return l - str;
}
People will likely argue that there are problems with this and it isn't perfect, but it does hopefully show how the length of a string is determined.
Now that we understand that the last example sorts by the total string length, and we know how string length is determined, we can probably make our own version of strlen that stops after the first word, instead of stopping at the null terminator:
int blank_strlen(char * str){
char * l;
for(l = str; *l != '\0' && !isblank(*l); l++);
return l - str;
}
Now, using the example code given:
void blank_srtlength(char * strings[], int n)
{
int top, seek;
char * temp;
for (top = 0; top < n - 1; top++)
for (seek = top + 1; seek < n; seek++)
if (blank_strlen(strings[top]) > blank_strlen(strings[seek]))
{
temp = strings[top];
strings[top] = strings[seek];
strings[seek] = temp;
}
}

millinon's answer is a much better way to do it, as it is simpler. However, if you are looking for the reason why your code isn't working, it is due to your variables only being reset outside of each loop.
This code:
for (seek = top + 1, ct = 0, i = 0, inWord = false; seek < n; seek++)
{
while(strings[seek][i])
only sets ct, i and inWord once, before the loop is first started. When the program loops around, the values of ct, i and inWord will be kept from the last iteration.
Moving the assignments inside the loop like this:
for (seek = top + 1; seek < n; seek++)
{
ct = 0;
i = 0;
inWord = false;
while(strings[seek][i])
will fix your problem (you have to do it in both places).

Related

Segmentation Fault in pset2 Readability

I am working on the readability problem in pset 2. I am finding that I get a segmentation fault but I believe I have narrowed down that it is coming from two functions I have created that calculates the number of words and letters in the user inputted text.
//function that counts letters
int count_letters(string t)
{
int letters = 0;
for (int i = 0, len = strlen(t); i < len; i++)
{
if (isalpha(t))
{
letters++;
}
}
return letters;
}
//function that counts words
int count_words(string t)
{
int words = 1;
for (int x = 0, len = strlen(t); x < len; x++)
{
if (isspace(t))
{
words++;
}
}
return words;
}
I am unsure how to fix the issue and would be open to any advice.
P.S. Sorry for any formating issues this is my first time posting to stack overflow.

First, please describe what you're talking about. It's not at all clear what the "readability problem in pset 2" is. Googling gave me some idea, but we shouldn't have to do that.
Second, isalpha() and isspace() take a char, not a string. You probably want
if (isalpha(t[i]))
Plus, you currently don't even use the variable 'i'. You should also check if 't' is a NULL string or not before dereferencing it. That's the most common cause of segfaults.
Third, the count_words() routine will always return at least 1. What if the passed in string is empty or only whitespace?

C segmentation fault

I'm trying to create a sub array with the following function :
Track * subArray(Track * arr, int start, int end){
int size = end - start;
Track * t = malloc(sizeof(Track) * size);
for(int i = 0; i < size && start <= end; i++){
t[i] = arr[start++];
}
}
The size of the t pointer is always 8, even when I don't multiply it with size and I get a segmentation fault. I'm new to C so I don't know what is causing this exception.

This is why C is hard. It's an off by one error: You need to allocate (end-start+1) items and use <= in the loop. Try rewriting to something like this:
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct Track {
char* color;
} Track;
Track * subArray(Track* arr, int start, int end){
assert(end > start);
const int size = 1 + end - start;
printf("Allocating %d items\n", size);
Track* t = malloc(sizeof(Track)*size);
for(int i=start; i <= end; ++i) {
printf("at %d fetching %d\n", i-start, i);
t[i-start] = arr[i];
}
return t;
}
int main() {
Track *track = malloc(sizeof(Track) * 7);
track[0].color = "red";
track[1].color = "orange";
track[2].color = "yellow";
track[3].color = "blue";
track[4].color = "indigo";
track[5].color = "green";
track[6].color = "violet";
Track *sub = subArray(track, 3, 5);
printf("%s\n", sub[0].color);
printf("%s\n", sub[1].color);
printf("%s\n", sub[2].color);
}
Compiling and running:
$ cc -g -W -Wall a.c && ./a.out
Allocating 3 items
at 0 fetching 3
at 1 fetching 4
at 2 fetching 5
blue
indigo
green
Note that I'm copying the value of char* pointers here. That may lead to additional confusing stuff, just in case you think about copying my code (I just drafted something that works to illustrate the problem).
Update
You're using inclusive indices. In C, however, it's quite common to specify a start index and a length. A lot of standard library functions do this, and this is what you'll most likely see in production code. One reason may be that it's easier to reason with. In your case, the code would be
Track* subArray(Track* arr, const size_t start, const size_t length) {
Track* t = malloc(sizeof(Track) * length);
for (size_t i = 0; i < length; ++i)
t[i] = arr[i + start];
return t;
}
and the corresponding call would be
Track *sub = subArray(track, 3, 3);
In my eyes, this not only looks better; it's simpler and easier to understand.
Another thing that's common is to copy pointers instead of the whole structs. This will depend on how your code and data structures are organized. In that case, it's quite common to use a sentry value at the end of an array of pointers to mark its end: This will typically be a NULL pointer.
Keep practising and keep reading other people's code, and you'll soon discover C idioms and programming styles that will make your life much easier!

I think your error is that you're using <= in your tests when they should be <. This will prevent you from running off the end of your arrays.

Swapping elements of char array in C

I have this code:
char *sort(char *string){ //shell-sort
int lnght = length(string) - 1; // length is my own function
int gap = lnght / 2;
while (gap > 0)
{
for (int i = 0; i < lnght; i++)
{
int j = i + gap;
int tmp =(int)string[j];
while (j >= gap && tmp > (int)string[j - gap])
{
string[j] = string[j - gap]; // code fails here
j -= gap;
}
string[j] = (char)tmp; // and here as well
}
if (gap == 2){
gap = 1;
}
else{
gap /= 2.2;
}
}
return string;
}
The code should sort (shell-sort) the characters in the string, given the ordinal value (ASCII value). Even though the code is pretty simple, it still fails at lines I've commented - segmentation fault. I've spent plenty of time with this code and still can't find the problem.

As you say in comment , you call our function like this -
char *str = "test string";
sort(str);
String literal is in read-only memory and creates a pointer str to that, thus it cannot be modified , and your function modifies it . Therefore ,it can result in segmentation fault .
Declare like this -
char str[] = "test string";

In situations like this look at your statements not so much as executable code, but as mathematical boundary conditions. I've replaced the monstrous name lnght with length for readability purposes.
Here are the relevant conditions that affect the value of j when entering the while loop, relative to the length.
i < length;
gap = length / 2;
j = i + gap;
Now we plug in a value. Consider the case where length == 10. Then presumably the maximum index in your array is 9 which is also the highest value that i can take on.
Then we also have that gap == 5 and so after entering the while loop j == i + gap == 9 + 5. Clearly 9 + 5 > 10. The rest is left as an exercise to the programmer.

How do you test your function? With a static string (i.e. char *buffer = "test string";) ?
Because on first loop at least j and j-gap should be inside the string boundaries. So if you get a segfault I guess it is because of a bad string (statics can't be modified).
Replacing length() by strlen() and calling it with a well-created test string lead me to a valid result:
"adgfbce" → "gfedcba"

longest common subsequence: why is this wrong?

int lcs(char * A, char * B)
{
int m = strlen(A);
int n = strlen(B);
int *X = malloc(m * sizeof(int));
int *Y = malloc(n * sizeof(int));
int i;
int j;
for (i = m; i >= 0; i--)
{
for (j = n; j >= 0; j--)
{
if (A[i] == '\0' || B[j] == '\0')
X[j] = 0;
else if (A[i] == B[j])
X[j] = 1 + Y[j+1];
else
X[j] = max(Y[j], X[j+1]);
}
Y = X;
}
return X[0];
}
This works, but valgrind complains loudly about invalid reads. How was I messing up the memory? Sorry, I always fail at C memory allocation.

The issue here is with the size of your table. Note that you're allocating space as
int *X = malloc(m * sizeof(int));
int *Y = malloc(n * sizeof(int));
However, you are using indices 0 ... m and 0 ... n, which means that there are m + 1 slots necessary in X and n + 1 slots necessary in Y.
Try changing this to read
int *X = malloc((m + 1) * sizeof(int));
int *Y = malloc((n + 1) * sizeof(int));
Hope this helps!

Series of issues. First, as templatetypedef says, you're under-allocated.
Then, as paddy says, you're not freeing up your malloc'd memory. If you need the Y=X line, you'll need to store the original malloc'd space addresses in another set of variables so you can call free on them.
...mallocs...
int * original_y = Y;
int * original_x = X;
...body of code...
free(original_y);
free(original_x);
return X[0];
But this doesn't address your new question, which is why doesn't the code actually work?
I admit I can't follow your code (without a lot more study), but I can propose an algorithm that will work and be far more understandable. This may be somewhat pseudocode and not particularly efficient, but getting it correct is the first step. I've listed some optimizations later.
int lcs(char * A, char * B)
{
int length_a = strlen(A);
int length_b = strlen(B);
// these hold the position in A of the longest common substring
int longest_found_length = 0;
// go through each substring of one of the strings (doesn't matter which, you could pick the shorter one if you want)
char * candidate_substring = malloc(sizeof(char) * length_a + 1);
for (int start_position = 0; start_position < length_a; start_position++) {
for (int end_position = start_position; end_position < length_a; end_position++) {
int substring_length = end_position - start_position + 1;
// make a null-terminated copy of the substring to look for in the other string
strncpy(candidate_substring, &(A[start_position]), substring_length);
if (strstr(B, candidate_substring) != NULL) {
longest_found_length = substring_length;
}
}
}
free(candidate_substring);
return longest_found_length;
}
Some different optimizations you could do:
// if this can't be longer, then don't bother checking it. You can play games with the for loop to not have this happen, but it's more complicated.
if (substring_length <= longest_found_index) {
continue;
}
and
// there are more optimizations you could do to this, but don't check
// the substring if it's longer than b, since b can't contain it.
if (substring_length > length_b) {
continue;
}
and
if (strstr(B, candidate_substring) != NULL) {
longest_found_length = end_position - start_position + 1;
} else {
// if nothing contains the shorter string, then nothing can contain the longer one, so skip checking longer strings with the same starting character
break; // skip out of inner loop to next iteration of start_position
}
Instead of copying each candidate substring to a new string, you could do a character swap with the end_position + 1 and a NUL character. Then, after looking for that substring in b, swap the original character at end_position+1 back in. This would be much faster, but complicates the implementation a little.

NOTE: I don't normally write two answers and if you feel that it is tacky, feel free to comment on this one and note vote it up. This answer is a more optimized solution, but I wanted to give the most straightforward one I could think of first and then put this in another answer to not confuse the two. Basically they are for different audiences.
The key to solving this problem efficiently is to not throw away information you have about shorter common substrings when looking for longer ones. Naively, you check each substring against the other one, but if you know that "AB" matches in "ABC", and your next character is C, don't check to see if "ABC" is in "ABC", just check that the spot after "AB" is a "C".
For each character in A, you have to check up to all the letters in B, but because we stop looking through B once a longer substring is no longer possible, it greatly limits the number of checks. Each time you get a longer match up front, you eliminate checks on the back-end, because it will no longer be a longer substring.
For example, if A and B are both long, but contain no common letters, each letter in A will be compared against each letter in B for a runtime of A*B.
For a sequence where there are a lot of matches, but the match length isn't a large fraction of the length of the shorter string, you have A * B combinations to check against the shorter of the two strings (A or B) leading to either A*B*A or A*B*B, which is basically O(n^3) time for similar length strings. I really thought the optimizations in this solution would be better than n^3 even though there are triple-nested for loops, but it appears to not be as best as I can tell.
I'm thinking about this some more, though. Either the substrings being found are NOT a significant fraction of the length of the strings, in which case the optimizations don't do much, but the comparisons for each combination of A*B don't scale with A or B and drop out to be constants -- OR -- they are a significant fraction of A and B and it directly divides against the A*B combinations that have to be compared.
I just may ask this in a question.
int lcs(char * A, char * B)
{
int length_a = strlen(A);
int length_b = strlen(B);
// these hold the position in A of the longest common substring
int longest_length_found = 0;
// for each character in one string (doesn't matter which), look for incrementally larger strings in the other
for (int a_index = 0; a_index < length_a - longest_length_found; a_index++) {
for (int b_index = 0; b_index < length_b - longest_length_found; b_index++) {
// offset into each string until end of string or non-matching character is found
for (int offset = 0; A[a_index+offset] != '\0' && B[b_index+offset] != '\0' && A[a_index+offset] == B[b_index+offset]; offset++) {
longest_length_found = longest_length_found > offset ? longest_length_found : offset;
}
}
}
return longest_found_length;
}

In addition to what templatetypedef said, some things to think about:
Why aren't X and Y the same size?
Why are you doing Y = X? That's an assignment of pointers. Did you perhaps mean memcpy(Y, X, (n+1)*sizeof(int))?

C problem inside function that searches a word inside another one and returns position

The following C problem searches s1 inside s2 and returns the position that s1 was found in s2. i wrote this code and it works fine for values like s1: car s2: carnal but if i have s1:car and s2: cbrcarnal i think it enters an infinite loop or smth but it won't display nothing.Can you guys see the problem? it must be in my function. Oh and I'm not allowed to use strstr.
code:
#include "stdafx.h"
#include "stdio.h"
#include "string.h"
int subsir(char s1[],char s2[],int k)
{ int n,m,i=0,j,poz=-1;
n=strlen(s1);
m=strlen(s2);
if(n>m)
return -1;
j=k;
while(j<=m-n)
if((s1[i]==s2[j])&&(s1[n-1]==s2[j+n-1]))
{
poz=j;
while(j+1<poz+n-1)
if(s1[i+1]==s2[j+1])
{i++;
j++;
}
else
return subsir(s1,s2,poz++);
}
else
j++;
if(poz!=-1)
return poz;
else
return -1;
}
int _tmain(int argc, _TCHAR* argv[])
{char s1[30],s2[30];
int n;
printf("introduceti sirul 1: ");
scanf("%s",&s1);
printf("introduceti sirul 2: ");
scanf("%s",&s2);
n=subsir(s1,s2,0);
if(n!=-1)
printf("%s is found in %s, at: %d\n",s1,s2,n);
else
printf("s %s is not found in %s\n",s1,s2);
}

Here is a formatted version of your code. The only changes I made were to comments, braces, whitespace, declaring variables closer to where they're used, changing one while loop into a for loop, and removing the ineffective post-increment. It does exactly the same as your code:
int subsir(char s1[], char s2[], int k) {
// Return location of s1 in s2, starting at index k in s2.
int n = strlen(s1);
int m = strlen(s2);
if (n > m) // s1 cannot be in s2
return -1;
int poz = -1;
for (int j = k, i = 0; j <= m - n;) {
// if first and last character in s1 match corresponding locations in s2
if((s1[i] == s2[j]) && (s1[n - 1] == s2[j + n - 1])) {
// save current s2 location in poz
poz = j;
while (j + 1 < poz + n - 1) {
if(s1[i + 1] == s2[j + 1]) {
i++;
j++;
}
else {
return subsir(s1, s2, poz);
}
}
}
else {
j++;
}
}
return poz;
}
With this version it should be much easier for you to spot problems:
I know you had it in mind and either written down somewhere explicitly (or you were just reading from the assignment description), but it always helps your thought process to include a short sentence about the purpose of a function. This is the first comment.
I would not normally include the second and third comments in my code, but I'm writing for a different audience than you are. Explaining "why" (for the second comment) and how more complicated expressions work (for the third comment) should help you.
The forth comment is a start at specifying the role poz serves. You should be able to state, similar to the short function description, a short blurb about each variable. Usually, such as for variables n, m, k, j, and i, you don't have to put this explicitly in the code — but if you get confused, start adding them. For example, "k is the starting index to search", "n is the length of s1", "j and i are the 'current' locations in s2 and s1, respectively", and so on.
The post-increment I removed was in a return statement, but since control never returns to this function (and poz is thus never used again), it has no affect.
You checked if poz was -1, and if it was, returned -1; otherwise returning poz. This is the same as returning poz directly.
You modify i, but if control escapes that while loop, you never reset it to zero; thus, you never start searching from the beginning of s1 from then on.
Because you start with poz (instead of poz + 1) on the recursive call, you get stuck in a loop continually checking the same location over and over again. (Passing poz + 1 fixes this.)
To continue from here, I'd refactor out your inner loop into a separate function doing the equivalent of strncmp — or use strncmp directly, if you can. Greater modularity makes problems easier to reason about.
Recursion is not necessary here, but it's fine to model the problem that way:
if s1 is in s2 starting at position k, return k
else call subsir(s1, s2, k + 1) to search at the next location
this would normally be done in a loop rather than a recursive call, but with tail-call optimization, it's exactly the same

Your code is sooo complicated... I honestly don't feel like looking at it closely. But I would like to show you how I would do it
int search(char* what, char* where)
{
int whatlen = strlen(what);
int wherelen = strlen(where);
for(int i = 0; i <= wherelen - whatlen; ++i)
{
int j;
for(j = 0; j < whatlen; ++j)
{
if(what[j] != where[i+j])
break;
}
if(j == whatlen)
return i;
}
return -1;
}

I think the problem is that you're not using strstr.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

C String -- Sort by first-word length [closed] - c

Related

Segmentation Fault in pset2 Readability

C segmentation fault

Swapping elements of char array in C

longest common subsequence: why is this wrong?

C problem inside function that searches a word inside another one and returns position

Categories

Resources