Edit Distance Matrix - c

I'm trying to build a program that takes two strings and fills in the edit distance matrix for them. The thing that is tripping me up is, for the second string input, it is skipping over the second input. I've tried clearing the buffer with getch(), but it didn't work. I've also tried switching over to scanf(), but that resulted in some crashes as well. Help please!
Code:
#include <stdio.h>
#include <stdlib.h>
int min(int a, int b, int c){
if(a > b && a > c)
return a;
else if(b > a && b > c)
return b;
else
return c;
}
int main(){
// allocate size for strings
int i, j;
char *input1 = (char*)malloc(sizeof(char)*100);
char *input2 = (char*)malloc(sizeof(char)*100);
// ask for input
printf("Enter the first string: ");
fgets(input1, sizeof(input1), stdin);
printf("\nEnter the second string: ");
fgets(input2, sizeof(input2), stdin);
// make matrix
int len1 = sizeof(input1), len2 = sizeof(input2);
int c[len1 + 1][len2 + 1];
// set up input 2 length
for(i = 0; i < len2 + 1; i++){
c[0][i] = i;
}
// set up input 1 length
for(i = 0; i < len1 + 1; i++){
c[i][0] = i;
}
// fill in the rest of the matrix
for(i = 1; i < len1; i++){
for(j = 1; j < len2; j++){
if(input1[i] == input2[j]) // if the first letters are equal make the diagonal equal to the last
c[i][j] = c[i - 1][j - 1];
else
c[i][j] = 1 + min(c[i - 1][j - 1], c[i - 1][j], c[i][j - 1]);
}
}
// print the matrix
printf("\n");
for(j = 0; j < len2; j++){
for(i = 0; i < len1; i++){
printf("| %d", c[i][j]);
}
printf("\n");
}
return 1;
}

Stick with fgets.
As others have pointed out, use char input1[100] instead of char *input1 = malloc(...)
But, even with that change, which makes the sizeof inside of the fgets correct, using sizeof when setting up len1 and len2 is wrong. You'll be processing an entire buffer of 100, even if their are only 10 valid characters in it (i.e. the remaining ones are undefined/random).
What you [probably] want is strlen [and a newline strip] to get the actual useful lengths.
Here's the modified code [please pardon the gratuitous style cleanup]:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int
min(int a, int b, int c)
{
if (a > b && a > c)
return a;
if (b > a && b > c)
return b;
return c;
}
int
main(void)
{
// allocate size for strings
int i;
int j;
char input1[100];
char input2[100];
// ask for input
printf("Enter the first string: ");
fgets(input1, sizeof(input1), stdin);
int len1 = strlen(input1);
if (input1[len1 - 1] == '\n') {
input1[len1 - 1] = 0;
--len1;
}
printf("\nEnter the second string: ");
fgets(input2, sizeof(input2), stdin);
int len2 = strlen(input2);
if (input2[len2 - 1] == '\n') {
input2[len2 - 1] = 0;
--len2;
}
// make matrix
int c[len1 + 1][len2 + 1];
// set up input 2 length
for (i = 0; i < len2 + 1; i++) {
c[0][i] = i;
}
// set up input 1 length
for (i = 0; i < len1 + 1; i++) {
c[i][0] = i;
}
// fill in the rest of the matrix
for (i = 1; i < len1; i++) {
for (j = 1; j < len2; j++) {
// if the 1st letters are equal make the diagonal equal to the last
if (input1[i] == input2[j])
c[i][j] = c[i - 1][j - 1];
else
c[i][j] = 1 + min(c[i - 1][j - 1], c[i - 1][j], c[i][j - 1]);
}
}
// print the matrix
printf("\n");
for (j = 0; j < len2; j++) {
for (i = 0; i < len1; i++) {
printf("| %d", c[i][j]);
}
printf("\n");
}
return 1;
}
UPDATE:
Okay sweet I see what you mean! The reason I was trying to use malloc though was to avoid making the matrix that I had to print a size of 100x100 blank spaces.
With either the fixed size input1 or the malloced one, fgets will only fill it to the input size entered [clipped to the second argument, if necessary]. But, it does not pad/fill the remainder of the buffer with anything (e.g. spaces on the right). What it does do is add an EOS [end-of-string] character [which is a binary 0x00] after the last char read from input [which is usually the newline].
Thus, if the input string is: abcdef\n, the length [obtainable from strlen] is 7, input[7] will be 0x00, and input1[8] through input1[99] will have undefined/random/unpredictable values and not spaces.
Since a newline char isn't terribly useful, it is often stripped out before further processing. For example, it isn't terribly relevant when computing edit distance for a small phrase.
Does using strlen() only count the number of chars inside the array, or does it include all the blank spaces too?
As I mentioned above, fgets does not pad the string at the end, so, not to worry. It will do what you want/expect.
strlen only counts chars up to [but not including the EOS terminator character (i.e.) zero]. If some of these chars happen to be spaces, they will be counted by strlen--which is what we want.
Consider computing the edit distance between any two of the following phrases:
quick brown fox jumped over the lazy dogs
the quick brown fox jumped over lazy dogs
quick brown fox jumps over the lazy dog
In each case, we want strlen to include the [internal/embedded] spaces in the length calculation. That's because it is perfectly valid to compute the edit distance of phrases.
There is a valid usage for malloc: when the amount of data is too big to fit on the stack. Most systems have a default limit (e.g. under linux, it's 8 MB).
Suppose we were computing the edit distance for two book chapters [read from files], we'd have (e.g.):
char input1[50000];
char input2[50000];
The above would fit, but the c matrix would cause a stack overflow:
int c[50000][50000];
because the size of this would be 50000 * 50000 * 4 which is approx 9.3 GB.
So, to fit all this data, we'd need to allocate it on the heap. While it is possible to do a malloc for c and maintain the 2D matrix access, we'd have to create a function and pass off the pointer to c to it.
So, here's a modified version that takes input of arbitrarily large sizes:
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#define sysfault(_fmt...) \
do { \
fprintf(stderr,_fmt); \
exit(1); \
} while (0)
#define C(y,x) c[((y) * (len2 + 1)) + (x)]
long
min(long a, long b, long c)
{
if (a > b && a > c)
return a;
if (b > a && b > c)
return b;
return c;
}
char *
input(const char *prompt,long *lenp,const char *file)
{
FILE *fp;
char *lhs;
int chr;
long siz;
long len;
if (file != NULL)
fp = fopen(file,"r");
else {
fp = stdin;
printf("Enter %s string: ",prompt);
fflush(stdout);
}
lhs = NULL;
siz = 0;
len = 0;
while (1) {
chr = fgetc(fp);
if (chr == EOF)
break;
if ((chr == '\n') && (file == NULL))
break;
// grow the character array
if ((len + 1) >= siz) {
siz += 100;
lhs = realloc(lhs,siz);
if (lhs == NULL)
sysfault("input: realloc failure -- %s\n",strerror(errno));
}
lhs[len] = chr;
len += 1;
}
if (file != NULL)
fclose(fp);
if (lhs == NULL)
sysfault("input: premature EOF\n");
// add the EOS
lhs[len] = 0;
// return the length to the caller
*lenp = len;
return lhs;
}
int
main(int argc,char **argv)
{
long i;
long j;
char *input1;
long len1;
char *input2;
long len2;
long *c;
--argc;
++argv;
switch (argc) {
case 2:
input1 = input("first",&len1,argv[0]);
input2 = input("second",&len2,argv[1]);
break;
default:
input1 = input("first",&len1,NULL);
input2 = input("second",&len2,NULL);
break;
}
// make matrix
c = malloc(sizeof(*c) * (len1 + 1) * (len2 + 1));
if (c == NULL)
sysfault("main: malloc failure -- %s\n",strerror(errno));
// set up input 2 length
for (i = 0; i < len2 + 1; i++) {
C(0,i) = i;
}
// set up input 1 length
for (i = 0; i < len1 + 1; i++) {
C(i,0) = i;
}
// fill in the rest of the matrix
for (i = 1; i < len1; i++) {
for (j = 1; j < len2; j++) {
// if the 1st letters are equal make the diagonal equal to the last
if (input1[i] == input2[j])
C(i,j) = C(i - 1,j - 1);
else
C(i,j) = 1 + min(C(i - 1,j - 1), C(i - 1,j), C(i,j - 1));
}
}
// print the matrix
printf("\n");
for (j = 0; j < len2; j++) {
for (i = 0; i < len1; i++) {
printf("| %ld", C(i,j));
}
printf("\n");
}
return 1;
}

Related

Sorting an array of strings loaded from a text file

I have problem to sort an array of string by length loaded from txt file.
So I read from the file line by line and put the strings into an array, after that I sort that array by the length of the string, but I get a strange output of the array stream.
The problem is that the program is sorting an array of strings, but one of the strings is pasted on top of another.
Example:
The data in the file I'm reading from:
X&Y
X|Y
!X
(X|Y)|Z
(X&Y)|Z
(X&Y)&Z
(X&Y)|Z&(A|B
((X|Y)|Z)&((A|B)|(C&D))
(X&Y)|(Z&(A|B))
(A|B)&(!C)
A|(B&(C&(D|E)))
((X|Y)|(Z&(A|B)))|((C&D)&(D|E))
(A|B)|(C&D)&(D|E)
!A&(B|C)
(A|B)|(C|D)&(D
Content of the array after sorting in ascending order:
!X
X|Y
X&Y
(X|Y)|Z
(X&Y)|Z
(X&Y)&Z
!A&(B|C)
(A|B)&(!C)
(X&Y)|Z&(A|B
(A|B)|(C|D)&(DA|(B&(C&(D|E))) //Here' is problem ! (A|B)|(C|D)&(D and A|(B&(C&(D|E))) are concatenated?
(X&Y)|(Z&(A|B))
(A|B)|(C&D)&(D|E)
((X|Y)|Z)&((A|B)|(C&D))
((X|Y)|(Z&(A|B)))|((C&D)&(D|E))
Here is the code:
//Sort function
void sort(char str[][MAXLEN], int number_of_elements) {
int d, j;
char temp[100];
for (d = 0; d < number_of_elements - 1; d++) {
for (j = 0; j < number_of_elements - d - 1; j++) {
if (strlen(str[j]) < strlen(str[j + 1])) {
strcpy(temp, str[j]);
strcpy(str[j], str[j + 1]);
strcpy(str[j + 1], temp);
}
}
}
}
int main() {
FILE *dat;
int number_of_elements;
char str[MAX][MAXLEN];
int i = 0;
dat = fopen("ulaz.txt", "r");
if (dat == NULL) {
printf("Error");
}
while (!feof(dat) && !ferror(dat)) {
if (fgets(str[i], 100, dat) != NULL)
i++;
}
number_of_elements = i;
fclose(dat);
sort(str, number_of_elements);
for (int d = 0; d < i; d++) {
printf("%s", str[d]);
}
return 0;
}
Thanks in advance !
Your observations is consistent with the last line of the source file having no trailing newline: (A|B)|(C|D)&(D
You can correct the problem by stripping the newline after fgets() and always appending one in the output phase.
Also make sure that the temporary array used for swapping the strings is long enough: instead of 100 bytes, it should have a length of MAXLEN. Also stop reading from the file when i reaches MAX.
Here is a modified version:
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXLEN 200
#define MAX 100
//Sort function by decreasing string lengths
void sort(char str[][MAXLEN], int number_of_elements) {
int d, j;
for (d = 0; d < number_of_elements - 1; d++) {
for (j = 0; j < number_of_elements - d - 1; j++) {
if (strlen(str[j]) < strlen(str[j + 1])) {
char temp[MAXLEN];
strcpy(temp, str[j]);
strcpy(str[j], str[j + 1]);
strcpy(str[j + 1], temp);
}
}
}
}
int main() {
int number_of_elements;
char str[MAX][MAXLEN];
int i;
FILE *dat = fopen("ulaz.txt", "r");
if (dat == NULL) {
fprintf(stderr, "Cannot open %s: %s\n", "ulaz.txt", strerror(errno));
return 1;
}
for (i = 0; i < MAX && fgets(str[i], MAXLEN, dat) != NULL; i++) {
/* strip the trailing newline if any */
str[i][strcspn(str[i], "\n")] = '\0';
}
number_of_elements = i;
fclose(dat);
sort(str, number_of_elements);
for (int d = 0; d < number_of_elements; d++) {
printf("%s\n", str[d]);
}
return 0;
}

Add strings to an array

The problem: After the convert_tolower(words) function is completed I want to add a new word in the words array( if the words array has less than 5 words)..But I am getting either errors or unexpected results(e.g some weird characters being printed)...What i thought is shifting the elements of the words array and then work with pointers because I am dealing with strings.But I am having quite some trouble achieving that..Probably the problem is in lines
35-37
How I want the program to behave:
Get 5 words(strings) at most from user input
Take these strings and place them in an array words
Convert the elements of the array to lowercase letters
After the above,ask the user again to enter a new word and pick the position of that word.If the words array already has 5 words then the new word is not added.Else,the new word is added in the position the user chose.(The other words are not deleted,they are just 'shifted').
Also by words[1] I refer to the first word of the words array in its entirety
The code:
#include <stdio.h>
#include <string.h>
#define W 5
#define N 10
void convert_tolower(char matrix[W][N]);
int main() {
int j = 0;
int i = 0;
int len = 0;
char words[W][N] = {{}};
char test[W][N];
char endword[N] = "end";
char newword[N];
int position;
while (scanf("%9s", test), strcmp(test, endword)) {
strcpy(words[i++], test);
j++;
len++;
if (j == W) {
break;
}
}
convert_tolower(words);
printf("Add a new word\n");
scanf("%9s", newword);
printf("\nPick the position\n");
scanf("%d",position);
if (len < W) {
for (i = 0; i < W-1; i++) {
strcpy(words[i], words[i + 1]); /*Shift the words */
words[position] = newword;
}
}
for (i = 0; i < W; i++) {
printf("%s", words[i]);
printf("\n");
}
printf("End of program");
return 0;
}
void convert_tolower(char matrix[W][N]) {
int i;
int j;
for (i = 0; i < W; i++) {
for (j = 0; j < N; j++) {
matrix[i][j] = tolower(matrix[i][j]);
}
}
}
This initialization
char words[W][N] = {{}};
is incorrect in C. If you want to zero initialize the array then just write for example
char words[W][N] = { 0 };
In the condition of the while loop
while (scanf("%9s", test), strcmp(test, endword)) {
there is used the comma operator. Moreover you are using incorrectly the two-dimensional array test instead of a one-dimensional array
It seems you mean
char test[N];
//...
while ( scanf("%9s", test) == 1 && strcmp(test, endword) != 0 ) {
And there are used redundantly too many variables like i, j and len.
The loop could be written simpler like
char test[N];
//...
for ( ; len < W && scanf("%9s", test) == 1 && strcmp(test, endword) != 0; ++len )
{
strcpy(words[len], test);
}
In this call
scanf("%d",position);
there is a typo. You must to write
scanf("%d", &position);
Also you should check whether the entered value of position is in the range [0, len].
For example
position = -1;
printf("\nPick the position\n");
scanf("%d", &position);
if ( len < W && -1 < position && position <= len ) {
Also this for loop
for (i = 0; i < W-1; i++) {
strcpy(words[i], words[i + 1]); /*Shift the words */
words[position] = newword;
}
does not make a sense. And moreover this assignment statement
words[position] = newword;
is invalid. Arrays do not have the assignment operator.
You need to move all strings starting from the specified position to the right.
For example
for ( i = len; i != position; --i )
{
strcpy( words[i], words[i-1] );
}
strcpy( words[position], newword );
++len;
And it seems the function convert_tolower should be called for the result array after inserting a new word. And moreover you need to pass the number of actual words in the array.
convert_tolower(words, len);
The nested loops within the function convert_tolower should look at least the following way
void convert_tolower(char matrix[][N], int n) {
int i;
int j;
for (i = 0; i < n; i++) {
for (j = 0; matrix[i][j] != '\0'; j++) {
matrix[i][j] = tolower(( unsigned char )matrix[i][j]);
}
}
}
The main problem with your code was initially that you declared char *words[W][N], then tried to insert strings into this 2d array of pointers. Sparse use of organizing functions, and variables with large scopes than necessary made it hard to read. I think the best way to help you is to show you a working minimal implementation. Step 4 is not sufficiently specified. insert currently shift. It is not clear what should happen if you insert at position after empty slots, or if insert a position before empty slots and in particular if there are non-empty slots after said position.
#include <ctype.h>
#include <stdio.h>
#include <string.h>
#define W 5
#define N 10
void convert(size_t w, size_t n, char list[][n]) {
for(size_t i = 0; i < w; i++) {
for(size_t j = 0; j < n; j++) {
list[i][j] = tolower(list[i][j]);
}
}
}
void insert(size_t w, size_t n, char list[][n], size_t pos, char *word) {
// out out of bounds
if(pos + 1 > w) return;
// shift pos through w - 2 pos
for(size_t i = w - 2; i >= pos; i--) {
strcpy(list[i + 1], list[i]);
if(!i) break;
}
// insert word at pos
strcpy(list[pos], word);
}
void print(size_t w, size_t n, char list[][n]) {
for (size_t i = 0; i < w; i++) {
printf("%u: %s\n", i, list[i]);
}
}
int main() {
char words[W][N] = { "a", "BB", "c" };
convert(W, N, words);
insert(W, N, words, 0, "start");
insert(W, N, words, 2, "mid");
insert(W, N, words, 4, "end");
insert(W, N, words, 5, "error")
print(W, N, words);
return 0;
}
and the output (note: "c" was shifted out as we initially had 3 elements and added 3 new words with valid positions):
0: start
1: a
2: mid
3: bb
4: end

memory allocation problem while reading a file

I'm trying to multiply two matrices stored in a file thus formatted:
1 2
2 3
*
-4 1
1 0
I do not know initially what the dimension of each matrix is. But I let the user define it or otherwise a default value of 100 is taken.
int maxc = argc > 2 ? atoi(argv[2]) * atoi(argv[2]) : 100;
I can already perform the calculation correctly, but I've noticed that if I enter the dimension argv[2] = "2" so that maxc = 8, (that should be enough for this example), errors are produced in reading or printing the file. But if I enter argv[2] = "3" everything works out fine for this example. Since maxc is used to allocate memory here: matrix = malloc(maxc * sizeof *matrix), I suspect the problem could be located on that line. Should I allocate memory also for size_t row; size_t col;?
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <math.h>
#include <string.h>
#define MAXNOP 50 /*Max number of operations allowed */
#define MAXNMATR 20 /*Max number of matrices */
struct m {
size_t row;
size_t col;
double *data;
};
struct m multiply(struct m *A, struct m *B);
void f(double x);
void print_matrix(struct m *A);
void read_file(int maxc, FILE *fp);
void scalar_product(double scalar, struct m *B);
void calculate(struct m *matrix, int nop, int id, char *op);
int main(int argc, char *argv[]) {
FILE *file = argc > 1 ? fopen(argv[1], "rb") : stdin;
/* define max dimension of a matrix */
int maxc = argc > 2 ? atoi(argv[2]) * atoi(argv[2]) : 100;
read_file(maxc, file);
return 0;
}
void read_file(int maxc, FILE *fp) {
struct m *matrix;
int id = 0; /* id of a matrix */
size_t ncol, nrow; /* No of columns of a matrix*/
ncol = nrow = 0;
int nop = 0; /*No of operators*/
int off = 0;
int i;
int n;
double *d;
char buf[2 * maxc]; /*to store each lines of file */
char *p = buf;
char op[MAXNOP];
for (i = 0; i < MAXNOP; i++)
op[i] = '?';
if (!(matrix = malloc(maxc * sizeof *matrix))) {
perror("malloc-matrix");
exit(1);
}
/* Read file line by line */
while (fgets(buf, maxc, fp)) {
if (nrow == 0) {
/* allocate/validate max no. of matrix */
d = matrix[id].data = malloc(sizeof(double) * MAXNMATR);
}
/* check if line contains operator */
if ((!isdigit(*buf) && buf[1] =='\n')) {
op[nop++] = *buf;
matrix[id].col = ncol;
matrix[id].row = nrow;
nrow = ncol = 0;
id++;
continue;
} else {
/* read integers in a line into d */
while (sscanf(p + off, "%lf%n", d, &n) == 1) {
d++;
if (nrow == 0)
ncol++;
off += n;
}
nrow++;
off = 0;
}
} /*end of while fgets cycle */
/* Assign last matrix No of columns and rows */
matrix[id].col = ncol;
matrix[id].row = nrow;
/* Printing the matrices and operations */
for (i = 0; i <= id; i++) {
if (op[i] == '*' || op[i] == '-' || op[i] =='+') {
print_matrix(&matrix[i]);
if (op[i-1] != 'i')
printf("%c\n", op[i]);
else
continue;
} else
if (op[i] == '?') {
print_matrix(&matrix[i]);
}
}
calculate(matrix, nop, id, op);
}
void calculate(struct m *matrix, int nop, int id, char *op) {
int i;
for (i = 0; i <= nop; i += 2) {
if (op[i] == '*' && op[i+1] == '?') {
if (matrix[i].row == 1 && matrix[i].col == 1)
scalar_product(matrix[i].data[0], &matrix[i + 1]); //Multiplication of Scalar per matrix
else {
matrix[i + 1] = multiply(&matrix[i], &matrix[i + 1]);
matrix[i + 2] = multiply(&matrix[i + 1], &matrix[i + 2]);
}
break;
}
}
printf("=\n");
print_matrix(&matrix[id]); /* Print the result */
free(matrix);
}
struct m multiply(struct m *A, struct m *B) {
size_t i, j, k;
struct m C;
C.data = malloc(sizeof(double) * A->row * B->col);
C.row = A->row;
C.col = B->col;
for (i = 0; i < C.row; i++)
for (j= 0 ; j < C.col; j++)
C.data[i * C.col + j] = 0;
// Multiplying matrix A and B and storing in C.
for (i = 0; i < A->row; ++i)
for (j = 0; j < B->col; ++j)
for (k = 0; k < A->col; ++k)
C.data[i * C.col + j] += A->data[i * A->col + k] * B->data[k * B->col + j];
return C;
}
void f(double x) {
double i, f = modf(x, &i);
if (f < .00001)
printf("%.f ", i);
else
printf("%f ", x);
}
/* printing a Matrix */
void print_matrix(struct m *A) {
size_t i, j;
double *tmp = A->data;
for (i = 0; i < A->row; i++) {
for (j = 0; j < A->col; j++) {
f(*(tmp++));
}
putchar('\n');
}
}
void scalar_product(double scalar, struct m *B) {
size_t i, j;
for (i = 0; i < B->row; i++)
for (j = 0; j < B->col; j++)
B->data[i * B->col + j] = scalar * B->data[i * B->col + j];
}
The expected result is this: https://ideone.com/Z7UtiR
here argv[2] is not read so there is enough memory to store all data.
Your read buffer only has room for maxc (ie. 4) characters :
char buf[maxc]; /*to store each lines of file */
You then attempt to get a line from the file into that buffer :
while (fgets (buf, maxc, fp)){
But that buffer is only large enough for 2 characters, followed by a newline, and then a '\0' terminator.
Looking at your sample file, the longest line has 4 characters : "-4 1". So, your buffer needs to at least be able to hold 6 (including the newline and '\0' terminator).
It's probably better to make your buffer quite a bit larger.
The problem is entirely in reading the arrays.
The maxc = 4 and the buffer char buf[maxc]; has place only for 3 characters and terminating character.
So fgets (buf, maxc, fp):
on the first will read buf = "1 2" (3 characters and zero byte)
on the second will read buf = "\n" (1 newline character, fgets terminates)
then reads buf = "2 3"
then reads buf = "\n"
buf = "*\n"
buf = "-4 "
and so on
Because of the empty line, inside this code snipped:
else /* read integers in a line into d */
{
while (sscanf (p + off, "%lf%n", d, &n) == 1) {
d++;
if(nrow == 0)
ncol++;
off += n;
}
nrow++;
off = 0;
}
The variable nrow will be incremented 4 times (2 times for rows, and 2 times for empty lines with only newlines read), which will be 2 times too many. The second matrix will have 1 column, because you will read only -4 from the line, so your while(sscanf loop will scan only one number, so ncol will be only 1.
Your fix you posted in the comment is invalid, because you only increased buffer size, but didn't increase the size argument you pass to fgets. If you did char buf[2*maxc]; you also should fgets (buf, 2 * maxc, fp), which will "fix" the current problem. I would rather re-write the whole thing or rather write fgets(buf, sizeof(buf)/sizeof(buf[0]), fp) to accommodate future changes.
Don't use VLAs ex. char buf[maxc];. For simplicity you can use arbitrary long buffer for the line, ex. #define LINE_MAX 1024 and char buf[LINE_MAX] and then fgets(buf, sizeof(buf)/sizeof(buf[0]), file). Or use or rewrite function that will dynamically resize memory and read line, like GNUs getline.

Uninitialised values in dynamic array in C

I've been given a task that requires a dynamic 2D array in C, but we haven't even covered pointers yet, so I'm kind of at a loss here. I have to read some text input and store it in a 2D array, without limiting its size.
Unfortunately, Valgrind keeps throwing me an error saying that there's an uninitialised value, when the puts() function executes and sometimes it prints out some random signs. I understand that I must have omitted some indexes, but I just can't find where the issue stems from. Additionally, all advices regarding the quality of my code are very much appreciated.
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <assert.h>
#define MULT 3
#define DIV 2
char **read(int *row, int *col) {
char **input = NULL;
int row_size = 0;
int col_size = 0;
int i = 0;
int c;
while ((c = getchar()) != EOF) {
if (c != '\n') { // skip empty lines
assert(i < INT_MAX);
if (i == row_size) { // if not enough row memory, allocate more
row_size = 1 + row_size * MULT / DIV;
input = realloc(input, row_size * sizeof *input);
assert(input != NULL);
}
char *line = NULL;
int j = 0;
// I need all the rows to be of the same size (see last loop)
line = malloc(col_size * sizeof *line);
// do while, so as to not skip the first character
do {
assert(j < INT_MAX-1);
if (j == col_size) {
col_size = 1 + col_size * MULT / DIV;
line = realloc(line, col_size * sizeof *line);
assert(line != NULL);
}
line[j++] = c;
} while(((c = getchar()) != '\n') && (c != EOF));
// zero-terminate the string
if (j == col_size) {
++col_size;
line = realloc(line, col_size * sizeof *line);
line[j] = '\0';
}
input[i++] = line;
}
}
// Here I give all the lines the same length
for (int j = 0; j < i; ++j)
input[j] = realloc(input[j], col_size * sizeof *(input+j));
*row = i;
*col = col_size;
return input;
}
int main(void) {
int row_size, col_size, i, j;
char **board = read(&row_size, &col_size);
// Initialize the remaining elements of each array
for (i = 0; i < row_size; ++i) {
j = 0;
while (board[i][j] != '\0')
++j;
while (j < col_size-1)
board[i][++j] = ' ';
}
for (i = 0; i < row_size; ++i) {
puts(board[i]);
}
for (i = 0; i < row_size; ++i)
free(board[i]);
free(board);
return 0;
}

How to avoid duplicates when finding all k-length substrings

I want to display all substrings with k letters, one per line, but avoid duplicate substrings. I managed to write to a new string all the k length words with this code:
void subSent(char str[], int k) {
int MaxLe, i, j, h, z = 0, Length, count;
char stOu[1000] = {'\0'};
Length = (int)strlen(str);
MaxLe = maxWordLength(str);
if((k >= 1) && (k <= MaxLe)) {
for(i = 0; i < Length; i++) {
if((int)str[i] == 32) {
j = i = i + 1;
} else {
j = i;
}
for(; (j < i + k) && (Length - i) >= k; j++) {
if((int)str[j] != 32) {
stOu[z] = str[j];
} else {
stOu[z] = str[j + 1];
}
z++;
}
stOu[z] = '\n';
z++;
}
}
}
But I'm struggling with the part that needs to save only one time of a word.
For example, the string HAVE A NICE DAY
and k = 1 it should print:
H
A
V
E
N
I
C
D
Y
Your subSent() routine poses a couple of challenges: first, it neither returns nor prints it's result -- you can only see it in the debugger; second it calls maxWordLength() which you didn't supply.
Although avoiding duplicates can be complicated, in the case of your algorithm, it's not hard to do. Since all your words are fixed length, we can walk the output string with the new word, k letters (plus a newline) at a time, doing strncmp(). In this case the new word is the last word added so we quit when the pointers meet.
I've reworked your code below and added a duplication elimination routine. I didn't know what maxWordLength() does so I just aliased it to strlen() to get things running:
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
#define maxWordLength strlen
// does the last (fixed size) word in string appear previously in string
bool isDuplicate(const char *string, const char *substring, size_t n) {
for (const char *pointer = string; pointer != substring; pointer += (n + 1)) {
if (strncmp(pointer, substring, n) == 0) {
return true;
}
}
return false;
}
void subSent(const char *string, int k, char *output) {
int z = 0;
size_t length = strlen(string);
int maxLength = maxWordLength(string);
if (k >= 1 && k <= maxLength) {
for (int i = 0; i < length - k + 1; i++) {
int start = z; // where does the newly added word begin
for (int j = i; (z - start) < k; j++) {
output[z++] = string[j];
while (string[j + 1] == ' ') {
j++; // assumes leading spaces already dealt with
}
}
output[z++] = '\n';
if (isDuplicate(output, output + start, k)) {
z -= k + 1; // last word added was a duplicate so back it out
}
while (string[i + 1] == ' ') {
i++; // assumes original string doesn't begin with a space
}
}
}
output[z] = '\0'; // properly terminate the string
}
int main() {
char result[1024];
subSent("HAVE A NICE DAY", 1, result);
printf("%s", result);
return 0;
}
I somewhat cleaned up your space avoidance logic but it can be tripped by leading spaces on the input string.
OUTPUT
subSent("HAVE A NICE DAY", 1, result);
H
A
V
E
N
I
C
D
Y
subSent("HAVE A NICE DAY", 2, result);
HA
AV
VE
EA
AN
NI
IC
CE
ED
DA
AY
subSent("HAVE A NICE DAY", 3, result);
HAV
AVE
VEA
EAN
ANI
NIC
ICE
CED
EDA
DAY

Resources