I have to do an assignment where I have to read a file that contains an adjacency matrix and later do some stuff.
I have all working but my code is very slow, at least for the benchmarking system.
I'm reading all the file rows in this code snippet below:
while (fgets(buf, sizeof(buf), stdin) != NULL) {
parse(buf);
i++;
}
and then I initialize my 2d array with all the values using strtok and atoi:
void parse(char *str, int count, char *sep) {
//char *aux = malloc(count * sizeof(char*));
char *aux;
aux = strtok(str, sep);
int j = 0;
while (aux) {
array[(i*DIM) + j] = atoi(aux);
j++;
aux = strtok(NULL, sep);
}
//free(aux);
}
Arrays are DIM*DIM size and each INT is separated by a comma.
Sample input for a 3*3 matrix:
1,20,1
0,111,3
4,7,10
How can I improve this for better performances?
EDIT:
array definition:
array = malloc(DIM*DIM*sizeof(int));
The malloc part makes no sense since you just need one single character pointer for strtok. Similarly, functions like strtol or atoi already parse the data, so you don't even need strtok - it just takes up extra time in this case. Furthermore, atoi doesn't have any error handling so it should never be used.
So you can just call strtol in a loop and it will do what you want. By checking the endptr argument you can see if each read was successful or not (man strtol). And then next lap in the loop, start over from endptr + 1.
If combining this with your 2D int array requirement, the function might look like this:
#include <stdio.h>
#include <stdlib.h>
void csv_to_int (size_t col, size_t row, int dst[col][row], const char* str)
{
const char* ptr = str;
char* end;
for(size_t c=0; c<col; c++)
{
for(size_t r=0; r<row; r++)
{
int val=strtol(ptr,&end,10);
if(ptr==end)
{
return ;
}
dst[c][r]=val;
ptr = end+1;
}
}
}
int main (void)
{
const char* input = "1,20,1\n0,111,3\n4,7,10\n";
int arr[3][3];
csv_to_int(3, 3, arr, input);
for(size_t i=0; i<3; i++)
{
for(size_t j=0; j<3; j++)
{
printf("%3d ", arr[i][j]);
}
puts("");
}
}
Output:
1 20 1
0 111 3
4 7 10
This is of course assuming that the input suits the 3x3 format - this code has almost no error handling.
How can I improve this for better performances?
So do not use these functions, if you think they are slow. Limit your requirements - do not handle locale specific digits. So read and convert it yourself. Blatantly disregard error checking. Something along:
#define _GNU_SOURCE 1
#include <stdio.h>
int main() {
char data[] ="1,20,1\n0,111,3\n4,7,10\n";
FILE *f = fmemopen(data, sizeof(data), "r");
int i = 0, j = 0;
#define DIM 3
int array[20];
// this reading part
int buf = 0;
for (int c; (c = fgetc(f)) != EOF; ) {
if (c == '\n') {
array[i * DIM + j] = buf;
buf = 0;
++i;
j = 0;
} else if (c == ',') {
array[i * DIM + j] = buf;
buf = 0;
++j;
} else {
buf *= 10;
buf += (c - '0');
}
}
array[i * DIM + j] = buf;
++j;
// checking
for (int i = 0; i < 3; ++i) {
for (int j = 0; j < 3; ++j) {
printf("%d %d = %d\n", i, j, array[i * DIM + j]);
}
}
}
Even further, ignore portability, and use system calls (on unix - read(STDIN_FILENO)) instead of C API.
Related
The problem: After the convert_tolower(words) function is completed I want to add a new word in the words array( if the words array has less than 5 words)..But I am getting either errors or unexpected results(e.g some weird characters being printed)...What i thought is shifting the elements of the words array and then work with pointers because I am dealing with strings.But I am having quite some trouble achieving that..Probably the problem is in lines
35-37
How I want the program to behave:
Get 5 words(strings) at most from user input
Take these strings and place them in an array words
Convert the elements of the array to lowercase letters
After the above,ask the user again to enter a new word and pick the position of that word.If the words array already has 5 words then the new word is not added.Else,the new word is added in the position the user chose.(The other words are not deleted,they are just 'shifted').
Also by words[1] I refer to the first word of the words array in its entirety
The code:
#include <stdio.h>
#include <string.h>
#define W 5
#define N 10
void convert_tolower(char matrix[W][N]);
int main() {
int j = 0;
int i = 0;
int len = 0;
char words[W][N] = {{}};
char test[W][N];
char endword[N] = "end";
char newword[N];
int position;
while (scanf("%9s", test), strcmp(test, endword)) {
strcpy(words[i++], test);
j++;
len++;
if (j == W) {
break;
}
}
convert_tolower(words);
printf("Add a new word\n");
scanf("%9s", newword);
printf("\nPick the position\n");
scanf("%d",position);
if (len < W) {
for (i = 0; i < W-1; i++) {
strcpy(words[i], words[i + 1]); /*Shift the words */
words[position] = newword;
}
}
for (i = 0; i < W; i++) {
printf("%s", words[i]);
printf("\n");
}
printf("End of program");
return 0;
}
void convert_tolower(char matrix[W][N]) {
int i;
int j;
for (i = 0; i < W; i++) {
for (j = 0; j < N; j++) {
matrix[i][j] = tolower(matrix[i][j]);
}
}
}
This initialization
char words[W][N] = {{}};
is incorrect in C. If you want to zero initialize the array then just write for example
char words[W][N] = { 0 };
In the condition of the while loop
while (scanf("%9s", test), strcmp(test, endword)) {
there is used the comma operator. Moreover you are using incorrectly the two-dimensional array test instead of a one-dimensional array
It seems you mean
char test[N];
//...
while ( scanf("%9s", test) == 1 && strcmp(test, endword) != 0 ) {
And there are used redundantly too many variables like i, j and len.
The loop could be written simpler like
char test[N];
//...
for ( ; len < W && scanf("%9s", test) == 1 && strcmp(test, endword) != 0; ++len )
{
strcpy(words[len], test);
}
In this call
scanf("%d",position);
there is a typo. You must to write
scanf("%d", &position);
Also you should check whether the entered value of position is in the range [0, len].
For example
position = -1;
printf("\nPick the position\n");
scanf("%d", &position);
if ( len < W && -1 < position && position <= len ) {
Also this for loop
for (i = 0; i < W-1; i++) {
strcpy(words[i], words[i + 1]); /*Shift the words */
words[position] = newword;
}
does not make a sense. And moreover this assignment statement
words[position] = newword;
is invalid. Arrays do not have the assignment operator.
You need to move all strings starting from the specified position to the right.
For example
for ( i = len; i != position; --i )
{
strcpy( words[i], words[i-1] );
}
strcpy( words[position], newword );
++len;
And it seems the function convert_tolower should be called for the result array after inserting a new word. And moreover you need to pass the number of actual words in the array.
convert_tolower(words, len);
The nested loops within the function convert_tolower should look at least the following way
void convert_tolower(char matrix[][N], int n) {
int i;
int j;
for (i = 0; i < n; i++) {
for (j = 0; matrix[i][j] != '\0'; j++) {
matrix[i][j] = tolower(( unsigned char )matrix[i][j]);
}
}
}
The main problem with your code was initially that you declared char *words[W][N], then tried to insert strings into this 2d array of pointers. Sparse use of organizing functions, and variables with large scopes than necessary made it hard to read. I think the best way to help you is to show you a working minimal implementation. Step 4 is not sufficiently specified. insert currently shift. It is not clear what should happen if you insert at position after empty slots, or if insert a position before empty slots and in particular if there are non-empty slots after said position.
#include <ctype.h>
#include <stdio.h>
#include <string.h>
#define W 5
#define N 10
void convert(size_t w, size_t n, char list[][n]) {
for(size_t i = 0; i < w; i++) {
for(size_t j = 0; j < n; j++) {
list[i][j] = tolower(list[i][j]);
}
}
}
void insert(size_t w, size_t n, char list[][n], size_t pos, char *word) {
// out out of bounds
if(pos + 1 > w) return;
// shift pos through w - 2 pos
for(size_t i = w - 2; i >= pos; i--) {
strcpy(list[i + 1], list[i]);
if(!i) break;
}
// insert word at pos
strcpy(list[pos], word);
}
void print(size_t w, size_t n, char list[][n]) {
for (size_t i = 0; i < w; i++) {
printf("%u: %s\n", i, list[i]);
}
}
int main() {
char words[W][N] = { "a", "BB", "c" };
convert(W, N, words);
insert(W, N, words, 0, "start");
insert(W, N, words, 2, "mid");
insert(W, N, words, 4, "end");
insert(W, N, words, 5, "error")
print(W, N, words);
return 0;
}
and the output (note: "c" was shifted out as we initially had 3 elements and added 3 new words with valid positions):
0: start
1: a
2: mid
3: bb
4: end
Let's say I have a series of data that's in this form:
"SomethingIDontCareAbout : SomethingICareAbout"
where the part after the ":" can vary in length of course.
The goal here is only storing the "SomethingICareAbout" substring efficiently. I made this function but the problem is that I'm storing both substrings,so it seems like a waste of memory. Any help to reduce to the time/space complexity?
char** ExtractKey(char* S)
{
int n = strlen(S);
int count = 0, i = 0, j = 0;
for(i = 0; i < n; i++)
{
if(S[i] == ':')
break;
count++;
}
char** T = (char**)malloc(2 * sizeof(char*));
T[0] = (char*)malloc((count + 1) * sizeof(char));
T[1] = (char*)malloc((n - count) * sizeof(char));
for(i = 0; i < count; i++) // inefficient ? cus we won't need T[0] [j]
{
T[0][j] = S[i];
j++;
}
T[0][j+1] = '\0';
j = 0;
for(i = count + 1; i < n; i++)
{
T[1][j] = S[i];
j++;
}
T[1][j+1] = '\0';
return T;
}
There is no reason to invent a search for a character in a string, or a copy of a string.
If the input data will live long enough for you to use the "value" part, just return a pointer to the value:
char* ExtractKey(char* S)
{
return strchr(S, ':');
}
If it doesn't, or if you for some reason need a separate copy:
char* ExtractKey(char* S)
{
return strdup(strchr(S, ':'));
}
Honestly, this could be done efficiently if strtok() was used to split those strings. I have designed the following code that parses each string of a 2-D array with a common delimiter that is : here.
Now, let's take a look into the code (notice the comments):
#include <stdio.h>
#include <string.h>
#define MAX_LEN 128
int main(void) {
// The 2-D string
char str[][MAX_LEN] = {"SomethingElse : SomethingToCareAbout",
"Something2 : SomethingToCare2",
"Unnecessary : Necessary"};
int size = sizeof(str) / sizeof(str[0]);
// Applying Variable-Length Array (valid in C)
char store_cared_ones[size][MAX_LEN];
for (int i = 0; i < size; i++) {
// Declaring a temporary pointer variable to obtain the required
// substring from each string
char *sub_str = NULL;
sub_str = strtok(str[i], ": ");
sub_str = strtok(NULL, ": ");
// Copying the 'sub_str' into each array element of 'store_cared_ones'
strcpy(store_cared_ones[i], sub_str);
}
// Displaying each of 'store_cared_ones'
for (int i = 0; i < size; i++)
fprintf(stdout, "%s\n", store_cared_ones[i]);
return 0;
}
Finally, let's see what that code does:
rohanbari#genesis:~/stack$ ./a.out
SomethingToCareAbout
SomethingToCare2
Necessary
The program reads a file which includes one word in every line.After reading random word put random word in a pointer and return the pointer .in main function
printf("%s",func("example.txt",str)) it prints different string when the program run.I want to do this in 2d array(20*20) like table,but i could not imagine how to do this.When i print the the function in internal loop,it give me the same word in every loop step.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
char *word(char *file, char *str);
int main() {
char *str ;
int i, j;
str = (char *)malloc(20);
srand(time(NULL));
char *puzzle[20][20];
for (i = 0; i < 20; i++) {
for (j = 0; j < 20; j++) {
puzzle[i][j] = word("words.txt", str);
}
}
for (i = 0; i < 20; i++) {
for (j = 0; j < 20; j++) {
printf("%s ", puzzle[i][j]);
}
printf("\n");
}
}
char *word(char *file, char *str) {
int end, loop, line;
FILE *fd = fopen(file, "r");
if (fd == NULL) {
printf("Failed to open file\n");
return (NULL);
}
srand(time(NULL));
line = rand() % 100 + 1;
for (end = loop = 0; loop < line; ++loop) {
if (0 == fgets(str, 20, fd)) {
end = 1;
break;
}
}
if (!end)
return (char *)str;
fclose(fd);
free(str);
}
I do not have your words.txt file, so I've created some random strings below.
And a note:
Because your nested loop is in the main, your code opens the file in the sub function and returns w/o closing it; then returns to the sub and reopens, and again, and again... It's always better to read at once and close the file before returning from the sub.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
char **word(int countString, int maxChars) {
int i;
int j;
int k;
// allocate memory for pointers that are pointing to each string
char **arrStr = malloc(countString * sizeof(char *));
// srand(time(NULL));
for (i = 0; i < countString; i++) {
// create a random string with a length of 'k'
// say, 5 <= k <= maxChars
// that (+ 1) is for the string terminating character '\0'
k = (rand() % (maxChars - 5)) + 5 + 1;
// allocate memory for string
arrStr[i] = malloc(k * sizeof(char));
for (j = 0; j < k - 1; j++) {
*(arrStr[i] + j) = rand() % 26 + 'A';
}
*(arrStr[i] + j) = '\0';
}
return arrStr;
}
int main() {
int countString = 10;
int maxChars = 20;
char **arrStr = NULL;
int i;
arrStr = word(countString, maxChars);
for (i = 0; i < 10; i++) {
printf("%s\n", *(arrStr + i));
}
// do not forget to free the strings
// and then the string pointers (array)
return 0;
}
I have a .csv file that reads like:
SKU,Plant,Qty
40000,ca56,1245
40000,ca81,12553.3
40000,ca82,125.3
45000,ca62,0
45000,ca71,3
45000,ca78,54.9
Note: This is my example but in reality this has about 500,000 rows and 3 columns.
I am trying to convert these entries into a 2D array so that I can then manipulate the data. You'll notice that in my example I just set a small 10x10 matrix A to try and get this example to work before moving on to the real thing.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
const char *getfield(char *line, int num);
int main() {
FILE *stream = fopen("input/input.csv", "r");
char line[1000000];
int A[10][10];
int i, j = 0;
//Zero matrix
for (i = 0; i < 10; i++) {
for (j = 0; j < 10; j++) {
A[i][j] = 0;
}
}
for (i = 0; fgets(line, 1000000, stream); i++) {
while (j < 10) {
char *tmp = strdup(line);
A[i][j] = getfield(tmp, j);
free(tmp);
j++;
}
}
//print matrix
for (i = 0; i < 10; i++) {
for (j = 0; j < 10; j++) {
printf("%s\t", A[i][j]);
}
printf("\n");
}
}
const char *getfield(char *line, int num) {
const char *tok;
for (tok = strtok(line, ",");
tok && *tok;
tok = strtok(NULL, ",\n"))
{
if (!--num)
return tok;
}
return 0;
}
It prints only "null" errors, and it is my belief that I am making a mistake related to pointers on this line: A[i][j] = getfield(tmp, j). I'm just not really sure how to fix that.
This is work that is based almost entirely on this question: Read .CSV file in C . Any help in adapting this would be very much appreciated as it's been a couple years since I last touched C or external files.
It looks like commenters have already helped you find a few errors in your code. However, the problems are pretty entrenched. One of the biggest issues is that you're using strings. Strings are, of course, char arrays; that means that there's already a dimension in use.
It would probably be better to just use a struct like this:
struct csvTable
{
char sku[10];
char plant[10];
char qty[10];
};
That will also allow you to set your columns to the right data types (it looks like SKU could be an int, but I don't know the context).
Here's an example of that implementation. I apologize for the mess, it's adapted on the fly from something I was already working on.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Based on your estimate
// You could make this adaptive or dynamic
#define rowNum 500000
struct csvTable
{
char sku[10];
char plant[10];
char qty[10];
};
// Declare table
struct csvTable table[rowNum];
int main()
{
// Load file
FILE* fp = fopen("demo.csv", "r");
if (fp == NULL)
{
printf("Couldn't open file\n");
return 0;
}
for (int counter = 0; counter < rowNum; counter++)
{
char entry[100];
fgets(entry, 100, fp);
char *sku = strtok(entry, ",");
char *plant = strtok(NULL, ",");
char *qty = strtok(NULL, ",");
if (sku != NULL && plant != NULL && qty != NULL)
{
strcpy(table[counter].sku, sku);
strcpy(table[counter].plant, plant);
strcpy(table[counter].qty, qty);
}
else
{
strcpy(table[counter].sku, "\0");
strcpy(table[counter].plant, "\0");
strcpy(table[counter].qty, "\0");
}
}
// Prove that the process worked
for (int printCounter = 0; printCounter < rowNum; printCounter++)
{
printf("Row %d: column 1 = %s, column 2 = %s, column 3 = %s\n",
printCounter + 1, table[printCounter].sku,
table[printCounter].plant, table[printCounter].qty);
}
// Wait for keypress to exit
getchar();
}
There are multiple problems in your code:
In the second loop, you do not stop reading the file after 10 lines, so you would try and store elements beyond the end of the A array.
You do not reset j to 0 at the start of the while (j < 10) loop. j happens to have the value 10 at the end of the initialization loop, so you effectively do not store anything into the matrix.
The matrix A should be a 2D array of char *, not int, or potentially an array of structures.
Here is a simpler version with an allocated array of structures:
#include <stdio.h>
#include <stdlib.h>
typedef struct item_t {
char SKU[20];
char Plant[20];
char Qty[20];
};
int main(void) {
FILE *stream = fopen("input/input.csv", "r");
char line[200];
int size = 0, len = 0, i, c;
item_t *A = NULL;
if (stream) {
while (fgets(line, sizeof(line), stream)) {
if (len == size) {
size = size ? size * 2 : 1000;
A = realloc(A, sizeof(*A) * size);
if (A == NULL) {
fprintf(stderr, "out of memory for %d items\n", size);
return 1;
}
}
if (sscanf(line, "%19[^,\n],%19[^,\n],%19[^,\n]%c",
A[len].SKU, A[len].Plant, A[len].Qty, &c) != 4
|| c != '\n') {
fprintf(stderr, "invalid format: %s\n, line);
} else {
len++;
}
}
fclose(stream);
//print matrix
for (i = 0; i < len; i++) {
printf("%s,%s,%s\n", A[i].SKU, A[i].Plant, A[i].Qty);
}
free(A);
}
return 0;
}
Suppose that we have a string "11222222345646". So how to print out subsequence 222222 in C.
I have a function here, but I think something incorrect. Can someone correct it for me?
int *longestsubstring(int a[], int n, int *length)
{
int location = 0;
length = 0;
int i, j;
for (i = 0, j = 0; i <= n-1, j < i; i++, j++)
{
if (a[i] != a[j])
{
if (i - j >= *length)
{
*length = i - j;
location = j;
}
j = i;
}
}
return &a[location];
}
Sorry,I don't really understand your question.
I just have a little code,and it can print the longest sub string,hope it can help.
/*breif : print the longest sub string*/
void printLongestSubString(const char * str,int length)
{
if(length <= 0)
return;
int i ;
int num1 = 0,num2 = 0;
int location = 0;
for(i = 0; i< length - 1; ++i)
{
if(str[i] == str[i+1])
++num2;//count the sub string ,may be not the longest,but we should try.
else
{
if(num2 >num1)//I use num1 store the sum longest of current sub string.
{ num1 = num2;location = i - num2;}
else
;//do nothing for short sub string.
num2 = 0;
}
}
for(i = location;str[i]== str[num1];++i)
printf("%c",str[i]);
printf("\n");
}
int main()
{
char * str = "1122222234566";
printLongestSubString(str,13);
return 0;
}
From your code it appears you want to return the longest sub-sequence (sub-string). Since I'm relearning C I thought I would give it a shot.
I've used strndup to extract the substring. I'm not sure how portable it is but I found an implementation if needed, just click on the link. It will allocate memory to store the new cstring so you have to remember to free the memory once finished with the substring. Following your argument list, the length of the sub-string is returned as the third argument of the extraction routine.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *extract_longest_subsequence(const char *str, size_t str_len, size_t *longest_len);
int main()
{
char str[] = "11222234555555564666666";
size_t substr_len = 0;
char *substr = extract_longest_subsequence(str, sizeof(str), &substr_len);
if (!substr)
{
printf("Error: NULL sub-string returned\n");
return 1;
}
printf("original string: %s, length: %zu\n", str, sizeof(str)-1);
printf("Longest sub-string: %s, length: %zu\n", substr, substr_len);
/* Have to remember to free the memory allocated by strndup */
free(substr);
return 0;
}
char *extract_longest_subsequence(const char *str, size_t str_len, size_t *longest_len)
{
if (str == NULL || str_len < 1 || longest_len == NULL)
return NULL;
size_t longest_start = 0;
*longest_len = 0;
size_t curr_len = 1;
size_t i = 0;
for (i = 1; i < str_len; ++i)
{
if (str[i-1] == str[i])
{
++curr_len;
}
else
{
if (curr_len > *longest_len)
{
longest_start = i - curr_len;
*longest_len = curr_len;
}
curr_len = 1;
}
}
/* strndup allocates memory for storing the substring */
return strndup(str + longest_start, *longest_len);
}
It looks like in your loop that j is supposed to be storing where the current "substring" starts, and i is the index of the character that you are currently looking at. In that case, you want to change
for (i = 0, j = 0; i <= n-1, j < i; i++, j++)
to
for (i = 0, j = 0; i <= n-1; i++)
That way, you are using i to store which character you're looking at, and the j = i line will "reset" which string of characters you are checking the length of.
Also, a few other things:
1) length = 0 should be *length = 0. You probably don't actually want to set the pointer to point to address 0x0.
2) That last line would return where your "largest substring" starts, but it doesn't truncate where the characters start to change (i.e. the resulting string isn't necessarily *length long). It can be intentional depending on use case, but figured I'd mention it in case it saves some grief.