I'm working on a program which reads every integer in csv file and copies it into a buffer so that I can later use it to construct a binary search tree with it. I'll show my code, then I'll explain the issue I'm having:
Code -
int *createBuffer(int count) {
FILE *file = fopen(FILE1, "r");
int buffer[count + 1];
int *bufferPointer = buffer;
int number;
int ch;
int i = 0;
while (1) {
ch = fgetc(file);
if(ch == EOF){
break;
}
if (fscanf(file, "%i", &number)) {
buffer[i] = number;
i++;
}
}
return bufferPointer;
}
Count refers to the number of commas that are present in the file so I can allocate enough space for each number in the array. The file pointer points to the file I'm opening in read-only mode. The buffer is created using the aforementioned count variable. bufferPointer is the pointer to the buffer that I'm returning from the function. The while loop runs until the variable ch is equal to EOF at which point it breaks. The if statement's purpose is basically to scan the file for integers and read them into number, and then copy number into the next buffer index. Finally, the buffer pointer is returned.
This code is giving me extremely strange results. When I print the buffer, I get the result:
9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 850045856 0 -2141008008 32767 0 0 214814639 1 0 0 -2141007448 32767 0 0 214814639 1 -487430544 32766 539243238 32767 -2141007448 32767 6 0 -487430496 32766 539279361 32767 0 0 0 0 0 0 0 0 -487430272 32766 539271526 32767 92 68 68 0 0 0 69 0 -2141007976 32767 0 0 42 68 55 46 10 40 44 100 75 63 19 13 10 95 43 47 47 49 59 40 0 0 -2141006600 %
The reason this is strange is because although I am getting some garbage values, the entire sequence from 42...40 matches numbers in my data file. I'm not exactly sure where I'm going wrong in this code so if anyone knows, please do share.
As always, if you take the time to answer or attempt to answer this question, thank you for your time. If you need further clarification, don't hesitate to ask.
This is a "fixed" version of your code. But you will notice that it does not print the first character. Lets say, if the first number in your file is, 220, then it will print 20.
The reason is - your program first takes away a character from file in c=fgetc(file). So at first iteration, it takes away the first character 2 from 220 and then stores 20 in the memory. Thought this problem does not occur for the rest of the iterations because the first character is comma in those cases.
To go around that problem, we can just put c=getc(file) at the end of the loop. This way, after entering the loop, it reads the first number, gets rid of the comma, reads next number, gets rid of the comma....
#include<stdio.h>
#include<stdlib.h>
int *createBuffer(int count) {
FILE *file = fopen("filename.txt", "r");
int* buffer = (int*)malloc(sizeof(int)*(count + 1));
int number;
int ch;
int i = 0;
while (1) {
if (fscanf(file, "%i", &number)) {
buffer[i] = number;
i++;
}
ch = fgetc(file);
if(ch == EOF){
break;
}
}
return buffer;
}
void main(){
int* arr = createBuffer(10);
for(int i=0; i<10; i++){
printf("%d ",arr[i]);
}
printf("\n");
}
Related
I have a text file that contains:
1 1 1
1 2 2
1 3 2
1 7 5
1 8 4
1 9 4
1 10 2
...
and this is my function:
void addRatings()
{
int n,m,l;
int a[50][100];
MovieR = fopen("d://ratings.txt","r");
l = LineNum(MovieR);
MovieR = fopen("d://ratings.txt","r");
for(int i=0;i<l;i++)
{
fscanf(MovieR,"%[^\t]\t%[^\t]\t%[^\t]\n",&n,&m,&a[n][m]);
}
}
Now I want to get the first and second column for n and m
then I want to give third column to the a[n][m].
How can I do that?
You need to read the third value into a temporary variable, and then store that value into the array if and only if the following conditions are met:
fscanf returned 3, meaning that it actually found three numbers
the value for n is between 0 and 49 inclusive
the value for m is between 0 and 99 inclusive
And the code doesn't need to count the number of lines (using LineNum()). The loop should end when fscanf runs out of numbers to read, i.e. returns something other than 3.
The resulting code looks something like this:
void addRatings(void)
{
int a[50][100] = {{0}}; // initialize all ratings to 0
FILE *MovieR = fopen("d://ratings.txt", "r");
if (MovieR != NULL)
{
int n, m, rating;
while (fscanf(MovieR, "%d%d%d", &n, &m, &rating) == 3) // loop until end-of-file
{
if (n < 0 || n > 49 || m < 0 || m > 99) // check for valid indexes
break;
a[n][m] = rating;
}
fclose(MovieR);
}
}
community, I have little experience with C and I am on the learning curve right now.
I am working on a little project that involves dividing a 32-char string into 4 strings of 8 chars each in C.
The 32-char string should resemble a 32-bit instruction. Those "32 bits" are divided into 4 "8-bit" strings that I want to print out as Hex. The code below is what I got so far. The data types I am using are the ones I am using in the rest of my code. I intend to feed the unsigned char t variable into an Substitution Box program that will give me equivalent of that t char from the S-Box lookup table.
The code below seems to me like it should work
unsigned char inst[] = "10101010101010101111111100111101";
unsigned char in[8];
for (int i = 0; i < 33; i++){
if (i%8 == 0 && i != 0) {
unsigned char t = (unsigned char) strtol(in, NULL, 2);
printf("%x \n", t);
}
in[i%8] = inst[i];
printf("%c ", in[i%8]);
}
but the output looks like this:
1 0 1 0 1 0 1 0 3d
1 0 1 0 1 0 1 0 3d
1 1 1 1 1 1 1 1 3d
0 0 1 1 1 1 0 1 3d
I can see that in[i%8] = inst[i]; line is reading the chars from inst[] correctly, but the
if (i%8 == 0 && i != 0) {
unsigned char t = (unsigned char) strtol(in, NULL, 2);
printf("%x \n", t);
}
conditional statement prints the wrong hex.
The output should look like something like this
1 0 1 0 1 0 1 0 aa
1 0 1 0 1 0 1 0 aa
1 1 1 1 1 1 1 1 ff
0 0 1 1 1 1 0 1 3d
Any help would be appreciated.
Problems with the current code:
"4 strings of 8 chars each" is char in[4][8+1]; and not char in[8]. You need room for null termination.
32 bits means iterate from 0 to 31, not from 0 to 32.
There's no need to copy byte per byte. It's slow and makes everything needlessly complicated.
This seems to be the requirements:
Split the original string in 4 sub strings.
Convert each sub string to an integer.
Display the result as hex
In which case you can simply iterate 4 times over the input string:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main (void)
{
const char* inst = "10101010101010101111111100111101";
char in [4][8+1];
puts("Bin Hex");
for(size_t i=0; i<4; i++)
{
memcpy(in[i], &inst[i*8], 8);
in[i][8] = '\0';
unsigned long val = strtoul(in[i], NULL, 2);
printf("%.8s %.2lX\n", in[i], val);
}
}
Output:
Bin Hex
10101010 AA
10101010 AA
11111111 FF
00111101 3D
The problem is your in is not NUL terminated.
Thus passing in to strol invokes the undefined behavior.
Do as below.
unsigned char in[9]; //+1 to hold the NUL char.
....
if (i%8 == 0 && i != 0) {
in[8] = '\0'; //NUL terminate the string.
unsigned char t = (unsigned char) strtol(in, NULL, 2);
printf("%x \n", t);
}
I am trying to write a function which gets a matrix 9x9 and updates it accordingly to user's input with the following rules:
Valid number is between 1 and 9 (zero is invalid).
I have to use scanf until I get EOF.
Input has digits and symbols. valid input is a pair of two digits following with a symbol or EOF or space. string with more than two digits is invalid. for example (123% isn't valid but 12% is valid).
Example:
Input: 10 33%55^21 $123%
Output:
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
Explanation: 10 and 123 are invalid. 33, 55 and 21 are valid so we will put 1 in 22, 44 and 10.
What I tried to do:
void updateMarix(int matrix[][9]) {
int digits = 0, one_previous, two_previous;
char input;
while (scanf("%c", &input) != EOF) {
if(isValidDigit(input)) {
digits++;
if(digits == 1) {
two_previous = input - '0' - 1;
continue;
} else if(digits == 2){
one_previous = input - '0' -1;
continue;
}
} else if(digits == 2) {
matrix[two_previous][one_previous]++;
}
digits = 0; // reset
}
}
most tests are end with success, but some of them are fail. I think that is because I don't handle with the last input (if for example it ends with 22 it won't update it because with my implementation, the update is in the next iteration when other symbol got as input).
Is there a better implementation for this? My code became messy and not clean.
*Edit: It should ignore invalid input and a3b doesn't counts, a03b also doesn't counts but a13b does counts as 13 meaning we should increase the number in matrix[0][2].
Edit 2: #JonathanLeffler menationed FSM so I tried to create one:
Although it doesn't handles the case of 1234 (invalid number) or 123 (also invalid). The most similar thing was to create an arrow from second number to symbol (but it isn't quite true because in 1234%12 only 12 is valid.
I think your FSM needs 4 states plus the end state:
Zero digits read (D0).
One digit read (D1).
Two digits read (D2).
Digits are invalid but no more error reporting needed (DI).
There are 4 different inputs, too:
Digit 1-9.
Digit 0.
Other.
EOF.
I've used a switch on state and if/else code in each state, but it leads to somewhat verbose code. OTOH, I believe it handles inputs correctly.
/*
** FSM
** States: 0 digits (D0), 1 digit (D1), 2 digits (D2), digits invalid (DI)
** Inputs: digit 1-9 (D), digit 0 (0), other (O), EOF.
** Action: S - save, E - error, I - ignore, P - print
** Body of FSM encodes "action;state"
**
** State D0 D1 D2 DI
** Input
** D S;D1 S;D2 E;D2 I;DI
** O I;D0 E;D0 P;D0 I;D0
** 0 E;D2 E;D2 E;D2 I;DI
** EOF I;end E;end P;end I;end
*/
#include <assert.h>
#include <ctype.h>
#include <stdio.h>
enum State { D0, D1, D2, DI };
enum Input { Digit, Zero, Other, End };
static int debug = 0;
static enum Input input(int *rv)
{
int c = getchar();
if (debug)
printf("Input: %c\n", (c == EOF) ? 'X' : c);
*rv = c;
if (c == EOF)
return End;
if (isdigit(c))
{
*rv = c - '0';
return (c == '0') ? Zero : Digit;
}
return Other;
}
static void updateMatrix(int matrix[9][9])
{
char pair[2] = { 0, 0 };
enum State state = D0;
int c;
enum Input value;
while ((value = input(&c)) != End)
{
switch (state)
{
case D0:
if (value == Digit)
{
pair[0] = c;
state = D1;
}
else if (value == Zero)
{
fprintf(stderr, "Received zero digit - invalid\n");
state = DI;
}
else
{
assert(value == Other);
}
break;
case D1:
if (value == Digit)
{
pair[1] = c;
state = D2;
}
else if (value == Zero)
{
fprintf(stderr, "Received zero digit - invalid\n");
state = DI;
}
else
{
assert(value == Other);
fprintf(stderr, "Received one digit where two expected\n");
state = D0;
}
break;
case D2:
if (value == Digit)
{
fprintf(stderr, "Received more than two digits where two were expected\n");
state = DI;
}
else if (value == Zero)
{
fprintf(stderr, "Received zero digit - invalid\n");
state = DI;
}
else
{
assert(value == Other);
printf("Valid number %d%d\n", pair[0], pair[1]);
matrix[pair[0]-1][pair[1]-1] = 1;
state = D0;
}
break;
case DI:
if (value == Other)
state = D0;
break;
}
}
if (state == D2)
{
printf("Valid number %d%d\n", pair[0], pair[1]);
matrix[pair[0]-1][pair[1]-1] = 1;
}
else if (state == D1)
fprintf(stderr, "Received one digit where two expected\n");
}
static void dump_matrix(const char *tag, int matrix[9][9])
{
printf("%s:\n", tag);
for (int i = 0; i < 9; i++)
{
for (int j = 0; j < 9; j++)
printf("%4d", matrix[i][j]);
putchar('\n');
}
}
int main(void)
{
int matrix[9][9] = { 0 };
updateMatrix(matrix);
dump_matrix("After input", matrix);
return 0;
}
On your test input, it produces the output:
Received zero digit - invalid
Valid number 33
Valid number 55
Valid number 21
Received more than two digits where two were expected
After input:
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
On the mostly-invalid input file:
123345132
bbbb12cccc1dddd011dd
it produces the output:
Received more than two digits where two were expected
Valid number 12
Received one digit where two expected
Received zero digit - invalid
After input:
0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
You can argue (easily) that the error messages could be more informative (identifying the erroneous character, and possibly the prior valid digits), but it only produces one error message for each invalid sequence, which is beneficial.
You could use a combination of fgets(), sscanf() and strpbrk() for this.
The input line is read into a character array str and a pointer ptr pointing to the part of the string in str being processed is maintained.
First, set up a loop to read input line by line. fgets() will return NULL on EOF.
for(; fgets(str, sizeof(str), stdin); )
{
...
...
...
}
fgets() will read in the trailing newline as well. You could remove it like
str[strlen(str)-1]='\0';
Now inside the above loop, use another loop to process the input line in str like
for(ptr=str; (ptr=strpbrk(ptr, "0123456789"))!=NULL; ptr+=len)
{
sscanf(ptr, "%d%n", &n, &len);
if(n>10 && n<100)
{
//accepted
printf("\n%d", n);
arr[n/10][n%10]=1;
}
//else discarded
}
strpbrk()'s prototype is
char *strpbrk(const char *s1, const char *s2);
and it returns a pointer to the first character in s1 which is a character in the string s2. If there is no match, NULL is returned.
So we are looking to see the first digit part in str that remains to be processed with strpbrk(ptr, "0123456789").
This number part is read into n via sscanf(). If this number is in the range you need, you may accept it.
The %n format specifier is used to find out the number of characters which has been scanned with the sscanf() inorder to find the value by which ptr must be updated. See this post.
The digit in the ones place will be n%10 and that in the tens place will be n/10 as the number you need is a 2-digit number.
You may set your array representing the matrix like
arr[n/10][n%10]=1;
So the whole thing could look something like
char *ptr, str[50];
for(; fgets(str, sizeof(str), stdin); )
{
for(ptr=str, str[strlen(str)-1]=0; (ptr=strpbrk(ptr, "0123456789"))!=NULL; ptr+=len)
{
sscanf(ptr, "%d%n", &n, &len);
if(n>10 && n<100)
{
printf("\n%d", n);
arr[n/10][n%10]=1;
}
}
}
And for your input 10 33%55^21 $123%, the output would be
33
55
21
as 10 and 123 will be discarded.
I have a txt file that contains 2 graphs and the number of vertices in the following format:
6
0 1 0 1 0 0
1 0 1 0 0 1
0 1 0 1 0 0
1 0 1 0 1 0
0 0 0 1 0 1
0 1 0 0 1 0
0 1 0 0 1 0
1 0 1 0 0 0
0 1 0 1 0 1
0 0 1 0 1 0
1 0 0 1 0 1
0 0 1 0 1 0
The matrices represent vertice adjacency. If two vertices are adjacent, their pair gets 1.
Although the graphs are not separated visually, the second graph starts after the 6th row of the first.
Each graph can have a lot of vertices, like 5000 and they are both of the same size (the graphs).
I wrote an algorithm that checks if the two graphs are isomorphic and i noticed that reading the graphs takes 8 seconds and the actual algorithm takes 2.5 (for 5000 vertices).
Since my goal is to optimize the overall speed of my program, I want to know if i can improve (in terms of speed) my current code of reading from file:
FILE* file = fopen ("input.txt", "r");
fscanf (file, "%d", &i);
int n = i;
while (!feof (file))
{
fscanf (file, "%d", &i);
if (j < (n*n)) { // first graph
if (i==1) {
adj_1[j/n][v_rank_1[j/n]] = j - (j/n)*n; // add the vertice to the adjacents of the current vertice
v_rank_1[j/n] += 1;
}
}
else if (j>=(n*n)) { // second graph
if (i==1) {
adj_2[(j-(n*n))/n][v_rank_2[(j-(n*n))/n]] = (j-(n*n)) - ((j-(n*n))/n)*n; // add the vertice to the adjacents of the current vertice
v_rank_2[(j-(n*n))/n] += 1;
}
}
j++;
}
fclose (file);
The adj_* table holds the indexes of the adjacent vertices of a vertice
The v_rank_* table holds the number of vertices adjacent to a vertice
It is important that I acquire this and only this information from the graph.
The first optimization is to read the whole file in memory in one shot. Accessing memory in the loops will be faster than calling fread.
The second optimization is to do less arythmetic operations, even if it means more code.
Third optimization is treating the data from file as characters to avoid integer conversion.
The result could be:
// bulk read file into memory
fseek(file, 0, SEEK_END);
long fsize = ftell(file);
fseek(file, 0, SEEK_SET);
char *memFile = malloc(fsize + 1);
if (memFile == NULL) return; // not enough memory !! Handle it as you wish
fscanf(file, "%d", &n);
fread(memFile, fsize, 1, file);
fclose(file);
memfile[fsize] = 0;
// more code but less arythmetic operations
int lig, col;
char *mem = memFile, c;
for (int lig = 0; lig < n; lig++) { // first graph
for (int col = 0; col < n; col++) {
for (;;)
{
c = *mem;
if (c == 0) break;
mem++;
if (c == '1') {
adj_1[lig][v_rank_1[lig]++] = col; // add the vertice to the adjacents of the current vertice
k++; // ??
break;
}
if (c == '0') break;
}
}
}
for (int lig = 0; lig < n; lig++) { // second graph
for (int col = 0; col < n; col++) {
c = *mem;
if (c == 0) break;
mem++;
if (c == '1') {
adj_2[(lig][v_rank_2[lig]++] = col; // add the vertice to the adjacents of the current vertice
l++; // ??
break;
}
if (c == '0') break;
}
}
}
free(memFile);
Remarks: you said nothing about variables k and l.
You could speed it up by accessing the file system less often. You are reading one integer at a time from the file thus accessing the file every time through the loop.
Instead, try reading the whole file or a large chunk of the file at once. (This is called block reading). You can buffer it into an array. Inside your loop, read from the memory buffer instead of the file. Refresh your memory buffer as needed inside the loop if you don't read in the entire file.
Use fgets() to read a line at a time into a line buffer. Parse the line buffer into integer values.
This function reduces the number of times you read from the file, because behind the scenes, fgets() reads a large chunk of data from the file and returns a line at a time. It only attempts to read another chunk when there are no more lines left in its internal buffer.
I want to open a text file (see below), read the first int in every line and store it in an array, but I get an segmentation fault. I got rid of all gcc warnings, I read through several tutorials I found on the net and searched stackoverflow for solutions, but I could't make out, what I am doing wrong.
It works when I have everything in the main function (see example 1), but not when I transfer it to second function (see example 2 further down). In example 2 I get, when I interpret gdb correctly a seg fault at sscanf (line,"%i",classes[i]);.
I'm afraid, it could be something trivial, but I already wasted one day on it.
Thanks in advance.
[Example 1] Even though that works with everything in main:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
const int LENGTH = 1024;
int main() {
char *filename="somedatafile.txt";
int *classes;
int lines;
FILE *pfile = NULL;
char line[LENGTH];
pfile=fopen(filename,"r");
int numlines=0;
char *p;
while(fgets(line,LENGTH,pfile)){
numlines++;
}
rewind(pfile);
classes=(int *)malloc(numlines*sizeof(int));
if(classes == NULL){
printf("\nMemory error.");
exit(1);
}
int i=0;
while(fgets(line,LENGTH,pfile)){
printf("\n");
p = strtok (line," ");
p = strtok (NULL, ", ");
sscanf (line,"%i",&classes[i]);
i++;
}
fclose(pfile);
return 1;
}
[Example 2] This does not with the functionality transfered to a function:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
const int LENGTH = 1024;
void read_data(int **classes,int *lines, char *filename){
FILE *pfile = NULL;
char line[LENGTH];
pfile=fopen(filename,"r");
int numlines=0;
char *p;
while(fgets(line,LENGTH,pfile)){
numlines++;
}
rewind(pfile);
* classes=(int *)malloc(numlines*sizeof(int));
if(*classes == NULL){
printf("\nMemory error.");
exit(1);
}
int i=0;
while(fgets(line,LENGTH,pfile)){
printf("\n");
p = strtok (line," ");
p = strtok (NULL, ", ");
sscanf (line,"%i",classes[i]);
i++;
}
fclose(pfile);
*lines=numlines;
}
int main() {
char *filename="somedatafile.txt";
int *classes;
int lines;
read_data(&classes, &lines,filename) ;
for(int i=0;i<lines;i++){
printf("\nclasses[i]=%i",classes[i]);
}
return 1;
}
[Content of somedatafile.txt]
50 21 77 0 28 0 27 48 22 2
55 0 92 0 0 26 36 92 56 4
53 0 82 0 52 -5 29 30 2 1
37 0 76 0 28 18 40 48 8 1
37 0 79 0 34 -26 43 46 2 1
85 0 88 -4 6 1 3 83 80 5
56 0 81 0 -4 11 25 86 62 4
55 -1 95 -3 54 -4 40 41 2 1
53 8 77 0 28 0 23 48 24 4
37 0 101 -7 28 0 64 73 8 1
...
This:
sscanf (line,"%i",classes[i]);
is probably wrong. You need to dereference there too, try:
sscanf (line,"%i", &(*classes)[i]);
This is because classes is a pointer to an array of integers. You want the address of one of those integers, so that sscanf() can write the parsed number there. Therefore, you must first dereference classes to get the array, then say that you want the address of element number i in that array.
You could also use
sscanf (line,"%i", *classes + i);
Which might be clearer, depending on how comfortable you are with these things.
The problem is you're applying the [] operator to an int* in the first case and an int** in the second. The int** is like a 2d array, when you use the [] operator in conjunction with the int** you are indexing into an array of int*. In your case this is not what you want, because you only initialize the first the first entry in this array. So when you access classes[1] it will crash because it's uninitialized. You could avoid yourself this confusion by passing in the pointer as a reference instead of a double pointer:
int*& classes instead of int** classes
Then you could use the same code as from your main function.