Read a txt file with gets in C - c

I want to know what is the best option to read a txt file that contain two line of numbers using gets function in c and save them in an array within 1 second.
Assume the following example as an txt file called ooo.txt and it has the number 2.000.000 in the first line (which will be the size of the array) and 2.000.000 number in the second line that will be stored in the array.
Eg
2000000
59 595 45 492 89289 5 8959 (+1.999.993 numbers)
code i try (only the fcanf function)
int t_size;
fscanf(fp, "%d",&t_size); //bypass the first character!
int* my_array = NULL;
my_array = malloc(t_size*sizeof(*my_array));
if (my_array==NULL) {
printf("Error allocating memory!\n"); //print an error message
return 1; //return with failure
getchar();
}
int i =0;
for ( i = 0; i < t_size; i++ )
{
fscanf(fp, "%d",&my_array[i]); /*p[i] is the content of element at index i and &p[i] is the address of element at index i */
}
best, so far, code to make the procedure in 1 second
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <time.h>
int is_end(char* input) {
return *input == 0;
}
int is_linebreak(char* input) {
return *input == '\r' || *input == '\n' || *input == ' ';
}
char* eat_linebreaks(char* input) {
while (is_linebreak(input))
++input;
return input;
}
size_t count_lines(char* input) {
char* p = input;
size_t rows = 1;
if (is_end(p))
return 0;
while (!is_end(p)) {
if (is_linebreak(p)) {
++rows;
p = eat_linebreaks(p);
}
else {
++p;
}
}
return rows;
}
/* split string by lines */
char** get_lines(char* input, size_t line_count) {
char* p = input;
char* from = input;
size_t length = 0;
size_t line = 0;
int i;
char** lines = (char**)malloc(line_count * sizeof(char*));
do {
if (is_end(p) || is_linebreak(p)) {
lines[line] = (char*)malloc(length + 1);
for (i = 0; i < length; ++i)
lines[line][i] = *(from + i);
lines[line][length] = 0;
length = 0;
++line;
p = eat_linebreaks(p);
from = p;
}
else {
++length;
++p;
}
} while (!is_end(p));
// Copy the last line as well in case the input doesn't end in line-break
lines[line] = (char*)malloc(length + 1);
for (i = 0; i < length; ++i)
lines[line][i] = *(from + i);
lines[line][length] = 0;
++line;
return lines;
}
int main(int argc, char* argv[]) {
clock_t start;
unsigned long microseconds;
float seconds;
char** lines;
size_t size;
size_t number_of_rows;
int count;
int* my_array;
start = clock();
FILE *stream;
char *contents;
int fileSize = 0;
int i;
// Open file, find the size of it
stream = fopen(argv[1], "rb");
fseek(stream, 0L, SEEK_END);
fileSize = ftell(stream);
fseek(stream, 0L, SEEK_SET);
// Allocate space for the entire file content
contents = (char*)malloc(fileSize + 1);
// Stream file into memory
size = fread(contents, 1, fileSize, stream);
contents[size] = 0;
fclose(stream);
// Count rows in content
number_of_rows = count_lines(contents);
// Get array of char*, one for each line
lines = get_lines(contents, number_of_rows);
// Get the numbers out of the lines
count = atoi(lines[0]); // First row has count
my_array = (int*)malloc(count * sizeof(int));
for (i = 0; i < count; ++i) {
my_array[i] = atoi(lines[i + 1]);
}
microseconds = clock() - start;
seconds = microseconds / 1000000.0f;
printf("Took %fs", seconds);
return 0;
}

First of all, you will want to use fgets instead to avoid dangerous buffer overflows. Second, you want to remove all punctuation from your numbers. Thus 2.000.000 becomes 2000000. Then you can use pointers and the strtol function to convert characters to integers; there are also other functions to convert to floats and other types.

Since code wants speed and IO is a typically bottle-neck, reading the entire file at once after using fstat() to find its length (#Charlon) makes some sense. Following is a quick parsing of that buffer.
// Stream file into memory
size = fread(contents, 1, fileSize, stream);
contents[size] = 0;
fclose(stream);
#if 1
// new code
size_t array_n;
int n;
if (sscanf(contents, "%zu%n", &array_n, &n) != 1) Handle_BadInput();
my_array = malloc(array_n * sizeof *my_array);
if (my_array == NULL) Handle_OOM();
char *p = &contents[n];
errno = 0;
char *endptr;
for (size_t count = 0; count < array_n; count++) {
my_array[count] = strtol(p, &endptr, 10);
if (p == endptr || errno)
Handle_BadInput();
p = endptr;
}
char ch;
if (sscanf(p, " %c", &ch) == 1) Handle_ExtraInput();
#else
//old code
// Count rows in content
number_of_rows = count_lines(contents);
// Get array of char*, one for each line
lines = get_lines(contents, number_of_rows);
// Get the numbers out of the lines
count = atoi(lines[0]); // First row has count
my_array = (int*)malloc(count * sizeof(int));
for (i = 0; i < count; ++i) {
my_array[i] = atoi(lines[i + 1]);
}
#endif
Still prefer the scale-able approach of reading one number at a time.

The fastest way needs a lot of RAM :
1) open the file (man open)
2) use the fstat function to get the size of you file (man fstat)
3) read the file with a buffer malloc-ed with the size you just get at 2) (man malloc)
4) close the file (man close)
5) parse your buffer and transform each block of digits (each time until ' ' or '\0') to int
EDIT : if your RAM is not enough large, you need to create a get_next_int function that only stores in your buffer the next number in the file
EDIT 2 : You can read until you know the number of int you will need to store and compares this number with a security coef to the size of your ram, and use the good way so that your program won't set errno to ENOMEM if you know what I'm talking about ;)

Related

How so separate a string and order each record by left number in C

I have a file.csv . it contains two numbers separated by a comma.I put every line , such as string, in a pointer of char char *arr. My aim is to sort in ascending order by left number (number before comma i.e. //this is the example of what I have to sort, the whole example is below:
9514902
1134289
7070279
ecc..)
I tried strtok() but it delete the number after comma. I need both of the numbers for each couple.
To order the numbers I used Insertion Sort, trasforming my strings (couple of numbers with comma for me is a string) in long integers in order to compare them. swap function doesn't work because it returns me numbers that I've never passed him.
How can I resolve it?
main.c
#define SIZE 10
#define LEN 20
void swap(char *xp, char *yp){
char *temp=xp;
*xp = *yp;
*yp = *temp;
}
int main(){
FILE *fd = NULL;
fd = fopen("file.csv", "r");
int pos=0;
char (*arr)[LEN] = NULL;
arr = calloc ( SIZE, sizeof *arr);
while ( pos < SIZE && fgets ( arr[pos], sizeof arr[pos], fd)) {
++pos;
}
int i, j;
char *ptr;
for (i = 1; i < SIZE; i++){
char *p = strtok(arr[i], ",");
long pivot= strtol(p,&ptr,10);
char * c = strtok(arr[i-1], ",");
long value= strtol(c,&ptr,10);
for (j = i - 1; (j >= 0) && (value>pivot); j--){
swap(arr[j],arr[j+1]);
j--;
c = strtok(arr[j], ",");
value= strtol(c,&ptr,10);
}
}
}
file.csv
9514902,846
1134289,572
7070279,994
30886,48552
750704,1169
1385812,729
471548,3595
8908491,196
4915590,362
375309,212
You can do it without strtok too, Maybe this is not you looking for but you can look as another way of doing what you want, Have Fun
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(){
FILE* pFile = fopen("file.csv", "r");
if(pFile == NULL)
return 1;
char aBuf[256];
char aResult[1024];
aResult[0] = '\0';
// get each line
for(int i=0; fgets(aBuf, sizeof(aBuf), pFile) != NULL; i++){
// find comma in line
for(int j=0; j < strlen(aBuf); j++){
if(aBuf[j] != ',')
continue;
// copy everything before comma
char aAnotherBuffer[50];
strncpy(aAnotherBuffer, aBuf, j);
// convert it to integer
int FirstNum = atoi(aAnotherBuffer);
// get after of comma and convert it too
int SecondNum = atoi(aBuf + j + 1);
// make line that with sorted values
char aRes[256];
sprintf(aRes,
"%d,%d\n",
FirstNum < SecondNum? FirstNum: SecondNum,
FirstNum > SecondNum? FirstNum: SecondNum
);
// concatenate to result buffer
strcat(aResult, aRes);
// go for next line
break;
}
}
fclose(pFile);
// save results
{
FILE* pResFile = fopen("result.csv", "w");
if(pResFile){
fputs(aResult, pResFile);
fclose(pResFile);
}
}
return 0;
}

find special cases while comparing files in reverse order

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BS 12
void reverse(char * buffer, int size)
{
char tmp;
int i;
for(i = 0; i < size / 2; i++)
{
tmp = (char)buffer[i];
buffer[i] = buffer[size - i - 1];
buffer[size - i - 1] = tmp;
}
}
int compare_bin(char * buffer, char * buffer2, int size)
{
// because strncmp is only for string without \x00, so there must be a customized compare function
int i;
for(i = 0; i < size; i++)
{
if(buffer[i] != buffer2[i])
return 0;
}
return 1;
}
int main (const int argc, const char** argv)
{
if(argc != 3)
exit(-1);
int equal = 1;
char * buffer = malloc(BS), * buffer2 = malloc(BS);
FILE * f1, * f2;
f1 = fopen(argv[1], "r");
f2 = fopen(argv[2], "r");
fseek(f1, 0, SEEK_END);
fseek(f2, 0, SEEK_END);
long i = ftell(f1), j = ftell(f2);
if(i != j)
{
equal = 0;
goto endp;
}
fseek(f2, 0, SEEK_SET);
int need = 0;
int count;
int f2_pos = 0;
do
{
i = i - BS;
if(i < 0)
{
need = BS - abs((int)i);
i = 0;
}
else
need = BS;
fseek(f1, i, SEEK_SET);
count = fread(buffer, need, 1, f1);
reverse(buffer, count * need);
// fwrite(buffer, count * need, 1, f2);
fread(buffer2, need * need, 1, f2);
// printf("compare...\n");
// for(int i = 0; i < need * count; i++)
// {
// printf("%02hhX", buffer[i]);
// }
// printf("\n");
// for (int i = 0; i < need * count; i++)
// {
// printf("%02hhX", buffer2[i]);
// }
// printf("\n");
if(compare_bin(buffer, buffer2, need * count) == 0)
{
equal = 0;
break;
}
f2_pos += need * count;
fseek(f2, f2_pos, SEEK_SET);
if(i == 0)
break;
}while(i > 0);
fclose(f1);
fclose(f2);
free(buffer);
free(buffer2);
endp:
if(equal)
return 0;
else
{
printf("2 files not equal is reversed order\n");
return 1;
}
return 0;
}
So I write a program to compare file content in reverse order. I have already considered \x00 in binary file and strncmp isn't used. But there is still flaw. There is a test server to test this program. But I dont have access to it. This program always fails on that server. So there must be some special cases to make it fail. Any idea?
There are other ways around it. For instance, calculating MD5. But I want to fix this.
For the very first iteration where you read data you have
fread(buffer2, need * need, 1, f2);
The problem is that in that case need is 12, which is the size of the memory allocated for buffer2, but you ask to read 12 * 12 bytes.
If the second file is large enough, you will write out of bounds in the memory, leading to undefined behavior. If the file is not large enough then you won't read anything.
Also note that the order of the two middle arguments to fread matter. If you changed the order you would write out of bounds of the buffer both if the file is larger than need * need or not. You should really read count byte-sized object (the second argument should be 1 and the third should be count, which of course mean you need to change the order in the first call as well).
In short, your two fread calls should be
count = fread(buffer, 1, BS, f1);
fread(buffer2, 1, count, f2);
PS. Don't forget error checking.

The end of an integer line in C

I have to read an undefined matrix from a text file in C language, and i want to read it line by line so that each line will be an integer array.But how do i know where is the end of a line, since i can't use "\n" as in for characters?
Here is the code:
#include "stdafx.h"
#include "stdlib.h"
#include "stdio.h"
using namespace System;
typedef struct
{
int *v;
int n;
}vector;
int main(array<System::String ^> ^args)
{
vector *a;
FILE* f;
int n = 15;
int i = 0;
int j,k;
if ((f = fopen("C:\\Users\\Mirelaa\\Documents\\visual studio 2013\\Projects\\MatriceNedefinita\\MatriceNedefinita\\Debug\\fisier2.in", "rt")) == NULL)
{
printf("Fisierul nu poate fi deschis!");
exit(1);
};
a = (vector *)malloc(n * sizeof(vector));
while (!feof(f))
{
a[i].v = (int*)malloc(n * sizeof(int));
a[i].n = 0;
//citeste rand
//citesti fiecare element din rand
j = 0;
while (a[i].v[j] != '\0')// wrong!!
{
fscanf(f, "%d", &a[i].v[j]);
j++;
a[i].n = a[i].n + 1;
}
for (k = 0 ; k < a[i].n ; k++)
{
printf("%d", a[i].v[j]);
printf("\n");
}
i++;
if (i == n)
{
n = 2 * n;
a = (vector *)realloc(a, n * sizeof(vector));
a[i].v = (int *)realloc(a[i].v, n * sizeof(int));
}
}
return 0;
}
Reading a line of integers and saving in a variable sized array is one approach.
The trouble with fscanf(f, "%d",... is that it first reads white-space and code loses the occurrence of '\n'. Code needs to look for it by some other means.
But rather than pack all the code in main(), consider helper functions. Following C function reads one line of numbers and return NULL on 1) out-of-memory, 2) no data or conversion failure with no numbers read. Otherwise return vector. It is not limited to any line length.
typedef struct {
int *v;
size_t n;
} vector;
vector *Read_vector(vector *v1, FILE *inf) {
v1->v = NULL;
v1->n = 0;
size_t size = 0;
for (;;) {
int number;
int ch;
// Check leading white-space for \n
while (isspace(ch = fgetc(inf))) {
if (ch == '\n') break;
}
if (ch == '\n' || ch == EOF) break;
ungetc(ch, inf);
if (1 != fscanf(inf, "%d", &number)) {
break;
}
// Is `v1` large enough?
if (v1.n >= size) {
size = size*2 + 1;
vector nu;
nu.v = realloc(v1->v, size * sizeof *nu.v);
if (nu.v == NULL) {
free(v1->v);
v1->v = NULL;
v1->n = 0;
return NULL;
}
v1->v = nu.v;
}
v1.v[v1.n++] = number;
}
if (v1->n == 0) {
return NULL;
}
return v1;
}
With repeated calls, an array of vectors could be had. Leave that to OP as it is very similar to the above.
Note: avoid use while (!feof(f)).
I would read in a lines with fgets...(see e.g. here)
char *fgets(char *str, int n, FILE *stream)
you can read a line up to $n$ characters line into a string.
This method gives you a way of loading each line individually to a string.
Edit after good comment
It is easier to split up the string with strtok and then use atoi to convert each bit of the string to an integer (rather than use sscanf as in my original answer).

Creating array to hold 1000000 numbers

I wrote a code in C that read a text file with numbers into memory and the create an 2d int array to store them.
The file has the following format:
9
9 5 6 2235 45558 6 5544 56565 2
The first number is the size of the array and the second line holds as many numbers as the first line says.
MY problem is that the size of the array can't hold more than ~30.000 numbers. How can I make the following code so I can make the array hold until 1.000.000 numbers? I know that I should use some king of long integer but I couldn't do it.
Heres the code
#include <stdio.h>
#include <stdlib.h>
int is_end(char* input) {
return *input == 0;
}
int is_separator(char* input) {
return *input == '\n' || *input == ' ';
}
char* eat_separators(char* input) {
while (is_separator(input))
++input;
return input;
}
size_t count_lines(char* input) {
size_t rows = 1;
while (!is_end(input)) {
if (is_separator(input)) {
++rows;
input = eat_separators(input);
}
else {
++input;
}
}
return rows;
}
char** get_lines(char* input, size_t number_of_rows) {
char* from = input;
size_t length = 0;
size_t line = 0;
size_t i;
char** lines = (char**)malloc(number_of_rows * sizeof(char*));
do {
if (is_end(input) || is_separator(input)) {
lines[line] = (char*)malloc(length + 1);
for (i = 0; i < length; ++i)
lines[line][i] = *(from + i);
lines[line][length] = 0;
length = 0;
++line;
input = eat_separators(input);
from = input;
}
else {
++length;
++input;
}
} while (!is_end(input));
/*
lines[line] = (char*)malloc(length + 1);
for (i = 0; i < length; ++i)
lines[line][i] = *(from + i);
lines[line][length] = 0;
++line; */
return lines;
}
int main(int argc, char* argv[]) {
char** lines;
size_t size;
size_t number_of_rows;
int count;
int* children;
FILE *input, *output;
char *contents;
int fileSize = 0;
int i;
input = fopen("xxx.in", "r");
long int filepos = 0L;
fseek(input, 0L, SEEK_END);
fileSize = ftell(input);
fseek(input, 0L, SEEK_SET);
contents = (char*)malloc(fileSize + 1);
size = fread(contents, 1, fileSize, input);
contents[size] = 0;
fclose(input);
number_of_rows = count_lines(contents);
lines = get_lines(contents, number_of_rows);
if ((count = atoi(lines[0])) <= 0 || count > 1000000){
return 1;
}
children = (int*)malloc(count * sizeof(int));
for (i = 0; i < count; ++i) {
if ((children[i] = atoi(lines[i + 1])) <= 0 )
return(-1);
}
// a check to see if everything stored in the array
for(i = 0;i<count;i++)
{
printf(" %d : %d\n", i, children[i]);
}
free(children);
free(lines);
// This is the end! Oh my dear friend, the end!
return 0;
}
First Let me explaint the reason of having only 30.000 number that will give reply to your question?
Basically you are trying to convert the character to ASCII values. Let us take the example of character x whos ASCII value is 120. You are changing the character x with 120, the storage capacity of x is 1 byte but the storage capacity of 120 is 3 bytes. So, basically you have to do memory allocation of 3 times higher the actual value computed as 1 byte is expanding into 3 bytes.
In Your code increase the memory allocation 3 times then your problem would be solved.

Getting every other line empty on output

I have a problem with getting every other line empty on output with this code. The desired output is: http://paste.ubuntu.com/1354365/
While I get: http://paste.ubuntu.com/1356669/
Does anyone have an idea of why I'm getting these empty lines on every other line?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
FILE *fp;
FILE *fw;
int main(int argc, char *argv[]){
char buffer[100];
char *fileName = malloc(10*sizeof(char));
char **output = calloc(10, sizeof(char*));
char **outputBuffer = calloc(10, sizeof(char*));
fw = fopen("calvin.txt", "w+");
for(int y = 0; y < 6; y++){
for(int i = 0; i < 10; i ++)
{
output[i] = malloc(100);
}
for(int x = 0; x < 12; x++){
sprintf(fileName,"part_%02d-%02d", x, y);
fp = fopen(fileName, "rb");
if(fp == NULL)
{
printf("Kan ikke åpne den filen(finnes ikke/rettigheter)\n");
}
else if(fp != NULL){
memset(buffer, 0, 100);
for(int i = 0; i < 10; i++){
outputBuffer[i] = malloc(100);
}
fread(buffer, 1, 100, fp);
for(int i = 0; i < 100; i++){
if(buffer[i] == '\0')
{
buffer[i] = ' ';
}
else if(buffer[i] == '\n')
{
buffer[i] = ' ';
}
}
for(int i = 0; i < 10; i++) {
strncpy(outputBuffer[i], buffer + i * 10, 10);
strncat(output[i], outputBuffer[i]+1, 11);
}
}
}
for(int i = 0; i < 10; i++){
printf("%s\n", output[i]);
}
}
fclose(fp);
free(fileName);
}
You are not reading correcting from the file. On the first image in the beginning you have:
o ""oo " o o o
on the second
""oo o o o
That does not make a lot of sense because it is the first line. It is not related to empty lines since we are talking about the first line.
It seems that you are reading -2 characters from the left so " prints over o the other " on the ' ' ect..
Try this away, may not be the most efficient solution:
int read(char *file)
{
FILE *fp = NULL;
int size = 0, pos = 0,i;
fp = fopen(file,"r");
if (!fp) return 0;
for(; ((getc(fp))!=EOF); size++); // Count the number of elements in the file
fclose(fp);
char buffer[size];
fp = fopen(file,"r");
if (!fp) return 0;
while((buffer[pos++]=getc(fp))!=EOF); // Saving the chars into the buffer
for(i = 0; i < pos; i++) // print them.
printf("%c",buffer[i]);
fclose(fp);
return 1;
}
This part seems problematic:
strncpy(outputBuffer[i], buffer + i * 10, 10);
strncat(output[i], outputBuffer[i]+1, 11);
1) Why is it necessary to use the extra outputBuffer step?
2) You know that strncpy() isn't guaranteed to null-terminate the string it copies.
3) More significantly, output[i] hasn't been initialized, so strncat() will concatenate the string after whatever junk is already in there. If you use calloc() instead of malloc() when creating each output[i], that might help. It's even possible that your output[i] variables are what hold your extra newline.
4) Even if initialized to an empty string, you could easily overflow output[i], since you're looping 12 times and writing up to 11 characters to it. 11 * 12 + 1 for the null terminator = 133 bytes written to a 100-byte array.
In general, unless this is a class assignment that requires use of malloc(), I don't understand why you aren't just declaring your variables once, at the start of the program and zeroing them out at the start of each loop:
char fileName[10];
char output[10][100];
char outputBuffer[10][100];
And, as stated by others, your allocating a bunch of memory and not trying to free it up. Allocate it once outside of your loop or just skip the allocation step and declare them directly.

Resources