fscanf can't read the entire file when file too large - c

I am trying to use C to read a large text file (say, a novel with more than ten thousand words) with fscanf, but each time the program will read until a certain point and seemingly reached EOF, while it is way far away from end of file.
Here is my code:
void putWordsInArray(){
FILE* fileToRead;
char words[1024];
int singleSize = 0;
singleWordArray = (char**)malloc(1024*sizeof(char*));
indexArray1 = (int*)malloc(1024*sizeof(int));
//read the temp file automatically generated
fileToRead = fopen("temp.txt", "r");
while(fscanf(fileToRead, "%s", words) != EOF){
printf("%s\n", words);
//the following if-else statement add the word read to the single word array
//if they are not yet in the array, or increment its count by 1 isf its already there
int result = -1;
result = contains(words, singleWordArray, singleSize);
if(result == -1){
indexArray1 = (int*)realloc(indexArray1, (singleSize + 1) * sizeof(int));
singleWordArray = (char**)realloc(singleWordArray, (singleSize + 1) * sizeof(char*));
singleWordArray[singleSize] = (char*)calloc((strlen(words) + 1), 1024*sizeof(char));
strcpy(singleWordArray[singleSize], words);
indexArray1[singleSize] = 1;
singleSize++;
}
else{
indexArray1[result] += 1;
}
//check if the current list contains the incoming word or not
}
}
Does anyone know what had happened here? I believe I have allocated enough memory each time. Thanks to whoever helps.

Related

Need to read a file with multiple lines of integers into an array

I need to read a file of ints into an array in C. A sample of the file I need to read is below, though note the files this will process can have thousands or hundreds of thousands of lines.
127
234
97
8723
I've gotten the file open in C, read how many lines there are so I know how many spaces my array needs, but I can't seem to read/parse each line into the array.
FILE *file;
int N = 0;
char filePath[30];
char endFile;
printf("What file should be used?\n");
scanf("%s", filePath);
file = fopen(filePath, "r");
if(file == NULL) {
printf("This file failed to open.\n");
break;
}
for(endFile = getc(file); endFile!=EOF; endFile=getc(file))
if(endFile == '\n') {
N = N+1;
}
int myArray[N];
while(fscanf(file, "%d\n", &a) != EOF) {
fscanf(file, "%d\n", &a); // I'm not sure this line is needed...
printf("%d\n", a);
M[i] = a;
}
From here, I need to read the file contents into myArray, with each line being the corresponding spot in the array (i.e. line zero is myArray[0], line one is myArray[1], etc.). I can't seem to find a way to do this, though I see several methods to do tab-delimited 2d arrays or csv multi-dimensional arrays.
Please also let me know if creating the array/determining the array size can be done in a better way than literally counting new-line characters...
There's no need to first "count the number of lines".
The following code cautiously grows an array of integers (by increments of 10).
#define GROW 10
int *rec = NULL, nRec = 0, sz = 0;
while( fgets( buf, sizeof buf, ifp ) != NULL ) {
if( nRec == sz ) {
rec = realloc( rec, (nRec+GROW) * sizeof *rec );
/*omitting test for failure */
sz += GROW;
}
rec[ nRec++ ] = atoi( buf );
}
This shows what is possible.
Note that realloc() can fail, returning NULL... It's up to you to add a bit of code to handle that condition.
Further, some conventional thought is to double the size of the allocation when needed (because realloc() may not be 'cheap'.) You can decide if you want to grow the array in increments (of 1024?) or grow it exponentially.

Getting data from a file and printing it out every time a new line starts

Im Completely new to programming but need to get a program running as part of my training. The programs ultimate goal is to read files from a database and then send them to the client who is asking for it.
Currently im just learning how to read strings from a file and write it to a different file. But my problem is that I want to print data out every time i hit a new line.
The data in the file im using is in the following format:
<DESCRIPTION>data,<DESCRIPTION>data,<DESCRIPTION>data etc.
The data is both int and chars.
Since the data is seperated with a "," i was thinking of first puting all "<DESCRIPTION>data" into substrings with the strtok function i managed to find while googling, after that i would scan only for the "DESCRIPTION" part and then put the desired data into an array that I then would print out when reaching the end of the array (end of the line) and then move on to the next line until End of file.
What functions can I use to fix this? Or how do I set up a loop that wont take forever by scanning all chars in the line everytime it wants data? If what im saying and what im doing is 2 different things I again apologize for being a total beginner at programming. I have been prgramming for a week now and this is all I could produce
#include <stdio.h>
#include <ctype.h>
void get9202() {
char
const * str;
const char s = ",";
char * token;
/*open database file*/
FILE * fp = fopen("datafile.dat", "r");
/*create array with all lines of data
I would like it to be able to handle unknown amounts of data.
current file is ~177000 lines of data.*/
int i = 0;
char line[i];
/*checking until end of file*/
while (fgets(line, sizeof(line), fp)) {
/*This part has to be included in the loop somehow but put in here
so that you might get a picture of what im trying to do.*/
while ( * str) {
if (!isspace( * str++))
i++;
else break;
/*not entirely sure how to exit this subloop
to print out the data and go to new line*/
}
/*trying to segment the string into an array of substrings
but dont know when to introduce x*/
token[x] = strtok(str, s);
while (token[x] != NULL) {
printf("%s\n,", token);
}
}
return result;
/* dont know how to return the file to main*/
flclose("datafile.dat");
}
If the data looks like this:
<SYMBOL>9202.T,<SYMSTAT>2,<MSGSTAT>0,<TIME>20:50:40.905246,<SYS_DT>2018/07/19,<SYS_TIM>20:50:40.503,<SYS_TIMU>20:50:40.503236
<SYMBOL>9202.T,<SYMSTAT>2,<MSGSTAT>0,<TIME>20:51:40.000235,<SYS_DT>2018/07/19,<SYS_TIM>20:51:39.598,<SYS_TIMU>20:51:39.598597
the expected file could look like
9202.T,2,0,20:50:40.905246
9202.T,2,0,20:51:40.000235
as the wanted pieces are being selected some will fall away.
Few problems:
Will declare zero length array.
int i=0;
char line[i];
fclose is never executed because of return also fclose needs FILE * as argument.
return result;
/* dont know how to return the file to main*/
flclose("datafile.dat");
Suggestions:
trying to segment the string into an array of substrings but dont
know when to introduce x
Use fgets with fscanf to parse your line since all the lines are identical.
dont know how to return the file to main
Define a structure with needed fields and return it to main.
Example:
typedef struct {
char symbol[50];
char symstat;
char msgstat;
char time[50];
}data;
data *get9202(int *numData) {
int memAllocated = 10;
data *mData = malloc(sizeof(*mData) * memAllocated);
FILE *fp = fopen("datafile.dat", "r");
char buf[3000];
int i = 0;
while (fgets(buf, sizeof buf, fp) != NULL) {
if (i == memAllocated) {
memAllocated *= 2;
void *temp = realloc(mData, sizeof( *mData) * memAllocated);
if (temp != NULL) mData = temp;
else break; //error
}
if (sscanf(buf, "<SYMBOL>%[^,],<SYMSTAT>%c,<MSGSTAT>%c,<TIME>%[^,]",
mData[i].symbol, &mData[i].symstat, &mData[i].msgstat, mData[i].time) == 4) {
i++;
} else {
printf("error\n"); //error
}
}
fclose(fp);
*numData = i;
return mData;
}
int main() {
int len = 0;
data *mData = get9202( &len);
int i = 0;
for (i = 0; i < len; i++)
printf("%s,%c,%c,%s\n", mData[i].symbol, mData[i].symstat, mData[i].msgstat,
mData[i].time);
if (mData) free(mData);
}

Char Array of Pointers storing the last line of the file for every subscript

so I'm writing a program that reads a mal file and it's supposed to print it to the output file. However, one of the problems I'm having is that for my first while loop, when I store the address of each line from the file to lineRef[i], inside that while loop, if I print it out immediately, it will print the correct line. However, in the 2nd while loop, it just prints the last line of the file.
FILE * inputFile = fopen("example.mal", "r");
FILE * outputFile = fopen("output.lst", "w");
char linesFromFile[256];
char * lineRef[50];
char * labels[50];
int i = 0;
while(fgets(linesFromFile, 256, inputFile) != NULL)
{
lineRef[i] = linesFromFile;
i++;
}
i = 0;
while(lineRef[i] != NULL)
{
printf("%s\n", lineRef[i]);
i++;
}
return 0;
lineRef[i] = linesFromFile;
Here you are assigning same address of linesFromFile to all elements of lineRef
You need to allocate memory for each line read into each element of lineRef then copy contents of linesFromFile into it.
e.g
while(fgets(linesFromFile, 256, inputFile) != NULL)
{
lineRef[i] = malloc(256);
memset(lineRef[i],'\0',256);
strncpy(lineRef[i] , linesFromFile,256);
i++;
}

Read access of a file to be shared by multiple threads: pthreads

I have to implement an application where user passes multiple words via command line and the application finds count of the word in each line of file. Each word will search the file in its own thread.
So far I have implemented it as single threaded app.
The code looks like:
//Below function reads file line and returns it
char* readLine(FILE* file, char* line)
{
if (file == NULL) {
printf("Error: file pointer is null.");
exit(1);
}
int maximumLineLength = 128;
char *lineBuffer = (char *) malloc(sizeof(char) * maximumLineLength);
if (lineBuffer == NULL) {
printf("Error allocating memory for line buffer.");
exit(1);
}
char ch = getc(file);//Get each character
int count = 0;
//loop for line or EOF
while ((ch != '\n') && (ch != EOF))
{
if (count == maximumLineLength)
{
maximumLineLength += 128;
lineBuffer = realloc(lineBuffer, maximumLineLength);
if (lineBuffer == NULL)
{
printf("Error reallocating space for line buffer.");
exit(1);
}
}
lineBuffer[count] = ch;
count++;
ch = getc(file);
}
lineBuffer[count] = '\0';//Add null character
line = (char *) malloc(sizeof(char) * (count + 1));
strncpy(line, lineBuffer, (count + 1));
free(lineBuffer);
return line;
}
//Below function finds the occurance of
//word in the line
//Need to refine to take into consideration
//scenarios such that {"Am"," am "," am","?Am",".Am"}etc
int findWord(char* line,char* word)
{
int count=0;
int lineLen = strlen(line);
int wordLen = strlen(word);
char* temp= (char *) malloc(sizeof(char) * (lineLen+1));
strcpy(temp,line);
while(true)
{
if( strstr(temp,word) == NULL)
break;
strcpy(temp, strstr(temp,word));
// printf("##%s\n",temp);
strcpy(temp,temp+wordLen+1);
// printf("##%s\n",temp);
count++;
}
//printf("%d\n",count);
free(temp);
return count;
}
//Below function fills the linked list for data structure lineCount
//with word occurance statistics
//line by line and the total
//The number of elements in the list would be number of lines in the
//file
LineCount* findCount(FILE* file, char* word,LineCount** lineCountHead)//Make it multithreaded fn()
{
LineCount* lineHead= NULL;
char* line = NULL;
int lineNumber=1;
int count=0;
if (file == NULL) {
printf("Error: file pointer is null.");
exit(1);
}
while (!feof(file)) {
LineCount* temp=NULL;
line = readLine(file, line);
//printf("%s\n", line);
count=findWord(line,word);
//Critical Section Start
temp=LineCountNode(lineNumber,count);
addToLineCountList(temp,lineCountHead);
//Criticla Section End
lineNumber++;
}
free(line);
return lineHead;
}
So basically I want my calling thread function to be LineCount* findCount(FILE* file, char* word,LineCount** lineCountHead)
My understanding is that, the file will be accessed - only for read purpose by the threads, so no need to take care of synchronization.
Currently I am opening the file as:
pFile = fopen (argv[1],"r");. My question is how do I open in read shared mode ?
I know in C++ there exists a read shared mode. How to achieve this in c?
Also how do I write my function LineCount* findCount(FILE* file, char* word,LineCount** lineCountHead) in the form required by thread call function i.e. the form void* fn(void*)
While in read-only mode there are no issues with the file itself, the IO functions in the standard C library are not designed to be usable from multiple threads in parallel. They are thread-safe (or at least, I think so) but using them correctly from multiple threads is not trivial.
At the lowest level, each FILE structure contains a file position pointer - or the IO functions maintain an OS-provided pointer. Having multiple threads mess with the file cursor position sounds like a good way to make your life more difficult than it should be.
The best approach would be to open your file multiple times - once in each thread. Each thread would then have its own FILE pointer, stream buffer etc. Note that this is not unique to C & POSIX threads - its an inherent issue with using multiple threads.
In any case, I am unsure what you are trying to achieve by using multiple threads. Generally search operations like this are I/O bound - multithreaded accesses to the same file are quite likely to make things worse.
The only case where it might make sense is if you had a huge amount of strings to search for and you had a single I/O thread feeding all other threads through a common buffer. That would distribute the CPU-intensive part, without causing undue I/O...

Trying to read text file into array without repeats in C

This is for a beginner's C programming unit. I'm trying to read a text file containing MAC addresses and the data they received, separate out the relevant data (address and number of packets), copy the addresses to an array without repeating any of them and sum the associated number of packets if an identical address is encountered.
I can read the file in just fine, and get the bits of each line I want without issue, but when I try to check each address read against those already in the array I hit a problem. Depending on the location of the integer counting the number of full lines, the program either fails to recognise identical strings and prints them all as they are in the file, or prints them over one another in addresses[0], leaving me with only the last address. I'm stumped and need some fresh eyes on this - any suggestions would be greatly appreciated.
My code follows:
static void readadds(char filename[])
{
FILE* packetfile = fopen(filename, "r");
FILE* datafile = fopen("packdata.txt", "w+");
// Open file from input; create temporary file to store sorted data.
char line[100];
char addresses[500][18];
int datasize[500];
int addressno = 0;
// Create storage for lines read from text file, addresses and related data.
if(packetfile != NULL)
{
while(fgets(line, sizeof line, packetfile) != NULL)
{
int linenum = 0;
char thisadd[18];
int thisdata;
//Create arrays to temp store data from each line
sscanf(line, "%*s %*s %s %i", thisadd, &thisdata);
for(int i = 0; i < 500; i++)
{
if(strcmp(thisadd, addresses[i]) == 0)
{ //check if the address is already in the array
int x = datasize[i];
datasize[i] = x + thisdata; //sum packet data if address already exists
printf("Match!\n");
break;
}
else
{
strcpy(addresses[linenum], thisadd); //initialize new address
datasize[linenum] = thisdata; //initialize assoc. data
linenum++;
addressno++;
printf("Started!\n");
break;
}
}
}
for(int i = 0; i <= addressno; i++)
{
printf("%s %i\n", addresses[i], datasize[i]);
fprintf(datafile,"%s %i\n", addresses[i], datasize[i]);
}
}
fclose(packetfile);
fclose(datafile);
}
This version prints over addresses[0]. If linenum is replaced by addressno in the for() loop, identical strings are not recognised. My dataset is arranged like this:
1378251369.691375 84:1b:5e:a8:bf:7f 68:94:23:4b:e8:35 100
1378251374.195670 00:8e:f2:c0:13:cc 00:11:d9:20:aa:4e 397
1378251374.205047 00:8e:f2:c0:13:cc 00:11:d9:20:aa:4e 397
1378251374.551604 00:8e:f2:c0:13:cc 00:11:d9:20:aa:4e 157
1378251375.551618 84:1b:5e:a8:bf:7c cc:3a:61:df:4b:61 37
1378251375.552697 84:1b:5e:a8:bf:7c cc:3a:61:df:4b:61 37
1378251375.553957 84:1b:5e:a8:bf:7c cc:3a:61:df:4b:61 37
1378251375.555332 84:1b:5e:a8:bf:7c cc:3a:61:df:4b:61 37
I'm almost certain this is what you're trying to do. The logic to add a new entry was incorrect. You only add one if you have exhausted searching all the current ones, which means you need to finish the current for-search before the add.
Note: Not tested for compilation, but hopefully you get the idea.
static void readadds(char filename[])
{
// Open file from input; create temporary file to store sorted data.
FILE* packetfile = fopen(filename, "r");
FILE* datafile = fopen("packdata.txt", "w+");
// Create storage for lines read from text file, addresses and related data.
char addresses[500][18];
int datasize[500];
int addressno = 0;
if (packetfile != NULL)
{
char line[100];
while(fgets(line, sizeof line, packetfile) != NULL)
{
char thisadd[18];
int thisdata = 0;
//Create arrays to temp store data from each line
if (sscanf(line, "%*s %*s %s %i", thisadd, &thisdata) == 2)
{
// try to find matching address
for(int i = 0; i < addressno; i++)
{
if(strcmp(thisadd, addresses[i]) == 0)
{
//check if the address is already in the array
datasize[i] += thisdata;;
printf("Match!\n");
break;
}
}
// reaching addressno means no match. so add it.
if (i == addressno)
{
printf("Started!\n");
strcpy(addresses[addressno], thisadd); //initialize new address
datasize[addressno++] = thisdata; //initialize assoc. data
}
}
else
{ // failed to parse input parameters.
break;
}
}
for(int i = 0; i <= addressno; i++)
{
printf("%s %i\n", addresses[i], datasize[i]);
fprintf(datafile,"%s %i\n", addresses[i], datasize[i]);
}
}
fclose(packetfile);
fclose(datafile);
}

Resources