What is the best approach to take when reading in a file from c (essentially some sort of grid map of characters), and putting it into a 2 dimensional array of some sort at which character can be accessed by it's coordinates?
sample input file is something like:
ffflli
ffsdfg
fl979p
kl8dfj
and each character can be accessed by coordinates depending on their position. i.e. (0,3) for the bottom left coordinate
You can do something like:
FILE *f;
f = fopen("sample.txt","r");
char ch, strr[100], *str;
int row = 0, column = 0, i = 0, j = 0;
while(fgets(strr, sizeof strr, f)) {
row++;
if(column < strlen(strr) )
column = strlen(strr);
}
rewind(f);
char arr[row][column];
while(i < row) {
ch = fgetc(f);
while( ch != EOF || ch != '\n') {
arr[i][j] = ch;
j++;
}
i++;
}
fclose(f);
You can also avoid first iteration through the file if the file size is very large. It is to avoid wastage of memory by unnecessarily allocating large size to array.
I would suggest you an approach
while not end of file
if character pointed by file pointer is not '\n' or EOF
push chracter into array
This worked well for me. It iterates through an infile like the one provided, fills a provided 2D-array, and null-terminates unused slots.
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <memory.h>
int main(int argc, char* argv[])
{
FILE* infile;
char* file_path = "C:\\Your\\File\\Path\\Here.txt";
char grid[10][10];
unsigned int num_rows = 10;
// Open infile for reading.
if (0 == fopen_s(&infile, file_path, "r"))
{
char character;
unsigned int row = 0;
unsigned int column = 0;
// Iterate through entire infile.
while (!feof(infile))
{
// Scan in next character.
fscanf_s(infile, "%c", &character, 1);
// Check for newline, moving to next grid row when necessary.
if (character == '\n')
{
// Null-terminate/zero rest of row.
grid[row][column] = '\0';
// Move to beginning of next row.
++row;
column = 0;
continue;
}
// Set appropriate cell to next character.
grid[row][column] = character;
++column;
}
// Null-terminate to sever junk in unused slots.
for (; row < num_rows; ++row)
{
grid[row][column] = '\0';
column = 0;
}
// Close file.
fclose(infile);
}
return 0;
}
A few caveats:
I assumed a 10x10 grid was sufficient. If this doesn't work for you, by all means change the dimensions. The algorithm should adjust fine (for reasonable dimension values).
Multiple newlines in succession will probably result in entirely empty rows. If you want to handle this differently, check for this case as you read input.
I assumed that null-terminating rows was desirable behavior. If I was wrong, feel free to do as you please.
I wrote this in Visual Studio, compiling with MSVC, hence the usage of fopen_s() and fscanf_s(). If this is undesirable, alteration to fopen() and fscanf() respectively should be relatively simple (mostly changing a couple of statements and function arguments).
I hope this helps you. Let me know if you have any questions.
In C you can access any 2D array stored in a 1D buffer (like a file) by casting the pointer to the correct type. Like this:
int buffer1D[20] = {
0, 1, 2, 3, 4,
1, 2, 3, 4, 5,
2, 3, 4, 5, 6,
3, 4, 5, 6, 7
};
int* bufferPtr = buffer1D;
int (*twoDPtr)[5]; //a pointer of the correct type
twoDPtr = (int (*)[5])bufferPtr; //the interesting part: the cast
Now you can access the buffer as a 2D array:
assert(twoDPtr[2][3] == 5);
The trick is, that twoDPtr is a pointer to a line array of the 2D data. So, when you perform pointer arithmetic on it with twoDPtr[2], it will skip the first two lines of the 2D data.
Related
I'm trying to read a file containing a paragraph, count the number of times specific words occur (words that I have specified and stored in an array) and then print that result to another file that would look something like,
systems, 2
computer, 3
programming, 6
and so on. Currently, all this code does is spit out every word in the paragraph and their respective counts. Any help would be much appreciated.
#include <stdio.h>
#include <string.h>
int main()
{
FILE* in;
FILE* out;
char arr1[13][100] = { "systems", "programming", "computer", "applications", "language", "machine"};
int arr2[180] = {0};
int count = 0;
char temp[150];
in = fopen("out2.dat", "r");
out = fopen("out3.dat", "w");
while (fscanf(in, "%s", temp) != EOF)
{
int i, check = 8;
for (i = 0;i < count;i++)
{
if (strcmp(temp, arr1[i]) == 0)
{
arr2[i]++;
check = 1;
break;
}
}
if (check == 1) continue;
strcpy(arr1[count], temp);
arr2[count++] = 1;
}
int i;
for (i = 0; i < count; i++)
fprintf(out, "%s, %d\n", arr1[i], arr2[i]);
return 0;
}
The use of count does not make much sense throughout this program.
It is declared as int count = 0;, and then used as the upper bound in this loop
for (i = 0; i < count; i++)
limiting which search words will be used. This also means that this loop will not be entered on the first iteration of the surrounding while loop.
As such, check != 1, so after this count is used as the index in arr1 at which the currently read "word" will be copied into
strcpy(arr1[count], temp);
which makes absolutely no sense. Why overwrite data you are searching for?
Then count is incremented to 1 after being used to set the first element of arr2 to 1.
On the second iteration of the while loop, the for loop will run for exactly one iteration, comparing the newly read "word" (temp) against the first element of arr1 (which is now the last "word" read).
If this matches: the first element in arr2 is incremented from 1 to 2, the string copy is skipped, and count is not incremented.
If this does not match, the new "word" is copied into the second element of arr1, the second element of arr2 is set to 1, and count is incremented to 2.
This spirals out of control from here.
Given the input shown above, this accesses arr1 out-of-bounds when count reaches 13.
With files that have a small selection of data (<= 13 unique "words", lengths < 100), this may accidentally "work" by populating arr1 with the words from the file. This will have the end effect of showing you the counts of each "word" in the input file.
Eventually, you will invoke Undefined Behavior when one of the following occurs:
fscanf(in, "%s", temp) reads a string that overflows the temp buffer.
count exceeds the bounds of arr1 or arr2.
strcpy(arr1[count], temp); copies a string that overflows a buffer in arr1.
Either fopen fail.
In addition to being unsafe, fscanf(in, "%s", temp) will consider anything other than whitespace as being part of a valid string. This includes trailing punctuation, which may or may not be an issue depending on which tokens you want to match (systems. vs. systems). You may need more robust parsing.
In any case, either create an array of structures composed of search words and frequencies, or, create two arrays of the same length to represent this data:
const char *words[6] = { "systems", "programming", "computer", "applications", "language", "machine"};
unsigned freq[6] = { 0 };
There is no need to copy anything. Remember to check if fopen fails, and to limit %s when reading as not to overflow the input buffer.
The rest of the program looks similar: test each input "word" against all search words; increment the corresponding frequency if a match.
An example using an array of structures:
#include <stdio.h>
#include <string.h>
int main(void) {
struct {
const char *word;
unsigned freq;
} search_words[] = {
{ "systems", 0 },
{ "programming", 0 },
{ "computer", 0 },
{ "applications", 0 },
{ "language", 0 },
{ "machine", 0 }
};
size_t length = sizeof search_words / sizeof *search_words;
FILE *input_file = fopen("out2.dat", "r");
FILE *output_file = fopen("out3.dat", "w");
if (!input_file || !output_file) {
fclose(input_file);
fclose(output_file);
fprintf(stderr, "Could not access files.\n");
return 1;
}
char word[256];
while (1 == fscanf(input_file, "%255s", word))
for (size_t i = 0; i < length; i++)
if (0 == strcmp(word, search_words[i].word))
search_words[i].freq++;
fclose(input_file);
for (size_t i = 0; i < length; i++)
fprintf(output_file, "%s, %u\n",
search_words[i].word,
search_words[i].freq);
fclose(output_file);
}
cat out3.dat:
systems, 1
programming, 1
computer, 2
applications, 2
language, 1
machine, 1
I have given input which contains data that I am going to process and saved into an array. The input looks like this :
{ [1, 10], [2,1] , [-10, 20] }
it can have more elements in it. I need to process it that I can load all numbers from [ number , number ] into 2d array , first number should be at 0th and second number should be at 1st index so this array should look like
[[1,10],[2,1],[-10,20]]
But I've failed to find the solution, how to process this input into my desired array. What is the right way to do it?
I tried to do as following:
int main()
{
long long int cisla[10][2];
int x;
int y;
int i;
int index=0;
int counter=0;
char c;
char zatvorka_one;
char zatvorka_three;
char ciarka;
char ciarka_two;
printf("Pozicia nepriatela\n");
c=getchar();
if(c!='{'){
return 0;
}
scanf(" %c%d,%d%c",&ciarka,&x,&y,&zatvorka_one);
cisla[index][0]=x;
cisla[index][1]=y;
index++;
while(1){
scanf("%c",&ciarka);
if(ciarka=='}'){
break;
}
scanf(" %c%d%,%d%c",&ciarka,&x,&y,&zatvorka_one);
cisla[index][0]=x;
cisla[index][1]=y;
index++;
}
for ( i = 0; i < index; i++){
printf("%d %d\n",cisla[i][0],cisla[i][1]);
}
}
But somehow it returns unexpected result, how can i fix it?
You should use gets instead of scanf. gets will return the entire string wich will be easear. Then you ahould read about strtok wich can be used to separate a string. For example: strtok(s,",") will separate your string into smaller strings. For the input {[12,4], [8,9]} will divide into: first string: {[12 second string: 4] third string [8 and fourth string 9]}. Now you will just have to remove the characters that are not numbers like { } and []. After that you will have strings only with the numbers so you can use another predefined fuction you should read abput called atoi. It recieves a string and turns it into an int (ascci to int). There is also an atof (ascci to float) if you need it. Tuturialpoints is a good place to look for examples on how to use these functions i mentioned.
I'm relatively new to C and SO. Maybe I shouldn't give you the solution, but I did. It follows the advice of sharp c student.
You could try to do it like this:
#include "stdafx.h"
#include "string.h"
#include "stdlib.h"
#define MAXNBRELM 10 // maximum number of elements; adjust as needed
int main()
{
int IntArr[MAXNBRELM][2]; // integer array to hold results
int s = 0; // subscript
int NbrElm; // number of elements found
char Buf[81]; // buffer to hold input
char * StrPtr; // pointer to string for fgets
char * TknChr; // each individual token
char * NxtTkn; // next token position (only needed for Visual C++)
StrPtr = fgets(Buf, 80, stdin);
TknChr = strtok_s(Buf, " {[,]}", &NxtTkn);
while (s <= MAXNBRELM && TknChr != NULL) {
IntArr[s][0] = atoi(TknChr);
TknChr = strtok_s(NULL, " {[,]}", &NxtTkn);
if (TknChr != NULL) {
IntArr[s][1] = atoi(TknChr);
TknChr = strtok_s(NULL, " {[,]}", &NxtTkn);
s++;
}
}
NbrElm = s;
for (s = 0; s < NbrElm; s++)
printf("%d %d\n", IntArr[s][0], IntArr[s][1]);
return 0;
}
This is for Visual Studio, is why I needed to use strtok_s and &NxtTkn.
I have a txt file consisting of tab-separated data with type double. The data file is over 10 GB, so I just wish to read the data line-by-line and then do some processing. Particularly, the data is layout as an matrix with, say 1001 columns, and millions of rows. Below is just a fake sample to show the layout.
10.2 30.4 42.9 ... 3232.000 23232.45
...
...
7.234 824.23232 ... 4009.23 230.01
...
For each line I'd like to store the first 1000 values in an array, and the last value in a separate variable. I am new to C, so it would be nice if you could kindly point out major steps.
Update:
Thanks for all valuable suggestions and solutions. I just figured out one simple example where I just read a 3-by-4 matrix row by row from a txt file. For each row, the first 3 elements are stored in x, and the last element is stored in vector y. So x is a n-by-p matrix with n=p=3, y is a 1-by-3 vector.
Below is my data file and my code.
Data file:
1.112272 -0.345324 0.608056 0.641006
-0.358203 0.300349 -1.113812 -0.321359
0.155588 2.081781 0.038588 -0.562489
My code:
#include<math.h>
#include <stdlib.h>
#include<stdio.h>
#include <string.h>
#define n 3
#define p 3
void main() {
FILE *fpt;
fpt = fopen("./data_temp.txt", "r");
char line[n*(p+1)*sizeof(double)];
char *token;
double *x;
x = malloc(n*p*sizeof(double));
double y[n];
int index = 0;
int xind = 0;
int yind = 0;
while(fgets(line, sizeof(line), fpt)) {
//printf("%d\n", sizeof(line));
//printf("%s\n", line);
token = strtok(line, "\t");
while(token != NULL) {
printf("%s\n", token);
if((index+1) % (p+1) == 0) { // the last element in each line;
yind = (index + 1) / (p+1) - 1; // get index for y vector;
sscanf(token, "%lf", &(y[yind]));
} else {
sscanf(token, "%lf", &(x[xind]));
xind++;
}
//sscanf(token, "%lf", &(x[index]));
index++;
token = strtok(NULL, "\t");
}
}
int i = 0;
int j = 0;
puts("Print x matrix:");
for(i = 0; i < n*p; i++) {
printf("%f\n", x[i]);
}
printf("\n");
puts("Print y vector:");
for(j = 0; j < n; j++) {
printf("%f\t", y[j]);
}
printf("\n");
free(x);
fclose(fpt);
}
With above, hopefully things will work if I replace data_temp.txt with my raw 10 GB data file (of course change values of n,p, and some other code wherever necessary.)
I have additional questions that I wish if you could help me.
I first initialized char line[] as char line[(p+1)*sizeof(double)] (note not multiplying n). But the line cannot be read completely. How could I assign memory JUST for one single line? What's the lenght? I assume it's (p+1)*sizeof(double) since there are (p+1) doubles in each line. Should I also assign memory for \t and \n? If so, how?
Does the code look reasonable to you? How could I make it more efficient since this code will be executed over millions of rows?
If I don't know the number of columns or rows in the raw 10 GB file, how could I quickly count rows and columns?
Again I am new to C, any comments are very appreciated. Thanks a lot!
1st way
Read file in chunks into preallocated buffer using fread.
2nd way
Map the file into your process memory space using mmap, move the pointer then over the file.
3rd way
Since your file is delimited by lines, open the file with fopen, use setvbuf or similar to set a buffer size greater than about 10 lines or so, then read the file line-by-line using fgets.
To potentially read the file even faster, use open with O_DIRECT (assuming Linux), then use fdopen to get a FILE * for the open file, then use setvbuf to set a page-aligned buffer. Doing that will allow you to bypass the kernel page cache - if your system's implementation works successfully using direct IO that way. (There can be many restrictions to direct IO)
Something to get you started: Reading 1 line
#define COLUMN (1000+1)
double data[COLUMNS];
for (int i = 0; i< COLUMN; i++) {
char delim = '\n';
int cnt = fscanf(in_stream, "%lf%c", &data[i], &delim);
if (cnt < 1) {
if (cnt == EOF && i == 0) return 0; // None read, OK as end of file
puts("Missing or bad data");
return -1; // problem
}
if (delim != '\t') {
// If tab not found, should be at end of line
if (delim == '\n' && i == COLUMN-1) {
return COLUMN; // Success
}
puts("Bad delimiter");
return -1;
}
}
puts("Extra data");
return -1;
We've narrowed down the issue to this function. This one's meant to take in a a group of words to be searched for like:
fish
john
miss
nope
that appear immediately after an NxN grid to search, and extend to the end of the file.
I'm attempting to put these words into a 2D array-like structure using pointers, and she's giving me a segmentation fault.
Help?
Here's the code:
int acceptItems(char** items)/*Function reads in 2D array of items to be searched for*/
{
int row = 0;/*row, col keep track of position*/
int col = 0;
int numWords;/*Number of words to be searched for*/
int end = 1;/*1 means continue, 0 means end*/
char c;/*Temporary char for input*/
while(end == 1)
{
c = getchar();
if(c == EOF)/*Case ends repetition at end of file*/
{
end = 0;
}
else if(c == '\n')
{
items[row][col] = '\0';
row++;
col = 0;
}
else
{
items[row][col] = c;
col++;
}
}
numWords = row + 1;
return numWords;
}
Thanks!
Can't be 100% sure since you haven't posted your function call, but your items array is probably too small. You are going out of bounds when you try to set items[row][col].
1) In main(), insure items is declared as pointer, not int.
// char items; (from comment)
char** items; (** may or may not be missing from your comment. #Red Alert)
2) Declare ch as int. getchar() returns 256 different char and EOF. To distinguish these 257 different results, do not use char, but int.
// char c;
int c;
...
c = getchar();
3) Upon detecting EOF, terminate the current string. ( I think this is it. By not terminating this line, using numWords = row + 1 and your last text line not ending with a \n, the terminator is never set when printing last line, which has no \0 leads down to the scary place of UB.)
if(c == EOF)/*Case ends repetition at end of file*/
{
items[row][col] = '\0';
end = 0;
}
4) Add test to insure you are not writing out of bounds. The is the 2nd idea that somewhere code has boldly gone where no code has gone before.
if (row >= 100 || col >= 100) HandleError();
items[row][col] = ...
5) Suggest changing numWords count.
numWords = row;
if (col > 0) numWords++;
If you declare a 2D array outside of function acceptItems, and then pass it as an argument when you call this function, then you need to provide (in the function's declaration) at least the "lower" dimension:
int acceptItems(char items[][COLS])
You can also provide both dimensions, although you don't have to:
int acceptItems(char items[ROWS][COLS])
The general rule for any type of array, is that you have to provide all dimensions except for the "highest":
int func(int arr[][S2][S3][S4][S5])
BTW, function getchar returns an int (in order to allow the end-of-file indication). So you should basically use int c instead of char c (I don't think that you will ever have c == EOF otherwise).
I am writing a program which will take every 3 numbers in a file and convert them to their ASCII symbol. So I thought I could read the numbers into a character array, and then make every 3 elements 1 element in a second array, convert them to int and then print these as char.
I am stuck on taking every 3 elements, however. This is my code snippet for this part:
char arry[] = "073102109109112"; <--example string read from a file
char arryNew[16] = {0};
for(int i = 0; i <= sizeof(arryNew); i++){
strncpy(arryNew, arry, 3);
arryNew[i+3]='\0';
puts(arryNew);
}
What this code gives me is the first 3 numbers, fifteen times. I've tried incrementing i by 3, which gives me the first 3 numbers 5 times. How do I write a for-loop with strncpy so that after copying n chars, it moves to the next n chars?
You pass always the pointer to the beginning of the array, so you will always have the same result of course. You must include the loop counter to get at the next block:
strncpy(arryNew, &arry[i*3], 3);
Here you have a problem:
arryNew[i+3]='\0';
First of all, you don't need to set the null byte every time, because this will not change anyway. Additionally you will corrupt memory, because you use i+3 as the index so when you reach 14 and 15, it will write beyond the arrayboundary.
Your arrayNew must be longer, because your original array is 16 characters, and your target array is also. If you intend to have several 3char strings in there, then you must have 5*4 characters for your target, because each string also has the 0-byte.
And of course, you must also use the index here as well. The way it is written now, it will write beyond the array boundary, when i reaches 14 and 15.
So what you seem to want to do (not sure from your description) is:
char arry[] = "073102109109112"; <--example string read from a file
char arryNew[20] = {0};
for(int i = 0; i <= sizeof(arry); i++)
{
strncpy(&arryNew[i*4], &arry[i*3], 3);
puts(&arryNew[i*4]);
}
Or if you just want to have the individual strings printed then you can just do:
char arry[] = "073102109109112"; <--example string read from a file
char arryNew[4] = {0};
for(int i = 0; i <= sizeof(arry); i++)
{
strncpy(arryNew, &arry[i*3], 3);
puts(arryNew);
}
Making things a bit simpler: your target string doesn't change.
char arry[] = "073102109109112"; <--example string read from a file
char target[4] = {0};
for(int i = 0; i < strlen(arry) - 3; i+=3)
{
strncpy(target, arry + i, 3);
puts(target);
}
Decoding:
start at the beginning of arry
copy 3 characters to target
(note the fourth element of target is \0)
print out the contents of target
increment i by 3
repeat until you fall off the end of the string.
Some problems.
// Need to change a 3 chars, as text, into an integer.
arryNew[i] = (char) strtol(buf, &endptr, 10);
// char arryNew[16] = {0};
// Overly large.
arryNew[6]
// for(int i = 0; i <= sizeof(arryNew); i++){
// Indexing too far. Should be `i <= (sizeof(arryNew) - 2)` or ...
for (i=0; i<arryNewLen; i++) {
// strncpy(arryNew, arry, 3);
// strncpy() can be used, but we know the length of source and destination,
// simpler to use memcpy()
// strncpy(buf, a, sizeof buf - 1);
memcpy(buf, arry, N);
// arryNew[i+3]='\0';
// Toward the loop's end, code is writing outside arryNew.
// Lets append the `\0` after the for() loop.
// int i
size_t i; // Better to use size_t (or ssize_t) for array index.
Suggestion:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char Source[] = "073102109109112"; // example string read from a file
const int TIW = 3; // textual integer width
// Avoid sprinkling bare constants about code. Define in 1 place instead.
const char *arry = Source;
size_t arryLen = strlen(arry);
if (arryLen%TIW != 0) return -1; // is it a strange sized arry?
size_t arryNewLen = arryLen/TIW;
char arryNew[arryNewLen + 1];
size_t i;
for (i=0; i<arryNewLen; i++) {
char buf[TIW + 1];
// strncpy(buf, a, sizeof buf - 1);
memcpy(buf, arry, TIW);
buf[TIW] = '\0';
char *endptr; // Useful should OP want to do error checking
// TBD: test if result is 0 to 255
arryNew[i] = (char) strtol(buf, &endptr, 10);
arry += TIW;
}
arryNew[i] = '\0';
puts(arryNew); // prints Ifmmp
return 0;
}
You could use this code to complete your task i.e. to convert the given char array in form of ascii value.
char arry[] = "073102109109112";
char arryNew[16] = {0};
int i,j=0;
for(i = 0; i <= sizeof(arryNew)-2; i+=3)
{
arryNew[j]=arry[i]*100+arry[i+1]*10+arry[i+2]*1;
j++;
arryNew[j+1]='\0';
puts(arryNew);
}