Reading all files in two directories at the same time - c

I have a problem with task. I have two path to directories. I can read all files from first path in argv[1] but can't open files from second folder from argv[2]. Quantity of files is equal. The way at the begining to write name of fales in array is failed because their is about a few hundred.I have an example how I try reading files. Need help. Thanks!
#include "stdafx.h"
#include "windows.h"
int main(int argc, char* argv[])
{
FILE *fp = 0;
uchar tmpl1[BUFFER_SIZE] = { 0 };
uchar tmpl2[BUFFER_SIZE] = { 0 };
size_t size;
size_t n;
FILE *Fl = 0;
if (argc != 3 || argv[1] == NULL || argv[2] == NULL)
{
printf("Error", argv[0]);
return -1;
}
char Fn[255];
HANDLE hFind;
WIN32_FIND_DATA ff;
char Fn1[255];
HANDLE hFind1;
WIN32_FIND_DATA ff1;
sprintf_s(Fn, 255, "%s\\*", argv[1]);
sprintf_s(Fn1, 255, "%s\\*", argv[2]);
if ((hFind = FindFirstFile(Fn, &ff)) != INVALID_HANDLE_VALUE)
{
if ((hFind1 = FindFirstFile(Fn1, &ff1)) != INVALID_HANDLE_VALUE)
{
do
{
if (ff.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) continue;
ff1.dwFileAttributes;
sprintf_s(Fn, "%s\\%s", argv[1], ff.cFileName);
sprintf_s(Fn1, "%s\\%s", argv[2], ff1.cFileName);
// here I can't read file's name from second folder
printf(Fn, "%s\\%s", argv[1], ff.cFileName);
printf(Fn1, "%s\\%s", argv[2], ff1.cFileName);
if (fopen_s(&fp, Fn, "rb") != 0)
{
printf("Error reading\nUsage: %s <tmpl1>\n", argv[1]);
return -1;
}
size = _filelength(_fileno(fp));
n = fread(tmpl1, size, 1, fp);
fclose(fp);
fp = 0;
} while (FindNextFile(hFind, &ff));
// also I have a problem how read next file in second directory
FindClose(hFind);
}
}
return 0;
}

I didn't read why you want to scan two directories concurrently.
When I saw "at the same time" in the title I thought "concurrently". Afterwards, I saw the presented code and realized it shall be done rather "interleaved" instead of "concurrently" but that's not essential.
I assume you want to associate the file names in the first directory somehow to the file names in the second directory. This might be comparing the file names, read data from a file of first directory and read other data from an associated file of second directory, or may be something completely different.
However, based on this assumption, you have to consider that:
You should not assume to get file names in any useful order when scanning with FindFirstFile()/FindNextFile(). These functions return the files in its "physical order" i.e. how they are listed internally. (At best, you get . and .. always as first entries but I even wouldn't count on this.)
Considering this, I would recommend the following procedure:
read file names from first directory and store them in an array names1
read file names from second directory and store them in an array names2
sort arrays names1 and names2 with an appropriate criterion (e.g. lexicographically)
process the arrays names1 and names2.
As you see, the "read file names from directory and store them in an array" could be implemented as function and re-used as well as the sorting.
This said, finally, the answer for how to interleave two directory scans:
HANDLE hFind1 = FindFirstFile(Fn1, &ff1);
HANDLE hFind2 = FindFirstFile(Fn2, &ff2);
while (hFind1 != INVALID_HANDLE_VALUE || hFind2 != INVALID_HANDLE_VALUE) {
if (hFind1 != INVALID_HANDLE_VALUE) {
/** #todo process ff1 somehow */
}
if (hFind2 != INVALID_HANDLE_VALUE) {
/** #todo process ff2 somehow */
}
/* iterate: */
if (!FindNextFile(hFind1, &ff1)) {
FindClose(hFind1); hFind1 = INVALID_HANDLE_VALUE;
}
if (!FindNextFile(hFind2, &ff2)) {
FindClose(hFind2); hFind2 = INVALID_HANDLE_VALUE;
}
}
Please, note that I "abuse" the handles hFind1 and hFind2 itself for loop repetition. Thus, I do not need the extra ifs. (I like things like that.)
Btw. this loop iterates until both directories are scanned completely (even if they don't contain the same number of entries).
If you want to iterate instead until at least one directory is scanned completely you may achieve this by simply changing the while condition to:
while (hFind1 != INVALID_HANDLE_VALUE && hFind2 != INVALID_HANDLE_VALUE) {
if the loop shall be terminated as soon as at least one directory scan fails.
At last, a little story out of my own past (where I learnt a useful lesson regarding this):
I just had finished my study (of computer science) and was working at home on a rather fresh installed Windows NT when I started to copy a large directory from a CD drive to harddisk. The estimated time was round-about 1 hour and I thought: 'Hey. It does multi-tasking!' Thus, I started a second File Manager to copy another directory from this CD drive concurrently. When I hit the OK button, the prompt noises of the CD drive alerted me as well as the estimated time which "exploded" to multiple hours. After that, I behaved like to expect: tapped on my forehead and mumbled something like "unshareable resources"... (and, of course, stopped the second copying and went for a coffee instead.)

Related

Count the number of files in, and below a directory in Linux C recursively

I write a function to count the number of files in, and below a directory (including files in the sub directory).
However, When I test the code on a directory with sub directory, it always report error said: "fail to open dir: No such file or directory".
Is there any thing I could do to make it work?
int countfiles(char *root, bool a_flag)//a_flag decide if it including hidden file
{
DIR *dir;
struct dirent * ptr;
int total = 0;
char path[MAXPATHLEN];
dir = opendir(root); //open root dirctory
if(dir == NULL)
{
perror("fail to open dir");
exit(1);
}
errno = 0;
while((ptr = readdir(dir)) != NULL)
{
//read every entry in dir
//skip ".." and "."
if(strcmp(ptr->d_name,".") == 0 || strcmp(ptr->d_name,"..") == 0)
{
continue;
}
//If it is a directory, recurse
if(ptr->d_type == DT_DIR)
{
sprintf(path,"%s%s/",root,ptr->d_name);
//printf("%s/n",path);
total += countfiles(path, a_flag);
}
if(ptr->d_type == DT_REG)
{
if(a_flag == 1){
total++;
}
else if (a_flag == 0){
if (isHidden(ptr->d_name) == 0){
total++;
}
}
}
}
if(errno != 0)
{
printf("fail to read dir");
exit(1);
}
closedir(dir);
return total;
}
Is there anything I could make it to work?
Sure, lots. Personally, I'd start by using the correct interface for this stuff, which in Linux and POSIXy systems would be nftw(). This would lead to a program that was not only shorter and more effective, but would not as easily get confused if someone renames a directory or file in the tree being scanned at the same time.
Programmers almost never implement opendir()/readdir()/closedir() as robustly and as efficiently as nftw(), scandir(), glob(), or the fts family of functions do. Why teachers still insist on using the archaic *dir() functions in this day and age, puzzles me to no end.
If you have to use the *dir functions because your teacher does not know POSIX and wants you to use interfaces you should not use in real life, then look at how you construct the path to the new directory: the sprintf() line. Perhaps even print it (path) out, and you'll probably find the fix on your own.
Even then, sprintf() is not something that is allowed in real life programs (because it will cause a silent buffer overrun when the arguments are longer than expected; and that CAN happen in Linux, because there actually isn't a fixed limit on the length of a path). You should use at minimum snprintf() and check its return value for overruns, or in Linux, asprintf() which allocates the resulting string dynamically.

How to check if file exists and create new file in C

I am new to programming in C and I am programming for the Raspberry Pi using a C compiler. All I want to be able to do is create a function that takes a String as a parameter and save it as a text file in a specific location. I want to check that file location to see what files exist and save the new file to that folder with an increment of 1 added to the file name.
For example, folder contains:
TestFile1
TestFile2
And I want to be able to create the new file saved as TestFile3.
This is the code that I have so far and want to know if I am on the right lines and get any tips please:
void WriteToFile(unsigned char *pID)
{
printf("Writing to file. . . . .\n");
/* Checking to see how many files are in the directory. */
int *count = 0;
DIR *d;
struct dirent *dir;
d = opendir("table_orders");
if(d)
{
while((dir = readdir(d)) != NULL)
{
printf("%s\n", dir->d_name);
count = count + 1; // Adds 1 to count whenever a file is found.
}
closedir(d);
}
char str[sizeOf(count)]; // Creates string.
sprintf(str, "%d", count); // Formats count integer into a string.
File *f = fopen("table_orders/Order " + str + ".txt", "a"); // Creates new file.
if(f == NULL)
{
printf("Error opening file!\n");
exit(1);
}
fprintf(f, "Order: %s \n", pID);
fclose(f);
printf("The Order %s has been written to the file\n", pID);
}
int fd = open( "filename", O_RDWR | O_CREAT | O_EXCL, 0644 );
Nothing else is atomic - another process can create the file in between any check for existence and your actual creation of the file.
You can use stat (_stat in Windows) to see if a file exits. If it fails with errno set to ENOENT then the file doesn't exist. access is another possibility.
Of course it's not atomic, some other process could create the file in between your check and your call to fopen.
You are close, but so far off the mark, I think you need to put this on hold. Read some c tutorials with structured examples.
You have an algorithm error when you read through the directory and arbitrarily increase count. The logic should be more like
parse dir->name to 3 tokens "Testfile" 0001 ".log"
Using atoi or similar convert the numeral string to an int
When you declare str there's a lack of understanding of sizeof, it should read more like
char str[25];
This is enough to hold all digits of a 4 byte int as a string. sizeof count will be 4 probably,4 bytes 32 bits.
When you fopen you do something like "Dir/file" + str + ".log"
This isn't how you do this, + is mathematical in c. You need to use strcat or sprintf into a new work string, created and freed on the fly.
When appending a number to a file like here it makes sense to precede short numbers with 0 s . This produces regular filenames which have a good ls order.
Should you need to control create and exclusivity, you'll need to use open to open the file and fdopen it to a FILE type file handle.
When you go live with this you will need to prevent another process causing timing errors with scheduling use semaphores or a ".lock" file.
You will also want some maintenance routine to delete old logs and subsequently renumber all the remaining logs from 001
PS don't know why the down votes

read multiple fasta sequence using external library kseq.h

I am trying to find fasta sequences of 5 ids/name as provided by user from a big fasta file (containing 80000 fasta sequences) using an external header file kseq.h as in: http://lh3lh3.users.sourceforge.net/kseq.shtml. When I run the program in a for loop, I have to open/close the big fasta file again and again (commented in the code) which makes the computation time slow. On the contrary, if I open/close only once outside the loop, the program stops if it encounters an entry which is not present in the big fasta file I.e. it reaches end of the file. Can anyone suggest how to get all the sequences without losing computational time. The code is:
#include <zlib.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include "ext_libraries/kseq.h"
KSEQ_INIT(gzFile, gzread)
int main(int argc, char *argv[])
{
char gwidd_ids[100];
kseq_t *seq;
int i=0, nFields=0, row=0, col=0;
int size=1000, flag1=0, l=0, index0=0;
printf("Opening file %s\n", argv[1]);
char **gi_ids=(char **)malloc(sizeof(char *)*size);
for(i=0;i<size;i++)
{
gi_ids[i]=(char *)malloc(sizeof(char)*50);
}
FILE *fp_inp = fopen(argv[1], "r");
while(fscanf(fp_inp, "%s", gwidd_ids) == 1)
{
printf("%s\n", gwidd_ids);
strcpy(gi_ids[index0], gwidd_ids);
index0++;
}
fclose(fp_inp);
FILE *f0 = fopen("xxx.txt", "w");
FILE *f1 = fopen("yyy.txt", "w");
FILE *f2 = fopen("zzz", "w");
FILE *instream = NULL;
instream = fopen("fasta_seq_uniprot.txt", "r");
gzFile fpf = gzdopen(fileno(instream), "r");
for(col=0;col<index0;col++)
{
flag1=0;
// FILE *instream = NULL;
// instream = fopen("fasta_seq_nr_uniprot.txt", "r");
// gzFile fpf = gzdopen(fileno(instream), "r");
kseq_t *seq = kseq_init(fpf);
while((kseq_read(seq)) >= 0 && flag1 == 0)
{
if(strcasecmp(gi_ids[col], seq->name.s) == 0)
{
fprintf(f1, ">%s\n", gi_ids[col]);
fprintf(f2, ">%s\n%s\n", seq->name.s, seq->seq.s);
flag1 = 1;
}
}
if(flag1 == 0)
{
fprintf(f0, "%s\n", gi_ids[col]);
}
kseq_destroy(seq);
// gzclose(fpf);
}
gzclose(fpf);
fclose(f0);
fclose(f1);
fclose(f2);
for(i=0;i<size;i++)
{
free(gi_ids[i]);
}
free(gi_ids);
return 0;
}
A few examples of inputfile (fasta_seq_uniprot.txt) is:
P21306
MSAWRKAGISYAAYLNVAAQAIRSSLKTELQTASVLNRSQTDAFYTQYKNGTAASEPTPITK
P38077
MLSRIVSNNATRSVMCHQAQVGILYKTNPVRTYATLKEVEMRLKSIKNIEKITKTMKIVASTRLSKAEKAKISAKKMD
-----------
-----------
The user entry file is
P37592\n
Q8IUX1\n
B3GNT2\n
Q81U58\n
P70453\n
Your problem appears a bit different than you suppose. That the program stops after trying to retrieve a sequence that is not present in the data file is a consequence of the fact that it never rewinds the input. Therefore, even for a query list containing only sequences that are present in the data file, if the requested sequence IDs are not in the same relative order as the data file then the program will fail to find some of the sequences (it will pass them by when looking for an earlier-listed sequence, never to return).
Furthermore, I think it likely that the time savings you observe comes from making only a single pass through the file, instead of a (partial) pass for each requested sequence, not so much from opening it only once. Opening and closing a file is a bit expensive, but nowhere near as expensive as reading tens or hundreds of kilobytes from it.
To answer your question directly, I think you need to take these steps:
Move the kseq_init(seq) call to just before the loop.
Move the kseq_destroy(seq) call to just after the loop.
Put in a call to kseq_rewind(seq) as the last statement in the loop.
That should make your program right again, but it is likely to kill pretty much all your time savings, because you will return to scanning the file from the beginning for each requested sequence.
The library you are using appears to support only sequential access. Therefore, the most efficient way to do the job both right and fast would be to invert the logic: read sequences one at a time in an outer loop, testing each one as you go to see whether it matches any of the requested ones.
Supposing that the list of requested sequences will contain only a few entries, like your example, you probably don't need to do any better testing for matches than just using an inner loop to test each requested sequence id vs. the then-current sequence. If the query lists may be a lot longer, though, then you could consider putting them in a hash table or sorting them into the same order as the data file to make it possible to test more efficiently for matches.

C: recursively opening sub-directories and creating new files

I'm writing something that recursively finds .c and .h files and deletes all comments (just as a learning excercise). For every .c/.h file found, this program creates an additional file which is equal to the original file without the comments. So for example, "helloworld.c" would result in an additional file "__helloworld.c"
The problem I am encountering is this:
I have a loop which iterates over all entries in a directory, and keeps going until it stops finding files with .c or .h extensions. However, the loop never actually ends, since each time a file is found, another is created. So I have this recursive situation where "__helloworld.c" becomes "____helloworld.c" which becomes "______helloworld.c", etc.
(in case anyone suggests, yes it is necessary for the new files to have a .c extension.)
One possible solution may be to keep track of the inode numbers so we know only to iterate over original files, however this requires several iterations of the loop: once to count directory entries,
(and use this number to initialise array for inode nums), twice to store inode numbers, and finally a third time to do the work.
Can anybody share any ideas that could achieve this in a single pass of the loop?
code is split across two files so I have posted the main recursive routine:
consume_comments():
takes single file as argument, creates new file with comments omitted
My main routine pretty much just does some argument handling- the routine posted below is where the real problems are.
/*
opens a directory stream of the dir pointed to by 'filename',
looks for .c .h files, consumes comments. If 'rc' == 1, find()
calls itself when it encounters a sub-directory.
*/
int find (const char * dirname)
{
int count = 3;
DIR * dh;
struct dirent * dent;
struct stat buf;
const char * fnext;
int filecount = 0;
chdir(dirname);
if ((dh = opendir(".")) == NULL)
{
printf("Error opening directory \"%s\"\n", dirname);
exit(-1);
}
while ((dent = readdir(dh)) != NULL)
{
if (count) count--;
if (!count)
{
if (lstat(dent->d_name, &buf) == -1)
{
printf("Error opening file \"%s\" for lstat()\n", dent->d_name);
exit(EXIT_FAILURE);
}
if (S_ISDIR(buf.st_mode) && rc)
{
find(dent->d_name);
chdir("..");
//when this find() completes, it will be one level down:
//so we must come back up again.
}
if (S_ISREG(buf.st_mode))
{
fnext = fnextension(dent->d_name);
if (*fnext == 'c' || *fnext == 'h')
{
consume_comments(dent->d_name);
printf("Comments consumed:%20s\n", dent->d_name);
}
}
}
}
}
You can use 1 of the 3 solutions
As suggested in comment by #Theolodis, ignore files starting with
__.
Split your algorithm into 2 parts. In first part prepare a list of
all the .c and .h files(recursive). In second step, go through the list and
generated stripped versions of files(non-recursive).
Prepare the stripped .c and .h files in some temp directory
(/tmp in linux or %TEMP% in windows) and move it to folder once
all the .c and .h files of the folders have been processed. Now
scan all the sub-folders.
I do see multiple solutions to your problem. But in any case you might need to check if the file you are going to create does already exist or not! Otherwise you could run into cases where you do override existing files!
(Example: file.c, __file.c in your directory, you check the file __file.c and generate the file ____file.c, then you check the file file.c and override the file __file.c)
Ignore files that do begin with your chosen prefix.
advantages: easy to implement
downsides: you might miss some files starting with your prefix
while going through all the directory you make a set of unique filenames you have already created. Before converting any file you check if this file has been created by yourself.
advantages: you don't miss files that begin with your prefix
disadvantages: if you do have a very long list of files the memory usage might explode.
edit: the second and third solution of Mohit Jain look pretty good too!
New implementation, using a routine chk_prefix() to match the prefix of filenames.
char * prefix = "__nmc_";
int chk_prefix (char * name)
{
int nsize = strlen(name);
int fsize = strlen(prefix);
int i;
if (nsize < fsize) return 1;
for (i = 0; i < fsize; i++)
{
if (name[i] != prefix[i]) return 1;
}
return 0;
}
int find (const char * dirname)
{
int count = 3;
DIR * dh;
struct dirent * dent;
struct stat buf;
const char * fnext;
int filecount = 0;
chdir(dirname);
if ((dh = opendir(".")) == NULL)
{
printf("Error opening directory \"%s\"\n", dirname);
exit(-1);
}
while ((dent = readdir(dh)) != NULL)
{
if (count) count--;
if (!count)
{
if (lstat(dent->d_name, &buf) == -1)
{
printf("Error opening file \"%s\" for lstat()\n", dent->d_name);
exit(EXIT_FAILURE);
}
if (S_ISDIR(buf.st_mode) && rc)
{
find(dent->d_name);
chdir("..");
//when this find() completes, it will be one level down:
//so we must come back up again.
}
if (S_ISREG(buf.st_mode))
{
fnext = fnextension(dent->d_name);
if (*fnext == 'c' || *fnext == 'h' && chk_prefix(dent->d_name))
{
consume_comments(dent->d_name);
printf("Comments consumed:%20s\n", dent->d_name);
}
}
}
}
}

Reading in .txt file with different extension in C

At the moment my program has no problem reading in a .txt file, but my program needs to read in a text file with a different file extension (.emu is the requirement). When simply changing the same file's extension to .emu, the variable 'file' is NULL and therefore the file isn't opened, can anyone help?
Had a little look around and haven't been able to find a solution so any help is much appreciated
here's the source code:
void handleArgs (const char *filename, int trace, int before, int after) {
FILE *file = fopen(filename, "r");
char *address = malloc(MAX_ADD_LENGTH * sizeof(char));
char *instruction = malloc(MAX_INS_LENGTH * sizeof(char));
long int addressDecoded;
if (file == NULL || file == 0) {
fprintf(stderr, "Error: Could not open file");
}
else {
if (ferror(file) == 0) {
while (fscanf(file, "%s %s", address, instruction) != EOF) {
if (strlen(address) == 8 && strlen(instruction) == 8) {
addressDecoded = strtol(address, NULL, 16);
printf("%ld\n", addressDecoded);
//instruction = decodeInstruction(instruction);
}
else {
fprintf(stderr, "Error: particular line is of wrong length");
}
}
}
}
fclose(file);
}
argument 'filename' when executing is simply '/foopath/test.emu'
There's nothing special to C about the file extension. Reread your code for simple errors like changing the filename in one place, but not the other. If you're passing in the filename, pass the whole name, not just the part to the left of the period.
Files are data, and have names. What comes before the dot in a name, is just as much a part of it as what comes after -- the extensions were created just as hints as to what the file contains, but they are NOT required to be strictly related to the file's contents.
The file may not exist, or your priviledges may not be enough to open it. Or maybe there's some other kind of error. How can you diagnose this?
When you use a system call and it doesn't behave the way you want to, there's a variable called errno in errno.h (#include <errno.h>) that will contain a number representing the status of the last call. There's a huge list of symbolic constants to put names to these values, you can google it up.
For example, if you try to open a file and the returned pointer is useless, you might want to check errno to see if the file existed, or if you're exceding system restrictions for opened files, etc.

Resources