a tester which does not know the names of files - c

There are some test cases for a program in format: ??.in ans ??.out in directories
./input and ./output such that for each test the first part of the names are equal e.g. test1.in, test1.out
How can I write a code which sweeps through these files together? (may be the files can be looked for in alphabetical order [in each directory]...)

Get list of all files in a directory using opendir (and related functions), and then parse the array.

for (int i = 0; i < 100; i++) {
char in_filename[100];
char out_filename[100];
sprintf(in_filename, "./input/test%d.in", i);
sprintf(out_filename, "./output/test%d.out", i);
/* use in_filename and out_filename as you see fit */
/* ... */
}

Related

Properties file reading in C (no C# or C++) compiled with minGW

I need to say that i am Newbie at C and i only wrote about 100-150 lines of code in C.
I need to read a .properties file with entries like the following:
Value1 = Hello
Value2 = Bye
I would like to get to the Values like this:
bla.getValue("Value1");
So i can work with it like this:
foo = bla.getValue("Value1");
bar = bla.getValue("Value2");
printf("%s - %s",foo,bar);
I don't need them for anything else, than printing them to the screen.
I found two questions here, which went into the right direction, but they couldn't help me in my task:
How to read configuration/properties file in C?
Properties file library for C (or C++)
I tried multiple of the answers of the thread above, but either way my compiler(minGW) doesn't like one of these lines:
using foo::bar;
or
using namespace foo;
When i try to compile my code, i get an error saying:
error: unknown type name 'using'
This is the code where i tried to implement the given solution of the thread above:
#include <windows.h>
#include <stdio.h>
#include <string.h>
using platformstl::properties_file;
int WINAPI WinMain(HINSTANCE a,HINSTANCE b,LPSTR c,int d)
{
char *tPath, *tWindow;
char *search = " ";
tWindow = strtok(c, search);
tPath = strtok(NULL, search);
properties_file properties("%s",tPath);
properties::value_type value1 = properties["Value1"];
properties::value_type value2 = properties["Value2"];
printf("Window: %s; Path: %s; %s %s",tWindow,tPath,value0,value1);
}
I use a WinMain, because the programm is about finding an open Window. I haven't included those parts of the code, because they are irrelevant for my question and worked completely fine. The strtok(); parts are working fine for me too. I need them, because the title of the window to find and the Path of the properties file are both given as commandline arguments:
programm.exe windowtitle path/to/properties/file
As i tried with other answers, which told me to load some libraries, i got to a point, where the needed libraries didn't contain the needed header files. Some of the libraries are even for c++, which i have a restriction on, so i can't use it.
I hope that made things a little clearer, as you may know that i am not used to ask questions here. :)
I solved my Problem with a big Workaround.
This is my final code:
if(vn != NULL){
for(i = 0; i < 1; i++){
if(fgets(temp, BUF, vn) == NULL){
printf("Line is empty");
return 2;
}
}
if(fgets(puffer, BUF, vn) == NULL){
printf("Line is empty");
return 2;
}
tVariable = strtok(puffer, find);
tValue = strtok(NULL, find);
}else {
printf("Unable to read File");
return 2;
}
I just read the second Line of the given file and cut it at the = sign.
I know, that i need to read the second line, because the Property i need is always found in the second line of the .properties file.
I now have my wanted Value in tValue, so i can use it to print it out with printf("%s", tValue).

Creating and writing in multiple txt files in C

I've got a code which creates j files (Node[j].ID) in a directory called (Nodes) and in those j files the code writes the info contained in NodeResults. At the moment the code doesn't neither create nor write in files because of the strcat function doesn't work. Please any idea how to correct the code in order to get the created files with the info contained in NodeResults on it?. Thanks in advance. Please find the code below:
{
int period, j ;
FILE*temporal;
FILE* temp_time;
char path[25];
char* extention = ".txt";
char s[30];
char temporal2[25];
long time_val = 0;
_mkdir("Nodes");
_mkdir("time");
temp_time = fopen("Time/time.txt", "w");
fprintf(temp_time, "%d,%d\n", ReportStep, Nperiods);
fclose(temp_time);
for ( j = 0; j < Nobjects[NODE]; j++ ) {
/* File path writing */
strcpy(temporal2,"Nodes/");
strcat(temporal2, Node[j].ID);
strcat(temporal2, extention);
temporal= fopen(temporal2, "w");
}
for ( period = 1; period <= Nperiods; period++ ) {
output_readNodeResults(period, j);
fprintf(temporal, "%9.3f,%9.3f,%9.3f,%9.3f,%9.3f\n",
NodeResults[NODE_INFLOW],
NodeResults[NODE_OVERFLOW],
NodeResults[NODE_DEPTH],
//NodeResults[NODE_HEAD],
NodeResults[NODE_VOLUME]);
}
fclose(temporal);
return Nperiods;
}
You open a bunch of files in the first for loop, but do not write anything to them. At each iteration, you assign a new FILE * to variable temporal, overwriting any previous value. Afterward, in your second for loop you write a bunch of output to the last file opened -- the one to which temporal refers at that point.
It looks like you want to move the body of the second for loop and the fclose() into the first for loop.
I have properly formatted your code, and now John's comment immediately sticks out: the brace is on the wrong line, resulting in wrong for loops and blocks!
Should you have formatted the code properly yourself, you would have seen it immediately yourself!

Regarding FOPEN in C

I am having a problem regarding FOPEN in C.
I have this code which reads a particular file from a directory
FILE *ifp ;
char directoryname[50];
char result[100];
char *rpath = "/home/kamal/samples/pipe26/divpipe0.f00001";
char *mode = "r";
ifp = fopen("director.in",mode); %director file contains path of directory
while (fscanf(ifp, "%s", directoname) != EOF)
{
strcpy(result,directoname); /* Path of diretory /home/kamal/samples/pipe26 */
strcat(result,"/"); /* front slash for path */
strcat(result,name); /* name of the file divpipe0.f00001*/
}
Till this point my code works perfectly creating a string which looks " /home/kamal/samples/pipe26/divpipe0.f00001 ".
The problem arises when I try to use the 'result' to open a file, It gives me error. Instead if I use 'rpath' it works fine even though both strings contain same information.
if (!(fp=fopen(rpath,"rb"))) /* This one works fine */
{
printf(fopen failure2!\n");
return;
}
if (!(fp=fopen(result,"rb"))) /* This does not work */
{
printf(fopen failure2!\n");
return;
}
Could some one please tell why I am getting this error ?
I think you mean char result[100];; i.e. without the asterisk. (Ditto for directoryname.)
You're currently stack-allocating an array of 100 pointers. This will not end well.
Note that rpath and mode point to read-only memory. Really you should use const char* for those two literals.
The error is the array 'char* result[100]', here you are allocating an array of 100 pointers to strings, not 100 bytes / characters, which was your intent.

Filter text from huge .csv files, in C

I have the raw and unfiltered records in a csv file (more than 1000000 records), and I am suppose to filter out those records from a list of files (each weighing more than 282MB; approx. more than 2000000 records). I tried using strstr in C. This is my code:
while (!feof(rawfh)) //loop to read records from raw file
{
j=0; //counter
while( (c = fgetc(rawfh))!='\n' && !feof(rawfh)) //read a line from raw file
{
line[j] = c; line[j+1] = '\0'; j++;
}
//function to extract the element in the specified column, in the CSV
extractcol(line, relcolraw, entry);
printf("\nWorking on : %s", entry);
found=0;
//read a set of 4000 bytes; this is the target file
while( fgets(buffer, 4000, dncfh)!=NULL && !found )
{
if( strstr(buffer, entry) !=NULL) //compare it
found++;
}
rewind(dncfh); //put the file pointer back to the start
// if the record was not found in the target list, write it into another file
if(!found)
{
fprintf(out, "%s,\n", entry); printf(" *** written to filtered ***");
}
else
{
found=0; printf(" *** Found ***");
}
//I hope this is the right way to null out a string
entry[0] = '\0'; line[0] ='\0';
//just to display a # on the screen, to let the user know that the program
//is still alive and running.
rawreccntr++;
if(rawreccntr>=10)
{
printf("#"); rawreccntr=0;
}
}
This program takes approximately 7 to 10 seconds, on an average, to search one entry in the target file (282 MB). So, 10*1000000 = 10000000 seconds :( God knows how much is that going to take if I decide to search in 25 files.
I was thinking of writing a program, and not going to spoon fed solutions (grep, sed etc.). OH, sorry, but I am using Windows 8 (64 bit, 4 GB RAM, AMD processor Radeon 2 core - 1000Mhz). I used DevC++ (gcc) to compile this.
Please enlighten me with your ideas.
Thanks in advance, and sorry if I sound stupid.
Update by Ali, the key information extracted from a comment:
I have a raw CSV file with details for customer's phone number and address. I have the target file(s) in CSV format; the Do Not Call list. I am suppose to write a program to filter out phone number that are not present in the Do No Call List. The phone numbers (for both files) are in the 2nd column. I, however, don't know of any other method. I searched for Boyer-Moore algorithm, however, could not implement that in C. Any suggestions about how should I go about searching for records?
EDITED
I would recommend you have a try with the readymade tools in any Unix/Linux system, grep and awk. You'll probably find they are just as fast and much more easily maintained. I haven't seen your data format, but you say the phone numbers are in the second column, so you can get the phone numbers on their own like this:
awk '{print $2}' DontCallFile.csv
If your phone numbers are in double quotes, you can remove those like this:
awk '{print $2}' DontCallFile.csv | tr -d '"'
Then you can use fgrep with the -f option, to search whether strings listed in one file are present in a second file, like this:
fgrep -f file1.csv file2.csv
or you can invert the search and search for strings NOT present in another file, by adding the -v switch to fgrep.
So, your final command would probably end up like this:
fgrep -v -f <(awk '{print $2}' DontCallFile.csv | tr -d '"') file2.csv
That says... search, in file2.csv for all strings not present (-v option) in column 2 of file "DontCallFile.csv". If you want to understand the bit in <() it is called process substitution and it basically makes a pseudo-file out of the result of running the command inside the brackets. And we need a pseudo-file because fgrep -f expects a file.
ORIGINAL ANSWER
Why are you using fgetc() anyway. Surely you would use getline() like this:
while(getline(myfile,line ))
{
...
}
Are you really reading the whole "target" file from the start for every single line in your main file? That will kill you! And why are you doing it in chunks of 4,000 bytes? And what if one of your strings straddles the 4,000 bytes you compare it with - i.e. the first 8 bytes are in one 4k chunk and the last however many bytes are in the nect 4k chunk?
I think you will get better help on here if you take the time to explain properly what you are trying to do - and maybe do it with awk or grep (at least figuratively) so we can see what you are actually trying to achieve. Your decription doesn't mention the "target" file you use in the code, for example.
You can do this with awk, like this:
awk -F, '
FNR==NR {gsub(/"/,"",$2);dcn[$2]++;next}
{gsub(/ /,"",$2);if(!dcn[$2])print}
' DontCallFile.csv x.csv
That says... the field separator is a comma (-F,). Now read the first file (DontCallFile.csv) and process according to the part in curly braces after FNR==NR. Remove the double quotes from around the phone number in field 2, using gsub (global substitution). Then increment the element in the associative array (i.e. hash) as indexed by unquoted field 2 and then move to next record. So basically, after file "DontCallFile.csv" is processed, the array dcn[] will hold a hash of all the numbers not to call (dcn=dontcallnumbers). Then, the code in the second set of curly braces is executed for each line of the second file ("x.csv"). That says... remove all spaces from around the phone number in field 2. Then, if that phone number is not present in the array dcn[] that we built earlier, print the line.
Here is one idea for improvement...
In the code below, what's the point in setting line[j+1] = '\0' at every iteration?
while( (c = fgetc(rawfh))!='\n' && !feof(rawfh))
{
line[j] = c; line[j+1] = '\0'; j++;
}
You might as well do it outside the loop:
while( (c = fgetc(rawfh))!='\n' && !feof(rawfh))
line[j++] = c;
line[j] = '\0';
My advice is the following.
Put all don't call phone numbers into an array.
Sort this array.
Use binary search to check if a given phone number is among the sorted
don't call numbers.
In the code below, I just hard-coded the numbers. In your application, you will have to replace that with the corresponding code.
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int compare(const void* a, const void* b) {
return (strcmp(*(char **)a, *(char **)b));
}
int binary_search(const char** first, const char** last, const char* val) {
ptrdiff_t len = last - first;
while (len > 0) {
ptrdiff_t half = len >> 1;
const char** middle = first;
middle += half;
if (compare(&*middle, &val)) {
first = middle;
++first;
len = len - half - 1;
}
else
len = half;
}
return first != last && !compare(&val,&*first);
}
int main(int argc, char** argv) {
size_t i;
/* Read _all_ of your don't call phone numbers into an array. */
/* For the sake of the example, I just hard-coded it. */
char* dont_call[] = { "908-444-555", "800-200-400", "987-654-321" };
/* in your program, change length to the number of dont_call numbers actually read. */
size_t length = sizeof dont_call / sizeof dont_call[0];
qsort(dont_call, length, sizeof(char *), compare);
printf("The don\'t call numbers sorted\n");
for (i=0; i<length; ++i)
printf("%lu %s\n", i, dont_call[i]);
/* For each phone number, check if it is in the sorted dont_call list. */
/* Use binary search to check it. */
char* numbers[] = { "999-000-111", "333-444-555", "987-654-321" };
size_t n = sizeof numbers / sizeof numbers[0];
printf("Now checking if we should call a given number\n");
for (i=0; i<n; ++i) {
int should_call = binary_search((const char **)dont_call, (const char **)dont_call+length, numbers[i]);
char* as_text = should_call ? "no" : "yes";
printf("Should we call %s? %s\n",numbers[i], as_text);
}
return 0;
}
This prints:
The don't call numbers sorted
0 800-200-400
1 908-444-555
2 987-654-321
Now checking if we should call a given number
Should we call 999-000-111? yes
Should we call 333-444-555? yes
Should we call 987-654-321? no
The code is definitely not perfect but it is sufficient to get you started.
The problem with your algorithm is complexity. You approach is O(n*m) where n is number of customers and m is number of do_not_call records (or size of file in your case). You need reduce this complexity. (And Boyer-Moore algorithm would not help there which suggested by Ali. It would not improve asymptotic complexity but only constant.) Even binary search as Ali suggest in his answer is not best. It would be O((n+m)*log m). We can do better. Nice solutions are using fgrep and awk as suggested by Mark Setchell in his answers. (I would chose one using fgrep which should perform better I guess but it is only guess.) I can provide one similar solution in Perl which will provide more robust CSV parsing and should handle your data sizes in easy on decent HW. This type of solutions has complexity O(n+m).
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use Text::CSV_XS;
use constant PHN_COL_DNC => 1;
use constant PHN_COL_CUSTOMERS => 1;
die "Usage: $0 dnc_file [customers]" unless #ARGV>0;
my $dncfile = shift #ARGV;
my $csv = Text::CSV_XS->new({eol=>"\n", allow_whitespace=>1, binary=>1});
my %dnc;
open my $dnc, '<', $dncfile;
while(my $row = $csv->getline($dnc)){
$dnc{$row->[PHN_COL_DNC]} = undef;
}
close $dnc;
while(my $row = $csv->getline(*ARGV)){
$csv->print(*STDOUT, $row) unless exists $dnc{$row->[PHN_COL_CUSTOMERS]};
}
If it would not meet our performance expectation you can go down to C road but I would definitely recommend use some good csv parsing and hashmap libraries. I would try libcsv and khash.h

how do i create recursive directories for the following requirement in c?

i expect to have more than one million files with unique names. I have been told that if i put all this files in one or two directories the search speed for these files will be extremely slow. So i have come up with the following directory architecture.
I want the directory structure to branch out with 10 sub directories and the level of the sub directories will be 4. because the file names are guaranteed to be unique i want to use these file names to make hashes which can be used to put the file in a directory and also later to find it. The random hash values will make a directory to have,approximately, 1,000 files.
so if F is root directory then inserting or searching for a file will have to go through these steps:
I want to use numbers from 0-9 as directory names
h=hash(filename)
sprintf(filepath,"f//%d//%d//%d//%d//.txt",h%10,h%10,h%10,h%10);
HOW DO I CREATE THESE DIRECTORIES?
EDIT:
All the files are text files.
The program will be distributed to many people in order to collect information for a research. So tt is important that these files are created like this.
EDIT:
i created the following code to implement perreal's pseudo code. It compiles to success but gives the run time error given at the end.
error occurs at the sprintf() line.
#include<iostream>
#include<stdlib.h>
#include<windows.h>
void make_dir(int depth, char *dir) {
if (depth < 4) {
if (! CreateDirectoryA (dir,NULL))
for (int i = 0; i < 10; i++) {
sprintf(dir,"\\%d",i);
char *sdir=NULL ;
strcpy(sdir,dir);
CreateDirectoryA(sdir,NULL);
make_dir(depth + 1, sdir);
}
}
}
int main()
{
make_dir(0,"dir");
return 1;
}
Unhandled exception at 0x5b9c1cee (msvcr100d.dll) in mkdir.exe:
0xC0000005: Access violation writing location 0x00be5898.
Kind of pseudo code, but can be done like this:
void make_dir(int depth, char *dir) {
if (depth < 4) {
CreateDirectoryA (dir,NULL);
for (int i = 0; i < 10; i++) {
char *sdir= (char*)malloc(strlen(dir+10)); // XXX 10?
strcpy(sdir, dir);
sprintf(sdir + strlen(sdir), "\\%d", i);
printf("%s\n", sdir);
//CreateDirectoryA(sdir,NULL);
make_dir(depth + 1, sdir);
free(sdir);
}
}
}
}
And to call make_dir(0, rootdir);
Do not do this: sprintf(dir,"\%d",i);
dir is a const, read only string in your example.
You're likely to run off the end of the string, corrupting things that follow it in memory.
Do not copy to sdir without allocating memory first.
sdir = (char *)malloc( strlen( dir ) + 1 );
At the end of the function make_dir, you will have to call free( sdir ); so you do not leak memory.

Resources