How to compare 2 files lexicographically using C

How to compare 2 files lexicographically using C - c

Hey guys, I'm currently trying to implement a function using C that takes in two file names as command line arguments and compare them lexicographically.
The function will return -1 if the contents of the first file are less than the contents of the second file, 1 if the contents of the second file are less than the contents of the first file, and 0 if the files are identical.
Please give me some advice on how I should start with this.
[EDIT]
Hey guys sorry if there's any unclear part in the question, so I'll just post the link to the question here: Original question. Thing is it's an uni assignment so we're expected to do it using only basic C properties, probably only including stdio.h, stdlib.h, and string.h. Sorry for the trouble caused. Also here's the code I already have, my main problem now is that the function doesn't know that file1.txt (refer to the link) has it's first line longer than file2.txt, but is actually lexicographically less:
int filecmp(char firstFile[], char secondFile[])
{
int similarity = 0;
FILE *file1 = fopen(firstFile, "r");
FILE *file2 = fopen(secondFile, "r");
char line1[BUFSIZ];
char line2[BUFSIZ];
while (similarity == 0)
{
if (fgets(line1, sizeof line1, file1) != NULL)
{
if (fgets(line2, sizeof line2, file2) != NULL)
{
int length;
if (strlen(line1) > strlen(line2))
{
length = strlen(line1);
}
else
{
length = strlen(line2);
}
for (int i = 0; i < length; i++)
{
if (line1[i] < line2[i]) similarity = -1;
if (line1[i] > line2[i]) similarity = 1;
}
}
else
{
similarity = 1; //As file2 is empty
}
}
else
{
if (fgets(line2, sizeof line2, file2) != NULL)
{
similarity = -1; // As file1 is empty
}
else break;
}
}
fclose(file1);
fclose(file2);
return similarity;
}
[END EDIT]
Many thanks,
Jonathan Chua

Take a look the source code of the UNIX cmp utility, e.g. here. The relevant file is regular.c. If you can't use mmap, the principle of implementation through fgetc() is the same: keep reading a single character from each of the two files as long as they compare the same. When (if!) you find a difference, return the result of the comparison. The borderline case of one file being proper prefix of the other (e.g. "ABC" "ABCCC") can be resolved by treating EOF as an infinitely small value. This is already neatly solved in C as fgetc() guarantees to return a negative value ONLY on EOF; proper characters are >= 0.

Are you allowed to use strcmp?
If so (untested):
int ret = 0;
while (ret == 0)
{
char line1 [ MAX_LINE_LEN ];
char line2 [ MAX_LINE_LEN ];
if (fgets(line1, MAX_LINE_LEN, file1) != NULL )
{
if (fgets(line2, MAX_LINE_LEN, file2) != NULL )
{
ret = strcmp(line1, line2);
}
else
{
ret = 1;
}
}
else
{
if (fgets(line2, MAX_LINE_LEN, file2) != NULL )
{
ret = -1;
}
else
{
break;
}
}
}
return ret;

Related

Correct way to limit document read in C without !feof, document error cannot read the .txt

I've been doing this C program which requires reading .txt files and so on. There's been lots of warning about using !feof but I still don't understand where the limitations !feof could bring. I wonder if the fault on my code today is really on !feof?
typedef struct City {
char cityName[20];
char cityID[10];
};
void readFiles() {
//preparing .txt file to read
char *txtMap = "map.txt";
char *txtPrice = "deliver_price.txt";
FILE *fmap = fopen(txtMap, "r");
FILE *fprice = fopen(txtPrice, "r");
City cityArr[20]; //I've defined the typedef struct before
int j, a = 0;
if (fmap == NULL || fprice == NULL || fmap && fprice == NULL) {
if (fmap == NULL) {
printf("\n\n\n\t\t\t\t\tError: Couldn't open file %s\n\n\n\n\n\n\n",
fmap);
printf("\n\n\n\t\t\t\t\tPress enter to continue\n\t\t\t\t\t");
return 1;
} else if (fprice == NULL) {
printf("\n\n\n\t\t\t\t\tError: Couldn't open file %s\n\n\n\n\n\n\n",
fprice);
printf("\n\n\n\t\t\t\t\tPress enter to continue\n\t\t\t\t\t");
return 1;
}
}
while (!feof(fmap)) {
City newCity;
fscanf(fmap, "%[^#]||%[^#]\n", &newCity.cityName, &newCity.CityID);
cityArr[a] = newCity;
a++;
}
printf("reading file succesfull");
fclose(fmap);
for (j = 0; j < a; j++) {
printf("\n%s || %s\n", cityArr[j].cityName, cityArr[j].cityID);
}
getch();
}
The text files need to be read:
New York||0
Washington D.C||1
Atlanta||2
Colombus||3
This program cannot read the files properly and making the program crash returning memory number. Anyone knows what's wrong with this program?
Sometimes when I tried fixing it, it says 'this part is a pointer, maybe you meant to use ->' error stuff. I don't know why this happen because in previous code, where I copied the file processing code part from, it doesn't happen like this.

Code has various troubles including:
Code not compiled with all warnings enabled
Save time. Enable all warnings.
Wrong use of feof()
See Why is “while ( !feof (file) )” always wrong?.
No width limit
"%[^#]" risks reading to much into &newCity.cityName.
Wrong type
"%[^#]" matches a char *. &newCity.cityName is not a char *.
Incorrect format
"%[^#]||%[^#]\n" will only match text that begins with a non-'#' and then up to, but not including a '#') followed by a '|' - which is impossible.
Consuming more than 1 line
"\n" reads any number of lines or white space.
Code is not checking the return value of input functions
Unlimited lines
Code can attempt to read more than 20 lines, yet City cityArr[20]; is limited.
Some corrections:
while (a < 20) {
City newCity;
int count = fscanf(fmap, "%19[^|]||%9[^\n]%*1[\n]",
newCity.cityName, newCity.CityID);
if (count != 2) break;
cityArr[a] = newCity;
a++;
}
Better
// Size line buffer to about 2x expected maximum
#define LINE_SIZE (sizeof(struct City)*2 + 4)
char buf[LINE_SIZE];
while (a < 20 && fgets(buf, sizeof buf, fmap)) {
City newCity;
int n = 0;
sscanf(buf, "%19[^|]||%9[^\n] %n", newCity.cityName, newCity.CityID, &n);
if (n == 0 || buf[n] != '\0') {
fprintf(stderr, "Bad input line <%s>\n", buf);
return -1;
}
cityArr[a] = newCity;
a++;
}
Wrong test
fmap && fprice == NULL is not what OP wants. Review operator precedence.
// if(fmap == NULL || fprice == NULL || fmap && fprice == NULL){
if (fmap == NULL || fprice == NULL) {
Useful to post exact errors
Not "it says 'this part is a pointer, maybe you meant to use ->' error stuff."
Return from void readFiles()?
Code attempts return 1;. Use int readFiles().
FILEs not closed
Add fclose( name ) when done with file.

C parse comments from textfile

So I am trying to implement a very trivial parser for reading a file and executing some commands. I guess very similar to bash scripts, but much simpler. I am having trouble figuring out how to tokenise the contents of a file given you are able to have comments denoted by #. To give you an example of how a source file might look
# Script to move my files across
# Author: Person Name
# First delete files
"rm -rf ~/code/bin/debug";
"rm -rf ~/.bin/tmp"; #deleting temp to prevent data corruption
# Dump file contents
"cat ~/code/rel.php > log.txt";
So far here is my code. Note that I am essentially using this little project as a means of become more comfortable and familiar with C. So pardon any obvious flaws in the code. Would appreciate the feedback.
// New line.
#define NL '\n'
// Quotes.
#define QT '"'
// Ignore comment.
#define IGN '#'
int main() {
if (argc != 2) {
show_help();
return 0;
}
FILE *fptr = fopen(argv[1], "r");
char *buff;
size_t n = 0;
int readlock = 0;
int qread = 0;
char c;
if (fptr == NULL){
printf("Error: invalid file provided %s for reading", argv[1]);
exit(1);
}
fseek(fptr, 0, SEEK_END);
long f_size = ftell(fptr);
fseek(fptr, 0, SEEK_SET);
buff = calloc(1, f_size);
// Read file contents.
// Stripping naked whitespace and comments.
// qread is when in quotation mode. Everything is stored even '#' until EOL or EOF.
while ((c = fgetc(fptr)) != EOF) {
switch(c) {
case IGN :
if (qread == 0) {
readlock = 1;
}
else {
buff[n++] = c;
}
break;
case NL :
readlock = 0;
qread = 0;
break;
case QT :
if ((readlock == 0 && qread == 0) || (readlock == 0 && qread == 1)) {
// Activate quote mode.
qread = 1;
buff[n++] = c;
}
else {
qread = 0;
}
break;
default :
if ((qread == 1 && readlock == 0) || (readlock == 0 && !isspace(c))) {
buff[n++] = c;
}
break;
}
}
fclose(fptr);
printf("Buffer contains %s \n", buff);
free(buff);
return 0;
}
So the above solution works but my question is...is there a better way to achieve the desired outcome ? At the moment i don't actually "tokenize" anything. Does the current implementation lack logic to be able create tokens based on the characters ?

It is way easier to read your file by whole lines:
char line[1024];
while(!feof(fptr))
{
if(!fgets (line , 1024 , fptr))
continue;
if(line[0] == '#') // comment
continue; // skip it
//... handle command in line here
}

Check to see if file input line is empty

I have a text file in the following format:
Some information here
Some more information here
I want to check to see if the inputted line is blank (line 2 above). I've tried various things but none of them seem to be working, there's obviously something simple that I am missing here.
void myFunc(char* file_path) {
FILE* file;
char buff[BUFFER_SIZE];
file = fopen(file_name, "r");
bool flag = false;
while(fgets(buff, BUFFER_SIZE, file) != NULL) {
if(buff[0] == '\n') {
flag = true;
}
}
}
I've tried strlen(buff) == 0, strcmp(buff, ""), buff[0] == '\0' and many other things but I still can't seem to be getting this to work properly.

It's possible that the second line has more than just the newline character.
You can use a helper function to test that out.
void printDebug(char* line)
{
char* cp = line;
for ( ; *cp != '\0'; ++cp )
{
printf("%d ", (int)(*cp));
}
printf("\n");
}
By examining the integer values of the characters printed, you can tell whether the line has more than one character, and what those characters are.

Getting sub strings separated by pipes given the exact positions of the pipes

I searched through and didn’t get a quite working answer. Although I know that still there might be plenty of answers out there. Honestly I couldn’t find it as I am a beginner to C/C++ .
My problem is I have a text file which has data on it separated by pipes('|'). Actually a log file. In each entry things are separated by pipes('|') and each entry is separated by new line('\n')its really lengthy. So I wanted to do is that when user gives a sequence sequence=[2,5,7] the function should be able to read that array and give only the things starting with that pipe position. So here It should give 2nd ,5th, and 7th things in to a text file. down here is the code I used. It doesnt work for some reason I can't find. It gives the resulting text file printed out only with the '\n' and no more.Its more thant the entries in the file too.
minorSeparatorChar is the charactor given as '|'
majorSeparatorChar is the charactor given as '\n'
inFile Input text file
outFile output text file
minSepCount minor separator count
majSepCount major separator count
sequence is a global const int array
void getFormattedOutput(char * inFile, char * outFile, char minorSeparatorChar,char majorSeparatorChar){
FILE *readFile,*writeFile;
int charactor=0, minSepCount=0, i=0,majSepCount = 0;
int flagMin = 0;
char charactorBefore = NULL;
readFile = fopen(inFile,"r"); // opens the file for reading
writeFile = fopen(outFile,"w"); // opens the file for writing
if (readFile==NULL || writeFile == NULL){
printf("\nFile creation is not a sucess, Exiting program..\n");
exit(0);
}
while(charactor!=EOF){
charactorBefore = charactor;
if (charactor==minorSeparatorChar)
flagMin=1;
charactor = fgetc(readFile);
if(charactorBefore == minorSeparatorChar){
flagMin = 0;
if (minSepCount==sequence[i]){
fputc(charactor,writeFile);
continue;
}
i++;
minSepCount++;
}
else if (charactorBefore == majorSeparatorChar){
minSepCount=0;
i=0;
majSepCount++;
fputc('\n',writeFile);
}
else{
if(flagMin==1)
fputc(charactor,writeFile);
continue;
}
}
fclose(readFile);
fclose(writeFile);
}
for example if the input file has
33|333|67|787|7889|9876554|56
20151001|0|0|0|0||94|71
1|94|71|1|94|71|1
and if I give sequence [2,5,6]
It should print to out file as
67 9876554 56
0 94 71
71 71 1

I ultimately concluded that there were too many flags and controls and variables in your code and that I couldn't make head or tail of what they were up to, and decided to rewrite the code. I couldn't see in your code how you knew how many fields were in the sequence, for example.
I write in C11 (C99), but in this program, that simply means that I declare variables when they're needed, not at the top of the function. If it's a problem (C89/C90), move the declarations to the top of the function.
I also find that the names used were so long that they obscured the purpose of the variables. You may think I've gone too far in the other direction; more significantly, your professor (teacher) may think that. So be it; names are fungible and global search and replace works well.
I also don't see how your code is supposed to interpolate semi-arbitrary numbers of blanks between the fields, so I've actually ducked the issue. This code outputs the field separator (minor_sep — a length reduction of minorSeparatorChar) and the record separator (major_sep — reduced from majorSeparatorChar) at the appropriate points.
I note that field numbers start with field 0 in your code. I'm not convinced your code would ever output data from field 0, but that is somewhat tangential given the rewrite.
I ended up with:
#include <stdio.h>
#include <stdlib.h>
static const int sequence[] = { 2, 5, 7 };
static const int seqlen = 3;
static
void getFormattedOutput(char *inFile, char *outFile, char minor_sep, char major_sep)
{
FILE *ifp = fopen(inFile, "r"); // opens the file for reading
FILE *ofp = fopen(outFile, "w"); // opens the file for writing
if (ifp == NULL || ofp == NULL)
{
printf("\nFile creation is not a success, Exiting program..\n");
exit(0);
}
int c;
int seqnum = 0;
int fieldnum = 0;
while ((c = getc(ifp)) != EOF)
{
if (c == major_sep)
{
putc(major_sep, ofp);
fieldnum = 0;
seqnum = 0;
}
else if (c == minor_sep)
{
if (seqnum < seqlen && fieldnum == sequence[seqnum])
{
putc(minor_sep, ofp);
seqnum++;
}
fieldnum++;
}
else if (fieldnum == sequence[seqnum])
fputc(c, ofp);
}
fclose(ifp);
fclose(ofp);
}
int main(void)
{
getFormattedOutput("/dev/stdin", "/dev/stdout", '|', '\n');
return 0;
}
When I run it (I called it split, though it isn't a good choice since there is also a standard command split), I get:
$ echo "fld0|fld1|fld2|fld3|fld4|fld5|fld6|fld7|fld8|fld9" | ./split
fld2|fld5|fld7|
$ echo "fld0|fld1|fld2|fld3|fld4|fld5|fld6" | ./split
fld2|fld5|
$
The only possible objection is that there is a field terminator rather than a field separator. As you can see, a terminator is not hard to implement; making it into a separator (so there isn't a pipe after the last field on the line, even when the line doesn't have as many fields as there are elements in the sequence — see the second sample output) is trickier. The code needs to output a separator when it reads the first character of a field that should be printed after the first such field. This code achieves that:
#include <stdio.h>
#include <stdlib.h>
static const int sequence[] = { 2, 5, 7 };
static const int seqlen = 3;
static
void getFormattedOutput(char *inFile, char *outFile, char minor_sep, char major_sep)
{
FILE *ifp = fopen(inFile, "r"); // opens the file for reading
FILE *ofp = fopen(outFile, "w"); // opens the file for writing
if (ifp == NULL || ofp == NULL)
{
printf("\nFile creation is not a success, Exiting program..\n");
exit(0);
}
int c;
int seqnum = 0;
int fieldnum = 0;
int sep = 0;
while ((c = getc(ifp)) != EOF)
{
if (c == major_sep)
{
putc(major_sep, ofp);
fieldnum = 0;
seqnum = 0;
sep = 0;
}
else if (c == minor_sep)
{
if (seqnum < seqlen && fieldnum == sequence[seqnum])
seqnum++;
fieldnum++;
sep = minor_sep;
}
else if (fieldnum == sequence[seqnum])
{
if (sep != 0)
{
putc(sep, ofp);
sep = 0;
}
putc(c, ofp);
}
}
fclose(ifp);
fclose(ofp);
}
int main(void)
{
getFormattedOutput("/dev/stdin", "/dev/stdout", '|', '\n');
return 0;
}
Example run:
$ {
> echo "Afld0|Afld1|Afld2|Afld3|Afld4|Afld5|Afld6|Afld7|Afld8|Afld9"
> echo "Bfld0|Bfld1|Bfld2|Bfld3|Bfld4|Bfld5|Bfld6|Bfld7|Bfld8|Bfld9"
> echo "Cfld0|Cfld1|Cfld2|Cfld3|Cfld4|Cfld5|Cfld6|Cfld7|Cfld8|Cfld9"
> echo "Dfld0|Dfld1|Dfld2|Dfld3|Dfld4|Dfld5|Dfld6|Dfld7|Dfld8|Dfld9"
> echo "Efld0|Efld1|Efld2|Efld3|Efld4|Efld5|Efld6|Efld7|Efld8|Efld9"
> } | ./split
|Afld2|Afld5|Afld7
|Bfld2|Bfld5|Bfld7
|Cfld2|Cfld5|Cfld7
|Dfld2|Dfld5|Dfld7
|Efld2|Efld5|Efld7
$

c - strcmp not returning 0 for equal strings

So I've tried searching for a solution to this extensively but can only really find posts where the new line or null byte is missing from one of the strings. I'm fairly sure that's not the case here.
I am using the following function to compare a word to a file containing a list of words with one word on each line (dictionary in the function). Here is the code:
int isWord(char * word,char * dictionary){
FILE *fp;
fp = fopen(dictionary,"r");
if(fp == NULL){
printf("error: dictionary cannot be opened\n");
return 0;
}
if(strlen(word)>17){
printf("error: word cannot be >16 characters\n");
return 0;
}
char longWord[18];
strcpy(longWord,word);
strcat(longWord,"\n");
char readValue[50] = "a\n";
while (fgets(readValue,50,fp) != NULL && strcmp(readValue,longWord) != 0){
printf("r:%sw:%s%d\n",readValue,longWord,strcmp(longWord,readValue));//this line is in for debugging
}
if(strcmp(readValue,longWord) == 0){
return 1;
}
else{
return 0;
}
}
The code compiles with no errors and the function reads the dictionary file fine and will print the list of words as they appear in there. The issue I am having is that even when the two strings are identical, strcmp is not returning 0 and so the function will return false for any input.
eg I get:
r:zymoscope
w:zymoscope
-3
Any ideas? I feel like I must be missing something obvious but have been unable to find anything in my searches.

I see you are appending a newline to your test strings to try to deal with the problem of fgets() retaining the line endings. Much better to fix this at source. You can strip all trailing stuff like this, immediately after reading from file.
readValue [ strcspn(readValue, "\r\n") ] = '\0'; // remove trailing newline etc

The string you are reading contains trailing character(s), and hence is not the same as the string you are comparing it against.
Remove the trailing newline (and CR if that is there); then you do not need to add any newline or carriage return to the string being compared:
int isWord(char *word, char *dictionary){
FILE *fp;
fp = fopen(dictionary, "r");
if (fp == NULL){
fprintf(stderr, "error: dictionary cannot be opened\n");
return 0;
}
if (strlen(word) > 16){
fprintf(stderr, "error: word cannot be >16 characters\n");
return 0;
}
char readValue[50];
while (fgets(readValue, 50, fp) != NULL){
char *ep = &readValue[strlen(readValue)-1];
while (*ep == '\n' || *ep == '\r'){
*ep-- = '\0';
}
if (strcmp(readValue, word) == 0){
return 1;
}
}
return 0;
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to compare 2 files lexicographically using C - c

Related

Correct way to limit document read in C without !feof, document error cannot read the .txt

C parse comments from textfile

Check to see if file input line is empty

Getting sub strings separated by pipes given the exact positions of the pipes

c - strcmp not returning 0 for equal strings

Categories

Resources