fgetc not starting at beginning of file - c [duplicate] - c

This question already has an answer here:
fgetc not starting at beginning of large txt file
(1 answer)
Closed 9 years ago.
Problem solved here:
fgetc not starting at beginning of large txt file
I am working in c and fgetc isn't getting chars from the beginning of the file. It seems to be starting somewhere randomly within the file after a \n. The goal of this function is to modify the array productsPrinted. If "More Data Needed" or "Hidden non listed" is encountered, the position in the array, productsPrinted[newLineCount], will be changed to 0. Any help is appreciated.
Update: It works on smaller files, but doesn't start at the beginning of the larger,617kb, file.
function calls up to category:
findNoPics(image, productsPrinted);
findVisible(visible, productsPrinted);
removeCategories(category, productsPrinted);
example input from fgetc():
Category\n
Diagnostic & Testing /Scan Tools\n
Diagnostic & Testing /Scan Tools\n
Hidden non listed\n
Diagnostic & Testing /Scan Tools\n
Diagnostic & Testing /Scan Tools\n
Hand Tools/Open Stock\n
Hand Tools/Sockets and Drive Sets\n
More Data Needed\n
Hand Tools/Open Stock\n
Hand Tools/Open Stock\n
Hand Tools/Open Stock\n
Shop Supplies & Equip/Tool Storage\n
Hidden non listed\n
Shop Supplies & Equip/Heaters\n
Code:
void removeCategories(FILE *category, int *prodPrinted){
char more[17] = { '\0' }, hidden[18] = { '\0' };
int newLineCount = 0, i, ch = 'a', fix = 0;
while ((ch = fgetc(category)) != EOF){ //if fgetc is outside while, it works//
more[15] = hidden[16] = ch;
printf("%c", ch);
/*shift char in each list <- one*/
for (i = 0; i < 17; i++){
if (i < 17){
hidden[i] = hidden[i + 1];
}
if (i < 16){
more[i] = more[i + 1];
}
}
if (strcmp(more, "More Data Needed") == 0 || strcmp(hidden, "Hidden non listed") == 0){
prodPrinted[newLineCount] = 0;
/*printf("%c", more[0]);*/
}
if (ch == '\n'){
newLineCount++;
}
}
}

Let computers do the counting. You have not null terminated your strings properly. The fixed strings (mdn and hdl are initialized but do not have null terminators, so string comparisons using them are undefined.
Given this sample data:
Example 1
More Data Needed
Hidden non listed
Example 2
Keeping lines short.
But as they get longer, the overwrite is worse...or is it?
Hidden More Data Needed in a longer line.
Lines containing "Hidden non listed" are zapped.
Example 3
This version of the program:
#include <stdio.h>
#include <string.h>
static
void removeCategories(FILE *category, int *prodPrinted)
{
char more[17] = { '0' };
char hidden[18] = { '0' };
char mdn[17] = { "More Data Needed" };
char hnl[18] = { "Hidden non listed" };
int newLineCount = 0, i, ch = '\0';
do
{
/*shift char in each list <- one*/
for (i = 0; i < 18; i++)
{
if (i < 17)
hidden[i] = hidden[i + 1];
if (i < 16)
more[i] = more[i + 1];
}
more[15] = hidden[16] = ch = fgetc(category);
if (ch == EOF)
break;
printf("%c", ch); /*testing here, starts rndmly in file*/
//printf("<<%c>> ", ch); /*testing here, starts rndmly in file*/
//printf("more <<%s>> hidden <<%s>>\n", more, hidden);
if (strcmp(more, mdn) == 0 || strcmp(hidden, hnl) == 0)
{
prodPrinted[newLineCount] = 0;
}
if (ch == '\n')
{
newLineCount++;
}
} while (ch != EOF);
}
int main(void)
{
int prod[10];
for (int i = 0; i < 10; i++)
prod[i] = 37;
removeCategories(stdin, prod);
for (int i = 0; i < 10; i++)
printf("%d: %d\n", i, prod[i]);
return 0;
}
produces this output:
Example 1
More Data Needed
Hidden non listed
Example 2
Keeping lines short.
But as they get longer, the overwrite is worse...or is it?
Hidden More Data Needed in a longer line.
Lines containing "Hidden non listed" are zapped.
Example 3
0: 37
1: 0
2: 0
3: 37
4: 37
5: 37
6: 0
7: 0
8: 37
9: 37

You may check which mode you opened the file, and you may have some error-check to make sure you have got the right return value.
Here you can refer to man fopen to get which mode to cause the stream position.
The fopen() function opens the file whose name is the string pointed to
by path and associates a stream with it.
The argument mode points to a string beginning with one of the follow‐
ing sequences (Additional characters may follow these sequences.):
r Open text file for reading. The stream is positioned at the
beginning of the file.
r+ Open for reading and writing. The stream is positioned at the
beginning of the file.
w Truncate file to zero length or create text file for writing.
The stream is positioned at the beginning of the file.
w+ Open for reading and writing. The file is created if it does
not exist, otherwise it is truncated. The stream is positioned
at the beginning of the file.
a Open for appending (writing at end of file). The file is cre‐
ated if it does not exist. The stream is positioned at the end
of the file.
a+ Open for reading and appending (writing at end of file). The
file is created if it does not exist. The initial file position
for reading is at the beginning of the file, but output is
always appended to the end of the file.
And there is another notice, that the file you operated should not more than 2G, or there maybe problem.
And you can use fseek to set the file position indicator.
And you can use debugger to watch these variables to see why there are random value. I think debug is efficient than trace output.

Maybe you can try rewinding the file pointer at the beginning of your function.
rewind(category);
Most likely another function is reading from the same file. If this solves your problem, it would be better to find which other function (or previous call to this function) is reading from the same file and make sure rewinding the pointer won't break something else.
EDIT:
And just to be sure, maybe you could change the double assignment to two different statements. Based on this post, your problem might as well be caused by a compiler optimization of that line. I haven't checked with the standard, but according to best answer the behavior in c and c++ might be undefined, therefore your strange results. Good luck

Related

C Program to count keywords from a keyword text file in a fake resume and display the result

#EDIT: I think the problem is that I put my 2 text files on desktop. Then, I move them to the same place as the source file and it works. But the program cannot run this time, the line:
cok = 0;
shows "exception thrown".
// end EDIT
I have the assignment at school to write a C program to create 2 text files. 1 file stores 25 keywords, and 1 file stores the fake resume. The problem is, my program cannot read my keywords.txt file. Anyone can help me? Thank you so much.
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
//*****************************************************
// MAIN FUNCTION
int main()
{
//File pointer for resume.txt file declared
FILE *fpR;
//File pointer for keywords.txt file declared and open it in read mode
FILE* fpK = fopen("keywords.txt", "r");
//To store character extracted from keyword.txt file
char cK;
//To store character extracted from resume.txt file
char cR;
//To store word extracted from keyword.txt file
char wordK[50];
//To store word extracted from resume.txt file
char wordR[50];
//To store the keywords
char keywords[10][50];
//To store the keywords counter and initializes it to zero
int keywordsCount[10] = { 0, 0, 0, 0 };
int coK, coR, r, r1;
coK = coR = r = r1 = 0;
//Checks if file is unable to open then display error message
if (fpK == NULL)
{
printf("Could not open files");
exit(0);
}//End of if
//Extracts a character from keyword.txt file and stores it in cK variable and Loops till end of file
while ((cK = fgetc(fpK)) != EOF)
{
//Checks if the character is comma
if (cK != ',')
{
//Store the character in wordK coK index position
wordK[coK] = cK;
//Increase the counter coK by one
coK++;
}//End of if
//If it is comma
else
{
//Stores null character
wordK[coK] = '\0';
//Copies the wordK to the keywords r index position and increase the counter r by one
strcpy(keywords[r++], wordK);
//Re initializes the counter to zero for next word
coK = 0;
fpR = fopen("resume.txt", "r");
//Extracts a character from resume.txt file and stores it in cR variable and Loops till end of file
while ((cR = fgetc(fpR)) != EOF)
{
//Checks if the character is space
if (cR != ' ')
{
//Store the character in wordR coR index position
wordR[coR] = cR;
//Increase the counter coR by one
coR++;
}//End of if
else
{
//Stores null character
wordR[coR] = '\0';
//Re initializes the counter to zero for next word
coR = 0;
//Compares word generated from keyword.txt file and word generated from resume.txt file
if (strcmp(wordK, wordR) == 0)
{
//If both the words are same then increase the keywordCounter arrays r1 index position value by one
keywordsCount[r1] += 1;
}//End of if
}//End of else
}//End of inner while loop
//Increase the counter by one
r1++;
//Close the file for resume
fclose(fpR);
}//End of else
}//End of outer while loop
//Close the file for keyword
fclose(fpK);
//Display the result
printf("\n Result \n");
for (r = 0; r < r1; r++)
printf("\n Keyword: %s %d time available", keywords[r], keywordsCount[r]);
}//End of main
I think the problem is the text files, aren't they?
The name of my 1st test file is "keywords.txt", and its content is:
Java, CSS, HTML, XHTML, MySQL, College, University, Design, Development, Security, Skills, Tools, C, Programming, Linux, Scripting, Network, Windows, NT
The name of my 2nd test file is "resume.txt", and its content is:
Junior Web developer able to build a Web presence from the ground up -- from concept, navigation, layout, and programming to UX and SEO. Skilled at writing well-designed, testable, and efficient code using current best practices in Web development. Fast learner, hard worker, and team player who is proficient in an array of scripting languages and multimedia Web tools. (Something like this).
I don't see any problem with these 2 files. But my program still cannot open the file and the output keeps showing "Could not open files".
while ((cK = fgetc(fpK)) != EOF)
If you check the documentation, you can see that fgets returns an int. But since cK is a char, you force a conversion to char, which can change its value. You then compare the possibly changed value to EOF, which is not correct. You need to compare the value that fgets returns to EOF since fgetc returns an EOF on end of file.

Processing file extensions in C

I'm developing some code in C that reads a file extension and stores it as a code in a byte together whether a text file or binary file is being processed. Later I wish to recover the file extension that is encoded in a byte.
As a test I created a loop in the main function where I can test out the function fileExtenCode(), which is in the second listing.
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
#define EXLEN 9
#define EXNUM 8
typedef unsigned char BYTE;
bool fileExtenCode(char*, BYTE*, int*);
int main(void) {
char fileExten[EXLEN];
BYTE code;
int bin;
for (;;) {
printf("Type file extension: ");
scanf_s("%s", fileExten, EXLEN);
if (fileExten[0] == '.') break;
printf("%s\n", fileExten);
code = 0;
bin = 0;
bool extFound = fileExtenCode(fileExten, &code, &bin); // <== (1)
if (extFound) printf("Extension found: TRUE\n");
else printf("Extension found: FALSE\n");
printf("%s%d", "Code: ", code);
if (bin) printf(" binary file\n");
else printf(" text file\n");
printf("\n");
printf("Type code: ");
int icode;
scanf_s("%d", &icode);
code = icode;
bin = -1;
fileExtenCode(fileExten, &code, &bin); // <== (2)
printf("%s", fileExten); // <== (5)
printf("\n");
}
return 0;
}
The function that I'm trying to test is as follows:
bool fileExtenCode(char* ext, BYTE* code, int* binary) {
char *fileEx[EXNUM] = {
"jpg1", "txt0", "html0", "xml0", "exe1", "bmp1", "gif1", "png1"};
if (*binary < 0) { // <== (3)
ext = fileEx[*code]; // <== (4)
return true;
}
size_t extLen = strlen(ext);
for (BYTE i = 0; i < EXNUM; i++) {
if (strncmp(fileEx[i], ext, extLen) == 0) {
*binary = (fileEx[i][extLen] == '1') ? 1 : 0;
*code = i;
return true;
}
}
return false;
}
The idea is that you pass a string with the file extension to fileExtenCode() in statement (1) in main, and the function searched for that extension in an array, and if found returns true together with the code argument indicating the position in array of file extensions and the binary flag as 0 or 1 indicating if the file is text or binary. A '0' or '1' immediately follows file extension in the array. If the extension is not found, the function returns with false and the return values in the arguments have no meaning.
So far so good, and this part works correctly. However, in using the function in reverse to recover the file extension given the input value of code, it fails when called with statement (2) in main. In this case binary is set to -1, and then the function is called and the condition at (3) is now true and ext in (4) recovers the file extension. This is confirmed when inserting a temporary print statement immediately after (4), but this value is not returned in (5) back in main, and an old input value is instead printed.
Obviously there is a problem with pointers, but I cannot see an obvious way of fixing it. My question is how to correct this without messing up the rest of the code, which is working correctly? Note that char* ext and BYTE* code are used for both input and output, whilst int* binary is used as an input flag and returns no useful value when set to -1.
Once this problem is fixed, then it should be relatively easy to separate the binary flag from the extension when the binary flag is set to -1. Eventually I plan to have many more file extensions, but not until this is working correctly with a sample of 8.
Getting help in fixing this problem would be most appreciated.
OK, many thanks pmg, that works, except that I have to use:
strcpy_s(ext, EXLEN, fileEx[*code]);
as the Visual Studio 2022 compiler flags an error. This also solves a warning I was getting when I declared the array *fileEx[EXNUM] with the const keyword.
In my haste last night I omitted to include the statement:
if (*code >= EXNUM) return false;
immediately after (3) to trap the case when *code goes out of bounds of *fileEx[EXNUM].

Strategy for cycling trough preexisting set of variables in c

I’m trying to program a HMI console to read a file from an USB pen drive and display its data on the screen. This is a csv file and the objective is to store the interpreted data to HMI console memory, which the HMI console later interprets. The macros on these consoles run in C (not C++).
I have no issue with both reading and interpreting the file, the issue that the existing function (not accessible to me, shown below) to write in the console memory only interprets char.
int WriteLocal( const char *type, int addr, int nRegs, void *buf , int flag );
Parameter: type is the string of "LW","LB" etc;
address is the Operation address ;
nRegs is the length of read or write ;
buf is the buffer which store the reading or writing data
flag is 0,then codetype is BIN,is 1 then codetype is BCD;
return value : 1 , Operation success
0 , Operation fail.
As my luck would have it I need to write integer values. What are available to me are the variables for each memory position. These are preexisting and are named individually such as:
int WR_LW200;
int WR_LW202;
int WR_LW204;
...
int WR_LW20n;
Ideally we could have a vector with all the names of the variables but unfortunately this is not possible. I could manually write every single variable but I need to do 300 of these…
must be a better way, right?
Just to give you a look on how it ended up looking:
int* arr[50][5] = { {&WR_LW200, &WR_LW400, &WR_LW600, &WR_LW800, &WR_LW1000},
{&WR_LW202, &WR_LW402, &WR_LW602, &WR_LW802, &WR_LW1002},
{&WR_LW204, &WR_LW404, &WR_LW604, &WR_LW804, &WR_LW1004},
{&WR_LW206, &WR_LW406, &WR_LW606, &WR_LW806, &WR_LW1006},
{&WR_LW208, &WR_LW408, &WR_LW608, &WR_LW808, &WR_LW1008},
{&WR_LW210, &WR_LW410, &WR_LW610, &WR_LW810, &WR_LW1010},
{&WR_LW212, &WR_LW412, &WR_LW612, &WR_LW812, &WR_LW1012},
{&WR_LW214, &WR_LW414, &WR_LW614, &WR_LW814, &WR_LW1014},
{&WR_LW216, &WR_LW416, &WR_LW616, &WR_LW816, &WR_LW1016},
{&WR_LW218, &WR_LW418, &WR_LW618, &WR_LW818, &WR_LW1018},
{&WR_LW220, &WR_LW420, &WR_LW620, &WR_LW820, &WR_LW1020},
{&WR_LW222, &WR_LW422, &WR_LW622, &WR_LW822, &WR_LW1022},
{&WR_LW224, &WR_LW424, &WR_LW624, &WR_LW824, &WR_LW1024},
{&WR_LW226, &WR_LW426, &WR_LW626, &WR_LW826, &WR_LW1026},
{&WR_LW228, &WR_LW428, &WR_LW628, &WR_LW828, &WR_LW1028},
{&WR_LW230, &WR_LW430, &WR_LW630, &WR_LW830, &WR_LW1030},
{&WR_LW232, &WR_LW432, &WR_LW632, &WR_LW832, &WR_LW1032},
{&WR_LW234, &WR_LW434, &WR_LW634, &WR_LW834, &WR_LW1034},
{&WR_LW236, &WR_LW436, &WR_LW636, &WR_LW836, &WR_LW1036},
{&WR_LW238, &WR_LW438, &WR_LW638, &WR_LW838, &WR_LW1038},
{&WR_LW240, &WR_LW440, &WR_LW640, &WR_LW840, &WR_LW1040},
{&WR_LW242, &WR_LW442, &WR_LW642, &WR_LW842, &WR_LW1042},
{&WR_LW244, &WR_LW444, &WR_LW644, &WR_LW844, &WR_LW1044},
{&WR_LW246, &WR_LW446, &WR_LW646, &WR_LW846, &WR_LW1046},
{&WR_LW248, &WR_LW448, &WR_LW648, &WR_LW848, &WR_LW1048},
{&WR_LW250, &WR_LW450, &WR_LW650, &WR_LW850, &WR_LW1050},
{&WR_LW252, &WR_LW452, &WR_LW652, &WR_LW852, &WR_LW1052},
{&WR_LW254, &WR_LW454, &WR_LW654, &WR_LW854, &WR_LW1054},
{&WR_LW256, &WR_LW456, &WR_LW656, &WR_LW856, &WR_LW1056},
{&WR_LW258, &WR_LW458, &WR_LW658, &WR_LW858, &WR_LW1058},
{&WR_LW260, &WR_LW460, &WR_LW660, &WR_LW860, &WR_LW1060},
{&WR_LW262, &WR_LW462, &WR_LW662, &WR_LW862, &WR_LW1062},
{&WR_LW264, &WR_LW464, &WR_LW664, &WR_LW864, &WR_LW1064},
{&WR_LW266, &WR_LW466, &WR_LW666, &WR_LW866, &WR_LW1066},
{&WR_LW268, &WR_LW468, &WR_LW668, &WR_LW868, &WR_LW1068},
{&WR_LW270, &WR_LW470, &WR_LW670, &WR_LW870, &WR_LW1070},
{&WR_LW272, &WR_LW472, &WR_LW672, &WR_LW872, &WR_LW1072},
{&WR_LW274, &WR_LW474, &WR_LW674, &WR_LW874, &WR_LW1074},
{&WR_LW276, &WR_LW476, &WR_LW676, &WR_LW876, &WR_LW1076},
{&WR_LW278, &WR_LW478, &WR_LW678, &WR_LW878, &WR_LW1078},
{&WR_LW280, &WR_LW480, &WR_LW680, &WR_LW880, &WR_LW1080},
{&WR_LW282, &WR_LW482, &WR_LW682, &WR_LW882, &WR_LW1082},
{&WR_LW284, &WR_LW484, &WR_LW684, &WR_LW884, &WR_LW1084},
{&WR_LW286, &WR_LW486, &WR_LW686, &WR_LW886, &WR_LW1086},
{&WR_LW288, &WR_LW488, &WR_LW688, &WR_LW888, &WR_LW1088},
{&WR_LW290, &WR_LW490, &WR_LW690, &WR_LW890, &WR_LW1090},
{&WR_LW292, &WR_LW492, &WR_LW692, &WR_LW892, &WR_LW1092},
{&WR_LW294, &WR_LW494, &WR_LW694, &WR_LW894, &WR_LW1094},
{&WR_LW296, &WR_LW496, &WR_LW696, &WR_LW896, &WR_LW1096},
{&WR_LW298, &WR_LW498, &WR_LW698, &WR_LW898, &WR_LW1098} };
Big right? I had consurns that this HMI would have issues with such an approach but it did the job. The code below runs trough a string that comes from the csv file. This code runs inside another while cycle to cycle trough the multi dimensional array.
it's a little crude but works.
while (i<=5)
{
memset(lineTemp, 0, sizeof lineTemp); // clear lineTemp array
while (lineFromFile[index] != delimiter)
{
if (lineFromFile[index] != delimiter && lineFromFile[index] != '\0') { lineTemp[j] = lineFromFile[index]; index++; j++; }
if (lineFromFile[index] == '\0') { i = 5; break; }
}
index++;
lineTemp[j] = '\0'; // NULL TERMINATION
j = 0;
if (i == -1) { WriteLocal("LW",temp,3,lineTemp,0); }
if (i >= 0 && i<=5) { *(arr[x][i]) = atoi(lineTemp); }
i++;
}
Thanks again for the tip.
Cheers

LZW encoding for large file

I am building an LZW encoding algorithm, which uses dictionary and hashing so it can reach fast enough for working words already stored in a dictionary.
The algorithm gives proper results when ran on smaller files (cca few hundreds of symbols), but on the larger files (and especially in those files which contain of less different symbols - for example, it gives the worst performance when ran on a file which consists only of 1 symbol, 'y' let's say). The worst performance, in terms that it just crashes when dictionary is not even close to being full. However, when the large input file consists of more than 1 symbol, dictionary gets close to being full, approximately 90%, but again then it crashes.
Considering the structure of my algorithm, I am not quite sure what is causing it to crash in general, or crash so soon when large file of just 1 symbol is given.
It must be something about hashing (first time doing it, so it might have some bugs).
The hash function I am using can be found here, and from what I have tested it, it gives good results: oat_hash
LZW encoding algorithm is based on this link, with slight change, that it works until the dictionary is not full: LZW encoder
Let's get into code:
Note: oat_hash is changed so it returns value % CAPACITY, so every index is from DICTIONARY
// Globals
#define CAPACITY 100000
char *DICTIONARY[CAPACITY];
unsigned short CODES[CAPACITY]; // CODES and DICTIONARY are linked via index: word from dictionary on index i, has its code in CODES on index i
int position = 0;
int code_counter = 0;
void encode(FILE *input, FILE *output){
int succ1 = fseek(input, 0, SEEK_SET);
if(succ1 != 0) printf("Error: file not open!");
int succ2 = fseek(output, 0, SEEK_SET);
if(succ2 != 0) printf("Error: file not open!");
//1. Working word = next symbol from the input
char *working_word = malloc(2048*sizeof(char));
char new_symbol = getc(input);
working_word[0] = new_symbol;
working_word[1] = '\0';
//2. WHILE(there are more symbols on the input) DO
//3. NewSymbol = next symbol from the input
while((new_symbol = getc(input)) != EOF){
char *workingWord_and_newSymbol= NULL;
char newSymbol[2];
newSymbol[0] = new_symbol;
newSymbol[1] = '\0';
workingWord_and_newSymbol = working_word_and_new_symbol(working_word, newSymbol);
int index = oat_hash(workingWord_and_newSymbol, strlen(workingWord_and_newSymbol));
//4. IF(WorkingWord + NewSymbol) is already in the dictionary THEN
if(DICTIONARY[index] != NULL){
// 5. WorkingWord += NewSymbol
working_word = working_word_and_new_symbol(working_word, newSymbol);
}
//6. ELSE
else{
//7. OUTPUT: code for WorkingWord
int idx = oat_hash(working_word, strlen(working_word));
fprintf(output, "%u", CODES[idx]);
//8. Add (WorkingWord + NewSymbol) into a dictionary and assign it a new code
if(!dictionary_full()){
DICTIONARY[index] = workingWord_and_newSymbol;
CODES[index] = code_counter + 1;
code_counter += 1;
working_word = strdup(newSymbol);
}else break;
}
//10. END IF
}
//11. END WHILE
//12. OUTPUT: code for WorkingWord
int index = oat_hash(working_word, strlen(working_word));
fprintf(output, "%u", CODES[index]);
free(working_word);
}
int index = oat_hash(workingWord_and_newSymbol, strlen(workingWord_and_newSymbol));
And later
int idx = oat_hash(working_word, strlen(working_word));
fprintf(output, "%u", CODES[idx]);
//8. Add (WorkingWord + NewSymbol) into a dictionary and assign it a new code
if(!dictionary_full()){
DICTIONARY[index] = workingWord_and_newSymbol;
CODES[index] = code_counter + 1;
code_counter += 1;
working_word = strdup(newSymbol);
}else break;
idx and index are unbounded and you use them to access a bounded array. You're accessing memory out of range. Here's a suggestion, but it may skew the distribution. If your hash range is much larger than CAPACITY it shouldn't be a problem. But you also have another problem which was mentioned, collisions, you need to handle them. But that's a different problem.
int index = oat_hash(workingWord_and_newSymbol, strlen(workingWord_and_newSymbol)) % CAPACITY;
// and
int idx = oat_hash(working_word, strlen(working_word)) % CAPACITY;
LZW compression is certainly used to construct binary files and normally is capable of reading binary files.
The following code is problematic as it relies on new_symbol never being a \0.
newSymbol[0] = new_symbol; newSymbol[1] = '\0';
strlen(workingWord_and_newSymbol)
strdup(newSymbol)
Needs re-write to work with arrays of bytes rather than strings.
fopen() was not shown. Insure one is opening in binary. input = fopen(..., "rb");
#Wumpus Q. Wumbley is correct, use int newSymbol.
Minor:
new_symbol and newSymbol are confusing.
Consider:
// char *working_word = malloc(2048*sizeof(char));
#define WORKING_WORD_N (2048)
char *working_word = malloc(WORKING_WORD_N*sizeof(*working_word));
// or
char *working_word = malloc(WORKING_WORD_N);

C++ How to determine and report offset for current position in binary file using pubseekoff?

I have only been working with C++ for about 6 months so my apologies for any stupid mistakes I make in my code.
I am working on a project that will read in a binary file to a streambuf and then search the streambuf for specific sequences and report the sequences starting offset in the binary file as well as an identifier for which specific sequence was found in the binary file.
The sequences to search for are stored in a vector> called snortRules where each index of the snortRules vector is the a single sequence stored as vector of ints.
Ultimately i need to create a text file that will show the offset and specific sequence found for comparison against another file that should match, but for now I am just printing the results to a console window.
My code seems to be working as it reports finding all the sequences except for a single issue. For some reason I sometimes get a repeat of the offset for where a sequence is found. for example output would be:
101521 #655
101816 #656 <--This output is correct
101816 #657 <--This output is wrong and should have offset 101865
101893 #658 <--This output is correct
101893 #659 <--This output is wrong and should have offset 102084
102105 #660
102325 #661
102378 #662
I think my issues is that I am not getting the offset correctly using streambuf's pubseekoff.
my code is:
fstream binFS(binName, ios::in | ios::binary); //create stream
streambuf* binBuff = binFS.rdbuf(); //create streambuf from filestream
int currentChar; //ascii code of current character from binary file
streamsize offset; //location of first character of rule in binary file
do
{ //until the end of binary file
for(unsigned i=0; i<snortRules.size(); ++i)
{//start by finding the begining of current rule
do
{ //check binBuff until first char of first rule is found.
currentChar = binBuff->sgetc();
if( snortRules[i][0] == currentChar )//binBuff->sgetc() ) //compare short rule 1st char to curtent position
{ //match save offset and compare rest of rule
offset = binBuff->pubseekoff(0,ios::cur); //set the offset to current postion
bool checkNext = true; //bool to break loop when checking rules if not matching
for(unsigned srIdx=0; srIdx < snortRules[i].size() && checkNext == true; ++srIdx)
{ //loop through current rule comparing characters
if(snortRules[i][srIdx] == currentChar)
{ //match check if end of rule or advance character to compare to next in rule
if( srIdx == snortRules[i].size()-1)
{ //match write the offset of fist character of rule and which was matched
std::cout<<offset<<" #"<<i+1<<endl; //write to console for debugging purposes will write to text file when working
++i; //increment rule to next when exact match found
}
else
{ //not at the end of the rule so continue to compare next characters
currentChar = binBuff->snextc(); //set currentChar to next char to be checked by for loop
}
}
else
{ //no match break out and continue
checkNext == false; //set flag to break for loop because characters didnt match
}
}
}
else
{ //no match check next character in binBuff
//offset = binBuff->pubseekoff (0,binFS.cur, binFS.in); //set the offset to current postion
currentChar = binBuff->snextc(); //get next char in binBuff
//offset = binBuff->pubseekoff (0,binFS.cur, binFS.in); //set the offset to current postion
}
}while( currentChar != streambuf::traits_type::eof() ); //stop checking if end of file reached
}
}while( binBuff->snextc() != streambuf::traits_type::eof() ); //stop checking if end of file reached
I have tried to find documentation and examples for pubseekoff on stackoverflow.com and cplusplus.com but what I can find is not making much sense to me and honestly I am not 100% sure it is my problem...
Any help or suggestions is greatly appreciated!
The entire problem was the line:
checkNext == false;
The "==" has no effect and cause the error because it needed to be:
checkNext = false;
So that it would assign the bool value and terminate loops at appropriate times.
I overlooked the warning issued by the compiler in Visual Studio and chased my tail for a couple of hours and now I feel like a fool.
So if you need an involved example of streambuf this may be helpful. And if you are having some random runtime error that is not making any sense, check for warnings.

Resources