Can memmove accessing the contents of a FILE* and delete information? - c

Does memmove work on file pointer data?
I am trying to remove a line from a C file. I am trying to use memmove to make this more efficient than the internet's recommendation to create a duplicate file and overwrite it. I have debugged and I can't figure out why this code isn't working. I am asking for input. The logic is a for loop. Inside the loop, I have logic to do a memmove but it doesn't seem effective.
nt RemoveRow(int iRowNum)
{
char sReplaceLineStart[m_MaxSizeRow]={0};
char sTemp[m_MaxSizeRow] ={0};
size_t RemovalLength = 0;
GoToBeginningOfFile();
for(int i =0;i<m_iNumberOfRows;i++)
{
if(i == iRowNum)
{
// Line to remove
fgets(m_sRemovalRow,m_MaxSizeRow,pFile);
}
if(m_sRemovalRow == NULL)
{
// Were removing the last line
// just make it null
memset(m_sRemovalRow,0,sizeof(m_MaxSizeRow));
}
}
else if(i==iRowNum+1)
{
// replace removal line with this.
RemovalLength+=strlen(sTemp);
fgets(sReplaceLineStart, m_MaxSizeRow, pFile);
}
else if(i>iRowNum) {
// start line to replace with
RemovalLength+=strlen(sTemp);
fgets(sTemp, m_MaxSizeRow, pFile);
}
else
{
// were trying to get to the removal line
fgets(m_sCurrentRow, m_MaxSizeRow, pFile);
printf("(not at del row yet)iRow(%d)<iRowNum(%d) %s\n",
i,
m_iNumberOfRows,
m_sCurrentRow);
}
}
{
memmove(m_sRemovalRow,
sReplaceLineStart,
RemovalLength);
}
return 1;
}

FILE is a so-called opaque type, meaning that the application programmer is purposely locked out of its internals as per design - private encapsulation.
Generally one would create an opaque type using the concept of forward declaration, like this:
// stdio.h
typedef struct FILE FILE;
And then inside the private library:
// stdio.c - not accessible by the application programmer
struct FILE
{
// internals
};
Since FILE was forward declared and we only have access to the header, FILE is now an incomplete type, meaning we can't declare an instance of that type, access its members nor pass it to sizeof etc. We can only access it through the API which does know the internals. Since C allows us to declare a pointer to an incomplete type, the API will use FILE* like fopen does.
However, the implementation of the standard library isn't required to implement FILE like this - the option is simply there. So depending on the implementation of the standard library, we may or may not be able to create an instance of a FILE objet and perhaps even access its internals. But that's all in the realm of non-standard language extensions and such code would be non-portable.

Related

Why do we use structure of function pointers? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
As they say, your learn coding techniques from others' code. I've been trying to understand couple of free stacks and they all have one thing in common: Structure of function pointers. I've following of questions related to this architecture.
Is there any specific reason behind such an architecture?
Does function call via function pointer help in any optimization?
Example:
void do_Command1(void)
{
// Do something
}
void do_Command2(void)
{
// Do something
}
Option 1: Direct execution of above functions
void do_Func(void)
{
do_Command1();
do_Command2();
}
Option 2: Indirect execution of above functions via function pointers
// Create structure for function pointers
typedef struct
{
void (*pDo_Command1)(void);
void (*pDo_Command2)(void);
}EXECUTE_FUNC_STRUCT;
// Update structure instance with functions address
EXECUTE_FUNC_STRUCT ExecFunc = {
do_Command1,
do_Command2,
};
void do_Func(void)
{
EXECUTE_FUNC_STRUCT *pExecFunc; // Create structure pointer
pExecFun = &ExecFunc; // Assign structure instance address to the structure pointer
pExecFun->pDo_Command1(); // Execute command 1 function via structure pointer
pExecFun->pDo_Command2(); // Execute command 2 function via structure pointer
}
While Option 1 is easy to understand and implement, why do we need to use Option 2?
While Option 1 is easy to understand and implement, why do we need to use Option 2?
Option 1 doesn't allow you to change the behavior without changing the code - it will always execute the same functions in the same order every time the program is executed. Which, sometimes, is the right answer.
Option 2 gives you the flexibility to execute different functions, or to execute do_Command2 before do_Command1, based decisions at runtime (say after reading a configuration file, or based on the result of another operation, etc.).
Real-world example from personal experience - I was working on an application that would read data files generated from Labview-driven instruments and load them into a database. There were four different instruments, and for each instrument there were two types of files, one for calibration and the other containing actual data. The file naming convention was such that I could select the parsing routine based on the file name. Now, I could have written my code such that:
void parse ( const char *fileName )
{
if ( fileTypeIs( fileName, "GRA" ) && fileExtIs( fileName, "DAT" ) )
parseGraDat( fileName );
else if ( fileTypeIs( fileName, "GRA" ) && fileExtIs ( fileName, "CAL" ) )
parseGraCal( fileName );
else if ( fileTypeIs( fileName, "SON" ) && fileExtIs ( fileName, "DAT" ) )
parseSonDat( fileName );
// etc.
}
and that would have worked just fine. However, at the time, there was a possibility that new instruments would be added later and that there may be additional file types for the instruments. So, I decided that instead of a long if-else chain, I would use a lookup table. That way, if I did have to add new parsing routines, all I had to do was write the new routine and add an entry for it to the lookup table - I didn't have to modify any of the main program logic. The table looked something like this:
struct lut {
const char *type;
const char *ext;
void (*parseFunc)( const char * );
} LUT[] = { {"GRA", "DAT", parseGraDat },
{"GRA", "CAL", parseGraCal },
{"SON", "DAT", parseSonDat },
{"SON", "CAL", parseSonCal },
// etc.
};
Then I had a function that would take the file name, search the lookup table, and return the appropriate parsing function (or NULL if the filename wasn't recognized):
void (*parse)(const char *) = findParseFunc( LUT, fileName );
if ( parse )
parse( fileName );
else
log( ERROR, "No parsing function for %s", fileName );
Again, there's no reason I couldn't have used the if-else chain, and in retrospect it's probably what I should have done for that particular app1. But it's a really powerful technique for writing code that needs to be flexible and responsive.
I suffer from a tendency towards premature generalization - I'm writing code to solve what I think will be issues five years from now instead of the issue today, and I wind up with code that tends to be more complex than necessary.
Best explained via Example.
Example 1:
Lets say you want to implement a Shape class with a draw() method, then you would need a function pointer in order to do that.
struct Shape {
void (*draw)(struct Shape*);
};
void draw(struct Shape* s) {
s->draw(s);
}
void draw_rect(struct Shape *s) {}
void draw_ellipse(struct Shape *s) {}
int main()
{
struct Shape rect = { .draw = draw_rect };
struct Shape ellipse = { .draw = draw_ellipse };
struct Shape *shapes[] = { &rect, &ellipse };
for (int i=0; i < 2; ++i)
draw(shapes[i]);
}
Example 2:
FILE *file = fopen(...);
FILE *mem = fmemopen(...); /* POSIX */
Without function pointers, there would be no way to implement a common interface for file and memory streams.
Addendum
Well, there is another way. Based on the Shape example:
enum ShapeId {
SHAPE_RECT,
SHAPE_ELLIPSE
};
struct Shape {
enum ShapeId id;
};
void draw(struct Shape *s)
{
switch (s->id) {
case SHAPE_RECT: draw_rect(s); break;
case SHAPE_ELLIPSE: draw_ellipse(s); break;
}
}
The advantage of the second example could be, that the compiler could inline the functions, then you would have omitted the overhead of a function call.
"Everything in computer science can be solved with one more level of indirection."
The struct-of-function-pointers "pattern", let's call it, permits runtime choices. SQLite uses it all over the place, for example, for portability. If you provide a "file system" meeting its required semantics, then you can run SQLite on it, with Posix nowhere in sight.
GnuCOBOL uses the same idea for indexed files. Cobol defines ISAM semantics, whereby a program can read a record from a file by specifying a key. The underlying name-value store can be provided by several (configurable) libraries, which all provide the same functionality, but use different names for their "read a record" function. By wrapping these up as function pointers, the Cobol runtime support library can use any of those key-value systems, or even more than one at the same time (for different files, of course).

why handle to an object frequently appears as pointer-to-pointer

What is the intention to set handle to an object as pointer-to pointer but not pointer? Like following code:
FT_Library library;
FT_Error error = FT_Init_FreeType( &library );
where
typedef struct FT_LibraryRec_ *FT_Library
so &library is a FT_LIBraryRec_ handle of type FT_LIBraryRec_**
It's a way to emulate pass by reference in C, which otherwise only have pass by value.
The 'C' library function FT_Init_FreeType has two outputs, the error code and/or the library handle (which is a pointer).
In C++ we'd more naturally either:
return an object which encapsulated the success or failure of the call and the library handle, or
return one output - the library handle, and throw an exception on failure.
C APIs are generally not implemented this way.
It is not unusual for a C Library function to return a success code, and to be passed the addresses of in/out variables to be conditionally mutated, as per the case above.
The approach hides implementation. It speeds up compilation of your code. It allows to upgrade data structures used by the library without breaking existing code that uses them. Finally, it makes sure the address of that object never changes, and that you don’t copy these objects.
Here’s how the version with a single pointer might be implemented:
struct FT_Struct
{
// Some fields/properties go here, e.g.
int field1;
char* field2;
}
FT_Error Init( FT_Struct* p )
{
p->field1 = 11;
p->field2 = malloc( 100 );
if( nullptr == p->field2 )
return E_OUTOFMEMORY;
return S_OK;
}
Or C++ equivalent, without any pointers:
class FT_Struct
{
int field1;
std::vector<char> field2;
public:
FT_Struct() :
field1( 11 )
{
field2.resize( 100 );
}
};
As a user of the library, you have to include struct/class FT_Struct definition. Libraries can be very complex so this will slow down compilation of your code.
If the library is dynamic i.e. *.dll on windows, *.so on linux or *.dylib on osx, you upgrade the library and if the new version changes memory layout of the struct/class, old applications will crash.
Because of the way C++ works, objects are passed by value, i.e. you normally expect them to be movable and copiable, which is not necessarily what library author wants to support.
Now consider the following function instead:
FT_Error Init( FT_Struct** pp )
{
try
{
*pp = new FT_Struct();
return S_OK;
}
catch( std::exception& ex )
{
return E_FAIL;
}
}
As a user of the library, you no longer need to know what’s inside FT_Struct or even what size it is. You don’t need to #include the implementation details, i.e. compilation will be faster.
This plays nicely with dynamic libraries, library author can change memory layout however they please, as long as the C API is stable, old apps will continue to work.
The API guarantees you won’t copy or move the values, you can’t copy structures of unknown lengths.

C example of using AntLR

I am wondering where I can find C tutorial/example of using AntLR. All I found is using Java language.
I am focusing to find a main function which use the parser and lexer generated by AntLR.
Take a look at this document
And here is an example:
// Example of a grammar for parsing C sources,
// Adapted from Java equivalent example, by Terence Parr
// Author: Jim Idle - April 2007
// Permission is granted to use this example code in any way you want, so long as
// all the original authors are cited.
//
// set ts=4,sw=4
// Tab size is 4 chars, indent is 4 chars
// Notes: Although all the examples provided are configured to be built
// by Visual Studio 2005, based on the custom build rules
// provided in $(ANTLRSRC)/code/antlr/main/runtime/C/vs2005/rulefiles/antlr3.rules
// there is no reason that this MUST be the case. Provided that you know how
// to run the antlr tool, then just compile the resulting .c files and this
// file together, using say gcc or whatever: gcc *.c -I. -o XXX
// The C code is generic and will compile and run on all platforms (please
// report any warnings or errors to the antlr-interest newsgroup (see www.antlr.org)
// so that they may be corrected for any platform that I have not specifically tested.
//
// The project settings such as additional library paths and include paths have been set
// relative to the place where this source code sits on the ANTLR perforce system. You
// may well need to change the settings to locate the includes and the lib files. UNIX
// people need -L path/to/antlr/libs -lantlr3c (release mode) or -lantlr3cd (debug)
//
// Jim Idle (jimi cut-this at idle ws)
//
// You may adopt your own practices by all means, but in general it is best
// to create a single include for your project, that will include the ANTLR3 C
// runtime header files, the generated header files (all of which are safe to include
// multiple times) and your own project related header files. Use <> to include and
// -I on the compile line (which vs2005 now handles, where vs2003 did not).
//
#include <C.h>
// Main entry point for this example
//
int ANTLR3_CDECL
main (int argc, char *argv[])
{
// Now we declare the ANTLR related local variables we need.
// Note that unless you are convinced you will never need thread safe
// versions for your project, then you should always create such things
// as instance variables for each invocation.
// -------------------
// Name of the input file. Note that we always use the abstract type pANTLR3_UINT8
// for ASCII/8 bit strings - the runtime library guarantees that this will be
// good on all platforms. This is a general rule - always use the ANTLR3 supplied
// typedefs for pointers/types/etc.
//
pANTLR3_UINT8 fName;
// The ANTLR3 character input stream, which abstracts the input source such that
// it is easy to provide input from different sources such as files, or
// memory strings.
//
// For an ASCII/latin-1 memory string use:
// input = antlr3NewAsciiStringInPlaceStream (stringtouse, (ANTLR3_UINT64) length, NULL);
//
// For a UCS2 (16 bit) memory string use:
// input = antlr3NewUCS2StringInPlaceStream (stringtouse, (ANTLR3_UINT64) length, NULL);
//
// For input from a file, see code below
//
// Note that this is essentially a pointer to a structure containing pointers to functions.
// You can create your own input stream type (copy one of the existing ones) and override any
// individual function by installing your own pointer after you have created the standard
// version.
//
pANTLR3_INPUT_STREAM input;
// The lexer is of course generated by ANTLR, and so the lexer type is not upper case.
// The lexer is supplied with a pANTLR3_INPUT_STREAM from whence it consumes its
// input and generates a token stream as output.
//
pCLexer lxr;
// The token stream is produced by the ANTLR3 generated lexer. Again it is a structure based
// API/Object, which you can customise and override methods of as you wish. a Token stream is
// supplied to the generated parser, and you can write your own token stream and pass this in
// if you wish.
//
pANTLR3_COMMON_TOKEN_STREAM tstream;
// The C parser is also generated by ANTLR and accepts a token stream as explained
// above. The token stream can be any source in fact, so long as it implements the
// ANTLR3_TOKEN_SOURCE interface. In this case the parser does not return anything
// but it can of course specify any kind of return type from the rule you invoke
// when calling it.
//
pCParser psr;
// Create the input stream based upon the argument supplied to us on the command line
// for this example, the input will always default to ./input if there is no explicit
// argument.
//
if (argc < 2 || argv[1] == NULL)
{
fName =(pANTLR3_UINT8)"./input"; // Note in VS2005 debug, working directory must be configured
}
else
{
fName = (pANTLR3_UINT8)argv[1];
}
// Create the input stream using the supplied file name
// (Use antlr3AsciiFileStreamNew for UCS2/16bit input).
//
input = antlr3AsciiFileStreamNew(fName);
// The input will be created successfully, providing that there is enough
// memory and the file exists etc
//
if ( input == NULL)
{
fprintf(stderr, "Failed to open file %s\n", (char *)fName);
exit(1);
}
// Our input stream is now open and all set to go, so we can create a new instance of our
// lexer and set the lexer input to our input stream:
// (file | memory | ?) --> inputstream -> lexer --> tokenstream --> parser ( --> treeparser )?
//
lxr = CLexerNew(input); // CLexerNew is generated by ANTLR
// Need to check for errors
//
if ( lxr == NULL )
{
fprintf(stderr, "Unable to create the lexer due to malloc() failure1\n");
exit(1);
}
// Our lexer is in place, so we can create the token stream from it
// NB: Nothing happens yet other than the file has been read. We are just
// connecting all these things together and they will be invoked when we
// call the parser rule. ANTLR3_SIZE_HINT can be left at the default usually
// unless you have a very large token stream/input. Each generated lexer
// provides a token source interface, which is the second argument to the
// token stream creator.
// Note that even if you implement your own token structure, it will always
// contain a standard common token within it and this is the pointer that
// you pass around to everything else. A common token as a pointer within
// it that should point to your own outer token structure.
//
tstream = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT, TOKENSOURCE(lxr));
if (tstream == NULL)
{
fprintf(stderr, "Out of memory trying to allocate token stream\n");
exit(1);
}
// Finally, now that we have our lexer constructed, we can create the parser
//
psr = CParserNew(tstream); // CParserNew is generated by ANTLR3
if (psr == NULL)
{
fprintf(stderr, "Out of memory trying to allocate parser\n");
exit(ANTLR3_ERR_NOMEM);
}
// We are all ready to go. Though that looked complicated at first glance,
// I am sure, you will see that in fact most of the code above is dealing
// with errors and there isn't really that much to do (isn't this always the
// case in C? ;-).
//
// So, we now invoke the parser. All elements of ANTLR3 generated C components
// as well as the ANTLR C runtime library itself are pseudo objects. This means
// that they are represented as pointers to structures, which contain any
// instance data they need, and a set of pointers to other interfaces or
// 'methods'. Note that in general, these few pointers we have created here are
// the only things you will ever explicitly free() as everything else is created
// via factories, that allocated memory efficiently and free() everything they use
// automatically when you close the parser/lexer/etc.
//
// Note that this means only that the methods are always called via the object
// pointer and the first argument to any method, is a pointer to the structure itself.
// It also has the side advantage, if you are using an IDE such as VS2005 that can do it
// that when you type ->, you will see a list of tall the methods the object supports.
//
psr->translation_unit(psr);
// We did not return anything from this parser rule, so we can finish. It only remains
// to close down our open objects, in the reverse order we created them
//
psr ->free (psr); psr = NULL;
tstream ->free (tstream); tstream = NULL;
lxr ->free (lxr); lxr = NULL;
input ->close (input); input = NULL;
return 0;
}
contrapunctus.net/blog/2012/antlr-c a simple google would suffice. Note however, the example is C++ I don't think ANTLR supports PURE C – Aniket Jan 1 at 1:56

How to get metadata from Libextractor into a struct

I want to use Libextractor to get keywords/metadata for files.
The basic example for it is -
struct EXTRACTOR_PluginList *plugins
= EXTRACTOR_plugin_add_defaults (EXTRACTOR_OPTION_DEFAULT_POLICY);
EXTRACTOR_extract (plugins, argv[1],
NULL, 0,
&EXTRACTOR_meta_data_print, stdout);
EXTRACTOR_plugin_remove_all (plugins);
However, this calls the function EXTRACTOR_meta_data_print which "prints" it to "stdout"
I'm looking at a way to get this information to another function - i.e. pass or store this in memory for further working. The documentation was not clear to me. Any help or experience regarding this?
I've tried to install libextractor and failed to get it working (it always returns a NULL plugin pointer upon call to EXTRACTOR_plugin_add_defaults()), so what I will write next is NOT TESTED:
from : http://www.gnu.org/software/libextractor/manual/libextractor.html#Extracting
Function Pointer: int
(*EXTRACTOR_MetaDataProcessor)(void *cls,
const char *plugin_name,
enum EXTRACTOR_MetaType type,
enum EXTRACTOR_MetaFormat format,
const char *data_mime_type,
const char *data,
size_t data_len)
and
Type of a function that libextractor calls for each meta data item found.
cls
closure (user-defined)
plugin_name
name of the plugin that produced this value;
special values can be used (i.e. '<zlib>' for
zlib being used in the main libextractor library
and yielding meta data);
type
libextractor-type describing the meta data;
format basic
format information about data
data_mime_type
mime-type of data (not of the original file);
can be NULL (if mime-type is not known);
data
actual meta-data found
data_len
number of bytes in data
Return 0 to continue extracting, 1 to abort.
So you would just have to write your own function called whatever you want, and have this declaration be like:
int whateveryouwant(void *cls,
const char *plugin_name,
enum EXTRACTOR_MetaType type,
enum EXTRACTOR_MetaFormat format,
const char *data_mime_type,
const char *data,
size_t data_len)
{
// Do your stuff here
if(stop)
return 1; // Stops
else
return 0; // Continues
}
and call it via:
EXTRACTOR_extract (plugins, argv[1],
NULL, 0,
&whateveryouwant,
NULL/* here be dragons */);
Like described in http://www.gnu.org/software/libextractor/manual/libextractor.html#Generalities "3.3 Introduction to the libextractor library"
[here be dragons]: That is a parameter left for the user's use (even if it's redundant to say so). As defined in the doc: "For each meta data item found, GNU libextractor will call the ‘proc’ function, passing ‘proc_cls’ as the first argument to ‘proc’."
Where "the proc function" being the function you added (whateveryouwant() here) and proc_cls being an arbitrary pointer (can be anything) for you to pass data to the function. Like a pointer to stdout in the example, in order to print to stdout. That being said, I suspect that the function writes to a FILE* and not inevitably to stdout; so if you open a file for writing, and pass its "file decriptor" as last EXTRACTOR_extract()'s parameter you would probably end with a file filled with the information you can currently read on your screen. That wouldn't be a proper way to access the information, but if you're looking into a quick and dirty way to test some behavior or some feature; that could do it, until you write a proper function.
Good luck with your code!

How avoid using global variable when using nftw

I want to use nftw to traverse a directory structure in C.
However, given what I want to do, I don't see a way around using a global variable.
The textbook examples of using (n)ftw all involve doing something like printing out a filename. I want, instead, to take the pathname and file checksum and place those in a data structure. But I don't see a good way to do that, given the limits on what can be passed to nftw.
The solution I'm using involves a global variable. The function called by nftw can then access that variable and add the required data.
Is there any reasonable way to do this without using a global variable?
Here's the exchange in previous post on stackoverflow in which someone suggested I post this as a follow-up.
Using ftw can be really, really bad. Internally it will save the the function pointer that you use, if another thread then does something else it will overwrite the function pointer.
Horror scenario:
thread 1: count billions of files
thread 2: delete some files
thread 1: ---oops, it is now deleting billions of
files instead of counting them.
In short. You are better off using fts_open.
If you still want to use nftw then my suggestion is to put the "global" type in a namespace and mark it as "thread_local". You should be able to adjust this to your needs.
/* in some cpp file */
namespace {
thread_local size_t gTotalBytes{0}; // thread local makes this thread safe
int GetSize(const char* path, const struct stat* statPtr, int currentFlag, struct FTW* internalFtwUsage) {
gTotalBytes+= statPtr->st_size;
return 0; //ntfw continues
}
} // namespace
size_t RecursiveFolderDiskUsed(const std::string& startPath) {
const int flags = FTW_DEPTH | FTW_MOUNT | FTW_PHYS;
const int maxFileDescriptorsToUse = 1024; // or whatever
const int result = nftw(startPath.c_str(), GetSize, maxFileDescriptorsToUse , flags);
// log or something if result== -1
return gTotalBytes;
}
No. nftw doesn't offer any user parameter that could be passed to the function, so you have to use global (or static) variables in C.
GCC offers an extension "nested function" which should capture the variables of their enclosing scopes, so they could be used like this:
void f()
{
int i = 0;
int fn(const char *,
const struct stat *, int, struct FTW *) {
i++;
return 0;
};
nftw("path", fn, 10, 0);
}
The data is best given static linkage (i.e. file-scope) in a separate module that includes only functions required to access the data, including the function passed to nftw(). That way the data is not visible globally and all access is controlled. It may be that the function that calls ntfw() is also part of this module, enabling the function passed to nftw() to also be static, and thus invisible externally.
In other words, you should do what you are probably doing already, but use separate compilation and static linkage judiciously to make the data only visible via access functions. Data with static linkage is accessible by any function within the same translation unit, and you avoid the problems associated with global variables by only including functions in that translation unit that are creators, maintainers or accessors of that data.
The general pattern is:
datamodule.h
#if defined DATAMODULE_INCLUDE
<type> create_data( <args>) ;
<type> get_data( <args> ) ;
#endif
datamodule.c
#include "datamodule.h"
static <type> my_data ;
static int nftwfunc(const char *filename, const struct stat *statptr, int fileflags, struct FTW *pfwt)
{
// update/add to my_data
...
}
<type> create_data( const char* path, <other args>)
{
...
ret = nftw( path, nftwfunc, fd_limit, flags);
...
}
<type> get_data( <args> )
{
// Get requested data from my_data and return it to caller
}

Resources