Structure information for pcre - c

I have the following function to compile a pcre regex:
/**
* common options: PCRE_DOTALL, PCRE_EXTENDED, PCRE_CASELESS, PCRE_MULTILINE
* full options located at: https://man7.org/linux/man-pages/man3/pcre_compile.3.html
*/
pcre* pcre_compile_pattern(const char* pattern, int options)
{
const char *pcre_error;
int error_offset;
pcre *re_compiled = pcre_compile(pattern, options, &pcre_error, &error_offset, NULL);
if (re_compiled == NULL) {
fprintf(stderr, "ERROR: '%s' occurs at pattern position %d\n", pcre_error, error_offset);
}
return re_compiled;
}
Is there a place where the pcre struct is described? For example, I'm looking to see if it contains the pattern (as a normal string) inside it or whether I have to keep the pattern separately. I've seen a lot of references in the man pages to pcre* but I haven't really been able to get more details on that struct.
In searching github here was one place I was able to find it, which seems like it might be what I'm using: https://github.com/luvit/pcre/blob/e2a236a5737b58d43bf324208406a60fe0dd95f4/pcre_internal.h#L2317. Everything is private though so you cannot access part of the struct, for example to read/print it directly.

Is there a place where the pcre struct is described?
The include file defining the interface is pcre.h for version 1 or pcre2.h for version 2.
Much in the same way that we don't need to know how stdio's FILE struct is designed, we don't need to know how pcre is defined. We also will not need the pattern after we have received a pcre struct.
Shawn, in comments, pointed out the importance of using pcre2 for new code. It is also noted on the website: pcre is end of life with 8.45 the last version, use pcre2 for new projects.
The primary change for pcre2 is more aggressive pattern validation.
A demonstration of pcre2 is available here.

Related

How to compare two (absolute) paths (given as char* ) in C and check if they are the same?

Given two paths as char*, I can't determine if the two paths are pointing to the same file.
How to implement in C a platform-independent utility to check if paths are pointing to the same file or not.
Using strcmp will not work because on windows paths can contain \ or /
Using ist_ino will not help because it does not work on windows
char *fileName = du->getFileName();
char *oldFileName = m_duPtr->getFileName();
bool isSameFile = pathCompare(fileName, oldFileName) == 0;//(strcmp(fileName, oldFileName) == 0);
if (isSameFile){
stat(fileName, &pBuf);
stat(oldFileName, &pBuf2);
if (pBuf.st_ino == pBuf2.st_ino){
bRet = true;
}
}
You can't. Hard links also exist on Windows and the C standard library has no methods for operating on them.
Plausible solutions to the larger problem: link against cygwin1.dll and use the st_ino method. You omitted st_dev from your sample code and need to put it back.
While there is an actual way to accomplish this on Windows, it involves ntdll methods and I had to read Cygwin's code to find out how to do it.
The methods are NtGetFileInformationByHandle and NtFsGetVolumeInformationNyHandle. There are documented kernel32 calls that claim to do the same thing. See the cygwin source code for why they don't work right (buggy fs drivers).

What is the entry point for git?

I was browsing through the git source code, and I was wondering where the entry point file is? I have gone through a couple files, that I thought would be it but could not find a main function.
I could be wrong, but I believe the entrypoint is main() in common-main.c.
int main(int argc, const char **argv)
{
/*
* Always open file descriptors 0/1/2 to avoid clobbering files
* in die(). It also avoids messing up when the pipes are dup'ed
* onto stdin/stdout/stderr in the child processes we spawn.
*/
sanitize_stdfds();
git_setup_gettext();
git_extract_argv0_path(argv[0]);
restore_sigpipe_to_default();
return cmd_main(argc, argv);
}
At the end you can see it returns cmd_main(argc, argv). There are a number of definitions of cmd_main(), but I believe the one returned here is the one defined in git.c, which is a bit long to post here in its entirety, but is excerpted below:
int cmd_main(int argc, const char **argv)
{
const char *cmd;
cmd = argv[0];
if (!cmd)
cmd = "git-help";
else {
const char *slash = find_last_dir_sep(cmd);
if (slash)
cmd = slash + 1;
}
/*
* "git-xxxx" is the same as "git xxxx", but we obviously:
*
* - cannot take flags in between the "git" and the "xxxx".
* - cannot execute it externally (since it would just do
* the same thing over again)
*
* So we just directly call the builtin handler, and die if
* that one cannot handle it.
*/
if (skip_prefix(cmd, "git-", &cmd)) {
argv[0] = cmd;
handle_builtin(argc, argv);
die("cannot handle %s as a builtin", cmd);
}
handle_builtin() is also defined in git.c.
Perhaps it's best to address the misunderstanding. Git is a way of collecting, recording, and archiving changes to a project directory. This is the purpose of a Version Control System, and git is perhaps one of the more recognizable ones.
Sometimes they also provide build automation, but often the best tools focus on the fewest responsibilities. In the case of git, it mostly focuses on commits to a repository in order to preserve different states of the directory it is initialized to. It doesn't build the program, so the entry points are unaffected.
For C projects, the entry point will always be the same one defined by the compiler. Generally this is a function called main, but there are ways to redefine or hide this entry point. Arduino, for example, uses setup as the entry point and then calls loop.
The comment left by #larks is an easy way to find the entry point when you're not sure. Using a simple recursive search from a git repo's root directory can hunt for the word main in any included file:
grep main *.c
The Windows equivalent is FINDSTR, but recent updates to Windows 10 have greatly improved compatibility with Bash commands. grep is usable in the version I'm running. So is ls, though I'm not sure whether it has been there all along.
Some git projects include multiple languages, and many languages related to C (and predecessors) use the same entry point name. Looking only in file extensions of .c is a good way to find the entry point of the C components, assuming the code is of high enough quality that you'd want to run it in the first place.
There are definitely ways to interfere with how well the extension filters out other languages, but their use implies very haphazard coding practice.

Have Doxygen link type to file for C

I'm creating a module in C. When I refer to that module in the documentation, I want a link to the header file, not the struct (because the functions and other useful information are at the file level).
The file my_iterator.h contains
typedef struct {
int foo;
int bar;
} my_iterator_t;
I would like references to my_iterator to create a link to my_iterator.h. For example,
/**
Create a new, specially configured my_iterator
*/
my_iterator_t* special_factory_in_another_module();
Putting "my_iterator.h" in the documentation would create the correct link, but would sound strange. Putting my_iterator_t in the documentation would sound better, but not link to a useful place.
Although it is a bit verbose, this does what I want:
/**
* Create a new, specially configured [my_iterator](#ref my_iterator.h)
*/
my_iterator_t* special_factory_in_another_module();

Mac sandbox: running a binary tool that needs /tmp

I have a sandboxed Cocoa app that, during an export process, needs to run a third party command-line tool. This tool appears to be hardcoded to use /tmp for its temporary files; sandboxing doesn't permit access to this folder, so the export fails.
How can I get this tool to run? I don't have access to its source code, so I can't modify it to use NSTemporaryDirectory(), and it doesn't appear to respect the TMP or TEMPDIR environment variables. For reasons I don't understand, giving myself a com.apple.security.temporary-exception.files.absolute-path.read-write entitlement doesn't seem to work, either.
Is there some way to re-map folders within my sandbox? Is there some obscure trick I can use? Should I try to patch the tool's binary somehow? I'm at my wit's end here.
I was able to get user3159253's DYLD_INSERT_LIBRARIES approach to work. I'm hoping they will write an answer describing how that works, so I'll leave the details of that out and explain the parts that ended up being specific to this case.
Thanks to LLDB, elbow grease, and not a little help from Hopper, I was able to determine that the third-party tool used mkstemp() to generate its temporary file names, and some calls (not all) used a fixed template starting with /tmp. I then wrote a libtmphack.dylib that intercepted calls to mkstemp() and modified the parameters before calling the standard library version.
Since mkstemp() takes a pointer to a preallocated buffer, I didn't feel like I could rewrite a path starting with a short string like "/tmp" to the very long string needed to get to the Caches folder inside the sandbox. Instead, I opted to create a symlink to it called "$tmp" in the current working directory. This could break if the tool chdir()'d at an inopportune time, but fortunately it doesn't seem to do that.
Here's my code:
//
// libtmphack.c
// Typesetter
//
// Created by Brent Royal-Gordon on 8/27/14.
// Copyright (c) 2014 Groundbreaking Software. This file is MIT licensed.
//
#include "libtmphack.h"
#include <dlfcn.h>
#include <stdlib.h>
#include <unistd.h>
//#include <errno.h>
#include <string.h>
static int gbs_has_prefix(char * needle, char * haystack) {
return strncmp(needle, haystack, strlen(needle)) == 0;
}
int mkstemp(char *template) {
static int (*original_mkstemp)(char * template) = NULL;
if(!original_mkstemp) {
original_mkstemp = dlsym(RTLD_NEXT, "mkstemp");
}
if(gbs_has_prefix("/tmp", template)) {
printf("libtmphack: rewrote mkstemp(\"%s\") ", template);
template[0] = '$';
printf("to mkstemp(\"%s\")\n", template);
// If this isn't successful, we'll presume it's because it's already been made
symlink(getenv("TEMP"), "$tmp");
int ret = original_mkstemp(template);
// Can't do this, the caller needs to be able to open the file
// int retErrno = errno;
// unlink("$tmp");
// errno = retErrno;
return ret;
}
else {
printf("libtmphack: OK with mkstemp(\"%s\")\n", template);
return original_mkstemp(template);
}
}
Very quick and dirty, but it works like a charm.
Since #BrentRoyal-Gordon has already published a working solution I'm simply duplicating my comment which inspired him to produce the solution:
In order to fix a program behavior, I would intercept and override some system calls with the help of DYLD_INSERT_LIBRARIES and a custom shared library with a custom implementation of the given system calls.
The exact list of the syscalls which need to be overridden depends on nature of the application and can be studied with a number of tools built upon MacOS DTrace kernel facility. E.g. dtruss or Hopper. #BrentRoyal-Gordon has investigated that the app can be fixed solely with an /appropriate/ implementation of mkstemp.
That's it. I'm still not sure that I've deserved the bounty :)
Another solution would be to use chroot within the child process (or posix_spawn options) to change its root directory to a directory that is within your sandbox. Its “/tmp” will then be a “tmp” directory within that directory.

Vala vapi files documentation

I'd like to hack on an existing GLib based C project using Vala.
Basically what I'm doing is, at the beginning of my build process, using valac to generate .c and .h files from my .vala files and then just compiling the generated files the way I would any .c or .h file.
This is probably not the best way, but seems to be working alright for the most part.
My problem is that I'm having a hard time accessing my existing C code from my Vala code. Is there an easy way to do this?
I've tried writing my own .vapi files (I didn't have any luck with the tool that came with vala), but I can't find any decent documentation on how to write these.
Does any exist? Do I need one of these files to call existing C code?
Yes, to call a C function, you need to write a binding for it. The process is described in http://live.gnome.org/Vala/Tutorial#Binding_Libraries_with_VAPI_Files, however, this doesn't apply directly to custom functions or libraries written without GObject. You'll probably need help from #vala IRC channel if you have complex binding for non-GObject libraries.
However, most of the time, we use simple vapi files to bind some autoconf define or some functions written in plain C, for efficiency reason or broken vala, or whatever other reason. And this is the way that most people do:
myfunc.vapi
[CCode (cheader_filename = "myfunc.h")]
namespace MyFunc {
[CCode (cname = "my_func_foo")]
public string foo (int bar, Object? o = null);
}
myfunc.h (and corresponding implementation in a .c linked with your project)
#include <glib-object.h>
char* my_func_foo(int bar, GObject* o)
example.vala could be
using MyFunc;
void main() {
baz = foo(42);
}
When compiling with valac, use --vapidir= to give the directory location of the myfunc.vapi. Depending on your build system, you may need to pass extra argument to valac or gcc CFLAGS in order to link everything together.
The only addition I would make to elmarco's answer is the extern keyword. If you're trying to access a single C function that's already available in one of your packages or the standard C/Posix libraries, you can access it easily this way.
For GLib-based libraries written in C you can try to generate gir-files from your C-sources: Vala/Bindings.
Doing it manually is no problem too. Suppose you have a library which defines SomelibClass1 in C with a method called do_something which takes a string.
The name of the headerfile is "somelib.h". Then the corresponding vapi is as simple as the following:
somelib.vapi:
[CCode (cheader_filename="somelib.h")]
namespace Somelib {
public class Class1 {
public void do_something (string str);
}
}
Documentation for writing vapis for non-GLib libraries can be found here: Vala/LegacyBindings
This is actually really easy. Lets take an excerpt from posix.vapi:
[Compact]
[CCode (cname = "FILE", free_function = "fclose", cheader_filename = "stdio.h")]
public class FILE {
[CCode (cname = "fopen")]
public static FILE? open (string path, string mode);
[CCode (cname = "fgets", instance_pos = -1)]
public unowned string? gets (char[] s);
}
This implements the following C-Function:
FILE *fopen (const char *path, const char *mode);
char *fgets (char *s, int size, FILE *stream);
When discarding the instance_pos attribute vala assumes that the object is the first parameter to a method. This way it is possible to bind c-constructs that are roughly object-oriented. The free_method of the compact-class is called when the object is dereferenced.
The CCode(cname)-attribute of a method, class, struct, etc. has to be the name of it as it would be in C.
There is a lot more to this subject, but this should give you a general overview.
It would probably be easier to just access your vala code from c. As all you have to do is just compile to C.

Resources