What is the entry point for git? - c

I was browsing through the git source code, and I was wondering where the entry point file is? I have gone through a couple files, that I thought would be it but could not find a main function.

I could be wrong, but I believe the entrypoint is main() in common-main.c.
int main(int argc, const char **argv)
{
/*
* Always open file descriptors 0/1/2 to avoid clobbering files
* in die(). It also avoids messing up when the pipes are dup'ed
* onto stdin/stdout/stderr in the child processes we spawn.
*/
sanitize_stdfds();
git_setup_gettext();
git_extract_argv0_path(argv[0]);
restore_sigpipe_to_default();
return cmd_main(argc, argv);
}
At the end you can see it returns cmd_main(argc, argv). There are a number of definitions of cmd_main(), but I believe the one returned here is the one defined in git.c, which is a bit long to post here in its entirety, but is excerpted below:
int cmd_main(int argc, const char **argv)
{
const char *cmd;
cmd = argv[0];
if (!cmd)
cmd = "git-help";
else {
const char *slash = find_last_dir_sep(cmd);
if (slash)
cmd = slash + 1;
}
/*
* "git-xxxx" is the same as "git xxxx", but we obviously:
*
* - cannot take flags in between the "git" and the "xxxx".
* - cannot execute it externally (since it would just do
* the same thing over again)
*
* So we just directly call the builtin handler, and die if
* that one cannot handle it.
*/
if (skip_prefix(cmd, "git-", &cmd)) {
argv[0] = cmd;
handle_builtin(argc, argv);
die("cannot handle %s as a builtin", cmd);
}
handle_builtin() is also defined in git.c.

Perhaps it's best to address the misunderstanding. Git is a way of collecting, recording, and archiving changes to a project directory. This is the purpose of a Version Control System, and git is perhaps one of the more recognizable ones.
Sometimes they also provide build automation, but often the best tools focus on the fewest responsibilities. In the case of git, it mostly focuses on commits to a repository in order to preserve different states of the directory it is initialized to. It doesn't build the program, so the entry points are unaffected.
For C projects, the entry point will always be the same one defined by the compiler. Generally this is a function called main, but there are ways to redefine or hide this entry point. Arduino, for example, uses setup as the entry point and then calls loop.
The comment left by #larks is an easy way to find the entry point when you're not sure. Using a simple recursive search from a git repo's root directory can hunt for the word main in any included file:
grep main *.c
The Windows equivalent is FINDSTR, but recent updates to Windows 10 have greatly improved compatibility with Bash commands. grep is usable in the version I'm running. So is ls, though I'm not sure whether it has been there all along.
Some git projects include multiple languages, and many languages related to C (and predecessors) use the same entry point name. Looking only in file extensions of .c is a good way to find the entry point of the C components, assuming the code is of high enough quality that you'd want to run it in the first place.
There are definitely ways to interfere with how well the extension filters out other languages, but their use implies very haphazard coding practice.

Related

Proper methods to Copy files/folders programmatically in C using POSIX functions

These terms may not be 100% accurate, but I'm using the GCC compiler and POSIX library. I have C code compiled with the SQLite amalgamation file to a single executable.
In the user interface that exchanges JSON messages with the C program, I'd like to make it possible for users to copy the SQLite database files they create through the C program, and copy a full directory/folder.
Thus far, I've been able to rename and move files and folders programmatically.
I've read many questions and answers here, at Microsoft's C runtime library, and other places but I must be missing the fundamental points. I'm using regular old C, not C++ or C#.
My question is are there POSIX functions similar to rename(), _mkdir(), rmdir(), remove(), _stat(), that allow for programmatic copying of files and folders in Windows and Linux?
If not, can one just make a new folder and/or file and fread/fwrite the bytes from the original file to the new file?
I am primarily concerned with copying SQLite database files, although I wouldn't mind knowing the answer in general also.
Is this answer an adequate method?
Is the system() function a poor method? It seems to work quite well. However, it took awhile to figure out how to stop the messages, such as "copied 2 files" from being sent to stdout and shutting down the requesting application since it's not well-formed JSON. This answer explains how and has a link to Microsoft "Using command redirection operators". A /q in xcopy may or may not be necessary also, but certainly didn't do the job alone.
Thank you very much for any direction you may be able to provide.
The question that someone suggested as an answer and placed the little submission box on this question is one that I had already linked to in my question. I don't mean to be rude but, if it had answered my question, I would not have written this one. Thank you whoever you are for taking the time to respond, I appreciate it.
I don't see how that would be a better option than using system() because with the right parameters all the sub-directories and files of a single parent folder can be copied in one statement without having to iterate through all of them manually. Is there any reason why it would not be better to use system() apart from the fact that code will need to be different for each OS?
Handling errors are a bit different because system() doesn't return an errno but an exit code; however, the errors can be redirected from stderr to a file and pulled from there, when necessary
rename(): posix
_mkdir(): not posix. You want mkdir which is. mkdir takes two arguments, the second of which should usually be 077.
rmdir(): posix
remove(): posix
_stat(): not posix, you want stat() which is.
_stat and _mkdir are called as such on the Windows C library because they're not quite compatible with the modern Unix calls. _mkdir is missing an argument, and _stat looks like a very old version of the Unix call. You'll have trouble on Windows with files larger than 2GB.
You could do:
#ifdef _WIN32
int mkdir(const char *path, int mode) { return _mkdir(path); } /* In the original C we could have #defined this but that doesn't work anymore */
#define stat _stat64
#endif
but if you do so, test it like crazy.
In the end, you're going to be copying stuff with stdio; this loop works. (beware the linked answer; it has bugs that'll bite ya.)
int copyfile(const char *src, const char *dst)
{
const int bufsz = 65536;
char *buf = malloc(bufsz);
if (!buf) return -1; /* like mkdir, rmdir, return 0 for success, -1 for failure */
FILE *hin = fopen(src, "rb");
if (!hin) { free(buf); return -1; }
FILE *hout = fopen(dst, "wb");
if (!hout) { free(buf); fclose(hin); return -1; }
size_t buflen;
while ((buflen = fread(buf, 1, bufsz)) > 0) {
if (buflen != fwrite(buf, 1, buflen)) {
fclose(hout);
fclose(hin);
free(buf);
return -1; /* IO error writing data */
}
}
free(buf);
int r = ferror(hin) ? -1 : 0; /* check if fread had indicated IO error on input */
fclose(hin);
return r | (fclose(hout) ? -1 : 0); /* final case: check if IO error flushing buffer -- don't omit this it really can happen; calling `fflush()` won't help. */
}

How to call same file as different name for different method?

I was recently at a presentation where one of the speakers stated that he'd used a single CGI file, written in C, that is called by the webserver, but the webserver calls the file by using different names, the CGI file would run a different method.
How can I have a single C file execute different functions within when it is called by different names? Also how do I re-direct the calls for differently named files back to this single file?
Is this possible or was he just full of himself?
If you create the executable with different names but with the same code base, you can take a different branch of the code based the name of the executable used to invoke the program.
Simple example file:
#include <stdio.h>
#include <string.h>
int main1(int argc, char** argv)
{
printf("Came to main1.\n");
return 0;
}
int main2(int argc, char** argv)
{
printf("Came to main2.\n");
return 0;
}
int main(int argc, char** argv)
{
// If the program was invoked using main1, go to main1
if (strstr(argv[0], "main1") != NULL )
{
return main1(argc-1, argv+1);
}
// If the program was invoked using main2, go to main2
if (strstr(argv[0], "main2") != NULL )
{
return main2(argc-1, argv+1);
}
// Don't know what to do.
return -1;
}
Create two different executables from the file.
cc test-262.c -o main1
cc test-262.c -o main2
Then, invoke the program by using the two different executables:
./main1
Output:
Came to main1.
and...
./main2
Output:
Came to main2.
Unix filesystems support the concept of hard and soft links. To create them just type:
ln origfile newfile
to create a hard link, or:
ln -s origfile newfile
to create a soft link.
Soft links are just a special kind of file that contains the path of another file. Most operations in the link transparently result in operating on the target file.
Hard links are lower level. In effect, all files are a link from the pathname to the content. In Unix you can link more than one pathname to the same content. In effect, there's no "original" and "links", all are links. When you delete a file, you're just removing a link, and when the link count goes to zero, the content is removed.
Many unix utilities do this trick. Since the running shell includes the name used to invoque the executable, it's handled just like the 0'th argument of the command line.
When a CGI script is being called by a web server, it receives a considerable amount of information in its environment to let it know how it was called, including:
SCRIPT_NAME, the path to the script from the document root
SCRIPT_FILENAME, the filesystem path to the script (usually the same as argv[0])
REQUEST_URI, the path that was requested by the browser (usually similar to SCRIPT_NAME in the absence of URL rewriting)
QUERY_STRING and PATH_INFO, which contain URL parameters following the script's name
HTTP_*, which contain most of the HTTP headers that were passed in the request
Point is, the script gets a lot of information about how it was called. It could be using any of those to make its decision.
It's possible and actually pretty common.
The first element in the argv array passed to the main function is the "name" of the executable. This can be the full path, or it can be just the last component of the path, or -- if the executable is started with an exec* function call, it can be an arbitrary string. (And Posix allows it to be a null string, as well, but in practice that's pretty rare.)
So there is nothing stopping the executable from looking at argv[0] (having first checked to make sure that argc > 0) and parsing it.
The most typical way to introduce a different name for the executable is to insert a filesystem link with the alternate name (which could be either a hard or a soft-link, but for maintainability soft links are more useful.)
For CGIs, it is not even necessary to examine argv[0], since there are various useful environment variables, including (at least): SCRIPT_NAME.

Mac sandbox: running a binary tool that needs /tmp

I have a sandboxed Cocoa app that, during an export process, needs to run a third party command-line tool. This tool appears to be hardcoded to use /tmp for its temporary files; sandboxing doesn't permit access to this folder, so the export fails.
How can I get this tool to run? I don't have access to its source code, so I can't modify it to use NSTemporaryDirectory(), and it doesn't appear to respect the TMP or TEMPDIR environment variables. For reasons I don't understand, giving myself a com.apple.security.temporary-exception.files.absolute-path.read-write entitlement doesn't seem to work, either.
Is there some way to re-map folders within my sandbox? Is there some obscure trick I can use? Should I try to patch the tool's binary somehow? I'm at my wit's end here.
I was able to get user3159253's DYLD_INSERT_LIBRARIES approach to work. I'm hoping they will write an answer describing how that works, so I'll leave the details of that out and explain the parts that ended up being specific to this case.
Thanks to LLDB, elbow grease, and not a little help from Hopper, I was able to determine that the third-party tool used mkstemp() to generate its temporary file names, and some calls (not all) used a fixed template starting with /tmp. I then wrote a libtmphack.dylib that intercepted calls to mkstemp() and modified the parameters before calling the standard library version.
Since mkstemp() takes a pointer to a preallocated buffer, I didn't feel like I could rewrite a path starting with a short string like "/tmp" to the very long string needed to get to the Caches folder inside the sandbox. Instead, I opted to create a symlink to it called "$tmp" in the current working directory. This could break if the tool chdir()'d at an inopportune time, but fortunately it doesn't seem to do that.
Here's my code:
//
// libtmphack.c
// Typesetter
//
// Created by Brent Royal-Gordon on 8/27/14.
// Copyright (c) 2014 Groundbreaking Software. This file is MIT licensed.
//
#include "libtmphack.h"
#include <dlfcn.h>
#include <stdlib.h>
#include <unistd.h>
//#include <errno.h>
#include <string.h>
static int gbs_has_prefix(char * needle, char * haystack) {
return strncmp(needle, haystack, strlen(needle)) == 0;
}
int mkstemp(char *template) {
static int (*original_mkstemp)(char * template) = NULL;
if(!original_mkstemp) {
original_mkstemp = dlsym(RTLD_NEXT, "mkstemp");
}
if(gbs_has_prefix("/tmp", template)) {
printf("libtmphack: rewrote mkstemp(\"%s\") ", template);
template[0] = '$';
printf("to mkstemp(\"%s\")\n", template);
// If this isn't successful, we'll presume it's because it's already been made
symlink(getenv("TEMP"), "$tmp");
int ret = original_mkstemp(template);
// Can't do this, the caller needs to be able to open the file
// int retErrno = errno;
// unlink("$tmp");
// errno = retErrno;
return ret;
}
else {
printf("libtmphack: OK with mkstemp(\"%s\")\n", template);
return original_mkstemp(template);
}
}
Very quick and dirty, but it works like a charm.
Since #BrentRoyal-Gordon has already published a working solution I'm simply duplicating my comment which inspired him to produce the solution:
In order to fix a program behavior, I would intercept and override some system calls with the help of DYLD_INSERT_LIBRARIES and a custom shared library with a custom implementation of the given system calls.
The exact list of the syscalls which need to be overridden depends on nature of the application and can be studied with a number of tools built upon MacOS DTrace kernel facility. E.g. dtruss or Hopper. #BrentRoyal-Gordon has investigated that the app can be fixed solely with an /appropriate/ implementation of mkstemp.
That's it. I'm still not sure that I've deserved the bounty :)
Another solution would be to use chroot within the child process (or posix_spawn options) to change its root directory to a directory that is within your sandbox. Its “/tmp” will then be a “tmp” directory within that directory.

program doesnt work if called from another folder

In Command Prompt, this works: whatever\folder> bezier.exe
but this doesn't: whatever> folder\bezier.exe
My bezier program loads some settings from a local file, so I believe the problem is that the program thinks its directory is whatever\ when it is actually whatever\folder\. I'm calling it from within a C program using CreateProcess(). If I am correct in guessing the problem, is there any way to ensure the program has the right directory for itself?
the main method of bezier.exe:
int main(int argc, char* argv[]) {
char buf[200];
FILE* f = fopen("out.txt","w");
GetCurrentDirectory(200,buf);
fprintf(f,buf);
fclose(f);
SDL_Surface* screen;
SDL_Event e;
SDL_Init(SDL_INIT_VIDEO);
screen = SDL_SetVideoMode(WIDTH, HEIGHT, 32, SDL_FULLSCREEN|SDL_HWSURFACE);
if (screen == NULL)
exit(-1);
SDL_ShowCursor(SDL_DISABLE);
srand(time(NULL));
loadColors(COLOR_FILE);
fill(screen, backColor);
initialiseVars();
while (e.type != SDL_KEYDOWN)
{
//do stuff
}
SDL_Quit();
return 0;
}
Here's the crazy part. With "..> folder\bezier.exe" it doesn't write its path, but it does start a new window. That doesn't make any sense to me, because SDL_SetVideoMode is after writing the path.
You can use GetModuleHandle and GetModuleFileName to find out where your execuatble file is, then use that information to create a file specification for your local settings file.
GetModuleHandle with a NULL argument will give you the handle for the current executable. Then, passing that to GetModuleFileName will give you the fully qualified name of that executable. Just strip off the executable filename from the end and add your configuration file name.
However, that's been a bad idea for a long time now, since Windows provides per-application and per-user settings areas for this sort of stuff - you can generally get those locations with SHGetFolderLocation() and its brethren.
Use the first method only if this is for a personal project. If you plan to release your software to the wild, you should separate executable and configuration information as per Microsoft guidelines.
Regardless of that, it appears you now have the problem that you think the file is not being written to. You need to check that. When you open that file out.txt for write, it does so in the current directory. If you're running in the parent directory (with folder\bezier.exe), it will create it in the parent directory and looking for it in the folder directory is a waste of time.
If you are looking in the directory where you're running the program from, and it's still not being created, there are possible reasons for this. For a start, you should check (ie, capture and output) the return codes from all those f* functions, fopen, fprintf and fclose.

Distinguish between optional arguments, pathname or file? c language

I am very new with c and less experienced with any other language :/
For an assignment at uni, I am a little stuck on this small part. Essentially I am required to write a 'ls' function that has 4 optional arguments, for example:
list [-l] [-f] [pathname] [localfile]
Now, the first two are straight forward. To make things more difficult, the 'localfile' doesn't necessarily exist and the 'pathname'(if given) will be located on the server I'm connecting to through a socket (so checking if it is a file is out and checking the pathname is out). I was thinking, check last 4 chars in the string for a '.txt' or something similar. I'm actually completely stumped and will present this problem to my course conveyor tomorrow, if I can't find a solution.
This is a very small part of what I actually have to do but any push in the right direction would be appreciated.
You will need to process argc and argv to get your command line arguments. That is the first thing to work on, getting the arguments - ensuring they are correct, and determining what is being asked for.
int main(int argc, char *argv[])
Assuming your are on Linux/Unix, you will need to use the directory functions opendir()/readdir()/closedir() - dirent.h. The stat() function will be required to satisfy the -l requirement. access() will determine if a file exists and then stat() will tell you if the file is a regular file or a directory.
I'd make a struct to hold the four optional arguments and return it from a function called "process_arguments" that takes argc and argv as parameters.
struct args {
bool valid;
bool l_option;
bool f_option
char directory[200];
char filename[200];
}
With the requirement for a socket connection you will have to write a "server program" that will be constantly running on the server and a "client program" that it will fork to handle the requests from your local program. Try and locate examples of socket programs.
Another check for whether you have a path string or a filename is to look for the path separator character - '/' if the server is Unix/Linux. This scheme shouldn't have any path separators in filenames, so the presence of one tells you it is a path.

Resources