Retrieving Global Variable Values from Command Line - c

In one particular project, we're trying to embed version information into shared object files. We'd like to be able to use some standard linux tool to parse the shared object to determine the version for automated testing.
Currently I have "const int plugin_version = 14;". I can use 'nm' and 'objdump' and verify that it's there:
00000000000dcfbc r plugin_version
I can't, however, seem to be able to get the value of that variable easily from command line. I figured there'd be a POSIX tool for showing the initialized values for globals. I have contemplated using a format for the variable as the information itself, ie, plugin_version_14, but that seems like a huge hack. Embedding the information in the filename unfortunately is NOT an option. Any other suggestions welcome.

You could embed it as a string
"MAGIC MARKER STRING VERSION: 4.56 END OF MAGIC" then just look for "MAGIC MARKER STRING" in the file and extract the version information that comes after it.
if you make it a standard, you could easily make command line tool to find these embeded strings on all your software.
if you require it also to be an int, a little macro magic will construct both the int and magic string to make sure they are never out of synch.

There's a couple of options I think.
My first instinct is to make sure the version information lives in its own section in the ELF file. You can use objdump -s -j name of section /bin/whatever.
This rather relies on objdump being available of course.
Alternatively you can do what Keith suggested, and just use 'strings', along with a magical marker string. This feels a little hackish, but should work quite well.
Finally, why don't you just add a --version command line option? You can then store the version information however you like, and trivially retrieve it using the one tool which is certain to be installed on any system which has your software.

A terrible hack that I've used in the past is to embed the version information in a variable name, so nm will show:
00000000000dcfbc r plugin_version_14

Why not writing your own tool to get that version in C/C++ ? You could Use dlopen, then dlsym to get the symbol and print its value to standard output. This way you also verify if the symbol is already there. It looks like 20 ~ 30 lines of code to me and about 20 minutes of your life :)
I know that the question is about command line, but writing such a tool yourself should be easy (especially if such a command line tool does not exist).

If the binary is not stripped, you could use gdb to print the variable. (I just tried to script gdb, but it seems to refuse work if stdin is not a tty, maybe expect will do the job ? )

If you can accept using python, this might help:
import struct
import sys
import subprocess
if __name__ == '__main__':
so = sys.argv[1]
sym = sys.argv[2]
addr = subprocess.check_output('nm %s | grep %s' % (so, sym), shell=True)
addr = int(addr.split()[0], 16)
so_file = open(so)
so_file.seek(addr)
data = so_file.read(4)
print struct.unpack('#i', data)[0]
Disclaimer: This script doesn't do any error checking (if you like it I'm sure you can come up with some ;)). It also assumes you're reading a 4-byte native int value.
$ cat global.c
const int plugin_version = 14;
$ python readsym.py global.so plugin_version
14

Related

using GDB with arguments

For a class assignment we needed to write a compiler. This includes an optimizer portion. In other words, we take in a file with some "code". An output file is generated. In the second step we take in the outputted code and remove any "dead" code and re-output to a second file. I have some problems with the optimizer portion and would like to use gdb. But I can't get gdb to operate properly with the input and output files arguments. The way we would normally run the optimizer is:
./optimize <tinyL.out> optimized.out
where tinyL.out is the file outputted in the first step and optimized.out is the file I want to output with the new optimized and compiled code.
I have searched Google for the solution and the tips I have found do not seem to work for my situation. Most people seem to want to only accept an input file and not output a separate file as I need to do.
Any help is appreciated (of course)
I'm not exactly sure what you're asking. But since I'm not yet able to comment everywhere, I write this answer with a guess and edit/delete if necessary.
When GDB is started and before you start the program you wish to debug, set the arguments you want to use with set args.
A reference to the documentation.
You just need to do the file redirection within gdb.
gdb ./optimize
(gdb) run < tinyL.out > optimized.out
https://stackoverflow.com/a/2388594/5657035

Frama-c: save plugin analysis results in c file

I'am new in frama-c. So I apologize in advance for my question.
I would like to make a plugin that will modify the source code, clone some functions, insert some functions calls and I would like my plugin to generate a second file that will contain the modified version of the input file.
I would like to know if it is possible to generate a new file c with frama-c. For example, the results of the Sparecode and Semantic constant folding plugins are displayed on the terminal directly and not in a file. So I would like to know if Frama-c has the function to write to a file instead of sending the result of the analysis to the standard output.
Of course we can redirect the output of frama-c to a file.c for example, but in this case, for the plugin scf for example, the results of value is there and I found that frama-c replaces for example the "for" loops by while.
But what I would like is that frama-c can generate a file that will contain my original code plus the modifications that I would have inserted.
I looked in the directory src / kernel_services / ast_printing but I have not really found functions that can guide me.
Thanks.
On the command line, option -ocode <file> indicates that any subsequent -print will be done in <file> instead of the standard output (use -ocode "" after that if you want to print on stdout again). Note that -print prints the code corresponding to the current project. You can use -then-on <prj> to change the project you're interested in. More information is of course available in the user manual.
All of this is of course available programmatically. In particular, File.pretty_ast by defaults pretty-prints (i.e. output a C program) the AST of the current project on stdout, but takes two optional argument for changing the project or the formatter to which the output should be done.

How do I trace coreutils down to a syscall?

I am trying to navigate and understand whoami (and other coreutils) all the way down to the lowest level source code, just as an exercise.
My dive so far:
Where is the actual binary?
which whoami
/usr/bin/whoami
Where is it maintained?
http://www.gnu.org/software/coreutils/coreutils.html
How do I get source?
git clone git://git.sv.gnu.org/coreutils
Where is whoami source code within the repository?
# find . | grep whoami
./man/whoami.x
./man/whoami.1
./src/whoami.c
./src/whoami
./src/whoami.o
./src/.deps/src_libsinglebin_whoami_a-whoami.Po
./src/.deps/whoami.Po
relevant line (84):
uid = geteuid ();
This is approximately where my rabbit hole stops. geteuid() is mentioned in gnulib/lib/euidaccess.c, but not explicitly defined AFAICT. It's also referenced in /usr/local/unistd.h as extern but there's no heavy lifting related to grabbing a uid that I can see.
I got here by mostly grepping for geteuid within known system headers and includes as I'm having trouble backtracing its definition.
Question: How can I dive down further and explore the source code of geteuid()? What is the most efficient way to explore this codebase quickly without grepping around?
I'm on Ubuntu server 15.04 using Vim and some ctags (which hasn't been very helpful for navigating existing system headers). I'm a terrible developer and this is my method of learning, though I can't get through this roadblock.
Normally you should read the documentation for geteuid. You can either read GNU documentation, the specification from the Open Group or consult the man page.
If that doesn't help you could install the debug symbols for the c-library (it's called libc6-dbg or something similar) and download the source code for libc6) then you point out the path to the source file when you step into the library.
In this case I don't think this would take you much further, what probably happens in geteuid is that it simply issues an actual syscall and then it's into kernel space. You cannot debug that (kernel) code in the same way as you would debug a normal program.
So in your case you should better consult the documentation and read it carefully and try to figure out why geteuid doesn't return what you expect. Probably this will lead to you changing your expectation of what geteuid should return to match what's actually returned.

How do I check /etc/network/interfaces file grammar and test it in C code?

I have a file ("interfaces_new" like /etc/network/interfaces) that I have to test.
Tests concerns File grammar and if I can detect if the settings are correct by testing, it is only better.
I searched and I fell on the ifup command that seems to do everything I want.
The trouble is that I do not want to call system in my C code, so I try to retrieve the sources of ifupdown.
I do not even succeeded, many result indicating to me that a script ifup on most system. On my Debian 7.1.0, it is a EFL binary.
What comes to my questions:
* Is it a usable tool in C code which allows to parse the grammar of a file / etc / network / interfaces?
* Is there a doc listing all the valid options? ifup man gives nothing (or I look bad :/ ), the idea being to make my own parser.
* How to get the source code of the ifup command to Debian 7.1.0?
As I originally commented:
You should read and learn more about parsing. Read interfaces(5) to understand the format of /etc/network/interfaces
The ifup command is from ifupdown package, so get its source code.
At last, you simply could popen(3) the following command
ifup --verbose --no-act --force --all --interfaces=/tmp/interfaces_new
assuming your file is in /tmp/interfaces_new.
(Warning, I did not test my suggestion, leaving that to you)

Should I unescape bytea field in C-function for Postgresql and if so - how to do it?

I write my own C function for Postgresql which have bytea parameter. This function is defined as followed
CREATE OR REPLACE FUNCTION putDoc(entity_type int, entity_id int,
doc_type text, doc_data bytea) RETURNS text
AS 'proj_pg', 'call_putDoc'
LANGUAGE C STRICT;
My function call_putDoc, written on C, reads doc_data and pass its data to another function like file_magic to determine mime-type of the data and then pass data to appropriate file converter.
I call this postgresql function from php script which loads file content to last parameter. So, I should pass file contents with pg_escape_bytea.
When data are passed to call_putDoc C function, does its data already unescaped and if not - how to unescape them?
Edit: As I found, no, data, passed to C function, is not unescaped. How to unescape it?
When it comes to programming C functions for PostgreSQL, the documentation explains some of the basics, but for the rest it's usually down to reading the source code for the PostgreSQL server.
Thankfully the code is usually well structured and easy to read. I wish it had more doc comments though.
Some vital tools for navigating the source code are either:
A good IDE; or
The find and git grep commands.
In this case, after having a look I think your bytea argument is being decoded - at least in Pg 9.2, it's possible (though rather unlikely) that 8.4 behaved differently. The server should automatically do that before calling your function, and I suspect you have a programming error in how you are calling your putDoc function from SQL. Without sources it's hard to say more.
Try calling it putDoc from psql with some sample data you've verified is correctly escape encoded for your 8.4 server
Try setting a breakpoint in byteain to make sure it's called before your function
Follow through the steps below to verify that what I've said applies to 8.4.
Set a breakpoint in your function and step through with gdb, using the print function as you go to examine the variables. There are lots of gdb tutorials that'll teach you the required break, backtrace, cont, step, next, print, etc commands, so I won't repeat all that here.
As for what's wrong: You could be double-encoding your data - for example, given your comments I'm wondering if you've base64 encoded data and passed it to Pg with bytea_output set to escape. Pg would then decode it ... giving you a bytea containing the bytea representation of the base64 encoding of the bytes, not the raw bytes themselves. (Edit Sounds like probably not based on comments).
For correct use of bytea see:
http://www.postgresql.org/docs/current/static/runtime-config-client.html
http://www.postgresql.org/docs/current/static/datatype-binary.html
To say more I'd need source code.
Here's what I did:
A quick find -name bytea\* in the source tree locates src/include/utils/bytea.h. A comment there notes that the function definitions are in utils/adt/varlena.c - which turns out to actually be src/backend/util/adt/varlena.c.
In bytea.h you'll also notice the definition of the bytea_output GUC parameter, which is what you see when you SHOW bytea_output or SET bytea_output in psql.
Let's have a look at a function we know does something with bytea data, like bytea_substr, in varlena.c. It's so short I'll include one of its declarations here:
Datum
bytea_substr(PG_FUNCTION_ARGS)
{
PG_RETURN_BYTEA_P(bytea_substring(PG_GETARG_DATUM(0),
PG_GETARG_INT32(1),
PG_GETARG_INT32(2),
false));
}
Many of the public functions are wrappers around private implementation, so the private implementation can be re-used with functions that have differing arguments, or from other private code too. This is such a case; you'll see that the real implementation is bytea_substring. All the above does is handle the SQL function calling interface. It doesn't mess with the Datum containing the bytea input at all.
The real implementation bytea_substring follows directly below the SQL interface wrappers in this partcular case, so read on in varlena.c.
The implementation doesn't seem to refer to the bytea_output GUC, and basically just calls DatumGetByteaPSlice to do the work after handling some border cases. git grep DatumGetByteaPSlice shows us that DatumGetByteaPSlice is in src/include/fmgr.h, and is a macro defined as:
#define DatumGetByteaPSlice(X,m,n) ((bytea *) PG_DETOAST_DATUM_SLICE(X,m,n))
where PG_DETOAST_DATUM_SLICE is
#define PG_DETOAST_DATUM_SLICE(datum,f,c) \
pg_detoast_datum_slice((struct varlena *) DatumGetPointer(datum), \
(int32) (f), (int32) (c))
so it's just detoasting the datum and returning a memory slice. This leaves me wondering: has the decoding been done elsewhere, as part of the function call interface? Or have I missed something?
A look at byteain, the input function for bytea, shows that it's certainly decoding the data. Set a breakpoint in that function and it should trip when you call your function from SQL, showing that the bytea data is really being decoded.
For example, let's see if byteain gets called when we call bytea_substr with:
SELECT substring('1234'::bytea, 2,2);
In case you're wondering how substring(bytea) gets turned into a C call to bytea_substr, look at src/catalog/pg_proc.h for the mappings.
We'll start psql and get the pid of the backend:
$ psql -q regress
regress=# select pg_backend_pid();
pg_backend_pid
----------------
18582
(1 row)
then in another terminal connect to that pid with gdb, set a breakpoint, and continue execution:
$ sudo -u postgres gdb -q -p 18582
Attaching to process 18582
... blah blah ...
(gdb) break bytea_substr
Breakpoint 1 at 0x6a9e40: file varlena.c, line 1845.
(gdb) cont
Continuing.
In the 1st terminal we execute in psql:
SELECT substring('1234'::bytea, 2,2);
... and notice that it hangs without returning a result. Good. That's because we tripped the breakpoint in gdb, as you can see in the 2nd terminal:
Breakpoint 1, bytea_substr (fcinfo=0x1265690) at varlena.c:1845
1845 PG_RETURN_BYTEA_P(bytea_substring(PG_GETARG_DATUM(0),
(gdb)
A backtrace with the bt command doesn't show bytea_substr in the call path, it's all SQL function call machinery. So Pg is decoding the bytea before it's passing it to bytea_substr.
You can now detach the debugger with quit. This won't quit the Pg backend, only detach and quit the debugger.

Resources