is it possible to get the the path where the program was called from?
I call the program on z/Os like this
call 'MCOE.XXXXXXXX.C.LOAD(args)' 'hi there'
My intention is to get the MCOE.XXXXXXXX.C.LOAD dataset in called program without specifying this path as a parameter.
Thanks!
PetrS
This is a non-trivial exercise, but it can generally be done.
You'd start by using CSVINFO to get some information about your program, and then the trick is to emulate the search z/OS would have done to find your module...private/task library, STEPLIB/JOBLIB, (M)LPA search, LNKLST, etc - once you know the load module name definitely (your "args" program name might be an alias, or something the caller setup with an IDENTIFY macro), you can get a lot of this with BLDL, assuming you know which DCB to use.
Once you figure out the DDNAME and concatenation number (after all, there might be 10 libraries in your STEPLIB!), you'd scan the allocation data structures to get the actual dataset name. Typically, this is done by traversing the data structures in memory (PSATOLD->TCBTIO, then indexing into the TIOT till you find the entry you want...the matching TIOT entry will have a pointer to the JFCB - or a SWA manager token - that you can use to get the JFCB, and the JFCB has the dataset name and all the other details you want).
In the case of a fetch from LNKLST, you have extra work to figure out exactly which dataset in the LNKLST concatenation you were fetched from. Again, possible but it requires a bit of finesse.
If your program happens to be in (M)LPA, I'm not sure you can reliably retrieve the original dataset name it was fetched from - this might be the worst case, although there are no doubt a variety of other potential challenges, such as dealing with UNIX Services executable pathnames.
Good luck if you decide to give it a try!
Related
For a certain time now, I'm looking to build a logging framework in C (not C++!), but for small microcontrollers or devices with a small footprint of some sort. For this, I've had the idea of hashing the strings that are being logged to a certain value and just saving the hashed value with the timestamp instead of the complete ASCII string. The hash can then be correlated with a 'database' file that would be generated from an external process that parses the strings out of the C source files and saves the logged strings along with the hash value.
After doing a little bit of research, this idea is not new, but I do not find an implementation of this idea in C. In other languages, this idea has been worked out, but that is not the goal of my exercise. An example may be this talk where the same concept has been worked out in C++: youtube.com/watch?v=Dt0vx-7e_B0
Some of the requirements that I've set myself for this library are the following:
as portable C code as possible
COMPILE TIME optimization/hashing for the string hash conversion, it should be equivalent to just printf("%d\n", hashed_value) for a single log statement. (Assuming no parameters/arguments for this particular logging statement).
arguments can be passed to the logging statement similar to the printf function.
user can define their own output function (being console, file descriptor, sending the data directly over an UART connection,...)
fast to run!! fast to compile is nice to have, but it should not be terribly slow.
very easy to use, no very complicated API to use the library.
But to achieve this in C, what is a good approach? I've tried several things now, but do not seem to have found a good method of achieving this.
An overview of things I've tried so far, along with the drawbacks are:
Full pre-processor string hashing: did get it working, but the compile time is terribly slow. Also, this code does not feel to be very portable over multiple C compilers.
Semi pre-processor string hashing: The idea was to generate a hash for each string and make an external header file with the defines in of each string with their hash value. The problem here is that I cannot figure out a way of converting the string to the correct define preprocessor value.
Letting go of the default logging macro with a string pointer: Instead of working with the most used method of LOG_DEBUG("Some logging statement"), converting it with an external parser to /*LOG_DEBUG("Some logging statement") */ LOG_RAW(45). This solves the problem of hashing the string since the hash will be replaced by the external parser with the correct hash, but is not the cleanest to read since the original statement will be a comment.
Also expanding this idea to take care of arguments proved to be tricky. How to take care of multiple types of variables as efficiently as possible?
I've tried some other methods but all without success. Especially when I want to add arguments to log the value of a variable, for example, it gets very complicated, and I do not get the required result...
I have this funny idea: write some data (say variable of integer type) to the end of the executable itself and then read it on the next run.
Is this possible? Is it a bad thing to do (I'm pretty sure it's :) )? How one would approach this problem?
Additional:
I would prefer to do this with C under Linux OS, but answers with any combination of programming language/OS would be appreciated.
EDIT:
After some time playing with the idea, it became apparent that Linux won't allow to write to a file while it's being executed. However, it allows to delete it.
My vision of the writing process at this point:
make a copy of the program from withing a program
append data to the end of the copy
make a program to delete itself
rename copy to the original name
Will try to implement that as soon as I have some time.
If anyone is interested about how "delete itself" works under Linux - look for info about inode. It's not possible to do this under Windows, as far as I know (might be wrong).
EDIT 2:
Have implemented a working example under Linux with C.
It basically use a strategy described above, i.e. appending bits of data to the end of the copy program, deletes itself and renaming program to the original name. It accepts integers to save as single argument in the CLI, and prints old data as well.
This surely won't work under Windows (although I found some options on a quick search), but I'm curious how it's gonna behave under OS X.
Efficiency thoughts:
Obviously copying whole executable isn't efficient. I guess that something faster is possible with another helper executable that will do the same after program stops executing.
It's not reusing old space but just appending new data to the end on each run. This can be fixed with some footer reservation process (maybe will try to implement this in the future)
EDIT 3:
Surprisingly, it works with OS X! (ver. 10.11.5, default gcc).
There are several questions dealing with some aspects of this problem, but neither seems to answer it wholly. The whole problem can be summarized as follows:
You have an already compiled executable (obviously expecting the use of this technique).
You want to add an arbitrarily sized binary data to it (not necessarily by itself which would be another nasty problem to deal with).
You want the already compiled executable to be able to access this added binary data.
My particular use-case would be an interpreter, where I would like to make the user able to produce a single file executable out of an interpreter binary and the code he supplies (the interpreter binary being the executable which would have to be patched with the user supplied code as binary data).
A similar case are self-extracting archives, where a program (the archiving utility, such as zip) is capable to construct such an executable which contains a pre-built decompressor (the already compiled executable), and user-supplied data (the contents of the archive). Obviously no compiler or linker is involved in this process (Thanks, Mathias for the note and pointing out 7-zip).
Using existing questions a particular path of solution shows along the following examples:
appending data to an exe - This deals with the aspect of adding arbitrary data to arbitrary exes, without covering how to actually access it (basically simple append usually works, also true with Unix's ELF format).
Finding current executable's path without /proc/self/exe - In companion with the above, this would allow getting a file name to use for opening the exe, to access the added data. There are many more of these kind of questions, however neither focuses especially on the problem of getting a path suitable for the purpose of actually getting the binary opened as a file (which goal alone might (?) be easier to accomplish - truly you don't even need the path, just the binary opened for reading).
There also may be other, probably more elegant ways around this problem than padding the binary and opening the file for reading it in. For example could the executable be made so that it becomes rather trivial to patch it later with the arbitrarily sized data so it appears "within" it being in some proper data segment? (I couldn't really find anything on this, for fixed size data it should be trivial though unless the executable has some hash)
Can this be done reasonably well with as little deviation from standard C as possible? Even more or less cross-platform? (At least from maintenance standpoint) Note that it would be preferred if the program performing the adding of the binary data didn't rely on compiler tools to do it (which the user might not have), but solutions necessiting those might also be useful.
Note the already compiled executable criteria (the first point in the above list), which requires a completely different approach than solutions described in questions like C/C++ with GCC: Statically add resource files to executable/library or SDL embed image inside program executable , which ask for embedding data compile-time.
Additional notes:
The problems with the obvious approach outlined above and suggested in some comments, that to just append to the binary and use that, are as follows:
Opening the currently running program's binary doesn't seem something trivial (opening the executable for reading is, but not finding the path to supply to the file open call, at least not in a reasonably cross-platform manner).
The method of acquiring the path may provide an attack surface which probably wouldn't exist otherwise. This means that a potential attacker could trick the program to see different binary data (provided by him) like which the executable actually has, exposing any vulnerability which might reside in the parser of the data.
It depends on how you want other systems to see your binary.
Digital signed in Windows
The exe format allows for verifying the file has not been modified since publishing. This would allow you to :-
Compile your file
Add your data packet
Sign your file and publish it.
The advantage of following this system, is that "everybody" agrees your file has not been modified since signing.
The easiest way to achieve this scheme, is to use a resource. Windows resources can be added post- linking. They are protected by the authenticode digital signature, and your program can extract the resource data from itself.
It used to be possible to increase the signature to include binary data. Unfortunately this has been banned. There were binaries which used data in the signature section. Unfortunately this was used maliciously. Some details here msdn blog
Breaking the signature
If re-signing is not an option, then the result would be treated as insecure. It is worth noting here, that appended data is insecure, and can be modified without people being able to tell, but so is the code in your binary.
Appending data to a binary does break the digital signature, and also means the end-user can't tell if the code has been modified.
This means that any self-protection you add to your code to ensure the data blob is still secure, would not prevent your code from being modified to remove the check.
Running module
Windows GetModuleFileName allows the running path to be found.
Linux offers /proc/self or /proc/pid.
Unix does not seem to have a method which is reliable.
Data reading
The approach of the zip format, is to have a directory written to the end of the file. This means the data can be found at the end of the location, and then looked backwards for the start of the data. The advantage here, is the data blob is signposted from the end of the data, rather than the natural start.
Is it possible to store the data in gdb in some data structure like a dictionary (some sort of key value pairs).
I have a core and I want to get some important statistics from this core. I am able to scan through the data structure which I want to dump. However I want to extract some more meaningful information while walking the data structure.
Example: Just a trivial example, While walking the data structure, I want to know how many times the element occurs in the data structure of my interest.
Is there a way we can create a dictionary in gdb which holds this as the key and the occurrences as the value?
It could maybe be done using the gdb CLI. However, it's likely to be an enormous pain -- especially because you want to debug core files, and for any non-trivial data structure, gdb wants to allocate memory in the inferior, which can't be done in this case.
So, save yourself a lot of pain and write a Python script using gdb's Python scripting API. By taking this route you have access to all of Python's data structures. And, it is simple to use gdb's API to walk data structures in your core file, or other things like that.
So I've run into an interesting design pattern and I wanted to know if you guys had an opinion on it.
Basically, the design is passing everything around as a pre-serialized type. There is no "types" for the returns, for example. It is passed as a simple uint8_t*. There is a defined header that "tells" you what is in the buffer, how big it is, what the version of the buffer is, ect. I call it "pre-serialized" because it forces flattening of all structures.
The pros:
You can easily write it (or even a set of it) to what ever you want. Files, IO, whatever.
Can store arbitrary data.
The Cons: IMHO:
No type safety is going to be a nightmare
The programmer has to parse the code. Even if there is an enumerated type, the user would have to know what that type means. Even if there are functions to parse the type, the programmer has to know that is the function to call.
Version hell: changing code will cause a ripple effect of errors. Because everywhere is parsing it differently, you have no idea where the code works or where it is broken.
It is viral: because it is flat, you can't "insert" the header on the end of outside data. You could wrap the call if you copy your "data", but this could cause an unnecessary copy that would be SLOW. So either your code is slower than it needs to be, or you conform to this data structure.
It isn't human readable OR debug-able.
Have you seen this design pattern before? Is there a name for this design pattern? Things I missed?
Is there a name for this design pattern?
Well, Legacy Code? :) I have seen such design in 30 years old Cobol systems...
The pros you have stated are easily reachable also by using XML format (or JSON):
You can easily write it (or even a set of it) to what ever you want. Files, IO, whatever - most of all, web services!
Can store arbitrary data.
Furthermore, all your cons are eliminated.
The only pro I can see in your solution is conciseness - when every byte counts and you need to avoid any overhead as too expensive, then this is nice.
Added: Cobol has a feature to easily define the structure of such serialized data, see PICTURE clause. Reading the data is very easy then, you read them as variables. (Like if you have a binary data and define a struct in the C language and typecast the binary to the struct.)
As Honza said this would be normal in Legacy Cobol/PL1 (was there a Cobol/PL1 conversion or interface to COBOL programs ???).
In COBOL this design pattern would make sense, not sure about C though (one of the binary serialization packages or JSON etc might be more sensible).
In Cobol, you would have a Cobol copybook which all programs would use and could edit the data using the Cobol Copybook (with something like file-aid or Microfocus Data Editor).
Why use this "design pattern" in Cobol:
Regression testing of Modules; you can write a driver module like
Read Test-data-file
while more-data
Call Module
write Result to output-file
Read Test-data-file
end
You can then do a compare between Output from the
re-Change Program to the changed program.
Testing - some times you can use a "production file" in testing
A file provides trace or snapshot of what is going on, this can be very useful.
Easy to reorganize Batch streams:
Split a programs up (and pass the data via file). There variety of reason for doing this including
program has gotten to big and is hard to maintain.
Sorting the data
Performance (use a file rather than hitting the DB multiple times)
new uses for extracted data
While your cons are valid for C, they will be less of an issue in Cobol.
The key to using this "design pattern" is being able to edit/view/compare the format. If you can not edit/view/compare a file, I do not see the point