How To Read MVS System Catalog To Retrieve GDG Information? - dataset

I have a job (JCL) on the mainframe where I want to programmatically retrieve a particular GDG file's recent relative generation numbers from the system catalog (API call)...where I can then programmatically dig thru the results returned by the call to figure out the relative generation numbers. This is similar to doing TSO 3.4 on the GDG base file name where the most recent generation numbers can be seen. IDCAMS doesn't appear to return information in a format that is friendly to a program. Thanks!
Example: GDG BASE NAME: TEST.FILE
GDG generations: TEST.FILE.G0010V00
TEST.FILE.G0011V00
TEST.FILE.G0012V00

Take a look at IGGCSI00, the catalog interface. You can call it from any program (REXX, CLIST, COBOL, assembler, PL/I), and it offers a lot of flexibility. Of course, like a lot of IBM flexible solutions, there is always some obtuseness.
There are lots of examples around the Internet, but the sample program in SYS1.SAMPLIB(IGGCSIRX) is excellent.

Programatically (in assembler language), you could use the LOCATE SVC, with CAMLST specifying the parameter list,to get the information you're seeking -- here's a reference: https://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.idas300/s3099.htm -- the example there shows only how to use that to get a volume list, but I used it in the early '80s to get the G-V- (generation-version) subname qualifiers corresponding to the relative index numbers -- pass the GDG base DSNAME and you get all the gens -- if you'd like to see some threads on this, maybe search bit.listserv.ibm-main -- you could also search the online IBM manuals with the term "Generation Index Pointer Entry" (GIPE), which is a pivotal part of the associated control blocks ...

Your choices include:
IDCAMS/TSO Listcat and writing a program to reformat
Rexx ListDsi command
In particular for the the ListDsi you can have the following in the JCL
//MYGDG DD DSN=my.gdg(0),DISP=SHR
and in the rexx program
x = ListDsi("MYGDG FILE")
say SYSDSNAME
You could also use Background ISPF services but that is an overkill for this
**Note:* to runn rexx, You need to Run TSO
//* job statement
//TSOBATCH EXEC PGM=IKJEFT1A,DYNAMNBR=200
//SYSEXEC DD DSN=userid.REXX.EXEC,DISP=SHR
//SYSPRINT DD SYSOUT=*
//SYSTSPRT DD SYSOUT=*
//MYGDG DD DSN=my.gdg(0),DISP=SHR
//SYSTSIN DD *
PROFILE PREFIX(userid) /* specifying a userid*/
%MYREXX

Related

How to find out storage used by stages to clean out

UI shows higher than normal usage for Stage, how to identify which internal/user/table stages have files need to be purged?
STORAGE_USAGE or STAGE_STORAGE_USAGE_HISTORY views in ACCOUNT_HISTORY are not helpful, since they only provide daily averages for total used space.
You can list all stages for current user using
ls #~;
This will return
Stage Name
Size
MD5 Value
Last modified date

Works by itself, but not in a loop

I am using pyteaser to get information from a listing of websites. Since there could be hundreds of sites, I am trying to put it in a loop. When I run this code by itself:
summaries = SummarizeUrl(df['url'].values[1])
print (summaries)
It gives the following output, working fine:
[u'Bookings Institute researcher Paul C. Light published a study about failed government projects and their causes.', u'In 2011, U.K. government officials scraped a massive 9-year, $16 billion project to create a unified electronic health records system for British citizens.', u'Changing requirements, insufficient testing, and the monolithic nature of the project contributed to this failed government project for the failure.', u'Projected to cost $68 million, the projects costs skyrocketed to $700 million before being abandoned.', u'Here are a few examples of failed government projects, with estimated costs and causes:\n\nThe FBI system was designed to modernize tech systems and enable easier access across diverse FBI information assets.']
When I put it in a loop as follows:
i=0
for i in list(df):
summaries = SummarizeUrl(df['url'].values[i])
str1 = ''.join(summaries)#convert to string
print (str1)
I get the following error:
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
I am trying to increment i based on the value in the dataframe. The dataframe looks like this:
dataframe
It works when I do it manually.
You're not incrementing i with this. In your code i is not an integer. It is an object consisting of one entry of what ever df is.
You could try to print your i like:
for i in list(df):
print(type(i))
print(i)
to see what it is. Then you should correct you access of the url, which could be i['url']

Extract Array of Values for Watson Dialog Variables

In DevPost Watson Developer Challenge for Conversational Applications post, I saw Watson (maybe) able to analyze following phrase "I want to visit Tokyo, Sydney, Manchester, and Reykjavik during a trip that takes 30 days".
Is there a better way to extract those array of locations without having to predefine max no of location variables (i.e. set location1 - 5) and manually specify various grammar items like $ (Locations)={location1} * (Locations)={location2} * (Locations)={location3} * (Locations)={location4} as per Pizza example dialog? I would like to follow up with comment such as "That's a lot" if location > 4, or "Sure" if less.
You could try something like alchemy or relationship extraction to identify all of the languages, and then simply add them to the user profile in Dialog. But today, the best way to do this within a broader conversation will be to do it the same way the pizza sample does as you outlined above.

Neo4j output format

After working with neo4j and now coming to the point of considering to make my own entity manager (object manager) to work with the fetched data in the application, i wonder about neo4j's output format.
When i run a query it's always returned as tabular data. Why is this??
Sure tables keep a big place in data and processing, but it seems so strange that a graph database can only output in this format.
Now when i want to create an object graph in my application i would have to hydrate all the objects and this is not really good for performance and doesn't leverage true graph performace.
Consider MATCH (A)-->(B) RETURN A, B when there is one A and three B's, it would return:
A B
1 1
1 2
1 3
That's the same A passed down 3 times over the database connection, while i only need it once and i know this before the data is fetched.
Something like this seems great http://nigelsmall.com/geoff
a load2neo is nice, a load-from-neo would also be nice! either in the geoff format or any other formats out there https://gephi.org/users/supported-graph-formats/
Each language could then implement it's own functions to create the objects directly.
To clarify:
Relations between nodes are lost in tabular data
Redundant (non-optimal) format for graphs
Edges (relations) and vertices (nodes) are usually not in the same table. (makes queries more complex?)
Another consideration (which might deserve it's own post), what's a good way to model relations in an object graph? As objects? or as data/method inside the node objects?
#Kikohs
Q: What do you mean by "Each language could then implement it's own functions to create the objects directly."?
A: With an (partial) graph provided by the database (as result of a query) a language as PHP could provide a factory method (in C preferably) to construct the object graph (this is usually an expensive operation). But only if the object graph is well defined in a standard format (because this function should be simple and universal).
Q: Do you want to export the full graph or just the result of a query?
A: The result of a query. However a query like MATCH (n) OPTIONAL MATCH (n)-[r]-() RETURN n, r should return the full graph.
Q: you want to dump to the disk the subgraph created from the result of a query ?
A: No, existing interfaces like REST are prefered to get the query result.
Q: do you want to create the subgraph which comes from a query in memory and then request it in another language ?
A: no i want the result of the query in another format then tabular (examples mentioned)
Q: You make a query which only returns the name of a node, in this case, would you like to get the full node associated or just the name ? Same for the edges.
A: Nodes don't have names. They have properties, labels and relations. I would like enough information to retrieve A) The node ID, it's labels, it's properties and B) the relation to other nodes which are in the same result.
Note that the first part of the question is not a concrete "how-to" question, rather "why is this not possible?" (or if it is, i like to be proven wrong on this one). The second is a real "how-to" question, namely "how to model relations". The two questions have in common that they both try to find the answer to "how to get graph data efficiently in PHP."
#Michael Hunger
You have a point when you say that not all result data can be expressed as an object graph. It reasonable to say that an alternative output format to a table would only be complementary to the table format and not replacing it.
I understand from your answer that the natural (rawish) output format from the database is the result format with duplicates in it ("streams the data out as it comes"). I that case i understand that it's now left to an alternative program (of the dev stack) to do the mapping. So my conclusion on neo4j implementing something like this:
Pro's - not having to do this in every implementation language (of the application)
Con's - 1) no application specific mapping is possible, 2) no performance gain if implementation language is fast
"Even if you use geoff, graphml or the gephi format you have to keep all the data in memory to deduplicate the results."
I don't understand this point entirely, are you saying that these formats are no able to hold deduplicated results (in certain cases)?? So infact that there is no possible textual format with which a graph can be described without duplication??
"There is also the questions on what you want to include in your output?"
I was under the assumption that the cypher language was powerful enough to specify this in the query. And so the output format would have whatever the database can provide as result.
"You could just return the paths that you get, which are unique paths through the graph in themselves".
Useful suggestion, i'll play around with this idea :)
"The dump command of the neo4j-shell uses the approach of pulling the cypher results into an in-memory structure, enriching it".
Does the enriching process fetch additional data from the database or is the data already contained in the initial result?
There is more to it.
First of all as you said tabular results from queries are really commonplace and needed to integrate with other systems and databases.
Secondly oftentimes you don't actually return raw graph data from your queries, but aggregated, projected, sliced, extracted information out of your graph. So the relationships to the original graph data are already lost in most of the results of queries I see being used.
The only time that people need / use the raw graph data is when to export subgraph-data from the database as a query result.
The problem of doing that as a de-duplicated graph is that the db has to fetch all the result data data in memory first to deduplicate, extract the needed relationships etc.
Normally it just streams the data out as it comes and uses little memory with that.
Even if you use geoff, graphml or the gephi format you have to keep all the data in memory to deduplicate the results (which are returned as paths with potential duplicate nodes and relationships).
There is also the questions on what you want to include in your output? Just the nodes and rels returned? Or additionally all the other rels between the nodes that you return? Or all the rels of the returned nodes (but then you also have to include the end-nodes of those relationships).
You could just return the paths that you get, which are unique paths through the graph in themselves:
MATCH p = (n)-[r]-(m)
WHERE ...
RETURN p
Another way to address this problem in Neo4j is to use sensible aggregations.
E.g. what you can do is to use collect to aggregate data per node (i.e. kind of subgraphs)
MATCH (n)-[r]-(m)
WHERE ...
RETURN n, collect([r,type(r),m])
or use the new literal map syntax (Neo4j 2.0)
MATCH (n)-[r]-(m)
WHERE ...
RETURN {node: n, neighbours: collect({ rel: r, type: type(r), node: m})}
The dump command of the neo4j-shell uses the approach of pulling the cypher results into an in-memory structure, enriching it and then outputting it as cypher create statement(s).
A similar approach can be used for other output formats too if you need it. But so far there hasn't been the need.
If you really need this functionality it makes sense to write a server-extension that uses cypher for query specification, but doesn't allow return statements. Instead you would always use RETURN *, aggregate the data into an in-memory structure (SubGraph in the org.neo4j.cypher packages). And then render it as a suitable format (e.g. JSON or one of those listed above).
These could be a starting points for that:
https://github.com/jexp/cypher-rs
https://github.com/jexp/cypher_websocket_endpoint
https://github.com/neo4j-contrib/rabbithole/blob/master/src/main/java/org/neo4j/community/console/SubGraph.java#L123
There are also other efforts, like GraphJSON from GraphAlchemist: https://github.com/GraphAlchemist/GraphJSON
And the d3 json format is also pretty useful. We use it in the neo4j console (console.neo4j.org) to return the graph visualization data that is then consumed by d3 directly.
I've been working with neo4j for a while now and I can tell you that if you are concerned about memory and performances you should drop cypher at all, and use indexes and the other graph-traversal methods instead (e.g. retrieve all the relationships of a certain type from or to a start node, and then iterate over the found nodes).
As the documentation says, Cypher is not intended for in-app usage, but more as a administration tool. Furthermore, in production-scale environments, it is VERY easy to crash the server by running the wrong query.
In second place, there is no mention in the docs of an API method to retrieve the output as a graph-like structure. You will have to process the output of the query and build it.
That said, in the example you give you say that there is only one A and that you know it before the data is fetched, so you don't need to do:
MATCH (A)-->(B) RETURN A, B
but just
MATCH (A)-->(B) RETURN B
(you don't need to receive A three times because you already know these are the nodes connected with A)
or better (if you need info about the relationships) something like
MATCH (A)-[r]->(B) RETURN r

Prolog Doing a Query

This is directly from a tutorial online, and I get a top down level design error, help?
employee(193,'Jones','John','173 Elm St.','Hoboken','NJ',
12345,1,'25 Jun 93',25500).
employee(181,'Doe','Betty','11 Spring St.','Paterson','NJ',
12354,3,'12 May 91',28500).
employee(198,'Smith','Al','2 Ace Ave.','Paterson','NJ',
12354,3,'12 Sep 93',27000).
Given these basic relations (also called extensional relations), we can define other relations using Prolog procedure definitions to give us answers to questions we might have about the data. For example, we can define a new relation containing the names of all employees making more than $28,000:
well_paid_emp(First,Last) :-
employee(_Num,Last,First,_Addr,_City,_St,_Zip,_Dept,_Date,Sal),
Sal > 28000.
It could be that you are using a Prolog system which shows a singleton warning for well_paid_emp/2.
Not all Prolog systems allow _<Capital><Rest> as singletons, i.e. variables that occur only once in a rule.

Resources