Multi-valued Database (UniVerse) -- SM (MV) vs SM (VS) and ASSOC() - database

I have a question taken from pg 16 of IBM's Nested Relational Database White Paper, I'm confused why in the below CREATE command they use MV/MS/MS rather than MV/MV/MS, when both ORDER_#, and PART_# are one-to-many relationships.. I don't understand what value, vs sub-value means in non-1nf database design. I'd also like to know to know more about the ASSOC () clause.
Pg 16 of IBM's Nested Relational Database White Paper (slight whitespace modifications)
CREATE TABLE NESTED_TABLE (
CUST# CHAR (9) DISP ("Customer #),
CUST_NAME CHAR (40) DISP ("Customer Name"),
ORDER_# NUMBER (6) DISP ("Order #") SM ("MV") ASSOC ("ORDERS"),
PART_# NUMBER (6) DISP (Part #") SM ("MS") ASSOC ("ORDERS"),
QTY NUMBER (3) DISP ("Qty.") SM ("MS") ASSOC ("ORDERS")
);
The IBM nested relational databases implement nested tables as repeating attributes and
repeating groups of attributes that are associated. The SM clauses specify that the attribute is either repeating (multivalued--"MV") or a repeating group (multi-subvalued--"MS"). The ASSOC clause associates the attributes within a nested table. If desired, the IBM nested relational databases can support several nested tables within a base table. The following standard SQL statement would be required to process the 1NF tables of Figure 5 to produce the report shown in Figure 6:
SELECT CUSTOMER_TABLE.CUST#, CUST_NAME, ORDER_TABLE.ORDER_#, PART_#, QTY
FROM CUSTOMER_TABLE, ORDER_TABLE, ORDER_CUST
WHERE CUSTOMER_TABLE.CUST_# = ORDER_CUST.CUST_# AND ORDER_CUST.ORDER_# =
ORDER _TABLE.ORDER_#;
Nested Table
Customer # Customer Name Order # Part # Qty.
AA2340987 Zedco, Inc. 93-1123 037617 81
053135 36
93-1154 063364 32
087905 39
GV1203948 Alphabravo 93-2321 006776 72
055622 81
067587 29
MT1238979 Trisoar 93-2342 005449 33
036893 52
06525 29
93-4596 090643 33

I'll go ahead and answer my own question, while pursuing IBM's UniVerse SQL Administration for DBAs I came across code for CREATE TABLE on pg 55.
ACT_NO INTEGER FORMAT '5R' PRIMARY KEY
BADGE_NO INTEGER FORMAT '5R' PRIMARY KEY
ANIMAL_ID INTEGER FORMAT '5L' PRIMARY KEY
(see distracting side note below) This amused me at first, but essentially I believe this to be a column directive the same as a table directive like PRIMARY ( ACT_NO, BADGE_NO, ANIMAL_ID )
Later on page 5-19, I saw this
ALTER TABLE LIVESTOCK.T ADD ASSOC VAC_ASSOC (
VAC_TYPE KEY, VAC_DATE, VAC_NEXT, VAC_CERT
);
Which leads me to believe that tacking on ASSOC (VAC_ASSOC) to a column would be the same... like this
CREATE TABLE LIVESTOCK.T (
VAC_TYPE ... ASSOC ("VAC_ASSOC")
VAC_DATE ... ASSOC ("VAC_ASSOC")
VAC_NEXT ... ASSOC ("VAC_ASSOC")
VAC_cERT ... ASSOC ("VAC_ASSOC")
);
Anyway, I'm not 100% sure I'm right, but I'm guessing the order doesn't matter, and that rather than these being an intransitive association they're just a order-insensitive grouping.
Onward! With the second part of the question pertaining to MS and MV, I for the life of me can not figure out where the hell IBM got this syntax from. I believe it to be imaginary. I don't have access to a dev machine I can play on to test this out, but I can't find it (the term MV) in the old 10.1 or the new UniVerse 10.3 SQL Reference
side note for those not used to UniVerse the 5R and 5L mean 5 characters right or left justified. That's right a display feature built into the table meta data... Google for UniVerse FORMAT (or FMT) for more info.

Just so you know, Attribute, Multivalue and Sub-Multivalue comes from the way they structure their data.
Essentially, all data is stored in a tree of sorts.
UniVerse is a Multivalue Database. Generally, it does not work in the say way as Relational DBs of the SQL work function.
Each record can have multiple attributes.
Each attribute can have multiple multivalues.
Each multivalue can have multiple sub-multivalues.
So, if I have a record called FRED
Then, FRED<1,2,3> refers to the 1st attribute, 2 multivalue position and 3 subvalue position.
To read more about it, you need to learn more about how UniVerse works. The SQL section is just a side part of it. I suggest you read the other manuals to understand what you are working with.
EDIT
Essentially, the code above is telling you that:
There may be multiple orders per client. These are stored at an MV level in the 'table'
There may be multiple parts per order. These are stored at the MS level in the 'table'
There may be multiple qtys per order. These are stored at the MS level in the 'table'. since they are at the same level, although they are 1-n for orders, they are 1-1 in regards to parts.

Related

neo4j error while avoiding loops in relationships

I have Node Label CUSTOMER that has one key, CUSTOMER_ID
Each customer id is linked to other customer id's, these bidirectional relationships are created using CSV files.
I want to have the result in below form for all nodes
CUSTOMER_ID, MIN(CUSTOMER_ID) over the set of related nodes
600,600
601,600
602,600
604,600
605,600
There will many such linked nodes (sub graphs) in the total data
I was able to get it using the below query
MATCH (a:Member_Matching_1) -[r:MATCHED*]-> (b:Member_Matching_1)
WITH DISTINCT a,b
RETURN a.OPTUM_LAB_ID ,min(b.OPTUM_LAB_ID)
order by toInt(min(b.OPTUM_LAB_ID)),ToINT(a.OPTUM_LAB_ID)
but the issue is that the query will traverse the graph too many number of unwanted times
Ex-
wanted : 600 -> 601 -> 602 -> 604
Unwanted : 600 -> 601 -> 602 -> 603 -> 602 -> 604
As the data volume will be too high, I want to use the most optimal query.
After having spent some time searching the web came across a solution
MATCH p=(a:Member_Matching_1) -[:MATCHED*]-> (b:Member_Matching_1)
WHERE NONE (n IN nodes(p)
WHERE size(filter(x IN nodes(p)
WHERE n = x))> 1)
RETURN EXTRACT(n IN NODES(p)| n.OPTUM_LAB_ID) ;
But I am facing the error
Neo.DatabaseError.General.UnknownError
key not found: UNNAMED32
Please advise
Thanks in advance
As of today, Cypher is not really well-suited for these sort of queries, as it only supports edge uniqueness, but not vertex uniqueness. There is a proposal in the openCypher language to support configurable matching semantics, but it has only been accepted recently and is not merged to Neo4j.
So currently, for this sort of traversal, you are probably better of using the APOC library's apoc.path.expandConfig stored procedure. This allows you to set uniqueness constraints such as NODE_PATH, which enforces that "For each returned node there’s a unique path from the start node to it."
Also, when I faced a similar problem, I tried to use the following hack: set a fixed depth of the traversal and manually specify the uniqueness constraints. This did not work well for my use case, but it might be worth to give it a try. Sketch code:
MATCH p=(n)-[*5]->(n)
WHERE nodes(p)[0] <> nodes(p)[2]
AND nodes(p)[0] <> nodes(p)[4]
AND nodes(p)[2] <> nodes(p)[4]
RETURN nodes(p)
LIMIT 1
The error you got Neo.DatabaseError.General.UnknownError / key not found: UNNAMED32 is very strange indeed, it seems that your query overstressed the database which resulted in this (quite unique) error message.
Note: I agree with the comment of #TomGeudens stating that you should not create the MATCHED edge twice - just use a single direction and incorporate the undirected nature of the edge in your queries, i.e. use (...)-[...]-(...) in Cypher.

one function or many for doing two-way lookups on multi-column junction table?

In SQL Server, we have a junction table associating our property IDs with IDs (text or numeric) of the same properties in several external databases (with sentinel value -1 or '-1' for no link):
SEPID SEPCODE AEID MRIID ABSID PEDID
2087 '140800' 26077 '140800' '-1' 3162
2088 '140900' -1 '140900' '-1' 3167
2089 'F21610' 25744 '-1' '-1' 3184
2090 '15402' -1 '-1' '-1' 3185
2094 '141200' 26085 '141200' '83296' 3198
As databases and business needs grow, we write new specific functions to translate one set of IDs to another, taking a comma-separated string list of input IDs and returning either a table of output IDs or another comma-separated list of output IDs, with different use cases for the table-valued (TVF) vs scalar-valued (SVF) functions. For example (using hard-coded rather than the normal programmatically-managed ID lists):
SELECT * FROM dbo.tvf_SepIdsToAbsIds('2088,2090,2094')
SELECT * FROM dbo.tvf_MriInfo(dbo.svf_AeIds2MriIds('26077,25744'))
would return:
ABSID
'-1'
'-1'
'83296'
and
MRIID NAME ETC...
'140800' 'Foo' ...
I'm wondering if there is a sensible, practical, reasonably performant way to replace these proliferating specific functions with a single generic TVF and SVF that would take two column names and a list of IDs:
SELECT * FROM dbo.tvf_Ids('SEPID', '2088,2090,2094', 'ABSID')
SELECT * FROM dbo.MriInfo(dbo.svf_Ids('AEID', '26077,25744', 'MRIID'))
Is it possible? Does it require dynamic SQL? Do the sometimes-text/sometimes-numeric external IDs complicate things greatly? Is it worth the struggle? Dynamic SQL always feels a bit dodgy and in this case I'm not even sure how to write it.
If I understand your question you have a number of databases each mastering a set of IDs. In you chosen database you relate some\all of these with local data. You then want a way of translating between the local\remote IDS.
You could use Synonyms to point to the database and then a view to consume your synonym.
So you would have
Select *
from MyNewView
and in your MyNewView
Select *
from MyNewSynonym s
inner join Mytable.ABSID = s.ABSID
Your local data just looks at the view and doesn't know anything about the different databases or any particular conversion issues. Its all hidden in the Synonym and view.
Not really sure why you are using TVFs but they are better than using SVFs which you definitely want to avoid for performance reasons. Even better if you can hide it in a view as it may give you indexing advantages.
As for splitting strings and converting them to a rowset, you can use a number of methods to do this. Including a TVF, CLR function or some horrid looping. There a number of examples to assist out there. You then just join the result in with the results from your view above.

Is it possible to query a value (or a value set) with cassandra even if we don't know the key (or key range) in advance?

I am going to express the idea in SQL:
SELECT key,value
FROM table1
WHERE value > 10
Or do we always need to know the key?
I suppose you can use secondary indexes which are available since version 0.7 of casssandra.
You might also checkout the following answer: Cassandra and Secondary-Indexes, how do they work internally?
it is recommended to use secondary indexes only for low-cardinality columns, which means for columns which do not have many different values (e.g. columns like 'status' or 'priority' which have usually only a handful different values like 'high', 'medium', 'low').
In case you are using Hector as your cassandra client you can find information here how to use them:
https://github.com/rantav/hector/wiki/User-Guide
Yes, of course, for example, you can use *
select * from CF where value = 10
If you use the Hector API (e.g. CqlQuery), you can get a list of rows back from this query.
Note, currently for secondary indexes, you must have at least one equality conditional, so your query with just value > 10 would not work. See this question

Selective PostgreSQL database querying

Is it possible to have selective queries in PostgreSQL which select different tables/columns based on values of rows already selected?
Basically, I've got a table in which each row contains a sequence of two to five characters (tbl_roots), optionally with a length field which specifies how many characters the sequence is supposed to contain (it's meant to be made redundant once I figure out a better way, i.e. by counting the length of the sequences).
There are four tables containing patterns (tbl_patterns_biliteral, tbl_patterns_triliteral, ...etc), each of which corresponds to a root_length, and a fifth table (tbl_patterns) which is used to synchronise the pattern tables by providing an identifier for each row—so row #2 in tbl_patterns_biliteral corresponds to the same row in tbl_patterns_triliteral. The six pattern tables are restricted such that no row in tbl_patterns_(bi|tri|quadri|quinqui)literal can have a pattern_id that doesn't exist in tbl_patterns.
Each pattern table has nine other columns which corresponds to an identifier (root_form).
The last table in the database (tbl_words), contains a column for each of the major tables (word_id, root_id, pattern_id, root_form, word). Each word is defined as being a root of a particular length and form, spliced into a particular pattern. The splicing is relatively simple: translate(pattern, '12345', array_to_string(root, '')) as word_combined does the job.
Now, what I want to do is select the appropriate pattern table based on the length of the sequence in tbl_roots, and select the appropriate column in the pattern table based on the value of root_form.
How could this be done? Can it be combined into a simple query, or will I need to make multiple passes? Once I've built up this query, I'll then be able to code it into a PHP script which can search my database.
EDIT
Here's some sample data (it's actually the data I'm using at the moment) and some more explanations as to how the system works: https://gist.github.com/823609
It's conceptually simpler than it appears at first, especially if you think of it as a coordinate system.
I think you're going to have to change the structure of your tables to have any hope. Here's a first draft for you to think about. I'm not sure what the significance of the "i", "ii", and "iii" are in your column names. In my ignorance, I'm assuming they're meaningful to you, so I've preserved them in the table below. (I preserved their information as integers. Easy to change that to lowercase roman numerals if it matters.)
create table patterns_bilateral (
pattern_id integer not null,
root_num integer not null,
pattern varchar(15) not null,
primary key (pattern_id, root_num)
);
insert into patterns_bilateral values
(1,1, 'ya1u2a'),
(1,2, 'ya1u22a'),
(1,3, 'ya12u2a'),
(1,4, 'me11u2a'),
(1,5, 'te1u22a'),
(1,6, 'ina12u2a'),
(1,7, 'i1u22a'),
(1,8, 'ya1u22a'),
(1,9, 'e1u2a');
I'm pretty sure a structure like this will be much easier to query, but you know your field better than I do. (On the other hand, database design is my field . . . )
Expanding on my earlier answer and our comments, take a look at this query. (The test table isn't even in 3NF, but the table's not important right now.)
create table test (
root_id integer,
root_substitution varchar[],
length integer,
form integer,
pattern varchar(15),
primary key (root_id, length, form, pattern));
insert into test values
(4,'{s,ş,m}', 3, 1, '1o2i3');
This is the important part.
select root_id
, root_substitution
, length
, form
, pattern
, translate(pattern, '12345', array_to_string(root_substitution, ''))
from test;
That query returns, among other things, the translation soşim.
Are we heading in the right direction?
Well, that's certainly a bizarre set of requirements! Here's my best guess, but obviously I haven't tried it. I used UNION ALL to combine the patterns of different sizes and then filtered them based on length. You might need to move the length condition inside each of the subqueries for speed reasons, I don't know. Then I chose the column using the CASE expression.
select word,
translate(
case root_form
when 1 then patinfo.pattern1
when 2 then patinfo.pattern2
... up to pattern9
end,
'12345',
array_to_string(root.root, '')) as word_combined
from tbl_words word
join tbl_root root
on word.root_id = root.root_id
join tbl_patterns pat
on word.pattern_id = pat.pattern_id
join (
select 2 as pattern_length, pattern_id, pattern1, ..., pattern9
from tbl_patterns_biliteral bi
union all
select 3, pattern_id, pattern1, pattern2, ..., pattern9
from tbl_patterns_biliteral tri
union all
...same for quad and quin...
) patinfo
on
patinfo.pattern_id = pat.pattern_id
and length(root.root) = patinfo.pattern_length
Consider combining all the different patterns into one pattern_details table with a root_length field to filter on. I think that would be easier than combining them all together with UNION ALL. It might be even easier if you had multiple rows in the pattern_details table and filtered based on root_form. Maybe the best would be to lay out pattern_details with fields for pattern_id, root_length, root_form, and pattern. Then you just join from the word table through the pattern table to the pattern detail that matches all the right criteria.
Of course, maybe I've completely misunderstood what you're looking for. If so, it would be clearer if you posted some example data and an example result.

Searching for and matching elements across arrays

I have two tables.
In one table there are two columns, one has the ID and the other the abstracts of a document about 300-500 words long. There are about 500 rows.
The other table has only one column and >18000 rows. Each cell of that column contains a distinct acronym such as NGF, EPO, TPO etc.
I am interested in a script that will scan each abstract of the table 1 and identify one or more of the acronyms present in it, which are also present in table 2.
Finally the program will create a separate table where the first column contains the content of the first column of the table 1 (i.e. ID) and the acronyms found in the document associated with that ID.
Can some one with expertise in Python, Perl or any other scripting language help?
It seems to me that you are trying to join the two tables where the acronym appears in the abstract. ie (pseudo SQL):
SELECT acronym.id, document.id
FROM acronym, document
WHERE acronym.value IN explode(documents.abstract)
Given the desired semantics you can use the most straight forward approach:
acronyms = ['ABC', ...]
documents = [(0, "Document zeros discusses the value of ABC in the context of..."), ...]
joins = []
for id, abstract in documents:
for word in abstract.split():
try:
index = acronyms.index(word)
joins.append((id, index))
except ValueError:
pass # word not an acronym
This is a straightforward implementation; however, it has n cubed running time as acronyms.index performs a linear search (of our largest array, no less). We can improve the algorithm by first building a hash index of the acronyms:
acronyms = ['ABC', ...]
documents = [(0, "Document zeros discusses the value of ABC in the context of..."), ...]
index = dict((acronym, idx) for idx, acronym in enumberate(acronyms))
joins = []
for id, abstract in documents:
for word in abstract.split():
try
joins.append((id, index[word]))
except KeyError:
pass # word not an acronym
Of course, you might want to consider using an actual database. That way you won't have to implement your joins by hand.
Thanks a lot for the quick response.
I assume the pseudo SQL solution is for MYSQL etc. However it did not work in Microsoft ACCESS.
the second and the third are for Python I assume. Can I feed acronym and document as input files?
babru
It didn't work in Access because tables are accessed differently (e.g. acronym.[id])

Resources