Perfoming calculations on properties from different edges in OrientDB - graph-databases

I have this unique problem where I want to make calculations from values returned by different edges and it doesn't seem like it's working for me.
My graph's function is to track people's collections of Magic: the Gathering cards. Please use the graph as an illustration for my problem.
I wanted to take #8:0 as a starting point and see which decks (MDeck) had the cards (MCard) I owned and calculate what percentage of the deck I already have in my collection. My query started out like this:
SELECT FROM (SELECT FLATTEN(out[label="has"]) FROM #8:0) WHERE in.in.size() > 1
This is to get the cards that I owned that belonged to decks. Next if I had more of a certain card than what is required on a deck I would only count what is required, so I had to use MIN(). This is where the problem arises:
SELECT
MIN(UNION(in.in[label="includes"].qty, qty))
FROM (
SELECT
FLATTEN(out[label="has"])
FROM #8:0
)
WHERE in.in.size() > 1
I thought this will do the trick but it just returned null. I made sure all the qty fields are integers. Am I missing something?
Thanks,
Ramon

Do you have a public database where to execute queries with the OrientDB console or Studio?
If not could you report here the intermediate results of the last query? For example:
SELECT
in.in[label="includes"].qty
FROM (
SELECT
FLATTEN(out[label="has"])
FROM #8:0
)
WHERE in.in.size() > 1
And then:
SELECT
UNION(in.in[label="includes"].qty, qty)
FROM (
SELECT
FLATTEN(out[label="has"])
FROM #8:0
)
WHERE in.in.size() > 1

Related

Extracting all records using a conditional SQL Server query?

I have a long database of observations for individuals. There are multiple observations for each individual, all assigned different medcodeid's.
I want to extract all records of individuals with certain medcodeid's assigned, but only if they at some point have had a smaller list of specific codes assigned.
This is an example of what I start with:
long dataset, multiple observations
and this is the records I'd like to extract:
multiple observations, but patients 3 and 5 are not extracted, as they never had a medcode 12
Would this be an additional WHERE clause? I am struggling as this will then only extract the second AND medcodeid list. But I want it to extract all, if the individual has had one of these certain fewer codes at some point. I hope that makes some sense. I am unfamiliar with IF command? And cannot see how CASE WHEN would work either.
Thank you very much in advance!
You definitely don't want to filter out all the rows so you're right that an additional condition won't help with that. And where only lets you look at the current row and you're trying to make a decision based all the rows belonging to the patient.
This query just uses a table expression and an analytic count() that tags each row with the number of matches as it lets you look outside the current row just like you need.
-- my additions to your query are in lowercase
with data as (
SELECT obs.patid, yob, obsdate, medcodeid,
count(case when medcodeid IN (<list of mandatory codes>) then 1 end)
over (partition by obs.patid) as medcode_count
-- assuming the relationship looks something like this
from obs inner join medcode on medcode.patid = obs.patid
WHERE medcodeid IN (<list of codes>)
AND obsdate BETWEEN '2004-12-31' AND GETDATE()
AND patienttypeid = 3 AND acceptable = 1 AND gender = 2
AND YEAR(obsdate) - yob > 15 AND YEAR(obsdate) - yob < 45
)
select * from data where medcode_count > 0;
At first I thought you were requiring that at least five of the codes from the full set were found. Now that you've edited the question I believe that you want to require that at least one code from a smaller subset is present. Either way this approach will work.
If I'm understanding what you're asking, I think what you need is an additional WHERE clause with a subquery. This could be done with and EXIST or a join but I find an IN query to be easier to work with.
You left the FROM out of your query so I had to guess at it but try this:
SELECT
obs.patid,
yob,
obsdate,
medcodeid
FROM
obs
WHERE
medcodeid IN (list of 20 codes)
AND (obsdate BETWEEN '2004-12-31' AND GETDATE())
AND patienttypeid = 3
AND acceptable = 1
AND gender = 2
AND ((YEAR(obsdate))-yob) > 15
AND ((YEAR(obsdate)) - yob) < 45
AND obs.patid IN (
SELECT
obs.patid
FROM
obs
WHERE
medcodeid IN (5 of the 20 codes)
);

SalesForce SOQL highest number from size column

i'm new to SOQL and SF, so bear with me :)
I have Sales_Manager__c object tied to Rent__c via Master-Detail relationship. What i need is to get Manager with highest number of Rent deals for a current year. I figured out that the Rents__r.size column stores that info. The question is how can i gain access to the size column and retrieve highest number out of it?
Here is the picture of query i have
My SOQL code
SELECT (SELECT Id FROM Rents__r WHERE StartDate__c=THIS_YEAR) FROM Sales_Manager__c
Try other way around, by starting the query from Rents and then going "up". This will let you sort by the count and stop after say top 5 men (your way would give you list of all managers, some with rents, some without and then what, you'd have to manually go through the list and find the max).
SELECT Manager__c, Manager__r.Name, COUNT(Id)
FROM Rent__c
WHERE StartDate__c=THIS_YEAR
GROUP BY Manager__c, Manager__r.Name
ORDER BY COUNT(Id) DESC
LIMIT 10
(I'm not sure if Manager__c is the right name for your field, experiment a bit)

Data model for unique sets

I am hunting for the best way to implement a data model for "recipes"
think like a pizza app where you can compose your own pizza. you select maybe 5 out of 100 ingredients and you select an amount for each. I need to check if I've "seen" that pizza combination before, assign ID if I have not, and retrieve ID if I have.
We have n ingredients.
A recipe is defined by a set of ingredients and a corresponding amount.
Could look like:
Ingr1 90
Ingr2 10
or
Ingr1 90
Ingr2 10
Ingr3 10
I want to store this in a structure where I give each unique recipe an ID, and so it's possible for me to query for the ID given the recipe data set.
I want a stored procedure that takes a data set as a parameter and returns an ID that is new if the recipe was unknown and existing if the recipe already exists.
I am looking for the most efficient way of doing this. My best idea so far is to either encode the recipe as a string (json) and use this as a unique constraint, or have a stored procedure that iterates through the recipe data set and constructs a n level deep if exists statement.
So, I'm confident I can solve the problem, but am looking for a beautiful method.
As far as I can see, you have entities Recipe and Ingredient and M:M relation between them. Data model can look like this (PK in bold):
Recipe (RecipeID, RecipeName)
Ingredient(IngredientID, IngredientName)
RecipeIngredients(RecipeID, IngredientID, Amount)
You can solve task of finding out if same recipe is already present in a database using query but this query wouldn't be simple. It is well-know problem, relational division. There are several approaches. One of the most popular is counting. If some recipe has same amount of ingredients as target one and all ingredients are the same, then they are equal. Such queries often involves data aggregations and perform not very fast on big amount of data.
You can help to solve this problem from application side and you are thinking in right direction. Represent recipe as a string, ordering values by IngredientID (to get same string even if ingredients were added in different order), converting Amount in some stable form (not to get 0.499999 instead of 0.5), calculate some hash out of string, and store this value in Recipe. In simple form hash is an integer value, so you can find doubles very fast.
So it is your call. Every approach has it's own issues. Heavy query in first case and hassle to keep hash in actual state in second case (and possible collisions too). I'd stick with first option until it works OK and start any optimizations only when they are unavoidable.
Query example (new recipe is in #tmp):
;with totals as
(
select RecipeID, count(*) totals
from RecipeIngredients
group by RecipeID
), matched_totals as
(
select i.RecipeID, count(*) matched_totals
from RecipeIngredients i
join #tmp t
on i.IngredientID = t.IngredientID
and i.Amount = t.Amount
group by i.RecipeID
)
select t.*
from totals t
join matched_totals m
on m.RecipeID = t.RecipeID
where
totals = matched_totals
and totals = (select count(*) from #tmp)
This solution is more elegant but much less intuitive:
select *
from Recipe r
where
not exists
( select 1
from RecipeIngredients ri
where
r.RecipeID = ri.RecipeID
and not exists
(select 1 from #tmp t where t.IngredientID = ri.IngredientID)
)

Using a sort order column in a database table

Let's say I have a Product table in a shopping site's database to keep description, price, etc of store's products. What is the most efficient way to make my client able to re-order these products?
I create an Order column (integer) to use for sorting records but that gives me some headaches regarding performance due to the primitive methods I use to change the order of every record after the one I actually need to change. An example:
Id Order
5 3
8 1
26 2
32 5
120 4
Now what can I do to change the order of the record with ID=26 to 3?
What I did was creating a procedure which checks whether there is a record in the target order (3) and updates the order of the row (ID=26) if not. If there is a record in target order the procedure executes itself sending that row's ID with target order + 1 as parameters.
That causes to update every single record after the one I want to change to make room:
Id Order
5 4
8 1
26 3
32 6
120 5
So what would a smarter person do?
I use SQL Server 2008 R2.
Edit:
I need the order column of an item to be enough for sorting with no secondary keys involved. Order column alone must specify a unique place for its record.
In addition to all, I wonder if I can implement something like of a linked list: A 'Next' column instead of an 'Order' column to keep the next items ID. But I have no idea how to write the query that retrieves the records with correct order. If anyone has an idea about this approach as well, please share.
Update product set order = order+1 where order >= #value changed
Though over time you'll get larger and larger "spaces" in your order but it will still "sort"
This will add 1 to the value being changed and every value after it in one statement, but the above statement is still true. larger and larger "spaces" will form in your order possibly getting to the point of exceeding an INT value.
Alternate solution given desire for no spaces:
Imagine a procedure for: UpdateSortOrder with parameters of #NewOrderVal, #IDToChange,#OriginalOrderVal
Two step process depending if new/old order is moving up or down the sort.
If #NewOrderVal < #OriginalOrderVal --Moving down chain
--Create space for the movement; no point in changing the original
Update product set order = order+1
where order BETWEEN #NewOrderVal and #OriginalOrderVal-1;
end if
If #NewOrderVal > #OriginalOrderVal --Moving up chain
--Create space for the momvement; no point in changing the original
Update product set order = order-1
where order between #OriginalOrderVal+1 and #NewOrderVal
end if
--Finally update the one we moved to correct value
update product set order = #newOrderVal where ID=#IDToChange;
Regarding best practice; most environments I've been in typically want something grouped by category and sorted alphabetically or based on "popularity on sale" thus negating the need to provide a user defined sort.
Use the old trick that BASIC programs (amongst other places) used: jump the numbers in the order column by 10 or some other convenient increment. You can then insert a single row (indeed, up to 9 rows, if you're lucky) between two existing numbers (that are 10 apart). Or you can move row 370 to 565 without having to change any of the rows from 570 upwards.
Here is an alternative approach using a common table expression (CTE).
This approach respects a unique index on the SortOrder column, and will close any gaps in the sort order sequence that may have been left over from earlier DELETE operations.
/* For example, move Product with id = 26 into position 3 */
DECLARE #id int = 26
DECLARE #sortOrder int = 3
;WITH Sorted AS (
SELECT Id,
ROW_NUMBER() OVER (ORDER BY SortOrder) AS RowNumber
FROM Product
WHERE Id <> #id
)
UPDATE p
SET p.SortOrder =
(CASE
WHEN p.Id = #id THEN #sortOrder
WHEN s.RowNumber >= #sortOrder THEN s.RowNumber + 1
ELSE s.RowNumber
END)
FROM Product p
LEFT JOIN Sorted s ON p.Id = s.Id
It is very simple. You need to have "cardinality hole".
Structure: you need to have 2 columns:
pk = 32bit int
order = 64bit bigint (BIGINT, NOT DOUBLE!!!)
Insert/UpdateL
When you insert first new record you must set order = round(max_bigint / 2).
If you insert at the beginning of the table, you must set order = round("order of first record" / 2)
If you insert at the end of the table, you must set order = round("max_bigint - order of last record" / 2)
If you insert in the middle, you must set order = round("order of record before - order of record after" / 2)
This method has a very big cardinality. If you have constraint error or if you think what you have small cardinality you can rebuild order column (normalize).
In maximality situation with normalization (with this structure) you can have "cardinality hole" in 32 bit.
It is very simple and fast!
Remember NO DOUBLE!!! Only INT - order is precision value!
One solution I have used in the past, with some success, is to use a 'weight' instead of 'order'. Weight being the obvious, the heavier an item (ie: the lower the number) sinks to the bottom, the lighter (higher the number) rises to the top.
In the event I have multiple items with the same weight, I assume they are of the same importance and I order them alphabetically.
This means your SQL will look something like this:
ORDER BY 'weight', 'itemName'
hope that helps.
I am currently developing a database with a tree structure that needs to be ordered. I use a link-list kind of method that will be ordered on the client (not the database). Ordering could also be done in the database via a recursive query, but that is not necessary for this project.
I made this document that describes how we are going to implement storage of the sort order, including an example in postgresql. Please feel free to comment!
https://docs.google.com/document/d/14WuVyGk6ffYyrTzuypY38aIXZIs8H-HbA81st-syFFI/edit?usp=sharing

how to know how many times its a value in the databes? and for all values?

This can be aplied to tags, to coments, ...
My question its to obtain for example, top 20 repetitions for an atribute in a table..
I mean, i know i can ask for a selected atribute with the mysql_num_rows, but how do i do it to know all of them?
For example,
Most popular colors:
Red -100000
blue -5000
white -200
and so on...
if anyone can give me a clue.. thanks!
SELECT `name`, count(*) as `count` FROM `colors`
GROUP BY `name`
ORDER BY `count` DESC
You want to do this computation on the database, not in your application. Usually, a query of the following form should be fine:
SELECT color_id, COUNT(product_id) AS number
FROM products
GROUP BY color_id
ORDER BY number DESC
LIMIT 20
It will be faster this way, as only the value-count data will be sent from the database to the application. Also, if you have indices set up correctly (on color_id, for instance), thighs will be smoother.

Resources