Solr query join that returns common values - solr

Is it possible to retrieve the common values that were used in a Solr join?
For example, say I have two cores:
1) hospital, fields: id, doctor_id (multiValued), patient_id (multiValued)
2) dental_office, fields: id, dentist_id (multiValued) patient_id (multiValued)
I would like to find all of the patients who go to go to a specific dental_office (id = 2) and see a specific doctor (doctor_id = 123).
Currently my query on the hospital core looks like this:
"q=doctor_id:(123)",
"fq={!join from=patient_id to=patient_id fromIndex=dental_office}id:(2)"
However this returns the hospitals that match the query, but in reality I want to select the hospitals along with which matched patient_ids. Something such as:
hospital docs:
{ id: 1, patient_ids: [234, 56, 8] }
{ id: 8, patient_ids: [8, 45, 89] }
This seems difficult since patient_ids is multiValued. Is there a way to do this?
Thanks!

solr is document oriented, so you can't do JOINs between cores

Related

Fastest way to fetch multiple rows by ID, with ordering intact

I need to do a lookup and info from the Product table based on an order list of primary keys.
pks = [22,51,22,45]
products = list(Products.object.filter(pk__in=pks).values_list("pk", flat=True))
# products order is not same as pks's order.
# one solution is to put them into a mapping of sorts
products_by_id = {
prod.pk : prod for prod in Products.object.filter(pk__in=pks)
}
ordered_prods = [products_by_id[pk] for pk in pks]
Is there a better or faster way of doing that with the Djano ORM?
Something like Products.object.filter(pk__in=pks).order_by(lambda p: ...pk.find[p.id])
https://gist.github.com/cpjolicoeur/3590737?permalink_comment_id=2202866#gistcomment-2202866
This seems to be exactly what I'm looking for.
SELECT * FROM foo WHERE id IN (3, 1, 2) ORDER BY array_position(ARRAY[3, 1, 2], id);
Is that possible to use extra() with array_position somehow perhaps?
With ORM, you can annotate with array_position function, and then order by annotated field:
Products.objects.filter(
pk__in=pks
).annotate(
_array_order=Func(
Value(pks),
F("pk"),
function="array_position",
)
).order_by(
"_array_order"
)
However, in many cases converting from QuerySet to list and ordering with python code should be also okay.

Postgres seach jsonb with indexes

Im new to postgres jsonb operation.
Im storing some data in Postgres with jsonb column, which has flexible metadata as below.
I wanted to search different unique metadata (key:value pairs)
id, type, metadata
1, player, {"name": "john", "height": 180, "team": "xyz"}
2, game, {"name": "afl", "members": 10, "team": "xyz"}
results should be something like below, distinct, order by asc. I wanted it to be efficient using some indexes.
key | value
______________
height 180
members 10
name alf
name john
team xyz
My solution below hit the index for search but sorting and distinct wont hit any indexes as they are processed values from jsonb.
CREATE INDEX metadata_jsonb_each_text_idx ON table
USING GIN (jsonb_pretty(metadata) gin_trgm_ops);
select distinct t, t.*
from table u, jsonb_each_text(u.metadata) t
where jsonb_pretty(u.metadata) like '%key%'
order by t.key, t.value
Appreciate any thoughts on this issue.
Thanks!

OrientDB CRUD for large and nested data

I'm very new to OrientDB, I'm trying to create a structure to insert and retrieve large data with nested fields and I couldn't find proper solution or guideline.
This is the structure of table I want to create:
{
UID,
Name,
RecordID,
RecordData: [
{
RAddress,
ItemNo,
Description
},
{
RAddress,
ItemNo,
Description
},
{
RAddress,
ItemNo,
Description
}
....Too many records....
]
},
{
UID,
Name,
RecordID,
RecordData: [
{
RAddress,
ItemNo,
Description
},
{
RAddress,
ItemNo,
Description
},
{
RAddress,
ItemNo,
Description
}
....Too many records....
]
}
....Too many records....
Now, I want to retrieve Description field from table by querying ItemNo and RAddress in bulk.
For example, I have 50K(50000) pairs of UID or RecordID and ItemNo or RAddress, based on this data I want to retrieve Description field. I want to do is with the fastest possible way. So can any one please suggest me good query for this task?
I have 500M records in which most of the record contains 10-12 words each.
Can anyone suggest CRUD queries for it?
Thanks in advance.
You might want to create a single record using content as such:
INSERT INTO Test CONTENT {"UID": 0,"Name": "Test","RecordID": 0,"RecordData": {"RAddress": ["RAddress1", "RAddress2", "RAddress3"],"ItemNo": [1, 2, 3],"Description": ["Description1", "Description2", "Description3"]}}
That'll get you started with embedded values and JSON, however, if you want to do a bulk insert you should write a function, there are many ways to do so but if you want to stay on Studio, go for Function tab.
As for the retrieving part:
SELECT RecordData[Description] FROM Test WHERE (RecordData[ItemNo] CONTAINSTEXT "1") AND (RecordData[RAddress] CONTAINSTEXT "RAddress1")

Best way to use compound Index to query with multiple combination of query parameters?

I am building a functionality to estimate Inventory for my Ads serve platform.The fields on which I am trying to estimate with their cardinality is as below:
FIELD: CARDINALITY
location: 10000 (bengaluru, chennai etc..)
n/w speed : 6 (w, 4G, 3G, 2G, G, NA)
priceRange : 10 (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
users: contains number of users falling under any of the above combination.
Ex. {'location':'bengaluru', 'n/w':'4G', priceRange:8, users: 1000}
means 1000 users are from bengaluru having 4G and priceRange = 8
So total combination can be 10000 * 6 * 10 = 600000 and in future more fields can be added to around 29(currently it is 3 location, n/w, priceRange) and total combination can reach the order of 10mn. Now I want to estimate how many users fall under
Now queries I will need are as follows:
1) find all users who are from location:bengaluru , n/w:3G, priceRange: 6
2) find all users from bengaluru
3) Find all users falling under n/w: 3G and priceRange: 8
What is the best possible way to approach to this?
Which database can be best suited for this requirement.What indexes I need to build. Will compound index help? If yes then How ? Any help is appreciated.
Here's my final answer:
Create table Attribute(
ID int,
Name varchar(50));
Create table AttributeValue(
ID int,
AttributeID int,
Value varchar(50));
Create table userAttributeValue(
userID int,
AttributeID varchar(20),
AttributeValue varchar(50));
Create table User(
ID int);
Insert into user (ID) values (1),(2),(3),(4),(5);
Insert into Attribute (ID,Name) Values (1,'Location'),(2,'nwSpeed'),(3,'PriceRange');
Insert into AttributeValue values
(1,1,'bengaluru'),(2,1,'chennai'),
(3,2, 'w'), (4, 2,'4G'), (5,2,'3G'), (6,2,'2G'), (7,2,'G'), (8,2,'NA'),
(9,3,'1'), (10,3,'2'), (11,3,'3'), (12,3,'4'), (13,3,'5'), (14,3,'6'), (15,3,'7'), (16,3,'8'), (17,3,'9'), (18,3,'10');
Insert into UserAttributeValue (userID, AttributeID, AttributeValue) values
(1,1,1),
(1,2,5),
(1,3,9),
(2,1,1),
(2,2,4),
(3,2,6),
(2,3,13),
(4,1,1),
(4,2,4),
(4,3,13),
(5,1,1),
(5,2,5),
(5,3,13);
Select USERID
from UserAttributeValue
where (AttributeID,AttributeValue) in ((1,1),(2,4))
GROUP BY USERID
having count(distinct concat(AttributeID,AttributeValue))=2
Now if you need a count wrap userID in count and divide by the attributes passed in as each user will have 1 record per attribute and to get the "count of users" you'd need to divide by the number of attributes.
This allows for N growth of Attributes and the AttributeValues per user without changes to UI or database if UI is designed correctly.
By treating each datapoint as an attribute and storing them in once place we can enforce database integrity.
Attribute and AttributeValue tables becomes lookups for UserAttributevalue so you can translate the IDs back to attribute name and the value.
This also means we only have 4 tables user, attribute, attributeValue, and UserAttributeValue.
Technically you don't have to store attributeID on the userAttributeValue, but for performance reasons on later joins/reporting I think you'll find it beneficial.
You need to add proper Primary Key's, Foreign keys, and indexes to the tables. They should be fairly self explanatory. On UserAttributeValue I would have a few Composite indexes each with a different order of the unique key. Just depends on the type of reporting/analysis you'll be doing but adding keys as performance tuning is needed is commonplace.
Assumptions:
You're ok with all datavalues being varchar data in all cases.
If needed you could add a datatype, precision, and scale on the attribute table and allow the UI to cast the attribute value as needed. but since they are all in the same field in the database they all have to be the same datatype. and of the same precision/scale.
Pivot tables to display the data across will likely be needed and you know how to handle those (and engine supports them!)
Gotta say I loved the metal exercise; but still would appreciate feedback from others on SO. I've used this approach in 1 systems I've developed and it's been in two I've supported. There are some challenges but it does follow 3rd normal form db design (except for the replicated attributeID in userAttributevalue but that's there for performance gain in reporting/filtering.

SQL Server tables extend to many tables

I'm handling this problem here in company: We have different customers, that need different fields in the same table, but we do not want to have a table with 300 columns, which is inneficient, hard to use and so on. Example:
table_products have this fields: product_id, product_name, product_cost.
then, the first client 'X' needs the field product_registerid.
the client 'Y' needs the field product_zipareaid.
That happens by different causes. Example: they are from different states, that have different rules.
At this moment we came up with this solution, which i don't like:
product_id, product_name, product_cost, product_personal. In this product_personal we have saved values like '{product_registerid:001;product_zipareaid:001-000131}'.
I came up with a theoretic solution: extend the table, and the sql will know when i do a query in the extended table, and shou me the column with the main table's column. Something like:
table_products with columns product_id, product_name, product_cost.
table_products_x with column product_registerid.
table_products_y with column product_zipareaid.
And the querys would return:
1.
select * from table_products where product_registerid = 001:
product_id, product_name, product_cost, product_registerid
1, iphone, 599, 001.
2.
select * from table_products where product_zipareaid = 000-000110:
product_id, product_name, product_cost, product_zipareaid
1, iphone, 599, 000-000110.
So, im accepting different suggestions for solving our problem.
Thank you in advance!
One approach would be to add a single Extended Properties table, that would look something like this:
Product_id (FK)
Client_id
PropertyName
PropertyValue
And so it would be populated with values like:
Product_id Client_id PropertyName PropertyValue
1 x product_registerid 001
1 y product_zipareaid 000-000110
Then you just join table_products to Extended_properties on Product_Id and put the Client_id(s) you want in the WHERE clause.
Note that you'll probably end up wanting to use a PIVOT query to get multiple extended properties for each client.

Resources