OrientDB CRUD for large and nested data - database

I'm very new to OrientDB, I'm trying to create a structure to insert and retrieve large data with nested fields and I couldn't find proper solution or guideline.
This is the structure of table I want to create:
{
UID,
Name,
RecordID,
RecordData: [
{
RAddress,
ItemNo,
Description
},
{
RAddress,
ItemNo,
Description
},
{
RAddress,
ItemNo,
Description
}
....Too many records....
]
},
{
UID,
Name,
RecordID,
RecordData: [
{
RAddress,
ItemNo,
Description
},
{
RAddress,
ItemNo,
Description
},
{
RAddress,
ItemNo,
Description
}
....Too many records....
]
}
....Too many records....
Now, I want to retrieve Description field from table by querying ItemNo and RAddress in bulk.
For example, I have 50K(50000) pairs of UID or RecordID and ItemNo or RAddress, based on this data I want to retrieve Description field. I want to do is with the fastest possible way. So can any one please suggest me good query for this task?
I have 500M records in which most of the record contains 10-12 words each.
Can anyone suggest CRUD queries for it?
Thanks in advance.

You might want to create a single record using content as such:
INSERT INTO Test CONTENT {"UID": 0,"Name": "Test","RecordID": 0,"RecordData": {"RAddress": ["RAddress1", "RAddress2", "RAddress3"],"ItemNo": [1, 2, 3],"Description": ["Description1", "Description2", "Description3"]}}
That'll get you started with embedded values and JSON, however, if you want to do a bulk insert you should write a function, there are many ways to do so but if you want to stay on Studio, go for Function tab.
As for the retrieving part:
SELECT RecordData[Description] FROM Test WHERE (RecordData[ItemNo] CONTAINSTEXT "1") AND (RecordData[RAddress] CONTAINSTEXT "RAddress1")

Related

Most optimal way to store nested information in a database

I want to store some nested information in a Postgres database and I am wondering what is the most optimal way to do so.
I have a list of cars for rent, structured like this:
[Brand] > [Model] > [Individual cars for rent of that brand and model], ex.:
[
{
"id": 1,
"name": "Audi",
"models": [
{
"id": 1,
"name": "A1",
"cars": [
{
"id": 1,
"license": "RPY9973",
"mileage": "41053"
},
{
"id": 2,
"license": "RPY3001",
"mileage": "102302"
},
{
"id": 3,
"license": "RPY9852",
"mileage": "10236"
}
]
},
{
"id": 2,
"name": "A3",
"cars": [
{
"id": 1,
"license": "RPY1013",
"mileage": "66952"
},
{
"id": 2,
"license": "RPY3284",
"mileage": "215213"
},
{
"id": 3,
"license": "RPY0126",
"mileage": "19632"
}
]
}
...
]
}
...
]
Currently, having limited experience with databases and storing arrays, I am storing it in a 'brands' table with the following columns:
id (integer) - brand ID
name (text) - brand name
models (text) - contains stringified content of models and cars within them, which are parsed upon reading
In practice, this does the job, however I would like to know what the most efficient way would be.
For example, should I split the single table into three tables: 'brands', 'models' and 'cars' and have the tables reference each other (brands.models would be an array of unique model IDs, which I could use to read data from the 'models' table, and models.cars would be an array of unique car IDs, which I could use to read data from the 'cars' table)?
Rather than store it as json, jsonb, or as arrays, the most efficient way to store the data would be to store it as relational data (excluding the data types for brevity):
create table brands(
id,
name,
/* other columns */
PRIMARY KEY (id)
);
create table models(
id,
name,
brand_id REFERENCES brands(id),
/* other columns */
PRIMARY KEY (id)
);
create table cars(
id,
model_id REFERENCES models(id),
mileage,
license,
/* other columns */
PRIMARY KEY (id)
);
You can then fetch and update each entity individually, without having to parse json. Partial updates is also much easier when you only have to focus on a single row, rather than worrying about updating arrays or json. For querying, you would join by the primary keys. For example, to get rental cars available by a brand:
select b.id, b.name, m.id, m.name, c.id, c.mileage, c.license
FROM brands b
LEFT JOIN models m
ON m.brand_id = b.id
LEFT JOIN cars c
ON c.model_id = m.id
where b.id = ?
Based on querying / filtering patterns, you would then also want to create indexes on commonly used columns...
CREATE INDEX idx_car_model ON cars(model_id);
CREATE INDEX idx_model_brand ON models(brand_id);
The best solution to store the nested data in your postgres database is json or jsonb field.
The benefits using json or jsonb are:
significantly faster to process, supports indexing (which can be a significant advantage),
simpler schema designs (replacing entity-attribute-value (EAV) tables with jsonb columns, which can be queried, indexed and joined, allowing for performance improvements up until 1000X)

Database Model Design With Laravel Eloquent

we have a problem to query our database in a meant-to-be fashion:
Tables:
employees <1-n> employee_card_validity <n-1> card <1-n> stamptimes
id id id id
employee_id no card_id
card_id timestamp
valid_from
valid_to
Employee is mapped onto Card via the EmployeeCardValidity Pivot which has additional attributes.
We reuse cards which means that a card has multiple entries in the pivot table. Which card is right is determined by valid_from/valid_to. These attributes are constrained not to overlap. Like that there's always a unique relationship from employee to stamptimes where an Employee can have multiple cards and a card can belong to multiple Employees over time.
Where we fail is to define a custom relationship from Employee to Stamptimes which regards which Stamptimes belong to an Employee. That means when I fetch a Stamptime its timestamp is distinctly assigned to a Card because it's inside its valid_from and valid_to.
But I cannot define an appropriate relation that gives me all Stamptimes for a given Employee. The only thing I have so far is to define a static field in Employee and use that to limit the relationship to only fetch Stamptimes of the given time.
public static $date = '';
public function cardsX() {
return $this->belongsToMany('App\Models\Tempos\Card', 'employee_card_validity',
'employee_id', 'card_id')
->wherePivot('valid_from', '>', self::$date);
}
Then I would say in the Controller:
\App\Models\Tempos\Employee::$date = '2020-01-20 00:00:00';
$ags = DepartmentGroup::with(['departments.employees.cardsX.stamptimes'])
But I cannot do that dynamically depending on the actual query result as you could with sql:
SELECT ecv.card_id, employee_id, valid_from, valid_to, s.timestamp
FROM staff.employee_card_validity ecv
join staff.stamptimes s on s.card_id = ecv.card_id
and s.stamptimes between valid_from and coalesce(valid_to , 'infinity'::timestamp)
where employee_id = ?
So my question is: is that database desing unusual or is an ORM mapper just not capable of describing such relationships. Do I have to fall back to QueryBuilder/SQL in such cases?
Do you suit your database model towards ORM or the other way?
You can try:
DB::query()->selectRaw('*')->from('employee_card_validity')
->join('stamptimes', function($join) {
return $join->on('employee_card_validity.card_id', '=', 'stamptimes.card_id')
->whereRaw('stamptimes.timestamp between employee_card_validity.valid_from and employee_card_validity.valid_to');
})->where('employee_id', ?)->get();
If your Laravel is x > 5.5, you can initiate Model extends the Pivot class I believe, so:
EmployeeCardValidity::join('stamptimes', function($join) {
return $join->on('employee_card_validity.card_id', '=', 'stamptimes.card_id')
->whereRaw('stamptimes.timestamp between employee_card_validity.valid_from and employee_card_validity.valid_to');
})->where('employee_id', ?)->get();
But code above is only translating your sql query, I believe I can write better if I know exactly your use cases.

Postgresql 9.5 JSONB nested arrays LIKE statement

I have a jsonb column, called "product", that contains a similar jsonb object as the one below. I'm trying to figure out how to do a LIKE statement against the same data in a postgresql 9.5.
{
"name":"Some Product",
"variants":[
{
"color":"blue",
"skus":[
{
"uom":"each",
"code":"ZZWG002NCHZ-65"
},
{
"uom":"case",
"code":"ZZWG002NCHZ-65-CASE"
},
]
}
]}
The following query works for exact match.
SELECT * FROM products WHERE product#> '{variants}' #> '[{"skus":[{"code":"ZZWG002NCHZ-65"}]}]';
But I need to support LIKE statements like "begins with", "ends width" and "contains". How would this be done?
Example: Lets say I want all products returned that have a sku code that begins with "ZZWG00".
You should unnest variants and skus (using jsonb_array_elements()), so you could examine sku->>'code':
SELECT DISTINCT p.*
FROM
products p,
jsonb_array_elements(product->'variants') as variants(variant),
jsonb_array_elements(variant->'skus') as skus(sku)
WHERE
sku->>'code' like 'ZZW%';
Use DISTINCT as you'll have multiple rows as a result of multiple matches in one product.

Solr query join that returns common values

Is it possible to retrieve the common values that were used in a Solr join?
For example, say I have two cores:
1) hospital, fields: id, doctor_id (multiValued), patient_id (multiValued)
2) dental_office, fields: id, dentist_id (multiValued) patient_id (multiValued)
I would like to find all of the patients who go to go to a specific dental_office (id = 2) and see a specific doctor (doctor_id = 123).
Currently my query on the hospital core looks like this:
"q=doctor_id:(123)",
"fq={!join from=patient_id to=patient_id fromIndex=dental_office}id:(2)"
However this returns the hospitals that match the query, but in reality I want to select the hospitals along with which matched patient_ids. Something such as:
hospital docs:
{ id: 1, patient_ids: [234, 56, 8] }
{ id: 8, patient_ids: [8, 45, 89] }
This seems difficult since patient_ids is multiValued. Is there a way to do this?
Thanks!
solr is document oriented, so you can't do JOINs between cores

Salesforce SOQL statement

I'm using Salesforce and trying to write a SOQL statement. My tables look like:
Person: [id, name]
Related: [id, personid1, personid2]
In sql, to find all the people someone is related to, I might write something like:
select person2.name from
person person1, related, person person2
where person1.id = 'xyz'
and person1.id = related.personid1
and related.person2 = person2.id
How can I achieve the same result set using a SOQL statement?
For the purposes of this query I'm going to assume your custom objects and fields use the regular Salesforce naming conventions.
If you're querying with a record ID:
select personid2__r.Name from Related__c where personid1__c = 'xxxyyyzzz123123'
Or if you're querying with a name:
select personid2__r.Name from Related__c where personid1__r.Name = 'John Doe'
If you absolutely need to return records of type Person__c, then you could do something like:
select Id, Name from Person__c where Id in (select personid2__c from Related__c where personid1__c = 'xxxyyyzzz123123')

Resources