Handling Id for multiple entities in solr/Lucene

Handling Id for multiple entities in solr/Lucene - solr

I'm using solr dataimporthandler to index multiple tables which are actors, actress, directors and movies. Each of these tables have an id field which starts from 1. Solr's schema has an unique key field. Does this field need to be unique for the entire index or just the entity? For example, if there are both an actor and movie with id 1, will solr be able to recognize them or I'll have to make a global unique key for each entity?

it needs to be unique across the entire index. This is easily achieved if you just create an id by appending some prefix per table to the table id. For example, when select-ing from actors table:
SELECT CONCAT('ACT-', id) as solrid, ...
And index solrid as the doc id.

Related

Two multivalued attributes in one entity

I am trying to create an entity (order) with a primary key: order ID. Foreign Key: customer ID, and with two multivalued attributes, Product ID and Quantity. Because an order may have multiple products, and each product has specific quantity. The idea is that since they are multivalued then a separate table will be created for them in this way [orderID,ProductID,Quantity]. Is it a correct reasoning?
ER diagram

Multi-Column b-tree index logic?

I know how to implement a b-tree for single-column indexes, but how do I implement a b-tree for multi-column indexes in my rdbms project?
For example, I have a table consisting of documents records:
Documents
-------------
id
serial_no
order_no
record_sequence
If I make an index with 3 columns, for example:
CREATE INDEX UNIQUE myindex(serial_no, order_no, record_sequence);
then I have a key name for my b-tree structure in this format:
serial_no*order_no*record_sequence.
I can request a record via this index and this query:
SELECT * FROM Documents WHERE serial_no='ABC' AND order_no=500 AND record_sequence=0;
Note: I am creating an index record ABC*500*0 as b-tree key name.
But when I call all records of a document, for example:
SELECT * FROM Documents WHERE serial_no='ABC' AND order_no=500;
I cannot use my index to search for records because record_sequence is missing in this example.
As a result, what is the method of creating and searching multi-column indexes?
As far as I know my b-tree object does not support searching for "ABC*500*ANY". I am using a RaptorDB_v2.7.5 b-tree object:
RaptorDB - the Document Store
NoSql, JSON based, Document store database with compiled .net map functions and automatic hybrid bitmap indexing and LINQ query filters

How to deal with compound keys using dih in solr

I am importing data from mysql db into solr documents. All is fine but I have one table which has a compound key (a pair of columns together as primary key) -> primary key for post_locations table is (post_id, location_id).
But my post_id is the primary key for my solr document, so when data is being imported from post_location table the location_ids are being overwritten.Is it possible to get location_ids(which is of type int) as an array(as there can be more than one location_id for a post).

For MySQL you can use GROUP BY and GROUP_CONCAT to get all the values for a field grouped together in a single column, separated by ,. You can then use the RegexTransformer and splitBy for that field to index the field as multiValued (in practice indexing it as an array). I posted an example of this in a previous answer. You might also do this by having dependent entity entries in DIH, but it will require more SQL queries than doing a GROUP BY and GROUP_CONCAT.
If you want one row for each entry, you can use build a custom uniqueKey instead, using CONCAT to build the aggregate / compound key on the MySQL side.

Data Modeling and uuid on Cassandra

I am trying to build a movie database for educational purpose using Cassandra in the backend. The querying on the database will be principally made by movie title. So currently the data I have fits in the following model.
movie title | imdb rating | year of release | actors
Reading the CQL documentation I found the music playlist example where the following structure was used
CREATE TABLE playlists (
id uuid,
song_order int,
song_id uuid,
title text,
album text,
artist text,
PRIMARY KEY (id, song_order ) );
The query I have is what is the necessity of using a separate id column. Can't the title column be used as a primary key? what are the advantages and disadvantages of not using a separate uuid field?
The command which I am designing for my model is
CREATE TABLE movies (
title text,
imdb_rating double,
year int,
actors text,
PRIMARY KEY (title, imdb_rating ) );
Here I believe in my model title is the PRIMARY KEY and the PARTITION KEY and imdb_rating is the CLUSTERING KEY(for arranging output in ascending order). Is there anything wrong in my model and how will it affect distribution of the data and why should I/should not use uuid? I am planning to keep a replication_factor of 2 because the number of nodes I am using is just 3.
Also according to the documentation
Do not use an index in these situations:
......
•On a frequently updated or deleted column
In my database the most updated column is imdb_rating so I am not building any secondary index on it.

Can't the title column be used as a primary key?
If the movie title is unique (which is not necessarily true) you could use title as primary key.
what are the advantages and disadvantages of not using a separate uuid field?
UUID is good if you need a unique id that is globally unique and you don't have to check for it's uniqueness. If you can find a set of columns that can be granted that their combination is unique you don't have to use UUID (assuming you don't need an id to refer to it).
But it all depends on your query pattern. if you are going to look for a movie with it's id (probably coming from another table) use UUID as primary key. if you want to find movies with specific title then use title as primary key.
in your case since title is not unique, use a combination of title and UUID as composite key, given that you would search by title.
Here I believe in my model title is the PRIMARY KEY and the PARTITION KEY and imdb_rating is the CLUSTERING KEY(for arranging output in ascending order). Is there anything wrong in my model and how will it affect distribution of the data and why should I/should not use uuid?
in this case you have to use the rating and a UUID for primary key, but when you query you need to allow filtering.

When doing a non ancestor query with a sort by key, will my result be ordered by entity groups?

I need to change a fair amount of entities belonging to different entity groups.
If I do a non-ancestor query, sorted by key, like:
Query query = new Query( "Kind" )
.setFilter( ... )
.addSort( Entity.KEY_RESERVED_PROPERTY, ASC or DESC );
Will I always have a result ordered by entity-groups? I am planning to iterate through the
result until the parent (or grand-parent) key changes, and create a single transaction for all the entities in the same group - to avoid contention.
Will this work as expected? Any other suggestion?
Thank you.

Yes. Sorting by keys orders them by each entity in the ancestor list in order - eg, first by root entities, then by their children, and so forth.

Kindles Queries or Ancestor Queries can only be sorted by KEY.
You are sorting by key and that is ok.
The key is a result of the PARENT+KIND+ID
Each Kind's keys is a part of the KEY. So all your results will be sorted by kind, and then by key.
From GAE KEYS
Every model instance has an identifying key, which includes the
instance's entity kind along with a unique identifier. The identifier
may be either a key name string, assigned explicitly by the
application when the instance is created, or an integer numeric ID,
assigned automatically by App Engine when the instance is written
(put) to the Datastore.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Handling Id for multiple entities in solr/Lucene - solr

it needs to be unique across the entire index. This is easily achieved if you just create an id by appending some prefix per table to the table id. For example, when select-ing from actors table: SELECT CONCAT('ACT-', id) as solrid, ... And index solrid as the doc id.

Related

Two multivalued attributes in one entity

Multi-Column b-tree index logic?

How to deal with compound keys using dih in solr

Data Modeling and uuid on Cassandra

When doing a non ancestor query with a sort by key, will my result be ordered by entity groups?

Categories

Resources