Selecting entitites with the highest value of some attribute - datomic

Suppose I have one million article entities in my backend with an inst attribute called date, or one million player entities with an int attribute called points. What's a good way to select the 10 latest articles or top-scoring players?
Do I need to fetch the whole millions to the peer and then sort and drop from them?

Until getting hold of the reverse index becomes a Datomic feature, you could manually define one.
e.g. for a :db.type/instant, create an additional attribute of type :db.type/long which you would fill with
(- (Long/MAX_VALUE) (.getTime date))
and the latest 10 articles could be fetched with
(take 10 (d/index-range db reverse-attr nil nil))

Yes, you would need to fetch all the data, since there's no index that would help you out here.
I would have created my own "index" and normalized this data. You can have a separate set of N entities where you keep as many as you'd like. You could start with 10, or consider storing 100 to trade some (possibly negligible) speed for more flexibility. This index can be stored on a separate "singleton" entity that you add as part of your schema.
;; The attribute that stores the index
{:db/id #db/id[:db.part/db]
:db/ident :indexed-articles
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/many
:db.install/_attribute :db.part/db}
;; The named index entity.
{:db/id #db/id[:db.part/db]
:db/ident :articles-index}
You can have a database function that does this. Every time you insert a new entity that you want to "index", call this function.
[[:db/add tempid :article/title "Foo]
[:db/add tempid :article/date ....]
[:index-article tempid 10]]
The implementation of index-article could look like this:
{:db/id #db/id[:db.part/user]
:db/ident :index-article
:db/fn #db/fn {:lang "clojure"
:params [db article-id idx-size]
:code (concat
(map
(fn [article]
[:db/retract
(d/entid db :articles-index)
:indexed-articles
(:db/id article)])
(->> (datomic.api/entity db :articles-index)
(sort-by (fn [] ... implement me ... ))
(drop (dec idx-size))))
[[:db/add (d/entid db :articles-index) :indexed-articles article-id]])}}
Disclaimer: I haven't actually tested this function, so it probably contains errors :) The general idea is that we remove any "overflow" entities from the set, and add the new one. When idx-size is 10, we want to ensure that only 9 items are in the set, and we add our new item to it.
Now you have an entity you can lookup from index, :articles-index, and the 10 most recent articles can be looked up from the index (all refs are indexed), without causing a full database read.
;; "indexed" set of articles.
(d/entity db :articles-index)

I've been looking into this and think I have a slightly more elegant answer.
Declare your attribute as indexed with :db/index true
{:db/id #db/id[:db.part/db -1]
:db/ident :ocelot/number
:db/valueType :db.type/long
:db/cardinality :db.cardinality/one
:db/doc "An ocelot number"
:db/index true
:db.install/_attribute :db.part/db}
This ensures that the attribute is included in the AVET index.
Then the following gives you access to the "top ten", albeit using the low-level datoms call.
(take-last 10 (d/datoms (db conn) :avet :ocelot/number))
Obviously if you need to do any further filtering ("who are the top ten scorers in this club ?") then this approach won't work, but at that point you have a much smaller amount of data in your hand and shouldn't need to worry about the indexing.
I did look extensively at the aggregation functions available from Datalog and am having trouble getting my head around them - and am uncertain that e.g. max would use this index rather than a full scan of the data. Similarly the (index-range ...) function almost certainly does use this index but requires you to know the start and/or end values.

Related

Ruby convert array of active records or objects into array of hashes

I have an object Persons which is an ActiveRecord model with some fields like :name, :age .etc.
Person has a 1:1 relationship with something called Account where every person has an account .
I have some code that does :
Account.create!(person: current_person)
where current_person is a specified existing Person active record object.
Note : The table Account has a field for person_id
and both of them have has_one in the model for each other.
Now I believe we could do something like below for bulk creation :
Account.create!([{person: person3},{person:: person2} ....])
I have an array of persons but am not sure of the best way to convert to an array of hashes all having the same key.
Basically the reverse of Convert array of hashes to array is what I want to do.
Why not just loop over your array of objects?
[person1, person2].each{|person| Account.create!(person: person)}
But if for any reason any of the items you loop over fail Account.create! you may be left in a bad state, so you may want to wrap this in an Active Record Transaction.
ActiveRecord::Base.transaction do
[person1, person2].each{|person| Account.create!(person: person)}
end
The create method actually persists each hash individually, as shown in the source code, so probably it's not what you are looking for. Either way the following code would do the job:
Account.create!(persons.map { |person| Hash[:person_id, person.id] })
If you need to create all records in the same database operation and are using rails 6+ you could use the insert_all method.
Account.insert_all(persons.map { |person| Hash[:person_id, person.id] })
For previous versions of rails you should consider using activerecord-import gem.
# Combination(1).to_a converts [1, 2, 3] to [[1], [2], [3]]
Account.import [:person_id], persons.pluck(:id).combination(1).to_a

How to create a Datomic partition without using db.part

In the official docs for Datomic (http://docs.datomic.com/schema.html) under the heading 'Creating new partitions' it says that a new partition (communities) can be created like this:
{:db/id #db/id[:db.part/db]
:db/ident :communities}
Here the ':communities' is not written as 'db.part/communities'
I can not install a new partition this way. For me it has to be with the leading 'db.part/'. Is the documentation wrong, or am I not seeing the bigger picture?
If you read further in the documentation, you'll see that you're missing another datom required for that transaction (labeled with "Here is the complete transaction..."). That datom is (with user assigned tempid as -1 optional):
[:db/add :db.part/db :db.install/partition #db/id[:db.part/db -1]]
Anything transacted with a tempid that resolves to the system partition (:db.part/db) must also include a datom marking the installation, as with :db.install/partition and :db.install/attribute (the reverse ref version for attribute included in the map is more common).
Transacting the full example from the docs works fine:
(def tx [{:db/id #db/id[:db.part/db -1]
:db/ident :communities}
[:db/add :db.part/db :db.install/partition #db/id[:db.part/db -1]]])
#(d/transact conn tx)
;; returns successful tx map

Properties on Datomic ref relationships

I'm trying to model a schema where a list can have many items and each item can belong to many lists. It's clear to me that I can have a :list/items ref type to model the relationship, but I'd like to also have a rank attribute which determines an item's position in each list where it exists. How might one do such a thing?
The only answer I have - assuming that positioning is list dependent - is that you need to add an indirecting entity with a rank attribute. This isn't very pleasant. It would be nice if a many relation could be ordered, as this use case would simplify substantially.
Heterogenous tuples, added in June 2019, are a new modelling option here.
An attribute value, i.e. the v in the eavto 5-tuple, can now itself be a tuple. This is a clojure vector of max length 8.
Official blog post announcement.
Discussion of the release on twitter.
Note the example in the docs above uses
:db/tupleTypes [:db.type/long :db.type/long]
which is a little strange as the point is heterogenous tuples, so in the case of the OP this would be:
{:db/ident :list/item
:db/valueType :db.type/tuple
:db/tupleTypes [:db.type/ref :db.type/long] ; ref to item, rank
:db/cardinality :db.cardinality/many}
Or you could use a value type instead of a ref for item, if that works for you.
To use this in datalog, you can use the tuple and untuple functions.

How to safely remove a duplicate index from a Rails 3 schema?

I'm working on a Rails 3 app, and we recently realized we have a duplicate index:
# from schema.rb
add_index "dogs", ["owner_id"], :name => "index_dogs_on_owner"
add_index "dogs", ["owner_id"], :name => "index_dogs_on_owner_id"
How can I check which index ActiveRecord is using for relevant queries? Or do I even need to? If one of the indices is removed will ActiveRecord happily just use the other?
I can play around with it locally, but I'm not sure our production environment behaves exactly the same at the DB level.
The name of the index is arbitrary. The database engine will look at the indexes based on the column name, not the human name. The index will not affect ActiveRecord. I recommend removing whichever index is least obvious, in this case index_dogs_on_owner, because the other index is clearly on the owner_id column.
remove_index :dogs, :name => 'index_dogs_on_owner'
Cite: http://apidock.com/rails/ActiveRecord/ConnectionAdapters/SchemaStatements/remove_index

Query to list all partitions in Datomic

What is a query to list all partitions of a Datomic database?
This should return
[[:db.part/db] [:db.part/tx] [:db.part/user] .... ]
where .... is all of the user defined partitions.
You should be able to get a list of all partitions in the database by searching for all entities associated with the :db.part/db entity via the :db.install/partition attribute:
(ns myns
(:require [datomic.api :as d]))
(defn get-partitions [db]
(d/q '[:find ?ident :where [:db.part/db :db.install/partition ?p]
[?p :db/ident ?ident]]
db))
Note
The current version of Datomic (build 0.8.3524) has a shortcoming such that :db.part/tx and :db.part/user (two of the three built-in partitions) are treated specially and aren't actually associated with :db.part/db via :db.install/partition, so the result of the above query function won't include the two.
This problem is going to be addressed in one of the future builds of Datomic. In the meantime, you should take care of including :db.part/tx and :db.part/user in the result set yourself.
1st method - using query
=> (q '[:find ?i :where
[:db.part/db :db.install/partition ?p] [?p :db/ident ?i]]
(db conn))
2nd method - from db object
(filter #(instance? datomic.db.Partition %) (:elements (db conn)))
The second method returns sequence of datomic.db.Partition objects which may be useful if we want to get additional info about the partition.
Both methods have known bug/inconsistency: they don't return :db.part/tx and :db.part/user built-in partitions.

Resources