Adding dynamically a sharding key in ArangoDB - database

I'm installing a clustered db with ArangoDB. I need use indexes in collections.
We suppose that we have one collection named myCollection that was created with the shard keys _key.
Let myVariable be the unique key of myCollection so I have a unique constraint on myVariable.
By myCollection is created, and data are inside.
I don't want erase all, create myCollection again and add a new shard key with myVariable and restore myCollection, so I need to add a new shard key dinamically meanwhile that myCollection is already created.
Is this possible? Can I add, somehow, new shard key?
I mean, add key in _shardBy label without recreate collection.
Thanks for help.

No, changing the shard key after creation is not supported. If you take a look at the consequences this would have, its easily understandeable why:
The shard key identifies to the coordinator which documents should end on which cluster node. Vice versa it can therefore predict where to search for documents based on the shard key. This assumption would fail if you change that condition to an arbirtary new one. Therefore documents not matching the condition would have to be moved to the correct new shard.
As you see, you need to work with all documents anyways. So if you don't want to download all data to the client, some javascript on the coordinator like a Foxx Service could fill the gap:
create the new collection with the proper shard key
fetch all _keys into memory
issue repetive AQL queries that select a range from the old collection and insert it into the new one.
You may want to start an additional coordinator if you don't want to use your existing setup for this.
Hint: An upgrade to ArangoDB 3.0 will require a dump/restore cycle anyways - so if you can postpone your problem a little you may solve it then.

Related

I am able to add composite indexes to index.yaml file and get it to work without removing and reloading the data from datastore

I thought that this was not possible and that you had to reload all the data if you added new indexes.
Is this supposed to happen?
When Cloud Datastore builds a new index, it includes any existing entities that match the index, so there's no need to update your existing data.
If, however, you have inserted entities with unindexed properties and decide you want to define indexes on those properties, then you need to update each of those entities to mark the property as indexed.

Solr generate key

I'm working with solr and indexing data from DB.
When I import the data using SQL query, I got some rows with the same key.
I need a way that solr will generate a new field with unique key.
How can I do that?
Thanks
I am not sure if this is possible or not, but maybe you need to re-consider your logic here...
Indexing operation into Solr should be Re-Runable. So, imagine that you come one day and decide to change the schema of your core.
If you generate a new key everytime you import a document, you will end up creating duplicate items when you re-run your data import.
Maybe you need to revisit your DB design to have a unique key, or maybe in the select query, you can create a derived or calculated column value that is calculated based on multiple columns. But I am sure that pushing this problem to solr is not the solution.
ideally the unique key should come from the db (are you sure you cannot get one, by composing some columns etc?).
But, if you cannot, Solr supports UUID generation for this, look here to see how it works depending on your solr version

best solution for hirercical and non typed data

I have a db schema like this:
ELEMENT(uuid[string], name[string], status[integer], ...)
APPLICATION(uuid[string], name[string], config[string], status[integer], parent[foreign key on self or foreign key on ELEMENT], ...)
ATTRIBUTE(uuid[string], name[string],type[string], parent[foreign key on APPLICATION], ...)
VALUES(created[datetime], data[varchar|text|integer|float|boolean|datetime])
And I have this constraints:
Application must have a fk on ELEMENT or on SELF
Attribute could grow up to 250000 values
For every Attribute I have 20.000 Values
Values data column can have different types
On user request I need to serve all Application that match a specific Element
On user request I need to serve all Attributes and their last value that match a specific application
On user request I need to server all Values of a specific Attribute in a arbitrary datetime range
At regular intervals(x minutes) new values are added and the oldest are removed
When new values are added I might need to access last X values of an Attribute
when someone is modifying a item of ELEMENT, APPLICATION or ATTRIBUTE i need to lock that item
I need a persistent data storage and good performance when users makes some requests
I cannot use a classic RDBMS because of constraint 4 and i think a relation database is not the best solution in any case. I think best solution is to use a NOSQL database but which one? There are lot of solutions with pro and cons but i don't know which approaches better to my needs.
Thanks

Selecting Entity based on auto generated ID in google datastore

I have created an entity with few attributes but without specifying any key in which case an auto generated ID has been created in data-store.
Entity en=new Entity("Job");
Now when I fetch such entities and try to store it in Java object, how can I get the auto generated ID (which I required to perform UPDATE operation later)?
I have tried the below ways but it does not return Identifier value.
en.getProperty("__key__");
en.getProperty("ID/Name");
en.getProperty("Key");
You are probably looking for:
en.getProperty(Entity.KEY_RESERVED_PROPERTY)
mentioned in Key Filters (not an obvious place to find it).
Another approach would be to try:
en.getKey().getId()
mentioned in Entity JavaDoc and Key JavaDoc.

How can I fetch the lastest entry of a model new put into NDB?

How can I get the latest entry of a model new putted into NDB?
1: If I use a same parent key ? How to ?
I see the document write
Entities whose keys have the same root form an entity group or group.
If entities are in different groups, then changes to those entities
might sometimes seem to occur "out of order". If the entities are
unrelated in your application's semantics, that's fine. But if some
entities' changes should be consistent, your application should make
them part of the same group when creating them.
Is this means , with the same parent key the order is insert order?
But , how to get the last one ?
2: If I not use a same parent key (the model is same)? How to ?
If you're OK with eventual consistency (i.e. you might not see the very latest one immediately) you can just add a DateTimeProperty with auto_now_add=True and then run a query sorting by that property to get the latest one. (This is also approximate since you might have several entities saved close together which are ordered differently than you expect.)
If you need it to be exactly correct, the only way I can see is to create an entity whose job it is to hold a reference to the latest entry, and update that entity in the same transaction as the entry you're creating. Something like:
class LatestHolder(ndb.Model):
latest = ndb.KeyProperty('Entry')
# code to update:
#ndb.transactional(xg=True)
def put_new_entry(entry):
holder = LatestHolder.get_or_insert(name='fixed-key')
holder.latest = entry
holder.put()
entry.put()
Note that I've used a globally fixed key name here with no parent for the holder class. This is a bottleneck; you might prefer to make several LatestHolder entities with different parents if your "latest entry" only needs to be from a particular parent, in which case you just pass a parent key to get_or_insert.

Resources