Composite key with CouchDB, finding multiple records - database

I know you can pass a key or a range to return records in CouchDB, but I want to do something like this. Find X records that are X values.
So for example, in regular SQL, lets say I wanted to return records with ids that are 5, 7, 29, 102. I would do something like this:
SELECT * FROM sometable WHERE id = 5 OR id = 7 or id = 29 or id = 102
Is it possible to do this in CouchDB, where I toss all the values I want to find in the key array, and then CouchDB searches for all of those records that could exist in the "key parameter"?

You can do a POST as documented on CouchDB wiki. You pass the list of keys in the body of the request.
{"keys": ["key1", "key2", ...]}
The downside is that a POST request is not cached by the browser.
Alternatively, you can obtain the same response using a GET with the keys parameter. For example, you can query the view _all_docs with:
/DB/_all_docs?keys=["ID1","ID2"]&include_docs=true
which, properly URL encoded, becomes:
/DB/_all_docs?keys=%5B%22ID1%22,%22ID2%22%5D&include_docs=true
this should give better cacheability, but keep in mind that _all_docs changes at each doc update. Sometimes, you can workaround this by defining your own view with only the needed documents.

With a straight view function, this will not be possible. However, you can use a _list function to accomplish the same result.

Related

ActiveRecord - Find rows with array and update column with array of values?

Lets say I have some Users and I want to update their ages because I put them in wrong at first. I could do them one at a time as:
User.find(20).update(age: 25)
But can I pull the users with an array, and then update each individual with the corresponding place in another array? I'm imagining something like:
User.find([20,46,78]).update_all(age: [25, 50, 43])
I have a list of User ids and what their correct ages are supposed to be, so I want to change the existing ones for those users. What's the best way to do this?
One way of doing this, if your backing storage is MySql/Maria you can do something like this:
arr = { 1 => 22, 2 => 55}.map { |id, age| "(#{id}, #{age})" }
User.connection.execute("INSERT into users (id, age) VALUES #{arr.join(',')} ON DUPLICATE KEY UPDATE age = VALUES(age)")
This is assuming your table name is users. Be careful. Since SQL is manually composed, watch out for injection problems. Depends on what you need, this solution might or might not work for you. It for the most part bypasses the ActiveRecord, so you will not get any validation or callbacks.

OrientDB - find "orphaned" binary records

I have some images stored in the default cluster in my OrientDB database. I stored them by implementing the code given by the documentation in the case of the use of multiple ORecordByte (for large content): http://orientdb.com/docs/2.1/Binary-Data.html
So, I have two types of object in my default cluster. Binary datas and ODocument whose field 'data' lists to the different record of binary datas.
Some of the ODocument records' RID are used in some other classes. But, the other records are orphanized and I would like to be able to retrieve them.
My idea was to use
select from cluster:default where #rid not in (select myField from MyClass)
But the problem is that I retrieve the other binary datas and I just want the record with the field 'data'.
In addition, I prefer to have a prettier request because I don't think the "not in" clause is really something that should be encouraged. Is there something like a JOIN which return records that are not joined to anything?
Can you help me please?
To resolve my problem, I did like that. However, I don't know if it is the right way (the more optimized one) to do it:
I used the following SQL request:
SELECT rid FROM (FIND REFERENCES (SELECT FROM CLUSTER:default)) WHERE referredBy = []
In Java, I execute it with the use of the couple OCommandSQL/OCommandRequest and I retrieve an OrientDynaElementIterable. I just iterate on this last one to retrieve an OrientVertex, contained in another OrientVertex, from where I retrieve the RID of the orpan.
Now, here is some code if it can help someone, assuming that you have an OrientGraphNoTx or an OrientGraph for the 'graph' variable :)
String cmd = "SELECT rid FROM (FIND REFERENCES (SELECT FROM CLUSTER:default)) WHERE referredBy = []";
List<String> orphanedRid = new ArrayList<String>();
OCommandRequest request = graph.command(new OCommandSQL(cmd));
OrientDynaElementIterable objects = request.execute();
Iterator<Object> iterator = objects.iterator();
while (iterator.hasNext()) {
OrientVertex obj = (OrientVertex) iterator.next();
OrientVertex orphan = obj.getProperty("rid");
orphanedRid.add(orphan.getIdentity().toString());
}

Query given keys

I would like to accomplish some sort of hybrid solution between ndb.get_multi() and Query().
I have a set of keys, that I can use with:
entities = ndb.get_multi(keys)
I would like to query, filter, and order these entities using Query() or some more efficient way than doing all myself in the Python code manually.
How do people go about doing this? I want something like this:
query = Entity.gql('WHERE __key__ in :1 AND prop1 = :2 ORDER BY prop2', keys, 'hello')
entities = query.fetch()
Edit:
The above code works just fine, but it seems like fetch() never uses values from cache, whereas ndb.get_multi() does. Am I correct about this? If not, is the gql+fetch method much worse than get_multi+manual processing?
There are no way to use a query on already fetched properties, unless you will write it by yourself, but all this stuff can be easily done with built-in python filters. Note that its more efficient to run a query if you have a big dataset, rather than get_multi hundreds of keys to get only 5 entities.
entities = ndb.get_multi(keys)
# filtering
entities = [e for e in entities if e.prop1 == 'bla' and e.prop2 > 3]
#sorting by multiple properties
entities = sorted(entities, key=lambda x: (x.prop1, x.prop2))
UPDATE: And yes, cache is only used when you receive your entity by key, it is not used when you query for entities.

ndb retrieving entity key by ID without parent

I want to get an entity key knowing entity ID and an ancestor.
ID is unique within entity group defined by the ancestor.
It seems to me that it's not possible using ndb interface. As I understand datastore it may be caused by the fact that this operation requires full index scan to perform.
The workaround I used is to create a computed property in the model, which will contain the id part of the key. I'm able now to do an ancestor query and get the key
class SomeModel(ndb.Model):
ID = ndb.ComputedProperty( lambda self: self.key.id() )
#classmethod
def id_to_key(cls, identifier, ancestor):
return cls.query(cls.ID == identifier,
ancestor = ancestor.key ).get( keys_only = True)
It seems to work, but are there any better solutions to this problem?
Update
It seems that for datastore the natural solution is to use full paths instead of identifiers. Initially I thought it'd be too burdensome. After reading dragonx answer I redesigned my application. To my suprise everything looks much simpler now. Additional benefits are that my entities will use less space and I won't need additional indexes.
I ran into this problem too. I think you do have the solution.
The better solution would be to stop using IDs to reference entities, and store either the actual key or a full path.
Internally, I use keys instead of IDs.
On my rest API, I used to do http://url/kind/id (where id looked like "123") to fetch an entity. I modified that to provide the complete ancestor path to the entity: http://url/kind/ancestor-ancestor-id (789-456-123), I'd then parse that string, generate a key, and then get by key.
Since you have full information about your ancestor and you know your id, you could directly create your key and get the entity, as follows:
my_key = ndb.Key(Ancestor, ancestor.key.id(), SomeModel, id)
entity = my_key.get()
This way you avoid making a query that costs more than a get operation both in terms of money and speed.
Hope this helps.
I want to make a little addition to dargonx's answer.
In my application on front-end I use string representation of keys:
str(instance.key())
When I need to make some changes with instence even if it is a descendant I use only string representation of its key. For example I have key_str -- argument from request to delete instance':
instance = Kind.get(key_str)
instance.delete()
My solution is using urlsafe to get item without worry about parent id:
pk = ndb.Key(Product, 1234)
usafe = LocationItem.get_by_id(5678, parent=pk).key.urlsafe()
# now can get by urlsafe
item = ndb.Key(urlsafe=usafe)
print item

Simple search by value?

I would like to store some information as follows (note, I'm not wedded to this data structure at all, but this shows you the underlying information I want to store):
{ user_id: 12345, page_id: 2, country: 'DE' }
In these records, user_id is a unique field, but the page_id is not.
I would like to translate this into a Redis data structure, and I would like to be able to run efficient searches as follows:
For user_id 12345, find the related country.
For page_id 2, find all related user_ids and their countries.
Is it actually possible to do this in Redis? If so, what data structures should I use, and how should I avoid the possibility of duplicating records when I insert them?
It sounds like you need two key types: a HASH key to store your user's data, and a LIST for each page that contains a list of related users. Below is an example of how this could work.
Load Data:
> RPUSH page:2:users 12345
> HMSET user:12345 country DE key2 value2
Pull Data:
# All users for page 2
> LRANGE page:2:users 0 -1
# All users for page 2 and their countries
> SORT page:2:users By nosort GET # GET user:*->country GET user:*->key2
Remove User From Page:
> LREM page:2:users 0 12345
Repeat GETs in the SORT to retrieve additional values for the user.
I hope this helps, let me know if there's anything you'd like clarified or if you need further assistance. I also recommend reading the commands list and documentation available at the redis web site, especially concerning the SORT operation.
Since user_id is unique and so does country, keep them in a simple key-value pair. Quering for a user is O(1) in such a case... Then, keep some Redis sets, with key the page_id and members all the user_ids..

Resources