Arangodb update properties depend on edge type - database

I am trying to use AQL to update the whole node collection , named Nodes, depend on the type of edges they have
.
Requirement:
Basically, if 2 entity in Nodes has relation type= "Same", they would be updated with unique groupid properties (same for more than 2)
This would only run one time in the beginning (to populate groupid)
My concept approach:
Use AQL
For each entity inside Node, query out all connectable nodes with type=SAME
Generate an groupid and Update all of them
Write to an lookup object those id
For next entity, do a lookup, skip the entity if their id is there.
What I tried
FOR v,e,p
In 1..10
ANY v
EntityRelationTest
OPTIONS {uniqueVertices:"global",bfs:true}
FILTER p.edges[*].relationType[0]== "EQUALS"
UPDATE v WITH { typeName2:"test1"} IN EntityTest
return NEW
But I am quite new to arangodb AQL, is something like above possible?

In the end, what I use is a customize traversal object running directly inside Foxx in order to get the best of both world: performance and correctness. It seemed that we cannot do the above with only AQL

Related

MongoDB numeric index

I was wondering if it's possible to create a numeric count index where the first document would be 1 and as new documents are inserted the count would increase. If possible are you also able to apply it to documents imported via mongoimport? I have created and index via db.collection.createIndex( {index : 1} ) but it doesn't seem to be applying.
I would strongly recommend using ObjectId as your _id field. This has the benefit of being a good value for distributed systems, but also based on the date it was created. It also has a built-in index inside MongoDB.
Example using Morphia:
Date d = ...;
QueryImpl<MyClass> query = datastore.createQuery(MyClass);
query.field("_id").greaterThanOrEq(new ObjectId(d));
query.sort("_id");
query.limit(100);
List<MyClass> myDocs = query.asList();
This would fetch all documents created since date d in order of creation.
To load the next batch, change to:
query.field("_id").greaterThan(lastDoc.getId());
This will very efficiently load the next batch based on the ID of the last document from the previous batch.

Django Query Optimisation

I am working currently on telecom analytics project and newbie in query optimisation. To show result in browser it takes a full minute while just 45,000 records are to be accessed. Could you please suggest on ways to reduce time for showing results.
I wrote following query to find call-duration of a person of age-group:
sigma=0
popn=len(Demo.objects.filter(age_group=age))
card_list=[Demo.objects.filter(age_group=age)[i].card_no
for i in range(popn)]
for card in card_list:
dic=Fact_table.objects.filter(card_no=card.aggregate(Sum('duration'))
sigma+=dic['duration__sum']
avgDur=sigma/popn
Above code is within for loop to iterate over age-groups.
Model is as follows:
class Demo(models.Model):
card_no=models.CharField(max_length=20,primary_key=True)
gender=models.IntegerField()
age=models.IntegerField()
age_group=models.IntegerField()
class Fact_table(models.Model):
pri_key=models.BigIntegerField(primary_key=True)
card_no=models.CharField(max_length=20)
duration=models.IntegerField()
time_8bit=models.CharField(max_length=8)
time_of_day=models.IntegerField()
isBusinessHr=models.IntegerField()
Day_of_week=models.IntegerField()
Day=models.IntegerField()
Thanks
Try that:
sigma=0
demo_by_age = Demo.objects.filter(age_group=age);
popn=demo_by_age.count() #One
card_list = demo_by_age.values_list('card_no', flat=True) # Two
dic = Fact_table.objects.filter(card_no__in=card_list).aggregate(Sum('duration') #Three
sigma = dic['duration__sum']
avgDur=sigma/popn
A statement like card_list=[Demo.objects.filter(age_group=age)[i].card_no for i in range(popn)] will generate popn seperate queries and database hits. The query in the for-loop will also hit the database popn times. As a general rule, you should try to minimize the amount of queries you use, and you should only select the records you need.
With a few adjustments to your code this can be done in just one query.
There's generally no need to manually specify a primary_key, and in all but some very specific cases it's even better not to define any. Django automatically adds an indexed, auto-incremental primary key field. If you need the card_no field as a unique field, and you need to find rows based on this field, use this:
class Demo(models.Model):
card_no = models.SlugField(max_length=20, unique=True)
...
SlugField automatically adds a database index to the column, essentially making selections by this field as fast as when it is a primary key. This still allows other ways to access the table, e.g. foreign keys (as I'll explain in my next point), to use the (slightly) faster integer field specified by Django, and will ease the use of the model in Django.
If you need to relate an object to an object in another table, use models.ForeignKey. Django gives you a whole set of new functionality that not only makes it easier to use the models, it also makes a lot of queries faster by using JOIN clauses in the SQL query. So for you example:
class Fact_table(models.Model):
card = models.ForeignKey(Demo, related_name='facts')
...
The related_name fields allows you to access all Fact_table objects related to a Demo instance by using instance.facts in Django. (See https://docs.djangoproject.com/en/dev/ref/models/fields/#module-django.db.models.fields.related)
With these two changes, your query (including the loop over the different age_groups) can be changed into a blazing-fast one-hit query giving you the average duration of calls made by each age_group:
age_groups = Demo.objects.values('age_group').annotate(duration_avg=Avg('facts__duration'))
for group in age_groups:
print "Age group: %s - Average duration: %s" % group['age_group'], group['duration_avg']
.values('age_group') selects just the age_group field from the Demo's database table. .annotate(duration_avg=Avg('facts__duration')) takes every unique result from values (thus each unique age_group), and for each unique result will fetch all Fact_table objects related to any Demo object within that age_group, and calculate the average of all the duration fields - all in a single query.

ndb retrieving entity key by ID without parent

I want to get an entity key knowing entity ID and an ancestor.
ID is unique within entity group defined by the ancestor.
It seems to me that it's not possible using ndb interface. As I understand datastore it may be caused by the fact that this operation requires full index scan to perform.
The workaround I used is to create a computed property in the model, which will contain the id part of the key. I'm able now to do an ancestor query and get the key
class SomeModel(ndb.Model):
ID = ndb.ComputedProperty( lambda self: self.key.id() )
#classmethod
def id_to_key(cls, identifier, ancestor):
return cls.query(cls.ID == identifier,
ancestor = ancestor.key ).get( keys_only = True)
It seems to work, but are there any better solutions to this problem?
Update
It seems that for datastore the natural solution is to use full paths instead of identifiers. Initially I thought it'd be too burdensome. After reading dragonx answer I redesigned my application. To my suprise everything looks much simpler now. Additional benefits are that my entities will use less space and I won't need additional indexes.
I ran into this problem too. I think you do have the solution.
The better solution would be to stop using IDs to reference entities, and store either the actual key or a full path.
Internally, I use keys instead of IDs.
On my rest API, I used to do http://url/kind/id (where id looked like "123") to fetch an entity. I modified that to provide the complete ancestor path to the entity: http://url/kind/ancestor-ancestor-id (789-456-123), I'd then parse that string, generate a key, and then get by key.
Since you have full information about your ancestor and you know your id, you could directly create your key and get the entity, as follows:
my_key = ndb.Key(Ancestor, ancestor.key.id(), SomeModel, id)
entity = my_key.get()
This way you avoid making a query that costs more than a get operation both in terms of money and speed.
Hope this helps.
I want to make a little addition to dargonx's answer.
In my application on front-end I use string representation of keys:
str(instance.key())
When I need to make some changes with instence even if it is a descendant I use only string representation of its key. For example I have key_str -- argument from request to delete instance':
instance = Kind.get(key_str)
instance.delete()
My solution is using urlsafe to get item without worry about parent id:
pk = ndb.Key(Product, 1234)
usafe = LocationItem.get_by_id(5678, parent=pk).key.urlsafe()
# now can get by urlsafe
item = ndb.Key(urlsafe=usafe)
print item

objectify query filter by list in entity contains search parameter

in an app i have an entity that contains a list of other entities (let's say an event holding a list of assigned employees)
using objectify - i need to find all the events a particular employee is assigned to.
is there a basic way to filter a query if it contains the parameter - kind of the opposite of the query in
... quick pseudocode
findAll(Employee employee) {
...
return ofy.query(Event.class).filter("employees.contains", employee).list();
}
any help would be greatly appreciated
i tried just doing filter("employees", employee) after seeing this http://groups.google.com/group/objectify-appengine/browse_thread/thread/77ba676192c08e20 - but unfortunately this returns me an empty list
currently i'm doing something really inefficient - going through each event, iterating through the employees and adding them to a new list if it contains the given employee just to have something that works - i know this is not right though
let me add one thing,
the above query is not actually what it is, i was just using that because i did not think this would make a difference.
The Employee and Events are in the same entity group with Business as a parent
the actual query i am using is the following
ofy.query(Event.class).ancestor(businessKey).filter("employees", employee).list();
unfortunately this is still returning an empty list - does having the ancestor(key) in there mess up the filter?
solution, the employees field was not indexed correctly.
I added the datastore-indexes file to create a composite index, but was testing originally on a value that I added before the employees field was indexed, this was something stupid i was doing - simply having an index on the "business" field and the "employees" field fixed everything. the datastore-indexes file did not appear to be necessary, after deleting it and trying again everything worked fine.
Generally, you do this one of two ways:
Put a property of Set<Key<Employee>> on the Event
or
Put a property of Set<Key<Event>> on the Employee
You could also create a relationship entity, but if you're just doing filtering on values with relatively low counts, usually it's easier to just put the set property on one entity or the other.
Then filter as you describe:
ofy.query(Event.class).filter("employees", employee).list()
or
ofy.query(Employee.class).filter("events", event).list()
The list property should hold a Keys to the target entity. If you pass in an entity to the filter() method, Objectify will understand that you want to filter by the key instead.
Example :
/***************************************************/
#Entity
#Cache
public class News {
#Id Long id;
String news ;
#Index List<Long> friend_list = new ArrayList<Long>();
// My friends who can see my news , exemele : friend_list.add(id_f1); friend_list.add(id_f2); friend_list.add(id_f3);
//To make an operation on "friend_list", it is obligatory to index it
}
/*************************************************/
public News(Long id_f){
List<Long> friend_id = new ArrayList<Long>();
friend_id.add(id_f);
Query<Nesw> query = ofy().load().type(News.class).filter("friend_list in",friend_id).limit(limit);
//To filter a list, just after the name of the field you want to filter, add "IN".
//here ==> .filter("friend_list in",friend_id);
// if friend_list contains "id_friend" ==> the query return value
.........
}

pull Drupal field values with db_query() or db_select()

I've created a content type in Drupal 7 with 5 or 6 fields. Now I want to use a function to query them in a hook_view call back. I thought I would query the node table but all I get back are the nid and title. How do I get back the values for my created fields using the database abstraction API?
Drupal stores the fields in other tables and can automatically join them in. The storage varies depending on how the field is configured so the easiest way to access them is by using an EntityFieldQuery. It'll handle the complexity of joining all your fields in. There's some good examples of how to use it here: http://drupal.org/node/1343708
But if you're working in hook_view, you should already be able access the values, they're loaded into the $node object that's passed in as a parameter. Try running:
debug($node);
In your hook and you should see all the properties.
If you already known the ID of the nodes (nid) you want to load, you should use the node_load_multiple() to load them. This will load the complete need with all fields value. To search the node id, EntityFieldQuery is the recommended way but it has some limitations. You can also use the database API to query the node table for the nid (and revision ID, vid) of your nodes, then load them using node_load_multiple().
Loading a complete load can have performance impacts since it will load way more data than what you need. If this prove to be an issue, you can either try do directly access to field storage tables (if your fields values are stored in your SQL database). The schema of these tables is buld dynamicaly depedning on the fields types, cardinality and other settings. You will have to dig into your database schema to figure it out. And it will probably change as soon as you change something on your fields.
Another solution, is to build stub node entities and to use field_attach_load() with a $options['field_id'] value to only load the value of a specific field. But this require a good knowledge and understanding of the Field API.
See How to use EntityFieldQuery article in Drupal Community Documentation.
Creating A Query
Here is a basic query looking for all articles with a photo that are
tagged as a particular faculty member and published this year. In the
last 5 lines of the code below, the $result variable is populated with
an associative array with the first key being the entity type and the
second key being the entity id (e.g., $result['node'][12322] = partial
node data). Note the $result won't have the 'node' key when it's
empty, thus the check using isset, this is explained here.
Example:
<?php
$query = new EntityFieldQuery();
$query->entityCondition('entity_type', 'node')
->entityCondition('bundle', 'article')
->propertyCondition('status', 1)
->fieldCondition('field_news_types', 'value', 'spotlight', '=')
->fieldCondition('field_photo', 'fid', 'NULL', '!=')
->fieldCondition('field_faculty_tag', 'tid', $value)
->fieldCondition('field_news_publishdate', 'value', $year. '%', 'like')
->fieldOrderBy('field_photo', 'fid', 'DESC')
->range(0, 10)
->addMetaData('account', user_load(1)); // Run the query as user 1.
$result = $query->execute();
if (isset($result['node'])) {
$news_items_nids = array_keys($result['node']);
$news_items = entity_load('node', $news_items_nids);
}
?>
Other resources
EntityFieldQuery on api.drupal.org
Building Energy.gov without Views

Resources