In a Big Table, is it normal to key values in child (sub) collections? - google-app-engine

I'm using Google App Engine and thus Big Table.
I have a person entity that looks like this:
{
// This property would be encoded into JSON and saved un-indexed as db.Text()
phone_numbers:
{
'hHklams8akjJkaJSL': // <-- Should I key this object?
{
number:'555-555-5555',
type:'mobile',
},
etc...
},
// This property is an array of strings.
// It is searchable so that a query could be run to find all
// people with a particular phone number:
// "SELECT * FROM person WHERE phone_number_search_property =
// '5555555555'"
phone_number_search_property:['5555555555','other phone numbers...'],
first_name:'...',
etc...
}
The phone_number property is stored as a blob of unindexed text in JSON format (db.Text). If I want to refer to a particular phone number in this situation, I decode the json, then get the phone number with the particular key that I am looking for.
The phone_number_search_property is used for searching. It enables a search by phone number: "SELECT * FROM person WHERE phone_number_search_property = '5555555555'"
What is a good way to refer to a phone number inside of an entity in this situation? Here, I have each value keyed using a UUID. Is this a "normal" and accepted way of doing things? If not, what is?
Thanks!

If data object is really just part of another object and is never accessed without the "parent" object (as is the case with phone number and person) then IMHO it's ok to serialize it and store it inside the "parent" object. So what you did is OK.
You search persons by phone number, so the solution to have additional property with (normalized) phone numbers is working. If you'd need to search on additional property, then it would not work (e.g. limiting search to only mobile numbers).
Why do you key serialized phone numbers by a hashed string (I assume you generate it via UUID.fromString(String))? Just use the (normalized) phone number - it is unique.

Related

Parse.com: query on an array field not working with a big number of values

I use Parse.com Core and Cloud Code to store data and perform some operations for a mobile app. I have an issue with a query on an array field that is sometimes not returning anything even if I am sure it should.
I store a large amount of phone numbers in an array field to keep track of user's matching contacts.
This field is called phoneContacts and look like this (with numbers only, this is just as an example):
["+33W30VXXX0V","+33W30VXX843","+33W30VZVZVZ","+33W34W3X0Y4","+33W34W386Y0", ...]
I have a function in Cloud Code that is supposed to get matching rows for a given phone number. Here is my query:
var phoneNumber = request.params.phoneNumber;
var queryPhone = new Parse.Query('UserData');
queryPhone.equalTo('phoneContacts', phoneNumber); // phoneNumber is passed as a string param, i.e. "+33W30VXX843"
queryPhone.include('user');
var usersToNotify = [];
return queryPhone.each(function(userData) {
var user = userData.get('user');
usersToNotify.push(user.get('username'));
})
.then(function() {
return usersToNotify;
});
I tested my query with an array of 2 or 3 phone numbers and it works well and returns the expected rows. But then I tried with a user having around 300 phone numbers in that phoneContacts field and even if I query a value that is present (appear with a filter in Parse Data Browser), nothing is returned. To be sure I even took a phone number existing in 2 rows: one with few values and one with many, and only the row with a few values got returned.
I've read carefully the Parse documentation and especially about queries and field limits, but it doesn't seem to have a restriction on the number of values for an array field, and nothing says that query might not work with a lot of values.
Anybody can point me in the right direction? Should I design my Parse Classes differently to avoid having so many values in an array field? Or is there something wrong with the query?
You need to be using a PFRelation or some sort of intermediate table. You should not use an array to store 300 phone numbers, your queries will get really slow.
PFRelations:
https://parse.com/docs/osx/api/Classes/PFRelation.html
http://blog.parse.com/learn/engineering/new-many-to-many/

JPA search by Key without Knowing Parent Key

Ok so I have an application that uses GAE and consequently the datastore.
Say I have multiple companies A, B and C and I have within each company Employees X,Y and Z. The relationship between a company and employee will be OneToMany, with the company being the owner. This results in the Company Key being of the form
long id = 4504699138998272; // Random Example
Key CompanyKey = KeyFactory.createKey(Company.class.getSimpleName(), id);
and the employee key would be of the form
long id2 = 5630599045840896;
Key EmployeeKey = KeyFactory.createKey(CompanyKey,Employee.class.getSimpleName(),id2);
all fine and well and there is no problem, until in the front end, during jsp representation. Sometimes I would need to generate a report, or open an Employees profile, in which case the div containing his information would get an id as follows
<div class="employeeInfo" id="<%=employee.getKey().getId()%>" > .....</div>
and this div has an onclick / submit event, that will ajax the new modifications to the employee profile to servelet, at which point I have to specify the primary key of the employee, (which I thought I could easily get from the div id), but it didnt work server side.
The problem is I know the Employees String portion of the Key and the long portion, but not the Parent key. To save time I tried this and it didnt work
Key key = KeyFactory.creatKey(Employee.class.getSimpleName(); id);
Employee X = em.find(Employee.class,key);
X is always returned null.
I would really appreciate any idea of how to find or "query" Entities by keys without knowing their parents key (as I would hate having to re-adjust Entity classes)
Thanks alot !!
An Entity key and its parents cannot be separated. It's called ancestor path, a chain composed of entity kinds and ids.
So, in your example ancestor paths will look like this:
CompanyKey: ("Company", 4504699138998272)
EmployeeKey: ("Company", 4504699138998272, "Employee", 5630599045840896)
A key composed only of ("Employee", 5630599045840896) is a completely different one comparing to the EmployeeKey even though both keys end with the same values. Think of concatenating elements into a single "string" and comparing final values, they will never match.
One thing you can do is use encoded keys instead of their id values:
String encodedKey = KeyFactory.keyToString(EmployeeKey);
Key decodedKey = KeyFactory.stringToKey(encodedKey);
decodedKey.equals(EmployeeKey); // true
More about Ancestor Paths:
https://developers.google.com/appengine/docs/java/datastore/entities#Java_Ancestor_paths
KeyFactory Java doc:
https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/datastore/KeyFactory#keyToString(com.google.appengine.api.datastore.Key)

ndb retrieving entity key by ID without parent

I want to get an entity key knowing entity ID and an ancestor.
ID is unique within entity group defined by the ancestor.
It seems to me that it's not possible using ndb interface. As I understand datastore it may be caused by the fact that this operation requires full index scan to perform.
The workaround I used is to create a computed property in the model, which will contain the id part of the key. I'm able now to do an ancestor query and get the key
class SomeModel(ndb.Model):
ID = ndb.ComputedProperty( lambda self: self.key.id() )
#classmethod
def id_to_key(cls, identifier, ancestor):
return cls.query(cls.ID == identifier,
ancestor = ancestor.key ).get( keys_only = True)
It seems to work, but are there any better solutions to this problem?
Update
It seems that for datastore the natural solution is to use full paths instead of identifiers. Initially I thought it'd be too burdensome. After reading dragonx answer I redesigned my application. To my suprise everything looks much simpler now. Additional benefits are that my entities will use less space and I won't need additional indexes.
I ran into this problem too. I think you do have the solution.
The better solution would be to stop using IDs to reference entities, and store either the actual key or a full path.
Internally, I use keys instead of IDs.
On my rest API, I used to do http://url/kind/id (where id looked like "123") to fetch an entity. I modified that to provide the complete ancestor path to the entity: http://url/kind/ancestor-ancestor-id (789-456-123), I'd then parse that string, generate a key, and then get by key.
Since you have full information about your ancestor and you know your id, you could directly create your key and get the entity, as follows:
my_key = ndb.Key(Ancestor, ancestor.key.id(), SomeModel, id)
entity = my_key.get()
This way you avoid making a query that costs more than a get operation both in terms of money and speed.
Hope this helps.
I want to make a little addition to dargonx's answer.
In my application on front-end I use string representation of keys:
str(instance.key())
When I need to make some changes with instence even if it is a descendant I use only string representation of its key. For example I have key_str -- argument from request to delete instance':
instance = Kind.get(key_str)
instance.delete()
My solution is using urlsafe to get item without worry about parent id:
pk = ndb.Key(Product, 1234)
usafe = LocationItem.get_by_id(5678, parent=pk).key.urlsafe()
# now can get by urlsafe
item = ndb.Key(urlsafe=usafe)
print item

mongo db indexes on embedded documents

I have a domain object model as below...
#document
Profile
{
**social profile list:**
SocialProfile
{
**Interest list:**
{
Interest
{
id
type
value
}
...
}
...
}
Each profile can have many social profiles, in each social profile there are many interests related to the profile via the specific social profile ( social profile represent social network like Facebook), each interest is also embedded document with the fields id , type , value.
So I have two questions..
can I index few fields separately in the embedded document interest?
can I create compound index in the embedded document interest?
I guess the complexity in my model is the deep level of the embedded document which is 2.. and that the path to that document is via arrays...
can it be done in spring way via metadata annotations? if you think my model is wrong please let me know I am a newbie on mongo
Thanks
You can index separately on the fields in an embedded document.
You can also create a compound index on the fields, so long as no more than one field is an array.
These might offer more answers:
http://www.mongodb.org/display/DOCS/Indexes#Indexes-CompoundKeys
http://www.mongodb.org/display/DOCS/Multikeys

Elementary Apex Object IDs

Quick Question. In the below code, you can see that the for loop (which takes all of the records in newTimecards and puts them as a variable called timecard) and adds the Resource_c to the resourceIds set. I'm confused about how this object is considered an ID data type. When an object is made in Salesforce does it automatically have an ID made, so that it knows Resource_c ID can be added to a set? Note that within the Resource_c Object there is also a field called Resource_ID_c. Resource_c within Timecard_c is a Master-Detail data type. Resource_c is the parent of Timecard_c.
Now that I think about it, resourceIds.add(timecard.Resource_c), does that reference the relationship between the two objects and then searches through Resource_c and adds the ID field Resource_ID_c automactically since it's a unique field?
Thanks for your help.
public class TimecardManager {
public class TimecardException extends Exception {}
public static void handleTimecardChange(List<Timecard__c> oldTimecards,
List<Timecard__c> newTimecards) {
Set<ID> resourceIds = new Set<ID>();
for (Timecard__c timecard : newTimecards) {
resourceIds.add(timecard.Resource__c);
}
Every object instance (and that means EVERY, including factory ones) has a unique organization level ID, whose field name is always Id, is covered by Apex type ID and is a case-sensitive string of 15 characters that also has an 18 character case-insensitive representation. The first three characters are object prefix code (e.g. 500 for a Case) so all instances of the same object share the same prefix. You see these values all across SF (for example in https://na1.salesforce.com/02s7000000BW59L the 02s7000000BW59L in the URL is the ID). When an instance of the object is created using INSERT DML operation, the salesforce automatically assigns unique value based on the prefix and the next available transactional sub ID, it all happens transparently to you.
This is not to be confused with object Name field which is a field you define when you create an object and which can be auto-incremented and so on (e.g. MYOBJ-{00000}) and which can have more meaning to a user than a cryptic ID
When you create a lookup or master-detail relationship it is ID that is being used to link the two instances, not the Name. In the above example Resource__c seems to be that lookup field and it contains Id value of row's master.
What the code does is it enumerates all resources used in timelines and builds a set of their IDs, the purpose of which is most probably to be used via WHERE Id IN :resourceIds clause to load resource details from master table.
mmix's answer is a great overview to what an ID is and where it comes from. To answer what I think is your specific question:
Any time there is a reference from one object to another (like here, between Timecard_c and Resource_c), the field representing the reference will be an ID. So, the for loop that calls resourceIds.add(timecard.Resource__c) is just building up your set of ID's (those 15-character strings). The timecard.Resource__c doesn't look through the Resource__c table to find the ID, timecard.Resource__c is the ID.

Resources