How to create the LINK in data vault without having strong relationship keys (Foreign Keys)? - data-modeling

Our sales representatives call Leads to propose services. Leads are stored in Crm with attributes: LeadId, PrimaryContactNumber, SecondaryContactNumber. All calls are done via Teams and recorded. Calls could be extracted via Microsoft Graph Api and have attributes: CallId, UserId, CalleeNumber, CallerNumber, Duration. Users have these attributes: UserId, Username, Email. How can I generate LINK (relation between User, Lead) using data vault modeling when foreign keys are not known? I thought my design should look like this: User(hub)-Call(link)-Lead(hub), but my call has only userId, leadId could only be inferred from one of the Lead attributes(PrimaryContactNumber or SecondaryContactNumber) What is the best solution to this problem? Or should I model Call as a hub aswell and perform filter when loading data to datamart?

A call is not a business entities, it's rather a relation between a sales person and a lead. So, your first thought is right.
To answers your question, you might want take a look at the zero/ghost records concept. http://roelantvos.com/blog/unknown-keys-zero-keys-or-ghost-keys-in-hubs-for-dv2-0/
Basically, add a "Unknown" entry in your Lead(Hub). Then, when you generate the link and the Lead is not known, simply link it to the unknown entities. Make sure you add a satellite to you link, so you can track the period it was unknown, and when it is known.

Related

NoSql - entity holds an owner ID field vs owner holds list of child ID's

I am currently exploring MongoDB.
I built a notes web app and for now the DB has 2 collections: notes and users.
The user can create, read and update his notes.
I want to create a page called /my-notes that will display all the notes that belong to the connected user.
My question is:
Should the notes model has an ownerId field or the opposite - the user model will have a field of noteIds of type list.
Points I found relevant for the decision making:
noteIds approach:
There is no need to query the notes that hold the desired ownerId (say we have a lot of notes then we will need indexes and search accross the whole notes collection). We just need to find the user by user ID and then get all the notes by their IDs.
In this case there are 2 calls to DB.
The data is ordered by the order of insertion to the notesIds field in the document.
ownerId approach:
We do need to find the notes by their ownerId field across the notes collection which might be more computer "intensive".
We can paginate / sort the data as we want - more control over the data.
Are there any more points you can think of?
As I can conclude this is a question of whether you want less computer intensive DB calls vs more control over the data.
What are the "best practices"?
Thanks,
A similar use case is explained in the documentation. If there is no limit on number of notes a user can have, it might be better to store a userId reference field in notes document.
As you've figured out already, pagination would be easier in the second approach. Also when updating notes, you can simply updateOne({ _id: "note_id", userId: 1 }) instead of checking user's document if the note actually belong to the user.

Salesforce retrieve object id using custom field on another object

We're implementing a coupon program. The coupons are unique codes and are related to an existing customer and stored in a custom field on the Account.
When a Lead is created due to being referred using one of the unique coupons, the unique coupon is saved in a custom field on the Lead. I need to access the associated Account Id of the unique coupon.
I could do this by creating a trigger on lead insert and then query accounts looking for unique coupon. My concern with this approach is having a trigger and query on every lead created; seems this would not be good pratice - using so much resources for a rare situation
Is there another (better) approach; lookup?
Thanks
If you build the trigger correctly there should be no concern about resources. But it really depends on what you're trying to do with the Account data, I don't know the architecture though so you will need to give more details

GQL + Join Table Query Replacement for Google App Engine Datastore

Given the following Many to Many Relationship designed in Google App Engine Datastore:
User
PK: UserID
Name
Company
PK: CompanyID
Name
CompanyReview
CK CompanyID
CK UserID
ReviewContent
For optimization query, what's the best way to query this relationship tables for showing the selected company's review by users.
Currently, I'm doing the following:
results = CompanyReview.all().filter('owned_by = ', company).filter('written_by = ', user).fetch(10)
where I'm able to retrieve the data of CompanyReview table. However, in this case, I would need to check against the UserID from this CompanyReview table against the User table in order to obtain the name of the users who have commented for the selected company.
Is there a better solution to grab the user name as well, all in one statement in this case or at least better optimized solution? Performance is emphasized.
It dependes on which side of the relationship will have more values. As described is this article of Google App Engine docs, you can model many-to-many relationships by using a list of keys in one side of the relationship. "This means you should place the list on side of the relationship which you expect to have fewer values".
If both sides of the relationship will have many values, then you will really need the CompanyReview model. But pay attention to what the article says:
However, you need to be very careful because traversing the
connections of a collection will require more calls to the datastore.
Use this kind of many-to-many relationship only when you really need
to, and do so with care to the performance of your application.
This is because it uses RefereceProperty in the relationship model:
class ContactCompany(db.Model):
contact = db.ReferenceProperty(Contact,
required=True,
collection_name='companies')
company = db.ReferenceProperty(Company,
required=True,
collection_name='contacts')
title = db.StringProperty()
So if in Contact entities we try to access the companies, it will make a new query. And if in ContactCompany entities we try to get attributes of contact as in contact_company.contact.name, a query for that single contact will be made also. Read the ReferencyProperty docs for more info.
Extra:
Since you are performance-savvy, I recommend using a decorator for memcaching function returns and using this excellent layered storage library for Google App Engine.

Table Module vs. Domain Model

I asked about Choosing a method to store user profiles the other day and received an interesting response from David Thomas Garcia suggesting I use the Table Module design pattern. It looks like this is probably the direction I want to take. Everything I've turned up with Google seems to be fairly high level discussion, so if anyone could point me in the direction of some examples or give me a better idea of the nuts and bolts involved that would be awesome.
The best reference is "Patterns of Enterprise Application Architecture" by Martin Fowler:
Here's an excerpt from the section on Table Module:
A Table Module organizes domain
logic with one class per table in the
database, and a single instance of a
class contains the various procedures
that will act on the data. The
primary distinction with Domain
Model is that, if you have many
orders, a Domain Model will have one
order object per order while a Table
Module will have one object to handle
all orders.
Table Module would be particularly useful in the flexible database architecture you have described for your user profile data, basically the Entity-Attribute-Value design.
Typically, if you use Domain Model, each row in the underlying table becomes one object instance. Since you are storing user profile information in multiple rows, then you end up having to create many Domain Model objects, whereas what you really want is one object that encapsulates all the user properties.
Instead, the Table Module makes it easier for you to code logic that applies to multiple rows in the underlying database table. If you create a profile for a given user, you'd specify all those properties, and the Table Module class would have the code to translate that into a series of INSERT statements, one row per property.
$table->setUserProfile( $userid, array('firstname'=>'Kevin', 'lastname'=>'Loney') );
Likewise, querying a given user's profile would use the Table Module to map the multiple rows of the query result set to object members.
$hashArray = $table->getUserProfile( $userid );

Preferred way to map code with user created database entries

I am trying to work out the best database model for the current setup:
An administrator can create "customer products". This means services/products which customer can attach/subscribe to. The simple cases where the product simply costs a price, or the product subscription should send an e-mail is easy to model in the database.
But how about very specific backend code for a customer product? For example, one product might have very specific code implemented for checking a customer status on a different database. How can I map this relationship in the database so I can turn on/turn off some code based on the product settings.
My intuitive way of handling it would be to have a string column on the CustomerProducts table where a pre-defined set of strings could be set, e.g. "MyCustomCodeHandler", and then the code would check for the existence of this string in order to execute it. But for me it doesn't really feel like a real relationship between the database and code.
Data is data, whereas code is code. I would not recommend storing code in the database.
If you need to allow customers to create product types (in the object-oriented sense of "types") with associated code, I'd choose to deploy that code in the same way you deploy other code.
The custom code may also reference custom data stored in the database. I'd choose to create a dependent table per product subtype, and put the type-specific columns in there. The relationship between this subtype table and the generic product table is one-to-one. That is, the primary key in the subtype table is also a foreign key to the generic product table.

Resources