Common strategy in handling concurrent global 'inventory' updates - database

To give a simplified example:
I have a database with one table: names, which has 1 million records each containing a common boy or girl's name, and more added every day.
I have an application server that takes as input an http request from parents using my website 'Name Chooser' . With each request, I need to pick up a name from the db and return it, and then NOT give that name to another parent. The server is concurrent so can handle a high volume of requests, and yet have to respect "unique name per request" and still be high available.
What are the major components and strategies for an architecture of this use case?

From what I understand, you have two operations: Adding a name and Choosing a name.
I have couple of questions:
Qustion 1: Do parents choose names only or do they also add names?
Question 2 If they add names, doest that mean that when a name is added it should also be marked as already chosen?
Assuming that you don't want to make all name selection requests to wait for one another (by locking of queueing them):
One solution to resolve concurrency in case of choosing a name only is to use Optimistic offline lock.
The most common implementation to this is to add a version field to your table and increment this version when you mark a name as chosen. You will need DB support for this, but most databases offer a mechanism for this. MongoDB adds a version field to the documents by default. For a RDBMS (like SQL) you have to add this field yourself.
You havent specified what technology you are using, so I will give an example using pseudo code for an SQL DB. For MongoDB you can check how the DB makes these checks for you.
NameRecord {
id,
name,
parentID,
version,
isChosen,
function chooseForParent(parentID) {
if(this.isChosen){
throw Error/Exception;
}
this.parentID = parentID
this.isChosen = true;
this.version++;
}
}
NameRecordRepository {
function getByName(name) { ... }
function save(record) {
var oldVersion = record.version - 1;
var query = "UPDATE records SET .....
WHERE id = {record.id} AND version = {oldVersion}";
var rowsCount = db.execute(query);
if(rowsCount == 0) {
throw ConcurrencyViolation
}
}
}
// somewhere else in an object or module or whatever...
function chooseName(parentID, name) {
var record = NameRecordRepository.getByName(name);
record.chooseForParent(parentID);
NameRecordRepository.save(record);
}
Before whis object is saved to the DB a version comparison must be performed. SQL provides a way to execute a query based on some condition and return the row count of affected rows. In our case we check if the version in the Database is the same as the old one before update. If it's not, that means that someone else has updated the record.
In this simple case you can even remove the version field and use the isChosen flag in your SQL query like this:
var query = "UPDATE records SET .....
WHERE id = {record.id} AND isChosend = false";
When adding a new name to the database you will need a Unique constrant that will solve concurrenty issues.

Related

Controlling NHIbernate search query output regarding parameters

When you use NHibernate to "fetch" a mapped object, it outputs a SELECT query to the database. It outputs this using parameters; so if I query a list of cars based on tenant ID and name, I get:
select Name, Location from Car where tenantID=#p0 and Name=#p1
This has the nice benefit of our database creating (and caching) a query plan based on this query and the result, so when it is run again, the query is much faster as it can load the plan from the cache.
The problem with this is that we are a multi-tenant database, and almost all of our indexes are partition aligned. Our tenants have vastly different data sets; one tenant could have 5 cars, while another could have 50,000. And so because NHibernate does this, it has the net effect of our database creating and caching a plan for the FIRST tenant that runs it. This plan is likely not efficient for subsequent tenants who run the query.
What I WANT to do is force NHibernate NOT to parameterize certain parameters; namely, the tenant ID. So I'd want the query to read:
select Name, Location from Car where tenantID=55 and Name=#p0
I can't figure out how to do this in the HBM.XML mapping. How can I dictate to NHibernate how to use parameters? Or can I just turn parameters off altogether?
OK everyone, I figured it out.
The way I did it was overriding the SqlClientDriver with my own custom driver that looks like this:
public class CustomSqlClientDriver : SqlClientDriver
{
private static Regex _partitionKeyReplacer = new Regex(#".PartitionKey=(#p0)", RegexOptions.Compiled);
public override void AdjustCommand(IDbCommand command)
{
var m = _tenantIDReplacer.Match(command.CommandText);
if (!m.Success)
return;
// replace the first parameter with the actual partition key
var parameterName = m.Groups[1].Value;
// find the parameter value
var tenantID = (IDbDataParameter ) command.Parameters[parameterName];
var valueOfTenantID = tenantID.Value;
// now replace the string
command.CommandText = _tenantIDReplacer.Replace(command.CommandText, ".TenantID=" + valueOfTenantID);
}
} }
I override the AdjustCommand method and use a Regex to replace the tenantID. This works; not sure if there's a better way, but I really didn't want to have to open up NHibernate and start messing with core code.
You'll have to register this custom driver in the connection.driver_class property of the SessionFactory upon initialization.
Hope this helps somebody!

Trigger Duplicate CSV

am trying to upload a CSV file / insert a bulk of records using the import wizard. In short I would like to keep the latest record, in case if duplicates are found. Duplicates record are a combination of First name, Last name and title
For example if my CSV file looks like the following:
James,Wistler,34,New York,Married
James,Wistler,34,London,Married
....
....
James,Wistler,34,New York,Divorced
This should only keep in my org: James,Wistler,34,New York,Divorced
I have been trying to write a trigger before an update / insert but so far no success Here is my trigger code: (The code is not yet finished (only filering with Firstname), I am having a problem deleting found duplicate in my CSV ) Any hints. Thanks for reading!
trigger CheckDuplicateInsert on Customer__c(before insert,before update){
Map <String, Customer__c> customerFirstName = new Map<String,Customer__c>();
list <Customer__c> CustomerList = Trigger.new;
for (Customer__c newCustomer : CustomerList)
{
if ((newCustomer.First_Name__c != null) && System.Trigger.isInsert )
{
if (customerFirstName.containsKey(newCustomer.First_Name__c) )
//remove the duplicate from the map
customerFirstName.remove(newCustomer.First_Name__c);
//end of the if clause
// add this stage we dont have any duplicate, so lets add a new customer
customerFirstName.put(newCustomer.First_Name__c , newCustomer);
}
else if ((System.Trigger.oldMap.get(newCustomer.id)!= null)&&newCustomer.First_Name__c !=System.Trigger.oldMap.get(newCustomer.id).First_Name__c )
{//field is being updated, lets mark it with UPDATED for tracking
newCustomer.First_Name__c=newCustomer.First_Name__c+'UPDATED';
customerFirstName.put(newCustomer.First_Name__c , newCustomer);
}
}
for (Customer__c customer : [SELECT First_Name__c FROM Customer__c WHERE First_Name__c IN :customerFirstName.KeySet()])
{
if (customer.First_Name__c!=null)
{
Customer__c newCustomer=customerFirstName.get(customer.First_Name__c);
newCustomer.First_Name__c=Customer.First_Name__c+'EXIST_DB';
}
}
}
Purely non-SF solution would be to sort them & deduplicate in Excel for example ;)
Good news - you don't need a trigger. Bad news - you might have to ditch the import wizard and start using Data Loader. The solution is pretty long and looks scary but once you get the hang of it it should start to make more sense and be easier to maintain in future than writing code.
You can download the Data Loader in setup area of your Production org and here's some basic info about the tool.
Anyway.
I'd make a new text field on your Contact, call it "unique key" or something and mark it as External Id. If you have never used ext. ids - Jeff Douglas has a good post about them.
You might have to populate the field on your existing data before proceeding. Easiest would be to export all Contacts where it's blank (from a report for example), fill it in with some Excel formulas and import back.
If you want, you can even write a workflow rule to handle the generation of the unique key. This might help you when Mrs. Jane Doe gets married and becomes Jane Bloggs and also will make previous point easier (you'd just import Contacts without changes, just "touching" them and the workflow will fire). Something like
condition: ISBLANK(Unique_key__c) || ISCHANGED(FirstName) || ISCHANGED(LastName) || ISCHANGED(Title)
new value: Title + FirstName + ' ' + LastName
Almost there. Fire Data Loader and prepare an upsert job (because we want to insert some records and when duplicate is found - update them instead).
My only concern is what happens when what's effectively same row will appear more than once in 1 "batch" of records sent to SF like in your example. Upsert will not know which value is valid (it's like setting x = 7; and x = 5; in same save to DB) and will decide to fail these rows. So you might have to tweak the amount of records in a batch in Data Loader's settings.

Is there any way to do a Insert or Update / Merge / Upsert in LLBLGen

I'd like to do an upmerge using LLBLGen without first fetching then saving the entity.
I already found the possibility to update without fetching the entity first, but then I have to know it is already there.
Updating entries would be about as often as inserting a new entry.
Is there a possibility to do this in one step?
Would it make sense to do it in one step?
Facts:
LLBLgen Pro 2.6
SQL Server 2008 R2
.NET 3.5 SP1
I know I'm a little late for this, but As I remember working with LLBLGenPro, it is totally possible and one of its beauties is everithing is possible!
I don't have my samples, but I'm pretty sure you there is a method named UpdateEntitiesDirectly that can be used like this:
// suppose we have Product and Order Entities
using (var daa = new DataAccessAdapter())
{
int numberOfUpdatedEntities =
daa.UpdateEntitiesDirectly(OrderFields.ProductId == 23 && OrderFields.Date > DateTime.Now.AddDays(-2));
}
When using LLBLGenPro we were able to do pretty everything that is possible with an ORM framework, it's just great!
It also has a method to do a batch delete called DeleteEntitiesDirectly that may be usefull in scenarios that you need to delete an etity and replace it with another one.
Hope this is helpful.
I think you can achieve what you're looking for by using EntityCollection. First fetch the entities you want to update by FetchEntityCollection method of DataAccessAdapter then, change anything you want in that collection, insert new entities to it and save it using DataAccessAdapter, SaveCollection method. this way existing entities would be updated and new ones would be inserted to the Database. For example in a product order senario in which you want to manipulate orders of a specified product then you can use something like this:
int productId = 23;
var orders = new EntityCollection<OrderEntity>();
using (DataAccessAdapter daa = new DataAccessAdapter())
{
daa.FetchEntityCollection(orders, new RelationPredicateBucket(OrderFields.ProductId == productId))
foreach(var order in orders)
{
order.State = 1;
}
OrderEntity newOrder = new OrderEntity();
newOrder.ProductId == productId;
newOrder.State = 0;
orders.Add(newOrder);
daa.SaveEntityCollection(orders);
}
As far as I know, this is not possible, and could not be possible.
If you were to just call adapter.Save(entity) on an entity that was not fetched, the framework would assume it was new. If you think about it, how could the framework know whether to emit an UPDATE or an INSERT statement? No matter what, something somewhere would have to query the database to see if the row exists.
It would not be too difficult to create something that did this more or less automatically for single entity (non-recursive) saves. The steps would be something like:
Create a new entity and set it's fields.
Attempt to fetch an entity of the same type using the PK or a unique constraint (there are other options as well, but none as uniform)
If the fetch fails, just save the new entity (INSERT)
If the fetch succeeds, map the fields of the created entity to the fields of the fetched entity.
Save the fetched entity (UPDATE).

Jdbc batched updates good key retrieval strategy

I insert alot of data into a table with a autogenerated key using the batchUpdate functionality of JDBC. Because JDBC doesn't say anything about batchUpdate and getAutogeneratedKeys I need some database independant workaround.
My ideas:
Somehow pull the next handed out sequences from the database before inserting and then using the keys manually. But JDBC hasn't got a getTheNextFutureKeys(howMany). So how can this be done? Is pulling keys e.g. in Oracle also transaction save? So only one transaction can ever pull the same set of future keys.
Add an extra column with a fake id that is only valid during the transaction.
Use all the other columns as secondary key to fetch the generated key. This isn't really 3NF conform...
Are there better ideas or how can I use idea 1 in a generalized way?
Partial answer
Is pulling keys e.g. in Oracle also transaction save?
Yes, getting values from a sequence is transaction safe, by which I mean even if you roll back your transaction, a sequence value returned by the DB won't be returned again under any circumstances.
So you can prefetch the id-s from a sequence and use them in the batch insert.
Never run into this, so I dived into it a little.
First of all, there is a way to retrieve the generated ids from a JDBC statement:
String sql = "INSERT INTO AUTHORS (LAST, FIRST, HOME) VALUES " +
"'PARKER', 'DOROTHY', 'USA', keyColumn";
int rows = stmt.executeUpdate(sql,
Statement.RETURN_GENERATED_KEYS);
ResultSet rs = stmt.getGeneratedKeys();
if (rs.next()) {
ResultSetMetaData rsmd = rs.getMetaData();
int colCount = rsmd.getColumnCount();
do {
for (int i = 1; i <= colCount; i++) {
String key = rs.getString(i);
System.out.println("key " + i + "is " + key);
}
}
while (rs.next();)
}
else {
System.out.println("There are no generated keys.");
}
see this http://download.oracle.com/javase/1.4.2/docs/guide/jdbc/getstart/statement.html#1000569
Also, theoretically it could be combined with the JDBC batchUpdate
Although, this combination seems to be rather non-trivial, on this pls refer to this thread.
I sugest to try this, and if you do not succeed, fall back to pre-fetching from sequence.
getAutogeneratedKeys() will also work with a batch update as far as I remember.
It returns a ResultSet with all newly created ids - not just a single value.
But that requires that the ID is populated through a trigger during the INSERT operation.

How to use MS Sync Framework to filter client-specific data?

Let's say I've got a SQL 2008 database table with lots of records associated with two different customers, Customer A and Customer B.
I would like to build a fat client application that fetches all of the records that are specific to either Customer A or Customer B based on the credentials of the requesting user, then stores the fetched records in a temporary local table.
Thinking I might use the MS Sync Framework to accomplish this, I started reading about row filtering when I came across this little chestnut:
Do not rely on filtering for security.
The ability to filter data from the
server based on a client or user ID is
not a security feature. In other
words, this approach cannot be used to
prevent one client from reading data
that belongs to another client. This
type of filtering is useful only for
partitioning data and reducing the
amount of data that is brought down to
the client database.
So, is this telling me that the MS Sync Framework is only a good option when you want to replicate an entire table between point A and point B?
Doesn't that seem to be an extremely limiting characteristic of the framework? Or am I just interpreting this statement incorrectly? Or is there some other way to use the framework to achieve my purposes?
Ideas anyone?
Thanks!
No, it is only a security warning.
We use filtering extensively in our semi-connected app.
Here is some code to get you started:
//helper
void PrepareFilter(string tablename, string filter)
{
SyncAdapters.Remove(tablename);
var ab = new SqlSyncAdapterBuilder(this.Connection as SqlConnection);
ab.TableName = "dbo." + tablename;
ab.ChangeTrackingType = ChangeTrackingType.SqlServerChangeTracking;
ab.FilterClause = filter;
var cpar = new SqlParameter("#filterid", SqlDbType.UniqueIdentifier);
cpar.IsNullable = true;
cpar.Value = DBNull.Value;
ab.FilterParameters.Add(cpar);
var nsa = ab.ToSyncAdapter();
nsa.TableName = tablename;
SyncAdapters.Add(nsa);
}
// usage
void SetupFooBar()
{
var tablename = "FooBar";
var filter = "FooId IN (SELECT BarId FROM dbo.GetAllFooBars(#filterid))";
PrepareFilter(tablename, filter);
}

Resources