With Microsoft Common Data Service, is incremental refresh possible? - salesforce

We have tables on Salesforce which we'd like to make available to other applications usings Microsoft Common Data Service. Moreover, we'd like to keep CDS more or less up to date, even having data which was created or updated five minutes ago.
However, some of those tables have hundreds of thousands, or even millions, of records. So, refreshing all the data is inefficient and impractical.
In theory, when CDS queries for data, it should be able to know when its most recent data is from and include this data in the query for new data.
But I'm not clear how to make that part of the query that gets used in the refresh operation.
Is this possible?
What do I need to do?

How are you selecting the data that you are retrieving? If you have the option to use a SOQL query, you can use the fields CreatedDate and LastModifiedDate as part of your queries. For example:
SELECT Id, Name
FROM Account
WHERE CreatedDate > 2020-10-25T23:01:01Z
or
SELECT Id, Name
FROM Account
WHERE LastModifiedDate > 2020-10-25T23:01:01Z

There are some options but I have no idea what (if anything) is implemented in your connector. Read up a bit and maybe you'll decide to do something custom. There's also a semi-decent SF connector in Azure Data Factory 2 if that helps.
Almost every Salesforce table contains CreatedDate, LastModifiedDate and SystemModstamp columns but we don't have raw access to underlying database. There's no ODBC driver (or if there is - it's lying, pretending SF objects are "linked tables" and hiding the API implementation magic in stored procedures). I'm not affiliated, never used it personally but heard good stuff about DBAmp. It's been few years though, alternatives might have popped up. Go give that ADF connector a go.
I mean you could even rephrase it a bit, look around for backup tool that does incremental backups and works with SQL Server? kill 2 birds with 1 stone.
So...
The other answer gives you query route which is OK but bit impractical if it's 100+ tables.
There's Data Replication API to get Ids (primary keys) of recently updated/deleted records. SOAP API has getUpdated call, similar in REST API. You'd still have to call it per object though. And you'd just know that these were modified, still need to query all columns (there's "retrieve" call similar to SELECT *)
Perhaps you need to change the direction. SF can raise events when data changes, subscribing apps have between 1 to 3 days to consume them. It uses cometd protocol, chances are there's app for that in .NET world. There are few types of events (could raise custom events, could raise only when certain conditions are met, from SF config or code; and other way around - subscribing app could specify a query it's interested in and get notified whenever query's results would change). But if you just want everything - search for "Change Data Capture". Could be nice near-realtime solution

Related

Standard practice/API for sharing database data without giving direct database access

We would like to give some of our customers the option to read data from our central database. The data is live and new records are being added every few seconds. Our database is MySQL running on Amazon RDS.
I was wondering what is the common practice for doing so.
One option would be to give them select right from specific tables, in that case they would be able to access other customers' data as well.
I have tried searching for database, interface, and API key words and some other key words, but I couldn't find a good answer.
Thanks!
Use REST for exposing specific tables to do CRUD operations. You can control the access on it too.

Sitecore Analytics truncation

We have implemented Sitecore for our future site. We are going live soon with our corporate site. We had some testing done on this site to make sure everything was working right. There is some data that was written to the Analytics database due to this.
Now we are looking to get rid of this data and start fresh. We want to go the route of truncating the data out of the tables. Is there a way to do this? I know this could be done with a built-in functionality for the OMS but not the DMS. Also, what tables would be safe to truncate.
Thanks
Personally, I would start fresh by attaching an empty DMS database. You will need to redeploy any goals, page events, campaigns, etc., but it's a much safer option than truncating the tables.
Another thing to consider is that most (if not all) of the reports in the DMS are setup to accept a start and end date. Simply running your reports starting from the launch date may be all you need.
If you decide to truncate the tables, I would focus on any tables that have a Foreign Key relationship to the Visits table (the DMS ships with a Database Diagram that's really handy for stuff like this). Going in order that would be the PageEvents, Pages, and Profiles tables. Then it should be safe to clear out the Visits table.

Publish SQL Server data to clients from saas website with multi-tenant database?

We maintain a Software as a Service (SaaS) web application that sits on top of a multi-tenant SQL Server database. There are about 200 tables in the system, this biggest with just over 100 columns in it, at last look the database was about 10 gigabytes in size. We have about 25 client companies using the application every entering their data and running reports.
The single instance architecture is working very effectively for us - we're able to design and develop new features that are released to all clients every month. Each client experience can be configured through the use of feature-toggles, data dictionary customization, CSS skinning etc.
Our typical client is a corporate with several branches, one head office and sometimes their own inhouse IT software development teams.
The problem we're facing now is that a few of the clients are undertaking their own internal projects to develop reporting, data warehousing and dashboards based on the data presently stored in our multi-tenant database. We see it as likely that the number and sophistication of these projects will increase over time and we want to cater for it effectively.
At present, we have a "lite" solution whereby we expose a secured XML webservice that clients can call to get a full download of their records from a table. They specify the table, and we map that to a purpose-built stored proc that returns a fixed number of columns. Currently clients are pulling about 20 tables overnight into a local SQL database that they manage. Some clients have tens of thousands of records in a few of these tables.
This "lite" approach has several drawbacks:
1) Each client needs to develop and maintain their own data-pull mechanism, deal with all the logging, error handling etc.
2) Our database schema is constantly expanding and changing. The stored procs they are calling have a fixed number of columns, but occasionally when we expand an existing column (e.g. turn a varchar(50) into a varchar(100)) their pull will fail because it suddenly exceeds the column size in their local database.
3) We are starting to amass hundreds of different stored procs built for each client and their specific download expectations, which is a management hassle.
4) We are struggling to keep up with client requests for more data. We provide a "shell" schema (i.e. a copy of our database with no data in it) and ask them to select the tables they need to pull. They invariably say "all of them" which compounds the changing schema problem and is a heavy drain on our resources.
Sorry for the long winded question, but what I'm looking for is an approach to this problem that other teams have had success with. We want to securely expose all their data to them in a way they can most easily use it, but without getting caught in a constant process of negotiating data exchanges and cleaning up after schema changes.
What's worked for you?
Thanks,
Michael
I've worked for a SaaS company that went through a similar exercise some years back and Web Services is the probably the best solution here. incidentally, one of your "drawbacks" is actually a benefit. Customers should be encouraged to do their own data pulls because each customer's needs on timing and amount of data will be different.
Now instead of a LITE solution, you should look at building out a WSDL with separate CRUD calls for each table and good filtering capabilities. Also, make sure you have change times for records on each table. this way a customer can hit each table and immediately pull only the records that have been updated since the last time they pulled.
Will it be easy. Not a chance, but if you want scalability, it's the only route to go.
ood luck.

heavy data application silverlight vs mvc vs asp.net

we've been using a custom made ERP application for more than 4 years, actually there were some bad design issues which affected the performance for the application right now.
in fact the problem is, this application can be classified as heavy-data application, aka business application.
the biggest bad design issues, is that we are getting the whole data for a particular business unit, and assign it for controls, or keep it in memory (like ArrayLists, Generic Lists), and give the user the flexibility of navigation and full application speed response.
in the first 2 years of usage, we didn't notice the problem, but by the end of that period, we are getting complaints, more and more about the performance of the application.
We were and still using ADO.NET, with stored procedures, which is generated for CRUD operations (Get, List, Delete, Insert, Update SPs), so if we are going to separate each year's data, we've to redesign those auto generated SPs with the required modifications in the data access layer, and with the fact that there were no actual separation in layers, we are not able by easy means to do that modification, so the first upcoming solution is to do some database enhancements to balance this huge data traffic, but the problem raises again for about 6 Months by now.
so we think this is the time to correct our mistakes (very late decision), but how could we do this? we end up with these solutions:
Use web application rather than windows application, thus we'll be able to use our server's full power, and minimizes the traffic for the connection
Redesign the application so that it takes care of Year Endings, don't get all data, use better ORM rather than old fusion ADO.NET, but stick in windows forms, and same application
and according to time frame (limited some how), team members (2, both have good experience in windows, web development), I'm not able to take the decision right now, I'm afraid that windows application modifications could take longer time than expected.
can you advice me?
I'm using C# 4.0, ASP.NET web forms, (MVC newbie).
If the database itself is quick, as you've indicated in your response to my comment, then, as you have suggested, you must reduce the amount of data populating the DataSet.
Although ADO.NET uses a "disconnected" model, the developer still has control over the select-query that is used to populate each table/relation in the DataSet. A hypothetical example: suppose you had 5000 customers in your CUSTOMERS table in the database. You might populate the CUSTOMERS table in your DataSet like this:
select * from CUSTOMERS order by customername
or like this:
select id, customername from CUSTOMERS order by customername
The order-headers table could be populated like this:
create proc getOrderHeaders4Customer
#customerid int
as
select * from ORDERHEADER where customerid = #customerid
that is, you would need to have a current customer in the GUI before you'd populate the OrderHeaders relation, and you would fetch the order-detail data only when the user clicks on a particular order-header in the GUI. The point is, that ADO.NET can be used to do this -- you don't have to abandon everything in the app in order to be more parsimonious with the disconnected dataset.
How much data would be fetched by the client, and more importantly, when the data are fetched by the client, is completely under developer control regardless of whether it's Silverlight, WinForms, or ASP.NET. Now, if your ERP application was not efficient, retrofitting efficiency may be difficult. But you don't have to abandon ADO.NET to achieve efficiency, and simply choosing another data-layer technology will not itself bring you efficiency.

Is it good idea to build xml api based on for xml in SQL Server 2005?

One of my coworkers is going to build api directly from database. He wants to return xml for queries. Is really good idea?
We have to build api for our partners and we are looking for good architectural approach. We have few milions of products, reviews and so on.
Some partners will take 1000 products others would like to copy almost whole database.
We have to restrict access to some fields for example one partner will see productDescription other only id and productName.
Sometimes we would like to return only info about categories in xml response, sometimes we would like to include 100 products for each category returned.
One of programmers is forcing solution based almost only on mssql 2005 for xml auto. He wants to build query in application, send it to server and then return xml to partner. Without any caching within application.
Is it rally good idea?
I have used this technique for a particular web application. I have mixed feelings about this approach.
One pro is that this is really convenient for simple requirements. Another pro is that it is really easy to translate a database schema change in a change in the XML format, since everything is in one place.
I found there are cons too. When your target XML gets more complex, has more nested structures then this solution can get rapidly out of hand. Consider for example this (taken from http://msdn.microsoft.com/en-us/library/ms345137(SQL.90).aspx#forxml2k5_topic5):
SELECT CustomerID as "CustomerID",
(SELECT OrderID as "OrderID"
FROM Orders "Order"
WHERE "Order".CustomerID = Customer.CustomerID
FOR XML AUTO, TYPE),
(SELECT DISTINCT LastName as "LastName"
FROM Employees Employee
JOIN Orders "Order" ON "Order".EmployeeID = Employee.EmployeeID
WHERE Customer.CustomerID = "Order".CustomerID
FOR XML AUTO, TYPE)
FROM Customers Customer
FOR XML AUTO, TYPE
So essentially, you see that you begin writing SQL to mirror the structure of the XML output. And if you think about it, this is a bad thing - you're mixing the data retrieval logic with presentation logic - the fact that the presentation is in this case representation in a data exchange format, really does not change the fact that you're mixing two different things, making both of them harder.
For example, it is quite possible that the requirements for the exact structure of the XML change over time, whereas the actual associated data requirements remain unchanged. Then you would be rewriting queries even though there is nothing wrong with the actual dataset you are already retrieving. That's a code smell if you ask me.
Another consideration is performance/query tuning. I cannot say I have done much benchmarking of these types of queries, but I would typically avoid correlated subqueries like this whenever I can...and now, just because of this syntactic sugar, I would suddenly throw that overboard because of the convenience of generating XML with no intermediary language? I don't think it's a good idea.
So in short, I would use this technique if I could considerably simplify things. But if I could anticipate that I would need an intermediary language anyway to generate all the XML structures I need, I would choose to not use this technique at all. If you are going to generate XML, do it all in one place, don't put some in the query and some in your program because it will become a nightmare to manage change and maintain it.
It's a bad idea. SQL Server can return data only trough the TDS protocol, meaning it can only be a result set (rows). Returning XML means you still return a rowset, but a arowset of SQL data formated as XML. But ultimately, you still need a TDS protocol client (ie. SqlClient, OleDB, ODBC, JDBC etc). You still need to deal with rows, columns and T-SQL errors. You are returning a column containing XML data, not an XML response.
So if your client has to be database client, what advantage does XML give? Other than you lost all schema result metadata information in the process...
In adition, consider that stored procedures are an API for access for everything, including SSIS tasks, maintenance and ETL jobs, other applications deployed on the database etc. Having everything presented to that layer as XML will be a mess. Two stored procedures from related apps, both in the same instance, exchanging call via for-xml and then xpath-squery? Why? Keep in mind, your database will outlive every application you have in mind today.
I understand XML as a good format for exchange of data between web services. But not between the database and the client. So the answer is that your partners should see XML, but from your web service layer, not from your database.
In general, I see nothing wrong with exposing information stored in a RDBMS via a "read only" API that enforces access restrictions based on user privilege. Since your application is building the queries you can expose whatever the appropriate names are for your nouns (tables) and attributes (columns) in the user-facing API.
Most DBs can cache queries (and although I don't know SQL server at all I imagine it can do this), and the main advantage of not caching "downstream" in the application is simplicity - the data returned for each API call will be up to date, without any of the complexity of having to figure out when to refresh a "downstream" cache. You can always add caching later - when you're sure that everything works properly and you actually need the performance boost.
As for keeping the queries and the XML in sync - if you're simply dumping data records which are generated from a single table then there's not much of an issue, here. It is true that as you start combining information from multiple tables on the back end it may become harder to generate data records with a single query, but fixing this with intermediate data structures in the web server application is an approach that (typically) scales poorly as the tables grow - you're often better off putting the data you need to expose in a single query/API call into a database "view".
If your XML is designed in such a way that you have to load all the data into memory and calculate statistics (to be displayed in XML element attributes, for example) before rendering begins, then you'll have scalability problems no matter what you do. So try to avoid designing your XML this way from the beginning.
Note also that XML can often be "normalized" (just as DB tables should be), using internal links and "GLOSSARY" elements that factor out repeated information. This reduces the size of XML generated, and by rendering a GLOSSARY element at the end of the XML document (perhaps with information extracted from a subsequent SQL query) you can avoid having to hold lots of data in web server memory while serving the API call.
It depends on who will consume this API - If this API is going to be consumed by a large range of different languages, then yes it may make sense to expose returned data in an Xml format, as pretty much everything is able to parse Xml.
If on the other hand the API is predominanly going to be consumed by only 1 language (e.g. C# / .Net) then you would be much better writing the API in that language and directly exposing data in a format native to that language - exposing Xml based results will result in a needless generation and subsequent parsing of Xml.
Personally I would probably opt for a mixed approach - choose a suitable commonly used language (for the customers of this API) to write the API in, and then on top of that expose an extra xml based API if it turns out its needed.

Resources