SQL Server Linked server for data federation - sql-server

I am familiar with SQL server as a product and want to use this to do data federation across a wide range of data sources that can be accessed trough linked servers. My data volumes are rather limited.
However I am interested how would this approach compare to existing data virtualisation software on a data federation level performance wise?
SQL server has his own query optimization engine and so on... so will performance of sql server linked servers federated queries be comparable to the Denodos, Red hats, Ciscos,... and other Data Virtualization products of this world? Or do they have some other trick up their sleeve?
Kind Regards!

The specialised data virtualization tools do have optimization techniques you will not find in LinkedServer and similar technologies. The optimizers of DV systems have been designed from scratch to work with distributed architectures, while the optimizer of SQL Server has been designed for a conventional database. The problem of optimizing queries in both scenarios is VERY different.
You can check this two blog posts by Denodo to get an idea of the differences (DISCLAIMER: I work for Denodo):
http://www.datavirtualizationblog.com/performance-data-virtualization-logical-data-warehouse-scenarios/
http://www.datavirtualizationblog.com/cost-based-optimization-in-data-virtualization/

Related

Replication across heterogenous databases

Is it possible using SQL Server Replication to replicate data to AND from (bi-directional) Oracle and SQL Server? The schemas are completely different. In real-time would be a bonus.
Have already investigated Oracle Golden Gate, which seemed to do the job, although the licence cost is not insignificant!
I wondered if anyone has had any experience in replicating data across different schemas, and what other tools they employed? I realise this is a bit of an open-ended question but any advice and previous experiences would be most useful.
Thanks
Duncan
I recently had to create a solution to import periodically lots of data from different databases (most of the time from Oracle databases) to a SQL Server database (a data warehouse). To do so, I used SQL Server Integration Service to create a package able to import, transform and insert the data as I wanted (since it was from heterogeneous sources too). This software comes with SQL Server and the version 2005 and superior is really easy to use (graphical programming). In your case, you could trigger your created services when needed. I am not sure it is the best solution since you would need to create a SSIS service for each direction (from Oracle to SQL Server and from SQL Server to Oracle).

CAML queries against a database hosted on a RDBMS other than SQL Server?

We have a SharePoint 2007 site. It is supported by two back-end databases - one hosted on SQL Server, another on an open-source RDBMS. We issue CAML queries to retrieve data from SQL Server, and ADO.NET queries to retrieve data from the other server. Our architect says we would be better off if we used the same approach (namely, CAML) to get data from the both databases.
Is it possible to use CAML queries to retrieve data from any RDBMS other than SQL Server?
If so, please suggest any web resources, docs, anything you find appropriate.
CAML (at least that part used for SPList.GetItems queries) seems to be quite simple, so translating it into valid SQL statements should not be too complex. Which means, you could create a "translator module" and issue your queries against it. For instance, you can follow guidelines published in the article "[Implementing a .Net Framework Data Provider](http://msdn.microsoft.com/en-us/magazine/aa720164(VS.71).aspx)".

How to report across large enterprise systems

Some time ago my company was evaluating different reporting solutions. We settled on MS SSRS particularly because it's capable of connecting to various types of data stores, including MS SQL, Oracle and SAP Netweaver BI. It has proven the test of time pretty well, however, we're now under fire from management because SSRS is not capable of mixing data sources into the same data set.
So, I searched long and hard for a reporting solution that can "inner join" data from separate systems, but I came up short. I am about to propose that we custom write reports (ASP.NET) for these cross-system report requests, but I wanted to ping the internet first for any advice.
How do you "inner join" across your massive enterprise systems for reporting purposes?
Take a look at BIRT from Actuate. BIRT comes in open-source and commercial flavors and I believe it allows joins across data sources.
Perhaps a linked server in SQL Server? Watch out for performance issues and I'm not sure if SSRS has limitations against them - I don't think that it does. You can reference a table like this - MYSERVER01.DATABASE1.dbo.TABLE. More info from the source.
For best performance you would be pulling all your disparate data in to a data warehouse, but that is a major undertaking that management may not be willing to fund.
One way to join across data sources in SSRS is to use subreports - see http://msdn.microsoft.com/en-us/library/ms159837.aspx .
Peformance is unlikely to be good using this method, however - Sam's suggestion of a linked server is likely to be more practical.
(According to BIRT's documentation it does enable joins between datasets, as DMKing suggested - I haven't tried using this feature yet.)
this limitation was removed in SQl Server 2008 R2. there is a work around in previous versions
see my full response here: How can i add a field to dataset from another dataset in ssrs?
I was going to suggest using SSIS as a possibility until I read this
http://blogs.msdn.com/b/jenss/archive/2009/04/23/consuming-ssis-package-data-in-reporting-services-and-using-web-services-in-addition-part-1.aspx
I have used linked servers to combine data from Oracle/SQL Server, not nice but it worked.
Failing that I'd go with subreports.
Failing that, point out to management how expensive SAP/Oracle etc are and they'll soon stop moaning.
:)

What advantages does a Document-based database have over a Relational database?

For example: Microsoft SQL Server vs. CouchDB.
The main benefit for me with CouchDB is that you can access it from pretty much anywhere! What advantages does a document based database have over a relational one?
Where would a document based db be a better choice over a relational?
I wouldn't say "accessing it from anywhere" is an advantage of CouchDB over SQL Server. Both are fully accessible from a variety of clients.
The key differentiating factor is the fundamental concept of how data is persisted as tables & columns (SQL Server) versus documents (CouchDB). In addition, CouchDB is designed to leverage multiple copies with replication/map-reduce in a highly forgiving fashion. SQL Server can do the same level of fault tolerance but true map-reduce is non-existant in it (it's ability to deal with sets mimics the capabilities fundamentally however - see GROUPING SETS keyword).
You should note this post which really shows that map reduce has its place, but you need to pick the right tool for the job:
http://gigaom.com/2009/04/14/mapreduce-vs-sql-its-not-one-or-the-other/

Business Intelligence: Data mining with MS SQL Server?

I have to study about data mining using SQL Server. As I know, Business Intelligence in SQL Server supports data mining, but I'm not pretty sure.
Does BI really support data mining?
How can I start with data mining with SQL Server? I mean, resources such as books, blogs,..etc
Thank you all.
I would suggest beginning your research by focusing on SQL Server Analysis Services via the Books Online Documentation, in particular the "Analysis Services Information Worker InfoCenter" as the information presented is mostly for the attention of data analysts.
http://msdn.microsoft.com/en-us/library/ms174577(SQL.90).aspx
Using the reference you can choose further readying into specific subjects such as "Data Mining Concepts", which will subsequently lead you onto the various Data Mining Algorithms that are available to you.
Then to get hands on with the technology, take a look at the Microsoft Data Mining Tutorial:
http://msdn.microsoft.com/en-us/library/ms167167.aspx
Every database supports data mining. Handling large amounts of structured data is what databases do!
First learn SQL which is useful in many applications and databases.
Then, if you find stuff you can't solve with SQL, you can turn to:
Reporting services: to create fancy reports
Analysis services: to analyse truly gigantic amounts of data (if you're thinking in millions of rows, Analysis server is overkill)
Integration Services: import from non Sql Server sources, automate tasks, combine queries graphically
These are Sql Server specific, and not as useful in every BI scenario.
The Microsoft BI stack is composed of:
SQL Server Reporting Services
SQL Server Analysis Services
SQL Services Integration Services
The combination of three will allow you to mine data, perform analytics and display them interactively.
I have bookmarked a few good links in my delicious account:
http://delicious.com/syalam/ssrs
Check thislink out - I learned from those screencasts:
http://sqlblog.com/blogs/denis_gobo/archive/2007/12/13/3937.aspx

Resources