given the following question:
What are two possible classes of intervention that an administrator of a system database
could take to improve the performance of a query?
What is a possible answer?
first consider that DB optimizations are fairly dependent from the DB type/vendor, but here's a short list:
Fetch the "Explain plan" of the query (if your db let you do this): this is useful to understand what the DB does (using indexes, for instances) to retrieve the query results
If your query is filtering on attributes with small cardinality, create a Bitmap Index on these to speed up retrieving of table rows
If your query is using joins and your DB supports this, use a Join Index
If your query is retrieving results against a portion of a table identified by a single attribute, you could use Sharding
Related
I am investigating the possibility to split one DB into multiple. We decided to move some tables into another database, but we have queries with join on these tables. I found a few solutions about how to achieve that:
Azure SQL Database elastic query
EXTERNAL DATA SOURCE
But I don`t know what the difference between them and what to choose.
Thanks for any help!
Azure SQL Database Elastic Queries and External data sources are two names for the same concept.
My suggestion is to avoid cross database queries and avoid splitting one database into multiples because query performance involving external data sources won't be the same no matter what strategy you choose to query those external tables.
If you still want to stick with the plan of splitting the database into multiple databases, then know that cross database queries show good performance when the remote tables are not big. When remote tables are big, this article shows you how to perform joins remotely using table variables and improve performance. This other article shows you also how to push parameterized operations to remote databases and improve performance.
if you are thinking to split your DB into multiple SQL server DB with the different host then you can prefer Linked server which has flexible to join across SQL servers
I'm currently trying to build a data flow in SSIS to select all records from a mapping table where an ID column exists in the related Item table. There are two complications:
The two tables are currently in different databases on different servers.
The databases are in Azure, for which I've read Linked Servers are not supported.
To be more clear, the job to migrate data from Staging environment to Production. I only want to push lookup records into prod if the associated Item IDs are in there. Here's some psudo-TSQL to give a clear goal of what I'm trying to achieve:
SELECT *
FROM [Staging_Server].[SourceDB].[dbo].[Lookup] L
WHERE L.[ID] IN (
SELECT P.[Item]
FROM [Production_Server].[TargetDB].[dbo].[Item] P
)
I haven't found a good way to create this in SSIS. I think I've created a work-around that involves sorting both tables and performing a merge join, but sorting both sides is an unnecessary hit on performance. I'm looking for a more direct and intuitive design for this seemingly simple data flow.
Doing this in a data flow, you'd have your Source query, sans filter, fed into a Lookup Component which is the subquery.
The challenge with this is SSIS is likely on-premises so that means you are going to pull all of your data out of Stage Azure to the server running SSIS and push it back to the Prod Azure instance.
That's a lot of network activity and as I'm reading the Azure pricing guide, I guess as long as you have the appropriate DTUs, you'd be fine. Back in the day, you were charged for Reads and not Writes so the idiom was to just push all your data to target server and then do the comparison there, much as ElendaDBA mentions. Only suggestion I'd make on the implementation is to avoid temporary tables or ad-hoc creation/destruction of them. Just implement as a physical table and truncate and reload prior to transmission to production.
You could create a temp table on staging server to copy production data into. Then you could create a query joining those two tables. After SSIS package runs, you could delete the temp table on staging server
Is there any possibility to trace, most used column in every table on the SQL server?
The intention is, a query which suggests some columns to be indexed; Based on the usage of the column by the queries that run on server
Thanks In Advance
First of all there is not direct relationship between the most used column and the indexes. The column can be used often, but if it has a poor selectivity the index will be useless.
You can use Database Tuning Advisor to get optimal indexes set.
I write a module to translate 1 sql query into another query. When users send sql queries to DB-Engine, then DB-Engine will firstly forward these queries to my defined-module before processing sql syntax.
How can I integrate my-defined module to DB-Engine of SQL Server?
You can redirect queries for certain data to different tables using a partitioned view:
http://technet.microsoft.com/en-US/library/ms188299(v=SQL.105).aspx
In a nutshell, you tell the server some rules as to which values reside in which tables (usually based on primary or foreign key ranges for example). When you query using the partition field, the database can direct your query to the correct remote table. But you can still do queries over all the tables just as if they were held locally (except more slowly).
We have a SQL 2005/2008 database that has a table with a computed column. We're using the computed column as a discriminator in NHibernate so having it in the database is proving to be very useful.
In order to gain the benefits of faster integration tests, I'd like to be able to run our integration tests against an in-memory database such as SQLite or SQL CE. But I don't think either of those support the computed column.
Are there any other solutions to my problem? I have complete access to the database and can modify it if there's a better solution available. I've seen this post that suggests using a view instead of a computed column, is this the best alternative?
What I did was added the computed column to the DataTable when loading the table from SqlCe. I stored the definition of the computed DataColumn in a "configuration" table stored in the database. I was able to do complex calculations that depended on a "chain" of tables, where each table performed a simplier function of a more complex function. (The last table in the chain contained the results.) I used SqlCe because one table of five contained 15 million rows. Too much data for the in-memory data sets of ADO.NET. (I had a requirement of using local client based calculations before posting to server.)