Can SQL Server Replication include the source dbid in the replicated data?

Can SQL Server Replication include the source dbid in the replicated data? - sql-server

Let's say I have DatabaseA with TableA, which has these fields: Id, Name.
In another database, DatabaseB, I have TableA which has these fields: DatabaseId, Id, Name.
Is it possible to setup a replication publication that will send:
DatabaseA.dbid, DatabaseA.TableA.Id, DatabaseA.TableA.Name
to DatabaseB.TableA?
Edit:
The reason I'm asking is that I need to combine multiple databases (with identical schemas) into a single database, with as little latency as possible. Replication seemed like a good place to start (need to replicate data from one place to another), but I'm just in the brainstorming phase. I would definitely be open to suggestions on how to accomplish this without using replication.

There might be an easier way to do it, but the first thing I thought of is wrapping TableA in an indexed view on the source database and then replicating the view as a table (i.e., type = "indexed view logbased"). I don't think this would work with merge replication, though.
So, that would roughly be like:
CREATE VIEW TableA_with_dbid WITH SCHEMABINDING AS
SELECT DatabaseA.dbid, Id, Name FROM TableA
CREATE UNIQUE CLUSTERED INDEX ON TableA_with_dbid (Id) -- or whatever your PK is
EXEC sp_addarticle ...,
#source_object = 'TableA_with_dbid',
#destination_table = 'TableA',
#type = 'indexed view logbased',
...
Big caveat: indexed views have a lot of requirements that may not be appropriate for your application. For example, certain options have to be set any time you update the base table.
(In response to the edit in your question...) This won't work for combining multiple sources into one table. AFAIK, an object in a subscribing database can only come from one published article. And you can't do an indexed view on the subscribing side since UNION is not allowed in an indexed view. (The docs don't explicitly state UNION ALL is disallowed, but it wouldn't surprise me. You might try it just in case.) But it still does answer your explicit question: the dbid would be in the replicated table.

Are you aggregating these events in one place from multiple sources? Replicating only comes from one source - it's one-to-one, so the source ID doesn't seem like it would make much sense.
If you're aggregating data from multiple sources, maybe linked servers and triggers is a better choice, and if that's the case, then you could absolutely include any information about the source that you want.
If you can clarify your question to describe the purpose, it would help us find the best solution.
UPDATED FROM NEW DETAIL IN QUESTION:
Does this solution sound like it might be what you need?
Set up AFTER triggers on the source databases that send any changed rows to the central repository database, in some kind of holding table. These rows can include additional columns, like "Source", "Change type" (for insert, delete, etc).
Some central process watches the table and processes new rows (or runs periodically - once/minute, maybe), incorporating them into the central database
You could adjust how frequently the check/merge process runs on the server based on your needs (even running it constantly to handle new rows as they appear, perhaps even with an AFTER trigger on that table as well).

Related

Update table as same table in another database changes

I have two databases in one instance of SQL server and they have the same structure.
Now I want to write some triggers for some of the tables in databases to get synced with each other whenever they got inserted, updated or deleted records.
something like below will be going to be one of the triggers :
CREATE TRIGGER AdminMessage_Insert
ON AdminMessage
AFTER INSERT
AS
INSERT INTO SecondDb.dbo.AdminMessage
( ID ,
DeptKey ,
AdminKey ,
ReceiverKey ,
MessageText ,
IsActive
)
SELECT i.ID, i.DeptKey, i.AdminKey, i.ReceiverKey, i.MessageText, i.IsActive
FROM INSERTED i
so my problem is that there are many tables and writing about three triggers for each of them doesn't seem to be the best solution.
can you give me a better and smaller approach?
UPDATE
I found some ways like CDC, Change Tracking, SQL Audit And of course Replication (snap replication) and read about them.
as I understand the best solution for me is using 'CDC' Or 'Audit'.
in both of them, I must work with each table one by one that takes a long time from me.
can I have all table changes with less work and with one SQL instance? (replication is good, but it needs more than one instance)
what's your idea?

While Change Data Capture (CDC) wasn't designed to be used as a sort of replication, we use it in this way at my company because it works for us. You enable CDC for the specific tables that you need to only get the net changes. The records are then stored in a database created by CDC. From there you can push the changes to the other database. You can find more information about CDC here.

Because it seems like you are looking for a solution that is only replicating the data one way, can I assume that the second source is read-only? If so, and because you said both databases are on the same instance, you can use synonyms in your secondary database.

How Do I find what is populating a table?

I constantly run into this problem. I am working in a data warehouse and I cannot find out what is populating a table. Typically the table is being populated on a daily basis from either other table in the warehouse or from an Oracle database. I have tried the below query and can confirm the updates, but i cannot see what is doing it. I searched to the known SSIS package and stored procedure with similar names and SQL jobs but I can find nothing.
select object_name(object_id) as DatabaseName, last_user_update, *
from sys.dm_db_index_usage_stats
where database_id = DB_ID('Warehouse')
and object_id=object_id('PAYMENTS_DAILY')
I only have the most basic SQL Server tools available so no fancy search tools :(

There is no way to tell, after data has been inserted into a data, where the data came from without having some sort of logging.
SSIS has logging, you can use triggers on the tables, change data capture, audit columns, etc. are the many ways to do this.

Frequently, if you know when the row was added, that can help you figure out what process is adding it. Add a new "InsertedDatetime" column to your warehouse table and give it a default value of getdate(). If you know that the rows always come in at 11:15 AM, you can use that to narrow your search.
That will probably be enough information, but if that doesn't help you track down the process, then you can add additional columns that contain everything from a source IP address to a calling object name.
As a last resort, you could rename your table and create a view named the same and then use an Instead Of Insert trigger on it that just holds open the connection so you can examine the currently executing processes to figure out where it's coming from.
I bet you can figure it out from the time alone though.

Add DATE column to store when last read

We want to know what rows in a certain table is used frequently, and which are never used. We could add an extra column for this, but then we'd get an UPDATE for every SELECT, which sounds expensive? (The table contains 80k+ rows, some of which are used very often.)
Is there a better and perhaps faster way to do this? We're using some old version of Microsoft's SQL Server.

This kind of logging/tracking is the classical application server's task. If you want to realize your own architecture (there tracking architecture) do it on your own layer.
And in any case you will need application server there. You are not going to update tracking field it in the same transaction with select, isn't it? what about rollbacks? so you have some manager who first run select than write track information. And what is the point to save tracking information together with entity info sending it back to DB? Save it into application server file.

You could either update the column in the table as you suggested, but if it was me I'd log the event to another table, i.e. id of the record, datetime, userid (maybe ip address etc, browser version etc), just about anything else I could capture and that was even possibly relevant. (For example, 6 months from now your manager decides not only does s/he want to know which records were used the most, s/he wants to know which users are using the most records, or what time of day that usage pattern is etc).
This type of information can be useful for things you've never even thought of down the road, and if it starts to grow large you can always roll-up and prune the table to a smaller one if performance becomes an issue. When possible, I log everything I can. You may never use some of this information, but you'll never wish you didn't have it available down the road and will be impossible to re-create historically.
In terms of making sure the application doesn't slow down, you may want to 'select' the data from within a stored procedure, that also issues the logging command, so that the client is not doing two roundtrips (one for the select, one for the update/insert).
Alternatively, if this is a web application, you could use an async ajax call to issue the logging action which wouldn't slow down the users experience at all.

Adding new column to track SELECT is not a practice, because it may affect database performance, and the database performance is one of major critical issue as per Database Server Administration.
So here you can use one very good feature of database called Auditing, this is very easy and put less stress on Database.
Find more info: Here or From Here
Or Search for Database Auditing For Select Statement

Use another table as a key/value pair with two columns(e.g. id_selected, times) for storing the ids of the records you select in your standard table, and increment the times value by 1 every time the records are selected.
To do this you'd have to do a mass insert/update of the selected ids from your select query in the counting table. E.g. as a quick example:
SELECT id, stuff1, stuff2 FROM myTable WHERE stuff1='somevalue';
INSERT INTO countTable(id_selected, times)
SELECT id, 1 FROM myTable mt WHERE mt.stuff1='somevalue' # or just build a list of ids as values from your last result
ON DUPLICATE KEY
UPDATE times=times+1
The ON DUPLICATE KEY is right from the top of my head in MySQL. For conditionally inserting or updating in MSSQL you would need to use MERGE instead

Doing large updates against indexed view

We have an indexed view that runs across three large tables. Two of these tables (A & B) are constantly getting updated with user transactions and the other table (C) contains data product info that is needs to be updated once a week. This product table contains over 6 million records.
We need this view across these three tables for our core business process and unfortunately we cannot change this aspect. We even had a sql server MVP come in to help test under load to make sure we have the most efficient configuration. There is one column in the product table that gets utilized in the view and has to be updated each week.
The problem we are now encountering is that as volume is increasing on our transactions against tables A & B, the update to Table C is causing deadlocks.
I have tried several different methods to no avail:
1) I was hoping that we could change the view so that table C could be a dirty read "WITH (NOLOCK)" but apparently that functionality is not available with indexes views.
2) I thought about updating a new column in Table C and then just renaming it when the process is done but you cannot do that due to the dependency in the view.
3) I also entertained the idea of writing this value to a temporary product table, and then running an ALTER statement against the view to have it point to my new table. however when i did that the indexes on my view were dropped and it took quite a bit of time to recreate them.
4) we tried to do the weekly update in small chunks (as small as 100 records at a time) but we still run into dead locks.
questions:
a) we are using sql server 2005. Does sql server 2008 have a new functionality with their indexed views that would help us? Is there now a way to do dirty reads w/ an indexed view?
b) a better approach to altering an existing view to point to a new table?
thanks!

The issue you're experiencing is that adding the indexed view between the three tables is causing lock contention. There is a really good post about the issue here : http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/06/02/be-ready-to-drop-your-indexed-view.aspx
Partitioning the table might provide some relief, although I don't know if the partitioning will circumvent the lock issue. You will have to upgrade to 2008 if you want to investigate this option however - as you need to use partition-aligned indexed views. 2005 will require you to drop the view before you swap in/out any partitions.
More information about partition-aligned indexed views: http://msdn.microsoft.com/en-us/library/dd171921.aspx

Have you considered making C a partitioned table and swapping in/out a partition as your price update mechanism? I'm not sure how that would work with an indexed view - I would think the index needs to be rebuilt at that point. I think this is probably the same situation you are seeing with the ALTER TABLE, actually.
Is the indexed view really necessary? i.e. could appropriate indexes on the 3 underlying tables perform just as well when a normal view is used? Remember that the indexed view may have to be updated on key changes to any of the three tables, while a index on a single table would only have to be updated if a key changes or data moves in just that table. Typically indexed views are indexed on different columns than the base tables because it is a different kind of section accross the data than is available in the underlying tables - does that description really apply?
How long does the pricing update take? This would appear to be the core of your problem, but it's hard to say without more information.

We can try this for avoid locking.
SELECT a,b,c FROM indexedview as v WITH (NOEXPAND,NOLOCK) WHERE ...

What is a maintainable way to store large text fields without sacrificing performance?

I have been dancing around this issue for awhile but it keeps coming up. We have a system and our may of our tables start with a description that is originally stored as an NVARCHAR(150) and I then we get a ticket asking to expand the field size to 250, then 1000 etc, etc...
This cycle is repeated on ever "note" field and/or "description" field we add to most tables. Of course the concern for me is performance and breaking the 8k limit of the page. However, my other concern is making the system less maintainable by breaking these fields out of EVERY table in the system into a lazy loaded reference.
So here I am faced with these same to 2 options that have been staring me in the face. (others are welcome) please lend me your opinions.
Change all may notes and/or descriptions to NVARCHAR(MAX) and make sure we do exclude these fields in all listings. Basically never do a: SELECT * FROM [TableName] unless is it only retrieving one record.
Remove all notes and/or description fields and replace them with a forign key reference to a [Notes] table.
CREATE TABLE [dbo].[Notes] (
[NoteId] [int] NOT NULL,
[NoteText] [NVARCHAR](MAX)NOT NULL )
Obviously I would prefer use option 1 because it will change so much in our system if we go with 2. However if option 2 is really the only good way to proceed, then at least I can say these changes are necessary and I have done the homework.
UPDATE:
I ran several test on a sample database with 100,000 records in it. What I find is that the because of cluster index scans the IO required for option 1 is "roughly" twice that of option 2. If I select a large number of records (1000 or more) option 1 is twice as slow even if I do not include the large text field in the select. As I request less rows the lines blur more. I a web app where page sizes of 50 or so are the norm, so option 1 will work, but I will be converting all instances to option 2 in the (very) near future for scalability.

Option 2 is better for several reasons:
When querying your tables, the large
text fields fill up pages quickly,
forcing the database to scan more
pages to retrieve data. This is
especially taxing when you don't
actually need to return the text
data.
As you mentioned, it gives you
a clean break to change the data
type in one swoop. Microsoft has
deprecated TEXT in SQL Server 2008,
so you should stick with
VARCHAR/VARBINARY.
Separate filegroups. Having
all your text data in a slower,
cheaper storage location might be
something you decide to pursue in
the future. If not, no harm, no
foul.
While Option 1 is easier for now, Option 2 will give you more flexibility in the long-term. My suggestion would be to implement a simple proof-of-concept with the "notes" information separated from the main table and perform some of your queries on both examples. Compare the execution plans, client statistics and logical I/O reads (SET STATISTICS IO ON) for some of your queries against these tables.
A quick note to those suggesting the use of a TEXT/NTEXT from MSDN:
This feature will be removed in a
future version of Microsoft SQL
Server. Avoid using this feature in
new development work, and plan to
modify applications that currently use
this feature. Use varchar(max),
nvarchar(max) and varbinary(max) data
types instead. For more information,
see Using Large-Value Data Types.

I'd go with Option 2.
You can create a view that joins the two tables to make the transition easier on everyone, and then go through a clean-up process that removes the view and uses the single table wherever possible.

You want to use a TEXT field. TEXT fields aren't stored directly in the row; instead, it stores a pointer to the text data. This is transparent to queries, though - if you ask for a TEXT field, it will return the actual text, not the pointer.
Essentially, using a TEXT field is somewhat between your two solutions. It keeps your table rows much smaller than using a varchar, but you'll still want to avoid asking for them in your queries if possible.

The TEXT/NTEXT data type has practically unlimited length while taking up next to nothing in your record.
It comes with a few strings attached, like special behavior with string functions, but for a secondary "notes/description" type of field these may be less of a problem.

Just to expand on Option 2
You could:
Rename existing MyTable to MyTable_V2
Move the Notes column into a joined Notes table (with 1:1 joining ID)
Create a VIEW called MyTable that joins MyTable_V2 and Notes tables
Create an INSTEAD OF trigger on MyTable view which saves the Notes column into the Notes table (IF NULL then delete any existing Notes row, if NOT NULL then Insert if not found, otherwise Update). Perform appropriate action on MyTable_V2 table
Note: We've had trouble doing this where there is a Computed column in MyTable_V2 (I think that was the problem, either way we've hit snags when doing this with "unusual" tables)
All new Insert/Update/Delete code should be written to operate directly on MyTable_V2 and Notes tables
Optionally: Have the INSERT OF trigger on MyTable log the fact that it was called (it can do this minimally, UPDATE a pre-existing log table row with GetDate() only if existing row's date is > 24 hours old - so will only do an update once a day).
When you are no longer getting any log records you can drop the INSTEAD OF trigger on MyTable view and you are now fully MyTable_V2 compliant!
Huge amount of hassle to implement, as you surmised.
Alternatively trawl the code for all references to MyTable and change them to MyTable_V2, put a VIEW in place of MyTable for SELECT only, and not create the INSTEAD OF trigger.
My plan would be to fix all Insert/Update/Delete statements referencing the now deprecated MyTable. For me this would be made somewhat easier because we use unique names for all tables and columns in the database, and we use the same names in all application code, so making sure I had found all instances by a simple FIND would be high.
P.S. Option 2 is also preferable if you have any SELECT * lying around. We have had clients whos application performance has gone downhill fast when they added large Text/Blob columns to existing tables - because of "lazy" SELECT * statements. Hopefully that isn;t the case in your shop though!