I have a lengthy daily process on an Oracle database that takes place every evening. I would like to:
Take a snapshot of the database at a certain point in the middle
of the daily process without interrupting it for a long time.
Query the snapshot to update a data warehouse database.
Drop the snapshot after pulling the necessary data.
I found the below link on Oracle website that describes what I need exactly and calls it a copy-on-write snapshot.
https://www.oracle.com/technetwork/database/features/availability/rman-fra-snapshot-322251.html
The problem is I could not find any help on creating such snapshots as all search results for "snaphsots" are related to materialized views which seemingly were called snaphosts in previous releases.
Is it possible to create a point in time version of a database in a short period of time (not backup / restore) in order to use it for data warehousing?
Related
I have two databases - a CRM database (Microsoft Dynamics crm) and a company database.
These two databases are different.
How to copy the company database (all objects) into CRM database every 5 seconds?
Thanks
The cheapest way to do this (and one of the easiest) is to use a method called log shipping. This can (on a schedule even every 5 minutes or so) copy the log file to another machine and from the shipped log file restore to the target data base. Please ignore geniuses that will claim it can be done every minute because it takes a little while to close the log backup file, move it and reapply but a 5-10 minute window is achievable.
You can also use mirroring,transactional replication, and other High Availability solutions but there is no easy way to keep two machines in sync.
Do you need to duplicate the data? Can;t you query the source system directly if they're on the same server?
Else this might point you in the right direction: Keep two databases synchronized with timestamp / rowversion
I need advise on copying daily data to another server.
Just to give you an image of the situation, I will explain a little. there are workstations posting transactions to 2 database servers (DB1 and DB2). These db servers hosted on 2 separate physical servers and are linked. Daily transactions are 50.000 for now but will increase soon. There might be days some workstations down (operational but cannot post data) and transactions posted after a few days.
So, what I do is I run a query on those 2 linked servers. The daily query output contains ~50.000 records with minimum 15 minutes fetching time as linked servers have performance problems.I will create a SP and schedule it to run 2AM in the morning.
My concern starts from here, the output will be copied across to another data warehouse (DW). This is our client's special land, I do not know much about. This DW will be linked onto these db servers to make it possible to send the data (produced by my stored procedure) across.
Now, what would you do to copy the data across:
Create a dummy table on DB1 to copy stored procedure output on the same server so make sure it is available and we do not need to rerun stored procedure again. Then client retrieves it later.
Use "select into" statement to copy the content to remote DW table. I do not know what happens with this one during fetching and sending data across to DW. Remember it takes ~15 mins to fetch the data by my stored procedure.
post the data (retrieved by stored procedure) with xml file through ftp.
Please tell me if there is a way of setting an alert or notification on jobs.
I just want to take precautions so it will be easier to track when something goes wrong.
Any advice is appreciated very much. Thank you. Oz.
When it comes to coping data in SQL Server you need to look at High Availability Solutions, depending on the version and edition of your SQL Server you will have different options.
http://msdn.microsoft.com/en-us/library/ms190202(v=sql.105).aspx
If you need just to move data for specific tables you can have options like SSIS job or SQL Server Replication.
If you are looking to have all tables in a given databases copied to another server you should use Log Shipping. Which allows you to copy entire content of source database to another location. Because this is done of smaller interval the your load will be distributed over larger period of time instead of having large transaction running at once.
Another great alternative is SQL Server Replication. This option will capture transaction on the source and push them to the target. This model requires publisher (source), distributor (can be source or another db) and subscriber (target).
Also you can create SSIS job that runs on frequent basis and just moves specified amount of data.
I have an application that is in production with its own database for more than 10 years.
I'm currently developing a new application (kind of a reporting application) that only needs read access to the database.
In order not to be too much linked to the database and to be able to use newer DAL (Entity Framework 6 Code First) I decided to start from a new empty database, and I only added the tables and columns I need (different names than the production one).
Now I need some way to update the new database with the production database regularly (would be best if it is -almost- immediate).
I hesitated to ask this question on http://dba.stackexchange.com but I'm not necessarily limited to only using SQL Server for the job (I can develop and run some custom application if needed).
I already made some searches and had those (part-of) solutions :
Using Transactional Replication to create a smaller database (with only the tables/columns I need). But as far as I can see, the fact that I have different table names / columns names will be problematic. So I can use it to create a smaller database that is automatically replicated by SQL Server, but I would still need to replicate this database to my new one (it may avoid my production database to be too much stressed?)
Using triggers to insert/update/delete the rows
Creating some custom job (either a SQL Job or some Windows Service that runs every X minutes) that updates the necessary tables (I have a LastEditDate that is updated by a trigger on my tables, so I can know that a row has been updated since my last replication)
Do you some advice or maybe some other solutions that I didn't foresee?
Thanks
I think that the Transactional replication is the better than using triggers.
Too much resources would be used in source server/database due to the trigger fires by each DML transaction.
Transactional rep could be scheduled as a SQL job and run it few times a day/night or as a part of nightly scheduled job. IT really depends on how busy the source db is...
There is one more thing that you could try - DB mirroring. it depends on your sql server version.
If it were me, I'd use transactional replication, but keep the table/column names the same. If you have some real reason why you need them to change (I honestly can't think of any good ones and a lot of bad ones), wrap each table in a view. At least that way, the view is the documentation of where the data is coming from.
I'm gonna throw this out there and say that I'd use Transaction Log shipping. You can even set the secondary DBs to read-only. There would be some setting up for full recovery mode and transaction log backups but that way you can just automatically restore the transaction logs to the secondary database and be hands-off with it and the secondary database would be as current as your last transaction log backup.
Depending on how current the data needs to be, if you only need it done daily you can set up something that will take your daily backups and then just restore them to the secondary.
In the end, we went for the Trigger solution. We don't have that much changes a day (maybe 500, 1000 top), and it didn't put too much pressure on the current database. Thanks for your advices.
I am new in designing a ETL process. Currently I have two database, one is the live database where the application use it for every day transaction. The other one is the data warehouse.
I have a table in the live database that regularly have new data insert into it. The goal is that every night the ETL Process will transfer the data in the live database to the data warehouse, follow by deleting the data in the live database.
Due to my lack of knowledge, the solution that I got is to implement something call a rolling table. Basically on the live database, I have two tables that have the same structure. I call them tblLive1 and tblLive2. I also has a synonym call tblLive. All insert is done on the synonym. The synonym would point at one of the table.
When I run the ETL process, I have a stored procedure that would drop and create a new synonym that point to tblLive2. This allow the ETL process to transform data from tblLive1 without effecting the application. The assumption is that the ETL Process takes an hour to run, and I won't want the ETL process lock the table preventing the application insert new data to it.
This solution should theoretically work, but not elegant.
I am sure this problem is a common problem, are there any other solutions out there?
To add to Bob's answer (above), It is usual in DWH/BI applications, that all necessary tables are essentially copied into a "staging" database or a "staging" schema on your DWH database(depending on the number of tables / size etc). These would ordinarily be on a different server to your OLTP system - for a DWH implementation of any size that is)
To answer the question on performance impact, it depends on your server spec/io configuration.
Is data being inserted into the OLTP system 24hours/day? or are there downtimes? or low traffic times?
It might be worthwhile using database compression as IO is going to be your biggest enemy and this will help considerably.
Read the table into a staging area and process the staging table. You usually want to spend as little time on the production system as you have too. Especially if it is in use.
You may also want to look into using tables loaded by a trigger. Or Change Data Capture if you are on SQL 2008
I'm just wondering what's the difference between a SQL database snapshot and a regular SQL database? Can someone out there would like to help me understand the difference between the two?
Thanks in advance.
A snapshot is a read-only copy of another database, made at a point in time. Any changes to the original database cause the version of the data when the snapshot was taken to get written to the file used by the snapshot. Therefore, there's a performance hit involved, but it can be very useful for knowing exactly what your database looked like at some point in the past (when you told the snapshot to be created).
It's definitely worth noting that the snapshot contains no data of its own when first created, as it can reference the original database for it, at least until the original database is changed.
When a snapshot is first created, it is an empty shell that delegates all queries (a snapshot is read only) to the original database.
As changes are made to the original database, the pages involved are copied to the snapshot. Queries of the snapshot at this point will be performed on a logical database that is the result of layering the pages in the snapshot over those in the original database.
The effect is that the snapshot appears to be a complete copy of the original database that was made at the same time as the snapshot was created.
One scenario in which this can be useful is in deploying changes. The snapshot can be a very inexpensive form of insurance if something goes wrong. Assuming that only a subset of the pages within the original database were modified during the deployment, only that subset of the pages will need to be copied back from the snapshot to the original database during a restore.