We have several systems with Oracle (A) and SQL Server (B) databases on backend. I have to consolidate data from those systems into the new SQL Server database.
Something like that:
(A) =>|---------------|
| some software | => SQL Server
(B) =>|---------------|
where some software is:
transport (A and B systems located in the network)
processing business logic (custom .NET code)
Due to first point, I need some queue software or something similar (like MSMQ, Service Broker or something). In another hand, I can implement a web-service instead of queue.
(A) =>|---------------|-------------|
| queue/service | custom code | => SQL Server
(B) =>|---------------|-------------|
The question is: which queue/transport framework should I use with Oracle and SQL Server databases?
It would be nice, if I can post messages to MSMQ in both Oracle and SQL Server stored procedures (can I?)
It would be nice, if I can call a web-service in both Oracle and SQL Server stored procedures (can I?)
It would be nice, if I can use something similar in both Oracle and SQL Server stored procedures (what exactly?)
What software should I prefer to my requirements?
UPD: some techspec
This would be a regular sync process. Once a day I think.
Latency is not critical (>0.5-1 hour is ok).
Amount of data: 1-50 MB per sync from each system.
Encryption is required while transfer.
I would suggest creating an SSIS package that transfer the new data from the server A,B to the new server when invoked. You would launch the SSIS package on a schedule, say every 30 min, from the new server.
If both A and B would be SQL Server then Service Broker would make sense in order to provide a very low latency. But with one of them being Oracle, and with no real-time requirements, it looses its appeal. As a side note, you can see here an example of using Service Broker for High volume real time contiguous ETL.
Doing the transfer as an SSIS package makes for ease of maintenance (you can modify the package with relative ease), it does not require invasive changes in the existing system, is quite performant and there is a large tonne of SSIS know-how available online.
I would advice against using MSMQ for several reasons:
when transactional reliability is needed you'll have to involve all MSMQ related operations into distributed transaction (DTC between the MSSMQ dequeue and the SQL server insert/update on the new server) which will slow doen the processing throughput significantly
You'll need to come up with quite a few lines of code for the marshaling/unmarshaling and shredding the deltagram messages into the target system (I know codding is fun, but SSIS is simply better at this kind of jobs, and easier to maintain)
MSMQ limitation of 2GB per queue is quite small in real world (fills up quickly if your traffic increases and you have a maintenance downtime)
The real problem I'd be worried about is how to detect changes on A and B: when the SSIS job comes every 30 minutes, how does it know what data is new? Specially, how it detects deletes...
Related
We have an older vendor supplied application that is earmarked for platform upgrades in 2019 but is currently running SQL Server 2008 (SP4). It's about 1.2TB of data. Our internal IT unit has come to the point that we want to create a readable secondary for some reports, but mostly ad-hoc reporting. Usage is about 1500 active sessions and about 25,000 Be/S peak.
Now onto the actual question. The option I forsee are transactional replication, mirroring, and log shipping with a read only standby. One of the developers also put Service Broker with CDC ... any landmines or curveballs with CDC and SB?
Service Broker is a very powerful tool to create and manage ques. CDC reads the log asynchronously to pick up changes to designated tables. They don't interact with each other and are designed to have low impact on an active database. They both work very well even in high volume situations. Like many features in SQL Server they can be used with a minimal learning curve but if you want to really take advantage of these tools some study is required.
I'm looking for a little advice.
I have some SQL Server tables I need to move to local Access databases for some local production tasks - once per "job" setup, w/400 jobs this qtr, across a dozen users...
A little background:
I am currently using a DSN-less approach to avoid distribution issues
I can create temporary LINKS to the remote tables and run "make table" queries to populate the local tables, then drop the remote tables. Works as expected.
Performance here in US is decent - 10-15 seconds for ~40K records. Our India teams are seeing >5-10 minutes for the same datasets. Their internet connection is decent, not great and a variable I cannot control.
I am wondering if MS Access is adding some overhead here than can be avoided by a more direct approach: i.e., letting the server do all/most of the heavy lifting vs Access?
I've tinkered with various combinations, with no clear improvement or success:
Parameterized stored procedures from Access
SQL Passthru queries from Access
ADO vs DAO
Any suggestions, or an overall approach to suggest? How about moving data as XML?
Note: I have Access 7, 10, 13 users.
Thanks!
It's not entirely clear but if the MSAccess database performing the dump is local and the SQL Server database is remote, across the internet, you are bound to bump into the physical limitations of the connection.
ODBC drivers are not meant to be used for data access beyond a LAN, there is too much latency.
When Access queries data, is doesn't open a stream, it fetches blocks of it, wait for the data wot be downloaded, then request another batch. This is OK on a LAN but quickly degrades over long distances, especially when you consider that communication between the US and India has probably around 200ms latency and you can't do much about it as it adds up very quickly if the communication protocol is chatty, all this on top of the connection's bandwidth that is very likely way below what you would get on a LAN.
The better solution would be to perform the dump locally and then transmit the resulting Access file after it has been compacted and maybe zipped (using 7z for instance for better compression). This would most likely result in very small files that would be easy to move around in a few seconds.
The process could easily be automated. The easiest is maybe to automatically perform this dump every day and making it available on an FTP server or an internal website ready for download.
You can also make it available on demand, maybe trough an app running on a server and made available through RemoteApp using RDP services on a Windows 2008 server or simply though a website, or a shell.
You could also have a simple windows service on your SQL Server that listens to requests for a remote client installed on the local machines everywhere, that would process the dump and sent it to the client which would then unpack it and replace the previously downloaded database.
Plenty of solutions for this, even though they would probably require some amount of work to automate reliably.
One final note: if you automate the data dump from SQL Server to Access, avoid using Access in an automated way. It's hard to debug and quite easy to break. Use an export tool instead that doesn't rely on having Access installed.
Renaud and all, thanks for taking time to provide your responses. As you note, performance across the internet is the bottleneck. The fetching of blocks (vs a continguous DL) of data is exactly what I was hoping to avoid via an alternate approach.
Or workflow is evolving to better leverage both sides of the clock where User1 in US completes their day's efforts in the local DB and then sends JUST their updates back to the server (based on timestamps). User2 in India, also has a local copy of the same DB, grabs just the updated records off the server at the start of his day. So, pretty efficient for day-to-day stuff.
The primary issue is the initial DL of the local DB tables from the server (huge multi-year DB) for the current "job" - should happen just once at the start of the effort (~1 wk long process) This is the piece that takes 5-10 minutes for India to accomplish.
We currently do move the DB back and forth via FTP - DAILY. It is used as a SINGLE shared DB and is a bit LARGE due to temp tables. I was hoping my new timestamped-based push-pull of just the changes daily would have been an overall plus. Seems to be, but the initial DL hurdle remains.
We use a central SQL Server (2008 Standard edition) and several smaller, dedicated SQL Servers (Express editions). We need to implement some mechanism for transferring data asynchronously* from the dedicated decentralized SQL Server (bigger volume, see below) and back from the central SQL Server (few records, basically some notifications for the machines and possibly some optimization hints).
The dedicated SQL Servers are physically located near technology machines, and they are collecting say datetime, temperature rows in regular intervals (think about few seconds interval). There are about 500 records for one job, but the next job follows immediately (the machine does not know it is a new job--being quite stupid in the sense -- and simply collects the temperatures on and on).
The technology machines must be able to work without the central SQL Server, and the central SQL Server must work also when the machine is not accessible (i.e. its dedicated SQL engine cannot be reached, switched off with the machine). In other words, the solution need not to be super fast, but must be robust in the sense that no collected data is lost.
The basic idea is to move the collected data from the dedicated SQL Server (preprocessed to the normalized format with ID of the machine) to the well known table on the central SQL Server. Only the newer data should be sent to minimize the amount of the data. That transfer should be started by the dedicated SQL Server in regular intervals (say one hour) if the connection is OK. If the connection is not OK, the data will be sent after next hour, etc.
Another well known table on the central SQL Server will be used to send notifications for the dedicated SQL Server engines. This way the dedicated engine can be told (for example) what data was already processed/archived on the central SQL Server (i.e. the hint for what records may already be deleted from the local database on the dedicated machine), or whatever information that is hinted from the central (just hints or other not the real-time requirements). The hints will be collected by the dedicated SQL Server (i.e. also the machine responsibility). In other words, the central SQL Server only processes the well known, local tables. It does not try to connect the dedicated SQL Server machines.
The solution should use only the standard mechanisms -- SQL commands (via stored procedures), no external software. What kind of solution should I focus on?
Thanks,
Petr
[Edited later] The SQL servers are at the same Local Area Network.
If you are willing to make a mental switch and stop thinking in terms of tables and rows and instead think in terms of data and messages then Service Broker can do handle all the communication, delivery and message processing. Instead of locally (on the Express machines) doing INSERT INTO LocalTable(datetime, temperature) VALUES (...) you think in terms of:
BEGIN CONVERSATION WITH CentralServer ...;
SEND ON conversation MESSAGE TYPE [Measurement] (<datetime...><temperature ...>)
See Using Service Broker instead of Replication or High Volume Contiguous Real Time ETL
Sounds like a job for merge replication.
We get daily files that need to be loaded into our database. The files will get delivered on a separate server than the database. Which one of the 2 approaches are better for the ETL from a performance perspective?
Transfer files over from the delivery server to the database server. Do bulk load.
Open DB connection from delivery server and load
Edited to add: The servers are all on the same network.
Depends whether source servers are SQL servers or other technology, the driver used (if it's oracle the Microsoft driver will nerf your perf badly, oracle is better), the amount of database overhead You want to impose (while one server is feeding the other they are probably both IO bound), the disk layout You have (ie reading from one raid and writing to the other, conpressing and transferring through 1gig or 100mb might be more efficient. Usually the dumps compress nicely but as Beth have noticed, test it.
With dumps You can abuse parallel transformations (like multiple disk shares, and multiple processors use for compression - use 7zip period.) With ethernet YOu probably wont abuse as much parallelism. Same thing affects the target server.
All in all, as usual with performance, test, quantify, test, quantify, repeat:)
The universal response of 'It Depends'. It depends particularly on what ETL technology you are using. If your ETL is tied to the database server for its processing power (SSIS, BODI (to a lesser degree) then you need to get your files onto the database server asap. If you have a more file based ETL package (Abinitio, Informatica) then you are free to do your transformation on your delivery server and then move your 'ready-to-load' data onto the database server for bulk loading.
in all cases.
Espacially if the files are very large, you can compress data files before transporting over network.
I just realized that my application was needlessly making 50+ database calls per user request due to some hidden coding -- hidden in the sense that between LINQ, persistence frameworks and events it just so turned out that a huge number of calls were being made without me being aware.
Is there a recommended way to analyze individual transactions going to my SQL 2008 database, preferably with some integration to my Visual Studio 2010 environment? I want to be able to 'spy' on individual transactions being made, but only for certain pieces of my code, and without making serious changes to either the code or database.
I addition to SQL Server Profiler, there are a number of performance counters you can look at to see both a real time evaluation and a historic trend:
Batch Requests/sec: Effectively measures the number of actual calls made to the SQL Server
Transactions/sec: Number of transactions in each database.
Connection resets/sec: number of new connections started from the connection pool by your site.
There are many more performance counters you can monitor, specially if you want to measure performance, but going through is besides the scope here. A good starting point is Monitoring Resource Usage.
You can use the SQL Profiler tool that comes with SQL Server Management Studio.
Microsoft SQL Server Profiler is a graphical user interface to SQL Trace for monitoring an instance of the Database Engine or Analysis Services. You can capture and save data about each event to a file or table to analyze later. For example, you can monitor a production environment to see which stored procedures are affecting performance by executing too slowly.
As mentioned, SQL Profiler is userful at the SQL Server level. It is not available in SQL Server SSMS Express however.
At the .NET level, LINQ to SQL and the Entity Framework both support logging. See Logging every data change with Entity Framework, http://msdn.microsoft.com/en-us/magazine/gg490349.aspx, http://peterkellner.net/2008/12/04/linq-debug-output-vs2008/.