Consideration DECS vs SSIS? - sql-server

I need solution to pump data from Lotus Notes to SqlServer. Data will be transfered in 2 modes
Archive data transfer
Current data transfer
Availability of data in Sql is not critical, data is used for reports. Reports could be created daily, weekly or monthly.
I am considering to choose from one of those solutions: DESC and SSIS. Could You please give me some tips about prons and cons of both technologies. If You suggest something else it could be also taken into consideration.
DECS - Domino Enterprise Connection Services
SSIS - Sql Sever Integration Services

I've personally used XML frequently to get data out of Lotus Notes in a way that can be read easily by other systems. I'd suggest you take a look and see if that fits your needs. You can create views that emit XML or use NotesAgents or Java Servlets, all of which can be accessed using HTTP.

SSIS is a terrific tool for complex ETL tasks. You can even write C# code if you need to. There are lots of pre-written available data cleaning components already out there for you to download if you want. It can pretty much do anything you need to do. It does however have a fairly steep learning curve. SSIS comes free with SQL Server so that is a plus. A couple of things I really like about SSIS are the ability to log errors and the way it handles configuration so that moving the package from the dev environment to QA and Prod is easy once you have set it up.
We have also set up a meta data database to record a lot of information about our imports such as the start and stop time, when the file was recieved, the number of records processed, types of errors etc. This has really helped us in researching data issues and has helped us write some processes that are automatically stopped when the file exceeds the normal parameters by a set amount. This is handy if you normally recive a file with 2 million records and the file comes in one day with 1000 records. Much better than delting 2,000,000 potential customer records because you got a bad file. We also now have the ability to do reporting on files that were received but not processed or files that were expected but not received. This has tremendously improved our importing porcesses (we have hundreds of imports and exports in our system). If you are designing from sratch, you might want to take some time and think about what meta data you want to have and how it will help you over time.
Now depending on your situation at work, if there is a possibility that data will also be sent to the SQL Server database from sources other than Lotus Notes as well as the imports from Notes that you are developing for, I would suggest it might be worth your time to go ahead and start using SSIS as that is how the other imports are likely to be done. As a database person, I would prefer to have all the imports I support using the same technology.
I can't say anything about DECS as I have never used it.

Just a thought - but as Lotus Notes tends to behave a bit "different" than relational databases (or anything else), you might be safer going with a tool which comes out of the Notes world, versus a tool from the sql world.
(I have used DECS in the past (prior to Domino 8) and it has worked fine for pumping data out into a SQL Server database. I have not used SSIS).

Related

Migrate data from MariaDB to SQLServer

We are planning to migrate all the data from MariaDB to SQLServer. Can anyone please suggest any approach to migrate the data so that no downtime is required as well as no data is lost.
In context of that, I have gone through a few posts here, but did not get much idea.
You could look into SQL Server Integration Services functionality for migrating your data.
Or you could manually create a migration script using a linked server in your new SQL Server instance.
Or you could use BCP to perform bulk imports (which is quite fast, but requires intermediate steps to put the data in text files).
What's more important is how you want to realize the "no downtime" requirement. I suppose the migration routines need some functional requirement, which might be difficult to implement with a general migration tool, like:
the possibility to perform the migration in multiple batches/runs (where already migrated data is skipped), and
the possibility to implement different phases of the migration in different solutions, like bulk imports (using text files and staging tables) for history data (which will not change anymore), but live queries over a live database connection for the latest updates in the MariaDB/MySQL database.
The migration strategy might also largely depend on the size of the data in MariaDB/MySQL, and the structure of the database(s) and its data. Perhaps you want to keep auto-generated primary key values, because the system requires them to remain unchanged. Perhaps you need to use different data types for some exotic table fields. Perhaps you need to re-implement some database logic (like stored procedures and functions). Etc. etc.
It is very difficult to give some ad-hoc advice about these kind of migration projects; as Tim Biegeleisen already commented, this can be quite a complex job, even for "small" databases. It practically always requires a lot of research, extensive preparations, test runs (in a testing environment using database backups), some more test runs, a final test run, etc. And - of course - some analytics, metrics, logging, and reporting for troubleshooting (and to know what to expect during the actual migration). If the migration will be long-running, you want to make sure it does not freeze the live production environment, and you might also want some form of progress indication during the migration.
And - last but not least - you surely want to have a "plan B" or a quick return strategy in case the actual migration will fail (despite all those careful preparations).
Hope I did not forget something... ;-)

Extract & transform data from Sql Server to MongoDB periodically

I have a Sql Server database which is used to store data coming from a lot of different sources (writers).
I need to provide users with some aggregated data, however in Sql Server this data is stored in several different tables and querying it is too slow ( 5 tables join with several million rows in each table, one-to-many ).
I'm currently thinking that the best way is to extract data, transform it and store it in a separate database (let's say MongoDB, since it will be used only for read).
I don't need the data to be live, just not older that 24 hours compared to the 'master' database.
But what's the best way to achieve this? Can you recommend any tools for it (preferably free) or is it better to write your own piece of software and schedule it to run periodically?
I recommend respecting the NIH principle here, reading and transforming data is a well understood exercise. There are several free ETL tools available, with different approaches and focus. Pentaho (ex Kettle) and Talend are UI based examples. There are other ETL frameworks like Rhino ETL that merely hand you a set of tools to write your transformations in code. Which one you prefer depends on your knowledge and, unsurprisingly, preference. If you are not a developer, I suggest using one of the UI based tools. I have used Pentaho ETL in a number of smaller data warehousing scenarios, it can be scheduled by using operating system tools (cron on linux, task scheduler on windows). More complex scenarios can make use of the Pentaho PDI repository server, which allows central storage and scheduling of your jobs and transformations. It has connectors for several database types, including MS SQL Server. I haven't used Talend myself, but I've heard good things about it and it should be on your list too.
The main advantage of sticking with a standard tool is that once your demands grow, you'll already have the tools to deal with them. You may be able to solve your current problem with a small script that executes a complex select and inserts the results into your target database. But experience shows those demands seldom stay the same for long, and once you have to incorporate additional databases or maybe even some information in text files, your scripts become less and less maintainable, until you finally give in and redo your work in a standard toolset designed for the job.

Data Replication vs Service Bus vs App Fabric vs...?

I am build an application which needs to consume data from a source database. The source database has several issues including:
Performance issues
Legacy structure with terrible keys, naming conventions, etc.
Lots of data my application doesn’t care about
I would like to setup an application specific SQL Server database. The new database will be populated with a subset of data from the source database (and from a few other source systems). The data will always move one way from the source databases to the application specific database (i.e. - data won't sync back to the source). It will have a different DDL model than the source database.
The data doesn't need to be synced absolutely real time, but any longer than a few minute lag could cause issues.
How should I move data from the source database into the application database? Should I use
Replication
Write Custom SSIS Packages
Abstact to higher level SOA
solution like nServiceBus, AppFabric, etc?
Some other ideas?
Pros/cons to each?
Sounds to me like you don't need a messaging service like NServiceBus - this would involve modifying the legacy system to publish events whenever data changes, something I expect you don't want to get into. Because it is acceptable in your case for your local store of data to be slightly out of date, an SSIS package could be acceptable.
However, if the source database is very large, this could be an issue, as you will be doing it every few minutes. Also, if users of the legacy system are already experiencing performance problems, an SSIS package running every few minutes won't help. Maybe you could introduce a timestamp of the source data, so that it only copies new/modified data?
If the source data is very large and performance is seriously an issue, then maybe NServiceBus would be a good idea. You could also consider Mass Transit or your own simple solution built on MSMQ. But this will mean getting you hands dirty with the legacy code.

Copying data from a local database to a remote one

I'm writing a system at the moment that needs to copy data from a clients locally hosted SQL database to a hosted server database. Most of the data in the local database is copied to the live one, though optimisations are made to reduce the amount of actual data required to be sent.
What is the best way of sending this data from one database to the other? At the moment I can see a few possibly options, none of them yet stand out as being the prime candidate.
Replication, though this is not ideal, and we cannot expect it to be supported in the version of SQL we use on the hosted environment.
Linked server, copying data direct - a slow and somewhat insecure method
Webservices to transmit the data
Exporting the data we require as XML and transferring to the server to be imported in bulk.
The data copied goes into copies of the tables, without identity fields, so data can be inserted/updated without any violations in that respect. This data transfer does not have to be done at the database level, it can be done from .net or other facilities.
More information
The frequency of the updates will vary completely on how often records are updated. But the basic idea is that if a record is changed then the user can publish it to the live database. Alternatively we'll record the changes and send them across in a batch on a configurable frequency.
The amount of records we're talking are around 4000 rows per table for the core tables (product catalog) at the moment, but this is completely variable dependent on the client we deploy this to as each would have their own product catalog, ranging from 100's to 1000's of products. To clarify, each client is on a separate local/hosted database combination, they are not combined into one system.
As well as the individual publishing of items, we would also require a complete re-sync of data to be done on demand.
Another aspect of the system is that some of the data being copied from the local server is stored in a secondary database, so we're effectively merging the data from two databases into the one live database.
Well, I'm biased. I have to admit. I'd like to hypnotize you into shelling out for SQL Compare to do this. I've been faced with exactly this sort of problem in all its open-ended frightfulness. I got a copy of SQL Compare and never looked back. SQL Compare is actually a silly name for a piece of software that synchronizes databases It will also do it from the command line once you have got a working project together with all the right knobs and buttons. Of course, you can only do this for reasonably small databases, but it really is a tool I wouldn't want to be seen in public without.
My only concern with your requirements is where you are collecting product catalogs from a number of clients. If they are all in separate tables, then all is fine, whereas if they are all in the same table, then this would make things more complicated.
How much data are you talking about? how many 'client' dbs are there? and how often does it need to happen? The answers to those questions will make a big difference on the path you should take.
There is an almost infinite number of solutions for this problem. In order to narrow it down, you'd have to tell us a bit about your requirements and priorities.
Bulk operations would probably cover a wide range of scenarios, and you should add that to the top of your list.
I would recommend using Data Transformation Services (DTS) for this. You could create a DTS package for appending and one for re-creating the data.
It is possible to invoke DTS package operations from your code so you may want to create a wrapper to control the packages that you can call from your application.
In the end I opted for a set of triggers to capture data modifications to a change log table. There is then an application that polls this table and generates XML files for submission to a webservice running at the remote location.

How to migrate from MS Access to SQL server 2005?

I have a VB.NET windows application that pulls information from an MS Access database. The primary role of the application is to extract information from Excel files in various formats, standarize the file layout and write that out to csv files. The application uses MS Access as the source for the keys and cross reference files.
The windows app uses typed datasets for much of the user interaction between the database. The standardization is done on the on each clients machine. The application is not... how can I say this...FAST :-).
Question: What is the best way to migrate the DB and application to SQL Server 2005. I am thinking it might be a good idea to write the code for the standarization in and SSIS packages.
What is the appropriate way to go about this migration?
The application pulls data from 250 excel files each week and approximatley 800 files each month with an average of about 5000 rows per file. There are 13 different file formats that are standarized and out put into 3 different standard formats. The application takes between 25 min. and 40 min to run depending on which data run we are taling about. 95% of the appliction is the standarization process. All the user does is pick a few parameters then start the run.
Microsoft provide a free tool to migrate an Access Database to SQL Server. Once you've upgraded you should be able to change your connection string to point at SQL Server.
You might want to run your app through a profiler to ensure that the Access DB is really what's slowing down your app, and not something else. It would be a shame to go through all the work to convert it over to SQL server, and have nothing to show for it.
The Access upsizing wizard can be used as a starting point.
You may be able to change the backend to be SQL Server with linked tables in Access without changing your front end. Then, you can modify the front end to go directly to SQL Server at will.
Unless you are hitting Access very heavily, I doubt that it is your bottleneck.
As far as reading the Excel files, SSIS can do it, but it might not be as reliable as the mechanism you are using in VB.NET right now, if your VB.NET code has a lot of smart logic to deal with a degree of variation in the input files
As far as writing data out to CSV, SSIS is fine, and I've found SSIS to be a pretty good performer.
If you could give more details about the workflow and how much the user interacts with the database versus the program pulling configuration, it might be easier to help with your architecture.
SSIS is very configurable on the fly (package configuring itself somewhat while it is running), and in many cases it could be programmed to read a variety of Excel files and convert them to CSV, but it's not as configurable on the fly as a hand-coded system. It is also possible to use the SSIS object model to generate packages programmatically and then execute them - this does not have some of the limitations of a package configuring itself, but the object model is pretty complex.
Making sure the scope is clear:
Use a .NET program to
drive an Access database front-end which enables you to
Extract data from a number of Excel spreadsheets,
Massaging the data appropriately, and
Save the result in a CSV file.
What sorts of volumes are we talking about? How many clients, how many spreadsheets per client, how many rows per spreadsheet (I think it would be 32767 max for a single spreadsheet, right? And how much time are we talking about?
Seems like a lot of moving parts. And Access usually is a pretty good tool (with VBA) to do this sort of thing by itself.
It doesn't seem like enough volume to provide a major time sink for a well-designed Access database front-ending Excel to accomplish the whole process using VBA. If your alternative involves installing and operating SQL Server (in place of Access) on each client, I would be surprised if the admin and operational overhead doesn't increase.
So Weekly, per client:
250 files at 25 minutes
= 10 files / minute
or 6 seconds per file.
Monthly, per client:
800 files at 40 minutes
= 20 files/minute
or 3 seconds per file.
My expectation would be less than 1 sec. per file (5000 rows) round trip including:
a. Import or attach xls to mdb,
b. Transform via Access SQL
c. Export to csv
The only explanation that comes to mind is that perhaps the .NET app is reading, transforming, and saving a row at a time. Is that possibly the case?
If you convert to SSIS, then that probably obsoletes the .NET app, because SSIS will want to handle the ETL (and save) itself. So you will basically be rewriting the software. But you may have better resources for SSIS than for Access. But it seems to me like overkill. BUt then .NET rather than VBA also is maybe overkill; and rewriting in VBA is work, too. The least effort would I think be to see if you can do the entire ETL (and save) using Access SQL for most of it, and using VBA just for scripting, to iterate through input files in a directory or some such.
I think you could at least prototype the basic use cases and find out if you can find out pretty quickly where the time is being spent now (as suggested by earlier responses.) But that would be worth finding out before committing redevelopment resources aimed at the wrong part of the problem. If you can expand a bit in those areas, I could probably direct you further. But Access is pretty well suited for this sort of thing, at (IMHO) a lower TCO than SQL Server + SSIS + .NET.
Not to mention that I'd be surprised if the csv files are the true end point, which may play a role in the decision. Isn't the Excel data really ending up further down the path?
Finally, how objectionable is a 25-40 minute process that presumably is unattended, can run over lunch break, and maybe basically works ok?
Notes:
Per week
Excel Files 250
Minutes 25
Minutes/File 0.1
Sec/File 6
Per month
Excel files 800
Minutes 40
Minutes/File 0.05
Sec/File 3

Resources