I've been using Shared Data Sources in all of my SSIS projects because I thought it was a "best practice". However, now that everything is under source control (TFS) just about every time I open a package it updates the Data Source connection in the package. I either have to roll the change back or check it in with some nonsense description.
I saw this SSIS Best Practice blog entry and it got me thinking about whether Shared Data Sources are really the way to go.
Don’t use Data Sources: No, I don't
mean data source components. I mean
the .ds files that you can add to your
SSIS projects in Visual Studio in the
"Data Sources" node that is there in
every SSIS project you create.
Remember that Data Sources are not a
feature of SSIS - they are a feature
of Visual Studio, and this is a
significant difference. Instead, use
package configurations to store the
connection string for the connection
managers in your packages. This will
be the best road forward for a smooth
deployment story, whereas using Data
Sources is a dead-end road. To
nowhere.
What are your experiences with data sources, configuration and source control?
We use SVN, so it doesn't integrate in the same way the TFS does. When starting out with SSIS, I used Shared Datasource, but it got me into all sorts of trouble when I finally uploaded the package to run on a schedule. So now I use XML configuration files (package configurations) to provide the connection properties, and I've never had any trouble with these.
So I agree, share datasources = bad idea/lack of hair
when we were migrating from SSIS 2005 to 2008, data sources were quite painful. Configurations on the other hand are pretty flexible. Especially if you store configurations in one database table - that way you can easily change connections with just one UPDATE statement!
Related
As a company we have grown and we are now moving a couple of SQL Server 2016 databases over to a new server. We have SSIS packages that run off the databases that we are moving from server 1 to server 2.
Is there a way to easily identify using SSMS which SSIS packages use the current server and databases we are moving? Some of the old SSIS packages don't have documentation so we are trying to avoid physically opening up all the SSIS packages. We would prefer to identify the SSIS packages that are impacted.
Thank you!
Here are my solutions on the top of my head. I'm not an expert by any means so don't be surprised if someone comes up with something better.
In SSMS, you can view the data sources being utilized by a
package by... Object Explorer > SQL Server Agent > Jobs > (Your Job) > Steps > Edit... > Data Sources (Tab)
Here, you can view the data sources of your package. This is
slightly faster than opening all your SSIS packages; but, it isn't a
great solution either.
Conversely, recognize that .dtsx files are simply plain text
files. You can scan keywords within all of them using a number of
different scripts (PowerShell, Python, SSIS package w/ a Script
Task, etc.)
What you can use depends on the tech stack that your organization
supports but I imagining Googling for such a program/script would
not be difficult.
If you are utilizing SQL Server configurations in your packages
and you consistently do so for every package, you can query the
[SSIS_Configurations].[dbo].[SSIS Configurations]
NOTE: Solution (1) and (2) do not take configurations into account.
Hopefully, some of these solutions are helpful to you. I would be interested in an efficient means to do this without delving into scripts as well.
We originally dismissed using database projects in conjunction with TFS as our solution for our deployment and soucecontrol needs. However, in the interest of thoroughness, I'm exploring and prototyping it.
I've set up my database project (with add to source control checked). I've checked in the changes. Now, where do you develop from?
I've tried ...
connecting to the remote development server to make changes
syncing schema to (localdb)\Projects and making changes there
directly in the Source Control Explorer
With option 1 and 2 I don't see an automated way to add code to source control. Am I suppose to be working in the Source Control Explorer? (this seems a little silly)... Is there a way to commit the entire solution to source control? My apologies in advance, I'm a database developer and this concept of a "solution" is very foreign to me.
Also there were a lot of chatter about Visual Studios doing a lot of ugly things in the back ground that turned a lot of development shops off of database projects. Can someone share your experiences with me? Some of the pitfalls and gotchas.
And yes, we have looked at Redgate SourceControl (very nice tool).
Generally people do one of two things:
Develop in Visual Studio, via the Solution Explorer. Just open the project like you would any other project, add tables, indexes, etc. You even get the same GUI for editing DB objects as you get in SSMS. All changes will automatically be added to TFS Pending changes (just like any other code change), and can be checked in when you're ready.
Deploy the latest DB (using Publish in VS) to any SQL Server, make your changes in SSMS, then do a Schema Compare in Visual Studio to bring your changes back into your DB project so they can be checked into TFS.
I've been using DB projects for many years and I LOVE them! Every developer I've introduced them to, refuses to develop without them from that point on.
I'm going to explain you briefly how we use DB projects with TFS.
We basically have one DB already done and if we require any changes or new tables we create them or alter them directly in SQL Server (each developer has its own dev SQL Server).
Then in VS from the SQL Server Object Explorer we drag the tables we want into the DB project so when we check in the changes, every user in TFS would be able to get them and then publish that project that will generate and execute a script into the DB.
This is the way we use to develop when we need to add specific tables or records to the DB so we don't have to send emails with scripts or have them stored in an specific location (even with source control). This way we can get latest version of the project and publish it to ensure we have the latest DB version although it requires the user (who made the changes) to add them to the DB project.
Other way could be to do all the changes (and can be done without any problem) directly in the DB project and then publish it. That one would be a more right way to do it so you do all the changes directly in a source controlled project, but as you know, is always more comfortable to work directly through the SQLMS.
Hope this helps somehow.
We use the SSDT tools and have implemented the SQL Server Database Project Type to develop our databases:
http://www.techrepublic.com/blog/data-center/auto-deploy-and-version-your-sql-server-database-with-ssdt/
The definition of database objects and peripheral SQL Code (e.g. functions, sprocs, triggers etc) sit within the Visual Studio project and all changes are managed through VS. The interface is very similar to SSMS and, at this point doesn't cause any issues.
The benefits of this approach for us are as follows:
An existing SQL database can be imported into the SQL Server Project and managed through Visual Studio.
SQL object definitions & code can managed through the same version control system as the rest of the application code.
SQL Code can be checked for errors within Visual Studio in much the same way as you'd check your C# / VB for compilation / reference errors.
You can compare database schema's (within Visual Studio) between environments and easily identify key changes that you need to be aware of.
The SQL project can be compiled into a DACPAC file for automating deployment to different servers using a CI / Build Server (using the sqlpackage.exe utility without any custom scripts or code).
In essence developers can have a local version of the database to work on but would manage any changes through VS, then publish the changes to their local database. Once the changes are complete, the changes are committed to your version control system and then built centrally & automatically through a CI / Build server to ensure that all changes integrate and play nicely in much the same way that your other code is.
Hope that helps :)
I recently had the unfortunate experience of having the size of a field changed in a table that was being used in SSIS packages. The developer who wrote the packages had since retired, and had grouped them all in one VS BI project. She developed them solely on her local PC and moved them to a shared drive when she left.
Anyone with any knowledge of Protection Levels in SSIS knows what happened next. She saved them with the default EncryptSensitiveWithUserKey option, so as a result, I couldn't modify the package due to the fact that I wasn't her and I wasn't on her machine. Her AD account has long since been deactivated, and her machine may be checking people out at a grocery store now for all I know.
I had to recreate the packages from scratch. Fortunately, the protection level prevents you from saving or building, not from looking at things in Design, so I was able to replicate what she did, but it was a long, tedious process that took the better part of a full day to complete.
My question is: We use SourceSafe to maintain our projects, so that's where the new SSIS project will be going. Given the following:
We each have our own PCs with working folders that sync to SourceSafe
We do not have the option of saving the project on the database server itself; we can only deploy the pacakges.
Software details: MS Visual Studio 2008, VSS 8.0, MS SQL Server 2008 R2
What would be the best security option to configure our project with? I immediately see that DontSaveSensitive would be the logical approach, but I don't know where the passwords would then be supplied from. I would think a config file, but I don't know how to set that up for an SSIS deployment.
Thanks!
Use VSS as the source control provider for Visual Studio and add the SSIS project just like any other source project. After installing VSS, go to Tools\Options\Source Control and set VSS as the provider.
I've been using this setup for several months. There's no difference, as far as I can tell, between any of my C# projects and my SSIS projects in relation to VSS.
A little background:
I have a remote, stand alone SQL Server database that is truncated at the end of every weekend. The data is hardly relational, not normalized at all, and pretty annoying to work with. On top of that, the schema for this database cannot be modified at all, because it is recreated by a third party application. Before the database is destroyed each week, a backup is created of that week's data. On average each database will have between 500,000 and 2,000,000 records.
My task is to create a historical version of this database that is a superset of all of these database backups. It should tie into our other databases which contain related sets of information. I have already started on an application to perform this task, and I've gotten to the point where I'm able to match data with our other databases, but I'm wondering if theres any best practice to handling this kind of import.
How do I make sure that I have unique IDs in my historical version of this database? Are there any features in SQL Server that can do some of the heavy lifting in this for me?
Thanks for your time on this.
There's definitely a feature in SQL Server that can assist you and that feature is called SSIS (SQL Server Integration Services). One of the main uses of SSIS is for ETL (Extract, Transform, Load), which means extracting data from several diverse source, transforming it into whatever you need to get into your destination database (such as a data warehouse - any linking with existing data will also happen here), and finally loading it into your destination DB.
I think the best way to get started, if that's what you want of course, is to pick up a good book on SSIS and go through it. While reading, don't forget to play around with the BIDS (Business Intelligence Development Studio - one of the SQL Server tools) to create some test packages.
Furthermore, on the internet you'll find plenty of "getting started" articles.
For your case in particular what I would do is:
create a generic package that can import the data from a source DB (one of your weekly DBs) and insert it into the destination DB - this package can be parameterized using Parent Package Configuration.
create a main package that loops over all backups in a certain folder, restores them one by one and calls the generic import package for each restore. After each successful import, the Control Flow would delete the previously-restored DB.
I think I've given you enough material to investigate on now :-)
Good luck,
Valentino.
After having used an application for over 10 years, and been constantly limited by its lack of extensibility, we have decided to rewrite it fully from scratch. Because the new architecture differs from the old application, the database is also different. Here comes the problem: Is there any industrial process for migrating the data from the previous database to the new one? Some tables are alike, others are not. Overall, we need a process that will help us make sure that no data or logical constraints is lost during the migration.
PS: The old and new database are both Oracle databases.
Although you don't specify this in your question, I assume that you're going to develop the new version of your application/database, and then at some switchover point you need to migrate all of the live data from your old database into your new database.
If this is the case, then you're really asking about two distinct processes: the migration (with some modifications) of the database structure, followed later by the migration of the data itself.
For the first process, the best tool is you, the developer (I don't mean you're a "tool" - you know what I mean). You could bring over the structure of the old database and then change it as necessary for the new version; however, this approach in general tends to leave too much of the old structure behind. I think it's better to take advantage of the situation and rebuild the database from the ground up, using the original database just as a general reference.
For the second process, I would treat the data migration as a separate task requiring a separately-written and -tested application. This application could be a set of scripts or a compiled application or whatever is most convenient for you. Because your old and new databases will not have the same structure (and may in fact be very different), there are no commercial tools out there that will handle this task for you automagically. By treating this as a distinct application that you write yourself, you can test the data conversion process many times before your "go live" date.
I've heard of several different ways to attack problems such as this. The most simple solution I've seen is to use a Microsoft Access database and use ODBC connection to connect to both the new and old Oracle databases. You can then use Access to migrate and transform the data as you need.
The more elegant solution involes installing Microsoft SQL Server Development tools. You can use Business Intelligence Development Studio to create a SSIS package with two Oracle endpoints. SSIS can handle the heavy lifting of transforming the data between the databases and you can run the package locally so you don't have to have an instance of SQL Server running anywhere.
There's a tutorial series for SSIS at:
http://www.developerdotstar.com/community/node/364
You might also want to check out Oracle Warehouse Builder (OWB). The name is a little confusing, but it's Oracle's ETL (Extract, Transform, and Load) package. I've never used it personally, but it might do what you're looking to do as well.