I have a database called CommonDB. I have created shared data sets from this database in one of my report projects. I now have a need to include the same shared data sets in another report project. Ideally it would be nice if I could just point it to the testing site in BIDS and develop my report based on a reference.
I was wondering if there is a way to do this without adding existing data set (as I was hoping to keep the code base the same so I wouldn't have to update it in different projects).I am aware you can also add existing data sets from url but that defeats the purpose as it just downloads a copy to my report solution and it's not synced.
Any ideas? Thanks :)
This scenario is not supported by BIDS/Visual Studio.
It gets worse: If you deploy multiple projects to the same server, using the same Shared Datasets folder, then at runtime they will try to reuse each-others Shared Dataset definitions. The latest deployed Shared Dataset definition will win.
Data Sources have similar issues - although they tend to be less volatile.
To avoid this, I prefer to keep all reports for a single SSRS server instance in a single BIDS/Visual Studio Project. I use Project Configurations to manage deployment of reports to disparate folders.
Related
I've worked a lot with Pentaho PDI so some obvious things jump out at me.
I'll call Connection Managers "CMs" from here on out.
Obvious, Project CMs > Package CMs, for extensability/ re-usability. Seems a rare case indeed where you need a Package-level CM.
But I'm wondering another best practice. Should each Project CM itself be composed of variables? (or parameters I guess).
Let's talk in concrete terms. There are specific database sources. Let's call two of them in use Finance2000 and ETL_Log_db. These have specific connection strings (password, source, etc).
Now if you have 50 packages pulling from Finance2000 and also using ETL_Log_db ... well ... what happens if the databases change? (host, name, user, password?)
Say it's now Finance3000.
Well I guess you can go into Finance2000 and change the source, specs, and even the name itself --- everything should work then, right?
Or should you simply build a project level database called "FinanceX" or whatever and make it comprised of parameters so the connectoin string is something like #Source + # credentials + # whatever?
Or is that simply redundant?
I can see one benefit of the parameter method is that you can change the "logging database" on the fly even within the package itself during execution, instead of passing parameters merely at runtime. I think. I don't know. I don't have a mountain of experience with SSIS yet.
SSIS, starting from version 2012, has SSIS Catalog DB. You can create all your 50 packages in one Project, and all these packages share the same Project Connection Managers.
Then you deploy this Project into the SSIS Catalog; the Project automatically exposes Connection Manager parameters with CM prefix. The CM parameters are parts of the Connection Manager definition.
In the SSIS Catalog you can create so called Environments. In the Environment you define variables with name and datatype, and store its value.
Then - the most interesting part - you can associate the Environment and the uploaded Project. This allows you to bind project parameter with environment variable.
At Package Execution - you have to specify which Environment to use when specifying Connection Strings. Yes, you can have several Environments in the Catalog, and choose when starting Package.
Cool, isn't it?
Moreover, passwords are stored encrypted, so none can copy it. Values of these Environment Variables can be configured by support engineers who has no knowledge of SSIS packages.
More Info on SSIS Catalog and Environments from MS Docs.
I'll give my fair share of experience.
I recently had a similar experience at work, our 2 main databases name's changed, and i had no issue, or downtime on the schedules.
The model we use is not the best, but for this, and for other reasons, it is quite confortable to work with. We use BAT files to pass named parameters into a "Master" Job, and basically depending on 2 parameters, the Job runs on an alternate Database/Host.
The model we use is, in every KTR/KJB we use a variable ${host} and ${dbname}, these parameters are passed with each BAT file. So when we had to change the names of the hosts and databases, it was a simple Replace All Text Match in NotePad++, and done, 2.000+ BAT Files fixed, and no downtime.
Having a variable for the Host/DB Name for both Client Connection and Logging Connection lets you have that flexibility when things change radically.
You can also use the kettle.properties file for the logging connection.
I've read about the use of Catalogs in 2012/14 SSIS as a replacement for Configurations in 2008. With that replacement, I haven't seen how people handled the scanario of a configuration that is used by all packages on the server such as a Server Connection or path location. With this scanario, all packages point to one configuration, and should something about that value change, all packages are updated. Is this possible with catalogs? It seems each project has their on catalog and if that is the case, everytime a server wide config / parameter changes, it needs to change in each project.
In the SSSIDB, a project lives under a folder. A folder may also contain an SSIS Environment.
When you right click on a project (or package) and select Configure, this is where you would apply configurations, much as you did in 2008. You can use an SSIS Environment that exists in the same folder as the projects, or you can reference one in a different folder. That is the approach I use and suggest to people.
In my Integration Services Catalog, I have a folder called "Configurations" (because it sorts higher than Settings). Within that, I create one Environment called "General". Many people like to make environments called Dev, Test, Prod but unless you have 1 SSIS server handling all of those, I find the complexity of getting my deployment scripts nice and generic to be much too painful.
I then deploy my projects to sanely named folders so the Sales folder contains projects like SalesLoadRaw, SalesLoadStaging, SalesLoadDW.
If I have created a new project, then I need to add a reference to Configurations.General collection and then associate the project item to the Environment item. For Connection Strings, you do not need to define a Variable to accept the string. You can directly assign to the properties of a connection manager (either project or package scoped).
The great thing about Configurations is that once you've assigned them, they persist through redeploys of the project.
The biggest thing that tends to bite people in the buttocks is that when you create an Environment and add those entries into them, DO NOT CLICK OK. Instead, click the Script button and script those to new window. Otherwise, you have to recreate all those entries for your dev/test/load/stage/production environments. I find it far cleaner to script once and then modify the values (SLSDEV to SLSPROD) versus trying to create them all by hand.
I am working on bringing an application comprised of several SqlServer databases into a source control system. I've come up with a solution using several project files, each representing one of the databases, but in order for it to compile, the database project references need to be defined for several database projects, or I get errors regarding missing dependencies and the like. With the references set up, the solution compiles ok.
However, with the solution publishing, I will need to publish the referenced database projects first if I don't want to get 'Invalid object name' script publishing errors on their referencing projects. I would like to have this configured and be able to publish by just clicking on the 'publish solution' button. Is there a way to define 'publication dependencies', similar to the compilation dependencies, that will allow me to do this?
I had something like this - had to turn off transactions (so I wouldn't roll back), then publish several times. In my instance, I only had one DB that required that, but I had to:
Publish DB A
Publish DBs B/C/D/E
Re-publish DB A to catch the rest of the objects.
I had a special Publish profile set up that did not use transactions just for this purpose. Other than that, you're probably going to have to keep some sort of track about dependencies if you create a lot of new objects that depend on other databases.
I have not come across any publication dependencies so far, but it would be helpful to have something like that to avoid these sorts of issues.
We have two databases, A and B. A contains tables that should also be deployed to B; however, A should always be considered as the master of those tables. We don't want to duplicate the schema object scripts. We do not want to simply reference A's table from B - they need to be separate, duplicated tables.
As far as I can see, there are two ways to achieve this:
Partial projects: export the shared schema objects to a partial project (.files) file, and import it into the B's database project
Adding shared schema object files to the B's database project as links.
These both have the disadvantage that you need to explicitly specify files - you cannot specify a folder, meaning that any time a schema object that needs sharing is added to A's database project, then either the partial project export would need to be run again, or the new file added as a link to B's project.
What are the advantages and disadvantages of these techniques? Are there any better ways of achieving this that I may have missed? Thanks.
Partial Projects are not supported in the VS2012 RC release, which makes me think that I shouldn't use them. Furthermore, it appears that Composite Projects within SQL Server Data Tools (SSDT) may be the long term solution.
I've discovered that it is possible to link to entire folders by editing the dbproj file manually. For example:
<Build Include="..\SourceDatabase\Schema Objects\Tables\*.sql">
<SubType>Code</SubType>
<Link>Schema Objects\Tables\%(FileName).sql</Link>
</Build>
This works quite nicely, so it will be our preferred solution until we evaluate SSDT. The main drawback that I've found with it so far is that it will include all files in the source database's folders, including those that are not included in the source database project.
We are building a webapp which is shipped to several client as a debian package. Each client runs his own server. But the update and support is done by us.
We make regular releases of the product, with a clean version number. Most of the users get an automatic update (by Puppet), some others don't.
We want to keep a trace of the version of the application (in order to allow the user to check the version in an "about" section, and for our support to help the user more accurately).
We plan to store the version of the code and the version of the base in our database, and to keep the info up to date automatically.
Is that a good idea ?
The other alternative we see is a file.
EDIT : The code and database schema are updated together. ( if we update to version x.y.z , both code and database go to x.y.z )
Using a table to track every change to a schema as described in this post is a good practice that I'd definitely suggest to follow.
For the application, if it is shipped independently of the database (which is not clear to me), I'd embed a file in the package (and thus not use the database to store the version of the web application).
If not and thus if both the application and the database versions are maintained in sync, then I'd just use the information stored in the database.
As a general rule, I would have both, DB version and application version. The problem here is how "private" is the database. If the database is "private" to the application, and user never modifies the schema then your initial solution is fine. In my experience, databases which accumulate several years of data stop being private, it means that users add a table or two and access data using some reporting tool; from that point on the database is not exclusively used by the application any more.
UPDATE
One more thing to consider is users (application) not being able to connect to the DB and calling for support. For this case it would be better to have version, etc.. stored on file system.
Assuming there are no compelling reasons to go with one approach or the other, I think I'd go with keeping them in the database.
I'd put them in both places. Then when running your about function you quickly check that they are both the same, and if they aren't you can display extra information about the version mismatch. If they're the same then you will only need to display one of them.
I've generally found users can do "clever" things like revert databases back to old versions by manually copying directories around "because they can" so defensively dealing with it is always a good idea.