How to design a versioning table inside SQL Server 2016

How to design a versioning table inside SQL Server 2016 - sql-server

I will preface this right off the bat by saying that I am new to database design.
I have been working to rewrite some legacy code that controls an import process into one of our pieces of software. Part of this new process includes the modification of the incoming XML files (that come into our system via FTP) to remove certain elements as well as swap values in special cases.
As a part of the new system, we are implementing a way of versioning inside the database so that we can pull the most recent version of the xml directly from that instead modifying the file over and over again. In order to prove that this can be done, I have created a very simple table inside of SQL Server 2016 that stores the XML, then wrote a simple PowerShell script to pull that XML file from the database and store it inside of an object. Now that I know that this is indeed possible, I need to refine how I design the table.
This is where my expertise starts to take a hit. As of right now, the table contains three columns: xml_Version, xml_FileID, and xml_FileContents.
The general idea is to have a GUID (xml_FileID) that is tied to each version of the XML and another column that indicates what version of the XML that is. I would also assume that you need some way of tying each version of the XML to it's original file, too.
I was hoping that someone could point me in the right direction about how I should go about designing the table to accomplish this task. I can provide more information if needed.
Thanks.
Edit: I think what I'm having the most trouble grasping is what I should be referencing when I'm trying to grab data out of the database. Storing the XML in the table with a unique identifier is the easy part - but the unfortunate part is that there's nothing in the XML itself that I can grab out of there that would be able to uniquely identify the correlating data within the database. Does that make sense?

Related

Is there a way to export a Informatica maplet 'graphical' data to a simple csv/Excel file?

The firm I work in has a lot of data sources entering the firm database using the Informatica ETL tool, stored in maplets and other data models (sorry If I'm not using the exact terminology).
The problem is that all the business logic is stored in the 'graphical interface' and nowhere else - Every time I want to see what field goes into the target field I have to trace the inputs through the maplet and that takes a very long time.
The Question is: Is there a tool that can takes all the relationships in the Informatica maplet and somehow export them to a excel table (so I can see it all without tracing)? that way I could try to make proper documentation....
Thanks in Advance.

It's possible to export mappings or whole workflows to XML. Next, you can use this tool - it will create tables with source to target dependency for every mapping.
Keep in mind it will only map input to output, it won't extract the full logic and transformations done along the way - that would've been to complex for simple visualization.

Informatica supports exporting mapping information to Excel - just search the documentation which tells you how to do it.
However, for anything other than the simplest of mappings, what ends up in Excel is not that easy to understand. If your Informatica installation supports it, then using the lineage capabilities is a much better bet.

How to pass database name dynamically to RoundHousE

I am trying to setup the RoundHousE project in my application to handle the database migration and version handling. I am following this article. It is fine as far as I know the database name exactly.
But I am not able to find, how should I handle the dynamic name of databases, because in my application I have separate database for each client, and list of these databases in a table in my main database. So name goes like: client1_db, client2_db etc.
Any solution or pointer towards the solution will be a great help.

A pointer towards the solution - The wiki https://github.com/chucknorris/roundhouse/wiki
Asking about passing the database name dynamically is a bit weird to me as when you run rh.exe or use the embedded DLL, database name is one of the required arguments. So you always have to pass the name dynamically. See https://github.com/chucknorris/roundhouse/wiki/ConfigurationOptions#main-stuff
Reading and trying to understand what you are asking, it seems you have a list of database names in a main database somewhere that you want to give to RoundhousE? To do that you would need to create something custom that can gather the name of the database(s) you are looking for and provide the result to RoundhousE.

Generating several similar SSIS packages (file data source to DB)

Is there a way to automatically generate SSIS packages? I need to create a lot of SSIS packages that just erase data from one table and import data from a text file. The file name matches table name and the column headers are in the first line of the file.
For more detailed information:
I am working on a project in which I have to separate two systems that are currently coupled (one system has direct access to the other's database). After the modifications, one system will provide data through txt files to be loaded in the other database.
We have to use SSIS to load data into the database from the text files.
The text files will be provided in CSV format with column headers in the first line.
The tables from both databases have matching column names, and all we need to do is clear the table and load data from the files.
I have more than one hundred tables with different number of columns. Do I need to create each package manually?

I'm familiar with 2 free options.
EzAPI might be a good place if you're a .NET heavy shop or just really want to geek out with the API. This approach allows you to control the pretty much the entire package generation but at the cost of coding time. I find EzAPI generally easier than working with the base COM/.NET libraries for SSIS.
Biml is an interesting beast. Varigence will be happy to sell you a license to Mist but it's not needed. All you would need is BIDSHelper and then browse through BimlScript and look for a recipe that approximates your needs. Once you have that, click the context sensitive menu button in BIDSHelper and whoosh, it generates packages.

I did this just using vb, I passed in the table names as a command parameter and used vb to generate the insert and clear, worked a charm... I can try and dig it out tomorrow when I'm back in the office but it was pretty simple. There didn't seem to be any other way to say "just get x and export it", "just take y and import it into z" so vb it had to be. In fact come to think of it I think I actually used a small xml file to pass the table info for export and then determined the table name for import from the csv file name. To be clear, this was only one package but it could dynamically choose the number of imports/exports it did. Further clarification this was vb within ssis as a processing step

storing database values in source control

We have a table in our our database that stores XSL's and XSD's that are applied to XML documents created in our application. This table is versioned in the sense that each time a change is made, a new row is created.
I'm trying to propose that we store the XSL's and XSD's as files in our Source control system instead of relying on the database to track the history. Each time a file is updated, we would deploy the new version to the database.
I don't seem to be getting much agreement on the issue. can anyone help me out with pros and cons of this approach? Perhaps I'm missing something.

XSL and XSD files are part of the application and so ought to be kept under source control. That's just obvious. Even if somebody wanted to catgorise them as data they would be reference data and so - in my book at least - would need to be kept under source control. This is because reference data is part of the application and so part of its configuration. For instance, applications which use the database to store values for drop downs or to implement business rules need to be certain that it holds the right version of the data.
The only argument for keeping multiple versions of the files in the dtabase would be if you might need to process older versions of the XML files. This depends on the nature of your application. Certainly I have worked on systems where XML files / messages came from external (third party) systems, where we really had no control over the format of the messages sent. So for a variety of reasons we needed to be able to handle incoming XML regardless of whether its structure was current or historical. But is is in addition to storing the files in a source control repository, not instead of.

How should I rename many Stored Procedures without breaking stuff?

My database has had several successive maintainers over the years and any naming guidelines that may have once been in place have been ignored.
I'd like to rename the stored procedures to a consistent format. Obviously I can rename them from within SQL Server Management Studio, but this will not then update the calls made in the website code behind (C#/ASP.NET).
Is there anything I can do to ensure all calls get updated to the new names, short of searching for every single old procedure name in the code? Does Visual Studio have the ability to refactor such stored procedure names?
NB I do not believe my question to be a duplicate of this question as the latter is solely about renaming within the database.

You could make the change in stages:
Copy of the stored procedures to the new stored procedures under their new name.
Alter the old stored procedures to call the new ones.
Add logging to the old stored procedures when you've changed all the code in the website.
After a while when you're not seeing any calls to the old stored procedures and you're happy you've found all the calls in the web site, you can remove the old stored procedures and logging.

You can move the 'guts' of the SPROC to a new SPROC meeting your new naming conventions, and then leave the original sproc as a shell / wrapper which delegates to the new SPROC.
You can also add an 'audit' table to track when the old wrapper SPROC is called - this way you will know that there are no dependencies on the old SPROC, and the old SPROC can be safely dropped (also, make sure that it isn't just 'your app' using the DB - e.g. cross database joins or other apps)
This has a small performance penalty, and won't really buy you that much (other than being able to 'find' your new SPROCs easier)

You will need to handle this in at least two areas, the application and the database. There could be other areas as well, and you have to be careful not to overlook them.
The Application
A Nice Practice for Future Projects
It helps to abstract your sprocs out. In our apps, we wrap all of our sprocs in a giant class, I can make calls like this:
Dim SomeData as DataTable = Sprocs.sproc_GetSomeData(5)
That way, the code end is nice and encapsulated. I can go into Sprocs.sproc_GetSomeData and tweak the sproc name in just one place, and of course I can right click on the method and do a symbolic rename to fix the method call solution-wide.
Without the Abstraction
Without that abstraction, you can just do Find In Files (Cntl+Shift+F) for the sproc name and then if the results looks right, open the files up and Find/Replace all the occurances.
The Sql Server
Don't Trust View Dependencies
On the SQL server end, theoretically in MSSMS 2008 you can right click on a sproc and select View Dependencies.
That should show you a list of all the places where the sproc is used in the database, however my confidence in this feature is very low. It might be better in SQL 2008, but in previous versions it definitely had problems.
View Dependencies hurt me, and it will take time for that to heal. :)
Wrap It!
You end up having to keep the old sproc around for awhile. This is the major reason why renaming sprocs is a such a project - it can take a month to finally be done with it.
First replace its contents with some simple TSQL that calls the the new sproc with the same parameters, and write some logging so that once some time goes by, you can tell if the old sproc is actually unused.
Finally, when you're sure the old sproc is unused, delete it.
Other Areas?
There could be a lot of other areas as well. Reporting Services springs to mind. SSIS packages. Using the technique of keeping the old sproc around and re-routing to the new one (mentioned above) will help you know if you missed anything, however it won't tell you what you missed. This can lead to much pain!
Good luck!

Short of testing every path in your application to ensure that any calls to the database and the relevant stored procedures have been updated... no.
Use global search and replace (but review each suggested replacement) to try to avoid missing any instances. If you app is well structured then there really should only be 1 place each stored proc is called.

As far as changing your application, I have all my stored procs as settings in the web.config file, so all the names are in one place and can be changed at any time to match changes to the database.
When the application needs to call a stored proc, the name is determined from web.config.
This makes it easier to manage all the potential calls which the application could make to the database services layer.

It will be a bit of a tedious search through your source code and other database objects I'm afraid.
Don't forget SSIS Packages, SQL Agent Jobs, Reporting Services rdl as well as your main application code.
You could use a regular expression like spProc1|spProc2 to search in the source code for all object names at the same time if you have a tool that supports searching through files using regular expressions (I have used RegexBuddy for this in the past)
If you want to just cover the possibility you might have missed the odd one you could leave all the previous stored procedures behind for a month and just have them log a custom SQL trace event with APP_NAME(), SUSER_NAME() and any other info you find helpful then have it call the renamed version. Then set up a trace monitoring this event.

If you use a connection to DB, stored procedures etc, you should create a service class to delegate these methods.
This way when something in your database, SP etc changes, you only have to update your service class, and everything is protected from breaking.
There are tools for VS that can manage changing a name, like refactor, and resharper

I did this and I relied heavily on global search in my source code for stored procedure names and SQL digger to find sql procs that called sql proces.
http://www.sqldigger.com/
SQL Server (as of SQL 2000) poorly understands it own dependencies, so one is left searching the text of the scripts to find dependencies, which could be other stored procs or substrings of dynamic sql.

I would obtain a list of references to a procedure by using the following, because SSMS dependencies doesn't pickup dynamic SQL references or references outside the database.
SELECT OBJECT_NAME(m.object_id), m.*
FROM SYS.SQL_MODULES m
WHERE m.definition LIKE N'%my_sproc_name%'
The SQL needs to be run in every database where there could be references.
syscomments and INFORMATION_SCHEMA.routines have nvarchar(4000) columns. So if "mySprocName" is used at position 3998, it won't be found. syscomments does have multiple lines but ROUTINES truncates. Should you disagree, take it up with gbn.
Based on that list of dependencies, I'd create new stored procedures starting the foundation stored procedures - those with the least dependencies. But I'd mind not to create stored procedures, prefixing the name with "sp_"
Verify the foundation procedures work identically to existing ones
Move to the next level of stored procedures - repeat steps 1-3 as needed till the highest level procedure has been processed.
Test the switch over the application uses to the new procedure - don't wait until the all the procedures are updated to test interaction with the application code. This doesn't need to be done for every stored procedure, but waiting to do this wholesale isn't a great approach either.
Developing in parallel has it's risks too:
Any changes to existing code needs to also be applied to the new code. If possible, work in areas where development is frozen or use a bug fix as an opportunity to migrate to new code rather than apply the patch in two places (while also minimizing downtime for transition).

Use a utility like FileSeek to search the contents inside each and every file in your project folder. Don't trust the windows search - it's slow and user-unfriendly.
So if you had a Stored Procedure named OldSprocOne and want to rename it to SP_NewONe, search all occurrences Of OldSprocOne then search all occurrences of OldSprocOne to see if that name isn't already being used somewhere else and won't cause problems. Then rename each and every occurrence in the code.
This can be very time consuming and repetitive for larger systems.

I would be more concerned about ignoring the names of the procedures and replacing your legacy DAL with Enterprise Library Data Access Block 5
Database Accessors in Enterprise Library 5 DAAB - Database.ExecuteSprocAccessor
Having code that is like
public Contact FetchById(int id)
{
return _database.ExecuteSprocAccessor<Contact>
("FetchContactById", id).SingleOrDefault();
}
Will have atleast a billion times more value than having stored procs with consistent names, especially if the current code passes around DataTables or DataSets ::shudders::

I'me all in favor of refactoring any sort of code.
What you really need here is a method slowly and incrementally renaming your stored procs.
I certainly would not do a global find and replace.
Rather, as you identify small pieces of functionality and understand the relationships between the procs, you can re-factor in small pieces.
Fundamental to this process, though, is source-code control of your database.
If you do not manage changes to your database the same as normal code, you will be in serious trouble.
Have a look at DBSourceTools. http://dbsourcetools.codeplex.com
It's specifically designed to help developers get their databases under source code control.
You need a repeatable method of restoring your database to a specific state - prior to refactoring.
Then re-apply your refactored changes in a controlled way.
Once you have embraced this mindset, this mammoth and error-prone task will become simple.

This is assuming that you use SQL Server 2005 or above. An option that I have used before is to rename the old database object and create a SQL Server Synonym with the old name. This will allow for you to update your objects to whatever convention you choose and replace the refrences in code, SSIS packages, etc... as you come along them. Then you can concentrate updating the references in your code gradually over however maintenance releases you choose (as opposed to breaking them all at once). As you feel that you've found all references you can remove the synonym as the code goes to QA.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight