Is there a way to automatically generate SSIS packages? I need to create a lot of SSIS packages that just erase data from one table and import data from a text file. The file name matches table name and the column headers are in the first line of the file.
For more detailed information:
I am working on a project in which I have to separate two systems that are currently coupled (one system has direct access to the other's database). After the modifications, one system will provide data through txt files to be loaded in the other database.
We have to use SSIS to load data into the database from the text files.
The text files will be provided in CSV format with column headers in the first line.
The tables from both databases have matching column names, and all we need to do is clear the table and load data from the files.
I have more than one hundred tables with different number of columns. Do I need to create each package manually?
I'm familiar with 2 free options.
EzAPI might be a good place if you're a .NET heavy shop or just really want to geek out with the API. This approach allows you to control the pretty much the entire package generation but at the cost of coding time. I find EzAPI generally easier than working with the base COM/.NET libraries for SSIS.
Biml is an interesting beast. Varigence will be happy to sell you a license to Mist but it's not needed. All you would need is BIDSHelper and then browse through BimlScript and look for a recipe that approximates your needs. Once you have that, click the context sensitive menu button in BIDSHelper and whoosh, it generates packages.
I did this just using vb, I passed in the table names as a command parameter and used vb to generate the insert and clear, worked a charm... I can try and dig it out tomorrow when I'm back in the office but it was pretty simple. There didn't seem to be any other way to say "just get x and export it", "just take y and import it into z" so vb it had to be. In fact come to think of it I think I actually used a small xml file to pass the table info for export and then determined the table name for import from the csv file name. To be clear, this was only one package but it could dynamically choose the number of imports/exports it did. Further clarification this was vb within ssis as a processing step
Related
The firm I work in has a lot of data sources entering the firm database using the Informatica ETL tool, stored in maplets and other data models (sorry If I'm not using the exact terminology).
The problem is that all the business logic is stored in the 'graphical interface' and nowhere else - Every time I want to see what field goes into the target field I have to trace the inputs through the maplet and that takes a very long time.
The Question is: Is there a tool that can takes all the relationships in the Informatica maplet and somehow export them to a excel table (so I can see it all without tracing)? that way I could try to make proper documentation....
Thanks in Advance.
It's possible to export mappings or whole workflows to XML. Next, you can use this tool - it will create tables with source to target dependency for every mapping.
Keep in mind it will only map input to output, it won't extract the full logic and transformations done along the way - that would've been to complex for simple visualization.
Informatica supports exporting mapping information to Excel - just search the documentation which tells you how to do it.
However, for anything other than the simplest of mappings, what ends up in Excel is not that easy to understand. If your Informatica installation supports it, then using the lineage capabilities is a much better bet.
I will preface this right off the bat by saying that I am new to database design.
I have been working to rewrite some legacy code that controls an import process into one of our pieces of software. Part of this new process includes the modification of the incoming XML files (that come into our system via FTP) to remove certain elements as well as swap values in special cases.
As a part of the new system, we are implementing a way of versioning inside the database so that we can pull the most recent version of the xml directly from that instead modifying the file over and over again. In order to prove that this can be done, I have created a very simple table inside of SQL Server 2016 that stores the XML, then wrote a simple PowerShell script to pull that XML file from the database and store it inside of an object. Now that I know that this is indeed possible, I need to refine how I design the table.
This is where my expertise starts to take a hit. As of right now, the table contains three columns: xml_Version, xml_FileID, and xml_FileContents.
The general idea is to have a GUID (xml_FileID) that is tied to each version of the XML and another column that indicates what version of the XML that is. I would also assume that you need some way of tying each version of the XML to it's original file, too.
I was hoping that someone could point me in the right direction about how I should go about designing the table to accomplish this task. I can provide more information if needed.
Thanks.
Edit: I think what I'm having the most trouble grasping is what I should be referencing when I'm trying to grab data out of the database. Storing the XML in the table with a unique identifier is the easy part - but the unfortunate part is that there's nothing in the XML itself that I can grab out of there that would be able to uniquely identify the correlating data within the database. Does that make sense?
I have to import about 50 different types of files every day. Some of them with a few columns, some inculde up to 250 columns.
The Flat File connection always defaults all columns to 50 chars.
Some columns can be way longer than 50 chars, and will of course end up in errors.
Currently i am doing a stupid search&replace with notepad++ - Opening all SISS packages, replacing:
DTS:MaximumWidth="50"
by
DTS:MaximumWidth="500"
This is an annoying workaround.
Is there any possibility to set a default length for flatfile string columns to a certain value?
I am developing in Microsoft Visual Studio Professional 2015 and SQL Server Data Tools 14.0.61021.0
Thanks!
I don't think that there is a way to achieve this from SQL Server Data Tools.
But you can do some workaround to achieve this:
Easiest solution, In the Flat file connection manager - Advanced Tab, select all columns (using Ctrl key) and change the data length property for them all in one edit. (detailed in #MikeHoney answer)
You can use BIML (Business Intelligence Markup Language) to create ssis package, if you're new to BIML you can access to BIML Script website for detailed tutorials.
You can create a Small application that loop over .dtsx files in a folder and replace DTS:MaximumWidth="50" with DTS:MaximumWidth="500" using normal String.Replace function or using Regular expressions. (you can read my answer # Automate Version number Retrieval from .Dtsx files to see an exmaple on reading .dtsx file using Regular expressions)
Function To Read and Replace content of dtsx file (Vb.Net)
Public Sub FixDTSX(byval strFile as string)
dim strContent as string = string.empty
Using sr as new Io.StreamReader(strFile)
strContent = sr.ReadToEnd()
sr.Close()
End Using
strContent = strContent.Replace("DTS:MaximumWidth=""50""","DTS:MaximumWidth=""500""")
Using sw as new Io.StreamWriter(strFile,False)
sw.Write(strContent)
sw.Close()
End Using
End Sub
There is a way to achieve what you want using the standard Visual Studio SSDT UI, although it is quite obscure. AFAIK it works in every version of this editor since SQL Server 2005.
With the package open, from the Connection Managers pane, right-click your Flat File Connection and choose Edit. Then navigate to the Advanced page. Then multi-select the columns you want to change (e.g. shift-click a range or ctrl-click a specific set). Now the Properties appearing at the right will be applied to all the selected columns.
In the example shown below, I have set all the selected columns to a width of 255.
Esteban,
I suggest you use the Object Model API which allows you to develop SSIS packages programmatically. Using that, you can make use of any .net code that allows you to gather data/metadata from text files. Also, the assumption is that, since you are using SSIS, you already might be familiar with writing code in C#/VB.Net
Now, if you are just starting with the Object Model API, there would be a huge learning curve (but it is worth learning it if SSIS is your day to day life). If you do not have the time to invest right now, I would recommend you to use a library I wrote (called Pegasus) which greatly simplifies how you can use the Object Model API; you can create your packages in an almost declarative fashion (using C#).
On Github, there is an example that shows how to create a package that loads any number of text files with differing schemas in a given folder. See here; specifically the method GenerateProjectToLoadTextFilesToSqlServerDatabase().
In the example, I use a third party .Net library called lumenworks.framework to probe delimited files and get their metadata. Using this library, I get the names of the columns; and I also infer data types and lengths based on sampling the first 'n' number of rows. (In my code, I am only inferring ints, dates and strings; if you have more data types, add relevant code accordingly). Or you can specify one specific data type and length (looks like you want to use string of 500 chars) for all your columns. [Or (in some cases), you might have this metadata available outside in a excel file/config file.] Then I use this metadata to configure my text file connection managers programmatically.
YOu can download the code from Github and run the DataFlowExample by specifying where your source files are and see how far it gets you.
Another recommendation would be Biml, but I am not sure if you can incorporate your own/third party full fledged C# code (not just snippets) into Biml workflow. If you can, then go with Biml.
Let me know if you have any questions.
I did see some other posts on this, but they were rather old and there does not appear to be any solutions at this point.
I'm trying to determine where a particular table(s) that SSIS is loading during a monthly job is being used in other packages. The package that loads these tables have in the past several months been taking much longer than before, and I'm trying to see if I can eliminate this load all together.
I just happened to check the Allocation packages in our database to see how the tables were being used, and discovered that I can't find anywhere when/where those tables are being used. Is there a function or query I can run in SSMS or elsewhere to determine how to find this information?
Thx in advance - please let me know if I need to clarify something.
The packages are just XML files. If you have the packages somewhere on your file system you can use any program that searches through text files.
I'm not sure about older SSIS projects but with an SSIS project in Data Tools for SQL Server 2012 you can just use the build in search function to search through your entire solution. It will also search in the XML of all the packages.
If you don't have this particular information saved anywhere already in your documentation then I think you are going to have some difficulty in finding an accurate way to retrieve this information. However, there are a few automated data collection options that might help you get most of the way there.
The first option is that because all SSIS Packages are essentially glorified XML that is being fed into an engine you can perform a patterned search on the packages like GREP to look for that particular table name. Any packages that dynamically retrieve and build the table name though would not be found through this method.
Another option would be to run a server side SQL trace with a pattern match based on the table name(s) and limited to the host or application name of SSIS. Run over the course of a month or so would make for a fairly accurate list.
I haven't used it myself, but the DOC xPress tool from PragmaticWorks might be what you're looking for.
I want to export all my queries as individual files for purposes of putting them into mercurial source control, but I don't know how to export the individual queries as individual files without having to open each one, then save to the folder, then add into the project, or some equally convoluted process.
I wouldn't mind having to add each one individually, but how do I get them out of the database as individual files without opening them all and doing each one save as? Ostensibly I would like them named with the name they have in the database right now.
I could easily dump the whole lot into one long file using database tasks, but that's not really super helpful is it?
I have SSMS 2k5 and 2k8 (and VS 2k5, 2k8, 2010 to boot) to work with, any thoughts?
Right click on the database. Select Generate Script. On the last page. Script To file you can choose single file or file per object
When you script a database in SSMS you have the option of one file per objects.
SMO is useful with a small app to iterate through
Third party tools like Red Gate SQL Compare (there are other free tools) can script too
I would write a small C# program which extracts your database object via SMO and stores them in your filesystem the way you want.
It is rather easy to write stored procedures which fetches the definition into the result as text. sp_helptext could be used as start.
Than you can use PowerShell to write the Output to the file system.
It sounds as if this would fit rather good into the Really Simple Data Dictionary codeplex project. link text