SSIS - Create Summary Output File - sql-server

I would like to use SSIS in orded to perform tranformations on multiple files (CSVs, Excels) which are comming from various datasources and the output should be always CSV files in certain structure.
One of the requirement after performing tranformation steps is to create a output summary file (MANIFEST FILE) about the results of the process in following structure.
BATCH_ID|EXTRACTED_FILE_NAME|MODEL_TYPE|RECORD_COUNT|TOTAL_QTY|GENERATED_ON_TE|CONTENTS_FROM_DATE|CONTENTS_TO_DATE|WORKSET_ID|FILE_STATUS|FILE_STATUS_TS
000005|NSL_B_YFRCARRAB0_PRODUCT_MASTER_20171122.txt|B|829||20171122121525|||||
Important columns:
Batch ID: ID of run
EXTRACTED_FILE_NAME: Name of created CSV file by SSIS (output file)
RECORD_COUNT: Number of rows in output file
TOTAL_QTY: SUM of column QTY
GENERATED_ON_TE: When the file was generated
STATUS_TS: Status - OK / FAIL
Is this output possible to achive in SSIS? Can I create it without using script compontent? If I have to use script compontent, can you navigate me little bit?
Many thanks,
Martin!

Related

Extract data from Word documents with SSIS to ETL into SQL

I could really use some help in how to extract data from Word documents using SSIS and inserting the extracted data in SQL. There are 10,000 - 13,000 Word files to process. The files most likely aren't consistent over the years. Any help is greatly appreciated!
Below is the example data from the Word documents that I'm interested in capturing. Note that Date and Job No are in the Header section.
Customer : Test Customer
Customer Ref. : 123456
Contact : Test Contact
Part No. : 123456789ABCDEFG
Manufacturer : Some Mfg.
Package : 123-456
Date Codes : 1234
Lot Number : 123456
Country of Origin : Country
Total Incoming Qty : 1 pc
XRF Test Result : PASS
HCT Result : PASS
Solder Test Result : PASS
My approach would be this:
Create a script in Python that extracts your data from the Word files and save them in XML or JSON format
Create SSIS package to load the data from each XML/JSON file to SQL Server
1. Using a script component as a source
To import data from Microsoft Word into SQL Server, you can use a script component as a data source where you can implement a C# script to parse document files using Office Interoperability libraries or any third-party assembly.
Example of reading tables from a Word file
2. Extracting XML from DOCX file
DOCX file is composed of several embedded files. Text is mainly stored within an XML file. You can use a script task or Execute Process Task to extract the DOCX file content and use an XML source to read the data.
How can I extract the data from a corrupted .docx file?
How to extract just plain text from .doc & .docx files?
3. Converting the Word document into a text file
The third approach is to convert the Word document into a text file and use a flat-file connection manager to read the data.
convert a word doc to text doc using C#
Converting a Microsoft Word document to a text file in C#

How to handle this scenario in single SSIS package?

I'm receiving around 100 excel files on daily basis ,in these 100 files there are 4 types of files which name start with (ALC,PLC,GLC and SLC) and then some random No. and each excel file sheetname is same as filename.
Now inside of each type and each file at cell A3 there is 'request by' and then user name for eg-Request by 'Ajeet' and we want to pick the file which is requested by only 'Ajeet', first few rows are not formatted, actual data start from.
ALC data start from A33 Cell
PLC data start from A36 Cell
GLC data start from A32 cell
SLC data start from A38 cell
And few files having no data so in that case "NoData" is mentioned in respective type of files from where data start.
All type of file containing same no. of column.
So how can we handle all these situation in SSIS and load the data into a single SQL table but without using script task. I have attached snapshot one of the file for your reference.
This will help.
how-to-read-data-from-an-excel-file-starting-from-the-nth-row-with-sql-server-integration-services
Copying the solution here in case the link is unavailable
Solution 1 - Using the OpenRowset Function
Solution 2 - Query Excel Sheet
Solution 3 - Google It
Google it, The information above is from the first search result

SSIS error handling: redirect rows that have zip code field more than 5 from a flat file

I have been given a task to load a simple flat file into another using ssis package. The source flat file contains a zip code field, now my task is to extract and load into another flat file that accepts only the ones with correct zip code which is 5 digit zip code , and redirect the invalid rows to a new file.
Since I am new to SSIS, any help or ideas is much appreciated.
You can add a derived column which determines the length of the field. Then you can add a conditional split based on that column. <= 5 goes the good path, > 5 goes the reject path.

Reading and writing to xls and doc files in c

I have this particular problem where i have to write a c program that reads numerical data from a text file. The data is tab delimited. Here is a sample from the text file.
1 23099 345565 345569
2 908 66766 66768
This is data for clients and each client has a row.Each column represents customer no.,previous balance,previous reading, current reading.Then i have to generate a doc. document
that summarizes all this information and calculates the balance I can write a function that does this but how do i create an xls document
and a word document where all the results are summarized using the program? The text document has only numerical data. Any ideas
The easiest way is to create a csv file and not a xls file.
Office can open those csv files with good results.
And it is way easier to create a ascii text file with commaseparated values,
than to create something into a closed format like the ms office formats.
The simplest way to create a spreadsheet that contains formulas and formatting, and can be opened by Excel, is to create an XML Spreadsheet file.

Need to extract/consolidate info from database files

Here's a summary of my problem:
Our company's old software had a large database of contacts in it.
We switched to a new program and have no way to easily transfer those contacts to it.
The contacts database appears to have 4 files which can all be opened in Excel, but not MSAccess. The four files contain the following:
File 1: A nicely formatted spreadsheet of names and some other BASIC info for each contact. There is an ID number on each one, but the numbers do not seem to correspond to anything in File 2.
File 2: Info on each contact, but not in rows. Instead it looks something like this :
JHGH_CONTACT_BLOB: 1426367745
EMAIL: SMITH
WEB:
PHONE_COUNT: 1
FAX_COUNT: 0
ADDRESS_COUNT: 0
NOTE_COUNT: 0
555-7364
(I changed some info for privacy reasons)
Each blob of info is on a separate spreadsheet row. Each starts off with the same first line, even the number is the same, so it can't be some sort of ID number.
File 3: A file containing a lot of gobbledygook, interspersed with a few readable bits of text here and there. The readable text looks like it belongs to the database (ie, it is info on contacts like place of work and other notes.)
File 4: Contains one row and one column labeled ID, with the number 12725 in it.
I need to somehow get the info from File 2, into the nicely formatted file 1. In essence, I need to add the phone numbers, emails etc included in a messy fashion in file 2 on their proper rows in file 1.
This probably makes little sense and I thank you for even reading down this far. If you have any suggestions, I'd love to hear them.
Thanks
We have established that you have a DBF file, an FPT file and a CDX file. These are likely to all relate to Visual FoxPro (a now discontinued Microsoft product).
The .dbf file can be opened in Excel via the standard file open dialog by changing "Files of type" to "dBase files (*.dbf)". Going by your original post, Excel seems to be able to open this sensibly in the first place.
The combination of all three files might be accessible by downloading this OLE DB provider for FoxPro which would let you access the database from Excel using the methods outlined here
You can get more info on the specific file structures at the following links: DBF, FPT and CDX. The DBF contains most of the data, the FPT contains binary memo data and the CDX is an index file.

Resources