Invalid Data Causes SSIS Flat File Import to Hang - file

I'm working on a SSIS package in VS 2010 for a 2012 instance. I'm importing flat files from a vendor who will not clean their data. I cannot get beyond the "Flat File Source" step because the data is so mangled it hangs and won't continue parsing.
Here is an example of good data with headers:
EventID|AccountID|ListID|ID|Date
1|3000|20030|1092997696|10-Nov-2014 09:36:13
Here is bad data that will (and is) be/ing captured by error handling:
1|3000|20030|1092997696;ҧ��DAVNJ��|11-Nov-2014 06:40:28
Here is data that hangs my package:
1|3000|20030|1092997696ci[
a5��~[�t:RW�uXXïA,u��ïn��I� �JA!QXQ|11-Nov-2014 08:27:27
How do I deal with this? Remember, I cannot get beyond the flat file parse step to use a derived column/conditional split/script task.
Thanks in Advance!
Kirsten

Figured out a way to get the data in so I can clean it!
CREATE TABLE dbo.crap_data_varcharmax(
DataBlob NVARCHAR(MAX));
BULK INSERT dbo.crap_data_varcharmax
FROM '\\SQLSERVERNAME01\e$\Folder\FileName.txt' WITH (ROWTERMINATOR = '\n', FIELDTERMINATOR = '|', FIRSTROW = 2);
SELECT *
FROM dbo.crap_data_varcharmax;

Related

pyodbc - Image from Microsoft SQL Server writing in binary mode error

Hopefully I'm missing something obvious here...
Whenever I try and use pyodbc to convert and save images from a Microsoft SQL Server, I generate files that are not recognised as images files.
Python code:
cursor = conn.cursor()
q = "SELECT top 1 * from reactorA.dbo.ProcessedImageData"
records = cursor.execute(q).fetchall()
for row in records:
data = row.AnalysedImage
with open('test.bmp','wb') as f:
f.write(data)
When I open the newly created image file it's just repeating ÿ symbols
When I view the record it seems to be in the correct bytes format
b'\xff\xff\xff .......
Any help on fixing this will be greatly appreciated.

DolphinDB error: SegmentedTable does not support direct access. Please use sql query to retrieve data

dbDir = '/tests/dolphindb/valueDB'
devDir = '/tests/dolphindb/dev.csv'
db = database(dbDir)
dev = db.loadTable(`dev)
saveText(dev, devDir)
I want to export table "dev" as 'csv' file but I encountered this error message:
Execution was completed with exception
SegmentedTable does not support direct access. Please use sql query to retrieve data
I wonder if I have to load all data into memory to export it as 'csv' file.
Yes, the input table for saveText must be a non-partitioned table.

Fitbit Data Export - Creating a data warehouse

I plan to create a Fitbit data warehouse for educational purposes, and there doesn't seem to be any material online for Fitbit data specifically.
A few issues faced:
You can only export 1 month of data (max) at a time from the Fitbit website. My plan would be to drop a month's worth of data at a time into a folder, and have these files read seperately.
You can either export the data through CSV or .XLS. The issue with XLS is that each day in the month will create a seperate sheet for food logs, which will then need to be merged in a staging table. The issue with CSV would be that there is one sheet per file, with all of the data in there: CSV Layout
I would then use SSIS to load the data into a SQL Server database for reporting purposes.
Which would the more suited approach be, to export the data using .XLS format or CSV?
Edit: How would it be possible to load a CSV file into SSIS with such a format?
The CSV layout would be as such:
Body,,,,,,,,,
Date,Weight,BMI,Fat,,,,,,
01/06/2018,71.5,23.29,15,,,,,,
02/06/2018,71.5,23.29,15,,,,,,
03/06/2018,71.5,23.29,15,,,,,,
04/06/2018,71.5,23.29,15,,,,,,
05/06/2018,71.5,23.29,15,,,,,,
06/06/2018,71.5,23.29,15,,,,,,
07/06/2018,71.5,23.29,15,,,,,,
08/06/2018,71.5,23.29,15,,,,,,
09/06/2018,71.5,23.29,15,,,,,,
10/06/2018,71.5,23.29,15,,,,,,
11/06/2018,71.5,23.29,15,,,,,,
12/06/2018,71.5,23.29,15,,,,,,
13/06/2018,71.5,23.29,15,,,,,,
14/06/2018,71.5,23.29,15,,,,,,
15/06/2018,71.5,23.29,15,,,,,,
16/06/2018,71.5,23.29,15,,,,,,
17/06/2018,71.5,23.29,15,,,,,,
18/06/2018,71.5,23.29,15,,,,,,
19/06/2018,71.5,23.29,15,,,,,,
20/06/2018,71.5,23.29,15,,,,,,
21/06/2018,71.5,23.29,15,,,,,,
22/06/2018,71.5,23.29,15,,,,,,
23/06/2018,71.5,23.29,15,,,,,,
24/06/2018,71.5,23.29,15,,,,,,
25/06/2018,71.5,23.29,15,,,,,,
26/06/2018,71.5,23.29,15,,,,,,
27/06/2018,71.5,23.29,15,,,,,,
28/06/2018,71.5,23.29,15,,,,,,
29/06/2018,72.8,23.72,15,,,,,,
30/06/2018,72.95,23.77,15,,,,,,
,,,,,,,,,
Foods,,,,,,,,,
Date,Calories In,,,,,,,,
01/06/2018,0,,,,,,,,
02/06/2018,0,,,,,,,,
03/06/2018,0,,,,,,,,
04/06/2018,0,,,,,,,,
05/06/2018,0,,,,,,,,
06/06/2018,0,,,,,,,,
07/06/2018,0,,,,,,,,
08/06/2018,0,,,,,,,,
09/06/2018,0,,,,,,,,
10/06/2018,0,,,,,,,,
11/06/2018,0,,,,,,,,
12/06/2018,0,,,,,,,,
13/06/2018,100,,,,,,,,
14/06/2018,0,,,,,,,,
15/06/2018,0,,,,,,,,
16/06/2018,0,,,,,,,,
17/06/2018,0,,,,,,,,
18/06/2018,0,,,,,,,,
19/06/2018,0,,,,,,,,
20/06/2018,0,,,,,,,,
21/06/2018,0,,,,,,,,
22/06/2018,0,,,,,,,,
23/06/2018,0,,,,,,,,
24/06/2018,0,,,,,,,,
25/06/2018,0,,,,,,,,
26/06/2018,0,,,,,,,,
27/06/2018,"1,644",,,,,,,,
28/06/2018,"2,390",,,,,,,,
29/06/2018,981,,,,,,,,
30/06/2018,0,,,,,,,,
For example, "Foods" would be the table name, "Date" and "Calories In" would be column names. "01/06/2018" is the Date, "0" is the "Calories in" and so on.
Tricky, I just pulled my fitbit data as this peaked my curiosity. That csv is messy. You basically have mixed file formats in one file. That won't be straight forward in SSIS. The XLS format and like you mentioned the food logs tagging each day on the worksheet, SSIS won't like that changing.
CSV:
XLS:
Couple of options off the top of my head that I see for CSV.
Individual exports from Fitbit
I see you can pick which data you want to include in your export: Body, Foods, Activities, Sleep.
Do each export individually, saving each file with a prefix of what type of data it is.
Then build SSIS with multiple foreach loops and data flow task for each individual file format.
That would do it, but would be a tedious effort when having to export the data from Fitbit.
Handle the one file with all the data
This option you would have to get creative since the formats are mixed and you have sections with difference column definitions, etc.
One option would be to create a staging table with as many columns as which ever section has the most, which looks to be maybe "Activities". Give each column a generic name as Column1,Column2 and make them all VARCHAR.
Since we have mixed "formats" and not all data types would line up we just need to get all the data out first and then sort out conversion later.
From there you can build one data flow and flat file source and also get line number added since we will need to sort out where each section of data is later.
When building out the file connection for your source you will have to manually add all columns since the first row of data in your file doesn't include all the commas for each field, SSIS won't be able to detect all the columns. Manually add the number of columns needed, also make sure:
Text Qualifier = "
Header row Delimiter = {LF}
Row Delimiter = {LF}
Column Delimiter = ,
That should get you data loaded into a database at least into a stage table. From there you would need to use a bunch of T-SQL to zero in on each "section" of data and then parse, convert and load from there.
Small test I did I just had table call TestTable:
CREATE TABLE [dbo].[TestTable](
[LineNumber] [INT] NULL,
[Column1] [VARCHAR](MAX) NULL,
[Column2] [VARCHAR](MAX) NULL,
[Column3] [VARCHAR](MAX) NULL,
[Column4] [VARCHAR](MAX) NULL,
[Column5] [VARCHAR](MAX) NULL,
[Column6] [VARCHAR](MAX) NULL,
[Column7] [VARCHAR](MAX) NULL,
[Column8] [VARCHAR](MAX) NULL,
[Column9] [VARCHAR](MAX) NULL
)
Dataflow and hooked up the file source:
Execute dataflow and then I had data loaded as:
From there I worked out some T-SQL to get to each "Section" of data. Here's an example that shows how you could filter to the "Foods" section:
DECLARE #MaxLine INT = (
SELECT MAX([LineNumber])
FROM [TestTable]
);
--Something like this, using a sub query that gets you starting and ending line numbers for each section.
--Doing the conversion of what column that section of data ended up in.
SELECT CONVERT(DATE, [a].[Column1]) AS [Date]
, CONVERT(BIGINT, [a].[Column2]) AS [CaloriesIn]
FROM [TestTable] [a]
INNER JOIN (
--Something like this to build out starting and ending line number for each section
SELECT [Column1]
, [LineNumber] + 2 AS [StartLineNumber] --We add 2 here as the line that start the data in a section is 2 after its "heading"
, LEAD([LineNumber], 1, #MaxLine) OVER ( ORDER BY [LineNumber] )
- 1 AS [EndLineNumber]
FROM [TestTable]
WHERE [Column1] IN ( 'Body', 'Foods', 'Activities' ) --Each of the sections of data
) AS [Section]
ON [a].[LineNumber]
BETWEEN [Section].[StartLineNumber] AND [Section].[EndLineNumber]
WHERE [Section].[Column1] = 'Foods'; --Then just filter on what sectoin you want.
Which in turn gave me the following:
There could be other options for parsing that data, but this should give a good starting point and a idea on how tricky this particular CSV file is.
As for the XLS option, that would be straight forward for all sections except food logs. You would basically setup an excel file connection and each sheet would be a "table" in the source in the data flow and have individual data flows for each worksheet.
But then what about Food logs. Once those changed and you rolled into the next month or something SSIS would freak out, error, probably complain about metadata.
One obvious work around would be manually manipulate the excel and merge all of them into one "Food Log" sheet prior to running it through SSIS. Not ideal because you'd probably want something completely automated.
I'd have to tinker around with that. Maybe a script task and some C# code to combine all those sheets into one, parsing the date out of each sheet name and appending it to the data prior to a data flow loading it. Maybe possible.
Looks like there are challenges with both of the files Fitbit is exporting out no matter which format you look at.

SQL Server FOR XML PATH carriage return after each root node

I am using FOR XML PATH in SQL Server 2014 to generate an XML file to send to one of our vendors. Their system requires that each root node be separated by a carriage return / line break. Here is the T-SQL code I'm using to generate it:
Declare #xmldata xml
set #xmldata =
(SELECT a.StatementDate AS [stmt_date]
,a.CustomerID AS [student_id]
,'Upon Receipt' AS [due_date]
,a.TotalDue AS [curr_bal]
,a.TotalDue AS [total_due]
,a.AlternateID AS [alternate_id]
,a.FullName AS [student_name]
,a.Email AS [student_email]
,a.Addr1
,a.Addr2
,a.Msg AS [message]
,(
SELECT b.StatementDate AS [activity_date]
,b.ActivityDesc AS [activity_desc]
,b.TermBalance AS [charge]
FROM #ActivityXML AS b
WHERE a.CustomerID = b.CustomerID
ORDER BY a.StatementDate
FOR XML PATH('activity'),TYPE
)
FROM #BillingStatement AS a
FOR XML PATH('Billing'))
select #xmldata as returnXml
This works great, but returns one long string with no separation between nodes at all. (I would post an example but it would just look like a jumbled up mess in here.)
Anyhow, what we need is to generate a file where each <Billing> tag and contents within is placed on a new line after a closing </Billing> tag. I would guess there's a simple solution, such as inserting char(13)+char(10) somewhere in the code, but I've been unable to get that working. Is it possible or will I need to do it in another system?
Based on responses here and research elsewhere, this is not possible using just T-SQL. We would need to either copy / paste the output, or use another program to take the data and insert line breaks.
From #Shnugo - "The pretty print of XML is not supported natively within T-SQL. You might use a CLR method, a service or any kind of post processing with a physically stored file. You might open the XML from grid-results' xml viewer and copy-paste the output to a text editor. Don't forget to set the XML size for grid result to unlimited, if your XML is big."

problem storing image as blob to Oracle database

First of all i am very new with database systems. i am trying to store an image on my db (only for testing purposes) however I cannot do. There is a problem in the code I use. Can you please tell me what is wrong with the following code?
Create DIRECTORY temp as 'c:\temp';
DECLARE
src_lob BFILE := BFILENAME('temp', 'IMAGE.png');
dest_lob BLOB;
BEGIN
INSERT INTO lob_table VALUES(2, EMPTY_BLOB())
RETURNING doc INTO dest_lob;
DBMS_LOB.OPEN(src_lob, DBMS_LOB.LOB_READONLY);
DBMS_LOB.LoadFromFile( DEST_LOB => dest_lob,
SRC_LOB => src_lob,
AMOUNT => DBMS_LOB.GETLENGTH(src_lob) );
DBMS_LOB.CLOSE(src_lob);
COMMIT;
END;
When I try to run it, I have the following error: ORA-00911: invalid character
What is wrong here?
Thannks in advance.
Never done it so I'm not certain, but I think the DIRECTORY has to be on the server, not the client.
(You may be running SQL*Plus on the server I guess)

Resources