how to load multiple files into multiple destination table in ssis - sql-server

HI I Have one doubt in ssis,
source Location have different files each file name is comes with location name .here we want load
each file name corresponding tables using ssis package.
source loacation have multiples files for each locationname files;
exaple:Files location : c:\Sourcefile\
Filesnames comes like : hyd files,bang files.
Hyd files comes like: hyd.txt,hyd1.txt hyd2.txt all are same structure only.hyd related all files load into hyd table only.
bang files comes like: bang.txt,bang.txt bang2.txt all are same structure only.bang related all files load into bang table only.
all source files and target tables structure are same only.
source FIles Structure: for hyd.txt file
Id,name,loc
1,abc,hyd
2,hari,hyd
for hyd1.txt file
id,name,loc
4,banu,hyd
5,ran,hyd
similar to bang:
id,name,loc
10,gop,bang
11,union,loc
for bang1.txt file
id,name,loc
14,ja,bang
here all hyd related text files load into hyd table in sql server database table. similar to bang fils load into bang table.
hyd table structure :
CREATE TABLE [dbo].[hyd](
[id] [int] NULL,
[name] [varchar](50) NULL,
[loc] [varchar](50) NULL
)
similar to bang
CREATE TABLE [dbo].[bang](
[id] [int] NULL,
[name] [varchar](50) NULL,
[loc] [varchar](50) NULL
)
I tried like below:
above one tables names not getting dynamically. i kept statistically values in table variable. that time all location related records are loaded into one table.
how to load multiple files into multiple destination table in ssis.please tell me how to achive this task in ssis

From the screenshots i have 3 suggestions:
You have to set the Data Flow Task Delay Validation property to True
You have to change the User::location variable value outside the Data flow task, you can add an expression task before the data flow task with the following expression
#[User::location] = SUBSTRING(#[User::FileName],1,FINDSTRING(#[User::FileName,".",1) -1)
or use a script component to achieve this
Or you can add a script task followed 2 data flow tasks inside the for each loop, the script task check the filename: if it is hyd it execute the first DFT , if it is bang it execute the second: (check this link: Working with Precedence Constraints in SQL Server Integration Services)

Related

EXterna Table errorORA-29913- ORA-29400- KUP-04040

i created a external table when i select from it this error show.
i work with oracle 19c
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04040: file customer.csv in EXTERNAL not found
------------code----------------
CREATE TABLE customers
(Email VARCHAR2(255) NOT NULL,
Name VARCHAR2(255) NOT NULL,
Phone VARCHAR2(255) NOT NULL,
Address VARCHAR2(255) NOT NULL)
ORGANIZATION EXTERNAL(
type oracle_loader
DEFAULT DIRECTORY external
ACCESS PARAMETERS
(
records delimited by newline
fields terminated by ','
missing field values are null
REJECT ROWS WITH ALL NULL FIELDS)
LOCATION ('customer.csv'))
REJECT LIMIT UNLIMITED;
customer.csv data
salma.55#gmm.com,salma,0152275522,44al,
mariam.66#hotmail.com,mariam,011145528,552www,
ahmed.85#gmail.com,ahmed,0111552774,44eee,
"DEFAULT DIRECTORY external" means you are looking in a named directory that you have called "external".
For example, if I had done:
create directory XYZ as '/tmp';
then
default directory XYZ
means I'll be searching in /tmp for my files. So look at DBA_DIRECTORIES to see where your "EXTERNAL" directory is pointing

Presto: How to read from s3 an entire bucket that is partitioned in sub-folders?

I need to read using presto from s3 an entire dataset that sits in "bucket-a". But, inside the bucket, the data was saved in sub-folders by year. So I have a bucket that looks like that:
Bucket-a>2017>data
Bucket-a>2018>more data
Bucket-a>2019>more data
All the above data is the same table but saved this way in s3. Notice that in the bucket-a itself there is no data, just inside each folder.
What I have to do is read all the data from the bucket as a single table adding a year as column or partition.
I tried doing this way, but didn't work:
CREATE TABLE hive.default.mytable (
col1 int,
col2 varchar,
year int
)
WITH (
format = 'json',
partitioned_by = ARRAY['year'],
external_location = 's3://bucket-a/'--also tryed 's3://bucket-a/year/'
)
and also
CREATE TABLE hive.default.mytable (
col1 int,
col2 varchar,
year int
)
WITH (
format = 'json',
bucketed_by = ARRAY['year'],
bucket_count = 3,
external_location = 's3://bucket-a/'--also tryed's3://bucket-a/year/'
)
All of the above didn't work.
I have seen people writing with partitions to s3 using presto, but what I'm trying to do is the opposite: read from s3 data that is already splitted in folders as single table.
Thanks.
If your folders were following Hive partition folder naming convention (year=2019/), you could declare the table as partitioned and just use system. sync_partition_metadata procedure in Presto.
Now, your folders do not follow the convention, so you need to register each one individually as a partition using system.register_partition procedure (will be available in Presto 330, about to be released). (The alternative to register_partition is to run appropriate ADD PARTITION in Hive CLI.)

Fitbit Data Export - Creating a data warehouse

I plan to create a Fitbit data warehouse for educational purposes, and there doesn't seem to be any material online for Fitbit data specifically.
A few issues faced:
You can only export 1 month of data (max) at a time from the Fitbit website. My plan would be to drop a month's worth of data at a time into a folder, and have these files read seperately.
You can either export the data through CSV or .XLS. The issue with XLS is that each day in the month will create a seperate sheet for food logs, which will then need to be merged in a staging table. The issue with CSV would be that there is one sheet per file, with all of the data in there: CSV Layout
I would then use SSIS to load the data into a SQL Server database for reporting purposes.
Which would the more suited approach be, to export the data using .XLS format or CSV?
Edit: How would it be possible to load a CSV file into SSIS with such a format?
The CSV layout would be as such:
Body,,,,,,,,,
Date,Weight,BMI,Fat,,,,,,
01/06/2018,71.5,23.29,15,,,,,,
02/06/2018,71.5,23.29,15,,,,,,
03/06/2018,71.5,23.29,15,,,,,,
04/06/2018,71.5,23.29,15,,,,,,
05/06/2018,71.5,23.29,15,,,,,,
06/06/2018,71.5,23.29,15,,,,,,
07/06/2018,71.5,23.29,15,,,,,,
08/06/2018,71.5,23.29,15,,,,,,
09/06/2018,71.5,23.29,15,,,,,,
10/06/2018,71.5,23.29,15,,,,,,
11/06/2018,71.5,23.29,15,,,,,,
12/06/2018,71.5,23.29,15,,,,,,
13/06/2018,71.5,23.29,15,,,,,,
14/06/2018,71.5,23.29,15,,,,,,
15/06/2018,71.5,23.29,15,,,,,,
16/06/2018,71.5,23.29,15,,,,,,
17/06/2018,71.5,23.29,15,,,,,,
18/06/2018,71.5,23.29,15,,,,,,
19/06/2018,71.5,23.29,15,,,,,,
20/06/2018,71.5,23.29,15,,,,,,
21/06/2018,71.5,23.29,15,,,,,,
22/06/2018,71.5,23.29,15,,,,,,
23/06/2018,71.5,23.29,15,,,,,,
24/06/2018,71.5,23.29,15,,,,,,
25/06/2018,71.5,23.29,15,,,,,,
26/06/2018,71.5,23.29,15,,,,,,
27/06/2018,71.5,23.29,15,,,,,,
28/06/2018,71.5,23.29,15,,,,,,
29/06/2018,72.8,23.72,15,,,,,,
30/06/2018,72.95,23.77,15,,,,,,
,,,,,,,,,
Foods,,,,,,,,,
Date,Calories In,,,,,,,,
01/06/2018,0,,,,,,,,
02/06/2018,0,,,,,,,,
03/06/2018,0,,,,,,,,
04/06/2018,0,,,,,,,,
05/06/2018,0,,,,,,,,
06/06/2018,0,,,,,,,,
07/06/2018,0,,,,,,,,
08/06/2018,0,,,,,,,,
09/06/2018,0,,,,,,,,
10/06/2018,0,,,,,,,,
11/06/2018,0,,,,,,,,
12/06/2018,0,,,,,,,,
13/06/2018,100,,,,,,,,
14/06/2018,0,,,,,,,,
15/06/2018,0,,,,,,,,
16/06/2018,0,,,,,,,,
17/06/2018,0,,,,,,,,
18/06/2018,0,,,,,,,,
19/06/2018,0,,,,,,,,
20/06/2018,0,,,,,,,,
21/06/2018,0,,,,,,,,
22/06/2018,0,,,,,,,,
23/06/2018,0,,,,,,,,
24/06/2018,0,,,,,,,,
25/06/2018,0,,,,,,,,
26/06/2018,0,,,,,,,,
27/06/2018,"1,644",,,,,,,,
28/06/2018,"2,390",,,,,,,,
29/06/2018,981,,,,,,,,
30/06/2018,0,,,,,,,,
For example, "Foods" would be the table name, "Date" and "Calories In" would be column names. "01/06/2018" is the Date, "0" is the "Calories in" and so on.
Tricky, I just pulled my fitbit data as this peaked my curiosity. That csv is messy. You basically have mixed file formats in one file. That won't be straight forward in SSIS. The XLS format and like you mentioned the food logs tagging each day on the worksheet, SSIS won't like that changing.
CSV:
XLS:
Couple of options off the top of my head that I see for CSV.
Individual exports from Fitbit
I see you can pick which data you want to include in your export: Body, Foods, Activities, Sleep.
Do each export individually, saving each file with a prefix of what type of data it is.
Then build SSIS with multiple foreach loops and data flow task for each individual file format.
That would do it, but would be a tedious effort when having to export the data from Fitbit.
Handle the one file with all the data
This option you would have to get creative since the formats are mixed and you have sections with difference column definitions, etc.
One option would be to create a staging table with as many columns as which ever section has the most, which looks to be maybe "Activities". Give each column a generic name as Column1,Column2 and make them all VARCHAR.
Since we have mixed "formats" and not all data types would line up we just need to get all the data out first and then sort out conversion later.
From there you can build one data flow and flat file source and also get line number added since we will need to sort out where each section of data is later.
When building out the file connection for your source you will have to manually add all columns since the first row of data in your file doesn't include all the commas for each field, SSIS won't be able to detect all the columns. Manually add the number of columns needed, also make sure:
Text Qualifier = "
Header row Delimiter = {LF}
Row Delimiter = {LF}
Column Delimiter = ,
That should get you data loaded into a database at least into a stage table. From there you would need to use a bunch of T-SQL to zero in on each "section" of data and then parse, convert and load from there.
Small test I did I just had table call TestTable:
CREATE TABLE [dbo].[TestTable](
[LineNumber] [INT] NULL,
[Column1] [VARCHAR](MAX) NULL,
[Column2] [VARCHAR](MAX) NULL,
[Column3] [VARCHAR](MAX) NULL,
[Column4] [VARCHAR](MAX) NULL,
[Column5] [VARCHAR](MAX) NULL,
[Column6] [VARCHAR](MAX) NULL,
[Column7] [VARCHAR](MAX) NULL,
[Column8] [VARCHAR](MAX) NULL,
[Column9] [VARCHAR](MAX) NULL
)
Dataflow and hooked up the file source:
Execute dataflow and then I had data loaded as:
From there I worked out some T-SQL to get to each "Section" of data. Here's an example that shows how you could filter to the "Foods" section:
DECLARE #MaxLine INT = (
SELECT MAX([LineNumber])
FROM [TestTable]
);
--Something like this, using a sub query that gets you starting and ending line numbers for each section.
--Doing the conversion of what column that section of data ended up in.
SELECT CONVERT(DATE, [a].[Column1]) AS [Date]
, CONVERT(BIGINT, [a].[Column2]) AS [CaloriesIn]
FROM [TestTable] [a]
INNER JOIN (
--Something like this to build out starting and ending line number for each section
SELECT [Column1]
, [LineNumber] + 2 AS [StartLineNumber] --We add 2 here as the line that start the data in a section is 2 after its "heading"
, LEAD([LineNumber], 1, #MaxLine) OVER ( ORDER BY [LineNumber] )
- 1 AS [EndLineNumber]
FROM [TestTable]
WHERE [Column1] IN ( 'Body', 'Foods', 'Activities' ) --Each of the sections of data
) AS [Section]
ON [a].[LineNumber]
BETWEEN [Section].[StartLineNumber] AND [Section].[EndLineNumber]
WHERE [Section].[Column1] = 'Foods'; --Then just filter on what sectoin you want.
Which in turn gave me the following:
There could be other options for parsing that data, but this should give a good starting point and a idea on how tricky this particular CSV file is.
As for the XLS option, that would be straight forward for all sections except food logs. You would basically setup an excel file connection and each sheet would be a "table" in the source in the data flow and have individual data flows for each worksheet.
But then what about Food logs. Once those changed and you rolled into the next month or something SSIS would freak out, error, probably complain about metadata.
One obvious work around would be manually manipulate the excel and merge all of them into one "Food Log" sheet prior to running it through SSIS. Not ideal because you'd probably want something completely automated.
I'd have to tinker around with that. Maybe a script task and some C# code to combine all those sheets into one, parsing the date out of each sheet name and appending it to the data prior to a data flow loading it. Maybe possible.
Looks like there are challenges with both of the files Fitbit is exporting out no matter which format you look at.

Do or don't I need to create a table that I want to prepopulate with H2?

I'm confused by the errors I get when trying to create an in-memory H2 DB for my Spring Boot application. The relevant configuration is
db.url=jdbc:h2:mem:test;MODE=MySQL;DB_CLOSE_DELAY=-1;INIT=runscript from 'classpath:create.sql'
hibernate.hbm2ddl.auto=create
And create.sql:
CREATE TABLE `cities` (
`name` varchar(45) NOT NULL,
PRIMARY KEY (`name`)
) ;
INSERT INTO `cities` VALUES ('JAEN'),('ALBACETE');
But I get the error Caused by: org.h2.jdbc.JdbcSQLException: Table "CITIES" already exists;
Weird is, if I remove the CREATE TABLE statement, I get:
Caused by: org.h2.jdbc.JdbcSQLException: Table "CITIES" not found;
The only thing that works is using DROP TABLE IF EXISTS, but well, I don't think I should need to.
What's going on? What's the proper way of pre-populating static data into an H2 memory DB?
1) Hibernate way: use import.sql file or specify files
spring.jpa.properties.hibernate.hbm2ddl.import_files=file1.sql,file2.sql
http://docs.spring.io/spring-boot/docs/current/reference/html/howto-database-initialization.html
2) Spring Boot: use default schema.sql & data.sql files
or specify files through properties
spring.datasource.schema = file1.sql
spring.datasource.data = file1.sql, file2.sql
http://docs.spring.io/autorepo/docs/spring-boot/1.0.2.RELEASE/reference/html/howto-database-initialization.html

PostgreSQL Unique Index Error

I'm busy writing a script to restore a database backup and I've run into something strange.
I have a table.sql file which only contains create table structures like
create table ugroups
(
ug_code char(10) not null ,
ug_desc char(60) not null
);
I have a second data.csv file which only contains delimiter data such as
xyz | dummy data
abc | more nothing
fun | is what this is
Then I have a third index.sql file which only creates indexes as such
create unique index i_ugroups on ugroups
(ug_code);
I use the commands from the terminal like so
/opt/postgresql/bin/psql -d dbname -c "\i /tmp/table.sql" # loads table.sql
I have a batch script that loads in the data which works perfectly. Then I use the command
/opt/postgresql/bin/psql -d dbname -c "\i /tmp/index.sql" # loads index.sql
When I try to create the unique indexes it is giving me the error
ERROR: could not create unique index "i_ugroups"
DETAIL: Key (ug_code)=(transfers ) is duplicated.
What's strange is that when I execute the table.sql file and the index.sql file together and load the data last I get no errors and it all works.
Is there something I am missing? why would it not let me create the unique indexes after the data has been loaded?
There are two rows in your column ug_code with the data "transfers " and that's why it can't create the index.
Why it would succeed if you create the index first, I don't know. But I would suspect that the second time it tries to insert "transfers " into database, it just fails the insert that time and other data gets inserted succesfully.

Resources