Bulk load to HDFS from sybase database - sybase

I need to load data from sybase(production database) to HDFS. By using sqoop it is taking very long time and frequently hit the production database. So, I am thinking to create data files from sybase dump and after that copy the data files to hdfs. Is there any tool(open source) is available to create required data files(flat files) from sybase dump.
Thanks,

The iq_bcp command line utility is designed to do this on a per table basis. You just need to generate a list of tables, and you can iterate through the list.
iq_bcp [ [ database_name. ] owner. ] table_name { in | out } datafile
iq_bcp MyDB..MyTable out MyTable.csv -c -t#$#
-c specifies a character (plaintext) output
-t allows you to customize the column delimiter. You will want to use a character or series of characters that do not appear in your extact e.g. if you have a text column that contains text with a comma, a csv will be tricky to import without additional work.
Sybase IQ: iq_bcp

Related

Snowflake Data Load Wizard - File Format - how to handle null string in File Format

I am using Snowflake Data Load Wizard to upload csv file to Snowflake table. The Snowflake table structure identifies a few columns as 'NOT NULL' (non-nullable). Problem is, the wizard is treating empty strings as null and the Data Load Wizard issues the following error:
Unable to copy files into table. NULL result in a non-nullable column
File '#<...../load_data.csv', line 2, character 1 Row, 1 Column
"<TABLE_NAME>" ["PRIMARY_CONTACT_ROLE":19)]
I'm sharing my File Format parameters from the wizard:
I then updated the DDL of the table by removing the "NOT NULL" declaration of the PRIMARY_CONTACT_ROLE column, then re-create the table and this time the data load of 20K records is successful.
How do we fix the file format wizard to make SNOWFLAKE not consider empty strings as NULLS?
The option you have to set EMPTY_FIELD_AS_NULL = FALSE. Unfortunately, modifying this option is not possible in the wizard. You have to create your file format or alter your existing file format in a worksheet manually as follows:
CREATE FILE FORMAT my_csv_format
TYPE = CSV
FIELD_DELIMITER = ','
SKIP_HEADER = 1
EMPTY_FIELD_AS_NULL = FALSE;
This will cause empty strings to be not treated as NULL values but as empty strings.
The relevant documentation can be found at https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html#type-csv.
Let me know if you you need a full example of how to upload a CSV file with the SnowSQL CLI.
Just to add, there are additional ways you can load your CSV into snowflake without having to specify a file format.
You can use pre-built third-party modeling tools to upload your raw file, adjust the default column types to your preference and push your modeled data back into snowflake.
My team and I are working on such a tool - Datameer, feel free to check it out here
https://www.datameer.com/upload-csv-to-snowflake/

Search string from large amount of data(millions of record in CSV file)

I have millions of record in csv file and i need to do string comparison and show the filtered record in Bootstrap data table.
CSV files are updated on daily basis with millions of record.
Note:
If i import csv file into sql database and apply search query in table to get the result, it takes a lot of time.
Can i do search from csv file without importing it in SQL?
If there any specific method/way to store data?
Is there any tools for text search or it can be done in MS SQL?
Every help will be appreciated.
You can use OPENROWSET to read your CSV file directly in SQL Server
You will need "Ad Hoc Distributed Queries" enabled:
EXEC sp_configure 'show advanced options', 1
GO
RECONFIGURE
GO
GO
EXEC sp_configure 'ad hoc distributed queries', 1
GO
RECONFIGURE
GO
Then you define the datasource this way:
SELECT *
FROM OPENROWSET(
'Microsoft.ACE.OLEDB.12.0',
'Text;Database=C:\Temp\;IMEX=1;', -- the path to csv file
[data_file#csv] -- csv file name, please note # instead of dot
) as t
If your file is a real CSV (comma separated) then it should work with default settings.
If your file is not a real CSV (comma separated) you can define your own file format just placing a file named "SCHEMA.INI" in the same folder of the csv file.
This schema.ini file must contains a section with the definition of the structure of your data file. (see details here Schema.ini File (Text File Driver))
Example:
[data_file.csv]
Format=Delimited(;)
DecimalSymbol=.
ColNameHeader=True
MaxScanRows=10
Col1=ID Long
Col2=DESCR char width 4
Col3=FIELD_X char width 255
Col4=FIELD_Y DateTime
...
...
Coln=aFieldName aDataType aWidth
Can i do search from csv file without importing it in SQL?
Yes, there are many ways. If you're on Windows you can use command prompt find command. find "string to find" C:\Windows\file.csv
If there any specific method/way to store data?
Depends what you need to do with your matches. What do you need to do with your results?
Is there any tools for text search or it can be done in MS SQL?
Yes to both. A Database may not be the best place to store the data if it's not relational. If you need to find specific patterns in these text files then have a look at regex.

Tilde (~) Delimited File Read in SSIS

I'm trying to load a Tilde (~) delimited .DAT to SQL Server DB using SSIS. When I use a flat file source to read the file, I don't see the option of a ~ delimiter. I'm pasting a row from my file below:
7318~97836: LRX PAIN MONTHLY DX~001~ALL OTHER NSAIDs~1043676~001~1043676~001~OSR~401~01~ORALS,SOL,TAB/CAP RE~156720~50MG~ANSAID~100 0170-07
In here, I need to get the data between the columns separated by a ~ i.e.
Column 1 should have '7318', Column 2 should have '97836: LRX PAIN MONTHLY DX'.
Can someone help me with this? Can this be done using a Flat File Source or do I need to use a Script Task?
Sure you can, you just need to configure the "Column delimiter" property in the "Flat File Connection Manager Editor". There are some predetermined choices there, but you can click and type any separator you want:
After that you can click "refresh" and then "OK".

Exporting Specific Columns,Specific Rows from a Specific Table of a Specific Database in Mysql

I need to export only subset of columns from a very large table containing large number of columns.Also this table contains million of rows so i want to export only specific rows from this table.
I have recently started using Mysql earlier i was working on Oracle.
This worked for me:
mysql -u USERNAME --password=PASSWORD --database=DATABASE \
--execute='SELECT `field_1`, `field_2` FROM `table_name`' -X > file.xml
And then importing the file, using command:
LOAD XML LOCAL INFILE '/pathtofile/file.xml'
INTO TABLE table_name(field_1, field_2, ...);
What format do you need the data in? You could get a CSV using a query. For example
SELECT column1,column2,column3,... FROM table WHERE column1=criteria1,....
INTO OUTFILE '/tmp/output.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
http://www.tech-recipes.com/rx/1475/save-mysql-query-results-into-a-text-or-csv-file/
An administration tool like phpMyAdmin (http://www.phpmyadmin.net/) could also be used to run the query and then export the results in a variety of formats.

Export Image column from SQL Server 2000 using BCP

I've been tasked with extracting some data from an SQL Server 2000 database into a flat format on disk. I've little SQL Server experience.
There is a table which contains files stored in an "IMAGE" type column, together with an nvarchar column storing the filename.
It looks like there are numerous types of files stored in the table: Word docs, XLS, TIF, txt, zip files, etc.
I'm trying to extract just one row using BCP, doing something like this:
bcp "select file from attachments where id = 1234" queryout "c:\myfile.doc" -S <host> -T -n
This saves a file, but it is corrupt and I can't open it with Word. When I open the file with word, I can see a lot of the text, but I also get a lot of un-renderable characters. I've similar issues when trying to extract image files, e.g. TIF. Photo software won't open the files.
I presume I'm hitting some kind of character encoding problems.
I've played around with the -C (e.g. trying RAW) and -n options within BCP, but still can't get it to work.
The table in SQL Serer has a collation of "SQL_Latin1_General_CP1_CI_AS"
I'm running BCP remotely from my Windows 7 desktop.
Any idea where I'm going wrong? Any help greatly appreciated.
I got this working by changing the default options which BCP asks you about when you invoke the command:
The one that made the difference was changing the prefix-length field from 4 to 0.
bcp "select file from attachments where id = 1234" queryout "c:\myfile.doc" -S -T -n
after this
[image] : I (enter capital "I"]
0
0
Enter
save file Y
kallas.. ur file is there

Resources