Is there any way to import the data from s3 to mssql - sql-server

I have hadoop cluster running on amazon EMR which process some data and write the output to s3. Now, I want to import that data in mssql. Is there any open source connector for that ? Or i have to manually download the data, change the default seperator '\001' to ',' and then import data in mssql.

There is no direct way.
Use below config in map reduce to write output , as delimiter
job.getConfiguration().set("mapreduce.textoutputformat.separator", ",");
The best way is to keep processed data in s3. You can CSV to s3. The write a php/java/shell to download data from s3 and load it to mssql.
You can use s3download directory to download the processed data and then use bulk insert to load the csv file to mssql.

You can use Apache Sqoop for this use case.
Apache Sqoop supports importing from and exporting to mssql.
The following article explains how to install Sqoop in EMR
http://blog.kylemulka.com/2012/04/how-to-install-sqoop-on-amazon-elastic-map-reduce-emr/
Please refer to Sqoop user guide.
http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html

Related

How to import an Oracle DB .dmp file using DBeaver?

I'm currently trying to import an Oracle DB .dmp (dump) file into my Oracle DB using DBeaver but have trouble doing so.
The Oracle DB in question is running in a docker container. I successfully connected to this Oracle database with DBeaver, and can thus browse the database using DBeaver.
Currently however, the DB is empty. That's where the .dmp file comes in.
I want to import this .dmp file into my database, under a certain schema but I cannot seem to do this. The dump file looks something like this: 'export.dmp' and is around 16MB big.
I'd like to import the data from the .dmp file to be able to browse the data to get familiar with it, as similar data will be stored in our own database.
I looked online but was unable to get an answer that works for me.
I tried using DBeaver but I don't seem to have the option to import or restore a DB via a .dmp file. At best, DBeaver proposes to import data using a .CSV file. I also downloaded the Oracle tool SQLDeveloper, but I can't manage to connect to my database in the docker container.
Online there is also talk of an import / export tool that supposedly can create these .dmp files and import them, but I'm unsure how to get this tool and whether that is the way to do it.
If so, I still don't understand how I can get to browse the data in DBeaver.
How can I import and browse the data from the .dmp file in my Oracle DB using DBeaver?
How to find Oracle datapump dir
presumably set to /u01/app/oracle/admin/<mydatabase>/dpdump on your system
How to copy files from host to docker container
docker cp export.dmp container_id:/u01/app/oracle/admin/<mydatabase>/dpdump/export.dmp
How do I get into a Docker container's shell
docker exec -it <mycontainer> bash
How to import an Oracle database from dmp file
If it was exported using expdp, then start the import with impdp:
impdp <username>/<password> dumpfile=export.dmp full=y
It will output the log file in the same default DATA_PUMP_DIR directory in the container.
oracle has two utilities IMPORT and IMPDP to import dumps , with IMPORT you can't use database directories and you have to specify the location . The IMPDP on other hand require database directory .
having said that you can't import oracle export dumps using dbeaver , you have to use IMPORT or IMPDP utility from OS.

How to load data from UNIX to snowflake

I have created CSV files into UNIX server using Informatica resides in. I want to load those CSV files directly from UNIX box to snowflake using snowsql, can someone help me how to do that?
Log into SnowSQL:
https://docs.snowflake.com/en/user-guide/getting-started-tutorial-log-in.html
Create a Database, Table and Virtual Warehouse, if not done so already:
https://docs.snowflake.com/en/user-guide/getting-started-tutorial-create-objects.html
Stage the CSV files, using PUT:
https://docs.snowflake.com/en/user-guide/getting-started-tutorial-stage-data-files.html
Copy the files into the target table using COPY INTO:
https://docs.snowflake.com/en/user-guide/getting-started-tutorial-copy-into.html

import SQL server database to HDFS or to HIVE

I deployed a hdinsight cluster on azure. i need to copy SQL database to Hdfs Location or directly to hive. I am New to establishing these connections. Please let me know the suggestions. Thank you.
It look like you are not clear on requirement what exactly you need. Can you provide more details about this task?
But mean time I suggest you to verify configuration files(.xml) and connections for Hive or HDFS.
Below link might be helpful to you for debug :
https://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-use-hive
https://msdn.microsoft.com/en-us/library/dn749882.aspx#sec3
If I am understanding your requirement correctly,then you need to use Sqoop to import the SQL db into HDFS. Use Below command(if Sqoop is already installed).This command will copy the all tables in the schema to default location
sqoop import-all-tables \
--connect jdbc:mysql://localhost/<schema name> \
--username <username>
--pass<password>
Then you can use below command to check the tables that has been imported
hadoop fs -ls
For more info,you can visit https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal
https://community.hortonworks.com/questions/13132/best-practice-to-import-data-from-sql-server-to-hi.html

How to export data to local system from snowflake cloud data warehouse?

I am using snowflake cloud datawarehouse, which is like teradata that hosts data. I am able run queries and get results on the web UI itself. But I am unclear how can one export the results to a local PC so that we can report based on the data.
Thanks in advance
You have 2 options which both use sfsql which is based on henplus. The first option is to export the result of your query to a S3 staging file as shown below:
CREATE STAGE my_stage URL='s3://loading/files/' CREDENTIALS=(AWS_KEY_ID=‘****' AWS_SECRET_KEY=‘****’);
COPY INTO #my_stage/dump
FROM (select * from orderstiny limit 5) file_format=(format_name=‘csv' compression=‘gzip'');
The other option is to capture the sql result into a file.
test.sql:
set-property column-delimiter ",";
set-property sql-result-showheader off;
set-property sql-result-showfooter off;
select current_date() from dual;
$ ./sfsql < test.sql > result.txt
For more details and help, login to your snowflake account and access the online documentation or post your question to Snowflake support via the Snowflake support portal which is accessible through the Snowflake help section. Help -> Support Portal.
Hope this helps.
You can use a COPY command to export a table (or query results) into a file on S3 (using "stage" locations), and then a GET command to save it onto your local filesystem. You can only do it from the "sfsql" Snowflake command line tool (not from web UI).
Search the documentation for "unloading", you'll find more info there.
You can directly download the data from Snowflakes to Local Filesystem without staging to S3 or redirecting via unix pipe
Use COPY INTO to load table data to table staging
https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html
snowsql$> copy into #%test_table/result/data_ from test_table
file_format = (TYPE ='[FILE_TYPE]' compression='[COMPRESSION_TYPE]');
Use GET command to download data from table staging to Local FS
https://docs.snowflake.net/manuals/sql-reference/sql/get.html
snowsql$> get #%test_table/result/data_ file:///tmp/;

importing data from sql server to hbase

I know that Sqoop allows us to import data from a RDBMS into HDFS. I was wondering if the sql server connector in sqoop also allows us to import it directly into HBase? I know we can do this with mysql. I was wondering if the same can be done with sql server too
I am working in the Hortonworks Sandbox, and I was able to pull data from a SQL Server instance into an HBase table by doing the following steps:
Get the SQL Server JDBC driver onto the Hadoop box.
curl -L 'http://download.microsoft.com/download/0/2/A/02AAE597-3865-456C-AE7F-613F99F850A8/sqljdbc_4.0.2206.100_enu.tar.gz' | tar xz
Copy the driver into the correct location for sqoop to be able to find it:
cp sqljdbc_4.0/enu/sqljdbc4.jar /usr/lib/sqoop/lib
Run a sqoop import
sqoop import --hbase-create-table --hbase-table table_name_in_hbase --column-family cf_name --hbase-row-key my_ID --connect "jdbc:sqlserver://hostname:1433;database=db_name;username=sqoop;password=???" --table tablename_in_sql_server -m 1
I referenced these sites:
http://hortonworks.com/hadoop-tutorial/import-microsoft-sql-server-hortonworks-sandbox-using-sqoop/
http://souravgulati.webs.com/apps/forums/topics/show/8680714-sqoop-import-data-from-mysql-to-hbase
It is possible to directly import data into HBase from any relational database using Sqoop.
This post how it can be done using a Mysql Database Server and import it directly to HBase.
you can import data into HBase from any RDBMS as lons as it provides JDBC driver. Sqoop interface with any other RDBMS is using JDBC.

Resources