why do not we use sqoop eval in production environment, what is the disadvantage by using sqoop eval? - eval

using sqoop eval we can evaluate the connection with the database. but after that why do not we use for query propose in production environment, what's it's import in production.
sqoop eval \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--query "SELECT * FROM orders LIMIT 10"

Sqoop Eval is only used to print the output of the query to the console. It is not used to import data from RDBMS to Hdfs. So "Sqoop eval" is a functionality where we get output only on console. It does not provide functionality to move data to hdfs.
Whereas, Sqoop import is actually used to import the data to hdfs.

Related

Auto generate models in Nest js from existing SQL Server DB using Sequelize

I am working on an API using Nest js that has to connect to an existing database.
There are so many tables that cannot be manage manually for creating the entity tables in Nest.
I am using sequelize.
Is there a way that I can auto generate models.
Sequelize-auto seems to only work well for express. I need something that can generate class based model entities.
I get the solution, you can use sequelize-typescript-generator library to create the entity and models automatically based on schema and connection.
From the description:
You can run this globally if you already install the package
For the usage of the command
-h, --host Database IP/hostname
-p, --port Database port. Defaults:
- MySQL/MariaDB: 3306
- Postgres: 5432
- MSSQL: 1433
-d, --database Database name
-s, --schema Schema name (Postgres only). Default:
- public
-D, --dialect Dialect:
- postgres
- mysql
- mariadb
- sqlite
- mssql
-u, --username Database username
-x, --password Database password
-t, --tables Comma-separated names of tables to process
-T, --skip-tables Comma-separated names of tables to skip
-i, --indices Include index annotations in the generated models
-o, --out-dir Output directory. Default:
- output-models
-C, --case Transform tables and fields names
with one of the following cases:
- underscore
- camel
- upper
- lower
- pascal
- const
You can also specify a different
case for model and columns using
the following format:
<model case>:<column case>
-S, --storage SQLite storage. Default:
- memory
-L, --lint-file ES Lint file path
-l, --ssl Enable SSL
-r, --protocol Protocol used: Default:
- tcp
-a, --associations-file Associations file path
-g, --logs Enable Sequelize logs
-n, --dialect-options Dialect native options passed as json string.
-f, --dialect-options-file Dialect native options passed as json file path.
-R, --no-strict Disable strict typescript class declaration.
Example of the command:
stg \
-D mysql \
-h localhost \
-p 3306 \
-d myDatabase \
-u myUsername \
-x myPassword \
--indices \
--case camel \
--out-dir pathofthegeneratedfiletakeplace \
--clean \

How to open mongo-db dump file?

I have a .dump file (8GB) which is a mongo database that I need to work with.
I'm working with Robo 3T.
I've tried:
a) menu options in robo gui
b) mongorestore --db cert-db certctream.dump command, got error
(using certctream without the extension didn't work as well)
Failed: file certctream.dump does not have .bson extension
What am I missing?
Solution:
mongoimport --db <new_db_name> --host localhost:27017 path_to_dump_file
To import a mongo collection:
mongoimport -d database_name -c collection_name
In this case, collection_name would be the JSON file for the corresponding collection.
To import a mongo database:
mongorestore -d database_name
In this case, database_name would be a folder.
To restore from a .dump file you need to use the archive argument:
mongorestore --archive=mydump.dump
GOTCHA
The dump may have also been gzipped, in which case you'll get:
Failed: stream or file does not appear to be a mongodump archive
In this case, try adding --gzip as it might do the job:
mongorestore --gzip --archive=mydump.dump

sqoop import from SQL Server with windows authentication

I am trying to import a table from Microsoft SQL Server 11.0.5058 through Sqoop (which is a service on Hortonwork Data Platform) into HDFS. Given the user i have, has only windows authentication (LDAP) on SQL Server.
Tried few approaches
1. Kept the sqljdbc4.jar in sqoop shared library and used import command.
2. Downloaded sqljdbc_auth.dll and kept it in java library and tried running import command.
But no luck.
This worked for me:
HADOOP_CLASSPATH=/apps/lib/java/jdbc/jtds-1.3.1-patched/jtds-1.3.1.jar \
sqoop import --table XXXXX --connect "jdbc:jtds:sqlserver://XXXX:1433;useNTLMv2=true;domain=XXXX;databaseName=XXXXXX" \
--connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXX -P \
--verbose

Problems with Sqoop import-all-tables command

I am trying to import all the tables from several SQL Server databases into HDFS using Sqoop. I am using Cloudera CDH 5.7. So I type the following command:
sqoop import-all-tables --connect "jdbc:sqlserver://X.X.X.X:1433;database=FEPDB" --username XXXXX --password XXXXX --hive-import
It runs successfully but all the tables present in the 'FEPDB' Database are not being imported. I don't find them in the hive directory in the HDFS or when I list all the tables present in Hive.
So I tried to import all the tables into a directory in HDFS and then create hive tables. I gave the following command:
sqoop import-all-tables --connect "jdbc:sqlserver://X.X.X.X:1433;database=FEPDB" --username XXXXX --password XXXXX --target-dir "/user/FEPDB"
It gives me an error saying
unrecognized argument --target-dir
Doesn't --target-dir argument work with import-all-tables command? And why are all the tables from a database not being imported in the first place? Is there a way to get past these errors and import all the tables in a much easier way?
Any help would be appreciated. Thank you.
import-all-tables expect parameter with --warehouse-dir, provide warehouse-dir and hdfs path, it will work
Please try this: sqoop import-all-tables --connect "jdbc:sqlserver://X.X.X.X:1433;database=FEPDB" --username XXXXX --password XXXXX --warehouse-dir

Sqoop and MSSQL with window auth on linux fails

I m using apache sqoop to connect mSSQL server using window authentication but
I am not able to login when I run:
sqoop list-databases --connect jdbc:sqlserver://192.168.xx.xx:1433;username=xxxxx;password=xxxxxx;database=xxxxx;
I'm getting exception:
java.lang.RuntimeException: com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user 'username'. ClientConnectionId:a593dc10-2d06-4b8b-b53b-e743fb133d0e
at org.apache.sqoop.manager.CatalogQueryManager.listDatabases(CatalogQueryManager.java:73)
at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
at org.apache.sqoop.Sqoop.main(Sqoop.java:240)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user 'username'. ClientConnectionId:a593dc10-2d06-4b8b-b53b-e743fb133d0e
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:216)
at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:254)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:84)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:2908)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:2234)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.access$000(SQLServerConnection.java:41)
at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:2220)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:5696)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1715)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1326)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:991)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:827)
at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1012)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:233)
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:824)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.CatalogQueryManager.listDatabases(CatalogQueryManager.java:57)
... 7 more
I am sorry to say but the current answer is that Sqoop does not support integrated authentication (Kerberos) for any databases.
https://community.hortonworks.com/questions/20719/sqoop-to-sql-server-with-integrated-security.html#answer-82516
I spent months on this and was told by Cloudera that it is just not possible. The kerberos tokens are lost when the mappers spawn (as the YARN transitions the job to its internal security subsystem).
I think this is something that would have to be added to Sqoop itself, much like HBase mapreduce jobs have to pass job config to mappers in the code Sqoop able to do the same.
I spent weeks trying different things, doing network traces (all my servers are tied to AD with Centrify, ext), watched the token drop, and finally found a paper written by Yahoo about YARN that explains the internal token subsystem it uses (after using Kerberos to externally verify someone it moves to different token based subsystem for performance).
If you are coming from a Linux machine you and you want to leverage direct auth you need to do a few things.
First, your Linux system needs to be integrated with Active Directory via Kerberos (tons of articles on this).
Second, you need to change your connection string:
dbc:sqlserver://;serverName=$fqdnOfDatabaseHost;database=$databaseName;integratedSecurity=true;authenticationScheme=JavaKerberos"
Replacing the $variables with the ones for your environment.
There is more information on this here:
http://blogs.msdn.com/b/psssql/archive/2015/01/09/jdbc-this-driver-is-not-configured-for-integrated-authentication.aspx
Using Kerberos Integrated Authentication to Connect to SQL Server
https://msdn.microsoft.com/en-us/library/gg558122(v=sql.110).aspx
As a secondary option, if you do not feel like going through the hassle of integrating Linux to AD via Kerberos there are 3rd party paid for drivers you can purchase that will allow you to avoid this.
Data direct provides one, I cannot link since I don't have enough rep (created account to reply to this).
Use "integratedSecurity=true" for windows authetication instead of username and password.
Your sqoop statement should look like below
sqoop list-databases --connect 'jdbc:sqlserver://192.168.xx.xx:1433;integratedSecurity=true;database=xxxxx;'
You can follow the below to use jtds for sqoop sql and windows authentication:
1) Download the jtds driver from: https://sourceforge.net/projects/jtds/files/ (find the FAQ on jtds at http://jtds.sourceforge.net/faq.html)
2) Copy jtds files to sqoop lib
3) Use the following connection string template to modify according to your environment and connect:
jdbc:jtds:sqlserver://db_server:1433/DB_NAME;domain=NT_DOMAIN_NAME;integratedSecurity=true;authenticationScheme=JavaKerberos
I have found a solution to this provided by another user here: https://community.hortonworks.com/questions/20719/sqoop-to-sql-server-with-integrated-security.html
Basically if you switch to the jtds driver which you can download here: http://jtds.sourceforge.net/
Per Rajendra Manjunath
"
Sqoop SQL Server data import to HDFS worked with manual parametric the authentication(using windows credential) with added parameter on the SQL Server JDBC driver, as integrated security is not supported by the SQL driver as of now due to the Kerberos authentication(Delegated tokens distributed over cluster while running MR job).
So we need to pass the windows authentication with password and with the integrated security disabled mode to import the data to the system. As normal SQL server driver does not support, so I had used the jtds.jar and the different driver class to pull the data to the Hadoop Lake.
Sample Command I tried on the server as follows,
sqoop import --table Table1 --connect "jdbc:jtds:sqlserver://:;useNTLMv2=true;domain=;databaseName=XXXXXXXXXXXXX" \
--connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX --password 'XXXXXXX' \
--verbose --target-dir /tmp/33 -m 1 -- --schema dbo
"
Here are some examples that worked for me:
List databases
sqoop list-databases --connect "jdbc:jtds:sqlserver://databasehostname.yourdomain.com:1433;useNTLMv2=true;domain=myactivedirectorydomain.com" --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX -P
List tables
sqoop list-tables --connect "jdbc:jtds:sqlserver://databasehostname.yourdomain.com:1433;useNTLMv2=true;domain=myactivedirectorydomain.com;databaseName=DATABASENAMEHERE" --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username jmiller.admin -P
Pull data example
sqoop import --table TABLENAMEHERE --connect "jdbc:jtds:sqlserver://databasehostname.yourdomain.com:1433;useNTLMv2=true;domain=myactivedirectorydomain.com;databaseName=DATABASENAMEHERE" --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX -P --fields-terminated-by '\001' --target-dir /user/XXXXX/20170313 -m 1 -- --schema dbo
Note* In the above example you need to change the username to your username and database name in the list-tables or pull to the one you need (note the AD account you use will require access to the data).
Integrated authentication does not work with MS SQLServer JDBC driver in a secure cluster with AD integration as the containers will not have the context as the Kerberos tokens are lost when the mappers spawn (as the YARN transitions the job to its internal security subsystem).
Here is my repo that was used as work around to get Kerberos/AD authentication
https://github.com/chandanbalu/mssql-jdbc-krb5, solution implements a Driver that overrides connect method of the latest MS SQL JDBC Driver (mssql-jdbc-9.2.1.jre8.jar), and will get a ticket for keytab file/principal, and gives this connection back.
You can grab the latest build of this custom driver from release folder here
Sqoop command
export HADOOP_CLASSPATH=/efs/home/c795701/mssql-jdbc-krb5/target/scala-2.10/mssql-jdbc-krb5_2.10-1.0.jar:/efs/home/c795701/.ivy2/jars/scala-library-2.11.1.jar
sqoop import -libjars "/efs/home/c795701/mssql-jdbc-krb5/target/scala-2.10/mssql-jdbc-krb5_2.10-1.0.jar,/efs/home/c795701/.ivy2/jars/scala-library-2.11.1.jar" \
-files "/efs/home/c795701/c795701.keytab" \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--driver hadoop.sqlserver.jdbc.krb5.SQLServerDriver \
--connect "jdbc:krb5ss://<SERVER_NAME>:1433;databasename=<DATABASE_NAME>;integratedSecurity=true;authenticationScheme=JavaKerberos;krb5Principal=c795701#NA.DOMAIN.COM;krb5Keytab=/efs/home/c795701/c795701.keytab" \
--query "SELECT TOP 1000 * FROM <TABLE_NAME> WHERE \$CONDITIONS" \
--delete-target-dir \
--target-dir "/dev/product/sandbox/<table_name>" \
--num-mappers 1 \
--verbose \
-- --schema "dbo"
For Windows Authentication and importing directly into Hive this worked for me:
HADOOP_CLASSPATH=/apps/lib/java/jdbc/jtds-1.3.1-patched/jtds-1.3.1.jar \
sqoop import --table XXXXX --connect "jdbc:jtds:sqlserver://XXXX:1433;useNTLMv2=true;domain=XXXX;databaseName=XXXXXX" \
--split-by XXXXX --num-mappers 10 --hive-import --hive-table test --hive-overwrite \
--connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXX -P \
--verbose --target-dir /apps/hive/warehouse/XXXX/XXX-$(uuidgen) \
-- --schema XXXXX

Resources