Run Sqoop command on Windows using jar - database

I am new to HADOOP hdfs and Sqoop.
I installed hadoop hdfs on one machine with single node. I am able to put and get the file to/from hdfs repectively.
Requirement [Do not want to install hadoop and sqoop on client machine]:
I want to access hdfs from differnt machine using WebHDFS without installation of hadoop on client machine and that part is working fine.
To access HDFF, I am using webhdfs java client jar.
Now I want to execute export/import command of sqoop with remote hdfs.
Case: Export to local file system where HADOOP as well as Sqoop is not installed, we are using only HADDOP and Sqoop client jar.
public int importToHdfs(String tablename, String stmpPath){
int resultcode=-1;
try {
String s = File.separator;
String outdir = stmpPath.substring(0,stmpPath.lastIndexOf(s));
String[] str = {"import"
, "--connect", jdbcURL
,"--table",tablename
, "--username", sUserName, "--password", sPassword
, "-m", "1"
,"--target-dir","/tmp/user/hduser"
,"--outdir",outdir
};
Configuration conf = new Configuration();
resultcode = Sqoop.runTool(str,conf);
}catch(Exception ex){
System.out.print(ex.toString());
}
return resultcode;
}

Related

Querying MS Sql Server from a Jenkins Pipeline

I've been using Jenkins (2.289.3) in a docker container (https://hub.docker.com/r/jenkins/jenkins). The next update to Jenkins 2.312 migrates the docker container from Java 8 to Java 11.
I have some pipelines that use the sourceforge jdbc driver to query SQL server (http://jtds.sourceforge.net/)
Example:
import java.sql.DriverManager
import groovy.sql.Sql
con = DriverManager.getConnection('jdbc:jtds:sqlserver://servername', 'user', 'password');
stmt = con.createStatement();
To make this work, in the Docker container on Java 8 I ran this on the docker container
cp jtds-1.3.1.jar ${JAVA_HOME}/jre/lib/ext
Which loads the jar for use inside Jenkins. This method no longer exists with Java 11.
It seems pipelines have added the #Grab syntax, eg
#Grab(group='net.sourceforge.jtds', module='jtds', version='1.3.1')
If I add this to my pipline, I can see the Jars are downloaded in /var/jenkins_home/.groovy/grapes/ but it doesn't seem to actually load the jar
java.lang.ClassNotFoundException: net.sourceforge.jtds.jdbc.Driver
or
java.sql.SQLException: No suitable driver found for jdbc:jtds:sqlserver://servername
depending on which commands I run. Either way, it appears to be due to the jar not being loaded.
All the groovy examples use
#GrabConfig(systemClassLoader=true)
But this appears to not be supported in pipelines.
I've considered using a command line client, but I need to parse the results of queries and I haven't seen a tool that works well for this (ie, one that would load results into a json file or similar)
I've also tried setting the -classpath argument in the docker container, eg
ENV JAVA_OPTS=-classpath /var/jenkins_home/test/jtds-1.3.1.jar
Running ps in the docker container, I can see that the java process runs with the classpath command line option specified, but it doesn't seem to actually load the jar for use.
I'm a bit lost on how to get this working, can anyone help? Thanks.
Well, I've found a workaround. It doesn't seem ideal, but it does work
The original code
import java.sql.DriverManager
import groovy.sql.Sql
con = DriverManager.getConnection('jdbc:jtds:sqlserver://servername', 'user', 'password');
stmt = con.createStatement();
Assuming we have the jar saved in /var/jenkins_home/test/jtds-1.3.1.jar it can be updated to:
import java.sql.DriverManager
import groovy.sql.Sql
def classLoader = this.class.classLoader
while (classLoader.parent) {
classLoader = classLoader.parent
if(classLoader.getClass() == java.net.URLClassLoader)
{
// load our jar into the urlclassloader
classLoader.addURL(new File("/var/jenkins_home/test/jtds-1.3.1.jar").toURI().toURL())
break;
}
}
// register the class
Class.forName("net.sourceforge.jtds.jdbc.Driver")
con = DriverManager.getConnection('jdbc:jtds:sqlserver://servername', 'user', 'password');
stmt = con.createStatement();
Once this code has been run once, the jar seems to be accessible globally (even in other pipelines that don't load the jar).
Based on this, it seems like a good way to handle this is on the Jenkins initialization, rather than in the script at all. I created /var/jenkins_home/init.groovy with these contents:
def classLoader = this.class.classLoader
while (classLoader.parent) {
classLoader = classLoader.parent
if(classLoader.getClass() == java.net.URLClassLoader)
{
classLoader.addURL(new File("/var/jenkins_home/jars/jtds-1.3.1.jar").toURI().toURL())
break;
}
}
Class.forName("net.sourceforge.jtds.jdbc.Driver")
And after that, the scripts seem to behave similar to how I think it should work with the Jar in the classpath.

How to connect GCP Cloud Run Java Spring to Cloud SQL SQL Server

I have a little problem, I want to deploy a Java Spring App to Cloud RUN and take connection from CLOUD SQL SQL SERVER , I know that can connect via unix socket for MySQL and Postgresql (https://cloud.google.com/sql/docs/mysql/connect-run?hl=es-419) but for SQL Server do not has the drivers jet.
Another way is to connect for Proxy Like (https://medium.com/#petomalina/connecting-to-cloud-sql-from-cloud-run-dcff2e20152a)
I tried but I can't, even that when deploy the script, it's tell me that is listenning for 127.0.0.1 for my instance id but when a tried to connect I can't.
here is my docker file
# Use the official maven/Java 8 image to create a build artifact.
# https://hub.docker.com/_/maven
FROM maven:3.5-jdk-8-alpine as builder
# Copy local code to the container image.
WORKDIR /app
COPY pom.xml .
COPY src ./src
COPY ohJpo-2.1.0.jar .
# download the cloudsql proxy binary
# RUN wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 -O ./build/cloud_sql_proxy
# RUN chmod +x ./build/cloud_sql_proxy
COPY cloud_sql_proxy /build/cloud_sql_proxy
RUN chmod +x /build/cloud_sql_proxy
# copy the wrapper script and credentials
COPY run.sh /build/run.sh
COPY credentials.json /build/credentials.json
# Build a release artifact.
RUN mvn install:install-file -Dfile=/app/ohJpo-2.1.0.jar -DgroupId=ovenfo -DartifactId=ohJpo -Dversion=2.1.0 -Dpackaging=jar
RUN mvn package -DskipTests
# Use AdoptOpenJDK for base image.
# It's important to use OpenJDK 8u191 or above that has container support enabled.
# https://hub.docker.com/r/adoptopenjdk/openjdk8
# https://docs.docker.com/develop/develop-images/multistage-build/#use-multi-stage-builds
FROM adoptopenjdk/openjdk8:jdk8u202-b08-alpine-slim
RUN /build/cloud_sql_proxy -instances=idInstanceID=tcp:1433 -credential_file=/build/credentials.json & sleep 10
COPY --from=builder /app/target/helloworld-*.jar /helloworld.jar
# Run the web service on container startup.
CMD ["java","-Djava.security.egd=file:/dev/./urandom","-Dserver.port=${PORT}","-jar","/helloworld.jar"]
I my java application I have this way to connect, in local PC with proxy found wit
#GetMapping("/pruebacuatro")
String pruebacuatro() {
Map<String, String> config = new HashMap<String, String>();
config.put("type", "SQLSERVER");
config.put("url", "127.0.0.1");
config.put("db", "bd");
config.put("username", "user");
config.put("password", "pass");
Object data = null;
Jpo miJpo = null;
try {
miJpo = new Jpo(config);
Procedure store = miJpo.procedure("seg.menu_configuraciones");
data = store.ejecutar();
} catch (Exception e) {
e.printStackTrace();
} finally {
if(miJpo != null) {
try {
miJpo.finalizar();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
return "Contents json: "+(new Gson().toJson(data));
}
I want to connect with my public IP or Private IP from my SQL Server But also I can't find information about that, Do you have any suggestion?
Cloud SQL proxy work in 2 modes: Unix socket and TCP
When you use it on your computer, you should use the TCP mode and you can connect to it in the localhost IP. However, with Cloud Run, it's the unix socket mode which is used and there isn't SQL server client that can use this connexion mode.
Thus, you have to use the Cloud SQL IP to connect your Cloud SQL instance to your Cloud Run.
For your local tests, continue to use Cloud SQL proxy in TCP mode
For Cloud Run, I recommend you to use the private IP of your SQL server.
Expose your instance in your VPC
Create a serverless VPC connector in the correct region
Attach the Serverless VPC connector to your Cloud Run service
Use the Cloud SQL private IP in your code to connect your DB.

Trying to query Azure SQL Database with Azure ML / Docker Image

I wanted to do a realtime deployment of my model on azure, so I plan to create an image which firsts queries an ID in azure SQL db to get the required features, then predicts using my model and returns the predictions. The error I get from PyODBC library is that drivers are not installed
I tried it on the azure ML jupyter notebook to establish the connection and found that no drivers are being installed in the environment itself. After some research i found that i should create a docker image and deploy it there, but i still met with the same results
driver= '{ODBC Driver 13 for SQL Server}'
cnxn = pyodbc.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+ password+';Encrypt=yes'+';TrustServerCertificate=no'+';Connection Timeout=30;')
('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC
Driver 13 for SQL Server' : file not found (0) (SQLDriverConnect)")
i want a result to the query instead i get this message
and/or you could use pymssql==2.1.1, if you add the following docker steps, in the deployment configuration (using either Environments or ContainerImages - preferred is Environments):
from azureml.core import Environment
from azureml.core.environment import CondaDependencies
conda_dep = CondaDependencies()
conda_dep.add_pip_package('pymssql==2.1.1')
myenv = Environment(name="mssqlenv")
myenv.python.conda_dependencies=conda_dep
myenv.docker.enabled = True
myenv.docker.base_dockerfile = 'FROM mcr.microsoft.com/azureml/base:latest\nRUN apt-get update && apt-get -y install freetds-dev freetds-bin vim gcc'
myenv.docker.base_image = None
Or, if you're using the ContainerImage class, you could add these Docker Steps
from azureml.core.image import Image, ContainerImage
image_config = ContainerImage.image_configuration(runtime= "python", execution_script="score.py", conda_file="myenv.yml", docker_file="Dockerfile.steps")
# Assuming this :
# RUN apt-get update && apt-get -y install freetds-dev freetds-bin vim gcc
# is in a file called Dockerfile.steps, it should produce the same result.
See this answer for more details on how I've done it using an Estimator Step and a custom docker container. You could use this Dockerfile to locally create a Docker container for that Estimator step (no need to do that if you're just using an Estimator run outside of a pipeline) :
FROM continuumio/miniconda3:4.4.10
RUN apt-get update && apt-get -y install freetds-dev freetds-bin gcc
RUN pip install Cython
For more details see this posting :using estimator in pipeline with custom docker images. Hope that helps!
Per my experience, I think the comment as #DavidBrowne-Microsoft said is right.
There is a similar SO thread I am getting an error while connecting to an sql DB in Jupyter Notebook answered by me, which I think it will help you to install the latest msodbcsql driver for Linux on Microsoft Azure Notebook or Docker.
Meanwhile, there is a detail about the connection string for Azure SQL Database which you need to carefully note, that you should use {ODBC Driver 17 for SQL Server} instead of {ODBC Driver 13 for SQL Server} if your Azure SQL Database had been created recently (ignore the connection string shown in Azure portal).
you can use AzureML built in solution dataset to connect to your SQL server.
To do so, you can first create an azure_sql_database datastore. reference here
Then create a dataset by passing the datastore you created and the query you want to run.
reference here
sample code
from azureml.core import Dataset, Datastore, Workspace
workspace = Workspace.from_config()
sql_datastore = Datastore.register_azure_sql_database(workspace = workspace,
datastore_name = 'sql_dstore',
server_name = 'your SQL server name',
database_name = 'your SQL database name',
tenant_id = 'your directory ID/tenant ID of the service principal',
client_id = 'the Client ID/Application ID of the service principal',
client_secret = 'the secret of the service principal')
sql_dataset = Dataset.Tabular.from_sql_query((sql_datastore, 'SELECT * FROM my_table'))
You can also do it via UI at ml.azure.com where you can register an azure SQL datastore using your user name and password.

How to Distribute (Export) a Java Application project with Database?

I've created a Java application with a Database running on Derby 10.10.2.0 (JDK 1.7 )
The problem is when NetBeans 7.3.1 is opened and the Database is connected everything works fine .
But when I compile and build this application then go to dist folder inside NetBeans Projects , I run the application and the database won't connect .
This is the code to connect to the Database :
String url = "jdbc:derby://localhost:1527/reflet;create=true";
String driver = "org.apache.derby.jdbc.ClientDriver";
String userName = "root";
String password = "admin";
Class.forName(driver).newInstance();
conn = DriverManager.getConnection(url,userName,password);
conn.close();

Correct path to sqlite database used by JAR imported into my Android app?

My Android app is using a database through an API packaged in an external JAR. Since the JAR is opening/updating/closing the SQLite database file, do I need to make sure it uses its own package name instead of my own? Im getting errors where it fails to open the database file and Im guessing its because the package name Im telling it to use is incorrect: should it be using the package name in the JAR file?
Code that is opening the database:
public AndroidSQLiteDataStore() throws SQLiteException {
Logger logger = LoggerFactory.getDefaultInstance();
logger.information(TAG, "OPEN CONNECTION TO DATABASE " + Config.BN_ANDROID_DB_PATH);
try {
this.db = SQLiteDatabase.openDatabase(Config.ANDROID_DB_PATH, null, SQLiteDatabase.CREATE_IF_NECESSARY | SQLiteDatabase.OPEN_READWRITE);
} catch (SQLiteException e) {
logger.error(TAG, e.toString());
throw e;
}
}
Also tried poking around with the shell:
$ adb shell
$ ls
sqlite_stmt_journals
cache
sdcard
etc
system
sys
sbin
proc
logo.rle
init.sapphire.rc
init.rc
init.goldfish.rc
init
default.prop
data
root
dev
$ cd data
$ ls
opendir failed, Permission denied
$
Since the JAR is
opening/updating/closing the SQLite
database file, do I need to make sure
it uses its own package name instead
of my own?
JARs don't have package names from an Android standpoint.

Resources