Using Scala to Write a DF to a SQL Server Table

Using Scala to Write a DF to a SQL Server Table - sql-server

I'm trying to use the code below.
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._
// Aquire a DataFrame collection (val collection)
val config = Config(Map(
"url" -> "mysqlserver.database.windows.net",
"databaseName" -> "MyDatabase",
"dbTable" -> "dbo.Clients"
"user" -> "username",
"password" -> "*********"
))
import org.apache.spark.sql.SaveMode
collection.write.mode(SaveMode.Append).sqlDB(config)
The script is from this link.
https://github.com/Azure/azure-sqldb-spark
I'm running this in a databricks environment. I'm getting these errors:
command-836397363127942:5: error: object sqlDB is not a member of package com.microsoft.azure
import com.microsoft.azure.sqlDB.spark.connect._
^
command-836397363127942:4: error: object sqlDB is not a member of package com.microsoft.azure
import com.microsoft.azure.sqlDB.spark.config.Config
^
command-836397363127942:7: error: not found: value Config
val bulkCopyConfig = Config(Map(
^
command-836397363127942:18: error: value sqlDB is not a member of org.apache.spark.sql.DataFrameWriter[org.apache.spark.sql.Row]
df.write.mode(SaveMode.Append).sqlDB(bulkCopyConfig)
I'm guessing that some kind of library is not installed correctly. I Googled for an answer, but didn't find anything useful. Any idea how to make this work? Thanks.

If you are getting the sqldb error means all other support libraries already imported to your notebook and only the latest JAR with dependencies are missing.
As per the repro, I got the same error message as shown above:
After bit of research, I had found that you will experience this error due to missing JAR with dependencies.
To resolve this issue, you need to download the JAR file from here: https://search.maven.org/artifact/com.microsoft.azure/azure-sqldb-spark/1.0.2/jar
After downloaded the jar file, upload the JAR library into the cluster and install it.
Note: After installing both the libraries, make sure to restart the cluster.
Now, you will be able to run the command successfully.

I think you have missed the library
If your using Maven Build Add the following library in pom.xml
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-sqldb-spark</artifactId>
<version>1.0.2</version>
</dependency>
If your using SBT Build Add the following library in build.sbt
libraryDependencies += "com.microsoft.azure" % "azure-sqldb-spark" % "1.0.2"

Have you imported and installed the library in DataBricks?
I found it easiest to import the library using Maven. See this answer: How to install a library on a databricks cluster using some command in the notebook?
Note: You need to install the Library on your cluster and then restart the cluster before you can use it.

Related

bibliometrix (Problem in uploading the data)

I am trying to do bibliometrix analysis of articles using the bibliometrix package. I installed the package and then run the library.However, when I am uploading my data, it is showing this error:
Error in readFiles("C:/Users/patel/savedrecs.bib") : could not find
function "readFiles".
Also. when I am running library this is the massage I am getting.
library(bibliometrix) ### load bibliometrix package
Error: package or namespace load failed for ‘bibliometrix’: object
‘scale_type’ is not exported by 'namespace:ggplot2'
In addition:
Warning message: package ‘bibliometrix’ was built under R version
3.4.4
My question is how to upload data? Please help me.

Try to download the package directly from the github repository:
install.packages("devtools")# to install the bibliometrix most recent version from GITHUB
devtools::install_github("massimoaria/bibliometrix")
Beware! The github version, usually, is the most recent and not fixed one.

For windows, I had to install Rtools from https://cran.r-project.org/bin/windows/Rtools/
Then I did the following installs in RStudio
> install.packages("bibliometrix", dependencies=TRUE)
> library(bibliometrix)
To cite bibliometrix in publications, please use:
Aria, M. & Cuccurullo, C. (2017) bibliometrix: An R-tool for comprehensive science mapping analysis, Journal of Informetrics, 11(4), pp 959-975, Elsevier.
http:\\www.bibliometrix.org
To start with the shiny web-interface, please digit:
biblioshiny()
> biblioshiny()
Only then all such prerequisite problems got resolved. Without RTools, all such errors were getting reported.

I re-installed the rtools, then installed the jsonlite package and lastly, installed the bibliometrix package.
now it's working.

Error when adding jar dependence in Zeppelin

Spark works normally in Zeppelin but when I add a jar dependency and run something like:
val df = spark.read.json("path/file1")
I got the the following error:
com.fasterxml.jackson.databind.JsonMappingException: Jackson version is too old 2.5.4
If I run it a second time I got:
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
(Not sure why the error changes for the second run)
If I do not attach the jar the code works normally. Is this connected with conflicts maybe?
Tried to do something like the following in the interpreter option in Spark but still the same error:
artifact: path/utils.jar
exclude: com.fasterxml.jackson.databind (Is this the correct way to write how to exclude jackson?)
Any insights?

Error with module using Cloud Storage with Python and his tutorial

I'm trying to test Google Cloud Storage to store images (I need it in an app that I'm developing) and I'm following the Bookshelf App tutorial that they have in his webpage.
I'm using python and the problem is that when I execute the requirementes.txt all packages have been installed fine, but when I try execute the code, I see this error:
...sandbox.py", line 948, in load_module
raise ImportError('No module named %s' % fullname)
ImportError: No module named cryptography.hazmat.bindings._openssl
I have been trying hundred of posibles solutions, reinstalling only the cryptography package, trying to use different versions of the same module, and installing other packages that contains it but anything resolved the problem.
The requirements contains this:
Flask==0.10.1
gcloud==0.9.0
gunicorn==19.4.5
oauth2client==1.5.2
Flask-SQLAlchemy==2.1
PyMySQL==0.7.1
Flask-PyMongo==0.4.0
PyMongo==3.2.1
six==1.10.0
I'm sure that it is a simple error but I don't find the way to solve it.
Any help will be welcome. Thanks.
EDIT:
When I try do this with a python program this work fine:
import os
from gcloud import storage
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'key.json'
client = storage.Client(project='xxxxxxx')
bucket = client.get_bucket('yyyyyyy')
f = open('profile.jpg', 'rb')
blob = bucket.blob(f.name)
blob.upload_from_string(f.read(), 'image/jpeg')
url = blob.public_url
print url
Why I don't can use gcloud library without erros in a GAE app?

It seems you're following the bookshelf tutorial, but according to this line in your stacktrace:
...sandbox.py", line 948, in load_module
It hints that you're using dev_appserver.py to run the code. This isn't necessary for Managed VMs/Flexible unless you're using the compat runtime.
If this is the case, the tutorial provides correct instructions:
$ virtualenv env
$ source env/bin/activate
$ pip install -r requirements.txt
$ python main.py
(If this is not the case, please feel free to comment on this with more details about how you're running your application).

Python dbconnection

I have a python code(python 3.5.1) in which I need to connect to SQL server. I am using the module 'pyodc' but its throwing me an error
"ImportError: No module named 'pyodbc'"
I am using eclipse on Windows 8.
Then I downloaded pyodc package from "https://github.com/mkleehammer/pyodbc". I tried to install using pip first but its not working on Windows 8.1:
"building 'pyodbc' extension error: Unable to find vcvarsall.bat'.
Then I tried running setup.py file in the folder. But still i am not able to import pyodc module.
Can anyone help me in properly importing the module. I am very new to python and I know there might be some beginner mistake.

gaemachanize: ImportError: No module named _winreg

gaemechanize used to work fine when I try to do
import gaemechanize
. But recently, I updated GAE to newest version and I get this error when I run on local machine (gaemachanize works fine when upload to server)
Internal Server Error
import_string() failed for 'myapp.views.index'. Possible reasons are: - missing __init__.py in a package; - package or module path not included in sys.path; - duplicated package or module name taking precedence in sys.path; - missing module, class, function or variable; Debugged import: - 'myapp' found in 'D:\\Dropbox\\GAE Proj\\xxxx\\myapp\\__init__.pyc'. - 'myapp.views' not found. Original exception: ImportError: No module named _winreg
I found import _winreg in _msiecookiejar.py file in gaemechanize
I am using python 2.7.7 on windows 7
I got error when I try to do
pip install _winreg
How to fix it?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Using Scala to Write a DF to a SQL Server Table - sql-server

Related

bibliometrix (Problem in uploading the data)

Error when adding jar dependence in Zeppelin

Error with module using Cloud Storage with Python and his tutorial

Python dbconnection

gaemachanize: ImportError: No module named _winreg

Categories

Resources