Can I use SQL Server database with Apache Mahout? - sql-server

I chose to use Apache Mahout as my recommendation engine but at the same time due to some reasons it would be easier if I could store my data in a SQL Server db. Can mahout be connected with SQL Server without any problems ?
The documentation says that it can be connected with other db engines through JDB driver but I see all articles , books using mysql and also the data model supported are for mysql only.

How to convert MySQL to SQL Server databases:
SQL Import/Export Wizard through ODBC (http://www.mssqltips.com/sqlservertutorial/2205/mysql-to-sql-server-data-migration/)
SQL Server Migration Assistant (http://msdn.microsoft.com/en-us/library/hh313125(v=sql.110).aspx)
Here is the JDBC driver for SQL server:
JDBC Driver for SQL Server: http://msdn.microsoft.com/en-us/sqlserver/aa937724.aspx
Changing DB input format/driver in Hadoop cluster: http://blog.cloudera.com/blog/2009/03/database-access-with-hadoop/
There are also numerous example of using Mahout with an Azure Hadoop Cluster via HDInsight:
http://bluewatersql.wordpress.com/2013/04/12/installing-mahout-for-hdinsight-on-windows-server/
http://www.codeproject.com/Articles/620717/Building-A-Recommendation-Engine-Machine-Learning

I have just started my experiments with Mahout. I managed to run some book examples after replacing the in-memory data models with SQL92JDBCDataModel or SQL92BooleanPrefJDBCDataModel shipped with Mahout 0.9.
I passed an instance of SQLServerDataSource to constructors of those data models. This class is included into the Microsoft JDBC Drivers for SQL Server package (I used the version 4.1)
However, the SQL92JDBCDataModel documentaton states that it is "not optimized for performance".

Related

Old SQL Server DB without CDC feature streaming ingest to Kafka

I would like to have a SQL Server DB as a source into Kafka.
However, my SQL Server version is old (2012 Standard) and does not have the CDC feature, what are my options?
The 2 ways I've seen, from the Kafka documentation and blogposts, are to use the Debezium connector or the JDBC Connector. The content I read was from a 2018 post and so I wanted to check, now in 2022, if there are any new options that I'm missing.
Given the situation (SQL Server 2012), here are the choices I could think of:
Upgrade the DB to a later version, enable CDC, and work with the Debezium SQL Server Kafka Connect Source connector in true CDC
use the Confluent JDBC source connector
Are these the only 2 good options even in 2022?
(By good I mean please exclude telling me to write your own query-based CDC code).
Is the decision really just between:
ideal: I can upgrade the DB to a later SQL Server Version with CDC: then enable CDC and use the Debezium connector
not so ideal: I cannot upgrade the DB, no choice but to use the JDBC Source Connector
?
p.s I have referred to and read this question but that was from 2019 and also I wanted to know if upgrading and then using Debezium is a good idea

Why does AWS Schema Conversion Tool use the JAR format for for MS SQL Server?

During a demo in a recent AWS webinar, a JAR file was specified as a driver for the Schema Conversion Tool to connect to an MS SQL Server db (I am providing a screenshot from the webinar). Why was JDBC chosen? Was this optional, and was a choice made by the person who presented the webinar? If so, what other options would be available as a driver?
From Schema Conversion Tool documentation:
For the AWS SCT to work correctly, you must install the JDBC drivers
for your source and target database engines.

Connecting data source to CRM in SSIS

Intern at a company. Learning SSIS. When I am connecting a data source to CRM 4.0. Am I to create connections for both OLE DB and ODBC or just ODBC?
Take a look at: http://blogs.msdn.com/b/crm/archive/2008/05/07/integrating-crm-using-sql-integration-services-ssis.aspx
To be able to make updates and create records you really need to use a web service as described in the blog above.
OLE DB Drivers would help you connect to the SQL Database that CRM 4.0 is using, but that would be no different than just connecting to a MS SQL Database. If you want to transfer data, manipulating the database directly is unsupported (and a bad idea).

Derby Database ODBC Connection

I have a Derby Database in Netbeans with connection string
jdbc:derby://localhost:1527/MyDatabase
Can this be used with ODBC? If so, how could I transform this or configure my Derby Database to be compliant with ODBC
The end goal is to get the Derby Database migrated to MySQL. Using the migrating wizard in MySQL Workbench appears to be the easiest way. However I do need ODBC connectivity.
Do you want to use ODBC because MySQL Workbench uses it to migrate database?
I migrated some databases between various engines and my favorite technology is to convert DDL schema (CREATE TABLE, CREATE VIEW etc) using specialized Python program. Then I use JDBC with getObject()/setObject() methods to copy data. You can see such copy database Jython program (Jython is a Python language that works using JVM and can use JDBC drivers) in my response to Blob's migration data from Informix to Postgres .
BTW Quick search shows that IBM have ODBC driver to Derby (they bought Informix that earlier bought Cloudscape): http://www.ibm.com/developerworks/data/library/techarticle/dm-0409cline2/
Use OpenDBCopy, which is an opensource database utility to migrate data from and to any database via JDBC connection.
You can copy table structures as well as data from any supported database.

DB2 database in Oracle SQL developer

I've heard it's possible to connect to a mainframe DB2 database with a client like Oracle SQL developer. I've looked on-line and can't seem to find the connector files needed to do this in SQL developer. Can anyone direct me to a link to make this work? Or tell me if im just looking for the wrong thing to begin with. I've got the connector working with MySQL databases in Oracle, so I assumed it would be similar for a DB2 database.
To enable DB2 in SQL Developer, you need to pull out the db2jcc.jar
Go to "Oracle SQL Developer" - "Tools" - "Preferences" - > Third Party JDBC Driver
My ORACLE SQL Developer Version is 4.1.1.19 (it mostly works for many versions)
After adding that jar to third party JDBC Drivers. Click on "New Connection".
You should be able to find DB2 Option.
The easiest way to connect to Db2 is through their JDBC Type 4 JCC driver. This driver uses two JARs:
db2jcc4.jar, which is the JDBC 4 driver (The db2jcc.jar JDBC 3 driver has been deprecated.)
db2jcc_license_cisuz.jar, which permits the driver to connect to all Db2 server platforms, including z/OS
Your mainframe DBA should be able to provide you with both of these JARs, and assist you in building a connect string with the proper JDBC driver options.
More information about JDBC drivers for Db2 can be found here: https://www.ibm.com/support/pages/db2-jdbc-driver-versions-and-downloads
SQL Developer supports the following JDBC drivers.
IBM DB2: You need the binary driver jar files db2jcc.jar and db2jcc_license_cu.jar. Search for DB2 Universal JDBC Drivers. https://www.ibm.com/support/pages/location-db2jcclicensecisuzjar-file
Microsoft Access:No additional driver is required. Access uses the JDBC/ODBC bridge
Microsoft SQL Server and Sybase: jTDS driver version 1.2. Download here. The binary driver is located within the jtds-1.2-dist.zip. This jar file is called jtds-1.2.jar.
MySQL: MySQL JDBC Driver, version 5.04. Download here. The binary driver is located within the mysql-connector-java-5.0.4.tar.gz (or .zip). The jar file is called mysql-connector-java-5.0.4-bin.jar.
Teradata: Use Teradata JDBC Driver 12.0 or above. Both the Teradata JDBC Driver 12.0 and 13.0 use the jar files terajdbc4.jar and tdgssconfig.jar. https://www.teradata.com/downloadcenter/

Resources