Accessing a non JDBC DB using mapreduce - database

I have a database which is not JDBC enabled where I am able to fire a query and get the result using an input stream. I want to access this using a map reduce program.
For a JDBC enabled database there are "DBInputFormat.java" and "DBConfiguration.java" files in Hadoop which take care of accessing the database and getting the result in a user-defined class which extends DBWritable and Writable interfaces.
Is there a way in which I can access the above mentioned non-JDBC database in the same fashion ?

I am not sure if your DB supports ODBC. If so you can try jdbc:odbc driver with DBInputFormat. I am not sure if this works as never tried this.
Another option which should be your final option is to implement your own FileInputFormat

Related

Reactive Indexing from SQL Server to Elasticsearch with Spring

I wanted to use reactive spring to index data from a SQL Server database (using R2DBC) to Elasticsearch (using reactive Elasticsearch). I have an entity class corresponding to the table in SQL Server and I have a model class corresponding to the documents which will be indexed by Elasticsearch.
For both databases (SQL Server & Elasticsearch), I have created repositories:
Repository for SQL Server:
#Repository
public interface ProductRepository extends ReactiveCrudRepository<ProductTbl, Long> {
}
Repository for Elasticsearch:
#Repository
public interface ProductElasticRepository extends ReactiveElasticsearchRepository<Product, String> {
}
Shouldn't I be able to index all documents by calling productElasticRepository.saveAll(productRepository.findAll())?
It doesn't quite work. Either it exceeds the DataBufferLimit and therefore throws an Exception or there is a ReadTimeOutException. When executing, I can see R2DBC creating RowTokens for each data entry of the SQL Server database. However a POST to the Elasticsearch client does only occur until all rows are obtained, which doesn't seem to be how reactive code should work? I am definitely missing something here and hope someone can help.
I cannot figure out what exactly is the problem in your case but I remember I did something similar in the past and I remember a few problems I've encountered.
It depends on how much data you need to migrate. If there are millions of rows then it will definitely fail. In this case you can create an algorithm which is going to have a windows, let's say 5000 rows, read them and then write them to elasticsearch using a batch insert. Then you do this until there are no more rows to read.
Another problem I encountered is that the ElasticSearch WebClient wasn't configured to support the amount of data I was sending in the body.
Another: ElasticSearch has a queue capacity of 200 by default, if you exceed that then you will get an error. This happens if you somehow try to insert the data in parallel.
Another: the connection with the relational database will be interrupted at some point if kept open for a very long time
Remember that elasticsearch is not reactive by default. There is a reactive driver at this point but it is not official.
Another: When doing the migration try to write on a single node with not so many shards.
You should do something like
productRepository.findAll()
.buffer(1000)
.onBackpressureBuffer()
.flatMap(list -> productElasticRepository.saveAll(list))
Also, if you're getting ReadTimeOutException, increase the socket timeout
spring:
data:
elasticsearch:
client:
reactive:
socket-timeout: 5m

Acccessing own schema/tables of an H2 embedded database via DB tool?

I have a web application using an embedded (in-memory) H2 database. It seems like the H2 DB is set up correctly on Tomcat startup (no errors in server log), but when accessing the DB (through the application) it seems to be "empty" (no application tables nor data available, even though the log states that they have been created/inserted).
How can I check the H2 DB is really set up or not? I've been trying to connect to the DB using a DB tool (e.g. H2 console, DB Visualizer) but I am not sure about the proper DB connection string or username/pw as it is not explicitly defined in the project.
By raising the log level in the server log, I could at least retrieve this information:
Creating new JDBC Driver Connection to [jdbc:h2:mem:myDataSource;DB_CLOSE_DELAY=-1]
Not sure though whether I am really connected or not because I could pass any user/pw combination and can still "connect"? Probably it's not the right way, because I can only retrieve schemas INFORMATION_SCHEMA and PUBLIC on the DB?
You cannot use a memory database, even a named one, because as soon as the last SQL connection using it is closed, the database will be purged. If you open it again, the only thing you'll see is information_schema tables and views.
I suggest you use embedded mode instead: the database will be persisted, and you can later on open it in another process, a DB viewer, or even in tcp mode if you started H2 server.
As you're using Tomcat, I just wrote today an answer to a similar issue (Embedding an H2 Database within the WEB-INF Directory).
Might be helpful to you: https://stackoverflow.com/a/30638808/3956551

How to connect opencart with mssql server

Is there any chance of connecting opencart with mssql? Have anyone tried? If so what is the procedure of doing that?
That should not be a big problem, You only need to do:
create /system/database/mssql.php class - the class should have the same methods, properties and functionality as e.g. the mysql.php one
rewrite all of the model classes method's queries to meet the MS SQL / T-SQL SQL syntax
in both config files (/config.php and /admin/config.php) set the proper DB_DRIVER - mssql
I am supposing You have the OpenCart database created already due to the /install/opencart.sql file.
I guess nothing more should be done.
Anyway, what is the reason for switching to MS SQL?
EDIT: In /system/database/ there is this mmsql.php file which actually contains the MSSQL class thus this do not have to be implemented, just renamed to mssql.php file.

Testing Connection Parameters with NHibernate

We have a program where users can specify their database connection parameters. The usual suspects including host, port, username, password, and table name. We are connecting to the database using NHibernate. What we'd like to do is be able to build the configuration with NHibernate and then test the connection parameters before continuing with other operations; notifying the user of failure.
Is this possible to do through NHibernate, or will it require using each database type we support's specific driver and creating a custom TestConnection() method for each type?
I realize this is an old post - but I guess an answer to a question never hurts.
I don't think there is a way to explicitly tell NHibernate to test out the connection string. However, when you instantiate the SessionFactory it will attempt to connect to the database. You could wrap your SessionFactory creation in a Try/Catch and handle the error that way.
I use Fluent NHibernate, but I'm sure the following example will still explain the situation.
Dim sf as SessionFactory
Try
sf = CreateSessionFactory()
Catch ex as FluentNHibernate.Cfg.FluentConfigurationException
Messagebox.Show(ex.InnerException.Message)
End Try
The ex.InnerException.Message contains the actual error and will tell you if:
The connection string was invalid
The server could not be found
The user/pass could not be authenticated
To configure Nhibernate you have two options:
Set the dialect when you are building the session factory. This will assign reasonable default value to Nhibernate's ADO and other configuration values.
Manually set the configuration values.
That said, at some point, you need to configure Nhibernate to use the appropriate driver for the database you want to talk to. Which means you need to be able to build Session Factories of different types (your supported database types). To do this you need more than just "host, port, username, password, and table name". You need to know the database type(Dialect).
If you intend to just try to connect the database with every driver available to you not knowing what the database type is, you may run into problems when the database and the dialect don't match. Imagine you use a SqlServer2008 dialect on SqlServer2005 machine. The difference in dialect can cause a particular SqlServer2008 feature you are using not to, obviously, work. Moreover, if you don't stick to basic SQL through out all your code, you may be generating Sql that works, say, in PostgreSql but not in SqlServer (Think sequences and such).
To learn more about configuring Nhibernate read:
Chapter 3: Session Factory Configuration. Specially sections 3.3, 3.4, 3.5 which talk about configuration parameters.
Last note, Nhibernate supports multiple databases. But, for complex domain layers where you rely on database specific constructs, your code doesn't.

h2 in-memory tables, remote connection

I am having problems with creating an in memory table, using H2 database, and accessing it outside of the JVM it is created and running in.
The documentation structures the url as jdbc:h2:tcp://<host>/mem:<databasename>
I've tried many combinations, but simply cannot get the remote connection to work. Is this feature working, can anyone give me the details of how they used this.
None of the solutions mentioned so far worked for me. Remote part just couldn't connect.
According to H2's official documentation:
To access an in-memory database from another process or from another computer, you need to start a TCP server in the same process as the in-memory database was created. The other processes then need to access the database over TCP/IP or TLS, using a database URL such as: jdbc:h2:tcp://localhost/mem:db1.
I marked the crucial part of the text in bold.
And I found a working solution at this guy's blog:
The first process is going to create the DB, with the following URL:
jdbc:h2:mem:db1
and it’s going to need to start a tcp Server:
org.h2.tools.Server server = org.h2.tools.Server.createTcpServer().start();
The other processes can then access your DB by using the following URL:
"jdbc:h2:tcp://localhost/mem:db1"
And that is it! Worked like a charm!
You might look at In-Memory Databases. For a network connection, you need a host and database name. It looks like you want one of these:
jdbc:h2:tcp://localhost/mem:db1
jdbc:h2:tcp://127.0.0.1/mem:db1
Complete examples may be found here, here and here; related examples are examined here.
Having just faced this problem I found I needed to append DB_CLOSE_DELAY=-1 to the JDBC URL for the tcp connection. So my URLs were:
In Memory : jdbc:h2:mem:dbname
TCP Connection : jdbc:h2:tcp://localhost:9092/dbname;DB_CLOSE_DELAY=-1
From the h2 docs:
By default, closing the last connection to a database closes the
database. For an in-memory database, this means the content is lost.
To keep the database open, add ;DB_CLOSE_DELAY=-1 to the database
URL.
Not including DB_CLOSE_DELAY=-1 means that I cannot connect to the correct database via TCP. The connection is made, but it uses a different version to the one created in-memory (validated by using the IFEXISTS=true parameter)
In SpringBoot: https://www.baeldung.com/spring-boot-access-h2-database-multiple-apps
#Bean(initMethod = "start", destroyMethod = "stop")
public Server inMemoryH2DatabaseaServer() throws SQLException {
return Server.createTcpServer(
"-tcp", "-tcpAllowOthers", "-tcpPort", "9090");
}

Resources