Apache Flink JDBC InputFormat throwing java.net.SocketException: Socket closed - apache-flink

I am querying oracle database using Flink DataSet API. For this I have customised Flink JDBCInputFormat to return java.sql.Resultset. As I need to perform further operation on resultset using Flink operators.
public static void main(String[] args) throws Exception {
ExecutionEnvironment environment = ExecutionEnvironment.getExecutionEnvironment();
environment.setParallelism(1);
#SuppressWarnings("unchecked")
DataSource<ResultSet> source
= environment.createInput(JDBCInputFormat.buildJDBCInputFormat()
.setUsername("username")
.setPassword("password")
.setDrivername("driver_name")
.setDBUrl("jdbcUrl")
.setQuery("query")
.finish(),
new GenericTypeInfo<ResultSet>(ResultSet.class)
);
source.print();
environment.execute();
}
Following is the customised JDBCInputFormat:
public class JDBCInputFormat extends RichInputFormat<ResultSet, InputSplit> implements ResultTypeQueryable {
#Override
public void open(InputSplit inputSplit) throws IOException {
Class.forName(drivername);
dbConn = DriverManager.getConnection(dbURL, username, password);
statement = dbConn.prepareStatement(queryTemplate, resultSetType, resultSetConcurrency);
resultSet = statement.executeQuery();
}
#Override
public void close() throws IOException {
if(statement != null) {
statement.close();
}
if(resultSet != null)
resultSet.close();
if(dbConn != null) {
dbConn.close();
}
}
#Override
public boolean reachedEnd() throws IOException {
isLastRecord = resultSet.isLast();
return isLastRecord;
}
#Override
public ResultSet nextRecord(ResultSet row) throws IOException{
if(!isLastRecord){
resultSet.next();
}
return resultSet;
}
}
This works with below query having limit in the row fetched:
SELECT a,b,c from xyz where rownum <= 10;
but when I try to fetch all the rows having approx 1 million of data, I am getting the below exception after fetching random number of rows:
java.sql.SQLRecoverableException: Io exception: Socket closed
at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:101)
at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:133)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:199)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:263)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:521)
at oracle.jdbc.driver.T4CPreparedStatement.fetch(T4CPreparedStatement.java:1024)
at oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next(OracleResultSetImpl.java:314)
at oracle.jdbc.driver.OracleResultSetImpl.next(OracleResultSetImpl.java:228)
at oracle.jdbc.driver.ScrollableResultSet.cacheRowAt(ScrollableResultSet.java:1839)
at oracle.jdbc.driver.ScrollableResultSet.isValidRow(ScrollableResultSet.java:1823)
at oracle.jdbc.driver.ScrollableResultSet.isLast(ScrollableResultSet.java:349)
at JDBCInputFormat.reachedEnd(JDBCInputFormat.java:98)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:173)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Socket closed
at java.net.SocketOutputStream.socketWrite0(Native Method)
So for my case, how i can solve this issue?

I don't think it is possible to ship a ResultSet like a regular record. This is a stateful object that internally maintains a connection to the database server. Using a ResultSet as a record that is transferred between Flink operators means that it can be serialized, shipped over the via the network to another machine, deserialized, and handed to a different thread in a different JVM process. That does not work.
Depending on the connection a ResultSet might as well stay on the same machine in the same thread, which might be the case that worked for you. If you want to query a database from within an operator, you could implement the function as a RichMapPartitionFunction. Otherwise, I'd read the ResultSet in the data source and forward the resulting rows.

Related

Flink JDBC Sink part 2

I have posted a question few days back- Flink Jdbc sink
Now, I am trying to use the sink provided by flink.
I have written the code and it worked as well. But nothing got saved in DB and no exceptions were there. Using previous sink my code was not finishing(that should happen ideally as its a streaming app) but after the following code I am getting no error and the nothing is getting saved to DB.
public class CompetitorPipeline implements Pipeline {
private final StreamExecutionEnvironment streamEnv;
private final ParameterTool parameter;
private static final Logger LOG = LoggerFactory.getLogger(CompetitorPipeline.class);
public CompetitorPipeline(StreamExecutionEnvironment streamEnv, ParameterTool parameter) {
this.streamEnv = streamEnv;
this.parameter = parameter;
}
#Override
public KeyedStream<CompetitorConfig, String> start(ParameterTool parameter) throws Exception {
CompetitorConfigChanges competitorConfigChanges = new CompetitorConfigChanges();
KeyedStream<CompetitorConfig, String> competitorChangesStream = competitorConfigChanges.run(streamEnv, parameter);
//Add to JBDC Sink
competitorChangesStream.addSink(JdbcSink.sink(
"insert into competitor_config_universe(marketplace_id,merchant_id, competitor_name, comp_gl_product_group_desc," +
"category_code, competitor_type, namespace, qualifier, matching_type," +
"zip_region, zip_code, competitor_state, version_time, compConfigTombstoned, last_updated) values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",
(ps, t) -> {
ps.setInt(1, t.getMarketplaceId());
ps.setLong(2, t.getMerchantId());
ps.setString(3, t.getCompetitorName());
ps.setString(4, t.getCompGlProductGroupDesc());
ps.setString(5, t.getCategoryCode());
ps.setString(6, t.getCompetitorType());
ps.setString(7, t.getNamespace());
ps.setString(8, t.getQualifier());
ps.setString(9, t.getMatchingType());
ps.setString(10, t.getZipRegion());
ps.setString(11, t.getZipCode());
ps.setString(12, t.getCompetitorState());
ps.setTimestamp(13, Timestamp.valueOf(t.getVersionTime()));
ps.setBoolean(14, t.isCompConfigTombstoned());
ps.setTimestamp(15, new Timestamp(System.currentTimeMillis()));
System.out.println("sql"+ps);
},
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl("jdbc:mysql://127.0.0.1:3306/database")
.withDriverName("com.mysql.cj.jdbc.Driver")
.withUsername("xyz")
.withPassword("xyz#")
.build()));
return competitorChangesStream;
}
}
You need enable autocommit mode for jdbc Sink.
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl("jdbc:mysql://127.0.0.1:3306/database;autocommit=true")
It looks like SimpleBatchStatementExecutor only works in auto-commit mode. And if you need to commit and rollback batches, then you have to write your own ** JdbcBatchStatementExecutor **
Have you tried to include the JdbcExecutionOptions ?
dataStream.addSink(JdbcSink.sink(
sql_statement,
(statement, value) -> {
/* Prepared Statement */
},
JdbcExecutionOptions.builder()
.withBatchSize(5000)
.withBatchIntervalMs(200)
.withMaxRetries(2)
.build(),
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl("jdbc:mysql://127.0.0.1:3306/database")
.withDriverName("com.mysql.cj.jdbc.Driver")
.withUsername("xyz")
.withPassword("xyz#")
.build()));

How to handle exception while parsing JSON in Flink

I am reading data from Kafka using flink 1.4.2 and parsing them to ObjectNode using JSONDeserializationSchema. If the incoming record is not a valid JSON then my Flink job fails. I would like to skip the broken record instead of failing the job.
FlinkKafkaConsumer010<ObjectNode> kafkaConsumer =
new FlinkKafkaConsumer010<>(TOPIC, new JSONDeserializationSchema(), consumerProperties);
DataStream<ObjectNode> messageStream = env.addSource(kafkaConsumer);
messageStream.print();
I am getting the following exception if the data in Kafka is not a valid JSON.
Job execution switched to status FAILING.
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'This': was expecting ('true', 'false' or 'null')
at [Source: [B#4f522623; line: 1, column: 6]
Job execution switched to status FAILED.
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
The easiest solution is to implement your own DeserializationSchema and wrap JSONDeserializationSchema. You can then catch the exception and either ignore it or perform custom action.
As suggested by #twalthr, I implemented my own DeserializationSchema by copying JSONDeserializationSchema and added exception handling.
import org.apache.flink.api.common.serialization.AbstractDeserializationSchema;
import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ObjectNode;
import java.io.IOException;
public class CustomJSONDeserializationSchema extends AbstractDeserializationSchema<ObjectNode> {
private ObjectMapper mapper;
#Override
public ObjectNode deserialize(byte[] message) throws IOException {
if (mapper == null) {
mapper = new ObjectMapper();
}
ObjectNode objectNode;
try {
objectNode = mapper.readValue(message, ObjectNode.class);
} catch (Exception e) {
ObjectMapper errorMapper = new ObjectMapper();
ObjectNode errorObjectNode = errorMapper.createObjectNode();
errorObjectNode.put("jsonParseError", new String(message));
objectNode = errorObjectNode;
}
return objectNode;
}
#Override
public boolean isEndOfStream(ObjectNode nextElement) {
return false;
}
}
In my streaming job.
messageStream
.filter((event) -> {
if(event.has("jsonParseError")) {
LOG.warn("JsonParseException was handled: " + event.get("jsonParseError").asText());
return false;
}
return true;
}).print();
Flink has improved null record handling for FlinkKafkaConsumer
There are two possible design choices when the DeserializationSchema encounters a corrupted message. It can either throw an IOException which causes the pipeline to be restarted, or it can return null where the Flink Kafka consumer will silently skip the corrupted message.
For more details, you can see this link.

Would singleton database connection affect performance in a weblogic clustered environment?

I have a Java EE struts web application using a singleton database connection. In the past, there is only one weblogic server, but now, there are two weblogic servers in a cluster.
Session replication have been tested to be working in this cluster. The web application consist of a few links that will open up different forms for the user to fill in. Each form has a dynamic dropdownlist that will populate some values depending on which form is clicked. These dropdownlist values are retrieved from the oracle database.
One unique issue is that the first form that is clicked, might took around 2-5 seconds, and the second form clicked could take forever to load or more than 5 mins. I have checked the codes and happened to know that the issue lies when an attempt to call the one instance of the db connection. Could this be a deadlock?
public static synchronized DataSingleton getDataSingleton()
throws ApplicationException {
if (myDataSingleton == null) {
myDataSingleton = new DataSingleton();
}
return myDataSingleton;
}
Any help in explaining such a scenario would be appreciated.
Thank you
A sample read operation calling Singleton
String sql = "...";
DataSingleton myDataSingleton = DataSingleton.getDataSingleton();
conn = myDataSingleton.getConnection();
try {
PreparedStatement pstmt = conn.prepareStatement(sql);
try {
pstmt.setString(1, userId);
ResultSet rs = pstmt.executeQuery();
try {
while (rs.next()) {
String group = rs.getString("mygroup");
}
} catch (SQLException rsEx) {
throw rsEx;
} finally {
rs.close();
}
} catch (SQLException psEx) {
throw psEx;
} finally {
pstmt.close();
}
} catch (SQLException connEx) {
throw connEx;
} finally {
conn.close();
}
The Singleton class
/**
* Private Constructor looking up for Server's Datasource through JNDI
*/
private DataSingleton() throws ApplicationException {
try {
Context ctx = new InitialContext();
SystemConstant mySystemConstant = SystemConstant
.getSystemConstant();
String fullJndiPath = mySystemConstant.getFullJndiPath();
ds = (DataSource) ctx.lookup(fullJndiPath);
} catch (NamingException ne) {
throw new ApplicationException(ne);
}
}
/**
* Singleton: To obtain only 1 instance throughout the system
*
* #return DataSingleton
*/
public static synchronized DataSingleton getDataSingleton()
throws ApplicationException {
if (myDataSingleton == null) {
myDataSingleton = new DataSingleton();
}
return myDataSingleton;
}
/**
* Fetching SQL Connection through Datasource
*
*/
public Connection getConnection() throws ApplicationException {
Connection conn = null;
try {
if (ds == null) {
}
conn = ds.getConnection();
} catch (SQLException sqlE) {
throw new ApplicationException(sqlE);
}
return conn;
}
It sounds like you may not be committing the transaction at the end of your use of the connection.
What's in DataSingleton - is it a database connection? Allowing multiple threads to access the same database connection is not going to work, for example once you have more than one user. Why don't you use a database connection pool, for example a DataSource?

Error in SQL Server to WCF development

I am testing a DB that have two tables (Satellite and Channel) to be exposed as I need using WCF. fortunately, I tried everything I know and online for more that I week now and I can't solve the problem.
This is the service contract IService.cs
[ServiceContract]
public interface IService
{
[OperationContract]
List<Satalite> SelectSatalite(int satNum);
[OperationContract]
List<Satalite> SataliteList();
[OperationContract]
List<Channel> ChannelList(int satNum);
[OperationContract]
String Sat(int satNum);
}
And this is the Service.svc.cs file
public class Service : IService
{
DataDbDataContext DbObj = new DataDbDataContext();
public List<Satalite> SataliteList()
{
var satList = from r in DbObj.Satalites
select r;
return satList.ToList();
}
public List<Satalite> SelectSatalite(int satNum)
{
var satList = from r in DbObj.Satalites
where r.SateliteID == satNum
select r;
return satList.ToList();
}
public List<Channel> ChannelList(int satNum)
{
var channels = from r in DbObj.Channels
where r.SateliteID == satNum
select r;
return channels.ToList();
}
public String Sat(int satNum)
{
Satalite satObj = new Satalite();
satObj = DbObj.Satalites.Single(p => p.SateliteID == satNum);
return satObj.Name;
}
}
Whenever I try to run the first three I got an error when testing them using wcftestclient.exe, the last one works with no issues.
The underlying connection was closed: The connection was closed
unexpectedly.
Server stack trace:
at System.ServiceModel.Channels.HttpChannelUtilities.ProcessGetResponseWebException(WebException
webException, HttpWebRequest request, HttpAbortReason abortReason)
at System.ServiceModel.Channels.HttpChannelFactory.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan
timeout)
at System.ServiceModel.Channels.RequestChannel.Request(Message message,
TimeSpan timeout)
at System.ServiceModel.Dispatcher.RequestChannelBinder.Request(Message
message, TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannel.Call(String action,
Boolean oneway, ProxyOperationRuntime operation, Object[] ins,
Object[] outs, TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage
methodCall, ProxyOperationRuntime operation)
at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage
message)
Exception rethrown at [0]:
at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage
reqMsg, IMessage retMsg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData&
msgData, Int32 type) at IService.SelectSatalite(Int32 satNum)
at ServiceClient.SelectSatalite(Int32 satNum)
Inner Exception: The underlying connection was closed: The connection
was closed unexpectedly.
at System.Net.HttpWebRequest.GetResponse()
at System.ServiceModel.Channels.HttpChannelFactory.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan
timeout)
What I understand is that the error happens for the custom classes which are the DB tables, if I used a known type by the .net compiler (ex. int or string) it will work with no problems. Fortunately, I didn't find a solution.
The error appears to be one of two reasons:
a timeout since you're returning too much data, e.g. the selection of the data from the database takes too long for the service method to complete in time
or:
the message size is too large, because you're selecting too much data, and thus the WCF communication aborts before the whole data has been returned
My solution:
don't select all data from the tables! Return only as much data as you can really handle / display, e.g. 10 rows, 20 rows or a maximum of 100 rows....
Try this - if you change your method to:
public List<Satalite> SataliteList(int count)
{
var satList = (from r in DbObj.Satalites
select r).Take(count);
return satList.ToList();
}
Can you call this from the WCF Test Client with e.g. count = 10 or count = 50 ??
Adjusting timeout settings on server and client side will help you.
Server Side adjust the SendTimeout attribute of binding element and on client side adjust the RecieveTimeout attribute of binding element.
Thanks,

SQL Server CLR Integration enlisting in current transaction

I'm trying to use CLR integration in SQL Server to handle accessing external files instead of storing them internally as BLOBs. I'm trying to figure out the pattern I need to follow to make my code enlist in the current SQL transaction. I figured I would start with the simplest scenario, deleting an existing row, since the insert/update scenarios would be more complex.
[SqlProcedure]
public static void DeleteStoredImages(SqlInt64 DocumentID)
{
if (DocumentID.IsNull)
return;
using (var conn = new SqlConnection("context connection=true"))
{
conn.Open();
string FaceFileName, RearFileName;
int Offset, Length;
GetFileLocation(conn, DocumentID.Value, true,
out FaceFileName, out Offset, out Length);
GetFileLocation(conn, DocumentID.Value, false,
out RearFileName, out Offset, out Length);
new DeleteTransaction().Enlist(FaceFileName, RearFileName);
using (var comm = conn.CreateCommand())
{
comm.CommandText = "DELETE FROM ImagesStore WHERE DocumentID = " + DocumentID.Value;
comm.ExecuteNonQuery();
}
}
}
private class DeleteTransaction : IEnlistmentNotification
{
public string FaceFileName { get; set; }
public string RearFileName { get; set; }
public void Enlist(string FaceFileName, string RearFileName)
{
this.FaceFileName = FaceFileName;
this.RearFileName = RearFileName;
var trans = Transaction.Current;
if (trans == null)
Commit(null);
else
trans.EnlistVolatile(this, EnlistmentOptions.None);
}
public void Commit(Enlistment enlistment)
{
if (FaceFileName != null && File.Exists(FaceFileName))
{
File.Delete(FaceFileName);
}
if (RearFileName != null && File.Exists(RearFileName))
{
File.Delete(RearFileName);
}
}
public void InDoubt(Enlistment enlistment)
{
}
public void Prepare(PreparingEnlistment preparingEnlistment)
{
preparingEnlistment.Prepared();
}
public void Rollback(Enlistment enlistment)
{
}
}
When I actually try to run this, I get the following exception:
A .NET Framework error occurred during execution of user defined routine or aggregate 'DeleteStoredImages':
System.Transactions.TransactionException: The operation is not valid for the state of the transaction. ---> System.Transactions.TransactionPromotionException: MSDTC on server 'BD009' is unavailable. ---> System.Data.SqlClient.SqlException: MSDTC on server 'BD009' is unavailable.
System.Data.SqlClient.SqlException:
at System.Data.SqlServer.Internal.StandardEventSink.HandleErrors()
at System.Data.SqlServer.Internal.ClrLevelContext.SuperiorTransaction.Promote()
System.Transactions.TransactionPromotionException:
at System.Data.SqlServer.Internal.ClrLevelContext.SuperiorTransaction.Promote()
at System.Transactions.TransactionStatePSPEOperation.PSPEPromote(InternalTransaction tx)
at System.Transactions.TransactionStateDelegatedBase.EnterState(InternalTransaction tx)
System.Transactions.TransactionException:
at System.Transactions.TransactionState.EnlistVolatile(InternalTransaction tx, IEnlistmentNotification enlistmentNotification, EnlistmentOptions enlistmentOptions, Transaction atomicTransaction)
at System.Transactions.TransactionStateSubordinateActive.EnlistVolatile(InternalTransaction tx, IEnlistmentNotification enlistmentNotification, EnlistmentOptions enlistmentOptions, Transaction atomicTransaction)
at System.Transactions.Transaction.EnlistVolatile(IEnlistmentNotification enlistmentNotification, EnlistmentOptions enlistmentOptions)
at ExternalImages.StoredProcedures.DeleteTransaction.Enlist(String FaceFileName, String RearFileName)
at ExternalImages.StoredProcedures.DeleteStoredImages(SqlInt64 DocumentID)
. User transaction, if any, will be rolled back.
The statement has been terminated.
Can anyone explain what I'm doing wrong, or point me to an example of how to do it right?
You have hopefully solved this by now, but in case anyone else has a similar problem: the error message you are getting suggests that you need to start the Distributed Transaction Coordinator service on the BD009 machine (presumably your own machine).
#Aasmund's answer regarding the Distributed Transaction Coordinator might solve the stated problem, but that still leaves you in a non-ideal state: You are tying a transaction, which locks the ImagesStore table (even if it is just a RowLock), to two file system operations? And you need to BEGIN and COMMIT the transaction outside of this function (since that isn't being handled in the presented code).
I would separate those two pieces:
Step 1: Delete the row from the table
and then, IF that did not error,
Step 2: Delete the file(s)
In the scenario where Step 1 succeeds but then Step 2, for whatever reason, fails, do one or both of the following:
return an error status code and keep track of which DocumentIDs got an error when attempting to delete the file in a status table. You can use that to manually delete the files and/or debug why the error occurred.
create a process that can run periodically to find and remove unreferenced files.

Resources