Does StreamingOutput spawn a new thread to write to the output stream? - cxf

Let us say we have a web service like this:
//This code is taken from a stack overflow question
#Autowired
private Service service;
#GET
#Produces(MediaType.TEXT_PLAIN)
public Response streamExample() {
StreamingOutput stream = service.getStream();
return Response.ok(stream).build();
}
Service class:
public class Service{
public StreamingOutput getStream(){
log.info("Going to start Streaming");
StreamingOutput stream = new StreamingOutput() {
#Override
public void write(OutputStream os) throws IOException,
WebApplicationException {
Writer writer = new BufferedWriter(new OutputStreamWriter(os));
writer.write("test");
log.info("Inside streaming.");
writer.flush();
}
};
log.info("Finished streaming.");
return stream;
}
}
The output in the log file is:
Going to start Streaming.
Finished Streaming.
Inside streaming.
There are two questions I would like to ask regarding this:
1. Is there a new thread being created to stream the output for every request?
2. If there is a hibernate query that I would like to run inside the write method of streaming output how do I associate a session to this thread?

Related

Flink Checkpointing mode ExactlyOnce is not working as expected

I am newbie to flink apologize if my understanding is wrong i am building a dataflow application and the flow contains multiple data streams which check if the required fields are present in the incoming DataStream or not. My application validate the incoming data and if the data is validated successfully it should append the data to file in the given if it is already existing. I am trying to simulate if any exception happens in one DataStream other data streams should not get impacted for that i am explicitly throwing an exception in one of the flow. In the below example for simplicity i am using windows text file to append data
Note: My flow don't have states since i don't have any thing to store in state
public class ExceptionTest {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// start a checkpoint every 1000 ms
env.enableCheckpointing(1000);
// env.setParallelism(1);
//env.setStateBackend(new RocksDBStateBackend("file:///C://flinkCheckpoint", true));
// to set minimum progress time to happen between checkpoints
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);
// checkpoints have to complete within 5000 ms, or are discarded
env.getCheckpointConfig().setCheckpointTimeout(5000);
// set mode to exactly-once (this is the default)
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
// allow only one checkpoint to be in progress at the same time
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
// enable externalized checkpoints which are retained after job cancellation
env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION); // DELETE_ON_CANCELLATION
env.setRestartStrategy(RestartStrategies.fixedDelayRestart(
3, // number of restart attempts
Time.of(10, TimeUnit.SECONDS) // delay
));
DataStream<String> input1 = env.fromElements("hello");
DataStream<String> input2 = env.fromElements("hello");
DataStream<String> output1 = input.flatMap(new FlatMapFunction<String, String>() {
#Override
public void flatMap(String value, Collector<String> out) throws Exception {
//out.collect(value.concat(" world"));
throw new Exception("=====================NO VALUE TO CHECK=================");
}
});
DataStream<String> output2 = input.flatMap(new FlatMapFunction<String, String>() {
#Override
public void flatMap(String value, Collector<String> out) throws Exception {
out.collect(value.concat(" world"));
}
});
output2.addSink(new SinkFunction<String>() {
#Override
public void invoke(String value) throws Exception {
try {
File myObj = new File("C://flinkOutput//filename.txt");
if (myObj.createNewFile()) {
System.out.println("File created: " + myObj.getName());
BufferedWriter out = new BufferedWriter(
new FileWriter("C://flinkOutput//filename.txt", true));
out.write(value);
out.close();
System.out.println("Successfully wrote to the file.");
} else {
System.out.println("File already exists.");
BufferedWriter out = new BufferedWriter(
new FileWriter("C://flinkOutput//filename.txt", true));
out.write(value);
out.close();
System.out.println("Successfully wrote to the file.");
}
} catch (IOException e) {
System.out.println("An error occurred.");
e.printStackTrace();
}
}
});
env.execute();
}
I have few doubts as below
When i am throwing exception in output1 stream the second flow output2 is running even after encountering the exception and writing data to the file in my local but when i check the file the output as below
hello world
hello world
hello world
hello world
As per my understanding from flink documentation if i use the checkpointing mode as EXACTLY_ONCE it should not write the data to file not more than one time as the process is already completed and written data to file. But its not happening in my case and i am not getting if i am doing anything wrong
Please help me to clear my doubts on checkpointing and how can i achieve the EXACTLY_ONCE mechanism i read about TWO_PHASE_COMMIT in flink but i didn't get any example on how to implement it.
As suggested by #Mikalai Lushchytski i implemented StreamingSinkFunction below
With StreamingSinkFunction
public class ExceptionTest {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// start a checkpoint every 1000 ms
env.enableCheckpointing(1000);
// env.setParallelism(1);
//env.setStateBackend(new RocksDBStateBackend("file:///C://flinkCheckpoint", true));
// to set minimum progress time to happen between checkpoints
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);
// checkpoints have to complete within 5000 ms, or are discarded
env.getCheckpointConfig().setCheckpointTimeout(5000);
// set mode to exactly-once (this is the default)
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
// allow only one checkpoint to be in progress at the same time
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
// enable externalized checkpoints which are retained after job cancellation
env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION); // DELETE_ON_CANCELLATION
env.setRestartStrategy(RestartStrategies.fixedDelayRestart(
3, // number of restart attempts
Time.of(10, TimeUnit.SECONDS) // delay
));
DataStream<String> input1 = env.fromElements("hello");
DataStream<String> input2 = env.fromElements("hello");
DataStream<String> output1 = input.flatMap(new FlatMapFunction<String, String>() {
#Override
public void flatMap(String value, Collector<String> out) throws Exception {
//out.collect(value.concat(" world"));
throw new Exception("=====================NO VALUE TO CHECK=================");
}
});
DataStream<String> output2 = input.flatMap(new FlatMapFunction<String, String>() {
#Override
public void flatMap(String value, Collector<String> out) throws Exception {
out.collect(value.concat(" world"));
}
});
String outputPath = "C://flinkCheckpoint";
final StreamingFileSink<String> sink = StreamingFileSink
.forRowFormat(new Path(outputPath), new SimpleStringEncoder<String>("UTF-8"))
.withRollingPolicy(
DefaultRollingPolicy.builder()
.withRolloverInterval(TimeUnit.MINUTES.toMillis(15))
.withInactivityInterval(TimeUnit.MINUTES.toMillis(5))
.withMaxPartSize(1)
.build())
.build();
output2.addSink(sink);
});
env.execute();
}
But when i check the Checkpoint folder i can see it created four part files with in progress as below
Is there anything i am doing because of that its creating multipart files?
In order to guarantee end-to-end exactly-once record delivery (in addition to exactly-once state semantics), the data sink needs to take part in the checkpointing mechanism (as well as the data source).
If you are going to write the data to a file, then you can use a StreamingFileSink, which emits its input elements to FileSystem files within buckets. This is integrated with the checkpointing mechanism to provide exactly once semantics out-of-the box.
If you are going to implement your own sink, then the sink function must implement the CheckpointedFunction interface and properly implement snapshotState(FunctionSnapshotContext context) method called when a snapshot for a checkpoint is requested and flushing the current application state. In addition I would recommend implementing the CheckpointListener interface to be notified once a distributed checkpoint has been completed.
Flink already provides an abstract TwoPhaseCommitSinkFunction, which is a recommended base class for all of the SinkFunction that intend to implement exactly-once semantic. It does that by implementing two phase commit algorithm on top of the CheckpointedFunction and
CheckpointListener. As an example, you can have a look at FlinkKafkaProducer.java source code.

Camel - Enrich CSV from FTP with CSV from local disk using Camel Bindy

The goal is to produce a report every hour by comparing two CSV files with
use of Camel 3.0.0. One is located on a FTP server, the other on disk. How to use poll enrich pattern in combination with unmarshalling the CSV on disk with Bindy Dataformat?
Example code (for simplicity the FTP endpoint is replaced by a file endpoint):
#Component
public class EnricherRoute extends RouteBuilder {
#Override
public void configure() {
from("file://data?fileName=part_1.csv&scheduler=quartz2&scheduler.cron=0+0+0/1+*+*+?")
.unmarshal().bindy(BindyType.Csv, Record.class)
.pollEnrich("file://data?fileName=part_2.csv", new ReportAggregationStrategy())
.marshal().bindy(BindyType.Csv, Record.class)
.to("file://reports?fileName=report_${date:now:yyyyMMdd}.csv");
}
}
The problem in this example is that in the ReportAggregationStrategy the resource (coming from data/part_2.csv, see below) is not unmarshalled. How to unmarshal data/part_2.csv as well?
public class ReportAggregationStrategy implements AggregationStrategy {
#Override
public Exchange aggregate(Exchange original, Exchange resource) {
final List<Record> originalRecords = original.getIn().getBody(List.class);
final List<Record> resourceRecords = resource.getIn().getBody(List.class); // Results in errors!
...
}
}
You can wrap enrichment with direct endpoint and do unmarshaling there.
from("file://data?fileName=part_1.csv&scheduler=quartz2&scheduler.cron=0+0+0/1+*+*+?")
.unmarshal().bindy(BindyType.Csv, Record.class)
.enrich("direct:enrich_record", new ReportAggregationStrategy())
.marshal().bindy(BindyType.Csv, Record.class)
.to("file://reports?fileName=report_${date:now:yyyyMMdd}.csv");
from("direct:enrich_record")
.pollEnrich("file://data?fileName=part_2.csv")
.unmarshal().bindy(BindyType.Csv, Record.class);

Timeout waiting for connection from pool - despite single SolrServer

We are having problems with our solrServer client's connection pool running out of connections in no time, even when using a pool of several hundred (we've tried 1024, just for good measure).
From what I've read, the following exception can be caused by not using a singleton HttpSolrServer object. However, see our XML config below, as well:
Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
at org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:232)
at org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:199)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:455)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
XML Config:
<solr:solr-server id="solrServer" url="http://solr.url.domain/"/>
<solr:repositories base-package="de.ourpackage.data.solr" multicore-support="true"/>
At this point, we are at a loss. We are running a web application on a tomcat7. Whenever a user requests a new website, we send one or more request to the Solr Server, requesting whatever we need, which are usually single entries or page of 20 (using Spring Data).
As for the rest of our implementation, we are using an abstract SolrOperationsrepository class, which is extended by each of our repositories (one repository for each core).
The following is how we set our solrServer. I suspect we are doing something fundamentally wrong here, which is why our connections are overflowing. According to the logs, they are always being returned into the pool, btw.
private SolrOperations solrOperations;
#SuppressWarnings("unchecked")
public final Class<T> getEntityClass() {
return (Class<T>)((ParameterizedType)getClass().getGenericSuperclass()).getActualTypeArguments()[0];
}
public final SolrOperations getSolrOperations() {
/*HttpSolrServer solrServer = (HttpSolrServer)solrOperations.getSolrServer();
solrServer.getHttpClient().getConnectionManager().closeIdleConnections(500, TimeUnit.MILLISECONDS);*/
logger.info("solrOperations: " + solrOperations);
return solrOperations;
}
#Autowired
public final void setSolrServer(SolrServer solrServer) {
try {
String core = SolrServerUtils.resolveSolrCoreName(getEntityClass());
SolrTemplate template = templateHolder.get(core);
/*solrServer.setConnectionTimeout(500);
solrServer.setMaxTotalConnections(2048);
solrServer.setDefaultMaxConnectionsPerHost(2048);
solrServer.getHttpClient().getConnectionManager().closeIdleConnections(500, TimeUnit.MILLISECONDS);*/
if ( template == null ) {
template = new SolrTemplate(new MulticoreSolrServerFactory(solrServer));
template.setSolrCore(core);
template.afterPropertiesSet();
logger.debug("Creating new SolrTemplate for core '" + core + "'");
templateHolder.put(core, template);
}
logger.debug("setting SolrServer " + template);
this.solrOperations = template;
} catch (Exception e) {
logger.error("cannot set solrServer...", e);
}
}
The code that is commented out has been mostly used for testing purposes. I also read somewhere else that you cannot manipulate the solrServer object on-the-fly. Which begs the question, how do I set a timeout/poolsize in the XML config?
The implementation of a repository looks like this:
#Repository(value="stellenanzeigenSolrRepository")
public class StellenanzeigenSolrRepositoryImpl extends SolrOperationsRepository<Stellenanzeige> implements StellenanzeigenSolrRepositoryCustom {
...
public Query createQuery(Criteria criteria, Sort sort, Pageable pageable) {
Query resultQuery = new SimpleQuery(criteria);
if ( pageable != null ) resultQuery.setPageRequest(pageable);
if ( sort != null ) resultQuery.addSort(sort);
return resultQuery;
}
public Page<Stellenanzeige> findBySearchtext(String searchtext, Pageable pageable) {
Criteria searchtextCriteria = createSearchtextCriteria(searchtext);
Query query = createQuery(searchtextCriteria, null, pageable);
return getSolrOperations().queryForPage(query, getEntityClass());
}
...
}
Can any of you point to mistakes that we've made, that could possibly lead to this issue? Like I said, we are at a loss. Thanks in advance, and I will, of course update the question as we make progress or you request more information.
The MulticoreServerFactory always returns an object of HttpClient, that only ever allows 2 concurrent connections to the same host, thus causing the above problem.
This seems to be a bug with spring-data-solr that can be worked around by creating a custom factory and overriding a few methods.
Edit: The clone method in MultiCoreSolrServerFactory is broken. This hasn't been corrected yet. As some of my colleagues have run into this issue recently, I will post a workaround here - create your own class and override one method.
public class CustomMulticoreSolrServerFactory extends MulticoreSolrServerFactory {
public CustomMulticoreSolrServerFactory(final SolrServer solrServer) {
super(solrServer);
}
#Override
protected SolrServer createServerForCore(final SolrServer reference, final String core) {
// There is a bug in the original SolrServerUtils.cloneHttpSolrServer()
// method
// that doesn't clone the ConnectionManager and always returns the
// default
// PoolingClientConnectionManager with a maximum of 2 connections per
// host
if (StringUtils.hasText(core) && reference instanceof HttpSolrServer) {
HttpClient client = ((HttpSolrServer) reference).getHttpClient();
String baseURL = ((HttpSolrServer) reference).getBaseURL();
baseURL = SolrServerUtils.appendCoreToBaseUrl(baseURL, core);
return new HttpSolrServer(baseURL, client);
}
return reference;
}
}

Should new instance created for each request?

I am building using java servlet/jsp. I have a class to handle database connection, but I dont know should I create each instance for each request or one instance for all requests.
For instance:
Scenario 1:
class HandleDB {
public static HandleDB getInstance(); // singleton pattern
public void initConnection();
public void releaseConnection();
}
then,
//at the beginning of a request:
HandleDB.getInstance().initConnection();
// handle tasks
// at the end of request
HandleDB.getInstance().releaseConnection();
Scenario 2:
class HandleDB {
public void initConnection();
public void releaseConnection();
}
//at the beginning of a request:
HandleDB db = new HandleDB();
db.initConnection();
// handle tasks
// at the end of request
db.releaseConnection();
db = null;
Which scenario should be used in practice?
Go with Scenario 2. The problem with Scenario 1 is that the same HandleDB instance will be shared by all requests and could lead to thread safety issues. Keep in mind that requests can be executed in parallel. The standard is to have one connection per thread/request.
Most Web applications use a connection pool (like C3P0 or Apache DBCP) to avoid having to create a new connection for each request. You get a connection from the pool at the beginning of the request and return it to the pool at the end of the request, so other requests can reuse it later.
Use Listeners LINK
public class AppServletContextListener implements ServletContextListener{
#Override
public void contextDestroyed(ServletContextEvent arg0) {
/// Destroy DB Connection
}
#Override
public void contextInitialized(ServletContextEvent arg0) {
/// Create DB Connection
}
}
if you have batch of tasks you should create database connection only at beginning of first task then after finishing all task you should release or free db connection
for your case scenario 1 is applicable.

Why Future<> result from ApiProxy.Delegate.makeAsyncCall() is never used?

I'm playing with GAE hooks and trying to follow Nick's blog post. But apparently it's somewhat outdated because it doesn't have implementation of makeAsyncCall which exists in my GAE SDK 1.6.1.
Here is snippet of my code
public class MultiTenantHook implements Delegate
{
#Override
public Future<byte[]> makeAsyncCall(final Environment env, final String pkgName, final String method, final byte[] request, ApiProxy.ApiConfig config)
{
Callable<byte[]> callable = new Callable<byte[]>()
{
#Override
public byte[] call() throws Exception
{
return makeSyncCall(env, pkgName, method, request);
}
};
FutureTask<byte[]> task = new FutureTask<byte[]>(callable);
return task;
}
}
This method is being called but returned Future<> is never used by GAE. call() method of inner class is never executed.
Do you know how to make it work?

Resources