Spring Data mongo profiling data - spring-data-mongodb

Is it possible to get profiling data from the spring data mongo db layer.
I know that if I use something like this: http://docs.mongodb.org/manual/tutorial/manage-the-database-profiler/ then I can get some degree of info out of the application.
I could also custom write an aspect to measure the queries and operations.
However, I am looking for a some built in functionality. Is there something via JMX or the like?
Thanks in advance

I searched around quite a bit and was not about to find any thing so I created my mechanism to track timing and metrics to Mongo.
Given that we are using ELK to gather log and some metrics data, I added an annotation and an aspect to track timings. I put this on whatever I wanted measure that connects to our mongo methods. It gather data and puts them in the logs and through Kibana I can see the access under load to Mongo on each type of access.
This is the annotation
#Retention(RetentionPolicy.RUNTIME)
#Target({ElementType.METHOD, ElementType.TYPE})
public #interface TimedMethod {
}
This is the aspect:
#Component()
#Slf4j(topic="com.cisco.services.common.rpil.metrics")
#Aspect
public class TimedMethodAspect {
#Around("#annotation(com.cisco.services.common.rpil.metrics.TimedMethod) && execution(public * *(..))")
public Object time(ProceedingJoinPoint pjp) throws Throwable {
long start = System.nanoTime();
String throwableName = null;
try {
return pjp.proceed();
} catch(Throwable t) {
throwableName = t.getClass().getName();
throw t;
} finally {
long duration = System.nanoTime() - start;
if (throwableName != null) {
log.info("Timed [{}]: {} nsecs, with exception [{}]", pjp.getSignature().toString(), duration, throwableName);
} else {
log.info("Timed [{}]: {} nsecs", pjp.getSignature().toString(), duration);
}
}
}
}
Basically it works like this:
#TimedMethod
public Object measureMe() {
...
}

Related

Creating Prometheus collectors directly in Flink?

I understand that Flink has its own metrics collection abstraction and supports integration with Prometheus. Due to the abstraction it doesn't support some Prometheus concepts directly. I'm trying to do something like this:
message_counter = Counter.build().name("messages").labelNames("source", "dest").register();
// Then..
void processMessage(Message m) {
..
message_counter.labels(m.source, m.dest).inc()
}
Obviously this is for situations where there is a relatively small set of sources and destinations. As far as I can tell, Flink's metrics requires you to pre-register all combinations of the labels in advance, or maintain them in your own data structure and build them as necessary. That's doable, but it feels like I'm re-implementing one of the nice features of simpleclient in my own code.
Is there any way to bypass Flink's abstraction of Prometheus and instantiate counters directly? I've tried the following in combination with metrics.reporters: prom:
class FlinkMetricsExposingMapFunction extends RichMapFunction<Integer, Integer> {
// Using Flink metrics
private transient org.apache.flink.metrics.Counter eventCounter;
// Using Prometheus directly
private static io.prometheus.client.Counter message_counter =
io.prometheus.client.Counter.build()
.name("messages")
.labelNames("source", "dest")
.register();
#Override
public Integer map(Integer value) {
message_counter.labels("in", "out").inc();
return value;
}
}
The problem is that the counter is missing from the metrics endpoint (which happily serves the Flink-defined eventCounter). I examined the contents of defaultRegistry.metricFamilySamples() inside this map function and it only contains the counter I defined and nothing else.
Defining the counter using a transient and setting it in open() fails for different NPE-related reasons that I don't fully understand, but I suspect it's because Prometheus doesn't like the same counter being registered more than once, and I can't figure out or seem to guard against this happening.
Has anyone managed to get this working? Or am I completely on the wrong track?
Edit:
After banging my head against this for some time, I decided that it would be easier to re-implement Prometheus simpleclient's method for defining labels, which turns out to be fairly trivial. It would be nicer if Flink natively supported using labels in this kind of way or if this approach could be made a little more generic, but it's better than nothing.
private transient ConcurrentMap<List<String>, Counter> messageCounterMap;
private Counter labelledCounter(String source, String dest) {
List<Integer> key = Arrays.asList(source, dest);
Counter c = messageCounterMap.get(key);
if (c != null) {
return c;
}
Counter c2 = getRuntimeContext().getMetricGroup().addGroup("source", source)
.addGroup("dest", dest).counter("incoming_messages");
Counter tmp = messageCounterMap.putIfAbsent(key, c2);
return tmp == null ? c2 : tmp;
}
#Override
public void open(Configuration parameters) {
this.messageCounterMap = new ConcurrentHashMap<List<String>, Counter>();
}
#Override
public Integer map(Message msg) {
labelledCounter(msg.source, msg.dest).inc();
}

Flink streaming job is not scaling as expected

We are in the middle of testing scaling ability of Flink. But we found that scaling not working, no matter increase more slot or increase number of Task Manager. We would expect a linear, if not close-to-linear scaling performance but the result even show degradation. Appreciated any comments.
Test Details,
-VMWare vsphere
-Just a simple pass through test,
- auto gen source 3mil records, each 1kb in size, parallelism=1
- source pass into next map operator, which just return the same record, and sent counter to statsD, parallelism is in cases = 2,4,6
3 TM, total 6 slots(2/TM) each JM/TM has 32 vCPU, 100GB memory
Result:
2 slots: 26 seconds, 3mil/26=115k TPS
4 slots: 23 seconds, 3mil/23=130k TPS
6 slots: 22 seconds, 3mil/22=136k TPS
As shown the scaling is almost nothing. Any clue? Thanks.
You really should be using a RichParallelSourceFunction. If you care about making the records from different instances of the source distinct, you can get ahold of each instance's index from the RuntimeContext, which is available via the getRuntimeContext() method in the RichFunction interface.
Also, Flink has a built-in statsd metrics reporter that you should be using instead of rolling your own. Moreover, numRecordsIn, numRecordsOut, numRecordsInPerSecond, and numRecordsOutPerSecond are already being computed for you, so no need to create this instrumentation yourself. You can also access these metrics via Flink's web interface, or the REST API.
As for why you might be experiencing poor scalability with the Kafka consumer, there are many things that could cause this. If you are using event time processing, then idle partitions could be holding things up (see https://issues.apache.org/jira/browse/FLINK-5479). If the stream is keyed, then data skew could be an issue. If you are connecting to an external database or service, then it could easily be a bottleneck. If checkpointing is misconfigured it could cause this. Or insufficient network capacity.
I would start to debug this by looking at some key metrics in the Flink web UI. Is the load well balanced across the sub-tasks, or is it skewed? You could turn on latency tracking and see if one of the kafka partitions is misbehaving (by inspecting the latency at the sink(s), which will be reported on a per-partition basis). And you could look for back pressure.
please refer to the sample code,
public class passthru extends RichMapFunction<String, String> {
public void open(Configuration configuration) throws Exception {
... ...
stats = new NonBlockingStatsDClient();
}
public String map(String value) throws Exception {
... ...
stats.increment();
return value;
}
}
public class datagen extends RichSourceFunction<String> {
... ...
public void run(SourceContext<String> ctx) throws Exception {
int i = 0;
while (run){
String idx = String.format("%09d", i);
ctx.collect("{\"<a 1kb json content with idx in certain json field>\"}");
i++;
if(i == loop)
run = false;
}
}
... ...
}
public class Job {
public static void main(String[] args) throws Exception {
... ...
DataStream<String> stream = env.addSource(new datagen(loop)).rebalance();
DataStream<String> convert = stream.map(new passthru(statsdUrl));
env.execute("Flink");
}
}
the reductionState code,
dataStream.flatMap(xxx).keyBy(new KeySelector<xxx, AggregationKey>() {
public AggregationKey getKey(rec r) throws Exception {
... ...
}
}).process(new Aggr());
public class Aggr extends ProcessFunction<rec, rec> {
private ReducingState<rec> store;
public void open(Configuration parameters) throws Exception {
store= getRuntimeContext().getReducingState(new ReducingStateDescriptor<>(
"reduction store", new ReduceFunction<rec>() {
... ...
}
public void processElement(rec r, Context ctx, Collector<rec> out)
throws Exception {
... ...
store.add(r);

Running a whole scenario before another scenario

I'm not able to figure out how to run a whole scenario before an other scenario, so that my test are not dependant on eachother.
I have this imaginary scenarios.
Scenario A
Given I have something
When I sumbit some data
I should see it on my webpage
Scenario B
Given SCENARIO A
When I delete the data
I should not see it on my webpage
When I run this scenario case, the software does not recognize Scenario A in scenario B, and ask me to create the step, like this...
You can implement missing steps with the snippets below:
#Given("^Registrere formue og inntekt$")
public void registrere_formue_og_inntekt() throws Throwable {
// Write code here that turns the phrase above into concrete actions
throw new PendingException();
}
You could either:
Use a Background to group all the steps that need to be executed before the different scenarii:
Background:
Given I have something
When I submit some data
Then I should see it on my webpage
Scenario: B
When I delete the data
Then I should not see it on my webpage
Group them as part of a step definition:
#Given("^Scenario A")
public void scenario_A() {
I_have_something();
I_submit_some_data();
I_should_see_it_on_my_page();
}
which you can then use like this:
Given Scenario A
When I delete the data
Then I should not see it on my webpage
Using this technique, you usually observe that some actions are constantly reused, and you may want to factor them out so that they can be reused across different step definitions; at that point, the Page Object pattern comes very handy.
Cucumber scenarios are supposed to be independent. A lot of work is done assuming and ensuring that independence. Trying to go against will be an obstacle course.
Having said that, you could create your custom implementation of the Cucumber JUnit runner. Having this custom implementation, and by looking at the source of the original runner, you can expose / wrap / change the internals to allow what you want. For example with the following runner:
public class MyCucumber extends Cucumber {
private static Runtime runtime;
private static JUnitReporter reporter;
private static List<CucumberFeature> features;
public MyCucumber(Class<?> clazz) throws InitializationError, IOException {
super(clazz);
}
#Override
#SuppressWarnings("static-access")
protected Runtime createRuntime(ResourceLoader resourceLoader,
ClassLoader classLoader, RuntimeOptions runtimeOptions)
throws InitializationError, IOException {
this.runtime = super.createRuntime(resourceLoader, classLoader, runtimeOptions);
this.reporter = new JUnitReporter(runtimeOptions.reporter(classLoader), runtimeOptions.formatter(classLoader), runtimeOptions.isStrict());
this.features = runtimeOptions.cucumberFeatures(resourceLoader);
return this.runtime;
}
public static void runScenario(String name) throws Exception {
new ExecutionUnitRunner(runtime, getScenario(name), reporter).run(new RunNotifier());
}
private static CucumberScenario getScenario(String name) {
for (CucumberFeature feature : features) {
for (CucumberTagStatement element : feature.getFeatureElements()) {
if (! (element instanceof CucumberScenario)) {
continue;
}
CucumberScenario scenario = (CucumberScenario) element;
if (! name.equals(scenario.getGherkinModel().getName())) {
continue;
}
return scenario;
}
}
return null;
}
}
You can setup your test suite with:
#RunWith(MyCucumber.class)
public class MyTest {
}
And create a step definition like:
#Given("^I first run scenario (.*)$")
public void i_first_run_scenario(String name) throws Throwable {
MyCucumber.runScenario(name);
}
It is a fragile customization (can break easily with new versions of cucumber-junit) but it should work.

Timeout waiting for connection from pool - despite single SolrServer

We are having problems with our solrServer client's connection pool running out of connections in no time, even when using a pool of several hundred (we've tried 1024, just for good measure).
From what I've read, the following exception can be caused by not using a singleton HttpSolrServer object. However, see our XML config below, as well:
Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
at org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:232)
at org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:199)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:455)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
XML Config:
<solr:solr-server id="solrServer" url="http://solr.url.domain/"/>
<solr:repositories base-package="de.ourpackage.data.solr" multicore-support="true"/>
At this point, we are at a loss. We are running a web application on a tomcat7. Whenever a user requests a new website, we send one or more request to the Solr Server, requesting whatever we need, which are usually single entries or page of 20 (using Spring Data).
As for the rest of our implementation, we are using an abstract SolrOperationsrepository class, which is extended by each of our repositories (one repository for each core).
The following is how we set our solrServer. I suspect we are doing something fundamentally wrong here, which is why our connections are overflowing. According to the logs, they are always being returned into the pool, btw.
private SolrOperations solrOperations;
#SuppressWarnings("unchecked")
public final Class<T> getEntityClass() {
return (Class<T>)((ParameterizedType)getClass().getGenericSuperclass()).getActualTypeArguments()[0];
}
public final SolrOperations getSolrOperations() {
/*HttpSolrServer solrServer = (HttpSolrServer)solrOperations.getSolrServer();
solrServer.getHttpClient().getConnectionManager().closeIdleConnections(500, TimeUnit.MILLISECONDS);*/
logger.info("solrOperations: " + solrOperations);
return solrOperations;
}
#Autowired
public final void setSolrServer(SolrServer solrServer) {
try {
String core = SolrServerUtils.resolveSolrCoreName(getEntityClass());
SolrTemplate template = templateHolder.get(core);
/*solrServer.setConnectionTimeout(500);
solrServer.setMaxTotalConnections(2048);
solrServer.setDefaultMaxConnectionsPerHost(2048);
solrServer.getHttpClient().getConnectionManager().closeIdleConnections(500, TimeUnit.MILLISECONDS);*/
if ( template == null ) {
template = new SolrTemplate(new MulticoreSolrServerFactory(solrServer));
template.setSolrCore(core);
template.afterPropertiesSet();
logger.debug("Creating new SolrTemplate for core '" + core + "'");
templateHolder.put(core, template);
}
logger.debug("setting SolrServer " + template);
this.solrOperations = template;
} catch (Exception e) {
logger.error("cannot set solrServer...", e);
}
}
The code that is commented out has been mostly used for testing purposes. I also read somewhere else that you cannot manipulate the solrServer object on-the-fly. Which begs the question, how do I set a timeout/poolsize in the XML config?
The implementation of a repository looks like this:
#Repository(value="stellenanzeigenSolrRepository")
public class StellenanzeigenSolrRepositoryImpl extends SolrOperationsRepository<Stellenanzeige> implements StellenanzeigenSolrRepositoryCustom {
...
public Query createQuery(Criteria criteria, Sort sort, Pageable pageable) {
Query resultQuery = new SimpleQuery(criteria);
if ( pageable != null ) resultQuery.setPageRequest(pageable);
if ( sort != null ) resultQuery.addSort(sort);
return resultQuery;
}
public Page<Stellenanzeige> findBySearchtext(String searchtext, Pageable pageable) {
Criteria searchtextCriteria = createSearchtextCriteria(searchtext);
Query query = createQuery(searchtextCriteria, null, pageable);
return getSolrOperations().queryForPage(query, getEntityClass());
}
...
}
Can any of you point to mistakes that we've made, that could possibly lead to this issue? Like I said, we are at a loss. Thanks in advance, and I will, of course update the question as we make progress or you request more information.
The MulticoreServerFactory always returns an object of HttpClient, that only ever allows 2 concurrent connections to the same host, thus causing the above problem.
This seems to be a bug with spring-data-solr that can be worked around by creating a custom factory and overriding a few methods.
Edit: The clone method in MultiCoreSolrServerFactory is broken. This hasn't been corrected yet. As some of my colleagues have run into this issue recently, I will post a workaround here - create your own class and override one method.
public class CustomMulticoreSolrServerFactory extends MulticoreSolrServerFactory {
public CustomMulticoreSolrServerFactory(final SolrServer solrServer) {
super(solrServer);
}
#Override
protected SolrServer createServerForCore(final SolrServer reference, final String core) {
// There is a bug in the original SolrServerUtils.cloneHttpSolrServer()
// method
// that doesn't clone the ConnectionManager and always returns the
// default
// PoolingClientConnectionManager with a maximum of 2 connections per
// host
if (StringUtils.hasText(core) && reference instanceof HttpSolrServer) {
HttpClient client = ((HttpSolrServer) reference).getHttpClient();
String baseURL = ((HttpSolrServer) reference).getBaseURL();
baseURL = SolrServerUtils.appendCoreToBaseUrl(baseURL, core);
return new HttpSolrServer(baseURL, client);
}
return reference;
}
}

EJB 3.1 and NIO2: Monitoring the file system

I guess most of us agree, that NIO2 is a fine thing to make use of. Presumed you want to monitor some part of the file system for incoming xml - files it is an easy task now. But what if I want to integrate the things into an existing Java EE application so I don't have to start another service (app-server AND the one which monitors the file system)?
So I have the heavy weight app-server with all the EJB 3.1 stuff and some kind of service monitoring the file system and take appropriate action once a file shows up. Interestingly the appropriate action is to create a Message and send it by JMS and it might be nice to integrate both into the app server.
I tried #Startup but deployment freezes (and I know that I shouldn't make use of I/O in there, was just a try). Anyhow ... any suggestions?
You could create a singleton that loads at startup and delegates the monitoring to an Asynchronous bean
#Singleton
#Startup
public class Initialiser {
#EJB
private FileSystemMonitor fileSystemMonitor;
#PostConstruct
public void init() {
String fileSystemPath = ....;
fileSystemMonitor.poll(fileSystemPath);
}
}
Then the Asynchronous bean looks something like this
#Stateless
public class FileSystemMonitor {
#Asynchronous
public void poll(String fileSystemPath) {
WatchService watcher = ....;
for (;;) {
WatchKey key = null;
try {
key = watcher.take();
for (WatchEvent<?> event: key.pollEvents()) {
WatchEvent.Kind<?> kind = event.kind();
if (kind == StandardWatchEventKinds.OVERFLOW) {
continue; // If events are lost or discarded
}
WatchEvent<Path> watchEvent = (WatchEvent<Path>)event;
//Process files....
}
} catch (InterruptedException e) {
e.printStackTrace();
return;
} finally {
if (key != null) {
boolean valid = key.reset();
if (!valid) break; // If the key is no longer valid, the directory is inaccessible so exit the loop.
}
}
}
}
}
Might help if you specified what server you're using, but have you considered implementing a JMX based service ? It's a bit more "neutral" than EJB, is more appropriate for a background service and has fewer restrictions.

Resources