Nutch2.3.1 hangs while inject, parse fetch, generate

Nutch2.3.1 hangs while inject, parse fetch, generate - solr

I've read various SO threads on why it takes so long (or hangs) while generating/injecting/parsing/fetching, but to no luck. The solutions in the following SO threads I've tried implementing, but no luck.
1) Nutch 2.1 urls injection takes forever
2) Nutch 2.2.1 doesnt continue after Injector job
and various other threads.
I'm using Nutch2.3.1 and HBase0.94.27. I've been following this and this tutorial and I was able to build successfully. But when I fire any nutch commands, it hangs up.
Following are the logs I get while firing these commands:-
Inject Command
root#ubuntu:~/apache-nutch-2.3.1/runtime/local# ./bin/nutch inject seed/urls.txt
InjectorJob: starting at 2016-05-04 09:59:12
InjectorJob: Injecting urlDir: seed/urls.txt
Generate Command
root#ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch generate -topN 40
GeneratorJob: starting at 2016-05-04 09:54:08
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 40
Fetch command
root#ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch fetch -all
FetcherJob: starting at 2016-05-04 10:00:14
FetcherJob: fetching all
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
Parse Command
root#ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch parse -all
ParserJob: starting at 2016-05-04 10:00:43
ParserJob: resuming: false
ParserJob: forced reparse: false
ParserJob: parsing all
Update Command
root#ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch updatedb -all
DbUpdaterJob: starting at 2016-05-04 10:02:24
DbUpdaterJob: updatinging all
Following is the HBase logs:-
client /0:0:0:0:0:0:0:1:45216
2016-05-04 10:00:47,214 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x1547b2be4bc000e, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2016-05-04 10:00:47,215 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /0:0:0:0:0:0:0:1:45216 which had sessionid 0x1547b2be4bc000e
2016-05-04 10:00:47,215 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x1547b2be4bc000d, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2016-05-04 10:00:47,216 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:59934 which had sessionid 0x1547b2be4bc000d
2016-05-04 10:01:10,000 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000c, timeout of 40000ms exceeded
2016-05-04 10:01:10,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000c
2016-05-04 10:01:22,002 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000b, timeout of 40000ms exceeded
2016-05-04 10:01:22,003 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000b
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000e, timeout of 40000ms exceeded
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000d, timeout of 40000ms exceeded
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000e
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000d
2016-05-04 10:02:25,195 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:59938
2016-05-04 10:02:25,202 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:59938
2016-05-04 10:02:25,204 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x1547b2be4bc000f with negotiated timeout 40000 for client /127.0.0.1:59938
2016-05-04 10:02:25,822 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:59940
2016-05-04 10:02:25,822 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:59940
2016-05-04 10:02:25,825 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x1547b2be4bc0010 with negotiated timeout 40000 for client /127.0.0.1:59940
2016-05-04 10:04:15,530 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=2.02 MB, free=243.82 MB, max=245.84 MB, blocks=3, accesses=27, hits=24, hitRatio=88.88%, , cachingAccesses=27, cachingHits=24, cachingHitsRatio=88.88%, , evictions=0, evicted=0, evictedPerRun=NaN
2016-05-04 10:04:28,372 DEBUG org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting at row= for max=2147483647 rows using org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation#25e5c862
2016-05-04 10:04:28,379 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 0 catalog row(s) and gc'd 0 unreferenced parent region(s)
2016-05-04 10:09:15,530 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=2.02 MB, free=243.82 MB, max=245.84 MB, blocks=3, accesses=27, hits=24, hitRatio=88.88%, , cachingAccesses=27, cachingHits=24, cachingHitsRatio=88.88%, , evictions=0, evicted=0, evictedPerRun=NaN
Hadoop.log
2016-05-04 10:42:18,132 INFO crawl.InjectorJob - InjectorJob: starting at 2016-05-04 10:42:18
2016-05-04 10:42:18,134 INFO crawl.InjectorJob - InjectorJob: Injecting urlDir: seed/urls.txt
2016-05-04 10:42:18,527 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
What exactly is the problem. I have configured everything correctly and it still hangs up. Why

Related

Embedded ActiveMQ connection taking a long time to shutdown

I'm using Camel with embedded ActiveMQ for some functional tests. But after the tests have run, ActiveMQ is having trouble shutting down, and I see logs like this:
2022-01-31 19:12:34,873 [MQ ShutdownHook] INFO BrokerService - Apache ActiveMQ 5.15.9 (localhost, ID:8dc9dbba2775-37769-1643655318556-0:2) is shutting down
...
2022-01-31 19:12:34,875 [MQ ShutdownHook] INFO TransportConnector - Connector tcp://localhost:0 stopped
2022-01-31 19:12:35,052 [m://localhost#4] INFO PooledConnectionFactory - Expiring connection ActiveMQConnection {id=ID:8dc9dbba2775-37769-1643655318556-4:3,clientId=ID:8dc9dbba2775-37769-1643655318556-3:2,started=false} on IOException: peer (vm://localhost#5) stopped.
...
2022-01-31 19:12:35,053 [MQ ShutdownHook] INFO BrokerService - Apache ActiveMQ 5.15.9 (localhost, ID:8dc9dbba2775-37769-1643655318556-0:19) is shutdown
2022-01-31 19:12:40,052 [MQ ShutdownHook] INFO TransportConnection - The connection to 'vm://localhost#12' is taking a long time to shutdown.
2022-01-31 19:12:45,052 [MQ ShutdownHook] INFO TransportConnection - The connection to 'vm://localhost#12' is taking a long time to shutdown.
2022-01-31 19:12:50,053 [MQ ShutdownHook] INFO TransportConnection - The connection to 'vm://localhost#12' is taking a long time to shutdown.
A connection shutdown is failing and keeps repeating that last message.
The config is done programmatically for the tests :
private static void addActiveMqComponent()
{
if ( FunctionalTestFramework.framework().context().getComponent("activemq2") == null)
{
JmsConfiguration jmsConfig = new JmsConfiguration();
ActiveMQConnectionFactory connectionFactory = new ActiveMQConnectionFactory("vm://localhost:61616?broker.persistent=false");
connectionFactory.setTrustAllPackages(true);
RedeliveryPolicy redeliveryPolicy = new RedeliveryPolicy();
redeliveryPolicy.setInitialRedeliveryDelay(1000);
redeliveryPolicy.setRedeliveryDelay(1000);
redeliveryPolicy.setBackOffMultiplier(3.5);
redeliveryPolicy.setUseExponentialBackOff(true);
redeliveryPolicy.setMaximumRedeliveries(-1);
connectionFactory.setRedeliveryPolicy(redeliveryPolicy);
jmsConfig.setConnectionFactory(connectionFactory );
transactionManager = new JmsTransactionManager();
transactionManager.setConnectionFactory(connectionFactory);
jmsConfig.setTransactionManager(transactionManager);
ActiveMQComponent activeMqComponent = new ActiveMQComponent();
activeMqComponent.setConfiguration(jmsConfig);
activeMqComponent.setTransacted(true);
activeMqComponent.setCacheLevelName("CACHE_CONSUMER");
FunctionalTestFramework.framework().context().addComponent("activemq2", activeMqComponent);
}
}
I've seen some other threads suggesting to set useShutdownHook="false" but it only applies when they're explicitly triggering shutdown (broker.stop()) which I'm not doing. And I'm not finding much else on this specific problem.
Any ideas on what the issue could be?

Kong/Scylladb intermittent timeout issue when writing routes "received only 0 responses 2" - K8

When trying to register routes via Kong /routes, we are seeing intermittent timeout issues.
Kong and Scylladb are setup in docker/k8
[error] 41#0: *119054033 [kong] api_helpers.lua:310 could not execute insertion query: [Write timeout] Operation timed out for kong.routes - received only 0 responses from 2 CL=SERIAL., client: 10.xxx.xxx.xxx, server: kong_admin, request: \"POST /routes HTTP/1.1\"
Only happens on /routes registration call, and write_request_timeout_in_ms is set to 2000ms.
Anyone seen this issue before?

Apache Camel - Parallel Routes Inflight Exchanges

I have a Camel context with many routes that starts every 15m with Timer Component.
These routes set some properties in exchange (Target host, Query and Current Date that I use a Processor to get date, -12 hours and transform to GMT).
After set these properties, using Direct, another route is called to execute the HTTP Get. When the Request finished, another Route is called to Post the return on Artemis ActiveMQ.
The project is deployed on Wildfly 13.
The problem is:
Sometimes the routes simply freeze. Don't start after 15 minutes.
When I try to stop/start the route, I got the follow log:
[0m[0m08:27:45,230 INFO [org.apache.camel.impl.DefaultShutdownStrategy] (Camel (camel-example) thread #70 - ShutdownTask) There are 1 inflight exchanges: InflightExchange: [exchangeId=ID-exchange-ID, fromRouteId=Route1, routeId=GetDataAutoBySinceTime, nodeId=toD7, elapsed=0, duration=216958569]
[0m[0m08:27:46,231 INFO [org.apache.camel.impl.DefaultShutdownStrategy] (Camel (camel-example) thread #70 - ShutdownTask) Waiting as there are still 1 inflight and pending exchanges to complete, timeout in 299 seconds. Inflights per route: [Route1 = 1]
[0m[0m08:27:46,231 INFO [org.apache.camel.impl.DefaultShutdownStrategy] (Camel (camel-example) thread #70 - ShutdownTask) There are 1 inflight exchanges: InflightExchange: [exchangeId=ID-exchange-ID, fromRouteId=Route1, routeId=GetDataAutoBySinceTime, nodeId=toD7, elapsed=0, duration=216959570]
[0m[0m08:27:47,231 INFO [org.apache.camel.impl.DefaultShutdownStrategy] (Camel (camel-example) thread #70 - ShutdownTask) Waiting as there are still 1 inflight and pending exchanges to complete, timeout in 298 seconds. Inflights per route: [Route1 = 1]
I don't know if some processes are stuck making it impossible another processes to start.
I thought to remove the generic routes (PostMessageInActiveMQ and
GetDataAutomaticallyBySinceTime and to implements the same code in another routes (Route1, Route2 and Route3) but I don't think this is the best approach.
Routes:
Route1 (Route2 and Route3 are almost the same, just change properties values)
from("timer:Route1Timer?period=15m")
.routeId("Route1")
.autoStartup(false)
.setProperty("targetAddress", simple("hostname.route1"))
.process(new GetCurrentDate())
.setProperty("query",
simple("DataQuery%26URI=Route1%26format=xml%26Mode=since-time%26p1=${header.currentDate}"))
.to("direct:GetDataAutoBySinceTime");
GetDataAutomaticallyBySinceTime
from("direct:GetDataAutoBySinceTime")
.routeId("GetDataAutoBySinceTime")
.autoStartup(true)
.removeHeaders("*")
.setHeader(Exchange.HTTP_METHOD, constant("GET"))
.toD("http4:${header.targetAddress}/command=${header.query}%26httpClient.socketTimeout=3000")
.convertBodyTo(String.class, "utf-8")
.to("direct:PostMessageInActiveMQ");
PostMessageInActiveMQ
CamelArtemisComponent components = new CamelArtemisComponent();
getContext().addComponent("artemis", components.getArtemisComponent());
from("direct:PostMessageInActiveMQ")
.routeId("PostMessageInActiveMQ")
.autoStartup(true)
.convertBodyTo(String.class, "utf-8")
.inOnly("artemis:ARTEMIS.QUEUE");
Entire code: https://github.com/vitorvr/camel-example
EDIT:
Camel Version: 2.22.0

Gatling active users drops to negative for a longer run after timeout exception

I'm running a simulation using gatling.
5 users per second for 150 mins.
After a certain exception:
15:58:19.643 [WARN ] i.g.h.e.GatlingHttpListener - Request 'facebook_outbound_msg' failed for user 29989
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source)
15:58:19.643 [ERROR] i.g.h.e.r.DefaultStatsProcessor - Request 'facebook_outbound_msg' failed for user 29989: j.n.s.SSLException: handshake timed out
15:58:19.643 [ERROR] i.g.h.c.i.DefaultHttpClient - Failed to install SslHandler
15:58:19.678 [WARN ] i.g.h.e.GatlingHttpListener - Request 'facebook_inbound_msg' failed for user 29984
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source)
15:58:19.678 [ERROR] i.g.h.e.r.DefaultStatsProcessor - Request 'facebook_inbound_msg' failed for user 29984: j.n.s.SSLException: handshake timed out
the number of active users dropped to -1, then everytime this exception happens, the number of users keeps dropping.
Example:
---- FacebookOutboundSimulation ------------------------------------------------
[############################################ ] 59%
waiting: 18061 / active: -3 / done: 26942
---- FacebookInboundMessageSimulation ------------------------------------------
[############################################ ] 59%
waiting: 18061 / active: -3 / done: 26942
================================================================================
Why does this happen and how to fix this?

The issue was with gatling-charts-highcharts-bundle-3.0.0-RC1 when I switched to gatling-charts-highcharts-bundle-3.0.0-RC3 that was released on 2nd October 2018 it got resolved.
Sidenote: Use AsJson instead of AsJSON.

Camel HL7 - ClosedChannelException while sending ACK back to the client

I'm building a HL7 listener using netty4 and processing HL7 messages. Once succesfully processed an ACK is sent back.
from("hl7NettyListener")
.routeId("route_hl7listener")
.startupOrder(997)
.unmarshal()
.hl7(false)
.to("direct:a");
from("direct:a")
.doTry()
.to("bean:processHL7?method=process")
.doCatch(HL7Exception.class)
.to("direct:ErrorACK")
//.transform(ack())
.stop()
.end()
.transform(ack())
.wireTap("direct:b");
This is working fine in my local eclipse. I fire a HL7 message and I get a ACk back.
But i package this application into a jar and put it on my server and then try doing a
cat example.hl7 | netcat localhost 4444 (to fire a HL7 message to port 4444 on linux env)
I dont get an ACK back. I get a closedconnection exception.
DEBUG NettyConsumer - Channel: [id: 0xdf13b06b, L:0.0.0.0/0.0.0.0:4444] writing body: MSH|^~\&|Karisma||Kestral|Kestral|20180309144109.827+1300||ACK^R01|701||2.3.1
2018-03-09 14:41:09,838 [ad #3 - WireTap] DEBUG WireTapProcessor - >>>> (wiretap) direct:b Exchange[]
2018-03-09 14:41:09,839 [ServerTCPWorker] DEBUG NettyConsumer - Caused by: [org.apache.camel.CamelExchangeException - Cannot write response to null. Exchange[ID-annan06-56620-1520559639101-0-2]. Caused by: [java.nio.channels.ClosedChannelException - null]]
org.apache.camel.CamelExchangeException: Cannot write response to null. Exchange[ID-annan06-56620-1520559639101-0-2]. Caused by: [java.nio.channels.ClosedChannelException - null]
at org.apache.camel.component.netty4.handlers.ServerResponseFutureListener.operationComplete(ServerResponseFutureListener.java:54)
at org.apache.camel.component.netty4.handlers.ServerResponseFutureListener.operationComplete(ServerResponseFutureListener.java:36)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:514)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:488)
at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:438)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:418)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:440)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)

That worked. It was failing because netcat was immediately closing the connection. Had put a ”netcat -i 5 localhost” for netcat to wait for 5 secs and successfully received the ACK back.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Nutch2.3.1 hangs while inject, parse fetch, generate - solr

Related

Embedded ActiveMQ connection taking a long time to shutdown

Kong/Scylladb intermittent timeout issue when writing routes "received only 0 responses 2" - K8

Apache Camel - Parallel Routes Inflight Exchanges

Gatling active users drops to negative for a longer run after timeout exception

Camel HL7 - ClosedChannelException while sending ACK back to the client

Categories

Resources