Items not groupped correctly - CoGroupByKey - apache-flink

CoGroupByKey problem
Data description.
I have two datasets.
Records - the first, containes around 0.5-1M of records per (key,day). For testing I use 2-3 keys and 5-10 days of data. What I shoot for is 1000+ keys. Each record contains key, timestamp in μ-seconds and some other data.
Configs - the second, is rather small. It describes the key in time, e.g. you can think about it as a list of tuples: (key, start date, end date, description).
For the exploration I've encoded the data as files of length-prefixed Protocol Buffer binary encoded messages. Additionally the files are packed with gzip. Data is sharded by date. Each file is around 10MB.
Pipeline
I use Apache Beam to express a pipeline.
First I add keys to both datasets. For Records dataset it's (key, day rounded timestamp). For Configs a key is (key, day), where day is each timestamp value between start date and end date (pointing midnight).
The datasets are merged using CoGroupByKey.
As a key type I use org.apache.flink.api.java.tuple.Tuple2 with a Tuple2Coder from repo github.com/orian/tuple-coder.
The problem
If the Records dataset is tiny like 5 days, everything seems fine (check normal_run.log).
INFO [main] (FlinkPipelineRunner.java:124) - Final aggregator values:
INFO [main] (FlinkPipelineRunner.java:127) - item count : 4322332
INFO [main] (FlinkPipelineRunner.java:127) - missing val1 : 0
INFO [main] (FlinkPipelineRunner.java:127) - multiple val1 : 0
When I run the pipeline against 10+ days I encounter an error pointing that for some Records there's no Config (wrong_run.log).
INFO [main] (FlinkPipelineRunner.java:124) - Final aggregator values:
INFO [main] (FlinkPipelineRunner.java:127) - item count : 8577197
INFO [main] (FlinkPipelineRunner.java:127) - missing val1 : 6
INFO [main] (FlinkPipelineRunner.java:127) - multiple val1 : 0
Then I've added some extra logging messages:
(a.java:144) - 68643 items for KeyValue3 on: 1462665600000000
(a.java:140) - no items for KeyValue3 on: 1463184000000000
(a.java:123) - missing for KeyValue3 on: 1462924800000000
(a.java:142) - 753707 items for KeyValue3 on: 1462924800000000 marked as no-loc
(a.java:123) - missing for KeyValue3 on: 1462752000000000
(a.java:142) - 749901 items for KeyValue3 on: 1462752000000000 marked as no-loc
(a.java:144) - 754578 items for KeyValue3 on: 1462406400000000
(a.java:144) - 751574 items for KeyValue3 on: 1463011200000000
(a.java:123) - missing for KeyValue3 on: 1462665600000000
(a.java:142) - 754758 items for KeyValue3 on: 1462665600000000 marked as no-loc
(a.java:123) - missing for KeyValue3 on: 1463184000000000
(a.java:142) - 694372 items for KeyValue3 on: 1463184000000000 marked as no-loc
You can spot that in first line 68643 items were processed for KeyValue3 and time 1462665600000000.
Later on in line 9 it seems the operation processes the same key again, but it reports that no Config was available for these Records.
The line 10 informs they've been marked as no-loc.
The line 2 is saying that there were no items for KeyValue3 and time 1463184000000000, but in line 11 you can read that the items for this (key,day) pair were processed later and they've lacked a Config.
Some clues
During one of the exploration runs I've got an exception (exception_thrown.log).
05/26/2016 03:49:49 GroupReduce (GroupReduce at GroupByKey)(1/5) switched to FAILED
java.lang.Exception: The data preparation for task 'GroupReduce (GroupReduce at GroupByKey)' , caused an error: Error obtaining the sorted input: Thread 'SortMerger spilling thread' terminated due to an exception: Error obtaining the sorted input: Thread 'SortMerger Reading Thread' terminated due to an exception: tried to access field com.esotericsoftware.kryo.io.Input.inputStream from class org.apache.flink.api.java.typeutils.runtime.NoFetchingInput
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:455)
at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:345)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Error obtaining the sorted input: Thread 'SortMerger spilling thread' terminated due to an exception: Error obtaining the sorted input: Thread 'SortMerger Reading Thread' terminated due to an exception: tried to access field com.esotericsoftware.kryo.io.Input.inputStream from class org.apache.flink.api.java.typeutils.runtime.NoFetchingInput
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:619)
at org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1079)
at org.apache.flink.runtime.operators.GroupReduceDriver.prepare(GroupReduceDriver.java:94)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:450)
... 3 more
Caused by: java.io.IOException: Thread 'SortMerger spilling thread' terminated due to an exception: Error obtaining the sorted input: Thread 'SortMerger Reading Thread' terminated due to an exception: tried to access field com.esotericsoftware.kryo.io.Input.inputStream from class org.apache.flink.api.java.typeutils.runtime.NoFetchingInput
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:799)
Caused by: java.lang.RuntimeException: Error obtaining the sorted input: Thread 'SortMerger Reading Thread' terminated due to an exception: tried to access field com.esotericsoftware.kryo.io.Input.inputStream from class org.apache.flink.api.java.typeutils.runtime.NoFetchingInput
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:619)
at org.apache.flink.runtime.operators.sort.LargeRecordHandler.finishWriteAndSortKeys(LargeRecordHandler.java:263)
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.go(UnilateralSortMerger.java:1409)
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:796)
Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated due to an exception: tried to access field com.esotericsoftware.kryo.io.Input.inputStream from class org.apache.flink.api.java.typeutils.runtime.NoFetchingInput
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:799)
Caused by: java.lang.IllegalAccessError: tried to access field com.esotericsoftware.kryo.io.Input.inputStream from class org.apache.flink.api.java.typeutils.runtime.NoFetchingInput
at org.apache.flink.api.java.typeutils.runtime.NoFetchingInput.readBytes(NoFetchingInput.java:122)
at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:297)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:706)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:228)
at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:242)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:144)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:144)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
at org.apache.flink.runtime.io.disk.InputViewIterator.next(InputViewIterator.java:43)
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ReadingThread.go(UnilateralSortMerger.java:973)
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:796)
Work-around (after more testing, doesn't work, staying with Tuple2)
I've switched from using Tuple2 to a Protocol Buffer message:
message KeyDay {
optional ByteString key = 1;
optional int64 timestamp_usec = 2;
}
But using Tuple2.of() was just easier than: KeyDay.newBuilder().setKey(...).setTimestampUsec(...).build().
When switched to a key been a class derived from protobuf.Message the problem disappeared for 10-15 days (so data size which was problem for Tuple2), but increasing data size to 20 days revealed it's there.

Related

Alfresco - Configure 2 groupSearchBases for Active Directory

How to configure 2 groupSearchBases for Alfresco?
Right now i have this property in my global.properties:
ldap.synchronization.groupSearchBase=CN\=Alfresco users,OU\=Users,OU\=AWE,DC\=main,DC\=awe
But i need to configure second search base with path
CN=Alfresco users,OU=Labs,OU=AWE,DC=main,DC=awe
. What i have tried is to configure the property with OR statement like this:
ldap.synchronization.groupSearchBase=(|(CN\=Alfresco users,OU\=Users,OU\=AWE,DC\=main,DC\=awe)(CN\=Alfresco users,OU\=Labs,OU\=AWE,DC\=main,DC\=awe))
This setting gave me an error:
00:30:07,147 ERROR [org.alfresco.repo.security.sync.ChainingUserRegistrySynchronizer] Synchronization aborted due to error
org.alfresco.error.AlfrescoRuntimeException: 02290000 Error during LDAP Search. Reason: null
...
Caused by: javax.naming.PartialResultException [Root exception is javax.naming.NamingException: LDAP response read timed out, timeout used:5000ms. [Root exception is com.sun.jndi.ldap.LdapReferralException: Continuation Reference; remaining name 'DC\=main,DC\=awe']; remaining name '']
...
Caused by: javax.naming.NamingException: LDAP response read timed out, timeout used:5000ms. [Root exception is com.sun.jndi.ldap.LdapReferralException: Continuation Reference; remaining name 'DC\=main,DC\=awe']; remaining name ''
...
Caused by: com.sun.jndi.ldap.LdapReferralException: Continuation Reference; remaining name 'DC\=main,DC\=awe'
I also minimized the searchBase path to include both of the directories like this:
ldap.synchronization.groupSearchBase=CN\=Alfresco users,OU\=AWE,DC\=main,DC\=awe
But this also gave me an error:
org.alfresco.error.AlfrescoRuntimeException: 02310000 Error during LDAP Search. Reason: [LDAP: error code 32 - 0000208D: NameErr: DSID-03100238, problem 2001 (NO_OBJECT), data 0, best match of: 'OU=AWE,DC=main,DC=awe'
...
Caused by: javax.naming.NameNotFoundException: [LDAP: error code 32 - 0000208D: NameErr: DSID-03100238, problem 2001 (NO_OBJECT), data 0, best match of:'OU=AWE,DC=main,DC=awe'
What i am doing wrong and how to make alfresco search for both groupSearchBases (the easiest way if possible). Thanks in advance.
as mentioned in the comments, the search base is a LDAP (Distinguished Name) path, not a query. This means that you should select the search base for your user and group query to a path for which both organizational units are subordinate: OU=AWE,DC=main,DC=awe.
Then you need to build the users and groups query so that only groups and users are returned as expected. E.g. for the person query can look like this:
(&
(objectCategory\=Person)
(|
(memberOf\:1.2.840.113556.1.4.1941\:\=CN\=Alfresco users,OU\=Users,OU\=AWE,DC\=main,DC\=awe)
(memberOf\:1.2.840.113556.1.4.1941\:\=CN\=Alfresco users,OU\=Labs,OU\=AWE,DC\=main,DC\=awe)
)
(userAccountControl\:1.2.840.113556.1.4.803\:\=512)
)
for the group search you should do the same.
hint: 1.2.840.113556.1.4.1941 is a Active-Directory specific filter to retrieve nested groups (recursive retrieval of all members of that DN). For more info check Active Directory: LDAP Syntax Filters | MS Tecnet

Neo4j: How do you rebuild the label scan store?

I shut down my Neo4J instance every night to do a backup. This morning I found that it failed to start up again:
2015-12-05 03:38:49.326+0000 INFO Successfully shutdown Neo4j Server
2015-12-05 03:38:49.330+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#7728902c' was successfully initialized, but failed to start. Please see attached cause exception. Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#7728902c' was successfully initialized, but failed to start. Please see attached cause exception.
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#7728902c' was successfully initialized, but failed to start. Please see attached cause exception.
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:67)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:234)
at org.neo4j.server.Bootstrapper.start(Bootstrapper.java:97)
at org.neo4j.server.CommunityBootstrapper.start(CommunityBootstrapper.java:48)
at org.neo4j.server.CommunityBootstrapper.main(CommunityBootstrapper.java:35)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase#7728902c' was successfully initialized, but failed to start. Please see attached cause exception.
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:462)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:194)
... 3 more
Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.impl.factory.CommunityFacadeFactory, /lustre/scratch116/vr/vrpipe/neo4j/production/db
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:143)
at org.neo4j.kernel.impl.factory.CommunityFacadeFactory.newFacade(CommunityFacadeFactory.java:43)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:108)
at org.neo4j.server.CommunityNeoServer$1.newGraphDatabase(CommunityNeoServer.java:66)
at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:95)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 5 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.api.impl.index.LuceneLabelScanStore#28c94a12' failed to initialize. Please see attached cause exception.
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:434)
at org.neo4j.kernel.lifecycle.LifeSupport.init(LifeSupport.java:66)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:102)
at org.neo4j.kernel.NeoStoreDataSource.start(NeoStoreDataSource.java:600)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.kernel.impl.transaction.state.DataSourceManager.start(DataSourceManager.java:112)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:139)
... 10 more
Caused by: java.io.IOException: Label scan store could not be read, and needs to be rebuilt. To trigger a rebuild, ensure the database is stopped, delete the files in '/lustre/scratch116/vr/vrpipe/neo4j/production/db/schema/label/lucene', and then start the database again.
at org.neo4j.kernel.api.impl.index.LuceneLabelScanStore.init(LuceneLabelScanStore.java:259)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:424)
... 19 more
I followed its advice to delete db/schema/label/lucene/*, and the database started up fine, but I can't query any existing nodes or relationships. The web front end says I have no node labels or relationship types. I tried doing match (n)-[r]-() return n,r, but that returns nothing.
How do I get my database back? Perhaps I need to force rebuilding of the lucene indexes somehow?
You took a backup before you deleted it?
You only deleted that directory?
What does the new startup log look like?
How much data do you have in your db?
What does this return? match (n) return count(*)

Error during node startup: Unable to start DSE server / Plugin activation failed / Cannot find core

I've been having these issues for quite a while already but I ignored them initially because I can still start my nodes. However, one of these issues became more serious recently that it now takes me a lot of tries in order to successfully start a node.
Issue #1: Unable to start DSE server / Plugin activation failed / Cannot find core
ERROR [main] 2015-01-28 03:30:40,058 DseDaemon.java (line 492) Unable to start DSE server.
java.lang.RuntimeException: com.datastax.bdp.plugin.PluginManager$PluginActivationException: Plugin activation failed
at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:135)
at com.datastax.bdp.server.DseDaemon.start(DseDaemon.java:480)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:509)
at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:659)
Caused by: com.datastax.bdp.plugin.PluginManager$PluginActivationException: Plugin activation failed
at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:284)
at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:128)
... 3 more
Caused by: java.lang.IllegalStateException: Cannot find core: myks.mycf
at com.datastax.bdp.search.solr.core.SolrCoreResourceManager.doWaitForCore(SolrCoreResourceManager.java:742)
at com.datastax.bdp.search.solr.core.SolrCoreResourceManager.waitForCore(SolrCoreResourceManager.java:478)
at com.datastax.bdp.plugin.SolrContainerPlugin.waitForSecondaryIndexesLoading(SolrContainerPlugin.java:237)
at com.datastax.bdp.plugin.SolrContainerPlugin.onActivate(SolrContainerPlugin.java:98)
at com.datastax.bdp.plugin.PluginManager.initialize(PluginManager.java:334)
at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:263)
... 4 more
INFO [Thread-3] 2015-01-28 03:30:40,059 DseDaemon.java (line 505) DSE shutting down...
INFO [StorageServiceShutdownHook] 2015-01-28 03:30:40,164 Gossiper.java (line 1307) Announcing shutdown
INFO [Thread-3] 2015-01-28 03:30:40,620 PluginManager.java (line 356) All plugins are stopped.
INFO [Thread-3] 2015-01-28 03:30:40,620 CassandraDaemon.java (line 463) Cassandra shutting down...
INFO [StorageServiceShutdownHook] 2015-01-28 03:30:42,165 MessagingService.java (line 701) Waiting for messaging service to quiesce
INFO [ACCEPT-/144.76.201.233] 2015-01-28 03:30:42,814 MessagingService.java (line 941) MessagingService has terminated the accept() thread
This exception started as a "mild" issue - mild because although it prevents a node from starting up when it happens, it usually takes me 1 more try to successfully start the affected node. However, about two weeks ago, after having not restarted any of my nodes for quite a while, I discovered that I now need a lot more attempts (20+) in order to start a node.
From the stack trace, it looks like a timeout issue (in doWaitForCore()); but I cannot find a setting to increase the amount of time that DSE would wait for a core to load during startup before giving up. The core that is mentioned in the stack trace is always the same, and I assume that this is because it is my biggest core (~1.4 billions records) and it takes the longest time to load. But when I manage to start the node successfully, there are no signs of errors - I can query the core like any other core.
--
There are two other issues that may or may not be related to the one above. Both of them always appear during startup; and unlike the first one, they do not cause a startup failure (i.e. they also appear when a node starts successfully)
Issue #2: Invalid Number: static
ERROR [searcherExecutor-67-thread-1] 2015-01-28 04:26:49,691 SolrException.java (line 124) org.apache.solr.common.SolrException: Invalid Number: static
at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:396)
at org.apache.solr.schema.FieldType.getFieldQuery(FieldType.java:697)
at org.apache.solr.schema.TrieField.getFieldQuery(TrieField.java:343)
at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:741)
at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:545)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:153)
at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)
at org.apache.solr.search.QParser.getQuery(QParser.java:143)
at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:135)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:183)
I looked at the data that I imported and I couldn't find a supposedly-numeric value that was incorrectly supplied as "static". In the java application that I wrote to convert CSVs to SSTables, I cast all numeric values to int/long/double depending on the field type so I honestly don't think that it has something to do with my data.
Issue #3: Could not getStatistics on info bean com.datastax.bdp.search.solr.FilterCacheMBean
WARN [SolrSecondaryIndex myks.mycf2 index initializer.] 2015-01-28 04:26:51,770 JmxMonitoredMap.java (line 256) Could not getStatistics on info bean com.datastax.bdp.search.solr.FilterCacheMBean
java.lang.RuntimeException: java.lang.ClassCastException: org.apache.lucene.search.FieldCache$CreationPlaceholder cannot be cast to org.apache.solr.search.SolrCache
at com.datastax.bdp.search.solr.FilterCacheMBean.getStatistics(FilterCacheMBean.java:185)
at org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean.getMBeanInfo(JmxMonitoredMap.java:236)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:140)
at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:51)
at com.datastax.bdp.search.solr.core.CassandraCoreContainer.registerExtraMBeans(CassandraCoreContainer.java:679)
at com.datastax.bdp.search.solr.core.CassandraCoreContainer.register(CassandraCoreContainer.java:427)
at com.datastax.bdp.search.solr.core.CassandraCoreContainer.doLoad(CassandraCoreContainer.java:757)
at com.datastax.bdp.search.solr.core.CassandraCoreContainer.load(CassandraCoreContainer.java:162)
at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex$2.run(AbstractSolrSecondaryIndex.java:882)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: org.apache.lucene.search.FieldCache$CreationPlaceholder cannot be cast to org.apache.solr.search.SolrCache
at com.datastax.bdp.search.solr.FilterCacheMBean.getStatistics(FilterCacheMBean.java:174)
... 16 more
I have absolutely no idea what this is.
--
Has anyone encountered these errors/exceptions/warnings before? What did you do?
Issue #1: The max waiting time to load a core was hard-coded at 1 min. So, your assumption is right: a very large core or hundreds of cores could prevent the node starting due to the excessive time to load this particular core. In the next patch release (4.5.6, 4.6.1) we address this issue by creating a new option load_max_time_per_core in dse.yaml. This option allows you to increase the max waiting time for core loading, starting at 1 min. For 500 cores you would need to increase load_max_time_per_core to about 3 minutes, for example.
Issue #2: Unfortunately, I don't know what could be causing this. We would need further info about this to see why it's happening.
Issue #3: We have currently investigating what this can be.
Regarding issue #2, are you sure you don't have a QuerySenderListener with a wrong warmup query in your solrconfig?

solr indexing not working when i try to insert 1000000 rows but works fine when i try to index 400000 rows or below

iam using solr 4.7.1 and trying to do a full import.My data source is a table in mysql. It has 10000000 rows and 20 columns.
Whenever iam trying to do a full import solr stops responding. But when i try to do a import of 400000 or less it works fine.
If i try to import more than this solr wont index the result it either stops responding or will show "indexing failed". In the error log it says "Unable to execute query".But i dont understand how is the query running fine for lesser number of records but fails when i run more number of records
My system config are follows
CPU-i7
Ram -6Gb
OS-64 bit windows 7
I am not able to figure out what the problem is ,i have tried increasing the max_allowed_packet to 1000M and even java heap size.
please help thanks in advance
This is the error code
`Exception while processing: playername document : SolrInputDocument(fields: []):org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT player_id,firstname,lastname,value1,value2,value3,value4,value5,value6, value7,value8,value9,value10, value11,value18,value19,value20, country_id, playername_modtime,player_flag from playername WHERE 'true' != 'false' OR playername.playername_modtime > '2014-05-23 10:38:56' Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:281) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:238) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:42) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:477) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:331) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:239) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:464) Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 130,037 milliseconds ago. The last packet sent successfully to the server was 130,038 milliseconds ago. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at com.mysql.jdbc.Util.handleNewInstance(Util.java:409) at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1127) at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2288) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:2044) at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3549) at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:489) at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3240) at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2411) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2834) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2832) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2781) at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:908) at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:788) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:274) ... 12 more Caused by: java.io.EOFException: Can not read response from server. Expected to read 6 bytes, read 4 bytes before connection was unexpectedly lost. at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3161) at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2269) ... 23 more 5/23/2014 8:32:18 PM ERROR DataImporter Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT player_id,​firstname,​lastname,​value1,​value2,​value3,​value4,​value5,​value6,​ value7,​value8,​value9,​value10,​ value11,​value18,​value19,​value20,​ country_id,​ playername_modtime,​player_flag from playername WHERE 'true' != 'false' OR playername.playername_modtime > '2014-05-23 10:38:56' Processing Document # 1 Last Check: 5/23/2014 8:36:34 PM`
Added batchSize="-1" to data-config.xml and it worked
http://wiki.apache.org/solr/DataImportHandlerFaq

Solr error when doing full-import 250000 rows org.apache.solr.common.SolrException;null:org.eclipse.jetty.io.EofException

I am using solr 4.6.0 with jetty on windows 7 enterrpise with max heap of 2G.I can do a full-import for 200,000 records properly from the Solr Admin UI but as soon as I increase to 250,000 records, it starts giving me this error below:
webapp=/solr path=/dataimport params={optimize=false&clean=false&indent=true&commit=true&verbose=true&entity=files&command=full-import&debug=true&wt=json&rows=250000} {add=[8065121, 8065126, 8065128, 8065146, 8065963, 7838189, 7838186, 8065155, 8065174, 8065179, ... (250001 adds)],commit=} 0 2693420
org.apache.solr.common.SolrException; null:org.eclipse.jetty.io.EofException
at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914)
at org.eclipse.jetty.http.AbstractGenerator.blockForOutput(AbstractGenerator.java:507)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:170)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)
at su
Caused by: java.net.SocketException: Software caused connection abort: socket write error at java.net.SocketOutputStream.socketWrite0(Native Method)
at j......
org.apache.solr.common.SolrException;null:org.eclipse.jetty.io.EofException at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914)
org.eclipse.jetty.servlet.ServletHandler; /solr/dihdb/dataimport
java.lang.IllegalStateException: Committed
at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1144)
I have changed example/etc/jetty.xml as follows for maxIdleTime=3500000.
I changed example/etc/webdefault.xml for session-timeout=720.
I still keep getting the error above.
TIA,
Vijay
I changed -Xmx5120M and that seems to have fixed the issue with 500K and 1 million records.Lack of memory in essence was the issue for this misleading error showing up.
Also tried 100000 1800000 for DataImportHandler.

Resources