Indexing Wikipedia with Solr doesn't work - solr

I'm trying to index the English Wikipedia, around 40Gb, but it's not working. I've followed the tutorial at http://wiki.apache.org/solr/DataImportHandler#Configuring_DataSources and other related Stackoverflow questions like Indexing wikipedia with solr and Indexing wikipedia dump with solr.
I was able to import the wikipedia (simple english), about 150k documents, and Portuguese wikipedia (more than 1 million documents) using the configuration explained in the tutorial. The problem is happening when I try to index the English Wikipedia (more than 8 million documents). It gives the follow error:
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:476)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:457)
Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:410)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:323)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:231)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:539)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:408)
... 5 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.ParallelPostingsArray.<init>(ParallelPostingsArray.java:34)
at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:254)
at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:279)
at org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48)
at org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray.grow(TermsHashPerField.java:307)
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:324)
at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185)
at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:165)
at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1520)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:217)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:569)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:705)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:70)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:504)
... 6 more
I'm using a MacBook pro with 4Gb RAM and more than 120Gb of free space in the HD. I've already tried to change the the 256 in the solrconfig.xml, but no success up to now.
Does anyone could help me, please?
Edited
Just in case, if someone has the same problem, I've used the command java Xmx1g -jar star.jar suggested by Cheffe to solve my problem.

Your Java VM is running out of memory. Give more memory to it. Like explained in this SO question Increase heap size in Java
java -Xmx1024m myprogram
Further detail on the Xmx parameter can be found in the docs, just search for -Xmxsize
Specifies the maximum size (in bytes) of the memory allocation pool in bytes. This value must be a multiple of 1024 and greater than 2 MB. Append the letter k or K to indicate kilobytes, m or M to indicate megabytes, g or G to indicate gigabytes. The default value is chosen at runtime based on system configuration. For server deployments, -Xms and -Xmx are often set to the same value. For more information, see Garbage Collector Ergonomics at http://docs.oracle.com/javase/8/docs/technotes/guides/vm/gc-ergonomics.html
The following examples show how to set the maximum allowed size of allocated memory to 80 MB using various units:
Xmx83886080
Xmx81920k
Xmx80m
The -Xmx option is equivalent to -XX:MaxHeapSize.

If you have tomcat6, you can increase java heap size in the file
/etc/default/tomcat6
change the parameter -Xmx in the line (e.g. from Xmx128m to Xmx256m):
JAVA_OPTS="-Djava.awt.headless=true -Xmx256m -XX:+UseConcMarkSweepGC"
During the import, watch the Admin Dashboard web page, where you can see actual JVM-memory allocated.

Related

Memory failure when running gem5 SE RISCV code

When I try to run a simulation in SE mode in gem5 I get the following output:
warn: No dot file generated. Please install pydot to generate the dot file and pdf. build/RISCV/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes) 0: system.remote_gdb: listening for remote gdb on port 7000 build/RISCV/sim/simulate.cc:194: info: Entering event queue # 0. Starting simulation... build/RISCV/sim/mem_state.cc:443: info: Increasing stack size by one page. build/RISCV/sim/mem_state.cc:99: panic: Someone allocated physical memory at VA 0x4000000000000000 without creating a VMA! Memory Usage: 619616 KBytes Program aborted at tick 2222000
I'm using the ELF-linux cross compiler. Compiling with the Newlib-ELF cross compiler simulates just fine, but the thing is that I need to use pthreads(openmp) and the Newlib compilation doesn't support it. To get a grip on things I tried to simulate in x86, and found out that it wont work either with a simple gnu/gcc compilation. Then I complied replicating what the test-progs folder did with docker and then it worked fine. Is this the way to go? Since the error says there are problems with physical memory, would compiling with docker help out, or am I missing an obviuos fix? How would go about compiling RISCV with docker (I couldn't find examples of docker+RISCV)?

Increase heap to avoid Out of Memory Error in WEKA

I am trying to run a classifier in WEKA, using a J48 classifier using the following command line:
$ java -Xmx2048m -cp /home/weka-3-7-9/weka.jar weka.classifiers.trees.J48 -t input.arff -i -k -d J48-data.model &
Although the size of my arff is 43.8 M, and I aumented the heap space to 2048m,
I still received the following errors:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(ArrayList.java:132)
at weka.core.Instances.initialize(Instances.java:196)
at weka.core.Instances.<init>(Instances.java:177)
at weka.classifiers.trees.j48.ClassifierSplitModel.split(ClassifierSplitModel.java:252)
at weka.classifiers.trees.j48.ClassifierTree.buildTree(ClassifierTree.java:159)
at weka.classifiers.trees.j48.C45PruneableClassifierTree.buildClassifier(C45PruneableClassifierTree.java:126)
at weka.classifiers.trees.J48.buildClassifier(J48.java:249)
at weka.classifiers.evaluation.Evaluation.evaluateModel(Evaluation.java:1485)
at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:649)
at weka.classifiers.AbstractClassifier.runClassifier(AbstractClassifier.java:297)
at weka.classifiers.trees.J48.main(J48.java:1062)
Does someone know if I am doing something incorrectly? Or can point me to a different solution to increase the heap?
Thank you in advance.
Quick instruction for the Ubuntu-Users: The heap can be set by changing the line MEMORY="256m" in the file /usr/bin/weka with your favored editor.
Weka's instructions state that the "-Xmx..." command will not work from the simple command line interface. I believe you should increase the heap size by editing the RunWeka.ini file. The link I provided should point you in the right direction.
In terminal use this command
sudo gedit /usr/bin/weka
Change the size in the line
MEMORY="256m"

Sybase initializes but does not run

I am using Red Hat 5.5 and I am trying to run Sybase ASE 12.5.4.
Yesterday I was trying to use the command "service sybase start" and the console showed sybase repeatedly trying to initialize, but failing, the main database server.
UPDATE:
I initialized a database at /ims_systemdb/master using the following commands:
dataserver -d /ims_systemdb/master -z 2k -b 51204 -c $SYBASE/ims.cfg -e db_error.log
chmod a=rwx /ims_systemdb/master
ls -al /ims_systemdb/master
And it gives me a nice database at /ims_systemdb/master with a size of 104865792 bytes (2048x51240).
But when I run
service sybase start
The error log at /logs/sybase_error.log goes like this:
00:00000:00000:2013/04/26 16:11:45.18 kernel Using config area from primary master device.
00:00000:00000:2013/04/26 16:11:45.19 kernel Detected 1 physical CPU
00:00000:00000:2013/04/26 16:11:45.19 kernel os_create_region: can't allocate 11534336000 bytes
00:00000:00000:2013/04/26 16:11:45.19 kernel kbcreate: couldn't create kernel region.
00:00000:00000:2013/04/26 16:11:45.19 kernel kistartup: could not create shared memory
I read "os_create_region" is normal if you don't set shmmax in sysctl high enough, so I set it to 16000000000000, but I still get this error. And sometimes, when I'm playing around with the .cfg file, I get this error message instead:
00:00000:00000:2013/04/25 14:04:08.28 kernel Using config area from primary master device.
00:00000:00000:2013/04/25 14:04:08.29 kernel Detected 1 physical CPU
00:00000:00000:2013/04/25 14:04:08.85 server The size of each partitioned pool must have atleast 512K. With the '16' partitions we cannot configure this value f
Why do these two errors appear and what can I do about them?
UPDATE:
Currently, I'm seeing the 1st error message (os cannot allocate bytes). The contents of /etc/sysctl.conf are as follows:
kernel.shmmax = 4294967295
kernel.shmall = 1048576
kernel.shmmni = 4096
But the log statements earlier state that
os_create_region: can't allocate 11534336000 bytes
So why is the region it is trying to allocate so big, and where did that get set?
The Solution:
When you get a message like "os_create_region: can't allocate 11534336000 bytes", what it means is that Sybase's configuration file is asking the kernel to create a region that exceeds the shmmax variable in /etc/sysctl.conf
The main thing to do is to change ims.conf (or whatever configuration file you are using). Then, you change the max memory variable in the physical memory section.
[Physical Memory]
max memory = 64000
additional network memory = 10485760
shared memory starting address = DEFAULT
allocate max shared memory = 1
For your information, my /etc/sysctl.conf file ended with these three lines:
kernel.shmmax = 16000000000
kernel.shmall = 16000000000
kernel.shmmni = 8192
And once this is done, type "showserver" to reveal what processes are running.
For more information, consult the Sybase System Administrator's Guide, volume 2 as well as Michael Gardner's link to Red Hat memory management in the comments earlier.

OpenSearchServer: Why am I getting this error Error (java.lang.NullPointerException)

I am using OpenSearchServer v1.2.4 rc3.
In the first few days it's working fine.
But when its Index size reached 1.0GB I got this error
"Error (java.lang.NullPointerException)"
when I start my crawler. The crawler works fine for some time and then stops with this error
"Error (java.lang.NullPointerException)".
What's wrong?
Depending of the size of your index, a memory parameter must be added. By default, OpenSearchServer is setup to run on small server with the default RAM value provided by the Java Virtual Machine (from 64MB to 512MB only).
For large indexes, you must set up a higher value. On a Unix/Linux server, just create an /etc/opensearchserver file with the following content:
CATALINA_OPTS="-Xms2G -Xmx2G -server"
export CATALINA_OPTS
On a Windows server, edit the start.bat files. Add the following line just after :okExec
set CATALINA_OPTS="-Xms2G -Xmx2G -server"
Replace 2G (which mean 2 GB) by the size of the memory you want to allocate to OpenSearchServer.
In a 32 bits version, the memory is limited to 2.5GB. You can use more memory with a 64 bits operating system using the following lines (on Unix/Linux):
CATALINA_OPTS="-Xms12G -Xmx12G -d64 -server"
for Window 64bits:
set CATALINA_OPTS="-Xms12G -Xmx12G -d64 -server"
After restarting OpenSearchServer, just check in the Runtime tab panel that you have the correct size of memory available.
Regarding the error details, it is more useful to have the full stack trace. You can find it in the log file (data/logs/oss.log), or in the Runtime/Logs tab panel.

Using cscope with VIM: adding database returns errno 75

I've got a pretty large cscope.out database (over 2GB) and an inverted index of over 1GB, and when I issue the command :cscope add "path to database", I get the following error:
E563: stat("path to database") error: 75
Looking at the source code, it seems to return the errno, where 75 means value too large for defined data type.
How can I get it to load my db?
32 bit vim imposes a 2GB limit on cscope databases. Use 64 bit vim to overcome this limitation.

Resources