Running Apache Nutch on windows - solr

I am trying to run Apache Nutch on Windows for web crawling.I have installed cygwin and set its Path .But I am getting the following exception :
Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-cjindal\mapred\staging\cjindal-330065706\.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:682)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:655)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
I have not installed hadoop. Please help.

It is better to run Nutch on Unix. But if you want to run it on windows then probably you can download 1.2 version of Nutch which comes with Hadoop version which does not have this issue.

Related

Indexing problem with SOLR (MultiMaxScoreQParserPlugin)

I'm trying to integrate SOLR with Hybris but both of them are running on Kubernetes as a diffferent pod.
If I'm trying indexing SOLR on Hybris, it throws the error below;
ERROR [BackofficeLO-47] (000001JT) [SolrStandaloneSearchProvider] Error from server at http://10.10.100.181:34324/solr: Error CREATEing SolrCore 'master_backoffice_backoffice_product_flip': Unable to create core [master_backoffice_backoffice_product_flip] Caused by: de.hybris.platform.solr.search.MultiMaxScoreQParserPlugin
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://10.10.100.181:34324/solr: Error CREATEing SolrCore 'master_backoffice_backoffice_product_flip': Unable to create core [master_backoffice_backoffice_product_flip] Caused by: de.hybris.platform.solr.search.MultiMaxScoreQParserPlugin
I guess somethings wrong with SOLR "deafult" indexing directory.
Solr is running as process like below inside the pod;
solr#solr-fsd33wdf-qteg:/opt/solr-8.5.2$ ps -ef | grep solr
solr 10 1 0 Jun23 ? 00:15:55 /usr/local/openjdk-11/bin/java -server -Xms512m -Xmx512m -XX:+UseG1GC -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=250 -XX:+UseLargePages -XX:+AlwaysPreTouch -Xlog:gc*:file=/var/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M -Dsolr.jetty.inetaccess.includes= -Dsolr.jetty.inetaccess.excludes= -Dsolr.log.dir=/var/solr/logs -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data -Dsolr.data.home= -Dsolr.install.dir=/opt/solr -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf -Dlog4j.configurationFile=/var/solr/log4j2.xml -Xss256k -Dsolr.jetty.https.port=8983 -jar start.jar --module=http
So default confdir is /opt/solr/server/solr/configsets/_default/conf
If I'm check SOLR_HOME variable, it's different directory;
solr#solr-f575dcfdf-qtnpg:/opt/solr-8.5.2$ echo $SOLR_HOME
/var/solr/data
So, how can I change confdir to /var/solr/data ? I guess this is the problem here?
Thanks!
This page provides you with information on how to use the standalone setup. The ant configureSolrServer takes an argument of the path to the original Solr binary that you should download from here. This overwrites the files in the directory with the SAP Commerce specific setup. The MultiMaxScoreQParserPlugin is part of solr-hybris-components-<version_of_solr>.jar file, where the <version_of_solr> corresponds to the solr version your SAP Commerce is running on. Note that SAP Commerce also supports multiple Solr versions and it depends on what configuration you have.
You may then extend the default Solr docker image as provided here to have your setup running.

Cannot start indexing with solr in hybris 5.6

I tried to do indexing via HMC but it all get aborted with below error.
ERROR [coreLoadExecutor-4-thread-1] [CoreContainer] Failed to load file /hybris_extensions/hybris/config/solr/embedded/collection1/solrconfig.xml
ERROR [coreLoadExecutor-4-thread-1] [CoreContainer] Unable to create core: collection1
org.apache.solr.common.SolrException: Could not load config file /hybris_extensions/hybris/config/solr/embedded/collection1/solrconfig.xml
at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:530)
I have started solr from hybris 5.6 in standalone mode and using linux machine.
hybris can run solr in standalone or embedded mode. When running in embedded mode, the configuration for SolR is created in the directory:
hybris/config/solr/embedded/
Therefore your hybris server needs the rights to write to this directory.
Further reading:
https://wiki.hybris.com/display/release5/SolrFacetSearch+-+Installation+Guide

Apache Flink Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Traversable

I have just started learning Apache Flink and found the guide link to start the development in EClipse IDE.
I followed the this to start off but getting the below error
00:20:26,993 INFO org.apache.flink.api.java.ExecutionEnvironment - The job has 0 registered types and 0 default Kryo serializers
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Traversable
at java.lang.ClassLoader.defineClass1(Native Method)
Here I have place the Error log log File ... Please let me know if your require more details. Thanks, Nyamath
java.util.zip.ZipException: invalid LOC header (bad signature)
Your Scala jar file provided by Maven seems to be corrupted. Please update your Maven dependencies by executing this from your project folder on the command line:
mvn -U clean install
In Eclipse, right click on your project and click on Update - - > Maven dependencies.
If that does not work you'll need to delete the corrupted Jar file in the .m2/repositories/ folder.

GAE 500 server error

I'm developing app on GAE, I test the website locally fine however, every time I tried to deploy it to the GAE it reports with
Error: Server Error
The server encountered an error and could not complete your request.
Please try again in 30 seconds.
I searched a lot in google, seems no answer could solve my question. When I look for the log in the GAE app, following is the major problem I found so far. Initially, I thought it's due to JDK8 but when I set JDK8 I can't even run the app locally!
Uncaught exception from servlet
java.lang.UnsupportedClassVersionError: org/apache/jsp/index_jsp : Unsupported major.minor version 52.0
at com.google.appengine.runtime.Request.process-aea5c804a9f29902(Request.java)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:795)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.lang.ClassLoader.loadClass(ClassLoader.java:360)
at org.mortbay.util.Loader.loadClass(Loader.java:91)
at org.mortbay.util.Loader.loadClass(Loader.java:71)
at org.mortbay.jetty.servlet.Holder.doStart(Holder.java:73)
at org.mortbay.jetty.servlet.ServletHolder.doStart(ServletHolder.java:242)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:685)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:437)
at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:444)
at com.google.tracing.CurrentContext.runInContext(CurrentContext.java:188)
at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:308)
at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:300)
at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:441)
at java.lang.Thread.run(Thread.java:724)
This is the telltale: Unsupported major.minor version 52.0. This happens when you compile on higher version JDK (52 = java8) and then execute it on lower JRE version (GAE uses java7).
GAE does not yet support Java8, so you should compile under Java7.
I had the same problem on Windows when Java 8 was installed.
I tried modifying the project/workspace settings but it didn't help me.
So, I created the following batch file as a workaround for GAE projects:
eclipse_gae.bat:
SET JAVA_HOME="C:\Program Files\Java\jdk1.7.0_55"
SET PATH="%JAVA_HOME%\bin"
START eclipse.exe
there is no need to fully remove Java 8 from MacOS. Just reconfigure Eclipse as shown here to force it to compile JSPs as Java 7:
http://java.wildstartech.com/Java-Platform-Standard-Edition/mac-os-x-java-development/how-to-configure-eclipse-to-run-with-java-7-when-java-8-is-installed
Make sure, that the version of Project Properties -> Project Facets -> Java is set to 1.7 and not to 1.8.
This is also nicely explained in the plugin's documentation section 'Changing the JDK Compliance Level'
I ran into a similar issue now. My setup is OS X + JDKs 6, 7 and 8, and in my eclipse.ini I had:
-vm
/Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home/bin/java
Apparently it will use that compiler no matter what you set in global/project preferences (I even removed JDK 8 from the Installed JRE list, no success).
Once I've changed that to use JDK 7 and redeployed my GAE app, it worked.
You have to compile with Java 1.7. But if you have *.jsp files, you should also completely remove Java 1.8 from the system. If you use Mac, here is how you can do it.
If you are using gradle (e.g. Android Studio) on a mac, you can work around this problem by adding the following lines to the top of your gradlew script:
# Insist on java 7
JAVA_HOME=$(/usr/libexec/java_home -v 1.7)
Variations of this method should work on other platforms.
I'm sure this has been resolved, but here is the solution
GAE runs on java7 (as indicated by Mr. Knego) so maven must be compiling with java7. Do the following:
I prefer to do all this in the terminal
Edit bash profile:
touch ~/.bash_profile; open ~/.bash_profile
Set Java Home Directory
#set JAVA_HOME
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home
export JAVA_HOME
# For Apache Maven Commands
export M2_HOME=/Users/your-username/your/path/to/maven/apache-maven-3.3.3
export PATH=$PATH:$M2_HOME/bin
Navigate to your project folder (with pom.xml) and reinstall a clean version of maven
mvn clean install
Navigate to EAR or WAR directory and deploy with a new runtime
mvn appengine:update
If you are using endpoints, you should update your endpoint libraries

Tomcat runs but complains about missing server.xml

I'm using ubuntu 9.10 and I installed java and tomcat using the package manager. When I went to run startup.sh, it first complains about catalina.out not being there and not being writable. I fixed that and it doesn't complain about that (why isn't that included in the install??) Now it's complaining about server.xml not being there when I shut down the server. Here is my output from command line:
user#desktop:/usr/share/tomcat6$ ./bin/startup.sh
Using CATALINA_BASE: /usr/share/tomcat6
Using CATALINA_HOME: /usr/share/tomcat6
Using CATALINA_TMPDIR: /usr/share/tomcat6/temp
Using JRE_HOME: /usr/lib/jvm/java-6-sun-1.6.0.15
user#desktop:/usr/share/tomcat6$ ./bin/shutdown.sh
Using CATALINA_BASE: /usr/share/tomcat6
Using CATALINA_HOME: /usr/share/tomcat6
Using CATALINA_TMPDIR: /usr/share/tomcat6/temp
Using JRE_HOME: /usr/lib/jvm/java-6-sun-1.6.0.15
Dec 11, 2009 4:42:57 PM org.apache.catalina.startup.Catalina stopServer
SEVERE: Catalina.stop:
java.io.FileNotFoundException: /usr/share/tomcat6/conf/server.xml (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:106)
at org.apache.catalina.startup.Catalina.stopServer(Catalina.java:407)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.catalina.startup.Bootstrap.stopServer(Bootstrap.java:337)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:415)
user#desktop:/usr/share/tomcat6$
I'm really new to tomcat so this might be a dumb question, but why isn't there a sample server.xml in a fresh install of tomcat? What can I put in there to shut it up... even if its just a stub and a +1 to any one who can explain to me the structure of this file?
Try using server.xml located at:
/etc/tomcat6/server.xml
server.xml is the configuration file for the application server. It contains stuff like what port is the server going to listen at, where are the applications being deployed, and other related stuff.
More info on Tomcat in Ubuntu:
https://help.ubuntu.com/12.04/serverguide/tomcat.html

Resources