Carrot2 dcs webapp setup - solr

I have been struggling with setting up Carrot2 for use PHP, on a local machine. The plan is to have Carrot2 retrieve cluster from Solr populated by Nutch. Currently Solr and Nutch are correctly configured and I have been able to access the information via Carrot2 Workbench. Carrot2-dcs-3.10.0 has been set up what I believed to be correctly deployed through the tomcat6 manager although the documentation on setting this up is horrible vague and incomplete. Changes to source-solr-attributes.xml were made according to https://sites.google.com/site/profileswapnilkulkarni/tech-talk/howtoconfigureandruncarrot2webapplicationwithsolrdocumentsource . Tomcat is set up on port 8080. The Carrot2 DCS php example example.php works and displays the test output correctly. Although, when I try to perform a cluster using localIPAddress:8080/carrot2-dcs/index.html I run into a problem. When I set document source to Solr and the query to : then click cluster I get the following error message.
HTTP Status 500 - Could not perform processing: org.apache.http.conn.HttpHostConnectException: Connection to localhost:8983 refused
type Status report
message Could not perform processing: org.apache.http.conn.HttpHostConnectException: Connection to localhost:8983 refused
description The server encountered an internal error that prevented it from fulfilling this request.
I have searched everywhere in the deployed webapp folder for carrot2 and can't find where it is getting localhost:8983 from.
Any assistance would be appreciated, thank you.

It turns out that the source-solr-attributes.xml file had an extra overridden-attributes. one was before the default block comment with the example parameters and the second was added in by me with the parameters needed for my config. Deleting one of the line so there was only one corrected the problem. Apparently with two of those it ignores the server settings and uses default values instead.

Related

How to make SolrCloud use HTTPS for creating collections?

I'm setting up SolrCloud 7.2.1 on a Windows Server 2016, following the getting started guide: https://lucene.apache.org/solr/guide/7_2/getting-started-with-solrcloud.html#GettingStartedwithSolrCloud-InteractiveStartup
I have to specify the host because for some reason the SOLR_HOST variable does not have any effect. localhost gets tried all to time.
So specifying the host, everything works fine until a new collection has to be created.
For some reason I get the following error:
ERROR: Failed to create collection 'collectionname' due to: {solrurl:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://solrurl:8984/solr, solrurl:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://solrurl:8983/solr}
I'm a bit puzzled here because I don't understand why it uses HTTP instead of HTTPS.
I can call the URLs with HTTPS without any problems.
When creating a new collection inside the Solr web interface, the same error occures.
SOLR_SSL_ENABLED variable is set to true.
So how can I change this behaviour? Is there anything I have to do beforehand to make Solr accept changes to the solr.in.cmd and solr.in.sh?
If you want to use inter node communcation over SSL, you have to tell Solr to use that first before bringing the nodes up.
You can use the bundled zkcli tool to set the cluster property first:
server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd clusterprop -name urlScheme -val https

SolrJ : Server refused connection

For a customer, I need to write a search engine running on Linux. I am using SolrJ and did not configure anything else so far.
I followed https://lucene.apache.org/solr/guide/7_4/using-solrj.html#common-build-systems and thus added SolrJ in the project pom.xml, and also that tutorial.
The SolR client is instanciated like :
solrClient = new HttpSolrClient.Builder(
GeneralSettings.getRootSolrPath() + "/" + getCollectionName()).
build();
But for any query or commit I keep getting org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost:8983/solr/test. I read http://lucene.472066.n3.nabble.com/Default-query-error-quot-Server-refused-connection-quot-td4010806.html but I am already using the expected port.
My understanding of the java doc SolrClient ’s handle the work of connecting to and communicating with Solr, and are where most of the user configuration happens. is that I only need to import the jar and then everything will work out of the box.
But as I keep getting this "Server refused connection" error I may have to configure something, but I could not find how to configure SolrJ (use solrconfig.xml or core.properties or call System.setProperty or call an API).
Please note that Apache may be running somewhere because I used to test some sites on it.
So how to get rid of this "Server refused connection" error?
Any help or tutorial to set SolrJ up based on Solr available doc would be very much appreciated,
Edit 2018-08-12 16:10
I thought SolrJ could work like Lucene, without a server, but it looks that I missed one essential piece: installing Solr (see https://www.baeldung.com/apache-solrj). I'll give it a try and post updates.
In case it might help someone else starting with SolrJ here are the steps I did to get rid of the error mentionned in the title (actually I followed https://www.baeldung.com/apache-solrj).
Downloaded the latest binary release of Solr
Extracted it somewhere
CDed into that dir
Lauched bin/solr start from that dir
Created a core with bin/solr create -c coreName (maybe another way exists but I hadn't been able to make it work so far)
Then Solr was running and listening on port 8983, and my Java app could connect to it via SolrJ.

Can I use StackDriver Trace PHP application in GKE?

I want to check latencies of RPC every day about CakePHP Application each endpoints running in GKE cluster. I found it is possible using php google client or zipkin server by reading documents , but I don't know how easy to introduce to our app though both seem tough for me.
In addition, I'm concerned about GKE cluster configuration has StackDriver Trace option though our cluster it sets disabled.Can we trace span if it sets enable?
Could you give some advices?
I succeeded to send gcp's trace api in php client via REST. It can see trace set by php client parameters , but my endpoint for trace api has stopped though I don't know why.Maybe ,it is not still supported well because the document have many ambiguous expression so, I realized watching server response by BigQuery with fluentd and DataStudio and it seem best solution because auto span can be set by table name with yyyymmdd and we can watch arbitrary metrics with custom query or calculation field.

Jackrabbit Oak: Getting started and connect to a standalone repository via RMI

I am totally new to Jackrabbit and Jackrabbit Oak. I worked a lot with Alfresco though, another JCR compliant open-source content repo.
I want to start a standalone Jackrabbit Oak repo, then connect to it via Java code. Unfortunately the Oak documentation is quite scarce.
I checked out the Oak repo, built it with mvn clean install and then ran the standalone server (memory repository is fine for me at the moment for testing) via:
$ java -jar oak-run-1.6-SNAPSHOT.jar server
Apache Jackrabbit Oak 1.6-SNAPSHOT
Starting Oak-Memory repository -> http://localhost:8080/
13:14:38.317 [main] WARN o.a.j.s.r.d.ProtectedRemoveManager - protectedhandlers-config is missing -> DIFF processing can fail for the Remove operation if the content toremove is protected!
When I open http://localhost:8080/ I see a blank page with code like this but the html / xhtml output as source like this:
I try to connect via Java code:
JcrUtils.getRepository("http://localhost:8080");
// or
JcrUtils.getRepository("http://localhost:8080/rmi");
but getting:
Connecting to http://localhost:8080
Exception in thread "main" javax.jcr.RepositoryException: Unable to access a repository with the following settings:
org.apache.jackrabbit.repository.uri: http://localhost:8080
The following RepositoryFactory classes were consulted:
org.apache.jackrabbit.oak.jcr.OakRepositoryFactory: declined
org.apache.jackrabbit.commons.JndiRepositoryFactory: declined
Perhaps the repository you are trying to access is not available at the moment.
at org.apache.jackrabbit.commons.JcrUtils.getRepository(JcrUtils.java:223)
at org.apache.jackrabbit.commons.JcrUtils.getRepository(JcrUtils.java:263)
at Main.main(Main.java:26)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
(The Oak documentation is not as complete as the Jackrabbit documentation, but I am also not sure how much of Jackrabbit 2 is still valid for Oak, since it's a complete rewrite.)
I found the same question in the mailing list/Nabble, but the provided answer there does not use a remote, standalone repository but a local one running in the same servlet container and even app (just that eventually the Mongo DB / Node store is configured as remote, but that would mean that the Mongo ports would need to be open). So the app creates the repository itself, which is not my case (I got this case working fine in Oak as well).
In Jackrabbit2 (not Oak), I can simply connect via
Repository repo = new URLRemoteRepository("http://localhost:8080/rmi");
and it's working fine, but this method is not available for Oak, it seems.
Is RMI not enabled by default in Oak? Is there a different URI to use?
However, the documentation of Oak says "Oak comes with a runnable jar" and the runnable jar offers the server method to start the server, so I assume that my scenario above is a valid one.
The blank page is a result of your browser being unable to parse the<title/> tag.
Go into developer mode to see how the browser incorrectly interpreted that tag.
Incorrect interpretation of title tag
i never saw an example of jackrabbit oak working like this.. are you sure it is possible to start oak outside of your application?
How do you set up the persistent store? (which one are you going to use?).
Here is the link how you normally set up jackrabbit oak: https://jackrabbit.apache.org/oak/docs/construct.html
For example if you use MongoDB as backend (which is the most powerful), you first connect to the db via
Db db = new MongoClient(ip, port).getDB("testDB");
where ip is the ip-address of your MongoDB-server with its port. This server doesn't need to be on the same machine like your Java code is running. You can even use instead of a single MongoDB instance a Replica set.
The same is valid by using a relational db.. only if you choose the tar-file system backend you are limited to your local machine.
Then, in a second step you create a jcr based on the chosen backend (see the link)

This resource can not be previewed at the moment. - CKAN

I’m running CKAN 2.2 on Ubuntu 12.04 LTS (GNU/Linux 3.2.0-23-generic x86_64).
I have uploaded a dataset to the CKAN instance. It has been uploaded successfully and can be downloaded as well. But when I try to preview the database I end up with below error.
This resource can not be previewed at the moment.
When I click on the “Click here for more information”, It says
Could not load preview: DataProxy returned an error (Request Error:
Backend did not respond after 10 seconds)
How can I fix this error?
The problem is that the data proxy (which is used to transform csv to
something that the data preview can understand) is a server on the
internet. Consequently the files you want to preview have to publicly
accessible from the internet as well. localhost is your own computer
which means that the dataproxy cannot access it. To solve this, wither put the file in the datastore using the datastorer or put the file on a server and provide the correct url.
.
This happens because the data proxy which is used to transform the
data into something we can preview with recline needs the files to be
accessible from the internet. The best solution is to store the data
in the datastore and then the preview will work.
Extracted from here & here
Sometimes you get the same message as the title question:
This resource can not be previewed at the moment.
But when you click on the “Click here for more information”, It says:
Could not load preview: DataProxy returned an error (Data transformation failed. error: An error occured while connecting to the server: DNS lookup failed for URL: http:///dataset/c3ce226b-73bd-4b06-9d1b-ffea13d5f770/resource/580fb05f-6d86-4748-aac7-560b904a208f/download/foo.csv)
In this case, probably the datapusher plugin is not working. First follow the instructions for datapusher in CKAN manual. If you already did this or you installed CKAN from a package, check the CKAN configuration in production.ini (development.ini) file. A small check list to solve the problem:
add datapusher in "ckan.plugins"
set "ckan.site_url"
set "ckan.datapusher.url"
check Apache/nginx server logs (/var/log/apache2/datapusher.*.log, /var/log/apache2/ckan_default*.log)
In my case, the issue was in my development.ini (or production.ini for you maybe) file where the lines for DataPusher's configuration were commented out with a # in the start of the line. Also, the ckan storage config line was also commented.
I uncommented those lines and it was solved.

Resources