Zookeeper + Solr in TestContainers - solr

I'm using org.testcontainers to perform integration testing with Solr.
[Using SolrJ in my unit tests]
When I start Solr in cloud mode, using an embedded ZooKeeper instance, I'm able to connect to the solr instance from my unit test, but unable to connect to ZooKeeper from my SolrClient.
I think this is because embedded ZooKeeper is bound to IP 127.0.0.1 and inaccessible.
If I start two separate containers [using a shared network], ZooKeeper and Solr, I can connect Solr to ZooKeeper, and I can connect to Zookeeper from my unit tests, BUT when Zookeeper returns the active SOLR node, it return the internal server IP which is not accessible from my unit test [in my SolrJ client].
I'm not sure where to go with this.
Maybe there is a network mode that will do address translation?
Thoughts?

UPDATE:
There is an official Testcontainers Module: https://www.testcontainers.org/modules/solr/
It seems that this problem can`t be solved that easy.
One way would be to use fixed ports with testcontainer. In this case the ports 9983 and 8983 will be mapped to the same ports on the host. This makes it possible to use the Solr Cloud Client. But this only works if you can ensure that tests will run sequentially, which can be a bit tricky, e.g. on Jenkins with Feature Branches.
A different solution would be to use another client. Since Solrj provides multiple Clients, you can choose which one you want to use. If you only want to search or update you can use the LBHttp2SolrClient which load balances between multiple nodes. If you want to use a specific client for the Integration Tests this example could work:
// Create the solr container.
SolrContainer container = new SolrContainer();
// Start the container. This step might take some time...
container.start();
// Do whatever you want with the client ...
SolrClient client = new Http2SolrClient.Builder("http://localhost:" + container.getSolrPort() + "/solr").build();
SolrPingResponse response = client.ping("dummy");
// Stop the container.
container.stop();
Here is a list of solr client in java: https://lucene.apache.org/solr/guide/8_3/using-solrj.html#types-of-solrclients

I ran into the exact same issue. I solved it using a proxy. In my docker_compose.yml I added:
squid:
image: sameersbn/squid:3.5.27-2
ports:
- "3128:3128"
volumes:
- ./squid.conf:/etc/squid/squid.conf
- ./cache:/var/spool/squid
restart: always
networks:
- solr
And in the configuration of the SolrClient I added:
[...]
HttpClient httpClient = HttpClientBuilder.create().setProxy(new HttpHost("localhost", 3128)).build()
CloudSolrClient c = new CloudSolrClient.Builder(getZookeeperList(), Optional.empty()).withHttpClient(httpClient).build();
[...]
protected List<String> getZookeeperList() {
List<String> zookeeperList = new ArrayList<String>();
for (Zookeepers z : Zookeepers.values()) {
zookeeperList.add(testcontainer.getServiceHost(z.getServicename(), z.getPort()) + ":"
+ testcontainer.getServicePort(z.getServicename(), z.getPort()));
}
return zookeeperList;
}
But I'd still be interested in the workaround, that Jeremy mentioned in this comment.

Related

gcloud - Can't configure my VPC connector to work with my Redis instance

I'm facing a problem with gcloud and their support can't seem to help me.
So, to put my app in prod I need to use a redis instance to host some data. I'm using memorystore because I like to have everything on gcloud.
My app is in the standard environment on app engine so on their doc (https://cloud.google.com/memorystore/docs/redis/connect-redis-instance-standard) they ask me to configure a VPC connector. But I think that the CIDR that I put is always wrong, can someone help me finding the good CIDR.
connectMode: DIRECT_PEERING
createTime: '2020-03-13T17:20:51.590448560Z'
currentLocationId: europe-west1-d
displayName: APP-EPG
host: 10.240.224.179
locationId: europe-west1-d
memorySizeGb: 1
name: projects/*************/locations/europe-west1/instances/app-epg
persistenceIamIdentity: *************
port: 6379
redisVersion: REDIS_4_0
reservedIpRange: 10.240.224.176/29
state: READY
tier: BASIC
Thank you all !
First in order to VPC connector work yor App Engine instances have to be in the same VPC & region that your Redis instance is. If not there will not be connectivity between the two.
Also make sure you redis and app use one of the approved locations. By now it's a lot of them.
Your redis instance is in europe-west1 region so to create your VPC connector you have to set the name of the VPC network your redis instance is in
(for example "default").
IP range you were asking about is any range (not reserved by the network redis instance is in).
So - for example if your "default" network is 10.13.0.0/28 then you have to specify something else like 10.140.0.0/28 etc. It has to be /29 - otherwise you won't be able to create the connector.
Why 10.13.0.0 or any other addresses ? They are going to be assigned as the source network for you apps to connect to the Redis (or any
other VM's) in the specified network.
I've tested it using the command:
cloud compute networks vpc-access connectors create conn2 --network default /
--range 10.13.0.0/28 --region=europe-west1
Or you can do it using console in Serverless VPC Access and clicking "Add new connector";
You can also read documentation on how to create a connector.

How to make a request in SolrCloud?

I have a Node.js app and I used to have a standalone Solr but then our company decided to use SolrCloud to provide failover.
In the standalone Solr I had only the one server with it and I had all my requests like: http://solr_server:8983/solr/mycore/select?indent=on&q=*:*&wt=json so all requests led to the same server all the time.
But now I have 3 different instances with 1 ZooKeeper and 1 Solr node on each of them and my requests look like this: http://solr_server_1:8983/solr/mycollection/select?q=*:*
And now the question: what if the solr_server_1 will go down? How can I still get my results? How can I handle requests in this case?
If you're doing this manually: You'll have to catch the exception when the connection fails, and then retry the next server in your list.
let servers = ['ip1:8983', 'ip2:8983', 'ip3:8983']
If you're using a library that supports Zookeeper (i.e. it connects to zookeeper to find out what the live nodes are), you give the client a list of zookeeper nodes and lets it figure out the rest. node-solr-smart-client is a client that supports Zookeeper as well.
options = {
zkConnectionString: 'ip1:2181,ip2:2181,ip3:2181',
// etc.
}
solrSmartClient.createClient('my_solr_collection', options, function (err, solrClient) {

Jackrabbit Oak: Getting started and connect to a standalone repository via RMI

I am totally new to Jackrabbit and Jackrabbit Oak. I worked a lot with Alfresco though, another JCR compliant open-source content repo.
I want to start a standalone Jackrabbit Oak repo, then connect to it via Java code. Unfortunately the Oak documentation is quite scarce.
I checked out the Oak repo, built it with mvn clean install and then ran the standalone server (memory repository is fine for me at the moment for testing) via:
$ java -jar oak-run-1.6-SNAPSHOT.jar server
Apache Jackrabbit Oak 1.6-SNAPSHOT
Starting Oak-Memory repository -> http://localhost:8080/
13:14:38.317 [main] WARN o.a.j.s.r.d.ProtectedRemoveManager - protectedhandlers-config is missing -> DIFF processing can fail for the Remove operation if the content toremove is protected!
When I open http://localhost:8080/ I see a blank page with code like this but the html / xhtml output as source like this:
I try to connect via Java code:
JcrUtils.getRepository("http://localhost:8080");
// or
JcrUtils.getRepository("http://localhost:8080/rmi");
but getting:
Connecting to http://localhost:8080
Exception in thread "main" javax.jcr.RepositoryException: Unable to access a repository with the following settings:
org.apache.jackrabbit.repository.uri: http://localhost:8080
The following RepositoryFactory classes were consulted:
org.apache.jackrabbit.oak.jcr.OakRepositoryFactory: declined
org.apache.jackrabbit.commons.JndiRepositoryFactory: declined
Perhaps the repository you are trying to access is not available at the moment.
at org.apache.jackrabbit.commons.JcrUtils.getRepository(JcrUtils.java:223)
at org.apache.jackrabbit.commons.JcrUtils.getRepository(JcrUtils.java:263)
at Main.main(Main.java:26)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
(The Oak documentation is not as complete as the Jackrabbit documentation, but I am also not sure how much of Jackrabbit 2 is still valid for Oak, since it's a complete rewrite.)
I found the same question in the mailing list/Nabble, but the provided answer there does not use a remote, standalone repository but a local one running in the same servlet container and even app (just that eventually the Mongo DB / Node store is configured as remote, but that would mean that the Mongo ports would need to be open). So the app creates the repository itself, which is not my case (I got this case working fine in Oak as well).
In Jackrabbit2 (not Oak), I can simply connect via
Repository repo = new URLRemoteRepository("http://localhost:8080/rmi");
and it's working fine, but this method is not available for Oak, it seems.
Is RMI not enabled by default in Oak? Is there a different URI to use?
However, the documentation of Oak says "Oak comes with a runnable jar" and the runnable jar offers the server method to start the server, so I assume that my scenario above is a valid one.
The blank page is a result of your browser being unable to parse the<title/> tag.
Go into developer mode to see how the browser incorrectly interpreted that tag.
Incorrect interpretation of title tag
i never saw an example of jackrabbit oak working like this.. are you sure it is possible to start oak outside of your application?
How do you set up the persistent store? (which one are you going to use?).
Here is the link how you normally set up jackrabbit oak: https://jackrabbit.apache.org/oak/docs/construct.html
For example if you use MongoDB as backend (which is the most powerful), you first connect to the db via
Db db = new MongoClient(ip, port).getDB("testDB");
where ip is the ip-address of your MongoDB-server with its port. This server doesn't need to be on the same machine like your Java code is running. You can even use instead of a single MongoDB instance a Replica set.
The same is valid by using a relational db.. only if you choose the tar-file system backend you are limited to your local machine.
Then, in a second step you create a jcr based on the chosen backend (see the link)

Distribute Solr Using Replication without Using SolrCloud

I want to use Solr replication without using SolrCloud.
I have three Solr servers, one is master and others are slave.
How to dispatch the search query on the Solr server which isn't busy?
What tools do and how to lead?
You can use any load balancer - Solr talks HTTP, which makes any existing load balancing technology available. HAProxy, varnish, nginx, etc. will all work as you expect, and you'll be able to use all the advanced features that the different packages offer. It'll also be independent of the client, meaning that you're not limited to the LBHttpSolrServer class from SolrJ or what your particular client offers. Certain LB solutions also offer high throughput caching (varnish) or dynamic real time fallover between live nodes.
Another option we've also used successfully is to replicate the core to each web node, allowing us to always query localhost for searching.
You have configured solr in master-slave mode. I think you can use LBHttpSolrServer from solrj api for querying the solr. You need to send the update requests to master node explicitly. The LBHttpSolrServer will provide you the load balancing among all the specified nodes. In the master-slave mode, slave are responsible for keeping themselves updated with the master.
Do NOT use this class for indexing in master/slave scenarios since documents must be sent to the correct master; no inter-node routing is done. In SolrCloud (leader/replica) scenarios, this class may be used for updates since updates will be forwarded to the appropriate leader.
I hope this will help.
apache camel can be used for general load balancer. like this:
public class LoadBalancer {
public static void main(String args[]) throws Exception {
CamelContext context = new DefaultCamelContext();
context.addRoutes(new RouteBuilder() {
public void configure() {
from("jetty://http://localhost:8080")
.loadBalance().roundRobin().to("http://172.28.39.138:8080","http://172.168.20.118:8080");
}
});
context.start();
Thread.sleep(100000);
context.stop();
}
}
There is some other materials maybe useful:
Basic Apache Camel LoadBalancer Failover Example
http://camel.apache.org/load-balancer.html
But is seems there are not straight way to solr-camel integration, because camel can be used to balance the requests upon he java "Beans" components
http://camel.apache.org/loadbalancing-mina-example.html
There is another useful example:
https://svn.apache.org/repos/asf/camel/trunk/camel-core/src/test/java/org/apache/camel/processor/CustomLoadBalanceTest.java
And you can use camel as a proxy between client and server
http://camel.apache.org/how-to-use-camel-as-a-http-proxy-between-a-client-and-server.html
There are some presentation to beginning with apache camel,its approach and architecture:
http://www.slideshare.net/ieugen222/eip-cu-apache-camel

solr4.0, wrong hostname when there is a dot

I am building a full distributed solr cluster on different servers, the servers are server1.mycompany.com and server2.mycompany.com. after configuration when I click server1.mycompany.com in web console, solr try to connect server1:8983/solr rather than server1.mycompany.com:8983/solr. And I use java API:
ZkStateReader zkStateReader = cloudQuery.getCloudSolrServer().getZkStateReader();
ClusterState clusterState = zkStateReader.getClusterState();
System.out.println(clusterState);
I still get "node_name":"server1:8983_solr","base_url":"http://server1:8983/solr".
anyone give a hint? is this a bug or something to be configured?

Resources