Solr : replication options - solr

I've got a SOLR instance running behind a firewall. I'm about to put up another instance which will not be firewalled. Howevever, SOLR appears to only support pull replication and not push replication.
What are my options with regard to maintaining the same level of security? I'd rather not open too many ports in the firewall. Would HTTP over a SSH tunnel be the best option? Would it also be possible to just replicate the index files using plain old rsync (not using any SOLR specific features) or would this break something?

Would it also be possible to just replicate the index files using plain old rsync
Solr actually supports this kind of distribution with its snappuller mechanism, documented here: http://wiki.apache.org/solr/CollectionDistribution

I would open a port and specify the IP address of the slave, and just use ordinary HTTP-based replication; that would be quite secure, I think, and easier to maintain probably. I know it's not exactly where you were angling, but it's what I'd recommend.

I'm answering my own question as the solution i went for is different than what the two other answers suggested. I ended up using a SSH tunnel for HTTP traffic. Thus, i used SSH to redirect all traffic to port 8080 on the HostA to port 8080 on hostB through a SSH tunnel.
The solution appears to be working fine. I'm using a script which validates the tunnel every 5 minutes or so.

You could use HTTP basic authentication (see https://wiki.apache.org/solr/SolrReplication#Slave) but since the password will be passed in plain text, an SSH tunnel or secure VPN would also be required in order to deter more determined attackers.
I'll be going for a VPN solution to start with and consider an SSH tunnel before moving to production if we feel we are unable to place sufficient trust in our internal networks.

Related

Connecting to SolrCloud with SolrJ and Zookeeper with ACL

I'm trying to connect an application using SolrJ to a SolrCloud cluster via ZooKeeper which is secured using ACLs.
No problem you say? Right, we can just use the specific Java Properties (e.g. -DzkACLProvider etc.) and be done with it. Unfortunately we do not want to set these options with system properties because they would also contain the password. Our preferred way would be to provide these flags via environment variables. Looking into the SolrJ code this looks impossible without implementing some wrappers/extensions of SolrJ classes.
I've come so far that I think that maybe we're just doing it wrong, especially because I don't find any questions regarding this. Is this even the right way to do this? Should we even connect via Zookeeper? Or would you normally just connect to a loadbalanced SolrCloud HTTP endpoint and not connect to Zookeeper at all?
I'm thankful for any input!

Is it possible to restore a datomic backup to a local dynamodb

I'm trying to diagnose some performance issues, so I have a the Datomic transactor running locally backed by a local instance of DynamoDB. What I can't figure out is how to populate it from a backup of our primary Datomic environment. I know the basic command is:
>datomic restore-db s3://<BUCKET> datomic:ddb://<REGION>/<DB-NAME>
but how to I tell datomic to use the local dynamodb? It seems to only accept the valid AWS regions for REGION. I've also tried using datomic:ddb-local as the protocol but no luck there either.
How do I form the target URI? Or is this even possible?
You should be able to use a ddb-local URI as indicated here: http://docs.datomic.com/storage.html#dynamodb-local
It will be something like: datomic:ddb-local://localhost:8000/my-table/my-db-name?aws_access_key_id=ABC&aws_secret_key=DEF, assuming you're running ddb-local at localhost on port 8000.
Note that the ddb-local protocol does require an access key and secret, even though they are ignored.
Best,
Marshall

Apache Mod_JK and Load Balancing

I am using Tomcat 6 and have some questions about Apache mod_jk as follows.
Do I have to install Apache webserver to use mod_jk ??
If I run applications on 2 servers under Tomcat and load balance between them using mod_jk, will this also check the availability of the applications i.e. will it only send requests to one server if the application is down on the other server ??
If it checks for availability do you need to have multicast available on the network.
We intend to use tomcat clustering as well, will this work with mod_jk ??
Is there anything else I could use to load balance with availability checking for tomcat running applications ??.
Any help will be appreciated.
Cheers
Jeff
Yes.
Yes, unless you go out of your way to configure mod_jk not to do that.
No.
Yes, but it is not necessary.
Pretty much any H/W load-balancer, pretty much any web server that supports reverse proxy over HTTP or AJP.
You would be much better off using mod_proxy_ajp rather than mod_jk for this. It's much simpler to configure, none of those nasty JkMount things or the Tomcat listener that 'auto-configures' it for you, not, and it works a lot better too. It's also not deprecated, unlike mod_jk since Tomcat 5.5.
Yes , you must have a Apache/Httpd installed on your webserver, on this you can perform Load balancing using mod_jk/mod_cluster/mod_proxy. Hope currently you are using mod_jk.
You are right. This can be enable using session. If you want one session to a corresponding server instance only means you can enable session stickiness. And the load balancing will be based on the "lbfactor" which you are mentioning on the "worker.properties" of your mod_jk. "redirect" option for failover also available in worker.properties. Failover can be done from Application server side as well.
As far as I knew if you are enabling failover in Application server, multicast address will be available by default. Only thing you need to do is port opening.
Mod_jk will will work with clustring in tomcat/Jboss perfectly.
As I mentioned above in Answer "1" you can use any load balancing for tomcat.

Set up a webserver for multiple users and makes PHP scripts run under their account (with their permissions)

I'm setting up an Apache 2.2 webserver for multiple users (having the "developers" profile).
They need to execute PHP scripts/applications (both home-made and acquired) and run
I tried using *mod_userdir* but the problem is that Apache (thus the scripts) runs under "www-data" (I'm using GNU/Debian OS).
So I looked at suPHP but it doesn't support *php_admin_value* Apache directives.
I also saw apache2-mpm-itk mentioned but it uses virtual hosts, which itself requires DNS.
I think I could see some workaround to that if I was to install a DNS server on the webserver managing a subdomain via delegation (eg. my webserver's FQDN is "testsrv.mycompany.tld" and users's virtual host's FQDN would be "user1.testsrv.mycompany.tld", "user2.testsrv.mycompany.tld"). But it might a bit "too much" no?
You could use virtual hosts along with mod_auth_basic so user1 would have a password protected site at www.user1.example.com.
If by 'php_admin_value' you are refering to the .htaccess files, then yes they are not supported by suPHP but I believe there is a way around that.
Finally, I am setting up my server locally (for testing) so I just updated my /etc/hosts/ file. That might be a good place for you to start.

Determining DB in use from http query

Is there a simple way to determine what database is in use behind a website from an external HTTP request? i.e., I make an HTTP request, get back whatever data is going to come from the webserver - can I inspect any of that and reliably determine that DB in use? I am thinking not, but figured I would ask this group.
No. The same answer could come from a static file, a SQL database, or a martian telepath.
No and for a good reason. If there were it would be a security hole. Unless it is a part of the application functionality.
For most websites, the answer is no, however, you may find security holes which reveal this information. For example, it's possible to get this information if the site isn't coded against sql injection attacks. For example, try entering the following as your user name:
'; select version();
On shared hosting system, they often don't have a firewall protecting the database from external connections.
Try the following:
telnet localhost 3306
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
5
5.0.51a—Bjb-W
This tells you that the server is running MySQL version 5.0.51a. MSSQL and Sybase also identify their version number before the client attempts to login.
Probably the easiest way is just to ask the webmaster. If your not a hacker, and the site isn't a bank, they will likely tell you.

Resources