SOLR on Elastic Beanstalk - solr

I would like to run a SOLR Server on an Elastic Beanstalk. But I cannot find that much about that in the web.
It must be possible somehow, 'cause some are using it already. (https://forums.aws.amazon.com/thread.jspa?threadID=91276 i.e.)
Any Ideas how I could do that?
Well, somehow I can upload the solr warfile into the environment, but then it gets complicated.
Where do I put the config files and the index directory, so that each instance can reach it?

EDIT: Please keep in mind that this answer is from 2013. The products mentioned here have likely evolved. I have updated the documentation link to reflect changes in the solr clustering wiki. I encourage you to continue your research after reading this information.
ORIGINAL:
It only really makes sense to run solr on beanstalk instances if you are planning to only ever use the single server deploy. The minute that you want to scale your app you will need to configure your beanstalk environment to either create a solr cluster or move to something like CloudSearch. If you are unfamiliar with ec2 lifecycles and solr deployments then CloudSearch will almost certainly save you time (read money).
If you do want to run solr on a single instance then you can use rake to launch it by adding a file to your local repo named .ebextensions/solr.config with the following contents:
container_commands:
01create_post_dir:
command: "mkdir -p /opt/elasticbeanstalk/hooks/appdeploy/post"
ignoreErrors: true
02killjava:
command: "killall java"
test: "ps uax | grep java | grep root"
ignoreErrors: true
files:
"/opt/elasticbeanstalk/hooks/appdeploy/post/99_start_solr.sh":
mode: "755"
owner: "root"
group: "root"
content: |
#!/usr/bin/env bash
. /opt/elasticbeanstalk/support/envvars
cd $EB_CONFIG_APP_CURRENT
su -c "RAILS_ENV=production bundle exec rake sunspot:solr:start" $EB_CONFIG_APP_USER
su -c "RAILS_ENV=production bundle exec rake db:seed" $EB_CONFIG_APP_USER
su -c "RAILS_ENV=production bundle exec rake sunspot:reindex" $EB_CONFIG_APP_USER
Please keep in mind that this will cause chaos if you are using autoscaling.

Related

How to deploy SQL Server Express on Docker Desktop Kubernetes

I've been studying "Kubernetes Up and Running" by Hightower et al (first edition) Chapter 13 where they discussed creating a Reliable MySQL Singleton (Since I just discovered that there is a second edition, I guess I'll be buying it soon).
Using their MySQL reliable singleton example as a model, I've been looking for some sample YAML files to make a similar deployment with Microsoft SQL Server (Express) on Docker Desktop for Kubernetes.
Apparently I need YAML files to deploy
Persistent Volume
Volume claim (should this be NFS?)
SQL Server (Express edition) replica set (in spite of the fact that this is just a singleton).
I've tried this example but I'm confused because it does not contain a persistent volume & claim and it does not work. I get the error
Error: unable to recognize "sqlserver.yml": no matches for kind "Deployment" in version "apps/v1beta1"
Can someone please point me to some sample YAML files that are not Azure specific that will work on Docker Desktop Kubernetes for Windows 10? After debugging my application, I'll want to deploy this to Azure (AKS).
Wed Jul 15 2020 Update
I left out the "-n namespace" for the helm install command (possibly because I'm using Helm and you are using helm v2?).
That install command still did not work. Then I did a
helm repo add stable https://kubernetes-charts.storage.googleapis.com/
Now this command works:
helm install todo-app-database stable/mssql-linux
Progress!
When I do a "k get pods" I see that my todo-app-mssql-linux database is in the pending state. So I did a
kubectl get events
and I see
Warning FailedScheduling pod/todo-app-database-mssql-linux-8668d9b88c-lsh5l 0/1 nodes are available: 1 Insufficient memory.
I've been google searching for "Kubernetes insufficient memory" and can find no match.
I suspect this is a problem specific to "Docker Desktop Kubernetes".
When I look at the output for
helm -n ns-todolistdemo template todo-app-database stable/mssql-linux
I see the deployment is asking for 2Gi. (Interesting: when I use the template command, the "-n ns-todolistdemo" does not cause an error like it does with the install command).
So I do
kubectl describe deployment todo-app-database-mssql-linux >todo-app-database-mssql-linux.yaml
I edit the yaml file to change 2Gi to 1Gi.
kubectl apply -f todo-app-database-mssql-linux.yaml
I get this error:
error: error parsing todo-app-database-mssql-linux.yaml: error converting YAML to JSON: yaml: line 9: mapping values are not allowed in this context
Hmm... that did not work. I try delete:
kubectl delete deployment todo-app-database-mssql-linux
kubectl create -f todo-app-database-mssql-linux.yaml
I get this error:
error: error validating "todo-app-database-mssql-linux.yaml": error validating data: invalid object to validate; if you choose to ignore these errors, turn validation off with --validate=false
So I try apply:
kubectl apply -f todo-app-database-mssql-linux.yaml
Same error!
Shucks.... Is there a way to adjust the memory allocation for Docker Desktop?
Thank you
Siegfried
Short answer
https://github.com/helm/charts/blob/master/stable/mssql-linux/templates/pvc-master.yaml
Detailed Answer
Docker For Desktop comes already with a default StorageClass :
This storage class is responsible for auto-provisioning of PV whenever you create a PVC.
If you have a YAML definition of PVC (persistent volume claim), you just need to keep storageClass empty, so it will use the default.
k get storageclass
NAME PROVISIONER AGE
hostpath (default) docker.io/hostpath 11d
This is fair enough as Docker-For-Desktop Cluster is a one node cluster. So if your DB crashes and the cluster opens it again , it will not move to another node, because simply, you have a single node :)
Now should write the YAML of PVC from scratch ?
No , you don't need. Because Helm should be your best friend.
( I explained below Why you have to use Helm even without deep learning curve)
Fortunately, the community provides a chart called stable/mssql-linux..
Let's run it together :
helm -n <your-namespace> install todo-app-database stable/mssql-linux
# helm -n <namespace> install <release-name> <chart-name-from-community>
If you want to check the YAML (namely PVC) that Helm computed, you can run template instead of install
helm -n <your-namespace> template todo-app-database stable/mssql-linux
Why I give you the answer with Helm ?
Writing YAML from scratch lets reinventing the wheel while others do it.
The most efficient way is to reuse what community prepare for you.
However, you may ask: How can i reuse what others doing ?
That's why Helm comes.
Helm comes to be your installer of any application on top of kubernetes regardless how much YAML does your app require.
Install it now and hit the ground choco install kubernetes-helm

Share storage bucket between apps

I have an internal tool that lets me edit configuration files and then the config files gets synced to Google Storage (* * * * * gsutil -m rsync -d /data/www/config_files/ gs://my-site.appspot.com/configs/).
How can I use these config files across multiple instances in Google App Engine? (I don't want to use the Google PHP SDK to read / write to the config files in the bucket).
Only thing I can come up with is making a cron.yaml file that downloads the configs from the bucket to /app/configs/ every minute, but then I'd have to reload php-fpm every minute as well.
app.yaml:
runtime: custom
env: flex
service: my-site
env_variables:
CONFIG_DIR: /app/configs
resources:
cpu: 1
memory_gb: 0.5
disk_size_gb: 10
automatic_scaling:
min_num_instances: 2
max_num_instances: 20
cpu_utilization:
target_utilization: 0.5
Dockerfile:
FROM eu.gcr.io/google-appengine/php71
RUN mkdir -p /app;
ADD . /app
RUN chmod -R a+r /app
I am assuming you are designing a solution where you can use pull requests on the GCS bucket to get configuration and update your apps en mass quickly.
There are many points in the process, depending on your exact flow, where you can insert a "please update now" command. For example, why can't you simply queue a task as you update the configuration in your GCS bucket? That task will basically down the configuration and redeloy your application.
Unless you are thinking about using multiple applications that have access to that bucket, and you want to be able to update them at the same time centrally. In that case, your cron job solution makes sense. Dan's suggestion definitely works, but I think you can make it easier by using version numbers. Simply have another file with a version number in it, the cron job pulls that file, compares it and then performs an update if the version is newer. It's very similar to Dan's solution except you don't really need to hash anything. If you are updating GCS with your configurations, might as well tag on another file with the version information.
Another solution is to expose a handler in all those applications, for example an "/update" handler. Whenever it's hit, the application performs the update. You can hit that handler whenever you actually update the configuration in your GCS. This is more of a push solution. The advantage is that you have more control over which applications gets the updates, this might be useful if you aren't sure about a certain configuration yet so you don't want to update everything at once.
We did not want to add a handler in our application for this. We thought it was best to use supervisord.
additional-supervisord.conf:
[program:sync-configs]
command = /app/scripts/sync_configs.sh
startsecs = 0
autorestart = false
startretries = 1
sync_configs.sh:
#!/usr/bin/env bash
while true; do
# Sync configs from Google Storage.
gsutil -m rsync -c -r ${CONFIG_BUCKET} /app/config
# Reload PHP-FPM
ps ax | grep php-fpm | cut -f2 -d" " - | xargs kill -s USR2
# Wait 60 seconds.
sleep 60
done
Dockerfile:
COPY additional-supervisord.conf /etc/supervisor/conf.d/

Ansible Issue - [Errno 2] No such file or directory

I've been holding off posting here because I feel like this issue could be too vague. I will try my best to explain. I have been through all of the existing questions but they don't seem relevant to what I am doing.
Basically, I have inherited 3 Ec2 Instances that are Dev / Staging / Live web applications in my new role. I use Ansible playbooks to migrate the Database between all environments. We recently had a new website that was deployed onto all three existing instances.
The Dev box recently died so I blew it away and launched a new one, the website looks fine, however exporting and importing the Database no longer works (on the new instance)
Below is the Ansible output:
TASK: [Export database to migrate] ********************************************
failed: [172.**.**.***] => {"changed": true, "cmd": "wp db export dbv2.sql --tables=t*******0_links,t*******0_options,t*******0_postmeta,t*******0_posts,taxlt4ws0_rg_form,taxlt4ws0_rg_form_meta,taxlt4ws0_rg_form_view,t*******0_term_relationships,t*******0_term_taxonomy,t*******0_termmeta,t*******0_terms,t*******0_usermeta,t*******0_users", "delta": "0:00:00.001594", "end": "2017-09-01 10:21:25.225355", "rc": 127, "start": "2017-09-01 10:21:25.223761", "warnings": []}
stderr: /bin/sh: 1: wp: not found
FATAL: all hosts have already failed -- aborting
Things I've checked:
Chmod on the folders it import/exports in/from.
IAM Role is set
Used Shell instead of Command in the Playbook
Configs for each environment
I'm really stumped my Ansible knowledge is quite limited as I only picked it up a couple months ago and hadn't run into any issues (even with a new Website) until the Dev box had to be replaced.
I think ansible is referring to wpcli. It is not able to find its executable.
If this is the case,you need to install it with another task before that one.
Basically what this is complaining about is that whatever script you are using in module Export DB is not able to find a wp script or executable.
stderr: /bin/sh: 1: wp: not found
Would recommend checking which wp or maybe do a find to see if it is on the staging or live instances to see what it is and install/copy it over to the Dev instances.
You can test this hypothesis by using a small test script:
#!/bin/sh
wp
create this script say test.sh, give it executable permissions and run it on all the env's to see where it fails.

How to correctly add additional SOLR 5 (vm) nodes to SOLR Cloud

I have a SOLR / Zookeeper / Kafka setup. Each on separate VMs.
I have successfully run this all using two SOLR 4.9 vms (Ubuntu)
Now I wish to build two SOLR 5.4 vms and get it all working again.
Essentially, "Upgrade by Replacement"
I have "hacked" a solution to my problem but that makes me very nervous.
To begin, Zookeeper is running. I turn off my SOLR 4.9 vms and delete the config out of Zookeeper (not necessarily in that order... ;-) )
Now, I start up my 'solr5' VM (and SOLR in cloud mode) where I have installed SOLR 5.4 according to the "Production Install" instructions on the SOLR Wiki. I have also installed 5.4 on 'solr6', but it's not running yet.
I issue this command on the 'solr5' machine:
/opt/solr/bin/solr create -c fooCollection -d /home/john/conf -shards 1 -replicationFactor 1
and I get the following output:
Connecting to ZooKeeper at 192.168.56.5,192.168.56.6,192.168.56.7/solr ...
Re-using existing configuration directory statdx
Creating new collection 'fooCollection' using command:
http://localhost:8983/solr/admin/collections?action=CREATE&name=fooCollection&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=fooCollection
{
"responseHeader":{
"status":0,
"QTime":3822},
"success":{"":{
"responseHeader":{
"status":0,
"QTime":3640},
"core":"fooCollection_shard1_replica1"}}}
Everything is working great. I turn on my microservice, and it pumps all my SOLR docs from Kafka into 'solr5'.
Now, I want to add 'solr6' to the collection. I can't find a way to do this besides my hack (which I'll describe later).
The command I used before to create a collection, errors out with the observation that my collection already exists.
There seems to be no zkcli.sh or solr command that will do what I want. None of the api commands seem to do this either.
Is there not a simple way to say to (SOLR? Zookeeper?) I want to add another machine to my SOLR nodes, please configure it like the first (solr5) and begin replicating data?
Maybe I should have had both machines running when I issued the create command?
I'd be grateful for some "approved" method for doing this since I need to come up with a "solution" to do the same kind of approach in Prod every time there is a need to upgrade SOLR.
Now for my hack. Keep in mind I'm now two days trying to find clear docs on this. No flames please, I totally get that this is not the way to do things. At least, I HOPE this is not the way to do things...
Copy the fooCollection directory from where the create collection
command put it on 'solr5' (which was
/opt/solr/server/solr/fooCollection_shard1_replica1) to the same
location on my 'solr6' VM.
Make what changes seem logical to the collection directory name (becomes
fooCollection_shard1_replica2)
Make what changes seem logical in the core.properties file:
For reference, here's the core.properties file that was created by the create command.
#Written by CorePropertiesLocator
#Wed Jan 20 18:59:08 UTC 2016
numShards=1
name=fooCollection_shard1_replica1
shard=shard1
collection=fooCollection
coreNodeName=core_node1
Here is what the file looked like on 'solr6' when I was done hacking.
#Written by CorePropertiesLocator
#Wed Jan 20 18:59:08 UTC 2016
numShards=1
name=fooCollection_shard1_replica2
shard=shard1
collection=fooCollection
coreNodeName=core_node2
When I did this and rebooted 'solr6' everything appeared golden. The "Cloud" web page looked right in the Admin web page - and when I added documents to 'solr5' they were available in 'solr6' if I hit it directly from the Admin web pages.
I would be grateful if someone can tell me how to achieve this without a hack like this... or if this IS the right way to do this...
=============================
In answer to #Mani and the suggested procedure
Thanks Mani - I did try this very carefully following your steps.
In the end, I get this output from the collection status query:
john#solr6:/opt/solr$ ./bin/solr healthcheck -z 192.168.56.5,192.168.56.6,192.168.56.7/solr5_4 -c fooCollection
{
"collection":"fooCollection",
"status":"healthy",
"numDocs":0,
"numShards":1,
"shards":[{
"shard":"shard1",
"status":"healthy",
"replicas":[{
"name":"core_node1",
"url":"http://192.168.56.15:8983/solr/fooCollection_shard1_replica1/",
"numDocs":0,
"status":"active",
"uptime":"0 days, 0 hours, 6 minutes, 24 seconds",
"memory":"31 MB (%6.3) of 490.7 MB",
"leader":true}]}]}
This is the kind of result I've been finding in my experimentation all along. The core will get created on one of the SOLR VM's (the one I issue the command line to create the collection on) but I don't get anything created on the other VM -- which, based on your steps below, I believe you also thought should occur, yes?
Also, I'll note for anyone reading that in 5.4, the command is "healthcheck" and not healthstatus. The command line shows you immediately, so it's no big deal.
===============
Update 1 :: Manual add of 2nd core
If I go to the other VM and manually add the following:
sudo mkdir /opt/solr/server/solr/fooCollection_shard1_replica2
sudo mkdir /opt/solr/server/solr/fooCollection_shard1_replica2/data
nano /opt/solr/server/solr/fooCollection_shard1_replica2/core.properties
(in here I add only collection=fooCollection and then save/close)
Then I reboot my SOLR server on that same VM:
sudo /opt/solr/bin/solr restart -c -z zoo1,zoo2,zoo3/solr
I will find a second node magically appearing in my Admin console. It will be a "follower" (I.E. not the leader) and both will be branching off "shard1" in the cloud UI.
I don't know if this is "the way" but it's the only way I've found so far. I'm going to reproduce to that point and try with the Admin UI and see what I get. That would be a little easier for my IT guys when the time comes - if it works.
===============
Update 2 :: Slight modification of create command
#Mani -- I believe I have success following your steps - and like many things, it's simple once you understand.
I reset everything (deleted directories, cleared out zookeeper (rmr /solr) and re did everything from scratch.
I changed the "create" command slightly thus:
./bin/solr create -c fooCollection -d /home/john/conf -shards 1 -replicationFactor 2
Note the "replicationFactor 2" rather than 1.
Suddenly I did indeed have cores on both VMs.
A couple of notes:
I found that I couldn't get a happy result from the status call just by starting the SOLR 5.4 servers in Cloud mode with the Zookeeper IP addresses. The "node" in Zookeeper was not yet created.
The create command also failed at that point.
The way I found around this was to use the zkcli.sh to load the configs like this:
sudo /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -confdir /home/john/conf/ -confname fooCollection -z 192.168.56.5/solr
When I checked Zookeeper immediately after running this command, there was a /solr/configs/fooCollection "path".
NOW the create command works and I assume that if I had wanted to override the configs, I could have done so at that point although I haven't tried.
I'm not positive at what point, but it seems I needed to reboot the SOLR Servers (probably after the create command) in order to find everything on status etc... I may be misremembering that because I've been through it so many times. If in doubt after the create command, try a reboot of the servers. (This can be IP addresses or names that resolve correctly)
sudo /opt/solr/bin/solr restart -c -z zoo1,zoo2,zoo3/solr
sudo /opt/solr/bin/solr restart -c -z 192.168.56.5,192.168.56.6,192.168.56.7/solr
After doing these slight modifications to #Mani's recommended procedure, I get a Leader and a "follower" each on different VM's - in the /opt/solr/server/solr directory (fooCollection in this case) and I was able to send data in to one and search the other via the Admin console hitting the IP addresses.
=============
Variations
One thing anyone reading this may want to try is simply making another "node" in Zookeeper (solr5_4 for example).
I tried this and it works like a charm. Everywhere you see the /solr chroot associated with the Zookeeper ensemble, you could replace it with /solr5_4. This would allow the older SOLR VM's to keep functioning in Prod while you build out your new SOLR 5.4 "environment" and the same Zookeeper VM's could be used for both -- because a different chroot should guarantee no interaction or overlap.
Again, the "node" in Zookeeper won't be created until you do the config upload, but you need to start your SOLR process like this or you'd be in the wrong context later on. Note the "solr5_4" as the chroot.
sudo /opt/solr/bin/solr restart -c -z zoo1,zoo2,zoo3/solr5_4
Once done with testing, the solr5_4 "environment" becomes what matters for Prod and the SOLR 4.x VM's and Zookeeper "node" of solr can be removed. It should be a fairly simple matter to point a load balancer at the new SOLR VM's and do a switchover without users really even noticing.
This strategy will work for SOLR 6, 6.5, 7, and so on.
This command also worked to add the collections/cores. However, the solr server had to be running first.
http://192.168.56.16:8983/solr/admin/collections?action=CREATE&name=fooCollection&numShards=1&replicationFactor=2&collection.configName=fooCollection
==================
Use as Upgrade By Replacement
In case it's not obvious, this technique (especially if using the "new" chroot in Zookeeper of something like /solr5_4 or similar) gives you the luxury of leaving your older version of SOLR running for as long as you want. Allowing a re-indexing of all your data to take days if needed.
I haven't tried, but I'm guessing a backup of the index could be dropped into the new machines as well.
I just wanted readers to understand that this was an approach intended to make upgrades really low stress and straightforward. (Don't need to upgrade in place, just build new VMs and install latest version of SOLR.)
This would allow the switch-over to occur without affecting prod until you're ready to drop the hammer and re-direct your load balancer at the new SOLR ip addresses (Which you will have already tested of course...)
The one assumption here is that you have the resources to bring up a set of SOLR VMs or physical servers to match whatever you already have in Production. Obviously, if you're resource-limited to only the boxes or VMs you have, upgrade-in-place may be your only option.
This is how I would do it. I am assuming that you have the luxury of having downtime & have ability to completely reindex the documents. Since you are essentially upgrading from 4.9 to 5.4.
Stop the 4.9 solr nodes and uninstall solr.
Remove the config from zk nodes using zkcli.sh with the clear command.
Install the solr on both solr5 & solr6 vm
Start both the solr nodes and make sure both can talk to zk. =>
On solr5 vm ./bin/solr start -c -z zk1:port1,zk2:port1,zk3:port1
On solr6 vm ./bin/solr start -c -z zk1:port1,zk2:port1,zk3:port1
Verify the status of Solrcloud using ./bin/solr status => this should return liveNodes as 2
Now create the fooCollection using the CollectionsAPI from anyone of solr nodes. This uploads the configsets to zookeeper and also creates the collection =>
./bin/solr create -c fooCollection -d /home/john/conf -shards 1 -replicationFactor 1
Verify the healthstatus of the fooCollection =>
./bin/solr healthstatus -z zk1:port1,zk2:port1,zk3:port1 -c fooCollection
Now verify the config is present in Zookeeper by checking Solr-AdminConsole -> CloudSection -> Tree .. /configs
And also check the CloudSection -> Graph showing the active status on the nodes. That indicates that everything is good.
Now start pushing documents into the collection
The below wiki is very helpful to do the above.
https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference

sunspot with jruby

Hi can sunspot be used with jruby, also here is my app details
jruby-1.6.1
rails 3.0.7
when i install all the gems and run rake sunspot:solr:start it gives me the following error
Gem::LoadError: Could not find RubyGem sunspot (>= 0)
report_activate_error at /Users/dpatel/.rvm/rubies/jruby-1.6.1/lib/ruby/site_ruby/1.8/rubygems.rb:861
activate at /Users/dpatel/.rvm/rubies/jruby-1.6.1/lib/ruby/site_ruby/1.8/rubygems.rb:255
gem at /Users/dpatel/.rvm/rubies/jruby-1.6.1/lib/ruby/site_ruby/1.8/rubygems.rb:1215
(root) at /Users/dpatel/.rvm/gems/jruby-1.6.1#solr/bin/sunspot-solr:18
However when I run rake sunspot:solr:run, it works fine.
Also, when i search using Model.search, it works fine, but I when I fire up solr webapp on the browser and search something, it does not work.
Can anyone tell me what is happening, new to sunspot.
-Thanks
Hi i kinda figured it out, i am on jruby and fork is not allowed on jruby, and rake sunspot:solr:start tries to fork and thrown out an error but
rake sunspot:solr:run starts solr in the foreground and works fine, a little painful but all is well :-)
-D
You already figured out the forking issue, so if you want to stay in a single shell for development and testing I found these aliases to be particularly useful for running sunspot in a particular rails environment and then finding and killing that process when I am done:
If you keep the default ports:
alias sunspot_run_test="RAILS_ENV=test sunspot-solr run &"
alias sunspot_kill_test="fuser -n tcp 8982 -k"
alias sunspot_run_dev="RAILS_ENV=test sunspot-solr run &"
alias sunspot_kill_dev="fuser -n tcp 8982 -k
If you change ports you will need to change the auto-generated sunspot.yml or put a sunspot.rb in config>initializers and you can add a -p{$port_num} before the & on the run aliases and change the explicit port numbers to {$port_num} for the kill aliases.
As Vlad mentioned, it's hard to know what's going on with in the browser from your explanation. One thing that can catch you if you are new to sunspot is that you need to have an instance running in dev environment (use the sunspot_run_dev alias) before you try to CRUD anything in your database or you will get an connection refused error.
See bash aliases not recognized by a bash function: sunspot_rails, jruby, rspec for some more troubleshooting with functions to wrap commands that require sunspot.
For rake issue:
gem install sunspot -v 1.2.rc4
For 'does not work in browser': what do you mean it does not work?
if no result is returned:
You can debug it by comparing Solr requests made by you manually on script/console and the ones made by the server. By default, the sunspot gem logs requests in 'logs/sunspot-solr-development.log'
error (which?)

Resources