I am having a problem exporting a collection from Mongo Atlas to my local machine. I have tried several different formats including this one, which I found in the official Atlas documentation on importing and exporting data.
First I log into my Atlas like so:
mongosh "mongodb+srv://cluster0.oyvrw.mongodb.net/dbname" --username uname
Then I try the command from the official docs:
mongoexport --uri mongodb+srv://uname:password#cluster0.oyvrw.mongodb.net/dbname --collection colname --type json --out cats.json
I have looked around at other similar questions and tried everything I can find online without success. One suggestion was not to run the command from the mongo shell but from the regular command line, but this does not work either.
It seems like it should be easier to get a collection out of Atlas to JSON. Any help or suggestions are much appreciated. Thanks!
For anyone facing this error, the mongoexport command does not work with mongosh. It must be run with the system shell.
However, mongoexport is part of mongo-database-tools, which as of MongoDB 4.4, is released separately. As a result, running mongoexport in the system shell will throw a command not found if the installed version of MongoDB is 4.4 or greater.
To solve this you can install the database tools using homebrew:
brew install mongodb/brew/mongodb-database-tools
Of course, make sure you have homebrew already installed. If not a quick Google will help.
Then following command should work to perform an export:
mongoexport --uri mongodb+srv://<username>:<password>#cluster0.oyvrw.mongodb.net/<dbName> --collection <collectionName> --type json --out /Users/macuser/desktop/exportBU.json
Hope that helps anyone having similar problems getting data in/out of MongoDB.
I've been studying "Kubernetes Up and Running" by Hightower et al (first edition) Chapter 13 where they discussed creating a Reliable MySQL Singleton (Since I just discovered that there is a second edition, I guess I'll be buying it soon).
Using their MySQL reliable singleton example as a model, I've been looking for some sample YAML files to make a similar deployment with Microsoft SQL Server (Express) on Docker Desktop for Kubernetes.
Apparently I need YAML files to deploy
Persistent Volume
Volume claim (should this be NFS?)
SQL Server (Express edition) replica set (in spite of the fact that this is just a singleton).
I've tried this example but I'm confused because it does not contain a persistent volume & claim and it does not work. I get the error
Error: unable to recognize "sqlserver.yml": no matches for kind "Deployment" in version "apps/v1beta1"
Can someone please point me to some sample YAML files that are not Azure specific that will work on Docker Desktop Kubernetes for Windows 10? After debugging my application, I'll want to deploy this to Azure (AKS).
Wed Jul 15 2020 Update
I left out the "-n namespace" for the helm install command (possibly because I'm using Helm and you are using helm v2?).
That install command still did not work. Then I did a
helm repo add stable https://kubernetes-charts.storage.googleapis.com/
Now this command works:
helm install todo-app-database stable/mssql-linux
Progress!
When I do a "k get pods" I see that my todo-app-mssql-linux database is in the pending state. So I did a
kubectl get events
and I see
Warning FailedScheduling pod/todo-app-database-mssql-linux-8668d9b88c-lsh5l 0/1 nodes are available: 1 Insufficient memory.
I've been google searching for "Kubernetes insufficient memory" and can find no match.
I suspect this is a problem specific to "Docker Desktop Kubernetes".
When I look at the output for
helm -n ns-todolistdemo template todo-app-database stable/mssql-linux
I see the deployment is asking for 2Gi. (Interesting: when I use the template command, the "-n ns-todolistdemo" does not cause an error like it does with the install command).
So I do
kubectl describe deployment todo-app-database-mssql-linux >todo-app-database-mssql-linux.yaml
I edit the yaml file to change 2Gi to 1Gi.
kubectl apply -f todo-app-database-mssql-linux.yaml
I get this error:
error: error parsing todo-app-database-mssql-linux.yaml: error converting YAML to JSON: yaml: line 9: mapping values are not allowed in this context
Hmm... that did not work. I try delete:
kubectl delete deployment todo-app-database-mssql-linux
kubectl create -f todo-app-database-mssql-linux.yaml
I get this error:
error: error validating "todo-app-database-mssql-linux.yaml": error validating data: invalid object to validate; if you choose to ignore these errors, turn validation off with --validate=false
So I try apply:
kubectl apply -f todo-app-database-mssql-linux.yaml
Same error!
Shucks.... Is there a way to adjust the memory allocation for Docker Desktop?
Thank you
Siegfried
Short answer
https://github.com/helm/charts/blob/master/stable/mssql-linux/templates/pvc-master.yaml
Detailed Answer
Docker For Desktop comes already with a default StorageClass :
This storage class is responsible for auto-provisioning of PV whenever you create a PVC.
If you have a YAML definition of PVC (persistent volume claim), you just need to keep storageClass empty, so it will use the default.
k get storageclass
NAME PROVISIONER AGE
hostpath (default) docker.io/hostpath 11d
This is fair enough as Docker-For-Desktop Cluster is a one node cluster. So if your DB crashes and the cluster opens it again , it will not move to another node, because simply, you have a single node :)
Now should write the YAML of PVC from scratch ?
No , you don't need. Because Helm should be your best friend.
( I explained below Why you have to use Helm even without deep learning curve)
Fortunately, the community provides a chart called stable/mssql-linux..
Let's run it together :
helm -n <your-namespace> install todo-app-database stable/mssql-linux
# helm -n <namespace> install <release-name> <chart-name-from-community>
If you want to check the YAML (namely PVC) that Helm computed, you can run template instead of install
helm -n <your-namespace> template todo-app-database stable/mssql-linux
Why I give you the answer with Helm ?
Writing YAML from scratch lets reinventing the wheel while others do it.
The most efficient way is to reuse what community prepare for you.
However, you may ask: How can i reuse what others doing ?
That's why Helm comes.
Helm comes to be your installer of any application on top of kubernetes regardless how much YAML does your app require.
Install it now and hit the ground choco install kubernetes-helm
I am getting an error while installing the features.
error executing command: error restarting bundles
Initially, it worked fine for some of the features but then it suddenly started throwing this error.
Any suggestion on this would be appreciated.
if you clean karaf, you can resolve the issue, by running the following command, you can clean the karaf.
./karaf clean
When OpenDaylight moved to karaf 4 there were problems identified with
installing features in the karaf shell one after the other. I think
you are hitting that problem.
you can try listing out all the features you want in the featuresBoot
variable of the etc/org.apache.karaf.features.cfg file.
You may also have some success trying to install on the karaf shell with
the following flag --no-auto-refresh, like this:
feature:install --no-auto-refresh odl-l2switch-switch
Also, as sridhar reddy noted, if you use "karaf clean" to start karaf it will
wipe the data/ folder (and more) so that old loaded features will not come
back in on startup and you will start "clean".
I have a SOLR / Zookeeper / Kafka setup. Each on separate VMs.
I have successfully run this all using two SOLR 4.9 vms (Ubuntu)
Now I wish to build two SOLR 5.4 vms and get it all working again.
Essentially, "Upgrade by Replacement"
I have "hacked" a solution to my problem but that makes me very nervous.
To begin, Zookeeper is running. I turn off my SOLR 4.9 vms and delete the config out of Zookeeper (not necessarily in that order... ;-) )
Now, I start up my 'solr5' VM (and SOLR in cloud mode) where I have installed SOLR 5.4 according to the "Production Install" instructions on the SOLR Wiki. I have also installed 5.4 on 'solr6', but it's not running yet.
I issue this command on the 'solr5' machine:
/opt/solr/bin/solr create -c fooCollection -d /home/john/conf -shards 1 -replicationFactor 1
and I get the following output:
Connecting to ZooKeeper at 192.168.56.5,192.168.56.6,192.168.56.7/solr ...
Re-using existing configuration directory statdx
Creating new collection 'fooCollection' using command:
http://localhost:8983/solr/admin/collections?action=CREATE&name=fooCollection&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=fooCollection
{
"responseHeader":{
"status":0,
"QTime":3822},
"success":{"":{
"responseHeader":{
"status":0,
"QTime":3640},
"core":"fooCollection_shard1_replica1"}}}
Everything is working great. I turn on my microservice, and it pumps all my SOLR docs from Kafka into 'solr5'.
Now, I want to add 'solr6' to the collection. I can't find a way to do this besides my hack (which I'll describe later).
The command I used before to create a collection, errors out with the observation that my collection already exists.
There seems to be no zkcli.sh or solr command that will do what I want. None of the api commands seem to do this either.
Is there not a simple way to say to (SOLR? Zookeeper?) I want to add another machine to my SOLR nodes, please configure it like the first (solr5) and begin replicating data?
Maybe I should have had both machines running when I issued the create command?
I'd be grateful for some "approved" method for doing this since I need to come up with a "solution" to do the same kind of approach in Prod every time there is a need to upgrade SOLR.
Now for my hack. Keep in mind I'm now two days trying to find clear docs on this. No flames please, I totally get that this is not the way to do things. At least, I HOPE this is not the way to do things...
Copy the fooCollection directory from where the create collection
command put it on 'solr5' (which was
/opt/solr/server/solr/fooCollection_shard1_replica1) to the same
location on my 'solr6' VM.
Make what changes seem logical to the collection directory name (becomes
fooCollection_shard1_replica2)
Make what changes seem logical in the core.properties file:
For reference, here's the core.properties file that was created by the create command.
#Written by CorePropertiesLocator
#Wed Jan 20 18:59:08 UTC 2016
numShards=1
name=fooCollection_shard1_replica1
shard=shard1
collection=fooCollection
coreNodeName=core_node1
Here is what the file looked like on 'solr6' when I was done hacking.
#Written by CorePropertiesLocator
#Wed Jan 20 18:59:08 UTC 2016
numShards=1
name=fooCollection_shard1_replica2
shard=shard1
collection=fooCollection
coreNodeName=core_node2
When I did this and rebooted 'solr6' everything appeared golden. The "Cloud" web page looked right in the Admin web page - and when I added documents to 'solr5' they were available in 'solr6' if I hit it directly from the Admin web pages.
I would be grateful if someone can tell me how to achieve this without a hack like this... or if this IS the right way to do this...
=============================
In answer to #Mani and the suggested procedure
Thanks Mani - I did try this very carefully following your steps.
In the end, I get this output from the collection status query:
john#solr6:/opt/solr$ ./bin/solr healthcheck -z 192.168.56.5,192.168.56.6,192.168.56.7/solr5_4 -c fooCollection
{
"collection":"fooCollection",
"status":"healthy",
"numDocs":0,
"numShards":1,
"shards":[{
"shard":"shard1",
"status":"healthy",
"replicas":[{
"name":"core_node1",
"url":"http://192.168.56.15:8983/solr/fooCollection_shard1_replica1/",
"numDocs":0,
"status":"active",
"uptime":"0 days, 0 hours, 6 minutes, 24 seconds",
"memory":"31 MB (%6.3) of 490.7 MB",
"leader":true}]}]}
This is the kind of result I've been finding in my experimentation all along. The core will get created on one of the SOLR VM's (the one I issue the command line to create the collection on) but I don't get anything created on the other VM -- which, based on your steps below, I believe you also thought should occur, yes?
Also, I'll note for anyone reading that in 5.4, the command is "healthcheck" and not healthstatus. The command line shows you immediately, so it's no big deal.
===============
Update 1 :: Manual add of 2nd core
If I go to the other VM and manually add the following:
sudo mkdir /opt/solr/server/solr/fooCollection_shard1_replica2
sudo mkdir /opt/solr/server/solr/fooCollection_shard1_replica2/data
nano /opt/solr/server/solr/fooCollection_shard1_replica2/core.properties
(in here I add only collection=fooCollection and then save/close)
Then I reboot my SOLR server on that same VM:
sudo /opt/solr/bin/solr restart -c -z zoo1,zoo2,zoo3/solr
I will find a second node magically appearing in my Admin console. It will be a "follower" (I.E. not the leader) and both will be branching off "shard1" in the cloud UI.
I don't know if this is "the way" but it's the only way I've found so far. I'm going to reproduce to that point and try with the Admin UI and see what I get. That would be a little easier for my IT guys when the time comes - if it works.
===============
Update 2 :: Slight modification of create command
#Mani -- I believe I have success following your steps - and like many things, it's simple once you understand.
I reset everything (deleted directories, cleared out zookeeper (rmr /solr) and re did everything from scratch.
I changed the "create" command slightly thus:
./bin/solr create -c fooCollection -d /home/john/conf -shards 1 -replicationFactor 2
Note the "replicationFactor 2" rather than 1.
Suddenly I did indeed have cores on both VMs.
A couple of notes:
I found that I couldn't get a happy result from the status call just by starting the SOLR 5.4 servers in Cloud mode with the Zookeeper IP addresses. The "node" in Zookeeper was not yet created.
The create command also failed at that point.
The way I found around this was to use the zkcli.sh to load the configs like this:
sudo /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -confdir /home/john/conf/ -confname fooCollection -z 192.168.56.5/solr
When I checked Zookeeper immediately after running this command, there was a /solr/configs/fooCollection "path".
NOW the create command works and I assume that if I had wanted to override the configs, I could have done so at that point although I haven't tried.
I'm not positive at what point, but it seems I needed to reboot the SOLR Servers (probably after the create command) in order to find everything on status etc... I may be misremembering that because I've been through it so many times. If in doubt after the create command, try a reboot of the servers. (This can be IP addresses or names that resolve correctly)
sudo /opt/solr/bin/solr restart -c -z zoo1,zoo2,zoo3/solr
sudo /opt/solr/bin/solr restart -c -z 192.168.56.5,192.168.56.6,192.168.56.7/solr
After doing these slight modifications to #Mani's recommended procedure, I get a Leader and a "follower" each on different VM's - in the /opt/solr/server/solr directory (fooCollection in this case) and I was able to send data in to one and search the other via the Admin console hitting the IP addresses.
=============
Variations
One thing anyone reading this may want to try is simply making another "node" in Zookeeper (solr5_4 for example).
I tried this and it works like a charm. Everywhere you see the /solr chroot associated with the Zookeeper ensemble, you could replace it with /solr5_4. This would allow the older SOLR VM's to keep functioning in Prod while you build out your new SOLR 5.4 "environment" and the same Zookeeper VM's could be used for both -- because a different chroot should guarantee no interaction or overlap.
Again, the "node" in Zookeeper won't be created until you do the config upload, but you need to start your SOLR process like this or you'd be in the wrong context later on. Note the "solr5_4" as the chroot.
sudo /opt/solr/bin/solr restart -c -z zoo1,zoo2,zoo3/solr5_4
Once done with testing, the solr5_4 "environment" becomes what matters for Prod and the SOLR 4.x VM's and Zookeeper "node" of solr can be removed. It should be a fairly simple matter to point a load balancer at the new SOLR VM's and do a switchover without users really even noticing.
This strategy will work for SOLR 6, 6.5, 7, and so on.
This command also worked to add the collections/cores. However, the solr server had to be running first.
http://192.168.56.16:8983/solr/admin/collections?action=CREATE&name=fooCollection&numShards=1&replicationFactor=2&collection.configName=fooCollection
==================
Use as Upgrade By Replacement
In case it's not obvious, this technique (especially if using the "new" chroot in Zookeeper of something like /solr5_4 or similar) gives you the luxury of leaving your older version of SOLR running for as long as you want. Allowing a re-indexing of all your data to take days if needed.
I haven't tried, but I'm guessing a backup of the index could be dropped into the new machines as well.
I just wanted readers to understand that this was an approach intended to make upgrades really low stress and straightforward. (Don't need to upgrade in place, just build new VMs and install latest version of SOLR.)
This would allow the switch-over to occur without affecting prod until you're ready to drop the hammer and re-direct your load balancer at the new SOLR ip addresses (Which you will have already tested of course...)
The one assumption here is that you have the resources to bring up a set of SOLR VMs or physical servers to match whatever you already have in Production. Obviously, if you're resource-limited to only the boxes or VMs you have, upgrade-in-place may be your only option.
This is how I would do it. I am assuming that you have the luxury of having downtime & have ability to completely reindex the documents. Since you are essentially upgrading from 4.9 to 5.4.
Stop the 4.9 solr nodes and uninstall solr.
Remove the config from zk nodes using zkcli.sh with the clear command.
Install the solr on both solr5 & solr6 vm
Start both the solr nodes and make sure both can talk to zk. =>
On solr5 vm ./bin/solr start -c -z zk1:port1,zk2:port1,zk3:port1
On solr6 vm ./bin/solr start -c -z zk1:port1,zk2:port1,zk3:port1
Verify the status of Solrcloud using ./bin/solr status => this should return liveNodes as 2
Now create the fooCollection using the CollectionsAPI from anyone of solr nodes. This uploads the configsets to zookeeper and also creates the collection =>
./bin/solr create -c fooCollection -d /home/john/conf -shards 1 -replicationFactor 1
Verify the healthstatus of the fooCollection =>
./bin/solr healthstatus -z zk1:port1,zk2:port1,zk3:port1 -c fooCollection
Now verify the config is present in Zookeeper by checking Solr-AdminConsole -> CloudSection -> Tree .. /configs
And also check the CloudSection -> Graph showing the active status on the nodes. That indicates that everything is good.
Now start pushing documents into the collection
The below wiki is very helpful to do the above.
https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference
I would like to run a SOLR Server on an Elastic Beanstalk. But I cannot find that much about that in the web.
It must be possible somehow, 'cause some are using it already. (https://forums.aws.amazon.com/thread.jspa?threadID=91276 i.e.)
Any Ideas how I could do that?
Well, somehow I can upload the solr warfile into the environment, but then it gets complicated.
Where do I put the config files and the index directory, so that each instance can reach it?
EDIT: Please keep in mind that this answer is from 2013. The products mentioned here have likely evolved. I have updated the documentation link to reflect changes in the solr clustering wiki. I encourage you to continue your research after reading this information.
ORIGINAL:
It only really makes sense to run solr on beanstalk instances if you are planning to only ever use the single server deploy. The minute that you want to scale your app you will need to configure your beanstalk environment to either create a solr cluster or move to something like CloudSearch. If you are unfamiliar with ec2 lifecycles and solr deployments then CloudSearch will almost certainly save you time (read money).
If you do want to run solr on a single instance then you can use rake to launch it by adding a file to your local repo named .ebextensions/solr.config with the following contents:
container_commands:
01create_post_dir:
command: "mkdir -p /opt/elasticbeanstalk/hooks/appdeploy/post"
ignoreErrors: true
02killjava:
command: "killall java"
test: "ps uax | grep java | grep root"
ignoreErrors: true
files:
"/opt/elasticbeanstalk/hooks/appdeploy/post/99_start_solr.sh":
mode: "755"
owner: "root"
group: "root"
content: |
#!/usr/bin/env bash
. /opt/elasticbeanstalk/support/envvars
cd $EB_CONFIG_APP_CURRENT
su -c "RAILS_ENV=production bundle exec rake sunspot:solr:start" $EB_CONFIG_APP_USER
su -c "RAILS_ENV=production bundle exec rake db:seed" $EB_CONFIG_APP_USER
su -c "RAILS_ENV=production bundle exec rake sunspot:reindex" $EB_CONFIG_APP_USER
Please keep in mind that this will cause chaos if you are using autoscaling.