I have been tasked with building a test index of about 100million small records using Solr. I have been running this on my laptop since some time yesterday, incrementing it by 10million records at a time and running queries at the major "milestones" (10m, 20m... etc). I have reached about 70million records, and all is going well... The laptop specs are as follows:
Quad Core i7
8Gb RAM
Windows 7
Tomcat 7 + latest version of Solr.
As a test, I decided to see what happens when i run a simular workload on my home workstation (Dual Proc, Quad Core Xeon, 12Gb RAM, 2x10K RPM Disks in RAID 0 for index, Windows 2008 R2, same software). Only difference is now i am using Multi Cores... using the same schema and conf directory from the laptop, modified the solr.xml...
Anyway, on the laptop, at about 70 million records, i am getting results of less than 500ms. thats 150 queries, 100 of which are one word, 50 are 2 word queries. only one field is queried (name field). all good... On my workstation, using multi cores and the following querystring, i am getting times in excess of 4-5 seconds!
http://localhost:8080/solr/core0/select?shards=localhost:8080/solr/core0,localhost:8080/solr/core1,localhost:8080/solr/core2,localhost:8080/solr/core3&q=Name:Test Name
This is a new Index i have generated: i am doing a loop, from 0 to 100,000,000 and every time i hit i % 10000 == 0 i add the documents to a solr core. Each time I hit that loop, I incrememt a commitID, and when commitID %4 == 0, go to core0, when 1 go to core1, etc...
I am pretty sure its a config issue somewhere... but i just want to make sure... Should I be expecting this to be a lot faster? both processors (Laptop and Workstation) are in around the 2.2Gz range. Both are new enough architectures (Nehalem on the Workstation, i7 from 2010 on the laptop). So, any ideas what i should be looking at?
Related
We're running SqlServer 2019 on google cloud linux VM (2vcpu, 4gb), using mcr.microsoft.com/mssql/server:latest image. Might be of interest; it's enterprise edition. I understand that 2vcpu & 4gb is not really sufficient for any kind of SqlServer workload, but its currently just a poc/test environment, but shouldn't be relevant as no queries are executed anyway. (maybe?)
Instance hosts about 10 databases, each smaller than the other (few megs each), but databases aren't queried at all. Only external queries executed are SELECT 1 healthchecks (from dotnet core apps).
After few weeks of slow gcloud dashboard movement (cpu ~ 1% at all times), few days ago cpu started to behave oddly. It sticks to 60% most of the time (+/- few %). Pattern is visible on the image below; it sticks to ~60% for 30 min, then drops for few mins, and goes back up to 60%. It once went higher, but most of the time its steady 60%:
When i ssh to the instance, top looks like this:
Additional information;
MAXDOP is set to 2
Memory is not limited (mistake on my part, should be ~3gb, considering 4gb vm?)
Other config: default
I've run query shown here: techcommunity.microsoft.com and got results that look like old 2008 issue: constant resource monitor values 32-33.
How can i find out whats eating the cpu?
I'm just curious in which direction to look... Is it resource monitor-related? Linux? Docker? Bad configuration? Too weak VM?
Edit: reason why i haven't increased VM resources or set ram limits so far is because it took few weeks for this issue to occur, I dont want to make it disappear by restarting the VM. We have 2 more VMs with the same setup, still behaving as expected (~1% cpu).
Edit 2: I added monitoring snapshots of last 48h for this and 2 other VMs running the similar workload - below.
Column 1: VM1 - topic of the discussion
Column 2: VM2 - behaves as expected
Column 3: VM3 - behaves as expected
Memory footprint of all instances is the same:
VIRT = 12.5g
RES = 2.5-3.0g
%MEM = 65%
The setup:
A CouchDB 2.0 running in Docker on a Raspberry PI 3
A node-application that uses pouchdb, also in Docker on the same PI 3
The scenario:
At any given moment, the CouchDB has at max 4 Databases with a total of about 60 documents
the node application purges (using pouchdbs destroy) and recreates these databases periodically (some of them every two seconds, others every 15 minutes)
The databases are always recreated with the newest entries
The reason for purging the databases, instead of deleting their documents is, that i'd otherwise have a huge amount of deleted documents, and my web-client can't handle syncing all these deleted documents
The problem:
The file var/lib/couchdb/_dbs.couch always keeps growing, it never shrinks. Last time i left it alone for three weeks, and it grew to 37 GB. Fauxten showed, that the CouchDB only contains these up to 60 Documents, but this file still keeps growing, until it fills all the space available
What i tried:
running everything on an x86 machine (osx)
running couchdb without docker (because of this info)
using couchdb 2.1
running compaction manually (which didn't do anything)
googling for about 3 days now
Whatever i do, i always get the same result: the _dbs.couch keeps growing. I also wasn't really able to find out, what that files purpose is. googling that specific filename only yields two pages of search-results, none of which are specific.
The only thing i can currently do, is manually delete this file from time to time, and restart the docker-container, which does delete all my databases, but that is not a problem as the node-application recreates them soon after.
The _dbs database is a meta-database. It records the locations of all the shards of your clustered databases, but since it's a couchdb database too (though not a sharded one) it also needs compacting from time to time.
try;
curl localhost:5986/_dbs/_compact -XPOST -Hcontent-type:application/json
You can enable the compaction daemon to do this for you, and we enable it by default in the recent 2.1.0 release.
add this to the end of your local.ini file and restart couchdb;
[compactions]
_default = [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}]
I have a SOLR instance running with a couple of cores, each one of them having between 15 to 25 million documents. Normally the size of each core index (on the disk) is around 30-50 GB each, but there is one particular core index that keeps increasing until hard disk space is full (raising up to 200 GB and more).
When I look at other indexes all files are from the current day, but this one core keeps files also from 4-5 days ago (I guess the data is always duplicated on every import).
What could be causing such behavior and what should I look for when debugging it? Thanks.
We are running a vanilla solr installation. [ in non-cloud mode ].
Each document has about 100 fields, and the document size is ~5k bytes.
There are multiple cores, ~20 in a single solr instance. The total number of documents combined is ~2 million.
During testing, this node gives a peak QPS of ~100. For a modern 8core, 60G machine, this seems to be really low.
Does anyone have experience with solr internals to explain, why is it so slow?
Will using lucene library directly with a thin server wrapper give a higher QPS?
NopCommerce 2.30 is running on IIS 7.5 and Server has 16GB RAM with Dual Core processor.I want to implement Solr with Nop Commerce.
How many searches can handled by Solr per second if index file is more than or equal to 2 GB?
I recommend this presentation that will make it easy for you to find out your system capabilities.
http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012-presentations#Paddy_Mullen
It is described how to index Wikipedia as a Benchmark of Single Machine Performance Limit.
Also there is some information about this on the dataimporthandler wiki of solr.
http://wiki.apache.org/solr/DataImportHandler#Example:_Indexing_wikipedia
From my experience with solr I can say that I have a 10 GB index splitted in 9 cores and I do 2300 requests per minute and I have an average response time of 40 ms. This is on two AMD Opteron 2.6 ghz processors.
Maybe you should ask the expert who do it for nop commerce here: http://nopaccelerate.com/