I have two Apache Solar collections, the first collection containing information for the past year and the second collection containing information up to one year old (as shown below)
My problem is doing sorted search between two collections.
For example, I want to search data between 300-400 days ago in a sort order, the problem is that I do not know how to do it by most accurate and fastest way.
Create a Collection alias that spans both collections - i.e. give both collection names in the collections list when creating the alias:
/admin/collections?action=CREATEALIAS&name=name&collections=uptopastyear,pastyeartonow
You can then query this collection with a regular range filter:
&fq=datetime:[<timestamp 400 days ago> TO <timestamp 300 years ago>]
Related
I am working on a MERN project (related to School) , wherein one of the screen needs to display 3 lists, i.e. Grades, Subjects and Topics.
By default first item of grade list is selected and the subject list is showing its corresponding subject along with first subject is selected and then topic list is showing the corresponding topics of the subject.
Grade is from 1 to 10
Each grade would have maximum of 7 subjects
Each subject can have maximum of 50 topics
So total data is 350 records
Now what should my mongodb data model should look like
Three collections, one for each list
Just one combined collection, since count of record is very low
I have a mongodb database storing documents with a publication date and an end date. For my request i need to create an index with both on those dates.
My collection size is about 17GB for a total of 6 Millions documents.
When i create my index, the index size is about 600MB ... but when i use it intensively, the size get up to 70GB ... 6 times more than the documents themselves oO
Am i doing something wrong ? Is there some special considerations with date fields ? (I only have problem with date indexes).
Note: I suspected my dates to be "too precises" to be indexed, so i rounded them to the nearest hour ... without any index size decrease.
Okay i've found the reason of this really strange bug. I am putting it here since it can happens to others.
I'm using moment.js for managing dates on node.js
I'm using mongodb driver directly (not mongoose)
(I forgot to precise it in my question), i make an update on my dates in my intensive scripts.
I update those dates with a moment object instead of a pure javascript date making mongodb try to index every bit of subpath i think which is responsible of this huge index size.
I have a sheet "RawCount" with Google Form results that will accumulate over time (people will make entries each week or as their raw number changes).
I need a way to compile the data to obtain the most recent entry for each individual who has entered data via the form:
This data will accumulate with new entries over the period of eight months from up to 100 or more different people.
I would like to sum the most recent entries for each individual onto another tab in the same Google Sheet that contains a scorecard.
Thanks for any help you can offer. I think I've sprained my brain on this.
I've got a business case where I need to check if the search query is about displays businesses
eg: q="night clubs new york"
I've got a list of Countries, state city and region in my database 3million + records and I've got a list of business categories.
All I want to do is check if in the query has a business category in it (night clubs) and does it have a City, state or country's name (new york). So i'm checking the number of results retuned for the below query. If I get 2 numResults then this is a business query and then I query my Solr index to search for businesses.
query: places_ss:(night clubs new york) OR categories_ss:(night clubs new york)
Speed Question: How should I save the list of cities, states and countries in SOLR to get maximum search speed ?
Have one document id:places and add distinct cities, states and countries in on array places_ss
have multiple documents with different id's with 100,000 place names in each document in an array.
?
have a document or multiple documents with place_s string(not array) each place separated by space and each space in place separated by underscore eg: new york becomes new_york.
And during query time I will get multiple combinations of night clubs new york
eg: night night_clubs night_clubs_new night_clubs_new_york clubs_new clubs_new_york new_york york and query for place.
Would it be a good idea to have a separate core just for above place documents to increase speed ?
Is this a good solution ?
Document organisation :
better to have a document approche with :
- location
- activity
- other things needed!
location
You should save your location like this
Country:state:city:suburb.... so that you can seach in usa:new york:new york*
of ::new york
No need for _
avoid that, there is no needs !
activity
activity should be stored in another field for precision on the search and speed.
I have an example schema like:
id:1,date:2012-05-01,parent:p1
id:1,date:2012-05-01,parent:p2
id:1,date:2012-05-01,parent:p3
id:1,date:2012-05-02,parent:p1
id:1,date:2012-05-02,parent:p4
I would like to pefrorm a range query on "date" and know how many new/unique parents occured each day. In other words i would like to see how many NEW parents were added through time. For the given data the output should look like:
2012-04-31:0 (no parents existed an that time)
2012-05-01:3 (because three new parents occured at 2012-05-01: p1,p2,p3)
2012-05-02:4 (which is 3 parents from 2012-05-01 and 1 new unique parent p4 occured at 2012-05-02 which gives a total of 4)
2012-05-03:4 (no new parent was added this day...)
Is this kind of query even possible in SOLR?
Yes this should be fairly simple if I understand your question correctly. Adding something like
fq=date:[2012-05-05T00:00:00Z TO 2012-05-06T00:00:00Z]
to your query will fetch you all documents with a date between 5 May and 6 May. Make sure to store your dates in ISO 8601 format.
For more, check out the date examples here: http://wiki.apache.org/solr/SolrQuerySyntax
EDIT: I understood your question better now - you're looking for "group collapsing."
Try
&group=true&group.field=parent&group.limit=1
and count the number of documents returned.
If you want them with values for each date, you'll want to facet by date:
&facet=true&facet.field=date