Cannot find in your documentation anywhere wheter "sddocname" is automatically indexed , and therefore I can fully rely on the best possible performance when querying by sddocname. Thank you for answer
sddocname is indexed. The &model.restrict api (http://docs.vespa.ai/documentation/reference/search-api-reference.html#model.restrict) uses sddocname internally.
Related
Official Solr documentation points to the separation of Simplified and Traditional Chinese by using different Tokenizers. I wonder if people use ICU Transform Filter to Traditional <-> Simplified conversion and then be able to have one unique field for both Chineses.
At the same time, it seems that this conversion is a really hard task and it doesn't seem to be solved.
The simple question is what is the recommended way of indexing traditional and simplified Chinese in Solr? It would be really convenient to have a unique field for both, but I couldn't find a good success case for that.
The truth is, it is possible. This video shows how you could create an field with as many languages as possible. But looks tricky.
The documentation of the ndb.Query class, states that it accepts a read_policy option that can be set to EVENTUAL_CONSISTENCY to allow faster queries that might not be strongly consistent. Which implies that not using this option would return strongly consistent results.
However, global queries are always eventually consistent. So what does this flag actually do?
You can choose to have an ancestor-query, which would normally be strongly-consistent, use the eventually-consistent policy instead for the stated speed improvement.
The old 'db' module docs explain this.
(If you've only ever used NDB, then the DB docs are definitely worth reading - there is a lot more detail on how things work, and how best to make use of datastore.)
I have a list of ASINs and need to get the corresponding EAN/UPC values.
I am aware this is possible using AWSECommerceService and ItemLookup call. However, my application already uses MWS, and I'd like to avoid using two APIs, two access keys, etc.
The most similar API call in MWS is GetMatchingProduct. However, the returned data does not include an EAN/UPC. I would be astonished if this is impossible with MWS, however, I can't see anyway to get EAN/UPC.
Any suggestions appreciated,
Paul
I don't think there is a call that does what you want. There is a call that does the opposite, if that is of any help: GetMatchingProductFromId will return the ASIN for a given EAN or UPC. Why the result from this call (and from GetMatchingProduct) does not return EANs etc. is beyond me.
If you already have items listed through MWS, the _GET_MERCHANT_LISTINGS_DATA_ report might help
Just answering this question for my own amusement and because I might need it in the future when I have forgotten I previously looked at this.
Amazon apparently consider EAN for ASIN/SellerSKU proprietary information which is why their standard seller APIs don't return it. This doesn't make a huge amount of sense to me personally because you would think that it would at least return them for your own products (when specifying your own sku and authentication information.)
I've combed the documentation, mws forums and also asked Amazon directly but it looks like it's not available through standard APIs.
I've ready somewhere that it may be possible via APIs available to associates but that's not me so remains a rumour.
I would like to well understand Solr merge behaviour. I did some researches on the different merge policies. And it seems that the TieredMergePolicy is better than old merge policies (LogByteSizeMergePolicy, etc ...). That's why I use this one and that's the default policy on last solr versions.
First, I give you some interesting links that I've read to have a better idea of merge process :
http://java.dzone.com/news/merge-policy-internals-solr
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
According to the official documentation of Lucene, I would like to ask several questions on it :
http://lucene.apache.org/core/3_2_0/api/all/org/apache/lucene/index/TieredMergePolicy.html
Questions
1- In the official documentation, there is one method called setExpungeDeletesPctAllowed(double v). And in the Solr 4.3.0, I have checked in the TieredMergePolicy class and I didn't find this method. There is another method that look like this one, called : setForceMergeDeletesPctAllowed(double v). Is there any differences between both methods ?
2- Are both methods above called only when you do a ExpungeDelete and an optimization or Are they called when a normal merge.
3- I've read that merges beetween segments are done according a pro-rata of deleted documents percentage on a segment. By default, this percentage is set to 10%. Does it possible to set this value to 0% to be sure that there is no more deleted documents in the index after merging ?
I need to reduce the size of my index without call optimize() method if it's possible. That's why any informations about merge process would be interesting for me.
Thanks
you appear to be mixing up your documentation. If you are using Lucene 4.3.0, use the documentation for it (see the correct documentation for TieredMergePolicy in 4.3.0), rather than for version 3.2.0.
Anyway, on these particular questions: See #Lucene-3577
1 - Seems to be mainly a necessary name change, for all intents and purposes.
2 - Firstly, IndexWriter.expungeDeletes no longer exists in 4.3.0. You can use IndexWriter.forceMergeDeletes(), if you must, though it is strongly recommended against, as it is very, very costly. I believe this will only impact a ForceMergeDeletes() call. If you want to favor reclaiming deletions, set that in the MergePolicy, using: TieredMergePolicy.setReclaimDeletesWeight
3 - The percent allowed is right there in the method call you've indicated in your first question. Forcing all the deletions to be merged out when calling ForceMergeDeletes() will serve to make an already very expensive operation that much more expensive as well, though.
Just to venture a guess, if you need to save disk space taken by your index, you'll likely have much more success looking more closely at how much data your are storing in the index. Not enough information to say for sure, of course, but seems a likely solution to consider.
It seems there's not such interface..
Do I have to iterate all keys to get the count?
What is the design purpose of that? Or what is the limitation of implement this feature?
"There is no way to implement Count more efficiently inside leveldb than outside." states offical issue 113
Looks like there is no better way to do it, except for either iterating through the whole dataset or implementing your own in-application on-write counter.
Probably when LevelDB was built, this API was not required for the original authors.
Sadly LevelDB does not have an increment API which you can use to record counting. What you can do right now is read and write a key in Leveldb, but this is not thread safe.
May be you could have a look at Redis, if it is better suited for your use case.