I'm using Waterline which is amazing ORM of Node.js. I think there are two ways to count relation(association).
First way is to apply record count when a relation record added or removed. e.g) A comment appended to a post, post's comment count field will be increased.
Second way is using 'count' query. I can count the relations when I need.
What I am worry is second way is easier but it seems to be slower than first way. It can request too much. But first way needs more dirty codes.
I really don't know what is best way to count relation.
The answers to this question have to be a little opinonated, but I will give you my point of view.
I would go with the "count query" solution because it is the most reliable way to get this information. As you said, the other solution needs more dirty code and could be more easily bugged. I always try to have a single way to retrieve an information.
If the request is too much slow and/or too much frequent and slows down your application, then you should consider caching the result. Depending on the infrastructure you are using, you could cache the result of the query in a variable or in a fast cache backend like memcached or Redis. You will have to invalidate the cache when needed and it is up to you to decide the lifetime of the cache. You should define a global cache stategy of your application so you could use it for other parts of your application.
Related
Just a question regarding NoSQL DB. As far as I know, operations are done by the app/website outside the DB. For instance, if I need to add an value to a list, I need to
download the intial list
add the new value in the list on my device
upload the whole updated list.
At the end, a lot of data is travelling (twice the initial list) with no added value.
Is there any way to request directly the DB for simple operations like this?
db.collection("collection_key").document("document_key").add("mylist", value)
Or simply increment a field?
Same for knowing the number of documents in a collection: is it needed to download the whole set of document to get the number ?
Couple different answers:
In Firestore, many intrinsic operations can be done "FieldValues", such as increment/decrement (by supplied value, so really Add/subtract). Also array unions, field deletes, etc. Just search the documentation for FieldValue. Whether this is true for NoSQL in general, I can't say.
Knowing the number of documents, on the other hand. is not trivially done in Firestore - but frankly, I can't think of any situations other than artificially contrived examples where you would need to know. Easy enough to setup ways to "count" documents as you create/delete them, and keep that separately, if for some reason you find yourself needing it.
Or were you just trying to generically put down NoSQL as a concept?
We want to use SolR in a Near Real Time scenario. Say for example we want to filter / rank our results by number of views.
SolR SoftCommit was made for this use case but:
In practice, the same few documents are updated very frequently (just for the nb_view field) while most of the documents are untouched.
As far as I know each update, even partial are implemented as a full delete and full addition of the document in lucene.
It seems to me having many times the same docs in the Tlog is inefficient and might also be problematic during the merge process (is the doc marked n times as deleted and added?)
Any advice / good practice?
Two things you could use for supporting this scenario:
In place updates: only that field is udpated, not the whole doc. Check out the conditions you need to be able to use them.
ExternalFileFieldType you keep the values in an external file
if the scenario is critical, I would test both in reald world conditions if possible, and asses.
I would like to well understand Solr merge behaviour. I did some researches on the different merge policies. And it seems that the TieredMergePolicy is better than old merge policies (LogByteSizeMergePolicy, etc ...). That's why I use this one and that's the default policy on last solr versions.
First, I give you some interesting links that I've read to have a better idea of merge process :
http://java.dzone.com/news/merge-policy-internals-solr
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
According to the official documentation of Lucene, I would like to ask several questions on it :
http://lucene.apache.org/core/3_2_0/api/all/org/apache/lucene/index/TieredMergePolicy.html
Questions
1- In the official documentation, there is one method called setExpungeDeletesPctAllowed(double v). And in the Solr 4.3.0, I have checked in the TieredMergePolicy class and I didn't find this method. There is another method that look like this one, called : setForceMergeDeletesPctAllowed(double v). Is there any differences between both methods ?
2- Are both methods above called only when you do a ExpungeDelete and an optimization or Are they called when a normal merge.
3- I've read that merges beetween segments are done according a pro-rata of deleted documents percentage on a segment. By default, this percentage is set to 10%. Does it possible to set this value to 0% to be sure that there is no more deleted documents in the index after merging ?
I need to reduce the size of my index without call optimize() method if it's possible. That's why any informations about merge process would be interesting for me.
Thanks
you appear to be mixing up your documentation. If you are using Lucene 4.3.0, use the documentation for it (see the correct documentation for TieredMergePolicy in 4.3.0), rather than for version 3.2.0.
Anyway, on these particular questions: See #Lucene-3577
1 - Seems to be mainly a necessary name change, for all intents and purposes.
2 - Firstly, IndexWriter.expungeDeletes no longer exists in 4.3.0. You can use IndexWriter.forceMergeDeletes(), if you must, though it is strongly recommended against, as it is very, very costly. I believe this will only impact a ForceMergeDeletes() call. If you want to favor reclaiming deletions, set that in the MergePolicy, using: TieredMergePolicy.setReclaimDeletesWeight
3 - The percent allowed is right there in the method call you've indicated in your first question. Forcing all the deletions to be merged out when calling ForceMergeDeletes() will serve to make an already very expensive operation that much more expensive as well, though.
Just to venture a guess, if you need to save disk space taken by your index, you'll likely have much more success looking more closely at how much data your are storing in the index. Not enough information to say for sure, of course, but seems a likely solution to consider.
Hi I'm using CakePHP and I'm wondering if it's advisable to store things that don't change a lot in the database lik the list of cities?
If your application already needs a database, why would you keep data anywhere else?
If the list doesn't change (per installation) and it's reasonably small and frequently used, then it might be worth reading it once on initialization and caching the result to improve performance and reduce the load on the database.
You get all sorts of queries and retrievals out of the box, the same way you access any other of your data. Databases are as cheap as flat files today, but you get a full service.
I see this question has had an answer accepted - I still want to chime in with my $0.02
The way I typically do for arrays of static data (country list, timezone list, immutable sets you would use enum for...) is to use this array datasource.
It allows you to map relationships between db models and array based models and to use the usual find syntax / Containable on the relationships.
http://github.com/jrbasso/array_datasource
If it is pretty much a static list, then you can store it either in the db or a file, but keep it in memory for use. In other words, load it once whether from db or file. What you don't want to do is keep taking a hit loading it. Especially if you use it on most page views. Those little bits of time add up if you have a large number of visitors.
The flip side, of course, is if you find yourself doing this for large lists or lots and lots of little lists. Then you could run into problems of keeping too much in memory.
Bill the Lizard is right about it being important whether or not the list links to other tables. If it does, then you will need it in the db if you need queries that will include it.
Imagine you're dealing with many strings of text that are about 10,000 characters long entered by users. Would it be more efficient to write those automatically onto pages or input them onto a table in a database? I hope that question is clear enough...
It depends on what sort of "efficiency" you're aiming for.
Here's what I mean:
will you be changing the content of your text strings?
what sorts of searches will you be doing?
when you extract the text do what do you do with it?
My opinion is that provided you're not going to change the content much, nor perform much analysis, you're better off with the database.
10k isn't particularly large, so either is fine. I would personally use the database, as it will allow you to easily search though.
Depends how you're accessing them, but normally using the FS would result in better performance. That's for the obvious reason the DB is another layer built on top of the FS, and using the FS directly, assuming no extra heavy processing (for example, have 100s of named files instead of one big bloated file ordered in a special order you need to parse), would save you the DBMS operations.
I'm wondering if SQLite would be the best of both worlds, or at least, the best database for that size of job.
The real answer her is what you're going to do with these strings.
Databases are meant to be able to quickly return specific records. If you're just going to SELECT * FROM Table and then concat it all together, there's no point in using a database.
However, if you have a relation between your data that you want to be able to search, then a database will likely be more efficient.
E.G., do you want to be able to pull up all the text records from a set of users on a set of dates? Find all records from users who match some records?
These kinds of loads will likely be more efficient than a naive implementation, and still probably faster than a decent one, even if it does avoid some access layers.
There are a lot of considerations. As others said - either approach would work fine for a small number of 10k rows (thousands).
But what's the rest of your app do? If it does everything in the database, then I'd be inclined to put this there as well; the opposite is true as well.
And how will you be selecting these? Do you need to do complex text searches? If so, a database might not be the best. Or, would you be adding new attributes, searching on those attributes - or matching them against data in other tables? In this common case a database would be better.
And if your data is really vast (many millions of 10k rows) and your performance requirements aren't terribly high - you may want to compress them and store them in the file system.
Lastly, how important is data quality? Given the features of a good database it's much easier to guarantee good data quality with a database.