Is it an expensive operation to create Ext.data.Store objects, because
I quite often create stores just for retrieving data once.
It will depend on the quantity of data you're retrieving and how you use it in your application.
You need to weigh up the overheads with calling data from you datasource more than once, with the overhead of storing it on the page and using it client side.
Using stores just to retrieve data once isn't a problem really as the store is just a client side d collection of data. There isn't really much weight to them.
It may also be worth knowing that if you're using ExtJS4 and you're talking about retrieving a single data item, rather than a collection of items, you can create a single 'model' and interact with that rather than a store, which would be a lighter solution.
Related
Let's say I use Elasticsearch in my application for searching restaurants near me.
I get all sorted restaurants id from Elasticsearch. And using these ids, I get all data like name, location, popular menus of restaurant from RDB.
As you can guess, it takes some time to get data from RDB. If I store all data used by application in Elasticsearch, then I can make it faster.
But I'm wondering what is the recommended way to store data in Elasticsearch and what to consider for choosing it.
I think there are some ways like below,
To store data only used for search
To store all data for search and display
Thanks!
This is a very interesting but very common question and normally every application needs to decide this, I can provide some data points which would help you to take a informed decision.
Elasticsearch is a NRT search engine and there will always be some latency when you update ES from your RDB. so some of your items which are in RDB will not be in ES and thus will not be in your search results.
Considering above, why you want to make a call again to RDB, to fetch the latest info from your RDB, on your ES search result or some other reasons like avoid fetching/storing the large data from ES ?
With every field ES provides a way to store it or not using store param or using _source enabled by default, if both are not enabled, you can't fetch the actual value, then you have to go to RDB.
RDB call to fetch the values of fields put a penalty on performance, have you benchmark it versus fetching the values directly from ES.
Every search system has its own functional and non-functional requirement and based on above points, hope you got more information, which will help you take a better decision.
I'm using pouchDb on an electron app. The data was stored in a postgres database before passing to pouchDb. On some cases it wasn't hard to figured out how to structure the data in a document fashion.
My main concern is regarding relations. For example:
I have the data type Projects and the Projects have many events. Right now I have a field called project_id on each event. So when I want to get the events for a project with ID 'project/1' I'll do
_db.allDocs({
include_docs: true,
startkey: 'event',
endkey: 'event\uffff'
}).then(function(response){
filtered = _.filter(response['rows'], function(row){
return row['doc']['project_id'] == 'project/1'
});
result = filtered.map(function(row){
return row['doc']
})
});
I've read that allDocs is the most performant API, but, Is having a view more convenient on this case?
On the other hand, when I show a list with all the projects, each project needs to show the number of events it has. On this scenario looks like I would have to run allDocs again, with include_docs: false in order to count the number of events the project has.
Does having a view improves this situation?
On the other hand I'm thinking on having an array with all the events Ids on the Project document so I can easily count how many event it has. In this case should I use allDocs? Is there a way of passing an array of Ids to allDocs? Or would it be better using a loop over that array and call get(id) for each id?
Is this other way more performant than the first one?
Thanks!
Good question! There are many ways to handle relationships in PouchDB. And like many NoSQL databases, each will give you a tradeoff of performance vs. convenience.
The system you describe is not terribly performant. Basically you are fetching every single event in the database (O(n)) and then filtering in-memory. If you have many events, then n will be large, meaning it will be very very slow.
You have a few options here. All of them are better than your current system:
Linked (aka joined) documents in map/reduce. I.e. in your map function, you would emit() the project _ids for each event. This creates a secondary index on whatever you put as the key in the emit() function.
relational-pouch, which is a plugin that works by using prefixed _ids and running allDocs() with startkey and endkey for each one. So it would do one allDocs() to fetch the project, then a second allDocs() to fetch the events for that project.
Entirely separate databases, e.g. new PouchDB('projects') and new PouchDB('events')
(Roughly, these are listed in order of least performant to most performant.)
#1 is more performant than the system you describe, although it's still not terribly fast, because it requires creating a secondary index, and then after that will essentially do an allDocs() on the secondary index database as well as on the original database (to fetch the linked docs). So basically you are running allDocs() three times under the hood – one of which is on whatever you emitted as the key, which it seems like you don't need, so it would just be wasted.
#2 is much better, because under the hood it runs two fast allDocs() queries - one to fetch the project, and another to fetch the events. It also doesn't require creating a secondary index; it can use the free _id index.
#3 also requires two allDocs() calls. So why is it the fastest? Well, interestingly it's because of how IndexedDB orders read/write operations under the hood. Let's say you are writing to both 'projects' and 'events'. What IndexedDB will do is to serialize those two writes, because it can't be sure that the two aren't going to modify the same documents. (When it comes to reads, though, the two queries can run concurrently in either case – in Chrome, at least. I believe Firefox will actually serialize the reads.) So basically if you have two completely separate PouchDBs, representing two completely separate IndexedDBs, then both reads and writes can be done concurrently.
Of course, in the case of a parent-child relationship, you can't know the child IDs in advance, so you have to fetch the parent anyway and then fetch the children. So in that case, there is no performance difference between #2 and #3.
In your case, I would say the best choice is probably #2. It's a nice compromise between perf and convenience, especially since the relational-pouch plugin already does the work for you.
I'm trying to make a general purpose data structure. Essentially, it will be an append-only list of updates that clients can subscribe to. Clients can also send updates.
I'm curious for suggestions on how to implement this. I could have a ndb.Model, 'Update' that contains the data and an index, or I could use a StructuredProperty with Repeated=true on the main Entity. I could also just store a list of keys somehow and then the actual update data in a not-strongly-linked structure.
I'm not sure how the repeated properties work - does appending to the list of them (via the Python API) have to rewrite them all?
I'm also worried abut consistency. Since multiple clients might be sending updates, I don't want them to overwrite eachother and lose an update or somehow end up with two updates with the same index.
The problem is that you've a maximum total size for each model in the datastore.
So any single model that accumulates updates (storing the data directly or via collecting keys) will eventually run out of space (not sure how the limit applies with regard to structured properties however).
Why not have a model "update", as you say, and a simple version would be to have each provided update create and save a new model. If you track the save date as a field in the model you can sort them by time when you query for them (presumably there is an upper limit anyway at some level).
Also that way you don't have to worry about simultaneous client updates overwriting each other, the data-store will worry about that for you. And you don't need to worry about what "index" they've been assigned, it's done automatically.
As that might be costly for datastore reads, I'm sure you could implement a version that used repeated properties in a single, moving to a new model after N keys are stored but then you'd have to wrap it in a transaction to be sure mutiple updates don't clash and so on.
You can also cache the query generating the results and invalidate it only when a new update is saved. Look at NDB also as it provides some automatic caching (not for a query however).
I have some data (about 1000 rows) that I want to display. I want the user to be able to filter and sort the data dynamicly. Also I have to be able to save and load the data. I am relativly new to databases and need your help deciding which approach is the best/most resonable:
1st variant:
This is my current approach. I am using a ListView/GridView that is bound to a ObservableCollection of RowItems. I can now use the View to sort and filter. Saving and loading is done via Serialisation.
2nd variant:
Use of a database to store the data. The data is then loaded into the same business objects as above.
3rd variant:
Bind the ListView directly to a database. I just found out that this is somehow possible and am still somewhat hazy on the details. Especially on the aspect of sorting and filtering (Could that be done via a database Query?)
4th variant:
Store data in database. Use ObservableCollection of bussines objects for binding. But filter (and sort?) via SQL query.
These are the ideas I have to fullfilling my requirements. I'd like to know which one is the best/easiest/best performed/etc. or if you suggest an other approach.enter code here
Peter, there are couple of points you need to consider:
Where possible never make tight integration between back end data and display. Makes it easier to change either later on. you can use NHibernate or other ORM to make it easier to represent and read/write data from database. That way you have layer/tier in between them (aka n-tier)
you will only need to make call to DB for reading the data, it is 1000 rows it is not too bad, you can read them in one go and keep it in memory. Any filtering/sorting/grouping that you can do it on the in memory DomainModel/Buisness Object without touching the library.
i would use mixture of variant 1) and 2).
I would use business object to represent your data from Database (NHibernate is quite handy). I would then perform all data filtering/sorting on the client without requering DB each time. Any writes to the database must go through some sort of Domain Model layer or ORM so that they are not tighly linked. That allows to me to chnage DB without effecting the GUI Front end
When should I use DataSet instead of DataReader?
When I must use DataSet instead of DataReader?
When should I use data in a disconnected fashion?
When I must use data in a disconnected fashion?
N.B.
I am not asking which is better. I need to know the appropriate scenarios of the use of DataSet. I am programming in .net for couple of years but I never seriously needed it.
One scenario, When you want to pass data from one layer to another layer of your application you could use dataset. For more information Dataset and DataReader
A DataSet holds all of the needed data records in memory, whereas a DataReader reads records from a data connection one record at a time.
DataSets are commonly filled with data using DataReaders.
Use a DataReader when you need a high-performance, forward-only reader.
Use a DataSet when you need to do something that requires all of the data to be present at once, such as serialization or passing data between tiers. However, as others have pointed out, using List<T> rather than a DataSet object provides better separation of concerns between tiers.
See http://articles.sitepoint.com/article/dataset-datareader and http://msdn.microsoft.com/en-us/magazine/cc188717.aspx for more info on this.