How to maintain an ordered table with Core Data (or SQL) with insertions/deletions? - database

This question is in the context of Core Data, but if I am not mistaken, it applies equally well to a more general SQL case.
I want to maintain an ordered table using Core Data, with the possibility for the user to:
reorder rows
insert new lines anywhere
delete any existing line
What's the best data model to do that? I can see two ways:
1) Model it as an array: I add an int position property to my entity
2) Model it as a linked list: I add two one-to-one relations, next and previous from my entity to itself
1) makes it easy to sort, but painful to insert or delete as you then have to update the position of all objects that come after
2) makes it easy to insert or delete, but very difficult to sort. In fact, I don't think I know how to express a Sort Descriptor (SQL ORDER BY clause) for that case.
Now I can imagine a variation on 1):
3) add an int ordering property to the entity, but instead of having it count one-by-one, have it count 100 by 100 (for example). Then inserting is as simple as finding any number between the ordering of the previous and next existing objects. The expensive renumbering only has to occur when the 100 holes have been filled. Making that property a float rather than an int makes it even better: it's almost always possible to find a new float midway between two floats.
Am I on the right track with solution 3), or is there something smarter?

If the ordering is arbitrary i.e. not inherent in the data being modeled, then you have no choice but to add an attribute or relationship to maintain the order.
I would advise the linked list since it is easiest to maintain. I'm not sure what you mean by a linked list being difficult to sort since you won't most likely won't be sorting on an arbitrary order anyway. Instead, you will just fetch the top most instance and walk your way down.
Ordering by a divisible float attribute is a good idea. You can create an near infinite number of intermediate indexes just by subtracting the lower existing index from higher existing index, dividing the result by two and then adding that result to the lower index.
You can also combine an divisible index with a linked list if you need an ordering for tables or the like. The linked list would make it easy to find existing indexes and the divisible index would make it easy to sort if you needed to.
Core Data resist this kind of ordering because it is usually unnecessary. You don't want to add something to the data model unless it is necessary to simulate the real world object, event or condition that the model describes. Usually, ordering/sorting are not inherent to the model but merely needed by the UI/view. In that case, you should have the sorting logic in the controller between the model and the view.
Think carefully before adding ordering to model when you may not need it.

Since iOS 5 you can (and should) use NSOrderedSet and its mutable subclass. → Core Data Release Notes for OS X v10.7 and iOS 5.0
See the accepted answer at How can I preserve an ordered list in Core Data.

Related

Database Position Index

Does anyone know of any databases (SQL or NoSQL) that have native support for position based indexes?
To clarify, on many occasions I've had the need to maintain a position based collection, where the order or position is maintained by an external entity (user, external service, etc). By maintained I mean the order of the items in the collection will be changed quite often but are not based on any data fields in the record, the order is completely arbitrary as far as the service maintaining the collection is concerned. The service needs to provide an interface that allows CRUD functions by position (Insert after Pos X, Delete at Pos Y, etc) as well as manipulating the position (move from pos X to pos Y).
I'm aware there are workaround ways that you can achieve this, I've implemented many myself but this seems like a pretty fundamental way to want to index data. So I can't help but feel there must be an off the shelf solution out there for this.
The only thing I've seen that comes close to this is Redis's List data type, which while it's ordered by position, is pretty limited (compared to a table with multiple indexes) and Redis is more suited as a Cache rather than a persistent data store.
Finally I'm asking this as I've got a requirement that needs user ordered collections that could contain 10,000's of records.
In case it helps anyone, the best approximation of this I've found so far is to implement a Linked List structure in a Graph Database (like Neo4J). Maintaining the item links is considerably easier than maintaining a position column (especially if you only need next links, i.e. not doubly linked). It's easier as there is no need to leave holes, re-index, etc, you only have to move pointers (or relations). The performance is pretty good but reads slow down linearly if you're trying to access items towards the end of the list by position, as you have to scan (SKIP) the whole list start to end.

how to remember multiple indexes in a buffer to later access them for modification one by one...keeping optimization in mind

i have a scenario where i have to set few records with field values to a constant and then later access them one by one sequentially .
The records can be random records.
I dont want to use link list as it will be costly and don't want to traverse the whole buffer.
please give me some idea to do that.
When you say "set few records with field values to a constant" is this like a key to the record? And then "later access them one by one" - is this to recall them with some key? "one-by-one sequentially" and "don't want to traverse the whole buffer" seems to conflict, as sequential access sounds a lot like traversal.
But I digress. If you in fact do have a key (and it's a number), you could use some sort of Hash Table to organize your records. One basic implementation might be an array of linked lists, where you mod the key down into the array's range, then add it to the list there. This might increase performance assuming you have a good distribution of keys (your records spread across the array well).
Another data structure to look into might be a B-Tree or a binary search tree, which can access nodes in logarithmic time.
However, overall I agree with the commenters that over-optimizing is usually not a good idea.

Database design linked list vs. order by

I have a table which need to persist some user actions in sequence. I can either save it by using a self reference table which will be like a linked list or not using the self reference at all and just use the times tamp to keep the sequence.
This table has reference to other tables such as user and files associated with an action.
The operations will need support CRUD. The frequency of operations are in this order: Retrieve > Insert > Update > Delete
What is your design preference and why?
Thanks!
i would avoid "linked lists" like the plague. about the only thing they are good for is retrieving the "next" item. the problem is that every extra "hop" requires a join, so if you want to parameterise over that (eg to provide a function that retrieves N items following a given item) then you need one of (1) machine-generated joins (2) multiple selects (3) SQL that's unlikely to be portable and/or supported by your ORM.
this is the same problem that makes trees notoriously nasty in sql. it's "fixed" by recursive joins, but that's (3) above (maybe i am old-fashioned and someone will say that these are well supported now - if so i guess i will learn...).

C Database Design, Sortable by Multiple Fields

If memory is not an issue for my particular application (entry, lookup, and sort speed being the priorities), what kind of data structure/concept would be the best option for a multi-field rankings table?
For example, let's say I want to create a Hall of Fame for a game, sortable by top score (independent of username), username (with all scores by the same user placed together before ranking users by their highest scores), or level reached (independent of score or name). In this example, if I order a linked list, vector, or any other sequential data structure by the top score of each player, it makes searching for the other fields -- like level and non-highest scores -- more iterative (i.e. iterate across all looking for the stage, or looking for a specific score-range), unless I conceive some other way to store the information sorted when I enter new data.
The question is whether there is a more efficient (albeit complicated and memory-consumptive) method or database structure in C/C++ that might be primed for this kind of multi-field sort. Linked lists seem fine for simple score rankings, and I could even organize a hashtable by hashing on a single field (player name, or level reached) to sort by a single field, but then the other fields take O(N) to find, worse to sort. With just three fields, I wonder if there is a way (like sets or secondary lists) to prevent iterating in certain pre-desired sorts that we know beforehand.
Do it the same way databases do it: using index structures. You have your main data as a number of records (structs), perhaps ordered according to one of your sorting criteria. Then you have index structures, each one ordered according to one of your other sorting criteria, but these index structures don't contain copies of all the data, just pointers to the main data records. (Think "index" like the index in a book, with page numbers "pointing" into the main data body.)
Using ordered linked list for your index structures will give you a fast and simple way to go through the records in order, but it will be slow if you need to search for a given value, and similarly slow when inserting new data.
Hash tables will have fast search and insertion, but (with normal hash tables) won't help you with ordering at all.
So I suggest some sort of tree structure. Balanced binary trees (look for AVL trees) work well in main memory.
But don't forget the option to use an actual database! Database managers such as MySQL and SQLite can be linked with your program, without a separate server, and let you do all your sorting and indexing very easily, using SQL embedded in your program. It will probably execute a bit slower than if you hand-craft your own main-memory data structures, or if you use main-memory data structures from a library, but it might be easier to code, and you won't need to write separate code to save the data on disk.
So, you already know how to store your data and keep it sorted with respect to a single field. Assuming the values of the fields for a single entry are independent, the only way you'll be able to get what you want is to keep three different lists (using the data structure of your choice), each of which are sorted to a different field. You'll use three times the memory's worth of pointers of a single list.
As for what data structure each of the lists should be, using a binary max heap will be effective. Insertion is lg(N), and displaying individual entries in order is O(1) (so O(N) to see all of them). If in some of these list copies the entries need to be sub-sorted by another field, just consider that in the comparison function call.

Should I denormalize properties to reduce the number of indexes required by App Engine?

One of my queries can take a lot of different filters and sort orders depending on user input. This generates a huge index.yaml file of 50+ indexes.
I'm thinking of denormalizing many of my boolean and multi-choice (string) properties into a single string list property. This way, I will reduce the number of query combinations because most queries will simply add a filter to the string list property, and my index count should decrease dramatically.
It will surely increase my storage size, but this isn't really an issue as I won't have that much data.
Does this sound like a good idea or are there any other drawbacks with this approach?
As always, this depends on how you want to query your entities. For most of the sorts of queries you could execute against a list of properties like this, App Engine will already include an automatically built index, which you don't have to specify in app.yaml. Likewise, most queries that you'd want to execute that require a composite index, you couldn't do with a list property, or would require an 'exploding' index on that list property.
If you tell us more about the sort of queries you typically run on this object, we can give you more specific advice.
Denormalizing your data to cut back on the number of indices sounds like it a good tradeoff. Reducing the number of indices you need will have fewer indices to update (though your one index will have more updates); it is unclear how this will affect performance on GAE. Size will of course be larger if you leave the original fields in place (since you're copying data into the string list property), but this might not be too significant unless your entity was quite large already.
This is complicated a little bit since the index on the list will contain one entry for each element in the list on each entity (rather than just one entry per entity). This will certainly impact space, and query performance. Also, be wary of creating an index which contains multiple list properties or you could run into a problem with exploding indices (multiple list properties => one index entry for each combination of values from each list).
Try experimenting and see how it works in practice for you (use AppStats!).
"It will surely increase my storage size, but this isn't really an issue as I won't have that much data."
If this is true then you have no reason to denormalize.

Resources