Does this schema design support efficient querying in Realm? - database

If I had a series of Rows in a Realm database, each one containing a field that was a List [Points] of thousands of coordinate objects (p1, p2, ...pn) - and each one of those coordinates had a field for a TAG property which may not exist - would there be an efficient and inexpensive way to query for results such as the following based on TAG value (i.e. p3 below)
individual Rows (where [Points] contains point(s) that has TAG property matching a queried value)
an array of objects each containing a new [Points] list with Points from hundreds of Rows and a reference to which Row they came from (there could be several Rows with an array of [Points], certain elements of which contain the TAG property that match the value queried for. I want those points extracted from all of the Rows)
Row
Row
Row
[Points]
p1
lat
lon
p2
lat
lon
p3
lat
lon
TAG
p4
...
This article says:
Relationships in Realm are also extremely fast because they’re indexes that traverse a B‑tree–like structure to the related object. This is much faster than querying. Because of this there is no need to perform another full query as ORMs do. It simply is a native pointer to the related object.
But my concern is the size of the dataset: A query of type 2 above would involve going through hundreds of Rows each containing thousands of Points and checking each one of those points to see a) if the TAG property is present and b) if it's value matches the query.
I want to be able to specify a TAG value, then find all instances of Rows as well as coordinates containing that TAG value within a date range. This would happen locally on device, as well as on a server hosting the Realm Object Server. A request like this could happen over the network as well.

Related

Problem placing attribute in dimensional layout

I am doing a small exercise, I need to create a small dimensional design that deals with the tsunamis that have occurred in different countries over the years. I have created a "Country" dimension and a "Location" dimension. In each record of the provided table comes (or may not come) the longitude and latitude in which the place is located. My question is where should I put such attributes, whether in the fact table or in the location dimension. My understanding is that in the fact table it should only contain metrics and the foreign keys of the dimensions. However, I don't know how correct it can be to add the longitude and latitude to the location dimension, since by having the values a very wide range, many records are being created in the "Location" dimensional table. Would it be more appropriate to put those attributes in the fact table?
Thanks.
You should merge Location and Country into a single Location dimension (country is an attribute of location) and hold lat and long in the dimension
On the fact table. This is exactly like a fact that has a time attribute, with sub-second resolution. It's not necessary to create a dimension table containing every possible point in time, or every possible lat/long.

Dimension Creation - Multiple Uses

We received some generic training related to TM1 and dimension creation and we were informed we'd need separate dimensions for the same values.
Let me describe, we transport goods and we'd have an origin and destination province and in typical database design I'd expect we'd have one "province" reference table, but we were informed we'd need an "origin" dimension and a "destination" dimension. This seems to be cumbersome and seems like we'd encounter the same issue with customers, services, etc.
Can someone clarify how this could work for us?
Again, I'd expect to see a "lookup" table in the database which contains all possible provinces (assumption is values in both columns would be the same), then you'd have an ID value in any column that used the "province" and join to the "lookup" table based on ID.
in typical database design I'd expect we'd have one "province" reference table, but we were informed we'd need an "origin" dimension and a "destination" dimension
Following the regular DB design it makes sense to keep two data entities separate: one defines source, other defines target. I think on this we'd both agree. If you could give more details it would be better.
Imagine a drop down list: two lists populated by one single "source", but represent two different values in DB.
assumption is values in both columns would be the same
if the destination=origin, you don't need two dimensions then? :) This point needs clarification.
Besides your solution (combination of all source and destination in a table with an unique ID, which could be a way of solving this), it seems it's resolvable by cube or dimension structure changes.
If at some dimension you'd use e.g. ProvinceOrigin and ProvinceDestination as string type elements, and populate them from one single dimension (dynamic attribute) then whenever you save the cube you'll have these two fields populated from one single dimension.
Obviously the best solution for you depends on your system architecture.

Datastore Index Creation Fails Without Explanation

I'm trying to create a compound index with a single Number field and a list of Strings field. When I view the status of the index it just has an exclamation mark with no explanation. I assume it is because datastore concludes that it is an exploding index based on this FAQ page: https://cloud.google.com/appengine/articles/index_building#FAQs.
Is there any way to confirm what the actual failure reason is? Is it possible to split the list field into multiple fields based on some size limit and create multiple indexes for each chunk?
You get the exploding indexes problem when you have an index on multiple list/repeated properties. In this case a single entity would generate all combinations of the property values (i.e. an index on (A, B) where A has N entries and B has M entries will generate N*M index entries).
In this case you shouldn't get the exploding index problem since you aren't combining two repeated fields.
There are some other obscure ways in which an index build can fail. I would recommend filing a production ticket so that someone can look into your specific indexes.
I believe it was the 1000 item limit per entity for indexes on list properties. I partitioned the property into groups of 999, e.g. property1, property2 etc. as needed. I was then able to create indexes for each chunked property successfully.

What is the best database structure for this scenario?

I have a database that is holding real estate MLS (Multiple Listing Service) data. Currently, I have a single table that holds all the listing attributes (price, address, sqft, etc.). There are several different property types (residential, commercial, rental, income, land, etc.) and each property type share a majority of the attributes, but there are a few that are unique to that property type.
My question is the shared attributes are in excess of 250 fields and this seems like too many fields to have in a single table. My thought is I could break them out into an EAV (Entity-Attribute-Value) format, but I've read many bad things about that and it would make running queries a real pain as any of the 250 fields could be searched on. If I were to go that route, I'd literally have to pull all the data out of the EAV table, grouped by listing id, merge it on the application side, then run my query against the in memory object collection. This also does not seem very efficient.
I am looking for some ideas or recommendations on which way to proceed. Perhaps the 250+ field table is the only way to proceed.
Just as a note, I'm using SQL Server 2012, .NET 4.5 w/ Entity Framework 5, C# and data is passed to asp.net web application via WCF service.
Thanks in advance.
Lets consider the pros and cons of the alternatives:
One table for all listings + attributes:
Very wide table - hard to view to model & schema definitions and table data
One query with no joins required to retreive all data on listing(s)
Requires schema + model change for each new attribute.
Efficient if you always load all the attributes and most items have values for most of the attributes.
Example LINQ query according to attributes:
context.Listings.Where(l => l.PricePerMonthInUsd < 10e3 && l.SquareMeters >= 200)
.ToList();
One table for all listings, one table for attribute types and one for (listing IDs + attribute IDS +) values (EAV):
Listing table is narrow
Efficient if data is very sparse (most attributes don't have values for most items)
Requires fetching all data from values - one additional query (or one join, however, that would waste bandwidth - will fetch basic listing table data per attribute value row)
Does not require schema + model changes for new attributes
If you want type safe access to attributes via code, you'll need custom code generation based on attribute types table
Example LINQ query according to attributes:
var listingIds = context.AttributeValues.Where(v =>
v.AttributeTypeId == PricePerMonthInUsdId && v < 10e3)
.Select(v => v.ListingId)
.Intersection(context.AttributeVales.Where(v =>
v.AttributeTypeId == SquareMetersId && v.Value >= 200)
.Select(v => v.ListingId)).ToList();
or: (compare performance on actual DB)
var listingIds = context.AttributeValues.Where(v =>
v.AttributeTypeId == PricePerMonthInUsdId && v < 10e3)
.Select(v => v.ListingId).ToList();
listingIds = context.AttributeVales.Where(v =>
listingIds.Contains(v.LisingId)
&& v.AttributeTypeId == SquareMetersId
&& v.Value >= 200)
.Select(v => v.ListingId).ToList();
and then:
var listings = context.Listings.Where(l => listingIds.Contains(l.ListingId)).ToList();
Compromise option - one table for all listings and one table per group of attributes including values (assuming you can divide attributes into groups):
Multiple medium width tables
Efficient if data is sparse per group (e.g. garden related attributes are all null for listings without gardens, so you don't add a row to the garden related table for them)
Requires one query with multiple joins (bandwidth not wasted in join, since group tables are 1:0..1 with listing table, not 1:many)
Requires schema + model changes for new attributes
Makes viewing the schema/model simpler - if you can divide attributes to groups of 10, you'll have 25 tables with 11 columns instead of another 250 on the listing table
LINQ query is somewhere between the above two examples.
Consider the pros and cons according to your specific statistics (regarding sparseness) and requirements/maintainability plan (e.g. How often are attribute types added/changed?) and decide.
What I probably do:
I first create a table for the 250 fields, where I have the ID, and the FieldName, for example:
price -> 1
address -> 2
sqft -> 3
This table it will also hard coded on my code as enum and used on queries.
Then in the main table I have two fields together, one the type of the field ID get it from the above table, and the second the value of it, for example
Line1: 122(map id), 1 (for price), 100 (the actually price)
Line2: 122(map id), 2 (for address), "where is it"
Line3: 122(map id), 3 (for sqft), 10 (sqft)
Here the issue is that you may need at least two fields, one for number and one for strings.
This is just a proposal of course.
I would create a listing table which contains only the shared attributes. This table would have listingId as the primary key. It would have a column that stores the listing type so you know if it's a residential listing, landing listing, etc.
Then, for each of the subtypes, create an extra table. So you would have tables for residential_listing, land_listing, etc. The primary key for all of these tables would also be listingId. This column is also a foreign key to listing.
When you wish to operate on the shared data, you can do this entirely from the listing table. When you are interested in specific data you will join in the specific table. Some queries may be able to run entirely on the specific table if all the data is there.

How to set custom database keys?

I am designing a database structure (SQLite on Android) that will consist of two tables, Containers(table 1) and Object/Container Data(table 2).
Table 1 will contain a key for the containers data and a list of keys for the containers/objects within.
Table 2 will contain the data for the object/container: title, description, category, pictures, etc.
With this design I will be able to go to a container, get its data (title, image) and get a list of items it holds, in order, then search for each items data(image, possibly title). Then the user could click on an item and if it is an object go to its data, or if it is a container, repeat this process.
How can I set the ID's for each table so that I can know which table an ID belongs to. For example, if an ID is even it points to table 1, if its odd, it points to table 2.
I think I may have just found an answer that seems rather obvious now... I just have to test it out. Basically you can just set a custom ID by setting the ID to what you want when you create the row.
INSERT INTO test1(rowid, a, b) VALUES(123, 5, 'hello');
I got that from http://www.sqlite.org/autoinc.html (Wish I would have seen it two days ago :)) And now I have to go, so I'll post back if this works when I get a chance.
I guess my only concern now (if it works) is does it create empty rows between the ID's you don't use, wasting space?
I've also considered just having two separate lists, but the order is important, so this would require another list to track the order of the objects, so I am wondering if key manipulation is possible.
If container and objects have common properties, then you could make a base table (Item) which contains the common properties and your Container and Object tables which contain only the extra data (inheritance).
This is known as a Table-per-Type hierachy (TPT). More information on [1] and [2].
Your containers can then link to Items using a simple foreign key, because independent of whether it's a container or an object it will have an entry there.
Now if you want to have only the objects you select all items from the container and do an INNER JOIN with your object table. Like this you will only get the objects and not the containers. If you only want the containers you do the join with the container table. If you want both containers and objects and need all extra data (that is not in the Item base table) you can do a LEFT OUTER JOIN with the object table and then a LEFT OUTER JOIN with the container table.
References:
[1] http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server
[2] http://blogs.devart.com/dotconnect/table-per-type-vs-table-per-hierarchy-inheritance.html

Resources