Reducing amount of http requests by grouping queries - angularjs

So I have a bunch of chatty http requests in my Angular 1 app which are bottle necking many of the other requests.
Imagine I have a list of Users from a totally different data source and I make calls to 5 different tables such as:
user.signup
+-----+------------+
| uid | date |
+-----+------------+
| 1 | 2016-12-13 |
| 2 | 2016-12-01 |
+-----+------------+
user.favourite_color
+-----+-------+
| uid | color |
+-----+-------+
| 1 | red |
| 5 | blue |
| 7 | green |
+-----+-------+
user.location
+-----+-----------+
| uid | location |
+-----+-----------+
| 2 | uk |
| 3 | france |
| 9 | greenland |
+-----+-----------+
The reason they are in different tables are because the fields are optional.
The way I see it I have 3 options:
Put them in 1 table
So I could just group them all in 1 table and have a bunch of null columns but that just doesn't sit right with me in terms of DB design.
+-----+------------+-----------+-------+
| uid | date | location | color |
+-----+------------+-----------+-------+
| 1 | 2016-12-13 | null | red |
| 2 | 2016-12-01 | uk | null |
| 3 | null | greenland | null |
| 5 | null | null | blue |
+-----+------------+-----------+-------+
Join them all with 1 request
So I could just have one query that joins all these tables but the way I see it they would have to be full joins with the expectation that some uid's wouldn't exist in some tables. e.g.
+------+------------+-------+-----------+-------+-------+
| uid | date | l_uid | location | c_uid | color |
+------+------------+-------+-----------+-------+-------+
| 1 | 2016-12-13 | null | null | 1 | red |
| 2 | 2016-12-01 | 2 | uk | null | null |
| null | null | 3 | greenland | null | null |
| null | nul | null | null | 5 | blue |
+------+------------+-------+-----------+-------+-------+
which is probably even worse!
Change the way the requests are made?
Maybe make some clever changes how the requests are made:
function activate() {
$q.all([requestSignupDate(), requestFaveColor(), requestLocation(), ....])
.then(function (data) {
//do a bunch of stuff with the data
});
}
which I want to change to:
function activate() {
requestUserData();
}
Any suggestions?

This is a typical ORM problem - precisely this database does not provide a better way of storing the user as an entity.
The entity properties are spread out in multiple tables and due to being optional you are taxed to do left joins with multiple tables.
So you have to essentially solve that problem first. You have several options or (non-options without knowing requirements.)
Put them in one table
If you can use nullable columns and refactor your application - this is preferable. I see that your other tables have just one or two more fields. Heavily normalized tables saves some space and with no data repetition other normalization benefits are moot.
Join them all with 1 request
Only if your query stays performant. Can you use left join ?
You would do this if the above option is difficult. Use this only as a quick fix.
Other options To solve Database Problem
Use server side caching if feasible.
If feasible use a different NoSQL database (e.g. MongoDB)
Change the way the requests are made?
Do you really need all the properties upfront ? The web is asynchronous so why not keep things async. Use $q.all only if you need all the properties. For example the user may not even navigate/scroll to certain part of the page to see stuff so why may queries in the first place.
Along with this you can cluster your server side and the database so that these queries fall on multiple machines and load gets distributed. You may get some items retrieved in parallel.
If the number of columns are all that you have and the tables are all that you mentioned i.e. the supplemental tables have fewer properties I would go with Put them in 1 table option.

Why doesn't using nullable fields sit right with you? NULL exists because it's useful, and a profile table with optional values for a fixed set of fields is practically the textbook case for invoking it. If fields can be dynamically redefined (eg swapping out "favorite color" for "favorite food"), it's another story, but that's not a requirement in what you've described.

Related

Using tables to categorise resources with

I'm trying to design a database that allows for filtering according to if a specific resource fills certain categories. I've gotten to the point where I can input data that seems to be how it should be filled out but I'm not sure how I should pull it out again.
The main resource table looks like this:
Table1 - resources
| resourceID | AutoNum |
| title | short text |
| author | short text |
| publish date | date |
| type | short text |
Table2 - Department Categories
| ID | AutoNum |
| 1 | Yes/No |
| 2 | Yes/No |
| fID| Number |
Table3 - Categories
| ID | AutoNum |
| cat | Yes/No |
| dog | Yes/No |
| bird | Yes/No |
| fID | Number |
I have built a form where you can fill in items to the resource ID, and at the same time check off the Yes/No boxes in tables 2 & 3.
I'm trying to use the primary key ID from table 1 and copy it into the table 2 & 3 with referential integrity to cascade deletes, updates. Which I think is the right way to do this.
Currently, I've learnt that I can implement a search function for the columns in table 1, this seems to work fine. However I am stuck with applying the relevant columns in table 2 and 3 as filters.
apply search>
[X] - Cats
Should only return records from table 1 where in table 3 the relevant column has a tick in the Yes/No box.
I hope I have explained this properly, very new to Access and databases so if you need clarity, don't mind offering.

Constraint to prevent overlapping time periods

Temporal tables and time periods are nothing special in the IT world. But somehow it seems my request is rather unique because I cannot find anything useful for it at all.
What I have are two tables, one containing static data and the other one dynamic data for entries of the first table which changes over time. For each row in the second table there are two columns ValidFrom and ValidUntil whereas the latter can be null if there is no planned end of validity.
Simplified, the schema looks like this:
tbStatic
+----+------------+------------+
| Id | Attribute1 | Attribute2 |
+----+------------+------------+
| 1 | foo | bar |
| 2 | baz | foo |
+----+------------+------------+
tbDynamic
+----+------------+---------------------+---------------------+------------+------------+
| Id | tbStaticId | ValidFrom | ValidUntil | Attribute1 | Attribute2 |
+----+------------+---------------------+---------------------+------------+------------+
| 1 | 1 | 2018-01-01 00:00:00 | 2018-01-31 23:59:59 | 1 | 0 |
| 2 | 2 | 2018-04-01 00:00:00 | 2018-04-02 11:59.59 | 2 | 1 |
| 3 | 1 | 2018-02-01 00:00:00 | null | 2 | 1 |
| 4 | 2 | 2018-05-01 00:00:00 | 2018-06-01 00:00:00 | 23 | 15 |
| 5 | 2 | 2018-07-01 01:23:45 | 2018-07-05 23:12:01 | 80 | 12 |
+----+------------+---------------------+---------------------+------------+------------+
As you might have spotted, there is the possibility that we have holes between time periods. What we cannot have is overlapping periods, though. This means it is impossible to have overlapping time periods for the same tbStaticId.
Unfortunately, this is only a requirement until now and although it is enforced in the application using the database, I would prefer having a constraint on the table that prevents new rows to be inserted or existing rows to be updated when they violate this time uniqueness.
As stated, my research up to this point was rather disappointing and that is also the reason I cannot really show any code I've tried yet. The most promising approach I followed yet was to create a function that takes a record or the two time period values and the foreign key as input and determines if they overlap with something else. This function could then be called in a check constraint. But after thinking about the amount of cases to check, I gave up because it seemed unreasonable (especially when considering updates as well, which require additional attention).
So my question is if there is some easy way to constraint time slices in SQL Server, without the use of temporal tables (not available in my SQL Server version)? And if yes, how?

How to design a database for types and categories in Laravel?

As the questions states, what is the best way when designing a database for types and categories?
Scenario:
I have x amount of database-tables e.g. users, feedback, facts and countries, and all these tables have a type-attribute. What I've found is that a lot of people tend to just create type-tables for each and one of these. E.g. user_types, feedback_types, fact_types and country_types.
I'm currently working on a project where I don't want to create a bunch of extra tables just to handle their individual types. Therefore I'm trying to come up with a database-design-solution that fits all tables.
My best thought of solution:
At first I thought I might just create a polymorphic table that has id, type_id, typable_id and typable_type and a types table. Then i figured that I have to specify in the types table which type-attribute belongs to which table. Then it hit me, I can create a self-referencing table where the parent name is the table name.
E.g.
---------------------------------------------
|id | parent_id | name | description |
---------------------------------------------
| 1 | null | feedback | something |
---------------------------------------------
| 2 | 1 | general | something |
---------------------------------------------
| 3 | 1 | bug | something |
---------------------------------------------
| 4 | 1 | improvement | something |
---------------------------------------------
| 5 | null | countries | something |
---------------------------------------------
| 4 | 5 | europe | something |
---------------------------------------------
| 4 | 5 | asia | something |
---------------------------------------------
| etc... |
---------------------------------------------
Is this a ok design? I'm thinking a lot about the parent names in this table, I haven't seen anyone else use table-names as parents.
If thinking about it in a front-end point of view, it's easier to get the correct types depending on which types you're looking for.
Please give me feedback on this. I'm struggling to find a good design.

What's the fastest way to perform large inserts with foreign key relationships and preprocessing?

I need to regularly import large (hundreds of thousands of lines) tsv files into multiple related SQL Server 2008 R2 tables.
The input file looks something like this (it's actually even more complex and the data is of a different nature, but what I have here is analogous):
January_1_Lunch.tsv
+-------+----------+-------------+---------+
| Diner | Beverage | Food | Dessert |
+-------+----------+-------------+---------+
| Nancy | coffee | salad_steak | pie |
| Joe | milk | soup_steak | cake |
| Pat | coffee | soup_tofu | pie |
+-------+----------+-------------+---------+
Notice that one column contains a character-delimited list that needs preprocessing to split it up.
The schema is highly normalized -- each record has multiple many-to-many foreign key relationships. Nothing too unusual here...
Meals
+----+-----------------+
| id | name |
+----+-----------------+
| 1 | January_1_Lunch |
+----+-----------------+
Beverages
+----+--------+
| id | name |
+----+--------+
| 1 | coffee |
| 2 | milk |
+----+--------+
Food
+----+-------+
| id | name |
+----+-------+
| 1 | salad |
| 2 | soup |
| 3 | steak |
| 4 | tofu |
+----+-------+
Desserts
+----+------+
| id | name |
+----+------+
| 1 | pie |
| 2 | cake |
+----+------+
Each input column is ultimately destined for a separate table.
This might seem an unnecessarily complex schema -- why not just have a single table that matches the input? But consider that a diner may come into the restaurant and order only a drink or a dessert, in which case there would be many null rows. Considering that this DB will ultimately store hundreds of millions of records, that seems like a poor use of storage. I also want to be able to generate reports for just beverages, just desserts, etc., and I figure those will perform much better with separate tables.
The orders are tracked in relationship tables like this:
BeverageOrders
+--------+---------+------------+
| mealId | dinerId | beverageId |
+--------+---------+------------+
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 3 | 1 |
+--------+---------+------------+
FoodOrders
+--------+---------+--------+
| mealId | dinerId | foodId |
+--------+---------+--------+
| 1 | 1 | 1 |
| 1 | 1 | 3 |
| 1 | 2 | 2 |
| 1 | 2 | 3 |
| 1 | 3 | 2 |
| 1 | 3 | 4 |
+--------+---------+--------+
DessertOrders
+--------+---------+-----------+
| mealId | dinerId | dessertId |
+--------+---------+-----------+
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 3 | 1 |
+--------+---------+-----------+
Note that there are more records for Food because the input contained those nasty little lists that were split into multiple records. This is another reason it helps to have separate tables.
So the question is, what's the most efficient way to get the data from the file into the schema you see above?
Approaches I've considered:
Parse the tsv file line-by-line, performing the inserts as I go. Whether using an ORM or not, this seems like a lot of trips to the database and would be very slow.
Parse the tsv file to data structures in memory, or multiple files on disk, that correspond to the schema. Then use SqlBulkCopy to import each one. While it's fewer transactions, it seems more expensive than simply performing lots of inserts, due to having to either cache a lot of data or perform many writes to disk.
Per How do I bulk insert two datatables that have an Identity relationship and Best practices for inserting/updating large amount of data in SQL Server 2008, import the tsv file into a staging table, then merge into the schema, using DB functions to do the preprocessing. This seems like the best option, but I'd think the validation and preprocessing could be done more efficiently in C# or really anything else.
Are there any other possibilities out there?
The schema is still under development so I can revise it if that ends up being the sticking point.
You can import you file in the table of the following structure: Diner, Beverage, Food, Dessert, ID (identity, primary key NOT CLUSTERED - for performance issues).
After this simply add the following columns: Dinner_ID, Beverage_ID, Dessert_ID and fill them according to your separate tables (it's simple to group each of the columns and to add the missing data to lookup tables as Beverages, Desserts, Meals and, after this, to fix the imported table with the IDs for existent and newly added records).
The situation with Food table is more complex because of ability to combine the foods, but the same trick can be used: you can also add the data to your lookup table and, among this, store the combinations of foods in the additional temp table (with the unique ID) and separation on the single dishes.
When the parcing will be finished, you will have 3 temp tables:
table with all your imported data and IDs for all text columns
table with distinct food lists (with IDs)
table with IDs of food per food combination
From the above tables you can perform the insertion of the parsed values to either structure as you want.
In this case only 1 insert (bulk) will be done to the DB from the code side. All other data manipulations will be performed in the DB.

Schema to support dynamic properties

I'm working on an editor that enables its users to create "object" definitions in real-time. A definition can contain zero or more properties. A property has a name a type. Once a definition is created, a user can create an object of that definition and set the property values of that object.
So by the click of a mouse-button, the user should ie. be able to create a new definition called "Bicycle", and add the property "Size" of type "Numeric". Then another property called "Name" of type "Text", and then another property called "Price" of type "Numeric". Once that is done, the user should be able to create a couple of "Bicycle" objects and fill in the "Name" and "Price" property values of each bike.
Now, I've seen this feature in several software products, so it must be a well-known concept. My problem started when I sat down and tried to come up with a DB schema to support this data structure, because I want the property values to be stored using the appropriate column types. Ie. a numeric property value is stored as, say, an INT in the database, and a textual property value is stored as VARCHAR.
First, I need a table that will hold all my object definitions:
Table obj_defs
id | name |
----------------
1 | "Bicycle" |
2 | "Book" |
Then I need a table for holding what sort of properties each object definition should have:
Table prop_defs
id | obj_def_id | name | type |
------------------------------------
1 | 1 | "Size" | ? |
2 | 1 | "Name" | ? |
3 | 1 | "Price" | ? |
4 | 2 | "Title" | ? |
5 | 2 | "Author" | ? |
6 | 2 | "ISBN" | ? |
I would also need a table that holds each object:
Table objects
id | created | updated |
------------------------------
1 | 2011-05-14 | 2011-06-15 |
2 | 2011-05-14 | 2011-06-15 |
3 | 2011-05-14 | 2011-06-15 |
Finally, I need a table that will hold the actual property values of each object, and one solution is for this table to have one column for each possible value type, such as this:
Table prop_vals
id | prop_def_id | object_id | numeric | textual | boolean |
------------------------------------------------------------
1 | 1 | 1 | 27 | | |
2 | 2 | 1 | | "Trek" | |
3 | 3 | 1 | 1249 | | |
4 | 1 | 2 | 26 | | |
5 | 2 | 2 | | "GT" | |
6 | 3 | 2 | 159 | | |
7 | 4 | 3 | | "It" | |
8 | 5 | 3 | | "King" | |
9 | 6 | 4 | 9 | | |
If I implemented this schema, what would the "type" column of the prop_defs table hold? Integers that each map to a column name, varchars that simply hold the column name? Any other possibilities? Would a stored procedure help me out here in some way? And what would the SQL for fetching the "name" property of object 2 look like?
You are implementing something called Entity-Attribute-Value model http://en.wikipedia.org/wiki/Entity-attribute-value_model.
Lots of folks will say it's a bad idea (usually I am one of those) because the answer to your last question, "What would the SQL for fetching..." tends to be "thick hairy and nasty, and gettting worse."
These criticisms tend to hold once you allow users to start nesting objects inside of other objects, if you do not allow that, the situation will remain manageable.
For your first question, "what would the "type" column of the prop_defs table hold", everything will be simpler if you have a table of types and descriptions that holds {"numeric","Any Number"}, {"textual","String"}, etc. The first value is the primary key. Then in prop_defs your column "type" is a foreign key to that table and holds values "numeric", "textual", etc. Some will tell you incorrectly to always use integer keys because they JOIN faster, but if you use the values "numeric", "textual" etc. you don't have to JOIN and the fastest JOIN is the one you don't do.
The query to grab a single value will have a CASE statement:
SELECT case when pd.type = "numeric" then pv.numeric
when pd.type = "textual" then pv.textual
when pd.type = "boolean" then pv.boolean
from prov_vals pv
JOIN prop_defs pd ON pv.prop_def_id = pv.id
WHERE pv.object_id = 2
AND pd.name = "Name"
You must accept that relational databases are not good at providing this kind of functionality. They CAN provide it, but are not good at it. (I hope I'm wrong). Relational databases lend themselves better to defined interfaces, not changing interfaces.
--EAV tables give dynamic fields but suck on performance. Sucks on indexing. And it is complex to query. It gets the job done in many situations, but can fall apart on big tables with lots of users hitting the system.
--"Regular" tables with several place holder columns are OK for performance, but you get non-descriptive column names and are limited in the number of columns you can "add". Also it does not support sub-type separation.
--Typically you create/modify tables at development time, not run time. Should we really discriminate against modifying the database at run time? maybe, maybe not. Creating new tables, foreign keys, and columns at run-time can achieve true dynamic objects, while giving the performance benefits of "regular" tables. But you would have to query the schema of the database, then dynamically generate all of your queries. That would suck. It would totally break the concept of tables as an interface.

Resources