I'm trying to get my way with DynamoDB and NoSQL.
What is the best (right?) approach for modeling a student table and class tables with respect to the fact that I need to have a student-is-in-class relationship.
I'm taking into account that there is no second-index available in DynamoDB.
The model needs to answer the following questions:
Which students are in a specific class?
Which classes a student take?
Thanks
A very simple suggestion (without range keys) would be to have two tables: One per query type. This is not unusual in NoSQL databases.
In your case we'd have:
A table Student with attribute StudentId as (hash type) primary key. Each item might then have an attribute named Attends, the value of which was a list of Ids on classes.
A table Class with attribute ClassId as (hash type) primary key. Each item might then have an attribute named AttendedBy, the value of which was a list of Ids on students.
Performing your queries would be simple. Updating the database with one "attends"-relationship between a student and a class requires two separate writes, one to each table.
Another design would have one table Attends with a hash and range primary key. Each record would represent the attendance of one student to one class. The hash attribute could be the Id of the class and the range key could be the Id of the student. Supplementary data on the class and the student would reside in other tables, then.
To join two Amazon DynamoDB tables
The following example maps two Hive tables to data stored in Amazon DynamoDB. It then calls a join across those two tables. The join is computed on the cluster and returned. The join does not take place in Amazon DynamoDB. This example returns a list of customers and their purchases for customers that have placed more than two orders.
CREATE EXTERNAL TABLE hive_purchases(customerId bigint, total_cost double, items_purchased array<String>)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "Purchases",
"dynamodb.column.mapping" = "customerId:CustomerId,total_cost:Cost,items_purchased:Items");
CREATE EXTERNAL TABLE hive_customers(customerId bigint, customerName string, customerAddress array<String>)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES ("dynamodb.table.name" = "Customers",
"dynamodb.column.mapping" = "customerId:CustomerId,customerName:Name,customerAddress:Address");
Select c.customerId, c.customerName, count(*) as count from hive_customers c
JOIN hive_purchases p ON c.customerId=p.customerId
GROUP BY c.customerId, c.customerName HAVING count > 2;
Related
According to Kimball, Factless fact table are“fact tables that have no facts but captures the many-to-many relationship between dimension keys.”
A factless fact table is a fact table that does not have any measures. It is essentially an intersection of dimensions (it contains nothing but dimensional keys).
In my case, I am creating a fact table which captures for each employee :
their function
their role
their main manager
their department
their status
EntryDate
ExitDate
The event related to my fact table are :
- when any change is applied to the function, role main manager.. of an
already existing employee
- or a new employee has arrived
I am adding for historical need in my fact :
BI_StartDate
BI_EndDate
Is my fact table a Factless fact ?
The fact is containing the history : How can I track the date of updates if I am having an update of function and type of an employee of the same period ?
This is an example of a Type II dimension.
Notes:
The current record should have a null BI_EndDate
You can either join on Current Info by joining on EmpID and BI_EndDate is null
or
You can join on the record at the time
EmpID and [Comparison date]>=BI_StartDate and [Comparison date] <= ISNULL(BI_EndDate,'20991231')
Futhermore, I think your example of a factless fact seems more in line with many to many relationships.
As an example, think of students and classes. There are many students and many classes but the intersection of these two is a studentClass table. (with the official title of studentEnrollment but that not important).
I don't necessarily call this factless as the measure coming from this table are counts.
The main idea is to store multiple ids from areas into one column. Example
Area A id=1
Area B id=2
I want if it is possible to save into one column which area my customer can service.
Example if my customer can service both of them to store into one column, I imagine something like:
ColumnArea
1,2 //....or whatever area can service
Then I want using an SQL query to retrieve this customer if contains this id.
Select * from customers where ColumnArea=1
Is there any technique or idea making that?
You really should not do that.
Storing multiple data points in a single column is bad design.
For a detailed explanation, read Is storing a delimited list in a database column really that bad?, where you will see a lot of reasons why the answer to this question is Absolutely yes!
What you want to do in this situations is create a new table, with a relationship to the existing table. In this case, you will probably need a many-to-many relationship, since clearly one customer can service more than one area, and I'm assuming one area can be serviced from more than one customer.
A many-to-many relationship is generating by connection two tables containing the data with another table containing the connections between the data (A.K.A bridge table). All relationships directly between tables are either one-to-one or one-to-many, and the fact that there is a bridge table allows the relationship between both data tables to be a many-to-many relationship.
So the database structure you want is something like this:
Customers Table
CustomerId (Primary key)
FirstName
LastName
... Other customer related data here
Areas Table
AreaId (Primary key)
AreaName
... Other area related data here
CustomerToArea table
CustomerId
AreaId
(Note: The combination of both columns is the primary key here)
Then you can select customers for area 1 like this:
SELECT C.*
FROM Customers AS C
WHERE EXISTS
(
SELECT 1
FROM CustomerArea As CA
WHERE CA.CustomerId = C.CustomerId
AND AreaId = 1
)
I am trying to create a table in my database using Visual Studio.
I've got a table for my Products (like in online shop) and then I have a table for Orders, which should store all products that user has ordered. The problem is that I am not sure which datatype I should use when designing the database to store an array of products in my Orders table. This is what the Orders table should look like
You should create Products and Orders table with relationship between them.
Your Orders table should have Id column as well (which is PrimaryKey)
Then you should create Products table, that keeps all the information about products and additionaly OrderId which should be used as Foreign Key to Orders table.
Please look at that link:
https://msdn.microsoft.com/en-us/library/ms189049.aspx
It's also worth of checking:
One To One, One To Many, Many To Many relations in SQLServer to have better understanding and design your data store properly.
In your case you need ProductsOrders table, Many To Many relationship.
In Relational database, you can create a relationship between 2 tables.
The relationship can be
1 to 1 (1 Product - 1 Order)
1 to Many (1 Product - 'n' Order)
Many to Many (n product - 'n' Order)
Based on your scenario, You can choose any of the relationship listed above. While querying from the database, you can easily operate over each order/Product.
Suppose I have a table for purchase orders. One customer might buy many products. I need to store all these products and their relevant prices in a single record, such as an invoice format.
If you can change the db design, Prefer to create another table called PO_products that has the PO_Id as the foreign key from the PurchaseOrder table. This would be more flexible and the right design for your requirement.
If for some reason, you are hard pressed to store in a single cell (which I re-iterate is not a good design), you can make use of XMLType and store all of the products information as XML.
Note: Besides being bad design, there is a significant performance cost of storing the data as XML.
This is a typical example of an n-n relationship between customer and products.
Lets say 1 customer can have from 0 to N products and 1 products can be bought by 0 to N customers. You want to use a junction table to store every purchase orders.
This junction table may contain the id of the purchase, the id of the customer and the id of the product.
https://en.wikipedia.org/wiki/Many-to-many_(data_model)
I have a problem with a many-to-many relation in my tables, which is between an employee and instructor who work in a training centre. I cannot find the link between them, and I don't know how to get it. The employee fields are:
employee no.
employee name
company name
department job title
business area
mobile number
ext
ranking
The Instructors fields are
instructor name
institute
mobile number
email address
fees
in a many-to-many relationship the relationships will be in a 3rd table, something like
table EmployeeInstructor
EmployeeID
InstructorID
to find all the employees for a specific instructor, you'd use a join against all three tables.
Or more likely there will be classes involved --
Employee takes Class
Instructor teaches Class
so you'll have and EmployeeClass table,
an InstructorClass table,
and join through them. And Class needs to be unique, or else you'll need
Class is taught in Quarter on ClassSchedule
and end up joining EmplyeeClassSchedule to InstructorClassSchedule.
This ends up being one of your more interesting relational designs pretty quickly. If you google for "Terry Halpin" and "Object Role Modeling", this is used as an illustrative situation in the tutorial.
First of all, you will need a unique key in both tables. The employee number may work for the employee table, but you will need another for the instructor table. Personally, I tend to use auto incrementing identity fields called ID in my tables. This is the primary key.
Second, create a new table, InstructorEmployee. This table has two columns, InstructorID and EmployeeID. Both fields should be indexed. Now you can create an association between any Employee and any Instructor by creating a record which contains the two IDs.