How to move table row to other amps in Teradata? - database

Let`s say there are clique 1(contains 4 nodes) and clique 2 (contains 2 nodes).
and the table rows are being distributed across clique 1 and clique 2 nodes amps.
When clique 2 failed, we lost our access to some of important tables.
As one of our options, we want to transfer these important tables to clique 1 nodes (amps) only.
How do you do this? I looked into some manuals and looks like sparse maps can be used but I am having trouble determining the amp number of those nodes that are in clique1.
And the table I am referring to has million of rows, I am afraid it will cause amp skew.
Does anyone have alternative solution for this? Will appreciate any ideas. Thank you.

Related

What is the correct data structure for a workout journal?

I have trouble getting my head around the correct data structure for the following application. Although I am sure this is quite standard for people in the field.
I want to create, for learning purposes, a workout journal app. So the idea is to be able to log, each day, a particular workout.
A workout is comprised of exercises. And each exercise has particular attributes.
For example. Workout 1 is a strength session. So I will have e.g. dumbell press, squats, ... which are all sets and reps based. So for workout 1 I need to be able to enter for each exercise the sets, reps and weight used for that particular workout.
Workout 2 is say a running session. This is time based and distance based. So for that workout 2 I need to be able to enter time and distance.
What would be the structure I need to have in my database for such application ?
I guess I should have an "exercise" table. Then this should somehow be a foreign key in the "workout" table. But how can the workout table accommodate varying attributes ? As well as varying number of entries ? (since a workout can be one, three or ten exercises) Also all this should constitute only one record of the "workout" table.
EDIT :
I have tried to come up with a structure. Could someone confirm/infirm this is the correct way to do this ?
So the final result is the one below, for human representation :
Final Result (sport journal)
Date
Timestamp
Exercise 1
Sets
Reps
Weight
Exercise 2
Time
Distance
Exercise 3
Sets
Reps
Weight
120821
10.30
Bench press
5
10
40
Run
60
400
NULL
NULL
NULL
NULL
120821
17.00
Bench press
5
10
40
NULL
NULL
NULL
Squats
3
5
120
But I guess this can't really be achieved as such as this is not (I think) possible in a relational database. So I need to have separate tables and the human view shown above will be a join of those various databases. For example, one "record" of the human view can be obtained by a join of various tables based on the date and timestamp i.e. the actual timing of the workout.
If that is correct, then I think a structure like this could work (at least, the ideas are there I think) :
Exercise database (so simply the list of exercises with their type which determines the attributes needed)
Name
Type
Bench press
Setsreps
Run
Run
Squats
Setsreps
Attributes (the attributes depending on exercise type. Maybe should split furhter to avoid varying number of columns depending on this exercise type i.e. run vs setsreps?)
Attributes
Attribute1
Attribute2
Attribute3
setsreps
Sets
Reps
Weight
Run
Time
Distance
NULL
Carry
Weight
Distance
NULL
Setsreps instances database (so the actual realization of the exercise on a certain day. This table will be huge !)
Date
Timestamp
Exercise
Sets
Reps
Weight
120821
10.30
Bench press
5
10
40
120821
17.00
Squats
3
5
120
Run instances database (same as above but for run instances. Since a run instance has different attributes than a setsreps instance. Is this the correct way to do this ?)
Date
Timestamp
Time
Distance
120821
10.30
60
400
170821
17.00
120
800
So then I could have the "human" view by performing a join of the setsreps & run tables on a particular data and timestamp (which together form a primary key)
Is this a correct way of thinking ?
Thanks for the support
A workout is comprised of exercises. And each exercise has particular
attributes.
One way to do this is to have a Workout table, an Exercise table, and an Attribute table. The Wikipedia article, Database normalization, will help you understand how to create a normalized database.
I'm assuming that you're not going to share attributes with exercises. If you do, this changes a one to many relationship between exercises and attributes to a many-to-many relationship. You would need a junction table to model a many to many relationship.
The Exercise table will have a foreign key for the Workout table. The Attribute table will have a foreign key for the Exercise table.

What database lends itself to a composition relationship between entites?

TLDR: Looking for a free database option to run locally that lends itself to composition. Object A is composed of B,C,D. Where B,C are of the same type as A. What should I use?
I would like to experiment with some open source database. I am playing a game that has crafting and it is a bit cumbersome to drill down into the objects to figure out what resources I need. This seems like a good problem to solve via a database. I was hoping to explore a NoSQL option as I do not have much experience with them.
To use a simple contrived example:
A staff: requires 5 wood
A spearhead: requires 2 iron
A spear: requires 1 staff, 1 spearhead
A trident: requires 1 staff, 3 spearheads, 2 iron
If I wanted to build 2 tridents and 1 spear a query to my database would inform me I need 15 wood, 18 iron.
So each craftable item would require some combination of base resources and/or other crafted items. The question to be put to the database is, given that I already possess some resources, what is remaining for me to collect to build some combination of items?
If I were to attempt this in SQL I would make 4 tables:
A resource table (the raw materials needed for crafting)
An item table (the things I can craft)
A many to many table, mapping items to items
A many to many table, mapping items to resources
What would you recommend I use? An answer might be, there are no NoSQL databases that lend themselves well to your problem set (model and queries).
Using the Bill of Materials picture I linked to in the comment, you have a Resource table.
Resource
--------
Resource ID
Resource Name
Here are some rows, based on your example. I deliberately added spearhead after spear. The order of the resources doesn't matter.
Resource ID | Resource Name
---------------------------
1 Wood
2 Iron
3 Staff
4 Spear
5 Spearhead
6 Trident
Next, you have a ResourceHiearchy table.
ResourceHiearchy
----------------
ResourceHiearchy ID
Resource ID
Parent Resource ID (FK)
Resource Quantity
Here are some rows, again based on your example.
ResourceHiearchy ID | Resource ID | P Resource ID | Resource Quantity
1 6 null null
2 5 6 3
3 3 6 1
4 2 6 2
5 4 3 1
6 4 5 1
7 5 2 2
8 3 1 5
Admittedly, this is difficult to create by hand. I probably made some errors in my example. You would have a part of your application that allows you to create Resource and ResourceHiearchy rows using the actual resource names.
You have to make several queries to retrieve all of the components for a top-level resource, starting with a null Parent ResourceHiearchy ID and querying your way through the resources. That's the disadvantage to a Bill of Materials.
The advantage of a Bill of Materials is there's no limit to the nesting and you can freely combine items and resources to make more items.
You can identify resources and items with a flag in your Resource table if you wish.
You might want to consider a graph data model, such as JanusGraph, where entitles (nodes) could be members of a set (defined as another node) via a relationship (edge).
That would allow you to have multi-child or multi-parent relationships you are talking about.
Mother == married to == Father
child1, child2, child 3 ... childn
Would all then have a "childOf" relationship to both the mother and separately to the father, and would be "siblingOf" the other members of the set, as labeled along their edges.
Make sense?
Here's more of the types of edge labels and multiplicities you can have:
https://docs.janusgraph.org/basics/schema/
Disclosure: I work for ScyllaDB, and our database is often used as a storage engine under JanusGraph implementations. There are many other types of NoSQL graph databases you can check out. Find the one that's right for your use case & data model.
Edit: JanusGraph is open source, as is Scylla:
https://github.com/JanusGraph
https://github.com/scylladb/scylla

One table with many necessary rows in SQL issue

Imagine one book table. A book details, are at least 40 items (Like: Name, HeadAuhtor, SecondAuthor, ISBN, DateOfPublish & so many more stupid columns).
I wanna add 30 more columns to this table completely related to my job but not really related to book table (Like: LibraryId, VisitedTimes, DownloadedTimes, HaveFiveStars, HaveFourStars, HaveThreeStars, HaveTwoStars, HaveOneStar [to calculate books rank], SoldTimes, LeasedTimes & some more).
SO, in total we have 70 columns for at least 5 million Books.
The second 30 columns will be filled eventually but:
Another important thing is that some libraries may fill All first 40 columns completely but some libraries with many books may fill just 10 of those 40 columns. so in this case we have at least 2 million rows with many NULL or 0 columns.
I want Speed & Performance.
This question is very important to me. and I cant test these both way to check the speed and performance myself, so don't tell me go and check it yourself.
I just need one best solution that explains what should I do!
Is it okay if I make a book table with 70 columns? or what? Split 70 columns in 2 tables with 1 to 1 relation? save the first 40 columns as Json in one string field (will Json be fast to get?)?
Does it really matter one 70-column table OR two 40 and 30-column tables with 1:1 relation?
Just add a separate table that has a book_id as a foreign key.
Since not all books will have additional details, left outer join from book table to the additional details table.
I will create 2 tables where most of the mandatory and important columns in table1(may 10-15 columns),rest in table2.
Most importantly some of your columns are extra,like HaveFiveStars, HaveFourStars, HaveThreeStars, HaveTwoStars, HaveOneStar.So instead of 5 column here you can have only one column like ViewerRating.
Similarly you can eliminate other columns
I think performance will improve .
Read this,
Which is more efficient: Multiple MySQL tables or one large table?
Most of the reason are already mention in this link.Also discussion in this link is not abut mySql related rather it is very generic RDBMS related.
You should carefully read each and every line.The reson given here are very technical.There is no assumption.
You have mention that there will be 5 million rows.Also most of the column will be null.I will say that not only your performance improve but also it will be very easy to maintain.
There are so many good point mention there in favour of multiple table.

Requesting feedback on database design

I am building a database for a christmas tree growing operation. I have put together, what I believe to be, a workable schema. I am hoping to get some feedback from someone, and I have no one. You are my only hope.
So, there are 3 growing plots, we will call them Orchards. Each Orchard has rows & columns, and each row/column intersection can have zero or one trees, planted in it. The rows/columns are numbers and letters, so row 3, column f, etc. Each row/column intersection has a status (empty, in use). A tree can be different species (denoted by manually created GID {Genetic ID}), modified (have a different species grafted on), or moved to a different location. So a plant can have one or many locations, and a location can contain, through history, one or many trees, but only one at a time.
Here is a diagram I put together:
So I was thinking for historical purposes, I would use the
treelocation table. Do you think it is unnecessary?
No, but in that case you should have the information pertaining to the tree's location in the tree location table. For instance "MovedYear". If a tree moves multiple times, don't you want to keep the Year of each Move, instead of just one MovedYear for each tree?
It's fine to have a history table the way you do, but right now, if TreeId 1 has been in 3 different locations, how could you query your database to see which location it's in NOW? All you'll see is:
TreeId LocationId
1 1
1 2
1 3
You won't know in what order the moves took place. (Unless you have some business rule that says trees can only move from 1 to 2 and from 2 to 3, and never follow any other order).
The usual way to solve this is to have a StartDate and EndDate in the history table.
It seams
A plant can have one or many locations
No, a plant have a location but it can move.
To gain this we need to
Have location foreign key(FK) inside Tree table, showing current tree location.
This FK needs to be mandatory (exposing have a)
To prevent multiple trees having the same location we need to have a unique key constraint on this FK column.
A plant can move, so to trace a plants location history
We will need a plant-location-history table
Each row/column intersection has a status (empty, in use)
So the intersections status can have predefined limited values.
Do we need a LocationStatus table? I don't think so. status can be a static field inside locatin table with a check constraint of (1= empty, 2= in-use, 3= ETC)

Hierarchical Data Structure Design (Nested Sets)

I'm working on a design for a hierarchical database structure which models a catalogue containing products (this is similar to this question). The database platform is SQL Server 2005 and the catalogue is quite large (750,000 products, 8,500 catalogue sections over 4 levels) but is relatively static (reloaded once a day) and so we are only concerned about READ performance.
The general structure of the catalogue hierarchy is:-
Level 1 Section
Level 2 Section
Level 3 Section
Level 4 Section (products are linked to here)
We are using the Nested Sets pattern for storing the hierarchy levels and storing the products which exist at that level in a separate linked table. So the simplified database structure would be
CREATE TABLE CatalogueSection
(
SectionID INTEGER,
ParentID INTEGER,
LeftExtent INTEGER,
RightExtent INTEGER
)
CREATE TABLE CatalogueProduct
(
ProductID INTEGER,
SectionID INTEGER
)
We do have an added complication in that we have about 1000 separate customer groups which may or may not see all products in the catalogue. Because of this we need to maintain a separate "copy" of the catalogue hierarchy for each customer group so that when they browse the catalogue, they only see their products and they also don't see any sections which are empty.
To facilitate this we maintain a table of the number of products at each level of the hierarchy "rolled up" from the section below. So, even though products are only directly linked to the lowest level of the hierarchy, they are counted all the way up the tree. The structure of this table is
CREATE TABLE CatalogueSectionCount
(
SectionID INTEGER,
CustomerGroupID INTEGER,
SubSectionCount INTEGER,
ProductCount INTEGER
)
So, onto the problem
Performance is very poor at the top levels of the hierarchy. The general query to show the "top 10" products in the selected catalogue section (and all child sections) is taking somewhere in the region of 1 minute to complete. At lower sections in the hierarchy it is faster but still not good enough.
I've put indexes (including covering indexes where applicable) on all key tables, run it through the query analyzer, index tuning wizard etc but still cannot get it to perform fast enough.
I'm wondering whether the design is fundamentally flawed or whether it's because we have such a large dataset? We have a reasonable development server (3.8GHZ Xeon, 4GB RAM) but it's just not working :)
Thanks for any help
James
Use a closure table. If your basic structure is a parent-child with the fields ID and ParentID, then the structure for a closure table is ID and DescendantID. In other words, a closure table is an ancestor-descendant table, where each possible ancestor is associated with all descendants. You may include a LevelsBetween field if you need. Closure table implementations usually include self-referencing records, i.e. ID 1 is an ancestor of descendant ID 1 with LevelsBetween of zero.
Example:
Parent/Child
ParentID - ID
1 - 2
1 - 3
3 - 4
3 - 5
4 - 6
Ancestor/Descendant
ID - DescendantID - LevelsBetween
1 - 1 - 0
1 - 2 - 1
1 - 3 - 1
1 - 4 - 2
1 - 6 - 3
2 - 2 - 0
3 - 3 - 0
3 - 4 - 1
3 - 5 - 1
3 - 6 - 2
4 - 4 - 0
4 - 6 - 1
5 - 5 - 0
The table is intended to eliminate recursive joins. You push the load of the recursive join into an ETL cycle that you do when you load the data once a day. That shifts it away from the query.
Also, it allows variable-level hierarchies. You won't be stuck at 4.
Finally, it allows you to slot products in non-leaf nodes. A lot of catalogs create "Miscellaneous" buckets at higher levels of the hierarchy to create a leaf-node to attach products to. You don't need to do that since intermediate nodes are included in the closure.
As far as indexing goes, I would do a clustered index on ID/DescendantID.
Now for your query performance. This takes a chunk out but not all. You mentioned a "Top 10". This implies ranking over a set of facts that you haven't mentioned. We need details to help tune those. Plus, this gets only gets the leaf-level sections, not the products. At the very least, you should have an index on your CatalogueProduct that orders by SectionID/ProductID. I would force Section to Product joins to be loop joins based on the cardinality you provided. A report on a catalog section would go to the closure table to get descendants (using a clustered index seek). That list of descendants would then be used to get products from CatalogueProduct using the index by looped index seeks. Then, with those products, you would get the facts necessary to do the ranking.
you might be able to solve the customer groups problem with roles and treeId's but you'll have to provide us with the query.
Might it be possible to calculate the ProductCount and SubSectionCount after the load each day?
If the data is changing only once a day surely it's worthwhile to calculate these figures then, even if some denormalization is required.

Resources