I'm working on a SSIS/SAAS project to build a BI solution.
One of my data sources contains informations about a Service Desk.
A user can create a new request related to a service catalog (for example because his laptop crashed).
So it will generate a new row in the Request table (creation date, comment, tracking number, etc.).
To solve this issue, few actions will be perform. So these actions will be recorded in the action table (there is a One to many relationship between Request and Action tables).
An action can be : "try to format computer", "change hard drive", etc.
In the production environnent a Request contains aproximatly from 10 to 100 actions.
I'm facing a problem about designing this because many columns of my fact table cannot be aggregated.
In fact there are many date columns, tracking number (string), bollean values and only few SUM attributes.
Here is an extract of the dw model :
FactRequest :
ID (DW primary key)
Business Key (original PK)
Request number (string)
Begin date (datetime)
End date (datetime)
Max resolution date (datetime)
Time to solve request
Comment (string)
Delay (int)
...
FactAction :
ID (DW primary key)
Business Key (original PK)
Begin date (datetime)
End date (datetime)
Name (string)
Time to solve action
...
I know adding non aggregable data in a fact table is not the best solution.
In my SSAS project, I created a new cube based on my FactRequest table.
It works fine except for "string" attributes such as the request identifier because it is a string.
Should I use an SSAS "fact dimension" to create a "Request" dimension based on my FactRequest table ?
Any idea ?
Thanks so much,
Sounds like you are lacking specific requirements (which is very common in BI projects). Is the textual data required to be displayed in the report at all? If yes: is it required only in some detail view?
Columns like ID, Business Key, Request number typically have little value in your cube. This data is only interesting for detailed reports (e.g. getting all actions taken for a certain request ID) and such lists often require no aggregates. You do not need a cube for lists like that, you can query the database directly with SQL.
Only if you require a summary report (e.g. getting the average time taken to solve a request per weekday) the cube could make sense - it may still not be worth the effort to use an SSAS database if you can get almost the same query response time with direct SQL queries.
Related
I have these two tables
items
id
name
itemStats (many to one relation to items table)
id
item_id
update_time
stat_value
In most pages, the application will just display a summary of items in list forms and will need to query the most recent update_time and min/max values of stat_value
Currently it's setup to just do the full query each time it's needed but since the data is written once a day only and read maybe thousands of times per day is it worth caching it in the items table? (by adding lastUpdate, minStat',maxStatfields that would be updated after each INSERT in theitemStats` table)
Or is Symfony or Doctrine doing enough caching behind the scenes on all frequent queries to make it useless do to any on our own?
Neither Symfony nor Doctrine should/could do any caching, since they cannot know when data might have changed and the cache has to be invalidated.
So add the extra fields if you can keep them up to date easily.
Or check the caching-options of your database of choice and let the DBMS do all the work:
SELECT
MAX(update_time) as lastUpdateTime,
MIN(stat_value) as minStat,
MAX(stat_value) as maxStat
FROM
itemStats
GROUP BY
item_id
MariaDB Query Cache: https://mariadb.com/kb/en/library/query-cache/
I'm creating a rather large APEX application which allows managers to go in and record statistics for associates in the company. Currently we have a database in oracle with data from AD which hold all the associates information. Name, Manager, Employee ID, etc.
Now I'm responsible for creating and modeling a table that will house all their stats for each employee. The table I have created has over 90+ columns in it. Some contain data such as:
Documents Processed
Calls Received
Amount of Doc 1 Processed
Amount of Doc 2 Processed
and the list goes on for well over 90 attributes. So here is my question:
When creating this table in my application with so many different columns how would I go about choosing a primary key that's appropriate? Should I link it to our employee table using the employees identification which is unique (each have a associate number)?
Secondly, how can I create these tables (and possibly form) to allow me to associate the statistic I am entering for an individual to the actual individual?
I have ordered two books from amazon on data modeling since I am new to APEX and DBA design. Not a fresh chicken, but new enough to need some guidance. An additional problem I am running into is that each form can have only 60 fields to it. So I had thought about creating tables for different functions out of my 90+ I have.
Thanks
4.2 allows for 200 items per page.
oracle apex component limits
A couple of questions come to mind:
Are you sure that the employee Ids are not recyclable? If these ids are unique and not recycled.. you've found yourself a good primary key.
What do you plan on doing when you decide to add a new metric? Seems like you might have to add a new column to your rather large and likely not normalized table.
I'd recommend a vertical table for your metrics.. you can use oracle's pivot function to make your data appear more like a horizontal table.
If you went this route you would store your employee Id in one column, your metric key in another, and value...
I'd recommend that you create a metric table consisting of a primary key, a metric label, an active indicator, creation timestamp, creation user id, modified timestamp, modified user id.
This metric table will allow you to add new metrics, change the name of the metric, deactivate a metric, and determine who changed what and when.
This would be a much more flexible approach in my opinion. You may also want to think about audit logs.
I'm designing a database that has a couple of tables; FAQ's, Bulletins, and Attachments. Bulletins and FAQ's could have an attachment associated with them, so my initial thought was to create a joining table with the two primary keys as a composite key:
Bulletin
--------
BulletinID
Subject
Description
Notes
Attachment
-----------
AttachmentID
FileName
FilePath
etc.
Joining table:
BulletinAttachments
-------------------
BulletinID
AttachmentID
As I design this, I also thought, what if other entities are introduced later (say Newsletter, Email, etc.) that need Attachments as well. I would have to create a joining table for each of these entities. Not awful, but it made me think, what if I got rid of the joining tables and put an AttachmentType in the Attachment table and then assigned the type accordingly:
AttachmentType
--------------
AttachmentTypeID
AttachmentType
Description
The data in that table would be:
1-Bulletin
2-FAQ
3-Newsletter
4-Email
Then the Attachment table would hold the AttachmentTypeID to identify it:
Attachments
-----------
AttachmentID
AttachmentTypeID
FileName
FilePath
etc.
So my question is, for performance wise (using SQL 2008 R2), is there a better choice between the two? Is there a better way to design this? My concern with using individual joining tables is that we may have more entities come along and to accommodate Attachments, we would have to create a joining table and on our front-end software, we would have to write logic for it whereas the AttachmentTypeID would allow the front-end to insert a new AttachmentType and no db interaction would be needed.
Your second solution doesn't have a way to link the attachment to the item, just what kind of item it is.
Even if it did (ie: an itemID), what you would create would be a violation of 4th Normal form - ie: a multivalued dependency.
Stick with your first plan, but consider whether Bulletins are fundamentally different to Newsletters, Emails, FAQs, etc in your application. If you do need a new table for Newsletters, add a new table for NewsletterAttachments.
Also consider, are you going to share attachments between different items, or types of item?
I am totally agree with podiluska. you need to create separate table for each type of attachment otherwise you cant map itemid with attachment and you will face a problem of joining table for different type of attachment . also if you make separate table for each type of attachment then performance will be faster .
I am looking for pattern, framework or best practice to handle a generic problem of application level data synchronisation.
Let's take an example with only 1 table to make it easier.
I have an unreliable datasource of product catalog. Data can occasionally be unavailable or incomplete or inconsistent. ( issue might come from manual data entry error, ETL failure...)
I have a live copy in a Mysql table in use by a live system. Let's say a website.
I need to implement safety mecanism when updating the mysql table to "synchronize" with original data source. Here are the safety criteria and the solution I an suggesting:
avoid deleting records when they temporarily disappear from datasource => use "deleted" boulean/date column or an archive/history table.
check for inconsistent changes => configure rules per columns such as : should never change, should only increment,
check for integrity issue => (standard problem, no point discussing approach)
ability to rollback last sync=> restore from history table ? use a version inc/date column ?
What I am looking for is best practice and pattern/tool to handle such problem. If not you are not pointing to THE solution, I would be grateful of any keywords suggestion that would me narrow down which field of expertise to explore.
We have the same problem importing data from web analytics providers - they suffer the same problems as your catalog. This is what we did:
Every import/sync is assigned a unique id (auto_increment int64)
Every table has a history table that is identical to the original, but has an additional column "superseded_id" which gets the import-id of the import, that changed the row (deletion is a change) and the primary key is (row_id,superseded_id)
Every UPDATE copies the row to the history table before changing it
Every DELETE moves the row to the history table
This makes rollback very easy:
Find out the import_id of the bad import
REPLACE INTO main_table SELECT <everything but superseded_id> FROM history table WHERE superseded_id=<bad import id>
DELETE FROM history_table WHERE superseded_id>=<bad import id>
For databases, where performance is a problem, we do this in a secondary database on a different server, then copy the found-to-be-good main table to the production database into a new table main_table_$id with $id being the highest import id and have main_table be a trivial view to SELECT * FROM main_table_$someid. Now by redefining the view to SELECT * FROM main_table_$newid we can atomically swicth the table.
I'm not aware of a single solution to all this - probably because each project is so different. However, here are two techniques I've used in the past:
Embed the concept of version and validity into your data model
This is a way to deal with change over time without having to resort to history tables; it does complicate your queries, so you should use it sparingly.
For instance, instead of having a product table as follows
PRODUCTS
Product_ID primary key
Price
Description
AvailableFlag
In this model, if you want to delete a product, you execute "delete from product where product_id = ..."; modifying price would be "update products set price = 1 where product_id = ...."
With the versioned model, you have:
PRODUCTS
product_ID primary key
valid_from datetime
valid_until datetime
deleted_flag
Price
Description
AvailableFlag
In this model, deleting a product requires you to update products set valid_until = getdate() where product_id = xxx and valid_until is null, and then insert a new row with the "deleted_flag = true".
Changing price works the same way.
This means that you can run queries against your "dirty" data and insert it into this table without worrying about deleting items that were accidentally missed off the import. It also allows you to see the evolution of the record over time, and roll-back easily.
Use a ledger-like mechanism for cumulative values
Where you have things like "number of products in stock", it helps to create transactions to modify the amount, rather than take the current amount from your data feed.
For instance, instead of having a amount_in_stock column on your products table, have a "product_stock_transaction" table:
product_stock_transactions
product_id FK transaction_date transaction_quantity transaction_source
1 1 Jan 2012 100 product_feed
1 2 Jan 2012 -3 stock_adjust_feed
1 3 Jan 2012 10 product_feed
On 2 Jan, the quantity in stock was 97; on 3 Jan, 107.
This design allows you to keep track of adjustments and their source, and is easier to manage when moving data from multiple sources.
Both approaches can create large amounts of data - depending on the number of imports and the amount of data - and can lead to complex queries to retrieve relatively simple data sets.
It's hard to plan for performance concerns up front - I've seen both "history" and "ledger" work with large amounts of data. However, as Eugen says in his comment below, if you get to an excessively large ledger, it may be necessary to to clean up the ledger table by summarizing the current levels, and deleting (or archiving) old records.
For the first time, I am developing in an environment in which there is a central repository for a number of different industry standard reference data tables and many different customers who need to select records from these industry standard reference data tables to fill in foreign key information for their customer specific records.
Because these industry standard reference files are utilized by all customers, I want to reserve Create/Update/Delete access to these records for global product administrators. However, I would like to implement a (semi-)automated interface by which specific customers could request record additions, deletions or modifications to any of the industry standard reference files that are shared among all customers.
I know I need something like a "data change request" table specifying:
user id,
user request datetime,
request type (insert, modify, delete),
a user entered text explanation of the change request,
the user request's current status (pending, declined, completed),
admin resolution datetime,
admin id,
an admin entered text description of the resolution,
etc.
What I can't figure out is how to elegantly handle the fact that these data change requests could apply to dozens of different tables with differing table column definitions. I would like to give the customer users making these data change requests a convenient way to enter their proposed record additions/modifications directly into CRUD screens that look very much like the reference table CRUD screens they don't have write/delete permissions for (with an additional text explanation and perhaps request priority field). I would also like to give the global admins a tool that allows them to view all the outstanding data change requests for the users they oversee sorted by date requested or user/date requested. Upon selecting a data change request record off the list, the admin would be directed to another CRUD screen that would be populated with the fields the customer users requested for the new/modified industry standard reference table record along with customer's text explanation, the request status and the text resolution explanation field. At this point the admin could accept/edit/reject the requested change and if accepted the affected industry standard reference file would be automatically updated with the appropriate fields and the data change request record's status, text resolution explanation and resolution datetime would all also be appropriately updated.
However, I want to keep the actual production reference tables as simple as possible and free from these extraneous and typically null customer change request fields. I'd also like the data change request file to aggregate all data change requests across all the reference tables yet somehow "point to" the specific reference table and primary key in question for modification & deletion requests or the specific reference table and associated customer user entered field values in question for record creation requests.
Does anybody have any ideas of how to design something like this effectively? Is there a cleaner, simpler way I am missing?
Option 1
If preserving the base tables is important then I would create a "change details" table as a child to your change request table. I'm envisioning something like
ChangeID
TableName
TableKeyValue
FieldName
ProposedValue
Add/Change/Delete Indicator
So you'd have a row in this table for every proposed field change. The challenge in this scenario is maintaining the mapping of TableName and FieldName values to the actual tables and fields. If your database structure if fairly static then this may not be an issue.
Option 2
Add a ChangeID field to each of your base tables. When a change is proposed add a record to the base table with the ChangeID populated. So as an example if you have a Company table, for a single company you could have multiple records:
CompanyCode ChangeID CompanyName CompanyAddress
----------- -------- ----------- --------------
COMP1 My Company Boston <-- The "live" record
COMP1 1 New Name Boston <-- A proposed change
When the admin commits the change the existing live record is deleted or archived and the ChangeID value is removed from the proposed record making it the live record. It may be a little tricky to handle proposed deletions with this option. This option also has the potential for impacting performance of selecting live data for normal usage. However it does save you the hassle of maintaining a list of table names and field names somewhere in your code.
I'm sure others will have some opinions!