I have connected my power BI to companys oracle database, and with measure I have calculated the daily inventory level. My problem is that I only have the current inventory level in my power BI which updates automatically when i press refresh. (no historic data)
Is it possible to somehow export the measurements data (inventory level) to a new table where there could be for example a date and a inventory level from that certain date.
At the endpoint there would be historic data in one table about the development of our companys inventory levels.
It's possible to get measure values with recently announced XMLA Endpoint , but you need Power BI Premium or Power BI Embedded capacity for that to work (at least for now).
Other way of doing this could be with an Excel file and Analyse with Excel -feature. For this to work, you need to update the excel file and save the results manually.
Third option is the way #MJoy suggested, a log table to the source database with automatic updates.
Related
I have a table with 200 million records. This table is updated per minute and new records added to it. I want to query in format of a group by and sum function for KPI analysis. What is the best way to query the table without performance drawbacks? Currently, I save the result in a separate table and I updated this table with a SQL Server trigger, but it isn't a good way. Is there any other way you can suggest?
If you use SQL Server 2016 or an upper version of SQL Server, you can use
Real-Time Operational Analytics approach in order to overcome this type
of issue. Real-Time Operational helps to run analytics and OLTP workloads
on the same database. In this way, you can avoid the ETL process.
Real-Time Operational Analytics could be an option for your issue.
Using another table is a good solution if the events are stored in the second table. You can save events by month, weekly, daily, etc. and calculate the system analysis according to it.
I am a newbie to Database Systems and I was wondering what is difference between Temporal database and Time-series database. I have searched on the internet but I am not getting any comparison of the two.
A temporal database stores events which happen at a certain time or for a certain period. For example, the address of a customer may change so when you join the invoice table with the customer the answer will be different before and after the move of the customer.
A time-series database stores time-series which are array of number indexed by time. Like the evolution of the temperature with one measure every hour. Or the stock value every second.
Time-series database: A time series database is a database that is optimized to store time-series data. This is data that is stored along with a time stamp so that changes in the data can be measured over time. Prometheus is a time-series database used by Sound Cloud, Docker and Show Max.
Real world uses:
Autonomous trading algorithms, continuously collects data on market changes.
DevOps monitoring stores data of the state of the system over its run time.
Temporal databases contain data that is time sensitive. That is, the data are stored with time indicators such as the valid time (time for which the entry remains valid) and transaction time (time the data was entered into the database). Any database can be used as a temporal database if the data is managed correctly.
Real world uses:
Shop inventory systems keep track of stock quantities, time of purchase and best-before-dates.
Industrial processes that are dependant on valid time data during manufacturing and sales.
In an on-premises SQL Server database, I have a number of tables in to which various sales data for a chain of stores is inserted during the day. I would like to "harvest" these data to Azure every, say 15, minutes via Data Factory and an on-premises data management gateway. Clearly, I am not interested in copying all table data every 15 minutes, but only in copying the rows that have been inserted since last fetch.
As far as I can see, the documentation suggests using data "slices" for this purpose. However, as far as I can see, these slices require a timestamp (e.g. a datetime) column to exist on the tables where data is fetched from.
Can I perform a "delta" fetch (i.e. only fetch the rows inserted since last fetch) without having such a timestamp column? Could I use a sequential integer column instead? Or even have no incrementally increasing column at all?
Assume that the last slice fetched had a window from 08:15 to 08:30. Now, if the clock on the database server is a bit behind the Azure clock, it might add some rows with the timestamp being set to 08:29 after that slice was fetched, and these rows will not be included when the next slice (08:30 to 08:45) is fetched. Is there a smart way to avoid this problem? Shifting the slice window a few minutes into the past could minimize the risk, but not totally eliminate it.
Take Azure Data Factory out of the equation. How do you arrange for transfer of deltas to a target system? I think you have a few options:
add date created / changed columns to the source tables. Write parameterised queries to pick up only new or modified values. ADF supports this scenario with time slices and system variables. Re identity column, you could do that with a stored procedure (as per here) and a table tracking the last ID sent.
Engage Change Data Capture (CDC) on the source system. This will allow you to access deltas via the CDC functions. Wrap them in a proc and call with the system variables, similar to the above example.
Always transfer all data, eg to staging tables on the target. Use delta code EXCEPT and MERGE to work out what records have change; obviously not ideal for large volumes, this would work for small volumes.
HTH
We are planning to add this capability into ADF. It may start from sequential integer column instead of timestamp. Could you please let me know if the sequential integer column will help?
By enabling "Change Tracking" on SQL Server, you can leverage on the "SYS_CHANGE_VERSION " to incrementally load data from On-premise SQL Server or Azure SQL Database via Azure Data Factory.
https://learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-change-tracking-feature-portal
If using SQL Server 2016, see https://msdn.microsoft.com/en-us/library/mt631669.aspx#Enabling-system-versioning-on-a-new-table-for-data-audit. Otherwise, you can implement the same using triggers.
And use NTP to synchronize your server time.
I need to remove duplicated records, as a maintenance task, inside the sql server instance or in my local compact edition testing database. Because, I have a tool that reads a clock device that outputs workers check-in/out workday. I export reading data to Xml files as a backup and insert the objects parsed into the database.
So, there are to many records for insertion daily and I will like to do it in a optimal manner without having to check other values existing in the database every time I need to insert.
What recommendation you give me?
I'm using Entity Framework 6
Do I deal with EF and Linq for managing duplicates and SqlBulkCopy?
Do I create temporary tables in Sql Server?
Do I create a Sql store procedure that does so?
Do I use SSIS (I'm a newbie on that) for importing Xml files?
I have two tables:
-Clock (Id, Name, Location)
-Stamp (Id, ClockId, WorkerId, StartDate, EndDate, State)
State: Evaluation of worker assistance attending to Start/End (in a normal work day: 8.00am-5.00pm).
-BadStart
-BadEnd
-Critical (Start/End out of admisible range)
-Pending (Those who not yet has been processed and normalized)
How do I process data:
There are 2 clocks units (each creates its own stamps, but workers can check-in/out in any of them)
-Read clock data from the device (other application does that, the physical machine has a scheduled task that runs a script that reads the clock unit device. Output: Xml files)
-Parse Xml files (Compatibility issue: Human Resources department has other application that reads it in that specific format)
-Insert/update records in database according to some normalizing rules
As you could see, the table can't have unique fields, because the same worker can check-in/out several times (by mistake, by confirmation, by other clock) and all these stamps has to be unified/normalized for the day in course.
The duplicates are created each time I run the parser that reads all Xml files in the directory and insert them in the database.
I don't have permissions to modify the physical machine directory hierarchy.
So I'm looking a better strategy for clasify, store and remove redundant records.
The task should be performed daily and several Xml files are created from each clock unit in a specific directory. The clock is connected via a serial wire to a physical machine.
Depending on your preference and data model, there are several ways to skin this cat.
See the following links that have examples. Most of them use CTE - Common Table Expression. You should be easily able to adapt it to your needs, and then schedule the script to run as a SQL Server Job periodically.
1) Different strategies for removing duplicate records in SQL Server.
2) Using CTE to remove duplicate records
Change Data Capture is a new feature in SQL Server 2008. From MSDN:
Change data capture provides
historical change information for a
user table by capturing both the fact
that DML changes were made and the
actual data that was changed. Changes
are captured by using an asynchronous
process that reads the transaction log
and has a low impact on the system
This is highly sweet - no more adding CreatedDate and LastModifiedBy columns manually.
Does Oracle have anything like this?
Sure. Oracle actually has a number of technologies for this sort of thing depending on the business requirements.
Oracle has had something called Workspace Manager for a long time (8i days) that allows you to version-enable a table and track changes over time. This can be a bit heavyweight, though, because it is based on views with instead-of triggers.
Starting in 11.1 (as an extra cost option to the enterprise edition), Oracle has a Total Recall that asynchronously mines the redo logs for data changes that get logged to a separate table which can then be queried using flashback query syntax on the main table. Total Recall is automatically going to partition and compress the historical data and automatically takes care of purging the data after a specified data retention period.
Oracle has a LogMiner technology that mines the redo logs and presents transactions to consumers. There are a number of technologies that are then built on top of LogMiner including Change Data Capture and Streams.
You can also use materialized views and materialized view logs if the goal is to replicate changes.
Oracle has Change Data Notification where you register a query with the system and the resources accessed in that query are tagged to be watched. Changes to those resources are queued by the system allowing you to run procs against the data.
This is managed using the DBMS_CHANGE_NOTIFICATION package.
Here's an infodoc about it:
http://www.oracle-base.com/articles/10g/dbms_change_notification_10gR2.php
If you are connecting to Oracle from a C# app, ODP.Net (Oracles .Net client library) can interact with Change Data Notification to alert your c# app when Oracle changes are made - pretty kewl. Goodbye to polling repeatedly for data changes if you ask me - just register the table, set up change data notifcation through ODP.Net and wala, c# methods get called only when necessary. woot!
"no more adding CreatedDate and LastModifiedBy columns manually" ... as long as you can afford to keep complete history of your database online in the redo logs and never want to move the data to a different database.
I would keep adding them and avoid relying on built-in database techniques like that. If you have a need to keep historical status of records then use an audit table or ship everything off to a data warehouse that handles slowly changing dimensions properly.
Having said that, I'll add that Oracle 10g+ can mine the log files simply by using flashback query syntax. Examples here: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_10002.htm#i2112847
This technology is also used in Oracle's Datapump export utility to provide consistent data for multiple tables.
I believe Oracle has provided auditing features since 8i, however the tables used to capture the data are rather complex and there is a significant performance impact when this is turned on.
In Oracle 8i you could only enable this for an entire database and not a table at a time, however 9i introduced Fine Grained Auditing which provides far more flexibility. This has been expanded upon in 10/11g.
For more information see http://www.oracle.com/technology/deploy/security/database-security/fine-grained-auditing/index.html.
Also in 11g Oracle introduced the Audit Vault, which provides secure storage for audit information, even DBA's cannot change this data (according to Oracle's documentation, I haven't used this feature yet). More info can be found at http://www.oracle.com/technology/deploy/security/database-security/fine-grained-auditing/index.html.
Oracle has mechanism called Flashback Data Archive. From A Fresh Look at Auditing Row Changes:
Oracle Flashback Query retrieves data as it existed at some time in the past.
Flashback Data Archive provides the ability to track and store all transactional changes to a table over its lifetime. It is no longer necessary to build this intelligence into your application. A Flashback Data Archive is useful for compliance with record stage policies and audit reports.
CREATE TABLESPACE SPACE_FOR_ARCHIVE
datafile 'C:\ORACLE DB12\ARCH_SPACE.DBF'size 50G;
CREATE FLASHBACK ARCHIVE longterm
TABLESPACE space_for_archive
RETENTION 1 YEAR;
ALTER TABLE EMPLOYEES FLASHBACK ARCHIVE LONGTERM;
select EMPLOYEE_ID, FIRST_NAME, JOB_ID, VACATION_BALANCE,
VERSIONS_STARTTIME TS,
nvl(VERSIONS_OPERATION,'I') OP
from EMPLOYEES
versions between timestamp timestamp '2016-01-11 08:20:00' and systimestamp
where EMPLOYEE_ID = 100
order by EMPLOYEE_ID, ts;