SQL Statement to get all entries in timerange based on identifier - database

I'm storing interface data from my Router gathered through SNMP in a MariaDB. The data is structured in the following way (simplified):
id TIMESTAMP READING VALUE
============================================================
100 2020-04-15 11:29:51 if03_name eth0
101 2020-04-15 11:29:51 if03_totalBytesRx 654321
102 2020-04-15 11:29:51 if03_totalBytesTx 123456
103 2020-04-15 11:30:51 if03_totalBytesRx 765432
104 2020-04-15 11:30:51 if03_totalBytesTx 234567
Now to get the data received and transmitted for eth0 it is easily possible to select on the READING if03_totalBytesRx or if03_totalBytesTx.
For instance, I can execute the following query to get the average bits per second received at any given time:
SELECT
TIMESTAMP,
GREATEST(0,(VALUE - LAG(VALUE,1) OVER (ORDER BY TIMESTAMP))*8/(UNIX_TIMESTAMP(TIMESTAMP) - UNIX_TIMESTAMP(LAG(TIMESTAMP,1) OVER (ORDER BY TIMESTAMP)))) as value
FROM history
WHERE
READING = "if03_totalBytesRx"
Unfortunately, sometimes (I believe when an interface goes down and up again) the mapping between READING and interface changes, such that if03 is not eth0 any more (if03 could for instance be eth1 or also noSuchInstance), whereas eth0 is assigned another READING, for example if29:
id TIMESTAMP READING VALUE
============================================================
200 2020-04-15 12:16:51 if03_totalBytesRx 876543
201 2020-04-15 12:16:51 if03_totalBytesTx 345678
202 2020-04-15 12:17:51 if03_name noSuchInstance
203 2020-04-15 12:17:51 if03_totalBytesRx noSuchInstance
204 2020-04-15 12:17:51 if03_totalBytesTx noSuchInstance
205 2020-04-15 12:17:51 if29_name eth0
206 2020-04-15 12:17:51 if29_totalBytesRx 987654
207 2020-04-15 12:17:51 if29_totalBytesTx 456789
Note that if03_name is only stored in the DB when changed, not every minute.
Obviously that causes either no data when querying the READING if03_totalBytesRx (in the case of noSuchInstance) or false data, i.e. data from another interface.
What would be a viable way to select all the ifXX_totalBytesRx and ifXX_totalBytesTx for all the timeranges where the corresponding ifXX_name equals to eth0? (E.g. in the example above the union of 11:29:51 to 12:17:50 using if03 plus everything from 12:17:51 to NOW() using if29, assuming that id 207 is the last entry in the database)

Ok, I think I found a solution myself, however it is kind of slow, so I'm happy about any improvements:
First, I created a view ubnt that splits the original READING into a new field INTERFACE (the part before the underscore) and another field READING (the part after the underscore, reusing the name):
SELECT
id,
TIMESTAMP,
substring_index(READING,'_',1) AS INTERFACE,
substring_index(READING,'_',-1) AS READING,
VALUE
FROM history
WHERE READING like 'if%'
The second step is to create another view from this called ubnt_if that shows the timerange of a logical to physical interface mapping:
SELECT
TIMESTAMP AS START,
COALESCE(LEAD(TIMESTAMP,1) OVER ( PARTITION BY INTERFACE ORDER BY TIMESTAMP),NOW()) AS END,
INTERFACE AS LOGICAL,
VALUE AS PHYSICAL
FROM ubnt
WHERE READING = 'name'
The third step is to join the interface to the original data, based on the timestamp. For this, I used another view called ubnt_int:
SELECT
a.TIMESTAMP AS TIMESTAMP,
a.READING AS READING,
a.VALUE AS VALUE,
b.PHYSICAL AS INTERFACE
FROM (
ubnt a
LEFT JOIN ubnt_if b ON(
a.INTERFACE = b.LOGICAL AND
a.TIMESTAMP BETWEEN b.START AND b.END
)
)
This generates a table of data that I can filter based on the physical interface, not the "logical" interface that could change.
I was inspired by this answer to another question.

Related

SQL recursive get BOM from PSP

I am having a MS SQL Server (2016) and a database which contains i.a. table like this : (it´s a view created in an Autodesk PSP Database - please don´t ask why ... :-) )
CHILD_AIMKEY
QUANTITY
PARENT_AIMKEY
StatusOfParent
StatusOfChild
5706657
1
5664344
100
103
5706745
1
5664344
100
103
5707104
1
5664344
100
103
5707109
1
5664344
100
100
5801062
1
5664344
100
103
The "children" can contain other "children" and in that case they would be their "parents".
So it´s a standard structured BOM table from a CAD PDM System.
If I do the following "Select Statement" I get all the children of the top level parent:
SELECT [CHILD_AIMKEY] , [POSITION], [QUANTITY] ,[PARENT_AIMKEY],[StatusOfParent],[StatusChild] FROM database_table where Parent_aimkey = '5664344'
(as shown in the table above)
My first question is : How to recursivly process all children of each parent from that table ? (Could be an other table or direct output)
The format should be: Parent_Aimkey, Child_Aimkey, Quantity
The second question is a bit more complicated:
I try it with some "pseudo code":
If Tree_Level_of_DIRECT_Parent < 3 then show CHILD_AIMKEY,QUANTITY in queryresult_above
If Tree_Level_of_DIRECT_Parent > 2 and StatusOf_DIRECT_Parent = 103 and StatusOf_DIRECT_Child = 103 then show CHILD_AIMKEY,QUANTITY in queryresult_above
Is that in some way possible ? (If there is a need to extend the database view of an other field or another table, that´s no problem)
I know this looks a bit confusing, but what I need is the Autodesk Inventor structured BOM in an SQL Statement or stored procedure.
Any would be really much appreciated
Thanks
Alex.

Email sql query results of multiple email addresses daily

I am fairly green at SQL and have reached a road block. I have a Job that already runs a query and sends an email to our purchasing department agents via the agent.
Here is my sample data they receive as a text attachment:
po_num vend_num qty_needed external_email_addr
318 1 200 email#earthlink.net
318 1 910 email#earthlink.net
703 2 250 email#row.com
993 3 3600 email#cast.com
993 3 3600 email#cast.com
676 4 1 NULL
884 5 10000 email#Futures.com
118 5 2500 email#Futures.com
My goal is to automatically send each vendor one email of the qty_needed using the email address in external_email_addr field. Also, if the email address is NULL it would send me an email that this needs to be fixed.
I am not sure how complicated or simple this is but any help would be greatly appreciated.
Since the po_num is unique you will generate several mails per email address per day based on the example data you provided.
I dont have access to SQL at the moment so the syntax might need some sprucing up.
SELECT po_num,
vend_num,
qty_needed,
CASE WHEN external_email_addr ='' THEN COALESCE(external_email_addr,'defaultempty#fixthisproblem.com') ELSE external_email_ddr END AS email_address
FROM table_name

Sqoop & Hadoop - How to join/merge old data and new data imported by Sqoop in lastmodified mode?

Background:
I have a table with the following schema on a SQL server. Updates to existing rows is possible and new rows are also added to this table.
unique_id | user_id | last_login_date | count
123-111 | 111 | 2016-06-18 19:07:00.0 | 180
124-100 | 100 | 2016-06-02 10:27:00.0 | 50
I am using Sqoop to add incremental updates in lastmodified mode. My --check-column parameter is the last_login_date column. In my first run, I got the above two records into Hadoop - let's call this current data. I noted that the last value (the max value of the the check column from this first import) is 2016-06-18 19:07:00.0.
Assuming there is a change on the SQL server side, I now have the following changes on the SQL server side:
unique_id | user_id | last_login_date | count
123-111 | 111 | 2016-06-25 20:10:00.0 | 200
124-100 | 100 | 2016-06-02 10:27:00.0 | 50
125-500 | 500 | 2016-06-28 19:54:00.0 | 1
I have the row 123-111 updated with a more recent last_login_date value and the count column has also been updated. I also have a new row 125-500 added.
On my second run, sqoop looks at all columns with a last_login_date column greater than my known last value from the previous import - 2016-06-18 19:07:00.0
This gives me only the changed data, i.e. 123-111 and 125-500 records. Let's call this - new data.
Question
How do I do a merge join in Hadoop/Hive using the current data and the new data so that I end up with the updated version of 123-111, 124-100, and the newly added 125-500?
Changed data load using scoop is a two phase process.
1st phase - load changed data into some temp (stage) table using
sqoop import utility.
2nd phase - Merge changed data with old data using sqoop-merge
utility.
If the table is small(say few M records) then use full load using sqoop import.
Sometimes it's possible to load only latest partition - in such case use sqoop import utility to load partition using custom query, then instead of merge simply insert overwrite loaded partition into target table, or copy files - this will work faster than sqoop merge.
You can change the existing Sqoop query (by specifying a new custom query) to get ALL the data from the source table instead of getting only the changed data. Refer using_sqoop_to_move_data_into_hive. This would be the simplest way to accomplish this - i.e doing a full data refresh instead of applying deltas.

Compare Records in Single SQL Table

Using SQL 2008 R2
I have a table that contains Windows security event log entries. The possible event ID's are 560, 562 and 564.
These are the three event log entries created when a user deletes a file.
The 560 contains most of the data regarding the user who performed the delete, the source IP, the file name, etc. However, the 560 is not the event that confirms the delete occurred. The 560 is the object open event type.
When a user deletes a file the 560 (object open) is created first, then a 562 (handle closed) and finally a 564 (object delete).
The common link between all three of these events is the Handle ID. So for a single delete you'll have something similar to the following:
EventID HandleID UserName Event File
564 000015f7 NT AUTHORITY\SYSTEM Object Delete N/A
562 000015f7 NT AUTHORITY\SYSTEM Handle Closed N/A
560 000015f7 DOMAIN\USER Object Open \share\filename
I would like UserName and File from 560 event but only when there's a 564 w/ the same HandleID.
There's many ways to do it. You could use a correlated subquery:
SELECT UserName, File
FROM EventTableNameNotProvided e1
WHERE e1.EventID = 560
AND EXISTS (SELECT 1
FROM EventTableNameNotProvided e2
WHERE e2.HandleID = e1.HandleID
AND e2.EventID = 564)
Or a self JOIN:
SELECT e1.UserName, e1.File
FROM EventTableNameNotProvided e1
JOIN EventTableNameNotProvided e2
ON e2.HandleID = e1.HandleID
WHERE e1.EventID = 560
AND e2.EventID = 564
Either or both queries might be more useful with a SELECT DISTINCT. It depends on your data.
One way to solve this is to use a subquery:
SELECT UserName, File
FROM YourTable
WHERE EventID = 560
AND HandleID IN (
SELECT HandleID
FROM YourTable
WHERE EventID = 564
)

How to get last access/modification date of a PostgreSQL database?

On development server I'd like to remove unused databases. To realize that I need to know if database is still used by someone or not.
Is there a way to get last access or modification date of given database, schema or table?
You can do it via checking last modification time of table's file.
In postgresql,every table correspond one or more os files,like this:
select relfilenode from pg_class where relname = 'test';
the relfilenode is the file name of table "test".Then you could find the file in the database's directory.
in my test environment:
cd /data/pgdata/base/18976
ls -l -t | head
the last command means listing all files ordered by last modification time.
There is no built-in way to do this - and all the approaches that check the file mtime described in other answers here are wrong. The only reliable option is to add triggers to every table that record a change to a single change-history table, which is horribly inefficient and can't be done retroactively.
If you only care about "database used" vs "database not used" you can potentially collect this information from the CSV-format database log files. Detecting "modified" vs "not modified" is a lot harder; consider SELECT writes_to_some_table(...).
If you don't need to detect old activity, you can use pg_stat_database, which records activity since the last stats reset. e.g.:
-[ RECORD 6 ]--+------------------------------
datid | 51160
datname | regress
numbackends | 0
xact_commit | 54224
xact_rollback | 157
blks_read | 2591
blks_hit | 1592931
tup_returned | 26658392
tup_fetched | 327541
tup_inserted | 1664
tup_updated | 1371
tup_deleted | 246
conflicts | 0
temp_files | 0
temp_bytes | 0
deadlocks | 0
blk_read_time | 0
blk_write_time | 0
stats_reset | 2013-12-13 18:51:26.650521+08
so I can see that there has been activity on this DB since the last stats reset. However, I don't know anything about what happened before the stats reset, so if I had a DB showing zero activity since a stats reset half an hour ago, I'd know nothing useful.
PostgreSQL 9.5 let us to track last modified commit.
Check track commit is on or off using the following query
show track_commit_timestamp;
If it return "ON" go to step 3 else modify postgresql.conf
cd /etc/postgresql/9.5/main/
vi postgresql.conf
Change
track_commit_timestamp = off
to
track_commit_timestamp = on
Restart the postgres / system
Repeat step 1.
Use the following query to track last commit
SELECT pg_xact_commit_timestamp(xmin), * FROM YOUR_TABLE_NAME;
SELECT pg_xact_commit_timestamp(xmin), * FROM YOUR_TABLE_NAME where COLUMN_NAME=VALUE;
My way to get the modification date of my tables:
Python Function
CREATE OR REPLACE FUNCTION py_get_file_modification_timestamp(afilename text)
RETURNS timestamp without time zone AS
$BODY$
import os
import datetime
return datetime.datetime.fromtimestamp(os.path.getmtime(afilename))
$BODY$
LANGUAGE plpythonu VOLATILE
COST 100;
SQL Query
SELECT
schemaname,
tablename,
py_get_file_modification_timestamp('*postgresql_data_dir*/*tablespace_folder*/'||relfilenode)
FROM
pg_class
INNER JOIN
pg_catalog.pg_tables ON (tablename = relname)
WHERE
schemaname = 'public'
I'm not sure if things like vacuum can mess this aproach, but in my tests it's a pretty acurrate way to get tables that are no longer used, at least, on INSERT/UPDATE operations.
I guess you should activate some log options. You can get information about logging on postgreSQL here.

Resources