Mask some fields when loading hdb in kdb - database

If there are two instances: a1 and a2, and both connect to the same hdb. I would like a2 connect to the hdb but add some filter. For example, there is a table called elec.
I would like a2 starts with filtering some of the values. If I write codes and let a2 load it when starting, doesn't that load the information to memory? Is there any way I can load it like normal hdb when starting a2 instance?
Basically, the question is how to mask some fields in one table when loading hdb?

It is possible to prevent columns from being returned by select statements be manipulating the table definition in your HDB instance. The below example has a single date paritioned table. We update the definition to a flipped dictionary with only a subset of the columns defined. This however is reversible and will not update the meta of the table in your instance which will still show all columns.
q)meta trade
c | t f a
----| -----
date| d
sym | s p
size| j
px | f
side| s
q)flip trade
`sym`size`px`side!`trade
q)`trade set flip `sym`size`px!`trade
q)select from trade where date=2017.05.27
date sym size px
------------------------------
2017.05.27 APPl 9968 92.79204
2017.05.27 APPl 9788 94.97189
2017.05.27 APPl 9660 27.62907
q)meta trade
c | t f a
----| -----
date| d
sym | s p
size| j
px | f
side| s

Related

Use TQuery.Locate() function to find other then first matching

Locate moves the cursor to the first row matching a specified set of search criteria.
Let's say that q is TQuery component, which is connected to the database with two columns TAG and TAGTEXT. With next code I am getting letter a. And I would like to use Locate() function to get letter d.
If q.Locate('TAG','1',[loPartialKey]) Then
begin
tag60 := q.FieldByName('TAGTEXT');
end
For example if I got table like this:
TAG | TAGTEXT
+---+--------+
| 1 | a |
+---+--------+
| 2 | b |
+---+--------+
| 3 | c |
+---+--------+
| 1 | d |
+---+--------+
| 4 | e |
+---+--------+
| 1 | f |
+---+--------+
is it possible to locate the second time number one occurred in table?
EDIT
My job is to find the occurrence of TAG with value 1 (which occurrence I need depends on the parameter I get), I need to iterate through table and get the values from all the TAGTEXT fields till I find that value in TAG field is again number 1. Number 1 in this case represents the start of new segment, and all between the two number 1s belongs to one segment. It doesn't have to be same number of rows in each segment. Also I am not allowed to do any changes on table.
What I thought I could do is to create a counter variable that is going to be increased by one every time it comes to TAG with value 1 in it. When the counter equals to the parameter that represents the occurrence I know that I am in the right segment and I am going to iterate through that segment and get the values I need.
But this might be slow solution, and I wanted to know if there was any faster.
You need to be a bit wary of using Locate for a purpose like this, because some
TDataSet descendants' implementation of Locate (or the underlying db-access layer) construct a temporary index on the dataset. which can be discarded immediately afterwards, so repeatedly calling Locate to iterate the rows of a given segment may be a lot more inefficient than one might expect it to be.
Also, TClientDataSet constructs, uses and then discards an expression parser for each invocation of Locate (in its internal call to LocateRecord), which is a lot of overhead for repeated calls, especial when they are entirely avoidable.
In any case, the best way to do this is to ensure that your table records which segment a given row belongs to, adding a column like the SegmentID below if your table does not already have one:
TAG | TAGTEXT|SegmentID
+---+--------+---------+
| 1 | a | 1
| 2 | b | 1
| 3 | c | 1
| 1 | d | 2
+---+--------+---------+ // btw, what happened to the 2 missing rows after this one?
| 4 | e | 2
| 1 | f | 3
+---+--------+---------+
Then, you could use code like this to iterate the rows of a segment:
procedure IterateSegment(Query : TSomeTypeOfQueryComponent; SegmentID : Integer);
var
Sql; String;
begin
Sql := Format('select * from mytable where SegmentID = %d order by Tag', [SegmentID]);
if Query.Active then
Query.Close;
Query.Sql.Text := Sql;
Query.Open;
Query.DisableControls;
try
while not Query.Eof do begin
// process row here
Query.Next;
end;
finally
Query.EnableControls;
end;
end;
Once you have the SegmentID column in the table, if you don't want to open a new query to iterate a block, you can set up a local index (by SegmentID then Tag), assuming your dataset type supports it, set a filter on the dataset to restrict it to a given SegmentID and then iterate over it
You have much options to do this.
If your component don´t provide a locateNext you can make your on function locateNext, comparing the value and make next until find.
You can also bring the sql with order by then use locate for de the first value and test if the next value match the comparision.
If you use a clientDataset you can filter into the component filter propertie, or set IndexFieldNames to order values instead the "order by" of sql in the prior suggestion.
You can filter it on the SQL Where clausule too.

Excel Lookup IP addresses in multiple ranges

I am trying to find a formula for column A that will check an IP address in column B and find if it falls into a range (or between) 2 addresses in two other columns C and D.
E.G.
A B C D
+---------+-------------+-------------+------------+
| valid? | address | start | end |
+---------+-------------+-------------+------------+
| yes | 10.1.1.5 | 10.1.1.0 | 10.1.1.31 |
| Yes | 10.1.3.13 | 10.1.2.16 | 10.1.2.31 |
| no | 10.1.2.7 | 10.1.1.128 | 10.1.1.223 |
| no | 10.1.1.62 | 10.1.3.0 | 10.1.3.127 |
| yes | 10.1.1.9 | 10.1.4.0 | 10.1.4.255 |
| no | 10.1.1.50 | … | … |
| yes | 10.1.1.200 | | |
+---------+-------------+-------------+------------+
This is supposed to represent an Excel table with 4 columns a heading and 7 rows as an example.
I can do a lateral check with
=IF(AND((B3>C3),(B3 < D3)),"yes","no")
which only checks 1 address against the range next to it.
I need something that will check the 1 IP address against all of the ranges. i.e. rows 1 to 100.
This is checking access list rules against routes to see if I can eliminate redundant rules... but has other uses if I can get it going.
To make it extra special I can not use VBA macros to get it done.
I'm thinking some kind of index match to look it up in an array but not sure how to apply it. I don't know if it can even be done. Good luck.
Ok, so I've been tracking this problem since my initial comment, but have not taken the time to answer because just like Lana B:
I like a good puzzle, but it's not a good use of time if i have to keep guessing
+1 to Lana for her patience and effort on this question.
However, IP addressing is something I deal with regularly, so I decided to tackle this one for my own benefit. Also, no offense, but getting the MIN of the start and the MAX of the end is wrong. This will not account for gaps in the IP white-list. As I mentioned, this required 15 helper columns and my result is simply 1 or 0 corresponding to In or Out respectively. Here is a screenshot (with formulas shown below each column):
The formulas in F2:J2 are:
=NUMBERVALUE(MID(B2,1,FIND(".",B2)-1))
=NUMBERVALUE(MID(B2,FIND(".",B2)+1,FIND(".",B2,FIND(".",B2)+1)-1-FIND(".",B2)))
=NUMBERVALUE(MID(B2,FIND(".",B2,FIND(".",B2)+1)+1,FIND(".",B2,FIND(".",B2,FIND(".",B2)+1)+1)-1-FIND(".",B2,FIND(".",B2)+1)))
=NUMBERVALUE(MID(B2,FIND(".",B2,FIND(".",B2,FIND(".",B2)+1)+1)+1,LEN(B2)))
=F2*256^3+G2*256^2+H2*256+I2
Yes, I used formulas instead of "Text to Columns" to automate the process of adding more information to a "living" worksheet.
The formulas in L2:P2 are the same, but replace B2 with C2.
The formulas in R2:V2 are also the same, but replace B2 with D2.
The formula for X2 is
=SUMPRODUCT(--($P$2:$P$8<=J2)*--($V$2:$V$8>=J2))
I also copied your original "valid" set in column A, which you'll see matches my result.
You will need helper columns.
Organise your data as outlined in the picture.
Split address, start and end into columns by comma (ribbon menu Data=>Text To Columns).
Above the start/end parts, calculate MIN FOR START, and MAX FOR END for all split text parts (i.e. MIN(K5:K1000) .
FORMULAS:
VALIDITY formula - copy into cell D5, and drag down:
=IF(AND(B6>$I$1,B6<$O$1),"In",
IF(OR(B6<$I$1,B6>$O$1),"Out",
IF(B6=$I$1,
IF(C6<$J$1, "Out",
IF( C6>$J$1, "In",
IF( D6<$K$1, "Out",
IF( D6>$K$1, "In",
IF(E6>=$L$1, "In", "Out"))))),
IF(B6=$O$1,
IF(C6>$P$1, "Out",
IF( C6<$P$1, "In",
IF( D6>$Q$1, "Out",
IF( D6<$Q$1, "In",
IF(E6<=$R$1, "In", "Out") )))) )
)))

How to get the dimensionality of an ARRAY column?

I'm working on a project that collects information about your schema from the database directly. I can get the data_type of the column using information_schema.columns, which will tell me if it's an ARRAY or not. I can also get the underlying type (integer, bytea etc) of the ARRAY by querying information_schema.element_types as described here:
https://www.postgresql.org/docs/9.1/static/infoschema-element-types.html
My problem is that I also need to know how many dimensions the array has, whether it is integer[], or integer[][] for example. Does anyone know of a way to do this? Google isn't being very helpful here, hopefully someone more familiar with the Postgres spec can lead me in the right direction.
For starters, the dimensionality of an array is not reflected in the data type in Postgres. The syntax integer[][] is tolerated, but it's really just integer[] internally.
Read the manual here.
This means that dimensions can vary within the same array type (the same table column).
To get actual dimensions of a particular array value:
SELECT array_dims(my_arr); -- [1:2][1:3]
Or to just get the number of dimensions:
SELECT array_ndims(my_arr); -- 2
There are more array functions for similar needs. See table of array functions in the manual.
Related:
Use string[][] with ngpsql
If you need to enforce particular dimensions in a column, add a CHECK constraint. To enforce 2-dimensional arrays:
ALTER TABLE tbl ADD CONSTRAINT tbl_arr_col_must_have_2_dims
CHECK (array_ndims(arr_col) = 2);
Multidimensional arrays support in Postgres is very specific. Multidimensional array types do not exist. If you declare an array as multidimensional, Postgres casts it automatically to a simple array type:
create table test(a integer[][]);
\d test
Table "public.test"
Column | Type | Modifiers
--------+-----------+-----------
a | integer[] |
You can store arrays of different dimensions in a column of an array type:
insert into test values
(array[1,2]),
(array[array[1,2], array[3,4]]);
select a, a[1] a1, a[2] a2, a[1][1] a11, a[2][2] a22
from test;
a | a1 | a2 | a11 | a22
---------------+----+----+-----+-----
{1,2} | 1 | 2 | |
{{1,2},{3,4}} | | | 1 | 4
(2 rows)
This is a key difference between Postgres and programming languages like C, python etc. The feature has its advantages and disadvantages but usually causes various problems for novices.
You can find the number of dimensions in the system catalog pg_attribute:
select attname, typname, attndims
from pg_class c
join pg_attribute a on c.oid = attrelid
join pg_type t on t.oid = atttypid
where relname = 'test'
and attnum > 0;
attname | typname | attndims
---------+---------+----------
a | _int4 | 2
(1 row)
It is not clear whether you can rely on this number, as for the documentation:
attndims - Number of dimensions, if the column is an array type; otherwise 0. (Presently, the number of dimensions of an array is not enforced, so any nonzero value effectively means "it's an array".)

SSIS: How to split excel cell value into SQL columns

I have an excel file with data like this:
ID | FieldA | FieldB
1 ABC A, B
2 FGH W, Z
3 KLÑ G, K
What I want to do is to use SSIS and import this data into a SQL Table. The only problem is that this table has an structure like this:
ID | FieldA | FieldB1 | FieldB2
So, what I need to do is to split the "FieldB" Column in Excel and put it into FieldB1 and FieldB2 in SQL.
The result would be something like this:
ID | FieldA | FieldB1 | FieldB2
1 | ABC | A | B
2 | FGH | W | Z
3 | KLÑ | G | K
Any ideas on how to achieve this?
Unless I'm missing something, I'd just skip the header row and have it import the subsequent data correctly. Take a minute or so to assign column names and voilà, done.
Try selecting the relevant range, then running this:
Sub SplitColumn()
Dim strArr() as String
Dim cell as Range
For Each cell In Selection
cell.offset(0, 1).resize(1,2).value = split(cell.value,", ")
Next cell
End Sub
Now copy and paste your data wherever required.
Non-VBA alternative:
Enter the following formula in cell D2:
=LEFT(C2,FIND(",",C2)-1)
And in E2:
=RIGHT(C2,LEN(C2)-FIND(", ",C2)-1)
And autocomplete the rest of the column.
As I see here is a detailed explanation of your example.
On the other side you can use another approach - split one excel column on two columns in excel using excel formulas and import document with 4 columns.
you can use derived column and add as two new columns .First Column expression should be like this :
SUBSTRING([FieldB],1,FINDSTRING([FieldB],",",1) - 1)
and the second one like this :
SUBSTRING([FieldB],FINDSTRING([FieldB],",",1) + 1,LEN([FieldB])- FINDSTRING([FieldB],"_",1) )

A single MySQL query for 'bouncing' table selects

So, say for the sake of simplicity, I have a master table containing two fields - The first is an attribute and the second is the attributes value. If the second field is set to reference a value in another table it is denoted in parenthesis.
Example:
MASTER_TABLE:
Attr_ID | Attr_Val
--------+-----------
1 | 23(table1) --> 23rd value from `table1`
2 | ...
1 | 42 --> the number 42
1 | 72(table2) --> 72nd value from `table2`
3 | ...
1 | txt --> string "txt"
2 | ...
4 | ...
TABLE 1:
Val_Id | Value
--------+-----------
1 | some_content
2 | ...
. | ...
. | ...
. | ...
23 | some_content
. | ...
Is it possible to perform a single query in SQL (without parsing the results inside the application and requerying the db) that would iterate trough master_table and for the given <attr_id> get only the attributes that reference other tables (e.g. 23(table1), 72(table2), ...), then parse the tables names from the parenthesis (e.g. table1, table2, ...) and perform a query to get the (23rd, 72nd, ...) value (e.g. some_content) from that referenced table?
Here is something I've done, and it parses the Attr_Val for the table name, but I don't know how to assign it to a string and then do a query with that string.
PREPARE pstmt FROM
"SELECT * FROM information_schema.tables
WHERE TABLESCHEMA = '<my_db_name>' AND TABLE_NAME=?";
SET #str_tablename =
(SELECT table.tablename FROM
(SELECT #string:=(SELECT <string_column> FROM <table> WHERE ID=<attr_id>) as String,
#loc1:=length(#string)-locate("(", reverse(#string))+2 AS from,
#loc2:=length(#string)-locate(")", reverse(#string))+1-#loc1 AS to,
substr(#string,#loc1, #loc2) AS tablename
) table
); <--this returns 1 rows which is OK
EXECUTE pstmt USING #str_tablename; <--this then returns 0 rows
Any thoughts?
I love the purity of this approach, if pulled off. But I'm thinking you're creating a maintenance bomb. With a cure like this, who needs to be sick?
No one has ever said of a web site "Man, their data sure is pure!" They compliment what is being done with the data. I don't recommend you keep your hands tied behind your back on this one. I guarantee your competitors aren't.

Resources