Table cursor in Perl - database

I want to iterate over a big table that don't fits in the memory.
In Java, I can use a cursor and load the contents as needed and not overflow the memory. How I do the same with Perl?
The database I'm using is PostgreSQL and DBI.

Just use a database cursor in PostgreSQL. An example from the manual:
BEGIN WORK;
-- Set up a cursor:
DECLARE liahona SCROLL CURSOR FOR SELECT * FROM films;
-- Fetch the first 5 rows in the cursor liahona:
FETCH FORWARD 5 FROM liahona;
code | title | did | date_prod | kind | len
-------+-------------------------+-----+------------+----------+-------
BL101 | The Third Man | 101 | 1949-12-23 | Drama | 01:44
BL102 | The African Queen | 101 | 1951-08-11 | Romantic | 01:43
JL201 | Une Femme est une Femme | 102 | 1961-03-12 | Romantic | 01:25
P_301 | Vertigo | 103 | 1958-11-14 | Action | 02:08
P_302 | Becket | 103 | 1964-02-03 | Drama | 02:28
-- Fetch the previous row:
FETCH PRIOR FROM liahona;
code | title | did | date_prod | kind | len
-------+---------+-----+------------+--------+-------
P_301 | Vertigo | 103 | 1958-11-14 | Action | 02:08
-- Close the cursor and end the transaction:
CLOSE liahona;
COMMIT WORK;

I used a PostgreSQL cursor from PostgreSQL database.
my $sql = "SOME QUERY HERE";
$dbh->do("DECLARE csr CURSOR WITH HOLD FOR $sql");
my $sth = $dbh->prepare("fetch 100 from csr");
$sth->execute;
while(my $ref = $sth->fetchrow_hashref()) {
//... - processing here
if ($count % 100 == 0){
$sth->execute;
}
}

What's wrong with:
my $s = $h->prepare(select ...);
$s->execute;
while(my $row = $fetchrow_arrayref) {
; # do something
}

Look at DBD::Pg docs for an example.
Use DBI fetchrow_* functions in a while() loop for smaller memory allocation, avoid fetchall_*.
Other database options related to memory usage:
LongReadLen - maximum length of 'long' type fields (LONG, BLOB, CLOB, MEMO, etc.)
RowCacheSize (not used in DBD::Pg) - A hint to the driver indicating the size of the local row cache that the application would like the driver to use for future "SELECT" statements.

Related

Alert Before Running Query Consisting Of Large Size Data

Do we have any mechanism in Snowflake where we alert Users running a Query containing Large Size Tables , this way user would get to know that Snowflake would consume many warehouse credits if they run this query against large size dataset,
There is no alert mechanism for this, but users may run EXPLAIN command before running the actual query, to estimate the bytes/partitions read:
explain select c_name from "SAMPLE_DATA"."TPCH_SF10000"."CUSTOMER";
+-------------+----+--------+-----------+-----------------------------------+-------+-----------------+-----------------+--------------------+---------------+
| step | id | parent | operation | objects | alias | expressions | partitionsTotal | partitionsAssigned | bytesAssigned |
+-------------+----+--------+-----------+-----------------------------------+-------+-----------------+-----------------+--------------------+---------------+
| GlobalStats | | | | 6585 | 6585 | 109081790976 | | | |
| 1 | 0 | | Result | | | CUSTOMER.C_NAME | | | |
| 1 | 1 | 0 | TableScan | SAMPLE_DATA.TPCH_SF10000.CUSTOMER | | C_NAME | 6585 | 6585 | 109081790976 |
+-------------+----+--------+-----------+-----------------------------------+-------+-----------------+-----------------+--------------------+---------------+
https://docs.snowflake.com/en/sql-reference/sql/explain.html
You can also assign users to specific warehouses, and use resource monitors to limit credits on those warehouses.
https://docs.snowflake.com/en/user-guide/resource-monitors.html#assignment-of-resource-monitors
As the third alternative, you may set STATEMENT_TIMEOUT_IN_SECONDS to prevent long running queries.
https://docs.snowflake.com/en/sql-reference/parameters.html#statement-timeout-in-seconds

How to use SAS array wildcards to create an array of variables that have names ending with a certain string?

I have code similar to this:
data input;
input yr $ lob $ type $
allow_P25 los_P25 adm_P25;
cards;
2019 Com AMB 205.4 3.56 3444
2019 Med DME 34.4 1.11 533
;
run;
data results;
length perc_type $15 perc_value 8;
set input;
array change _numeric_;
do over change;
perc_type = vname(change);
perc_value = change;
output results;
end;
run;
This code creates an array of all numeric variables. However, I now need to create an array of variables with names ending in P25
Is there a way to do it using wildcards? I found some solutions on the internet in which they used wildcards, but it always seemed to be at the end of the variable name. What if I want to use the wildcard at the beginning of the variable name? I tried this (obviously, wrong solution)
data results;
length perc_type $15 perc_value 8;
set input;
array change :P25;
do over change;
perc_type = vname(change);
perc_value = change;
output results;
end;
run;
As a workaround, you can match the suffix of the _numeric_ array using prxmatch(perl_regex, string) and perform action only if a match is found.
Code
data results;
length perc_type $15 perc_value 8;
set input;
array change _numeric_;
do over change;
/* match vnames ending with P25 */
if (prxmatch("/P25$/", vname(change)) > 0) then do;
/* do whatever you want */
perc_type = vname(change);
perc_value = change;
output results;
end;
end;
run;
Output
| Obs | perc_type | perc_value | yr | lob | type | allow_P25 | los_P25 | adm_P25 |
|-----|----------:|------------|------|-----|-----:|----------:|--------:|---------|
| 1 | allow_P25 | 205.40 | 2019 | Com | AMB | 205.4 | 3.56 | 3444 |
| 2 | los_P25 | 3.56 | 2019 | Com | AMB | 205.4 | 3.56 | 3444 |
| 3 | adm_P25 | 3444.00 | 2019 | Com | AMB | 205.4 | 3.56 | 3444 |
| 4 | allow_P25 | 34.40 | 2019 | Med | DME | 34.4 | 1.11 | 533 |
| 5 | los_P25 | 1.11 | 2019 | Med | DME | 34.4 | 1.11 | 533 |
| 6 | adm_P25 | 533.00 | 2019 | Med | DME | 34.4 | 1.11 | 533 |
Notes
SAS array does not work like other languages. A SAS array is a reference to a group of variables. Therefore, if there is no magic way, just get the built-in _numeric_ group directly at first and filter the variable names subsequently.
See also the official docs in SAS regex.
There is no direct SAS syntax that allows this. Though macros have been written to deal with problems like this one.
See a few at Roger's Github here.

Create/Update table in MS Access dynamically

EDIT:
Here's what I have: An Access database made up of 3 tables linked from SQL server. I need to create a new table in this database by querying the 3 source tables. Here are examples of the 3 tables I'm using:
PlanTable1
+------+------+------+------+---------+---------+
| Key1 | Key2 | Key3 | Key4 | PName | MainKey |
+------+------+------+------+---------+---------+
| 53 | 1 | 5 | -1 | Bikes | 536681 |
| 53 | 99 | -1 | -1 | Drinks | 536682 |
| 53 | 66 | 68 | -1 | Balls | 536683 |
+------+------+------+------+---------+---------+
SpTable
+----+---------+---------+
| ID | MainKey | SpName |
+----+---------+---------+
| 10 | 536681 | Wing1 |
| 11 | 536682 | Wing2 |
| 12 | 536683 | Wing3 |
+----+---------+---------+
LocTable
+-------+-------------+--------------+
| LocID | CenterState | CenterCity |
+--- ---+-------------+--------------+
| 10 | IN | Indianapolis |
| 11 | OH | Columbus |
| 12 | IL | Chicago |
+-------+-------------+--------------+
You can see the relationships between the tables. The NewMasterTable I need to create based off of these will look something like this:
NewMasterTable
+-------+--------+-------------+------+--------------+-------+-------+-------+
| LocID | PName | CenterState | Key4 | CenterCity | Wing1 | Wing2 | Wing3 |
+-------+--------+-------------+------+--------------+-------+-------+-------+
| 10 | Bikes | IN | -1 | Indianapolis | 1 | 0 | 0 |
| 11 | Drinks | OH | -1 | Columbus | 0 | 1 | 0 |
| 12 | Balls | IL | -1 | Chicago | 0 | 0 | 1 |
+-------+--------+-------------+------+--------------+-------+-------+-------+
The hard part for me is making this new table dynamic. In the future, rows may be added to the source tables. I need my NewMasterTable to reflect any changes/additions to the source. How do I go about building the NewMasterTable as described? Does this make any sort of sense?
Since an Access table is a necessary requirement, then probably the only way to go about it is to create a set of Update and Insert queries that are executed periodically. There is no built-in "dynamic" feature of Access that will monitor and update the table.
First, create the table. You could either 1) do this manually from scratch by defining the columns and constraints yourself, or 2) create a make-table query (i.e. SELECT... INTO) that generates most of the schema, then add any additional columns, edit necessary details and add appropriate indexes.
Define and save Update and Insert (and optional Delete) queries to keep the table synced. I'm not sharing actual code here, because that goes beyond your primary issue I think and requires specifics that you need to define. Due to some ambiguity with your key values (the field names and sample data still are not sufficient to reveal precise relationships and constraints), it is likely that you'll need multiple Update statements.
In particular, the "Wing" columns will likely require a transform statement.
You may not be able to update all columns appropriately using a single query. I recommend not trying to force such an "artificial" requirement. Multiple queries can actually be easier to understand and maintain.
In the event that you experience "query is not updateable" errors, you may need to define other "temporary" tables with appropriate indexes, into which you do initial inserts from the linked tables, then subsequent queries to update your master table from those.
Finally, and I think this is the key to solving your problem, you need to define some Access form (or other code) that periodically runs your set of "sync" queries. Access forms have a [Timer Interval] property and corresponding Timer event that fires periodically. Add VBA code in the Form_Timer sub that runs all your queries. I would suggest "wrapping" such VBA in a transaction and adding appropriate error handling and error logging, etc.

How can we validate tabular data in robot framework?

In Cucumber, we can directly validate the database table content in tabular format by mentioning the values in below format:
| Type | Code | Amount |
| A | HIGH | 27.72 |
| B | LOW | 9.28 |
| C | LOW | 4.43 |
Do we have something similar in Robot Framework. I need to run a query on the DB and the output looks like the above given table.
No, there is nothing built in to do exactly what you say. However, it's fairly straight-forward to write a keyword that takes a table of data and compares it to another table of data.
For example, you could write a keyword that takes the result of the query and then rows of information (though, the rows must all have exactly the same number of columns):
| | ${ResultOfQuery}= | <do the database query>
| | Database should contain | ${ResultOfQuery}
| | ... | #Type | Code | Amount
| | ... | A | HIGH | 27.72
| | ... | B | LOW | 9.28
| | ... | C | LOW | 4.43
Then it's just a matter of iterating over all of the arguments three at a time, and checking if the data has that value. It would look something like this:
**** Keywords ***
| Database should contain
| | [Arguments] | ${actual} | #{expected}
| | :FOR | ${type} | ${code} | ${amount} | IN | #{expected}
| | | <verify that the values are in ${actual}>
Even easier might be to write a python-based keyword, which makes it a bit easier to iterate over datasets.

SAS - Do loop within If statement?

I have been using SAS off and on for a year and I'm finally getting into arrays, macros, and all that cool stuff.
What I want to do:
I have a merged dataset with data from students in different grades on a test. I need to create different files for each grade. I don't have a grade variable to easily sort the dataset by and create different files. I do have an index of variables specific to each grade.
Example - What I have:
+-------+--------+--------+--------+--------+--------+
| ID | sc_132 | sc_139 | sc_142 | sc_143 | sc_151 |
+-------+--------+--------+--------+--------+--------+
| 16623 | 1 | 1 | 0 | . | . |
| 16624 | 1 | 0 | 0 | . | . |
| 16626 | 1 | 1 | 1 | . | . |
| 17221 | . | . | . | 1 | 0 |
| 17222 | . | . | . | 0 | 1 |
| 17225 | . | . | . | 0 | . |
+-------+--------+--------+--------+--------+--------+
Example - What I want:
+-------+--------+--------+--------+--------+--------+
| ID | sc_132 | sc_139 | sc_142 | sc_143 | sc_151 |
+-------+--------+--------+--------+--------+--------+
| 16623 | 1 | 1 | 0 | . | . |
| 16624 | 1 | 0 | 0 | . | . |
| 16626 | 1 | 1 | 1 | . | . |
+-------+--------+--------+--------+--------+--------+
+-------+--------+--------+--------+--------+--------+
| ID | sc_132 | sc_139 | sc_142 | sc_143 | sc_151 |
+-------+--------+--------+--------+--------+--------+
| 17221 | . | . | . | 1 | 0 |
| 17222 | . | . | . | 0 | 1 |
| 17225 | . | . | . | 0 | . |
+-------+--------+--------+--------+--------+--------+
Where I am:
I have a lot of variables specific to each grade, and some of the variables contain missing data, so to be thorough I should check all of the grade-specific variables and output any observations containing data in any of those fields. I could use a hideously long IF THEN statement...
DATA grade1 grade2 grade3 grade4;
SET gradeall;
IF sc_132 ^= . OR sc_139 ^= . OR (AND SO ON FOR ABOUT 34 VARIABLES) THEN OUTPUT grade1;
RUN;
But I thought this would be a good time to use an array. I can't find any easy to parse documentation about where and when you can use do loops. Using my logic of other programming languages and what I've browsed about do loops I've put together the following.
%let gr1_var = sc_132 sc_139 sc_142;
/*-GRADE SPECIFIC ARRAY REPEATED FOR OTHER GRADES -*/
DATA grade1 grade2 grade3 grade4;
SET gradeall;
PUT &gr1_var;
ARRAY grade1 [*] &gr1_var;
IF (
DO i= 1 TO (DIM(items5_all)-1);
items5_all(i) ^=. OR ;
END;
DO i= DIM(items5_all);
items5_all(i) ^=.;
END;
)
THEN OUTPUT grade1;
/*-IF THEN STATEMENT THEN REPEATED FOR OTHER GRADES-*/
run;
I was hoping this would give me the equivalent of the long IF THEN statement above without having to type it. But of course it is non-functional.
Can you even use do loops within If statements (I haven't found any examples of this)?
Does anyone have any recommendations for how to accomplish this task?
I think if you only want to output any observation which contains data in any of specific fields, you can just do a sum of array. If any observation doesn't have value for a variable, the sum is empty so this observation will not be output. No loop is needed. Just like:
%let gr1_var = sc_132--sc_142; /*for array definition, you may use "--" or "-" */
%let gr2_var = sc_143 sc_151;
DATA grade1 grade2;
SET gradeall;
ARRAY grade1 [*] &gr1_var;
ARRAY grade2 [*] &gr2_var;
if sum(of grade1(*))^=. then output grade1;
if sum(of grade2(*))^=. then output grade2;
run;
By the way, if macro is used here, there is no need to write multiple if..then and array definition.
And I don't think you can use DO LOOP inside if..else statement like what you put here.

Resources