How can we validate tabular data in robot framework? - database

In Cucumber, we can directly validate the database table content in tabular format by mentioning the values in below format:
| Type | Code | Amount |
| A | HIGH | 27.72 |
| B | LOW | 9.28 |
| C | LOW | 4.43 |
Do we have something similar in Robot Framework. I need to run a query on the DB and the output looks like the above given table.

No, there is nothing built in to do exactly what you say. However, it's fairly straight-forward to write a keyword that takes a table of data and compares it to another table of data.
For example, you could write a keyword that takes the result of the query and then rows of information (though, the rows must all have exactly the same number of columns):
| | ${ResultOfQuery}= | <do the database query>
| | Database should contain | ${ResultOfQuery}
| | ... | #Type | Code | Amount
| | ... | A | HIGH | 27.72
| | ... | B | LOW | 9.28
| | ... | C | LOW | 4.43
Then it's just a matter of iterating over all of the arguments three at a time, and checking if the data has that value. It would look something like this:
**** Keywords ***
| Database should contain
| | [Arguments] | ${actual} | #{expected}
| | :FOR | ${type} | ${code} | ${amount} | IN | #{expected}
| | | <verify that the values are in ${actual}>
Even easier might be to write a python-based keyword, which makes it a bit easier to iterate over datasets.

Related

Alert Before Running Query Consisting Of Large Size Data

Do we have any mechanism in Snowflake where we alert Users running a Query containing Large Size Tables , this way user would get to know that Snowflake would consume many warehouse credits if they run this query against large size dataset,
There is no alert mechanism for this, but users may run EXPLAIN command before running the actual query, to estimate the bytes/partitions read:
explain select c_name from "SAMPLE_DATA"."TPCH_SF10000"."CUSTOMER";
+-------------+----+--------+-----------+-----------------------------------+-------+-----------------+-----------------+--------------------+---------------+
| step | id | parent | operation | objects | alias | expressions | partitionsTotal | partitionsAssigned | bytesAssigned |
+-------------+----+--------+-----------+-----------------------------------+-------+-----------------+-----------------+--------------------+---------------+
| GlobalStats | | | | 6585 | 6585 | 109081790976 | | | |
| 1 | 0 | | Result | | | CUSTOMER.C_NAME | | | |
| 1 | 1 | 0 | TableScan | SAMPLE_DATA.TPCH_SF10000.CUSTOMER | | C_NAME | 6585 | 6585 | 109081790976 |
+-------------+----+--------+-----------+-----------------------------------+-------+-----------------+-----------------+--------------------+---------------+
https://docs.snowflake.com/en/sql-reference/sql/explain.html
You can also assign users to specific warehouses, and use resource monitors to limit credits on those warehouses.
https://docs.snowflake.com/en/user-guide/resource-monitors.html#assignment-of-resource-monitors
As the third alternative, you may set STATEMENT_TIMEOUT_IN_SECONDS to prevent long running queries.
https://docs.snowflake.com/en/sql-reference/parameters.html#statement-timeout-in-seconds

Solr poor performance on date range query

I'm currently struggling to get decent performance on a ~18M documents core with NRT indexing from a date range query, in Solr 4.10 (CDH 5.14).
I tried multiple strategies but everything seems to fail.
Each document has multiple versions (10 to 100) valid at different non-overlapping periods (startTime/endTime) of time.
The query pattern is the following: query on the referenceNumber (or other criteria) but only return documents valid at a referenceDate (day precision). 75% of queries select a referenceDate within the last 30 days.
If we query without the referenceDate, we have very good performance, but 100x slowdown with the additional referenceDate filter, even when forcing it as a postfilter.
Here are some perf tests from a python script executing http queries and computing the QTime of 100 distinct referenceNumber.
+----+-------------------------------------+----------------------+--------------------------+
| ID | Query | Results | Comment |
+----+-------------------------------------+----------------------+--------------------------+
| 1 | q=referenceNumber:{referenceNumber} | 100 calls in <10ms | Performance OK |
+----+-------------------------------------+----------------------+--------------------------+
| 2 | q=referenceNumber:{referenceNumber} | 99 calls in <10ms | 1 call to warm up |
| | &fq=startDate:[* to NOW/DAY] | 1 call in >=1000ms | the cache then all |
| | AND endDate:[NOW/DAY to *] | | queries hit the filter |
| | | | cache. Problem: as |
| | | | soon as new documents |
| | | | come in, they invalidate |
| | | | the cache. |
+----+-------------------------------------+----------------------+--------------------------+
| 3 | q=referenceNumber:{referenceNumber} | 99 calls in >=500ms | The average of |
| | &fq={!cache=false cost=200} | 1 call in >=1000ms | calls is 734.5ms. |
| | startDate:[* to NOW/DAY] | | |
| | AND endDate:[NOW/DAY to *] | | |
+----+-------------------------------------+----------------------+--------------------------+
How is it possible that the additional date range filter query creates a 100x slowdown? From this blog, I would have expected similar performance from the daterange query as without the additional filter: http://yonik.com/advanced-filter-caching-in-solr/
Or is the only option is to change the softCommit/hardCommit delays, create 30 warmup fq for the past 30 days and tolerate poor performance on 25% of our queries?
Edit 1: Thanks for the answer, unfortunately, using integers instead of tdate does not seem to provide any performance gains. It can only leverage caching, like the query ID 2 above. That means we need a strategy of warmup of 30+ fq.
+----+-------------------------------------+----------------------+--------------------------+
| ID | Query | Results | Comment |
+----+-------------------------------------+----------------------+--------------------------+
| 4 | fq={!cache=false} | 35 calls in <10ms | |
| | referenceNumber:{referenceNumber} | 65 calls in >10ms | |
+----+-------------------------------------+----------------------+--------------------------+
| 5 | fq={!cache=false} | 9 calls in >100ms | |
| | referenceNumber:{referenceNumber} | 6 calls in >500ms | |
| | AND versionNumber:[2 TO *] | 85 calls in >1000ms | |
+----+-------------------------------------+----------------------+--------------------------+
edit 2: It seems that passing my referenceNumber from fq to q and setting different costs improve the query time (no perfect, but better). What's weird though is that the cost >= 100 is supposed to be executed as a postFilter, but setting the cost from 20 to 200 does not seem to impact performance at all. Does anyone know how to see whether a fq param is executed as a post filter?
+----+-------------------------------------+----------------------+--------------------------+
| 6 | fq={!cache=false cost=0} | 89 calls in >100ms | |
| | referenceNumber:{referenceNumber} | 11 calls in >500ms | |
| | &fq={!cache=false cost=200} | | |
| | startDate:[* TO NOW] AND | | |
| | endDate:[NOW TO *] | | |
+----+-------------------------------------+----------------------+--------------------------+
| 7 | fq={!cache=false cost=0} | 36 calls in >100ms | |
| | referenceNumber:{referenceNumber} | 64 calls in >500ms | |
| | &fq={!cache=false cost=20} | | |
| | startDate:[* TO NOW] AND | | |
| | endDate:[NOW TO *] | | |
+----+-------------------------------------+----------------------+--------------------------+
Hi I have an another solution for you, it will give a good performance after performing same query to solr.
My Suggestion is store date in int format, please find below example.
Your Start Date : 2017-03-01
Your END Date : 2029-03-01
**Suggested format in int format.
Start Date : 20170301
END Date : 20290301**
When you are trying fire same query with int number instead of dates it works faster as expected.
So your query will be.
q=referenceNumber:{referenceNumber}
&fq=startNewDate:[* to YYMMDD]
AND endNewDate:[YYMMDD to *]
Hope it will help you ..

Issues with ER model design

I am trying to design a model for our future database of our toys and certain measurements that have to be done post-production. I have trouble grasping how to model this. I have tried multiple ways, but none of them seem optimal and in the end I've always lost out on the connectivity between entities.
What I need to achieve is some kind of meaningful relationship between the following:
A toy (with some trivial properties).
A series of toys (multiple toys can be related to one series and a toy can only belong to one series).
Measurement steps. There are currently 6 of these steps. Each step has its own input parameters and these vary in type as well as in number (eg. only 3 parameters for measurement step 1 and 10 parameters for measurement step 2).
With each series, a sequence of these measurement steps is defined. Duplicates of tests are allowed (eg. measurement step 1 > measurement step 4 > measurement step 1 is a valid sequence). The sequence along with the parameters must be stored somewhere for future reference.
Each toy goes through the sequence of measurements that is defined by its series. All of the results must be stored somewhere (for each individual toy).
If I split the measurement steps into their own tables I can't reference them conditionally (as foreign keys) to some other table.
If I try to serialize part of the data I lose the ability to make connections between individual measurement steps, measurement results (at least with queries) etc.
I know people here generally hate/don't answer these kinds of "discussion-like" questions, but I'd ask of you to at least point out what is a good practice in a system where I need to store this locally on a machine, but need a database to hold the data - to move towards serial-like data and just do general relationships where it is easy to do so or keep trying to normalize it as much as possible?
If measurements steps share most of attributes (or are of the same type, like what you called PARAMETERS), and I understood correctly your definitions, I would make something like this.
It could be a starting point.
+----------------------------+ +------------------------------+
| TOYS | | TOY_SERIES |
+-----+----------------------+ +---------+--------------------+
| PK | ID_TOY | +----------+ PK, FK1 | ID_S +--------+
| | | | +------------------------------+ |
| FK1 | ID_S +---------+ | | ... | |
+----------------------------+ | | | |
| | ... | | | | |
| | | | | | |
+-----+----------------------+ +---------+--------------------+ |
|
|
|
|
+------------------------------+ |
| BR_SER_MEAS | |
+---------+--------------------+ |
| PK, FK1 | ID_S +--------+
| | |
| PK, FK2 | ID_M +--------+
| | | |
| PK | ID_SEQ | |
| | | |
+---------+--------------------+ |
|
|
+------------------------------+ |
| MEASURE_STEPS | |
+------------------------------+ |
| PK ID_M +--------+
+------------------------------+
| PARAM_01 |
| ... |
| PARAM_10 |
| |
| |
+------------------------------+

How to Write Conditional Statement in SQL Server

I am having a logic issue in relation to querying an SQL database. I need to exclude 3 different categories and any item that is included in those categories; however, if an item under one of those categories meets the criteria for another category I need to keep said item.
This is an example output I will get after querying the database at its current version:
ExampleDB | item_num | pro_type | area | description
1 | 45KX-76Y | FLCM | Finished | coil8x
2 | 68WO-93H | FLCL | Similar | y45Kx
3 | 05RH-27N | FLDR | Finished | KH72n
4 | 84OH-95W | FLEP | Final | tar5x
5 | 81RS-67F | FLEP | Final | tar7x
6 | 48YU-40Q | FLCM | Final | bile6
7 | 19VB-89S | FLDR | Warranty | exp380
8 | 76CS-01U | FLCL | Gator | low5
9 | 28OC-08Z | FLCM | Redo | coil34Y
item_num and description are in a table together, and pro_type and area are in 2 separate tables--a total of 3 tables to pull data from.
I need to construct a query that will not pull back any item_num where area is equal to: Finished, Final, and Redo; but I also need to pull in any item_num that meets the type criteria: FLCM and FLEP. In the end my query should look like this:
ExampleDB | item_num | pro_type | area | description
1 | 45KX-76Y | FLCM | Finished | coil8x
2 | 68WO-93H | FLCL | Similar | y45Kx
3 | 84OH-95W | FLEP | Final | tar5x
4 | 81RS-67F | FLEP | Final | tar7x
5 | 19VB-89S | FLDR | Warranty | exp380
6 | 76CS-01U | FLCL | Gator | low5
7 | 28OC-08Z | FLCM | Redo | coil34Y
Try this:
select * from table
join...
where area not in('finished', 'final', 'redo') or type in('flcm', 'flep')
Are you looking for something like
SELECT *
FROM Table_1
JOIN Table_ProType ON Table_1.whatnot = Table_ProType.whatnot
JOIN Table_Area ON Table_1.whatnot = Table_Area.whatnot
WHERE Table.area NOT IN ('Finished','Final','Redo') OR ProType.pro_type IN ('FLCM','FLEP')
Giving the names of the three tables and the joining criteria will help me improve the answer.

Difficult kind of Hierachical Data in Relational Database

I have "components" which can be assembled in different ways into a "system". I want my database to hold all these "components", their type specific data and define how they are connected to each other to form a "system".
The systems are typically gearboxes and they can have rather complex branched designs. Let's start with an easy example:
This system is built up out of Masses (horizontal lines) and Stiffnesses (vertical lines). Gears and clutches are types of masses and come in pairs. Colors represent different branch speeds due to gear ratios. Here's a (bad) example of how I could store everything from this particular illustration:
ID | Type | Clutch | Ends | DrivenBy | NoOfTeeth| Mass | Stiffness
--- | ---- | ------ | ---- | --------- | -------- | ---- | ---------
1 | Mass | | Input1 | | | 5 |
2 | Stiffness | | | | | | 15
3 | Mass | 1.1 | | | | 2 |
4 | Mass | 1.2 | | | | 3 |
5 | Stiffness | | | | | | 20
6 | Gear | | | | 10 | 4 |
7 | Stiffness | | | | | | 30
8 | Gear | | | | 4 | 5 |
9 | Gear | | | 8 | 7 | 2 |
10 | Stiffness | | | | | | 40
11 | Mass | | | | | 4 |
12 | Stiffness | | Output1 | | | | 10
13 | Gear | | | 6 | 5 | 4 |
14 | Stiffness | | | | | | 20
15 | Mass | 2.1 | | | | 4 |
16 | Mass | 2.2 | | | | 3
17 | Stiffness | | | | | | 30
18 | Mass | | Output2 | | | 2 |
Obviously, this is not a very good way to store the data. This design pattern resembles somewhat of a "Repeated attributes" since each component type has a different attribute to be filled. I could create a table for each type of component, but things become more complex when looking at other examples, such as this 2-stage gearbox:
There are also examples with more than 1 input and several outputs, but I can't post more links due to low reputation.
Eitherway, you will see that the usual hierarchical data storage doesn't apply here because the data is not purely "tree-shaped" where everything branches off from 1 main branch.
I think that even though I could store data in the above mentioned way, I will get huge difficulties when it comes to the programming stage.
To add to the complexity, these gearboxes are actually sub-systems to a much bigger system.
So, any suggestions on a good way to store this type of data?*
Perhaps this is a possible way of doing it?
Here you will see that there is a "main" table called GearboxBranch, keeping track of all elements in the gearbox, giving them an id and to identify in which branch the element exists.
Then for the elements themselves, masses are defined in their dedicated table, so are stiffnesses. Gears and Clutches (which are types of masses) are then defined in their perspective tables. A recursive relationship is existing in the gear table, since one gear has to be driven by at least one other gear.
Furthermore, the table with Shaft Ends defines which of the elements in the gearbox are input or output and what number they have.
I can't seem to see any problems with this method, but I'm a little unsure how to get data out of the database. There will be considerable coding involved I'm afraid.

Resources