Combining multiple external datasets with cucumber/selenium test environments - selenium-webdriver

When developing automated tests using cucumber and selenium webdriver in java, I use excel spreadsheets as datasets for the gherkin scenarios instead of the traditional Examples tables, using a simple table with only the row numbers in my feature files. This works very well when doing tests that only make use of data from one spreadsheet at a time, but when implementing tests that make use of multiple spreadsheets, how does one ensure the it iterates over every combination.
For example, when testing multiple configurations and their impact on the main interface, I provide the configuration data, let's say 3 combinations of different configurations, using the first spreadsheet, and in my gherkin feature I only enter the row numbers and use the code to handle the actual reading of data.
When the user uses configuration from row <ExcelRow>
...
Examples:
| ExcelRow |
| 1 |
| 2 |
| 3 |
The problem arises when I want to test such configurations with different combinations of inputs in the main interface, also provided via a separate excel spreadsheet. I want configuration from row 1 to be run with all rows from the second spreadsheet, before moving on to row 2's configuration and so on.
Manually using the examples table to do the combinations does the job when working with smaller data sets
Examples:
| ConfigRow | InputRow |
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
| 3 | 2 |
The problem arises when using very large datasets, where the examples table starts to clutter the feature file despite only containing the row numbers.
I tried implementing the actual input testing as a single step that loops over the entire excel spreadsheet for each configuration, but that forced me to do my assertions in the same loop and not in the Then step.

If you want to mention only config row in feature file and you want that some other rows to be executed for each config row then you may want to utilize cucumber-guice and make it #ScenarioScoped . Cucumber-guice can initialize same classes for each scenario independently. You would need these dependencies in your pom
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-guice</artifactId>
<version>${cucumber.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.google.inject</groupId>
<artifactId>guice</artifactId>
<version>4.2.3</version>
<scope>test</scope>
</dependency>
Then , in a global class you can do
import io.cucumber.guice.ScenarioScoped;
#ScenarioScoped
public class Global {
public Helpers help;
//constructor
public Global() {
//helper class would contain code that does all the second excel sheet work
help = new Helpers();
}
}
In step def you can do
//import Global and guice dependencies
import yourPackage.Global;
import com.google.inject.Inject;
...
...
public class stepDef {
#Inject
Global global;
#When ("the user uses configuration from row {int}")
public void useConfigs(){
global.help.doSomeExcelWork();
}
#Then ("I assert from excel sheet")
public void doAssertions(){
//do assertions here. These
global.help.doAssertion();
}
}
Your helper class could be something like this
public class Helper {
public void doSomeExcelWork(){
//do excel work
}
public void doAssertion(){
//return values for your assertions
}
}
Your feature file would look like
When the user uses configuration from row <ExcelRow>
Examples:
| ExcelRow |
| 1 |
| 2 |
Then I assert from excel sheet
Now , for all your examples (scenarios) global would be injected independently and the then statement also would be called for each example row

I am not sure if that is possible to do via Cucumber. You may try searching for Cucumber Java dynamic examples generation like here.
I just would like to question if Cucumber/Gherkin are the right tools for what you wanted to achieve. The primary goal of Gherkin / Cucumber / Specflow scenarios is demonstrate the behavior of the system to anyone reading the feature file. So hiding the data in "linked" files can be accepted if they hold a complex piece of data which is as a single unit, provided the file name demonstrates what is special about the data inside.
What you might be looking for are Parameterized Tests and Data Providers that are available in JUnit 5 and TestNG 2, 3. If you write automation framework in the way that Cucumber, or other test framework becomes only a thin wrapper around it which "assembles" the test, you can generate tests on the fly.
For example your step: "When the user uses configuration from row", becomes
public void whenUserUsesConfiguration(SutConfiguration configuration) {
// your configuration setup goes here
// but you do not read configuration from file in this method
}
Method above can be used in both Cucumber steps and JUnit/TestNG test without loosing any readability, or understandability.
By splitting your tests into two parts, one for understanding general system behavior and accessible to all stakeholders, and the one that check lots of small nuances, but both using the same framework will allow you to have greater flexibility, and more pleasant development experience.

Related

How is inference by enumeration done on bayesian networks?

For instance, if given the following Bayesian network and probabilities how would I find P(BgTV | not(GfC). I attempted to do so by simply using the equivalence that P(A|B) = P(A and B)/P(B) but that resulted in me having a value of 200% which is not possible. Do I need to treat George_feeds_cat as a dependent event as per the network and use what I know from baseball_game_on_TV and George_watches_TV to calculate the odds? Any guidance would be much appreciated!
Indeed, you need all the parameters for answering your question (oCF seems to be independent from BgTV and GwTV but knowing GfC, this is not the case anymore).
Using the columns, instead of the symbols, you want :
P(A|C)=P(A,C)/P(C)=sum_{B,D} P(A,B,C,D) / sum_{A,B,D} P(A,B,C,D)
with the joint distribution factorized using the BN
P(A,B,C,D)=P(A)*P(B|A)*P(C)*P(D|C,B)
In Python using the package pyAgrum, you would write :
# model
import pyAgrum as gum
bn=gum.fastBN("BgTV->GwTV->GfC<-oCF")
# where do those numbers come from ? :-)
bn.cpt("BgTV").fillWith([1-0.3041096,0.3041096])
bn.cpt("oCF").fillWith([1-0.169863,0.169863])
bn.cpt("GwTV")[{"BgTV":0}]=[1-0.1181102,0.1181102]
bn.cpt("GwTV")[{"BgTV":1}]=[1-0.9279279,0.9279279]
bn.cpt("GfC")[{"GwTV":0,"oCF":0}]=[1-0.9587629,0.9587629]
bn.cpt("GfC")[{"GwTV":0,"oCF":1}]=[1-0.3157895,0.3157895]
bn.cpt("GfC")[{"GwTV":1,"oCF":0}]=[1-0.706422,0.706422]
bn.cpt("GfC")[{"GwTV":1,"oCF":1}]=[1-0.0416667,0.0416667]
# compute
joint=bn.cpt("BgTV")*bn.cpt("GwTV")*bn.cpt("GfC")*bn.cpt("oCF")
joint.margSumIn(["GfC","BgTV"])/joint.margSumIn("GfC")
which should give you
|| BgTV |
GfC ||0 |1 |
------||---------|---------|
0 || 0.5159 | 0.4841 |
1 || 0.7539 | 0.2461 ||
Where you see that P(BgTV=1|GfC=0)=48.41%
Using a notebook, the model :
And the inference (using another method with junction tree) :

How do I get a dataframe or database write from TFX BulkInferrer?

I'm very new to TFX, but have an apparently-working ML Pipeline which is to be used via BulkInferrer. That seems to produce output exclusively in Protobuf format, but since I'm running bulk inference I want to pipe the results to a database instead. (DB output seems like it should be the default for bulk inference, since both Bulk Inference & DB access take advantage of parallelization... but Protobuf is a per-record, serialized format.)
I assume I could use something like Parquet-Avro-Protobuf to do the conversion (though that's in Java and the rest of the pipeline's in Python), or I could write something myself to consume all the protobuf messages one-by-one, convert them into JSON, deserialize the JSON into a list of dicts, and load the dict into a Pandas DataFrame, or store it as a bunch of key-value pairs which I treat like a single-use DB... but that sounds like a lot of work and pain involving parallelization and optimization for a very common use case. The top-level Protobuf message definition is Tensorflow's PredictionLog.
This must be a common use case, because TensorFlowModelAnalytics functions like this one consume Pandas DataFrames. I'd rather be able to write directly to a DB (preferably Google BigQuery), or a Parquet file (since Parquet / Spark seems to parallelize better than Pandas), and again, those seem like they should be common use cases, but I haven't found any examples. Maybe I'm using the wrong search terms?
I also looked at the PredictExtractor, since "extracting predictions" sounds close to what I want... but the official documentation appears silent on how that class is supposed to be used. I thought TFTransformOutput sounded like a promising verb, but instead it's a noun.
I'm clearly missing something fundamental here. Is there a reason no one wants to store BulkInferrer results in a database? Is there a configuration option that allows me to write the results to a DB? Maybe I want to add a ParquetIO or BigQueryIO instance to the TFX pipeline? (TFX docs say it uses Beam "under the hood" but that doesn't say much about how I should use them together.) But the syntax in those documents looks sufficiently different from my TFX code that I'm not sure if they're compatible?
Help?
(Copied from the related issue for greater visibility)
After some digging, here is an alternative approach, which assumes no knowledge of the feature_spec before-hand. Do the following:
Set the BulkInferrer to write to output_examples rather than inference_result by adding a output_example_spec to the component construction.
Add a StatisticsGen and a SchemaGen component in the main pipeline right after the BulkInferrer to generate a schema for the aforementioned output_examples
Use the artifacts from SchemaGen and BulkInferrer to read the TFRecords and do whatever is neccessary.
bulk_inferrer = BulkInferrer(
....
output_example_spec=bulk_inferrer_pb2.OutputExampleSpec(
output_columns_spec=[bulk_inferrer_pb2.OutputColumnsSpec(
predict_output=bulk_inferrer_pb2.PredictOutput(
output_columns=[bulk_inferrer_pb2.PredictOutputCol(
output_key='original_label_name',
output_column='output_label_column_name', )]))]
))
statistics = StatisticsGen(
examples=bulk_inferrer.outputs.output_examples
)
schema = SchemaGen(
statistics=statistics.outputs.output,
)
After that, one can do the following:
import tensorflow as tf
from tfx.utils import io_utils
from tensorflow_transform.tf_metadata import schema_utils
# read schema from SchemaGen
schema_path = '/path/to/schemagen/schema.pbtxt'
schema_proto = io_utils.SchemaReader().read(schema_path)
spec = schema_utils.schema_as_feature_spec(schema_proto).feature_spec
# read inferred results
data_files = ['/path/to/bulkinferrer/output_examples/examples/examples-00000-of-00001.gz']
dataset = tf.data.TFRecordDataset(data_files, compression_type='GZIP')
# parse dataset with spec
def parse(raw_record):
return tf.io.parse_example(raw_record, spec)
dataset = dataset.map(parse)
At this point, the dataset is like any other parsed dataset, so its trivial to write a CSV, or to a BigQuery table or whatever from there. It certainly helped us in ZenML with our BatchInferencePipeline.
Answering my own question here to document what we did, even though I think #Hamza Tahir's answer below is objectively better. This may provide an option for other situations where it's necessary to change the operation of an out-of-the-box TFX component. It's hacky though:
We copied and edited the file tfx/components/bulk_inferrer/executor.py, replacing this transform in the _run_model_inference() method's internal pipeline:
| 'WritePredictionLogs' >> beam.io.WriteToTFRecord(
os.path.join(inference_result.uri, _PREDICTION_LOGS_FILE_NAME),
file_name_suffix='.gz',
coder=beam.coders.ProtoCoder(prediction_log_pb2.PredictionLog)))
with this one:
| 'WritePredictionLogsBigquery' >> beam.io.WriteToBigQuery(
'our_project:namespace.TableName',
schema='SCHEMA_AUTODETECT',
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
custom_gcs_temp_location='gs://our-storage-bucket/tmp',
temp_file_format='NEWLINE_DELIMITED_JSON',
ignore_insert_ids=True,
)
(This works because when you import the BulkInferrer component, the per-node work gets farmed out to these executors running on the worker nodes, and TFX copies its own library onto those nodes. It doesn't copy everything from user-space libaries, though, which is why we couldn't just subclass BulkInferrer and import our custom version.)
We had to make sure the table at 'our_project:namespace.TableName' had a schema compatible with the model's output, but didn't have to translate that schema into JSON / AVRO.
In theory, my group would like to make a pull request with TFX built around this, but for now we're hard-coding a couple key parameters, and don't have the time to get this to a real public / production state.
I'm a little late to this party but this is some code I use for this task:
import tensorflow as tf
from tensorflow_serving.apis import prediction_log_pb2
import pandas as pd
def parse_prediction_logs(inference_filenames: List[Text]): -> pd.DataFrame
"""
Args:
inference files: tf.io.gfile.glob(Inferrer artifact uri)
Returns:
a dataframe of userids, predictions, and features
"""
def parse_log(pbuf):
# parse the protobuf
message = prediction_log_pb2.PredictionLog()
message.ParseFromString(pbuf)
# my model produces scores and classes and I extract the topK classes
predictions = [x.decode() for x in (message
.predict_log
.response
.outputs['output_2']
.string_val
)[:10]]
# here I parse the input tf.train.Example proto
inputs = tf.train.Example()
inputs.ParseFromString(message
.predict_log
.request
.inputs['input_1'].string_val[0]
)
# you can pull out individual features like this
uid = inputs.features.feature["userId"].bytes_list.value[0].decode()
feature1 = [
x.decode() for x in inputs.features.feature["feature1"].bytes_list.value
]
feature2 = [
x.decode() for x in inputs.features.feature["feature2"].bytes_list.value
]
return (uid, predictions, feature1, feature2)
return pd.DataFrame(
[parse_log(x) for x in
tf.data.TFRecordDataset(inference_filenames, compression_type="GZIP").as_numpy_iterator()
], columns = ["userId", "predictions", "feature1", "feature2"]
)

Loading of mobilenet v2 works, but pretrained mobilenet v2 fails

I retrain a mobilenet v2 modell using my own images and i can label new images with the output in python (https://www.tensorflow.org/hub/tutorials/image_retraining). Loading the file works, but during prediction it fails with (concole.log of Firefox and Chromium):
The dict provided in model.execute(dict) has keys: [images] not part of model graph.
I retrain a modell using the provided retrain.py
python retrain.py --image_dir flower_photos/ --tfhub_module https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/classification/2 --random_brightness 10 --how_many_training_steps 100
inside flower_photos there are folders with the name of the images and inside the appropriate images.
flower_photos
--- Huflattich
------- 1.jpg
------- 2.jpg
....
--- Buschwindröschen
------- 1.jpg
------- 2.jpg
I can convert this model using
tensorflowjs_converter --input_format=tf_frozen_model --output_node_names='module_apply_default/MobilenetV2/Logits/output' /tmp/output_graph.pb Mobilenetv2/web_model
but this isn't working inside the provided example from https://github.com/tensorflow/tfjs-examples/tree/master/mobilenet
If i convert the original mobilenet v2 using
tensorflowjs_converter --input_format=tf_hub 'https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/classification/2' mobilenetv2/web_model
i can load inside the provided example.
In the end, the programme should detect different early bloomer flowers shown by the webcam and classify. This should be a PWA for students and motivate them to experience nature.
Tensorflow.js currently has two types of models,
Layers model which allows training, you can load them with tf.loadModel(...)
Models that are converted from TensorFlow generated models, which does not allow training. This is what you have, you should use tf.loadFrozenModel(...)
here is an example for loading the frozen model and performing a prediction on an image. https://github.com/tensorflow/tfjs-converter/tree/master/demo/mobilenet

How to use java #repeatable with cucumber

I have different string for same businesses logic in cucumber.
So I trying to get a way to tag a multiple Gherkins string with one function.
I am trying with below but I m not able understand to formulate it with cucumber
Using #Repeatable while mainaining support for Java 7
Example:
Scenario Outline: Looking up the definition of fruits
the user is on the Wikionary home page for fruits
When the user looks up the definition of the word <name>
Then they should see the definition 'An edible fruit produced by the pear tree, similar to an apple but elongated towards the stem.'
Examples:
| name |
| pear |
Scenario Outline: Looking up the definition of orange
Given the user is on the Wikionary home page for orange
When the user looks up the definition of the word <name>
Then they should see the definition 'An edible fruit produced by the pear tree, similar to an apple but elongated towards the stem.'
Examples:
| name |
| pear |
In above statement Given is different but the business function is same.
How I can tag this with repeatable with java.
Or any other way except concatenate string with |
Any work around will be helpful!!!
Have a step definition like this - It should match any similar step and also non-capturing
#Given("^the user is on the Wikionary home page for (?:\\w+)$")
public void given() {
System.out.println("givn");
}
#Given("^should go to given (?:,*) $")
#Given("^should go to given - (.*?) - (?:,*) $")
#Given("^should go to given - (.*?) - (.*?) - (?:,*) $")
This will take in different parameters. But this will completely ruin the gherkin step text, make it total gibberish. Would be very uncomfortable using this.
You can write the Step Definition Java code only once for above both scenario it will automatically run the same step definition code for the two different scenarios:
Scenario Outline: Looking up the definition of fruits
Given the user is on the Wikionary home page for "fruits"
Scenario Outline: Looking up the definition of orange
Given the user is on the Wikionary home page for "orange"
For above #Given statement you can write only one step defination method it will automatically execute for both scenario as per the different parameters configuration:
#Given("the user is on the Wikionary home page for (.*))
public void given(String fruitName)
{
System.out.println(fruitName);
}

How to store an array returned by javascript function split in Selenium IDE

Not a developer, new to Selenium IDE, and yes, limited to sticking with IDE only. Appreciate any and all help.
Trying to grab a password from an email generated upon Password Reset so the script can then log in with the new password.
Thought I'd do a split on a delimiter in the email content, Trim as necessary to grab the password. Running into problems with how to store the returned array. In order to do what I'm thinking, I need to store it back into an array that Selenium can traverse.
storeText | css=body | emailText
getEval | storeResults = javascript{storedVars['emailText'].split("delimiter")}
The getEval throws an "missing ; before statement" exception. Using method store instead of getEval works (and moving storeResults to a target), but then the results are typecast as a string. I feel I'm missing something very basic here.
I think the keyword javascript must be omitted
For me the following code works:
storeText | //*[#id="_currentProduct"] | myText
getEval | alert(storedVars['myText'])
When run the alertbox has the value of myText.
Thanks for the response. I came to the same conclusion. Here's the working code:
getEval | storeResults = storedVars['emailText'].split("delimiter")
Was able to access the stored value this way:
LOG.info(storeResults[1])

Resources