Flink - Test Sink that can be used with .sinkTo - apache-flink

I'm using a KafkaSink to sink the results to kafka like .sinkTo(kafkaSink). I'm trying to come up with an end to end integration test and want to use a simple sink for the same. I came across CollectSink where I can add results to a list and do the matchers. But, CollectSink being SinkFunction, I am not able to use it in .sinkTo, instead addSink is where it can be used.
I have tried PrintSink but I want to read the saved data again to do some matchers.
Can anyone help me on how I can add a test sink so that it can be used along with .sinkTo?
Thank you in advance

You could look at how the integration tests for the Immerok Cookbook are organized. https://www.docs.immerok.cloud/docs/cookbook/creating-dead-letter-queues-from-and-to-apache-kafka-with-apache-flink/ might be good place to start, since it illustrates how to approach testing a job with multiple sinks.
Disclaimer: I work for Immerok

Related

How to make an automatic savepoint in Flink Stateful Functions application?

I am trying to dive into the new Stateful Functions approach and I already tried to create a savepoint manually (https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.1/deployment-and-operations/state-bootstrap.html#creating-a-savepoint).
It works like a charm but I can't find a way how to do it automatically. For example, I have a couple millions of keys and I need to write them all to savepoint.
Is your question about how to replace the env.fromElements in the example with something that reads from a file, or other data source? Flink's DataSet API, which is what's used here, can read from any HadoopInputFormat. See DataSet Connectors for details.
There are easy-to-use shortcuts for common cases. If you just want to read data from a file using a TextInputFormat, that would look like this:
env.readTextFile(path)
and to read from a CSV file using a CsvInputFormat:
env.readCsvFile(path)
See Data Sources for more about working with these shortcuts.
If I've misinterpreted the question, please clarify your concerns.

Data Driven Testing for a database web application

I have a database web application and I need to see all the possible inputs and all the possible outputs of this application (using Selenium or Jmeter).
Actually I tried to understand how the "Input Coverage Method" works in software testing tools but it seems too tough. If I'm not wrong this kind of testing I'm trying to do is a kind of Data Driven testing (means figuring out all the possible input and output of an database web application).
Would you please give a suggestion if there is any tool (I prefer open source) that can do this or any method to create such that test?
Do I have to create it by my own?
First of all you need to create Equivalence Classes that cover most of your input dataset.
After that you can simply run your selenium/JMeter tests with the test data created.
You just need to create single test script and populate the test data in excel or CSV sheets to perform data driven driven testing.
Have a look a jBehave.
It's a BBD tool that can drive selenium and supports sets of input test data.
I've used it and it works well. You'll need patience to get through the glue code, but once you're out the other side you'll be glad you persevered.

database clean up in acceptance tests with specflow

I am a newbie in tdd. I have watched Brandon Satrom's videos. I am trying to implement tests like them ,outer loop for acceptance tests and inner loop for unit tests. I have thought acceptance test was againist to Database ,too.So i expect to find examples about [BeginScenario/AfterScenario] events for database clean up in Specflow.It is said to be used for database Clean up. But None of the examples i saw do it.
Am i misundestanding the acceptance test concept? Doesn't it cover the database too? Should we use mock objects there like we did in unit tests?
I'm using a real MS SQL Server database in my integration unit tests (MSTest) and acceptance testing with BDD tool SpecFlow in this way: I have a dump of my test database (MDF/LDF files) stored as a template. On test initialize I copy them to a temporary location, attach them to a dedicated SQL Server using sp_attach_db stored procedure (you may use an Express edition for this), then I run whatever test code I want and on test cleanup I detach the test database and delete the MDF/LDF files. The whole copy/attach/detach/delete cycle is pretty fast (at least much faster than I thought before).
If you're interested, I could put it into some more words on my blog.
At last i am convinced that i must use the real database in my acceptance tests. I have to see some examples, and read it from several resources before i settle it in my mind.
Now i am using acceptence test as supposed for testing the flow of my user interfaces and database.
i wrote a happy path scenerio for my registration page to design page flow. then i wrote some test for logic that kept in my stored procedures in database. Other logic is on controllers and model classes. So for them i used unit tests. Now it makes more sense to me, until my next confusion about tdd :).
As for clean up process, i use [BeginScenario/AfterScenario] events. At BeginScenario i use a global varible to keep a DateTime.Now.Ticks value and merge it in beginnigs of the values that i sent to db. Then i find the records that start with this DateTime.Now.Ticks value when i making the clean up for that scenario at AfterScenario event. So it helped me to make unique values that doesnt interfere with other records. It seemed to work by now.
Regarding this matter, this article, is very helpful.
It describes the use of transactions in MSDTC, starting at BeginScenario and rolled back at AfterScenario.
(SpecFlow is not used in the article, but its the same concept)
We are currently using this technique with success in a mid scale development project.

How to Test Web Code?

Does anyone have some good hints for writing test code for database-backend development where there is a heavy dependency on state?
Specifically, I want to write tests for code that retrieve records from the database, but the answers will depend on the data in the database (which may change over time).
Do people usually make a separate development system with a 'frozen' database so that any given function should always return the exact same result set?
I am quite sure this is not a new issue, so I would be very interested to learn from other people's experience.
Are there good articles out there that discuss this issue of web-based development in general?
I usually write PHP code, but I would expect all of these issues are largely language and framework agnostic.
You should look into DBUnit, or try to find a PHP equivalent (there must be one out there). You can use it to prepare the database with a specific set of data which represents your test data, and thus each test will no longer depend on the database and some existing state. This way, each test is self contained and will not break during further database usage.
Update: A quick google search showed a DB unit extension for PHPUnit.
If you're mostly concerned with data layer testing, you might want to check out this book: xUnit Test Patterns: Refactoring Test Code. I was always unsure about it myself, but this book does a great job to help enumerate the concerns like performance, reproducibility, etc.
I guess it depends what database you're using, but Red Gate (www.red-gate.com) make a tool called SQL Data Generator. This can be configured to fill your database with sensible looking test data. You can also tell it to always use the same seed in its random number generator so your 'random' data is the same every time.
You can then write your unit tests to make use of this reliable, repeatable data.
As for testing the web side of things, I'm currently looking into Selenium (selenium.openqa.org). This appears to be a cross-browser capable test suite which will help you test functionality. However, as with all of these web site test tools, there's no real way to test how well these things look in all of the browsers without casting a human eye over them!
We use an in-memory database (hsql : http://hsqldb.org/). Hibernate (http://www.hibernate.org/) makes it easy for us to point our unit tests at the testing db, with the added bonus that they run as quick as lightning..
I have the exact same problem with my work and I find that the best idea is to have a PHP script to re-create the database and then a separate script where I throw crazy data at it to see if it breaks it.
I have not ever used any Unit testing or suchlike so cannot say if it works or not sorry.
If you can setup the database with a known quantity prior to running the tests and tear down at the end, then you'll know what data you are working with.
Then you can use something like Selenium to easily test from your UI (assuming web-based here, but there are a lot of UI testing tools out there for other UI-flavours) and detect the presence of certain records pulled back from the database.
It's definitely worth setting up either a test version of the database - or make your test scripts populate the database with known data as part of the tests.
You could try http://selenium.openqa.org/ it is more for GUI testing rather than a data layer testing application but does record your actions which then can be played back to automate tests across different platforms.
Here's my strategy (I use JUnit, but I'm sure there's a way to do the equivalent in PHP):
I have a method that runs before all of the Unit Tests for a specific DAO class. It puts the dev database into a known state (adds all test data, etc.). As I run tests, I keep track of any data added to the known state. This data is cleaned up at the end of each test. After all the tests for the class have run, another method removes all the test data in the dev database, leaving it in the state it was in before the tests were run. It's a bit of work to do all this, but I usually write the methods in a DBTestCommon class where all of my DAO test classes can get to them.
I would propose to use three databases. One production database, one development database (filled with some meaningful data for each developer) and one testing database (with empty tables and maybe a few rows that are always needed).
A way to test database code is:
Insert a few rows (using SQL) to initialize state
Run the function that you want to test
Compare expected with actual results. Here you could use your normal unit testing framework
Clean up the rows that were changed (so the next run won't see the previous run)
The cleanup could be done in a standard way (of course, only in the testing database) with DELETE * FROM table.
In general I agree with Peter but for creating and deleting of test data I wouldn't use SQL directly. I prefer to use some CRUD API that is used in product to create data as similar to production as possible...

How do I unit test persistence?

As a novice in practicing test-driven development, I often end up in a quandary as to how to unit test persistence to a database.
I know that technically this would be an integration test (not a unit test), but I want to find out the best strategies for the following:
Testing queries.
Testing inserts. How do I know that the insert that has gone wrong if it fails? I can test it by inserting and then querying, but how can I know that the query wasn't wrong?
Testing updates and deletes -- same as testing inserts
What are the best practices for doing these?
Regarding testing SQL: I am aware that this could be done, but if I use an O/R Mapper like NHibernate, it attaches some naming warts in the aliases used for the output queries, and as that is somewhat unpredictable I'm not sure I could test for that.
Should I just, abandon everything and simply trust NHibernate? I'm not sure that's prudent.
Look into DB Unit. It is a Java library, but there must be a C# equivalent. It lets you prepare the database with a set of data so that you know what is in the database, then you can interface with DB Unit to see what is in the database. It can run against many database systems, so you can use your actual database setup, or use something else, like HSQL in Java (a Java database implementation with an in memory option).
If you want to test that your code is using the database properly (which you most likely should be doing), then this is the way to go to isolate each test and ensure the database has expected data prepared.
As Mike Stone said, DbUnit is great for getting the database into a known state before running your tests. When your tests are finished, DbUnit can put the database back into the state it was in before you ran the tests.
DbUnit (Java)
DbUnit.NET
You do the unit testing by mocking out the database connection. This way, you can build scenarios where specific queries in the flow of a method call succeed or fail. I usually build my mock expectations so that the actual query text is ignored, because I really want to test the fault tolerance of the method and how it handles itself -- the specifics of the SQL are irrelevant to that end.
Obviously this means your test won't actually verify that the method works, because the SQL may be wrong. This is where integration tests kick in. For that, I expect someone else will have a more thorough answer, as I'm just beginning to get to grips with those myself.
I have written a post here concerning unit testing the data layer which covers this exact problem. Apologies for the (shameful) plug, but the article is too long to post here.
I hope that helps you - it has worked very well for me over the last 6 months on 3 active projects.
Regards,
Rob G
The problem I experienced when unit testing persistence, especially without an ORM and thus mocking your database (connection), is that you don't really know if your queries succeed. It could be that you your queries are specifically designed for a particular database version and only succeed with that version. You'll never find that out if you mock your database. So in my opinion, unit testing persistence is only of limited use. You should always add tests running against the targeted database.
For NHibernate, I'd definitely advocate just mocking out the NHibernate API for unit tests -- trust the library to do the right thing. If you want to ensure that the data actually goes to the DB, do an integration test.
For JDBC based projects, my Acolyte framework can be used: http://acolyte.eu.org . It allows to mockup data access you want to tests, benefiting from JDBC abstraction, without having to manage a specific test DB.
I would also mock the database, and check that the queries are what you expected. There is the risk that the test checks the wrong sql, but this would be detected in the integration tests
I usually create a repository, use that to save my entity and retrieve a fresh one. Then I assert that the retrieved is equal to the saved.
Technically unit tests of persistance are not unit tests. They are integration tests.
With C# using mbUnit, you simply use the SqlRestoreInfo and RollBack attributes:
[TestFixture]
[SqlRestoreInfo(<connectionsting>, <name>,<backupLocation>]
public class Tests
{
[SetUp]
public void Setup()
{
}
[Test]
[RollBack]
public void TEST()
{
//test insert.
}
}
The same can be done in NUnit, except the attribute names differ slightly.
As for checking, if your query iss successful, you normally need to follow it with a second query to see if the database has been changed as you expected.

Resources