If you have created limited resources in Shake, can you tell which resource the current rule is using? - shake-build-system

We're hoping to use Shake to run many old Ruby tests in parallel. Many of these tests assume that they have sole control of the test database, so we have been creating multiple instances of the test database and running a test process against each one, running a subset of the overall tests sequentually.
Let's say we create 3 test databases. We can trivially create a resource with a capacity of 3 (newResource "RSpecTestDatabase" 3) and state that an action needs to wait for one to be available (withResource rspecTestDataBase 1 $ ... do stuff ...). Is there anyway to tell which one of the three databases we've been given so we can choose the correct connection string?

No - Shake doesn't give you information about which resource you got - to Shake it's just a number it increments/decrements. One way to solve that is to have a separate MVar containing the list of free databases, and then access it only once you are inside withResource - Shake can control the demand on this resource, and then you can always know there will be a database when you need it.

Related

Should I have to sumit jobs to spark or I can run them from client lib?

So I'm learning about Spark and I have a question about how client libs works.
My goal is to do some sort of data analysis in Spark, telling it where are the data sources (databases, cvs, etc) to process, and store results in hdfs, s3 or any kind of database like MariaDB or MongoDB.
I though about having a service (API application) that "tells" spark what I want to do. The question is: Is it enough setting the master configuration with spark:remote-host:7077 at context creation or should I send the application to spark with some sort of spark-submit command?
This completely depends on how your environment is set up, if all paths are linked to your account you should be able to run one of the two commands, to efficiently open the shell and run test commands. The reason to have a shell, is this will allow you to dynamically run commands and validate/learn how to run/tether commands onto one another and see what results come out.
Scala
spark-shell
Python
pyspark
Inside of the environment, if everything is linked to Hive tables you can check the tables by running
spark.sql("show tables").show(100,false)
The above command will run a "show tables" on the Spark-Hive-Metastore Catalogue and will return all active tables you can see (doesn't mean you can access the underlying data). The 100 means I am going to look at 100 rows and the false means to show the full string not the first N many characters.
In a mythical example if one of the tables you see is called Input_Table you can bring it into the environmrnt with the below commands
val inputDF = spark.sql("select * from Input_Table")
inputDF.count
I would heavily advise, while your learning, not to run the commands via Spark-Submit, because you will need to pass through the Class and Jar, forcing you to edit/rebuild for each testing making it difficult to figure how commands will run without have a lot of down time.

Testing database interactions

I have an API that has a storage layer. It only does the database interactions and perform the CRUD operations. Now I want to test these functions.
In my path API/storage/ , I have different packages having functions to interact with different tables in same database. Tables A, B and C are in same database.
My file hierarchy goes like:
--api
--storage
--A
--A.go
--A_test.go
--B
--C
--server
--A
--testData
--A.sql
--B.sql
In this way I want to test the whole storage layer using command
go test ./...
The approach I was following is that I have a function RefreshTables which first truncates the table, then fills it with a fixed test data that I have kept in testData folder. For truncating I do :
db.Exec("SET FOREIGN_KEY_CHECKS = 0;")
db.Exec("truncate " + table)
db.Exec("SET FOREIGN_KEY_CHECKS = 1;")
As go test runs test functions of different packages in parallel by default, multiple sql connections get created and truncate runs on some other connection while set foreign key runs on some other connection randomly from connection pool.
I am not able to pass my tests if run together but all tests pass if run alone or package wise.
If I do :
go test ./... -p 1
which makes test functions run one by one, all the tests pass.
I have also tried using a transaction for truncate and locking table before truncate.
I checked this article (https://medium.com/kongkow-it-medan/parallel-database-integration-test-on-go-application-8706b150ee2e), and he suggests making different databases in every test function and dropping that database after function ends. I think this will be very time taking.
It would be really helpful if someone suggest the best method for testing database interactions in Golang.
I don't have much experience in integration testing, I'm not sure if mocking the data base drivers could work for you, but if so, I've been using go-sqlmock package for mocking sql database results in unit tests and works like a charm. You could use it and literally have a separated "database engine" for each of your tests. It's a bit time consuming since you have to manually tell the mock what queries to expect and what to return but trust me, it's a good time investment.
As I said before, I'm not sure if using this strategy suits your case, because if you are interested in knowing how your application behaves in a "real database scenario", like verifying that registries are actually saved, then mocking the database results is kind of useless.

Manage SSDT project file properly with version control (*.sqlproj)

We have constant problem with project XML file (*.sqlproj). If the files are added/renoved/changed location then it automatically adds/removes records in some unexpected places. After that we have big troubles by merging it when somebody changes that file also.
We came to conclusion that we might sort it before checkin. We would alphabetically sort it and in that case merge tool will understand it much better.
So, my questions would be:
Is it possible to re-arrange sqlproj file somehow before EVERY check-in? Maybe there are somekind of options/tools that doing that already?
Are there any other ways to make developers life easier?
UPDATE:
Once again I got the same problem. sqlproj file was modified 3 times and I want to merge to production only the last change, other 2 are not tested yet. in the merge tool I have the option to add all these 3 new objects or leave it without changes. I am not able to select only the last change ...
EXAMPLE:
developerA created tableA and checked in;
developerB got the latest version of dev branch, created tableB and checked in;
developerC got the latest version of dev branch, created tableC and checked in. DeveloperC tested the code and ready to go to production. He tries to merge his code to QA and get's the conflict where he has an option only to go with ALL changes.
I understand the scenario you are running into very well. This typically happens when you have multiple work streams happening in the context of a single repository and you don't have a common promotion schedule (as in all work will go to QA at the same time and PROD at the same time).
There's a few ways I can think to get around this problem and there are pros and cons to each option.
Lock each environment until everything can promote together. Not realistic in most cases.
When you are ready to promote, create a promotion branch from source environment and take things out of the promotion branch that aren't ready to promote to destination environment. This allows devs to keep working and be able to promote without freezing.
Hybrid approach... Don't source control anything in Dev until it's ready to promote to test. Then either do option #1 or 2 from there onward.
Create a more flexible ecosystem that can spin up an environment for each Feature branch in order to demo/test with others(or at least allocate/rotate enough between the developers to accomplish the same objective). Once it's accepted promote. This is what we are working towards currently but building out the infrastructure and process when you have a ton of interconnected databases and apps that share them is a bit challenging to say the least (especially in the Microsoft world).
Anyways hopes this helps...
1 - what source control are you using? No source control that I am aware of understands the context of sqlproj files but this isn't normally a problem.
2.a - This shouldn't be a problem you get constantly, are you checking in/out regularly? I would only expect to see issues if different developers are making large scale changes to the projects and not checking out / checking in before and after.
2.b - It is also possible you are not merging correctly, if you take both both sets of changes then it is normally fine.
ed

Creating the Front End MDE

I created a database for tracking metrics, with some automation tricks (email, .doc,.ppt presentations, etc) with a very large Main-table, and lots of forms/GUI. This is the first time I have ever I worried about an MDE/front-end for the thing. So if you would be so kind to answer a few questions, or offer any advice, it would be greatly appreciated (I would hate for all this work to not be utilized).
What is the first thing I need to do? It the 2000 version that must be converted to 03 to create the MDE, but does that get done before I use the database splitter?
Will the amount of objects in the database effect the ability to do this? I have something like 80 forms, 70 queries, 20+ macros, 12 tables, etc...but does the amount of objects prevent some of this from working well once the front end is there?
when i split the database, can I continue to work/make changes and such on the "back end", and have those changes directly effect the front end?
These may be some basic questions, but I don't know the answer so.....Thanks!
Here is my 2 ยข.
Question 1 - I have never used the database splitter as I feel I have more control doing it manually. If you do it manually you can do it to a version that does not have a database splitter. But if you do use the splitter then--yes--you will have to upgrade to a version that has a splitter before doing it.
To do it manually here are the steps.
Backup everything.
Create a copy of your file into the same directory. So if you have an MyApp.MDB create a copy into the same directory with a new name, such as MyAppDATA.mdb.
Open the new DATA file (MyAppDATA.mdb) and delete all of the objects EXCEPT the TABLES.
Open the App file (MyApp.mdb) and delete all of the tables.
Also in MyApp.mdb...go to the File/Get External Data/Link Tables menu to link the tables in MyAppDATA.mdb to MyApp.mdb. Select All and create the links.
That should do it. And if you screw up you made a backup...right?
A couple of tips and gotchas...be sure that you go to Tools/Options and that you are NOT showing System and Hidden tables. You just don't want to delete system tables from MyApp. Another way to do it is do NOT delete tables that start with MSys or USys.
Question 2 - Does not matter how many object you have. In fact you don't have that many objects anyway.
Question 3 - Yes...you will make backend changes in MyAppData.mdb and when you open MyApp.mdb those changes will auto-magically be there to see and query against etc. (In the query designer you may need to save/close/reopen to see new fields if you made the mod while in the query). The EXCEPTION to that is New Tables You will have to use the File/Get External Data/Link Tables option to create links to new tables.
One thing to remember (and that I hope you already realize) is that the one downside of splitting the database is that when you deploy the front end file that usually the relative path to the data will vary from machine to machine and there is no automatic re-linking of tables in access. If your target clients have full access you can always use Tools/Database Utilities/Linked Table Manager to refresh the links to the right location. If you can't do that then you will have to do one of the following:
1. Write code that does the automatic re-linking for you. Basically it will check the links...if invalid it will prompt the user for the data location (or look it up in an INI file) and re-link the tables.
2. Always deploy your app to the same location on all machines. If you have commercial visions for your application this won't work...I mention it for academic reasons. It might be doable for a limited deployment where you have a lot of control over file placement on each machine.
3. Put the Data file (MyAppDATA.mdb) onto a network share and link the table across the network using a drive mapping or UNC (\myserver\mydata\ApplicationData\MyAppData.mdb). The latter is preferred but both of them run the same risks as number two.
Seth
PS This answer assumes Access 2003.
PPS If you have commercial visions for your application then the table linking has got to be REALLY robust.
PPPS I agree with the commenter that you may want to take the plunge and do SQL if it is in your skill set.
One thing that hasn't been discussed, and that's the issue of whether the compile to MDE could fail. Basically, if your code compiles in your front-end MDB, it will convert to an MDE. But I've noticed that lots of people never compile.
Some hints for keeping your VBA code in good shape:
in VBE options, turn off COMPILE ON DEMAND.
add the COMPILE button to your standard VBE toolbar and USE IT OFTEN.
periodically, backup your MDB and decompile/recompile it.
Also, remember that you must keep the MDB source, as the VBA code is not editable in an MDE and not recoverable by any good method.
EDIT:
Steps for a decompile:
backup your MDB.
start an instance of Access with the /decompile commandline argument. For, instance, I have a shortcut on my deskstop that has this as the target:
"C:\Program Files\Microsoft Office\OFFICE11\MSACCESS.EXE" /decompile
having opened that instance of Access, open the MDB you want to decompile. You will see nothing happen. DO NOTHING FURTHER IN THIS INSTANCE OF ACCESS -- close this instance of Access (the reason for this is that Michael Kaplan, who knows a thing or two about this, recommended that you never do any work in an Access instance opened with the decompile switch because he said there was no guarantee that the Access application code executed under those circumstances in a way that was fully safe for all kinds of Access work).
open the just-decompiled MDB holding down the shift key (you want to be sure that startup routines don't run because that would likely recompile the product before you've finished your cleanup) and compact the MDB (holding down the shift key again).
open the code editor and compile the project (DEBUG -> COMPILE [db name] for those who haven't step #2 in my original compiling instructions at the top of the post before the edit).
compact the MDB (doesn't matter if you bypass startup, since it's already fully compiled).
Why so many steps?
Because the purpose of the decompile is to get rid of the compiled p-code in order to start afresh from the canonical VBA code. Following the steps above insures that you have completely cleared the data pages storing the compiled code before you recompile. The reason for this is that without the compact step after the decompile, under some very rare circumstances, the code can behave strangely. I can't imagine that the old discarded p-code is being used again, but there's something about the pointers between the canonical code and the compiled code that apparently doesn't get completely flushed by a decompile without a compact.
This would be a comment to Seth's answer, but my rep isn't high enough to comment yet.
Seth did a great job answering your questions, I just wanted to add a bit more to part #1 about using the Database Splitter. The Database Splitter in the Tools menu works fine. Doing it manually is alright too, but it's a whole lot faster and easier to use the Database Splitter. I've used it a dozen times and never encountered any issues after using it.
http://www.databasedev.co.uk/split_a_database.html has a decent page about some of the pros, cons of splitting your database.
http://www.accessmvp.com/TWickerath/articles/multiuser.htm also has some good info when dealing with a split database in a multi-user environment.
Seth gave you a very good answer. But I'll add a few comments.
The number of objects only becomes relevant when you get close to about 1000 forms, reports and modules which have code. There's a limit about there. If you do get that message when trying to make an MDE then you almost certainly have a code error and need to compile to find the error
Another resource is "Splitting your app into a front end and back end Tips"
See the Auto FE Updater downloads page to make the process of distributing new FEs relatively painless.. The utility also supports Terminal Server/Citrix quite nicely.

How to Test Web Code?

Does anyone have some good hints for writing test code for database-backend development where there is a heavy dependency on state?
Specifically, I want to write tests for code that retrieve records from the database, but the answers will depend on the data in the database (which may change over time).
Do people usually make a separate development system with a 'frozen' database so that any given function should always return the exact same result set?
I am quite sure this is not a new issue, so I would be very interested to learn from other people's experience.
Are there good articles out there that discuss this issue of web-based development in general?
I usually write PHP code, but I would expect all of these issues are largely language and framework agnostic.
You should look into DBUnit, or try to find a PHP equivalent (there must be one out there). You can use it to prepare the database with a specific set of data which represents your test data, and thus each test will no longer depend on the database and some existing state. This way, each test is self contained and will not break during further database usage.
Update: A quick google search showed a DB unit extension for PHPUnit.
If you're mostly concerned with data layer testing, you might want to check out this book: xUnit Test Patterns: Refactoring Test Code. I was always unsure about it myself, but this book does a great job to help enumerate the concerns like performance, reproducibility, etc.
I guess it depends what database you're using, but Red Gate (www.red-gate.com) make a tool called SQL Data Generator. This can be configured to fill your database with sensible looking test data. You can also tell it to always use the same seed in its random number generator so your 'random' data is the same every time.
You can then write your unit tests to make use of this reliable, repeatable data.
As for testing the web side of things, I'm currently looking into Selenium (selenium.openqa.org). This appears to be a cross-browser capable test suite which will help you test functionality. However, as with all of these web site test tools, there's no real way to test how well these things look in all of the browsers without casting a human eye over them!
We use an in-memory database (hsql : http://hsqldb.org/). Hibernate (http://www.hibernate.org/) makes it easy for us to point our unit tests at the testing db, with the added bonus that they run as quick as lightning..
I have the exact same problem with my work and I find that the best idea is to have a PHP script to re-create the database and then a separate script where I throw crazy data at it to see if it breaks it.
I have not ever used any Unit testing or suchlike so cannot say if it works or not sorry.
If you can setup the database with a known quantity prior to running the tests and tear down at the end, then you'll know what data you are working with.
Then you can use something like Selenium to easily test from your UI (assuming web-based here, but there are a lot of UI testing tools out there for other UI-flavours) and detect the presence of certain records pulled back from the database.
It's definitely worth setting up either a test version of the database - or make your test scripts populate the database with known data as part of the tests.
You could try http://selenium.openqa.org/ it is more for GUI testing rather than a data layer testing application but does record your actions which then can be played back to automate tests across different platforms.
Here's my strategy (I use JUnit, but I'm sure there's a way to do the equivalent in PHP):
I have a method that runs before all of the Unit Tests for a specific DAO class. It puts the dev database into a known state (adds all test data, etc.). As I run tests, I keep track of any data added to the known state. This data is cleaned up at the end of each test. After all the tests for the class have run, another method removes all the test data in the dev database, leaving it in the state it was in before the tests were run. It's a bit of work to do all this, but I usually write the methods in a DBTestCommon class where all of my DAO test classes can get to them.
I would propose to use three databases. One production database, one development database (filled with some meaningful data for each developer) and one testing database (with empty tables and maybe a few rows that are always needed).
A way to test database code is:
Insert a few rows (using SQL) to initialize state
Run the function that you want to test
Compare expected with actual results. Here you could use your normal unit testing framework
Clean up the rows that were changed (so the next run won't see the previous run)
The cleanup could be done in a standard way (of course, only in the testing database) with DELETE * FROM table.
In general I agree with Peter but for creating and deleting of test data I wouldn't use SQL directly. I prefer to use some CRUD API that is used in product to create data as similar to production as possible...

Resources