Benchmark postgresql cluster (postgres-xc) using Daabase Test tool-2 (DBT2)

Benchmark postgresql cluster (postgres-xc) using Daabase Test tool-2 (DBT2) - database

I'm trying to understand benchmark tool DBT2. I want to benchmark on postgres-xc, i could not get much information about how to set up the environment of postgres-xc for DBT2. I have got the DBT2 from the link
I have fixed some compilation issues. But i have not much idea on how to use this tool. Please provide some insights. I'm completely new o this. thanks!!

If you are unsure of how to use DBT-2, I would focus on just using it against PostgreSQL itself.
Once you are more familiar with it, you can try DBT-2. I have only run DBT-1 and DBT-3 against Postgres-XC and Postgres-XL.
Note that the DBT-1 version used in testing XC has the schema partitioned a certain way to reduce the amount of implicit two phase commits that you need to do. I have not looked at DBT-2, but you should select the hash distribution column carefully (CREATE TABLE .... DISTRIBUTE BY HASH(some_column)). Also, for tables that are fairly static, please add DISTRIBUTE BY REPLICATION to the CREATE TABLE statements.
If you do end up testing Postgres-XC or Postgres-XL, I would love to hear about your results.

Related

Is there a way to translate database table rows into Prolog facts?

After doing some research, I was amazed with the power of Prolog to express queries in a very simple way, almost like telling the machine verbally what to do. This happened because I've become really bored with Propel and PHP at work.
So, I've been wondering if there is a way to translate database table rows (Postgres, for example) into Prolog facts. That way, I could stop using so many boring joins and using ORM, and instead write something like this to get what I want:
mantenedora_ies(ID_MANTENEDORA, ID_IES) :-
papel_pessoa(ID_PAPEL_MANTENEDORA, ID_MANTENEDORA, 1),
papel_pessoa(ID_PAPEL_IES, ID_IES, 6),
relacionamento_pessoa(_, ID_PAPEL_IES, ID_PAPEL_MANTENEDORA, 3).
To see why I've become bored, look at this post. The code there would be replaced for these simple lines ahead, much easier to read and understand. I'm just curious about that, since it will be impossible to replace things around here.
It would also be cool if something like that was possible to be done in PHP. Does anyone know something like that?

check the ODBC interface of swi-prolog (maybe there is something equivalent for other prolog implementations too)
http://www.swi-prolog.org/pldoc/doc_for?object=section%280,%270%27,swi%28%27/doc/packages/odbc.html%27%29%29

I can think of a few approaches to this -
On initialization, call a method that performs a selects all data from a table and asserts it into the db. Do this for each db. You will need to declare the shape of each row as :- dynamic ies_row/4 etc
You could modify load_files by overriding user:prolog_load_files. From this activity you could so something similar to #1. This has the benefit of looking like a load_files call. http://www.swi-prolog.org/pldoc/man?predicate=prolog_load_file%2F2 ... This documentation mentions library(http_load), but I cannot find this anywhere (I was interested in this recently)!

There is the Draxler Prolog to SQL compiler, that translates some pattern (like the conjunction you wrote) into the more verbose SQL joins. You can find in the related post (prolog to SQL converter) more info.
But beware that Prolog has its weakness too, especially regarding aggregates. Without a library, getting sums, counts and the like is not very easy. And such libraries aren't so common, and easy to use.
I think you could try to specialize the PHP DB interface for equijoins, using the builtin features that allows to shorten the query text (when this results in more readable code). Working in SWI-Prolog / ODBC, where (like in PHP) you need to compose SQL, I effettively found myself working that way, to handle something very similar to what you have shown in the other post.
Another approach I found useful: I wrote a parser for the subset of SQL used by MySQL backup interface (PHPMyAdmin, really). So routinely I dump locally my CMS' DB, load it memory, apply whathever duty task I need, computing and writing (or applying) the insert/update/delete statements, then upload these. This can be done due to the limited size of the DB, that fits in memory. I've developed and now I'm mantaining this small e-commerce with this naive approach.
Writing Prolog from PHP should be not too much difficult: I'd try to modify an existing interface, like the awesome Adminer, that already offers a choice among basic serialization formats.

In Kyoto Cabinet Database using File Hash Database, how can avoid file size increasing?

I am using the follow to open:
db.open("db.kch#tune_defrag=10000", DB.OWRITER | DB.OCREATE)
I am putting and removing elements. At the end of execution, the database is "empty", the count() function returns 0, because I remove all elements. Why the file size always increase when I repeat the test? Is it possible to run something like "garbage collector" to clean the removed registers? If I execute the same test 100 times, I have a 500 MB size database, even I have only 2 records.
I tried to put the "tune_defrag=10000" but I think it didn't work fine.
Obs.: a only register is less than 1K, I don't understand why the registers take so much space in disk.
Thanks for any help

Try this:
db.kch#dfunit=8
That means the KC runs defrag for every 8 fragmentation detected, and 8 is actually recommended by Mikio.
Available options are listed here, however it could use some polish.
http://fallabs.com/kyotocabinet/command.html

running:
kchashmgr defrag path_to_kcabinet_file
is what i do to get the db file 'resized'. I didn't find the api access to this, that's why i do it with a shell command using kchashmgr utility (obviosuly this can be called from inside a program).

I haven't used this particular db but in some other ones a hack to resolve this issue is to copy the db into a new one, and then delete the old one. After making sure it copied well :).
I've implemented this process into production systems, as long as it is coded really really well it should work.

From a cursory look through the kyoto documentation it doesn't appear that you have any way to resize or otherwise clean the database of deleted records... or really to manage it in any way shape or form.
That project looks like it is a long way off from being "production ready". If you really want to implement it, I'd suggest contacting the project owners (http://fallabs.com/) and seeing if they have any plans for some much needed utility functions.
Otherwise, I'd suggest moving to a different nosql style database that is a bit more mature.

Relational database data explorer / visualization?

Is there a tool that can let one browse relational data as a graph of connected nodes?
For example, i'm faced with trying to cleanse some anomolous data. i can start with two offending rows. In this particular example, the TransactionID should, by business rules, be unique to the table, but i find a transaction that violates that rule:
SELECT * FROM LCTTrans
WHERE TransactionID = 1075048
LCTID TransactionID
========= =============
4358 1075048
4359 1075048
2 row(s) affected
But really what i want to begin to hunt down all the related data, to try to see which is right. So this hypothetical software would start by showing me these two rows:
Next, i want to see that transaction that is linked into this table:
Now that transaction points to an MAL, so show me that:
Now lets add those two LCTs, that the transaction is "on". A transaction can be on only one LCT, yet this one is pointing to two:
Okay computer, both of those LCTs point to an MAL and the transaction that created them, show me those:
Those last two transactions, they also point at an MAL, and they themselves point to an LCT, show me those:
Okay, now are there any entries in LCTTrans that point to LCTs 4358 or 4359?...
And so on, and so on.
Now i did all this manually, running single selects, copying and pasting uniqueidentifier keys and converting them into friendly id numbers so i could easily see the relationships.
Is there software that can do this?

Ok, well I liked this idea so much that I've written it.
It's not released yet, but when it is it will be free.
Edit
Ok, it's now released. Free relational database exploring goodness # http://www.atlantis-interactive.co.uk/products/datasurf/default.aspx
Edit
Although initially free, this is now part of Pragmatic Works' DBA xPress package.

DBeauty is a powerful data browser (similar to Matt Whitfield's excellent DataSurf but more powerful). It is Java based, so you need to download the JDBC driver. I've found this tool invaluable in quickly navigating data (I fell in love with Microsoft's Quadrant before they killed it off and have been looking for a replacement ever since).

Old but good and free DB subsetting tool Jailer should be able to answer the question.
http://jailer.sourceforge.net/

Yes, i would advice you to look into DbSchema, it's a neet database management tool that will help you.

I can think of a few for relational data (RDF, Topic Map and conceptual graph browsers), but none off-hand for SQL. You could try and translate your queries to a relational language the browsers understand. You also might be able to build something on top of skyrails. Most of the visualisations I've tagged on delicious are for graph or relational data, but again tend to be schema free rather than SQL.

Basically you write a dedup tool where you show both records onthe screen side by side with the ability to pick the record you wan to keep but to check individual data from the other record to keep as well. Since deduping is very differnt from database to database and highly dependant on the specific table structure and business rules you have (as well as knowledge about which things must be looked at for the type of deduping you are doing as they typically only show the most important relationship tables on screen), I have never seen one that wasn't built in house.
But if you want a quick look at all the data write a query that left joins to all the child tables and shows all the fields for both transactionids. Then read through your results.
More importantly, how did you end up with a dup if you hav ea business rule that requires the transactionid to be uninique. Did you forget that all of these types of rules must be enfoced through the datbase and not the application? Why was there no unique index on that field?

I've looked for open source software that can do this sort of link analysis, without much success. If you have enough of a budget to go proprietary, you might consider talking to Palantir Technologies, Centrifuge Systems, i2, etc. about analytics platforms and visualization technologies.

Try This tool - it is in russian, but interface is comprehensive http://sourceforge.net/projects/basescan/. Navigation in base is through drag and drop.

How do I test a code generation tool?

I am currently developing a small project of mine that generates SQL calls in a dynamic way to be used by an other software. The SQL calls are not known beforehand and therefore I would like to be able to unit test the object that generates the SQL.
Do you have a clue of how would be the best approach to do this? Bear in mind that there is no possible way to know all the possible SQL calls to be generated.
Currently the only idea I have is to create test cases of the accepted SQL from the db using regex and make sure that the SQL will compile, but this does not ensure that the call returns the expected result.
Edited: Adding more info:
My project is an extension of Boo that will allow the developer to tag his properties with a set of attributes. This attributes are used to identify how the developers wants to store the object in the DB. For example:
# This attribute tells the Boo compiler extension that you want to
# store the object in a MySQL db. The boo compiler extension will make sure that you meet
# the requirements
[Storable(MySQL)]
class MyObject():
# Tells the compiler that name is the PK
[PrimaryKey(Size = 25)]
[Property(Name)]
private name as String
[TableColumn(Size = 25)]
[Property(Surname)]
private surname as String
[TableColumn()]
[Property(Age)]
private age as int
The great idea is that the generated code wont need to use reflection, but that it will added to the class in compile time. Yes the compilation will take longer, but there won't be a need to use Reflection at all. I currently have the code working generating the required methods that returns the SQL at compile time, they are added to the object and can be called but I need to test that the generated SQL is correct :P

The whole point of unit testing is that you know the answer to compare the code results to. You have to find a way to know the SQL calls before hand.
To be honest, as other answerers have suggested, your best approach is to come up with some expected results, and essentially hard-code those in your unit tests. Then you can run your code, obtain the result, and compare against the hard-coded expected value.
Maybe you can record the actual SQL generated, rather than executing it and comparing the results, too?

This seems like a hen-egg situation. You aren't sure what the generator will spit out and you have a moving target to test against (the real database). So you need to tie the loose ends down.
Create a small test database (for example with HSQLDB or Derby). This database should use the same features as the real one, but don't make a copy! You will want to understand what each thing in the test database is for and why it is there, so invest some time to come up with some reasonable test cases. Use your code generator against this (static) test database, save the results as fixed strings in your test cases. Start with a single feature. Don't try to build the perfect test database as step #1. You will get there.
When you change the code generator, run the tests. They should only break in the expected places. If you find a bug, replicate the feature in question in your test database. Create a new test, check the result. Does it look correct? If you can see the error, fix the expected output in the test. After that, fix the generator so it will create the correct result. Close the bug and move on.
This way, you can build more and more safe ground in a swamp. Do something you know, check whether it works (ignore everything else). If you are satisfied, move on. Don't try to tackle all the problems at once. One step at a time. Tests don't forget, so you can forget about everything that is being tested and concentrate on the next feature. The test will make sure that your stable foundation keeps growing until you can erect your skyscraper on it.

regex
I think that the grammar of SQL is non-regular, but context-free; subexpressions being the key to realize this. You may want to write a context-free parser for SQL to check for syntax errors.
But ask yourself: what is it you want to test for? What are your correctness criteria?

If you are generating the code, why not also generate the tests?
Short of that, I would test/debug generated code in the same way you would test/debug any other code without unit tests (i.e. by reading it, running it and/or having it reviewed by others).

You don't have to test all cases. Make a collection of example calls, be sure to include as many of the difficult aspects that the function will have to handle as possible, then look if the generated code is correct.

I would have a suite of tests that put in a known input and check that the generated SQL is as expected.
You're never going to be able to write a test for every scenario but if you write enough to cover at least the most regular patterns you can be fairly confident your generator is working as expected.
If you find it doesn't work in a specific scenario, write another test for that scenario and fix it.

How to implement database engine independent paging?

Task: implement paging of database records suitable for different RDBMS. Method should work for mainstream engines - MSSQL2000+, Oracle, MySql, etc.
Please don't post RDBMS specific solutions, I know how to implement this for most of the modern database engines. I'm looking for the universal solution. Only temporary tables based solutions come to my mind at the moment.
EDIT:
I'm looking for SQL solution, not 3rd party library.

There would have been a universal solution if SQL specifications had included paging as a standard. The requirement for any RDBMS language to be called an RDBMS language does not include paging support as well.
Many database products support SQL with proprietary extensions to the standard language. Some of them support paging like MySQL with the limit clause, Rowid with Oracle; each handled differently. Other DBMS's will need to add a field called rowid or something like that.
I dont think you can have a universal solution (anyone is free to prove me wrong here;open to debate) unless it is built into the database system itself or unless there is a company say ABC that uses Oracle, MySQL, SQL Server and they decide to have all the various database systems provide their own implementation of paging by their database developers providing a universal interface for the code that uses it.

The most natural and efficient way to do paging is using the LIMIT/OFFSET (TOP in Sybase world) construct. A DBindependent way would have to know which engine it's running on and apply the proper SQL construct.
At least, that's the way I've seen it done in DB independent libraries' code. You can abstract away the paging logic once you get the data from the engine with the specific query.
If you really are looking for a single, one SQL sentence solution, could you show what you have in mind? Like the SQL for the temp table solution. That would probably get you more relevant suggestions.
EDIT:
I wanted to see what were you thinking because I couldn't see a way to do it with temp tables and not use a engine specific construct. You used specific constructs in the example. I still don't see a way to implement paging in the database with only (implemented) standard SQL. You could bring the whole table in standard SQL and page in the application, but that is obviously stupid.
So the question would now be more like "Is there a way to implement paging without using LIMIT/OFFSET or equivalent?" and I guess that the answer is "Sanely, no." You could try using cursors but you'll fall prey to database specific sentences/behavior there as well.
A wacko (read stupid) idea that just occurred to me would be to add a page column to the table, say create table test (id int, name varchar, phone varchar, page int) and then you can get page 1 with select * from table where page = 1. But that means having to add code to maintain that column, which, again could only be done by either bringing the whole database or using database specific constructs. That besides having to add a different column per each possible ordering and many other flaws.
I can't provide proof, but I really think you just can't do it sanely.

Proceed as usual:
Start by implementing it according to the standard. And then handle the corner cases, i.e. the DBMSes which don't implement the standard. How to handle the corner cases depends on your development environment.
You are looking for a "universal" approach. The most universal way to paginate is through the use of cursors, but cursor-based pagination don't fit very well with a non-stateful environment like a web application.
I've written about the standard and the implementations (including cursors) here:
http://troels.arvin.dk/db/rdbms/#select-limit-offset

SubSonic can do this for you if you if you can tolerate Open Source...
http://subsonicproject.com/querying/webcast-using-paging/
Other than that I know NHib does as well

JPA lets you do it with the Query class:
Query q = ...;
q.setFirstResult (0);
q.setMaxResults (10);
gives you the first 10 results in the result set.
If you want a DBMS independent raw SQL solution, I'm afraid you're out of luck. All the vendors do it differently.

#Vinko Vrsalovic,
as I wrote in question I know how to do it in most DBs. I what to find universal solution or get a proof that it doesn't exist.
Here is one stupid solution based on temporary table. It's obviously bad, so no need to comment on it.
N - upper bound
M - lower bound
create #temp (Id int identity, originalId int)
insert into #temp(originalId)
select top N KeyColumn from MyTable
where ...
select MyTable.* from MyTable
join #temp t on t.originalId = MyTable.KeyColumn
where Id between M and M
order by Id asc
drop #temp

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight