Database replication with Slony-I alternative Linux setup to Shell Scripts - database

As title suggests,
Current tutorials i have found thus far use embedded SLON commands in Shell Script files to setup the required configuration for Slony-I master to slave replication.
For example: Slony-I Setup Tutorial
I was wondering if it was possible to embedd the SLON setup commands and have them called within another language Eg C/C++/Python working within a linux environment?

SLONIK scripts generally work by writing through STDOUT to the slonik binary. Any language should have the ability to replicate this style, but there's little difference than using the shell script method and in my experience it tends to occlude what is being done. You are still, after all, writing to the STDOUT and sending that information to the slonik binary.
I have in the past written perl modules to assist with this but they felt very kludgey and I've only employed them when needing to dynamically modify replication setups. I find it is rare that something like this is required and for the vast majority of slony work a shell script is much simpler to manage.
To sum up: Yes you can, but it is probably only making things more complex.
An example of how you could do it in python would be:
p = subprocess.Popen('/usr/bin/slonik',stdout=subprocess.PIPE,stdin=subprocess.PIPE)
p.stdin.write('<slon commands here>')
p.stdin.close()

Related

How to use shp2pgsql

My question should be very simple to answer for anyone not being a self-taught newbie like me...
On this page is a cheatsheet concerning a function to be used in GIS/DB environnement : http://www.bostongis.com/pgsql2shp_shp2pgsql_quickguide.bqg
I would like to create a script allowing users to just have to click on it to launch the process, given the proper datas. But I don't understand how to use this. It obviously doesn't work in a Python console, nor directly in the windows console. How is it supposed to work ? What language is this ?
Thanks
shp2pgsql is indeed a command line tool. It comes with your PostgreSQL/PostGIS installation (usually) and, if not accessible via PATH-variable, can (usually) be run from within the /bin-folder in your PostgreSQL-Installation. You can also always 'make' the programm from source in any location yourself, if needed.
EDIT:
One way to set up a script (independent of whether you use it within qgis own python environment or not) would be to use Pythons subprocess (or os.system) module (check related question here) to write to shell and execute shp2pgsql.
A slightly more sophisitcated solution to (batch) insert (multiple) shapefiles via script could be to implement ogr2ogr via gdal/ogr module within python (check this blog). That, however, would require a working installation of the gdal core library, and the respective Python bindings (at least to use outside of QGIS Python environment, where it is pre-installed AFAIK), which can be tiresome at times. Once installed correctly, it offers a powerful (I dare say almighty) toolset for geodata management and manipulation via Python, though.
Apart from that, the blog link I provided also states the implementation of a batch insert script/tool (which operates ogr2ogr) in qgis 2.8 toolbox...maybe that can help you, either with your work directly or (via sourcecode) to point you in the direction of creating your own tool.

What is a good pattern to synchronize files between computers in parallel (in CentOS)?

Trying to find a good way to copy code between one "deployment" computer and several "target" computers, hopefully in parallel. The idea is that the deployment computer holds a copy of the files as they are supposed to be copied to the target servers. We would like to have copying happen in parallel, as it might involve several tens of target servers.
Our current scheme involves using rsync to synchronize the containing directory where the files reside, in order to keep the target servers up-to-date on the deployment server.
So, the questions are:
What is a good / better way to do this?
What sort of tools are used to do this?
Should this problem be faced from a different angle or perspective that I'm totally missing?
Thanks very much!
Another option is pdsh, a parallel, distributed shell. It's available from EPEL, and allows running remote commands (via ssh) on multiple nodes in parallel. For example:
pdsh -w node10,node11,node12 command
Runs "command" on all three nodes in parallel. It also has a handy hostname expression feature to do the same thing with a bit less typing:
pdsh -w node[10-12] command
It also includes the pdcp command copies files to multiple nodes in parallel. (The pdsh package needs to be installed on all nodes for pdcp to work.)
pdcp -w node[10-12] /local/file /remote/dir/
The local file is copied to the /remote/dir on all three nodes.
We use the lftp command to sync our remote web server to our local backup machine. We wrote a BaSH script to automatically sync all backups on the server to the local box, and we set that script up on a cron to run nightly.
rsync is a fine way of handling this, and I might recommend moving your current protocol into a cron setup if it isn't already.
Unison is also a tool available for setting up two way sync, if you requie that functionality.
Hope this helps!
There is a program called clusterssh that is available on debian based operating systems (but I was able to install it onto RHEL 6.3 using an RPM and resolving other dependencies) that will allow you to open an ssh terminal for multiple machines, with a single input location (this allows you type once onto as many machines as you have terminals open). Then you just have to use a simple scp. I have used this program to move a file from a development workstation to as many as 25 other workstations at the same time, but this option is only really useful if you're trying to accomplish what you stated in the question, that is, copying files from one computer to several others.
This is not an effective syncing mechanism. If you really want it to sync then the above answer would be best.

Configuration Management for FPGA Designs

Which configuration management tool is the best for FPGA designs, specifically Xilinx FPGA's programmed with VHDL and C for the embedded (microblaze) software?
There isn't a "best", but configuration control solutions that work for software will be OK for FPGAs - the flow is very similar. I use Subversion at work and git at home, and wrote a little on 'why' at my blog.
In other answers, binary files keep getting mentioned - the only binary files I deal with are compilation products (equivalent to software object and executables), so I don't keep them in the version control repository, I keep a zipfile for each release/tag that I create with all the important (and irritatingly slow to reproduce) ones in.
I don't think it much matters what revision control tool you use -- anything that you would consider good in general will probably be OK here. I personally use Git for a sizable Verilog + software project, and I'm quite happy with it.
What will bite you in the ass -- no matter what version control you use -- is this: The Xilinx tools don't generally respect a clean division between "input" and "output" or between (human edited) "source" and (opaque) "binary." Many of the tools like to store some state information, like a last-run time or a hash value, in their "input" files meaning that you'll get lots of false changes. Coregen does this to its .xco files, and project navigator (the main GUI) does this to its .xise files. Also, both tools have a habit of inserting or removing lines for default-valued parameters, seemingly at random.
The biggest issue I've encountered is the work-flow with Coregen: In many cases, at least one of the following is true:
You have to manually edit the HDL files produced by Coregen.
The parameters that went into Coregen are stored somewhere other than the .xco file (usually in what looks like an output file).
You have to copy-and-paste the output from Coregen into your top-level design.
This means that there is no single logical source/master location for your input to the core-generating process. So even if you have the .xco file under version control, there's no expectation that the design you're running corresponds to it. If you re-generate "the same" core from its nominal inputs, you probably won't get the right outputs. And don't even think about merging.
I suggest CM tools that support version labeling and binary files. Most Software CM applications are fine with ASCII text files. They may just store a "difference" file rather than the entire file for updates.
My recommendations: PVCS, ClearCase and Subversion. DO NOT USE Microsoft SourceSafe. I don't like it because it only supports one label per revision.
I've seen Perforce and Subversion used in a couple of FPGA-intensive companies.
We use Perforce, and its great. You can have your code that lives in Linux-land checked in side-by-side with your Specs and Docs that live in Windows-land. And you get branching, labels, etc.
I've seen everything from Clearcase to RCS used, and it is really all okay for this kind of thing. The important thing is to get a good set of check-in policies established for your group, and make sure they stick to it.
And have automated nightly regressions. That way, when someone breaks the rules, they can be identified and publicly shamed.
I have personally used Perforce, Subverion, git and ClearCase for FPGA projects. Since VHDL and C are just text files, any works fine. However be sure to capture the other project and contraint files and any libraries you use.
Also think about what to do with the outputs, e.g. log file and bitstreams. Both tend to be big and the bitstreams are binaries.
Previously I used Subversion but have switched to git two years ago. Git handles FPGA design files just as well as it handles every other text and binary file. Git is all you need for version controlling your files and artifacts.
For building the designs, I recommend just using a single ISE project called "ise" (living in a subdirectory called "ise/"). You can take a look at my (very modest) FPGA open-source project on github for the file layout. I don't bother storing the ISE files at all since they are easy to regenerate. The only things I save are the Verilog files and some ISIM waveform config files. In other projects that use coregen I save the coregen.cgp project file and all of the *.xco scripts for regenerating cores. Then I use a Makefile for actually running coregen on the *.xco files. There are a few other Xilinx-specific files you should version control too: *.ucf, *.coe, *.xcf, etc.
I experimented with using Makefiles and the Xilinx command-line tools but found that ISE did a much better job tracking dependencies and calling the tools with the right arguments. Just don't make the mistake of trying to version control your ise/ project files or you will go mad. Xilinx has something like 300 different file types which change every release. If you want to save a file, you can try the ISE project file itself with a .xise extension. Anything that is hard to recreate, like the golden bitfile that you know works and took 6 hours to build, you might want to copy that and configuration manage it explicitly.

Is it a good way to use system() for database scripts from C?

I was searching for connecting to database from C program. But I thought the ODBC connections, logon and all need some libraries. Also I am having a minimal compiler like Tiny C Compiler which is very fast. I do not want to use any ODBC logic etc which is needed to connect and query the database.
So I am using a method which is as follows.
I use a bteq script (teradata) which will have login, query, logoff commands in that. (FYI bteq is a command line database utility. You can use it similar to mysql.exe in command prompt by going to the path of the exe. You can replace bteq with mysql.exe etc). And I use
system("bteq <myscript.txt >out.txt");
myscript.txt will be like the following..
.logon boxname/user,password;
select date;
.logoff;
The above script will logon to the database and query date (you can change the query and write script according to your database engine and your needs) and give output into out.txt.
Now I will parse the out.txt for the row X column I want using fgetc,fscanf or fgets.
And use the data for checking and send a mail using PHP on any server
system("c:/server/php/php.exe sendmail.php");. We can do the same for many a database engines like mysql, .. etc through a simple C program.
Now my question is Is there any flaw in the above method.
If it is then how can I overcome it. I am asking this question because I think this method is unconventional. Please give your opinions on this method. I don't bother about time needed for execution, RAM used, performance issues etc. I know system() function is time consuming which is not my concern anyway. I also developed specific functions to access query results (similar to accessing a flat file). Please tell me if you have any improvements to this method. If you get to know of any flaws in this please let me know. All kinds of suggestions are welcome.
My environment is : teradata bteq on windows with Tiny C Compiler
This is a perfectly fine way to access an external database, as long as your needs are simple. If you already know about the performance and memory implications of doing this, then there's not much more to say.
The method is fine: it's great to decouple the db subsystem and the parser subsystem by implementing them in an appropriate language.
There's just this tiny little thing - but I may be mistaken because I'm not familiar with bteq: the program will need a bteq script installed in the execution folder; this script will contain username and password. If those aren't encripted in some way, there might be a security flaw.
I wouldn't recommend this if your calling code is running setuid or setgid, but in that case you could use one of the exec() functions instead. (There are a few other considerations you may wish to take into account, all detailed in man 3 system.)

If possible how can one embed PostgreSQL?

If it's possible, I'm interested in being able to embed a PostgreSQL database, similar to sqllite. I've read that it's not possible. I'm no database expert though, so I want to hear from you.
Essentially I want PostgreSQL without all the configuration and installation. If it's possible, tell me how.
Run postgresql in a background process.
Start a separate thread in your application that would start a postgresql server in local mode either by binding it to localhost with some random free port or by using sockets (does windows support sockets?). That should be fairly easy, something like:
system("C:\Program Files\MyApplication\pgsql\postgres.exe -D C:\Documents and Settings\User\Local Settings\MyApplication\database -h 127.0.0.1 -p 12345");
and then just connect to 127.0.0.1:12345.
When your application quits, you can always send a SIGTERM to your thread and then wait a few seconds for postgresql to quit (ie join the thread).
PS: You can also use pg_ctl to control your "embedded" database, even without threads, just do a "pg_ctl start" (with appropriate options) when starting the application and "pg_ctl stop" when quitting it.
You cannot embed it, nor should you try.
For embedding you should use sqlite as you mentioned or firebird rdbms.
Unless you do a major rewrite of code, it is not possible to run Postgres "embedded". Either run it as a separate process or use something else. SQLite is an excellent choice. But there are others. MySQL has an embedded version. See it at http://mysql.com/oem/. Also several java choices, and Mac has Core Data you can write too. Hell, you can even use FoxPro. What OS you on and what services you need from the database?
You can't embed it as a in process type thing like sqlite etc, but you can easily embed it into your application setup using Inno setup at http://www.innosetup.org. Search their mailing list archive and you will find someone did most of the work for you and all you have to to is grab the zipped distro and you can easily have postgresql installed when the user installs your app. You can then use the pg_hba.conf file to restrict the server to local host only. Not a true embedded DB, but it would work.
PostgreSQL is intended to run as a stand-alone server; it's probably possible to embed it if you hack at it hard and long enough, but it would be much easier to just run it as intended in a separate process.
HSQLDB (http://hsqldb.org/) is another db which is easily embedded. Requires Java, but is an excellent and often-used choice for Java applications.
Anyone tried on Mac OS X:
http://pagesperso-orange.fr/bruno.gaufier/xhtml/prod_postgresql.xhtml
http://www.macosxguru.net/article.php?story=20041119135924825
(Of course sqlite would be my embedded db of choice as well)
Well, I know this is a very very very old post, but if anyone has nowadays this question, I would refer to:
You can use containers running Postgres. Here's a post that could be helpful, doing something along this line using R:
https://rsangole.netlify.app/post/2021/08/07/docker-based-rstudio-postgres/?utm_source=pocket_mylist
Take a look at duckdb https://duckdb.org/docs/installation/ It is relatively new and still needs to mature. But it works pretty much like an embedded database ("In-process, serverless"), with bindings for several languages (Python, R, Java, ...)

Resources