Advantages of netezza client tools over aginity - netezza

My team recently started working with Netezza. I'm responsible for loading data into the database in the most efficient manner. They want me to look into things such as automating the loading of data and more.
Right now I'm using Aginity as an interface to load data but I'm wondering if there are any advantages of using Netezza Client tools (with nzload and more) instead of Aginity whether it's for loading data or anything else. When should I use one over the other?

Aginity is nice for exploration and code development.
IMHO you’ll need a proper (but lightweight) scripting language to do any kind of automated loading/extraction/manipulation of data.
Python, bash, powershell - doesn’t really matter.
Automation requires error handling and simple decision making combined with the ability to manipulate sql statements dynamically, and all scripting languages can do that.
Wether you call nzsql as a command line utility from that tool or uses an ODBC or JDBC capability in said scripting language is not of any consequence either.

Related

Database access in C

I am developing an application completely written in C. I have to save data permanently somewhere. I tried file storage but I feel its really a primitive manner to do the job and I don't want to save my sensitive data in a simple text file. How can i save my data and access it back in an easy manner? I come from JavaScript background and would prefer something like jsons. I will be happy with something like postgreSQL also. Give me some suggestions. I am using gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3.
sqlite seems to meet your requirements.
SQLite is an embedded SQL database engine. Unlike most other SQL
databases, SQLite does not have a separate server process. SQLite
reads and writes directly to ordinary disk files. A complete SQL
database with multiple tables, indices, triggers, and views, is
contained in a single disk file. The database file format is
cross-platform - you can freely copy a database between 32-bit and
64-bit systems or between big-endian and little-endian architectures.
These features make SQLite a popular choice as an Application File
Format. Think of SQLite not as a replacement for Oracle but as a
replacement for fopen()
Check out the quickstart
http://www.postgresql.org/docs/8.1/static/libpq.html
libpq is the C application programmer's interface to PostgreSQL. libpq is a set of library functions that allow client programs to pass queries to the PostgreSQL backend server and to receive the results of these queries.
I would recommend SQLite. I think it is a great way of storing local data.
There are C library bindings, and its API is quite simple.
Its main advantage is that all you need is the library. You don't need a complex database server setup (as you would with PostgreSQL). Also, its footprint is quite small (it's also used a lot in mobile development world {iOS, android, others}).
Its drawback is that it doesn't handle concurrency that well. But if it is a local, simple, single-threaded application, then I guess it won't be a problem.
MySQL embedded or BerkeleyDB are other options you might want to take a look at.
SQLite is a lightweight database. This page describes the C language interface:
http://www.sqlite.org/capi3ref.html
SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine. SQLite is the most widely deployed SQL database engine in the world. The source code for SQLite is in the public domain.
SQLite is a popular choice because it's light-weight and speedy. It also offers a C/C++ interface (including a bunch of other languages).
Everyone else has already mentioned SQLite, so I'll counter with dbm:
http://linux.die.net/man/3/dbm_open
It's not quite as fancy as SQLite (e.g, it's not a full SQL database), but it's often easier to work with from C, as it requires less setup.

Databases in offline software?

I'm primarily a web developer, currently learning C and planning on going into C++ in a year or so when I feel absolutely confident with C (Note: I'm not saying I'll be a master at C, just that I'll understand it in a fair amount of depth and will retain it properly rather than forgetting it when I see a new language).
My question is, how are offline/networked applications written with database functionality? I've built many-a database driven website in PHP and MySQL and would like to know how to use databases with my C projects - a lot of the applications I have the desire to write rely more on content management rather than processing data as such. What database formats are available to me? What should I be looking at to build a simple contact database for example?
Thanks in advance.
I'd suggest SQLite for file-based database. Mongo is pretty awesome too if you run it locally but it is still networked.
For a small application SQLLite might be a good option for you - it is part of your application and not dependant on other software but as a database is fairly weak (No triggers, no stored procedures afaik).
If you are looking for something more substantial (especially when it involves multiple users) you should be looking for MySQL or SQLServer. These can be accessed directly from their respective API's or via some kindof common mediator such as ODBC.
Your question is really very open, much application software depends on relational database technology at some level but the OS and the required task ussually dictate the best choices.
Going the SQL route with offline applications in C is not straightforward. Whereas the database storage brings in advantages, in terms of reliability e.g., it adds conversion steps during the save/load of your data, simply by using SQL.
The question is why would you want to create SQL commands as character strings to load/save the data that is treated as binary in your program, and that you can store as binary directly in your system local storage? It costs!
On the other side, if you already know SQL well, then you'll only have to learn about an (there are several) API to access a database (SQLite, MySQL ...) from C to get started.

Simplest way to develop an app that can use multiple types of databases?

I have a project for a class which requires that if a database is used, options exist for the user to pick a database to use which could be of a different type. So while I can use e.g. MySQL for development, in the final version of the project, the user must be able to choose a database (Oracle, MySQL, SQLite, etc.) upon installation. What's the easiest way to go about this, if there is an easy way?
The language used is up to me as long as it's supported by the department's Linux machines, so it could be Java, PHP, Perl, etc. I've been researching and found info on ODBC, JDBC, and SQLJ (such as this) but I'm quite new to databases so I'm having a hard time figuring out what would be best for my needs. It's also possible there may not be a simple enough way to do this; the professor admitted he's not a database guy either and he seemed to think it would be easy without having a clear idea of what it would take.
This is for a web app, but it ought to be fairly straight forward, using for example HTML and Javascript on the client side and Java with a MySQL database on the server side. No mention has been made of frameworks so I think they're too much. I have the option of using Tomcat or Apache if necessary but the overall idea is to keep things simple, and everything used should be able to be installed/changed/configured with just user level access. So something like having to recompile PHP to use ODBC would be out, I think.
Within these limitations, what would be the best way (if any) to be able to interact with an arbitrary database?
The issue I think you will have here is that SQL is not truely standard. What I mean is that vendors (Oracle, MySQL etc) have included types and features that are not SQL standard in order to "tie you in" to their DB, such as Oracle's VARCHAR2 and so on.
When I was at university, my final year project was to create an application that allowed users to create relational databases using JDBC with a Java front-end.
The use of JDBC was very simple but the issue was finding enough SQL features/types that all the vendors have in common. So they could switch between them without any issues. A way round this is to implement modules to deal with vendor specific issues and write ways to translate between them. So for example you may develop a database for MySQL with lots of MySQL specific code in that, but then you may want to use Oracle and then there are issues, which you would need to resolve.
I would spend some time looking at what core SQL standard all the vendors implement and then code for these features. But I think the technology you use wouldn't be the issue but rather the SQL you create.
Hope this helps, apologies if its not helpful!
Well, you can go two ways (in Java):
You can develop your own classes to work with different databases and load their drivers in JDBC. This way you will create a data access layer for yourself, which takes some time.
You can use Hibernate (or other ORMs). This way Hibernate will take care of things for you and you only have to know how to use Hibernate. Learning Hibernate may take some time, but when you get used to it, it can be very useful for your future projects.
If you want to stick Java there Hibernate (which wouldn't require a framework). Hibernate is fairly easy to use. You write HQL which gets translated to the SQL needed for the database you're using.
Maybe use an object relational mapper (ORM) or database abstraction layer (DAL). They are designed to provide a standard API to multiple database backends, making it possible to switch between different backends with minimal or no changes to your code. In Python, for example, a popular ORM is SQLAlchemy, and an excellent DAL is the web2py DAL (it's part of the web2py framework but can be used as a standalone DAL outside the framework as well). There are many other options in other languages as well.
use a framework with database abstraction layer and orm . try symfony or rails
There are a lot of Object relational database frameworks, unless you prefer jdbc. For simple/small applications this should work fine.

What database is a good progression from MS Access for Coldfusion?

All my (home) CF learning has so far been done using Access as a database, and as far as the DB goes I "get it". There's no database server, and no need to log on to the database or anything, and setting up table relationships is easy and visual. Oh and its essentially free to deploy.
However, I'm now working on an application that's likely to be used across several businesses and probably up to 50 concurrent users. I've heard that Access really isn't up to multi user use or production use on an app. What would you recommend as more suitable, preferably easy to grasp, with minimal tweeking needed for my SQL (I used a tool to convert to MySQL and it certainly handles concatenation differently, I dont want to have to do too much debugging), visual interface available, scalable, backupable, and whatever else I need that I don't yet know I need!
I recommend SQL Server 2008 Express. It has a great feature set, graphical UI admin tools, and you can step up easily from it to more commercial solutions as you continue to grow.
You could go with either MySQL or Microsoft SQL Server Express. Both are free and both work well.
Unfortunately you're going to have headaches converting your database no matter what you go with. Microsoft Access doesn't use standard SQL so string concatenation, functions, etc. will be different.
If you're merely using Access as a database, then naturally, Microsoft SQL Server is closest in concept (and SQL dialect) to access.
However, if your focus is on web development, the LAMP stack and specifically MySQL are a better choice. You should at least have an idea of some basic administration.
My experience is that the main challenge is going to be with data types and with string operations (sound like you have similar issues).
Generally, strive to write SQL that is portable, so it's good to read up on things that make it not portable or avoid using special tricks. If you can't do that, then abstract away using code, or even use an ORM tool.
The main benefit of Access, IMHO, is its built in support for generating UIs and reports, while hiding much of the underlying SQL. Most of the Microsoft languages (especially VB) offer similar capabilities. If you've used the UI or the forms, you'll now have to both learn a language or tool for generating UIs/forms/pages, and how to use SQL. Since you mentioned ColdFusion, I'll assume you have most of these skills already.
Before doing anything at all, I'd choose a database abstraction layer and refactor the existing code to use that.
Then it's relatively trivial to swap out your back end pretty much completely ecumenically.
If you use prepared statements, you'll also get protection against many forms of SQL injection.
I would also point out that a Jet/ACE back end was never a good choice for a web-based application because of the mismatch between the threading models and memory management of the web server and the Jet/ACE database engine (though if you use ADO, Jet is reported to be threadsafe; I don't quite understand how you can magically transform a non-threadsafe db engine into a threadsafe one with a data interface layer, but Michael Kaplan said it was so, and he is the type of person you can trust on that type of subject).
I don't have any experience with the Microsoft SQL Server Express, so I can't speak for. However, I do use MySQL with ColdFusion 8 and I'm continually impressed with the capability, flexibility, ease, community support, and speed of MySQL. PHP MyAdmin is pretty straight forward and a good web interface for MySQL. My $.02, spend it how you may.

ETL Tools and Build Tools

I have familiarities with software automated build tools ( such as Automated Build Studio). Now I am looking at ETL tools.
The one thing crosses my mind is that, I can do anything I can do in ETL tools by using a software build tool. ETL tools are tailored for data loading and manipulation for which a lot of scripts are needed in order to do the job. Software build tool, on the other hand, is versatile enough to do any jobs, including writing scripts to extract, transform and load any data from any format into any format.
Am I right?
It is correct that you can roll-out your own ETL scripts written using a development tool of your preference. Having said that, ETL jobs are frequently large (for a lack of better word) and demand considerable administration and attention to minute details (like programming). ETL tools allow developer to focus on ETL tasks -- as opposed to writing and debugging code, although that's part of it too. There are some open-source tools out there, so you can get a feeling of what an average tool does, before jumping into custom development. For example, more expensive tools provide data lineage, meaning you can (graphically) track every field on a report back to the originating table through all transformations (versions included); after a corporate merger that's quite a task to do.
For example Pentaho has community edition; if you have MS SQL Server, you can get SSIS. Also see if you can find something here.
The benefit of an ETL tool is maximized if you have many processes to build (I like jsf80238's post aboves analogy with hammering in 100 nails). A key benefit of real ETL tools is the metadata they generate and operational support. Writing your scripts in Perl/Ruby/etc is fairly easy, but breaks down when problems need to be tracked down or someone other than the author has to figure out what's wrong.The ability for admin/support staff to quickly see what went wrong is what's worth paying money for. I have used Microsoft's SSIS (2005 - OK) and the latest Pentaho PDI (quite good). The Pentaho ETL GUI is used by business users (without IT support for 99% of the time) at my workplace, and has replaced a tangle of SQL scripts and spreadsheets. Say what you like about the rest of the Pentaho stack, but the ETL component is, in my opinion, excellent "bang for buck".
The whole business of ETL is based on the premise that the source of the data is incompatible with destination data source. And many times, the folks who dump the source data may not be thinking that this data needs to be collected and aggregated. This is why the whole business of ETL is in existent.
A commercial ETL tool will not magically read the source input and transform data according to the rules of the destination database. Rules have to be defined and fed into the ETL tool. Interestingly, many companies offer training!!! on how to use their proprietary scripting language. So it is not always that easy. But for non-programmers, maybe this is the preferred route.
Personally, I think that it is always easier to write a proprietary ETL tool in a language like Perl. Simply write a state-machine algorithm to rip through the source data and convert it to the desired format. I use Perl to FTP into machines, read in the files, transform the data, and then load it into the database. This is always a superior solution and much faster if one is proficient in Perl or similar, or can hire someone who knows Perl.
And one final point, start with the end in mind. Dump your source data in a structured format to help out the analysis group in your company who wants to aggregate and study the. This will make the ETL program easier and faster to develop.
I like Damir Sudarevic's answer and wanted to add that your choice of tool might also depend on how much work you have in front of you. If you have the occasional ETL task and are already familiar with a tool that will allow you to accomplish that task, use the tool you already know (this approach assigns a zero value to learning a new tool, which is perhaps undervaluing new knowledge). If you have a lot of ETL tasks, the up-front investment of learning a new tool might very well pay off. You can use pliers to drive a nail, and if you have only one nail you can use the pliers. If you have to drive 100 nails get yourself a hammer.
You can also do anything ETL tools can do with code. :-)
Both tool categories you mention can be used to solve this problem, but they are optimized for the class of problems they are trying to solve:
ETLs tend to come with a library of data manipulation tools (relational calculus, in-line computations, etc.), are optimized to handle large quantities of data, and have job management features (important if this isn't a single one-off data migration).
Build tools (for me, Ant comes to mind as a prototypical example) could do similar tasks, but are focused on compilation, file organization and manipulation, and packaging.

Resources