Why hive is not supporting Stored procedure?
If its not supporting then how we will handle Sp in Hive? have any alternate solution?
(Because we have a already a data base is there in mssql)
What about HBASE? Is it support SP?
First of all, Hadoop or Hive is NOT an alternative to your SQL DB. You must never consider either of these 2 to be used as a replacement of your RDBMS.
Hive was developed just to provide warehousing capabilities on top of an existing Hadoop cluster keeping in mind the large base of SQL users, both expert database designers and administrators, as well as casual users who use SQL to extract information from their data warehouses. Although it provides you a SQL like interface, it is not a SQL DB. Hive is most suited for data warehouse applications, where relatively static data is analyzed, fast response times are not required, and when the data is not changing rapidly. Simply put for offline batch processing kind of stuff.
There is nothing like stored procedures in HBase as well. But they have something called as Coprocessor which resembles stored procedures in RDBMS. To find more on Coprocessor you can go here.
And as #zsxwing has said Sqoop is just a data migration tool, nothing more. Once you switch to the NoSQL world you need to be flexible and you need to abide by the NoSQL rules.
If you could elaborate your use case a bit, maybe we can help you better.
In response to your comment :
Yes Facebook uses Hadoop and Hive and other related tool extensively. Infact Hive was developed at Facebook. But These are not the only things. Wherever they have OLTP and full transactional need, they still depend on RDBMS. One example is their Timeline feature, which uses MySQL. They have a gigantic(and awesome) pipeline which consists of a lot of things and not just Hadoop and Hive. See the picture below.
Hive and Hbase are not support stored procedure. However, Hive plans to support Sp (HIVE-3087) in the future. HBase has no plan about supporting Sp since it only focuses on being a Storage and more like NoSQL.
Hive UDF could implement some function of stored procedure, though it's not enough.
Hive does not have stored procedures
Hive indeed does not have any stored procedures as explained in existing answers. However, here are 2 mitigating factors:
Hive has views
Of course it is not a proper substitute for stored procedures, but with smart use of views you can perhaps remove the need for some of your procedures.
You can call hive from another program
The last time I ran into the problem that hive does not have stored procedures, I realized that the thing I wanted to do (loop over all columns) was something that I could also do in another program. As such I followed the following workflow:
Run a query to get the relevant (meta) data: Python calls hive to get column names
Use the information to build the query: Python takes in all column names and builds the correspondng select statements
Run the resulting query: Python does a system call with hive -e
Optionally, go to 2 if needed
With views and external calls, I have so far been able to work around the lack of stored procedures.
Please refer to HPL/SQL, I am looking for same solution but not try yet.
I believe the data warehouse application need stored procedure support, but prefer set-based than row-based procedure.
In my personal experience, procedural support is needed when leverage server-side program template in structured data warehouse application. It makes data warehouse application more easy to porting between SQL/NoSQL, like Netezza, MSSQL, Oracle, DB2, and BigInsight.
Have a look at open-source project PL/HQL at http://www.plhql.org. It allows you to run existing SQL Server, Oracle, Teradata, MySQL etc. stored procedures in Hive.
Related
I have a database hosted on a server. And I have to monitor the database with a script with the necessary queries and stored procedures. The metrics that I have to monitor are:
accounts or users are connected
transactions are activated
resources use transactions
what time
Processor use
Disk use
They told me that with MDA tables I can do it. How can I get those metrics with these MDA ASE tables? Or with what stored procedures could I obtain them?
You are asking about full functionality of a full featured program. There are commercial tools available - like Bradmark Surveilance, or free - like asetune. You can also write your own scripts.
You could be using build in procedures like sp_sysmon. Or you can write your own scripts that read MDA tables and store the results. You can also try to use the tools delivered with ASE server - like ASE cockpit, Sybase Control Center (older versions), or Sybase Central (ancient ASE versions).
One tool in Sybase that may be very helpful is sp_help table_name (just replace table_name with the name of the table you want to know more about). sp_help will show you everything you need to know about the tables, and columns in your database, and I've found it extremely helpful when I need to build queries, but can't remember the full structure of all the tables.
Once you have an idea of what values are stored where, you can build queries that will pull the information you need. As #Adam point out in his answer above, Sybase has built-in procedures that will gather at least some of this data. The Sybase InfoCenter is also a great source of information about what's available to you already.
We have MS SQL server as a primary option for various databases and we run hundreds of Stored procedures on a regular basis.
Now we are moving to completely big data stack. We are using Spark for the batch jobs. But, We have already invested enormous effort in creating those stored procedure. Is there a way to reuse the stored procedure on top of Spark? or is there an easy way to migrate them to Spark instead of writing from scratch?
Or any framework like Cloudera distribution/impala addresses this requirement?
No, there's not as far as I can tell. You may be able to use a very similar logical flow but you're going to need to invest serious time and effort to convert the T-SQL to Spark. I would recommend going straight to Scala and not wasting time with Python/PySpark.
My rule of thumb for the conversion would be to try to do anything that's SQL in the stored procs as SQL in Spark (sqlContext.sql("SELECT x FROM y")) but be aware that Spark DataFrames are immutable so any UPDATE or DELETE actions will have to be changed to output a new modified DataFrame.
I would like to transfer the whole Database i have in Informix to Oracle. We have an an application which works on both Databases, one of our customers is moving from Informix to Oracle, and needs to transfer the whole Database to Oracle (the structure is the same).
We need often to transfer data between oracle/Mssql/Informix sometimes only one table and not the whole Database.
Does anybody know about any good program which does this kind of job?
The Pentaho Data Integration ETL tools are available as open source (also known under the former name "Kettle") for cross-database migration and many other use cases.
From their data sheet:
Common Use Cases
Data warehouse population with built-in support for slowly changing
dimensions, junk dimensions
Export of database(s) to text-file(s) or other databases
Import of data into databases, ranging from text-files to excel
sheets
Data migration between database applications
...
A list of input / output data formats can be found in the accepted answer of this question: Does anybody know the list of Pentaho Data Integration (Kettle) connectors list?
It supports all databases with a JDBC driver, which means most of them.
Check this question of mine, it includes some very good ideas: Searching for (freeware) database migration tool
you could give the Oracle Migration Workbench a try. See http://download.oracle.com/docs/html/B15858_01/toc.htm If you want to read Informix data into Oracle on a regular basis, using the Heterogeneous Services might be a better option. Check for hs4odbc or dg4odbc, depending on the Oracle release you have.
I hope this helps,
Ronald.
I have done this in the past and it is not a trivial task. We ended up writing out each table out to a pipe delimited flat file and reloading each table into Oracle with Oracle SQL Loader. There was a ton of Perl scripts to scrub the source data and shell scripts to automate the process as much as possible and run things in parallel as well.
Gotchas that can come up:
1. Pick a delimiter that is as unique as possible.
2. Try to find data types that match as close as possible to the Informix ones as possible. ie date vs. timestamp
3. Try to get the data as clean as possible prior to dumping out the flat files.
4. HS will most likely be too slow..
This was done years ago. You may want to investigate Golden Gate (now owned by Oracle) software which may help with the process(GG did not exist when I did it)
Another idea is use an ETL tool to read Informix and dump the data into Oracle (Informatica comes to mind)
Good luck :)
sqlldr - Oracle's import utility
Here's what I did to transfer 50TB of data from MySQL to ORacle. Generated csv files from MySql and used sqlldr utility in oracle to export all the data from the files to oracle db. It is the fastest way to import data. I researched on this for a few weeks and done lot of benchmark test cases and sqlldr is hands down best and fastest way to import into oracle.
We have an application which has metadata information stored in database (some tables with relations between). The metadata can be edited through web app or directly manipulating values in SQL Server database.
The problem: metadata changes and needs to be merged between different environments (test, staging, production, etc.). There are tools (e.g. RedGate) that help but it is still quite a lot of work to compare databases if autogenerated ID's are being used (as it is now in our DB, and yes, one way is to use natural keys to make comparison easier).
However, our metadata may be stored not necessarily in SQL database - it could be stored as documents in NOSQL databases (MongoDB, CouchDB, RavenDB) or even simple XML databases (maybe Berkeley DB XML?). Storing as XML file seems would work (as it easier to compare and merge files rather than databases) but may not be a good option as there needs to some concurrency mechanisms, some degree of transaction support.
We do not need replication to other servers, there is no need for high availability, etc.
The requirements to store data:
some kind of ACID
Should run on Windows
Easy comparison (bi-directional sync)
(optional) GUI to see what is in database
(optional) export to file (JSON, XML)
What are the options?
Why conflate the storage with the representation you are performing the diff on?
I'd keep everything in SQL, but when it came time to compare, select all the important data (not the ids) into a XML format, and use an XML differencing tool (or a csv format, and use a plain text comparer).
I have never used it but CouchDB has built-in support for birectional syncing between db's.
I am developing a data driven website and quite a lot of programming logic resides in database stored procedures and database functions. I found myself changing the stored proc/functions quite a lot in order to fix bugs or add new functionality. The data (tables) have remained mostly untouched.
The issue I am having is keeping track of versions of stored proc/functions. Currently I am incrementing version of whole database when I do a set of changes. As data is huge (10 Gb) I get issues having to run development version and release versions of databases in parallel.
I wish to put all the stored procs and functions in one database and keep data in one database, so that I can better manage the changes.
I am sure others would have encountered similar suggest and request suggestions on how to best handle this situation.
I would also recommend using source control keyword expansion in your stored procedures ($Version:$)
That way you can eyeball, grep, search syscomments, etc to see what version you have on your deployed database.
You can version just the schema dumps. In combination with source control keword expansion (as suggested by Rawheiser), you just take a look at what version you have in the database, generate a diff and apply it.
Also, there are several excellent tools to compare databases and their schemas, generate DDL scripts etc.: SQL Workbench, Power Architect, DDLUtils and Redgate SQL Compare, to name a few. SQL Compare is likely to work best with SQL Server, although all the others are FOSS and provide a higher ROI (in terms of time spent learning and what you can do with them) as they are platoform and RDBMS independent.
Finally, I have to say...I understand that the immediate results you get with logic in the DB are tempting, but if you've gone beyond more than a couple of procedures in the database, you're setting your self up for quite a lot of pain, sifting through what easily turns into spaghetti code and locking your application to a single database vendor. You might have your reasons, but I've been there and didn't like it very much. Logic can live very nicely in a different layer.
For source control you have several options:
Use a Visual Studio Database project.
Use SQL Server 2005's built-in support for source control
Use a third part tool such as SQL Compare
IMO Option 1. is preferable.