COPY GZIP HDFS data into vertica - database

I want to COPY HDFS (gzipped)data into Vetica.
I am using following command. but its not working
COPY pix001 SOURCE Hdfs(url='http://hadoopnemenode.com:50070/webhdfs/v1/bq-upload/pix/m=03/d=01/03-01.txt.gz', username='xyz') GZIP DELIMITER E'\t';
Anyone know better way to do this
Thanks

Yes there is GZIP support just need to compile GZIP libs [Vertica Guys helped me finally :)]
here are the steps :
# cd /opt/vertica/sdk/examples/
# make
# vsql -f FilterFunctions.sql
dbadmin=> CREATE LIBRARY GZipLib AS '/opt/vertica/sdk/examples/build/GZipLib.so';
dbadmin=> CREATE FILTER GZip AS LANGUAGE 'C++' NAME 'GZipUnpackerFactory' LIBRARY GZipLib;
COPY abc002 SOURCE Hdfs(url='http://hadoop-namenode.com:50070/webhdfs/v1/03-01.txt.gz', username='xyz') filter GZip() DELIMITER E'\t';

Adding to roy answer,
Steps to make(build) are given below, (#2nd step on roy answer)
sudo apt-get install g++
sudo apt-get install zlib1g-dev # for gzip
g++ -lz -D HAVE_LONG_INT_64 -I /opt/vertica/sdk/include -Wall -shared -Wno-unused-value -fPIC -o /opt/vertica/sdk/examples/build/GZipLib.so /opt/vertica/sdk/examples/FilterFunctions/GZip.cpp /opt/vertica/sdk/include/Vertica.cpp
Hint: -lz flag to link the zlib library statically with GZip.so
Vertica Documentation for compiling UDF

It doesn't look like copying from HDFS supports GZIP?:
https://my.vertica.com/docs/7.0.x/HTML/Content/Authoring/HadoopIntegrationGuide/HDFSConnector/LoadingDataFromHDFS.htm
I don't see it in that doc, in any case.

Related

macOS: library not found for -lpaho-mqtt3c

What I have done:
git clone https://github.com/eclipse/paho.mqtt.c
cd paho.mqtt.c
make
sudo make install
Then, I tried compiling a simple C program that includes the MQTT C library like this:
#include <MQTTClient.h>
The command I used was:
$ gcc -o mqttTest mqttTest.c -lpaho-mqtt3c
What I got was ...
... even though the libraries are clearly present in /usr/local/lib:
What do I need to do to compile my code?
I already tried adding -L/usr/local/lib to the compile command, to no avail.
I found the answer on GitHub. See VilleViktor's post here: https://github.com/eclipse/paho.mqtt.cpp/issues/150
All I had to do was:
$ mv /usr/local/lib/libpaho-mqtt3a.so.1.0 /usr/local/lib/libpaho-mqtt3a.so.1
$ mv /usr/local/lib/libpaho-mqtt3as.so.1.0 /usr/local/lib/libpaho-mqtt3as.so.1
$ mv /usr/local/lib/libpaho-mqtt3c.so.1.0 /usr/local/lib/libpaho-mqtt3c.so.1
$ mv /usr/local/lib/libpaho-mqtt3cs.so.1.0 /usr/local/lib/libpaho-mqtt3cs.so.1
Maybe that saves someone else a lot of time on Google ...

Using readline statically in C (compilation and linkage)

I would like to link readline statically with my program and I found this page about readline compilation from source http://www.bioinf.org.uk/software/profit/doc/node17.html but I'm a bit confused about the process.
The page talks about a variable READLINELIB in the makefile but I don't find it.
Could someone show me the way to use readline statically in my program, what to put in my Makefile for compiling readline from source and link it with my program?
Thank you.
Finally I figured out the proper way to do it, I using the --prefix option of the configure file I can tell where to put/install the library. The problem about installation was that I don't have the right to access other directories than my $HOME, so no problem doing this:
configure --prefix=$HOME/libreadline && make && make install-static
Then in my program I include the file from $HOME/libreadline/include.
To compile the main program I link the program with the archive libraries $HOME/libreadline/lib/libreadline.a and $HOME/libreadline/lib/libhistory.a.
Also since readline files uses directive like #include <readline/readline.h> which doesn't correspond to the location of the files, I must tell the compiler where to look for included files. To do this, before running gcc, I set the variable C_INCLUDE_PATH to $HOME/libreadline/include.
Finally, since readline uses ncurses dynamic library I must tell the compiler to dynamically link it with my program. It might be the case of termcap too...
The overall process looks like:
configure --prefix=$HOME/libreadline && make && make install-static
export C_INCLUDE_PATH=$HOME/libreadline/include
gcc -o myprogram myprogram.c $HOME/libreadline/lib/libreadline.a $HOME/libreadline/libhistory.a -lncurses -ltermcap
I was confused about what make install do, it only copy files to the location provided by the configure, by default it installs in system directories like /usr/include, etc... but providing the --prefix option make install will copy all files in the specified directory.
Installation is just copying compiled program, libraries, doc, etc to a certain location, by default standart system directories, if you don't have access to those directories like me you could "install" it in your own directory and then do whatever you wan't with it.
I could have installed the dynamic library instead the static one, but then I would have to modify the LD_LIBRARY_PATH environment.
get readline source
wget http://git.savannah.gnu.org/cgit/readline.git/snapshot/readline-master.tar.gz
tar zxvf readline-master.tar.gz
cd readline-master/
examples folder does not have Makefile, which is generated using Makefile.in script.
following steps build static & dynamic libs & puts them inside /usr/local/bin
./configure
make
sudo make install
may have to install curses as "sudo apt-get install libncurses5-dev"
Use following make file (strip down version from examples folder)
(Make sure tab is honored otherwise makefile will not work)
RM = rm -f
CC = gcc
CFLAGS = -g -O
INCLUDES = -I/usr/local/include
LDFLAGS = -g -L/usr/local/lib
READLINE_LIB = -lreadline
TERMCAP_LIB = -ltermcap
.c.o:
${RM} $#
$(CC) $(CFLAGS) $(INCLUDES) -c $<
SOURCES = rlversion.c
EXECUTABLES = rlversion
OBJECTS = rlversion.o
all: $(EXECUTABLES)
everything: all
rlversion: rlversion.o
$(CC) $(LDFLAGS) -o $# rlversion.o $(READLINE_LIB) $(TERMCAP_LIB)
clean mostlyclean:
$(RM) $(OBJECTS) $(OTHEROBJ)
$(RM) $(EXECUTABLES)
rlversion.o: rlversion.c
I was in need of libraries libreadline.a, libhistory.a for both 64 and 32 bit versions.
The answer provided by Rajeev Kumar worked for me. ( Had a little trouble finding and installing libncurses).
For 32-bit versions, using https://packages.ubuntu.com/search?keywords=lib32readline-dev, the following command worked for me.
sudo apt install lib32readline-dev
So it is hoped that for 64 also, it works
sudo apt install libreadline-dev

extracting and creating ipk files

ipk packages are the intallation packages used by opkg.
I'm trying to extract the contents of one of them and also create my own ipk.
I've read that I should be able to untar them but that is not true.
I've tried:
tar -zxvf mypack.ipk
and I get:
zip: stdin: not in gzip format
I've also tried:
tar -xvf mypack.ipk
and I get:
tar: This does not look like a tar archive
I've found that most of the information on the internet regarding ipk's are inaccurate.
My ipk was generated by bitbake. I'm having a hard time with bitbake and want to avoid using it.
Any ideas on how to extract and how to create ipk files? A simple template with a single package would be useful to have.
I figured it out.
You can extract the main package with the ar x command, then extract the control.tar.gz with the tar -zxf command.
I have tested "ar x package-name.ipk" command but it didn't help
I found bellow command which worked perfectly
tar zxpvf package-name.ipk
This extracts three files:
debian-binary
data.tar.gz
control.tar.gz
use the same command to open data.tar.gz and control.tar.gz files
for more information refer to
https://cognito.me.uk/computers/manual-extractioninstallation-of-ipk-packages-on-gargoyleopenwrt/
You need to create a control file, and then do some archiving using tar and ar. In my case, I was distributing just python scripts, so there was no architecture dependency. You should check the control and Makefile into version control, and delete all the other intermediate files.
Here are the contents of control
Package: my-thing-python
Version: 1.0
Description: python scripts for MyCompany
Section: extras
Priority: optional
Maintainer: John
License: CLOSED
Architecture: all
OE: my-thing-python
Homepage: unknown
Depends: python python-distutils python-pyserial python-curses python-mmap python-ctypes
Source: N/A
Here is my Makefile which sits in the same directory as all my python scripts.
all: my-thing-python.ipk
my-thing-python.ipk:
rm -rf ipk
mkdir -p ipk/opt/my-thing-python
cp *.py ipk/opt/my-thing-python
tar czvf control.tar.gz control
cd ipk; tar czvf ../data.tar.gz .; cd ..
echo 2.0 > debian-binary
ar r my-thing-python.ipk control.tar.gz data.tar.gz debian-binary
clean: FORCE
rm -rf ipk
rm -f control.tar.gz
rm -f data.tar.gz
rm -f my-thing-python.ipk
FORCE:
Extracting with these commands:
Extract the file by running the command:
ar -xv <.ipk file>
Extract the control.tar.gz file by running the command:
tar -zxvf control.tar.gz
data.tar.gz : untar by running the command:
tar –zxvf data.tar.gz
If you want a list of files in an ipk, you can do something like:
#!/bin/sh
for f
do
tar -x -z -f $f ./data.tar.gz -O | tar tvzf -
done
-O is extract to standard output.
ipk files used to be AR (like DPKG), but are now tgz.
I feel that some dpkg utility ought to cope with ipkg files, but I haven't found the right one.

What is the simplest way to create source Debian package?

Suppose you have hello.c
int main() { return 0; }
and Makefile
hello: hello.c
gcc hello.c -o hello
install: hello
install -m 755 hello /usr/bin/
The quickest and easiest way to get binary package seems to be to use checkinstall:
fakeroot checkinstall --pkgname hello -y -D --install=no --backup --nodoc --fstrans --pkgversion 0.0.1 make install
How to do similar thing, but for source package (to put it to some source repository or use "dpkg-buildpackage" on it)?
The officicial text is rather long: orig.tar.gz, changelog, control file... Is there something like checkinstall, but for source packages? Additional bonus whould be if it also figures out dependencies automatically (at least partially).

Library search paths using the -R option

There are multitudes of articles online that proclaim in strident tones that the use of LD_LIBRARY_PATH is a bad idea, and that one must set library search paths using the -R option. The majority of said articles also mention Solaris in the same breath. The trouble is, on Linux, this does not work with g++.
g++: unrecognized option '-R'
Now what?
You can use -Wl,-rpath=/your/rpath:
$ g++ -o t t.cpp -Wl,-rpath=/my/lib/dir -lwhatever
$ readelf -a t|grep RPATH
0x000000000000000f (RPATH) Library rpath: [/my/lib/dir]

Resources