How to extract data (Download) from IBM Watson notebook - ibm-watson

I have an instance of Pandas dataframe in a notebook runs in IBM Watson. I need to download the dataframe as a CSV file.

File read and File write python operations works in the notebook environment. If you want to run a system command all you have to do is run a command after ! to list the file you can just do
!ls -la
So my approach was to create a file in the local storage and encode the file in base64 and create a download link
from IPython.display import HTML
import base64
def create_download_link( dataframe, title = "Download CSV file", filename = "myout222.csv"):
csv = dataframe.to_csv() # create the csv
b64 = base64.b64encode(csv.encode()) # encode the file
payload = b64.decode() # set the payload
html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
html = html.format(payload=payload,title=title,filename=filename)
return HTML(html) # returning the link
Now call the function create_download_link(your_dataframe)
instead of a dataframe you can download any file by just reading the file and encoding the file
Since the system commands are working you can also upload the files into a separate server using curl, And also you can download files into the local storage by wget

Related

Opening a VCF.bgz file in python - edit

I have downloaded some data from gnomad - https://gnomad.broadinstitute.org/downloads.
It comes in the form of VCF.bgz file and I would like to read it as a vcf file.
I found some code here: Partially expand VCF bgz file in Linux
by #rnorris .
import gzip
ifile = gzip.GzipFile("gnomad.genomes.r2.1.1.sites.2.vcf.bgz")
ofile = open("truncated.vcf", "wb")
LINES_TO_EXTRACT = 100000
for line in range(LINES_TO_EXTRACT):
ofile.write(ifile.readline())
ifile.close()
ofile.close()
I tried it on my data and got:
Not a gzipped file (b'TB')
Is there any way to fix it? I don't understand what the problem is.
"Not a gzipped file" means it's not a gzipped file. Either it was corrupted, downloaded incorrectly, or isn't a gzip file in the first place. A gzip file starts with b'\x1f\x8b'. Not b'TB'.
A .bgz file should be a gzip file, so you likely did not download what you think you downloaded. How did you download it?
The gnomAD datasets should definitely be valid block-gzipped VCF files. Regardless, I think you might have a better time using Hail (disclaimer: I'm a Hail maintainer).
Hail can read directly out of Google Cloud Storage if you install the GCS connector.
curl https://broad.io/install-gcs-connector | bash
Then in Python:
import hail as hl
mt = hl.read_matrix_table(
'gs://gcp-public-data--gnomad/release/2.1.1/ht/exomes/gnomad.exomes.r2.1.1.sites.ht'
)
mt = mt.head(100_000)
sites = mt.collect()

Python script for creating zip file in remote server

I want to write a python script that connects to the remote server and creates a zip file in the remote server which consists of specific files present in remote server itself. I have written the below script :
c = paramiko.SSHClient()
c.set_missing_host_key_policy(paramiko.AutoAddPolicy())
c.connect('15.100.1.1', username='user', password='123')
sftp=c.open_sftp()
c.exec_command("cd 'C:\\Program Files\\temp'")
filenames = ['a.txt','b.txt','c.txt']
zf = zipfile.ZipFile('files.zip', mode='w')
for fname in filenames:
zf.write(fname)
zf.close()
sftp.close()
c.close()
But instead of creating zip file in remote server, the zip file gets created in the local machine itself. Can anyone please help me in this.....
When you create the zip-file you refer to a local file files.zip:
zf = zipfile.ZipFile('files.zip', mode='w')
You never tell it to create the file remotely. But you could probably (haven't tried it myself) create a remote file and send the handle to ZipFile. The issue with that is that you will run the zipping on you local machine instead of on the remote machine. That will have all your files be moved over network back and forth to no use other than creating the remote file. Try to do the zipping directly on the remote machine instead!
As UlfR says,
"Try to do the zipping directly on the remote machine". 
Since he (she?) didn't say how to do that, I suggest something like
c.exec_command("zip files.zip a.txt b.txt c.txt")
As suggested by G-Man Says 'Reinstate Monica';
I would like to add that in order to create a zip in a specific location; use the following code:
import paramiko
host = "" #Enter host name
port = "" #Enter port number
username = "" #Enter username
password = "" #Enter password
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(host, port, username, password)
sftp = ssh.open_sftp()
Now, we can either zip a group of files, single file or a directory.
#1 to Zip a single file
ssh.exec_command("zip /location_to_save_zip/zipname.zip /location_of_file/file1.txt")
#2 To zip multiple files
ssh.exec_command("zip /location_to_save_zip/zipname.zip /location_of_file/file1.txt /location_of_file/file2.txt")
#3 To zip a folder
ssh.exec_command("zip -r /location_to_save_zip/zipname.zip /location_of_directory")
Close the SFTP and Paramiko SSH Client connection
sftp.close()
ssh.close()

Can not open downloaded PDF files

I'm trying to download needed PDFs related to a researcher.
But the downloaded PDFs can't be opened, saying that the files may be damaged or in wrong format. While another URL used in test resulted in normal PDF files. Do you have any suggestion?
import requests
from bs4 import BeautifulSoup
def download_file(url, index):
local_filename = index+"-"+url.split('/')[-1]
# NOTE the stream=True parameter
r = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
f.flush()
return local_filename
# For Test: http://ww0.java4.datastructures.net/handouts/
# Can't open: http://flyingv.ucsd.edu/smoura/publications.html
root_link="http://ecal.berkeley.edu/publications.html#journals"
r=requests.get(root_link)
if r.status_code==200:
soup=BeautifulSoup(r.text)
# print soup.prettify()
index=1
for link in soup.find_all('a'):
new_link=root_link+link.get('href')
if new_link.endswith(".pdf"):
file_path=download_file(new_link,str(index))
print "downloading:"+new_link+" -> "+file_path
index+=1
print "all download finished"
else:
print "errors occur."
Your code has a comment saying:
# Can't open: http://flyingv.ucsd.edu/smoura/publications.html
Looks like what you can't open is a HTML file. So no wonder that a PDF reader will complain about it...
For any real PDF link that I had a problem with I would proceed as follows:
Download file with a different method (wget, curl, a browser, ...).
Can you download it even? Or is there some password hoop to be jumped through?
Is the download fast + complete?
Does it then open in a PDF viewer?
If so, compare to the file your script downloaded.
What are the differences?
Could they be caused by your script?
Are there no differences within the first few hundred lines, but differences later? End of the file being a bunch of nul-bytes? Then your download didn't complete...
If not so, still compare the differences. If there are none, your script is not at fault. The PDF may really be corrupted...
What does it look like, when opened in a text editor?

Ruby Zlib: What is the purpose of orig_name=?

My code is like this:
gz = Zlib::GzipWriter.open('test.zip')
gz.orig_name = "test.csv"
gz.write("testing writing to zipped file")
gz.close
What I am trying to do:
When using zip extractor application, test.zip will be unzipped to test.csv
I used orig_name method thinking that when I will try to extract the zip with other zip extractor like archive utility, the resulting file would be 'test.csv'. But the file is still 'test'.
If by "other zip extractor" you mean the gzip utility, you'd need to use the -N option of gzip to use the name stored in the gzip header. Otherwise it will just use the compressed file name with the .gz removed.

Importing text files or compressed files in QlikView using a batch file

I have multiple text files in a folder. I need those files to be imported into QlikView on a daily basis. Is there any way to import those files using batch/command file?
Moreover, can I import compressed files into QlikView?
I am not sure how your load script is set up, but if you wish to refresh your QlikView document, and you don't have QlikView Server, then you can use a batch file as follows:
"<Path To QlikView>\QV.exe" /r "ReportToReload.qvw"
The /r command parameter tells QlikView to open the document, reload it and then save and close the document. However, you must make sure that the QlikView User Preference option "Keep Progress Open after Reload" is not enabled, otherwise the progress dialogue will wait for you to close it after the document has been reloaded.
You can then schedule this batch file to run via Windows' Task Scheduler, or your favourite scheduling tool.
QlikView cannot import compressed files (e.g. Zip/RAR etc.), so you would need to extract these first using a batch script.
You can loop over your directory structure and read the existing files in your load script.
LET vCustCount = NoOfRows('Kunde');
TRACE Anzahl Kunden: $(vCustCount);
FOR i=1 TO $(vCustCount)
LET vNameKunde = FieldValue('name_kunde',$(i));
FOR each vFile in filelist ('$(vNameKunde)/umsatz.qvd')
TRACE $(vFile) hat eine umsatz.qvd;
LOAD ....
FROM [$(vFile)] (qvd);
NEXT vFile
NEXT
In this case I load pre-calculated qvd files but you could do the same with txt, csv ...
And as i_saw_drones mentioned QlikView cannot import compressed files. If you need to read compressed files you can batch operate them with a unzip tool.
You should have a look at
21.1 Loading Data from Files
in the Reference Manual.
HTH
Following script checks whether qvd exists or not. If, yes then it update it otherwise create a new qvd
IF NOT isNull(qvdCreateTime('G:\TestQvd\Data.qvd')) THEN
data2:
load * from G:\TestQvd\Data.qvd(qvd);
FOR each vFille in filelist ('G:\Test\*')
LOAD * FROM
[$(vFille)]
(txt, codepage is 1252, explicit labels, delimiter is spaces, msq);
NEXT vFille
ELSE
FOR each vFille in filelist ('G:\Test\*')
data2:
LOAD * FROM
[$(vFille)]
(txt, codepage is 1252, explicit labels, delimiter is spaces, msq);
NEXT vFille
ENDIF
STORE data2 into G:\TestQvd\Data.qvd;
exit Script;

Resources