Perl: Getting UTF-8 file with Mechanize content_file - file

Creating a perl script (Strawberry Perl v5.32.0 on W10) to download my Google Calendar. Google provides a 'private' (no login required) url to a 'basic.ics' file.
When opening this url in my browser (firefox), a window pops up to download this 'basic.ics' file. When saved, the file is UTF-8 encoded.
In my script I'm using WWW::Mechanize get ":content_file" to download the file:
#!/usr/bin/perl -w
use WWW::Mechanize;
# URL modified for obvious reasons ...
my $url = 'https://calendar.google.com/path-to-private-calendar-file/basic.ics';
my $local_file_name = 'Calendar.ics';
my $mech = WWW::Mechanize->new();
$mech->get( $url, ":content_file" => $local_file_name );
The file received with $mech->get however, is written ANSI-encoded and contains 'gibberich' (not 'translated' UTF-8 data, I suppose).
How can I make the get :content_file create the local file UTF-8 encoded ?
Or do I just download the file as is and convert it to UTF-8 later ?
If so please point me in the right direction, because reading the ANSI-encoded file as UTF-8 does not do the trick ...

This has nothing to do with UTF-8 or character encodings. You are getting a gzip-compressed response.
If Compress::Zlib is available, WWW::Mechanize provides the following header by default:
Accept-Encoding: gzip
This allows the remote end to compress the response. If it does, the remote end will provide the following header in the response:
Content-Encoding: gzip
This is happening here, and you are saving the compressed response.
You could use :content_cb instead of :content_file to provide a callback that decompresses the data and stores it. Or you could simply request an uncompressed version by providing the following header:
Accept-Encoding: identity
This is done using
$mech->get($url,
Accept_Encoding => 'identity',
":content_file" => $local_file_name,
);
If you don't otherwise need the overhead of WWW::Mechanize, why not use its base class LWP::UserAgent. It doesn't provide an Accept-Encoding by default, so the server is unlikely to gzip the response
my $ua = LWP::UserAgent->new();
# Will work 99.999% of the time.
$ua->get($url, ":content_file" => $local_file_name);
# Definitely works.
$ua->get($url,
Accept_Encoding => 'identity',
":content_file" => $local_file_name,
);

Sorry for my late reply, but I had some other items to handle.
Thanks to ikegami and Hakon for your replies and solutions.
From your replies, I have distilled 4 methods that get a readable UTF-8 encoded non-zipped result file:
Mechanize + Content_file and encoding
UserAgent + Content_file, no encoding
UserAgent + Content_file and encoding
UserAgent + print content to file
In the last case, however, there was an extra CR at the end of each line (CRCRLF instead of CRLF), but nothing a little regex can't solve ...
Here is the result of my tests:
CASE 1: Mechanize + Content_file and encoding ...
Get: $mech1->get($url, Accept_Encoding => 'identity', ":content_file" => $fn1)
Calendar1.ics received, size = 78564 bytes
CASE 2: UserAgent + Content_file, no encoding ...
Get: $ua2->get($url, ":content_file" => $fn2);
Calendar2.ics received, size = 78564 bytes
CASE 3: UserAgent + Content_file and encoding ...
Get: $ua3->get($url, Accept_Encoding => 'identity', ":content_file" => $fn3);
Calendar3.ics received, size = 78564 bytes
CASE 4: UserAgent + print content to file ...
Get: my $res4 = $ua4->get($url);
...
my $content = $res4->content;
$content =~ s/\r[\n]*/\n/gm;
print $fh $content;
...
Calendar4.ics received, size = 78564 bytes

Yes, I also get some sort of encrypted file when using WWW::Mechanize, however LWP::UserAgent works fine for me:
use feature qw(say);
use strict;
use warnings;
use LWP::UserAgent;
my $fn = 'Calendar.ics';
my $url = 'https://calendar.google.com/calendar/XXXXXXXX/basic.ics';
my $ua = LWP::UserAgent->new();
my $res = $ua->get( $url );
if ($res->is_success) {
say "Saving file: '$fn'";
open ( my $fh, '>', $fn ) or die "Could not open file '$fn': $!";
print $fh $res->content;
close $fh;
}
else {
die $res->status_line;
}

Related

PHP Download File Script (doesn't work)

I have never written an PHP Download File Script neither have any experience with it and I am really not a pro. I got the snippet of code you can see below from another website and tried to make use of it. I understand what is written in the script but I just don't get the message of the errors or rather said, I don't know how to prevent these.
Here is the download.php script - I have put it into the /download/ folder below my main domain:
<?php
ignore_user_abort(true);
set_time_limit(0); // disable the time limit for this script
$path = "/downloads/"; // change the path to fit your websites document structure
$dl_file = preg_replace("([^\w\s\d\-_~,;:\[\]\(\).]|[\.]{2,})", '', $_GET['download_file']); // simple file name validation
$dl_file = filter_var($dl_file, FILTER_SANITIZE_URL); // Remove (more) invalid characters
$fullPath = $path.$dl_file;
if ($fd = fopen ($fullPath, "r")) {
$fsize = filesize($fullPath);
$path_parts = pathinfo($fullPath);
$ext = strtolower($path_parts["extension"]);
switch ($ext) {
case "pdf":
header("Content-type: application/pdf");
header("Content-Disposition: attachment; filename=\"".$path_parts["basename"]."\""); // use 'attachment' to force a file download
break;
// add more headers for other content types here
default;
header("Content-type: application/octet-stream");
header("Content-Disposition: filename=\"".$path_parts["basename"]."\"");
break;
}
header("Content-length: $fsize");
header("Cache-control: private"); //use this to open files directly
while(!feof($fd)) {
$buffer = fread($fd, 2048);
echo $buffer;
}
}
fclose ($fd);
exit;
Now in the /download/ folder, which contains the download.php I have a folder /downloads, which contains the .pdf that should be downloaded.
The link I use on my webpage is:
PHP download file (why isn't it displayed, included the 4 white spaces :
Now I get the following errors when I click on the link:
Warning: Cannot set max_execution_time above master value of 30 (tried to set unlimited) in /var/www/xxx/html/download/download.php on line 4
Warning: fopen(/downloads/test.pdf): failed to open stream: No such file or directory in /var/www/xxx/html/download/download.php on line 12
Warning: fclose() expects parameter 1 to be resource, boolean given in /var/www/xxx/html/download/download.php on line 34
If I use an absolute path (https://www.my-domain.de/downloads/) for the $path variable, I get these errors:
Warning: Cannot set max_execution_time above master value of 30 (tried to set unlimited) in /var/www/xxx/html/download/download.php on line 4
Warning: fopen(): https:// wrapper is disabled in the server configuration by allow_url_fopen=0 in /var/www/xxx/html/download/download.php on line 12
Warning: fopen(https://www.my-domain.de/downloads/test.pdf): failed to open stream: no suitable wrapper could be found in /var/www/xxx/html/download/download.php on line 12
Warning: fclose() expects parameter 1 to be resource, boolean given in /var/www/xxx/html/download/download.php on line 34
I am thankful for any advices!
<?php
ignore_user_abort(true);
//set_time_limit(0); disable the time limit for this script
$path = "downloads/"; // change the path to fit your websites document structure
$dl_file = preg_replace("([^\w\s\d\-_~,;:\[\]\(\).]|[\.]{2,})", '', $_GET['download_file']); // simple file name validation
$dl_file = filter_var($dl_file, FILTER_SANITIZE_URL); // Remove (more) invalid characters
$fullPath = $path.$dl_file;
if ($fd = fopen ($fullPath, "r")) {
$fsize = filesize($fullPath);
$path_parts = pathinfo($fullPath);
$ext = strtolower($path_parts["extension"]);
switch ($ext) {
case "pdf":
header("Content-type: application/pdf");
header("Content-Disposition: attachment; filename=\"".$path_parts["basename"]."\""); // use 'attachment' to force a file download
break;
// add more headers for other content types here
default;
header("Content-type: application/octet-stream");
header("Content-Disposition: filename=\"".$path_parts["basename"]."\"");
break;
}
header("Content-length: $fsize");
header("Cache-control: private"); //use this to open files directly
while(!feof($fd)) {
$buffer = fread($fd, 2048);
echo $buffer;
}
}
fclose ($fd);
exit;
?>
Try this code
Your server is probably not allowing you for a maximum execution time limit for infinite seconds. Check it in php.ini file
Also the relative path was wrong, and "https://www.my-domain.de/downloads/" is not a path, it's a url for the server

Handling Hebrew files and folders with Python 3.4

I used Python 3.4 to create a programm that goes through E-mails and saves specific attachments to a file server.
Each file is saved to a specific destination depending on the sender's E-mail's address.
My problem is that the destination folders and the attachments are both in Hebrew and for a few attachments I get an error that the path does not exsist.
Now that's not possible because It can fail for one attachment but not for the others on the same Mail (the destination folder is decided by the sender's address).
I want to debug the issue but I cannot get python to display the file path it is trying to save correctly. (it's mixed hebrew and english and it always displays the path in a big mess, although it works correctly 95% of the time when the file is being saved to the file server)
So my questions are:
what should I add to this code so that it will proccess Hewbrew correctly?
Should I encode or decode somthing?
Are there characters I should avoid when proccessing the files?
here's the main piece of code that fails:
try:
found_attachments = False
for att in msg.Attachments:
_, extension = split_filename(str(att))
# check if attachment is not inline
if str(att) not in msg.HTMLBody:
if extension in database[sender][TYPES]:
file = create_file(str(att), database[sender][PATH], database[sender][FORMAT], time_stamp)
# This is where the program fails:
att.SaveAsFile(file)
print("Created:", file)
found_attachments = True
if found_attachments:
items_processed.append(msg)
else:
items_no_att.append(msg)
except:
print("Error with attachment: " + str(att) + " , in: " + str(msg))
and the create file function:
def create_file(att, location, format, timestamp):
"""
process an attachment to make it a file
:param att: the name of the attachment
:param location: the path to the file
:param format: the format of the file
:param timestamp: the time and date the attachment was created
:return: return the file created
"""
# create the file by the given format
if format == "":
output_file = location + "\\" + att
else:
# split file to name and type
filename, extension = split_filename(att)
# extract and format the time sent on
time = str(timestamp.time()).replace(":", ".")[:-3]
# extract and format the date sent on
day = str(timestamp.date())
day = day[-2:] + day[4:-2] + day[:4]
# initiate the output file
output_file = format
# add the original file name where needed
output_file = output_file.replace(FILENAME, filename)
# add the sent date where needed
output_file = output_file.replace(DATE, day)
# add the time sent where needed
output_file = output_file.replace(TIME, time)
# add the path and type
output_file = location + "\\" + output_file + "." + extension
print(output_file)
# add an index to the file if necessary and return it
index = get_file_index(output_file)
if index:
filename, extension = split_filename(output_file)
return filename + "(" + str(index) + ")." + extension
else:
return output_file
Thanks in advance, I would be happy to explain more or supply more code if needed.
I found out that the promlem was not using Hebrew. I found that there's a limit on the number of chars that the (path + filename) can hold (255 chars).
The files that failed excided that limit and that caused the problem

Python how to save dictionary with cyrillic symbols into json file

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import json
d = {'a':'текст',
'b':{
'a':'текст2',
'b':'текст3'
}}
print(d)
w = open('log', 'w')
json.dump(d,w, ensure_ascii=False)
w.close()
It gives me:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)
Post the full traceback, the error could be coming from the print statement when it fails to decode the dictionary object. For some reason print statement cannot decode all contents if you have Cyrillic text in it.
Here is how I save to json my dictionary that contains Cyrillics:
mydictionary = {'a':'текст'}
filename = "myoutfile"
with open(filename, 'w') as jsonfile:
json.dump(mydictionary, jsonfile, ensure_ascii=False)
The trick will be reading in json back into dictionary and doing things with it.
To read in json back into dictionary:
with open(filename, 'r') as jsonfile:
newdictonary = json.load(jsonfile)
Now when you look at the dictionary, the word 'текст' looks (encoded) like '\u0442\u0435\u043a\u0441\u0442'. You simply need to decode it using encode('utf-8'):
for key, value in newdictionary.iteritems():
print value.encode('utf-8')
Same goes for lists if your Cyrillic text is stored there:
for f in value:
print f.encode('utf-8')
# or if you plan to use the val somewhere else:
f = f.encode('utf-8')

Reading and writing variables from file

I'm trying to create a register/login CLI and I don't understand why don't the variables I write inside the register file work later when I try to use them in the login part.
I thought it was as easy as writing abc="xyz" and then reading that file, but it seems it's not like that.
###RegOrLog###
RegOrLog = str(raw_input('[Register/Log]:'))
###register the user detail###
if RegOrLog in('r', 'reg', 'register'):
reguser = raw_input('username:')
regpass = raw_input('password:')
regage = int(raw_input('age:'))
reggender = str(raw_input('gender:'))
###creating file for the user###
f = open(reguser, 'w')
f.write('password =' + repr(regpass) + '\n')
f.write('age =' + repr(regage) + '\n')
f.write('gender =' + repr(reggender) + '\n')
f.close()
RegOrLog = 'log'
###login as a user###
if RegOrLog in('l','log','login'):
loguser = raw_input('Login username:')
regpass = raw_input('password:')
#registered user#
regeduser = open(loguser, 'r')
regeduser = regeduser.read()
if regpass == password:
print 'Welcome', loguser
print 'You are', gender
print 'You are', age,'years old'
I now understand your problem. A text file just saves plain data, therefore, you cannot create a file writing like if it were a Python file, with a variable definition, and expect that when you read the file again all those variables will be initialized. Python will only read the data and assign it to a variable which will contain all the text you originally wrote into that file.
So, summing up, you are treating a text file like a python file and expecting python to execute the text file and create a variable.
But that does not work like that. You have to store the data in a database-like file (or in a database itself), then read it, and by means of data parsing, extract the variables you had written into the text file before.
You can do that manually, which is not very nice, or you can use a Python module called Pickle, which handles pretty much everything you would write yourself, and so you would only have to make use of this module in your code.
If you search for the Pickle module in Google, the Python.org website provides a very nice example of its usage:
#Pickle Example
# Save a dictionary into a pickle file.
import pickle
favorite_color = { "lion": "yellow", "kitty": "red" }
pickle.dump( favorite_color, open( "save.p", "wb" ) )
# Load the dictionary back from the pickle file.
favorite_color = pickle.load( open( "save.p", "rb" ) )
# favorite_color is now { "lion": "yellow", "kitty": "red" }
So, how would you use it in your code? This easy:
# RegOrLog
import pickle # new line!
RegOrLog = raw_input('[Register/Log]:') # why str()?
# register the user detail
if RegOrLog in('r', 'reg', 'register'):
reguser = raw_input('username:') # raw_input gives string by default, if you want other data type then you do have to convert it.
regpass = raw_input('password:')
regage = int(raw_input('age:'))
reggender =raw_input('gender:')
# creating file for the user
data = {"username": reguser, "password": regpass, "age": regage, "gender": reggender}
pickle.dump( data, open( "whatever_file_name", "wb" ) )
RegOrLog = 'log'
###login as a user###
if RegOrLog in('l','log','login'):
loguser = raw_input('Login username:')
regpass = raw_input('password:')
#registered user#
data = pickle.load( open( "whatever_file_name", "rb" ) )
if regpass == data["password"]:
print 'Welcome', loguser
print 'You are', gender
print 'You are', age,'years old'

MD5 encryption using jython

I need to encrypt some data using hashlib encryption in Jython. The output of variable "output" is a set of junk characters "¦?ìîçoÅ"w2?¨?¼?6"
m=hashlib.md5()
m.update(unicode(input).encode('utf-8'))
output = m.digest()
grinder.logger.info(digest= " + str(output))
How can I get the output as an array for the above code.
digest() method return bytes that can be used for other function that require bytes (for example to base64 or compress it). For simply displaying MD5 result as hex use hexdigest() method:
output = m.digest()
hexoutput = m.hexdigest()
print("digest= " + str(hexoutput))

Resources