How can I add a file of proxies into dictionary format - arrays

I'm trying to make my result something like:
proxies_dict = {
'http':'http://178.141.249.246:8081',
'http':'http://103.12.198.54:8080',
'http':'http://23.97.173.57:80',
}
I have tried doing
proxies_dict = {}
with open('proxies.txt', 'r') as proxy_file:
for proxy in proxy_file:
proxies_dict['http'] = 'http://' + proxy.rstrip()
print(proxies_dict)
But this will only add the last line of proxy, not the whole thing. How can I make it add every proxy in a my .txt file?

Something like this could get ya going!
proxy text file looks like this:
178.141.249.246:8081
103.12.198.54:8080
23.97.173.57:80
proxies_list = []
with open('proxies.txt', 'r+') as proxy_file:
# read txt file
proxies = proxy_file.readlines()
for proxy in proxies:
# initialize dict in loop
proxies_dict = {}
# add proxy to dict
proxies_dict['http'] = 'http://' + proxy.rstrip()
# append dict to list
proxies_list.append(proxies_dict)
print(proxies_dict)
[{'http': 'http://178.141.249.246:8081'},
{'http': 'http://103.12.198.54:8080'},
{'http': 'http://23.97.173.57:80'}]
Basically you have to read the file first and then when you add the item to the dictionary you append it to the list which will hold each of your proxies. I did it this way so you could keep the 'http' key for each of your proxies.
EDIT
If you need them all in one dictionary it would look something link this, per David's answer:
with open(file, 'r+') as f:
# read file
content = f.readlines()
# this is an extra step for if you want to
# strip the newlines from each item, else
# links = content
# will work as well
links = [row.strip() for row in content]
# initialize dict
tmp = {}
# create http key and and links list
tmp['http'] = links
# print result
print(tmp)
{'http': ['178.141.249.246:8081', '103.12.198.54:8080', '23.97.173.57:80']}

Related

How to build an array of Objects in a loop

I'm new with Python but i'm a Powershell user so maybe what i'm trying to do is not possible the same way in Python
In Python 3 to learn i'm trying to make a list of the files in a directory and store it into a indexstore variable.
To do that this is what i done :
i created 2 objects Index and Indexstore
class Index(object):
def __init__(self, filepath, size):
self.filepath = filepath
self.size = size
and
class IndexStore(object):
def __init__(self, filepath, size):
self.filepath = filepath
self.size = size
after that i get my filelist from a location on my HDD
listOfFile = os.listdir(SourcePath)
With with list i'm starting a loop where i get the fullpath and the size of the file ( Like ForEach ) in 2 variables fullPath and fileSize
fullPath = os.path.join(SourcePath, entry)
fileSize: int = os.path.getsize(fullPath)
With the values i set the Index Object
setattr(Index, 'filepath', fullPath)
setattr(Index, 'size', fileSize)
And it's working with
pprint(vars(Index))
i have the result
mappingproxy({'dict': <attribute 'dict' of 'Index' objects>,
'doc': None,
'init': <function Index.init at 0x00000271B9C7E940>,
'module': 'main',
'repr': <property object at 0x00000271B9F30590>,
'weakref': <attribute 'weakref' of 'Index' objects>,
'filepath': 'D:\AmigaForever\AmigaOS3.9.ISO',
'size': 28862259})
After that is my problem ! In Powershell if i want to add a object2 to my objectlist1 i just do Objectlist1 += object2 and the work is done but in Python 3.x i tried many things on forums without success best way seems to be :
IndexStore = []
IndexStore.append(Index(fullPath, fileSize))
But the variable Indexstore stay Empty and if i try to print it
print(IndexStore)
pprint(vars(IndexStore))
the run console say :
print(IndexStore)
TypeError: 'tuple' object is not callable
Can you help me please ? Do i'm checking the value of my Indexstore well ?
Or my error is how i'm appending the values ?
I want in a second loop use again the values of the Object array to continue my code.
With the goal of 'Using Python 3 to make a list of the files in a directory and store it into a indexstore variable'.
The first problem I see is that you create a class Indexstore but later completely obviate the class when you assign the variable Indexstore = [].
so given you have a valid list of files from:
listOfFile = os.listdir(SourcePath)
This is an approach that will work:
First build an IndexItem class:
class IndexItem:
def __init__(self, filepath, size):
self.filepath = filepath
self.size = size
def __repr__(self):
# Returns a string representation of the IindexItem
return f"({self.filepath}, {self.size})"
This class has an intialization method which serves to preserve the instantiation values passed during initialization and a repr method which is used to convert the index values into readable text.
Next we create the IndexStore Class as follows:
class IndexStore:
def __init__(self):
self._index = []
def append(self, o: object):
# Append an entry onto the index
self._index.append(o)
def __repr__(self):
# Returns a string representation of self._index
return ', '.join(list(str(x) for x in self._index))
This class includes an instantiation which creates a list to hold the indexItems passed to it, and append method to add IndexItems to the IndexStore and finally a repr to create a readable string of the values.
Finally, we implement the basic functionality required to build the IndexStore as follows:
listOfFile = os.listdir(sourcePath)
index = IndexStore()
for f in listOfFile[:5]:
# For each entry f in listOfFile
fullpath = os.path.join(sourcePath, f)
# add an instantiation of IndexItem to IndexStore
index.append(IndexItem(fullpath, int(os.path.getsize(fullpath))))
print(index)
A simpler and more direct approach to this problem makes use of inherent Python bult-in data structures and capabilities is as follows:
IndexStore = []
listOfFile = os.listdir(sourcePath)
for f in listOfFile[:5]:
# For each entry f in listOfFile
fullpath = os.path.join(sourcePath, f)
# add an instantiation of IndexItem to IndexStore
IndexStore.append((fullpath, int(os.path.getsize(fullpath)))
print(IndexStore)
In this approach, the class definitions are eliminated, and the IndexStore contains a list of tuples with each tuple containing the fullpath to the file and it's size

In Python, use beautiful soup to pull data using list of div ID

Have some Python experience. Very new to Beautiful soup.
I'm trying to take a list of div ID's for soup to find then export
What is the correct way to write this?
#my div ID list
DivIdList = [IdOne, IdTwo, IdThree,]
#to be filled with soup
ListName = []
HostList = []
InfoList = []
#loop through div ID list
for i in DivIdList:
#when found fill up with soup
Name = soup.find('IdOne')
Host = soup.find('IdTwo')
Info = soup.find('IdThree')
#Soup found to be exported
ListName.append(Name.text)
HostList.append(Host.text)
InfoList.append(Info.text)
#export soup info with headers
df = pd.DataFrame({'All Names':ListOfNames,....})
df.to_csv('MyFile.csv', index=False, encoding='utf-8')
Assuming IdOne etc are variables, you can use an f-string construct with soup.select_one()
soup.select_one(f'#{IdOne}') # etc
The # denotes an id css selector.
You will use i to stand in as that is the loop variable; also, be consistent with variable naming HostList.
If IdOne is an id css selector already then remove the # and use direct e.g. soup.select_one(i)
You then need a way to add to the appropriate list e.g.
ListName = []
HostList = []
InfoList = []
list_of_lists = [ListName, HostList, InfoList]
DivIdList = [IdOne, IdTwo, IdThree]
for number, i in enumerate(DivIdList):
list_of_lists[number].append(soup.select_one(f'#{i}').text)
It would be sensible to check soup.select_one(f'#{i}') is not None before using the .text accessor.
You could also have a dictionary, where the key is the id and the associated value, at start, is the relevant list to add to during the loop.

Pytorch Dataloader for Image GT dataset

I am new to pytorch. I am trying to create a DataLoader for a dataset of images where each image got a corresponding ground truth (same name):
root:
--->RGB:
------>img1.png
------>img2.png
------>...
------>imgN.png
--->GT:
------>img1.png
------>img2.png
------>...
------>imgN.png
When I use the path for root folder (that contains RGB and GT folders) as input for the torchvision.datasets.ImageFolder it reads all of the images as if they were all intended for input (classified as RGB and GT), and it seems like there is no way to pair the RGB-GT images. I would like to pair the RGB-GT images, shuffle, and divide it to batches of defined size. How can it be done? Any advice will be appreciated.
Thanks.
I think, the good starting point is to use VisionDataset class as a base. What we are going to use here is: DatasetFolder source code. So, we going to create smth similar. You can notice this class depends on two other functions from datasets.folder module: default_loader and make_dataset.
We are not going to modify default_loader, because it's already fine, it just helps us to load images, so we will import it.
But we need a new make_dataset function, that prepared the right pairs of images from root folder. Since original make_dataset pairs images (image paths if to be more precisely) and their root folder as target class (class index) and we have a list of (path, class_to_idx[target]) pairs, but we need (rgb_path, gt_path). Here is the code for new make_dataset:
def make_dataset(root: str) -> list:
"""Reads a directory with data.
Returns a dataset as a list of tuples of paired image paths: (rgb_path, gt_path)
"""
dataset = []
# Our dir names
rgb_dir = 'RGB'
gt_dir = 'GT'
# Get all the filenames from RGB folder
rgb_fnames = sorted(os.listdir(os.path.join(root, rgb_dir)))
# Compare file names from GT folder to file names from RGB:
for gt_fname in sorted(os.listdir(os.path.join(root, gt_dir))):
if gt_fname in rgb_fnames:
# if we have a match - create pair of full path to the corresponding images
rgb_path = os.path.join(root, rgb_dir, gt_fname)
gt_path = os.path.join(root, gt_dir, gt_fname)
item = (rgb_path, gt_path)
# append to the list dataset
dataset.append(item)
else:
continue
return dataset
What do we have now? Let's compare our function with original one:
from torchvision.datasets.folder import make_dataset as make_dataset_original
dataset_original = make_dataset_original(root, {'RGB': 0, 'GT': 1}, extensions='png')
dataset = make_dataset(root)
print('Original make_dataset:')
print(*dataset_original, sep='\n')
print('Our make_dataset:')
print(*dataset, sep='\n')
Original make_dataset:
('./data/GT/img1.png', 1)
('./data/GT/img2.png', 1)
...
('./data/RGB/img1.png', 0)
('./data/RGB/img2.png', 0)
...
Our make_dataset:
('./data/RGB/img1.png', './data/GT/img1.png')
('./data/RGB/img2.png', './data/GT/img2.png')
...
I think it works great) It's time to create our class Dataset. The most important part here is __getitem__ methods, because it imports images, applies transformation and returns a tensors, that can be used by dataloaders. We need to read a pair of images (rgb and gt) and return a tuple of 2 tensor images:
from torchvision.datasets.folder import default_loader
from torchvision.datasets.vision import VisionDataset
class CustomVisionDataset(VisionDataset):
def __init__(self,
root,
loader=default_loader,
rgb_transform=None,
gt_transform=None):
super().__init__(root,
transform=rgb_transform,
target_transform=gt_transform)
# Prepare dataset
samples = make_dataset(self.root)
self.loader = loader
self.samples = samples
# list of RGB images
self.rgb_samples = [s[1] for s in samples]
# list of GT images
self.gt_samples = [s[1] for s in samples]
def __getitem__(self, index):
"""Returns a data sample from our dataset.
"""
# getting our paths to images
rgb_path, gt_path = self.samples[index]
# import each image using loader (by default it's PIL)
rgb_sample = self.loader(rgb_path)
gt_sample = self.loader(gt_path)
# here goes tranforms if needed
# maybe we need different tranforms for each type of image
if self.transform is not None:
rgb_sample = self.transform(rgb_sample)
if self.target_transform is not None:
gt_sample = self.target_transform(gt_sample)
# now we return the right imported pair of images (tensors)
return rgb_sample, gt_sample
def __len__(self):
return len(self.samples)
Let's test it:
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
bs=4 # batch size
transforms = ToTensor() # we need this to convert PIL images to Tensor
shuffle = True
dataset = CustomVisionDataset('./data', rgb_transform=transforms, gt_transform=transforms)
dataloader = DataLoader(dataset, batch_size=bs, shuffle=shuffle)
for i, (rgb, gt) in enumerate(dataloader):
print(f'batch {i+1}:')
# some plots
for i in range(bs):
plt.figure(figsize=(10, 5))
plt.subplot(221)
plt.imshow(rgb[i].squeeze().permute(1, 2, 0))
plt.title(f'RGB img{i+1}')
plt.subplot(222)
plt.imshow(gt[i].squeeze().permute(1, 2, 0))
plt.title(f'GT img{i+1}')
plt.show()
Out:
batch 1:
...
Here you can find a notebook with code and simple dummy dataset.

Rails 5 CarrierWave, can't remove last file in a multiple upload

Following this How-to:
https://github.com/carrierwaveuploader/carrierwave/wiki/How-to:-Add-more-files-and-remove-single-file-when-using-default-multiple-file-uploads-feature
class ImagesController < ApplicationController
before_action :set_gallery
def create
add_more_images(images_params[:images])
flash[:error] = "Failed uploading images" unless #gallery.save
redirect_to :back
end
def destroy
remove_image_at_index(params[:id].to_i)
flash[:error] = "Failed deleting image" unless #gallery.save
redirect_to :back
end
private
def set_gallery
#gallery = Gallery.find(params[:gallery_id])
end
def add_more_images(new_images)
images = #gallery.images
images += new_images
#gallery.images = images
end
def remove_image_at_index(index)
remain_images = #gallery.images # copy the array
deleted_image = remain_images.delete_at(index) # delete the target image
deleted_image.try(:remove!) # delete image from S3
#gallery.images = remain_images # re-assign back
end
def images_params
params.require(:gallery).permit({images: []}) # allow nested params as array
end
end
I seem to not be able to correctly remove the very last file. In my printed file list it keeps on standing there. Oddly enough with 0kb.
Then when I load up new files this one does go away.
I had the same problem and I found that you have to call 'remove_images!' if this was the last one. In 'remove_image_at_index' function add:
#gallery.remove_images! if remain_images.empty?
Regards

Need help parsing results from ldap to csv

I am trying to create a script to generate a csv file with the results of some ldap queries using Net::LDAP but I'm having troubles skipping incomplete lines if one element of the #attributes array is blank.
my #attributes = ('cn', 'mail', 'telephoneNumber');
So for example, if a user has no mail listed, or no telephoneNumber listed, then it should skip the hold field instead of returning:
"Foo Bar",, # this line should be skipped since there is no mail nor telephone
"Bar Foo","bar#foo.com", # this line should be skipped too, no number listed
"John Dever","john_dever#google.com","12345657" # this one is fine, has all values
My loop right now is looking like this:
# Now dump all found entries
while (my $entry = $mesg->shift_entry()){
# Retrieve each fields value and print it
# if attr is multivalued, separate each value
my $current_line = ""; # prepare fresh line
foreach my $a (#attributes) {
if ($entry->exists($a)) {
my $attr = $entry->get_value($a, 'asref' => 1);
my #values = #$attr;
my $val_str = "";
if (!$singleval) {
# retrieve all values and separate them via $mvsep
foreach my $val (#values) {
if ($val eq "") { print "empty"; }
$val_str = "$val_str$val$mvsep"; # add all values to field
}
$val_str =~ s/\Q$mvsep\E$//; # eat last MV-Separator
} else {
$val_str = shift(#values); # user wants only the first value
}
$current_line .= $fieldquot.$val_str.$fieldquot; # add field data to current line
}
$current_line .= $fieldsep; # close field and add to current line
}
$current_line =~ s/\Q$fieldsep\E$//; # eat last $fieldsep
print "$current_line\n"; # print line
}
I have tried code like :
if ($attr == "") { next; }
if (length($attr) == 0) { next; }
and several others without any luck. I also tried simple if () { print "isempty"; } debug tests and its not working. Im not exacly sure how could I do this.
I appreciate any help or pointers you could give me on what am I doing wrong.
Thanks a lot in advance for your help.
UPDATE:
Per chaos request:
my $singleval = 0;
A sample run for this program would return:
Jonathan Hill,Johnathan_Hill#example.com,7883
John Williams,John_Williams#example.com,3453
Template OAP,,
Test Account,,
Template Contracts,,
So what I want to do is to skip all the lines that are missing a field, either email or extension number.
Label your while loop:
Record: while (my $entry = $mesg->shift_entry()){
and use:
next Record;
Your problem is that your next is associated with your foreach. Using the label avoids that.
By the way, $attr == '', though it will work in this case, is bad logic; in perl, == is a numeric comparison. String comparison would be $attr eq ''. Though I'd just use next Record unless $attr.

Resources