I have a line as:
Name:sample Location:(xyz)
I want to to convert it to dictionary as follows:
{'Name':'sample','Location':'(xyz)'}
I want to do this using python script. So, please suggest how do i make this possible. The platform i am working on is linux.
# First split at whitespaces ==> ['Name:sample', 'Location:(xyz)']
# Next split each item at ':' and convert them into list of tuples
# ==>[('Name', 'sample'), ('Location', '(xyz)')]
# Convert the list of tuples to dictionary
sample_string = "Name:sample Location:(xyz)"
split_sample_string = sample_string.split()
tuple_string = [tuple(item.split(":")) for item in split_sample_string]
final_dictionary = dict(tuple_string)
print final_dictionary
# final_dictionary = {'Name': 'sample', 'Location': '(xyz)'}
Related
I'm using HDBSCAN clustering algorithm and using RandomizedSearchCV. When I fit the features with labels, I get error "IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed". Shape of embedding is (5000,4) and of hdb_labels is (5000,). Below is my code
# UMAP
umap_hdb = umap.UMAP(n_components=4, random_state = 42)
embedding = umap_hdb.fit_transform(customer_data_hdb)
# creating HDBSCAN wrapper
class HDBSCANWrapper(hdbscan.HDBSCAN):
def predict(self,X):
return self.labels_.astype(int)
# HBDSCAN
clusterer_hdb = HDBSCANWrapper(min_samples=40, min_cluster_size=1000, metric='manhattan', gen_min_span_tree=True).fit(embedding)
hdb_labels = clusterer_hdb.labels_
# specify parameters and distributions to sample from
param_dist = {'min_samples': [10,30,50,60,100,150],
'min_cluster_size':[100,200,300,400,500],
'cluster_selection_method' : ['eom','leaf'],
'metric' : ['euclidean','manhattan']
}
# validity_scroer
validity_scorer = make_scorer(hdbscan.validity.validity_index,greater_is_better=True)
n_iter_search = 20
random_search = RandomizedSearchCV(clusterer_hdb
,param_distributions=param_dist
,n_iter=n_iter_search
,scoring=validity_scorer
,random_state=42)
random_search.fit(embedding, hdb_labels)
I'm getting an error in the random_search.fit and could not get rid of it. Any suggestions/help would be appreciated.
I want to convert a variable to a string and then to an array that I can use to compare, but i dont know how to do that.
my code:
import face_recognition
import numpy as np
a = face_recognition.load_image_file('C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\WIN_20191115_10_32_24_Pro.jpg') # my picture 1
b = face_recognition.load_image_file('C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\WIN_20191115_09_48_56_Pro.jpg') # my picture 2
c = face_recognition.load_image_file(
'C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\WIN_20191115_09_48_52_Pro.jpg') # my picture 3
d = face_recognition.load_image_file('C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\ziv sion.jpg') # my picture 4
e = face_recognition.load_image_file(
'C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\WIN_20191120_17_46_40_Pro.jpg') # my picture 5
f = face_recognition.load_image_file(
'C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\WIN_20191117_16_19_11_Pro.jpg') # my picture 6
a = face_recognition.face_encodings(a)[0]
b = face_recognition.face_encodings(b)[0]
c = face_recognition.face_encodings(c)[0]
d = face_recognition.face_encodings(d)[0]
e = face_recognition.face_encodings(e)[0]
f = face_recognition.face_encodings(f)[0]
Here I tried to convert the variable to a string
str_variable = str(a)
array_variable = np.array(str_variable)
my_face = a, b, c, d, e, f, array_variable
while True:
new = input('path: ')
print('Recognizing...')
unknown = face_recognition.load_image_file(new)
unknown_encodings = face_recognition.face_encodings(unknown)[0]
The program cannot use the variable:
results = face_recognition.compare_faces(array_variable, unknown_encodings, tolerance=0.4)
print(results)
recognize_times = int(results.count(True))
if (3 <= recognize_times):
print('hello boss!')
my_face = *my_face, unknown_encodings
please help me
The error shown:
Traceback (most recent call last):
File "C:/Users/zivsi/PycharmProjects/AI/pytt.py", line 37, in <module>
results = face_recognition.compare_faces(my_face, unknown_encodings, tolerance=0.4)
File "C:\Users\zivsi\AppData\Local\Programs\Python\Python36\lib\site-
packages\face_recognition\api.py", line 222, in compare_faces
return list(face_distance(known_face_encodings, face_encoding_to_check) <= tolerance)
File "C:\Users\zivsi\AppData\Local\Programs\Python\Python36\lib\site-packages\face_recognition\api.py", line 72, in face_distance
return np.linalg.norm(face_encodings - face_to_compare, axis=1)
ValueError: operands could not be broadcast together with shapes (7,) (128,)
First of all, the array_variable should actually be a list of the known encodings and not a numpy array.
Also you do not need str.
Now, in your case, if the input images i.e., a,b,c,d,f,e do NOT have the same dimensions, the error will persist. You can not compare images that have different sizes using this function. The reason is that the comparison is based on the distance and distance is defined on vectors of the same length.
Here is a working simple example using the photos from https://github.com/ageitgey/face_recognition/tree/master/examples:
import face_recognition
import numpy as np
from PIL import Image, ImageDraw
from IPython.display import display
# Load a sample picture and learn how to recognize it.
obama_image = face_recognition.load_image_file("obama.jpg")
obama_face_encoding = face_recognition.face_encodings(obama_image)[0]
# Load a second sample picture and learn how to recognize it.
biden_image = face_recognition.load_image_file("biden.jpg")
biden_face_encoding = face_recognition.face_encodings(biden_image)[0]
array_variable = [obama_face_encoding,biden_face_encoding] # list of known encodings
# compare the list with the biden_face_encoding
results = face_recognition.compare_faces(array_variable, biden_face_encoding, tolerance=0.4)
print(results)
[False, True] # True means match, False mismatch
# False: coming from obama_face_encoding VS biden_face_encoding
# True: coming from biden_face_encoding VS biden_face_encoding
To run it go here: https://beta.deepnote.com/project/09705740-31c0-4d9a-8890-269ff1c3dfaf#
Documentation: https://face-recognition.readthedocs.io/en/latest/face_recognition.html
EDIT
To save the known encodings you can use numpy.save
np.save('encodings',biden_face_encoding) # save
load_again = np.load('encodings.npy') # load again
Here is my sample data:
# sample data
xdata = [3.33172, 3.33348, 3.33525, 3.33702, 3.33878, 3.34055, 3.34232,
3.34408, 3.34585, 3.34762, 3.34938, 3.35115, 3.35292 , 3.35468, 3.35645,
3.35822, 3.35998, 3.36175, 3.36352, 3.36529, 3.36705, 3.36882]
ydata = [-0.00437834, -0.00486735, -0.0118371, -0.00582171, 0.00339488,
-0.000369502, -0.000898799, -0.00797662, -0.00853566, -0.0123596,
-0.0162318, -0.0103355, -0.00445416, 0.00137920, -0.00251916, -0.00968244,
0.00260652, 0.00841350, 0.00445556, 0.00373271, 0.00621243, 0.00220983]
How could I use np.where to find the index value of 3.35115, for example?
You need to first turn your data into a numpy array so you can check where it is == to your target value:
>>> np.where(np.array(xdata) == 3.35115)
# (array([11]),)
This says that index 11 of xdata is 3.35115.
I have two sets of arrays stored in a file and I need to extract values one by one and compare them. I am using this code but does look like I am doing correctly.
# First Dataset
File.foreach(file_set_a) do |data_a|
data_array_a = data_a.split("\t")
#file_name_a = data_array_a[0]
#file_ext_a = data_array_a[1]
# Second Dataset
File.foreach(file_set_b) do |data_b|
data_array_b = data_b.split("\t")
#file_name_b = data_array_b[0]
#file_ext_b = data_array_b[1]
#Compare
#file_name_a == #file_name_b
end
end
The problem is, I cannot go back and extract the next values in the set A when I enter the set B. Any suggestions?
First, convert those 2 files into two separated data arrays
lines_array_a = File.readlines(file_set_a)
lines_array_b = File.readlines(file_set_b)
I am assuming both of the array size will be same. Now run a loop and get the items from both array to compare them.
for i in 0..(lines_array_a.count - 1) do
data_array_a = lines_array_a[i].split("\t")
#file_name_a = data_array_a[0]
#file_ext_a = data_array_a[1]
data_array_b = lines_array_b[i].split("\t")
#file_name_b = data_array_b[0]
#file_ext_b = data_array_b[1]
#file_name_a == #file_name_b
end
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import json
d = {'a':'текст',
'b':{
'a':'текст2',
'b':'текст3'
}}
print(d)
w = open('log', 'w')
json.dump(d,w, ensure_ascii=False)
w.close()
It gives me:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)
Post the full traceback, the error could be coming from the print statement when it fails to decode the dictionary object. For some reason print statement cannot decode all contents if you have Cyrillic text in it.
Here is how I save to json my dictionary that contains Cyrillics:
mydictionary = {'a':'текст'}
filename = "myoutfile"
with open(filename, 'w') as jsonfile:
json.dump(mydictionary, jsonfile, ensure_ascii=False)
The trick will be reading in json back into dictionary and doing things with it.
To read in json back into dictionary:
with open(filename, 'r') as jsonfile:
newdictonary = json.load(jsonfile)
Now when you look at the dictionary, the word 'текст' looks (encoded) like '\u0442\u0435\u043a\u0441\u0442'. You simply need to decode it using encode('utf-8'):
for key, value in newdictionary.iteritems():
print value.encode('utf-8')
Same goes for lists if your Cyrillic text is stored there:
for f in value:
print f.encode('utf-8')
# or if you plan to use the val somewhere else:
f = f.encode('utf-8')