Adding prefix to existing array - arrays

I am trying to add suffix to an existing array. Below is my code
print('a' + [10, 100])
With this I am getting below error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate str (not "list") to str
Could you please help hoe to do that? I could use some for loop but I believe there may be more straightforward way to achieve the same.

You can create a new concatenated array as:
>>> ['{0}{1}'.format('a', num) for num in [10, 100]]
['a10', 'a100']
Read String format and List Comprehensions from doc.

If I understand your question, you want a new string array (list). You could try this:
new_lst = ['a'+str(x) for x in [10, 100]] # just use string concatentation

Related

numpy indexing using argsort result [duplicate]

This question already has answers here:
Get N maximum values and indices along an axis in a NumPy array
(4 answers)
Closed 4 years ago.
I got a numpy 2D array, and the list of indices corresponding to the top 3 elements obtained using argsort. Now, I am trying to extract the values corresponding to this indices, and it is not working. What is the workaround ?.
A = array([[0.19334242, 0.9787497 , 0.41453434, 0.35298119, 0.17943745,
0.63468207, 0.43840688],
[0.39811914, 0.68040634, 0.7589702 , 0.3573046 , 0.16365397,
0.86329535, 0.48559053],
[0.5848541 , 0.54203383, 0.27262654, 0.21979374, 0.06917679,
0.10586995, 0.57083441],
[0.76765549, 0.05703751, 0.83383973, 0.71867625, 0.16338699,
0.85721418, 0.5953548 ]])
np.flip(A.argsort(),axis=1)[:,0:3]
array([[1, 5, 6],
[5, 2, 1],
[0, 6, 1],
[5, 2, 0]])
gets error
>>> A[np.flip(A.argsort(),axis=1)[:,0:3]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 5 is out of bounds for axis 0 with size 4
In [22]: A.ravel()[A.argsort(axis=None)[::-1][:3]]
Out[22]: array([ 0.9787497 , 0.86329535, 0.85721418])
Explanation
By default, argsort() sorts along the last axis. In your case, you want to sort a flattened version of the array as you don't give any meaning to the fact that the array is 2D. This happens by passing axis=None to argsort().
Since you get 1D indices, you also need to access values on a flattened version of the array, which is what ravel() do.
[::-1] reverses the argsort array to get top values first and [:3] gets the first 3 values.
Note: there are other and possibly more efficient ways to do that, but this was the first thing that came to my mind.

DataConversionWarning: A column-vector y was passed when a 1d array was expected

I keep having an error running this part of my code:
scores = cross_val_score(XGB_Clf, X_resampled, y_resampled, cv=kf)
The error is :
DataConversionWarning: A column-vector y was passed when a 1d array
was expected. Please change the shape of y to (n_samples, ), for
example using ravel(). y = column_or_1d(y, warn=True)
I know there are lots of answers to this question, and that I need to use ravel(), but using it does not change anything!
Also, the array "y" I'm passing to the function is not a column-vector ...
See:
y_resampled
Out[82]: array([0, 0, 0, ..., 1, 1, 1], dtype=int64)
When I run
y_resampled.ravel()
I get
Out[81]: array([0, 0, 0, ..., 1, 1, 1], dtype=int64)
which is exactly the same as my initial variable...
Also, when I run y_resampled.values.ravel() I get an error telling me that this is well a numpy array...
Traceback (most recent call last):
File "<ipython-input-80-9d28d21eeab5>", line 1, in <module>
y_resampled.values.ravel()
AttributeError: 'numpy.ndarray' object has no attribute 'values'
Does any one of you have a solution to this?
Thanks a lot!
Check out this answer man!
Simply:
model = forest.fit(train_fold, train_y.values.ravel())
in you write y_resampled as dataframe, you can use values function.
import pandas as pd
y_resampled = pd.DataFrame(y_resampled)

Usage of Pydatalog Aggregate Functions

I have been playing around with the various aggregate functions to get a feel for them, and after being confused for the past few days I am in need of clarification. I either get completely unintuitive behavior or unhelpful errors. For instance, I test:
(p[X]==min_(Y, order_by=Z)) <= Y.in_((4,6,2)) & Z.in_((6,))
looking at sample output:
p[0]==X,Y,Z
([(6,)], [4, 6, 2], [6, 6, 6])
p[1]==X,Y,Z
([(6,)], [6, 4, 2], [6, 6, 6])
p[2]==X,Y,Z
([(6,)], [4, 2, 6], [6, 6, 6])
Why is the minimum 6? 2. Why has the value bound to Z been repeated 3 times? 3. What exactly is the purpose of 'order_by' in relation to the list from which a minimum value is found? 4. Why does the output change based upon if there are multiple values in the 'order_by' list; why does a specific value--6, in this case--in the 'order_by' list effect the output as it has? Another example:
(p[X]==min_(Y, order_by=Z)) <= Y.in_((4,6,2)) & Z.in_((0,))
Output:
p[0]==X,Y,Z
([(6,)], [4, 6, 2], [0, 0, 0])
p[1]==X,Y,Z
([(6,)], [2, 6, 4], [0, 0, 0])
p[2]==X,Y,Z
([(2,)], [2, 6, 4], [0, 0, 0])
Why did the output of X change--from 6 to 2--based upon the indexed provided? Even though the output was wrong in the previous example, at least it was consistent for the indexes used; with there only being one min/max, this makes since.
I at least get to see the output using the min_, max_, sum_ functions; but, I am lost when it comes to rank_ and running_sum_. I follow a similar process when defining my function:
(p[X]==running_sum_(Z, group_by=Z, order_by=Z)) <= Z.in_((43,34,65))
I try to view the output:
p[0]==X
I get the error:
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/UserList.py", line 16, in repr
def repr(self): return repr(self.data)
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyParser.py", line 109, in data
self.todo.ask()
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyParser.py", line 566, in ask
self._data = Body(self.pre_calculations, self).ask()
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyParser.py", line 686, in ask
self._data = literal.lua.ask()
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyEngine.py", line 909, in _
invoke(subgoal)
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyEngine.py", line 664, in invoke
todo.do() # get the thunk and execute it
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyEngine.py", line 640, in do
self.thunk()
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyEngine.py", line 846, in
aggregate.complete(base_subgoal, subgoal))
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyParser.py", line 820, in complete
result = [ tuple(l.terms) for l in list(base_subgoal.facts.values())]
AttributeError: 'bool' object has no attribute 'values'
What does this mean? What was done incorrectly? What are the relations shared by the running_sum_ (and rank_) parameters--'group_by' and 'order_by'?
As there seems to be no examples on the web, 2 or 3 short examples of rank_ and running_sum_ usage would be greatly appreciated.
Aggregate clauses are solved in 2 steps :
first resolve the unknowns in the clause, while ignoring the aggregate function
then apply the aggregate function on the result
Here is how you could write the first clause :
(p[None]==min_(Y, order_by=Y)) <= Y.in_((4,6,2))
The variable(s) in the bracket after p is used as the "group by" in SQL, and must also appear in the body of the clause. In this case, it does not vary, so I use None. The order_by variable is needed when you want to retrieve another value than the one you order by.
Let's say you want to retrieve the names of the youngest pupil in each class of a school. The base predicate would be pupil(ClassName, Name, Age).
+ pupil('1A', 'John', 8)
+ pupil('1B', 'Joe', 9)
The aggregate clause would be :
(younger[ClassName] == min_(Name, order_by= Age)) <= pupil(ClassName, Name, Age)
The query would then be :
(younger[ClassName]==X)

Python, loop program

Can someone tell me what i am doing wrong? I am writing a program using loops in Python 3.x, but when i execute program i am getting a traceback error:
multiple of 13 is 195 and factors are as follows
Traceback (most recent call last):
File "C:/Users/Darlene/Desktop/Chapter 4/program4_2.py", line 19, in
list1.append(j)
AttributeError: 'dict' object has no attribute 'append'
this is the code i entered:
def main():
for i in reversed(list(range(100,201))):
if i%13==0:
print("multiple of 13 is",i,"and factors are as follows")
list1 = {}
for j in list(range(2,i+1)):
if i%j == 00:
list1.append(j)
print(list1)
main()
As commented by Luke Park, list1 = {} will declare a dictionary. What you need is list1 = [].
Also, range will already return a range type that can be handled by most methods and loops so there's no need to cast it to a list.
list1 must be an list like so...
list1 = []
you defined it as an dict, and as python said
'dict' object has no attribute 'append'

Remove punctuation from a list using loops-Python3

So what I'm trying to do is create a function that reads in a list and create a new list without the punctuation, using loops.
So far, I've got:
list=["This:","is","a","list."]
def depunctuate():
for i in range(0,len(list),1):
list1=""
for j in range(0,len(list[i]),1):
if(list[i][j] !=['(',')','?',':',';',',','.','!','/','"',"'"]):
list1+=list1[i][j]
cleanList+=[list1]
return cleanList
depunctuate()
So what I'm looking for it to return is "This is a list"
However I'm getting
Traceback (most recent call last):
File "depunctuate.py", line 10, in <module>
depunctuate()
File "depunctuate.py", line 7, in depunctuate
tokens1 += tokens1[i][j]
IndexError: string index out of range
Any help is appreciated, thanks!
clean_l = [s.strip(".:;,'\"?!/") for s in l]
This will remove leading and trailing punctuation chars.

Resources