How to break elements in an array at each comma? python - arrays

So i opened a dataset and in short it looked something like this:
list1= ['Adrian,20,5,2000,green', 'Steve,15,6,1997,blue', ...]
trial = np.array(list1)
when i tried to print(trial[0][0]) to get Adrian, i only got the A.
So i figured i should make everything that has a comma after it an independent element, please help me get the output to be:
(['Adrian', 20, 5, 2000, 'green'], ['steve', 15, 6, 1997, 'blue'], ...)
where print(trial[0]) will give: ['Adrian', 20, 5, 2000, 'green']
and print(trial[0][0]) will give: Adrian

Just use the split function with a comma as the parameter like this:-
list2= ['Adrian,20,5,2000,green', 'Steve,15,6,1997,blue']
list1= []
for i in list2:
a = i.split(',')
list1 += [a]
trial = numpy.array(list1)
print(trial[0][0])
This will return Adrian.
You will still have to typecast the numbers to integer though, but that's easy to work around.

Related

How to find out if an arithmetic sequence exists in an array

If there is an array that contains random integers in ascending order, how can I tell if this array contains a arithmetic sequence (length>3) with the common differece x?
Example:
Input: Array=[1,2,4,5,8,10,17,19,20,23,30,36,40,50]
x=10
Output: True
Explanation of the Example: the array contains [10,20,30,40,50], which is a arithmetic sequence (length=5) with the common differece 10.
Thanks!
I apologize that I have not try any code to solve this since I have no clue yet.
After reading the answers, I tried it in python.
Here are my codes:
df = [1,10,11,20,21,30,40]
i=0
common_differene=10
df_len=len(df)
for position_1 in range(df_len):
for position_2 in range(df_len):
if df[position_1] + common_differene == df[position_2]:
position_1=position_2
i=i+1
print(i)
However, it returns 9 instead of 4.
Is there anyway to prevent the repetitive counting in one sequence [10,20,30,40] and also prevent accumulating i from other sequences [1,11,21]?
You can solve your problem by using 2 loops, one to run through every element and the other one to check if the element is currentElement+x, if you find one that does, you can continue form there.
With the added rule of the sequence being more than 2 elements long, I have recreated your problem in FREE BASIC:
DIM array(13) As Integer = {1, 2, 4, 5, 8, 10, 17, 19, 20, 23, 30, 36, 40, 50}
DIM x as Integer = 10
DIM arithmeticArrayMinLength as Integer = 3
DIM index as Integer = 0
FOR position As Integer = LBound(array) To UBound(array)
FOR position2 As Integer = LBound(array) To UBound(array)
IF (array(position) + x = array(position2)) THEN
position = position2
index = index + 1
END IF
NEXT
NEXT
IF (index <= arithmeticArrayMinLength) THEN
PRINT false
ELSE
PRINT true
END IF
Hope it helps
Edit:
After reviewing your edit, I have come up with a solution in Python that returns all arithmetic sequences, keeping the order of the list:
def arithmeticSequence(A,n):
SubSequence=[]
ArithmeticSequences=[]
#Create array of pairs from array A
for index,item in enumerate(A[:-1]):
for index2,item2 in enumerate(A[index+1:]):
SubSequence.append([item,item2])
#finding arithmetic sequences
for index,pair in enumerate(SubSequence):
if (pair[1] - pair[0] == n):
found = [pair[0],pair[1]]
for index2,pair2 in enumerate(SubSequence[index+1:]):
if (pair2[0]==found[-1] and pair2[1]-pair2[0]==n):
found.append(pair2[1])
if (len(found)>2): ArithmeticSequences.append(found)
return ArithmeticSequences
df = [1,10,11,20,21,30,40]
common_differene=10
arseq=arithmeticSequence(df,common_differene)
print(arseq)
Output: [[1, 11, 21], [10, 20, 30, 40], [20, 30, 40]]
This is how you can get all the arithmetic sequences out of df for you to do whatever you want with them.
Now, if you want to remove the sub-sequences of already existing arithmetic sequences, you can try running it through:
def distinct(A):
DistinctArithmeticSequences = A
for index,item in enumerate(A):
for index2,item2 in enumerate([x for x in A if x != item]):
if (set(item2) <= set(item)):
DistinctArithmeticSequences.remove(item2)
return DistinctArithmeticSequences
darseq=distinct(arseq)
print(darseq)
Output: [[1, 11, 21], [10, 20, 30, 40]]
Note: Not gonna lie, this was fun figuring out!
Try from 1: check the presence of 11, 21, 31... (you can stop immediately)
Try from 2: check the presence of 12, 22, 32... (you can stop immediately)
Try from 4: check the presence of 14, 24, 34... (you can stop immediately)
...
Try from 10: check the presence of 20, 30, 40... (bingo !)
You can use linear searches, but for a large array, a hash map will be better. If you can stop as soon as you have found a sequence of length > 3, this procedure takes linear time.
Scan the list increasingly and for every element v, check if the element v + 10 is present and draw a link between them. This search can be done in linear time as a modified merge operation.
E.g. from 1, search 11; you can stop at 17; from 2, search 12; you can stop at 17; ... ; from 8, search 18; you can stop at 19...
Now you have a graph, the connected components of which form arithmetic sequences. You can traverse the array in search of a long sequence (or a longest), also in linear time.
In the given example, the only links are 10->-20->-30->-40->-50.

Save integers into array given by first integer

I need to know, how to save integers from stdin into array, given by first integer in line... Ehm... hope you understand. I will give you an example.
On stdin I have:
0 : [ 1, 2, 3 ]
5 : [ 10, 11, 12, 13]
6 : [ 2, 4, 9 ]
0 : [ 4, 9, 8 ]
5 : [ 9, 6, 7 ]
5 : [ 1 ]
And I need save these integers to the arrays like this:
0={1, 2, 3, 4, 9, 8}
5={10, 11, 12, 13, 9, 6, 7, 1}
6={2, 4, 9}
I absolutely don't how to do it. There is a problem, that the number of arrays(in this case - 0, 5, 6 - so 3 arrays ) can be very high and I need to work effectively with memory...So I guess i will need something like malloc and free to solve this problem, or am I wrong? The names of arrays (0, 5, 6) can be changed. Number of integers in brackets has no maximum limit.
Thank you for any help.
I go with the assumption, this is homework, and I go with the assumption, this isn't your first homework to do, so I won't present you a solution but instead some tips that would help you to solve it yourself.
Given the input line
5 : [ 10, 11, 12, 13]
I will call "5" the "array name" and 10, 11, 12 and 13 the values to add.
You should implement some system to map array names to indices. A trivial approach would be like this:
.
size_t num_arrays;
size_t * array_names;
Here, in your example input, num_arrays will end up being 3 with array_names[3] = { 0, 5, 6}. If you find a new array name, realloc and add the new array name. Also you need the actual arrays for the values:
int * * array;
you need to realloc array for each new array name (like you realloc array_names). array[0] will represent array array_names[0] here array 0, array[1] will represent array array_names[1] here array 5 and array[2] will represent array array_names[2] here array 6.
To access an array, find it's index like so:
size_t index;
for (size_t index = 0; index < num_arrays && array_names[index] != search; ++index) ;
The second step is easy. Once you figured out, you need to use array[index] to add elemens, realloc that one (array[index] = realloc(array[index], new size)) and add elements there array[index][i+old_size] = new_value[i].
Obviously, you need to keep track of the number of elements in your separate arrays as well ;)
Hint: If searching for the array names take too long, you will have to replace that trivial mapping part by some more sophisticated data structure, like a hash map or a binary search tree. The rest of the concept may stay more or less the same.
Should you have problems to parse the input lines, I suggest, you open a new question specific on this parsing part.
In algorithmic terms, you need map (associative array) from ints to arrays. This is solved long ago in most higher level languages.
If you have to implement it manually, you have a few options:
simple "master" array where you store your 0, 5, 6, 1000000 and then map them to indices 0, 1, 2, 3 by doing search in for each time you have to access it (it's too time consuming when ;
hash table: write simple hash function to map 0, 5, 6, 1000000 (they're called keys) to values less than 1000, allocate array of 1000 elements and then make "master" array structures for each hash function result;
some kind of tree (e.g. red-black tree), may be a bit complex to implement manually.
Last two structures are part of programming classic and are well described in various articles and books.

Confusion with Fancy indexing (for non-fancy people)

Let's assume a multi-dimensional array
import numpy as np
foo = np.random.rand(102,43,35,51)
I know that those last dimensions represent a 2D space (35,51) of which I would like to index a range of rows of a column
Let's say I want to have rows 8 to 30 of column 0
From my understanding of indexing I should call
foo[0][0][8::30][0]
Knowing my data though (unlike the random data used here), this is not what I expected
I could try this that does work but looks ridiculous
foo[0][0][[8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30],0]
Now from what I can find in this documentation I can also use
something like:
foo[0][0][[8,30],0]
which only gives me the values of rows 8 and 30
while this:
foo[0][0][[8::30],0]
gives an error
File "<ipython-input-568-cc49fe1424d1>", line 1
foo[0][0][[8::30],0]
^
SyntaxError: invalid syntax
I don't understand why the :: argument cannot be passed here. What is then a way to indicate a range in your indexing syntax?
So I guess my overall question is what would be the proper pythonic equivalent of this syntax:
foo[0][0][[8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30],0]
Instead of
foo[0][0][8::30][0]
try
foo[0, 0, 8:30, 0]
The foo[0][0] part is the same as foo[0, 0, :, :], selecting a 2d array (35 x 51). But foo[0][0][8::30] selects a subset of those rows
Consider what happens when is use 0::30 on 2d array:
In [490]: np.zeros((35,51))[0::30].shape
Out[490]: (2, 51)
In [491]: np.arange(35)[0::30]
Out[491]: array([ 0, 30])
The 30 is the step, not the stop value of the slice.
the last [0] then picks the first of those rows. The end result is the same as foo[0,0,0,:].
It is better, in most cases, to index multiple dimensions with the comma syntax. And if you want the first 30 rows use 0:30, not 0::30 (that's basic slicing notation, applicable to lists as well as arrays).
As for:
foo[0][0][[8::30],0]
simplify it to x[[8::30], 0]. The Python interpreter accepts [1:2:3, 0], translating it to tuple(slice(1,2,3), 0) and passing it to a __getitem__ method. But the colon syntax is accepted in a very specific context. The interpreter is treating that inner set of brackets as a list, and colons are not accepted there.
foo[0,0,[1,2,3],0]
is ok, because the inner brackets are a list, and the numpy getitem can handle those.
numpy has a tool for converting a slice notation into a list of numbers. Play with that if it is still confusing:
In [495]: np.r_[8:30]
Out[495]:
array([ 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29])
In [496]: np.r_[8::30]
Out[496]: array([0])
In [497]: np.r_[8:30:2]
Out[497]: array([ 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

How can I convert an array of integers into an array of digits for further processing?

I want to break given numbers into digits and sort. I expect to get:
unused_digits(2015, 8, 26) # => [0,1,2,2,5,6,8]
I tried:
def unused_digits(*x)
x # => [2015, 8, 26]
x = x.join.split "" # => [2, 0, 1, 5, 8, 2, 6]
x = x.to_a # => [2, 0, 1, 5, 8, 2, 6]
# other stuff here
return x
end
if you are confused about the name "unused_digits". please ignore the name "unused_digits", and just treat it as "find_out_used_digits".
Originally I was going to find out the unused digits, but I was stuck at first stage finding used digits, so I just copied the first code for finding digits, and didn't copy the rest code to find unused ones. my bad. apologies.
For the problem described in comments, here is the solution:
def unused_digits(*x)
x.join.chars.sort.map(&:to_i)
end
unused_digits(2015,8,26)
#=> [0, 1, 2, 2, 5, 6, 8]
x is an array of arguments - [2015, 8, 26]
.join will join the arguments into a string and give us "2015826"
.chars will split the string into chars.
.sort will sort that character array
.map(&:to_i) will take each char and convert to number
TL;DR
Your question appears to be an X/Y problem, in large part because the name of your method (e.g. "unused_digits") doesn't actually seem to have anything to do with your expected return values. As originally posted, your method returns an array of used digits rather than unused digits.
If you truly want the return value to be [0,1,2,2,5,6,8] per your comment, then others have already posted useful answers. However, in the event that you actually want to return the digits that have not been used in any of your arguments (as suggested by your method name), then you may want to try the alternative described below.
Find Unused Digits with Array Difference
You can use various String functions to flatten an array of integers, and then use the Array difference method to return a de-duplicated list of unused digits. For example:
def unused_digits *integer_array
Array(0..9) - integer_array.flatten.join.scan(/\d/).sort.map(&:to_i)
end
unused_digits 2015, 8, 26
#=> [3, 4, 7, 9]
unused_digits 2345678
#=> [0, 1, 9]
This will correctly return an array of digits that are not included in any passed arguments. This seems to be what is intended by your method name, but your mileage may certainly vary.
Beginning your function, you already have an array: [2015, 8, 26]. If that's what you want, then you don't have to do anything else.
By then calling split("") directly after join, you are converting your initial array into a string, then back into an array.
By way of an example, this is executing what is essentially the same code in irb, the interactive ruby shell:
>> digits = 2015,8,26
=> [2015, 8, 26]
>> joined = digits.join
=> "2015826"
>> split = joined.split("")
=> ["2", "0", "1", "5", "8", "2", "6"]
>> split.to_a
=> ["2", "0", "1", "5", "8", "2", "6"]
>> split.class
=> Array
As you can see, when you call join, your 2015,8,26 turns into "2015826", which is a string. After you call split"", it becomes an array with each character as a separate element in the array.
Calling to_a on what is already an array has no effect.
Hopefully that's helpful!
def unused_digits(*x)
x.flat_map { |n| n.to_s.each_char.map(&:to_i) }.sort
end
unused_digits(2015,8,26)
#=> [0,1,2,2,5,6,8]

How to handle large files in python?

I am new in python. I have asked another question How to arrange three lists in such a way that the sum of corresponding elements if greater then appear first? Now the problem is following:
I am working with a large text file, in which there are 419040 rows and 6 columns containing floats. Among them I am taking first 3 columns to generate those three lists. So the lists I am actually working with has 419040 entries in each. While I was running the python code to extract the three columns into three lists the python shell was not responding, I suspected the large number of entries for this, I used this code:
file=open("file_location","r")
a=[]
b=[]
c=[]
for lines in file:
x=lines.split(" ")
a.append(float(x[0]))
b.append(float(x[1]))
c.append(float(x[2]))
Note: for small file this code was running perfectly.
To avoid this problem I am using the following code:
import numpy as np
a = []
b = []
c = []
a,b,c = np.genfromtxt('file_location',usecols = [0,1,2], unpack=True)
So when I am running the code given in answers to my previous question the same problem is happening. So what will be the corresponding code using numpy? Or, any other solutions?
If you're going to use numpy, then I suggest using ndarrays, rather than lists. You can use loadtxt since you don't have to handle missing data. I assume it'll be faster.
a = np.loadtxt('file.txt', usecols=(0, 1, 2))
a is now a two-dimensional array, stored as an np.ndarray datatype. It should look like:
>>> a
array([[ 1, 20, 400],
[ 5, 30, 500],
[ 3, 50, 100],
[ 2, 40, 300],
[ 4, 10, 200]])
However, you now need to re-do what you did in the previous question, but using numpy arrays rather than lists. This can be easily achieved like so:
>>> b = a.sum(axis=1)
>>> b
Out[21]: array([535, 421, 342, 214, 153])
>>> i = np.argsort(b)[::-1]
>>> i
Out[26]: array([0, 1, 2, 3, 4])
>>> a[i, :]
Out[27]:
array([[ 5, 30, 500],
[ 1, 20, 400],
[ 2, 40, 300],
[ 4, 10, 200],
[ 3, 50, 100]])
The steps involved are described in a little greater detail here.

Resources