Trying to find area using SHAPE#AREA - cursor

I am trying to get the area of each polygon in the .shp file and then append it to a new column in the attribute table.
import os
import arcpy
import math
folderpath = 'C:\Users\Michaelf\Desktop\GEOG M173'
arcpy.env.workspace = folderpath
arcpy.env.overwriteOutput = True
input_shp = folderpath + r'\lower48_county_2012_election.shp'
equal_shape = folderpath + r'\project_lower48.shp'
out_point = folderpath + r'\lower_48_centroid.shp'
out_shp = folderpath + r'\48_State_Centroids.shp'
totarea = []
arcpy.AddField_management(input_shp, "totarea")
geometryField = arcpy.Describe(totarea).shapeFieldName
cursor = arcpy.UpdateCursor(totarea)
for row in cursor:
AreaValue = row.getValue(geometryField).area
row.setValue("total_area",AreaValue)
cursor.updateRow(row)
del row, cursor
print AreaValue
Below is one suggestion I've received but I don't quite understand it
with arcpy.da.SearchCursor(input_shp, ("OID#", "SHAPE#AREA")) as cursor:
for row in cursor:
print("Feature {0} has an area of {1}".format(row[0], row[1]))
Here is my error message:
Traceback (most recent call last):
File "C:/Users/Michaelf/Desktop/GEOG M173/test2.py", line 16, in <module>
geometryField = arcpy.Describe(totarea).shapeFieldName
File "C:\Program Files (x86)\ArcGIS\Desktop10.3\ArcPy\arcpy\__init__.py", line 1246, in Describe
return gp.describe(value)
File "C:\Program Files (x86)\ArcGIS\Desktop10.3\ArcPy\arcpy\geoprocessing\_base.py", line 374, in describe
self._gp.Describe(*gp_fixargs(args, True)))
RuntimeError: Object: Describe input value is not valid type

Since it's a geometry-centric database, Arc automatically stores the total area of a polygon as an attribute. There's no need to calculate it, which is why (I assume) that code snippet was suggested to you:
with arcpy.da.SearchCursor(input_shp, ("OID#", "SHAPE#AREA")) as cursor:
for row in cursor:
print("Feature {0} has an area of {1}".format(row[0], row[1]))
This does the following:
Create a cursor to go through the input shapefile input_shp, pulling just two attributes (the object ID and the total area).
Loop through the shapefile one row at a time.
For each row, print out row[0] (the object ID) and row[1] (the total area).
For further reading I'd take a look at this excellent answer on GIS.SE. It explores some of the options for accessing and calculating geometry in detail.
(Final note: the shape#area attribute will be in the units of the data's projection. Be aware of this if you're getting unexpected numbers.)

Related

Change ID in multiple FASTA files

I need to rename multiple sequences in multiple fasta files and I found this script in order to do so for a single ID:
original_file = "./original.fasta"
corrected_file = "./corrected.fasta"
with open(original_file) as original, open(corrected_file, 'w') as corrected:
records = SeqIO.parse(original_file, 'fasta')
for record in records:
print record.id
if record.id == 'foo':
record.id = 'bar'
record.description = 'bar' # <- Add this line
print record.id
SeqIO.write(record, corrected, 'fasta')
Each fasta file corresponds to a single organism, but it is not specified in the IDs. I have the original fasta files (because these have been translated) with the same filenames but different directories and include in their IDs the name of each organism.
I wanted to figure out how to loop through all these fasta files and rename each ID in each file with the corresponding organism name.
ok my effort, got to use my own input folders/files since they where not specified in question
/old folder contains files :
MW628877.1.fasta :
>MW628877.1 Streptococcus agalactiae strain RYG82 DNA gyrase subunit A (gyrA) gene, complete cds
ATGCAAGATAAAAATTTAGTAGATGTTAATCTAACTAGTGAAATGAAAACGAGTTTTATCGATTACGCCA
TGAGTGTCATTGTTGCTCGTGCACTTCCAGATGTTAGAGATGGTTTAAAACCTGTTCATCGTCGTATTTT
>KY347969.1 Neisseria gonorrhoeae strain 1448 DNA gyrase subunit A (gyrA) gene, partial cds
CGGCGCGTACCGTACGCGATGCACGAGCTGAAAAATAACTGGAATGCCGCCTACAAAAAATCGGCGCGCA
TCGTCGGCGACGTCATCGGTAAATACCACCCCCACGGCGATTTCGCAGTTTACGGCACCATCGTCCGTAT
MG995190.1.fasta :
>MG995190.1 Mycobacterium tuberculosis strain UKR100 GyrA (gyrA) gene, complete cds
ATGACAGACACGACGTTGCCGCCTGACGACTCGCTCGACCGGATCGAACCGGTTGACATCCAGCAGGAGA
TGCAGCGCAGCTACATCGACTATGCGATGAGCGTGATCGTCGGCCGCGCGCTGCCGGAGGTGCGCGACGG
and an /empty folder.
/new folder contains files :
MW628877.1.fasta :
>MW628877.1
MQDKNLVDVNLTSEMKTSFIDYAMSVIVARALPDVRDGLKPVHRRI
>KY347969.1
RRVPYAMHELKNNWNAAYKKSARIVGDVIGKYHPHGDFAVYGTIVR
MG995190.1.fasta :
>MG995190.1
MTDTTLPPDDSLDRIEPVDIQQEMQRSYIDYAMSVIVGRALPEVRD
my code is :
from Bio import SeqIO
from os import scandir
old = './old'
new = './new'
old_ids_dict = {}
for filename in scandir(old):
if filename.is_file():
print(filename)
for seq_record in SeqIO.parse(filename, "fasta"):
old_ids_dict[seq_record.id] = ' '.join(seq_record.description.split(' ')[1:3])
print('_____________________')
print('old ids ---> ',old_ids_dict)
print('_____________________')
for filename in scandir(new):
if filename.is_file():
sequences = []
for seq_record in SeqIO.parse(filename, "fasta"):
if seq_record.id in old_ids_dict.keys():
print('### ', seq_record.id,' ', old_ids_dict[seq_record.id])
seq_record.id += '.'+old_ids_dict[seq_record.id]
seq_record.description = ''
print('-->', seq_record.id)
print(seq_record)
sequences.append(seq_record)
SeqIO.write(sequences, filename, 'fasta')
check how it works, it actually overwrites both files in new folder,
as pointed out by #Vovin in his comment it needs to be adapted per your files template from-to.
I am sure there is more than a way to do this, probably better and more pythonic than may way, I am learning too. Let us know

How can I create and shuffle a dataset for triplet mining in TensorFlow 2?

I'm working on a network using triplet mining for training. In order to make it work properly, I need my batches to contain several images of the same class. The problem I'm currently facing is that I have 751 classes, for a total of 12,937 pictures, and a batch size of 48 pictures. When shuffling the dataset using the command below, the odds to get pictures from the same class are really low, making the triplet mining inefficient.
dataset = dataset.shuffle(12937)
What I would need instead is a way of generating batches that contain a specific number of pictures for every class represented in this batch. As an example, let's say here that I want 12 classes per batch, there would be 4 pictures for each of them.
Another problem I'm facing is how would I shuffle this dataset at the end of every epoch so that I can have different batches that still follow the condition fixed above, that is 12 classes, 4 pictures for each one of them?
Is there any proper way to do it? I can't really find one. Please let me know if I'm unclear, and if you need further details.
================ EDIT ================
I've been trying a few things, and came up with something that would do what I want. The function would be the following:
counter = 0.
# Assuming a format such as (data, label)
def predicate(data, label):
global counter
allowed_labels = tf.constant([counter])
isallowed = tf.equal(allowed_labels, tf.cast(label, tf.float32))
reduced = tf.reduce_sum(tf.cast(isallowed, tf.float32))
counter += 1
return tf.greater(reduced, tf.constant(0.))
##tf.function
def custom_shuffle(train_dataset, batch_size, samples_per_class = 4, iterations_in_epoch = 100, database='market'):
assert batch_size%samples_per_class==0, F'batch size must be a {samples_per_class} multiple.'
if database == 'market':
class_nbr = 751
else:
raise Exception('Unsuported database yet')
all_datasets = [train_dataset.filter(predicate) for _ in range(class_nbr)] # Every element of this array is a dataset of one class
for i in range(iterations_in_epoch):
choice = tf.random.uniform(
shape=(batch_size//samples_per_class,),
minval=0,
maxval=class_nbr,
dtype=tf.dtypes.int64,
) # Which classes will be in batch
choice = tf.data.Dataset.from_tensor_slices(tf.concat([choice for _ in range(4)], axis=0)) # Exactly 4 picture from each class in the batch
batch = tf.data.experimental.choose_from_datasets(all_datasets, choice)
if i==0:
all_batches = batch
else:
all_batches = all_batches.concatenate(batch)
all_batches = all_batches.batch(batch_size)
return all_batches
It does what I want, however the returned dataset is extremely slow to iterate, making modele learning impossible. As per this thread, I understood that I needed to decorate custom_shuffle with #tf.function, as the one commented out. However, when doing so, it raises the following error:
Traceback (most recent call last):
File "training.py", line 137, in <module>
main()
File "training.py", line 80, in main
train_dataset = get_dataset(TRAINING_FILENAMES, IMG_SIZE, BATCH_SIZE, database=database, func_type='train')
File "E:\Morgan\TransReID_TF\tfr_to_dataset.py", line 260, in get_dataset
dataset = custom_shuffle(dataset, batch_size)
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in __call__
result = self._call(*args, **kwds)
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\def_function.py", line 846, in _call
return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds) # pylint: disable=protected-access
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\function.py", line 1843, in _filtered_call
return self._call_flat(
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\function.py", line 1923, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\function.py", line 545, in call
outputs = execute.execute(
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: No unary variant device copy function found for direction: 1 and Variant type_index: class tensorflow::data::`anonymous namespace'::DatasetVariantWrapper
[[{{node BatchDatasetV2/_206}}]] [Op:__inference_custom_shuffle_11485]
Function call stack:
custom_shuffle
Which I don't understand, and don't see how to fix.
Is there something I'm doing wrong?
PS: I'm aware the lack of minimal code to reproduce this behavior makes it hard to debug, I'll try to provide some as soon as possible.

File Open Mechanism

I'm trying to create an application. The application gives the user 2 combo boxes. Combo Box 1 gives the first part of the file name the user wants, and Combo Box 2 gives the second part of the file name. E.g. Combo box 1 option 1 is 1 and Combo Box 2 option 1 is A; the selected file is 1_A.txt.
I have a load button which is to use the file name , and open a file with that name. If no file exists, the application opens a dialog saying "No Such File Exists"
from PySide import QtGui, QtCore
from PySide.QtCore import*
from PySide.QtGui import*
class MainWindow(QtGui.QMainWindow):
def __init__(self,):
QtGui.QMainWindow.__init__(self)
QtGui.QApplication.setStyle('cleanlooks')
#PushButtons
load_button = QPushButton('Load',self)
load_button.move(310,280)
run_Button = QPushButton("Run", self)
run_Button.move(10,340)
stop_Button = QPushButton("Stop", self)
stop_Button.move(245,340)
#ComboBoxes
#Option1
o1 = QComboBox(self)
l1 = QLabel(self)
l1.setText('Option 1:')
l1.setFixedSize(170, 20)
l1.move(10,230)
o1.move(200, 220)
o1.setFixedSize(100, 40)
o1.insertItem(0,'')
o1.insertItem(1,'A')
o1.insertItem(2,'B')
o1.insertItem(3,'test')
#Option2
o2 = QComboBox(self)
l2 = QLabel(self)
l2.setText('Option 2:')
l2.setFixedSize(200, 20)
l2.move(10,290)
o2.move(200,280)
o2.setFixedSize(100, 40)
o2.insertItem(0,'')
o2.insertItem(1,'1')
o2.insertItem(2,'2')
o2.insertItem(3,'100')
self.fileName = QLabel(self)
self.fileName.setText("Select Options")
o1.activated.connect(lambda: self.fileName.setText(o1.currentText() + '_' + o2.currentText() + '.txt'))
o2.activated.connect(lambda: self.fileName.setText(o1.currentText() + '_' + o2.currentText() + '.txt'))
load_button.clicked.connect(self.fileHandle)
def fileHandle(self):
file = QFile(str(self.fileName.text()))
open(file, 'r')
if __name__ == '__main__':
import sys
app = QtGui.QApplication(sys.argv)
window = MainWindow()
window.setWindowTitle("Test11")
window.resize(480, 640)
window.show()
sys.exit(app.exec_())
The error I'm getting is TypeError: invalid file: <PySide.QtCore.QFile object at 0x031382B0> and I suspect this is because the string described in the file handle isn't being inserted in the QFile properly. Can someone please help
The Python open() function doesn't have any knowledge of objects of type QFile. I doubt you actually need to construct a QFile object though.
Instead, just open the file directly via open(self.fileName.text(), 'r'). Preferably, you would do:
with open(self.fileName.text(), 'r') as myfile:
# do stuff with the file
unless you need to keep the file open for a long period of time
I came up with a solution also.
def fileHandle(self):
string = str(self.filename.text())
file = QFile()
file.setFileName(string)
file.open(QIODevice.ReadOnly)
print(file.exists())
line = file.readLine()
print(line)
What this does is that it takes the string of the filename field. Creates the file object. Names the file object the string, and then opens the file. I have exists to check if the file is there, and after reading the test document i have, ti seemed to work as I wanted.
Thanks anyway #three_pineapples, but I'm going to use my solution :P

ValueError: could not convert string to float: E-3 [duplicate]

This question already has answers here:
Issue querying from Access database: "could not convert string to float: E+6"
(3 answers)
Closed 7 years ago.
I am trying to access some of the columns in a Microsoft Access database table which contains numbers of type double but I am getting the error mentioned in the title.The code used for querying the database is as below, the error is occurring in the line where cur.execute(....) command is executed. Basically I am trying filter out data captured in a particular time interval. If I exclude the columns CM3_Up, CG3_Up, CM3_Down, CG3_Down which contains double data type in the cur.execute(....) command I wont get the error. Same logic was used to access double data type from other tables and it worked fine, I am not sure what is going wrong.
Code:
start =datetime.datetime(2015,03,28,00,00)
a=start
b=start+datetime.timedelta(0,240)
r=7
while a < (start+datetime.timedelta(1)):
params = (a,b)
sql = "SELECT Date_Time, CM3_Up, CG3_Up, CM3_Down, CG3_Down FROM
Lysimeter_Facility_Data_5 WHERE Date_Time >= ? AND Date_Time <= ?"
for row in cur.execute(sql,params):
if row is None:
continue
r = r+1
ws.cell(row = r,column=12).value = row.get('CM3_Up')
ws.cell(row = r,column=13).value = row.get('CG3_Up')
ws.cell(row = r,column=14).value = row.get('CM3_Down')
ws.cell(row = r,column=15).value = row.get('CG3_Down')
a = a+five_min
b = b+five_min
wb.save('..\SE_SW_Lysimeters_Weather_Mass_Experiment-02_03_26_2015.xlsx')
Complete error report:
Traceback (most recent call last):
File "C:\DB_PY\access_mdb\db_to_xl.py", line 318, in <module>
for row in cur.execute(sql,params):
File "build\bdist.win32\egg\pypyodbc.py", line 1920, in next
row = self.fetchone()
File "build\bdist.win32\egg\pypyodbc.py", line 1871, in fetchone
value_list.append(buf_cvt_func(alloc_buffer.value))
ValueError: could not convert string to float: E-3
As to this discussion:
Python: trouble reading number format
the trouble could be that e should be d, like:
float(row.get('CM3_Up').replace('E', 'D'))
Sounds weird to me though, but I know only little of Python.
It sounds like you receive strings like '2.34E-3', so try with a conversion. Don't know Python, but in C# it could be like:
ws.cell(row = r,column=12).value = Convert.ToDouble(row.get('CM3_Up'))
ws.cell(row = r,column=13).value = Convert.ToDouble(row.get('CG3_Up'))
ws.cell(row = r,column=14).value = Convert.ToDouble(row.get('CM3_Down'))
ws.cell(row = r,column=15).value = Convert.ToDouble(row.get('CG3_Down'))

Query or Expression for excluding certain values from DAL selection

I'm trying to exclude posts which have a tag named meta from my selection, by:
meta_id = db(db.tags.name == "meta").select().first().id
not_meta = ~db.posts.tags.contains(meta_id)
posts=db(db.posts).select(not_meta)
But those posts still show up in my selection.
What is the right way to write that expression?
My tables look like:
db.define_table('tags',
db.Field('name', 'string'),
db.Field('desc', 'text', default="")
)
db.define_table('posts',
db.Field('title', 'string'),
db.Field('message', 'text'),
db.Field('tags', 'list:reference tags'),
db.Field('time', 'datetime', default=datetime.utcnow())
)
I'm using Web2Py 1.99.7 on GAE with High Replication DataStore on Python 2.7.2
UPDATE:
I just tried posts=db(not_meta).select() as suggested by #Anthony, but it gives me a Ticket with the following Traceback:
Traceback (most recent call last):
File "E:\Programming\Python\web2py\gluon\restricted.py", line 205, in restricted
exec ccode in environment
File "E:/Programming/Python/web2py/applications/vote_up/controllers/default.py", line 391, in <module>
File "E:\Programming\Python\web2py\gluon\globals.py", line 173, in <lambda>
self._caller = lambda f: f()
File "E:/Programming/Python/web2py/applications/vote_up/controllers/default.py", line 8, in index
posts=db(not_meta).select()#orderby=settings.sel.posts, limitby=(0, settings.delta)
File "E:\Programming\Python\web2py\gluon\dal.py", line 7578, in select
return adapter.select(self.query,fields,attributes)
File "E:\Programming\Python\web2py\gluon\dal.py", line 3752, in select
(items, tablename, fields) = self.select_raw(query,fields,attributes)
File "E:\Programming\Python\web2py\gluon\dal.py", line 3709, in select_raw
filters = self.expand(query)
File "E:\Programming\Python\web2py\gluon\dal.py", line 3589, in expand
return expression.op(expression.first)
File "E:\Programming\Python\web2py\gluon\dal.py", line 3678, in NOT
raise SyntaxError, "Not suported %s" % first.op.__name__
SyntaxError: Not suported CONTAINS
UPDATE 2:
As ~ isn't currently working on GAE with Datastore, I'm using the following as a temporary work-around:
meta = db.posts.tags.contains(settings.meta_id)
all=db(db.posts).select()#, limitby=(0, settings.delta)
meta=db(meta).select()
posts = []
i = 0
for post in all:
if i==settings.delta: break
if post in meta: continue
else:
posts.append(post)
i += 1
#settings.delta is an long integer to be used with limitby
Try:
meta_id = db(db.tags.name == "meta").select().first().id
not_meta = ~db.posts.tags.contains(meta_id)
posts = db(not_meta).select()
First, your initial query returns a complete Row object, so you need to pull out just the "id" field. Second, not_meta is a Query object, so it goes inside db(not_meta) to create a Set object defining the set of records to select (the select() method takes a list of fields to return for each record, as well as a few other arguments, such as orderby, groupby, etc.).

Resources