How to copy DB from disk to memory with IronPython2

How to copy DB from disk to memory with IronPython2 - database

I'm trying to figure out how to copy an existing database file on disk to memory to make queries on it faster. I know how to do this in CPython3 with:
import sqlite3
db_path = r"C:\path to\database.db"
db_disk = sqlite3.connect(db_path)
db_memory = sqlite3.connect(':memory:')
db_disk.backup(db_memory)
but the .backup() function doesn't exist in IronPython2 (SQLite library version 3.7.7).
Through various researching, I've tried:
import clr
clr.AddReference('IronPython.SQLite.dll')
import sqlite3
db_path = r"C:\path to\database.db"
db_disk = sqlite3.connect(db_path)
db_memory = sqlite3.connect(':memory:')
script = ''.join(db_server.iterdump())
db_memory.executescript(script)
and
db_server = sqlite3.connect(db_path)
db_memory = sqlite3.connect(':memory:')
script = "".join(line for line in db_server.iterdump())
db_memory.executescript(script)
But I keep getting an error at the line script = ''.join(db_server.iterdump()) or script = "".join(line for line in db_server.iterdump()):
Warning: IronPythonEvaluator.EvaluateIronPythonScript operation failed.
Traceback (most recent call last):
File "<string>", line 72, in <module>
NotImplementedError: Not supported with C#-sqlite for unknown reasons.
The code above came from seeing these posts:
Post 1
Post 2
Post 3
I was going to try the solution in this post, but I don't have apsw and I can't load any packages.
I was also going to try the solition in post 1 above, but again, I can't get pandas.io.sql or sqlalchemy.
Can anyone point me to a snippet of code that accomplishes this or correct my current code?
Thanks.

Related

Passing a cursor to ndb on Google App Engine Python 3 leads to an error

Note: This is happening on my development server (MAC running Mojave)
I'm running Python 3 on Google App Engine (standard environment) and I have the code below
cursor = ndb.Cursor(urlsafe = next_page) if next_page else ndb.Cursor()
q = myObject.query(myObject.link == linkKey).order(-myObject.created)
resultsFuture = q.fetch_page_async(PAGE_SIZE,start_cursor=cursor)
If next_page is not None (which means cursor is not None), I get the following error
Traceback (most recent call last):
File .../env/lib/python3.7/site-packages/google/cloud/ndb/_datastore_api.py", line 92, in rpc_call
result = yield rpc
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Error parsing protocol message"
debug_error_string = "{"created":"<CREATION_TIME>","description":"Error received from peer ipv6:<MY_IP_ADDRESS>","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Error parsing protocol message","grpc_status":3}"
>
Has anyone encountered this before and if so what was the solution?
If not, can anyone point me to a possible solution?

https://github.com/googleapis/python-ndb/issues/471#issuecomment-649173225 says:
Hi, the cursor returned by fetch_page_async is already an instance of Cursor. The urlsafe argument to Cursor is a string that can be derived by calling Cursor.urlsafe() on a cursor instance. Since the cursor returned by fetch_page_async is already a Cursor instance, it can be passed directly to another call to fetch_page_async.
So instead you should do:
cursor = next_page if next_page else ndb.Cursor()
q = myObject.query(myObject.link == linkKey).order(-myObject.created)
resultsFuture = q.fetch_page_async(PAGE_SIZE,start_cursor=cursor)

I had a similar error in the past week and finally fixed it with some help, if you are still facing this issue kindly do the following:
cursor = ndb.Cursor(urlsafe = next_page.decode('utf-8')) if next_page else None
q = myObject.query(myObject.link == linkKey).order(-myObject.created)
resultsFuture = q.fetch_page_async(PAGE_SIZE,start_cursor=cursor)
This piece of code is really what fixes it:
next_page.decode('utf-8')
seeing as next_page is the urlsafe cursor_string you are sending and then later receiving from your frontend.
The problem seems to occur with Python3 when you create the Cursor from a urlsafe webstring, the issue is, in python2.7 the urlsafe() method in datastore_query.Cursor()used to be decoded one way or another to utf-8 but in python3 you have to do the conversion manually else what you send and later receive from the frontend will actually be a raw bytestring, mind you, this bytestring will convert smoothly to a Cursor object and even pass test but when sent as a _datastore_query.Cursor(), that's where the real problem arises as the Cursor created from that bytestring is wrong and leads to a status = StatusCode.INVALID_ARGUMENT error.
I hope this helps. Kindly let me know if you have any more question on this.
For some more context you can read on the error here:
GRPC status code 3 for invalid arguments
Also you can try logging out the urlsafe (next_page) string you send and receive from your frontend to verify whether it is indeed not utf-8 decoded.

Why can't I invoke sagemaker endpoint with either bytes or file as payload

I have deployed a linear regression model on Sagemaker. Now I want to write a lambda function to make prediction on input data. Files are pulled from S3 first. Some preprocessing is done and the final input is a pandas dataframe. According to boto3 sagemaker documentation, the payload can either be byte-like, or file. So I have tried to convert the dataframe to a byte array using code from this post
# Convert pandas dataframe to byte array
pred_np = pred_df.to_records(index=False)
pred_str = pred_np.tostring()
# Start sagemaker prediction
sm_runtime = aws_session.client('runtime.sagemaker')
response = sm_runtime.invoke_endpoint(
EndpointName=SAGEMAKER_ENDPOINT,
Body=pred_str,
ContentType='text/csv',
Accept='Accept')
I printed out pred_str which does seem like a byte array to me.
However when I run it, I got the following Algorithm Error caused by UnicodeDecodeError:
Caused by: 'utf8' codec can't decode byte 0xed in position 9: invalid continuation byte
The traceback shows python 2.7 not sure why that is:
Traceback (most recent call last):
File "/opt/amazon/lib/python2.7/site-packages/ai_algorithms_sdk/serve.py", line 465, in invocations
data_iter = get_data_iterator(payload, **content_parameters)
File "/opt/amazon/lib/python2.7/site-packages/ai_algorithms_sdk/io/serve_helpers.py", line 99, in iterator_csv_dense_rank_2
payload = payload.decode("utf8")
File "/opt/amazon/python2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
Is the default decoder utf_8? What is the right decoder I should be using? Why is it complaining about position 9?
In addition, I also tried to save the dataframe to csv file and use that as payload
pred_df.to_csv('pred.csv', index=False)
with open('pred.csv', 'rb') as f:
payload = f.read()
response = sm_runtime.invoke_endpoint(
EndpointName=SAGEMAKER_ENDPOINT,
Body=payload,
ContentType='text/csv',
Accept='Accept')
However when I ran it I got the following error:
Customer Error: Unable to parse payload. Some rows may have more columns than others and/or non-numeric values may be present in the csv data.
And again, the traceback is calling python 2.7:
Traceback (most recent call last):
File "/opt/amazon/lib/python2.7/site-packages/ai_algorithms_sdk/serve.py", line 465, in invocations
data_iter = get_data_iterator(payload, **content_parameters)
File "/opt/amazon/lib/python2.7/site-packages/ai_algorithms_sdk/io/serve_helpers.py", line 123, in iterator_csv_dense_rank_2
It doesn't make sense at all because it is standard 6x78 dataframe. All rows have same number of columns. Plus none of the columns are non-numeric.
How to fix this sagemaker issue?

I was finally able to make it work with the following code:
payload = io.StringIO()
pred_df.to_csv(payload, header=None, index=None)
sm_runtime = aws_session.client('runtime.sagemaker')
response = sm_runtime.invoke_endpoint(
EndpointName=SAGEMAKER_ENDPOINT,
Body=payload.getvalue(),
ContentType='text/csv',
Accept='Accept')
It is very import to call getvalue() function for the payload while invoking the endpoint. Hope this helps

Django Model MultipleObjectsReturned

I am using django ORM to talk to a SQL Server DB.
I used the .raw() method to run a query:
#classmethod
def execute_native_queries(cls, query):
return cls.objects.raw(query)
I later type-cast it into a list by running: data = list(modelName.execute_native_queries(query))
While iterating through the list, i would call certain columns as such:
for entry in data:
a = entry.colA
b = entry.colB
c = entry.colC
For certain entries, I am able to run one loop-iteration fine, however for some i get the following error:
api.models.modelName.MultipleObjectsReturned: get() returned more than one modelName-- it returned 2!
What i do not get is how come this error is surfacing?
EDIT: Added the stacktrace
Traceback (most recent call last):
File "<full filepath>\a.py", line 178, in method1
'vc': data.vc,
File "C:\FAST\Python\3.6.4\lib\site-packages\django\db\models\query_utils.py", line 137, in __get__
instance.refresh_from_db(fields=[self.field_name])
File "C:\FAST\Python\3.6.4\lib\site-packages\django\db\models\base.py", line 605, in refresh_from_db
db_instance = db_instance_qs.get()
File "C:\FAST\Python\3.6.4\lib\site-packages\django\db\models\query.py", line 403, in get
(self.model._meta.object_name, num)
api.models.modelName.MultipleObjectsReturned: get() returned more than one modelName-- it returned 2!

How to download a file via FTP and save it locally only if it does not exist already?

So, I'm downloading some data files from an ftp server. I need to daily go in and retrieve new files and save them on my pc, but only the new ones.
Code so far:
from ftplib import FTP
import os
ftp = FTP('ftp.example.com')
ftp.login()
ftp.retrlines('LIST')
filenames = ftp.nlst()
for filename in filenames:
if filename not in ['..', '.']:
local_filename = os.path.join('C:\\Financial Data\\', filename)
file = open(local_filename, mode = 'x')
ftp.retrbinary('RETR '+ filename, file.write)
I was thinking of using if not os.path.exists() but I need the os.path.joint for this to work. Using open() with mode = 'x' as above, I get the following err message: "FileExistsError: [Errno 17] File exists"
Is error handling the way to go, or is there a neat trick that I'm missing?

I landed on the following solution:
filenames_ftp = ftp.nlst()
filenames_loc = os.listdir("C:\\Financial Data\\")
filenames = list(set(filenames_ftp) - set(filenames_loc))

How can I read a blob that was written to the Blobstore by a Pipeline within the test framework?

I have a pipeline that creates a blob in the blobstore and places the resulting blob_key in one of its named outputs. When I run the pipeline through the web interface I have built around it, everything works wonderfully. Now I want to create a small test case that will execute this pipeline, read the blob out from the blobstore, and store it to a temporary location somewhere else on disk so that I can inspect it. (Since testbed.init_files_stub() only stores the blob in memory for the life of the test).
The pipeline within the test case seems to work fine, and results in what looks like a valid blob_key, but when I pass that blob_key to the blobstore.BlobReader class, it cannot find the blob for some reason. From the traceback, it seems like the BlobReader is trying to access the real blobstore, while the writer (inside the pipeline) is writing to the stubbed blobstore. I have --blobstore_path setup on dev_appserver.py, and I do not see any blobs written to disk by the test case, but when I run it from the web interface, the blobs do show up there.
Here is the traceback:
Traceback (most recent call last):
File "/Users/mattfaus/dev/webapp/coach_resources/student_use_data_report_test.py", line 138, in test_serial_pipeline
self.write_out_blob(stage.outputs.xlsx_blob_key)
File "/Users/mattfaus/dev/webapp/coach_resources/student_use_data_report_test.py", line 125, in write_out_blob
writer.write(reader.read())
File "/Users/mattfaus/Desktop/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/blobstore/blobstore.py", line 837, in read
self.__fill_buffer(size)
File "/Users/mattfaus/Desktop/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/blobstore/blobstore.py", line 809, in __fill_buffer
self.__position + read_size - 1)
File "/Users/mattfaus/Desktop/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/blobstore/blobstore.py", line 657, in fetch_data
return rpc.get_result()
File "/Users/mattfaus/Desktop/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 604, in get_result
return self.__get_result_hook(self)
File "/Users/mattfaus/Desktop/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/blobstore/blobstore.py", line 232, in _get_result_hook
raise _ToBlobstoreError(err)
BlobNotFoundError
Here is my test code:
def write_out_blob(self, blob_key, save_path='/tmp/blob.xlsx'):
"""Reads a blob from the blobstore and writes it out to the file."""
print str(blob_key)
# blob_info = blobstore.BlobInfo.get(str(blob_key)) # Returns None
# reader = blob_info.open() # Returns None
reader = blobstore.BlobReader(str(blob_key))
writer = open(save_path, 'w')
writer.write(reader.read())
print blob_key, 'written to', save_path
def test_serial_pipeline(self):
stage = student_use_data_report.StudentUseDataReportSerialPipeline(
self.query_config)
stage.start_test()
self.assertIsNotNone(stage.outputs.xlsx_blob_key)
self.write_out_blob(stage.outputs.xlsx_blob_key)

It might be useful if you show how you finalized the blobstore file or if you can try that finalization code separately. It sounds Files API didn't finalize the file correctly on dev appserver.

Turns out that I was simply missing the .value property, here:
self.assertIsNotNone(stage.outputs.xlsx_blob_key)
self.write_out_blob(stage.outputs.xlsx_blob_key.value) # Don't forget .value!!
[UPDATE]
The SDK dashboard also exposes a list of all blobs in your blobstore, conveniently sorted by creation date. It is available at http://127.0.0.1:8000/blobstore.