database python3 dimension error - database

c.execute('''CREATE TABLE IF NOT EXISTS Electronic
(elem_no INTEGER PRIMARY KEY, Econf text)''')
lists = []
for i in range(1,5):
filenm = str(i)+".html"
print(filenm)
vals = []
with open(filenm, "r") as elfile:
for line in elfile:
mstr = "]Electron Configuration"
if mstr in line:
vals.insert(len(vals),"Hello")
print(len(vals))
lists.append(vals)
print(lists)
c.executemany("INSERT INTO Electronic VALUES(?)",lists)
conn.commit()
where in each [1-5].html, i have a line:
\grep "Electron Configuration" 1.html
[186]Electron Configuration 1s^1
Now, the problem is, with this exact setup, I am getting error:
1.html
1
2.html
1
3.html
1
4.html
1
[['Hello'], ['Hello'], ['Hello'], ['Hello']]
Traceback (most recent call last):
File "elem_parse.py", line 134, in <module>
c.executemany("INSERT INTO Electronic VALUES(?)",lists)
sqlite3.OperationalError: table Electronic has 2 columns but 1 values were supplied
As I have never done database before, so I tried with:
c.executemany("INSERT INTO Electronic VALUES(?,?)",lists)
i.e. increasing no of field in VALUES, which is then giving error:
1.html
1
2.html
1
3.html
1
4.html
1
[['Hello'], ['Hello'], ['Hello'], ['Hello']]
Traceback (most recent call last):
File "elem_parse.py", line 134, in <module>
c.executemany("INSERT INTO Electronic VALUES(?,?)",lists)
sqlite3.ProgrammingError: Incorrect number of bindings supplied. The current statement uses 2, and there are 1 supplied.
Since I have never done database befor, I am following Python-sqlite3 - Auto-increment of not null primary key?
but now, I am lost and cant figure out the problem.

When you use two parameter markers (?), you also need to specify two values in each element of the lists list. But in this application, you do not actually want to specifiy different values (or any value at all).
With two columns in the table, you either have to give values for all columns:
INSERT INTO Electronic VALUES (NULL, ?);
or specifiy the columns you want to use:
INSERT INTO Electronic(Econf) VALUES (?);

Related

I am having parameter limitations with pyodbc. How do I manually perform inserts or escape the values to insert?

Let's say I want to execute an insert in this connection, which is valid:
import pyodbc
CONNSTR = "DRIVER={ODBC Driver 17 for SQL Server};"
"SERVER=....database.windows.net,1433;
"UID=...;PWD=...;DATABASE=..."
connection = pyodbc.connect(CONNSTR, autocommit=True)
cursor = connection.cursor()
Then, I make this insert, which is valid:
cursor.execute("INSERT INTO [dbo].[products]([name], [regular_price], [sale_price], [type]) VALUES (?, ?, ?, ?)", ["Hello", 1.1, 1.1, "lalala"])
This is: I make the query using parameters, and then I insert one single record. This works (assume the table is valid and accepts sending those 4 columns).
But when I use 2100 or more arguments, I get an error:
>>> cursor.execute("INSERT INTO [dbo].[products]([name], [regular_price], [sale_price], [type]) VALUES " + ", ".join("(?, ?, ?, ?)" for _ in range(525)), ["Hello", 1.1, 1.1, "lalala"] * 525)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The incoming request has too many parameters. The server supports a maximum of 2100 parameters. Reduce the number of parameters and resend the request. (8003) (SQLExecDirectW)')
>>> cursor.execute("INSERT INTO [dbo].[products]([name], [regular_price], [sale_price], [type]) VALUES " + ", ".join("(?, ?, ?, ?)" for _ in range(526)), ["Hello", 1.1, 1.1, "lalala"] * 526)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
pyodbc.Error: ('07002', '[07002] [Microsoft][ODBC Driver 17 for SQL Server]COUNT field incorrect or syntax error (0) (SQLExecDirectW)')
So, it seems that using 2100 or more arguments is not allowed, and I need to support inserting up to 1000 records like this (in fact! this will be user-handled so I DON'T KNOW how many columns will the table have).
So my question is: How do I escape the arguments manually so I don't have to resort to using this argument-placeholder approach (which is limited on insert because of this)? Or, alternatively: Is there a driver-enabled method in the odbc adapters to insert a value through pyodbc (method which actually takes care of the escaping by itself)?
Spoiler alert: No. There is no built-in process. What I had to do comes like this:
Compute the number of columns per row to insert.
Compute the number of rows to insert "per batch" as int(2099 / numcols).
Batch-insert using "insert into mytable({col_list}) values ..." using parameters args_chunk.
This will be explained now:
col_list will be ", ".join(cols) while numcols will be len(cols), being cols an array of strings with the columns to insert.
args_chunk will be a flattened version of rows[index:index + batch_size] considering index as iterated from range(0, len(rows), batch_size).
... in the insert query will be ", ".join(["(?, ?...?, ?)"] * len(rows[index:index + batch_size])) where the number of question marks is numcols.
So the logic goes like this:
Considering the number of columns (which will at most be 1024), insert an amount of rows that make the total number of arguments NOT pass 2099. Use this amount as a "safe" amount of rows to insert.
Each iteration will use that "safe" amount of rows.
The query, on each iteration, will have the appropriate number of rows (and arguments).
A last iteration may have a different (lower) number of rows.
By the end, all of them will be safely inserted.

RuntimeError: The type numpy.ndarray(numpy.ustr) for column is not supported

I am trying to convert python dataframe data types so that they can be returned through sql server using the sp_execute_external_scripts procedure. Some columns in particular are giving me issues. Sample data:
>>> df.column1
0 NaN
1 1403
2 NaN
3 NaN
4 NaN
Using the method found in another answer (https://stackoverflow.com/a/60779074/3084939) I created a function to do this and return a new series.
def str_convert(series):
null_cells = series.isnull()
return series.astype(str).mask(null_cells, np.NaN)
I then do:
df.column1 = str_convert(df.column1)
When I run the procedure in management studio I get an error:
Msg 39004, Level 16, State 20, Line 0
A 'Python' script error occurred during execution of 'sp_execute_external_script' with HRESULT 0x80004004.
Msg 39019, Level 16, State 2, Line 0
An external script error occurred:
C:\SQL\MSSQL14.SQL2017\PYTHON_SERVICES.3.7\lib\site-packages\revoscalepy\functions\RxSummary.py:4: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
from pandas import DataFrame, Index, Panel
INTERNAL ERROR: should have tag
error while running BxlServer: caught exception: Error communicating between BxlServer and client: 0x000000e9
STDOUT message(s) from external script:
Express Edition will continue to be enforced.
Warning: numpy.int64 data type is not supported. Data is converted to float64.
Warning: numpy.int64 data type is not supported. Data is converted to float64.
SqlSatelliteCall function failed. Please see the console output for more information.
Traceback (most recent call last):
STDOUT message(s) from external script:
File "C:\SQL\MSSQL14.SQL2017\PYTHON_SERVICES.3.7\lib\site-packages\revoscalepy\computecontext\RxInSqlServer.py", line 605, in rx_sql_satellite_call
rx_native_call("SqlSatelliteCall", params)
File "C:\SQL\MSSQL14.SQL2017\PYTHON_SERVICES.3.7\lib\site-packages\revoscalepy\RxSerializable.py", line 375, in rx_native_call
ret = px_call(functionname, params)
RuntimeError: The type numpy.ndarray(numpy.ustr) for column1 is not supported.
No idea on where to start but when I simply do the below, it does not error but NaN values get replaced with 'nan' therefore return as the literal string rather than null in sql server which is not what I want. Hopefully someone else has some insight as to what's going on. I tried searching and nothing relevant came up.
df.column1 = df.column1 .astype(str)
Edit:
A more trivial example seems to reveal this occurring when the first value in the series is NaN.
declare #script nvarchar(max) = N'
import os
import datetime
import numpy as np
import pandas as pd
df = pd.DataFrame([[np.NaN, "a", "b"],["w","x",np.NaN],[1, 2, 3]])
df.columns = ["a","b","c"]
print(df.head())
'
execute sp_execute_external_script
#language = N'Python',
#script = #script,
#output_data_1_name = N'df'
with result sets ((
a varchar(100) null
,b varchar(100) null
,c varchar(100) null
))
I believe that this might be due to the fact that np.NaN is not a string and therefore cannot be converted to varchar.
Try casting the df values to str, i.e. in your latest example:
df["a"] = df["a"].apply(str)

Errors Dropping a Index in Python

I have these pymmsql index errors I created 3 indexes in my code and I want to drop them but they wouldn't drop I get these errors
Code:
DbConnect = 'Micros'
myDbConn = pymssql.connect(*******,"******", "*******",DbConnect)
cursor = myDbConn.cursor()
cursor.execute("""DROP INDEX [IF EXISTS]
Micros ON payrolldata;""")
cursor.execute("""DROP INDEX [IF EXISTS]
MainSort ON s20data,StoreSort ON s20data;""")
myDbConn.commit()
Error:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python37-32\lib\tkinter\__init__.py", line 1705, in __call__
return self.func(*args)
File "C:\Python37-32\SqlVersionpr_import.py", line 840, in proceed
Micros ON payrolldata;""")
File "src\pymssql.pyx", line 465, in pymssql.Cursor.execute
pymssql.ProgrammingError: (102, b"Incorrect syntax near 'Micros'.DB-Lib error message 20018, severity
15:\nGeneral SQL Server error: Check messages from the SQL Server\n")
Correct syntax would be.
DROP INDEX IF EXISTS IndexName;
DROP INDEX IndexName;
Square brackets mean that parameter or part of statement is optional.
See documentation
https://learn.microsoft.com/en-us/sql/t-sql/statements/drop-index-transact-sql?view=sql-server-ver15

Pandas insert into SQL Server

I've read in an excel file with 5 columns into a dataframe (using Pandas) and I'm trying to write it to an existing empty sql server table using this code
for index, row in df.iterrows():
PRCcrsr.execute("Insert into table([Field1], [Field2], [Field3], [Field4], [Field5]) VALUES(?,?,?,?,?)"
, row['dfcolumn1'],row['dfcolumn2'], row['dfcolumn3'], row['dfcolumn4'], row['dfcolumn5'])
I get the following error message:
TypeError: execute() takes from 2 to 5 positional arguments but 7 were given
df.shape says I have 5 columns but when I print the df to the screen it includes the RowNumber. Also one of the columns is city_state which includes a comma. Is this the reason it thinks I'm providing 7 arguments(5 actual columns + row number + the comma issue)? Is there a way to deal with the comma and rowindex columns in the dataframe before writing in to SQL Server? If shape says 5 columns why am I getting this error?
The code above indicated 7 parameters were being passed to the cursor execute command and only between 2 and 5 are permissible. I am actually passing 7 parameters (Insert into, Values, and row[dfcolumn1, 2, 3, 4, 5 - 7 in all). The fix was to convert the row[dfcolumn1] to a tuple using this code
new tuple = [tuple(r) for r in df.values.tolist()]
then I rewrote the for loop as follows:
for tuple in new_tuple:
PRCcrsr.execute = Insert into table([Field1], [Field2], [Field3], [Field4], [Field5]) VALUES(?,?,?,?,?)", tuple)
This delivered the fields as a tuple and inserted correctly

Weird behavior with web2py DAL on table definition

Hi getting behaviour I don't understand with web2py
In [50]: db = DAL('sqlite://deposit/sample.sqlite')
In [51]: db.define_table('customer',Field('name','string',required=True),
Field('nric','string',required=True),
Field('address','string'),
Field('phone','integer'),
primarykey=['name'])
Out[51]: <Table customer (name,nric,address,phone)>
works as expected.
I then do
In [53]: db.define_table('check',
Field('nric', db.customer.nric, required=True),
Field('clear','string'))
which gets me the message
AttributeError: 'DAL' object has no attribute 'customer.nric'
So thinking this may be an issue of not having committed customer to the database
so I do a db.commit() and then try again
In [56]: db.define_table('check',Field('nric', db.customer.nric, required=True), Field('clear','string'))
File "<string>", line unknown
SyntaxError: table already defined: check
Not sure why .. but anyway I try and drop the table
In [59]: db['check'].drop()
and get the following weird traceback
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-59-998297b798f5> in <module>()
----> 1 db['check'].drop()
/home/tahnoon/.dropbox-cyn/Dropbox (TIS Group)/Cynopsis/Builds/Apollo/Code Src/web2py/gluon/dal.pyc in drop(self, mode)
9225
9226 def drop(self, mode=''):
-> 9227 return self._db._adapter.drop(self, mode)
9228
9229 def _listify(self, fields, update=False):
/home/tahnoon/.dropbox-cyn/Dropbox (TIS Group)/Cynopsis/Builds/Apollo/Code Src/web2py/gluon/dal.pyc in drop(self, table, mode)
1328 queries = self._drop(table, mode)
1329 for query in queries:
-> 1330 if table._dbt:
1331 self.log(query + '\n', table)
1332 self.execute(query)
/home/tahnoon/.dropbox-cyn/Dropbox (TIS Group)/Cynopsis/Builds/Apollo/Code Src/web2py/gluon/dal.pyc in __getitem__(self, key)
9108 return self._db(self._id == key).select(limitby=(0, 1), orderby_on_limitby=False).first()
9109 elif key:
-> 9110 return ogetattr(self, str(key))
9111
9112 def __call__(self, key=DEFAULT, **kwargs):
AttributeError: 'Table' object has no attribute '_dbt'
Checking tables shows
In [61]: db.tables()
Out[61]: ['customer']
Is this expected behaviour? If so how do I drop/create a table after a syntax error? thanks
Since db.customer is a keyed table (i.e., you have defined a primarykey attribute rather than relying on the default autoincrement integer ID field as the primary key), it can only be referenced by other keyed tables.
Also, when creating reference fields for keyed tables, use the following syntax:
Field('nric', 'reference customer.nric', required=True)
However, I don't think keyed tables are supported for SQLite (the docs say only DB2, MS-SQL, Ingres, and Informix are supported). Anyway, if you are creating a new table in SQLite, there is no reason to use a keyed table (that functionality was added primarily to enable access to legacy databases that lack autoincrement integer primary key fields).
Finally, dropping a table does not remove the model from the db DAL instance -- rather, that operation drops the table from the database itself. If you want to redefine a model within a shell session, you should use the "redefine" argument:
db.define_table(..., redefine=True)

Resources