"Invalid key: 0. Please first select a split. For example: `my_dataset_dictionary['train'][0]`. Available splits: ['train']" - dataset

I tried to use the dataset API loaded with my own data to train the hunggingface model.
This is my code:
train_data= datasets.load_dataset('csv', data_files="/gdrive/MyDrive/project/train.csv")
test_data= datasets.load_dataset('csv', data_files="/gdrive/MyDrive/project/test.csv")
train_data
DatasetDict({
train: Dataset({
features: ['Post', 'Label'],
num_rows: 174
})
})
But in the trainer, I get this error message. What happens to my dataset. I do not find any error. Could you help me? Thank you!
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_data,
eval_dataset=test_data
)
trainer.train()
KeyError Traceback (most recent call last)
<ipython-input-63-3435b262f1ae> in <module>()
----> 1 trainer.train()
5 frames
/usr/local/lib/python3.7/dist-packages/datasets/dataset_dict.py in __getitem__(self, k)
44 suggested_split = available_suggested_splits[0] if available_suggested_splits else list(self)[0]
45 raise KeyError(
---> 46 f"Invalid key: {k}. Please first select a split. For example: "
47 f"`my_dataset_dictionary['{suggested_split}'][{k}]`. "
48 f"Available splits: {sorted(self)}"
KeyError: "Invalid key: 0. Please first select a split. For example: `my_dataset_dictionary['train'][0]`. Available splits: ['train']"

Related

Complex number in Fenics

I am currently trying to solve a complex-valued PDE with Fenics in a jupyter notebook but I am having trouble when I try to use a complex number in Fenics.
Here is how I've defined the variational problem:
u = TrialFunction(V)
v = TestFunction(V)
a = (inner(grad(u[0]), grad(v[0])) + inner(grad(u[1]), grad(v[1])))*dx + sin(lat)*(u[0]*v[1]-u[1]*v[0])*dx+1j*((-inner(grad(u[0]), grad(v[1])) + inner(grad(u[1]), grad(v[0])))*dx + (sin(lat)*(u[0]*v[0]-u[1]*v[1])*dx))
f = Constant((1.0,1.0))
b = (v[0]*f[0]+f[1]*v[1])*ds+1j*((f[1]*v[0]-f[0]*v[1])*ds)
I got the following error message:
AttributeError Traceback (most recent call last)
<ipython-input-74-7760afa5a395> in <module>()
1 u = TrialFunction(V)
2 v = TestFunction(V)
----> 3 a = (inner(grad(u[0]), grad(v[0])) + inner(grad(u[1]), grad(v[1])))*dx + sin(lat)*(u[0]*v[1]-u[1]*v[0])*dx+1j*((-inner(grad(u[0]), grad(v[1])) + inner(grad(u[1]), grad(v[0])))*dx + (sin(lat)*(u[0]*v[0]-u[1]*v[1])*dx)
4 f = Constant((0.0,0.0))
5 b = (v[0]*f[0]+f[1]*v[1])*ds+1j*((f[1]*v[0]-f[0]*v[1])*ds)
~/anaconda3_420/lib/python3.5/site-packages/ufl/form.py in __rmul__(self, scalar)
305 "Multiply all integrals in form with constant scalar value."
306 # This enables the handy "0*form" or "dt*form" syntax
--> 307 if is_scalar_constant_expression(scalar):
308 return Form([scalar*itg for itg in self.integrals()])
309 return NotImplemented
~/anaconda3_420/lib/python3.5/site-packages/ufl/checks.py in is_scalar_constant_expression(expr)
84 if is_python_scalar(expr):
85 return True
---> 86 if expr.ufl_shape:
87 return False
88 return is_globally_constant(expr)
AttributeError: 'complex' object has no attribute 'ufl_shape'
Could someone please help me?
By the way, Fenics might not be the best tool to solve complex-valued PDE and I would like to read your suggestions about such problems.

evaluating test dataset using eval() in LightGBM

I have trained a ranking model with LightGBM with the objective 'lambdarank'.
I want to evaluate my model to get the nDCG score for my test dataset using the best iteration, but I have never been able to use the lightgbm.Booster.eval() nor lightgbm.Booster.eval_train() function.
First, I have created 3 dataset instances, namely the train set, valid set and test set:
lgb_train = lgb.Dataset(x_train, y_train, group=query_train, free_raw_data=False)
lgb_valid = lgb.Dataset(x_valid, y_valid, reference=lgb_train, group=query_valid, free_raw_data=False)
lgb_test = lgb.Dataset(x_test, y_test, group=query_test)
I then train my model using lgb_train and lgb_valid:
gbm = lgb.train(params,
lgb_train,
num_boost_round=1500,
categorical_feature=chosen_cate_features,
valid_sets=[lgb_train, lgb_valid],
evals_result=evals_result,
early_stopping_rounds=150
)
When I call the eval() or the eval_train() functions after training, it returns an error:
gbm.eval(data=lgb_test,name='test')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-122-7ff5ef5136b8> in <module>()
----> 1 gbm.eval(data=lgb_test,name='test')
/usr/local/lib/python3.6/dist-packages/lightgbm/basic.py in eval(self, data,
name, feval)
1925 raise TypeError("Can only eval for Dataset instance")
1926 data_idx = -1
-> 1927 if data is self.train_set:
1928 data_idx = 0
1929 else:
AttributeError: 'Booster' object has no attribute 'train_set'
gbm.eval_train()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-123-0ce5fa3139f5> in <module>()
----> 1 gbm.eval_train()
/usr/local/lib/python3.6/dist-packages/lightgbm/basic.py in eval_train(self,
feval)
1956 List with evaluation results.
1957 """
-> 1958 return self.__inner_eval(self.__train_data_name, 0, feval)
1959
1960 def eval_valid(self, feval=None):
/usr/local/lib/python3.6/dist-packages/lightgbm/basic.py in
__inner_eval(self, data_name, data_idx, feval)
2352 """Evaluate training or validation data."""
2353 if data_idx >= self.__num_dataset:
-> 2354 raise ValueError("Data_idx should be smaller than number
of dataset")
2355 self.__get_eval_info()
2356 ret = []
ValueError: Data_idx should be smaller than number of dataset
and when i called the eval_valid() function, it returns an empty list.
Can anyone tell me how to evaluate a LightGBM model and get the nDCG score using test set properly? Thanks.
If you add keep_training_booster=True as an argument to your lgb.train, the returned booster object would be able to execute eval and eval_train (though eval_valid would still return an empty list for some reason even when valid_sets is provided in lgb.train).
Documentation says:
keep_training_booster (bool, optional (default=False)) – Whether the returned Booster will be used to keep training. If False, the returned value will be converted into _InnerPredictor before returning.

Python3 - IndexError when trying to save a text file

i'm trying to follow this tutorial with my own local data files:
CNTK tutorial
i have the following function to save my data array into a txt file feedable to CNTK:
# Save the data files into a format compatible with CNTK text reader
def savetxt(filename, ndarray):
dir = os.path.dirname(filename)
if not os.path.exists(dir):
os.makedirs(dir)
if not os.path.isfile(filename):
print("Saving", filename )
with open(filename, 'w') as f:
labels = list(map(' '.join, np.eye(11, dtype=np.uint).astype(str)))
for row in ndarray:
row_str = row.astype(str)
label_str = labels[row[-1]]
feature_str = ' '.join(row_str[:-1])
f.write('|labels {} |features {}\n'.format(label_str, feature_str))
else:
print("File already exists", filename)
i have 2 ndarrays of the following shape that i want to feed the model:
train.shape
(1976L, 15104L)
test.shape
(1976L, 15104L)
Then i try to implement the fucntion like this:
# Save the train and test files (prefer our default path for the data)
data_dir = os.path.join("C:/Users", 'myself', "OneDrive", "IA Project", 'data', 'train')
if not os.path.exists(data_dir):
data_dir = os.path.join("data", "IA Project")
print ('Writing train text file...')
savetxt(os.path.join(data_dir, "Train-128x118_cntk_text.txt"), train)
print ('Writing test text file...')
savetxt(os.path.join(data_dir, "Test-128x118_cntk_text.txt"), test)
print('Done')
and then i get the following error:
Writing train text file...
Saving C:/Users\A702628\OneDrive - Atos\Microsoft Capstone IA\Capstone data\train\Train-128x118_cntk_text.txt
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-24-b53d3c69b8d2> in <module>()
6
7 print ('Writing train text file...')
----> 8 savetxt(os.path.join(data_dir, "Train-128x118_cntk_text.txt"), train)
9
10 print ('Writing test text file...')
<ipython-input-23-610c077db694> in savetxt(filename, ndarray)
12 for row in ndarray:
13 row_str = row.astype(str)
---> 14 label_str = labels[row[-1]]
15 feature_str = ' '.join(row_str[:-1])
16 f.write('|labels {} |features {}\n'.format(label_str, feature_str))
IndexError: list index out of range
Can somebody please tell me what's going wrong with this part of the code? And how could i fix it? Thank you very much in advance.
Since you're using your own input data -- are they labelled in the range 0 to 9? The labels array only has 10 entries in it, so that could cause an out-of-range problem.

Loading data into R with rsqlserver package

I've just installed rsqlserver like so (no errors)
install_github('rsqlserver', 'agstudy',args = '--no-multiarch')
And created a connection to my database:
> library(rClr)
> library(rsqlserver)
Warning message:
multiple methods tables found for ‘dbCallProc’
> drv <- dbDriver("SqlServer")
> conn <- dbConnect(drv, url = "Server=MyServer;Database=MyDB;Trusted_Connection=True;")
>
Now when I try to get data using dbGetQuery, I get this error:
> df <- dbGetQuery(conn, "select top 100 * from public2013.dim_Date")
Error in clrCall(sqlDataHelper, "GetConnectionProperty", conn, prop) :
Type: System.MissingMethodException
Message: Method not found: 'System.Object System.Reflection.PropertyInfo.GetValue(System.Object)'.
Method: System.Object GetConnectionProperty(System.Data.SqlClient.SqlConnection, System.String)
Stack trace:
at rsqlserver.net.SqlDataHelper.GetConnectionProperty(SqlConnection _conn, String prop)
>
When I try to fetch results using dbSendQuery, I also get an error.
> res <- dbSendQuery(conn, "select top 100 * from public2013.dim_Date")
> df <- fetch(res, n = -1)
Error in clrCall(sqlDataHelper, "Fetch", stride) :
Type: System.InvalidCastException
Message: Object cannot be stored in an array of this type.
Method: Void InternalSetValue(Void*, System.Object)
Stack trace:
at System.Array.InternalSetValue(Void* target, Object value)
at System.Array.SetValue(Object value, Int32 index)
at rsqlserver.net.SqlDataHelper.Fetch(Int32 capacity) in c:\projects\R\rsqlserver\src\rsqlserver.net\src\SqlDataHelper.cs:line 116
Strangely, the file c:\projects\R\rsqlserver\src\rsqlserver.net\src\SqlDataHelper.cs doesn't actually exist on my computer.
Am I doing something wrong?
I am agstudy the creator of rsqlserver package. Sorry for the late but I finally I get some time to fix this bug. ( actually it was a not yet implemented feature). I demonstrate here how you can read/write data.frame with missing values in Sql server.
First I create a data.frame with missing values. It is important to distinguish the difference between numeric and character variables.
library(rsqlserver)
url = "Server=localhost;Database=TEST_RSQLSERVER;Trusted_Connection=True;"
conn <- dbConnect('SqlServer',url=url)
## create a table with some missing value
dat <- data.frame(txt=c('a',NA,'b',NA),
value =c(1L,NA,NA,2))
My input looks like this :
# txt value
# 1 a 1
# 2 <NA> NA
# 3 b NA
# 4 <NA> 2
I insert dat in my data base with the handy function dbWriteTable:
dbWriteTable(conn,name='T_TABLE_WITH_MISSINGS',
dat,row.names=FALSE,overwrite=TRUE)
Then I will read it using 2 methods:
dbSendQuery
res = dbSendQuery(conn,'SELECT *
FROM T_TABLE_WITH_MISSINGS')
fetch(res,n=-1)
dbDisconnect(conn)
txt value
1 a 1
2 <NA> NaN
3 b NaN
4 <NA> 2
dbReadTable:
rsqlserver is DBI compliant and implement many convenient functions to deal at least at possible with SQL.
conn <- dbConnect('SqlServer',url=url)
dbReadTable(conn,name='T_TABLE_WITH_MISSINGS')
dbDisconnect(conn)
txt value
1 a 1
2 <NA> NaN
3 b NaN
4 <NA> 2
(EDIT: I had missed something in your post (call to fetch). I can now reproduce the issue too.)
Short story is: do you have a NULL value in your database? this may be the cause.
Longer story, for a full repro:
I've used a sample DB reproducible by following the instructions at http://www.codeproject.com/Tips/326527/Create-a-Sample-SQL-Database-in-Less-Than-2-Minute
EDIT:
I can reproduce your issue with:
library(rClr)
library(rsqlserver)
drv <- dbDriver("SqlServer")
conn <- dbConnect(drv, url = "Server=Localhost\\somename;Database=Fabrics;Trusted_Connection=True;")
res <- dbSendQuery(conn, "SELECT TOP 100 * FROM [Fabrics].[dbo].[Client]")
str(res)
## Formal class 'SqlServerResult' [package "rsqlserver"] with 1 slots
..# Id:<externalptr>
> df <- fetch(res, n = -1)
Error in clrCall(sqlDataHelper, "Fetch", stride) :
Type: System.InvalidCastException
Message: Object cannot be stored in an array of this type.
Method: Void InternalSetValue(Void*, System.Object)
Stack trace:
at System.Array.InternalSetValue(Void* target, Object value)
at System.Array.SetValue(Object value, Int32 index)
at rsqlserver.net.SqlDataHelper.Fetch(Int32 capacity) in c:\projects\R\rsqlserver\src\rsqlserver.net\src\SqlDataHelper.cs:line 116
the following commands suggest things work as expected if using other commands.
> dbExistsTable(conn, name='Client')
Error in sqlServerExecScalar(conn, statement, ...) :
Message: There is already an open DataReader associated with this Command which must be closed first.
> dbClearResult(res)
[1] TRUE
> dbExistsTable(conn, name='Client')
[1] TRUE
> dbExistsTable(conn, name='SomeIncorrectColumn')
[1] FALSE
Note that I cannot reproduce the very odd one about MissingMethodException
df <- dbGetQuery(conn, "SELECT TOP 100 * FROM [Fabrics].[dbo].[Client]")
Error in clrCall(sqlDataHelper, "Fetch", stride) :
Type: System.InvalidCastException
Message: Object cannot be stored in an array of this type.
Method: Void InternalSetValue(Void*, System.Object)
Stack trace:
at System.Array.InternalSetValue(Void* target, Object value)
at System.Array.SetValue(Object value, Int32 index)
at rsqlserver.net.SqlDataHelper.Fetch(Int32 capacity) in c:\projects\R\rsqlserver\src\rsqlserver.net\src\SqlDataHelper.cs:line 116
Since the debug symbols seem present, I can debug it further through visual studio. It bombs in SqlDataHelper.Fetch at
_resultSet[_cnames[i]].SetValue(_reader.GetValue(i), cnt);
and the variable watch gives me:
i 11 int
_cnames[i] "Street2" string
_reader.GetValue(i) {} object {System.DBNull}
_reader.GetValue(i-1) "806 West Sir Francis Drake St" object {string}
_reader.GetValue(i+1) "Spokane" object {string}
The entry for Street2 is indeed a NULL:
ClientId FirstName MiddleName LastName Gender DateOfBirth CreditRating XCode OccupationId TelephoneNumber Street1 Street2 City ZipCode Longitude Latitude Notes
1 Nicholas Pat Kane M 1975-10-07 00:00:00.000 3 ZU8 5ML 4 (279) 459 - 2707 2870 North Cherry Blvd. NULL Carlsbad 64906 32.7608137325835 117.112738329071
For information, sessionInfo() output includes:
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
other attached packages:
[1] rsqlserver_1.0 rClr_0.5-2
loaded via a namespace (and not attached):
[1] DBI_0.2-7 tools_3.0.2
Hope this helps.

Complex query mongodb c

I've created my mongodb query like this >
86 bson query[1];
87 bson_init(query);
88 bson_append_start_object(query, "service.virtual_mach ine");
89 bson_append_oid(query,"_id",result);
90 bson_append_finish_object(query);
91 bson_finish(query);
and i run it using this>
93 bson fields[1];
94 bson_init(fields);
95 bson_append_oid(fields, "_id", result);
96 bson_finish(fields);
97
98 mongo_cursor* cursor = NULL;
99 cursor = mongo_find(conn, "db.services", query, field s, 9999,0,0);
and if i print the "mongo_cursor_next(cursor)" method i got a "-1" (ERROR), i want to know, what's the error in my query.
Thank you in advance.
This one? https://github.com/mongodb/mongo-c-driver#error-handling
Most functions return MONGO_OK or BSON_OK on success and MONGO_ERROR or BSON_ERROR on failure. Specific error codes and error strings are then stored in the err and errstr fields of the mongo and bson objects. It is the client's responsibility to check for errors and handle them appropriately.

Resources