fit_transform() missing 1 required positional argument: 'y - logistic-regression

I receive the following error;
TypeError Traceback (most recent call last)
<ipython-input-109-d2877618fb0b> in <module>()
3 if cleaned_df[column].dtype == np.number:
4 continue
----> 5 cleaned_df[column] = LabelEncoder.fit_transform(cleaned_df[column])
TypeError: fit_transform() missing 1 required positional argument: 'y'
My Code is as follows:
for column in cleaned_df.columns:
if cleaned_df[column].dtype == np.number:
continue
cleaned_df[column] = LabelEncoder.fit_transform(cleaned_df[column])

u are missing add () after LabelEncoder.
try this :
le=LabelEncoder() for column in cleaned_df.columns: if cleaned_df[column].dtype == np.number: continue cleaned_df[column] = le.fit_transform(cleaned_df[column])

You have to add paranthesis after LabelEncoder:
cleaned_df[column] = LabelEncoder().fit_transform(cleaned_df[column])

Related

evaluating test dataset using eval() in LightGBM

I have trained a ranking model with LightGBM with the objective 'lambdarank'.
I want to evaluate my model to get the nDCG score for my test dataset using the best iteration, but I have never been able to use the lightgbm.Booster.eval() nor lightgbm.Booster.eval_train() function.
First, I have created 3 dataset instances, namely the train set, valid set and test set:
lgb_train = lgb.Dataset(x_train, y_train, group=query_train, free_raw_data=False)
lgb_valid = lgb.Dataset(x_valid, y_valid, reference=lgb_train, group=query_valid, free_raw_data=False)
lgb_test = lgb.Dataset(x_test, y_test, group=query_test)
I then train my model using lgb_train and lgb_valid:
gbm = lgb.train(params,
lgb_train,
num_boost_round=1500,
categorical_feature=chosen_cate_features,
valid_sets=[lgb_train, lgb_valid],
evals_result=evals_result,
early_stopping_rounds=150
)
When I call the eval() or the eval_train() functions after training, it returns an error:
gbm.eval(data=lgb_test,name='test')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-122-7ff5ef5136b8> in <module>()
----> 1 gbm.eval(data=lgb_test,name='test')
/usr/local/lib/python3.6/dist-packages/lightgbm/basic.py in eval(self, data,
name, feval)
1925 raise TypeError("Can only eval for Dataset instance")
1926 data_idx = -1
-> 1927 if data is self.train_set:
1928 data_idx = 0
1929 else:
AttributeError: 'Booster' object has no attribute 'train_set'
gbm.eval_train()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-123-0ce5fa3139f5> in <module>()
----> 1 gbm.eval_train()
/usr/local/lib/python3.6/dist-packages/lightgbm/basic.py in eval_train(self,
feval)
1956 List with evaluation results.
1957 """
-> 1958 return self.__inner_eval(self.__train_data_name, 0, feval)
1959
1960 def eval_valid(self, feval=None):
/usr/local/lib/python3.6/dist-packages/lightgbm/basic.py in
__inner_eval(self, data_name, data_idx, feval)
2352 """Evaluate training or validation data."""
2353 if data_idx >= self.__num_dataset:
-> 2354 raise ValueError("Data_idx should be smaller than number
of dataset")
2355 self.__get_eval_info()
2356 ret = []
ValueError: Data_idx should be smaller than number of dataset
and when i called the eval_valid() function, it returns an empty list.
Can anyone tell me how to evaluate a LightGBM model and get the nDCG score using test set properly? Thanks.
If you add keep_training_booster=True as an argument to your lgb.train, the returned booster object would be able to execute eval and eval_train (though eval_valid would still return an empty list for some reason even when valid_sets is provided in lgb.train).
Documentation says:
keep_training_booster (bool, optional (default=False)) – Whether the returned Booster will be used to keep training. If False, the returned value will be converted into _InnerPredictor before returning.

PySpark: Convert T-SQL Case When Then statement to PySpark [duplicate]

I have seen this question earlier here and I have took lessons from that. However I am not sure why I am getting an error when I feel it should work.
I want to create a new column in existing Spark DataFrame by some rules. Here is what I wrote. iris_spark is the data frame with a categorical variable iris_spark with three distinct categories.
from pyspark.sql import functions as F
iris_spark_df = iris_spark.withColumn(
"Class",
F.when(iris_spark.iris_class == 'Iris-setosa', 0, F.when(iris_spark.iris_class == 'Iris-versicolor',1)).otherwise(2))
Throws the following error.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-157-21818c7dc060> in <module>()
----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))
TypeError: when() takes exactly 2 arguments (3 given)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-157-21818c7dc060> in <module>()
----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))
TypeError: when() takes exactly 2 arguments (3 given)
Any idea why?
Correct structure is either:
(when(col("iris_class") == 'Iris-setosa', 0)
.when(col("iris_class") == 'Iris-versicolor', 1)
.otherwise(2))
which is equivalent to
CASE
WHEN (iris_class = 'Iris-setosa') THEN 0
WHEN (iris_class = 'Iris-versicolor') THEN 1
ELSE 2
END
or:
(when(col("iris_class") == 'Iris-setosa', 0)
.otherwise(when(col("iris_class") == 'Iris-versicolor', 1)
.otherwise(2)))
which is equivalent to:
CASE WHEN (iris_class = 'Iris-setosa') THEN 0
ELSE CASE WHEN (iris_class = 'Iris-versicolor') THEN 1
ELSE 2
END
END
with general syntax:
when(condition, value).when(...)
or
when(condition, value).otherwise(...)
You probably mixed up things with Hive IF conditional:
IF(condition, if-true, if-false)
which can be used only in raw SQL with Hive support.
Conditional statement In Spark
Using “when otherwise” on DataFrame
Using “case when” on DataFrame
Using && and || operator
import org.apache.spark.sql.functions.{when, _}
import spark.sqlContext.implicits._
val spark: SparkSession = SparkSession.builder().master("local[1]").appName("SparkByExamples.com").getOrCreate()
val data = List(("James ","","Smith","36636","M",60000),
("Michael ","Rose","","40288","M",70000),
("Robert ","","Williams","42114","",400000),
("Maria ","Anne","Jones","39192","F",500000),
("Jen","Mary","Brown","","F",0))
val cols = Seq("first_name","middle_name","last_name","dob","gender","salary")
val df = spark.createDataFrame(data).toDF(cols:_*)
1. Using “when otherwise” on DataFrame
Replace the value of gender with new value
val df1 = df.withColumn("new_gender", when(col("gender") === "M","Male")
.when(col("gender") === "F","Female")
.otherwise("Unknown"))
val df2 = df.select(col("*"), when(col("gender") === "M","Male")
.when(col("gender") === "F","Female")
.otherwise("Unknown").alias("new_gender"))
2. Using “case when” on DataFrame
val df3 = df.withColumn("new_gender",
expr("case when gender = 'M' then 'Male' " +
"when gender = 'F' then 'Female' " +
"else 'Unknown' end"))
Alternatively,
val df4 = df.select(col("*"),
expr("case when gender = 'M' then 'Male' " +
"when gender = 'F' then 'Female' " +
"else 'Unknown' end").alias("new_gender"))
3. Using && and || operator
val dataDF = Seq(
(66, "a", "4"), (67, "a", "0"), (70, "b", "4"), (71, "d", "4"
)).toDF("id", "code", "amt")
dataDF.withColumn("new_column",
when(col("code") === "a" || col("code") === "d", "A")
.when(col("code") === "b" && col("amt") === "4", "B")
.otherwise("A1"))
.show()
Output:
+---+----+---+----------+
| id|code|amt|new_column|
+---+----+---+----------+
| 66| a| 4| A|
| 67| a| 0| A|
| 70| b| 4| B|
| 71| d| 4| A|
+---+----+---+----------+
There are different ways you can achieve if-then-else.
Using when function in DataFrame API.
You can specify the list of conditions in when and also can specify otherwise what value you need. You can use this expression in nested form as well.
expr function.
Using "expr" function you can pass SQL expression in expr. PFB example. Here we are creating new column "quarter" based on month column.
cond = """case when month > 9 then 'Q4'
else case when month > 6 then 'Q3'
else case when month > 3 then 'Q2'
else case when month > 0 then 'Q1'
end
end
end
end as quarter"""
newdf = df.withColumn("quarter", expr(cond))
selectExpr function.
We can also use the variant of select function which can take SQL expression. PFB example.
cond = """case when month > 9 then 'Q4'
else case when month > 6 then 'Q3'
else case when month > 3 then 'Q2'
else case when month > 0 then 'Q1'
end
end
end
end as quarter"""
newdf = df.selectExpr("*", cond)
you can use this:
if(exp1, exp2, exp3) inside spark.sql()
where exp1 is condition and if true give me exp2, else give me exp3.
now the funny thing with nested if-else is. you need to pass every exp inside
brackets {"()"}
else it will raise error.
example:
if((1>2), (if (2>3), True, False), (False))

internal timer of python class

I am writing some Python class which has a status attribute. I would like to be notified if the status of an object X does not change in 30 seconds i.e. I want to receive something like a "Timeout" message. I tried to use the Timer subclass of the class Threading. But I get an error
import threading
import datetime
import time
def change( ob ):
print "TIMED OUT"
ob.timer = threading.Timer( 30, change, args = ob )
ob.timer.start( )
return
class XXX(object):
def __init__( self ):
self.status = 0
self.last_change = datetime.datetime.utcnow()
self.timer = threading.Timer(30, change)
self.timer.start()
def set(self, new_status):
D = datetime.timedelta(0, 30)
if ( datetime.datetime.utcnow( ) < self.last_change + D ):
self.timer.cancel( )
self.last_change = datetime.datetime.utcnow( )
self.status = new_status
self.timer = threading.Timer(30, change, args = self )
self.timer.start( )
print "New status: ", new_status
def main():
myx = XXX()
x = 0
while ( x < 120 ):
x += 1
print "Second ", x
if ( x == 10 ):
myx.set( 20 )
elif ( x == 40 ):
myx.set( 44 )
elif ( x == 101 ):
myx.set( 2 )
time.sleep( 1 )
if __name__=='__main__':
main()
But at Second 40 in the while loop I get the following exception raised:
Exception in thread Thread-7:
Traceback (most recent call last):
File "C:\Users\alexa\Anaconda2\lib\threading.py", line 801, in
__bootstrap_inner
self.run()
File "C:\Users\alexa\Anaconda2\lib\threading.py", line 1073, in run
self.function(*self.args, **self.kwargs)
TypeError: change() argument after * must be an iterable, not XXX
The weird thing is that if I remove the arguments of the change function and then initialize the timer without passing args = self, it works. The reason why I was passing arguments for the change function is that I wanted to restart the timer if the status was updated. Any thoughts on how to do/fix this? Thanks!
so I figured that it's just easier to move the change function inside the XXX class. This way, one does not need to pass args = ... to the timer.

Kohana 3.2 Call to undefined method Database_MySQL_Result::offset()

That happens when I try to play around with DB::select instead of ORM.
The query is returned as an object, but the error appears.
Code:
$bd_userdata -> offset($pagination -> offset) -> limit($pagination -> items_per_page) -> find_all() -> as_array();
Error:
ErrorException [ Fatal Error ]: Call to undefined method Database_MySQL_Result::offset()
Does it mean I have to count rows, before I send them to the offset in pagination?
When I try $query->count_all() I get the error message:
Undefined property: Database_Query_Builder_Select::$count_all
I tried count($query) but instead I got:
No tables used [ SELECT * LIMIT 4 OFFSET 0 ]
Here is the solution:
$results = DB::select('*')
->from('users')
->where('id', '=', 1)
->limit($pagination->items_per_page)
->offset($pagination->offset)->execute();
And a counter:
$count = $results->count_all();
I was doing it before, the other way around. That is why it did not work.
As you can see, execute() returns Database_Result object, which has no QBuilder's functionality. You must apply all conditions (where, limit, offset etc) before calling execute.
Here is a simple example with pagination:
// dont forget to apply reset(FALSE)!
$query = DB::select()->from('users')->where('username', '=', 'test')->reset(FALSE);
// counting rows
$row_count = $query->count_all();
// create pagination
$pagination = Pagination::factory(array(
'items_per_page' => 4,
'total_items' => $row_count,
));
// select rows using pagination's limit&offset
$users = $query->offset($pagination->offset)->limit($pagination->items_per_page)->execute();

CodeIgniter unknown SQL error

Here is my selection code from db:
$q = $this->db->like('Autor1' or 'Autor2' or 'Autor3' or 'Autor4', $vyraz)
->where('stav', 1)
->order_by('id', 'desc')
->limit($limit)
->offset($offset)
->get('knihy');
return $q->result();
Where $vyraz = "Zuzana Šidlíková";
And the error is:
Nastala chyba databázy
Error Number: 1054
Unknown column '1' in 'where clause'
SELECT * FROM (\knihy`) WHERE `stav` = 1 AND `1` LIKE '%Zuzana Šidlíková%' ORDER BY `id` desc LIMIT 9
Filename: C:\wamp\www\artbooks\system\database\DB_driver.php
Line Number: 330
Can you help me solve this problem?
Your syntax is wrong for what you're trying to do, but still technically valid, because this:
'Autor1' or 'Autor2' or 'Autor3' or 'Autor4'
...is actually a valid PHP expression which evaluates to TRUE (because all non-empty strings are "truthy"), which when cast to a string or echoed comes out as 1, so the DB class is looking to match on a column called "1".
Example:
function like($arg1, $arg2)
{
return "WHERE $arg1 LIKE '%$arg2%'";
}
$vyraz = 'Zuzana Šidlíková';
echo like('Autor1' or 'Autor2' or 'Autor3' or 'Autor4', $vyraz);
// Output: WHERE 1 LIKE '%Zuzana Šidlíková%'
Anyways, here's what you need:
$q = $this->db
->like('Autor1', $vyraz)
->or_like('Autor2', $vyraz)
->or_like('Autor3', $vyraz)
->or_like('Autor4', $vyraz)
->where('stav', 1)
->order_by('id', 'desc')
->limit($limit)
->offset($offset)
->get('knihy');

Resources