How can I find a result by its row number? - cakephp

Say I have two models, Foo belongsTo Bar:
$this->Foo->find('first',array(
'conditions'=>array(
'barRowNumber'=>$barRowNumber,
'fooRowNumber'=>$fooRowNumber)));
I can't just do the auto_increment id because they skip around when I add and delete records. So how can I use row number as a parameter for a find query in CakePHP?

If you dont have id then you should count the row where your search occur.
If you have ID then $this->Foo->id = $value is the key
If you have other search value then $this->Foo->findbyYourValue will do

Related

Sort solr response by value in subdocument collection

I'm using DSE solr to index a cassandra table that contains a collection of UDTs. I want to be able to sort search results based on a value inside those UDTs.
Given a simplistic example table...
create type test_score (
test_name text,
percentile double,
score int,
description text
);
create table students (
id int,
name text,
test_scores set<frozen<test_score>>,
...
);
... and assuming I'm auto-generating the solr schema via dsetool, I want to be able to write a solr query that finds students who have taken a test (by a specific test_name), and sort them by that test's score (or percentile, or whatever).
Unfortunately you can't sort by UDT fields.
However, I'm not sure what the value of a UDT is here. Perhaps I don't know enough about your use case. Another issue I see is that each partition key is a student id, so you can only store one test result per student. A better approach might be to use test id as a clustering column so you can store all the test results for a student in a single partition. Something like this:
CREATE TABLE students (
id int,
student_name text,
test_name text,
score int,
percentile double,
description text,
PRIMARY KEY (id, student_name, test_name)
);
Student name is kind of redundant (it should be the same for every row in each partition), but it doesn't have to be a clustering column.
Then you can sort on any field like so:
SELECT * FROM students WHERE solr_query='{"q":"test_name:Biology", "sort":"percentile desc"}' LIMIT 10;
I've used the JSON syntax described here: https://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/srch/srchJSON.html
Ok so basically you want to do a JOIN between table test_score and students right ?
According to the official doc: http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/srch/srchQueryJoin.html
Joining Solr core is possible only if the 2 tables share the same partition key, which is not the case in your example ...

Which database can do this?

In the recent project, I need a database that does this.
each item is key value pair, each key is a multi-dimensional string, so for example
item 1:
key :['teacher','professor']
value: 'david'
item 2:
key :['staff', 'instructor', 'professor']
value: 'shawn'
so each key's length is not necessarily the same. I can do query like
anyone with both ['teacher','staff'] as keys.
Also I can add another item later easily, for example, a key-value pair like.
item 3:
key :['female', 'instructor', 'professor','programmer']
value: 'annie'
so the idea is that I can tag any array of keys to a value, and I can search by a subset of keys.
Since (judging on your comments) you don't need to enforce uniqueness, these are not actually "keys", and can be more appropriately thought of as "tags" whose primary purpose is to be searched on (not unlike StackOverflow.com tags).
The typical way of implementing tags in a relational database looks something like this:
Note the order of fields in the junction table TAG_ITEM primary key: since our goal is to find items of given tag (not tags of given item), the leading edge of the index "underneath" PK is TAG_ID. This facilitates efficient index range scan on given TAG_ID.
Cluster TAG_ITEM if your DBMS supports it.
You can then search for items with any of the given tags like this:
SELECT [DISTINCT] ITEM_ID
FROM
TAG
JOIN TAG_ITEM ON TAG.TAG_ID = TAG_ITEM.TAG_ID
WHERE
TAG_NAME = 'teacher'
OR TAG_NAME = 'professor'
And if you need any other fields from ITEM, you can:
SELECT * FROM ITEM WHERE ITEM_ID IN (<query above>)
You can search for items with all of the given tags like this:
SELECT ITEM_ID
FROM
TAG
JOIN TAG_ITEM ON TAG.TAG_ID = TAG_ITEM.TAG_ID
WHERE
TAG_NAME = 'teacher'
OR TAG_NAME = 'professor'
GROUP BY
ITEM_ID
HAVING
COUNT(*) = 2
PostgreSQL can do something similar with it's hstore data-format: http://www.postgresql.org/docs/9.1/static/hstore.html
Or maybe you search for arrays?: http://postgresguide.com/sexy/arrays.html

CakePHP: How to write a find() query that excludes one field in the resultset?

Is there a way to write a CakePHP query to return all fields (columns) except one via find()? Or do I need to use the fields parameter and actually list all the fields, except the excluded field?
For example, if I have a database table (model), Company, with these fields:
id
name
street
city
state
zip
phone
Normally, $this->Company->find('all') would return all the fields. I want to exclude the phone field from the resultset.
$fields = array_keys($this->Company->getColumnTypes());
$key = array_search('phone', $fields);
unset($fields[$key]);
$this->Company->find('all', array('fields' => $fields));
For more info, please have a look at http://book.cakephp.org/2.0/en/models/additional-methods-and-properties.html#model-getcolumntypes
I can think of a couple ways to do this
Write out all the fields you want to include in the fields parameter. As mentioned in the comment, you can use $this->Company->schema to get all the fields programmatically rather than writing them out.
Unset the field you don't want after you get the data. You can do this in the model's afterFind function too.

CakePHP HABTM relationship, setting related records always adds new entry in jointable even if already exists

This is my code:
$charities = explode(',',$this->data['Charity']['charities']);
foreach ($charities as $key=>$charity){
$data['Charity'][$key] = $charity;
}
$this->Grouping->id = $this->data['Charity']['grouping_id'];
if ($this->Grouping->save($data)){
//Great!
}else{
//Oh dear
}
The code is from an action that gets called by AJAX. $this->data['Charity']['charities'] is a comma separated list of ids for the Charity model, which HABTM Grouping.
The code works fine, except that Cake doesn't seem to check whether that charity is already associated with that grouping, so I end up with lots of duplicate entries in the join table. I can see that this will be a pain later on so I'd like to get it right now.
What am I doing wrong?
Thanks
A few things I spotted:
A. Is $this->data['Charity']['charities'] an array with named keys corresponding to the table columns? Because inside the foreach() loop, you are taking the key and putting it in the $data['Charity'] array. The result could be a wrong format. The $data array for Model saves is commonly formatted as $data['Charity']['NAME_OF_COLUMN1], $data['Charity']['NAME_OF_COLUM2], and so on for each column that wants to be saved. So, if the keys are numbers, then you could be having something like: $data['Charity'][0], $data['Charity'][1], which is wrong.
B. HABTM Associations are saved differently. What I do recommend, is that you take a look at the book's section on saving HABTM. I save a HABTM relation like this (supposing a relation between users and coupons):
$this->data['Coupon'] = array('id' => 1, 'code' = 'BlaBla');
$this->data['User'] = array('id' => 33);
$this->Coupon->save($this->data, false);
So, as you can see, the $data array has a subarray named after the foreign model ('User') with the columns+values that I want to save. Then I save this array to the Coupon model.
C. Be aware that CakePHP will always delete old records and insert new ones within the join table when it is 'updating' HABTM relations. To avoid this, you set the 'unique' field to false in the Model's $HABTM configuration.

Best database design (model) for user tables

I'm developping a web application using google appengine and django, but I think my problem is more general.
The users have the possibility to create tables, look: tables are not represented as TABLES in the database. I give you an example:
First form:
Name of the the table: __________
First column name: __________
Second column name: _________
...
The number of columns is not fixed, but there is a maximum (100 for example). The type in every columns is the same.
Second form (after choosing a particular table the user can fill the table):
column_name1: _____________
column_name2: _____________
....
I'm using this solution, but it's wrong:
class Table(db.Model):
name = db.StringProperty(required = True)
class Column(db.Model):
name = db.StringProperty(required = True)
number = db.IntegerProperty()
table = db.ReferenceProperty(table, collection_name="columns")
class Value(db.Model):
time = db.TimeProperty()
column = db.ReferenceProperty(Column, collection_name="values")
when I want to list a table I take its columns and from every columns I take their values:
data = []
for column in data.columns:
column_data = []
for value in column.values:
column_data.append(value.time)
data.append(column_data)
data = zip(*data)
I think that the problem is the order of the values, because it is not true that the order for one column is the same for the others. I'm waiting for this bug (but until now I never seen it):
Table as I want: as I will got:
a z c a e c
d e f d h f
g h i g z i
Better solutions? Maybe using ListProperty?
Here's a data model that might do the trick for you:
class Table(db.Model):
name = db.StringProperty(required=True)
owner = db.UserProperty()
column_names = db.StringListProperty()
class Row(db.Model):
values = db.ListProperty(yourtype)
table = db.ReferenceProperty(Table, collection_name='rows')
My reasoning:
You don't really need a separate entity to store column names. Since all columns are of the same data type, you only need to store the name, and the fact that they are stored in a list gives you an implicit order number.
By storing the values in a list in the Row entity, you can use an index into the column_names property to find the matching value in the values property.
By storing all of the values for a row together in a single entity, there is no possibility of values appearing out of their correct order.
Caveat emptor:
This model will not work well if the table can have columns added to it after it has been populated with data. To make that possible, every time that a column is added, every existing row belonging to that table would have to have a value appended to its values list. If it were possible to efficiently store dictionaries in the datastore, this would not be a problem, but list can really only be appended to.
Alternatively, you could use Expando...
Another possibility is that you could define the Row model as an Expando, which allows you to dynamically create properties on an entity. You could set column values only for the columns that have values in them, and that you could also add columns to the table after it has data in it and not break anything:
class Row(db.Expando):
table = db.ReferenceProperty(Table, collection_name='rows')
#staticmethod
def __name_for_column_index(index):
return "column_%d" % index
def __getitem__(self, key):
# Allows one to get at the columns of Row entities with
# subscript syntax:
# first_row = Row.get()
# col1 = first_row[1]
# col12 = first_row[12]
value = None
try:
value = self.__dict__[Row.__name_for_column_index]
catch KeyError:
# The given column is not defined for this Row
pass
return value
def __setitem__(self, key, value):
# Allows one to set the columns of Row entities with
# subscript syntax:
# first_row = Row.get()
# first_row[5] = "New values for column 5"
self.__dict__[Row.__name_for_column_index] = value
# In order to allow efficient multiple column changes,
# the put() can go somewhere else.
self.put()
Why don't you add an IntegerProperty to Value for rowNumber and increment it every time you add a new row of values and then you can reconstruct the table by sorting by rowNumber.
You're going to make life very hard for yourself unless your user's 'tables' are actually stored as real tables in a relational database. Find some way of actually creating tables and use the power of an RDBMS, or you're reinventing a very complex and sophisticated wheel.
This is the conceptual idea I would use:
I would create two classes for the data-store:
table this would serve as a
dictionary, storing the structure of
the pseudo-tables your app would
create. it would have two fields :
table_name, column_name,
column_order . where column_order
would give the position of the
column within the table
data
this would store the actual data in
the pseudo-tables. it would have
four fields : row_id, table_name,
column_name , column_data. row_id
would be the same for data
pertaining to the same row and would
be unique for data across the
various pseudo-tables.
Put the data in a LongBlob.
The power of a database is to be able to search and organise data so that you are able to get only the part you want for performances and simplicity issues : you don't want the whole database, you just want a part of it and want it fast. But from what I understand, when you retrieve a user's data, you retrieve it all and display it. So you don't need to sotre the data in a normal "database" way.
What I would suggest is to simply format and store the whole data from a single user in a single column with a suitable type (LongBlob for example). The format would be an object with a list of columns and rows of type. And you define the object in whatever language you use to communicate with the database.
The columns in your (real) database would be : User int, TableNo int, Table Longblob.
If user8 has 3 tables, you will have the following rows :
8, 1, objectcontaintingtable1;
8, 2, objectcontaintingtable2;
8, 3, objectcontaintingtable3;

Resources