Django QuerySet - Multiple Order By when there is a single field per record - database

I am accessing an API where the number of fields can change at any time, but I must store and display the data as a table. Therefore, each record from the API is stored as a single record per field. My problem is that I am having trouble working out how I would order by multiple columns at a time. Putting all of the data into a 2D array (list of lists) before sorting is not a viable option as the number of records could be too large to feasibly hold in memory.
I've put together a simple example to explain. If anyone has an idea on how to overcome the problem, or how I could redesign my approach, I'd be very grateful.
| record_id | field | data |
| 1 | x | 2 |
| 1 | y | 1 |
| 1 | z | 3 |
| 2 | x | 30 |
| 2 | y | 42 |
| 2 | z | 7 |
| 3 | x | 53 |
| 3 | y | 2 |
| 3 | z | 7 |
If ordering by fields 'z' then 'x' (both ascending), the record order would be 1,2,3
If ordering by fields 'z' then 'y' (both ascending), the record order would be 1,3,2
I am using models in django to store and I am using QuerySets to retrieve the data. I don't have any control over the API or database from which I am originally accessing the data.

After a fair amount of research I realised I was going about this all wrong - I am now using an hstore field in postgres and django-hstore to utilise it, for a schema-less approach. I now have a single row per original record and I can order_by after casting the required field in an 'extra' method.

Related

Create/Update table in MS Access dynamically

EDIT:
Here's what I have: An Access database made up of 3 tables linked from SQL server. I need to create a new table in this database by querying the 3 source tables. Here are examples of the 3 tables I'm using:
PlanTable1
+------+------+------+------+---------+---------+
| Key1 | Key2 | Key3 | Key4 | PName | MainKey |
+------+------+------+------+---------+---------+
| 53 | 1 | 5 | -1 | Bikes | 536681 |
| 53 | 99 | -1 | -1 | Drinks | 536682 |
| 53 | 66 | 68 | -1 | Balls | 536683 |
+------+------+------+------+---------+---------+
SpTable
+----+---------+---------+
| ID | MainKey | SpName |
+----+---------+---------+
| 10 | 536681 | Wing1 |
| 11 | 536682 | Wing2 |
| 12 | 536683 | Wing3 |
+----+---------+---------+
LocTable
+-------+-------------+--------------+
| LocID | CenterState | CenterCity |
+--- ---+-------------+--------------+
| 10 | IN | Indianapolis |
| 11 | OH | Columbus |
| 12 | IL | Chicago |
+-------+-------------+--------------+
You can see the relationships between the tables. The NewMasterTable I need to create based off of these will look something like this:
NewMasterTable
+-------+--------+-------------+------+--------------+-------+-------+-------+
| LocID | PName | CenterState | Key4 | CenterCity | Wing1 | Wing2 | Wing3 |
+-------+--------+-------------+------+--------------+-------+-------+-------+
| 10 | Bikes | IN | -1 | Indianapolis | 1 | 0 | 0 |
| 11 | Drinks | OH | -1 | Columbus | 0 | 1 | 0 |
| 12 | Balls | IL | -1 | Chicago | 0 | 0 | 1 |
+-------+--------+-------------+------+--------------+-------+-------+-------+
The hard part for me is making this new table dynamic. In the future, rows may be added to the source tables. I need my NewMasterTable to reflect any changes/additions to the source. How do I go about building the NewMasterTable as described? Does this make any sort of sense?
Since an Access table is a necessary requirement, then probably the only way to go about it is to create a set of Update and Insert queries that are executed periodically. There is no built-in "dynamic" feature of Access that will monitor and update the table.
First, create the table. You could either 1) do this manually from scratch by defining the columns and constraints yourself, or 2) create a make-table query (i.e. SELECT... INTO) that generates most of the schema, then add any additional columns, edit necessary details and add appropriate indexes.
Define and save Update and Insert (and optional Delete) queries to keep the table synced. I'm not sharing actual code here, because that goes beyond your primary issue I think and requires specifics that you need to define. Due to some ambiguity with your key values (the field names and sample data still are not sufficient to reveal precise relationships and constraints), it is likely that you'll need multiple Update statements.
In particular, the "Wing" columns will likely require a transform statement.
You may not be able to update all columns appropriately using a single query. I recommend not trying to force such an "artificial" requirement. Multiple queries can actually be easier to understand and maintain.
In the event that you experience "query is not updateable" errors, you may need to define other "temporary" tables with appropriate indexes, into which you do initial inserts from the linked tables, then subsequent queries to update your master table from those.
Finally, and I think this is the key to solving your problem, you need to define some Access form (or other code) that periodically runs your set of "sync" queries. Access forms have a [Timer Interval] property and corresponding Timer event that fires periodically. Add VBA code in the Form_Timer sub that runs all your queries. I would suggest "wrapping" such VBA in a transaction and adding appropriate error handling and error logging, etc.

Why does Neo4j hit every indexed record when only returning a count?

I am using version 3.0.3, and running my queries in the shell.
I have ~58 million record nodes with 4 properties each, specifically an ID string, a epoch time integer, and lat/lon floats.
When I run a query like profile MATCH (r:record) RETURN count(r); I get a very quick response:
+----------+
| count(r) |
+----------+
| 58430739 |
+----------+
1 row
29 ms
Compiler CYPHER 3.0
Planner COST
Runtime INTERPRETED
+--------------------------+----------------+------+---------+-----------+--------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+--------------------------+----------------+------+---------+-----------+--------------------------------+
| +ProduceResults | 7644 | 1 | 0 | count(r) | count(r) |
| | +----------------+------+---------+-----------+--------------------------------+
| +NodeCountFromCountStore | 7644 | 1 | 0 | count(r) | count( (:record) ) AS count(r) |
+--------------------------+----------------+------+---------+-----------+--------------------------------+
Total database accesses: 0
The Total database accesses: 0 and NodeCountFromCountStore tells me that neo4j uses a counting mechanism here that avoids iterating over all the nodes.
However, when I run profile MATCH (r:record) WHERE r.time < 10000000000 RETURN count(r);, I get a very slow response:
+----------+
| count(r) |
+----------+
| 58430739 |
+----------+
1 row
151278 ms
Compiler CYPHER 3.0
Planner COST
Runtime INTERPRETED
+-----------------------+----------------+----------+----------+-----------+------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------------+----------------+----------+----------+-----------+------------------------------+
| +ProduceResults | 1324 | 1 | 0 | count(r) | count(r) |
| | +----------------+----------+----------+-----------+------------------------------+
| +EagerAggregation | 1324 | 1 | 0 | count(r) | |
| | +----------------+----------+----------+-----------+------------------------------+
| +NodeIndexSeekByRange | 1752922 | 58430739 | 58430740 | r | :record(time) < { AUTOINT0} |
+-----------------------+----------------+----------+----------+-----------+------------------------------+
Total database accesses: 58430740
The count is correct, as I chose a time value larger than all of my records. What surprises me here is that Neo4j is accessing EVERY single record. The profiler states that Neo4j is using the NodeIndexSeekByRange as an alternative method here.
My question is, why does Neo4j access EVERY record when all it is returning is a count? Are there no intelligent mechanisms inside the system to count a range of values after seeking the boundary/threshold value within the index?
I use Apache Solr for the same data, and returning a count after searching an index is extremely fast (about 5 seconds). If I recall correctly, both platforms are built on top of Apache Lucene. While I don't know much about that software internally, I would assume that the index support is fairly similar for both Neo4j and Solr.
I am working on a proxy service that will deliver results in a paginated form (using the SKIP n LIMIT m technique) by first getting a count, and then iterating over results in chunks. This works really well for Solr, but I am afraid that Neo4j may not perform well in this scenario.
Any thoughts?
The later query does a NodeIndexSeekByRange operation. This is going through all your matched nodes with the record label to look up the value of the node property time and does a comparison if its value is less than 10000000000.
This query actually has to get every node and read some info for comparison, and that's the reason why it is much slower.

Making an Object Dependent Number of Fields for a Table in MS.Access

I'm trying to make a database that will hold a table of objects, and these objects are comprised of objects from a second table. One table is a table of possible sets, and the second is a table of possible components. The table of sets has to include fields for each of its components, but each set has an unknown number of components. How do I make a table with fields (Component 1, Component 2, Component 3, ...) that are dependent on each set to decide how many of the fields it needs?
Is there a way to do this just using the Access interface or will I actually have to get into the code behind it?
I think it would also solve my problem if there were a way to make a field in a column that acted as an ArrayList so if anyone could think of how to do that please let me know.
Assuming that a component can be part of more than one set, what you need here is a many-to-many relationship.
In a database you don't do this with an arbitrary number of columns, you use a junction table.
When you need a tabular representation, you use a Pivot / Crosstab query.
Your data model could look like this:
Sets
+--------+----------+
| Set_ID | Set_Name |
+--------+----------+
| 1 | foo |
| 2 | bar |
+--------+----------+
Components
+--------------+----------------+
| Component_ID | Component_Name |
+--------------+----------------+
| 1 | aaa |
| 2 | bbb |
| 3 | ccc |
| 4 | ddd |
+--------------+----------------+
Junction table
+----------+----------------+
| f_Set_ID | f_Component_ID |
+----------+----------------+
| 1 | 2 |
| 1 | 4 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
+----------+----------------+
(f_ as in Foreign Key)

Using Data from One Table in Another Table in Access

Hallo StackOverflow Users
I am struggling with transferring values between Access database tables which I will use in a Delphi program to tally election votes and determine the winning candidates. I have a total of six tables. One is my overall table, tblCandidates which identifies each candidate and contains the amount of votes they received from each party, namely the Grade Heads, the Teachers and the Learners. When it comes to the Learners we have four participating grades, namely the grade 8’s, 9’s, 10’s and 11’s, and each grade again has multiple participating classes, namely class A, B, C, etc.
Now, I have set up tables for each grade that contains all the classes in that grade. I named these tables tblGrX with X being the grade represented by 8 through 11. Each one of these tables has two extra fields, namely a field to identify a candidate and a field that will add up all the votes that candidate received from each of the classes in that grade. Lastly I have another table, tblGrTotals with fields Total_GrX (once again with X being the grade), that will contain all the total votes a candidate received from each grade, adding them up in another field for my tblCandidates table to use in its Total_Learners field.
So in short, I want, for example, tblGrTotals to use the value in the field Total of tblGr8 in its Total_Gr8 field, and then tblCandidates to use the value in field Total of tblGrTotals in its Total_Learners field. Is there any way to keep these values updated between tables like cells are updated in Excel the moment a change is made?
Thank you in advance!
You need to rethink your table design. I guess your background is Excel, and your tables are laid out like you would do in Excel sheets, but a relational database works differently.
Think about the objects you are modelling.
Candidates - that's easy. ID, Name, perhaps additional info that belongs to each candidate. But nothing about votes here.
"Groups that are voting" or Parties. Not so trivial, due to the different types of parties. Still I would put them in one table, with Grade and Class only set for Learners, NULL for Heads and Teachers.
e.g.
+----------+------------+-------+-------+
| Party_ID | Party_Type | Grade | Class |
+----------+------------+-------+-------+
| 1 | Head | | |
| 2 | Teacher | | |
| 3 | Learner | 8 | A |
| 4 | Learner | 8 | B |
| 5 | Learner | 8 | C |
| 6 | Learner | 9 | A |
| 7 | Learner | 9 | B |
| 8 | Learner | 10 | A |
+----------+------------+-------+-------+
Votes: they are a Junction Table between Candidates and Parties.
e.g.
+----------+--------------+-----------+
| Party_ID | Candidate_ID | Num_Votes |
+----------+--------------+-----------+
| 1 | 1 | 5 |
| 1 | 2 | 17 |
| 3 | 1 | 2 |
| 3 | 2 | 6 |
| 3 | 3 | 10 |
+----------+--------------+-----------+
Now: if you want to know the votes of Class 8A:
SELECT Candidate_ID, SUM(Num_Votes)
FROM Parties p INNER JOIN Votes v
ON p.Party_ID = v.Party_ID
WHERE p.Party_Type = 'Learner'
AND p.Grade = 8
AND p.Class = 'A'
GROUP BY Candidate_ID
Or of all Grade 8? Simply omit the p.Class criteria.
For the votes per candidate you join Candidates with Votes.
Edit:
for the votes counting differently, this is an attribute of Party_Type.
We don't have a table for them yet, so create one:
+------------+---------------+
| Party_Type | Multiplicator |
+------------+---------------+
| Head | 4 |
| Teacher | 3 |
| Learner | 1 |
+------------+---------------+
and to count all votes:
SELECT c.Candidate_ID, c.Candidate_Name, SUM(v.Num_Votes * t.Multiplicator) AS SumVotes
FROM Parties p
INNER JOIN Votes v ON p.Party_ID = v.Party_ID
INNER JOIN Party_Types t ON p.Party_Type = t.Party_Type
INNER JOIN Candidates c ON v.Candidate_ID = c.Candidate_ID
GROUP BY c.Candidate_ID, c.Candidate_Name
With a design like this, you don't need to keep updating data from one table into another - you calculate it when and how you need it, and it's always current.
The magic of databases. :)

Fill sequence in sql rows

I have a table that stores a group of attributes and keeps them ordered in a sequence. The chance exists that one of the attributes (rows) could be deleted from the table, and the sequence of positions should be compacted.
For instance, if I originally have these set of values:
+----+--------+-----+
| id | name | pos |
+----+--------+-----+
| 1 | one | 1 |
| 2 | two | 2 |
| 3 | three | 3 |
| 4 | four | 4 |
+----+--------+-----+
And the second row was deleted, the position of all subsequent rows should be updated to close the gaps. The result should be this:
+----+--------+-----+
| id | name | pos |
+----+--------+-----+
| 1 | one | 1 |
| 3 | three | 2 |
| 4 | four | 3 |
+----+--------+-----+
Is there a way to do this update in a single query? How could I do this?
PS: I'd appreciate examples for both SQLServer and Oracle, since the system is supposed to support both engines. Thanks!
UPDATE: The reason for this is that users are allowed to modify the positions at will, as well as adding or deleting new rows. Positions are shown to the user, and for that reason, these should show a consistence sequence at all times (and this sequence must be stored, and not generated on demand).
Not sure it works, But with Oracle I would try the following:
update my_table set pos = rownum;
this would work but may be suboptimal for large datasets:
SQL> UPDATE my_table t
2 SET pos = (SELECT COUNT(*) FROM my_table WHERE id <= t.id);
3 rows updated
SQL> select * from my_table;
ID NAME POS
---------- ---------- ----------
1 one 1
3 three 2
4 four 3
Do you really need the sequence values to be contiguous, or do you just need to be able to display the contiguous values? The easiest way to do this is to let the actual sequence become sparse and calculate the rank based on the order:
select id,
name,
dense_rank() over (order by pos) as pos,
pos as sparse_pos
from my_table
(note: this is an Oracle-specific query)
If you make the position sparse in the first place, this would even make re-ordering easier, since you could make each new position halfway between the two existing ones. For instance, if you had a table like this:
+----+--------+-----+
| id | name | pos |
+----+--------+-----+
| 1 | one | 100 |
| 2 | two | 200 |
| 3 | three | 300 |
| 4 | four | 400 |
+----+--------+-----+
When it becomes time to move ID 4 into position 2, you'd just change the position to 150.
Further explanation:
Using the above example, the user initially sees the following (because you're masking the position):
+----+--------+-----+
| id | name | pos |
+----+--------+-----+
| 1 | one | 1 |
| 2 | two | 2 |
| 3 | three | 3 |
| 4 | four | 4 |
+----+--------+-----+
When the user, through your interface, indicates that the record in position 4 needs to be moved to position 2, you update the position of ID 4 to 150, then re-run your query. The user sees this:
+----+--------+-----+
| id | name | pos |
+----+--------+-----+
| 1 | one | 1 |
| 4 | four | 2 |
| 2 | two | 3 |
| 3 | three | 4 |
+----+--------+-----+
The only reason this wouldn't work is if the user is editing the data directly in the database. Though, even in that case, I'd be inclined to use this kind of solution, via views and instead-of triggers.

Resources