Pivot Table for Wagtail Form Builder for Polling and Votes purpose - wagtail

I am trying to use Wagtail Form Builder for Voting and Polling purpose, and use HighCharts to display the results interactively on webpage.
the problems is that Wagtail FormSubmission class only stores information of each vote.
| vote user | question 1 | question 2 |
| jason | A | C |
| lily | D | B |
But I want to get information like:
How many users voted for A, B, C, D for Question 1, 2 respectively, and What are those users. Similar to do a Pivot Table for the FormSubmission results.
I understand I can do a QuerySet API Aggregation to get what I want, but I do not want to do this expensive manipulation every time when user visit the webpage.
I am thinking about using class-level attributes to achieve this.
Q: I am wondering what is the best practice to store those aggregation results in DB and update accordingly every time a Vote is submitted?

Wagtail form builder is not really suitable for this task. It's designed to allow non-programmers to construct forms for simple data collection - where it just needs to be stored and retrieved, with no further processing - without having to know how to define a Django model. To achieve this, all the data is stored in the FormSubmission model in a single field as JSON text, so that the same model can be re-used for any form. Since this isn't a format that the database engine understands natively, there's no way to perform complex queries on it efficiently - the only way is to unpack each submission individually and run calculations on it in Python code, which is going to be less efficient than any queryset functionality.
Instead, I would recommend writing a custom Django app for this. The tutorial in the Django documentation is a polls app, which should give you some idea of the way to go about it, but in short, you'll most likely need three models: a Question model containing the text of each question, an AnswerChoice model where each item is one of the possible answers for one question, and a Response model indicating which AnswerChoice a given user has chosen. With these models, you'll be able to perform queries such as "how many users answered A for question 1" with a queryset expression such as:
Response.objects.filter(question=1, answer_choice='A').count()

Related

Firestore: Running Complex Update Queries With Multiple Retrievals (ReactJS)

I have a grid of data whose endpoints are displayed from data stored in my firestore database. So for instance an outline could be as follows:
| Spent total: $150 |
| Item 1: $80 |
| Item 2: $70 |
So the value for all of these costs (70,80 and 150) is stored in my firestore database with the sub items being a separate collection from my total spent. Now, I wannt to be able to update the price of item 2 to say $90 which will then update Item 2's value in firestore, but I want this to then run a check against the table so that the "spent total" is also updated to say "$170". What would be the best way to accomplish something like this?
Especially if I were to add multiple rows and columns that all are dependent on one another, what is the best way to update one part of my grid so that afterwords all of the data endpoints on the grid are updated correctly? Should I be using cloud functions somehow?
Additionally, I am creating a ReactJS app and previously in the app I just had my grid endpoints stored in my Redux store state so that I could run complex methods that checked each row and column and did some math to update each endpoint correctly, but what is the best way to do this now that I have migrated my data to firestore?
Edit:here are some pictures of how I am trying to set up my firestore layout currently:
You might want to back up a little and get a better understanding of the type of database that Firestore is. It's NoSQL, so things like rows and columns and tables don't exist.
Try this video: https://youtu.be/v_hR4K4auoQ
and this one: https://youtu.be/haMOUb3KVSo
But yes, you could use a cloud function to update a value for you, or you could make the new Spent total calculation within your app logic and when you write the new value for Item 2, also write the new value for Spent total.
But mostly, you need to understand how firestore stores your data and how it charges you to retrieve it. You are mostly charged for each read/write request, with much less concern for the actual amount of data you have stored overall. So it will probably be better to NOT keep these values in separate collections if you are always going to be utilizing them at the same time.
For example:
Collection(transactions) => Document(transaction133453) {item1: $80, item2: $70, spentTotal: $150}
and then if you needed to update that transaction, you would just update the values for that document all at once and it would only count as 1 write operation. You could store the transactions collection as a subcollection of a customer document, or simply as its own collection. But the bottom line is most of the best practices you would rely on for a SQL database with tables, columns, and rows are 100% irrelevant for a Firestore (NoSQL) database, so you must have a full understanding of what that means before you start to plan the structure of your database.
I hope this helps!! Happy YouTubing...
Edit in response to comment:
The way I like to think about it is how am I going to use the data as opposed to what is the most logical way to organize the data. I'm not sure I understand the context of your example data, but if I were maybe tracking budgets for projects or something, I might use something like the screenshots I pasted below.
Since I am likely going to have a pretty limited number of team members for each budget, that can be stored in an array within the document, along with ALL of the fields specific to that budget - basically anything that I might like to show in a screen that displays budget details, for instance. Because when you make a query to populate the data for that screen, if everything you need is all in one document, then you only have to make one request! But if you kept your "headers" in one doc and then your "data" in another doc, now you have to make 2 requests just to populate 1 screen.
Then maybe on that screen, I have a link to "View Related Transactions", if the user clicks on that, you would then call a query to your collection of transactions. Something like transactions is best stored in a collection, because you probably don't know if you are going to have 5 transactions or 500. If you wanted to show how many total transactions you had on your budget details page, you might consider adding a field in your budget doc for "totalTransactions: (number)". Then each time a user added a transaction, you would write the transaction details to the appropriate transactions collection, and also increase the totalTransactions field by 1 - this would be 2 writes to your db. Firestore is built around the concept that users are likely reading data way more frequently than writing data. So make two writes when you update your transactions, but only have to read one doc every time you look at your budget and want to know how many transactions have taken place.
Same for something like chats. But you would only make chats a subcollection of the budget document if you wanted to only ever show chats for one budget at a time. If you wanted all your chats to be taking place in one screen to talk about all budgets, you would likely want to make your chats collection at the root level.
As for getting your data from the document, it's basically a JSON object so (may vary slightly depending on what kind of app you are working in),
a nested array is referred to by:
documentName.arrayName[index]
budget12345.teamMembers[1]
a nested object:
documentName.objectName.fieldName
budget12345.projectManager.firstName
And then a subcollection is
collection(budgets).document(budget12345).subcollection(transactions)
FirebaseExample budget doc
FirebaseExample remainder of budget doc
FirebaseExample team chats collection
FirebaseExample transactions collection

The right record access implementation

I am looking into indexing engines, specifically Apache Lucene Solr. We are willing to use it for our searches, yet one of the problems solved by our frameworks search is row-level access.
Solr does not provide record access out of the box:
<...> Solr does not concern itself with security either at the document level or the communication level.
And in the section about document level security: http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security
There are few suggestions - either use Manifold CF (which is highly undocumented and seems in a very pre-beta stage) or write your own request handler/search component (that part is marked as stub) - I guess that the later one would have bigger impact on performance.
So I assume not much is being done in this field.
In the recently released 4.0 version of Solr, they have introduced joining two indexed entities. Joining might seem a nice idea, since our framework also does a join to know whether the record is accessible for the user. The problem here is that sometimes we do a inner join, and sometimes and outer (depending on the optimistic (everything what's not forbidden is allowed) or pessimistic (everything is forbidden only what is explicitly allowed) security setting in the scope).
To give a better understanding of what our structure looks like:
Documents
DocumentNr | Name
------------------
1 | Foo
2 | Bar
DocumentRecordAccess
DocumentNr | UserNr | AllowRead | AllowUpdate | AllowDelete
------------------------------------------------------------
1 | 1 | 1 | 1 | 0
So for example the generated query for the Documents in pessimistic security setting would be:
SELECT * FROM Documents AS d
INNER JOIN DocumentRecordAccess AS dra ON dra.DocumentNr=d.DocumentNr AND dra.AllowRead=1 AND dra.UserNr=1
This would return only the foo, but not the bar. And in optimistic setting:
SELECT * FROM Documents AS d
LEFT JOIN DocumentRecordAccess AS dra ON dra.DocumentNr=d.DocumentNr AND dra.AllowRead=1 AND dra.UserNr=1
Returning both - the Foo and the Bar.
Coming back to my question - maybe someone has already done this and can share their insight and experience?
I am afraid there's no easy solution here. You will have to sacrifice something to get ACLs working together with the search.
If your corpus size is small (I'd say up to 10K documents), you could create a cached bit set of forbidden (or allowed, whichever less verbose) documents and send relevant filter query (+*:* -DocumentNr:1 ... -DocumentNr:X). Needless to say, this doesn't scale. Sending large queries will make the search a bit slower, but this is manageable (up to a point of course). Query parsing is cheap.
If you can somehow group these documents and apply ACLs on document groups, this would allow cutting on query length and the above approach would fit perfectly. This is pretty much what we are using - our solution implements taxonomy and has taxonomy permissions done via fq query.
If you don't need to show the overall result set count, you can run your query and filter the result set on the client side. Again, not perfect.
You can also denormalize your data structures and store both tables flattened in a single document like this:
DocumentNr: 1
Name: Foo
Allowed_users: u1, u2, u3 (or Forbidden_users: ...)
The rest is as easy as sending user id with your query.
Above is only viable if the ACLs are rarely changing and you can afford reindexing the entire corpus when they do.
You could write a custom query filter which would have cached BitSets of allowed or forbidden documents by user(group?) retrieved from the database. This would require not only providing DB access for Solr webapp but also extending/repackaging the .war which comes with Solr. While this is relatively easy, the harder part would be cache invalidation: main app should somehow signal Solr app when ACL data gets changed.
Options 1 and 2 are probably more reasonable if you can put Solr and your app onto the same JVM and use javabin driver.
It's hard to advice more without knowing the specifics of the corpus/ACLs.
I am agree with mindas, what he has suggested (sol-4), i have implemented my solution the same way,but the difference is i have few different type of ACLs. At usergroup level,user level and even document level too (private access).
The solution is working fine. But the main concern in my case is that ACLs gets changed frequently and that needs to be updated in the index,mean while search performance should not get affected too.
I am trying to manage this with load balancing and adding few more nodes into the cluster.
mindas,unicron can you please put your thoughts on this?

How to model football game statistics in RavenDB

I'm new to RavenDB and I'm still trying to get my head around the best way to model the data for the current scenario. Here is what the data looks like.
Game
- Teams
- Team 1
- list of players
- Team 2
- list of players
- Events
- Event 1
- type: Pass
- teamId
- PlayerId
- Event 2
- type: Goal
- teamId
- PlayerId
At the beginning of each game we get the overall info for the game (e.g. Teams, Venue etc) and then every few minutes we get an updated list of events.
Also I need to be able to query data for a particular player thoughtout the game (e.g. How many passes a player has)
Do I store it as a single document? Do I split the Events into a separate document e.g. GameEvents? Is there a third scenario?
I wouldn't worry about how this will be stored in RavenDB. That's the beauty of document databases; don't think relationally. Create your domain model in the object-oriented way that it should be created (a Team would have a List<Player> property, etc...), then just save the entities as necessary.
I've been meaning to blog about how I kept my domain model pure while using RavenDB. I need to publish that...
** EDIT **
I finally published that blog: http://bit.ly/xUsYJK. This shows how Presto kept a somewhat pure domain model while using RavenDB.
By the way, Daniel Lang has a good blog about this subject:
http://daniellang.net/how-to-handle-relations-in-ravendb/
I use the Include<T> approach because I like to keep my domain entities referencing each other in what I consider an appropriate way.
Daniel also has a section called "Denormalize your references." Some people prefer that method.
Store it as a single object - the structure you've defined is great. Then just define indexes for the various types of queries you will be doing. Not having to break things down into tables with relationships is what makes document DBs like Raven awesome - great for scenarios just like what you are describing.
Thinking about how many such events can occur during a game, I believe that you definitely want to have them as seperate documents. That way you don't need to load and update the game document on each event coming in, as this would also be quite expensive if the documents grows very large.
To get statistics across all events of a game, I'd rather have some indexes that collect the apppropriate data.

Django: efficient database search

I need an efficient way to search through my models to find a specific User, here's a list,
User - list of users, their names, etc.
Events - table of events for all users, on when they're not available
Skills - many-to-many relationship with the User, a User could have a lot of skills
Contracts - many-to-one with User, a User could work on multiple contracts, each with a rating (if completed)
... etc.
So I got a lot of tables linked to the User table. I need to search for a set of users fitting certain criteria; for example, he's available from next Thurs through Fri, has x/y/z skills, and has received an average 4 rating on all his completed contracts.
Is there some way to do this search efficiently while minimizing the # of times I hit the database? Sorry if this is a very newb question.
Thanks!
Not sure if this method will solve you issue for all 4 cases, but at least it should help you out in the first one - querying users data efficiently.
I usually find using values or values_list query function faster because it slims down the SELECT part of the actual SQL, and therefore you will get results faster. Django docs regarding this.
Also worth mentioning that starting with new dev version within values and values_list you can query any type of relationship, including many_to_one.
And finally you might find in_bulk also useful. If I do a complex query, you might try to query the ids first of some models using values or values_list and then use in_bulk to get the model instances faster. Django docs about that.

Deletion / invalidation approaches for reference data

Based on the discussion I found here: Database: To delete or not to delete records, I want to focus on reference data in particular, add a few thoughts on that, and ask for your preferred approach in general, or based on which criteria you make the decision which of the approaches available you go for.
Let's assume the following data structure for a 'request database' for customers, whereas requests may be delivered via various channels (phone, mail, fax, ..; our 'reference data table I want to mainly focus on'):
Request (ID, Text, Channel_ID)
Channel(ID, Description)
Let's, for the beginning, assume the following data within those two tables:
Request:
ID | Text | Channel_ID
===============================================================
1 | How much is product A currently? | 1
2 | What about my inquiry from 2011/02/13? | 1
3 | Did you receive my payment from 2011/03/04? | 2
Channel:
ID | Description
===============================================================
1 | Phone
2 | Mail
3 | Fax
So, how do you attack this assuming the following requirements:
Channels may change over time. That means: Their descriptions may change. New ones may be added, only valid starting from some particular data. Channels may be invalidated (by some particular date)
For reporting and monitoring purposes, it needs to be possibly to identify using which channel a request was originally filed.
For new requests, only the currently 'valid' channels should be allowed, whereas for pre-existing ones, also the channels that were valid at that particular date should be allowed.
In my understanding, that clearly asks for a richer invalidation approach that goes beyond a deletion flag, probably something incorporating a 'ValidFrom / ValidTo' approach for the reference data table.
On the other hand, this involves several difficulties during data capture of requests, because for new requests, you only display they currently available channels, whereas for maintenance of pre-existing ones, all channels available as of the creation of this record need to be displayed. This might not only be complicated from a development point of view, but may also be non-intuitive to the users.
How do you commonly set up your data model for reference data that might chance over time? How do you create your user interface then? Which further parameters do you take into account for proper database design?
In such cases I usually create another table, for example, channel_versions that duplicates all fields from channel and has extra create_date column(and it's own PK of course). For channel I define after insert/update triggers that copy new values into channel_versions. Now all requests from Request table refer to records from channel_versions. For new requests you need to get the most recent version of channel from channel_versions . For old requests you always know how channel looked when the request was fulfilled.

Resources