Is it possible to make changes to WCF RIA entities on the server, send them to the client but not affect the underlying entities? - silverlight

From reading the title it might seem like an odd request, so let me clarify.
I'm storing dates and times on the server alongside their time zone information. I want the clients to be able to request these objects with a parameter matching their required time zone and receive the objects with the appropriate data.
So say I have a table of Bookings for particular times. A couple of rows might look like
BookingId | When | TimeZone | Notes
1 | 2011-05-06 12:00:00.000 | GMT +12 | null
2 | 2011-05-06 08:00:00.000 | GMT +2 | null
The client would call something like GetBookings("Pacific Standard Time") and the resulting entity would be the above 2 tuples (probably without the time zone field) with their DateTimes adjusted such that the times are given in the client's time zone, with no additional time zone/offset information.
I know I could just do the time zone conversion on the client, but if I have multiple different clients I'm looking at duplicating this (somewhat tricky) code on multiple platforms, which I don't want to do.
The problem here is that if the server makes changes to these entities (which are backed by EF) then the changes are tracked by the ObjectContext. I'm sure there's a simple way around this?
The best solution I have thought of so far is a DTO for my Booking object, which I'd rather avoid but will implement if necessary.
Thanks.

Well, one approach will be you can simply create a new object of that class type and copy the data from your "real" object to this one and modify this object's timestamp. Offcourse you should not add this to ObjectContext :p. If you return this object it will be simply good and you can achieve your results.
A better solution will be just create partial class for your class (Mindwell it should be in the same namespace) and create a computed property. If you are using Silverlight use [DataMemberAttribute()] on the property and populate your information accordingly with your desired timezone. I think this is good to go.

Related

Pivot Table for Wagtail Form Builder for Polling and Votes purpose

I am trying to use Wagtail Form Builder for Voting and Polling purpose, and use HighCharts to display the results interactively on webpage.
the problems is that Wagtail FormSubmission class only stores information of each vote.
| vote user | question 1 | question 2 |
| jason | A | C |
| lily | D | B |
But I want to get information like:
How many users voted for A, B, C, D for Question 1, 2 respectively, and What are those users. Similar to do a Pivot Table for the FormSubmission results.
I understand I can do a QuerySet API Aggregation to get what I want, but I do not want to do this expensive manipulation every time when user visit the webpage.
I am thinking about using class-level attributes to achieve this.
Q: I am wondering what is the best practice to store those aggregation results in DB and update accordingly every time a Vote is submitted?
Wagtail form builder is not really suitable for this task. It's designed to allow non-programmers to construct forms for simple data collection - where it just needs to be stored and retrieved, with no further processing - without having to know how to define a Django model. To achieve this, all the data is stored in the FormSubmission model in a single field as JSON text, so that the same model can be re-used for any form. Since this isn't a format that the database engine understands natively, there's no way to perform complex queries on it efficiently - the only way is to unpack each submission individually and run calculations on it in Python code, which is going to be less efficient than any queryset functionality.
Instead, I would recommend writing a custom Django app for this. The tutorial in the Django documentation is a polls app, which should give you some idea of the way to go about it, but in short, you'll most likely need three models: a Question model containing the text of each question, an AnswerChoice model where each item is one of the possible answers for one question, and a Response model indicating which AnswerChoice a given user has chosen. With these models, you'll be able to perform queries such as "how many users answered A for question 1" with a queryset expression such as:
Response.objects.filter(question=1, answer_choice='A').count()

Making a table with fixed columns versus key-valued pairs of metadata?

I was asked to create a table to store paid-hours data from multiple attendance systems from multiple geographies from multiple sub-companies. This table would be used for high level reporting so basically it is skipping the steps of creating tables for each system (which might exist) and moving directly to what the final product would be.
The request was to have a dimension for each type of hours or pay like this:
date | employee_id | type | hours | amount
2016-04-22 abc123 regular 80 3500
2016-04-22 abc123 overtime 6 200
2016-04-22 abc123 adjustment 1 13
2016-04-22 abc123 paid time off 24 100
2016-04-22 abc123 commission 600
2016-04-22 abc123 gross total 4413
There are multiple rows per employee but the though process is that this will allow us to capture new dimensions if they are added.
The data is coming from several sources and I was told not to worry about the ETL, but just design the ultimate table and make it work for any system. We would provide this format to other people for them to fill in.
I have only seen the raw data from one system and it like this:
date | employee_id | gross_total_amount | regular_hours | regular_amount | OT_hours | OT_amount | classification | amount | hours
It is pretty messy. Multiple rows for employees and values like gross_total repeat each row. There is a classification column which has items like PTO (paid time off), adjustments, empty values, commission, etc. Because of repeating values, it is impossible to just simply sum the data up to make it equal the gross_total_amount.
Anyways, I kind of would prefer to do a column based approach where each row describes the employees paid hours for a cut off. One problem is that I won't know all of the possible types of hours which are possible so I can't necessarily make a table like:
date | employee_id | gross_total_amount | commission_amount | regular_hours | regular_amount | overtime_hours | overtime_amount | paid_time_off_hours | paid_time_off_amount | holiday_hours | holiday_amount
I am more used to data formatted that way though. The concern is that you might not capture all of the necessary columns or if something new is added. (For example, I know there is maternity leave, paternity leave, bereavement leave, in other geographies there are labor laws about working at night, etc)
Any advice? Is the table which was suggested to me from my superior a viable solution?
TAM makes lots of good points, and I have only two additional suggestions.
First, I would generate some fake data in the table as described above, and see if it can generate the required reports. Show your manager each of the reports based on the fake data, to check that they're OK. (It appears that the reports are the ultimate objective, so work back from there.)
Second, I would suggest that you get sample data from as many of the input systems as you can. This is to double check that what you're being asked to do is possible for all systems. It's not so you can design the ETL, or gather new requirements, just testing it all out on paper (do the ETL in your head). Use this to update the fake data, and generate fresh fake reports, and check the reports again.
Let me recapitulate what I understand to be the basic task.
You get data from different sources, having different structures. Your task is to consolidate them in a single database to be able to answer questions about all these data. I understand the hint about "not to worry about the ETL, but just design the ultimate table" in that way that your consolidated database doesn't need to contain all detail information that might be present in the original data, but just enough information to fulfill the specific requirements to the consolidated database.
This sounds sensible as long as your superior is certain enough about these requirements. In that case, you will reduce the information coming from each source to the consolidated structure.
In any way, you'll have to capture the domain semantics of the data coming in from each source. Lacking access to your domain semantics, I can't clarify the mess of repeating values etc. for you. E.g., if there are detail records and gross total records, as in your example, it would be wrong to add the hours of all records, as this would always yield twice the hours actually worked. So someone will have to worry about ETL, namely interpreting each set of records, probably consisting of all entries for an employee and one working day, find out what they mean, and transform them to the consolidated structure.
I understand another part of the question to be about the usage of metadata. You can have different columns for notions like holiday leave and maternity leave, or you have a metadata table containing these notions as a key-value pair, and refer to the key from your main table. The metadata way is sometimes praised as being more flexible, as you can introduce a new type (like paternity leave) without redesigning your database. However, you will need to redesign the software filling and probably also querying your tables to make use of the new type. So you'll have to develop and deploy a new software release anyway, and adding a few columns to a table will just be part of that development effort.
There is one major difference between a broad table containing all notions as attributes and the metadata approach. If you want to make sure that, for a time period, either all or none of the values are present, that's easy with the broad table: Just make all attributes `not nullĀ“, and you're done. Ensuring this for the metadata solution would mean some rather complicated constraint that may or may not be available depending on the database system you use.
If that's not a main requirement, I would go a pragmatic way and use different columns if I expect only a handful of those types, and a separate key-value table otherwise.
All these considerations relied on your superior's assertion (as I understand it) that your consolidated table will only need to fulfill the requirements known today, so you are free to throw original detail information away if it's not needed due to these requirements. I'm wary of that kind of assertion. Let's assume some of your information sources deliver additional information. Then it's quite probable that someday someone asks for a report also containing this information, where present. This won't be possible if your data structure only contains what's needed today.
There are two ways to handle this, i.e. to provide for future needs. You can, after knowing the data coming from each additional source, extend your consolidated database to cover all data structures coming from there. This requires some effort, as different sources might express the same concept using different data, and you would have to consolidate those to make the data comparable. Also, there is some probability that not all of your effort will be worth the trouble, as not all of the detail information you get will actually be needed for your consolidated database. Another more elegant way would therefore be to keep the original data that you import for each source, and only in case of a concrete new requirement, extend your database and reimport the data from the sources to cover the additional details. Prices of storage being low as they are, this might yield an optimal cost-benefit ratio.

Database schema for calendar application

I'm creating a calendar application in which each date has one of 3 states: available, maybe available, and unavailable. Trying to figure out the best schema for this situation.
One thought might be to have a UserDate model with a field state. The problem with this is that the DB will have #-of-users- x 365 rows for each year - seems like it would grow too quickly for a modestly sized app.
Another thought might be to have a default state, and only create a UserDate object when the user has signified that their availability on that date is different from the default. This seems convoluted though.
Has anyone dealt with this situation before? Any suggestions on the best way to go about this?
When you create a new user, you do not want to be inserting records for the next 50 years of their life. Only creating the UserDate object when there is a non-default value is what you should do.
You could consider storing a range of dates for a user if you are likely to have lots of consecutive dates with the same status. For example, if they are unavailable for all of December, then this could be represented as a single row.
Think about the sort of information you want to extract from your database, and how difficult this will be with each of your possible designs.

When is it OK to blur the abstraction between data and logic?

I mean referring to specific database rows by their ID, from code, or specifying a class name in the database. Example:
You have a database table called SocialNetwork. It's a lookup table. The application doesn't write or or delete from it. It's mostly there for database integrity; let's say the whole shebang looks like this:
SocialNetwork table:
Id | Description
-----------------------------
1 | Facebook
2 | Twitter
SocialNetworkUserName table:
Id | SocialNetworkId | Name
---------------------------------------------------
1 | 2 | #seanssean
2 | 1 | SeanM
In your code, there's some special logic that needs to be carried out for Facebook users. What I usually do is make either an enum or some class constants in the code to easily refer to it, like:
if (socailNetwork.Id == SocialNetwork.FACEBOOK ) // SocialNetwork.FACEBOOK = 1
// special facebook-specific functionality here
That's a hard-coded database ID. It's not a huge crime since it's just referencing a lookup table, but there's no longer a clean division between data and logic, and it bothers me.
The other option I can think of would be to specify the name of a class or delegate in the database, but that's even worse IMO because now you've not only broken the division between data and logic, but you've tied yourself to one language now.
Am I making much ado about nothing?
I don't see the problem.
At some point your code needs to do things. Facebook is a real social network, with its own real API, and you want it to do Facebook-specific things in your code. Unless your tasks are trivial, to put all of the Facebook-specific stuff in the database would mean a headache in your code. (What's the equivalent of "Like" in Twitter, for example?)
If the Facebook entry isn't in your database, then the Facebook-specific code won't be executed. You can do that much.
Yep, but with the caveat that "it depends." It's unlikely to change, but.
Storing the name of a class or delegate is probably bad, but storing a token used by a class or delegate factory isn't, because it's language-neutral--but you'll always have the problem of having to maintain the connection somewhere. Unless you have a table of language-specific things tied to that table, at which point I believe you'd be shot.
Rather than keep the constant comparison in mainline code, IMO this kind of situation is nice for a factory/etc. pattern, enum lookup, etc. to implement network-specific class lookup/behavior. The mainline code shouldn't have to care how it's implemented, which it does right now--that part is a genuine concern.
With the caveat that ultimately it may never matter. If it were me, I'd at least de-couple the mainline code, because stuff like that makes me twitchy.

Deletion / invalidation approaches for reference data

Based on the discussion I found here: Database: To delete or not to delete records, I want to focus on reference data in particular, add a few thoughts on that, and ask for your preferred approach in general, or based on which criteria you make the decision which of the approaches available you go for.
Let's assume the following data structure for a 'request database' for customers, whereas requests may be delivered via various channels (phone, mail, fax, ..; our 'reference data table I want to mainly focus on'):
Request (ID, Text, Channel_ID)
Channel(ID, Description)
Let's, for the beginning, assume the following data within those two tables:
Request:
ID | Text | Channel_ID
===============================================================
1 | How much is product A currently? | 1
2 | What about my inquiry from 2011/02/13? | 1
3 | Did you receive my payment from 2011/03/04? | 2
Channel:
ID | Description
===============================================================
1 | Phone
2 | Mail
3 | Fax
So, how do you attack this assuming the following requirements:
Channels may change over time. That means: Their descriptions may change. New ones may be added, only valid starting from some particular data. Channels may be invalidated (by some particular date)
For reporting and monitoring purposes, it needs to be possibly to identify using which channel a request was originally filed.
For new requests, only the currently 'valid' channels should be allowed, whereas for pre-existing ones, also the channels that were valid at that particular date should be allowed.
In my understanding, that clearly asks for a richer invalidation approach that goes beyond a deletion flag, probably something incorporating a 'ValidFrom / ValidTo' approach for the reference data table.
On the other hand, this involves several difficulties during data capture of requests, because for new requests, you only display they currently available channels, whereas for maintenance of pre-existing ones, all channels available as of the creation of this record need to be displayed. This might not only be complicated from a development point of view, but may also be non-intuitive to the users.
How do you commonly set up your data model for reference data that might chance over time? How do you create your user interface then? Which further parameters do you take into account for proper database design?
In such cases I usually create another table, for example, channel_versions that duplicates all fields from channel and has extra create_date column(and it's own PK of course). For channel I define after insert/update triggers that copy new values into channel_versions. Now all requests from Request table refer to records from channel_versions. For new requests you need to get the most recent version of channel from channel_versions . For old requests you always know how channel looked when the request was fulfilled.

Resources