Making View in Big Query is good practice? - analytics

I am making View in Big Query
but if am running any query again on this View it says this query will process the same GB space as of the table whose Desired View is created
Does View Destroyed Automatically or we have to delete it?

Views act in a similar way to macros: They auto-expand at execution time.
Storing a view processes no data: It just stores the associated query with a name you can use later.
When using a view in a larger query it will process data online: BigQuery has no materialized views (yet).

Database views are immaterial. If you are looking for a logical layer, you are almost always better off doing the modeling in a tool such as Looker or Microstrategy, where you can document the view logic as well:
https://discourse.looker.com/t/how-to-reference-views-and-fields-in-lookml/179

Related

How do a handle a database with over 100 tables

I have a database with over a 100 tables, 30 of them are lookup tables with lookup Language tables. each table links back to one or three tables. but there are around 20 different web forms that needs to interlink for a registered user.
My question is, do i create one connection string with one Model, or do i break them up into individual models?
I've tried the breaking up into individual models based on the page that they are required for, but this just throws up validation and reference errors looking for the same field.
I don't have any errors to show at the moment, but i can provide if necessary.
Sounds like you need to create some views so that you can consolidate the queries coming from the database. Try to think of logical groupings of the tables (lookup and otherwise) that you have and create a view for each logical grouping. Then, have your application query against those views to retrieve data.
As for connection strings, I don't see why you would need more than one if all of the tables are in the same database.
If you have the possibility to create only one connection string, that is what you should do.
When you create a second connection string, it's because you have no choice. Having many different connections strings is just going to add to the confusion you migth already be in.
The number of tables you have in a data base is never going to influence how many connection string you should have. I would even say : having acces to all the information of your database through one single object is an advantage. Now, the way you are going to organise the impressive amount of informations is crucial, and there is a lot of way to accomplish that. You need to find out yours.

Is Couchbase an ordered key-value store?

Are documents in Couchbase stored in key order? In other words, would they allow efficient queries for retrieving all documents with keys falling in a certain range? In particular I need to know if this is true for Couchbase lite.
Query efficiency is correlated with the construction of the views that are added to the server.
Couchbase/Couchbase Lite only stores the indexes specified and generated by the programmer in these views. As Couchbase rebalances it moves documents between nodes, so it seems impractical that key order could be guaranteed or consistent.
(Few databases/datastores guarantee document or row ordering on disk, as indexes provide this functionality more cheaply.)
Couchbase document retrieval is performed via map/reduce queries in views:
A view creates an index on the data according to the defined format and structure. The view consists of specific fields and information extracted from the objects in Couchbase. Views create indexes on your information that enables search and select operations on the data.
source: views intro
A view is created by iterating over every single document within the Couchbase bucket and outputting the specified information. The resulting index is stored for future use and updated with new data stored when the view is accessed. The process is incremental and therefore has a low ongoing impact on performance. Creating a new view on an existing large dataset may take a long time to build but updates to the data are quick.
source: Views Basics
source
and finally, the section on Translating SQL to map/reduce may be helpful:
In general, for each WHERE clause you need to include the corresponding field in the key of the generated view, and then use the key, keys or startkey / endkey combinations to indicate the data you want to select.
In conclusion, Couchbase views constantly update their indexes to ensure optimal query performance. Couchbase Lite is similar to query, however the server's mechanics differ slightly:
View indexes are updated on demand when queried. So after a document changes, the next query made to a view will cause that view's map function to be called on the doc's new contents, updating the view index. (But remember that you shouldn't write any code that makes assumptions about when map functions are called.)
How to improve your view indexing: The main thing you have control over is the performance of your map function, both how long it takes to run and how many objects it allocates. Try profiling your app while the view is indexing and see if a lot of time is spent in the map function; if so, optimize it. See if you can short-circuit the map function and give up early if the document isn't a type that will produce any rows. Also see if you could emit less data. (If you're emitting the entire document as a value, don't.)
from Couchbase Lite - View

Table vs View vs Materialized View

I'm currently learning PostgreSQL. However, I am a little bit confused about table, view, and materialized view. I understand the basic definitions as well as conceptions. But sometimes, I have a trouble to make a decision that I should create a table, a view, or materialized view. Would anyone share some experience how to apply it correctly? What are the pros and cons of one over the others?
A table is where data is stored. You always start with tables first, and then your usage pattern dictates whether you need views or materialized views.
A view is like a stored query for future use, if you're frequently joining or filtering the same tables the same way in multiple places.
A materialized view is like a combination of both: it's a table that is automatically populated and refreshed via a view. You'd use this if you were using views, and want to pre-join or pre-aggregate the rows to speed up queries.
This article has a nice explanation on this part. Quoting from it,
When you query a TABLE, you fetch its data directly. On the other
hand, when you query a VIEW, you are basically querying another query
that is stored in the VIEW's definition.
...
Between the two there is MATERIALIZED VIEW - it's a VIEW that has a
query in its definition and uses this query to fetch the data directly
from the storage, but it also has it's own storage that basically acts
as a cache in between the underlying TABLE(s) and the queries
operating on the MATERIALIZED VIEW. It can be refreshed, just like an
invalidated cache - a process that would cause its definition's query
to be executed again against the actual data.

GAE Datastore: Normalization?

Normalization not in a general relational database sense, in this context.
I have received reports from a User. The data in these reports was generated roughly at the same time, making the timestamp the same for all reports gathered in one request.
I'm still pretty new to datastore, and I know you can query on properties, you have to grab the ancestors' entity's key to traverse down... so I'm wondering which one is better performance and "write/read/etc" wise.
Should I do:
Option 1:
User (Entity, ancestor of ReportBundle): general user information properties
ReportBundle (Entity, ancestor of Report): timestamp
Report (Entity): general data properties
Option 2:
User (Entity, ancestor of Report): insert general user information properties
Report (Entity): timestamp property AND general data properties
Do option 2:
Because, you save time for reading and writing an additional Entity.
You also save database operations (which in the end will save money).
As I see from your options, you need to check the timestamp property anyhow so putting it inside the report object would be fine,
also your code is less complex and better maintainable.
As mentioned from Chris and in comments, using datastore means thinking denormalized.
It's better to store the data twice then doing complex queries, goal for your data design should be to get the entities by ID.
Doing so will also save on the amount of indexes you may need. This is important to know.
The reason why the amount of indexes is limited, is because of denormalization.
For each index you create, datastore creates a new table in behind, which holds the data in the right order based on your index. So when you use indexes your data is already stored more then one time. The good thing about this behavior is that writes are faster, because you can write to all the index tables in parallel. Also reads, because you read data already in right order based on your index.
Knowing this, and if only these 2 options are available, option 2 would be the better one.
We have lots of very denormalized models because of the inability to do JOINs.
You should think about how you are going to process the data, if you might expect request timeouts.

Querying NHibernate

We are using NHibernate as our ORM for the project and we have only database read only feature. The application will not be updating,deleting or inserting any records into the database it will be just querying the database for records.
My question is which is the best method to query the database with NHibernate in the scenario explained above.
Are you sure you really need an ORM?
Anyway, there are 3 common options to query database using NHibernate:
HQL.
Criteria API.
Linq.
The easiest is 3, the most powerful is 1.
But I don't really understand the nature of your question as the query APIs in NHiebrnate are not muturally exclusive, but rather they add up each other.
So you can use any of them depending on the situation:
For dynamic queries - best is Criteria API.
For complex and never changing - HQL.
For quick and easy - Linq.
Since it is read only, you probably won't have much use for retrieving the query results as mapped objects. A result set type return value might be more useful. For that use session.createQuery and then query.list
Each element of the list will be a object array. Each array element correponds to one select column.

Resources