Large Postgres data size to handle with google data studio - sql-server

I have remote access to a distant Postgres Server containing a database which size is almost 50 Go.
Is there any method to work on it with Google Data Studio without downloading the whole 50 Go?

When you use the native PostGres connector in Data Studio, it does not pull all data from the database server. Rather, it will construct separate aggregate queries for each chart element and push down the queries to the database server.
For example, let's assume you have sales data by product and by day in your database. If you draw a chart that shows only current months total sales by product, Data Studio will create a SQL query with WHERE clause that limits dates to current month and also add a GROUP BY clause for the products. Thus, Data Studio will only get back the aggregated results. This will be transparent to the dashboard owner and the viewer.
Keep this in mind while designing your dashboard and make sure to put in filters and aggregations to limit the amount of data you are pulling from the database.

Related

How to push data from a on-premises database to tableau crm

We have an on-premises oracle database installed on a server. We have to create some Charts/Dashboards with Tableau CRM on those data on-premises. Note that, tableau CRM is not Tableau Online, it is a Tableau version for the Salesforce ecosystem.
Tableau CRM has APIs, so we can push data to it or can upload CSV programmatically to it.
So, what can be done are,
Run a nodeJS app on the on-premise server, pull data from Oracle DB, and then push to Tableau CRM via the TCRM API.
Run a nodeJS app on the on-premise server, pull data from Oracle DB, create CSV, push the CSV via TCRM API
I have tested with the 2nd option and it is working fine.
But, you all know, it is not efficient. Because I have to run a cronJob and schedule the process multiple times in a day. I have to query the full table all the time.
I am looking for a better approach. Any other tools/technology you know to have a smooth sync process?
Thanks
The second method you described in the questions is a good solution. However, you can optimize it a bit.
I have to query the full table all the time.
This is can be avoided. If you take a look at the documentation of SObject InsightsExternalData you can see that it has a field by name Operation which takes one of these values Append, Delete, Overwrite, Upsert
what you will have to do is when you push data to Tableau CRM you can use the Append operator and push the records that don't exist in TCRM. That way you only query the delta records from your database. This reduces the size of the CSV you will have to push and since the size is less it takes less time to get uploaded into TCRM.
However, to implement this solution you need two things on the database side.
A unique identifier that uniquely identifies every record in the database
A DateTime field
Once you have these two, you have to write a query that sorts all the records in ascending order of the DateTime field and take only the files that fall below the last UniqueId you pushed into TCRM. That way your result set only contains delta records that you don't have on TCRM. After that you can use the same pipeline you built to push data.

SQL Server table daily sync of records from table A to table B

I want to create a daily process where I reload all rows from table A into table B. Over time table A rows will change due to changes in source system and also because of aging/deletion of records in the origin table. Table A gets truncated/reloaded daily in step 1. Table B is the master table that just gets new/updated rows.
From a historical point of view, I want to keep track of ALL the rows in table B and be able to do a point in time comparison for analytics purposes.
So I need to do two things, Daily insert rows from table A to table B if they don't exist and then also create a new record in Table B if the record already exists but ANY of the columns have changed. At one point I attempted to use temporal tables but I had too many false/positives on 'real' changes, basically certain columns were throwing off things because a date/time column was updated(only real change in row).
I'm using a Azure SQL Server Managed Instance database (Microsoft SQL Azure (RTM) - 12.0.2000.8).
At my disposal I have SSMS, SQL Server and also Azure Data Factory.
Any suggestions on the best way to do this or tools to help with this?
There are 2 concepts out of which you can implement any one.
Temporal table
Capture Data Change (CDC)
As CDC is the commonly used approach in which you can create an Azure data factory with a pipeline that loads delta data based on change data capture (CDC) information in the source Azure SQL Managed Instance database to an Azure blob storage.
To implement the CDC, you can you can follow this simple Microsoft tutorial Incrementally load data from Azure SQL Managed Instance to Azure Storage using change data capture (CDC)
Note: You also need to Create a storage account which is required but not given in above tutorial.

How can I query the data stored in Azure Table Storage using T-SQL

We will be writing several hundred thousand rows of data to an Azure Table Storage Container. The data is made up of 4 columns, 1 of which contains a lot of JSON text which is the main column I'm interested in.
How can I query this data using T-SQL? I was hoping to join this with some existing data we currently hold in a table on SQL Server too.
I am new to Azure Storage and am trying to work out if I have to query the data directly or can I get it to my SQL Server to perform some more detailed querying? It is being stored on Azure to start with due to ease and cost.
Azure Table storage does not support SQL: https://db-engines.com/en/system/Microsoft+Azure+Table+Storage%3BMicrosoft+SQL+Server
If your store your data on blob storage you might be able to use Polybase to query data from your SQL Server https://learn.microsoft.com/en-us/sql/relational-databases/polybase/polybase-guide?view=sql-server-ver15

Automatically or easily updating my database

I have available to me a Report that is generated in Microsoft SharePoint, and it holds the quantities for certain items. The reports can be exported as excel documents, but if it is possible i would like to avoid that.
In my Access database I have all the same items but with additional data concerning special requests and item identification in the item's respective documentation folders.
I am looking for a way to have the select few columns that represent the quantities and some other factors, to be automatically updated in my database.
How can I go about this? Is there a specific terminology for what I am attempting to do, I am unable to find it on Google?
So to clarify ... you have item data exported from SharePoint and item data in Access and ideally you'd like to merge both and store the results in Access.
Or maybe another way of putting it, you would like to compliment the data in Access with the data from SharePoint.
If the database that powered the SharePoint report ran in Access as well, the word you are looking for is replication. You want to automatically replicate the data from one server/database to another.
Unfortunately I don't know of any software that replicates data to Access.
Your best bet would be to write a program that scheduled the running of the SharePoint report and then imported that data into Access.
I'm happy to give you the terminology of what to Google for. Just don't make me use SharePoint and Access. :)
If you have the same items in a report in SharePoint and in Access hopefully there is a field that uniquely identifies each item and is used in each table (a unique key). If these items (typically we would say 'records' or 'tuples' in database circles) are inventory SKUs or product numbers would be examples of potential unique keys. If you re taking the information in two tables and merging them together using a unique key we call it a 'Natural Join'. I know Access and SharePoint both support SQL and using SQL this would be done using a SELECT statement.
I would try googling: Natural Join tables in SharePoint and Accesss
Or: SQL SELECT between SharePoint and Access
Hope this helps.
If you choose linked tables to SharePoint (as opposed to importing them local), then you will always have a live copy of the data. In fact this is replicated model in Access 2010. Then a query could be used that joins in the additional table columns with quanity etc. Replication would need caution since any changes to the local access table would go back up to SharePoint and that may not be desired or even allowed.
In this case I would thus simply import the SharePoint tables local and again use a join based on a PK to the tables with quanity etc. that is local. Note that the local copy + cache runs very fast in 2010, and prior to Access 2010 + SharePoint 2010 the speed of such a setup is not so good compared to Access 2010.
If you are using an older version of Access + SharePoint then I would suggest you continue your approach of important the SharePoint tables (as opposed to being linked to the live tables on SharePoint). You then again simply use a query that joins in the additional columns you wish to display in your reports.
Such a results query would not only be of use for reports, but you could export that query into Excel or word.
Best regards.

SQL Server need to partition data, but only have standard edition

Is there a way that I can in code (Sproc ,etc) distribute the data for a table into multiple filegroups without actually having SQL Server partitioning available (Only have Standard Edition)? I wanted to be able to breakout my FileStream data into different "Partitions", but without an Enterprise license I can't actually use the partitioning functionality.
Any suggestions would be greatly appreciated.
Thanks,
S
You can distribute your data into different databases and join them with views. The tricky part of that will be to keep the views updated as you add/remove data.
You need to do this "partition" on a logical key (like a calendar date) where each DB has data within a certain range. If you cluster on this field, the query analyzer will be able to determine which DB to pull data from without issue.
At my workplace we are using this technique for a very large (multi-billion row) data set that we get monthly additions to and it works great.

Resources