BigQuery is a database that can store Spreadsheet value into it.
However, Calculating is done in the Spreadsheet cell that contains formula.
So, When calculating those values in BigQuery Database in Spreadsheet, Does it calculate in the BigQuery and uses BigQuery processor or Spreadsheet processor?
Link to Spreadsheet : https://docs.google.com/spreadsheets/d/1U_7ZvpbfUNqMoyvLrKd0G5wXS_aOYImQtu_iZuhXVQs/edit#gid=0
In Sheet2!B2 there is a formula that calculate =SUM(partial!num) , Does it calculate with BigQuery processor or Spreadsheet processor?
Related
i need to optimize database for high volume insert/reads
i have a postgress table raw data around 9million row.
Row composed of 2 columns: timestamp, string value; data is sorted asc by time
The only query I use agains this csv is:
select * where unixtimestamp between 1600000 and 1700000
after getting the data from raw table, i apply some function on result. and cache processed data into another table for faster future queries
I tried mongo with timeseries collection it was better yet still ~ 25sec to fetch; and insert is even longer.
Postgress was best fetch time: but trying to insert 500k row at once would take forever.
So far best way I found was to parse csv files as strings and use binary search to select rows
I think my indexs are not right and that's main problem
So what are your suggestions for keeping maintaining timeseries data where only operation I need is to get in range
clarify question EDIT1:
i have raw data of 9mil rows.
user can request data from api using one of 20+ aggregation formula
example api/get?formula=1&from=2019&to=2021
so i check if i do have cache for formula=1 within this date range
if not, then i need to load raw data for this date range, apply the formula then the user requested (result is usually 1k row for every 1mill or raw data) then i cache aggregated data for subsequent requests.
i cannot pre-process then data for all formulas in system; because i have over 2000 sensor (each has its own 9mil raw data); result will be more than 2TB of space.
I have a SQL Table with close to 2 Million rows and I am trying to export this data into an Excel file so the stakeholders can manipulate data, see charts, so on...
The issue is, when I hit refresh, it fails after getting all the data saying the number of rows exceed max rows limitation in Excel. This table is going to keep growing every day.
What I am looking for here is a way to refresh data, then add rows to Sheet 1 until max rows limitation is reached. Once maxed out, I want the rows to start getting inserted into Sheet 2. Once maxed out, move to 3rd sheet, all from the single SQL table, from a single refresh.
This does not have to happen in Excel (Data -> Refresh option), I can have this as a part of the SSIS package that I am already using to populate rows in the SQL table.
I am also open to any alternate ways to export SQL table into a different format that can be used by said stakeholders to create charts, analyze data, and whatever else pleases them.
Without sounding too facetious, you are suggesting a very inefficient method.
The best way of approaching this method is not to use .xlsx files at all for the data storage.
Assuming your destination stakeholders don't have read access to the SQL server, export the data to .csv and then use Power Query in some sort of 'Dashbaord.xlsx' type file to load the .csv to the data model which can handle hundreds of millions of rows instead of just 1.05m.
This will allow for the use of Power Pivot and DAX for analysis and the data will also be visible in the data model table view if users do want raw rows (or they can refer to the csv file..).
If they do have SQL read access then you can query the server directly so you don't need to store any rows whatsoever as it will read directly.
Failing all that and you decide to do it your way, I would suggest the following.
Read your table into a Pandas df and iterate over each row and cell of the dataframe, writing to an your xlsx[sheet1] using openpyxl then once the row number reaches 1,048,560 simply iterate to xlsx[sheet2].
In short: openpyxl allows you to create workbooks, worksheets, and write to cells directly.
But depending on how many columns you have it could take incredibly long.
Product Limitations
Excel 2007+ 1,048,576 rows by 16,384 columns
A challenge with your suggestion of filling a worksheet with the max number of rows and then splitting is "How are they going to work with that data?" and "Did you split data that should have been together to make an informed choice?"
If Excel is the tool the users want to use and they must have access to all the data, then you're going to need to put the data into a Power Pivot data model (and yes, that's going to impact the availability of some data visualizations). A Power Pivot model is an in-memory tabular data set. What the means is that the data engine, xVelocity, is going to use a bunch of memory but can get over the 1 million row limitation. Depending on how much memory is required, you might need to switch from the default 32 bit Office install and go with a 64 bit install (and I've seen clients have to max RAM out on old, low end desktops because they went cheap for business users).
Power Pivot will have a connection to your SQL Server (or other provider). When it refreshes data, it's going to fire off queries and determine the unique values in columns and then create a dictionary of unique values. This allows it to compress the data with low cardinality really well - sales dates are likely going to be repeated heavily within your set so the compression is good. Assuming your customers are typically not-repeat customers, a customer surrogate key would have high cardinality and thus not compress well since there's little to no repeat. The refresh is going to be dependent on your use case and environment. Maybe the user has to manually kick it off, maybe you have SharePoint with Excel services installed and then you can have it refresh the data on various intervals.
If they're good analysts, you might try turning them on to Power BI. Same-ish engine behind the scenes but built from the ground up to be an response reporting tool. If they're just wading through tables of data, they're not ready for PBI. If they are making visuals out of the data, PBI is likely a better fit.
I have an As-Is spreadsheet data source with merge applied in some columns and for weekly data columns are used in incremental format, e.g. for the 2019W12 the next column will be populated (R column).
As-Is Spreadsheet Data Source
I need parse the spreadsheet content and load into SQL Server table using SSIS and proposed format is:
Proposed Spreadsheet Data Transformation
I've tried some alternatives such as apply transformation in SSIS, but column increment case exception in next week load job, I tried to parse and split spreadsheet data with Python (xlrd) but without success to transpose and associate data from columns F to 'N' with columns from A to E. Does anybody faced this type of problem to ingest spreadsheet data using SSIS into SQL Server or have another logical way to transform data before ingestion?
Splitting Question into sub tasks
Based on your question there are three main functionalities that you are looking to achieve:
Finding an efficient way to manipulate Excel files
Unmerge Cells and fill duplicates values
Transpose Rows into Columns
Possible solution
In order to perform a complex transformation you have to do this using one of the following approaches because provides all functionalities that can be done in Microsoft Excel:
.Net Microsoft.Office.Interop.Excel library (C# or VB.NET)
Excel VBA
The solution you are looking for is complex and very specific to the issue, you have to implement the logic at your own. I will provide some links that can helps you to achieve that:
Helpful Links
Unmerge Cells and fill duplicates values
How To Unmerge Cells And Fill With Duplicate Values In Excel? (This link describe the process manually and using VBA)
Unmerging excel rows, and duplicate data
How to unmerge and fill cells in an excel file while loading into datatable
Unmerge and fill values - if the cells are merged to the same column
Manipulating Excel files using C#
C# Excel Interop: Microsoft.Office.Interop.Excel
C# Excel Tutorial
How to automate Microsoft Excel from Microsoft Visual C#.NET
Working with Excel Using C#
Read Excel File in C#
Transpose Excel Rows
C# - Transpose excel table using Interop
C# Transpose() method to transpose rows and columns in excel sheet
Transpose values in excel using C#
I'm stuck on this one...
...we're using linear regression for some trending and forecasting and I'm having to query data, create a dataset, then paste into excel and apply a linest function to my data. Since the data requirements have changed daily, this has become a very cumbersome thing to whip together. I'd want SQL Server to take care of that processing as this will be an automated forecast that I do not want to touch after I hand it over to an end user. When they refresh the data, I want it to refresh the linest function.
Here's some sample data
The [JanTrend] is a logarithmic trend in Excel that takes the trend of the Jan-12, Jan-13, and Jan-14 fields and calculates.
Here's that function in Excel
=LINEST([Jan-12]:[Jan-14]^{1})
The Forecasted field is basically [Jan-14] + [JanTrend].
StockCode Jan-12 Jan-13 Jan-14 JanTrend Forecasted
300168 2 3 11 5 16
300169 1 4 3 1 4
The JanTrend field is where my linest function is located in my excel spreadsheet.
I want to convert the above function to T-SQL or in an SSRS report. How can I achieve this?
EDIT: I'm trying to calculate a logarithmic trend. I made some changes to my sample data to makes things more clear.
the linest excel function is just linear regression. it's (still) not available in sql server, but you will find a lot of examples of UDF's or queries implementing it. just google for "sqlserver udf linear regression" or refer to this previous question.
Are there any Linear Regression Function in SQL Server?
udf's are generally slow, so you might want to go with the solution in the third post in this forum. http://www.sqlservercentral.com/Forums/Topic710626-338-1.aspx
It has been a few months for me working with SSIS, I'm trying to implement a data flow to replace a sequence of SQL Tasks used to do some data transformation.
Data flow description :
Source :
Each row gives information about energy consumed (X) during a number of days (Y).
Destination :
Energy consumed in Day 1 (X/Y), Day 2 (X/Y) , Day 3 (X/Y), .....
Any ideas about how to implement such logic in a single data flow.
Thanks a lot.
Yacine.
If I understand what you're doing correctly, you would like one data flow task that will process the necessary algorithms that are used before storing data - meaning that, as only an example, we may have energy amount data and date data, and we'd like to store energy amount data divided by the day number of the year.
One approach to this would be a Derived Column between the source and destination, where we could apply mathematical functions to our existing data. I've done this with weather data before, where an additional column was added which calculated a value to be stored along with the specific day, temperature and forecast based on the latter two.
Another possible approach is an OLE DB Command, but note (according to SSIS):
Runs [a] SQL statement for each row in a data flow. For example, call a
'new employee setup' stored procedure for each row in the 'new
employees' table. Note: running [a] SQL statement for each row of a
large data flow may take a long time.