Adding data to and querying a partitioned BigQuery table - google-app-engine

In BigQuery I'm making a partitioned data table (partitioned by hour) and while data are going into it, it doesn't appear that the results have a _PARTITIONTIME pseudocolumn; when I do
SELECT
_PARTITIONTIME AS pt,
*
FROM
[my_dataset.my_partitioned_table]
LIMIT
1000
I get all the regular columns of my table, but _PARTITIONTIME is null for every entry. The data are being sent in from a call to the Go BigQuery API the same way they were when I was sending the data to an unpartitioned table, and they're queried from the BigQuery console. Would the data be more likely to be being inserted wrong or queried wrong?

Currently only DAY is supported as a partitioning type!
See more details in timePartitioning property
And see more about Partitioned Tables in general

Related

Large table issue in Snowflake

I have large tables with about 2.2 gbs of data. When I use SELECT * to select a row in the tables, it takes about 14 mins to run. Is there a method to speed up this query?
Here are some other information that might be helpful:
~ 2 million rows
~ 25k columns
data type: Varcar
Warehouse:
Size: Computer_WH
Clusters: min:1, max:2
Auto Suspension: 10 minutes
Owner: ACCOUNTADMIN
2gb is not that large, and very much should not be taking 14m on a X-SMALL warehouse.
First rule of Snowflake, don't SELECT * FROM x, for two reasons,
The query compile has to wait for all meta data to be loaded for all partitions, before the plan can start being built as some partitions might have more data that the first partitions. Thus the output shape cannot be planned until all is known.
Second reason, when you "select all columns", all columns are loaded from disk, and if your data is unstructured JSON is has to rebuild all that data, which is "relatively expensive". You should name the columns you want, and only the columns you want.
If you are wanting to join to another table to do some filtering, just select the columns needed to do the filter, and the join, and then get the set of keys you want and re-join to the base table on those results (sometimes as a second query) so pruning can happen.
sigh, I have just looked at your stats a little hard 25K columns... sigh. This is not a database, this is something very painful..
As a strong opinion you cannot have a row of data that makes sense to have 25K related and meaning full columns. You have a table with a primary key, and it should have something like 25K rows of subtype data per attribute. Yes it means you have to exploded the data out via a PIVOT or the likes, but it's more honest about the relations present in the data, and how to process this volume of data.
With columnar databases each column in a table has it's own file. Previously each table was a file (older DBMS's). If you have 25,000 columns you'd be selecting 25,000 files.
Some of these files are big and some are small -> this is dependent on the data type and # distinct values.
If you found a column that say had 100 distinct values and just selected that column from your table I'd guess you'd get sub second response times.
So back to your problem ... instead of choosing all the columns (*) why not just choose some interesting ones?

How to join query_id & METADATA$ROW_ID in SnowFlake

I am working on tracking the changes in data along with few audit details like user who made the changes.
Streams in Snowflake gives delta records details and few audit columns including METADATA$ROW_ID.
Another table i.e. information_schema.query_history contain query history details including query_id, user_name, DB name, schema name etc.
I am looking for a way so that I can join query_id & METADATA$ROW_ID so that I can find the user_name corresponding to each change in data.
any lead will be much appreciated.
Regards,
Neeraj
The METADATA$ROW_ID column in a stream uniquely identifies each row in the source table so that you can track its changes using the stream.
It isn't there to track who changed the data, rather it is used to track how the data changed.
To my knowledge Snowflake doesn't track who changed individual rows, this is something you would have to build into your application yourself - by having a column like updated_by for example.
Only way i have found is to add
SELECT * FROM table(information_schema.QUERY_HISTORY_BY_SESSION()) ORDER BY start_time DESC LIMIT 1
during reports / table / row generation
Assuming that you have not changed setting that you can run more queries at same time in one session , that gets running querys id's , change it to CTE and do cross join to in last part of select to insert it to all rows.
This way you get all variables in query_history table. Also remember that snowflake does keep SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY ( and other data ) up to one year. So i recommend weekly/monthly job which merges data into long term history table. That way you an also handle access to history data much more easier that giving accountadmin role to users.

More Than one Column under same column Heading like Colspan

is it possible to get this output from select Query
I tried the below query
select monthly + savings as monthly savings from table
but the resultant data is under one column
is there any solution to get more than one column under same heading
There is no way to retrieve the information from SQL Server with merged headers, at least not with the most widely used clients.
SQL Server is a relational database and it's fundation is based on sets of data arranged in tables with columns, rows and relationships between them. Suppresing a header would mean breaking the column-value link. If you want to manipulate them, you will have to do so after retrieving the data from the database, maybe on your display layer or in a helper process between the database and your presentation, as Tim suggested.

SSIS Move Data Between Databases - Maintain Referential Integrity

I need to move data between two databases and wanted to see if SSIS would be a good tool. I've pieced together the following solution, but it is much more complex than I was hoping it would be - any insight on a better approach to tackling this problem would be greatly appreciated!
So what makes my situation unique; we have a large volume of data, so to keep the system performant we have split our customers into multiple database servers. These servers have databases with the same schema, but are each populated with unique data. Occasionally we have the need to move a customer's data from one server to another. Because of this, simple recreating the tables and moving the data in place won't work as in the database on server A there could be 20 records, but there could be 30 records in the same table for the database on server B. So when moving record 20 from A to B, it will need to be assigned ID 31. Getting past this wasn't difficult, but the trouble comes when needing to move the tables which have a foreign key reference to what is now record 31....
An example:
Here's a sample schema for a simple example:
There is a table to track manufacturers, and a table to track products which each reference a manufacturer.
Example of data in the source database:
To handle moving this data while maintaining relational integrity, I've taken the approach of gathering the manufacturer records, looping through them, and for each manufacturer moving the associated products. Here's a high level look at the Control Flow in SSDT:
The first Data Flow grabs the records from the source database and pulls them into a Recordset Destination:
The OLE DB Source pulls from the source databases manufacturer table while pulling all columns, and places it into a record set:
Back in the control flow, I then loop through the records in the Manufacturer recordset:
For each record in the manufacturer recordset I then execute a SQL task which determines what the next available auto-incrementing ID will be in the destination database, inserts the record, and then returns the results of a SELECT MAX(ManufacturerID) in the Execute SQL Task result set so that the newly created Manufacturer ID can be used when inserting the related products into the destination database:
The above works, however once you get more than a few layers deep of tables that reference one another, this is no longer very tenable. Is there a better way to do this?
You could always try this:
Populate you manufacturers table.
Get your products data (ensure you have a reference such as name etc. to manufacturer)
Use a lookup to get the ID where your name or whatever you choose matches.
Insert into database.
This will keep your FK constraints and not require you to do all that max key selection.

ora 01858 while inserting data from other table with same schema in same database

I am an application programmer but currently I have a situation in which I need to copy a huge amount of data which is collected for 1 month say approx 653 GB of data from the table in one database to exact similar table in other database (both oracle 11G). Each row size is approx 150 bytes. So the number of rows are approximately 4000 millions. I am not joking.
I have to do this. The table which holds this data (source table) is partitioned based on date column. So there is a partition for each day of the month and hence in total 31 partitions for December month.
The target db is partitioned based on month. So there is a single partition in the target db for complete december month.
I have chosen to copy the data over db link and with the help of dba's I created a db link between this 2 databases.
I have a store procedure in target db which accepts input parameter as (date, tablename). What this procedure does is it creates a temporary table in target db with name as tablename and copies all the data from the source db for the given date into this temporary table in target database. I have done it successfully for 2-3 days. Now I want to insert this data in the temporary table into actual table in the same target database. For that I executed following query:
insert into schemaname.target_table select * from schemaname.temp_table;
But I am getting folloowing ORA error.
ORA-01858: a non-numeric character was found where a numeric was expected
Both the tables have exact same table defination. I searched over internet for copying data and found the above query to insert as simplest. But I don't understand the error. Searching for this errors show it is has something to do with date column. But shouldn't it work as both the tables have same table structure?
Data types used in the table are varchar2(x), date, number(x,y), char(x).
Please help me to get over this error. Let me know if any other information is required.
It means your schemaname.temp_table table has some non-numeric value which doesn't inserted in your new table. Did schemaname.temp_table table populated with some script or any automated tool? There is possibility of empty space or junk character inserted in schemaname.temp_table. Kindly check once again using any sql tool.

Resources