DB space usage outages - database

Picked up that one of the DB is using a lot of space.
what is the best script or stored procedure which returns details information to see which table is using the most space?
is there a log maybe that shows which application added the data on the system logs?

Please see this question that has been asked previously: postgresql list and order tables by size.
TL;DR try the below:
select table_schema, table_name, pg_relation_size('"'||table_schema||'"."'||table_name||'"')
from information_schema.tables
order by 3 desc

Related

Snowflake ACCOUNT_USAGE Views - TABLES (ROW_COUNT, BYTES) are not updated within LATENCY period

I discovered while gathering metadata stom SNOWFLAKE - "ACCOUNT USAGE VIEWS" that the values in TABLES(ROW_COUNT, BYTES) are not being updated. I expect that there is some LATENCY period delay, but for most objects I have "0". I have to clearly state that those tables are not new. They are there for months.
While comparing with INFORMATION_SCHEMA.TABLES I see the up to date figures.
Other (also ACCOUNT_USAGE) view TABLE_STORAGE_METRICS show at least information about size. And its calculated there.
This is my account_usage select:
use schema snowflake.account_usage;
Select table_id, table_catalog, table_schema,table_name, ROW_COUNT , bytes From tables where TABLE_TYPE='BASE TABLE' AND DELETED IS NULL AND table_owner IS NOT NULL
and the information_schema select is:
SELECT table_catalog, table_schema,table_name, ROW_COUNT , bytes From CS_DB.INFORMATION_SCHEMA.TABLES where TABLE_TYPE='BASE TABLE'
Account Usage
INFORMATION_SCHEMA
#snowflake Please could you help,
I believe its an error, and it would be benefiacial for all if Snowflake fix the problem.
Or would be someone be able to explain why there is discrepancy?
I dont have 0 for all tables(account_usage), but in most tables.
Best regards,
Petr
I'd strongly recommend you document this like you have here and open a Snowflake Support Request.
Here is the link that explains how to do this:
https://community.snowflake.com/s/article/How-To-Submit-a-Support-Case-in-Snowflake-Lodge?_ga=2.44556966.304033970.1592222636-1653238650.1560273432&_gac=1.3262980.1591987547.EAIaIQobChMI9bPQ2_f86QIVAeiGCh3tLQFbEAAYASAAEgIMZ_D_BwE
I hope this helps...Rich

How can I find out if someone modified a row in SQL server in a specific date?

I am just wondering, can I find out if somebody wrote a query and updated a row against specific table in some date?
I tried this :
SELECT id, name
FROM sys.sysobjects
WHERE NAME = ''
SELECT TOP 1 *
FROM ::fn_dblog(NULL,NULL)
WHERE [Lock Information] LIKE '%TheOutoput%'
It does not show me ?
Any suggestions.
No, row level history/change stamps is not built into SQL Server. You need to add that in the table design. If you want an automatic update date column it would typically be set by a trigger on the table.
There is however a way if you really need to find out what happened in a forensics scenario. But that is only available if you have the right backup plans. What you can do then is to use the DB transaction log to find when the modification was done. Note that this is not anything an application can or should do runtime.

SQL Server - best way to validate SQL schema and seed data

I am working on a web based app with a SQL Server backend. This is a somewhat legacy app that I've just begun working on a few months ago and the database versioning situation is a bit of a mess.
There is one set of scripts for a new install and another set for upgrading from version to version. Some of the scripts update the schema, others insert seed data. Other people, not developers, do the deployment and running of these scripts. Because of the versioning situation, there are sometimes issues with the scripts.
I'm revising the scripts to be more hardy, less likely to fail, and to have better logging when they do.
Meanwhile I want to do is create a validation script that we can run after a deployment. The script would runs and checks if all the necessary tables are there with the expected schema, and that the seed data scripts ran, everything is how it should be. Other than writing a ton of 'if not exists' (write to log) type statements, is there a better way to do this?
I can sometimes use Visual Studio schema compare to compare the newly updated database to an existing one, but data compare is not feasible in our environment.
I have the same situation on some of my projects and I use the following approach:
I have written the script which collects all significant data from DB (descriptions of tables, columns, their orders\types..., indexes, etc. and even stored procedures) as XML and after that calculates a hash of it. Then the script checks the calculated hash value against expected (calculated in development environment).
As result I have quite simple way to check a consistency of the DB as before development to ensure that I have an actual DB state as during test and production deployments to ensure that all expected changes were included into the release.
I suppose, you can use the similar approach but to include into a list of significant data some additional information to control seeding data too. Of course, to calculate a hash of all data into DB isn’t good idea, but if you know your DB you can find simple signs to control it (rows count, max Id, last modified date, etc.).
Using INFORMAION_SCHEMA in the versioned database would be helpful for this.
First create a persisted table named DBSCHEMA to store all the versions of the database with an initial version of the database you want to track the changes (versions) in:
SELECT ID=IDENTITY(int,1,1), TABLE_CATALOG, TABLE_NAME, COLUMN_NAME,
DATA_TYPE, ORDINAL_POSITION, '1.0.1' AS VERSION, GETDATE() AS VersionDate
INTO DBSCHEMA FROM INFORMATION_SCHEMA.COLUMNS
ORDER BY TABLE_NAME, ORDINAL_POSITION
Note: You can get more columns than you need from information schema for precision, text length, etc. I did not include those.
With a subsequent change to the database schema, you will add a new version, Ex. '1.0.2' and perform an insert of the new schema into the same table and increment the version each time.
INSERT DBSCHEMA (TABLE_CATALOG, TABLE_NAME, COLUMN_NAME, DATA_TYPE, ORDINAL_POSITION,
VERSION, VersionDate)
SELECT TABLE_CATALOG, TABLE_NAME, COLUMN_NAME, DATA_TYPE, ORDINAL_POSITION, '1.0.2',
GETDATE()
FROM INFORMATION_SCHEMA.COLUMNS
ORDER BY TABLE_NAME, ORDINAL_POSITION
Now after you have multiple versions, you can check for changes between the versions with a query similar to the following to see what tables have changed in the database. I would change this up depending on if I am looking for tables or columns that changed.
SELECT T1.TABLE_NAME, T1.COLUMN_NAME from DBSCHEMA T1 INNER JOIN DBSCHEMA T2 ON
T1.TABLE_NAME = T2.TABLE_NAME
WHERE T1.VERSION = '1.0.1'
AND T1.Column_NAME NOT IN (SELECT COLUMN_NAME FROM DBSCHEMA WHERE VERSION = '1.0.2')
OR T1.TABLE_NAME NOT IN (SELECT TABLE_NAME FROM DBSCHEMA WHERE VERSION = '1.0.2')
The results of this query will give you changes that have occurred for tables and columns, but you can get more granular and check for data type, precision, etc. if you need more information. (There is probably a CTE query that would work better for this last query, but the one above gives you an idea of how to find the changes between the versions.)
Regarding the seed data, I would create another table for this that uses the same version as the one you inserted in DBSCHEMA and store the table name and count(*) of each table containing seed data. This will allow you see if it is there, and/or if the count of seed data changed between versions.

error when insert into linked server

I want to insert some data on the local server into a remote server, and used the following sql:
select * into linkservername.mydbname.dbo.test from localdbname.dbo.test
But it throws the following error
The object name 'linkservername.mydbname.dbo.test' contains more than the maximum number of prefixes. The maximum is 2.
How can I do that?
I don't think the new table created with the INTO clause supports 4 part names.
You would need to create the table first, then use INSERT..SELECT to populate it.
(See note in Arguments section on MSDN: reference)
The SELECT...INTO [new_table_name] statement supports a maximum of 2 prefixes: [database].[schema].[table]
NOTE: it is more performant to pull the data across the link using SELECT INTO vs. pushing it across using INSERT INTO:
SELECT INTO is minimally logged.
SELECT INTO does not implicitly start a distributed transaction, typically.
I say typically, in point #2, because in most scenarios a distributed transaction is not created implicitly when using SELECT INTO. If a profiler trace tells you SQL Server is still implicitly creating a distributed transaction, you can SELECT INTO a temp table first, to prevent the implicit distributed transaction, then move the data into your target table from the temp table.
Push vs. Pull Example
In this example we are copying data from [server_a] to [server_b] across a link. This example assumes query execution is possible from both servers:
Push
Instead of connecting to [server_a] and pushing the data to [server_b]:
INSERT INTO [server_b].[database].[schema].[table]
SELECT * FROM [database].[schema].[table]
Pull
Connect to [server_b] and pull the data from [server_a]:
SELECT * INTO [database].[schema].[table]
FROM [server_a].[database].[schema].[table]
I've been struggling with this for the last hour.
I now realise that using the syntax
SELECT orderid, orderdate, empid, custid
INTO [linkedserver].[database].[dbo].[table]
FROM Sales.Orders;
does not work with linked servers. You have to go onto your linked server and manually create the table first, then use the following syntax:
INSERT INTO [linkedserver].[database].[dbo].[table]
SELECT orderid, orderdate, empid, custid
FROM Sales.Orders
WHERE shipcountry = 'UK';
I've experienced the same issue and I've performed the following workaround:
If you are able to log on to remote server where you want to insert data with MSSQL or sqlcmd and rebuild your query vice-versa:
so from:
SELECT * INTO linkservername.mydbname.dbo.test
FROM localdbname.dbo.test
to the following:
SELECT * INTO localdbname.dbo.test
FROM linkservername.mydbname.dbo.test
In my situation it works well.
#2Toad: For sure INSERT INTO is better / more efficient. However for small queries and quick operation SELECT * INTO is more flexible because it creates the table on-the-fly and insert your data immediately, whereas INSERT INTO requires creating a table (auto-ident options and so on) before you carry out your insert operation.
I may be late to the party, but this was the first post I saw when I searched for the 4 part table name insert issue to a linked server. After reading this and a few more posts, I was able to accomplish this by using EXEC with the "AT" argument (for SQL2008+) so that the query is run from the linked server. For example, I had to insert 4M records to a pseudo-temp table on another server, and doing an INSERT-SELECT FROM statement took 10+ minutes. But changing it to the following SELECT-INTO statement, which allows the 4 part table name in the FROM clause, does it in mere seconds (less than 10 seconds in my case).
EXEC ('USE MyDatabase;
BEGIN TRY DROP TABLE TempID3 END TRY BEGIN CATCH END CATCH;
SELECT Field1, Field2, Field3
INTO TempID3
FROM SourceServer.SourceDatabase.dbo.SourceTable;') AT [DestinationServer]
GO
The query is run on DestinationServer, changes to right database, ensures the table does not already exist, and selects from the SourceServer. Minimally logged, and no fuss. This information may already out there somewhere, but I hope it helps anyone searching for similar issues.

SQL Server Linked Server Example Query

While in Management Studio, I am trying to run a query/do a join between two linked servers.
Is this a correct syntax using linked db servers:
select foo.id
from databaseserver1.db1.table1 foo,
databaseserver2.db1.table1 bar
where foo.name=bar.name
Basically, do you just preface the db server name to the db.table ?
The format should probably be:
<server>.<database>.<schema>.<table>
For example:
DatabaseServer1.db1.dbo.table1
Update: I know this is an old question and the answer I have is correct; however, I think any one else stumbling upon this should know a few things.
Namely, when querying against a linked server in a join situation the ENTIRE table from the linked server will likely be downloaded to the server the query is executing from in order to do the join operation. In the OP's case, both table1 from DB1 and table1 from DB2 will be transferred in their entirety to the server executing the query, presumably named DB3.
If you have large tables, this may result in an operation that takes a long time to execute. After all it is now constrained by network traffic speeds which is orders of magnitude slower than memory or even disk transfer speeds.
If possible, perform a single query against the remote server, without joining to a local table, to pull the data you need into a temp table. Then query off of that.
If that's not possible then you need to look at the various things that would cause SQL server to have to load the entire table locally. For example using GETDATE() or even certain joins. Others performance killers include not giving appropriate rights.
See http://thomaslarock.com/2013/05/top-3-performance-killers-for-linked-server-queries/ for some more info.
SELECT * FROM OPENQUERY([SERVER_NAME], 'SELECT * FROM DATABASE_NAME..TABLENAME')
This may help you.
For those having trouble with these other answers , try OPENQUERY
Example:
SELECT * FROM OPENQUERY([LinkedServer], 'select * from [DBName].[schema].[tablename]')
If you still find issue with <server>.<database>.<schema>.<table>
Enclose server name in []
You need to specify the schema/owner (dbo by default) as part of the reference. Also, it would be preferable to use the newer (ANSI-92) join style.
select foo.id
from databaseserver1.db1.dbo.table1 foo
inner join databaseserver2.db1.dbo.table1 bar
on foo.name = bar.name
select * from [Server].[database].[schema].[tablename]
This is the correct way to call.
Be sure to verify that the servers are linked before executing the query!
To check for linked servers call:
EXEC sys.sp_linkedservers
right click on a table and click script table as select
select name from drsql01.test.dbo.employee
drslq01 is servernmae --linked serer
test is database name
dbo is schema -default schema
employee is table name
I hope it helps to understand, how to execute query for linked server
Usually direct queries should not be used in case of linked server because it heavily use temp database of SQL server. At first step data is retrieved into temp DB then filtering occur. There are many threads about this. It is better to use open OPENQUERY because it passes SQL to the source linked server and then it return filtered results e.g.
SELECT *
FROM OPENQUERY(Linked_Server_Name , 'select * from TableName where ID = 500')
For what it's worth, I found the following syntax to work the best:
SELECT * FROM [LINKED_SERVER]...[TABLE]
I couldn't get the recommendations of others to work, using the database name. Additionally, this data source has no schema.
In sql-server(local) there are two ways to query data from a linked server(remote).
Distributed query (four part notation):
Might not work with all remote servers. If your remote server is MySQL then distributed query will not work.
Filters and joins might not work efficiently. If you have a simple query with WHERE clause, sql-server(local) might first fetch entire table from the remote server and then apply the WHERE clause locally. In case of large tables this is very inefficient since a lot of data will be moved from remote to local. However this is not always the case. If the local server has access to remote server's table statistics then it might be as efficient as using openquery More details
On the positive side T-SQL syntax will work.
SELECT * FROM [SERVER_NAME].[DATABASE_NAME].[SCHEMA_NAME].[TABLE_NAME]
OPENQUERY
This is basically a pass-through. The query is fully processed on the remote server thus will make use of index or any optimization on the remote server. Effectively reducing the amount of data transferred from the remote to local sql-server.
Minor drawback of this approach is that T-SQL syntax will not work if the remote server is anything other than sql-server.
SELECT * FROM OPENQUERY([SERVER_NAME], 'SELECT * FROM DATABASE_NAME.SCHEMA_NAME.TABLENAME')
Overall OPENQUERY seems like a much better option to use in majority of the cases.
I have done to find out the data type in the table at link_server using openquery and the results were successful.
SELECT * FROM OPENQUERY (LINKSERVERNAME, '
SELECT DATA_TYPE, COLUMN_NAME
FROM [DATABASENAME].INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME =''TABLENAME''
')
Its work for me
Following Query is work best.
Try this Query:
SELECT * FROM OPENQUERY([LINKED_SERVER_NAME], 'SELECT * FROM [DATABASE_NAME].[SCHEMA].[TABLE_NAME]')
It Very helps to link MySQL to MS SQL
PostgreSQL:
You must provide a database name in the Data Source DSN.
Run Management Studio as Administrator
You must omit the DBName from the query:
SELECT * FROM OPENQUERY([LinkedServer], 'select * from schema."tablename"')
For MariaDB (and so probably MySQL), attempting to specify the schema using the three-dot syntax did not work, resulting in the error "invalid use of schema or catalog". The following solution worked:
In SSMS, go to Server Objects > Linked Servers > Providers > MSDASQL
Ensure that "Dynamic parameter", "Level zero only", and "Allow inprocess" are all checked
You can then query any schema and table using the following syntax:
SELECT TOP 10 *
FROM LinkedServerName...[SchemaName.TableName]
Source: SELECT * FROM MySQL Linked Server using SQL Server without OpenQuery
Have you tried adding " around the first name?
like:
select foo.id
from "databaseserver1".db1.table1 foo,
"databaseserver2".db1.table1 bar
where foo.name=bar.name

Resources