there is a table that have few rows however contain several iamges, and I want to know there size , does spaceused gives the right size ?
Does sp_spaceused calculate columns that contain image in a table ?
example:
sp_spaceused tab1
will gives
name rowtotal reserved data index_size unused
tab1 153390 6436832 8248 63270576 79528
this table structure contain image data . so what is the size of the table ?
sp_help table_name Reports information about a database object (any object listed in sysobjects) and about system or user-defined datatypes. Column displays optimistic_index_lock.
Use sp_spaceused [objname [,1] ],That reports the table and each index separately. Dividing the data space used by the row-total will give you one value for the actual row length (not counting fragmentation, which is based on the lock scheme and activity).
data : in use for table/data storage
index : in use by indexes or text/image chains
unused : space reserved for object usage but which hasn't yet been used
Official source that information was taken
Related
Background
I'm using Azure data factory v2 to load data from on-prem databases (for example SQL Server) to Azure data lake gen2. Since I'm going to load thousands of tables, I've created a dynamic ADF pipeline that loads the data as-is in the source based on parameters for schema, table name, modified date (for identifying increments) and so on. This obviously means I can't specify any type of schema or mapping manually in ADF. This is fine since I want the data lake to hold a persistent copy of the source data in the same structure. The data is loaded into ORC files.
Based on these ORC files I want to create external tables in Snowflake with virtual columns. I have already created normal tables in Snowflake with the same column names and data types as in the source tables, which I'm going to use in a later stage. I want to use the information schema for these tables to dynamically create the DDL statement for the external tables.
The issue
Since column names are always UPPER case in Snowflake, and it's case-sensitive in many ways, Snowflake is unable to parse the ORC file with the dynamically generated DDL statement as the definition of the virtual columns no longer corresponds to the source column name casing. For example it will generate one virtual column as -> ID NUMBER AS(value:ID::NUMBER)
This will return NULL as the column is named "Id" with a lower case D in the source database, and therefore also in the ORC file in the data lake.
This feels like a major drawback with Snowflake. Is there any reasonable way around this issue? The only options I can think of is to:
1. Load the information schema from the source database to Snowflake separately and use that data to build a correct virtual column definition with correct cased column names.
2. Load the records in their entirety into some variant column in Snowflake, converted to UPPER or LOWER.
Both options add a lot of complexity or even messes up the data. Is there any straight forward way to only return the column names from an ORC file? Ultimately I would need to be able to use something like Snowflake's DESCRIBE TABLE on the file in the data lake.
Unless you set the parameter QUOTED_IDENTIFIERS_IGNORE_CASE = TRUE you can declare your column in the casing you want:
CREATE TABLE "MyTable" ("Id" NUMBER);
If your dynamic SQL carefully uses "Id" and not just Id you will be fine.
Found an even better way to achieve this, so I'm answering my own question.
With the below query we can get the path/column names directly from the ORC file(s) in the stage with a hint of the data type from the source. This filters out colums that only contains NULL values. Will most likely create some type of data type ranking table for the final data type determination for the virtual columns we're aiming to define dynamically for the external tables.
SELECT f.path as "ColumnName"
, TYPEOF(f.value) as "DataType"
, COUNT(1) as NbrOfRecords
FROM (
SELECT $1 as "value" FROM #<db>.<schema>.<stg>/<directory>/ (FILE_FORMAT => '<fileformat>')
),
lateral flatten(value, recursive=>true) f
WHERE TYPEOF(f.value) != 'NULL_VALUE'
GROUP BY f.path, TYPEOF(f.value)
ORDER BY 1
I have the following fact table : PlaceId, DateId, StatisticId, StatisticValue.
And I have a dimension with the statistics Ids and its names as the following : StatisticId, StatisticName.
I want to load the fact table with Data with 2 statistics. With this architecture, each row of my data will be represented with 2 rows in my fact table.
The Data has the following attributes : Place,Date,Stat1_Value, Stat2_Value.
How to load my fact table with Ids of these measures and its corresponding Values.
Thank You.
I would use SSIS to move your data into a holding table that has the same columns as your data. Then call a stored procedure that uses SQL to populate your fact table, using UNION to get all the Stat1_Values, and then all the Stat2_Values.
My main Access table contains 95 rows. One column is a name field with a unique name in each field. Two other tables also have a name column, but he name field from each of these tables contain one or more names separated by comma and a space. These tables are different lengths too, one has 99 rows the other has 33.
I need to link the data from these tables to a comprehensive form. To do this I think I want to make a cross tab query using the value in the Main table name field. It will need to search the name field of the other tables to see if one of the listed names match.
Please help.
Are You looking for this:
SELECT * FROM mainTable, Tble99Rows, Tbl33Rows
WHERE InStr(mainTable.Name, Tble99Rows.Name) AND InStr(mainTable.Name, Tble33Rows.Name)
?
Note it could be inaccurate, for example: it will link records with name Max and Maxine.
For proper Table Joining, Follow Data-Base normalization rules, in our case the First rule: all the attributes in a relation must have atomic domains. The values in an atomic domain are indivisible units.
EDIT:
Please read more: Is storing a delimited list in a database column really that bad?
We have table which stores information about clients which gets loaded using a scheduled job on daily basis from Data warehouse. There are more than 1 million records in that table.
I wanted to define BitMap Index on Country column as there would be limited number of values.
Does it have any impact on the indexes if we delete and reload data into table on daily basis. Do we need to explicitly rebuild the index after every load
Bitmap index is dangerous when the table is frequently updated (the indexed column) because DML on a single row can lock many rows in the table. That's why it is more data warehouse tool than OLTP. Also the true power of bitmap indexes comes with combining more of them using logical operations and translating the result into ROWIDs (and then accessing the rows or aggregate them). In Oracle in general there is not so many reasons to rebuild an index. When frequently modified it will always adapt by 50/50 block split. It doesn't make sense to try to compact it to smallest possible space. One million rows today is nothing unless each row contains big amount of data.
Also be aware that BITMAP indexes requires Enterprise edition license.
The rationale for defining a bitmap index is not a few values in a column, but a query(s) that can profit from it accessing the table rows.
For example if you have say 4 countries equaly populated, Oracle will not use the index as a FULL TABLE SCAN comes cheaper.
If you have some "exotic" countries (very few records) BITMAP index could be used, but you will most probably spot no difference to a conventional index.
I wanted to define BitMap Index on Country column as there would be limited number of values.
Just because a column is low cardinality does not mean it is a candidate for a bitmap index. It might be, it might not be.
Good explanation by Tom Kyte here.
Bitmap indexes are extremely useful in environments where you have
lots of ad hoc queries, especially queries that reference many columns
in an ad hoc fashion or produce aggregations such as COUNT. For
example, suppose you have a large table with three columns: GENDER,
LOCATION, and AGE_GROUP. In this table, GENDER has a value of M or F,
LOCATION can take on the values 1 through 50, and AGE_GROUP is a code
representing 18 and under, 19-25, 26-30, 31-40, and 41 and over.
For example,
You have to support a large number of ad hoc queries that take the following form:
select count(*)
from T
where gender = 'M'
and location in ( 1, 10, 30 )
and age_group = '41 and over';
select *
from t
where ( ( gender = 'M' and location = 20 )
or ( gender = 'F' and location = 22 ))
and age_group = '18 and under';
select count(*) from t where location in (11,20,30);
select count(*) from t where age_group = '41 and over' and gender = 'F';
You would find that a conventional B*Tree indexing scheme would fail you. If you wanted to use an index to get the answer, you would need at least three and up to six combinations of possible B*Tree indexes to access the data via the index. Since any of the three columns or any subset of the three columns may appear, you would need large
concatenated B*Tree indexes on
GENDER, LOCATION, AGE_GROUP: For queries that used all three, or GENDER with
LOCATION, or GENDER alone
LOCATION, AGE_GROUP: For queries that used LOCATION and AGE_GROUP or LOCATION
alone
AGE_GROUP, GENDER: For queries that used AGE_GROUP with GENDER or AGE_GROUP
alone
Having only a single Bitmap Index on a table is useless in most times. The benefit of Bitmap Indexes you get when you have several created on a table and your query combines them.
Maybe a List-Partition is more suitable in your case.
I'm looking for an advice about structuring a data table as in title to make it efficient for querying and writing. I store information about an entity which has usual data types, numbers, short string etc. Now I need to store additional field with large amount of data (~ 30 KB) and I'm looking at two options:
add a column an nvarchar(100000) in the entity table
create separate table to store such data and link from the entity table
other factors:
each entity row will have an accompanying large text field
each accompanying text field will have at least 20 KB of data
~20% of queries against entity table also need the large field. Other queries can do without it
~95% of queries seek for single entity
I'm using an O/RM to access the data, so all the columns are pulled in (I could pick and choose by making the code look horrid)
Right now I'm leaning toward having a separate table, but it also has a bad side in that I have to remember some concerns about keeping data consistent.
It's hard to make a decision without doing a real benchmark, but this could require few days of work so I'm turning to SO for a shortcut.
We recently had this exact problem. (though it was an XML Column instead of an NVarchar(max)) but the problem is the exact same.
Our use case was to display a list of records on a web page (the first 6 columns) of the table
and then to store a tonne of additional information in the nvarchar(max) column which got displayed once your selected an individual row.
Originally a single table contained all 7 columns.
TABLE 1
INT ID (PK IDentity)
5 other columns
NVARCHAR(max)
Once we refactored it to the following we got a massive perf. boost.
TABLE 1
INT ID (PK IDentity)
5 other columns
INT FID (FK -TABLE2)
TABLE 2
FID (PK IDENTITY)
nvarchar(max)
The reason is that if the nvarchar(max) is short enough, it will be stored "in-row" but if it extends beyond the page size, then it gets stored elsewhere, and depending on a) the size of the table and record set your querying, and b) the amount of data in your nvarchar(max) this can have a pretty dramatic perf. drop.
Have a read of this link:
http://msdn.microsoft.com/en-us/library/ms189087.aspx
When a large value type or a large object data type column value is
stored in the data row, the Database Engine does not have to access a
separate page or set of pages to read or write the character or binary
string. This makes reading and writing the in-row strings about as
fast as reading or writing limited size varchar, nvarchar, or
varbinary strings. Similarly, when the values are stored off-row, the
Database Engine incurs an additional page read or write.
I'd bite the bullet now, and design your tables to store the large nvarchar(max) in a seperate table, assuming you don't need the data it contains in every select query
With regards, your comment about using an ORM. we were also using NHibernate in our situation. It's relatively easy to configure your mappings to lazy-load the related object on demand.
Well, you could start with documentation...
add a column an nvarchar(100000) in the entity table
Given the documented max size of 8000 bytes for a field and thus nvarchar(4000) being the maximum, I am interested to know how you consider this an option?
nvarchar(max) - ntext etc. would be the right thing to do.
And then you should read up on full text search, which is in SQL Serve pretty much for ages. Your ORM likely does not support it though - technology choices limiting features is typical when people - have a problem abstract things. Not something I would access with an ORM.