How to retrieve images faster in SQL Server 2014?

How to retrieve images faster in SQL Server 2014? - sql-server

We are using SQL Server 2014 (not SQL Azure) and I have defined a simple table to store images (.jpg) and
CREATE TABLE [dbo].[Graphic](
[GraphicID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
[FileName] [varchar](100) NOT NULL,
[FileDescription] [nvarchar](200) NULL,
[Image] [varbinary](max) NULL
)
max size of an image stored is 1MB, this validation is taken care on the front-end. I just inserted 15 images and the table size is 5544 KB currently. There is a primary key placed on GraphicID column. No other indexes placed.
But when I retrieve one or more images using the below (simple SELECT) query, it is taking longer time like 25 - 30 seconds.
select * from [Graphic]
where [GraphicID] = 53
Is there a faster mechanism to query images in SQL Server in less than 5 seconds ?
Is there any alternate SAVE & RETRIEVE mechanism for images in SQL Server 2014 for better performance ?
Please help.
Thanks
Bhanu

CREATE TABLE [dbo].[Graphic](
[GraphicID] [int] IDENTITY(1,1) NOT NULL,
[FileName] [varchar](100) NOT NULL,
[FileDescription] [nvarchar](200) NULL,
[Image] [varbinary](max) NULL
)
If that is indeed your schema (and you seem extremely unsure of it), the problem is that you never added indices to your table. Add a clustered index over GraphicID and it should fix this particular access pattern.
Another note would be that if you know the maximum size of your varbinary (and you said you do), you should use it rather than using max. That way the table layout will store the image inside the row (up to a certain size) rather than appending it at the end, making row retrieval (select *) a lot faster for larger, fragmented tables.

Related

Points inside polygons - SQL Server T-SQL + geo

I have 2 tables, one containing a bunch of polygons like this
CREATE TABLE [dbo].[GeoPolygons](
[OID] [int] IDENTITY(1,1) NOT NULL,
[Version] [int] NULL,
[entity_id] [varchar](200) NULL,
[Geometry] [geometry] NULL
)
and one containing a bunch of nodes (places) like this
CREATE TABLE [dbo].[GeoPoints](
[point_id] [int] NOT NULL,
[point_name] [varchar](40) NULL,
[latitude] [decimal](8, 6) NULL,
[longitude] [decimal](9, 6) NULL,
[point_geometry] [geography] NULL
)
These tables came from disparate sources, so one has a geography field type and the other has a geometry field type.
What I want to do - which is very possible within my GIS, but I want to do it in T-SQL for a variety of reasons - is find out which nodes are inside which polygons.
Is the first step to match the geo-field types? geometry-to-geometry or geography-to-geography?
Or can that be done on the fly?
My ideal output would be a 3 field table
CREATE TABLE [dbo].[Points2Polygons](
[match_id] [int] IDENTITY(1,1) NOT NULL,
[entity_id] [varchar](200) NOT NULL,
[point_id] [int] NOT NULL
)
and be able to update that on the fly (or perhaps daily) when new records are added to each table.
I found this post but it seems to deal with a single point and a single polygon, as well as WKT definitions. I don't have WKT, and I have thousands of polys and thousands of points. I want to do it at a larger scale.
How do I do it in a T-SQL query?
Server is running SQL-Server Web 64-bit V15.0.4138.2 on Server 2019 Datacenter.
TIA

From the comments above, here's a proposal to convert the Geometry column in your GeoPolygons table. As with anything like this where you have data one way and you want it to look a different way on an ongoing basis, the high level steps are:
Start writing the data in both formats
Convert the old format into the new format
Convert all read paths to the new format
Drop the old format
I'll be focusing on "Convert the old format into the new format". Create a new column in your table (I'll call it Polygon).
alter table [dbo].[GeoPolygons] add
Polygon geography null;
Note that this is a prerequisite for the "Start writing the data in both formats" phase and so should already be done by the time you're ready to convert data.
The most straightforward method to do that looks like this:
update [dbo].[GeoPolygons]
set [Polygon] = geography::STGeomFromText(
[Geometry].STAsText(),
[Geometry].STSrid
)
where [Geometry] is not null
and [Polygon] is null;
I'm making the assumption that the SRID on your Geometry column is set properly. If not, you'll have to find the SRID that is appropriate given the WKT that was used to create t

Recreate index on column store indexed table with 35 billion rows

I have a big table that I need to rebuild the index. The table is configured with Clustered Column Store Index (CCI) and we realized we need to sort the data according to specific use case.
User performs date range and equality query but because the data was not sorted in the way they would like to get it back, the query is not optimal. SQL Advisory Team recommended that data are organized in right row group so query can benefit from row group elimination.
Table Description:
Partition by Timestamp1, monthly PF
Total Rows: 31 billion
Est row size: 60 bytes
Est table size: 600 GB
Table Definition:
CREATE TABLE [dbo].[Table1](
[PkId] [int] NOT NULL,
[FKId1] [smallint] NOT NULL,
[FKId2] [int] NOT NULL,
[FKId3] [int] NOT NULL,
[FKId4] [int] NOT NULL,
[Timestamp1] [datetime2](0) NOT NULL,
[Measurement1] [real] NULL,
[Measurement2] [real] NULL,
[Measurement3] [real] NULL,
[Measurement4] [real] NULL,
[Measurement5] [real] NULL,
[Timestamp2] [datetime2](3) NULL,
[TimeZoneOffset] [tinyint] NULL
)
CREATE CLUSTERED COLUMNSTORE INDEX [Table1_ColumnStoreIndex] ON [dbo].[Table1] WITH (DROP_EXISTING = OFF)
GO
Environment:
SQL Server 2014 Enterprise Ed.
8 Cores, 32 GB RAM
VMWare High
Performance Platform
My strategy is:
Drop the existing CCI
Create ordinary Clustered Row Index with the right columns, this will sort the data
Recreate CCI with DROP EXISTING = OFF. This will convert the existing CRI into CCI.
My questions are:
Does it make sense to rebuild the index or just reload the data? Reloading may take a month to complete where as rebuilding the index may take as much time either, maybe...
If I drop the existing CCI, the table will expand as it may not be compressed anymore?

31 billion rows is 31,000 perfect row groups, a rowgroup is just another horizontal partitioning, so it really matters when and how you load your data. SQL 2014 supports only offline index build.
There are a few cons and pros when considering create index vs. reload:
Create index is a single operation, so if it fails at any point you lost your progress. I would not recommend it at your data size.
Index build will create primary dictionaries so for low cardinality dictionary encoded columns it is beneficial.
Bulk load won't create primary dictionaries, but you can reload data if for some reason your batches fail.
Both index build and bulk load will be parallel if you give enough resources, which means your ordering from the base clustered index won't be perfectly preserved, this is just something to be aware of; at your scale of data it won't matter if you have a few overlapping rowgroups.
If your data will undergo updates/deletes and you reorganize (from SQL19 will also do it Tuple Mover) your ordering might degrade over time.
I would create a Clustered Index ordered and partition on the date_range column so that you have anything between 50-200 rowgroups per partition (do some experiments). Then you can create a partition aligned Clustered Columnstore Index and switch in one partition at a time, the partition switch will trigger index build so you'll get the benefit from primary dictionaries and if you end up with updates/deletes on a partition you can fix the index quality up by rebuilding the partition rather than the whole table. If you decide to use reorganize you still maintain some level of ordering, because rowgroups will only be merged within the same partition.

Deciding clustered index in Microsoft SQL Server

We are creating this “clients” table that will have around 50 million records.
I am having a hard time deciding the ‘clustered index’.
Theory says that it should be: Unique,Narrow,Static, Ever-increasing pattern… But in practice it should be the key you use to refer to your records most often.
The table has 50 columns…
Per the first approach the CI should be:
[Client_id] [bigint] IDENTITY(1,1) NOT NULL,
But I feel tempted to use:
[SF_id] [varchar](18) NOT NULL,
or
[UpdateDate] [datetime] NOT NULL,
or
[SystemModStamp] [datetime] NOT NULL,
Reality is that I do not exactly how the end users will query the table: but, I know they will use SF_id quite often and I know they will rarely use Client_id… And I also know, me myself I will use UpdateDate or SystemModStamp (not sure yet), I will use it as the key for ‘delta’ daily merges that I will set up in a Job/SP.

SQL design for various data types

I need to store data in a SQL Server 2008 database from various data sources with different data types. Data types allowed are: Bit, Numeric (1, 2 or 4 bytes), Real and String. There is going to be a value, a timestamp, a FK to the item of which the value belongs and some other information for the data stored.
The most important points are the read performance and the size of the data. There might be a couple thousand items and each item may have millions of values.
I have 5 possible options:
Separate tables for each data type (ValueBit, ValueTinyInt, ValueSmallInt, etc... tables)
Separate tables with inheritance (Value table as base table, ValueBit table just for storing the Bit value, etc...)
Single value table for all data types, with separate fields for each data type (Value table, with ValueBit BIT, ValueTinyInt TINYINT etc...)
Single table and single value field using sql_variant
Single table and single value field using UDT
With case 2, a PK is a must, and,
1000 item * 10 000 000 data each > Int32.Max, and,
1000 item * 10 000 000 data each * 8 byte BigInt PK is huge
Other than that, I am considering 1 or 3 with no PK. Will they differ in size?
I do not have experience with 4 or 5 and I do not think that they will perform well in this scenario.
Which way shall I go?

Your question is hard to answer as you seem to use a relational database system for something it is not designed for. The data you want to keep in the database seems to be too unstructured for getting much benefit from a relational database system. Database designs with mostly fields like "parameter type" and "parameter value" that try to cover very generic situations are mostly considered to be bad designs. Maybe you should consider using a "non relational database" like BigTable. If you really want to use a relational database system, I'd strongly recommend to read Beginning Database Design by Clare Churcher. It's an easy read, but gets you on the right track with respect to RDBS.

What are usage scenarios? Start with samples of queries and calculate necessary indexes.
Consider data partitioning as mentioned before. Try to understand your data / relations more. I believe the decision should be based on business meaning/usages of the data.

I think it's a great question - This situation is fairly common, though it is awkward to make tables to support it.
In terms of performance, having a table like indicated in #3 potentially wastes a huge amount of storage and RAM because for each row you allocate space for a value of every type, but only use one. If you use the new sparse table feature of 2008 it could help, but there are other issues too: it's a little hard to constrain/normalize, because you want only only one of the multiple values to be populated for each row - having two values in two columns would be an error, but the design doesn't reflect that. I'd cross that off.
So, if it were me I'd be looking at option 1 or 2 or 4, and the decision would be driven by this: do I typically need to make one query returning rows that have a mix of values of different types in the same result set? Or will I almost always ask for the rows by item and by type. I ask because if the values are different types it implies to me some difference in the source or the use of that data (you are unlikely, for example, to compare a string and a real, or a string and a bit.) This is relevant because having different tables per type might actually be a significant performance/scalability advantage, if partitioning the data that way makes queries faster. Partitioning data into smaller sets of more closely related data can give a performance advantage.
It's like having all the data in one massive (albeit sorted) set or having it partitioned into smaller, related sets. The smaller sets favor some types of queries, and if those are the queries you will need, it's a win.
Details:
CREATE TABLE [dbo].[items](
[itemid] [int] IDENTITY(1,1) NOT NULL,
[item] [varchar](100) NOT NULL,
CONSTRAINT [PK_items] PRIMARY KEY CLUSTERED
(
[itemid] ASC
)
)
/* This table has the problem of allowing two values
in the same row, plus allocates but does not use a
lot of space in memory and on disk (bad): */
CREATE TABLE [dbo].[vals](
[itemid] [int] NOT NULL,
[datestamp] [datetime] NOT NULL,
[valueBit] [bit] NULL,
[valueNumericA] [numeric](2, 0) NULL,
[valueNumericB] [numeric](8, 2) NULL,
[valueReal] [real] NULL,
[valueString] [varchar](100) NULL,
CONSTRAINT [PK_vals] PRIMARY KEY CLUSTERED
(
[itemid] ASC,
[datestamp] ASC
)
)
ALTER TABLE [dbo].[vals] WITH CHECK
ADD CONSTRAINT [FK_vals_items] FOREIGN KEY([itemid])
REFERENCES [dbo].[items] ([itemid])
GO
ALTER TABLE [dbo].[vals] CHECK CONSTRAINT [FK_vals_items]
GO
/* This is probably better, though casting is required
all the time. If you search with the variant as criteria,
that could get dicey as you have to be careful with types,
casting and indexing. Also everything is "mixed" in one
giant set */
CREATE TABLE [dbo].[allvals](
[itemid] [int] NOT NULL,
[datestamp] [datetime] NOT NULL,
[value] [sql_variant] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[allvals] WITH CHECK
ADD CONSTRAINT [FK_allvals_items] FOREIGN KEY([itemid])
REFERENCES [dbo].[items] ([itemid])
GO
ALTER TABLE [dbo].[allvals] CHECK CONSTRAINT [FK_allvals_items]
GO
/* This would be an alternative, but you trade multiple
queries and joins for the casting issue. OTOH the implied
partitioning might be an advantage */
CREATE TABLE [dbo].[valsBits](
[itemid] [int] NOT NULL,
[datestamp] [datetime] NOT NULL,
[val] [bit] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[valsBits] WITH CHECK
ADD CONSTRAINT [FK_valsBits_items] FOREIGN KEY([itemid])
REFERENCES [dbo].[items] ([itemid])
GO
ALTER TABLE [dbo].[valsBits] CHECK CONSTRAINT [FK_valsBits_items]
GO
CREATE TABLE [dbo].[valsNumericA](
[itemid] [int] NOT NULL,
[datestamp] [datetime] NOT NULL,
[val] numeric( 2, 0 ) NOT NULL
) ON [PRIMARY]
GO
... FK constraint ...
CREATE TABLE [dbo].[valsNumericB](
[itemid] [int] NOT NULL,
[datestamp] [datetime] NOT NULL,
[val] numeric ( 8, 2 ) NOT NULL
) ON [PRIMARY]
GO
... FK constraint ...
etc...

Using Access to change SQL tables. Table randomly becomes read-only

I have a database that the client needs to update. They like to use access. Some tables randomly become read-only for them. Any ideas why?
They are using Access 2007 and MS SQL 2005.
SQL Table:
CREATE TABLE [dbo].[Users](
[SyncGroup] [varchar](20) NULL,
[UserID] [varchar](20) NOT NULL,
[Password] [varchar](20) NOT NULL,
[Restriction] [text] NULL DEFAULT (' '),
[SiteCode] [varchar](20) NULL,
[Group] [varchar](20) NULL,
[EmpId] [varchar](20) NULL,
[TimeZoneOffset] [int] NULL,
[UseDaylightSavings] [bit] NULL,
PRIMARY KEY ([UserID]) )

Access really likes having a TimeStamp aka RowVersion field on every table. I don't know if this will fix your problem though.
"On servers that support them (such as Microsoft SQL Server), timestamp fields make updating records more efficient. Timestamp fields are maintained by the server and are updated every time the record is updated. If you have a timestamp field, Microsoft Access needs to check only the unique index and the timestamp field to see whether the record has changed since it was last retrieved from the server. Otherwise, Microsoft Access must check all the fields in the record. If you add a timestamp field to an attached table, re-attach the table in order to inform Microsoft Access of the new field."
http://technet.microsoft.com/en-us/library/cc917601.aspx

are users accessing the database while you're trying to do stuff iwth sql? if so, then you will get an error message stating that the database is in use and is read only. no one can be in the database when you are doing things with it though sql.

Sounds like a permissions problem. Are you keeping careful track of who is altering the schema? You may have users who aren't permitted to use changes made by certain other users.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight