Fetching ElasticSearch Results into SQL Server by calling Web Service using SQL CLR - sql-server

Code Migration due to Performance Issues :-
SQL Server LIKE Condition ( BEFORE )
SQL Server Full Text Search --> CONTAINS ( BEFORE )
Elastic Search ( CURRENTLY )
Achieved So Far :-
We have a web page created in ASP.Net Core which has a Auto Complete Drop Down of 2.5+ Million Companies Indexed in Elastic Search https://www.99corporates.com/
Due to performance issues we have successfully shifted our code from SQL Server Full Text Search to Elastic Search and using NEST v7.2.1 and Elasticsearch.Net v7.2.1 in our .Net Code.
Still looking for a solution :-
If the user does not select a company from the Auto Complete List and simply enters a few characters and clicks on go then a list should be displayed which we had done earlier by using the SQL Server Full Text Search --> CONTAINS
Can we call the ASP.Net Web Service which we have created using SQL CLR and code like SELECT * FROM dbo.Table WHERE Name IN( dbo.SQLWebRequest('') )
[System.Web.Script.Services.ScriptMethod()]
[System.Web.Services.WebMethod]
public static List<string> SearchCompany(string prefixText, int count)
{
}
Any better or alternate option

While that solution (i.e. the SQL-APIConsumer SQLCLR project) "works", it is not scalable. It also requires setting the database to TRUSTWORTHY ON (a security risk), and loads a few assemblies as UNSAFE, such as Json.NET, which is risky if any of them use static variables for caching, expecting each caller to be isolated / have their own App Domain, because SQLCLR is a single, shared App Domain, hence static variables are shared across all callers, and multiple concurrent threads can cause race-conditions (this is not to say that this is something that is definitely happening since I haven't seen the code, but if you haven't either reviewed the code or conducted testing with multiple concurrent threads to ensure that it doesn't pose a problem, then it's definitely a gamble with regards to stability and ensuring predictable, expected behavior).
To a slight degree I am biased given that I do sell a SQLCLR library, SQL#, in which the Full version contains a stored procedure that also does this but a) handles security properly via signatures (it does not enable TRUSTWORTHY), b) allows for handling scalability, c) does not require any UNSAFE assemblies, and d) handles more scenarios (better header handling, etc). It doesn't handle any JSON, it just returns the web service response and you can unpack that using OPENJSON or something else if you prefer. (yes, there is a Free version of SQL#, but it does not contain INET_GetWebPages).
HOWEVER, I don't think SQLCLR is a good fit for this scenario in the first place. In your first two versions of this project (using LIKE and then CONTAINS) it made sense to send the user input directly into the query. But now that you are using a web service to get a list of matching values from that user input, you are no longer confined to that approach. You can, and should, handle the web service / Elastic Search portion of this separately, in the app layer.
Rather than passing the user input into the query, only to have the query pause to get that list of 0 or more matching values, you should do the following:
Before executing any query, get the list of matching values directly in the app layer.
If no matching values are returned, you can skip the database call entirely as you already have your answer, and respond immediately to the user (much faster response time when no matches return)
If there are matches, then execute the search stored procedure, sending that list of matches as-is via Table-Valued Parameter (TVP) which becomes a table variable in the stored procedure. Use that table variable to INNER JOIN against the table rather than doing an IN list since IN lists do not scale well. Also, be sure to send the TVP values to SQL Server using the IEnumerable<SqlDataRecord> method, not the DataTable approach as that merely wastes CPU / time and memory.
For example code on how to accomplish this correctly, please see my answer to Pass Dictionary to Stored Procedure T-SQL
In C#-style pseudo-code, this would be something along the lines of the following:
List<string> = companies;
companies = SearchCompany(PrefixText, Count);
if (companies.Length == 0)
{
Response.Write("Nope");
}
else
{
using(SqlConnection db = new SqlConnection(connectionString))
{
using(SqlCommand batch = db.CreateCommand())
{
batch.CommandType = CommandType.StoredProcedure;
batch.CommandText = "ProcName";
SqlParameter tvp = new SqlParameter("ParamName", SqlDbType.Structured);
tvp.Value = MethodThatYieldReturnsList(companies);
batch.Paramaters.Add(tvp);
db.Open();
using(SqlDataReader results = db.ExecuteReader())
{
if (results.HasRows)
{
// deal with results
Response.Write(results....);
}
}
}
}
}

Done. Got the solution.
Used SQL CLR https://github.com/geral2/SQL-APIConsumer
exec [dbo].[APICaller_POST]
#URL = 'https://www.-----/SearchCompany'
,#JsonBody = '{"searchText":"GOOG","count":10}'
Let me know if there is any other / better options to achieve this.

Related

using servicestack ormlite, is there a way to get an execution plan?

Using servicestack ormlite 6,4 and azure SQL server - using SQLServerDialect2012, we have an issue with an enums causing excessive stalling and timeouts.
If we just convert it to a string its quick as it should be.
var results = db.Select(q => q.SomeColumn == enum.value); -> 3,5 seconds
var results2 = db.Select(q => q.SomeColumn.tostring() == enum.value.tostring()); -> 0,08
we are using default settings so the enum in the db is defined as a varchar(255)
both queries give the same result.
to track the issue we wanted to see what its actually firing, but all we get is a query with some #1 #2 etc with no indication of what parameters where used or how they are defined.
All our attempts to get a 1:1 SQL string we can use to manually test the query and see the results have failed... mini profiler was the closest as it shows the parameter values...
but it does not contain the details necessary to recreate the used query and recreate the issue we have. (manually recreating the query gives 80ms as above)
Trying to get the execution plan with the query also fail.
db.ExecuteSql("SET STATISTICS PROFILE ON;");
var results = db.Select(q => q.SomeColumn == enum.value);
db.ExecuteSql("SET STATISTICS PROFILE OFF;");
only returns data, not any extra info i was hoping for.
I have not been able to find any sites or threads that explain how others get any kind of debug info.
What is the correct next step here?
OrmLite's Logging & Introspection page shows how you can view the SQL generated by OrmLite.
E.g. Configuring a debug logger should log the generated SQL and params:
LogManager.LogFactory = new ConsoleLogFactory(debugEnabled:true);

NDB Queries Exceeding GAE Soft Private Memory Limit

I currently have a an application running in the Google App Engine Standard Environment, which, among other things, contains a large database of weather data and a frontend endpoint that generates graph of this data. The database lives in Google Cloud Datastore, and the Python Flask application accesses it via the NDB library.
My issue is as follows: when I try to generate graphs for WeatherData spanning more than about a week (the data is stored for every 5 minutes), my application exceeds GAE's soft private memory limit and crashes. However, stored in each of my WeatherData entities are the relevant fields that I want to graph, in addition to a very large json string containing forecast data that I do not need for this graphing application. So, the part of the WeatherData entities that is causing my application to exceed the soft private memory limit is not even needed in this application.
My question is thus as follows: is there any way to query only certain properties in the entity, such as can be done for specific columns in a SQL-style query? Again, I don't need the entire forecast json string for graphing, only a few other fields stored in the entity. The other approach I tried to run was to only fetch a couple of entities out at a time and split the query into multiple API calls, but it ended up taking so long that the page would time out and I couldn't get it to work properly.
Below is my code for how it is currently implemented and breaking. Any input is much appreciated:
wDataCsv = 'Time,' + ','.join(wData.keys())
qry = WeatherData.time_ordered_query(ndb.Key('Location', loc),start=start_date,end=end_date)
for acct in qry.fetch():
d = [acct.time.strftime(date_string)]
for attr in wData.keys():
d.append(str(acct.dict_access(attr)))
wData[attr].append([acct.time.strftime(date_string),acct.dict_access(attr)])
wDataCsv += '\\n' + ','.join(d)
# Children Entity - log of a weather at parent location
class WeatherData(ndb.Model):
# model for data to save
...
# Function for querying data below a given ancestor between two optional
# times
#classmethod
def time_ordered_query(cls, ancestor_key, start=None, end=None):
return cls.query(cls.time>=start, cls.time<=end,ancestor=ancestor_key).order(-cls.time)
EDIT: I tried the iterative page fetching strategy described in the link from the answer below. My code was updated to the following:
wDataCsv = 'Time,' + ','.join(wData.keys())
qry = WeatherData.time_ordered_query(ndb.Key('Location', loc),start=start_date,end=end_date)
cursor = None
while True:
gc.collect()
fetched, next_cursor, more = qry.fetch_page(FETCHNUM, start_cursor=cursor)
if fetched:
for acct in fetched:
d = [acct.time.strftime(date_string)]
for attr in wData.keys():
d.append(str(acct.dict_access(attr)))
wData[attr].append([acct.time.strftime(date_string),acct.dict_access(attr)])
wDataCsv += '\\n' + ','.join(d)
if more and next_cursor:
cursor = next_cursor
else:
break
where FETCHNUM=500. In this case, I am still exceeding the soft private memory limit for queries of the same length as before, and the query takes much, much longer to run. I suspect the problem may be with Python's garbage collector not deleting the already used information that is re-referenced, but even when I include gc.collect() I see no improvement there.
EDIT:
Following the advice below, I fixed the problem using Projection Queries. Rather than have a separate projection for each custom query, I simply ran the same projection each time: namely querying all properties of the entity excluding the JSON string. While this is not ideal as it still pulls gratuitous information from the database each time, generating individual queries of each specific query is not scalable due to the exponential growth of necessary indices. For this application, as each additional property is negligible additional memory (aside form that json string), it works!
You can use projection queries to fetch only the properties of interest from each entity. Watch out for the limitations, though. And this still can't scale indefinitely.
You can split your queries across multiple requests (more scalable), but use bigger chunks, not just a couple (you can fetch 500 at a time) and cursors. Check out examples in How to delete all the entries from google datastore?
You can bump your instance class to one with more memory (if not done already).
You can prepare intermediate results (also in the datastore) from the big entities ahead of time and use these intermediate pre-computed values in the final stage.
Finally you could try to create and store just portions of the graphs and just stitch them together in the end (only if it comes down to that, I'm not sure how exactly it would be done, I imagine it wouldn't be trivial).

Controlling NHIbernate search query output regarding parameters

When you use NHibernate to "fetch" a mapped object, it outputs a SELECT query to the database. It outputs this using parameters; so if I query a list of cars based on tenant ID and name, I get:
select Name, Location from Car where tenantID=#p0 and Name=#p1
This has the nice benefit of our database creating (and caching) a query plan based on this query and the result, so when it is run again, the query is much faster as it can load the plan from the cache.
The problem with this is that we are a multi-tenant database, and almost all of our indexes are partition aligned. Our tenants have vastly different data sets; one tenant could have 5 cars, while another could have 50,000. And so because NHibernate does this, it has the net effect of our database creating and caching a plan for the FIRST tenant that runs it. This plan is likely not efficient for subsequent tenants who run the query.
What I WANT to do is force NHibernate NOT to parameterize certain parameters; namely, the tenant ID. So I'd want the query to read:
select Name, Location from Car where tenantID=55 and Name=#p0
I can't figure out how to do this in the HBM.XML mapping. How can I dictate to NHibernate how to use parameters? Or can I just turn parameters off altogether?
OK everyone, I figured it out.
The way I did it was overriding the SqlClientDriver with my own custom driver that looks like this:
public class CustomSqlClientDriver : SqlClientDriver
{
private static Regex _partitionKeyReplacer = new Regex(#".PartitionKey=(#p0)", RegexOptions.Compiled);
public override void AdjustCommand(IDbCommand command)
{
var m = _tenantIDReplacer.Match(command.CommandText);
if (!m.Success)
return;
// replace the first parameter with the actual partition key
var parameterName = m.Groups[1].Value;
// find the parameter value
var tenantID = (IDbDataParameter ) command.Parameters[parameterName];
var valueOfTenantID = tenantID.Value;
// now replace the string
command.CommandText = _tenantIDReplacer.Replace(command.CommandText, ".TenantID=" + valueOfTenantID);
}
} }
I override the AdjustCommand method and use a Regex to replace the tenantID. This works; not sure if there's a better way, but I really didn't want to have to open up NHibernate and start messing with core code.
You'll have to register this custom driver in the connection.driver_class property of the SessionFactory upon initialization.
Hope this helps somebody!

Where EF6 doing where clause at SQL or at client

I am trying to start using EF6 for a project. My database is already filled with millions of records.
I can't find right explanation how does EF send T-SQL to SQL Server? I am afraid that I am going to download bunch of data to user for no reason.
In code below I have found three way to get my data to List<> but I am not sure which is right way to do WHERE clause at SQL.
I do not want to fill client with millions of record and to query (filter) that data at client.
using (rgtBaza baza = new rgtBaza())
{
var t = baza.Database.SqlQuery<CJE_DOC>("select * from cje_doc where datum between #od and #do",new SqlParameter("od", this.dateTimePickerOD.Value.Date ) ,new SqlParameter("do", this.dateTimePickerOD.Value.Date)).ToList();
var t = baza.CJE_DOC.Where(s => s.DATUM.Value >= this.dateTimePickerOD.Value.Date && s.DATUM.Value <= this.dateTimePickerDO.Value.Date).ToList();
var query = from b in baza.CJE_DOC
where b.DATUM >= this.dateTimePickerOD.Value.Date && b.DATUM.Value <= this.dateTimePickerDO.Value.Date
select b;
var t = query.ToList();
this.dataGridViewCJENICI.DataSource = t;
}
In all 3 cases, the filtering will happen on the database side, the filtering (or WHERE clause) will not take place on the client side.
If you want to verify that this is true, especially for your last 2 options, add some logging so that you can see the generated SQL:
baza.Database.Log = s => Console.WriteLine(s);
In this case, since you are using EF already, choose the 2nd or 3rd options, they are both equivalent with different syntax. Pick your favorite syntax.
In all of those examples, EF6 will generate a SQL query including the where clause - it won't perform the where clause on the client.
It won't actually retrieve any data from the database until you iterate through the results, which in the examples above, is when you call .ToList().
EF6 would only run the filter on the client if you called something like:
baza.CJE_DOC.ToList().Where(x => x.Field == value)
In this instance, it would retrieve the entire table when you called ToList(), and then use a client-side Linq query to filter the results in the where clause.
Any of the 3 will run the query on the SQL Server.
EF relies on LINQ's deferred execution model to build up an expression tree. Once you take an action that causes the expression to be enumerated (e.g. calling ToList(), ToArray(), or any of the other To*() methods), it will convert the expression tree to SQL, send the query to the server, and then start returning the results.
One of the side effects of this is that when using the query or lambda syntax, expressions that EF does not understand how to convert to SQL will cause an exception.
If you absolutely need to use some code that EF can't handle, you can break your code into multiple segments -- filtering things down as far as possible via code that can be converted to SQL, using the AsEnumerable() method to "close off" the EF expression, and doing your remaining filtering or transformations using Linq to Objects.

Why does codeigniter database cache regenerates file?

I've used Codeigniter Database Cache for a while and it works quite good, but now I have a weird problem with a specific query. The steps are as follow:
My cache directory is empty.
I open the URL /myproject/mycontroller/myaction/
The file is cached in mycontroller-myaction directory (I open the file with my notepad to be sure that it contains the correct data.
I open again the /myproject/mycontroller/myaction/ expecting the data to be retrieved from the cache and turns out that the data is retrieved from the database and the file y regenerated. I don't know why but the point is that the file generated is being useless.
If is important to you I give you the next info:
The query is a stored procedure.
I have other queries that are working perfectly.
I'd really appreciate your help, if you need specific data just let me know.
Thanks.
Using xdebug I found out that in the DB_driver.php file in the query function, there is a condition in the line 277 that says as follow:
// Is query caching enabled? If the query is a "read type"
// we will load the caching class and return the previously
// cached query if it exists
if ($this->cache_on == TRUE AND (stristr($sql, 'SELECT')))
{
if ($this->_cache_init())
{
$this->load_rdriver();
if (FALSE !== ($cache = $this->CACHE->read($sql)))
{
return $cache;
}
}
}
where the query must be a SELECT, and I am using a stored procedure and my model is:
public function cobertura($param1= NULL) {
$query = $this->db->query("[SP_NAME] ?", array($param1));
}
return $query->result();
}
So as I'm using a stored procedure instead of a SELECT statement, the condition returns FALSE and generates again the cache file.
How could I modify that function in order to detect my stored procedure as a valid statement?
Thank you very much.
Well, the solution for this problem was modify this
if ($this->cache_on == TRUE AND (stristr($sql, 'SELECT')))
by this
if ($this->cache_on == TRUE AND (stristr($sql, 'SELECT') OR strpos($sql, '[')))
This way I tell CI that when my sql statment starts with [ is a SELECT query too.
And when I have an stored procedure that is not a select statement simply disable the cache.
If you have a better solution you can share.
You should not use the query cache in CodeIgniter. It is not safe to use unless you know exactly what you are doing.
Instead I would create a cacheQuery function or something like it to at least give you control over caching for queries. The assumption that any query that returns data or might return data can be cached is very risky.
By replacing CI's attempted automation and doing it manually you will also gain some speed. If you xdebug profile all the guessing CI is doing about your queries add a lot of overhead to the query function.
The solution posted above might also be a bit dodgy because not all stored procedures return data and the square bracket could be anywhere in your queries so that check can get it wrong. It will return false if the first character and return true if the character is anywhere in the query even if it doesn't make up a stored procedure. For example, DELETE FROM table1 WHERE column1="[xxx]".
This kind of thing stristr($sql, 'SELECT') and strpos($sql, '[') are not good. CodeIgniter is simple, tidy, consistent and well organised but the actually quality of PHP code in it is extremely poor. To be fair, part of this is because it maintains compatibility with much older versions of PHP but there are several things that are inexplicably bad. When it comes to this kind of thing the CodeIgniter policy is "If it works 90% or more of the time then it's ok". This is of course not good for large serious enterprise applications.

Resources