Why does codeigniter database cache regenerates file? - sql-server

I've used Codeigniter Database Cache for a while and it works quite good, but now I have a weird problem with a specific query. The steps are as follow:
My cache directory is empty.
I open the URL /myproject/mycontroller/myaction/
The file is cached in mycontroller-myaction directory (I open the file with my notepad to be sure that it contains the correct data.
I open again the /myproject/mycontroller/myaction/ expecting the data to be retrieved from the cache and turns out that the data is retrieved from the database and the file y regenerated. I don't know why but the point is that the file generated is being useless.
If is important to you I give you the next info:
The query is a stored procedure.
I have other queries that are working perfectly.
I'd really appreciate your help, if you need specific data just let me know.
Thanks.
Using xdebug I found out that in the DB_driver.php file in the query function, there is a condition in the line 277 that says as follow:
// Is query caching enabled? If the query is a "read type"
// we will load the caching class and return the previously
// cached query if it exists
if ($this->cache_on == TRUE AND (stristr($sql, 'SELECT')))
{
if ($this->_cache_init())
{
$this->load_rdriver();
if (FALSE !== ($cache = $this->CACHE->read($sql)))
{
return $cache;
}
}
}
where the query must be a SELECT, and I am using a stored procedure and my model is:
public function cobertura($param1= NULL) {
$query = $this->db->query("[SP_NAME] ?", array($param1));
}
return $query->result();
}
So as I'm using a stored procedure instead of a SELECT statement, the condition returns FALSE and generates again the cache file.
How could I modify that function in order to detect my stored procedure as a valid statement?
Thank you very much.

Well, the solution for this problem was modify this
if ($this->cache_on == TRUE AND (stristr($sql, 'SELECT')))
by this
if ($this->cache_on == TRUE AND (stristr($sql, 'SELECT') OR strpos($sql, '[')))
This way I tell CI that when my sql statment starts with [ is a SELECT query too.
And when I have an stored procedure that is not a select statement simply disable the cache.
If you have a better solution you can share.

You should not use the query cache in CodeIgniter. It is not safe to use unless you know exactly what you are doing.
Instead I would create a cacheQuery function or something like it to at least give you control over caching for queries. The assumption that any query that returns data or might return data can be cached is very risky.
By replacing CI's attempted automation and doing it manually you will also gain some speed. If you xdebug profile all the guessing CI is doing about your queries add a lot of overhead to the query function.
The solution posted above might also be a bit dodgy because not all stored procedures return data and the square bracket could be anywhere in your queries so that check can get it wrong. It will return false if the first character and return true if the character is anywhere in the query even if it doesn't make up a stored procedure. For example, DELETE FROM table1 WHERE column1="[xxx]".
This kind of thing stristr($sql, 'SELECT') and strpos($sql, '[') are not good. CodeIgniter is simple, tidy, consistent and well organised but the actually quality of PHP code in it is extremely poor. To be fair, part of this is because it maintains compatibility with much older versions of PHP but there are several things that are inexplicably bad. When it comes to this kind of thing the CodeIgniter policy is "If it works 90% or more of the time then it's ok". This is of course not good for large serious enterprise applications.

Related

Fetching ElasticSearch Results into SQL Server by calling Web Service using SQL CLR

Code Migration due to Performance Issues :-
SQL Server LIKE Condition ( BEFORE )
SQL Server Full Text Search --> CONTAINS ( BEFORE )
Elastic Search ( CURRENTLY )
Achieved So Far :-
We have a web page created in ASP.Net Core which has a Auto Complete Drop Down of 2.5+ Million Companies Indexed in Elastic Search https://www.99corporates.com/
Due to performance issues we have successfully shifted our code from SQL Server Full Text Search to Elastic Search and using NEST v7.2.1 and Elasticsearch.Net v7.2.1 in our .Net Code.
Still looking for a solution :-
If the user does not select a company from the Auto Complete List and simply enters a few characters and clicks on go then a list should be displayed which we had done earlier by using the SQL Server Full Text Search --> CONTAINS
Can we call the ASP.Net Web Service which we have created using SQL CLR and code like SELECT * FROM dbo.Table WHERE Name IN( dbo.SQLWebRequest('') )
[System.Web.Script.Services.ScriptMethod()]
[System.Web.Services.WebMethod]
public static List<string> SearchCompany(string prefixText, int count)
{
}
Any better or alternate option
While that solution (i.e. the SQL-APIConsumer SQLCLR project) "works", it is not scalable. It also requires setting the database to TRUSTWORTHY ON (a security risk), and loads a few assemblies as UNSAFE, such as Json.NET, which is risky if any of them use static variables for caching, expecting each caller to be isolated / have their own App Domain, because SQLCLR is a single, shared App Domain, hence static variables are shared across all callers, and multiple concurrent threads can cause race-conditions (this is not to say that this is something that is definitely happening since I haven't seen the code, but if you haven't either reviewed the code or conducted testing with multiple concurrent threads to ensure that it doesn't pose a problem, then it's definitely a gamble with regards to stability and ensuring predictable, expected behavior).
To a slight degree I am biased given that I do sell a SQLCLR library, SQL#, in which the Full version contains a stored procedure that also does this but a) handles security properly via signatures (it does not enable TRUSTWORTHY), b) allows for handling scalability, c) does not require any UNSAFE assemblies, and d) handles more scenarios (better header handling, etc). It doesn't handle any JSON, it just returns the web service response and you can unpack that using OPENJSON or something else if you prefer. (yes, there is a Free version of SQL#, but it does not contain INET_GetWebPages).
HOWEVER, I don't think SQLCLR is a good fit for this scenario in the first place. In your first two versions of this project (using LIKE and then CONTAINS) it made sense to send the user input directly into the query. But now that you are using a web service to get a list of matching values from that user input, you are no longer confined to that approach. You can, and should, handle the web service / Elastic Search portion of this separately, in the app layer.
Rather than passing the user input into the query, only to have the query pause to get that list of 0 or more matching values, you should do the following:
Before executing any query, get the list of matching values directly in the app layer.
If no matching values are returned, you can skip the database call entirely as you already have your answer, and respond immediately to the user (much faster response time when no matches return)
If there are matches, then execute the search stored procedure, sending that list of matches as-is via Table-Valued Parameter (TVP) which becomes a table variable in the stored procedure. Use that table variable to INNER JOIN against the table rather than doing an IN list since IN lists do not scale well. Also, be sure to send the TVP values to SQL Server using the IEnumerable<SqlDataRecord> method, not the DataTable approach as that merely wastes CPU / time and memory.
For example code on how to accomplish this correctly, please see my answer to Pass Dictionary to Stored Procedure T-SQL
In C#-style pseudo-code, this would be something along the lines of the following:
List<string> = companies;
companies = SearchCompany(PrefixText, Count);
if (companies.Length == 0)
{
Response.Write("Nope");
}
else
{
using(SqlConnection db = new SqlConnection(connectionString))
{
using(SqlCommand batch = db.CreateCommand())
{
batch.CommandType = CommandType.StoredProcedure;
batch.CommandText = "ProcName";
SqlParameter tvp = new SqlParameter("ParamName", SqlDbType.Structured);
tvp.Value = MethodThatYieldReturnsList(companies);
batch.Paramaters.Add(tvp);
db.Open();
using(SqlDataReader results = db.ExecuteReader())
{
if (results.HasRows)
{
// deal with results
Response.Write(results....);
}
}
}
}
}
Done. Got the solution.
Used SQL CLR https://github.com/geral2/SQL-APIConsumer
exec [dbo].[APICaller_POST]
#URL = 'https://www.-----/SearchCompany'
,#JsonBody = '{"searchText":"GOOG","count":10}'
Let me know if there is any other / better options to achieve this.

Cloudant Database Map Reduce

I am new to cloudant , no-sql data base (i had worked on mongodb )
1) is there any cloudant ui to write the queires to find the resultset for developing.
2) how to create map-reduce in cloudant ?..
can u please reply me or send your thoughts.
The search indexes are written in JavaScript (at the moment, Cloduant has launched their own "Cloudant Query" which promises to be easier to work with but I haven't had the time to try it properly yet.)
Say you have documents in your DB which contain a field called "UserName" and you want to create a view on all these. You could write a function like this;
function(doc) {
if ( typeof doc.UserName !== "undefined" ) {
emit([doc.UserName], doc._id);
}
}
For example (it will output the user names and document ids)
If a given user name could be associated with multiple documents you could do this, for example;
function(doc) {
if ( typeof doc.UserName !== "undefined" ) {
emit([doc.UserName,doc._id], 1);
}
}
and also use the built-in "count" or "sum" reduce functions that Cloudant provides to tally the number of documents a given user name is associated with etc.
You can use the UI in the Cloudant DB dashboard to execute queries or (as I personally favour) use a tool like Postman (https://www.getpostman.com/)
One word of warning though; error- and sanity -checking of your JavaScript code is pretty much non-existent and you'll only know that something isn't working when you hit "save & build index" which can be a major pain if you're working on large databases (it can grind the whole thing to a halt). A pro tip, therefore, is to work out your indexes on smaller data sets in some safe little sandbox database before you let it lose on anything important...
All of this is supposedly going to be Much Better with Cloudant Query.

Entity Framework: Max. number of "subqueries"?

My data model has an entity Person with 3 related (1:N) entities Jobs, Tasks and Dates.
My query looks like
var persons = (from x in context.Persons
select new {
PersonId = x.Id,
JobNames = x.Jobs.Select(y => y.Name),
TaskDates = x.Tasks.Select(y => y.Date),
DateInfos = x.Dates.Select(y => y.Info)
}).ToList();
Everything seems to work fine, but the lists JobNames, TaskDates and DateInfos are not all filled.
For example, TaskDates and DateInfos have the correct values, but JobNames stays empty. But when I remove TaskDates from the query, then JobNames is correctly filled.
So it seems that EF can only handle a limited number of these "subqueries"? Is this correct? If so, what is the max. number of these "subqueries" for a single statement? Is there a way to work around these issue without having to make more than one call to the database?
(ps: I'm not entirely sure, but I seem to remember that this query worked in LINQ2SQL - could it be?)
UPDATE
I'm getting crazy about this. I tried to repro the issue from ground up using a fresh, simple project (to post the entire piece of code here, not only an oversimplified example) - and I found I wasn't able to repro it. It still happens within our existing code base (apparently there's more behind this problem, but I cannot share this closed code base, unfortunately).
After hours and hours of playing around I found the weirdest behavior:
It works great when I don't SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; before calling the LINQ statement
It also works great (independent of the above) when I don't use a .Take() to only get the first X rows
It also works great when I add an additional .Where() statements to cut the the number of rows returned from SQL Server
I didn't find any comprehensible reason why I see this behavior, but I started to look at the SQL: Although EF generates the exact same SQL, the execution plan is different when I use READ UNCOMMITTED. It returns more rows on a specific index in the middle of the execution plan, which curiously ends in less rows returned for the entire SQL statement - which in turn results in the missing data, that is the reason for my question to begin with.
This sounds very confusing and unbelievable, I know, but this is the behavior I see. I don't know what else to do, I don't even know what to google for at this point ;-).
I can fix my problem (just don't use READ UNCOMMITTED), but I have no idea why it occurs and if it is a bug or something I don't know about SQL Server. Maybe there's some "magic max number of allowed results in sub-queries" in SQL Server? At least: As far as I can see, it's not an issue with EF itself.
A little late, but does calling ToList() on each subquery produce the required effect?
var persons = (from x in context.Persons
select new {
PersonId = x.Id,
JobNames = x.Jobs.Select(y => y.Name.ToList()),
TaskDates = x.Tasks.Select(y => y.Date).ToList(),
DateInfos = x.Dates.Select(y => y.Info).ToList()
}).ToList();

How to get all results from solr query?

I executed some query like "Address:Jack*". It show numFound = 5214 and display 100 documents in results page(I changed default display results from 10 to 100).
How can I get all documents.
I remember myself doing &rows=2147483647
2,147,483,647 is integer's maximum value. I recall using a number bigger than that once and having a NumberFormatException because it couldn't be parsed into an int. I don't know if they use Long nowadays, but 2 billion rows is normally more than enough.
Small note:
Be careful if you are planning to do this in production. If you do a query like * : * and your index is big, you could transferring a couple of gigabytes in that query.
If you know you won't have many docs, go ahead and use integer's max value.
On the other hand, if you are doing a one-time script and just need to dump all results (for example document ID's) then this approach is valid, if you don't mind waiting 3-5 minutes for a query to return.
Don't use &rows=2147483647
Don't use Integer.MAX_VALUE(2147483647) as value of rows in production. This will heavily slow down your query even if you have a small resultset, because solr preallocates a queue in this size. see https://issues.apache.org/jira/browse/SOLR-7580
I strongly suggest to use Exporting Result Sets
It’s possible to export fully sorted result sets using a special rank query parser and response writer specifically designed to work together to handle scenarios that involve sorting and exporting millions of records.
Or I suggest to use Deep Paging.
Simple Pagination is a easy thing when you have few documents to read and all you have to do is play with start and rows parameters. But this is not a feasible way when you have many documents, I mean hundreds of thousands or even millions.
This is the kind of thing that could bring your Solr server to their knees.
For typical applications displaying search results to a human user,
this tends to not be much of an issue since most users don’t care
about drilling down past the first handful of pages of search results
— but for automated systems that want to crunch data about all of the
documents matching a query, it can be seriously prohibitive.
This means that if you have a website and are paging search results, a real user do not go so further but consider on the other hand what can happen if a spider or a scraper try to read all the website pages.
Now we are talking of Deep Paging.
I’ll suggest to read this amazing post:
https://lucidworks.com/post/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/
And take a look at this document page:
https://solr.apache.org/guide/pagination-of-results.html
And here is an example that try to explain how to paginate using the cursors.
SolrQuery solrQuery = new SolrQuery();
solrQuery.setRows(500);
solrQuery.setQuery("*:*");
solrQuery.addSort("id", ORDER.asc); // Pay attention to this line
String cursorMark = CursorMarkParams.CURSOR_MARK_START;
boolean done = false;
while (!done) {
solrQuery.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
QueryResponse rsp = solrClient.query(solrQuery);
String nextCursorMark = rsp.getNextCursorMark();
for (SolrDocument d : rsp.getResults()) {
...
}
if (cursorMark.equals(nextCursorMark)) {
done = true;
}
cursorMark = nextCursorMark;
}
Returning all the results is never a good option as It would be very slow in performance.
Can you mention your use case ?
Also, Solr rows parameter helps you to tune the number of the results to be returned.
However, I don't think there is a way to tune rows to return all results. It doesn't take a -1 as value.
So you would need to set a high value for all the results to be returned.
What you should do is to first create a SolrQuery shown below and set the number of documents you want to fetch in a batch.
int lastResult=0; //this is for processing the future batch
String query = "id:[ lastResult TO *]"; // just considering id for the sake of simplicity
SolrQuery solrQuery = new SolrQuery(query).setRows(500); //setRows will set the required batch, you can change this to whatever size you want.
SolrDocumentList results = solrClient.query(solrQuery).getResults(); //execute this statement
Here I am considering an example of search by id, you can replace it with any of your parameter to search upon.
The "lastResult" is the variable you can change after execution of the first 500 records(500 is the batch size) and set it to the last id got from the results.
This will help you execute the next batch starting with last result from previous batch.
Hope this helps. Shoot up a comment below if you need any clarification.
For selecting all documents in dismax/edismax via Solarium php client, the normal query syntax : does not work. To select all documents set the default query value in solarium query to empty string. This is required as the default query in Solarium is :. Also set the alternative query to :. Dismax/eDismax normal query syntax does not support :, but the alternative query syntax does.
For more details following book can be referred
http://www.packtpub.com/apache-solr-php-integration/book
As the other answers pointed out, you can configure the rows to be max integer to yield back all the results for a query.
I would recommend though to use Solr feature of pagination, and build a function that will return for you all the results using the cursorMark API. The gist of it is you set the cursorMark parameter to '*', you set the page size(rows parameter), and on each result you'll get a cursorMark for the next page, so you execute the same query only with the cursorMark given from the last result. This way you'll have more flexibility on how much of the results you want back, in a much more performant way.
The way I dealt with the problem is by running the query twice:
// Start with your (usually small) default page size
solrQuery.setRows(50);
QueryResponse response = solrResponse(query);
if (response.getResults().getNumFound() > 50) {
solrQuery.setRows(response.getResults().getNumFound());
response = solrResponse(query);
}
It makes a call twice to Solr, but gets you all matching records....with the small performance penalty.
query.setRows(Integer.MAX_VALUE);
works for me!!

Django hits the database for each filter() call

I have some Django 1.3 code that looks up many model instances in a loop, ie.
my_set = myinstance.subitem_set.all()
for value in values:
existing = my_set.filter(attr_name=value)
if len(existing) == 1:
...
This works, but profiling SQL queries shows that it hits the DB on each iteration. According to https://docs.djangoproject.com/en/1.3/ref/models/querysets/ iterating over the related items should eagerly load them, so I tried calling:
list(my_set)
However, this doesn't help. It does do a query to load all the sub-items, but then it still does an individual query for each sub-item inside the loop. How do I get it to use the cached set and not hit the DB each time? The DB is PostgreSQL 8.4.
The problem is in this line:
if len(existing) == 1:
From Django documentation:
len(). A QuerySet is evaluated when you call len() on it. This, as you might expect, returns the length of the result list.
Note: Don't use len() on QuerySets if all you want to do is determine the number of records in the set. It's much more efficient to handle a count at the database level, using SQL's SELECT COUNT(*), and Django provides a count() method for precisely this reason. See count() below.
So in your case it executes the query each time when you call len(existing). The more effective way is:
existing.count() == 1
This will also hit the database each time you call it but it will execute SELECT COUNT(*) which is faster.

Resources