Entity Framework, full-text search and temporary tables - sql-server

I have a LINQ-2-Entity query builder, nesting different kinds of Where clauses depending on a fairly complex search form. Works great so far.
Now I need to use a SQL Server fulltext search index in some of my queries. Is there any chance to add the search term directly to the LINQ query, and have the score available as a selectable property?
If not, I could write a stored procedure to load a list of all row IDs matching the full-text search criteria, and then use a LINQ-2-Entity query to load the detail data and evaluate other optional filter criteria in a loop per row. That would be of course a very bad idea performance-wise.
Another option would be to use a stored procedure to insert all row IDs matching the full-text search into a temporary table, and then let the LINQ query join the temporary table. Question is: how to join a temporary table in a LINQ query, as it cannot be part of the entity model?

I think I would probably suggest a hybrid approach.
Write a stored procedure which returns all the information you need.
Map an entity to the results. The entity can be created for this sole purpose. Alternately, use version 4 of the Entity Framework, which allows mapping complex types to start procedure results. The point is that instead of trying to coerce the procedure results in to existing entity types, were going to handle them as their own type.
Now you can build a LINQ to Entities query.
Sample query:
var q = from r in Context.SearchFor("searchText")
let fooInstance = (r.ResultType == "Foo")
? Context.Foos.Where(f => f.Id == r.Id)
: null
where ((fooInstance == null) || (fooInstance.SpecialCriterion == r.SpecialCriterion))
select {
// ...
This is off the top of my head, so the syntax might not be right. The important point is treating search results as an entity.
Alternately: Use a more flexible FTS system, which can do the "special", per-type filtering when building the index.

I've seen code like this for EF4:
var query = context.ExecuteStoreQuery<Person>(
"SELECT * FROM People WHERE FREETEXT(*,{0})",
searchText
).AsQueryable();
This may be simpler than creating a stored proc or UDP in some cases.

Related

Specify columns while appending Snowpark Python Dataframe to table

So right now, I have a Dataframe created using the session.createDataFrame() in Python. The intention is to append this Dataframe to an existing table object in Snowflake.
However the schema of the source dataframe doesn't match exactly with the schema of the target table. In Snowpark Scala, the DataFrameWriter object has the method option() Saving/Appending Dataframe to a table that allows the specification of column order, and hence allows for skipping columns from the dataframe as the columns could be matched by their names.
However, Snowpark Python lacks the option() for DataframeWriter at the moment. This forces Snowflake to look for the schemas and the count of columns (between source and target ) to match, else an error is thrown.
Not sure when Snowpark for Python would receive this feature, but in the interim, is there any alternative to this (apart from hardcoding columns names in the INSERT query) ?
You are right that Snowpark does not make inserting novel records easy. But it is possible. I did it with the Snowpark Java SDK that lacked any source/docs just banging my head on the desk until it worked.
I first did a select against the target table (see first line), then got the schema, then created a new Row object with the correct order and types. Use column "order" mode not the column "name" mode. It's also really finicky about types - doesn't like java.util.Dates but wants Timestamps, doesn't like Integers but need Longs, etc.
Then do an "append"->"saveAsTable". By some miracle it worked. Agreed it would be fantastic if they accepted a Map<String, Object> to insert a row or let you map columns using names. But they probably want to discourage this given the nature of warehouse performance for row based operations.
In Java...
DataFrame dfSchema = session.sql("select * from TARGET_TABLE limit 1");
StructType schema = dfSchema.schema();
System.out.println(schema);
Row[] rows = new Row[]{Row.fromArray(new Object[]{endpoint.getDatabaseTable(), statusesArr, numRecords, Integer.valueOf(filenames.size()).longValue(), filenamesArr, urlsArr, startDate, endDate})};
DataFrame df = session.createDataFrame(rows, schema);
System.out.println(df.showString(0, 120));
df.write().mode("Append").saveAsTable("TARGET_TABLE");
In the save_as_table method, use the parameter column_order="name". See Snowflake save_as_table docs. This should match the columns by name and allow you to omit missing columns without the column number mismatch error.
It's also good practice to include a schema when creating your session. See Snowflake create_dataframe docs on using the StructType class.

Mapping multiple objects in dapper using Split-on and Query Multiple together

Consider Role-Employee-Address
An employee can have only one role but can have many addresses. So using split on is not efficient as we may get the duplicates of roles and employee. We can use query multiple, but i feel if there is a way we can capture the role and employee together in one result and address in another, then that would be better.
In that case, instead of returning role and employee separately, we can directly join both in a single query and split on some column while mapping.
I'm expecting something like this
string query = "StoredProcedure";
using (var multi = connection.QueryMultiple(query, null))
{
empRole = multi.Read<Employee,Role>().Single().SplitOn("Id");
add = multi.Read<Address>().ToList();
}
Is there any way we could do like this using both the techniques together?
Correct, you need a One-To-Many mapping which is not natively supported by Dapper, but can be easily implemented if the database you're using supports JSON. Here's an article I wrote (along with working samples) that shows how you can do it using SQL Server / Azure SQL:
https://medium.com/dapper-net/one-to-many-mapping-with-dapper-55ae6a65cfd4

SQL query to search for patterns against a specific string

I am currently dealing with a table containing a list of hundreds of part number patterns for discount purposes.
For example:
1) [FR]08-[01237]0[67]4E-%
2) _10-[01]064[CD]-____
3) F12-[0123]0[67]4C-%
I have a string search criteria: F10-1064C-02TY and I am trying to find out which pattern(s) matches that particular string. For this example my query would return the second pattern. The objective is to find the correct part discount based on the matched pattern(s).
What is the best approach in handling this type of problem? Is there a simple and elegant approach or does this involve some complex TSQL procedure?
The pattern for the right side of a LIKE clause can be any expression, which includes values from a table. This means you can use your patterns table in a query i.e.
SELECT PatternId, Pattern
FROM Patterns
WHERE 'F10-1064C-02TY' LIKE Pattern
You can build from here - if your part numbers are stored in a different table, join to that table using the LIKE clause as a join criterion, or build a procedure that takes a part number parameter, or whatever else fits your requirements.

Using an IN clause with LINQ-to-SQL's ExecuteQuery

LINQ to SQL did a horrible job translating one of my queries, so I rewrote it by hand. The problem is that the rewrite necessarily involves an IN clause, and I cannot for the life of me figure out how to pass a collection to ExecuteQuery for that purpose. The only thing I can come up with, which I've seen suggested on here, is to use string.Format on the entire query string to kluge around it—but that will prevent the query from ever ending up in the query cache.
What's the right way to do this?
NOTE: Please note that I am using raw SQL passed to ExecuteQuery. I said that in the very first sentence. Telling me to use Contains is not helpful, unless you know a way to mix Contains with raw SQL.
Table-Valued Parameters
On Cheezburger.com, we often need to pass a list of AssetIDs or UserIDs into a stored procedure or database query.
The bad way: Dynamic SQL
One way to pass this list in was to use dynamic SQL.
IEnumerable<long> assetIDs = GetAssetIDs();
var myQuery = "SELECT Name FROM Asset WHERE AssetID IN (" + assetIDs.Join(",") + ")";
return Config.GetDatabase().ExecEnumerableSql(dr=>dr.GetString("Name"), myQuery);
This is a very bad thing to do:
Dynamic SQL gives attackers a weakness by making SQL injection attacks easier.
Since we are usually just concatenating numbers together, this is highly unlikely, but
if you start concatenating strings together, all it takes is one user to type ';DROP TABLE Asset;SELECT '
and our site is dead.
Stored procedures can't have dynamic SQL, so the query had to be stored in code instead of in the DB schema.
Every time we run this query, the query plan must be recalculated. This can be very expensive for complicated queries.
However, it does have the advantage that no additional decoding is necessary on the DB side, since the AssetIDs are found by the query parser.
The good way: Table-Valued Parameters
SQL Server 2008 adds a new ability: users can define a table-valued database type.
Most other types are scalar (they only return one value), but table-valued types can hold multiple values, as long as the values are tabular.
We've defined three types: varchar_array, int_array, and bigint_array.
CREATE TYPE bigint_array AS TABLE (Id bigint NOT NULL PRIMARY KEY)
Both stored procedures and programmatically defined SQL queries can use these table-valued types.
IEnumerable<long> assetIDs = GetAssetIDs();
return Config.GetDatabase().ExecEnumerableSql(dr=>dr.GetString("Name"),
"SELECT Name FROM Asset WHERE AssetID IN (SELECT Id FROM #AssetIDs)",
new Parameter("#AssetIDs", assetIDs));
Advantages
Can be used in both stored procedures and programmatic SQL without much effort
Not vulnerable to SQL injection
Cacheable, stable queries
Does not lock the schema table
Not limited to 8k of data
Less work done by both DB server and the Mine apps, since there is no concatenation or decoding of CSV strings.
"typical use" statistics can be derived by the query analyzer, which can lead to even better performance.
Disadvantages
Only works on SQL Server 2008 and above.
Rumors that TVP are prebuffered in their entirety before execution of the query, which means phenomenally large TVPs may be rejected by the server.
Further investigation of this rumor is ongoing.
Further reading
This article is a great resource to learn more about TVP.
If you can't use table-valued parameters, this option is a little faster than the xml option while still allowing you to stay away from dynamic sql: pass the joined list of values as a string parameter, and parse the delimited string back to values in your query. please see this article for instructions on how to do the parsing efficiently.
I have a sneaking suspicion that you're on SQL Server 2005. Table-valued parameters weren't added until 2008, but you can still use the XML data type to pass sets between the client and the server.
This works for SQL Server 2005 (and later):
create procedure IGetAListOfValues
#Ids xml -- This will recevie a List of values
as
begin
-- You can load then in a temp table or use it as a subquery:
create table #Ids (Id int);
INSERT INTO #Ids
SELECT DISTINCT params.p.value('.','int')
FROM #Ids.nodes('/params/p') as params(p);
...
end
You have to invoke this procedure with a parameter like this:
exec IGetAListOfValues
#Ids = '<params> <p>1</p> <p>2</p> </params>' -- xml parameter
The nodes function uses an xPath expression. In this case, it's /params/p and that's way the XML uses <params> as root, and <p> as element.
The value function cast the text inside each p element to int, but you can use it with other data types easily. In this sample there is a DISTINCT to avoid repeated values, but, of course, you can remove it depending on what you want to achieve.
I have an auxiliary (extension) method that converts an IEnumerable<T> in a string that looks like the one shown in the execute example. It's easy to create one, and have it do the work for you whenever you need it. (You have to test the data type of T and convert to an adequate string that can be parsed on SQL Server side). This way your C# code is cleaner and your SPs follow the same pattern to receive the parameters (you can pass in as many lists as needed).
One advantage is that you don't need to make anything special in your database for it to work.
Of course, you don't need to create a temp table as it's done in my example, but you can use the query directly as a subquery inside an IN predicate
WHERE MyTableId IN (SELECT DISTINCT params.p.value('.','int')
FROM #Ids.nodes('/params/p') as params(p) )
I am not 100% sure that I understand correctly the problem, but LinqToSql's ExecuteQuery has an overload for parameters, and the query is supposed to use a format similar to string.Format.
Using this overload is safe against SQL injection, and behind the scenes LinqToSql transalets it to use sp_executesql with parameters.
Here is an example:
string sql = "SELECT * FROM city WHERE city LIKE {0}";
db.ExecuteQuery(sql, "Lon%"); //Note that we don't need the single quotes
This way one can use the benefit of parameterized queries, even while using dynamic sql.
However when it comes to using IN with a dynamic number of parameters, there are two options:
Construct the string dynamically, and then pass the values as an array, as in:
string sql = "SELECT * FROM city WHERE zip IN (";
List<string> placeholders = new List<string>();
for(int i = 0; i < zips.Length;i++)
{
placeholders.Add("{"+i.ToString()+"}");
}
sql += string.Join(",",placeholders.ToArray());
sql += ")";
db.ExecuteQuery(sql, zips.ToArray());
We can use a more compact approach by using the Linq extension methods, as in
string sql = "SELECT * FROM city WHERE zip IN ("+
string.Join("," , zips.Select(z => "{" + zips.IndexOf(f).ToString() + "}"))
+")";
db.ExecuteQuery(sql, zips.ToArray());

Linq vs. database views

Here’s an interesting question. Suppose we have related tables in the database, for example, Instrument and Currency. Instrument table has a currency_id field that is mapped to entry in Currency table. In Linq land what’s the better way:
a) Create Instrument and Currency entities in the DataContext and then create association or simply use join in Linq queries or
b) Create a view in the database that joins Instrument and Currency (thus resolving currency_id to currency code) and use that as an entity in Linq context?
Would you ever use them independently? If so, you will need to have entities for each one that will be used independently. I suspect that you will use the Currency independently (say for a dropdown that allows you to choose a currency when creating an instrument). That being the case, I think it would be easier to just keep them separate and have an association.
With the ORM now abstracting the data access logic that particular function of views is no longer needed. It would be best to leave it to the ORM since thats part of its function.
However views may still be useful to simplify stored procedures code and even for creating useful indices.
If you load in Instrument and later use the Currencies property to load in related Currencies, there will be two queries.
If you issue a linq query with a join, linq will convert that to sql with a join and you get all the data at once.
If you set up a DataLoadOptions, you get all the data in one query and you do not have to write the join.
http://msdn.microsoft.com/en-us/library/system.data.linq.dataloadoptions.aspx
http://msdn.microsoft.com/en-us/library/system.data.linq.dataloadoptions.loadwith.aspx
DataLoadOptions dlo = new DataLoadOptions();
dlo.LoadWith<Instrument>(i => i.Currencies)
myDataContext.LoadOptions = dlo;
I find LINQ to be temperamental. I can run the same query 1 minute apart and get a different result. Note I am working of a local database so I know the data hasn't changed. Using a view with a dataset is far more reliable in my option, especially with joins.

Resources