Get random result from JPQL query over large table - sql-server

I'm currently using JPQL queries to retrieve information from a database. The purpose of the project is testing a sample environments through randomness with different elements so I need the queries to retrieve random single results all through the project.
I am facing that JPQL does not implement a proper function for random retrieval and postcalculation of random takes too long (14 seconds for the attached function to return a random result)
public Player getRandomActivePlayerWithTransactions(){
List<Player> randomPlayers = entityManager.createQuery("SELECT pw.playerId FROM PlayerWallet pw JOIN pw.playerId p"
+ " JOIN p.gameAccountCollection ga JOIN ga.iDAccountStatus acs"
+ " WHERE (SELECT count(col.playerWalletTransactionId) FROM pw.playerWalletTransactionCollection col) > 0 AND acs.code = :status")
.setParameter("status", "ACTIVATED")
.getResultList();
return randomPlayers.get(random.nextInt(randomPlayers.size()));
}
As ORDER BY NEWID() is not allowed because of JPQL restrictions I have tested the following inline conditions, all of them returned with syntax error on compilation.
WHERE (ABS(CAST((BINARY_CHECKSUM(*) * RAND()) as int)) % 100) < 10
WHERE Rnd % 100 < 10
FROM TABLESAMPLE(10 PERCENT)

Have you consider to generate a random number and skip to that result?
I mean something like this:
String q = "SELECT COUNT(*) FROM Player p";
Query query=entityManager.createQuery(q);
Number countResult=(Number) query.getSingleResult();
int random = Math.random()*countResult.intValue();
List<Player> randomPlayers = entityManager.createQuery("SELECT pw.playerId FROM PlayerWallet pw JOIN pw.playerId p"
+ " JOIN p.gameAccountCollection ga JOIN ga.iDAccountStatus acs"
+ " WHERE (SELECT count(col.playerWalletTransactionId) FROM pw.playerWalletTransactionCollection col) > 0 AND acs.code = :status")
.setParameter("status", "ACTIVATED")
.setFirstResult(random)
.setMaxResults(1)
.getSingleResult();

I have figured it out. When retrieving the player I was also retrieving other unused related entity and all the entities related with that one and so one.
After adding fetch=FetchType.LAZY (don't fetch entity until required) to the problematic relation the performance of the query has increased dramatically.

Related

Peewee select query with multiple joins and multiple counts

I've been attempting to write a peewee select query which results in a table with 2 counts (one for the number of prizes associated with the lottery, and the for the number of packages associated with the lottery), as well as the fields in the Lottery model.
I've managed to write select queries with 1 count working (seen below), and then I've had to convert the ModelSelects to lists and join them manually (which I think is very hacky).
I did manage to write a select query where the results were joined, but it would multiply the packages count with the prizes count (I've since lost that query).
I also tried using a .switch(Lottery) but I didn't have any luck with this.
query1 = (Lottery.select(Lottery,fn.count(Package.id).alias('packages'))
.join(LotteryPackage)
.join(Package)
.order_by(Lottery.id)
.group_by(Lottery)
.dicts())
query2 = (Lottery.select(Lottery.id.alias('lotteryID'), fn.count(Prize.id).alias('prizes'))
.join(LotteryPrize)
.join(Prize)
.group_by(Lottery)
.order_by(Lottery.id)
.dicts())
lottery = list(query1)
query3 = list(query2)
for x in range(len(lottery)):
lottery[x]['prizes'] = query3[x]['prizes']
While the above code works, is there a cleaner way to write this query?
Your best bet is to do this with subqueries.
# Create query which gets lottery id and count of packages.
L1 = Lottery.alias()
subq1 = (L1
.select(L1.id, fn.COUNT(LotteryPackage.package).alias('packages'))
.join(LotteryPackage, JOIN.LEFT_OUTER)
.group_by(L1.id))
# Create query which gets lottery id and count of prizes.
L2 = Lottery.alias()
subq2 = (L2
.select(L2.id, fn.COUNT(LotteryPrize.prize).alias('prizes'))
.join(LotteryPrize, JOIN.LEFT_OUTER)
.group_by(L2.id))
# Select from lottery, joining on each subquery and returning
# the counts.
query = (Lottery
.select(Lottery, subq1.c.packages, subq2.c.prizes)
.join(subq1, on=(Lottery.id == subq1.c.id))
.join(subq2, on=(Lottery.id == subq2.c.id))
.order_by(Lottery.name))
for row in query.objects():
print(row.name, row.packages, row.prizes)

How to select unrelated entities with a single query

Let's imagine we have 5 tables that have no relationship between each other but all of them share the same column. Let's name the tables ClojureConf, KotlinConf, ScalaConf, GroovyConf, JavaConf. They all have a column UserId. The number and data types of other columns are different in each of them. A given User may have attended zero or more conferences.
The task is to just select all the records from each of the 5 tables for a given UserId, convert them to DTOs and return as json.
Currently the code does 5 trips to the database to get a list of results from each table.
Is there a library support in hibernate/jpa what would make a single trip to a database? The goal is to improve performance.
Is it possible to define a projection for entities that would look similar to this:
interface ConfAttended {
List<ClojureConf> getClojureConfs();
List<KotlinConf> getKotlinConfs();
List<ScalaConf> getScalaConfs();
List<GroovyConf> getGroovyConfs();
List<JavaConf> getJavaConfs();
}
and a repository that would select and map the results in one go
interface ConfAttendedDAO extends JpaRepository<User, Long> {
#Query("SELECT c, k, s, g, j FROM ClojureConf c " +
"JOIN KotlinConf k ON c.UserId = k.UserId " +
"JOIN ScalaConf s ON c.UserId = s.UserId " +
"JOIN GroovyConf g ON c.UserId = g.UserId " +
"JOIN JavaConf j ON c.UserId = j.UserId " +
"WHERE c.UserId = :userId")
ConfAttended findByUserIdForProjection(#Param("userId") long userId);
}
?
I ended up having a query like this:
interface ConfAttendedDAO extends JpaRepository<User, Long> {
#Query("SELECT c, k, s, g, j FROM User u " +
"LEFT JOIN ClojureConf c ON c.UserId = :userId " +
"LEFT JOIN KotlinConf k ON k.UserId = :userId " +
"LEFT JOIN ScalaConf s ON s.UserId = :userId " +
"LEFT JOIN GroovyConf g ON g.UserId = :userId " +
"LEFT JOIN JavaConf j ON j.UserId = :userId " +
"WHERE u.Id = :userId")
List<Object[]> findAllByUserId(#Param("userId") long userId);
}
Hibernate takes care of mapping rows to entities. Each Object[] has all 5 entities (or nulls) as its elements. Selecting from User is to force the query to return results. Otherwise if first table returns nothing - the whole query returns nothing. Another downside is that if one table has 10 results and another has 1, the table with less results gets them duplicated.
As for performance (the sole point of doing all this), getting and processing results is 4-5 times faster than with 5 separate SELECTs.

sql server 2012 : how to optimize this like % query

This query takes too much time, so I try to optimize it. Do you have any idea or suggestion ?
I tried with fulltext on a procedure and a while loop ... it gets worst ( dbo.url has more than 100 000 lines ; dbo.url where status = 'tocheck' only 1000)
select tocheck.*
from dbo.url tocheck inner join dbo.url done
on tocheck.id != done.id
and tocheck.url like done.url+'%'
and done.status in ('tocheck','todo','done')
where tocheck.status = 'tocheck'
Edit :
I call a webservice multiple times with different urls :
urls look like http://ws.com/query?p1=a&p2=b (url1).
If I already called url http://ws.com/query?p1=a (url2), i don't want to call url1 cause :
url1 like url2+'%'
Thanks for your help.
Edit2 :
I add a column suburl that contains 'query?p1=a' for each url and modify the query :
select tocheck.*
from dbo.url tocheck inner join dbo.url done
on tocheck.id != done.id
and tocheck.suburl = done.suburl --NEW
and tocheck.url like done.url+'%'
and done.status in ('tocheck','todo','done')
where tocheck.status = 'tocheck'
More than 10 times shorter ... Phew !!
I think because of joining the table to itself through ids not equal there is much overhead as this is a cartesian product only excluding self joins for same id.
I suggest trying with a subquery. Then the outer query returns only 1000 (as you mentioned) tochecks whereas the subquery additionally excludes urls starting with the same characters:
select
tocheck.*
from
dbo.url tocheck
where
tocheck.status = 'tocheck'
and
tocheck.id not in (
select
done.id
from
dbo.url done
where
tocheck.url like done.url+'%'
and
done.status in ('tocheck','todo','done')
)

Converting complex sql stored proc into linq

I'm using Linq to Sql and have a stored proc that won't generate a class. The stored proc draws data from multiple tables into a flat file resultset.
The amount of data returned must be as small as possible, the number of round trips to the Sql Server need to be limited, and the amount of server-side processing must be limited as this is for an ASP.NET MVC project.
So, I'm trying to write a Linq to Sql Query however am struggling to both replicate and limit the data returned.
Here's the stored proc that I'm trying to convert:
SELECT AdShops.shop_id as ID, Users.image_url_75x75, AdShops.Advertised,
Shops.shop_name, Shops.title, Shops.num_favorers as hearts, Users.transaction_sold_count as sold,
(select sum(L4.num_favorers) from Listings as L4 where L4.shop_id = L.shop_id) as listings_hearts,
(select sum(L4.views) from Listings as L4 where L4.shop_id = L.shop_id) as listings_views,
L.title AS listing_title, L.price as price, L.listing_id AS listing_id, L.tags, L.materials, L.currency_code,
L.url_170x135 as listing_image_url_170x135, L.url AS listing_url, l.views as listing_views, l.num_favorers as listing_hearts
FROM AdShops INNER JOIN
Shops ON AdShops.shop_id = Shops.shop_id INNER JOIN
Users ON Shops.user_id = Users.user_id INNER JOIN
Listings AS L ON Shops.shop_id = L.shop_id
WHERE (Shops.is_vacation = 0 AND
L.listing_id IN
(
SELECT listing_id
FROM (SELECT l2.user_id , l2.listing_id, RowNumber = ROW_NUMBER() OVER (PARTITION BY l2.user_id ORDER BY NEWID())
FROM Listings l2
INNER JOIN (
SELECT user_id
FROM Listings
GROUP BY
user_id
HAVING COUNT(*) >= 3
) cnt ON cnt.user_id = l2.user_id
) l2
WHERE l2.RowNumber <= 3 and L2.user_id = L.user_id
)
)
ORDER BY Shops.shop_name
Now, so far I can return a flat file but am not able to limit the number of listings. Here's where I'm stuck:
Dim query As IEnumerable = From x In db.AdShops
Join y In (From y1 In db.Shops
Where y1.Shop_name Like _Search + "*" AndAlso y1.Is_vacation = False
Order By y1.Shop_name
Select y1) On y.Shop_id Equals x.shop_id
Join z In db.Users On x.user_id Equals z.User_id
Join l In db.Listings On l.Shop_id Equals y.Shop_id
Select New With {
.shop_id = y.Shop_id,
.user_id = z.user_id,
.listing_id = l.Listing_id
} Take 24 ' Fields ommitted for briefity...
I assume to select a random set of 3 listings per shop, I'd need to use a lambda expression however am not sure how to do this. Also, need to add in somewhere consolidated totals for listing fieelds against individual shops...
Anyone have any thoughts?
UPDATE:
Here's the current solution that I'm looking at:
Result class wrapper:
Public Class NewShops
Public Property Shop_id As Integer
Public Property listing_id As Integer
Public Property tl_listing_hearts As Integer?
Public Property tl_listing_views As Integer?
Public Property listing_creation As Date
End Class
Linq + code:
Using db As New Ads.DB(Ads.DB.Conn)
Dim query As IEnumerable(Of IGrouping(Of Integer, NewShops)) =
(From x In db.AdShops
Join y In (From y1 In db.Shops
Where (y1.Shop_name Like _Search + "*" AndAlso y1.Is_vacation = False)
Select y1
Skip ((_Paging.CurrentPage - 1) * _Paging.ItemsPerPage)
Take (_Paging.ItemsPerPage))
On y.Shop_id Equals x.shop_id
Join z In db.Users On x.user_id Equals z.User_id
Join l In db.Listings On l.Shop_id Equals y.Shop_id
Join lt In (From l2 In db.Listings _
Group By id = l2.Shop_id Into Hearts = Sum(l2.Num_favorers), Views = Sum(l2.Views), Count() _
Select New NewShops With {.tl_listing_views = Views,
.tl_listing_hearts = Hearts,
.Shop_id = id})
On lt.Shop_id Equals y.Shop_id
Select New NewShops With {.Shop_id = y.Shop_id,
.tl_listing_views = lt.tl_listing_views,
.tl_listing_hearts = lt.tl_listing_hearts,
.listing_creation = l.Creation,
.listing_id = l.Listing_id
}).GroupBy(Function(s) s.Shop_id).OrderByDescending(Function(s) s(0).tl_listing_views)
Dim Shops as New Dictionary(Of String, List(Of NewShops))
For Each item As IEnumerable(Of NewShops) In query
Shops.Add(item(0).shop_name, (From i As NewShops In item
Order By i.listing_creation Descending
Select i Take 3).ToList)
Next
End Using
Anyone have any other suggestions?
From the looks of that SQL and code, I'd not be turning it into LINQ queries. It'll just obfuscate the logic and probably take you days to get it correct.
If SQLMetal doesn't generate it properly, have you considered using the ExecuteQuery method of the DataContext to return a list of the items you're after?
Assuming that your sproc you're trying to convert is called sp_complicated, and takes in one parameter, something like the following should do the trick
Protected Class TheResults
Public Property ID as Integer
Public Property image_url_75x75 as String
'... and so on and so forth for all the returned columns. Be careful with nulls
End Class
'then, when you want to use it
Using db As New Ads.DB(Ads.DB.Conn)
dim results = db.ExecuteQuery(Of TheResults)("exec sp_complicated {0}", _Search)
End Using
Before you freak out, that's not susceptible to SQL Injection. L2SQL uses proper SQLParameters, as long as you use the squigglies and don't just concatenate the strings yourself.

LINQ to SQL Take w/o Skip Causes Multiple SQL Statements

I have a LINQ to SQL query:
from at in Context.Transaction
select new {
at.Amount,
at.PostingDate,
Details =
from tb in at.TransactionDetail
select new {
Amount = tb.Amount,
Description = tb.Desc
}
}
This results in one SQL statement being executed. All is good.
However, if I attempt to return known types from this query, even if they have the same structure as the anonymous types, I get one SQL statement executed for the top level and then an additional SQL statement for each "child" set.
Is there any way to get LINQ to SQL to issue one SQL statement and use known types?
EDIT: I must have another issue. When I plugged a very simplistic (but still hieararchical) version of my query into LINQPad and used freshly created known types with just 2 or 3 members, I did get one SQL statement. I will post and update when I know more.
EDIT 2: This appears to be due to a bug in Take. See my answer below for details.
First - some reasoning for the Take bug.
If you just Take, the query translator just uses top. Top10 will not give the right answer if cardinality is broken by joining in a child collection. So the query translator doesn't join in the child collection (instead it requeries for the children).
If you Skip and Take, then the query translator kicks in with some RowNumber logic over the parent rows... these rownumbers let it take 10 parents, even if that's really 50 records due to each parent having 5 children.
If you Skip(0) and Take, Skip is removed as a non-operation by the translator - it's just like you never said Skip.
This is going to be a hard conceptual leap to from where you are (calling Skip and Take) to a "simple workaround". What we need to do - is force the translation to occur at a point where the translator can't remove Skip(0) as a non-operation. We need to call Skip, and supply the skipped number at a later point.
DataClasses1DataContext myDC = new DataClasses1DataContext();
//setting up log so we can see what's going on
myDC.Log = Console.Out;
//hierarchical query - not important
var query = myDC.Options.Select(option => new{
ID = option.ParentID,
Others = myDC.Options.Select(option2 => new{
ID = option2.ParentID
})
});
//request translation of the query! Important!
var compQuery = System.Data.Linq.CompiledQuery
.Compile<DataClasses1DataContext, int, int, System.Collections.IEnumerable>
( (dc, skip, take) => query.Skip(skip).Take(take) );
//now run the query and specify that 0 rows are to be skipped.
compQuery.Invoke(myDC, 0, 10);
This produces the following query:
SELECT [t1].[ParentID], [t2].[ParentID] AS [ParentID2], (
SELECT COUNT(*)
FROM [dbo].[Option] AS [t3]
) AS [value]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ID]) AS [ROW_NUMBER], [t0].[ParentID]
FROM [dbo].[Option] AS [t0]
) AS [t1]
LEFT OUTER JOIN [dbo].[Option] AS [t2] ON 1=1
WHERE [t1].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p1 + #p2
ORDER BY [t1].[ROW_NUMBER], [t2].[ID]
-- #p0: Input Int (Size = 0; Prec = 0; Scale = 0) [0]
-- #p1: Input Int (Size = 0; Prec = 0; Scale = 0) [0]
-- #p2: Input Int (Size = 0; Prec = 0; Scale = 0) [10]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1
And here's where we win!
WHERE [t1].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p1 + #p2
I've now determined this is the result of a horrible bug. The anonymous versus known type turned out not to be the cause. The real cause is Take.
The following result in 1 SQL statement:
query.Skip(1).Take(10).ToList();
query.ToList();
However, the following exhibit the one sql statement per parent row problem.
query.Skip(0).Take(10).ToList();
query.Take(10).ToList();
Can anyone think of any simple workarounds for this?
EDIT: The only workaround I've come up with is to check to see if I'm on the first page (IE Skip(0)) and then make two calls, one with Take(1) and the other with Skip(1).Take(pageSize - 1) and addRange the lists together.
I've not had a chance to try this but given that the anonymous type isn't part of LINQ rather a C# construct I wonder if you could use:
from at in Context.Transaction
select new KnownType(
at.Amount,
at.PostingDate,
Details =
from tb in at.TransactionDetail
select KnownSubType(
Amount = tb.Amount,
Description = tb.Desc
)
}
Obviously Details would need to be an IEnumerable collection.
I could be miles wide on this but it might at least give you a new line of thought to pursue which can't hurt so please excuse my rambling.

Resources