Using search server with cakephp - database

I am trying to implement customized search in my application. The table structure is given below
main table:
teacher
sub tables:
skills
skill_values
cities
city_values
The searching will be triggered with location which is located in the table city_values with a reference field user_id, and city_id . Here name of the city and its latitude and longitude is found under the table cities.
Searching also includes skills, the table relations are similar to city. users table and skill_values table can be related with field user_id in the table skill_values. The table skills and skill_values related with field skill_id in table skill_values.
Here we need find the location of the user who perform this search, and need to filter this results with in 20 miles radius. there are a few other filters also.
My problem is that i need to filter these results without page reload. So i am using ajax, but if number of records increase my ajax request will take a lot of time to get response.
Is that a good idea that if i use some opensource search servers like sphinx or solr for fetching results from server?
I am using CAKEPHP for development and my application in hosted on cloud server.

... but if number of records increase my ajax request will take a lot of time to get response.
Regardless of the search technology, there should be a pagination mechanism of some kind.
You should therefore be able to set the limit or maximum number of results returned per page.
When a user performs a search query, you can use Javascript to request the first page of results.
You can then simply incrementing the page number and request the second, third, fourth page, etc.
This should mean that the top N results always appear in roughly the same amount of time.
It's then up to you to decide if you want to request each page of search results sequentially (ie. as the callback for each successful response), or if you wait for some kind of user input (ie. clicking a 'more' link or scrolling to the end of the results).
The timeline/newsfeed pages on Twitter or Facebook are a good example of this technique.

Related

Limit the number of rows counted in in google data studio

I have a scorecard that looks at the number of URL clicks driven by all queries which works as expected. I am now trying to display the number of clicks driven by the top 10 queries in the scorecard. I was able to limit the number of rows in my table by disabling pagination to show only the top 10 queries but now I'm looking to sum the clicks in a scorecard to provide a quick summary rather than having a table.
I don't think what you want to do is possible dynamically via just the Search Console connector. Google Data Studio does not provide any way to calculate rankings via calculated fields, so there's no way for you to know which query is in the top 10 without looking at a sorted table. A few imperfect alternatives (roughly in order of increasing complexity):
You apply a filter so that the score card only aggregates values above a certain threshold. This would be hardcoded, so you would be filtering on the Clicks (ie aggregate all URL clicks above 100)
You apply a filter to the score card so that it only aggregates clicks from the top 10 URLs. This would not be a dynamically updating filter, so you'd have to look at the table to see which URLs are in the top 10, which would change as time goes on. This would end up being a filter like: "Include URLS Contains www.google.com,www.stackoverflow.com"
If you do not mind using google sheets as an intermediary, you could dump your Search Console data into a spreadsheet so that you can manipulate it however you like and then use the spreadsheet as the data source for data studio (as opposed to the Search Console connector). It looks like there might be some addons out there that you can use out of the box although I haven't used it myself, so not sure how difficult it is. Alternatively, you can build something out yourself via the Google Script and the Search Console API
You could build a custom Data Studio Community Visualization. (BTW just because they are called 'Community Visualizations' does not mean you have to make them publicly available.) Essentially here, you would be building a scorecard like component that aggregates the data according to your own rules, although this does require more coding experience. (Before you build one, check if something like what you need exists in the gallery, but at a quick glance, I don't see anything that would meet your needs.)

How to query data from different database server in asp net microservices?

I would like your feedbacks and inputs on how to query data from different database server in a micro services pattern. I have been struggling for a couple days and could not come up with an efficient way yet.
Here is my scenario:
For simplicity, let's say that I have 3 micro services:
One is handling the User Posts
One is handling the User Room Chats
An API Gateway
Now, I need to get all user posts and room chats and display it into front end based on Creation Date, but I don't want to display them all at once. I want to have a pagination system implemented on front end as well.
What I have:
Query to get all user posts based on a user Id with pagination order by creation date (page size and page number in the stored proc)
Query to get all user room chats based on user Id with pagination order by creation date (page size and page number in stored proc)
What I come up with:
On the front end:
1. Request is constructed with a fixed value of page size and page number. It starts with page size = 1 and page number = 15
2. When user scroll down, the request page size will increase to 1
On the back end:
1. Execute 2 queries, one to the user post microservices, and the other one to the user room chats, each one is fixed with a page size of 1 and a big value of page number, like 1000
2. Get result lists from both queries, then use Automapper to map them to a list of big combination object
3. Once I have the list of big combination object, sort it based on creation date, then do pagination based on the page size and page number that is passed from the front end request
4. Return result from step 3 to front end
I don't like this implementation, since I have to use a fixed value for page size when executing the stored procs. My concern is that if I have a lot of user posts and rooms, it would have big impact on my performance. However, I can't figure out a better way to do this.
FYI, I'm using ASP NET Core for back end, Angular for front end and MS SQL for my database.
I would really appreciated if someone could give me some hints/inputs on how to do that ? or if you have run into this scenario in your real life project, how would you solve this ? Thank you.

Google Search API Wildcard

I have a Python project running on Google App Engine. I have a set of data currently placed at datastore. On user side, I fetch them from my API and show them to the user on a Google Visualization table with client side search. Because the limitations I can only fetch 1000 record at one query. I want my users search from all records that I have. I can fetch them with multiple queries before showing them but fetching 1000 records already taking 5-6 second so this process can exceed 30 seconds timeout and I don't think putting around 20.000 records on a table is good idea.
So I decided to put my records on Google Search API. Wrote a script to sync important data between datastore and Search API Index. When perform a search, couldn't find anything like wildcard character. For example let's say I have user field stores a string which contains "Ilhan" value. When user search for "Ilha" that record not show up. I want to show record includes "Ilhan" value even if it partially typed. So basically SQL equivalent of my search should be something like "select * from users where user like '%ilh%'".
I wonder if there is a way to that or is this not how Search API works?
I setup similar functionality purely within datastore. I have a repeated computed property that contains all the search substrings that can be formed for a given object.
class User(ndb.Model):
# ... other fields
search_strings = ndb.ComputedProperty(
lambda self: [i.lower() for i in all_substrings(strings=[
self.email,
self.first_name,
self.last_name,], repeated=True)
Your search query would then look like this:
User.query(User.search_strings == search_text.strip().lower()).fetch_page(20)
If you don't need the other features of Google Search API and if the number of substrings per entity won't put you at risk of hitting the 900 properties limit, then I'd recommend doing this instead as it's pretty simple and straight forward.
As for taking 5-6 seconds to fetch 1000 records, do you need to fetch that many? why not fetch only 100 or even 20 and use the query cursor for the user to pull the next page only if they need it.

Sorting in batches

I have a Java servlet which queries a DB and shows table in the browser. I have implemented pagination so that when the user scroll the table the new request are made only then. The problem is if the user chooses to sort the table in UI based on some column, the request takes a long time because actual table in DB is quite big and it sorts the entire table and then send the sorted paged data to the client/browser. So suppose if the table has 100k rows and I have a page size of 100 rows, so is there a way to tweak the sorting in DB or in pagination in servlet that the sorting the entire 100k rows is not required.
Pagination may help. So, here is how it usually done. Unlike, the old paginated page by page loading web pages, when you have single page scroll. You usually have a drop down which lists the sorting column.
You first load the first page, as soon as the page's bottom appear, you request next-page's query via AJAX. Until here, I guess you are fine.
What seems to be troubling you that if the user has scrolled, 10 pages deep and then he sorts, you will have to load 10 pages of data in one go. This is wrong.
Two things,
When you change sorting criteria, you changed the context which you were viewing the table in.
Assume, if you load 10 pages and keep the user at 10th page. He will be surprised what happened.
So, as soon as user changed the sort criteria, you clean the DIV and load page 1, by new criteria. You see you do not have burden now. Do you?
Couple of quick-tips:
I personally think it's better to leave DBMS to do the sorting and pagination. I think it's made to do that. You should write optimized queries.
Indexing columns to be sorted-by helps some.
If items do not change frequently, you may want to cache the pages (results from DB) with an appropriate TTL.
Leverage DBMS provided special functions to optimize the query. Here is an article for MySQL

Paged results when selecting data from 2 databases

Hi
I have one web service connected to one db that has a table called clients which has some data.
I have another web service connected to another db that has a table called clientdetails which has some other data.
I have to return a paged list of clients and every client object contains the information from both tables.
But I have a problem.
The search criteria has to be applied on both tables.
So basically in the clients table I can have the properties:
cprop1, cprop2
in the clientdetails table I can have cdprop1,cdprop2
and my search criteria can be cporp1=something, cdprop2 = somethingelse
I call the first web service and send it the criteria cporp1=something
And it returns some info and then I call the method in the second web service but if I have to return say 10 items on a page and the criteria of the second web service are applied on the 10 items selected by the first web service(cdprop2 = somethingelse) then I may be left with 8 items or none at all.
So what do I do in this case?
How can I make sure I always get the right number of items(that is as much as the user says he wants on a page)?
Until you have both responses you don't know how many records you are going to have to display.
You don't what kind of database access you are using, you imply that you ask for "N records matching criterion X", where you have N set to 10. In some DB access mechanisms you can ask for all matching records and then advance a "cursor" through the set, hence you don't need to set any upper bound - we assume that the DB takes care of managing resources efficiently for such a query.
If you can't do that, then you need to be able to revisit the first database asking for the next 10 records, repeat until finally you have a page full or no more records can be found. This requires that you have some way to specify a query for "next 10".
You need the ability to get to all records matching the criteria in some efficient way, either by some cursor mechanism offered by your DB or by your own "paged" queries, without that capability I don't see a way to guarantee to give an accurate result.
I found that in instances like this it's better not to use identity primary keys but primary keys with generated values in the second database(generated in the first database).
As for searching you should search for the first 1000 items that fit your criteria from the first database, intersect them with the first 1000 that match the given criteria from the second database and return the needed amount of items from this intersection.
Your queries should never return an unlimited amount of items any way so 1000 should do. The number could be bigger or smaller of course.

Resources