our application have a search functionality looking for businesses , users can search by name , category , location , features.... and more arguments. we would like to track these search arguments and save them in order to have a sense of users preferences, what are the values searched more and so forth .
the challenge is that a users can send many search requests , so with every change in the search argument a new request is sent to the backend. so my intial plan is to creat a table (postgres table)to store the search argument and to store a new row with each request, we will end up with a large table , however I will keep that only for few days (30 days) before archiving them , however I will have a cron job every week to store agrgated values on seperate table (exp: number of times the city "New York" appeared on the search) however I dont feel this is a good solution .
so before starting with designing the solution of how to manage search data. I would like some advice from someone who worked on something similar , what are the best practices , and the best procceses .
Related
I'm using CouchDB for syncing offline and online data. The problem is that since couchDB hasn't a layer between client and server, validation is kinda poor. The only field that can be unique is the _id.
For example if you want to make a sure the field "phone" is unique, you need to make sure that the phone number is somewhere in the _id (555-666-777_username) and the field "phone" is 555-666-777.
My problem is: I have a calendar that makes events that cannot overlap each other. An event has a start time and an end time. How can I make sure that the user won't put a date between the start time and end time.
One idea is instead of making one document with a startTime and endTime. It's to make various documents that have dates within the user's desired range.
Example: User selects a range between 2018-09-02 and 2018-09-10, so I'll create 8 documents with a _id composed of {date}{username}.
If you think couchDB doesn't suit to this kind of stuff, I'm open for suggestions.
I have a Python project running on Google App Engine. I have a set of data currently placed at datastore. On user side, I fetch them from my API and show them to the user on a Google Visualization table with client side search. Because the limitations I can only fetch 1000 record at one query. I want my users search from all records that I have. I can fetch them with multiple queries before showing them but fetching 1000 records already taking 5-6 second so this process can exceed 30 seconds timeout and I don't think putting around 20.000 records on a table is good idea.
So I decided to put my records on Google Search API. Wrote a script to sync important data between datastore and Search API Index. When perform a search, couldn't find anything like wildcard character. For example let's say I have user field stores a string which contains "Ilhan" value. When user search for "Ilha" that record not show up. I want to show record includes "Ilhan" value even if it partially typed. So basically SQL equivalent of my search should be something like "select * from users where user like '%ilh%'".
I wonder if there is a way to that or is this not how Search API works?
I setup similar functionality purely within datastore. I have a repeated computed property that contains all the search substrings that can be formed for a given object.
class User(ndb.Model):
# ... other fields
search_strings = ndb.ComputedProperty(
lambda self: [i.lower() for i in all_substrings(strings=[
self.email,
self.first_name,
self.last_name,], repeated=True)
Your search query would then look like this:
User.query(User.search_strings == search_text.strip().lower()).fetch_page(20)
If you don't need the other features of Google Search API and if the number of substrings per entity won't put you at risk of hitting the 900 properties limit, then I'd recommend doing this instead as it's pretty simple and straight forward.
As for taking 5-6 seconds to fetch 1000 records, do you need to fetch that many? why not fetch only 100 or even 20 and use the query cursor for the user to pull the next page only if they need it.
Okay. I was looking at my physio's spreadsheets for his small, private business today. He uses Excel to keep track of his clients appointments, fees, attendences, medical reports etc. At the moment he has a single sheet where he adds every clients appointment to the list as he goes - there's 3 years of details, one row for every appointment! It's huge and pretty hard to navigate and make sense of when he's extracting information such as fees paid/unpaid, total visits, etc.
I'm a novice to sub-intermediate at Excel, but getting better. What I'm wondering is it possible to set up a "front page" where he can enter a day's details in a single spreadsheet, press an export cell,and then have Excel pass the relevant data to individual sheets for each client. The data on that front page would look for the clients name as a string to find the relevant sheet and drop the information in.
I'm not asking how to do it as such, but rather wondering if this is possible at all!
Thanks
=COUNTIF(Sheet1!RangeToLookThrough, ValueYoureLookingFor)
Put this where you want.
If you want a daily dashboard, you could replicate this to do something like :
=SUMPRODUCT(--(Sheet1!B2:B4000<>""),--(MONTH(Sheet1!B2:B4000)=9))
Feel free to alter this to capture yesterday or whatever date your looking for. I just used an old report to mess with and give you an example.
I am stuck on a database problem for a client, wandering if someone could help me out. I am currently trying to implement filtering functionality so that a user can filter results after they have searched for something. We are using SQL Server 2008. I am working on an electronics e-commerce site and the database is quite large (500,000 plus records). The scenario is this - user goes to our website and types in 'laptop' and clicks search. This brings up the first page of several thousand results. What I want to do is then
filter these results further and present the user with options such as:
Filter By Manufacturer
Dell (10,000)
Acer (2,000)
Lenovo (6,000)
Filter By Colour
Black (7000)
Silver (2000)
The main columns of the database are like this - the primary key is an integer ID
ID Title Manufacturer Colour
The key part of the question is how to get the counts in various categories in an efficient manner. The only way I currently know how to do it is with separate queries. However, should we wish to filter by further categories then this will become very slow - especially as the database grows. My current SQL is this:
select count(*) as ManufacturerCount, Manufacturer from [ProductDB.Product] GROUP BY Manufacturer;
select count(*) as ColourCount, Colour from [ProductDB.Product] GROUP BY Colour;
My question is if I can get the results as a single table using some-kind of join or union and if this would be faster than my current method of issuing multiple queries with the Count(*) function. Thanks for your help, if you require any further information please ask. PS I am wandering how on sites like ebay and amazon manage to do this so fast. In order to understand my problem better if you go onto ebay and type in laptop you will
see a number of filters on the left - this is basically what I am trying to achieve. I don't know how it can be done efficiently when there are many filters. E.g to get functionality equivalent to Ebay I would need about 10 queries and I'm sure that will be slow. I was thinking of creating an intermediate table with all the counts however the intermediate table would have to be continuously updated in order to reflect changes to the database and that would be a problem if there are multiple updates per minute. Thanks.
The "intermediate table" is exactly the way to go. I can guarantee you that no e-commerce site with substantial traffic and large number of products would do what you are suggesting on the fly at every inquiry.
If you are worried about keeping track of changes to products, just do all changes to the product catalog thru stored procs (my preferred method) or else use triggers.
One complication is how you will group things in the intermediate table. If you are only grouping on pre-defined categories and sub-categories that are built into the product hierarchy, then it's fairly easy. It sounds like you are allowing free-text search... if so, how will you manage multiple keywords that result in an unexpected intersection of different categories? One way is to save the keywords searched along with the counts and a time stamp. Then, the next time someone searches on the same keywords, check the intermediate table and if the time stamp is older than some predetermined threshold (say, 5 minutes), return your results to a temp table, query the category counts from the temp table, overwrite the previous counts with the new time stamp, and return the whole enchilada to the web app. Otherwise, skip the temp table and just return the pre-aggregated counts and data records. In this case, you might get some quirky front-end count behavior, like it might say "10 results" in a particular category but then when the user drills down, they actually find 9 or 11. It's happened to me on different sites as a customer and it's really not a big deal.
BTW, I used to work for a well-known e-commerce company and we did things like this.
I am trying to implement customized search in my application. The table structure is given below
main table:
teacher
sub tables:
skills
skill_values
cities
city_values
The searching will be triggered with location which is located in the table city_values with a reference field user_id, and city_id . Here name of the city and its latitude and longitude is found under the table cities.
Searching also includes skills, the table relations are similar to city. users table and skill_values table can be related with field user_id in the table skill_values. The table skills and skill_values related with field skill_id in table skill_values.
Here we need find the location of the user who perform this search, and need to filter this results with in 20 miles radius. there are a few other filters also.
My problem is that i need to filter these results without page reload. So i am using ajax, but if number of records increase my ajax request will take a lot of time to get response.
Is that a good idea that if i use some opensource search servers like sphinx or solr for fetching results from server?
I am using CAKEPHP for development and my application in hosted on cloud server.
... but if number of records increase my ajax request will take a lot of time to get response.
Regardless of the search technology, there should be a pagination mechanism of some kind.
You should therefore be able to set the limit or maximum number of results returned per page.
When a user performs a search query, you can use Javascript to request the first page of results.
You can then simply incrementing the page number and request the second, third, fourth page, etc.
This should mean that the top N results always appear in roughly the same amount of time.
It's then up to you to decide if you want to request each page of search results sequentially (ie. as the callback for each successful response), or if you wait for some kind of user input (ie. clicking a 'more' link or scrolling to the end of the results).
The timeline/newsfeed pages on Twitter or Facebook are a good example of this technique.